From 48fd7ec67f6cea86eb9987a3284e9b7e91385e0b Mon Sep 17 00:00:00 2001 From: Ed Chalstrey Date: Wed, 9 Mar 2022 11:48:45 +0000 Subject: [PATCH 01/89] clarify pre-rquisites and link to doc --- docs/setup-dev-environment.md | 8 ++++++-- etl/README.md | 27 ++++++++++++--------------- 2 files changed, 18 insertions(+), 17 deletions(-) diff --git a/docs/setup-dev-environment.md b/docs/setup-dev-environment.md index 1d2ba0bd..734c5c8b 100644 --- a/docs/setup-dev-environment.md +++ b/docs/setup-dev-environment.md @@ -230,8 +230,12 @@ npm install ## :house: Loading the building data +There are several ways to create the Colouring London database in your environment. The simplest way if you are just trying out the application would be to use test data from OSM, but otherwise you should follow one of the instructions below to create the full database either from scratch, or from a previously made db (via a dump file). + +To create the full database from scratch, follow [these instructions](../etl/README.md), otherwise choose one of the following: +
- With a database dump

+ Create database from dump

If you are a developer on the Colouring London project (or another Colouring Cities project), you may have a production database (or staging etc) that you wish to duplicate in your development environment. @@ -261,7 +265,7 @@ ls ~/colouring-london/migrations/*.up.sql 2>/dev/null | while read -r migration;
- With test data

+ Create database with test data

This section shows how to load test buildings into the application from OpenStreetMaps (OSM). diff --git a/etl/README.md b/etl/README.md index 60f25780..93554116 100644 --- a/etl/README.md +++ b/etl/README.md @@ -1,4 +1,6 @@ -# Data loading +# Creating a Colouring London database from scratch + +## Data loading The scripts in this directory are used to extract, transform and load (ETL) the core datasets for Colouring London: @@ -8,20 +10,13 @@ for Colouring London: ## Prerequisites -Install PostgreSQL and create a database for colouringlondon, with a database -user that can connect to it. The [PostgreSQL -documentation](https://www.postgresql.org/docs/12/tutorial-start.html) covers -installation and getting started. +You should already have set up PostgreSQL and created a database. Make sure to create environment variables to use `psql` if you haven't already: -Install the [PostGIS extension](https://postgis.net/). - -Connect to the colouringlondon database and add the PostGIS, pgcrypto and -pg_trgm extensions: - -```sql -create extension postgis; -create extension pgcrypto; -create extension pg_trgm; +```bash +export PGPASSWORD= +export PGUSER= +export PGHOST=localhost +export PGDATABASE= ``` Create the core database tables: @@ -88,4 +83,6 @@ psql < ../migrations/003.index-buildings.up.sql ## Finally -Run the remaining migrations in `../migrations` to create the rest of the database structure. \ No newline at end of file +Run the remaining migrations in `../migrations` to create the rest of the database structure. + +# Updating the Colouring London database with new OS data \ No newline at end of file From 2dae59e5400025160f418de901b9a13cf9728535 Mon Sep 17 00:00:00 2001 From: Ed Chalstrey Date: Wed, 9 Mar 2022 13:24:27 +0000 Subject: [PATCH 02/89] data downloading --- etl/README.md | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/etl/README.md b/etl/README.md index 93554116..4dd2c931 100644 --- a/etl/README.md +++ b/etl/README.md @@ -1,6 +1,6 @@ # Creating a Colouring London database from scratch -## Data loading +## Data downloading The scripts in this directory are used to extract, transform and load (ETL) the core datasets for Colouring London: @@ -8,6 +8,19 @@ for Colouring London: 1. Building geometries, sourced from Ordnance Survey MasterMap (Topography Layer) 1. Unique Property Reference Numbers (UPRNs), sourced from Ordnance Survey AddressBase +To get the required datasets, you'll need to complete the following steps: + +1. Sign up for the Ordnance Survey [Data Exploration License](https://www.ordnancesurvey.co.uk/business-government/licensing-agreements/data-exploration-sign-up). You should receive an e-mail with a link to log in to the platform (this could take up to a week). +2. Navigate to https://orders.ordnancesurvey.co.uk/orders and click the button for: ✏️ Order. From here you should be able to click another button to add a product. +3. Drop a rectangle or Polygon over London and make the following selections, clicking the "Add to basket" button for each: + +![](screenshot/MasterMap.png) +

+ +![](screenshot/AddressBase.png) + +4. You should be then able to check out your basket and download the files + ## Prerequisites You should already have set up PostgreSQL and created a database. Make sure to create environment variables to use `psql` if you haven't already: From 732bae1f202a748d5942794db32b556bf4f314b3 Mon Sep 17 00:00:00 2001 From: Ed Chalstrey Date: Wed, 9 Mar 2022 13:33:26 +0000 Subject: [PATCH 03/89] clarify --- etl/README.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/etl/README.md b/etl/README.md index 4dd2c931..df51754f 100644 --- a/etl/README.md +++ b/etl/README.md @@ -35,7 +35,8 @@ export PGDATABASE= Create the core database tables: ```bash -psql < ../migrations/001.core.up.sql +cd ~/colouring-london +psql < migrations/001.core.up.sql ``` There is some performance benefit to creating indexes after bulk loading data. From bf75d6f9ed6a413a56b0339064125986871cc778 Mon Sep 17 00:00:00 2001 From: Ed Chalstrey Date: Wed, 9 Mar 2022 13:38:57 +0000 Subject: [PATCH 04/89] make data available --- etl/README.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/etl/README.md b/etl/README.md index df51754f..832b8239 100644 --- a/etl/README.md +++ b/etl/README.md @@ -5,7 +5,7 @@ The scripts in this directory are used to extract, transform and load (ETL) the core datasets for Colouring London: -1. Building geometries, sourced from Ordnance Survey MasterMap (Topography Layer) +1. Building geometries, sourced from Ordnance Survey (OS) MasterMap (Topography Layer) 1. Unique Property Reference Numbers (UPRNs), sourced from Ordnance Survey AddressBase To get the required datasets, you'll need to complete the following steps: @@ -45,6 +45,9 @@ creation steps below. Install GNU parallel, this is used to speed up loading bulk data. +## Make data available to Ubuntu + +If you didn't download the OS files to the Ubuntu machine where you are setting up your Colouring London application, you will need to make them available there. If you are using Virtualbox, you could host share a folder containing the files with the VM via a shared folder (e.g. [see these instructions for Mac](https://medium.com/macoclock/share-folder-between-macos-and-ubuntu-4ce84fb5c1ad)). ## Process and load Ordance Survey data From 269f24b946efba95feb1b7a757b0016a615caff1 Mon Sep 17 00:00:00 2001 From: Ed Chalstrey Date: Thu, 10 Mar 2022 09:33:55 +0000 Subject: [PATCH 05/89] remove section duplicated elsewhere --- etl/README.md | 22 ---------------------- 1 file changed, 22 deletions(-) diff --git a/etl/README.md b/etl/README.md index 832b8239..9fe3adde 100644 --- a/etl/README.md +++ b/etl/README.md @@ -76,28 +76,6 @@ load_uprns.py ./addressbase_dir psql < ../migrations/003.index-buildings.sql ``` -## Alternative, using OpenStreetMap - -This uses the [osmnx](https://github.com/gboeing/osmnx) python package to get OpenStreetMap data. You will need python and osmnx to run `get_test_polygons.py`. - -To help test the Colouring London application, `get_test_polygons.py` will attempt to save a -small (1.5km²) extract from OpenStreetMap to a format suitable for loading to the database. - -In this case, run: - -```bash -# download test data -python get_test_polygons.py -# load all building outlines -./load_geometries.sh ./ -# index geometries (should be faster after loading) -psql < ../migrations/002.index-geometries.up.sql -# create a building record per outline -./create_building_records.sh -# index building records -psql < ../migrations/003.index-buildings.up.sql -``` - ## Finally Run the remaining migrations in `../migrations` to create the rest of the database structure. From d58b0d35fb8bef40cfa72a13ca262ba5f3edb67c Mon Sep 17 00:00:00 2001 From: Ed Chalstrey Date: Thu, 10 Mar 2022 09:35:04 +0000 Subject: [PATCH 06/89] remove duplicate info --- etl/README.md | 7 +------ 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/etl/README.md b/etl/README.md index 9fe3adde..0c4bb0a4 100644 --- a/etl/README.md +++ b/etl/README.md @@ -49,12 +49,7 @@ Install GNU parallel, this is used to speed up loading bulk data. If you didn't download the OS files to the Ubuntu machine where you are setting up your Colouring London application, you will need to make them available there. If you are using Virtualbox, you could host share a folder containing the files with the VM via a shared folder (e.g. [see these instructions for Mac](https://medium.com/macoclock/share-folder-between-macos-and-ubuntu-4ce84fb5c1ad)). -## Process and load Ordance Survey data - -Before running any of these scripts, you will need the OS data for your area of -interest. AddressBase and MasterMap are available directly from [Ordnance -Survey](https://www.ordnancesurvey.co.uk/). The alternative setup below uses -OpenStreetMap. +## Process and load Ordnance Survey data The scripts should be run in the following order: From 490307e9c5c31e33044641d5fb4718c1052b42c1 Mon Sep 17 00:00:00 2001 From: Ed Chalstrey Date: Thu, 10 Mar 2022 09:46:12 +0000 Subject: [PATCH 07/89] clarify --- etl/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/etl/README.md b/etl/README.md index 0c4bb0a4..a451b26e 100644 --- a/etl/README.md +++ b/etl/README.md @@ -47,7 +47,7 @@ Install GNU parallel, this is used to speed up loading bulk data. ## Make data available to Ubuntu -If you didn't download the OS files to the Ubuntu machine where you are setting up your Colouring London application, you will need to make them available there. If you are using Virtualbox, you could host share a folder containing the files with the VM via a shared folder (e.g. [see these instructions for Mac](https://medium.com/macoclock/share-folder-between-macos-and-ubuntu-4ce84fb5c1ad)). +If you didn't download the OS files to the Ubuntu machine where you are setting up your Colouring London application, you will need to make them available there. If you are using Virtualbox, you could host share a the two folders containing the files (one for MasterMap, one for AddressBase) with the VM via a shared folder (e.g. [see these instructions for Mac](https://medium.com/macoclock/share-folder-between-macos-and-ubuntu-4ce84fb5c1ad)). ## Process and load Ordnance Survey data From b85a2bf86527c8a69871b0117c0a1fcd0d23f5b6 Mon Sep 17 00:00:00 2001 From: Ed Chalstrey Date: Thu, 10 Mar 2022 09:53:12 +0000 Subject: [PATCH 08/89] add sudo cmds --- etl/README.md | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/etl/README.md b/etl/README.md index a451b26e..37bb13c0 100644 --- a/etl/README.md +++ b/etl/README.md @@ -54,17 +54,18 @@ If you didn't download the OS files to the Ubuntu machine where you are setting The scripts should be run in the following order: ```bash +cd ~/colouring-london/etl # extract both datasets -extract_addressbase.sh ./addressbase_dir -extract_mastermap.sh ./mastermap_dir +sudo ./extract_addressbase.sh ./addressbase_dir +sudo ./extract_mastermap.sh ./mastermap_dir # filter mastermap ('building' polygons and any others referenced by addressbase) -filter_transform_mastermap_for_loading.sh ./addressbase_dir ./mastermap_dir +sudo ./filter_transform_mastermap_for_loading.sh ./addressbase_dir ./mastermap_dir # load all building outlines -load_geometries.sh ./mastermap_dir +sudo ./load_geometries.sh ./mastermap_dir # index geometries (should be faster after loading) psql < ../migrations/002.index-geometries.sql # create a building record per outline -create_building_records.sh +sudo ./create_building_records.sh # add UPRNs where they match load_uprns.py ./addressbase_dir # index building records From b773cec0e82f1802c5e5d1d54b736e2b002bafe1 Mon Sep 17 00:00:00 2001 From: Ed Chalstrey Date: Thu, 10 Mar 2022 09:59:22 +0000 Subject: [PATCH 09/89] unzip stage --- etl/extract_mastermap.sh | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/etl/extract_mastermap.sh b/etl/extract_mastermap.sh index 09ada8cb..64187a71 100755 --- a/etl/extract_mastermap.sh +++ b/etl/extract_mastermap.sh @@ -17,6 +17,14 @@ data_dir=$1 # Use `fid` as source ID, aka TOID. # +# +# Unzip to GML +# + +find $data_dir -type f -name '*.zip' -printf "%f\n" | \ +parallel \ +unzip -u $data_dir/{} -d $data_dir + find $data_dir -type f -name '*.gz' -printf "%f\n" | \ parallel \ gunzip $data_dir/{} -k -S gml From c029976741190c509996fd9fb4c183757ad408b1 Mon Sep 17 00:00:00 2001 From: Ed Chalstrey Date: Thu, 10 Mar 2022 10:06:44 +0000 Subject: [PATCH 10/89] todo --- etl/README.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/etl/README.md b/etl/README.md index 37bb13c0..3190f7ea 100644 --- a/etl/README.md +++ b/etl/README.md @@ -76,4 +76,6 @@ psql < ../migrations/003.index-buildings.sql Run the remaining migrations in `../migrations` to create the rest of the database structure. -# Updating the Colouring London database with new OS data \ No newline at end of file +# [WIP] Updating the Colouring London database with new OS data + +TODO: this section should instruct how to update and existing db \ No newline at end of file From 826dcc5236e2672c436a29be4e8d6b3cfce49b0a Mon Sep 17 00:00:00 2001 From: Ed Chalstrey Date: Thu, 10 Mar 2022 10:33:55 +0000 Subject: [PATCH 11/89] move python setup section --- docs/setup-dev-environment.md | 71 +++++++++++++++++++---------------- 1 file changed, 39 insertions(+), 32 deletions(-) diff --git a/docs/setup-dev-environment.md b/docs/setup-dev-environment.md index 734c5c8b..61c053b8 100644 --- a/docs/setup-dev-environment.md +++ b/docs/setup-dev-environment.md @@ -50,6 +50,7 @@ ssh @localhost -p 4022 - [:arrow_down: Installing Node.js](#arrow_down-installing-nodejs) - [:large_blue_circle: Configuring PostgreSQL](#large_blue_circle-configuring-postgresql) - [:arrow_forward: Configuring Node.js](#arrow_forward-configuring-nodejs) + - [Install Python]() - [:house: Loading the building data](#house-loading-the-building-data) - [:computer: Running the application](#computer-running-the-application) - [:eyes: Viewing the application](#eyes-viewing-the-application) @@ -228,6 +229,42 @@ cd ~/colouring-london/app npm install ``` +### Set up Python + +Install python and related tools. + +```bash +sudo apt-get install -y python3 python3-pip python3-dev python3-venv +``` + +Now set up a virtual environment for python. In the following example we have named the +virtual environment *colouringlondon* but it can have any name. + +```bash +pyvenv colouringlondon +``` + +Activate the virtual environment so we can install python packages into it. + +```bash +source colouringlondon/bin/activate +``` + +Install python pip package manager and related tools. + +```bash +pip install --upgrade pip +pip install --upgrade setuptools wheel +``` + +Install the required python packages. This relies on the `requirements.txt` file located +in the `etl` folder of your local repository. + +```bash +cd ~/colouring-london/etl/ +pip install -r requirements.txt +``` + ## :house: Loading the building data There are several ways to create the Colouring London database in your environment. The simplest way if you are just trying out the application would be to use test data from OSM, but otherwise you should follow one of the instructions below to create the full database either from scratch, or from a previously made db (via a dump file). @@ -269,34 +306,6 @@ ls ~/colouring-london/migrations/*.up.sql 2>/dev/null | while read -r migration; This section shows how to load test buildings into the application from OpenStreetMaps (OSM). -#### Set up Python - -Install python and related tools. - -```bash -sudo apt-get install -y python3 python3-pip python3-dev python3-venv -``` - -Now set up a virtual environment for python. In the following example we have named the -virtual environment *colouringlondon* but it can have any name. - -```bash -pyvenv colouringlondon -``` - -Activate the virtual environment so we can install python packages into it. - -```bash -source colouringlondon/bin/activate -``` - -Install python pip package manager and related tools. - -```bash -pip install --upgrade pip -pip install --upgrade setuptools wheel -``` - #### Load OpenStreetMap test polygons First install prerequisites. @@ -304,12 +313,10 @@ First install prerequisites. sudo apt-get install -y parallel ``` -Install the required python packages. This relies on the `requirements.txt` file located -in the `etl` folder of your local repository. +Ensure you have the `colouringlondon` environment activated. ```bash -cd ~/colouring-london/etl/ -pip install -r requirements.txt +source colouringlondon/bin/activate ``` To help test the Colouring London application, `get_test_polygons.py` will attempt to save a small (1.5km²) extract from OpenStreetMap to a format suitable for loading to the database. From 57b580ea04efcea53691194c06e4d2da5c16bdea Mon Sep 17 00:00:00 2001 From: Ed Chalstrey Date: Thu, 10 Mar 2022 10:36:31 +0000 Subject: [PATCH 12/89] emojify --- docs/setup-dev-environment.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/setup-dev-environment.md b/docs/setup-dev-environment.md index 61c053b8..f2582a59 100644 --- a/docs/setup-dev-environment.md +++ b/docs/setup-dev-environment.md @@ -50,7 +50,7 @@ ssh @localhost -p 4022 - [:arrow_down: Installing Node.js](#arrow_down-installing-nodejs) - [:large_blue_circle: Configuring PostgreSQL](#large_blue_circle-configuring-postgresql) - [:arrow_forward: Configuring Node.js](#arrow_forward-configuring-nodejs) - - [Install Python]() + - [:snake: Set up Python](#snake-set-up-python) - [:house: Loading the building data](#house-loading-the-building-data) - [:computer: Running the application](#computer-running-the-application) - [:eyes: Viewing the application](#eyes-viewing-the-application) @@ -229,7 +229,7 @@ cd ~/colouring-london/app npm install ``` -### Set up Python +### :snake: Set up Python Install python and related tools. From ecb9301be1ed04882a43096d367ff1d8c5d277ee Mon Sep 17 00:00:00 2001 From: Ed Chalstrey Date: Thu, 10 Mar 2022 10:38:22 +0000 Subject: [PATCH 13/89] move parallel install to essential tools --- docs/setup-dev-environment.md | 7 +------ 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/docs/setup-dev-environment.md b/docs/setup-dev-environment.md index f2582a59..7b4447cc 100644 --- a/docs/setup-dev-environment.md +++ b/docs/setup-dev-environment.md @@ -67,7 +67,7 @@ sudo apt-get upgrade -y Now install some essential tools. ```bash -sudo apt-get install -y build-essential git wget curl +sudo apt-get install -y build-essential git wget curl parallel ``` ### :red_circle: Installing PostgreSQL @@ -308,11 +308,6 @@ This section shows how to load test buildings into the application from OpenStre #### Load OpenStreetMap test polygons -First install prerequisites. -```bash -sudo apt-get install -y parallel -``` - Ensure you have the `colouringlondon` environment activated. ```bash From 4da6c96181acf900bfc160d21c9b7e3065376699 Mon Sep 17 00:00:00 2001 From: Ed Chalstrey Date: Thu, 10 Mar 2022 10:39:06 +0000 Subject: [PATCH 14/89] tidy --- etl/README.md | 55 ++++++++++++++++++++++++++++++++++++++++++--------- 1 file changed, 46 insertions(+), 9 deletions(-) diff --git a/etl/README.md b/etl/README.md index 3190f7ea..41d3abe7 100644 --- a/etl/README.md +++ b/etl/README.md @@ -55,27 +55,64 @@ The scripts should be run in the following order: ```bash cd ~/colouring-london/etl -# extract both datasets +``` + +Extract the addressBase dataset. + +```bash sudo ./extract_addressbase.sh ./addressbase_dir +``` + +Extract the MasterMap data (this step could take a while). + +```bash sudo ./extract_mastermap.sh ./mastermap_dir -# filter mastermap ('building' polygons and any others referenced by addressbase) +``` + +Filter MasterMap 'building' polygons and any others referenced by addressbase. + +```bash sudo ./filter_transform_mastermap_for_loading.sh ./addressbase_dir ./mastermap_dir -# load all building outlines +``` + +Load all building outlines. + +```bash sudo ./load_geometries.sh ./mastermap_dir -# index geometries (should be faster after loading) +``` + +Index geometries. + +```bash psql < ../migrations/002.index-geometries.sql -# create a building record per outline +``` + +Create a building record per outline. + +```bash sudo ./create_building_records.sh -# add UPRNs where they match +``` + +Add UPRNs where they match. + + + +```bash load_uprns.py ./addressbase_dir -# index building records +```` + +Index building records. + +```bash psql < ../migrations/003.index-buildings.sql ``` -## Finally - Run the remaining migrations in `../migrations` to create the rest of the database structure. + + # [WIP] Updating the Colouring London database with new OS data TODO: this section should instruct how to update and existing db \ No newline at end of file From 5fb4f91e5c8f2f4189798ca76648af1cc8a16846 Mon Sep 17 00:00:00 2001 From: Ed Chalstrey Date: Thu, 10 Mar 2022 10:40:38 +0000 Subject: [PATCH 15/89] source python env --- etl/README.md | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/etl/README.md b/etl/README.md index 41d3abe7..7bf8a9d8 100644 --- a/etl/README.md +++ b/etl/README.md @@ -93,11 +93,10 @@ Create a building record per outline. sudo ./create_building_records.sh ``` -Add UPRNs where they match. - - +Ensure you have the `colouringlondon` environment activated, then add UPRNs where they match. ```bash +source colouringlondon/bin/activate load_uprns.py ./addressbase_dir ```` From 5810aafc1e9d0ae50081b012e2d876e143c44c38 Mon Sep 17 00:00:00 2001 From: Ed Chalstrey Date: Thu, 10 Mar 2022 10:44:31 +0000 Subject: [PATCH 16/89] update how to run migrations --- etl/README.md | 14 ++++---------- 1 file changed, 4 insertions(+), 10 deletions(-) diff --git a/etl/README.md b/etl/README.md index 7bf8a9d8..2c22f46c 100644 --- a/etl/README.md +++ b/etl/README.md @@ -84,7 +84,7 @@ sudo ./load_geometries.sh ./mastermap_dir Index geometries. ```bash -psql < ../migrations/002.index-geometries.sql +psql < ../migrations/002.index-geometries.up.sql ``` Create a building record per outline. @@ -100,17 +100,11 @@ source colouringlondon/bin/activate load_uprns.py ./addressbase_dir ```` -Index building records. - -```bash -psql < ../migrations/003.index-buildings.sql -``` - Run the remaining migrations in `../migrations` to create the rest of the database structure. - +```bash +ls ~/colouring-london/migrations/*.up.sql 2>/dev/null | while read -r migration; do psql < $migration; done; +``` # [WIP] Updating the Colouring London database with new OS data From aafff2911ca7b1ff3c0b30064829927fe5d637f8 Mon Sep 17 00:00:00 2001 From: Ed Chalstrey Date: Thu, 10 Mar 2022 10:52:44 +0000 Subject: [PATCH 17/89] add rename package --- docs/setup-dev-environment.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/setup-dev-environment.md b/docs/setup-dev-environment.md index 7b4447cc..792e97db 100644 --- a/docs/setup-dev-environment.md +++ b/docs/setup-dev-environment.md @@ -67,7 +67,7 @@ sudo apt-get upgrade -y Now install some essential tools. ```bash -sudo apt-get install -y build-essential git wget curl parallel +sudo apt-get install -y build-essential git wget curl parallel rename ``` ### :red_circle: Installing PostgreSQL From b5bcd379a3719d5ea9e62ce5652c6b2e6fa9a113 Mon Sep 17 00:00:00 2001 From: Ed Chalstrey Date: Thu, 10 Mar 2022 11:10:53 +0000 Subject: [PATCH 18/89] update the python docs --- docs/setup-dev-environment.md | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/docs/setup-dev-environment.md b/docs/setup-dev-environment.md index 792e97db..52b4c541 100644 --- a/docs/setup-dev-environment.md +++ b/docs/setup-dev-environment.md @@ -237,10 +237,10 @@ Install python and related tools. sudo apt-get install -y python3 python3-pip python3-dev python3-venv ``` -Now set up a virtual environment for python. In the following example we have named the -virtual environment *colouringlondon* but it can have any name. +Create a virtual environment for python in the `etl` folder of your repository. In the following example we have name the virtual environment *colouringlondon* but it can have any name. ```bash +cd ~/colouring-london/etl pyvenv colouringlondon ``` @@ -257,11 +257,9 @@ pip install --upgrade pip pip install --upgrade setuptools wheel ``` -Install the required python packages. This relies on the `requirements.txt` file located -in the `etl` folder of your local repository. +Install the required python packages. ```bash -cd ~/colouring-london/etl/ pip install -r requirements.txt ``` From e66e375cb800256b5714cad97208d2581bfbfd6e Mon Sep 17 00:00:00 2001 From: Ed Chalstrey Date: Thu, 10 Mar 2022 11:46:34 +0000 Subject: [PATCH 19/89] add note --- etl/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/etl/README.md b/etl/README.md index 2c22f46c..fd9427f4 100644 --- a/etl/README.md +++ b/etl/README.md @@ -19,7 +19,7 @@ To get the required datasets, you'll need to complete the following steps: ![](screenshot/AddressBase.png) -4. You should be then able to check out your basket and download the files +4. You should be then able to check out your basket and download the files. Note: there may be multiple `.zip` files to download for MasterMap due to the size of the dataset. ## Prerequisites From c1cf06dca9ebcb4e0e402c41e64af9971b759915 Mon Sep 17 00:00:00 2001 From: Ed Chalstrey Date: Thu, 10 Mar 2022 11:55:35 +0000 Subject: [PATCH 20/89] update AddressBase instructions --- etl/README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/etl/README.md b/etl/README.md index fd9427f4..c552bb2f 100644 --- a/etl/README.md +++ b/etl/README.md @@ -20,6 +20,7 @@ To get the required datasets, you'll need to complete the following steps: ![](screenshot/AddressBase.png) 4. You should be then able to check out your basket and download the files. Note: there may be multiple `.zip` files to download for MasterMap due to the size of the dataset. +5. Unzip the AddressBase `.zip` in a convenient location. We will use the unzipped folder in later steps. Rename the folder as appropriate (make sure this folder doesn't contain the original `.zip` file). Note: this folder also contains `.zip` files, do not unzip at this stage as a script will do this later. ## Prerequisites From 3366b6756006145c9ceafc28b2bb7878cfdca591 Mon Sep 17 00:00:00 2001 From: Ed Chalstrey Date: Thu, 10 Mar 2022 12:00:37 +0000 Subject: [PATCH 21/89] Revert "unzip stage" This reverts commit b773cec0e82f1802c5e5d1d54b736e2b002bafe1. --- etl/extract_mastermap.sh | 8 -------- 1 file changed, 8 deletions(-) diff --git a/etl/extract_mastermap.sh b/etl/extract_mastermap.sh index 64187a71..09ada8cb 100755 --- a/etl/extract_mastermap.sh +++ b/etl/extract_mastermap.sh @@ -17,14 +17,6 @@ data_dir=$1 # Use `fid` as source ID, aka TOID. # -# -# Unzip to GML -# - -find $data_dir -type f -name '*.zip' -printf "%f\n" | \ -parallel \ -unzip -u $data_dir/{} -d $data_dir - find $data_dir -type f -name '*.gz' -printf "%f\n" | \ parallel \ gunzip $data_dir/{} -k -S gml From 26090e03b048d0c33537cef7244076b356b083bf Mon Sep 17 00:00:00 2001 From: Ed Chalstrey Date: Thu, 10 Mar 2022 12:01:41 +0000 Subject: [PATCH 22/89] update mastermap instructions --- etl/README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/etl/README.md b/etl/README.md index c552bb2f..64c7f5a8 100644 --- a/etl/README.md +++ b/etl/README.md @@ -21,6 +21,7 @@ To get the required datasets, you'll need to complete the following steps: 4. You should be then able to check out your basket and download the files. Note: there may be multiple `.zip` files to download for MasterMap due to the size of the dataset. 5. Unzip the AddressBase `.zip` in a convenient location. We will use the unzipped folder in later steps. Rename the folder as appropriate (make sure this folder doesn't contain the original `.zip` file). Note: this folder also contains `.zip` files, do not unzip at this stage as a script will do this later. +6. Unzip the MasterMap `.zip` files and move all the `.gz` files from each to a single folder in a convenient location. We will use this folder in later steps. ## Prerequisites From 8050419e716e1cfbb8c28f8b675bbf84c52396bb Mon Sep 17 00:00:00 2001 From: Ed Chalstrey Date: Thu, 10 Mar 2022 12:02:26 +0000 Subject: [PATCH 23/89] add error --- etl/README.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/etl/README.md b/etl/README.md index 64c7f5a8..bb59e907 100644 --- a/etl/README.md +++ b/etl/README.md @@ -65,6 +65,8 @@ Extract the addressBase dataset. sudo ./extract_addressbase.sh ./addressbase_dir ``` + + Extract the MasterMap data (this step could take a while). ```bash From 4f58ea64f436521839d06f868d87ae8a8ff449ab Mon Sep 17 00:00:00 2001 From: Ed Chalstrey Date: Thu, 10 Mar 2022 13:40:30 +0000 Subject: [PATCH 24/89] rearrange --- etl/README.md | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/etl/README.md b/etl/README.md index bb59e907..12fa4bdc 100644 --- a/etl/README.md +++ b/etl/README.md @@ -73,6 +73,14 @@ Extract the MasterMap data (this step could take a while). sudo ./extract_mastermap.sh ./mastermap_dir ``` + + +Ensure you have the `colouringlondon` environment activated. + +```bash +source colouringlondon/bin/activate +``` + Filter MasterMap 'building' polygons and any others referenced by addressbase. ```bash @@ -97,10 +105,9 @@ Create a building record per outline. sudo ./create_building_records.sh ``` -Ensure you have the `colouringlondon` environment activated, then add UPRNs where they match. +Add UPRNs where they match. ```bash -source colouringlondon/bin/activate load_uprns.py ./addressbase_dir ```` From 5796df69a1617e5c4aa19f9a3d7209206fcd25cd Mon Sep 17 00:00:00 2001 From: Ed Chalstrey Date: Thu, 10 Mar 2022 13:47:43 +0000 Subject: [PATCH 25/89] update python path in script --- etl/filter_transform_mastermap_for_loading.sh | 2 ++ 1 file changed, 2 insertions(+) diff --git a/etl/filter_transform_mastermap_for_loading.sh b/etl/filter_transform_mastermap_for_loading.sh index 7f7e6dd3..45597805 100755 --- a/etl/filter_transform_mastermap_for_loading.sh +++ b/etl/filter_transform_mastermap_for_loading.sh @@ -9,6 +9,8 @@ addressbase_dir=$1 mastermap_dir=$2 +PATH+=:~/colouring-london/etl/colouringlondon/bin/python + # # Check which TOIDs are matched against UPRNs # From 5b06b12f9852af7b87b68ad9c271be8d7aa5c59d Mon Sep 17 00:00:00 2001 From: Ed Chalstrey Date: Thu, 10 Mar 2022 13:51:05 +0000 Subject: [PATCH 26/89] temp hard code python path --- etl/filter_transform_mastermap_for_loading.sh | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/etl/filter_transform_mastermap_for_loading.sh b/etl/filter_transform_mastermap_for_loading.sh index 45597805..878a28d3 100755 --- a/etl/filter_transform_mastermap_for_loading.sh +++ b/etl/filter_transform_mastermap_for_loading.sh @@ -9,19 +9,17 @@ addressbase_dir=$1 mastermap_dir=$2 -PATH+=:~/colouring-london/etl/colouringlondon/bin/python - # # Check which TOIDs are matched against UPRNs # -python check_ab_mm_match.py $addressbase_dir $mastermap_dir +colouringlondon/bin/python check_ab_mm_match.py $addressbase_dir $mastermap_dir # # Filter # - WHERE descriptiveGroup = '(1:Building)' # - OR toid in addressbase_toids # -python filter_mastermap.py $addressbase_dir $mastermap_dir +colouringlondon/bin/python filter_mastermap.py $addressbase_dir $mastermap_dir # # Transform to 3857 (web mercator) From 7f3693355e93eeceda5873e53e58a90d5c5d2af0 Mon Sep 17 00:00:00 2001 From: Ed Chalstrey Date: Thu, 10 Mar 2022 14:02:00 +0000 Subject: [PATCH 27/89] add comments and update how to run --- etl/README.md | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/etl/README.md b/etl/README.md index 12fa4bdc..9fd1d161 100644 --- a/etl/README.md +++ b/etl/README.md @@ -89,6 +89,8 @@ sudo ./filter_transform_mastermap_for_loading.sh ./addressbase_dir ./mastermap_d Load all building outlines. + + ```bash sudo ./load_geometries.sh ./mastermap_dir ``` @@ -101,14 +103,20 @@ psql < ../migrations/002.index-geometries.up.sql Create a building record per outline. + + ```bash sudo ./create_building_records.sh ``` + + Add UPRNs where they match. + + ```bash -load_uprns.py ./addressbase_dir +sudo ./load_uprns.sh ./addressbase_dir ```` Run the remaining migrations in `../migrations` to create the rest of the database structure. From 6916e4ca598ed4468f5e7dddc57899dacfe2d45d Mon Sep 17 00:00:00 2001 From: Ed Chalstrey Date: Fri, 11 Mar 2022 11:46:00 +0000 Subject: [PATCH 28/89] remove use of addressbase --- etl/README.md | 6 +++--- etl/filter_transform_mastermap_for_loading.sh | 12 ++++++------ 2 files changed, 9 insertions(+), 9 deletions(-) diff --git a/etl/README.md b/etl/README.md index 9fd1d161..cb9ba144 100644 --- a/etl/README.md +++ b/etl/README.md @@ -59,11 +59,11 @@ The scripts should be run in the following order: cd ~/colouring-london/etl ``` -Extract the addressBase dataset. + @@ -84,7 +84,7 @@ source colouringlondon/bin/activate Filter MasterMap 'building' polygons and any others referenced by addressbase. ```bash -sudo ./filter_transform_mastermap_for_loading.sh ./addressbase_dir ./mastermap_dir +sudo ./filter_transform_mastermap_for_loading.sh ./mastermap_dir ``` Load all building outlines. diff --git a/etl/filter_transform_mastermap_for_loading.sh b/etl/filter_transform_mastermap_for_loading.sh index 878a28d3..63d613a4 100755 --- a/etl/filter_transform_mastermap_for_loading.sh +++ b/etl/filter_transform_mastermap_for_loading.sh @@ -4,27 +4,27 @@ # Filter and transform for loading # : ${1?"Usage: $0 ./path/to/addressbase/dir ./path/to/mastermap/dir"} -: ${2?"Usage: $0 ./path/to/addressbase/dir ./path/to/mastermap/dir"} +# : ${2?"Usage: $0 ./path/to/addressbase/dir ./path/to/mastermap/dir"} -addressbase_dir=$1 -mastermap_dir=$2 +# addressbase_dir=$1 +mastermap_dir=$1 # # Check which TOIDs are matched against UPRNs # -colouringlondon/bin/python check_ab_mm_match.py $addressbase_dir $mastermap_dir +# colouringlondon/bin/python check_ab_mm_match.py $addressbase_dir $mastermap_dir # # Filter # - WHERE descriptiveGroup = '(1:Building)' # - OR toid in addressbase_toids # -colouringlondon/bin/python filter_mastermap.py $addressbase_dir $mastermap_dir +# colouringlondon/bin/python filter_mastermap.py $addressbase_dir $mastermap_dir # # Transform to 3857 (web mercator) # -find $mastermap_dir -type f -name '*.filtered.csv' -printf "%f\n" | \ +find $mastermap_dir -type f -name '*.gml.csv' -printf "%f\n" | \ parallel \ ogr2ogr \ -f CSV $mastermap_dir/{}.3857.csv \ From 7e8817d2c52772d9ba8861c81692085efc13d55c Mon Sep 17 00:00:00 2001 From: Ed Chalstrey Date: Fri, 11 Mar 2022 13:28:23 +0000 Subject: [PATCH 29/89] make scripts executable --- etl/README.md | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/etl/README.md b/etl/README.md index cb9ba144..6337586f 100644 --- a/etl/README.md +++ b/etl/README.md @@ -53,16 +53,17 @@ If you didn't download the OS files to the Ubuntu machine where you are setting ## Process and load Ordnance Survey data -The scripts should be run in the following order: +Move into the `etl` directory and set execute permission on all scripts. ```bash cd ~/colouring-london/etl +chmod +x *.sh ``` @@ -70,7 +71,7 @@ sudo ./extract_addressbase.sh ./addressbase_dir Extract the MasterMap data (this step could take a while). ```bash -sudo ./extract_mastermap.sh ./mastermap_dir +./extract_mastermap.sh ./mastermap_dir ``` @@ -84,7 +85,7 @@ source colouringlondon/bin/activate Filter MasterMap 'building' polygons and any others referenced by addressbase. ```bash -sudo ./filter_transform_mastermap_for_loading.sh ./mastermap_dir +./filter_transform_mastermap_for_loading.sh ./mastermap_dir ``` Load all building outlines. @@ -92,7 +93,7 @@ Load all building outlines. ```bash -sudo ./load_geometries.sh ./mastermap_dir +./load_geometries.sh ./mastermap_dir ``` Index geometries. @@ -106,7 +107,7 @@ Create a building record per outline. ```bash -sudo ./create_building_records.sh +./create_building_records.sh ``` @@ -116,7 +117,7 @@ Add UPRNs where they match. ```bash -sudo ./load_uprns.sh ./addressbase_dir +./load_uprns.sh ./addressbase_dir ```` Run the remaining migrations in `../migrations` to create the rest of the database structure. From 8fccec68275181061b12cc459851eb9306db5bf1 Mon Sep 17 00:00:00 2001 From: Ed Chalstrey Date: Mon, 14 Mar 2022 09:39:29 +0000 Subject: [PATCH 30/89] temp python path edit --- etl/extract_addressbase.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/etl/extract_addressbase.sh b/etl/extract_addressbase.sh index d9d7e07e..b6b94480 100755 --- a/etl/extract_addressbase.sh +++ b/etl/extract_addressbase.sh @@ -40,7 +40,7 @@ ogr2ogr -f CSV \ # find $data_dir -type f -name '*.gml.csv' -printf "%f\n" | \ parallel \ -python filter_addressbase_csv.py $data_dir/{} +colouringlondon/bin/python filter_addressbase_csv.py $data_dir/{} # From 8b8f6622f5dc6ad1db1eb778881b63028e54bf2b Mon Sep 17 00:00:00 2001 From: Ed Chalstrey Date: Thu, 17 Mar 2022 14:41:54 +0000 Subject: [PATCH 31/89] temp python path edit --- etl/filter_transform_mastermap_for_loading.sh | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/etl/filter_transform_mastermap_for_loading.sh b/etl/filter_transform_mastermap_for_loading.sh index 63d613a4..042a3441 100755 --- a/etl/filter_transform_mastermap_for_loading.sh +++ b/etl/filter_transform_mastermap_for_loading.sh @@ -12,14 +12,14 @@ mastermap_dir=$1 # # Check which TOIDs are matched against UPRNs # -# colouringlondon/bin/python check_ab_mm_match.py $addressbase_dir $mastermap_dir +colouringlondon/bin/python check_ab_mm_match.py $addressbase_dir $mastermap_dir # # Filter # - WHERE descriptiveGroup = '(1:Building)' # - OR toid in addressbase_toids # -# colouringlondon/bin/python filter_mastermap.py $addressbase_dir $mastermap_dir +colouringlondon/bin/python filter_mastermap.py $addressbase_dir $mastermap_dir # # Transform to 3857 (web mercator) From dafe5de2780bc02e71550dd51fbea75fab8ce557 Mon Sep 17 00:00:00 2001 From: Ed Chalstrey Date: Thu, 17 Mar 2022 14:47:08 +0000 Subject: [PATCH 32/89] comment out addessbase stuff --- etl/README.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/etl/README.md b/etl/README.md index 6337586f..0772b80a 100644 --- a/etl/README.md +++ b/etl/README.md @@ -5,8 +5,8 @@ The scripts in this directory are used to extract, transform and load (ETL) the core datasets for Colouring London: -1. Building geometries, sourced from Ordnance Survey (OS) MasterMap (Topography Layer) -1. Unique Property Reference Numbers (UPRNs), sourced from Ordnance Survey AddressBase +Building geometries, sourced from Ordnance Survey (OS) MasterMap (Topography Layer) + To get the required datasets, you'll need to complete the following steps: @@ -17,10 +17,10 @@ To get the required datasets, you'll need to complete the following steps: ![](screenshot/MasterMap.png)

-![](screenshot/AddressBase.png) + 4. You should be then able to check out your basket and download the files. Note: there may be multiple `.zip` files to download for MasterMap due to the size of the dataset. -5. Unzip the AddressBase `.zip` in a convenient location. We will use the unzipped folder in later steps. Rename the folder as appropriate (make sure this folder doesn't contain the original `.zip` file). Note: this folder also contains `.zip` files, do not unzip at this stage as a script will do this later. + 6. Unzip the MasterMap `.zip` files and move all the `.gz` files from each to a single folder in a convenient location. We will use this folder in later steps. ## Prerequisites From d822dfaaec2f3d8bd96165db3231410185ae690c Mon Sep 17 00:00:00 2001 From: Ed Chalstrey Date: Thu, 17 Mar 2022 15:02:51 +0000 Subject: [PATCH 33/89] removee addressbase stuff --- etl/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/etl/README.md b/etl/README.md index 0772b80a..44d1d778 100644 --- a/etl/README.md +++ b/etl/README.md @@ -49,7 +49,7 @@ Install GNU parallel, this is used to speed up loading bulk data. ## Make data available to Ubuntu -If you didn't download the OS files to the Ubuntu machine where you are setting up your Colouring London application, you will need to make them available there. If you are using Virtualbox, you could host share a the two folders containing the files (one for MasterMap, one for AddressBase) with the VM via a shared folder (e.g. [see these instructions for Mac](https://medium.com/macoclock/share-folder-between-macos-and-ubuntu-4ce84fb5c1ad)). +If you didn't download the OS files to the Ubuntu machine where you are setting up your Colouring London application, you will need to make them available there. If you are using Virtualbox, you could host share a the folder containing the MasteerMap files with the VM via a shared folder (e.g. [see these instructions for Mac](https://medium.com/macoclock/share-folder-between-macos-and-ubuntu-4ce84fb5c1ad)). ## Process and load Ordnance Survey data From 0e35a7cca25b2276581935b1e16e19384e4e432e Mon Sep 17 00:00:00 2001 From: Ed Chalstrey Date: Thu, 17 Mar 2022 15:43:23 +0000 Subject: [PATCH 34/89] mastermap filtering without using addressbase --- etl/filter_mastermap.py | 46 ++++++++++--------- etl/filter_transform_mastermap_for_loading.sh | 8 ++-- 2 files changed, 27 insertions(+), 27 deletions(-) diff --git a/etl/filter_mastermap.py b/etl/filter_mastermap.py index 1713d262..e291c7ad 100644 --- a/etl/filter_mastermap.py +++ b/etl/filter_mastermap.py @@ -1,7 +1,6 @@ -"""Filter MasterMap to buildings and addressbase-matches +"""Filter MasterMap to buildings - WHERE descriptiveGroup includes 'Building' -- OR toid in addressbase_toids """ import csv import glob @@ -13,25 +12,28 @@ from multiprocessing import Pool csv.field_size_limit(sys.maxsize) -def main(ab_path, mm_path): - mm_paths = sorted(glob.glob(os.path.join(mm_path, "*.gml.csv"))) - toid_paths = sorted(glob.glob(os.path.join(ab_path, "ab_toids_*.txt"))) +def main(mastermap_path): + mm_paths = sorted(glob.glob(os.path.join(mastermap_path, "*.gml.csv"))) + # toid_paths = sorted(glob.glob(os.path.join(ab_path, "ab_toids_*.txt"))) - try: - assert len(mm_paths) == len(toid_paths) - except AssertionError: - print(mm_paths) - print(toid_paths) - zipped_paths = zip(mm_paths, toid_paths) + # try: + # assert len(mm_paths) == len(toid_paths) + # except AssertionError: + # print(mm_paths) + # print(toid_paths) + # zipped_paths = zip(mm_paths, toid_paths) # parallel map over tiles - with Pool() as p: - p.starmap(filter, zipped_paths) + # with Pool() as p: + # p.starmap(filter, zipped_paths) + + for mm_path in mm_paths: + filter(mm_path) -def filter(mm_path, toid_path): - with open(toid_path, 'r') as fh: - r = csv.reader(fh) - toids = set(line[0] for line in r) +def filter(mm_path): + # with open(toid_path, 'r') as fh: + # r = csv.reader(fh) + # toids = set(line[0] for line in r) output_path = "{}.filtered.csv".format(str(mm_path).replace(".gml.csv", "")) alt_output_path = "{}.filtered_not_building.csv".format(str(mm_path).replace(".gml.csv", "")) @@ -48,13 +50,13 @@ def filter(mm_path, toid_path): if 'Building' in line['descriptiveGroup']: w.writerow(line) - elif line['fid'] in toids: - alt_w.writerow(line) + # elif line['fid'] in toids: + # alt_w.writerow(line) if __name__ == '__main__': - if len(sys.argv) != 3: - print("Usage: filter_mastermap.py ./path/to/addressbase/dir ./path/to/mastermap/dir") + if len(sys.argv) != 2: + print("Usage: filter_mastermap.py ./path/to/mastermap/dir") exit(-1) - main(sys.argv[1], sys.argv[2]) + main(sys.argv[1]) diff --git a/etl/filter_transform_mastermap_for_loading.sh b/etl/filter_transform_mastermap_for_loading.sh index 042a3441..45c62b2e 100755 --- a/etl/filter_transform_mastermap_for_loading.sh +++ b/etl/filter_transform_mastermap_for_loading.sh @@ -3,23 +3,21 @@ # # Filter and transform for loading # -: ${1?"Usage: $0 ./path/to/addressbase/dir ./path/to/mastermap/dir"} -# : ${2?"Usage: $0 ./path/to/addressbase/dir ./path/to/mastermap/dir"} +: ${1?"Usage: $0 ./path/to/mastermap/dir"} -# addressbase_dir=$1 mastermap_dir=$1 # # Check which TOIDs are matched against UPRNs # -colouringlondon/bin/python check_ab_mm_match.py $addressbase_dir $mastermap_dir +# colouringlondon/bin/python check_ab_mm_match.py $addressbase_dir $mastermap_dir # # Filter # - WHERE descriptiveGroup = '(1:Building)' # - OR toid in addressbase_toids # -colouringlondon/bin/python filter_mastermap.py $addressbase_dir $mastermap_dir +colouringlondon/bin/python filter_mastermap.py $mastermap_dir # # Transform to 3857 (web mercator) From 9e4224f51cc34561d303f490ce79eef5300f26b3 Mon Sep 17 00:00:00 2001 From: Ed Chalstrey Date: Fri, 18 Mar 2022 10:46:28 +0000 Subject: [PATCH 35/89] add sudo cmds and comment --- etl/README.md | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/etl/README.md b/etl/README.md index 44d1d778..5b4cbe34 100644 --- a/etl/README.md +++ b/etl/README.md @@ -71,10 +71,12 @@ chmod +x *.sh Extract the MasterMap data (this step could take a while). ```bash -./extract_mastermap.sh ./mastermap_dir +sudo ./extract_mastermap.sh ./mastermap_dir ``` - + + + Ensure you have the `colouringlondon` environment activated. @@ -85,7 +87,7 @@ source colouringlondon/bin/activate Filter MasterMap 'building' polygons and any others referenced by addressbase. ```bash -./filter_transform_mastermap_for_loading.sh ./mastermap_dir +sudo ./filter_transform_mastermap_for_loading.sh ./mastermap_dir ``` Load all building outlines. From f55ce63d84f3f46c0a05c7a0238fe9a90e2d934a Mon Sep 17 00:00:00 2001 From: Ed Chalstrey Date: Fri, 18 Mar 2022 10:49:41 +0000 Subject: [PATCH 36/89] add drop_outside_limit to instructions --- etl/README.md | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/etl/README.md b/etl/README.md index 5b4cbe34..49611071 100644 --- a/etl/README.md +++ b/etl/README.md @@ -128,6 +128,15 @@ Run the remaining migrations in `../migrations` to create the rest of the databa ls ~/colouring-london/migrations/*.up.sql 2>/dev/null | while read -r migration; do psql < $migration; done; ``` +TODO: Drop outside limit. + + + +```bash +./drop_outside_limit.sh ./path/to/boundary_file +```` + + # [WIP] Updating the Colouring London database with new OS data TODO: this section should instruct how to update and existing db \ No newline at end of file From 3653e3036246e7c3f2c50b2701129e57f2ee21d2 Mon Sep 17 00:00:00 2001 From: Ed Chalstrey Date: Fri, 18 Mar 2022 11:07:15 +0000 Subject: [PATCH 37/89] remove addressbase from all steps and reorder readme --- etl/README.md | 37 +++-------- etl/check_ab_mm_match.py | 60 ------------------ etl/extract_addressbase.sh | 63 ------------------- etl/filter_addressbase_csv.py | 42 ------------- etl/filter_transform_mastermap_for_loading.sh | 5 -- etl/load_uprns.sh | 36 ----------- etl/run_all.sh | 19 +++--- etl/run_clean.sh | 7 +-- 8 files changed, 17 insertions(+), 252 deletions(-) delete mode 100644 etl/check_ab_mm_match.py delete mode 100755 etl/extract_addressbase.sh delete mode 100755 etl/filter_addressbase_csv.py delete mode 100755 etl/load_uprns.sh diff --git a/etl/README.md b/etl/README.md index 49611071..5f879177 100644 --- a/etl/README.md +++ b/etl/README.md @@ -6,7 +6,6 @@ The scripts in this directory are used to extract, transform and load (ETL) the for Colouring London: Building geometries, sourced from Ordnance Survey (OS) MasterMap (Topography Layer) - To get the required datasets, you'll need to complete the following steps: @@ -17,10 +16,7 @@ To get the required datasets, you'll need to complete the following steps: ![](screenshot/MasterMap.png)

- - 4. You should be then able to check out your basket and download the files. Note: there may be multiple `.zip` files to download for MasterMap due to the size of the dataset. - 6. Unzip the MasterMap `.zip` files and move all the `.gz` files from each to a single folder in a convenient location. We will use this folder in later steps. ## Prerequisites @@ -60,14 +56,6 @@ cd ~/colouring-london/etl chmod +x *.sh ``` - - - - Extract the MasterMap data (this step could take a while). ```bash @@ -104,6 +92,14 @@ Index geometries. psql < ../migrations/002.index-geometries.up.sql ``` +TODO: Drop outside limit. + + + +```bash +./drop_outside_limit.sh ./path/to/boundary_file +```` + Create a building record per outline. @@ -114,29 +110,12 @@ Create a building record per outline. -Add UPRNs where they match. - - - -```bash -./load_uprns.sh ./addressbase_dir -```` - Run the remaining migrations in `../migrations` to create the rest of the database structure. ```bash ls ~/colouring-london/migrations/*.up.sql 2>/dev/null | while read -r migration; do psql < $migration; done; ``` -TODO: Drop outside limit. - - - -```bash -./drop_outside_limit.sh ./path/to/boundary_file -```` - - # [WIP] Updating the Colouring London database with new OS data TODO: this section should instruct how to update and existing db \ No newline at end of file diff --git a/etl/check_ab_mm_match.py b/etl/check_ab_mm_match.py deleted file mode 100644 index 98d82684..00000000 --- a/etl/check_ab_mm_match.py +++ /dev/null @@ -1,60 +0,0 @@ -"""Check if AddressBase TOIDs will match MasterMap -""" -import csv -import glob -import os -import sys - -from multiprocessing import Pool - -csv.field_size_limit(sys.maxsize) - -def main(ab_path, mm_path): - ab_paths = sorted(glob.glob(os.path.join(ab_path, "*.gml.csv.filtered.csv"))) - mm_paths = sorted(glob.glob(os.path.join(mm_path, "*.gml.csv"))) - - try: - assert len(ab_paths) == len(mm_paths) - except AssertionError: - print(ab_paths) - print(mm_paths) - - zipped_paths = zip(ab_paths, mm_paths) - - # parallel map over tiles - with Pool() as p: - p.starmap(check, zipped_paths) - -def check(ab_path, mm_path): - tile = str(os.path.basename(ab_path)).split(".")[0] - output_base = os.path.dirname(ab_path) - ab_toids = set() - mm_toids = set() - - with open(ab_path, 'r') as fh: - r = csv.DictReader(fh) - for line in r: - ab_toids.add(line['toid']) - - with open(mm_path, 'r') as fh: - r = csv.DictReader(fh) - for line in r: - mm_toids.add(line['fid']) - - missing = ab_toids - mm_toids - print(tile, "MasterMap:", len(mm_toids), "Addressbase:", len(ab_toids), "AB but not MM:", len(missing)) - - with open(os.path.join(output_base, 'missing_toids_{}.txt'.format(tile)), 'w') as fh: - for toid in missing: - fh.write("{}\n".format(toid)) - - with open(os.path.join(output_base, 'ab_toids_{}.txt'.format(tile)), 'w') as fh: - for toid in ab_toids: - fh.write("{}\n".format(toid)) - - -if __name__ == '__main__': - if len(sys.argv) != 3: - print("Usage: check_ab_mm_match.py ./path/to/addressbase/dir ./path/to/mastermap/dir") - exit(-1) - main(sys.argv[1], sys.argv[2]) diff --git a/etl/extract_addressbase.sh b/etl/extract_addressbase.sh deleted file mode 100755 index b6b94480..00000000 --- a/etl/extract_addressbase.sh +++ /dev/null @@ -1,63 +0,0 @@ -#!/usr/bin/env bash - -# -# Extract address points from OS Addressbase GML -# - as supplied in 5km tiles, zip/gz archives -# -: ${1?"Usage: $0 ./path/to/data/dir"} - -data_dir=$1 - -# -# Unzip to GML -# - -find $data_dir -type f -name '*.zip' -printf "%f\n" | \ -parallel \ -unzip -u $data_dir/{} -d $data_dir - -# -# Extract to CSV -# -# Relevant fields: -# WKT -# crossReference (list of TOID/other references) -# source (list of cross-reference sources: 7666MT refers to MasterMap Topo) -# uprn -# parentUPRN -# logicalStatus: 1 (one) is approved (otherwise historical, provisional) -# - -find $data_dir -type f -name '*.gml' -printf "%f\n" | \ -parallel \ -ogr2ogr -f CSV \ - -select crossReference,source,uprn,parentUPRN,logicalStatus \ - $data_dir/{}.csv $data_dir/{} BasicLandPropertyUnit \ - -lco GEOMETRY=AS_WKT - -# -# Filter -# -find $data_dir -type f -name '*.gml.csv' -printf "%f\n" | \ -parallel \ -colouringlondon/bin/python filter_addressbase_csv.py $data_dir/{} - - -# -# Transform to 3857 (web mercator) -# -find $data_dir -type f -name '*.filtered.csv' -printf "%f\n" | \ -parallel \ -ogr2ogr \ - -f CSV $data_dir/{}.3857.csv \ - -s_srs "EPSG:4326" \ - -t_srs "EPSG:3857" \ - $data_dir/{} \ - -lco GEOMETRY=AS_WKT - -# -# Update to EWKT (with SRID indicator for loading to Postgres) -# -find $data_dir -type f -name '*.3857.csv' -printf "%f\n" | \ -parallel \ -cat $data_dir/{} "|" sed "'s/^\"POINT/\"SRID=3857;POINT/'" "|" cut -f 1,3,4,5 -d "','" ">" $data_dir/{}.loadable diff --git a/etl/filter_addressbase_csv.py b/etl/filter_addressbase_csv.py deleted file mode 100755 index c6d273c8..00000000 --- a/etl/filter_addressbase_csv.py +++ /dev/null @@ -1,42 +0,0 @@ -#!/usr/bin/env python -"""Read ogr2ogr-converted CSV, filter to get OSMM TOID reference, only active addresses -""" -import csv -import json -import sys - - -def main(input_path): - output_path = "{}.filtered.csv".format(input_path) - fieldnames = ( - 'wkt', 'toid', 'uprn', 'parent_uprn' - ) - with open(input_path) as input_fh: - with open(output_path, 'w') as output_fh: - w = csv.DictWriter(output_fh, fieldnames=fieldnames) - w.writeheader() - r = csv.DictReader(input_fh) - for line in r: - if line['logicalStatus'] != "1": - continue - - refs = json.loads(line['crossReference']) - sources = json.loads(line['source']) - toid = "" - for ref, source in zip(refs, sources): - if source == "7666MT": - toid = ref - - w.writerow({ - 'uprn': line['uprn'], - 'parent_uprn': line['parentUPRN'], - 'toid': toid, - 'wkt': line['WKT'], - }) - - -if __name__ == '__main__': - if len(sys.argv) != 2: - print("Usage: filter_addressbase_csv.py ./path/to/data.csv") - exit(-1) - main(sys.argv[1]) diff --git a/etl/filter_transform_mastermap_for_loading.sh b/etl/filter_transform_mastermap_for_loading.sh index 45c62b2e..85c68b51 100755 --- a/etl/filter_transform_mastermap_for_loading.sh +++ b/etl/filter_transform_mastermap_for_loading.sh @@ -7,11 +7,6 @@ mastermap_dir=$1 -# -# Check which TOIDs are matched against UPRNs -# -# colouringlondon/bin/python check_ab_mm_match.py $addressbase_dir $mastermap_dir - # # Filter # - WHERE descriptiveGroup = '(1:Building)' diff --git a/etl/load_uprns.sh b/etl/load_uprns.sh deleted file mode 100755 index 6001f65c..00000000 --- a/etl/load_uprns.sh +++ /dev/null @@ -1,36 +0,0 @@ -#!/usr/bin/env bash - -# -# Load UPRNS from CSV to Postgres -# - assume postgres connection details are set in the environment using PGUSER, PGHOST etc. -# -: ${1?"Usage: $0 ./path/to/addressbase/dir"} - -data_dir=$1 - -# -# Create 'building_properties' record with -# uprn: , -# parent_uprn: , -# toid: , -# uprn_geom: -# -find $data_dir -type f -name '*.3857.csv.loadable' \ --printf "$data_dir/%f\n" | \ -parallel \ -cat {} '|' psql -c "\"COPY building_properties ( uprn_geom, toid, uprn, parent_uprn ) FROM stdin WITH CSV HEADER;\"" - -# -# Create references -# - -# index essential for speeed here -psql -c "CREATE INDEX IF NOT EXISTS building_toid_idx ON buildings ( ref_toid );" -# link to buildings -psql -c "UPDATE building_properties -SET building_id = ( - SELECT b.building_id - FROM buildings as b - WHERE - building_properties.toid = b.ref_toid -);" diff --git a/etl/run_all.sh b/etl/run_all.sh index 76cc9d00..ed625de7 100755 --- a/etl/run_all.sh +++ b/etl/run_all.sh @@ -3,13 +3,11 @@ # # Extract, transform and load building outlines and property records # -: ${1?"Usage: $0 ./path/to/addressbase/dir ./path/to/mastermap/dir ./path/to/boundary"} -: ${2?"Usage: $0 ./path/to/addressbase/dir ./path/to/mastermap/dir ./path/to/boundary"} -: ${3?"Usage: $0 ./path/to/addressbase/dir ./path/to/mastermap/dir ./path/to/boundary"} +: ${1?"Usage: $0 ./path/to/mastermap/dir ./path/to/boundary"} +: ${2?"Usage: $0 ./path/to/mastermap/dir ./path/to/boundary"} -addressbase_dir=$1 -mastermap_dir=$2 -boundary_file=$3 +mastermap_dir=$1 +boundary_file=$2 script_dir=${0%/*} # @@ -17,10 +15,9 @@ script_dir=${0%/*} # # extract both datasets -$script_dir/extract_addressbase.sh $addressbase_dir $script_dir/extract_mastermap.sh $mastermap_dir # filter mastermap ('building' polygons and any others referenced by addressbase) -$script_dir/filter_transform_mastermap_for_loading.sh $addressbase_dir $mastermap_dir +$script_dir/filter_transform_mastermap_for_loading.sh $mastermap_dir # # Load @@ -33,7 +30,5 @@ psql < $script_dir/../migrations/002.index-geometries.up.sql $script_dir/drop_outside_limit.sh $boundary_file # create a building record per outline $script_dir/create_building_records.sh -# add UPRNs where they match -$script_dir/load_uprns.sh $addressbase_dir -# index building records -psql < $script_dir/../migrations/003.index-buildings.up.sql +# Run remaining migrations +ls $script_dir/../migrations/*.up.sql 2>/dev/null | while read -r migration; do psql < $migration; done; diff --git a/etl/run_clean.sh b/etl/run_clean.sh index 58bf2454..7f5104ef 100755 --- a/etl/run_clean.sh +++ b/etl/run_clean.sh @@ -3,11 +3,8 @@ # # Filter and transform for loading # -: ${1?"Usage: $0 ./path/to/addressbase/dir ./path/to/mastermap/dir"} -: ${2?"Usage: $0 ./path/to/addressbase/dir ./path/to/mastermap/dir"} +: ${1?"Usage: $0 ./path/to/mastermap/dir"} -addressbase_dir=$1 -mastermap_dir=$2 +mastermap_dir=$1 -rm -f $addressbase_dir/*.{csv,gml,txt,filtered,gfs} rm -f $mastermap_dir/*.{csv,gml,txt,filtered,gfs} From dfe48da577ce62b59ddfc71dd330c836c5f33534 Mon Sep 17 00:00:00 2001 From: Ed Chalstrey Date: Fri, 18 Mar 2022 11:08:33 +0000 Subject: [PATCH 38/89] tidy script --- etl/filter_mastermap.py | 22 +--------------------- 1 file changed, 1 insertion(+), 21 deletions(-) diff --git a/etl/filter_mastermap.py b/etl/filter_mastermap.py index e291c7ad..af27a21c 100644 --- a/etl/filter_mastermap.py +++ b/etl/filter_mastermap.py @@ -14,27 +14,11 @@ csv.field_size_limit(sys.maxsize) def main(mastermap_path): mm_paths = sorted(glob.glob(os.path.join(mastermap_path, "*.gml.csv"))) - # toid_paths = sorted(glob.glob(os.path.join(ab_path, "ab_toids_*.txt"))) - - # try: - # assert len(mm_paths) == len(toid_paths) - # except AssertionError: - # print(mm_paths) - # print(toid_paths) - # zipped_paths = zip(mm_paths, toid_paths) - - # parallel map over tiles - # with Pool() as p: - # p.starmap(filter, zipped_paths) - for mm_path in mm_paths: filter(mm_path) -def filter(mm_path): - # with open(toid_path, 'r') as fh: - # r = csv.reader(fh) - # toids = set(line[0] for line in r) +def filter(mm_path) output_path = "{}.filtered.csv".format(str(mm_path).replace(".gml.csv", "")) alt_output_path = "{}.filtered_not_building.csv".format(str(mm_path).replace(".gml.csv", "")) output_fieldnames = ('WKT', 'fid', 'descriptiveGroup') @@ -50,10 +34,6 @@ def filter(mm_path): if 'Building' in line['descriptiveGroup']: w.writerow(line) - # elif line['fid'] in toids: - # alt_w.writerow(line) - - if __name__ == '__main__': if len(sys.argv) != 2: From 7ec09a7123429704ce692dd734e0a18bce3edf0e Mon Sep 17 00:00:00 2001 From: Ed Chalstrey Date: Fri, 18 Mar 2022 11:27:05 +0000 Subject: [PATCH 39/89] add screenshot image --- etl/screenshot/MasterMap.png | Bin 0 -> 38832 bytes 1 file changed, 0 insertions(+), 0 deletions(-) create mode 100644 etl/screenshot/MasterMap.png diff --git a/etl/screenshot/MasterMap.png b/etl/screenshot/MasterMap.png new file mode 100644 index 0000000000000000000000000000000000000000..a6e9e408733f5c32b4e20a6f4656794c4ccfa341 GIT binary patch literal 38832 zcmZU)19T?Mwm%$aV%xTDJjujPCblNF@x+--tcf+TZQHhO+sQxgx#!+<*7x;V)xEnm zYj;<#?qBT=Qv07WSPfLPJd z&dkEv6a+*vEHMdMQRNuZ@6&4*j{@eKBn$_-2rp<}&=i>D_eC$K|W$Z!qf)rR^p;v$+@aGa_ekc8dq8IK!n@y|{jJdbaMYtNr4PN3OA zS9vqC8VG*$QVsO1U7;vhF+*JfkkrB;CR^|Zg!+u8p`oE5VwCs#`|AWCk;iEZ#?>Fs zZ=JAe#0j^cesRe1C_~Sh-x^7+elaDQg8T7CEJZri<{Nv%8MeUFAa8%L$fs>h8mF^P z{H%A#I+8@}p#<3)=Fg-@0I6XxGT*zJjp5$253->EtA^N4{N{Bd&JWrnsmDx4XVs03 zL~0%NBmW5+=&oO_lRB*Z2JA@vWYA3cJ&bQQL>g}5 zANy-TbYpWp0;hTt}EgGVz8AcT` z+EXlPFzf3OBiFypK}U&pbKAv?4AeQZ7Wh0AEHN? zT@vILgy@$rW{kg16?z$1a5lCzgtiFJ5~OS3`xY`cSZWuHJ(LcJdY6km9zV!>Hc}f1 zV2c3TpZFWJ5;1Nb0*-J&42B6(HL=1F60ARI4g@uhbRc;Qm=;wJ-&H|)c#PEJgqZAD+Yhq5&v?$#G-OzP^o2~{UzUuZRLwv--$l4x{tsZ-R z9nK2mMglivH(WRH#n4wjs%?a;2X~-1sb0Y3VA2lKmcq8(R@Sz}H%kbW06dZiVq$tq zYvMDKI^+tJmv3ZZE~N2CA`z4YXzD1VzwY~P`$GG;$We!X?MeS84wIsjGR^DF3(q5* zkzh@9Sx&f5&>GPkF&=@&Nc|lNt1g~hBbqOo zue2XBBBK^iQcSDul4+S~S$v{2Z&ea#3Bkd}5rwN@)2tP|;!w|6Z|IWRlHr-``R9fA z9{eO}?zNb^1ZO^)HXBpXO=VKq%Yr(UB@(bZ`(yTDYGU@~NP{-G>`F{5!~kR^skM2=XlgVGr*NWj*|8oh4#D1EX@on zn`jz(jKelKHlS*sSDmE$r)={ASaUtUCvhyN+GH5(ny(w@TA&%l+s&HXIu;C%^`)el zHB7!)o|>dtv<^P_6U;LXF>Y4$U(~nf3v71ueEb3L@7^>F2GsV_Nzh5;?8*cr{N>b7V&|KPSFIQ>gEFhA%Ld2~Tyd{FQht)ux!uo1@4=^ENsNG%`spXpV#kRzyg|UGJH(aY4H;nE{ z*@o){hIM@pl28@nkE{#V!CfX44^Jg>AwJQqQ@9H`f@V2j`JLRLF~5|zw6gT*(BVMg zuzt<9o7Ar_H6?{i#ZI+@YK&%u8hZFNzRX+5G)z;>d_19ZuqN)rG(MY>#B;glJ^$SH zED_H#>@W-+Cv@B6KHs*DZYym%f}GX6IRGTN!n8(#|0WPb(PLgrSWHp;Zb4{~HVY-D z_g8s3JDH3Tt5X_Db~Euqkc{3~f8eK?ju`+`kdEiHDTsDBCS5zb(_lJ4zBsW?aix9A zW_uwRP-CDOi61wa$YnkoD{@@YqjR0{muir$kn|8>Yke^-eiBX|UcuO))8=|z%m3Nd zE;A+bB(w4dmr=LA&K$=}bKdc%$H+|s#sD4ckGpbsjblq(EFjKuB8>y5#f>|^dij(6e1--oA1!0 zqEowzk6NG^;M&k$!{O;!VOVIL2h}_8oci?V`(%NTI$O zG=ZI!gB{O9_&C$6Ja%Rgcd>(GFU?MIfJ$!Rtvphrv^Ml)UdFLa+UCQS_1m)cA?bum zI+7spI^Z+7_WbN&7E6FJvuV%u_{nr(zkqH+9k-HKSKy&^wsMd+oe#INS;wm5!m~Z! z>BnI)o*vi4+FiTHHs<3(;%ZJK-(}xg^J?PUypOtl_f$;9eOaG1W^W90mTV-T+#vE$W^7Z^x0saX|2Pk;x zu?cq^9^2g`H1}ELZSiq)DAfMh#)~dA_bB~fd3WE!wPN?s?&7`5i7)u}q`S}>`zTIS zCv^5X^s@9pb1_>RyEfT0X{t}(QRj{D@nX+z<*Ygqe?1@pie9rdrwXEX2{_n=0QtEH zN<5L0Tsa7t`b3FMb(Q6A1Vw~p4)Q>1Y5f*{2bGPgIrhT@O5_K3&-4a}tG!|cFr0tB z+m{rx7zRg9T0&xz7}OpU#QPYRfWQKm;9WTo-HXU?!41vnJ-LVgB3pWqlbtg!{C09O zF)ZaXjqtOgTLUs`>+(gP?-<8>e6~FH>t*+csfLu9oE!-47yS(cG}HnF;)??PYPeqw z1Oz-T1O)o)i2BvUvcdkB3#yb2{=YQH`#*vrDgY^|ucL~wqp7K_(@#5RLgiBTFH;K^ zsv6E3a1HA& z{+Ecel>n)RoFXy6&e4>ZlaZN`nN$##n3$N~(Zq~bS?v42)xXXJNPjvz+w(Fpxw*M9 zy0J0ZIhr%E@bK_3F|#tUvNC*0FgSVGIvct(*gBE@kCFeiBWCJk>}X-{Y++|h{EuBj zBRdyo0aDU`9R1(+KXID6Tl~k9t<%4U^)*1Ie>hAmjLb~`XZuT)|DRr7MGJRRYfUi= zo3HeI`4D8~VB`On{Qr;hACLc~sqr68Ha4#R*8DHd|5sDZ$MD5|KKVJ{l+IgCrNko3v z@x^#eyj(Q4*4EZ~xq>fzAI$qj5$-46MQVV;%>)59Ntb|Nf>QNO1A-pxx3HqL^lwFU z1a3Iuh4mfdRfz$DzK3X_+vm5LW`18~)?tSTjRl$$xcL0TwKYNLB z*x@#YT)Ww+?R7WxsST36lN875gW>96XQK~a>(3{*$LMSm=^SfF(23bq;xbK#YIk?uUp7b$0^IaHH*lcFeA2|VYq zUpuV5wDtGgmBUhDHRi8~Jf2TgEiWyt<5W>ZNj7+kqfz79dw0bu@{loBY-DFMe8EUynE+f9K{c0_KK`76C9>jcQ~gs{l)1J>8oPw9Zq*@N8&yH$mj2egEAS=BXx zF+TrC_XieZFt&ro&ArMTK>jF!n`F|e?IYd?dGaCCCXi0%V9ei()}?t@zIl5=n?jq@ z5;dkm9*#|3h>})X&+87Qc($pU|KkLRGC(Di667|R1#uu`)dL|VD;Z;D+rlGr+}g{> zodJB|nH4!7swK2R+;bTfe?w0JmGUFI-Q#I!{tVK_{)UgY~9D> z#Rdf(CgT+6{d9C7g~Eq7F7P&Lafs^ELvtV?wxK;I-g>=BspTDz($O#CNZUpPr_42}u(ImoZcLhN&AX`!c2vc~Yr`BOF#DE$vb@ z=19ZHLcphqG+396IaC|RJ z@|tI*RL~yY&>5AKwQ}`XOFqYL*c_*Xhtrrk>&_SCu7}4yo0~4z>ZC?gE9gUwj7?XL zVwLk<35utSP-cdk+B0R|G%Y?kJ8Ex3fpmfsf=UVi6MfcQXwYj8Ea+SB>)z=7r`X<` zb?D1NC$5jjT;uo=^>SFKR3f_w260kawYaG&=K09)B9l_z+pysyfq641`zD_o?Hf&G zQx?9bkAW@Mq42GCmyITTe5(DwYg~_&HDO)4ft^pH(ut8V!EEO4?>GA!=9v!(qIo0Rx}Ea! z8T=G(Lf1iMx*f6|C+pgyDtRVS&uFMGg3P_4dfu9|SD{1%-q-6L7P)7u)c`PX@ZcBy z8B2K5`ZUf&tJX)!!%YI;jXfz=yB#r)ZxWgRE&DG-^yO6;^4%yTp)D+Jc=G0AaTYJ+D7W@D{?w^ z+@EbKgn;mi{qYB(^hZD}f!LTlR?%WvGym1wZ5)X1zK*189v=}tv?ePr4Dm;8=@4dxI@B1`0Uv5RKj{-PP&|S z6>ASixvb}t)hCa`jT><_)GC`?8-A<&&CeS@hqa4hcJc^yo}LF6uU$67nMD55(>qLo ze!h7SY1SL~&sl8fz+%x=oP@RyS?S2u?fK`aSDVT4xbFS3)@#=Sk8f5F?ITPj3&sBx z^+#Z};n)vNDgU?hX~Bo=i~MiW$M&ng=pgd8k7O|ekH(ZD!wv9I?eW;Mt`fHKF;)1q z`V0fA(_3X)f&qBh%dc-ZU`gWu!}urPt=rKydjT&dr~Vl{rVG&u#1jK?`F47Iv7ou zsQl9@BKnv}j4vW`g^9vYLbj-o%9Y3Q&~=}y-UrVco>1PSthKv$2OyZTLMZ$Cjg&br zA~VX$(`_Xl@&zub(f|~p0W@;sJQJCy5u{OVM2&&P;$@ul7nC~ked!`aMd3-d8GmI%kd^Rh1wU_a> zUaab2iJ$f5wt?jum%awlg*yp_Xt;bPA(b;iS#;aB?MNy8rg8{>|WW2)Hho z)MIL=e-?;{!=&`-h<_e%-^p%fCbuj>qZF|Lyt_S05XCoRXBXN2gjR#3*slhJ6w&tW znGYmSJ2EA+%-W;fPY4+@GtUUEOaEbF5@22uH-_qQkGPXFYsZV2(BC)|DK{p+V^OSM z``A;#ZE%e*n91>S;{kzshchD$3k%!WOVjnrU~fOb)eK$k>7h|kl{WW~WMiQgdT_NH zzKC6uBMuC|oNeLz_OxR(hDn5S7)}u=hMBddRT4y1xYQwL;fEePQRA}CoA~3W9&9^R zSVUCI=+{&$PI-|+Lt2-F?vD5^>VpioUI3s`M$brB&3Cmqc0AOLv!~RHBIJhvQnYXP z5=W*qrQ2WVQ&rL!Kn)<>*Y#E{-5F;Io>WV#KeoKBqq|8;XT+LN)Kgy;l1IFd-ewslAW0;F=xyp5`wjIwE{zw%bYEq za~@Y~z+bkkyw)i3n3EO%w%*4|v#^e!R(qYmud3a6b7AA#vhzH z?5-`$)To$SEH#undu@T@yo__PzP(<~WsFrK=G|r_!(g`@K4X!iviUblnxDt@TNhTxw$f=MX1nY)^|J@E}d2spLx#o>S}Tt{hrXTzfmco`C{iO z@xBRXu?}xrB+?%qkqBSc%g`t(Qdg`54mIR`ltQ;yjNpbq+3zox>>Vv~dQrFc8oF=b#MRu`b~V3rtSjO1cJZSW?)w^3 zi`^mOJ#e=DF+Uap47|2jV3ofx|Ld+!%6jpoVW?=8$CS?~v3;Y_;}mt=A11@=_AVY= zYj(kN{#%=m$18cwYC5lPpr{z~Jc*gE`>vSL)t?(;vmz}Zn=&jWw_(X)U9>$<=F4qE zc(-hh8bw3Qr;or~jLdw3dyhAC(jRWxU3aWZ%_Q&!Jzvc2LON9PkzD}*yY#MB|ZXa!SQXs0!$WD&sp z;A$2nPA3x#vK>YRj0z)~nyNmP6OEN$`LOkJn(9{Kw_in?OFAAKBI~pq-4D+c>h)v0 z$3NrVi9Za+z8*BO-Edt(`Ag*{w4O zT2rOeAzfn0+)1*oGR_LjHsOKafx{1d4vh3t*tWge4#Vj#&rRo=jJDL{x)C}Dq;*}L zh?E#|lX)neI6aj9p0k^mcC}t6bt0CN?W(JBDU@eDt?W#HiXC`Q=0zN!Z99=dv6^)~ z&u8E*fvGg@+E52RGeI5*Q?`Ml;uVXOa>jwrF7nEyj!2(TX6vzI$JxX7>~^9SzO7b0 zIYB5>_3DMLgFBlGHj-sR&K8V76%qdyRakn@2_+}z*I9qhpZw-?j={geHu(nb8=Gtv z4~xab2xl&vmOWn}f=gh6)@@~2_D@dE1K}IE< zUXQ*gJ&-8FgGbBv3Yg}@?aIkZCY)#TPuYSugv{G4!yZqOJEFrm-MJm~(Qq7&BrOJa zszfoLxgtwWOGFdKGE1gXMf0A-ZdEDTwL7uM&dNv+B&;cHptSx`QMoqQv-mmYXm1cx z0a|m8iPwl`wsL5>)wWiFu>~P)yoydhO%sa~Q^X z!k&3r_!qr1Y>4XWZS`O`H_?D?V0;`}L{HG}?5&!{37YH2nPEA*`I~-;RbD0=Bd_#* z3Ch&q1%9pgIvgNaII;U{j%0xRqNnE8*i;u*&~6t2&sb2-=o3k(;3q*LctGgO}JkST9Z4LyBHl+>6`|>C){4+CN3;|r#0#<;SwZE{i%X#n*A{Y2R{O(`;##-`6zS#CgtszA9Xw8P6|r6xb|F*%W9-P zhxS(xd8hjeyLBz>^zmbC6L^cVKH{qs7l1 zI7@0Z@7O{4Kn2mug(tGm_mTG~V-c{)gH{_h9SCKad6Y!!m*{j@G^dT+3QcRQB_9?m z7Sexb+4z%2qWhP6_ok8Lsg9VX+bx4K{ogX{K}-rjA0J;Hu+wAEh-iJ(pQKW|M(n;# z*YloLgd+nlTZMqj{a3|=Q*h?!(JRPqwF4&3b={v;0aW9&e_asrRuu1r~8li@s8G_-VF2;SG>F)taGg` z-%Q-ZFFVFuOGw3-&`g`95&`|oGsL&RkgvR}_hb);*Mka&v z7M^qx%^KSsI_DHvp`|@QhjNFRvP9OF{R(3_dHUVbn9Ky#q$RW@WhrFlKVS9+^*z%$ zlKG#V3h~ZsNiMx#deio~9I!txsMNqU$D{eaf-ru6Z++^1UU2IwvsuMih!ZDPfuL5p z=q~XhIWoUB)=v5d0s#RZA1jaj6Snfl;9mg91RAJ$RuNrQ$_6%_mAu+tG_>&0{$2)` zr3Zu6z*+^>dO3L~$$8irYY^#qObxeL8+yS8pQVp&chKLj8s@x zq4HL|$h%9-th?l5SE_!TZm8opqqUcE2aN9k(E;58H+dNr6c5@`PVO#=-B;kkR29v%J5Yw|@Gl;LXN zKj;s&)!iu3`-#}PUp|zaPwhpq#*m_46(T%VY@N3VGxmfm>`v>RT^6MvLmEuXj9 zF`nDTaAraRd9Z`e<#UkRO^o0trSGXc1ihd&QiBgqdv_hQG3x;kRS77tH6mTU&X9YsA-gF@0hIWmfQ0^k87}BY_yB1GX|mQF+;Uc3a$BlJ2YR>kbDmo z2-2w5adQPY8GKj4b8C6@d`YPryQ4p*VdD$;x;)HG%nd~v&2|9hOKn&NH>9Rv1c3AfZX{e zM&^#XXPK-_m(Hq}nkUV-wYP)6_0Y@}ehFeQ%hgIfesIk2Pwmtl$GNEPJS6N zv7U@uuGbQi^I>(M9)ufMNhz}jK|yTFl@@Hxx6^}n8M4mTXUhhEqo0D@BA&Yq_*^b? zyqhsEN`RmJSSI%~rUrl8O@y;1^N=f61^oGrb006_2_H9^`W?Xt(2n(OJoU3GF;B`(|j;EvnHw6RlJR)1~-hxYg4b<6Rgx4`)yDZJ( zGOalS1&PT==7nssR(q{z0@JixX9v~YW~0+L5)C*T#2CmQNeWe0=%NoH zTB#&r%#!)z-DLUPa-WRn$OOA^M$q^B23&h zW~(r!34LL-XEZcTU>={D0yXvrvc-@XqR%C2`9jGLGtEvE!0rG?ki{<~+>5kgIiI&{ z>S9O|9uekvB=@2~ao`p*RRLdsacx#4= zBx4sXX=O=7W~I<@QPlnLR!CFZjXJY>@5kv-Ew1&87MZMKA;JgvaNX60;aRt}%#)zT zDBae4md^M2#h9w!Klfw(LySOAa{Yx)Zbut8|J>8555Y2HzL0?1thGe)zxBY?fTOPP zO@$Q{&fO6-f`o-fszFH)ylwjt1*2|V5djmjtwju4+8mHo2~1rKjMHf>M?p6#gt2E&*b z_Og&P)D2$kc(NzO19@PcTo(ul1mDB9UklUb<7vX6Q`3T`*!2a03s`mN^nN&7&Q~5! zMa9fi(b$YQoa*LYZf8KvC{0&nU2ijsOy_q18>??ar`6NR{k2RKjEF@M{2No8z-9pC z$GTYgT;`7;aM-ysz|<8JvqHZ6aKFirs^Se8fJUx162yO za+Sqt{niz`!8F@HQC#Z*ni6f^5t<6N^d^GO4}8Ctixnmc9>i>uj(ohTur+lmM=1%Z zsuf$%W_ZoP9Hv_BE6%T>s$#(pwZDw;X(GMPEVI)!;Xg7Au;{;3Tmw}ggmu52 zyfe!E8yhXRLPScd{vEd(wbrf@ok3IihJ^6IZS}>aWov{#`Hig%&~e_+!ZvA=J%7p- z!@VC;Qm_7({U0~HsGOQ+F$-H$BeMNL%tKP-XgKb52V$CCu z)E+*Ya9ZYL_K~;sWVX81`J*`--TfsG!bn64Mru&KoX%WYOmgUt8fu$);a>WC68ZiZ z8CA(#(jGB1Q8f9;_k#)RfrChF6d|AU&eQ>w%1#oBk%apQ#o}!bF5WPLoPCZ(X?uI* zWHH9#`E1aFGt)LQ+;yDHNbgZziQgpw(#)kFF3c_zXzSq?d79^y5XHffm8xPqlsS8Z zB+nC_d*qY9e!E5e&HFuHT5J+W#;Y!si3i9j)6pV*8IBJ+@E69*ql?l z*UcVz%+Z_RL|CwCq5<6%K^quG4`!UfZJm6!RHw;z07v8lb`VS8VLF01p@m3Aw$dC8 zc!^%CQc|O+GYc&w=sjB|qycn1t?jOQIJdmJ7OyuIM_@1Mx6+XWqdy7iJfmSkU_r}$u9Q^neIt?9PNJ+SBjf*r~-C< z-}@nEYt|pGa#5A?YpjJ|@j2XM78$-iBcYScH(o(B#%{a+rcF4TMIVa(q!qYUya{|FRJle~uKtb?y;=A4 zv?G2iXdt|^#ap&XIA)?Hc}{7O|G2Bz?tn~H+t$4J&dVI}eoVz;YW9S)`=L6I1Ffn2 zj>)jUqHWi&ZTbEr^C@kw0H(sPSsay@jRv zyKGjd2(*odpF0;q9~(=iv50%PSo7DXdT-}*osqsV!bST7zQcATYz^-VzPHsqHz8b` z(vT5?F3MKeDE(_h5}}n@z--Q>ifro2T!cuvHV3`pS22!IhXPtkWjjk&pcm^5%W9u= zLb$MGBt&hXaP)){`Gs&qLWF&RipkGYb$WG}`bjz_fT(VU$cGqpModG6ez?CyMn{q} z0{8cx=T5Fo(LyiT^~Q1uV}HxQIEw0)!(#J2%%eZU=qZzggc)urzcJM2s~}|KGU`Q2 zf85ORU5aWUB+-z4`rYl~G$C7E*O=z>Vd7hH*p^xB`SAwCZU11N9zwvIi$cvHk*<4& z!-!kR;2j;!dWa>BDO=Kv#up4?yx3Afo;f$hZW;x68h3EdUN6LdhX9_;U)z=agIQsx&H)NONKU*PgL`MvoOszK{sjQ+Hk6DiNqHjz#$$z zn3y)X7|32bZ9Yu&FDNLrlN=3%RD})GP%nbPn3Yjf;;^0lA#k?Qv3`B@?6XC&2Wp9Q z;19o+AyjbP1;H4-o_g zDm2Q35oI(!2^&khn+TZ@a9RCyFC?>3CXv@A{px9dWiXrwSYm}=#&)8#+##vGP|;o4 zjinGzv|0Gl(RM*&@b|iZWMul~d|&d^*0D^EON!o4)6xD{I<>w`6j#TO{He25E~$s! z>Wyk*HL8-}E~KYlahGU+gn*yryh;J-At<511Vq9?$rFm8+^CrtZCb@vRVtVe+nJ_A z1`XlD#4~LD_El}1!!ju}wbXpQcnq0TKLRi*m@w4l;c1`gsAGp0$-iFtxE!XWyc}Vu z@`kb;;YKD=JSr-Zj6~H{o7kj zeymOvQ%_^|qjTWdQ?iWim#QWCBuBl!-K&cZhlt&OH&Em(h7mZ$Wsd6BFfjfCPRk1| zLrf&j=36M^*ISKSLssF_67apn0+zH7b3)3sjd@dDX^BAJpG(ByOoJta^uT6s z=Ov?`Vc{nIkGb)FTEi)%GE_7$2a_Sr(4GD%${1*7 zmV+FdB*wdfKs6=$j#O5qmIJT#gK-l^2(S`e4Pc(72Dcjiw~A1~&r3Qzo#DJN{{W=N zAB*~xr_hj7(O@~JuTQ~;^%X^d97=ut%6|S8?4&*M(1XD*ui0naEh<#5pMu5kE^@YI zGWJ{3{GP_b252dlD#jhGYw-ajv45zUt|5O*Pfc@6!A^p=7Yz;9sF8F1IIAV#3F_pQ zqS1odA1<(ZpnNs=shPIEmqad{=2^@E+}ibyB$^|*F-A>3_)1uE{wzZ6DuY2wxE26! z*&wUbQ*N6;!VgL6eA>>WATY8yAAiQ6MeX6qD*DVWh`_Nqi~DdVNS#(Dzh|!vEDf z2kYhdwX}+p$4W05fP<2PK>_wcr=M?$YX|gmpwD2FD4DAOFTO!>Q89~Xw6bkhtLUz4 z(`xrz0bO9wrlmU^zYXEaNGk0kF8~ZIyu9G;ty~C@&#`G|YyV7UbJ}Xn`4+u5+>kd$ zz-s|eYIPgF`gqTYj)m7yM#kp- ztxH6HkwCmZz!-g{!*jpRSL=V==RoB%5j8|zG${+Zuto2P-lC$qcS)L7Ae8l9{zA$j-7(q0R8#x@;@s`$K`tGOP<}t7UVBnb`JI@Doy`2% zq|zfdH*Kre`oo;u8mU_YZ!00)gR>6@B=30DI z+K91^ZHdAZNhL&@%Wszec}C_hPC{I+Qzwh@drRtsDL&0GPG^VhK(vtCiD;JhyGx?= zPBGgpyEmFd?ZbIK3OPQj!|uS-ksgRj_oMGo;jFAQuJzAH{_8wS2{+F4MHT|XLl|*m zlU*)DzKtch)W*wfR6L9WN`d<;u6KKsJuJzi`yNx$UlU9iq+qsrV*1Uo9mEx!? zw=*$fV?IA|Soz!(2}F8}n`4~i;#lf%8UYGHR2(c9C+V|2!h_M{G8~XeKZ~i4)(yMT zl|4_{Me1qgMjRv>wr_#RVX~QV^5T*&CVfgvpd#%%==5>Zos6y?Hz>al6DfhnB5TTI zZn9m01EUK+Z^Wh8@bZjYEtSDmKu5*2*nL_CnsMPG$r%~*MQ;6XAS(|Gfw;tauKQAH{4YMuQ zo3F$V621d(xptIcSA#tbHVFy?LaSJ%p+;iUP^DKZbUBF95xJn84ykJC4kpzHM2}A& zs4^v!6e^tqx^rdorBj_g6p0QCIHw*_dX+RJw=rp@>DO=*;P|jAREGJ5zr!Cfjgm|| z!LByh@C}E_k2nt>HBpP2Ny4yuUJ^dvCS@gFyACwa<#O83^!_YQ_|1lOS7b@7^xH)x z*pedw<*X0(i3l7T_C71mrxibBz>TqMJcC>5R;knh?b4EceP%Ju*y;?g-j)twwJs4& zB1%}7Srhu?)bp0p77RxSRz1`!*YCQW9<^#(bcn*}MI0{(Ryngunz_YFUN<}k;9AnC>Uik&m6**E-$ zWp~Rw=!lYF&%~3^Ypr1^=Lox-^$LUlQv=uzU0+mk3bQw7bkHyaw+e~e)4Q8b!AJ5F z3K+Ku$2#2%kp%SRw{04%GSSD&IN~!R@!Xic^q+&7j!^XLwTn0q$4O*`K6;M$6T*#& ze#av2+yzlMTfm(sC2Ky|eoS2J(xxuY9k7+*cq&%eXfF+?J$MN`P_nzd(8#@~*j$66 zWDzux-d*=jV`5? zKf~NdQQ5vBws~i#luGcuWSGh}WH=EkGNf+;k`$oSNmvVW*|f7r6;+nZGtY7TdRti= z3@6P50zYT*w&C5PS0m!G-9JF}nZv{XqN4t?wmS@&T`!a5Zd@ECPC$`$Rlwi!IkPy9 zm#SPwhs5ZGwqX^_J#_Wbc2>6FL^+F1%f$3t5DUp>j;2S$3-{4(alKEpBV=utx2MM~ zBCl#kZVsE+XGktVet6Q1+F+4bF-K&zrWBV$#IcW{QR5YDfogG81r^?v7$6M)Q0NHo z;GvIGdi1Og3;(WKXs@l7^LHXZ|Ich!T{IEMnLlGJDyv&SLMZM)%8VHg=6al4hld8Y zP00i-?R-VBKz!x!jVdwNs3*b~1;(%1S&;BOQ5B6Qe7hMXM2>cO)L!ma z$L|QtO0tLT0HNGlh@H|n4sGwX-Lik6^S9bnz*2`z9PmUDW;xhrXe|Q-Dpj#58Q*G2 zFY1R4`7lX>C;^nTFHV(vQli+ny& zaF;4rji6(15K|@QAFplB-1dNHUt1~A3xAES99tH!v_Eu?+F_-FPbqsZrlVKm?CMju zM}%FxOS>ZouTIrTe>QF0x)u#|1P;toYOAHv(ZVO(Hp>h3ukr3pm%8-O^braA8rRl( zdgK=RNlDo=Tt@#G03Km2qG$cxZM(`GjH_34v`!kVQz(fFe6Grt_Lm6jh+cV{M{^){ zELOI0&xpUt?E()M7-}HA9B=m~L)A7eELIk=&)~PVnI`CI6bp~;tNb%JYtx+1W*;Qn zgqq@31251xN}nH&Bj}UBLJ6k2uSZE3p^_2CORF1e|EY`Vu8lMyY`C=t=Lq}Q`TO?@ z;hj>m)uejud93zDe8$xqq}xV>qZ$Xe`x`@)&4Mdg0Z|0(WwJWIMl&^3h(S8X4n>Wy-Ay9F64X~&% z5es@D>D}1?$R9TM$igt8dJ#R>BMAZaKKczNyBBalgl z^^an?M*?BWb@vjup4YDJsyWmc=8kTAth0lEx!^j#8-XwUR~mb-!Jy)SFhHx9ISD zY{C$He~&C|KbCmiVM1mcVW$^mVt?Zrd@Hny_lc@h{xwd~z;;-H$SLPk7GsjGHs`1z zvvnNVeSU%Q3_IrmT0fwPvZsj!zw={SoV4wh*qjji%-aQ>6wZOOgooZ_xGY7PK%%=@ zYW7a!%8D2j3BqB}ZsyTl=CSJHm{`i#@LCF7g15K7YtSi^dPkn1Ej_60`Le%TwGi%5 za`K+=%{EOUGxleaAGoN7kO`yC6A8xdav5IcB!7pCp6|{xBxms4 zes*{7`G9%=`;S*CN`gwEEN>03U>XdG@M!QM(>_pW&%aJPS+-Qdu3%M3U8$eG{J(fD z5F5y^`DZnB79c1+MDHJadiP?Zh^^3#hv1uTh{Kl#D{jP-=8U`mz zyPWq5$1bq);ydQy3BN237Rr&mORZNoH#a}5fo`5#{abDo^B%n;^^v=SsR*bOw>vvI zp+f-;1>XqAv1|J4!ByB%g!`O)QPwIddlN03#c!|$W6Z{A<1>99F zkhTh@*qP2^P$PUXEgN1$QJbZ(A!pt~6E@O0+_J$E{nwQh<}B6%4cZh=z@kv>&jK~N zdT*2L@oMBmX)G+e)l7!WVw-R~ETm{Zk5#u4cq3BNxde!^Wzgr}U1StEn?H!ub!l*x z5*39C71HMdS&Qh}QE!Y_Pci27K26_xQ2>mj>Xgx32rC|q2XF&{`s~NkHc(21z1Xvm zgFwRjN?I^GC`O=fo8~lD8c}#Co_}Yf0EE)-?3ReEz><}*+d8xGP#dJ9UkyTE%lzJmY~&}nM1C3cQ4m>O_+{HO{_H_ ztqNKLyF3|0u296wdCI9RziI`l;{9fRkuO+ZPyL83UsO@(%(r_uj3bSS2;}^;b8-Or z^$RszMaF@E%-hAg(KMm8$nU-fnhgsfpS^GCN2+8pO1BAx1l&V;-#V5%vFivN$_ET- z`0FjHK18c2Mw&N=!hAL#B$or|g2EGNPUWH=s}yecJa}zUDHz*&9*tDs5goTN=`UK_ zvFpsby#@4%D<@t04pF&AeHrgv7#*}bfk+beEZmN0R^SD9Pi<~d!?PZii>zHN_!smmPcZ^?Ux1{-XlG>v{?5e$)Rh_5M=99Y* z8+76=f?Y4qmA*87+hJG@!ZB^Tj~*)iU)PH+EC512YzHps#TxtMBPiev#aDh`5JgTBePDWR+a-LuUO-32C1QmU(R zZy5Yi^-dQnDN*n6Liz6pEQfN_BGA>?6DEPP;L>`kBt-|hlfniA} z?(u{|)W7;U#_qN1MAyQaP|%{Ty3A3@-F(6RA8p^*o=5k7-PlfJ+qP}nc4M1O)7Z9c z+g4+vv29%WU%h|F^AVnxqjUDLXXeyiYfY!!BMr57enAdz2Eq!M!$8o|hokXIihKgX zEvLe8v)HRe}^p;V1Yil9Sf+hwlb^az06%wDF5jeoKjQs!Nb7hbN0S4?Bs=A z+dK1I1)q*2t~QP1+aLl}3)Wk{>euVRLiD3U8vlyqFfPNdbN8y|@QNJhvBuqh|st{?lx*VHAX;mp^l4 zeipe&>=$~S{n;_!+8ag%5_4fV#Dsh@ZNuM40yE6cuEZX5uj-5{dT-dSp22yFkvVPd0JX# zy^)Ic^HgEbGB4R?DV{ZMWSK!Lu7rPPlp5foG{f@3LW4@F5sl3xv=t6g+WR3A3AV6{ zBp8>KqbqG0jSuw8&i5`ls#hZ9cuUe^OEyZyS@SH}Lqr3nV4Nnum`MT7q~f>s6C%l1DtHVoEl+)FvkAf7DkKV0$)JHK zk5aJ}#dDCY3|7=O;dXX12#4Kc;sVN!KPfKJ^zr~@PDwX&YEsFw5T?(52$HP7RFmA!cqhO{d2Ch3tz9`C z0}6!YP<7M^mz1O|eB&DxeS&eV@auWAO`$o?(zQ>rk$gLMUYhfY{kDo*K6wyW?+&F~ z*C*m9><_hG!Xhl!fhNZAMi`4&{K6^2S{{ku@Hl6$nEoxsF0ytfdRbz3YL%rzS2TAeXR*~f@~a!~K;YTsoLk}6pQ-sOYRa}V#} zwf$9mb}DQCVu@uj@7iEODnfvp*ISf+8p851#7O98=6#OSfa2d3I&k$Jh8n+h$ zw4&NxQ@8CQ&AQhqCohJUR9EGAssi!U2|#d6@5CuKsQ+ z+nDe4FzGiJpE~HFrqm?m_NBSU{ms@G*)gH8e9RL21g=|GI)W6x5f?e)E$(E>!y8}m z5#gy5M=wo`apShP3PHeGBH}B5sj0I3APY`Ggb{y@++SiAvpf%h=imcIJuP*JulF0j zYm{Fjy?zD);EoOaBXsW`(#%GsyCg$;a*EWKb@zAz$h5{MfmM-2S(Dh9_=VSe%sFOc z=nNRjsOudT&FGA}8c^$H`n=5Dyu_uB0X#ciQ8}um3S7UA{Vu)gR1A?53|R3<5?YOj zPjpPx5=cb%(6x)zhh&{wVORvtEUBZH9X6JY<*=Zs_zA`Evac@5FFT+AEO~fvvE=0n zyh>fGS#r0EpvIKx@XK#L7{aSApF7nP#18Jyc{j0oG0omJ+noC;4@IcjvKX)sy~)X| z`uR-3%Ktj#uqOK^OWK_7Ik8d|s#+&fOl<+Fezr8)t$yyr`fUdK<^fG6C@GCpnPA;X zNbein{1JIGJob{M1VEb*-=P!VCTM`=#{*UMN_<=eZj&A5bI(Tv!~F1Wut*}|dsi19 zoz!<~+ona*I^auLf7FW~Tk|YP6o80B_USl=_ixtM1lmzaaT+dZFY*AFwS( z=V(mHMdp(Mm@81VG53e^FDZT|KPG=5|72soqwsPVp#x(!zC z+FEDRckwFhyt#IoKW01CRi|~#+Hjp;UyJuOT3=tw4~^t7gJQcG4f6SkCMdt2iyiTx z7=8h!09eH%=Yv{)gU+?3g0REF6PBbhdVqO8-L$Ap1~P&GW7dJ()7HZ;N`^Y7Ts2pc&WV4FoXMl|6I_M7LsrZol#5-4VLxI6BO!N>&zi6I9< zQYJ5y%(xy%!?9LG)Ml2Ueb%yw{U#oLA+O>j`U`vI(|B3^c^6<{ph0$PeYi-p6PBxi z%BA1yD5{#pX3D!U$1=;?*o%~zKk*7(br9v^k z3vz&zx_&O5yl0FZz2!;&`Rg6-E1f8r1|!McZbH`ASA_zk2JhH(RB>|E3D}(?z@nA-pXNNB%C?_*A8^<#U+GbAANkqQVyVYYUaF}NxR|BxODc@#bW4&sbT*{OyvGsB(1EKl2Ame{vm2 zdijL|A~42_y?EVtgmJ)DJX1`8xT7$>PUhP%+H>3I98y#Qf&%k$o#Hx9q#kSSM)6%| zlHFY0EYX-UutW5mq?n+2RQTj4>R9k{!ME9{aF(>p)gatOT4Gv_OHGQmEsuNi-Cc8M z`7l@(4}iicsKx6ZT($Yr<|U5p-#vYmdJ&MbVT_Od)(v z7}7OAd%kYvhF3MsO+M72z_xz|l_~|{IUiVcRcM!V3uwbrVE^s-@oemMa;uyG76N^K z)ZAJ!rd!*bI-}2r*;G7T&#n#5fD4Mrmwik!Kb;I}exb3MewiY)B_7Pbb*hM&y=kxQ zTMH>|21Qo&8K*EN6!2Ex+4Z&k$YSxb#p3#6Bh8geWd==lCR{0G;)0ewlbY+Eg0$Uh zkmJ|TE|*&`pFIw41%S6N4BH`JYk(M|kt*&HD*;l6vkx9uc*f#_wwhfWHhue6GBZw} zJ9{`M7(;x@ybi1+N8fu5Q)$7v&!2cQG%_^~&F&VfS=`Q!*lzVU>Ju_3<()1{6?Q5t z)?M<9v<>`wlH~|IdeL1^eNq{>3gMfSLlRhQR`u^rM5E0%oIdPBEg(p&QdpObb1 zI^-{pM4g|CzfV4otv}AOILR>%A8xhsOhDyMp(BD8RV2er1_D|q0x!v3y~b+!U(NS@NZZK-TFK@CwaSTlaY41HMYGyaO`m* z!dlHx+JceWjHHt)Z}h})I$-1;=$Z~4BV4P$81f~9-7TI z@+BRuO3n??rJmHdP?tzagI)?Hq*0}*WwX`O(I5MB#JEXb8Z&B2)2Vssy`^eMKDgnt zn2_E`YCe2MMIBk*O{@@0?Kd5qgxC0$CYRQPy>OJEQf9?4jr+OfsBaE4i(dg}pGl0J zs9~ZUv$&svqXPSO;B!?*x$|sDC!5N1Iv?#tYgivAm7r=!sgRZLhLZt(wAJFqU5nGV z+hMtb^0%X-#F_~jWfJh;+U`ZR&}h5e=wU0R)Ix1i@lFi1g1`2 zCMuN`9`|&~PctlgIOp+hiqeLP+~!CTVJD{-TAQ{~Xf&0l=2|Lf@57^6py1eFu;t8& zrG0Cn5axTC+%k~tGtbPl%2J$tYA)gALUY}{zp+ov$Jgn;_upbwPR$1OtR%*GOCA+O2m!* zYg-53E^oV0l(`wm5RgUOZci;e}9?9KORd^?88nUvNOa}46>OxJ6t zo2s%*JUXm@zmbyug}ql4e7bSZOU*+9`b?Pl=M9MUcEfi?UHx`>fkc1rXjzuf06qF= zNsny}Ndv$&xb;?jFH5yT8imJcmisLUeC1QD;HFlN-znn~ND6Hxp^7zgRUK~y#E$2< zqej&#C2u{K3ThOfX5+jZlOaNh1)s~T-1lu5Pd>JzXF`e9b_*|&0;i^I$Jf+NozS(T zr{SlN4TW3Z)z8jlAQh`h^^<-JkWmM72$wXk1mk-zu6e>}DNRVn6i;Qov)o<8)bqdYVSGi<7nTN#py3N&AwD(h9LO#EorLo#MMHq>lXwRIF2s@ncf$NNT-W z12lZ;lx}%`spRG0-lpYMe)UP_--z_z>?_BewFC*;r4q4~dEoCa+NY)%v3-8M@@C?a z7_^wR^4$$t$rpGp4X6+%v=LuH0c)8nDW)N7du}Nk`-xEg0!$f@{vcF>EN%wOj2Uew z2#u5=1F8XzkDc&7@pv(L$gjtp8PS2PkR^8pfLiTol^m|^%%L=Of~qS6w6fsoz-!}wI7XL{;L4Y|EpFinKJeNl@1dk zz=WT+Qoz<8n862C!UKzi+6LIhwPs);QA797PQ%&Y(a1kL6Ufl8FJ0H_DdzuO+DGDe zNQ7L+00~?|5p!p}Z9@F~7Ks>Me~012`oNEdMP7b?gsjRB+Ln|9+Vv!{-)?Ykp8 z)CK`v@6BPX@HjzD#z;`t;Fe?6+Me--C=R&omlW7&Z%4ptXmOKWf5!i+A#UPbsRBHW zeAq`I4uH?r=f)lCdm|kI22w~M(m$4=6?at6Vn+6O#_*sudkg>SQ(p>@W%%`vq&Fv@ z<-T{qVN_=UmbT#i?;4W4t)YM9MZFJ}jl$XP8;og?7s>EJFAH0PUXx-ud|ru^>0C<7 znXGu!;z_H)J4^v?a_y9vblQcHPwD^!LW;^%`gkzqITaQXk>HS!#-yC_*(3HJ#X|7| z8NJ-9rDI8Rg-%X7!sD5}LcdNei}@)0173R<|F+t%Q@z5W^Bv#G;jzPN(q2Q=n490E z8p>>gLXZ&xeD4DeZiw}oozw;f2KgVey!_q=!yW~oPydm8osTByf#O#`#?yIQ4`{WR z7M}JQ1zH3tcV5YN)Rl72HLH_O13t-@&X@SF?+jorr(6BB?q46FqzY!DlfmO$TaF8~ zSt3I3n{Qxb!r@cuJ`M{~NRdZ>`>r^JPP(B$1@&IGOqv5D(99ubMyb!AM|x>;}G> z^5FTOK96&5#{o56yapQywERn~_e?Twr9w`bzpi*n2or86OZ%;Na<&Y5qeATZR&sgq zTXA<-2c|A#t<$*>M}~_lIYZXs2g5?K7OgkM2>Of;VnR+|mLr`6Ay0F$D9I^HVN5qK zS?#tA?L5Jiy~k*};YXw)mFlEWv^hL#AmL_{RVB@KxHRZ>sx-2_pOSPCaL;%Lx1JwZ z@0@Gj%|0Uzv8D^n7BZIZ7FG;pKaZx(x9$$t=T2Y041DB)xLc}g!RM@BEd+Po zJ3TBuAK>ZiPQiS4cL7EA3ni#vBO70hissKPc4wTLIegBuiHgL&VC=oOtA$3JPBhG{ zc;#Yi4R;s3*Csp}vL&XDX)NAJZ~40UB;tqd&pRvB%+tprvd2^LAY+mF>v1X6pAeR_ z3_c!LN&bX>RC8mD)CJDwUMIt;Y&)j@0O`w|i#h!sTy`8nnfBBWaf&D`2HSkSSS!)lGkHaVy3 zHA5q%d{TMgs1%Vs5LMM%tjT!lX?mtUMX{4r4S)Jx+)95U^s#=fBzh3{oZ}^7@OENK z4Rs-PUn-wy4E83Qn;Rg!X*+i)mfMv%X9+nms0vpN+V`{*h&rEk0XXvJC)W zLvvN$pM=n;d%jS=RtUNX8F5gnHI+r-u!!Me;Cj?r=7@*MUk8jH$ww#wHPGhU8W%7j zz=(#gV&^FXQl>L0-7?scDeTrOw)$oT5=|vzWbS{|+`hFVqFMo^qSzgp%#JNYKM#_> zs7NaeYJ1r1RuKQjjB#G~koxpBZ8o{>L$>eFjV4K7iI4sjLWUZ5TkQiuO&|^9dzuJ1 zTnAFkR!^vw-W&AD0uJxPkUl?|J>&Ft_BmdUV&m^7EORF-#3PxMs+3P-;hVleCOp3( z#H}!4#XWyMeR`-A8k4r4Zoenh2k<(htH4Wze9hR!PY6+e`c~dP#ea5sOsdQZjqr8c z4|;A{9Gy4!7|!8Nyi~ZJxHpQ~Z_@<;-YPe}zrgO+W_u^5pSLJUAwQ{bHLGPbx49E3 zX~F#NqB#!-38C^4$J+4q9zw5QB13yJ;UB_RcH(N4V|WSt^hN-8tPgdgcpketP$Aox zy$H#jxt=NjEH|IF={50ApH#8GlaRkzxtqx&}Eut+!r_Gm}+7e$6 zg6eRtKw&|*V^pkgqXwr$YnG4wI1MS`{xx$%vRrXiOPTu4Yh_-*CL$z`a(M%g#3-_V z@IXgQth`m&mqcZ)9u%>=G}{}8YCdt6G>;kUWke+xi;jvD@!>Y-q1v#lU>9|)QGKCHV=Hmyo zR$JZ0TC?oaeyj?@WdB=;=HsY!c@Im~qyhc3V&J{s@I$?6u}L(Jk`V)&xw~Lb|1j-+A zd=4rUi$D-Ec1-^durznELmHfzwPT^rT*Y@O`=o0kH7GjTeg2>qEddrenaeGv6Fgzu zjSl~A*MQxA*F$<8lJAY(ovO^dAXDg+CPQkElzZBYXxN8AU7)|bLDyP4Nll^TL z9RmAj(v|B8xS1{aaQ{H9YMco>lk2-I(5p1t1Kavb+Q4){Rb86xVMNV*dXPutAZ1#m zIHfScnZ@|-FCTVnBIA!9oEsSD45R>Vzp3@c!^aib$#aJ|MFVf)WuuA)u;>E1v_xLM z^!1r)U01PMrS1$)|FeOD=c^#?BB8l~^|hCo?>=Gp;8Nvxhe@Y+5Q+QHwqtF$tP2~x z9d0^`TYXmJWoy*=}_>?H?(W2yUR z3;vtJa5OwOy+t!$TDo3owdOAd^`5SgKF^3$dRkmfQJUu=ng0IWJ)@(8)Q+8oWE$^5 z^N{slTqC)B#b?lncRtL>SSEg>s-8-4fVZQ;jIBpc_TBxt@DdKV(uRf1#C&wZmvKpF zA|8nxpKE;>7&ys&cdaaUz7`$Dh26wR0kZ;GW}MdZd2vY(_qXuDupj#*U^Hq~Tv#_6 z^3LZS%`xDR3kN18)JiF&%7leu(l616 zZYYWWCV?#Ybv$>b&5)nDy3;%qCgz3MZj>jQg{Yk~V7i6>lf%>xG((!c3f~!5QW?Nr z>vc($k?`PuZL-yEmH1^ui(J|^&0{c0b_k>2q3XirWE<2BA-4c4cmU(Zq#j`Gh(Z*(VyI0<`udQ zR5$>nZs&{1P#Zx_6}K41hYq_U@xp%k->o`%?M=5Hu$5Gq{p$69u3iky_~lRCpDo}8 zYDX!Tw(V|v+EL?!z|CVG%s+dF_!<9Egg<+=)F&nF(HNnCHX zsO#vt*k#;gFb^e0tU>AY2E{oVc7F*N{R)z-pVO=KFx`YJ(QD^81N}=n_baKqg0!V4 z@HEr$%CZL(g4uNa#7S1TT~R3XHnHja>?8%(^vG#n--CGOn=~I9<7S0si3vn)!1TBI zMnw`ww>@BevjyQ;!P+bo!~G3EweSQeI(B%z-A#_^n%J4|QDDF}KnsykSD>;9>gum> zz60nTb$d*KlYZyJziE*qfCzzof;579FL~bJSt=w%bNGgpn~$P|-_(=e$V_u!&UBVY z%*vSi^{n1?97^Dv|Mqt=eaWOqyC>6zCEl_5b&}Iy<)LHSuk|Xw2p5}aLE-VbYss$P zI4hj^Kgn>TqZe5ByNZTz^vzdG?;`kwJS-dRhN}q z*D!r7d}8@5Y(ov(d(z0@4ByUNrj(G&4`5WM&T_gm*$p&u9=5C)7fUR#zE~%qHYeg< zvfGyP|2)Fg=JLZS$#pXqIEw)LUizZIN5LUy&fJU-RFBMOXwXZp&?1sq`i^5L7fZYO zDexxLA@~xzbu9<|mQa6mixPX*ao7%2+(=bIqEh%zP8872scQirJJEoM0{+Gx~1tqb#l^_)2N230?2y zrDSZ9Z823>H*5KfzSwZIrv5l2vKG_x5E#KKO=i^Xhe1%TG8sTv3|J7C_@N2XW3*Ys zGyfMKSvON#kx=X^Vzyt$)Br%m-D=P{&J9K!%I_%z=`gkSO{Bq@Oo?H8Gn|^Nm_B6| zInjCI2bbEnl91vn0c~`8o*U;yQR#%f_ASY*KG>wJR>Ovi0-D?EG)$J=P@^R=bqBe`KHtD&#EoLJ-bY zL`6i^EGOG&mdjwM1w1-FSaKk%1F<1LTI*Ot)Vlp?EP4)E+nnCj9N=Gy5S0#8JuJ>qNfnzL~7Gcj{t*0(s zM!L6~)+8kOPuhy(uQg7(V3{mm^r!P*ExvKs`hLd=!U%}qi=1S6GcFVy`pi>%+ZRp5ylmsTIm%Io@hw9mP9b*Mib~UwN^7>T=GaD`-slcW5;MngLwR| zN>G>qul!znZQ$l`B3@Y#9Ep&cVQBg12d*oG=IArIJ5d*H)B@mcMlE2>(`b^h(3nFk zq+6xor=Aus4L$8AqSSwiSTW)JSo4-WKhGVS=?}os9;;0~-jdp8g;X9_Mzm%f_#Of` zqqB+elx8Hf`=eNMV+}nOwgLIQW?>46mDk7XkiVy?FVKzKeNRO;iXD8Hea(sUc$A%$ zCoHu6oHN9?(8=Sge(gWx-kw8W$i!MC)Kh|~@y zAqp~xda-@y|9b1jxkXBAXIB~hD?8o}lO#WDZG{098FGoF$tidP?DD8ZPOo9ZTrBME zj;a1Ta0q9490sz`V#_YDs>4Enil=#=6MtfL>x5kQ@`71y_x>{4g0%T$ZL@OEX0cOx zd-TI$YI#ezZ};ze(~Fkx??8u6CW{S$n)Ro@`q`XrC+}>*p+GXf?(?F*F_V)Zy#NNG zp@8*>)%#roqaUf_oeBI)I0%i=~fE`K3 zZ+y6m|DCBtVYBf+q4&0L|C*EEZfuo+ZA_r@dwQ827;teO*wO?=u9JHJ?Q0bNPhd>w z3_=0}4%q3J_}1!Of)|q*SOjhW5!p?;#^4cS`-TxDhfUbz7u;M2i-v%9x-cK7l z%V@~4*b4)~qNJyp-Q`xDl`bgW@SU1?t)&``8at>`#AO#C*g_KZEbwe^4jFHq+XHC$ zE7H&6wXhRZ50AQpC#T*h_ z877F+2#88>ZrgUB1nVY4j`p1@nVz@%5IfZZCe&AJ`|~LyCGN>2->g{~WKb~_5{pi{ z6MUI85(=KAzCj-Mp&pX88RL#?G!xXgPdh0@J!Esc97ej$AkggvmTKtfWs_4b+pM2LLn9B~Ook8o*PF>YEA`D6IlRIN#M2lo3mfA3v-F8vQDl36@&jMkzm zr)75u8gux4WZs`H{Y-khB#xqw2|ojTRzL{Fo4XZJQH3I0Psx79NL?d)Co`xEiHSjg z2roVdj)L7H1()Tqi6bM&+XzGF|3nN94n@vr`Uac-j4=V2^=Rle5phdSB|||ay>Xxw+7ren_FvDXRvur^LPhXlBpY8 z{HfMknVHG9dW*ql^WX8ma%-i+4TI-s@rj8rpUlWx=$X2jE6w$nnM`FYZ7`ptup*>S zdVuarri=+&x4#ce;8XOY%9As{qgDE8FKJ(>D~b4L`VfxFZbVM3xH2&|KqwgJbxM^R zy7zZt>2Iy@ls*5263+{S^BM<@H15n97hc5rA2Uwi8HS}CZo56}@d&@XlqIz)-{vXR zY36)-KZPwezULeTHgVEp8!q^QNP1v%^}A!!(IWqt&U{CV{O3PV!3L$ z9(T)0w-9gNH8$@%hkGYif@cS47IVbvFpc+6D%TPiGbf5a@7{I>4;K1iMSQ;g;Z%#$mXd* zEI-#Hj^jaiO{oyE2AltL$=^GF*^P3LD1cT>qniB@kwToC40ob&;Na?V8JpCCB?ymJ zCZ>-Md(QVLubfwP(g6vj++LbPqLq1)_|9lh01=I&^!~l_Bs#YxBERls*ZA#fm03CC zra}o3qe?5CuXj(3&2OcQ7-3f+#-FcA!Yu>8KGDD1JG94c^LNiiYwS@SNCWAcSyqWX zpZ7)n+h-EBYLURh^~GAFtl{Tw2+|3t?m^r=p9Qn!4BmR9CDP$^W~zbjt@y<7gztC) zdBaRDe@`N%qRQjhk~udO)o%`mZMqcOLWyXqKA`JZSQdqVD9$zSqn3jw2$$H;?@T?#`aN2k)+gJn-)Ig#9`vV`KWFmi`3ey+iKtBv@g-#=c!Y$$=yQW{d`5Ajm$6> z3;}0#I$w;AhDN%~P-W0YiJcyxNB$T3r3jX2$G1PJ!a^0+YCTki(+IBnYO|#};_4X3>uv|ZUud45J zT+r>?oG;$`~%MB09UHv}B5W|rxru)79PDC-%+fXTNL z+IoWYwdZs^Zi2R*!Y^Sj;qd$b(Y&aUSVBd&wbl0F2?9S)iq5xH#E@@(_^|fZ+uie} z##BPe1bm)KDNablcu&ANnMV@*Aae76n3H_C9VL`)_xoTvS!*oCB4QHQeU;A@$gXKb zLI3`peAxm~ptQ->(U`F9R;del5w1A;-DTQiAsSjjb1?t zgdqE;UdK2yedq4!W}>M?NSms@sgd(^mU!)bV)=#YS~*ImE;1DZsp8tcjADtje6NR< zB$#Y+lKlRQ^>)lw(NCFV8ug(8SIr)qr3z=1!NI{{lh6d)TEfNcYz!^u>Ss)h7-p+I`&ojj8ynj-AiWH^*z znAG)FDeJq*^f1$6zTkkr52C$y2Q?1cflK4TP(9&38OxS3nH280qpEOgJ7Us_6Fau< z+|wpIl>D?Xq>Nz*#U*JJS53t;#y5DA6lP9j0K+i0;Xww+9&FZaCNYjQr62`-?YcDji8PCLI7@I^nd ztelTIOO}D^CNSzP9Mk|?YtME-c?;j;7;iZ8BO*Ed}w$m5lvBqN|s7PPiXA#U>HnxrEm!3p1l$O z#6C3~=8*QLo8GZ=#|0z^6R_dCF=*akqcvO)F95$H`7+ z)!f-*4auN7POA=6F=0wvX>0^r!`q+2hScCrs~!=f-$7F8Kty31A9nieUv-dwVFZmO zi&(?0XNC2R^MZg79#qPLpfFIcQENf5AFK(QP70cCW#K{|2!puehiA^g?BXIlG5}V= z{M|PLa7`A|I1{6{rXImQ2#z2M5EcnIvw*WAwJE)YT>I+fSk9GM>J(B(+iE*W_}1k5ctZM+gC zsN$_aD^VLfZ%u5ryV~0*JyAvSs8CP?JA)O%sCjTVbBSWmgm))!rH37G%E$yj1^83~ zo@oU6jLqvGk7c;GW|g-deS!*GUKTj3I$TLl#tx zdae4$0$^&-mNX5MxdcrBbpl{uU_cPPiSyK2qa_+T9z5u8x80>wLNkqY4!aDhe#3yz zWMs6^dgTH!7-j(@VTCL%<%q$1R#0<9VXJS!t5!UnVc~JX?V65?!sCM#>^Vdx1z0YQ z8b>PeoB${I!+6+HAv?<&gU(~D3zfCiw=T=>%I{?%Q0aPhGJiT|vUxW=o96AbZ7nf3 zM-&|-v9I(R5YM@9?0J2js-nM_RsUG2X~@y{z@;@Rf1XNbZHVD5F%8J}`Ud{OvvIW< z-bq^9+2|2=d6#)MOX#Ai`Lvo8nGWQxC%wuR!SurH2uwTQ>iebFa2{+ zphkUnc_c92sWp2R?GwvyW9^RZkO}MYam6r_9R_LYRX~Wl?FNgCC8c<6g-4vXMz382 zo5ifICI!X+!oiMk9WHW7KXXrQmmH3I{Y&h_ye|;MnffLoK|RQXTn|+ff7E-f`+!FP zW4g`9lW_PaBgxktUs)1uoNbkwgB?O?+Hb?LFvAcT{oZC@$O>3C<5b+wHC4mMrs-Gc znk%p*W`+(IuXC35zg0r@_|K^>xO&Nr%ZJhT)a$6?arlg%Bu4fIP`4wINh<5E$IsEO zpOMepfN&5@@R2p?JvoIQJWfUZ-tVs~J~34My$~kh+9l|=fbjdD$-jMu;E6ub?4%G! z<>qAAxb^aq7A15S1v*|o**g4;kdS(A&LygZKAn$Azt-u;YHbk!2Z@5^W|b23b4iP| zU0-_Y;oe{DGTih-C%ng~&DjFN9`ILKb zJl3?|LB#uHQGUORq@{AGWEMkHfkp(jlH|!O>efcJ$f78LCqRH!h$Rl2&r8t!|2c&0 zfI)fFtJg>w%TD=s6ov@aETR~M`P{YQ-&llX#E1;Y*Z+nQh8i&-a%HvbESG9#smFT% zC9_QO%3KqDS1%L#?(EVd{asuLINGjnL7vHF*83v7k(4P5Jaa~Y8XSQxU>&G`M&Eet z8dFA})VlE7y{bTA*V%mWM6p`pumvqy#qF*=JsbO#ev38aiAcaQDm4WQA$q~~>p-*6 zGhqkMe|v=&ZW~VTCNHP^-Mn0*PrF#Zd-^1%_-zI&ZY2n+1em3n6-(6RjJDWqoFpm= zpolObJ8MHo=f^|isMQnQfl3?n2SZegyBho(B3=+U?Hq?3(hu|u?oxd@oh7eRmUXAU z1;|m+6-qIX>a`L?>oy&9Q56>Ba=K|*SCKmGr+!B%v;Dq*OS#w_e|vRjGR?ZpRi%d- z;tuJb>#~)`flg6)Qbl&@gZZMWa!| zf=oabH%XkyVHG9J1ob5B*e=vA(sv1(gN3|!x!t{?JPGuX;j(fwf23 z9@{7?yk;Me&wMWp2;8aENdD=q+upM4A(Kj-9kW5v2^uy)A@c7sTQ;tFZb4=QCiX8+s{_%W}ZCZ<20LB8VUMuxKH5hD$3^A z$%wCdt4o(;jK=O)Qb%b5$=ly zzjt>~{$TYWor*gyxZ=$=C)4~$2u%)f?YnAFaX5bmCdLjlvhK_IVmJg`4!M%D=dpOw ztB7-N;5&9B0*-1M{)7m~%}i%Bm00)x_(|1EY)jswUrp0&S;^oyLZ{Q_LD+3{9R5{g z;&rCkp0MB4$ZY80V2~PyM35mdOD1NOP$dkMZ}sZD0}a5S%QtidkRd` zVYlCSUhl^Rsw2M_cVX{4Hya7wwApTVX>S_0jf{;Utn|2FuCHF?c?|>qnzm#GOnK0; zuzrhIC>Q=}TWp966kH4^jDh)q3R<~LV?+nO^Ark)39KvqpOs?sE+QFws5<*(AhPMT z(C-h%NzWe?|9#|8pfDN*XgHiv`!fz)&T2DEeJh$JJq&FfE=94G7k5R{^yaFpv3z8e z*|Y0Ia`|*Fz!dd&V$bNr>)EAl)1Fw9ECktiEkNF%yeN2eS~?!3QJkP;nQILO-E$O1piY=SK+x z=x%i;Ewa_Y(Lop`i8R0jgh(ehB*((U#s09lxBD>YGBh{QsH`eyD%Zt)zn#kJcF(ro zippQ-BU((V9>>SzUY;0OoQMgk{0@aQ$Lv0s4dFCYpmQ1B^u8>o3hvz^) z0B#oymA|An|!ED)Ly3K<|K)?_p zzDn5JnO9Q>2N{)?YOv`LwYzWkZAc@QfS+dBOie9`+ItqA^tO)6!AZ{RXe#~ZoX<~C zrjud4ze68S&liQX=s?%*gYtVN+i<5iabsC?CNWdt**kQmUN!?|bHRP3$7^oDY(H1s z#Jl%M>*B=~W?&ZjjOk9g5b`0w%|_YA|8wlMx%`Kp&jLb=46BRnQt~c)VZG@ns@^j& zpOB6p2!&IvHRvug8&5z?#m8H#dt-t%ERn;;F88QGs^!&&sAW!9b{S$Mh9)$6AOwv> ztzUKMd6h7!qS&Uvr4LYQp$g#;o)fK`fp-)xr|m3&U;;jGpm!WiF&!V0RBd$DR0ckN z@m*!&n^l@QNr<%N%KyUC_aLa4a2?t?um3#03Jowmsh%aT+-rcN2Z;#_C0iBrl_mcd zJH&w-GTnFH+Yx~zaa7P{sPgDo&8Wc3q{{zolxrHy>H^<>R0v=+3q6LPZ0-`(_Hf5tl8Q?&Oih*&Ft=Pbe-{GGLbZ|TvsTmj4{rjh(u_5iWoCAlM zMuB(^VW@Bu@3~9B8gE_SkA6K^wpn&bxE?6 zT=ulIrrwK3`6!0Jwj`JTGJ6e4^#Cf{fcGb?`M8Pxr8@x9g|^r@fk$&Mf_zb(`@CbW zP*PUKq6)0ga=Xh522XYd+-P*b{%z)H=))xc@o-#KSXFA0T z0m}a`>`MtiZx)JG%hwLbpgW&@h%5WrayqyCDoPx0>uuwB#Ym&}R>ZE@yM&lEirOnyttxHRR@z!Y>{TPEy;`eAjoPBbu2E`~5?U0s zsa5(M{r(M4e#`45CnxuPopW8E_vgCrOQpZ3iDBBz7+xXO>#(@L=Zk-}KR!scX!hc5 zzxX|;nohQ|F65WNLJCQLmCuY>-|76j4Fiq5924SZmK?;R2vst>!oWIaFAWeCSHZlm ziY=mP50v?>(Z?@bBPgNtX1~ef;7ztuDflMYgLcsjPK9^}9QdKZZhxZxDs#&)t9HJQ z2teAl90lTj5oAkc(mSag@`JsMWm_+rNRGpqedNI#W4DIT2}H()u_`>H2pMiRRH>@Lb2c9UwA)Z2>w%aN4q zB2yaxI1B$?8gy}1$fUyLOcXYRhy$|LbvZaD2yv1>+K(5CfN|1hQZ@UT2i)B}EJWLeg_5%u zm!0uhKbzGTZ`iYHd^=FceD#{16q7=_3v6U{Px{Ygl?~7Khe+eV%k|{HdoMFH*f1cO%GZCqL^eo)$Z1jr@x8rEgRLl^Od_(pfc`7P06DkF+Ydd;jp$UK4Vz$-dfO$9L)J)75$O+_qBDT6A^f66bGjRLBkJ6F#kH5r7 z?6H8U#2$Y{Yg*HQ&K`(8Q|#&kTLE^R4Gd^;|A>bq%fl3!PWK?c!B6<+FG=YI#@Kk4y)5oyG4!=&@EOG?lcg&8_cczg6@ZZ)rJCwj^$jJ6f>#b;-%D$H*tS#C z*j1@CTb-C@Z>t#vU5mMppof^s10?)t_SU{P9X|6l>t_AYy87E$(&1k1?w|L1RQ4o8 zr_1zg+BLvigd$IM=p?h|_3SMYn8Z~Z83oPm&B)qYsNx;E(;{^!ortUa9GO4fB#w1% z_F7@pp4=GJA>R8FxZ4Gm$PY(x-TmP9trSS0%WtY}TFqoP$VV>#fK&#+@x#wLqT}hd zLHyztFrOYH)2p0w%7Oo)=Bx7;q5SEN<9hTW|B+ODtrd>)hA2r4KADB|N94QF=fO?8 zcFfY^7Os6Rh{^}wTda;zU*h%?%N3>^*W0e5Qv`JXotmfztGCUh;c0fgb_aqYR@#$u z>?`mS%GCZIL~WMQiYJfObAhKm@Nsfbhn6>!Ld?IkmB#YjsE^y~`O%1;PgdmiU2-=@xJ5$OTIM?**4`weQV1ItlwEpih?j!~` zOVOxb5yC8`sPecKs_82}id)Q)1%gS~43V%GIfz6puoj!!J ztyt%RZG*Is$`ksR*=O{E+SBl3m~84)4nqB?BUArQg=v9YZ2DLz$x$hQ#Sgd7;-&s1 zvK=1lh<@&QI=6n2^U4HG3v*v#aduwQqkFy~tn$+A#ROE6_@Ym5 zWSFvd6i|8Ex6gTXF#W5=gExcIN9p^(=)0=KCY(9)ehfzKkF)b{mK|buh*qu-QWp;# z@NnR}#ofO)S0mDnR-fXFB=||YG@WQnxXYdaUIk>P*l~l&Rfkl3`aMwmYk4-ED@G{2 zL}5Yvj9tkgiumYXRQw!J;Fx`{IIy@cKT3G`G{9IZ=B zl7PS<1?)1@3Nwggmtb`pS6_&#=KIGmAD-szh4$9@D3@s~_jXLE;6`A5@LY-u7kmX%^-ogvm#Le=>`U8gEo(%3UQ>Tcs}dfOwAxSA%7 zk_S;3J@5YAjCqt<;87=$1I|#X^XyZ-$H2Vv4f4MG4!}}wDIv;K4hXt* zul4F9-2kVVUXU_}R))w$!J{YyLN|XeU#2%6LLByVH9Oznx5RmAYs(Mi<5TDNvtc4V z*Xck+mH8fm<0LS_dO{hVy9YHCGSro3+_LNsw!t{aPYva9GAH!y?sD9xSe8kc!7o2d zt-4Ls%)NrP0#{N7Ay^;567;8u0n}H;-VQ2*NY)A4`0;WP*b9JET1!pEq{oilHY>jp z3kqor8N?y)uD-y^fzoae86)3-DWP;F2@*?7ORDzd=<9;4)0Fk14};CG=45&nCgCSk zO~P##<4qDsrb;1;k6Y{(o=ypPo|nms zA-&pw3p1gj634LEOe>EAiz?)dlCF67@8{9-71YRb1q(4Zq0;L_DaPH`~ao6jPPpw4R_yh zdz5Oy5(zf`@7ZSckS@93^mQRlgshznXK{#^L63&=G}+3scJ}POJt~6xPJ^k`moS7s zU>s|zsHmu}g5wIJI7r-w34<$|p3rW95~7?4SUym>Nf~e57?bD=QpTp>Uvezfx#+IyFgDYvH><{WKO=`%tw_?tFr;(OFc+uxm}78g_=ww9dDU9hb5nZn7Vt zDo7s$_+L{g=t>Tkr^to%_e9gGL+C{BBStvq8z4u7PnuS5T{1gs4uu|BW_-cYowM9z zfN>zZH4rxXsi=m;+r!!GnJZr=9lu6-pj*8q`ZTQYyiGKIf#8)ZtGXZ@D49tyVxak2 zTuV92=CYvsN2Q3i@p3f<8*jfpo%Kyx-RKU(+P0=GVGMd4<;}Q^|0?c&U#{Cs*F1qb zZ8$gn=E|U!4zfVnnnv5VuJ<$55Ji9_PS26gLYCs-(NTH*`XO`q!G%N1C(O1TG$Teo zPAAc;rZ5isG-BvF2#z$Sa(v~mV2YrXAtdtDXnLfvTrrd?Y3wo5eT1VV%QIV_sVx$^ z1a#FKfa%awsyCT5J)-_D>5xiXdCxUV(mA0=V9D|(!H$AkI8YEr<6>s2O+8!WH0!Dl z=(Jg;*iywDBAME-m%U%Xm=jbSB`0lDYt6oH+vmx(DqXdMzv{cZ`5`$z>RI<^G;#d%?)}FJD*=kgo$YMImSf}d4f)j< z-R~n+(GJp6iZc%JAR?d35x$*spR1WecGEgjpJG_b~On8_OZ?w<0G&w3CF5GMHeI{vk3mG!K>cyqXOzVaW9%gHQu02t9(#E}rXwXQxp0_6R>`-?M4I8d0df!C1|$ zJ&thY+7^dc96n9Z(|o}WToN5bI803goO*h`Ia{QUD~St^p_=f53Li~!;4SALY&R@#+~D)96MsT8>%( zaso;Yie{jd{?whW9(nR5Hdf|R=Dl0vx(RTI4%J7FQZp}Rg#fcxI{4V{HB`*H&UWfb1-8sz}m635RR(dYR z-E&#RI|3T48(QX-I_@_--M^j})()NNb!ZuspK zi16xYQy(Z?u#fYpqkhATqI%7$)DHYRAxiE=@tTPPXP0e?ZKgK{Q*KG4w-e8+^5Gag zz@t*8KH;N^0w51LR*+`ID9koBIMOyKZ#XYdm0|yQrW2~9(i5h&v%i+{?L;OnE&mnp zY_7)$54r@Qoi?hFf(Np}PRoB3E zflkDKa~iCw%TCb#$o3P(4#G~czdTmC|Fvu2EUUiHd~@%!6)6M8GK32+e7(o85ZW2M z#=zaT)??c^(K%I_^83R|Kmoni8rP*r63KzNgybxay=xK4a=OyvnHd;==UV!&YJ`pC zfR@nWgp44a&B3@z&gW(HmeO6PHPK?JpfXsc(~an8x0R-f{{-AZs4^?> z#`?LRRMWL!f(gGNv)8_O4r%<(;uFSx{Tjhe(lV_NZoLM_4Vz^KXJoZ&y)5@v>V25V z9y7GWz1f-7NgYgVoRj>D?0oA^XPpEzU`5X@+;WAR4xI2D+2Q7{IshIJ%S$-$rLD9T z&KiTqyhflUS1c8<5;;6Z8Fl;v2TsqAhyGaDXz2CKr=*DLKje*|^Sw*qkj4b9?d-~1 zaPoPDz7V2>Eww&(hCe;ch0g6329dq(_pTxoxl~tt`0%a!e$M8E6A%c1n!kyXt(INF z`#TN5QBe(sGuwa{s}Rz8+&h}qWRwp}SFo9P&zYzxiOH+u(25o?a5nL|x^3GCP%niH z&ErFPqo$L*YUJ}|ioV(SnHo&>$@!I_hJh$i b*uR8>->6tPjV~^*0WU3eJ+&%TyU_muXO!3M literal 0 HcmV?d00001 From 103300f969fac2fef1dc6dd02d1cd25bfef28360 Mon Sep 17 00:00:00 2001 From: Ed Chalstrey Date: Fri, 18 Mar 2022 13:36:06 +0000 Subject: [PATCH 40/89] remove not needed non-buildings toid code --- etl/filter_mastermap.py | 16 +++++++--------- 1 file changed, 7 insertions(+), 9 deletions(-) diff --git a/etl/filter_mastermap.py b/etl/filter_mastermap.py index af27a21c..76c1d559 100644 --- a/etl/filter_mastermap.py +++ b/etl/filter_mastermap.py @@ -8,7 +8,6 @@ import json import os import sys -from multiprocessing import Pool csv.field_size_limit(sys.maxsize) @@ -20,19 +19,18 @@ def main(mastermap_path): def filter(mm_path) output_path = "{}.filtered.csv".format(str(mm_path).replace(".gml.csv", "")) - alt_output_path = "{}.filtered_not_building.csv".format(str(mm_path).replace(".gml.csv", "")) output_fieldnames = ('WKT', 'fid', 'descriptiveGroup') + # Open the input csv with all polygons, buildings and others with open(mm_path, 'r') as fh: r = csv.DictReader(fh) + # Open a new buildings csv with open(output_path, 'w') as output_fh: w = csv.DictWriter(output_fh, fieldnames=output_fieldnames) - w.writeheader() - with open(alt_output_path, 'w') as alt_output_fh: - alt_w = csv.DictWriter(alt_output_fh, fieldnames=output_fieldnames) - alt_w.writeheader() - for line in r: - if 'Building' in line['descriptiveGroup']: - w.writerow(line) + w.writeheader() + # Then write to the output csv buildings only + for line in r: + if 'Building' in line['descriptiveGroup']: + w.writerow(line) if __name__ == '__main__': From f6cb0c488edcd8d1f8be2655dbf70824790ed0eb Mon Sep 17 00:00:00 2001 From: Ed Chalstrey Date: Fri, 18 Mar 2022 14:34:06 +0000 Subject: [PATCH 41/89] add test for filtering mastermap (not working) --- etl/__init__.py | 1 + etl/filter_mastermap.py | 9 ++++----- tests/test_filter.py | 8 ++++++++ 3 files changed, 13 insertions(+), 5 deletions(-) create mode 100644 etl/__init__.py create mode 100644 tests/test_filter.py diff --git a/etl/__init__.py b/etl/__init__.py new file mode 100644 index 00000000..a9f46b58 --- /dev/null +++ b/etl/__init__.py @@ -0,0 +1 @@ +from .filter_mastermap import filter_mastermap \ No newline at end of file diff --git a/etl/filter_mastermap.py b/etl/filter_mastermap.py index 76c1d559..be68f756 100644 --- a/etl/filter_mastermap.py +++ b/etl/filter_mastermap.py @@ -14,20 +14,19 @@ csv.field_size_limit(sys.maxsize) def main(mastermap_path): mm_paths = sorted(glob.glob(os.path.join(mastermap_path, "*.gml.csv"))) for mm_path in mm_paths: - filter(mm_path) + filter_mastermap(mm_path) -def filter(mm_path) +def filter_mastermap(mm_path) output_path = "{}.filtered.csv".format(str(mm_path).replace(".gml.csv", "")) output_fieldnames = ('WKT', 'fid', 'descriptiveGroup') # Open the input csv with all polygons, buildings and others with open(mm_path, 'r') as fh: r = csv.DictReader(fh) - # Open a new buildings csv + # Open a new output csv that will contain just buildings with open(output_path, 'w') as output_fh: w = csv.DictWriter(output_fh, fieldnames=output_fieldnames) - w.writeheader() - # Then write to the output csv buildings only + w.writeheader() for line in r: if 'Building' in line['descriptiveGroup']: w.writerow(line) diff --git a/tests/test_filter.py b/tests/test_filter.py new file mode 100644 index 00000000..cf2a2083 --- /dev/null +++ b/tests/test_filter.py @@ -0,0 +1,8 @@ +import pytest +from etl import filter_mastermap + +def test_filter_mastermap(): + """Test that MasterMap CSV can be correctly filtered to include only buildings.""" + input_file = "" + expected_output = "" + assert filter_mastermap(input_file) == expected_output \ No newline at end of file From b875e27fb861ffb655c6d209e2b7ecb27f159087 Mon Sep 17 00:00:00 2001 From: Ed Chalstrey Date: Fri, 18 Mar 2022 14:48:19 +0000 Subject: [PATCH 42/89] missing colon --- etl/filter_mastermap.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/etl/filter_mastermap.py b/etl/filter_mastermap.py index be68f756..de4a02d0 100644 --- a/etl/filter_mastermap.py +++ b/etl/filter_mastermap.py @@ -17,7 +17,7 @@ def main(mastermap_path): filter_mastermap(mm_path) -def filter_mastermap(mm_path) +def filter_mastermap(mm_path): output_path = "{}.filtered.csv".format(str(mm_path).replace(".gml.csv", "")) output_fieldnames = ('WKT', 'fid', 'descriptiveGroup') # Open the input csv with all polygons, buildings and others From 35f85ee2edbee20c5b1e81420f118e5fe5e81328 Mon Sep 17 00:00:00 2001 From: Ed Chalstrey Date: Fri, 18 Mar 2022 15:04:09 +0000 Subject: [PATCH 43/89] test mastermap filtering works --- tests/test_filter.py | 10 +++++++--- tests/test_mastermap.filtered.csv | 2 ++ tests/test_mastermap.gml.csv | 3 +++ 3 files changed, 12 insertions(+), 3 deletions(-) create mode 100644 tests/test_mastermap.filtered.csv create mode 100644 tests/test_mastermap.gml.csv diff --git a/tests/test_filter.py b/tests/test_filter.py index cf2a2083..9f4b34a8 100644 --- a/tests/test_filter.py +++ b/tests/test_filter.py @@ -1,8 +1,12 @@ +import csv import pytest from etl import filter_mastermap + def test_filter_mastermap(): """Test that MasterMap CSV can be correctly filtered to include only buildings.""" - input_file = "" - expected_output = "" - assert filter_mastermap(input_file) == expected_output \ No newline at end of file + input_file = "tests/test_mastermap.gml.csv" # Test csv with one building and one non-building + filter_mastermap(input_file) # creates test_mastermap.filtered.csv + with open('tests/test_mastermap.filtered.csv', newline='') as csvfile: + csv_array = list(csv.reader(csvfile)) + assert len(csv_array) == 2 # assert that length is 2 because just one row after header \ No newline at end of file diff --git a/tests/test_mastermap.filtered.csv b/tests/test_mastermap.filtered.csv new file mode 100644 index 00000000..28cc7baf --- /dev/null +++ b/tests/test_mastermap.filtered.csv @@ -0,0 +1,2 @@ +WKT,fid,descriptiveGroup +"POLYGON ((484704.003 184721.2,484691.62 184729.971,484688.251 184725.214,484700.633 184716.443,484704.003 184721.2))",osgb5000005129953843,"[ ""Building"" ]" diff --git a/tests/test_mastermap.gml.csv b/tests/test_mastermap.gml.csv new file mode 100644 index 00000000..d76a6eee --- /dev/null +++ b/tests/test_mastermap.gml.csv @@ -0,0 +1,3 @@ +WKT,fid,descriptiveGroup +"POLYGON ((484704.003 184721.2,484691.62 184729.971,484688.251 184725.214,484700.633 184716.443,484704.003 184721.2))",osgb5000005129953843,"[ ""Building"" ]" +"POLYGON ((484703.76 184849.9,484703.46 184849.7,484703.26 184849.4,484703.06 184849.2,484702.86 184848.9,484702.76 184848.6,484702.66 184848.2,484702.66 184847.3,484702.76 184847.0,484702.96 184846.7,484703.06 184846.4,484703.36 184846.2,484703.56 184846.0,484704.16 184845.6,484704.46 184845.5,484705.46 184845.5,484706.06 184845.7,484706.26 184845.8,484706.76 184846.3,484706.96 184846.6,484707.16 184846.8,484707.26 184847.2,484707.36 184847.5,484707.36 184848.4,484707.26 184848.7,484707.16 184848.9,484706.76 184849.5,484706.46 184849.7,484706.26 184849.9,484705.66 184850.2,484704.66 184850.2,484703.76 184849.9))",osgb1000000152730957,"[ ""General Surface"" ]" \ No newline at end of file From a159c974dc33b6e7e3a3efab57f4403172ac08d9 Mon Sep 17 00:00:00 2001 From: Ed Chalstrey Date: Fri, 18 Mar 2022 15:27:12 +0000 Subject: [PATCH 44/89] name output_file with input_file --- tests/test_filter.py | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/tests/test_filter.py b/tests/test_filter.py index 9f4b34a8..ed785013 100644 --- a/tests/test_filter.py +++ b/tests/test_filter.py @@ -6,7 +6,17 @@ from etl import filter_mastermap def test_filter_mastermap(): """Test that MasterMap CSV can be correctly filtered to include only buildings.""" input_file = "tests/test_mastermap.gml.csv" # Test csv with one building and one non-building + output_file = input_file.replace('gml', 'filtered') filter_mastermap(input_file) # creates test_mastermap.filtered.csv - with open('tests/test_mastermap.filtered.csv', newline='') as csvfile: + with open(output_file, newline='') as csvfile: csv_array = list(csv.reader(csvfile)) - assert len(csv_array) == 2 # assert that length is 2 because just one row after header \ No newline at end of file + assert len(csv_array) == 2 # assert that length is 2 because just one row after header + + +# def test_filter_mastermap_missing_type(): +# """Test that MasterMap CSV can be correctly filtered when the polygon does not have a type specified.""" +# input_file = "tests/test_mastermap_missing_type.gml.csv" # Test csv with one building and one non-building +# filter_mastermap(input_file) # creates test_mastermap.filtered.csv +# with open('tests/test_mastermap.filtered.csv', newline='') as csvfile: +# csv_array = list(csv.reader(csvfile)) +# assert len(csv_array) == 2 # assert that length is 2 because just one row after header \ No newline at end of file From 158a09637b70813119bee3d7ddc896ca1aaeae40 Mon Sep 17 00:00:00 2001 From: Ed Chalstrey Date: Fri, 18 Mar 2022 15:30:17 +0000 Subject: [PATCH 45/89] add failing test for missing mastermap type --- tests/test_filter.py | 15 ++++++++------- tests/test_mastermap_missing_type.gml.csv | 2 ++ 2 files changed, 10 insertions(+), 7 deletions(-) create mode 100644 tests/test_mastermap_missing_type.gml.csv diff --git a/tests/test_filter.py b/tests/test_filter.py index ed785013..aae7842d 100644 --- a/tests/test_filter.py +++ b/tests/test_filter.py @@ -13,10 +13,11 @@ def test_filter_mastermap(): assert len(csv_array) == 2 # assert that length is 2 because just one row after header -# def test_filter_mastermap_missing_type(): -# """Test that MasterMap CSV can be correctly filtered when the polygon does not have a type specified.""" -# input_file = "tests/test_mastermap_missing_type.gml.csv" # Test csv with one building and one non-building -# filter_mastermap(input_file) # creates test_mastermap.filtered.csv -# with open('tests/test_mastermap.filtered.csv', newline='') as csvfile: -# csv_array = list(csv.reader(csvfile)) -# assert len(csv_array) == 2 # assert that length is 2 because just one row after header \ No newline at end of file +def test_filter_mastermap_missing_type(): + """Test that MasterMap CSV can be correctly filtered when the polygon does not have a type specified.""" + input_file = "tests/test_mastermap_missing_type.gml.csv" # Test csv with one building and one non-building + output_file = input_file.replace('gml', 'filtered') + filter_mastermap(input_file) # creates test_mastermap.filtered.csv + with open(output_file, newline='') as csvfile: + csv_array = list(csv.reader(csvfile)) + assert len(csv_array) == 2 # assert that length is 2 because just one row after header \ No newline at end of file diff --git a/tests/test_mastermap_missing_type.gml.csv b/tests/test_mastermap_missing_type.gml.csv new file mode 100644 index 00000000..04c1efdd --- /dev/null +++ b/tests/test_mastermap_missing_type.gml.csv @@ -0,0 +1,2 @@ +WKT,fid,descriptiveGroup +"POLYGON ((517896.1 186250.8,517891.7 186251.6,517891.1 186248.7,517890.75 186246.7,517890.65 186246.35,517890.45 186245.95,517890.25 186245.8,517889.95 186245.75,517889.65 186245.75,517878.3 186247.9,517874.61 186248.55,517872.9 186239.5,517873.4 186239.7,517873.95 186239.8,517874.25 186239.75,517874.65 186239.7,517875.05 186239.6,517878.35 186238.95,517889.1 186236.85,517892.769 186236.213,517903.2 186234.4,517919.55 186231.4,517932.25 186229.1,517942.1 186227.25,517954.65 186225.05,517968.75 186222.45,517985.25 186219.5,518000.0 186216.65,518021.7 186212.7,518026.7 186211.75,518029.1 186211.3,518029.68 186211.173,518033.65 186210.3,518046.1 186207.65,518058.45 186204.95,518063.3 186203.6,518068.1 186202.25,518068.9 186202.05,518079.6 186198.95,518081.4 186198.3,518083.2 186197.55,518084.95 186196.8,518086.7 186196.0,518088.45 186195.25,518097.85 186191.05,518099.15 186190.45,518108.3 186186.2,518108.375 186186.175,518108.45 186186.15,518108.477 186186.132,518114.5 186183.6,518114.65 186183.55,518114.85 186183.45,518115.05 186183.4,518115.25 186183.3,518115.35 186183.2,518115.45 186183.15,518141.85 186171.55,518142.0 186171.5,518142.15 186171.4,518142.45 186171.3,518142.6 186171.2,518142.7 186171.1,518142.8 186171.05,518142.9 186170.95,518143.05 186170.85,518143.15 186170.75,518143.25 186170.6,518143.4 186170.5,518143.5 186170.4,51814 \ No newline at end of file From 3f281a5e73f546b178000f3d6a84ac498f428130 Mon Sep 17 00:00:00 2001 From: Ed Chalstrey Date: Fri, 18 Mar 2022 15:44:19 +0000 Subject: [PATCH 46/89] handle missing descriptiveGroup --- etl/filter_mastermap.py | 8 ++++++-- tests/test_filter.py | 10 +++++----- ...est_mastermap_missing_descriptivegroup.filtered.csv | 1 + ...=> test_mastermap_missing_descriptivegroup.gml.csv} | 0 4 files changed, 12 insertions(+), 7 deletions(-) create mode 100644 tests/test_mastermap_missing_descriptivegroup.filtered.csv rename tests/{test_mastermap_missing_type.gml.csv => test_mastermap_missing_descriptivegroup.gml.csv} (100%) diff --git a/etl/filter_mastermap.py b/etl/filter_mastermap.py index de4a02d0..2167a53c 100644 --- a/etl/filter_mastermap.py +++ b/etl/filter_mastermap.py @@ -14,6 +14,7 @@ csv.field_size_limit(sys.maxsize) def main(mastermap_path): mm_paths = sorted(glob.glob(os.path.join(mastermap_path, "*.gml.csv"))) for mm_path in mm_paths: + print(mm_path) filter_mastermap(mm_path) @@ -28,8 +29,11 @@ def filter_mastermap(mm_path): w = csv.DictWriter(output_fh, fieldnames=output_fieldnames) w.writeheader() for line in r: - if 'Building' in line['descriptiveGroup']: - w.writerow(line) + try: + if 'Building' in line['descriptiveGroup']: + w.writerow(line) + except TypeError: # when descriptiveGroup is missing, ignore this Polygon + pass if __name__ == '__main__': diff --git a/tests/test_filter.py b/tests/test_filter.py index aae7842d..16284657 100644 --- a/tests/test_filter.py +++ b/tests/test_filter.py @@ -7,17 +7,17 @@ def test_filter_mastermap(): """Test that MasterMap CSV can be correctly filtered to include only buildings.""" input_file = "tests/test_mastermap.gml.csv" # Test csv with one building and one non-building output_file = input_file.replace('gml', 'filtered') - filter_mastermap(input_file) # creates test_mastermap.filtered.csv + filter_mastermap(input_file) # creates output_file with open(output_file, newline='') as csvfile: csv_array = list(csv.reader(csvfile)) assert len(csv_array) == 2 # assert that length is 2 because just one row after header -def test_filter_mastermap_missing_type(): +def test_filter_mastermap_missing_descriptivegroup(): """Test that MasterMap CSV can be correctly filtered when the polygon does not have a type specified.""" - input_file = "tests/test_mastermap_missing_type.gml.csv" # Test csv with one building and one non-building + input_file = "tests/test_mastermap_missing_descriptivegroup.gml.csv" # Test csv with one building and one non-building output_file = input_file.replace('gml', 'filtered') - filter_mastermap(input_file) # creates test_mastermap.filtered.csv + filter_mastermap(input_file) # creates output_file with open(output_file, newline='') as csvfile: csv_array = list(csv.reader(csvfile)) - assert len(csv_array) == 2 # assert that length is 2 because just one row after header \ No newline at end of file + assert len(csv_array) == 1 # assert that length is 1 because just header \ No newline at end of file diff --git a/tests/test_mastermap_missing_descriptivegroup.filtered.csv b/tests/test_mastermap_missing_descriptivegroup.filtered.csv new file mode 100644 index 00000000..8de75769 --- /dev/null +++ b/tests/test_mastermap_missing_descriptivegroup.filtered.csv @@ -0,0 +1 @@ +WKT,fid,descriptiveGroup diff --git a/tests/test_mastermap_missing_type.gml.csv b/tests/test_mastermap_missing_descriptivegroup.gml.csv similarity index 100% rename from tests/test_mastermap_missing_type.gml.csv rename to tests/test_mastermap_missing_descriptivegroup.gml.csv From 37ecbd0bd2f4854f083ff10c4ae6006151a6440b Mon Sep 17 00:00:00 2001 From: Ed Chalstrey Date: Fri, 18 Mar 2022 16:02:14 +0000 Subject: [PATCH 47/89] add etl github workflow --- .github/workflows/etl.yml | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) create mode 100644 .github/workflows/etl.yml diff --git a/.github/workflows/etl.yml b/.github/workflows/etl.yml new file mode 100644 index 00000000..9b8208d4 --- /dev/null +++ b/.github/workflows/etl.yml @@ -0,0 +1,22 @@ +name: etl +on: [push, pull_request] + +jobs: + build: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v2 + - uses: actions/setup-python@v2 + with: + python-version: '3.9' + - name: + Install dependencies + run: | + python -m pip install --upgrade pip + python -m pip install -r requirements.txt + - name: Run Flake8 + run: | + ls etl/*py | grep -v 'join_building_data' | xargs flake8 + - name: Run tests + run: | + python -m pytest \ No newline at end of file From db01442d9b68f9d062f20aec2180ce76e66dcd4b Mon Sep 17 00:00:00 2001 From: Ed Chalstrey Date: Fri, 18 Mar 2022 16:18:44 +0000 Subject: [PATCH 48/89] flake8 --- .github/workflows/etl.yml | 2 +- etl/filter_mastermap.py | 8 +++++--- etl/get_test_polygons.py | 19 +++++++++++++------ 3 files changed, 19 insertions(+), 10 deletions(-) diff --git a/.github/workflows/etl.yml b/.github/workflows/etl.yml index 9b8208d4..0d56a433 100644 --- a/.github/workflows/etl.yml +++ b/.github/workflows/etl.yml @@ -16,7 +16,7 @@ jobs: python -m pip install -r requirements.txt - name: Run Flake8 run: | - ls etl/*py | grep -v 'join_building_data' | xargs flake8 + ls etl/*py | grep -v 'join_building_data' | xargs flake8 --exclude etl/__init__.py - name: Run tests run: | python -m pytest \ No newline at end of file diff --git a/etl/filter_mastermap.py b/etl/filter_mastermap.py index 2167a53c..3315a6ca 100644 --- a/etl/filter_mastermap.py +++ b/etl/filter_mastermap.py @@ -4,13 +4,13 @@ """ import csv import glob -import json import os import sys csv.field_size_limit(sys.maxsize) + def main(mastermap_path): mm_paths = sorted(glob.glob(os.path.join(mastermap_path, "*.gml.csv"))) for mm_path in mm_paths: @@ -19,7 +19,8 @@ def main(mastermap_path): def filter_mastermap(mm_path): - output_path = "{}.filtered.csv".format(str(mm_path).replace(".gml.csv", "")) + output_path = "{}.filtered.csv" + output_path.format(str(mm_path).replace(".gml.csv", "")) output_fieldnames = ('WKT', 'fid', 'descriptiveGroup') # Open the input csv with all polygons, buildings and others with open(mm_path, 'r') as fh: @@ -32,7 +33,8 @@ def filter_mastermap(mm_path): try: if 'Building' in line['descriptiveGroup']: w.writerow(line) - except TypeError: # when descriptiveGroup is missing, ignore this Polygon + # when descriptiveGroup is missing, ignore this Polygon + except TypeError: pass diff --git a/etl/get_test_polygons.py b/etl/get_test_polygons.py index 6b1b34e3..388b9872 100644 --- a/etl/get_test_polygons.py +++ b/etl/get_test_polygons.py @@ -25,11 +25,12 @@ gdf = osmnx.footprints_from_point(point=point, dist=dist) # preview image gdf_proj = osmnx.projection.project_gdf(gdf, to_crs={'init': 'epsg:3857'}) -gdf_proj = gdf_proj[gdf_proj.geometry.apply(lambda g: g.geom_type != 'MultiPolygon')] +gdf_proj = gdf_proj[gdf_proj.geometry.apply(lambda g: g.geom_type != 'MultiPolygon')] # noqa -fig, ax = osmnx.plot_footprints(gdf_proj, bgcolor='#333333', color='w', figsize=(4,4), - save=True, show=False, close=True, - filename='test_buildings_preview', dpi=600) +fig, ax = osmnx.plot_footprints(gdf_proj, bgcolor='#333333', + color='w', figsize=(4, 4), + save=True, show=False, close=True, + filename='test_buildings_preview', dpi=600) # save test_dir = os.path.dirname(__file__) @@ -50,7 +51,13 @@ gdf_to_save.rename( # convert to CSV test_data_csv = str(os.path.join(test_dir, 'test_buildings.3857.csv')) subprocess.run(["rm", test_data_csv]) -subprocess.run(["ogr2ogr", "-f", "CSV", test_data_csv, test_data_geojson, "-lco", "GEOMETRY=AS_WKT"]) +subprocess.run( + ["ogr2ogr", "-f", "CSV", test_data_csv, + test_data_geojson, "-lco", "GEOMETRY=AS_WKT"] +) # add SRID for ease of loading to PostgreSQL -subprocess.run(["sed", "-i", "s/^\"POLYGON/\"SRID=3857;POLYGON/", test_data_csv]) +subprocess.run( + ["sed", "-i", "s/^\"POLYGON/\"SRID=3857;POLYGON/", + test_data_csv] +) From 8eeea322aadab15e7175134f552befd25f357fd8 Mon Sep 17 00:00:00 2001 From: Ed Chalstrey Date: Fri, 18 Mar 2022 16:21:29 +0000 Subject: [PATCH 49/89] update path to etl requirements --- .github/workflows/etl.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.github/workflows/etl.yml b/.github/workflows/etl.yml index 0d56a433..c14b9904 100644 --- a/.github/workflows/etl.yml +++ b/.github/workflows/etl.yml @@ -13,7 +13,7 @@ jobs: Install dependencies run: | python -m pip install --upgrade pip - python -m pip install -r requirements.txt + python -m pip install -r etl/requirements.txt - name: Run Flake8 run: | ls etl/*py | grep -v 'join_building_data' | xargs flake8 --exclude etl/__init__.py From 1b43d4c1baf09208fb2dbe3d5cd42864e37375e7 Mon Sep 17 00:00:00 2001 From: Ed Chalstrey Date: Fri, 18 Mar 2022 16:22:21 +0000 Subject: [PATCH 50/89] run tests on PR only --- .github/workflows/etl.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.github/workflows/etl.yml b/.github/workflows/etl.yml index c14b9904..e718820f 100644 --- a/.github/workflows/etl.yml +++ b/.github/workflows/etl.yml @@ -1,5 +1,5 @@ name: etl -on: [push, pull_request] +on: [pull_request] jobs: build: From 0b176c7531a439658dc3874e3389fcda85947ba7 Mon Sep 17 00:00:00 2001 From: Ed Chalstrey Date: Fri, 18 Mar 2022 16:30:14 +0000 Subject: [PATCH 51/89] make sure pytest and flake8 installed --- .github/workflows/etl.yml | 2 ++ 1 file changed, 2 insertions(+) diff --git a/.github/workflows/etl.yml b/.github/workflows/etl.yml index e718820f..2bee1586 100644 --- a/.github/workflows/etl.yml +++ b/.github/workflows/etl.yml @@ -13,6 +13,8 @@ jobs: Install dependencies run: | python -m pip install --upgrade pip + python -m pip install pytest + python -m pip install flake8 python -m pip install -r etl/requirements.txt - name: Run Flake8 run: | From 426aa98a33b72cf6271cf6ad4f174d554cffe01b Mon Sep 17 00:00:00 2001 From: Ed Chalstrey Date: Fri, 18 Mar 2022 16:32:52 +0000 Subject: [PATCH 52/89] add libgeos-dev --- .github/workflows/etl.yml | 1 + 1 file changed, 1 insertion(+) diff --git a/.github/workflows/etl.yml b/.github/workflows/etl.yml index 2bee1586..352fbf54 100644 --- a/.github/workflows/etl.yml +++ b/.github/workflows/etl.yml @@ -12,6 +12,7 @@ jobs: - name: Install dependencies run: | + apt-get install libgeos-dev python -m pip install --upgrade pip python -m pip install pytest python -m pip install flake8 From 0d9e54d08d7b3452c22aab656c54491574198eae Mon Sep 17 00:00:00 2001 From: Ed Chalstrey Date: Fri, 18 Mar 2022 16:34:20 +0000 Subject: [PATCH 53/89] use sudo with libgeos-dev --- .github/workflows/etl.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.github/workflows/etl.yml b/.github/workflows/etl.yml index 352fbf54..6d7f3b27 100644 --- a/.github/workflows/etl.yml +++ b/.github/workflows/etl.yml @@ -12,7 +12,7 @@ jobs: - name: Install dependencies run: | - apt-get install libgeos-dev + sudo apt-get install libgeos-dev python -m pip install --upgrade pip python -m pip install pytest python -m pip install flake8 From d985b3214692ee86115c8263a75504dbb3cb86a8 Mon Sep 17 00:00:00 2001 From: Ed Chalstrey Date: Fri, 18 Mar 2022 16:37:30 +0000 Subject: [PATCH 54/89] switch to python 3.7.4 --- .github/workflows/etl.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.github/workflows/etl.yml b/.github/workflows/etl.yml index 6d7f3b27..eee64314 100644 --- a/.github/workflows/etl.yml +++ b/.github/workflows/etl.yml @@ -8,7 +8,7 @@ jobs: - uses: actions/checkout@v2 - uses: actions/setup-python@v2 with: - python-version: '3.9' + python-version: '3.7.4' - name: Install dependencies run: | From 48e56b0ad70cae98686ce78351e82473196faea3 Mon Sep 17 00:00:00 2001 From: Ed Chalstrey Date: Fri, 18 Mar 2022 16:40:40 +0000 Subject: [PATCH 55/89] 3.7 --- .github/workflows/etl.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.github/workflows/etl.yml b/.github/workflows/etl.yml index eee64314..32893dc1 100644 --- a/.github/workflows/etl.yml +++ b/.github/workflows/etl.yml @@ -8,7 +8,7 @@ jobs: - uses: actions/checkout@v2 - uses: actions/setup-python@v2 with: - python-version: '3.7.4' + python-version: '3.7' - name: Install dependencies run: | From 27623cf3d48856b4adfa3ea499ab39cff6f45694 Mon Sep 17 00:00:00 2001 From: Ed Chalstrey Date: Fri, 18 Mar 2022 16:49:46 +0000 Subject: [PATCH 56/89] update filter step explanation --- etl/README.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/etl/README.md b/etl/README.md index 5f879177..7f4cfe99 100644 --- a/etl/README.md +++ b/etl/README.md @@ -72,12 +72,14 @@ Ensure you have the `colouringlondon` environment activated. source colouringlondon/bin/activate ``` -Filter MasterMap 'building' polygons and any others referenced by addressbase. +Filter MasterMap 'building' polygons. ```bash sudo ./filter_transform_mastermap_for_loading.sh ./mastermap_dir ``` + + Load all building outlines. From cf30f71d0a2b9e4de01b3a231369a0b92549e07b Mon Sep 17 00:00:00 2001 From: Ed Chalstrey Date: Fri, 18 Mar 2022 16:52:41 +0000 Subject: [PATCH 57/89] remove bad comments --- etl/README.md | 6 ------ 1 file changed, 6 deletions(-) diff --git a/etl/README.md b/etl/README.md index 7f4cfe99..50d605eb 100644 --- a/etl/README.md +++ b/etl/README.md @@ -82,8 +82,6 @@ sudo ./filter_transform_mastermap_for_loading.sh ./mastermap_dir Load all building outlines. - - ```bash ./load_geometries.sh ./mastermap_dir ``` @@ -104,14 +102,10 @@ TODO: Drop outside limit. Create a building record per outline. - - ```bash ./create_building_records.sh ``` - - Run the remaining migrations in `../migrations` to create the rest of the database structure. ```bash From 8398bcea3ab1bee77c7bec98faa546ea1755501e Mon Sep 17 00:00:00 2001 From: Ed Chalstrey Date: Thu, 24 Mar 2022 09:51:41 +0000 Subject: [PATCH 58/89] remove python environment not needed --- etl/README.md | 6 ------ etl/filter_transform_mastermap_for_loading.sh | 2 +- 2 files changed, 1 insertion(+), 7 deletions(-) diff --git a/etl/README.md b/etl/README.md index 50d605eb..ed51f28b 100644 --- a/etl/README.md +++ b/etl/README.md @@ -66,12 +66,6 @@ sudo ./extract_mastermap.sh ./mastermap_dir -Ensure you have the `colouringlondon` environment activated. - -```bash -source colouringlondon/bin/activate -``` - Filter MasterMap 'building' polygons. ```bash diff --git a/etl/filter_transform_mastermap_for_loading.sh b/etl/filter_transform_mastermap_for_loading.sh index 85c68b51..ac44efd1 100755 --- a/etl/filter_transform_mastermap_for_loading.sh +++ b/etl/filter_transform_mastermap_for_loading.sh @@ -12,7 +12,7 @@ mastermap_dir=$1 # - WHERE descriptiveGroup = '(1:Building)' # - OR toid in addressbase_toids # -colouringlondon/bin/python filter_mastermap.py $mastermap_dir +python filter_mastermap.py $mastermap_dir # # Transform to 3857 (web mercator) From d7b4f9eeb567fae84dd514703b412aae2882d803 Mon Sep 17 00:00:00 2001 From: Ed Chalstrey Date: Thu, 24 Mar 2022 11:41:56 +0000 Subject: [PATCH 59/89] only convert gml files with 5690395 to csv --- etl/extract_mastermap.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/etl/extract_mastermap.sh b/etl/extract_mastermap.sh index 09ada8cb..ddf5b566 100755 --- a/etl/extract_mastermap.sh +++ b/etl/extract_mastermap.sh @@ -23,7 +23,7 @@ gunzip $data_dir/{} -k -S gml rename 's/$/.gml/' $data_dir/*[^gzvt] -find $data_dir -type f -name '*.gml' -printf "%f\n" | \ +find $data_dir -type f -name '*5690395*.gml' -printf "%f\n" | \ parallel \ ogr2ogr \ -select fid,descriptiveGroup \ From d060ebf4bb1878d630316ea3da27e0f7dd1d893d Mon Sep 17 00:00:00 2001 From: Ed Chalstrey Date: Mon, 28 Mar 2022 13:42:08 +0100 Subject: [PATCH 60/89] add comment --- etl/extract_mastermap.sh | 1 + 1 file changed, 1 insertion(+) diff --git a/etl/extract_mastermap.sh b/etl/extract_mastermap.sh index ddf5b566..d5b2403c 100755 --- a/etl/extract_mastermap.sh +++ b/etl/extract_mastermap.sh @@ -23,6 +23,7 @@ gunzip $data_dir/{} -k -S gml rename 's/$/.gml/' $data_dir/*[^gzvt] +# Note: we may need to update the below for other downloads find $data_dir -type f -name '*5690395*.gml' -printf "%f\n" | \ parallel \ ogr2ogr \ From 5f9aaf7f0beb1ab5287ef94536d094bd0ae03483 Mon Sep 17 00:00:00 2001 From: Ed Chalstrey Date: Mon, 28 Mar 2022 13:54:00 +0100 Subject: [PATCH 61/89] tidy --- etl/README.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/etl/README.md b/etl/README.md index ed51f28b..73cfd70b 100644 --- a/etl/README.md +++ b/etl/README.md @@ -59,7 +59,7 @@ chmod +x *.sh Extract the MasterMap data (this step could take a while). ```bash -sudo ./extract_mastermap.sh ./mastermap_dir +sudo ./extract_mastermap.sh /path/to/mastermap_dir ``` @@ -69,7 +69,7 @@ sudo ./extract_mastermap.sh ./mastermap_dir Filter MasterMap 'building' polygons. ```bash -sudo ./filter_transform_mastermap_for_loading.sh ./mastermap_dir +sudo ./filter_transform_mastermap_for_loading.sh /path/to/mastermap_dir ``` @@ -77,7 +77,7 @@ sudo ./filter_transform_mastermap_for_loading.sh ./mastermap_dir Load all building outlines. ```bash -./load_geometries.sh ./mastermap_dir +./load_geometries.sh /path/to/mastermap_dir ``` Index geometries. @@ -91,7 +91,7 @@ TODO: Drop outside limit. ```bash -./drop_outside_limit.sh ./path/to/boundary_file +./drop_outside_limit.sh /path/to/boundary_file ```` Create a building record per outline. From da89c9b63b4234655d1fb3144b0281386146211e Mon Sep 17 00:00:00 2001 From: Ed Chalstrey Date: Mon, 28 Mar 2022 14:02:47 +0100 Subject: [PATCH 62/89] clarify comment --- etl/extract_mastermap.sh | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/etl/extract_mastermap.sh b/etl/extract_mastermap.sh index d5b2403c..f0863e0c 100755 --- a/etl/extract_mastermap.sh +++ b/etl/extract_mastermap.sh @@ -23,7 +23,9 @@ gunzip $data_dir/{} -k -S gml rename 's/$/.gml/' $data_dir/*[^gzvt] -# Note: we may need to update the below for other downloads +# Note: previously the rename cmd above resulted in some temp files being renamed to .gml +# so I have specified the start of the filename (appears to be consistent for all OS MasterMap downloads) +# we may need to update this below for other downloads find $data_dir -type f -name '*5690395*.gml' -printf "%f\n" | \ parallel \ ogr2ogr \ From 405f9558ad3372b2b8ddbb3776e0c82e2594bc7f Mon Sep 17 00:00:00 2001 From: Ed Chalstrey Date: Mon, 28 Mar 2022 14:17:55 +0100 Subject: [PATCH 63/89] gather stored vars for postgres --- etl/load_geometries.sh | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/etl/load_geometries.sh b/etl/load_geometries.sh index 4a4d5745..a2b20416 100755 --- a/etl/load_geometries.sh +++ b/etl/load_geometries.sh @@ -1,5 +1,11 @@ #!/usr/bin/env bash +# Gather stored postgres variables +export PGHOST=${PGHOST} +export PGDATABASE=${PGDATABASE} +export PGUSER=${PGUSER} +export PGPASSWORD=${PGPASSWORD} + # # Load geometries from GeoJSON to Postgres # - assume postgres connection details are set in the environment using PGUSER, PGHOST etc. From bb964f55c9e880f3f740b77020bdf07f7252a1da Mon Sep 17 00:00:00 2001 From: Ed Chalstrey Date: Mon, 28 Mar 2022 14:18:10 +0100 Subject: [PATCH 64/89] add sudo --- etl/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/etl/README.md b/etl/README.md index 73cfd70b..1403b4e8 100644 --- a/etl/README.md +++ b/etl/README.md @@ -77,7 +77,7 @@ sudo ./filter_transform_mastermap_for_loading.sh /path/to/mastermap_dir Load all building outlines. ```bash -./load_geometries.sh /path/to/mastermap_dir +sudo ./load_geometries.sh /path/to/mastermap_dir ``` Index geometries. From 7bffa445025314ff930722942b79ab3297fdaed7 Mon Sep 17 00:00:00 2001 From: Ed Chalstrey Date: Mon, 28 Mar 2022 14:42:29 +0100 Subject: [PATCH 65/89] use env vars properly? --- etl/load_geometries.sh | 10 ++-------- 1 file changed, 2 insertions(+), 8 deletions(-) diff --git a/etl/load_geometries.sh b/etl/load_geometries.sh index a2b20416..27d6b07f 100755 --- a/etl/load_geometries.sh +++ b/etl/load_geometries.sh @@ -1,11 +1,5 @@ #!/usr/bin/env bash -# Gather stored postgres variables -export PGHOST=${PGHOST} -export PGDATABASE=${PGDATABASE} -export PGUSER=${PGUSER} -export PGPASSWORD=${PGPASSWORD} - # # Load geometries from GeoJSON to Postgres # - assume postgres connection details are set in the environment using PGUSER, PGHOST etc. @@ -23,12 +17,12 @@ mastermap_dir=$1 find $mastermap_dir -type f -name '*.3857.csv' \ -printf "$mastermap_dir/%f\n" | \ parallel \ -cat {} '|' psql -c "\"COPY geometries ( geometry_geom, source_id ) FROM stdin WITH CSV HEADER;\"" +cat {} '|' PGHOST=$PGHOST PGDATABASE=$PGDATABASE PGUSER=$PGUSER PGPASSWORD=$PGPASSWORD psql -c "\"COPY geometries ( geometry_geom, source_id ) FROM stdin WITH CSV HEADER;\"" # # Delete any duplicated geometries (by TOID) # -psql -c "DELETE FROM geometries a USING ( +PGHOST=$PGHOST PGDATABASE=$PGDATABASE PGUSER=$PGUSER PGPASSWORD=$PGPASSWORD psql -c "DELETE FROM geometries a USING ( SELECT MIN(ctid) as ctid, source_id FROM geometries GROUP BY source_id From c85b321a60fe0094b7aecdc926f19cd702116c0e Mon Sep 17 00:00:00 2001 From: Ed Chalstrey Date: Mon, 28 Mar 2022 15:06:23 +0100 Subject: [PATCH 66/89] revert changes to env vars --- etl/load_geometries.sh | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/etl/load_geometries.sh b/etl/load_geometries.sh index 27d6b07f..4a4d5745 100755 --- a/etl/load_geometries.sh +++ b/etl/load_geometries.sh @@ -17,12 +17,12 @@ mastermap_dir=$1 find $mastermap_dir -type f -name '*.3857.csv' \ -printf "$mastermap_dir/%f\n" | \ parallel \ -cat {} '|' PGHOST=$PGHOST PGDATABASE=$PGDATABASE PGUSER=$PGUSER PGPASSWORD=$PGPASSWORD psql -c "\"COPY geometries ( geometry_geom, source_id ) FROM stdin WITH CSV HEADER;\"" +cat {} '|' psql -c "\"COPY geometries ( geometry_geom, source_id ) FROM stdin WITH CSV HEADER;\"" # # Delete any duplicated geometries (by TOID) # -PGHOST=$PGHOST PGDATABASE=$PGDATABASE PGUSER=$PGUSER PGPASSWORD=$PGPASSWORD psql -c "DELETE FROM geometries a USING ( +psql -c "DELETE FROM geometries a USING ( SELECT MIN(ctid) as ctid, source_id FROM geometries GROUP BY source_id From 4cd14ffd7709dd3e23a6d402de6c9b65e8a2f1a2 Mon Sep 17 00:00:00 2001 From: Ed Chalstrey Date: Mon, 28 Mar 2022 15:22:04 +0100 Subject: [PATCH 67/89] remove sudo --- etl/README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/etl/README.md b/etl/README.md index 1403b4e8..aa284a5d 100644 --- a/etl/README.md +++ b/etl/README.md @@ -74,10 +74,10 @@ sudo ./filter_transform_mastermap_for_loading.sh /path/to/mastermap_dir -Load all building outlines. +Load all building outlines. Note: you should ensure that `mastermap_dir` has permissions that will allow the linux `find` command to work without using sudo. ```bash -sudo ./load_geometries.sh /path/to/mastermap_dir +./load_geometries.sh /path/to/mastermap_dir ``` Index geometries. From 6602fc916a8b18eeaea5358e531fe962bf656348 Mon Sep 17 00:00:00 2001 From: Ed Chalstrey Date: Mon, 28 Mar 2022 15:23:40 +0100 Subject: [PATCH 68/89] comment out step temporarily --- etl/README.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/etl/README.md b/etl/README.md index aa284a5d..595c5235 100644 --- a/etl/README.md +++ b/etl/README.md @@ -86,13 +86,13 @@ Index geometries. psql < ../migrations/002.index-geometries.up.sql ``` -TODO: Drop outside limit. + -```bash + Create a building record per outline. From 7297354f5baabc5e25eb3afbb562cf3dddc5bf30 Mon Sep 17 00:00:00 2001 From: Ed Chalstrey Date: Tue, 29 Mar 2022 10:12:31 +0100 Subject: [PATCH 69/89] update test that so that it fails --- tests/test_filter.py | 4 ++-- tests/test_mastermap.gml.csv | 3 ++- 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/tests/test_filter.py b/tests/test_filter.py index 16284657..3c4f3a72 100644 --- a/tests/test_filter.py +++ b/tests/test_filter.py @@ -5,12 +5,12 @@ from etl import filter_mastermap def test_filter_mastermap(): """Test that MasterMap CSV can be correctly filtered to include only buildings.""" - input_file = "tests/test_mastermap.gml.csv" # Test csv with one building and one non-building + input_file = "tests/test_mastermap.gml.csv" # Test csv with two buildings and one non-building output_file = input_file.replace('gml', 'filtered') filter_mastermap(input_file) # creates output_file with open(output_file, newline='') as csvfile: csv_array = list(csv.reader(csvfile)) - assert len(csv_array) == 2 # assert that length is 2 because just one row after header + assert len(csv_array) == 3 # assert that length is 3 because just two building rows after header def test_filter_mastermap_missing_descriptivegroup(): diff --git a/tests/test_mastermap.gml.csv b/tests/test_mastermap.gml.csv index d76a6eee..9837036b 100644 --- a/tests/test_mastermap.gml.csv +++ b/tests/test_mastermap.gml.csv @@ -1,3 +1,4 @@ WKT,fid,descriptiveGroup "POLYGON ((484704.003 184721.2,484691.62 184729.971,484688.251 184725.214,484700.633 184716.443,484704.003 184721.2))",osgb5000005129953843,"[ ""Building"" ]" -"POLYGON ((484703.76 184849.9,484703.46 184849.7,484703.26 184849.4,484703.06 184849.2,484702.86 184848.9,484702.76 184848.6,484702.66 184848.2,484702.66 184847.3,484702.76 184847.0,484702.96 184846.7,484703.06 184846.4,484703.36 184846.2,484703.56 184846.0,484704.16 184845.6,484704.46 184845.5,484705.46 184845.5,484706.06 184845.7,484706.26 184845.8,484706.76 184846.3,484706.96 184846.6,484707.16 184846.8,484707.26 184847.2,484707.36 184847.5,484707.36 184848.4,484707.26 184848.7,484707.16 184848.9,484706.76 184849.5,484706.46 184849.7,484706.26 184849.9,484705.66 184850.2,484704.66 184850.2,484703.76 184849.9))",osgb1000000152730957,"[ ""General Surface"" ]" \ No newline at end of file +"POLYGON ((484703.76 184849.9,484703.46 184849.7,484703.26 184849.4,484703.06 184849.2,484702.86 184848.9,484702.76 184848.6,484702.66 184848.2,484702.66 184847.3,484702.76 184847.0,484702.96 184846.7,484703.06 184846.4,484703.36 184846.2,484703.56 184846.0,484704.16 184845.6,484704.46 184845.5,484705.46 184845.5,484706.06 184845.7,484706.26 184845.8,484706.76 184846.3,484706.96 184846.6,484707.16 184846.8,484707.26 184847.2,484707.36 184847.5,484707.36 184848.4,484707.26 184848.7,484707.16 184848.9,484706.76 184849.5,484706.46 184849.7,484706.26 184849.9,484705.66 184850.2,484704.66 184850.2,484703.76 184849.9))",osgb1000000152730957,"[ ""General Surface"" ]" +"POLYGON ((530022.138 177486.29,530043.695 177498.235,530043.074 177499.355,530042.435 177500.509,530005.349 177480.086,529978.502 177463.333,529968.87 177457.322,529968.446 177457.057,529968.199 177455.714,529968.16 177455.504,529966.658 177454.566,529958.613 177449.543,529956.624 177448.301,529956.62 177448.294,529956.08 177447.4,529954.238 177444.351,529953.197 177442.624,529953.186 177442.609,529950.768 177438.606,529950.454 177438.086,529949.47 177434.209,529950.212 177434.038,529954.216 177433.114,529955.098 177437.457,529952.714 177437.98,529953.55 177441.646,529953.842 177442.008,529957.116 177446.059,529957.449 177446.471,529968.508 177453.375,529974.457 177451.966,529976.183 177458.937,530003.157 177475.772,530020.651 177485.466,530021.257 177484.372,530022.744 177485.196,530022.138 177486.29))",osgb5000005283023887,"[ ""Building"" ]" \ No newline at end of file From f49e378fe3db0ebea54bb8897b3bd802504f5053 Mon Sep 17 00:00:00 2001 From: Ed Chalstrey Date: Tue, 29 Mar 2022 10:32:13 +0100 Subject: [PATCH 70/89] fix filter_mastermap --- etl/filter_mastermap.py | 3 +-- tests/test_mastermap.filtered.csv | 1 + 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/etl/filter_mastermap.py b/etl/filter_mastermap.py index 3315a6ca..66108b78 100644 --- a/etl/filter_mastermap.py +++ b/etl/filter_mastermap.py @@ -19,8 +19,7 @@ def main(mastermap_path): def filter_mastermap(mm_path): - output_path = "{}.filtered.csv" - output_path.format(str(mm_path).replace(".gml.csv", "")) + output_path = "{}.filtered.csv".format(str(mm_path).replace(".gml.csv", "")) output_fieldnames = ('WKT', 'fid', 'descriptiveGroup') # Open the input csv with all polygons, buildings and others with open(mm_path, 'r') as fh: diff --git a/tests/test_mastermap.filtered.csv b/tests/test_mastermap.filtered.csv index 28cc7baf..286b40b3 100644 --- a/tests/test_mastermap.filtered.csv +++ b/tests/test_mastermap.filtered.csv @@ -1,2 +1,3 @@ WKT,fid,descriptiveGroup "POLYGON ((484704.003 184721.2,484691.62 184729.971,484688.251 184725.214,484700.633 184716.443,484704.003 184721.2))",osgb5000005129953843,"[ ""Building"" ]" +"POLYGON ((530022.138 177486.29,530043.695 177498.235,530043.074 177499.355,530042.435 177500.509,530005.349 177480.086,529978.502 177463.333,529968.87 177457.322,529968.446 177457.057,529968.199 177455.714,529968.16 177455.504,529966.658 177454.566,529958.613 177449.543,529956.624 177448.301,529956.62 177448.294,529956.08 177447.4,529954.238 177444.351,529953.197 177442.624,529953.186 177442.609,529950.768 177438.606,529950.454 177438.086,529949.47 177434.209,529950.212 177434.038,529954.216 177433.114,529955.098 177437.457,529952.714 177437.98,529953.55 177441.646,529953.842 177442.008,529957.116 177446.059,529957.449 177446.471,529968.508 177453.375,529974.457 177451.966,529976.183 177458.937,530003.157 177475.772,530020.651 177485.466,530021.257 177484.372,530022.744 177485.196,530022.138 177486.29))",osgb5000005283023887,"[ ""Building"" ]" From f1946ea35bb36cbc7f572b392ea7296b670cc7cb Mon Sep 17 00:00:00 2001 From: Ed Chalstrey Date: Tue, 29 Mar 2022 10:33:47 +0100 Subject: [PATCH 71/89] flake8 --- etl/filter_mastermap.py | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/etl/filter_mastermap.py b/etl/filter_mastermap.py index 66108b78..d540938f 100644 --- a/etl/filter_mastermap.py +++ b/etl/filter_mastermap.py @@ -19,7 +19,8 @@ def main(mastermap_path): def filter_mastermap(mm_path): - output_path = "{}.filtered.csv".format(str(mm_path).replace(".gml.csv", "")) + output_path = str(mm_path).replace(".gml.csv", "") + output_path = "{}.filtered.csv".format(output_path) output_fieldnames = ('WKT', 'fid', 'descriptiveGroup') # Open the input csv with all polygons, buildings and others with open(mm_path, 'r') as fh: From 3f07a9b081a7febe064b7588cda303820f00f270 Mon Sep 17 00:00:00 2001 From: Ed Chalstrey Date: Tue, 29 Mar 2022 14:37:42 +0100 Subject: [PATCH 72/89] remove print statement --- etl/filter_mastermap.py | 1 - 1 file changed, 1 deletion(-) diff --git a/etl/filter_mastermap.py b/etl/filter_mastermap.py index d540938f..847b0cf8 100644 --- a/etl/filter_mastermap.py +++ b/etl/filter_mastermap.py @@ -14,7 +14,6 @@ csv.field_size_limit(sys.maxsize) def main(mastermap_path): mm_paths = sorted(glob.glob(os.path.join(mastermap_path, "*.gml.csv"))) for mm_path in mm_paths: - print(mm_path) filter_mastermap(mm_path) From 4a8d79a54a178e197fd37b5379da713fe121b037 Mon Sep 17 00:00:00 2001 From: Ed Chalstrey Date: Tue, 29 Mar 2022 14:42:34 +0100 Subject: [PATCH 73/89] add echos to filtration step --- etl/filter_transform_mastermap_for_loading.sh | 19 +++++-------------- 1 file changed, 5 insertions(+), 14 deletions(-) diff --git a/etl/filter_transform_mastermap_for_loading.sh b/etl/filter_transform_mastermap_for_loading.sh index ac44efd1..84ee4718 100755 --- a/etl/filter_transform_mastermap_for_loading.sh +++ b/etl/filter_transform_mastermap_for_loading.sh @@ -1,22 +1,13 @@ #!/usr/bin/env bash -# -# Filter and transform for loading -# : ${1?"Usage: $0 ./path/to/mastermap/dir"} mastermap_dir=$1 -# -# Filter -# - WHERE descriptiveGroup = '(1:Building)' -# - OR toid in addressbase_toids -# +echo "Filter WHERE descriptiveGroup = '(1:Building)'... " python filter_mastermap.py $mastermap_dir -# -# Transform to 3857 (web mercator) -# +echo "Transform to 3857 (web mercator)..." find $mastermap_dir -type f -name '*.gml.csv' -printf "%f\n" | \ parallel \ ogr2ogr \ @@ -27,13 +18,13 @@ ogr2ogr \ $mastermap_dir/{} \ -lco GEOMETRY=AS_WKT -# -# Update to EWKT (with SRID indicator for loading to Postgres) -# +echo "Update to EWKT (with SRID indicator for loading to Postgres)..." +echo "Updating POLYGONs.." find $mastermap_dir -type f -name '*.3857.csv' -printf "%f\n" | \ parallel \ sed -i "'s/^\"POLYGON/\"SRID=3857;POLYGON/'" $mastermap_dir/{} +echo "Updating MULTIPOLYGONs.." find $mastermap_dir -type f -name '*.3857.csv' -printf "%f\n" | \ parallel \ sed -i "'s/^\"MULTIPOLYGON/\"SRID=3857;MULTIPOLYGON/'" $mastermap_dir/{} From 0e01971b4af4723ddefb788f80607aeb225e3a0e Mon Sep 17 00:00:00 2001 From: Ed Chalstrey Date: Tue, 29 Mar 2022 14:46:18 +0100 Subject: [PATCH 74/89] add echos to extraction step --- etl/extract_mastermap.sh | 14 ++++++-------- 1 file changed, 6 insertions(+), 8 deletions(-) diff --git a/etl/extract_mastermap.sh b/etl/extract_mastermap.sh index f0863e0c..568c990d 100755 --- a/etl/extract_mastermap.sh +++ b/etl/extract_mastermap.sh @@ -1,31 +1,28 @@ #!/usr/bin/env bash -# -# Extract MasterMap -# - : ${1?"Usage: $0 ./path/to/mastermap/dir"} data_dir=$1 -# -# Extract buildings from *.gz to CSV -# + +echo "Extract buildings from *.gz..." + # Features where:: # descriptiveGroup = '(1:Building)' # # Use `fid` as source ID, aka TOID. -# find $data_dir -type f -name '*.gz' -printf "%f\n" | \ parallel \ gunzip $data_dir/{} -k -S gml +echo "Rename extracted files to .gml..." rename 's/$/.gml/' $data_dir/*[^gzvt] # Note: previously the rename cmd above resulted in some temp files being renamed to .gml # so I have specified the start of the filename (appears to be consistent for all OS MasterMap downloads) # we may need to update this below for other downloads +echo "Covert .gml files to .csv" find $data_dir -type f -name '*5690395*.gml' -printf "%f\n" | \ parallel \ ogr2ogr \ @@ -35,5 +32,6 @@ ogr2ogr \ TopographicArea \ -lco GEOMETRY=AS_WKT +echo "Remove .gfs and .gml files from previous steps..." rm $data_dir/*.gfs rm $data_dir/*.gml From 2163dc58129f03b43fd7316368dfee2ecb68ff2f Mon Sep 17 00:00:00 2001 From: Ed Chalstrey Date: Tue, 29 Mar 2022 14:48:27 +0100 Subject: [PATCH 75/89] add echos to load geometries --- etl/load_geometries.sh | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/etl/load_geometries.sh b/etl/load_geometries.sh index 4a4d5745..a2febe8b 100755 --- a/etl/load_geometries.sh +++ b/etl/load_geometries.sh @@ -1,27 +1,25 @@ #!/usr/bin/env bash -# # Load geometries from GeoJSON to Postgres # - assume postgres connection details are set in the environment using PGUSER, PGHOST etc. -# + : ${1?"Usage: $0 ./path/to/mastermap/dir"} mastermap_dir=$1 -# # Create 'geometry' record with # id: , # source_id: , # geom: -# + +echo "Copy geometries to db..." find $mastermap_dir -type f -name '*.3857.csv' \ -printf "$mastermap_dir/%f\n" | \ parallel \ cat {} '|' psql -c "\"COPY geometries ( geometry_geom, source_id ) FROM stdin WITH CSV HEADER;\"" -# # Delete any duplicated geometries (by TOID) -# +echo "Delete duplicate geometries..." psql -c "DELETE FROM geometries a USING ( SELECT MIN(ctid) as ctid, source_id FROM geometries From 7bd78bf03a6bae580030c52ca92d7995047f4a59 Mon Sep 17 00:00:00 2001 From: Ed Chalstrey Date: Tue, 29 Mar 2022 15:41:00 +0100 Subject: [PATCH 76/89] fix filtering script --- etl/filter_transform_mastermap_for_loading.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/etl/filter_transform_mastermap_for_loading.sh b/etl/filter_transform_mastermap_for_loading.sh index 84ee4718..95aab262 100755 --- a/etl/filter_transform_mastermap_for_loading.sh +++ b/etl/filter_transform_mastermap_for_loading.sh @@ -8,7 +8,7 @@ echo "Filter WHERE descriptiveGroup = '(1:Building)'... " python filter_mastermap.py $mastermap_dir echo "Transform to 3857 (web mercator)..." -find $mastermap_dir -type f -name '*.gml.csv' -printf "%f\n" | \ +find $mastermap_dir -type f -name '*.filtered.csv' -printf "%f\n" | \ parallel \ ogr2ogr \ -f CSV $mastermap_dir/{}.3857.csv \ From 8227320b9d10e364d9ccfcd6e45171b4d18eae27 Mon Sep 17 00:00:00 2001 From: Ed Chalstrey Date: Tue, 29 Mar 2022 16:48:15 +0100 Subject: [PATCH 77/89] move python instructions specific to test data --- docs/setup-dev-environment.md | 48 +++++++++++++++-------------------- 1 file changed, 21 insertions(+), 27 deletions(-) diff --git a/docs/setup-dev-environment.md b/docs/setup-dev-environment.md index 52b4c541..b5626ebc 100644 --- a/docs/setup-dev-environment.md +++ b/docs/setup-dev-environment.md @@ -237,32 +237,6 @@ Install python and related tools. sudo apt-get install -y python3 python3-pip python3-dev python3-venv ``` -Create a virtual environment for python in the `etl` folder of your repository. In the following example we have name the virtual environment *colouringlondon* but it can have any name. - -```bash -cd ~/colouring-london/etl -pyvenv colouringlondon -``` - -Activate the virtual environment so we can install python packages into it. - -```bash -source colouringlondon/bin/activate -``` - -Install python pip package manager and related tools. - -```bash -pip install --upgrade pip -pip install --upgrade setuptools wheel -``` - -Install the required python packages. - -```bash -pip install -r requirements.txt -``` - ## :house: Loading the building data There are several ways to create the Colouring London database in your environment. The simplest way if you are just trying out the application would be to use test data from OSM, but otherwise you should follow one of the instructions below to create the full database either from scratch, or from a previously made db (via a dump file). @@ -306,12 +280,32 @@ This section shows how to load test buildings into the application from OpenStre #### Load OpenStreetMap test polygons -Ensure you have the `colouringlondon` environment activated. +Create a virtual environment for python in the `etl` folder of your repository. In the following example we have name the virtual environment *colouringlondon* but it can have any name. + +```bash +cd ~/colouring-london/etl +pyvenv colouringlondon +``` + +Activate the virtual environment so we can install python packages into it. ```bash source colouringlondon/bin/activate ``` +Install python pip package manager and related tools. + +```bash +pip install --upgrade pip +pip install --upgrade setuptools wheel +``` + +Install the required python packages. + +```bash +pip install -r requirements.txt +``` + To help test the Colouring London application, `get_test_polygons.py` will attempt to save a small (1.5km²) extract from OpenStreetMap to a format suitable for loading to the database. Download the test data. From ea76ad0288940c7688394715ff1f32ccfaff0d10 Mon Sep 17 00:00:00 2001 From: Ed Chalstrey Date: Tue, 29 Mar 2022 16:50:31 +0100 Subject: [PATCH 78/89] clarify --- docs/setup-dev-environment.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/setup-dev-environment.md b/docs/setup-dev-environment.md index b5626ebc..b44972f3 100644 --- a/docs/setup-dev-environment.md +++ b/docs/setup-dev-environment.md @@ -158,7 +158,7 @@ Ensure the `en_US` locale exists. sudo locale-gen en_US.UTF-8 ``` -Configure the database to listen on network connection. +Configure postgres to listen on network connection. ```bash sudo sed -i "s/#\?listen_address.*/listen_addresses '*'/" /etc/postgresql/12/main/postgresql.conf From a8e6dc0b095ea4d6eecd51ecfe5c72ec0330bd67 Mon Sep 17 00:00:00 2001 From: Ed Chalstrey Date: Tue, 29 Mar 2022 16:55:40 +0100 Subject: [PATCH 79/89] add new header on db creation --- docs/setup-dev-environment.md | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/docs/setup-dev-environment.md b/docs/setup-dev-environment.md index b44972f3..719fd26c 100644 --- a/docs/setup-dev-environment.md +++ b/docs/setup-dev-environment.md @@ -49,6 +49,7 @@ ssh @localhost -p 4022 - [:rainbow: Installing Colouring London](#rainbow-installing-colouring-london) - [:arrow_down: Installing Node.js](#arrow_down-installing-nodejs) - [:large_blue_circle: Configuring PostgreSQL](#large_blue_circle-configuring-postgresql) + - [:space_invader: Create an empty database](#space_invader_create_an_empty_database) - [:arrow_forward: Configuring Node.js](#arrow_forward-configuring-nodejs) - [:snake: Set up Python](#snake-set-up-python) - [:house: Loading the building data](#house-loading-the-building-data) @@ -190,6 +191,10 @@ If you intend to load the full CL database from a dump file into your dev enviro

+### :space_invader: Create an empty database + +Now create an empty database configured with geo-spatial tools. The database name (``) is arbitrary. + Set environment variables, which will simplify running subsequent `psql` commands. ```bash @@ -199,7 +204,7 @@ export PGHOST=localhost export PGDATABASE= ``` -Create a colouring london database if none exists. The name (``) is arbitrary. +Create the database. ```bash sudo -u postgres psql -c "SELECT 1 FROM pg_database WHERE datname = '';" | grep -q 1 || sudo -u postgres createdb -E UTF8 -T template0 --locale=en_US.utf8 -O From 57e655b48276b8adef7576863c211566706fb517 Mon Sep 17 00:00:00 2001 From: Ed Chalstrey Date: Tue, 29 Mar 2022 16:56:58 +0100 Subject: [PATCH 80/89] fix link --- docs/setup-dev-environment.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/setup-dev-environment.md b/docs/setup-dev-environment.md index 719fd26c..d70679c3 100644 --- a/docs/setup-dev-environment.md +++ b/docs/setup-dev-environment.md @@ -49,7 +49,7 @@ ssh @localhost -p 4022 - [:rainbow: Installing Colouring London](#rainbow-installing-colouring-london) - [:arrow_down: Installing Node.js](#arrow_down-installing-nodejs) - [:large_blue_circle: Configuring PostgreSQL](#large_blue_circle-configuring-postgresql) - - [:space_invader: Create an empty database](#space_invader_create_an_empty_database) + - [:space_invader: Create an empty database](#space_invader-create-an-empty-database) - [:arrow_forward: Configuring Node.js](#arrow_forward-configuring-nodejs) - [:snake: Set up Python](#snake-set-up-python) - [:house: Loading the building data](#house-loading-the-building-data) From 5b2029a4e38182b2f6f7f9e020e51dd633b6b07f Mon Sep 17 00:00:00 2001 From: Ed Chalstrey Date: Thu, 31 Mar 2022 15:42:06 +0100 Subject: [PATCH 81/89] remove bad comments --- etl/README.md | 8 -------- 1 file changed, 8 deletions(-) diff --git a/etl/README.md b/etl/README.md index 595c5235..d1714321 100644 --- a/etl/README.md +++ b/etl/README.md @@ -62,18 +62,12 @@ Extract the MasterMap data (this step could take a while). sudo ./extract_mastermap.sh /path/to/mastermap_dir ``` - - - - Filter MasterMap 'building' polygons. ```bash sudo ./filter_transform_mastermap_for_loading.sh /path/to/mastermap_dir ``` - - Load all building outlines. Note: you should ensure that `mastermap_dir` has permissions that will allow the linux `find` command to work without using sudo. ```bash @@ -88,8 +82,6 @@ psql < ../migrations/002.index-geometries.up.sql - - From d6ca8852d4cd22ddee7bdb8e53b65759f2aeca9b Mon Sep 17 00:00:00 2001 From: Ed Chalstrey Date: Thu, 31 Mar 2022 16:27:36 +0100 Subject: [PATCH 82/89] create script to load new geometries --- etl/load_new_geometries.sh | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) create mode 100644 etl/load_new_geometries.sh diff --git a/etl/load_new_geometries.sh b/etl/load_new_geometries.sh new file mode 100644 index 00000000..0d4bfc0d --- /dev/null +++ b/etl/load_new_geometries.sh @@ -0,0 +1,20 @@ +#!/usr/bin/env bash + +# Load new geometries from GeoJSON to Postgres +# - assume postgres connection details are set in the environment using PGUSER, PGHOST etc. + +: ${1?"Usage: $0 ./path/to/mastermap/dir"} + +mastermap_dir=$1 + +# Create 'geometry' record with +# id: , +# source_id: , +# geom: + +echo "Copy new geometries to db..." +find $mastermap_dir -type f -name '*.3857.csv' \ +-printf "$mastermap_dir/%f\n" | \ +parallel \ +cat {} '|' psql -c "\"COPY geometries ( geometry_geom, source_id ) FROM stdin WITH CSV HEADER +WHERE source_id NOT IN geometries;\"" From a3d2537e22961dd967f554e278c95cc72d38e2e4 Mon Sep 17 00:00:00 2001 From: Ed Chalstrey Date: Fri, 1 Apr 2022 10:05:11 +0100 Subject: [PATCH 83/89] Revert "create script to load new geometries" This reverts commit d6ca8852d4cd22ddee7bdb8e53b65759f2aeca9b. --- etl/load_new_geometries.sh | 20 -------------------- 1 file changed, 20 deletions(-) delete mode 100644 etl/load_new_geometries.sh diff --git a/etl/load_new_geometries.sh b/etl/load_new_geometries.sh deleted file mode 100644 index 0d4bfc0d..00000000 --- a/etl/load_new_geometries.sh +++ /dev/null @@ -1,20 +0,0 @@ -#!/usr/bin/env bash - -# Load new geometries from GeoJSON to Postgres -# - assume postgres connection details are set in the environment using PGUSER, PGHOST etc. - -: ${1?"Usage: $0 ./path/to/mastermap/dir"} - -mastermap_dir=$1 - -# Create 'geometry' record with -# id: , -# source_id: , -# geom: - -echo "Copy new geometries to db..." -find $mastermap_dir -type f -name '*.3857.csv' \ --printf "$mastermap_dir/%f\n" | \ -parallel \ -cat {} '|' psql -c "\"COPY geometries ( geometry_geom, source_id ) FROM stdin WITH CSV HEADER -WHERE source_id NOT IN geometries;\"" From b470591420588ef24f33d1d5260b58b8846fff03 Mon Sep 17 00:00:00 2001 From: Ed Chalstrey Date: Fri, 1 Apr 2022 10:20:18 +0100 Subject: [PATCH 84/89] rearrange sections --- etl/README.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/etl/README.md b/etl/README.md index d1714321..ab7863a8 100644 --- a/etl/README.md +++ b/etl/README.md @@ -1,6 +1,4 @@ -# Creating a Colouring London database from scratch - -## Data downloading +# Downloading Ordnance Survey data The scripts in this directory are used to extract, transform and load (ETL) the core datasets for Colouring London: @@ -19,6 +17,12 @@ To get the required datasets, you'll need to complete the following steps: 4. You should be then able to check out your basket and download the files. Note: there may be multiple `.zip` files to download for MasterMap due to the size of the dataset. 6. Unzip the MasterMap `.zip` files and move all the `.gz` files from each to a single folder in a convenient location. We will use this folder in later steps. +# Make data available to Ubuntu + +Before creating or updating a Colouring London database, you'll need to make sure the downloaded OS files are available to the Ubuntu machine where the database is hosted. If you are using Virtualbox, you could host share folder(s) containing the OS files with the VM (e.g. [see these instructions for Mac](https://medium.com/macoclock/share-folder-between-macos-and-ubuntu-4ce84fb5c1ad)). + +# Creating a Colouring London database from scratch + ## Prerequisites You should already have set up PostgreSQL and created a database. Make sure to create environment variables to use `psql` if you haven't already: @@ -43,10 +47,6 @@ creation steps below. Install GNU parallel, this is used to speed up loading bulk data. -## Make data available to Ubuntu - -If you didn't download the OS files to the Ubuntu machine where you are setting up your Colouring London application, you will need to make them available there. If you are using Virtualbox, you could host share a the folder containing the MasteerMap files with the VM via a shared folder (e.g. [see these instructions for Mac](https://medium.com/macoclock/share-folder-between-macos-and-ubuntu-4ce84fb5c1ad)). - ## Process and load Ordnance Survey data Move into the `etl` directory and set execute permission on all scripts. From 647d75abb4dc7f38c2e825fbda330a84dc128958 Mon Sep 17 00:00:00 2001 From: Ed Chalstrey Date: Fri, 1 Apr 2022 10:21:58 +0100 Subject: [PATCH 85/89] update gitignore --- .gitignore | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/.gitignore b/.gitignore index 56060de3..b7ecfbd3 100644 --- a/.gitignore +++ b/.gitignore @@ -18,6 +18,11 @@ etl/**/*.txt etl/**/*.xls etl/**/*.xlsx etl/**/*.zip +etl/**/*.gml +etl/**/*.gz +etl/**/5690395* +postgresdata +*/__pycache__/* .DS_Store From aba46dc95d25e481857df3ba86393549394f9310 Mon Sep 17 00:00:00 2001 From: Ed Chalstrey Date: Fri, 1 Apr 2022 10:32:53 +0100 Subject: [PATCH 86/89] add contents --- etl/README.md | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-) diff --git a/etl/README.md b/etl/README.md index ab7863a8..28dbd223 100644 --- a/etl/README.md +++ b/etl/README.md @@ -1,4 +1,11 @@ -# Downloading Ordnance Survey data +# Contents + +- :arrow_down: [Downloading Ordnance Survey data](#arrow_down-downloading-ordnance-survey-data) +- :penguin: [Making data available to Ubuntu](#penguin-making-data-available-to-ubuntu) +- :new_moon: [Creating a Colouring London database from scratch](#new_moon-creating-a-colouring-london-database from-scratch) +- :full_moon: [Updating the Colouring London database with new OS data](#full_moon-updating-the-colouring-london-database-with-new-os-data) + +# :arrow_down: Downloading Ordnance Survey data The scripts in this directory are used to extract, transform and load (ETL) the core datasets for Colouring London: @@ -17,11 +24,11 @@ To get the required datasets, you'll need to complete the following steps: 4. You should be then able to check out your basket and download the files. Note: there may be multiple `.zip` files to download for MasterMap due to the size of the dataset. 6. Unzip the MasterMap `.zip` files and move all the `.gz` files from each to a single folder in a convenient location. We will use this folder in later steps. -# Make data available to Ubuntu +# :penguin: Making data available to Ubuntu Before creating or updating a Colouring London database, you'll need to make sure the downloaded OS files are available to the Ubuntu machine where the database is hosted. If you are using Virtualbox, you could host share folder(s) containing the OS files with the VM (e.g. [see these instructions for Mac](https://medium.com/macoclock/share-folder-between-macos-and-ubuntu-4ce84fb5c1ad)). -# Creating a Colouring London database from scratch +# :new_moon: Creating a Colouring London database from scratch ## Prerequisites @@ -47,7 +54,7 @@ creation steps below. Install GNU parallel, this is used to speed up loading bulk data. -## Process and load Ordnance Survey data +## Processing and loading Ordnance Survey data Move into the `etl` directory and set execute permission on all scripts. @@ -98,6 +105,6 @@ Run the remaining migrations in `../migrations` to create the rest of the databa ls ~/colouring-london/migrations/*.up.sql 2>/dev/null | while read -r migration; do psql < $migration; done; ``` -# [WIP] Updating the Colouring London database with new OS data +# :full_moon: Updating the Colouring London database with new OS data TODO: this section should instruct how to update and existing db \ No newline at end of file From efc49bd2b98503e900ae14d07de883b5ce24b1db Mon Sep 17 00:00:00 2001 From: Ed Chalstrey Date: Fri, 1 Apr 2022 10:37:56 +0100 Subject: [PATCH 87/89] fix link --- etl/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/etl/README.md b/etl/README.md index 28dbd223..2bdf4494 100644 --- a/etl/README.md +++ b/etl/README.md @@ -2,7 +2,7 @@ - :arrow_down: [Downloading Ordnance Survey data](#arrow_down-downloading-ordnance-survey-data) - :penguin: [Making data available to Ubuntu](#penguin-making-data-available-to-ubuntu) -- :new_moon: [Creating a Colouring London database from scratch](#new_moon-creating-a-colouring-london-database from-scratch) +- :new_moon: [Creating a Colouring London database from scratch](#new_moon-creating-a-colouring-london-database-from-scratch) - :full_moon: [Updating the Colouring London database with new OS data](#full_moon-updating-the-colouring-london-database-with-new-os-data) # :arrow_down: Downloading Ordnance Survey data From 667110537f7d9c9a46a6026a2734f508207a9b74 Mon Sep 17 00:00:00 2001 From: Ed Chalstrey Date: Fri, 1 Apr 2022 10:50:34 +0100 Subject: [PATCH 88/89] clarify --- etl/README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/etl/README.md b/etl/README.md index 2bdf4494..428a9682 100644 --- a/etl/README.md +++ b/etl/README.md @@ -32,7 +32,7 @@ Before creating or updating a Colouring London database, you'll need to make sur ## Prerequisites -You should already have set up PostgreSQL and created a database. Make sure to create environment variables to use `psql` if you haven't already: +You should already have set up PostgreSQL and created a database in an Ubuntu environment. Make sure to create environment variables to use `psql` if you haven't already: ```bash export PGPASSWORD= @@ -52,7 +52,7 @@ There is some performance benefit to creating indexes after bulk loading data. Otherwise, it's fine to run all the migrations at this point and skip the index creation steps below. -Install GNU parallel, this is used to speed up loading bulk data. +You should already have installed GNU parallel, which is used to speed up loading bulk data. ## Processing and loading Ordnance Survey data From a635655995ab63c2263d5f94061467fdde5beb3b Mon Sep 17 00:00:00 2001 From: Ed Chalstrey Date: Fri, 1 Apr 2022 11:23:24 +0100 Subject: [PATCH 89/89] explain "ETL" --- etl/README.md | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/etl/README.md b/etl/README.md index 428a9682..316d9af2 100644 --- a/etl/README.md +++ b/etl/README.md @@ -1,3 +1,7 @@ +# Extract, transform and load + +The scripts in this directory are used to extract, transform and load (ETL) the core datasets for Colouring London. This README acts as a guide for setting up the Colouring London database with these datasets and updating it. + # Contents - :arrow_down: [Downloading Ordnance Survey data](#arrow_down-downloading-ordnance-survey-data) @@ -7,12 +11,7 @@ # :arrow_down: Downloading Ordnance Survey data -The scripts in this directory are used to extract, transform and load (ETL) the core datasets -for Colouring London: - -Building geometries, sourced from Ordnance Survey (OS) MasterMap (Topography Layer) - -To get the required datasets, you'll need to complete the following steps: +The building geometries are sourced from Ordnance Survey (OS) MasterMap (Topography Layer). To get the required datasets, you'll need to complete the following steps: 1. Sign up for the Ordnance Survey [Data Exploration License](https://www.ordnancesurvey.co.uk/business-government/licensing-agreements/data-exploration-sign-up). You should receive an e-mail with a link to log in to the platform (this could take up to a week). 2. Navigate to https://orders.ordnancesurvey.co.uk/orders and click the button for: ✏️ Order. From here you should be able to click another button to add a product.