Merge pull request #798 from colouring-cities/os-data-loading
Document & test Ordnance Survey data loading
This commit is contained in:
commit
a4771eaac0
25
.github/workflows/etl.yml
vendored
Normal file
25
.github/workflows/etl.yml
vendored
Normal file
@ -0,0 +1,25 @@
|
|||||||
|
name: etl
|
||||||
|
on: [pull_request]
|
||||||
|
|
||||||
|
jobs:
|
||||||
|
build:
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
steps:
|
||||||
|
- uses: actions/checkout@v2
|
||||||
|
- uses: actions/setup-python@v2
|
||||||
|
with:
|
||||||
|
python-version: '3.7'
|
||||||
|
- name:
|
||||||
|
Install dependencies
|
||||||
|
run: |
|
||||||
|
sudo apt-get install libgeos-dev
|
||||||
|
python -m pip install --upgrade pip
|
||||||
|
python -m pip install pytest
|
||||||
|
python -m pip install flake8
|
||||||
|
python -m pip install -r etl/requirements.txt
|
||||||
|
- name: Run Flake8
|
||||||
|
run: |
|
||||||
|
ls etl/*py | grep -v 'join_building_data' | xargs flake8 --exclude etl/__init__.py
|
||||||
|
- name: Run tests
|
||||||
|
run: |
|
||||||
|
python -m pytest
|
5
.gitignore
vendored
5
.gitignore
vendored
@ -18,6 +18,11 @@ etl/**/*.txt
|
|||||||
etl/**/*.xls
|
etl/**/*.xls
|
||||||
etl/**/*.xlsx
|
etl/**/*.xlsx
|
||||||
etl/**/*.zip
|
etl/**/*.zip
|
||||||
|
etl/**/*.gml
|
||||||
|
etl/**/*.gz
|
||||||
|
etl/**/5690395*
|
||||||
|
postgresdata
|
||||||
|
*/__pycache__/*
|
||||||
|
|
||||||
.DS_Store
|
.DS_Store
|
||||||
|
|
||||||
|
@ -49,7 +49,9 @@ ssh <linuxusername>@localhost -p 4022
|
|||||||
- [:rainbow: Installing Colouring London](#rainbow-installing-colouring-london)
|
- [:rainbow: Installing Colouring London](#rainbow-installing-colouring-london)
|
||||||
- [:arrow_down: Installing Node.js](#arrow_down-installing-nodejs)
|
- [:arrow_down: Installing Node.js](#arrow_down-installing-nodejs)
|
||||||
- [:large_blue_circle: Configuring PostgreSQL](#large_blue_circle-configuring-postgresql)
|
- [:large_blue_circle: Configuring PostgreSQL](#large_blue_circle-configuring-postgresql)
|
||||||
|
- [:space_invader: Create an empty database](#space_invader-create-an-empty-database)
|
||||||
- [:arrow_forward: Configuring Node.js](#arrow_forward-configuring-nodejs)
|
- [:arrow_forward: Configuring Node.js](#arrow_forward-configuring-nodejs)
|
||||||
|
- [:snake: Set up Python](#snake-set-up-python)
|
||||||
- [:house: Loading the building data](#house-loading-the-building-data)
|
- [:house: Loading the building data](#house-loading-the-building-data)
|
||||||
- [:computer: Running the application](#computer-running-the-application)
|
- [:computer: Running the application](#computer-running-the-application)
|
||||||
- [:eyes: Viewing the application](#eyes-viewing-the-application)
|
- [:eyes: Viewing the application](#eyes-viewing-the-application)
|
||||||
@ -66,7 +68,7 @@ sudo apt-get upgrade -y
|
|||||||
Now install some essential tools.
|
Now install some essential tools.
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
sudo apt-get install -y build-essential git wget curl
|
sudo apt-get install -y build-essential git wget curl parallel rename
|
||||||
```
|
```
|
||||||
|
|
||||||
### :red_circle: Installing PostgreSQL
|
### :red_circle: Installing PostgreSQL
|
||||||
@ -157,7 +159,7 @@ Ensure the `en_US` locale exists.
|
|||||||
sudo locale-gen en_US.UTF-8
|
sudo locale-gen en_US.UTF-8
|
||||||
```
|
```
|
||||||
|
|
||||||
Configure the database to listen on network connection.
|
Configure postgres to listen on network connection.
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
sudo sed -i "s/#\?listen_address.*/listen_addresses '*'/" /etc/postgresql/12/main/postgresql.conf
|
sudo sed -i "s/#\?listen_address.*/listen_addresses '*'/" /etc/postgresql/12/main/postgresql.conf
|
||||||
@ -189,6 +191,10 @@ If you intend to load the full CL database from a dump file into your dev enviro
|
|||||||
|
|
||||||
</details><p></p>
|
</details><p></p>
|
||||||
|
|
||||||
|
### :space_invader: Create an empty database
|
||||||
|
|
||||||
|
Now create an empty database configured with geo-spatial tools. The database name (`<colouringlondondb>`) is arbitrary.
|
||||||
|
|
||||||
Set environment variables, which will simplify running subsequent `psql` commands.
|
Set environment variables, which will simplify running subsequent `psql` commands.
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
@ -198,7 +204,7 @@ export PGHOST=localhost
|
|||||||
export PGDATABASE=<colouringlondondb>
|
export PGDATABASE=<colouringlondondb>
|
||||||
```
|
```
|
||||||
|
|
||||||
Create a colouring london database if none exists. The name (`<colouringlondondb>`) is arbitrary.
|
Create the database.
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
sudo -u postgres psql -c "SELECT 1 FROM pg_database WHERE datname = '<colouringlondondb>';" | grep -q 1 || sudo -u postgres createdb -E UTF8 -T template0 --locale=en_US.utf8 -O <username> <colouringlondondb>
|
sudo -u postgres psql -c "SELECT 1 FROM pg_database WHERE datname = '<colouringlondondb>';" | grep -q 1 || sudo -u postgres createdb -E UTF8 -T template0 --locale=en_US.utf8 -O <username> <colouringlondondb>
|
||||||
@ -228,10 +234,22 @@ cd ~/colouring-london/app
|
|||||||
npm install
|
npm install
|
||||||
```
|
```
|
||||||
|
|
||||||
|
### :snake: Set up Python
|
||||||
|
|
||||||
|
Install python and related tools.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
sudo apt-get install -y python3 python3-pip python3-dev python3-venv
|
||||||
|
```
|
||||||
|
|
||||||
## :house: Loading the building data
|
## :house: Loading the building data
|
||||||
|
|
||||||
|
There are several ways to create the Colouring London database in your environment. The simplest way if you are just trying out the application would be to use test data from OSM, but otherwise you should follow one of the instructions below to create the full database either from scratch, or from a previously made db (via a dump file).
|
||||||
|
|
||||||
|
To create the full database from scratch, follow [these instructions](../etl/README.md), otherwise choose one of the following:
|
||||||
|
|
||||||
<details>
|
<details>
|
||||||
<summary> With a database dump </summary><p></p>
|
<summary> Create database from dump </summary><p></p>
|
||||||
|
|
||||||
If you are a developer on the Colouring London project (or another Colouring Cities project), you may have a production database (or staging etc) that you wish to duplicate in your development environment.
|
If you are a developer on the Colouring London project (or another Colouring Cities project), you may have a production database (or staging etc) that you wish to duplicate in your development environment.
|
||||||
|
|
||||||
@ -261,22 +279,16 @@ ls ~/colouring-london/migrations/*.up.sql 2>/dev/null | while read -r migration;
|
|||||||
</details>
|
</details>
|
||||||
|
|
||||||
<details>
|
<details>
|
||||||
<summary> With test data </summary><p></p>
|
<summary> Create database with test data </summary><p></p>
|
||||||
|
|
||||||
This section shows how to load test buildings into the application from OpenStreetMaps (OSM).
|
This section shows how to load test buildings into the application from OpenStreetMaps (OSM).
|
||||||
|
|
||||||
#### Set up Python
|
#### Load OpenStreetMap test polygons
|
||||||
|
|
||||||
Install python and related tools.
|
Create a virtual environment for python in the `etl` folder of your repository. In the following example we have name the virtual environment *colouringlondon* but it can have any name.
|
||||||
|
|
||||||
```bash
|
|
||||||
sudo apt-get install -y python3 python3-pip python3-dev python3-venv
|
|
||||||
```
|
|
||||||
|
|
||||||
Now set up a virtual environment for python. In the following example we have named the
|
|
||||||
virtual environment *colouringlondon* but it can have any name.
|
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
|
cd ~/colouring-london/etl
|
||||||
pyvenv colouringlondon
|
pyvenv colouringlondon
|
||||||
```
|
```
|
||||||
|
|
||||||
@ -293,18 +305,9 @@ pip install --upgrade pip
|
|||||||
pip install --upgrade setuptools wheel
|
pip install --upgrade setuptools wheel
|
||||||
```
|
```
|
||||||
|
|
||||||
#### Load OpenStreetMap test polygons
|
Install the required python packages.
|
||||||
|
|
||||||
First install prerequisites.
|
|
||||||
```bash
|
|
||||||
sudo apt-get install -y parallel
|
|
||||||
```
|
|
||||||
|
|
||||||
Install the required python packages. This relies on the `requirements.txt` file located
|
|
||||||
in the `etl` folder of your local repository.
|
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
cd ~/colouring-london/etl/
|
|
||||||
pip install -r requirements.txt
|
pip install -r requirements.txt
|
||||||
```
|
```
|
||||||
|
|
||||||
|
142
etl/README.md
142
etl/README.md
@ -1,91 +1,109 @@
|
|||||||
# Data loading
|
# Extract, transform and load
|
||||||
|
|
||||||
The scripts in this directory are used to extract, transform and load (ETL) the core datasets
|
The scripts in this directory are used to extract, transform and load (ETL) the core datasets for Colouring London. This README acts as a guide for setting up the Colouring London database with these datasets and updating it.
|
||||||
for Colouring London:
|
|
||||||
|
|
||||||
1. Building geometries, sourced from Ordnance Survey MasterMap (Topography Layer)
|
# Contents
|
||||||
1. Unique Property Reference Numbers (UPRNs), sourced from Ordnance Survey AddressBase
|
|
||||||
|
- :arrow_down: [Downloading Ordnance Survey data](#arrow_down-downloading-ordnance-survey-data)
|
||||||
|
- :penguin: [Making data available to Ubuntu](#penguin-making-data-available-to-ubuntu)
|
||||||
|
- :new_moon: [Creating a Colouring London database from scratch](#new_moon-creating-a-colouring-london-database-from-scratch)
|
||||||
|
- :full_moon: [Updating the Colouring London database with new OS data](#full_moon-updating-the-colouring-london-database-with-new-os-data)
|
||||||
|
|
||||||
|
# :arrow_down: Downloading Ordnance Survey data
|
||||||
|
|
||||||
|
The building geometries are sourced from Ordnance Survey (OS) MasterMap (Topography Layer). To get the required datasets, you'll need to complete the following steps:
|
||||||
|
|
||||||
|
1. Sign up for the Ordnance Survey [Data Exploration License](https://www.ordnancesurvey.co.uk/business-government/licensing-agreements/data-exploration-sign-up). You should receive an e-mail with a link to log in to the platform (this could take up to a week).
|
||||||
|
2. Navigate to https://orders.ordnancesurvey.co.uk/orders and click the button for: ✏️ Order. From here you should be able to click another button to add a product.
|
||||||
|
3. Drop a rectangle or Polygon over London and make the following selections, clicking the "Add to basket" button for each:
|
||||||
|
|
||||||
|
![](screenshot/MasterMap.png)
|
||||||
|
<p></p>
|
||||||
|
|
||||||
|
4. You should be then able to check out your basket and download the files. Note: there may be multiple `.zip` files to download for MasterMap due to the size of the dataset.
|
||||||
|
6. Unzip the MasterMap `.zip` files and move all the `.gz` files from each to a single folder in a convenient location. We will use this folder in later steps.
|
||||||
|
|
||||||
|
# :penguin: Making data available to Ubuntu
|
||||||
|
|
||||||
|
Before creating or updating a Colouring London database, you'll need to make sure the downloaded OS files are available to the Ubuntu machine where the database is hosted. If you are using Virtualbox, you could host share folder(s) containing the OS files with the VM (e.g. [see these instructions for Mac](https://medium.com/macoclock/share-folder-between-macos-and-ubuntu-4ce84fb5c1ad)).
|
||||||
|
|
||||||
|
# :new_moon: Creating a Colouring London database from scratch
|
||||||
|
|
||||||
## Prerequisites
|
## Prerequisites
|
||||||
|
|
||||||
Install PostgreSQL and create a database for colouringlondon, with a database
|
You should already have set up PostgreSQL and created a database in an Ubuntu environment. Make sure to create environment variables to use `psql` if you haven't already:
|
||||||
user that can connect to it. The [PostgreSQL
|
|
||||||
documentation](https://www.postgresql.org/docs/12/tutorial-start.html) covers
|
|
||||||
installation and getting started.
|
|
||||||
|
|
||||||
Install the [PostGIS extension](https://postgis.net/).
|
```bash
|
||||||
|
export PGPASSWORD=<pgpassword>
|
||||||
Connect to the colouringlondon database and add the PostGIS, pgcrypto and
|
export PGUSER=<username>
|
||||||
pg_trgm extensions:
|
export PGHOST=localhost
|
||||||
|
export PGDATABASE=<colouringlondondb>
|
||||||
```sql
|
|
||||||
create extension postgis;
|
|
||||||
create extension pgcrypto;
|
|
||||||
create extension pg_trgm;
|
|
||||||
```
|
```
|
||||||
|
|
||||||
Create the core database tables:
|
Create the core database tables:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
psql < ../migrations/001.core.up.sql
|
cd ~/colouring-london
|
||||||
|
psql < migrations/001.core.up.sql
|
||||||
```
|
```
|
||||||
|
|
||||||
There is some performance benefit to creating indexes after bulk loading data.
|
There is some performance benefit to creating indexes after bulk loading data.
|
||||||
Otherwise, it's fine to run all the migrations at this point and skip the index
|
Otherwise, it's fine to run all the migrations at this point and skip the index
|
||||||
creation steps below.
|
creation steps below.
|
||||||
|
|
||||||
Install GNU parallel, this is used to speed up loading bulk data.
|
You should already have installed GNU parallel, which is used to speed up loading bulk data.
|
||||||
|
|
||||||
|
## Processing and loading Ordnance Survey data
|
||||||
|
|
||||||
## Process and load Ordance Survey data
|
Move into the `etl` directory and set execute permission on all scripts.
|
||||||
|
|
||||||
Before running any of these scripts, you will need the OS data for your area of
|
|
||||||
interest. AddressBase and MasterMap are available directly from [Ordnance
|
|
||||||
Survey](https://www.ordnancesurvey.co.uk/). The alternative setup below uses
|
|
||||||
OpenStreetMap.
|
|
||||||
|
|
||||||
The scripts should be run in the following order:
|
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# extract both datasets
|
cd ~/colouring-london/etl
|
||||||
extract_addressbase.sh ./addressbase_dir
|
chmod +x *.sh
|
||||||
extract_mastermap.sh ./mastermap_dir
|
|
||||||
# filter mastermap ('building' polygons and any others referenced by addressbase)
|
|
||||||
filter_transform_mastermap_for_loading.sh ./addressbase_dir ./mastermap_dir
|
|
||||||
# load all building outlines
|
|
||||||
load_geometries.sh ./mastermap_dir
|
|
||||||
# index geometries (should be faster after loading)
|
|
||||||
psql < ../migrations/002.index-geometries.sql
|
|
||||||
# create a building record per outline
|
|
||||||
create_building_records.sh
|
|
||||||
# add UPRNs where they match
|
|
||||||
load_uprns.py ./addressbase_dir
|
|
||||||
# index building records
|
|
||||||
psql < ../migrations/003.index-buildings.sql
|
|
||||||
```
|
```
|
||||||
|
|
||||||
## Alternative, using OpenStreetMap
|
Extract the MasterMap data (this step could take a while).
|
||||||
|
|
||||||
This uses the [osmnx](https://github.com/gboeing/osmnx) python package to get OpenStreetMap data. You will need python and osmnx to run `get_test_polygons.py`.
|
```bash
|
||||||
|
sudo ./extract_mastermap.sh /path/to/mastermap_dir
|
||||||
To help test the Colouring London application, `get_test_polygons.py` will attempt to save a
|
```
|
||||||
small (1.5km²) extract from OpenStreetMap to a format suitable for loading to the database.
|
|
||||||
|
Filter MasterMap 'building' polygons.
|
||||||
In this case, run:
|
|
||||||
|
```bash
|
||||||
|
sudo ./filter_transform_mastermap_for_loading.sh /path/to/mastermap_dir
|
||||||
|
```
|
||||||
|
|
||||||
|
Load all building outlines. Note: you should ensure that `mastermap_dir` has permissions that will allow the linux `find` command to work without using sudo.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
./load_geometries.sh /path/to/mastermap_dir
|
||||||
|
```
|
||||||
|
|
||||||
|
Index geometries.
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# download test data
|
|
||||||
python get_test_polygons.py
|
|
||||||
# load all building outlines
|
|
||||||
./load_geometries.sh ./
|
|
||||||
# index geometries (should be faster after loading)
|
|
||||||
psql < ../migrations/002.index-geometries.up.sql
|
psql < ../migrations/002.index-geometries.up.sql
|
||||||
# create a building record per outline
|
|
||||||
./create_building_records.sh
|
|
||||||
# index building records
|
|
||||||
psql < ../migrations/003.index-buildings.up.sql
|
|
||||||
```
|
```
|
||||||
|
|
||||||
## Finally
|
<!-- TODO: Drop outside limit. -->
|
||||||
|
|
||||||
Run the remaining migrations in `../migrations` to create the rest of the database structure.
|
<!-- ```bash
|
||||||
|
./drop_outside_limit.sh /path/to/boundary_file
|
||||||
|
```` -->
|
||||||
|
|
||||||
|
Create a building record per outline.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
./create_building_records.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
Run the remaining migrations in `../migrations` to create the rest of the database structure.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ls ~/colouring-london/migrations/*.up.sql 2>/dev/null | while read -r migration; do psql < $migration; done;
|
||||||
|
```
|
||||||
|
|
||||||
|
# :full_moon: Updating the Colouring London database with new OS data
|
||||||
|
|
||||||
|
TODO: this section should instruct how to update and existing db
|
1
etl/__init__.py
Normal file
1
etl/__init__.py
Normal file
@ -0,0 +1 @@
|
|||||||
|
from .filter_mastermap import filter_mastermap
|
@ -1,60 +0,0 @@
|
|||||||
"""Check if AddressBase TOIDs will match MasterMap
|
|
||||||
"""
|
|
||||||
import csv
|
|
||||||
import glob
|
|
||||||
import os
|
|
||||||
import sys
|
|
||||||
|
|
||||||
from multiprocessing import Pool
|
|
||||||
|
|
||||||
csv.field_size_limit(sys.maxsize)
|
|
||||||
|
|
||||||
def main(ab_path, mm_path):
|
|
||||||
ab_paths = sorted(glob.glob(os.path.join(ab_path, "*.gml.csv.filtered.csv")))
|
|
||||||
mm_paths = sorted(glob.glob(os.path.join(mm_path, "*.gml.csv")))
|
|
||||||
|
|
||||||
try:
|
|
||||||
assert len(ab_paths) == len(mm_paths)
|
|
||||||
except AssertionError:
|
|
||||||
print(ab_paths)
|
|
||||||
print(mm_paths)
|
|
||||||
|
|
||||||
zipped_paths = zip(ab_paths, mm_paths)
|
|
||||||
|
|
||||||
# parallel map over tiles
|
|
||||||
with Pool() as p:
|
|
||||||
p.starmap(check, zipped_paths)
|
|
||||||
|
|
||||||
def check(ab_path, mm_path):
|
|
||||||
tile = str(os.path.basename(ab_path)).split(".")[0]
|
|
||||||
output_base = os.path.dirname(ab_path)
|
|
||||||
ab_toids = set()
|
|
||||||
mm_toids = set()
|
|
||||||
|
|
||||||
with open(ab_path, 'r') as fh:
|
|
||||||
r = csv.DictReader(fh)
|
|
||||||
for line in r:
|
|
||||||
ab_toids.add(line['toid'])
|
|
||||||
|
|
||||||
with open(mm_path, 'r') as fh:
|
|
||||||
r = csv.DictReader(fh)
|
|
||||||
for line in r:
|
|
||||||
mm_toids.add(line['fid'])
|
|
||||||
|
|
||||||
missing = ab_toids - mm_toids
|
|
||||||
print(tile, "MasterMap:", len(mm_toids), "Addressbase:", len(ab_toids), "AB but not MM:", len(missing))
|
|
||||||
|
|
||||||
with open(os.path.join(output_base, 'missing_toids_{}.txt'.format(tile)), 'w') as fh:
|
|
||||||
for toid in missing:
|
|
||||||
fh.write("{}\n".format(toid))
|
|
||||||
|
|
||||||
with open(os.path.join(output_base, 'ab_toids_{}.txt'.format(tile)), 'w') as fh:
|
|
||||||
for toid in ab_toids:
|
|
||||||
fh.write("{}\n".format(toid))
|
|
||||||
|
|
||||||
|
|
||||||
if __name__ == '__main__':
|
|
||||||
if len(sys.argv) != 3:
|
|
||||||
print("Usage: check_ab_mm_match.py ./path/to/addressbase/dir ./path/to/mastermap/dir")
|
|
||||||
exit(-1)
|
|
||||||
main(sys.argv[1], sys.argv[2])
|
|
@ -1,63 +0,0 @@
|
|||||||
#!/usr/bin/env bash
|
|
||||||
|
|
||||||
#
|
|
||||||
# Extract address points from OS Addressbase GML
|
|
||||||
# - as supplied in 5km tiles, zip/gz archives
|
|
||||||
#
|
|
||||||
: ${1?"Usage: $0 ./path/to/data/dir"}
|
|
||||||
|
|
||||||
data_dir=$1
|
|
||||||
|
|
||||||
#
|
|
||||||
# Unzip to GML
|
|
||||||
#
|
|
||||||
|
|
||||||
find $data_dir -type f -name '*.zip' -printf "%f\n" | \
|
|
||||||
parallel \
|
|
||||||
unzip -u $data_dir/{} -d $data_dir
|
|
||||||
|
|
||||||
#
|
|
||||||
# Extract to CSV
|
|
||||||
#
|
|
||||||
# Relevant fields:
|
|
||||||
# WKT
|
|
||||||
# crossReference (list of TOID/other references)
|
|
||||||
# source (list of cross-reference sources: 7666MT refers to MasterMap Topo)
|
|
||||||
# uprn
|
|
||||||
# parentUPRN
|
|
||||||
# logicalStatus: 1 (one) is approved (otherwise historical, provisional)
|
|
||||||
#
|
|
||||||
|
|
||||||
find $data_dir -type f -name '*.gml' -printf "%f\n" | \
|
|
||||||
parallel \
|
|
||||||
ogr2ogr -f CSV \
|
|
||||||
-select crossReference,source,uprn,parentUPRN,logicalStatus \
|
|
||||||
$data_dir/{}.csv $data_dir/{} BasicLandPropertyUnit \
|
|
||||||
-lco GEOMETRY=AS_WKT
|
|
||||||
|
|
||||||
#
|
|
||||||
# Filter
|
|
||||||
#
|
|
||||||
find $data_dir -type f -name '*.gml.csv' -printf "%f\n" | \
|
|
||||||
parallel \
|
|
||||||
python filter_addressbase_csv.py $data_dir/{}
|
|
||||||
|
|
||||||
|
|
||||||
#
|
|
||||||
# Transform to 3857 (web mercator)
|
|
||||||
#
|
|
||||||
find $data_dir -type f -name '*.filtered.csv' -printf "%f\n" | \
|
|
||||||
parallel \
|
|
||||||
ogr2ogr \
|
|
||||||
-f CSV $data_dir/{}.3857.csv \
|
|
||||||
-s_srs "EPSG:4326" \
|
|
||||||
-t_srs "EPSG:3857" \
|
|
||||||
$data_dir/{} \
|
|
||||||
-lco GEOMETRY=AS_WKT
|
|
||||||
|
|
||||||
#
|
|
||||||
# Update to EWKT (with SRID indicator for loading to Postgres)
|
|
||||||
#
|
|
||||||
find $data_dir -type f -name '*.3857.csv' -printf "%f\n" | \
|
|
||||||
parallel \
|
|
||||||
cat $data_dir/{} "|" sed "'s/^\"POINT/\"SRID=3857;POINT/'" "|" cut -f 1,3,4,5 -d "','" ">" $data_dir/{}.loadable
|
|
@ -1,29 +1,29 @@
|
|||||||
#!/usr/bin/env bash
|
#!/usr/bin/env bash
|
||||||
|
|
||||||
#
|
|
||||||
# Extract MasterMap
|
|
||||||
#
|
|
||||||
|
|
||||||
: ${1?"Usage: $0 ./path/to/mastermap/dir"}
|
: ${1?"Usage: $0 ./path/to/mastermap/dir"}
|
||||||
|
|
||||||
data_dir=$1
|
data_dir=$1
|
||||||
|
|
||||||
#
|
|
||||||
# Extract buildings from *.gz to CSV
|
echo "Extract buildings from *.gz..."
|
||||||
#
|
|
||||||
# Features where::
|
# Features where::
|
||||||
# descriptiveGroup = '(1:Building)'
|
# descriptiveGroup = '(1:Building)'
|
||||||
#
|
#
|
||||||
# Use `fid` as source ID, aka TOID.
|
# Use `fid` as source ID, aka TOID.
|
||||||
#
|
|
||||||
|
|
||||||
find $data_dir -type f -name '*.gz' -printf "%f\n" | \
|
find $data_dir -type f -name '*.gz' -printf "%f\n" | \
|
||||||
parallel \
|
parallel \
|
||||||
gunzip $data_dir/{} -k -S gml
|
gunzip $data_dir/{} -k -S gml
|
||||||
|
|
||||||
|
echo "Rename extracted files to .gml..."
|
||||||
rename 's/$/.gml/' $data_dir/*[^gzvt]
|
rename 's/$/.gml/' $data_dir/*[^gzvt]
|
||||||
|
|
||||||
find $data_dir -type f -name '*.gml' -printf "%f\n" | \
|
# Note: previously the rename cmd above resulted in some temp files being renamed to .gml
|
||||||
|
# so I have specified the start of the filename (appears to be consistent for all OS MasterMap downloads)
|
||||||
|
# we may need to update this below for other downloads
|
||||||
|
echo "Covert .gml files to .csv"
|
||||||
|
find $data_dir -type f -name '*5690395*.gml' -printf "%f\n" | \
|
||||||
parallel \
|
parallel \
|
||||||
ogr2ogr \
|
ogr2ogr \
|
||||||
-select fid,descriptiveGroup \
|
-select fid,descriptiveGroup \
|
||||||
@ -32,5 +32,6 @@ ogr2ogr \
|
|||||||
TopographicArea \
|
TopographicArea \
|
||||||
-lco GEOMETRY=AS_WKT
|
-lco GEOMETRY=AS_WKT
|
||||||
|
|
||||||
|
echo "Remove .gfs and .gml files from previous steps..."
|
||||||
rm $data_dir/*.gfs
|
rm $data_dir/*.gfs
|
||||||
rm $data_dir/*.gml
|
rm $data_dir/*.gml
|
||||||
|
@ -1,42 +0,0 @@
|
|||||||
#!/usr/bin/env python
|
|
||||||
"""Read ogr2ogr-converted CSV, filter to get OSMM TOID reference, only active addresses
|
|
||||||
"""
|
|
||||||
import csv
|
|
||||||
import json
|
|
||||||
import sys
|
|
||||||
|
|
||||||
|
|
||||||
def main(input_path):
|
|
||||||
output_path = "{}.filtered.csv".format(input_path)
|
|
||||||
fieldnames = (
|
|
||||||
'wkt', 'toid', 'uprn', 'parent_uprn'
|
|
||||||
)
|
|
||||||
with open(input_path) as input_fh:
|
|
||||||
with open(output_path, 'w') as output_fh:
|
|
||||||
w = csv.DictWriter(output_fh, fieldnames=fieldnames)
|
|
||||||
w.writeheader()
|
|
||||||
r = csv.DictReader(input_fh)
|
|
||||||
for line in r:
|
|
||||||
if line['logicalStatus'] != "1":
|
|
||||||
continue
|
|
||||||
|
|
||||||
refs = json.loads(line['crossReference'])
|
|
||||||
sources = json.loads(line['source'])
|
|
||||||
toid = ""
|
|
||||||
for ref, source in zip(refs, sources):
|
|
||||||
if source == "7666MT":
|
|
||||||
toid = ref
|
|
||||||
|
|
||||||
w.writerow({
|
|
||||||
'uprn': line['uprn'],
|
|
||||||
'parent_uprn': line['parentUPRN'],
|
|
||||||
'toid': toid,
|
|
||||||
'wkt': line['WKT'],
|
|
||||||
})
|
|
||||||
|
|
||||||
|
|
||||||
if __name__ == '__main__':
|
|
||||||
if len(sys.argv) != 2:
|
|
||||||
print("Usage: filter_addressbase_csv.py ./path/to/data.csv")
|
|
||||||
exit(-1)
|
|
||||||
main(sys.argv[1])
|
|
@ -1,60 +1,44 @@
|
|||||||
"""Filter MasterMap to buildings and addressbase-matches
|
"""Filter MasterMap to buildings
|
||||||
|
|
||||||
- WHERE descriptiveGroup includes 'Building'
|
- WHERE descriptiveGroup includes 'Building'
|
||||||
- OR toid in addressbase_toids
|
|
||||||
"""
|
"""
|
||||||
import csv
|
import csv
|
||||||
import glob
|
import glob
|
||||||
import json
|
|
||||||
import os
|
import os
|
||||||
import sys
|
import sys
|
||||||
|
|
||||||
from multiprocessing import Pool
|
|
||||||
|
|
||||||
csv.field_size_limit(sys.maxsize)
|
csv.field_size_limit(sys.maxsize)
|
||||||
|
|
||||||
def main(ab_path, mm_path):
|
|
||||||
mm_paths = sorted(glob.glob(os.path.join(mm_path, "*.gml.csv")))
|
|
||||||
toid_paths = sorted(glob.glob(os.path.join(ab_path, "ab_toids_*.txt")))
|
|
||||||
|
|
||||||
try:
|
def main(mastermap_path):
|
||||||
assert len(mm_paths) == len(toid_paths)
|
mm_paths = sorted(glob.glob(os.path.join(mastermap_path, "*.gml.csv")))
|
||||||
except AssertionError:
|
for mm_path in mm_paths:
|
||||||
print(mm_paths)
|
filter_mastermap(mm_path)
|
||||||
print(toid_paths)
|
|
||||||
zipped_paths = zip(mm_paths, toid_paths)
|
|
||||||
|
|
||||||
# parallel map over tiles
|
|
||||||
with Pool() as p:
|
|
||||||
p.starmap(filter, zipped_paths)
|
|
||||||
|
|
||||||
def filter(mm_path, toid_path):
|
def filter_mastermap(mm_path):
|
||||||
with open(toid_path, 'r') as fh:
|
output_path = str(mm_path).replace(".gml.csv", "")
|
||||||
r = csv.reader(fh)
|
output_path = "{}.filtered.csv".format(output_path)
|
||||||
toids = set(line[0] for line in r)
|
|
||||||
|
|
||||||
output_path = "{}.filtered.csv".format(str(mm_path).replace(".gml.csv", ""))
|
|
||||||
alt_output_path = "{}.filtered_not_building.csv".format(str(mm_path).replace(".gml.csv", ""))
|
|
||||||
output_fieldnames = ('WKT', 'fid', 'descriptiveGroup')
|
output_fieldnames = ('WKT', 'fid', 'descriptiveGroup')
|
||||||
|
# Open the input csv with all polygons, buildings and others
|
||||||
with open(mm_path, 'r') as fh:
|
with open(mm_path, 'r') as fh:
|
||||||
r = csv.DictReader(fh)
|
r = csv.DictReader(fh)
|
||||||
|
# Open a new output csv that will contain just buildings
|
||||||
with open(output_path, 'w') as output_fh:
|
with open(output_path, 'w') as output_fh:
|
||||||
w = csv.DictWriter(output_fh, fieldnames=output_fieldnames)
|
w = csv.DictWriter(output_fh, fieldnames=output_fieldnames)
|
||||||
w.writeheader()
|
w.writeheader()
|
||||||
with open(alt_output_path, 'w') as alt_output_fh:
|
for line in r:
|
||||||
alt_w = csv.DictWriter(alt_output_fh, fieldnames=output_fieldnames)
|
try:
|
||||||
alt_w.writeheader()
|
|
||||||
for line in r:
|
|
||||||
if 'Building' in line['descriptiveGroup']:
|
if 'Building' in line['descriptiveGroup']:
|
||||||
w.writerow(line)
|
w.writerow(line)
|
||||||
|
# when descriptiveGroup is missing, ignore this Polygon
|
||||||
elif line['fid'] in toids:
|
except TypeError:
|
||||||
alt_w.writerow(line)
|
pass
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
if __name__ == '__main__':
|
if __name__ == '__main__':
|
||||||
if len(sys.argv) != 3:
|
if len(sys.argv) != 2:
|
||||||
print("Usage: filter_mastermap.py ./path/to/addressbase/dir ./path/to/mastermap/dir")
|
print("Usage: filter_mastermap.py ./path/to/mastermap/dir")
|
||||||
exit(-1)
|
exit(-1)
|
||||||
main(sys.argv[1], sys.argv[2])
|
main(sys.argv[1])
|
||||||
|
@ -1,29 +1,13 @@
|
|||||||
#!/usr/bin/env bash
|
#!/usr/bin/env bash
|
||||||
|
|
||||||
#
|
: ${1?"Usage: $0 ./path/to/mastermap/dir"}
|
||||||
# Filter and transform for loading
|
|
||||||
#
|
|
||||||
: ${1?"Usage: $0 ./path/to/addressbase/dir ./path/to/mastermap/dir"}
|
|
||||||
: ${2?"Usage: $0 ./path/to/addressbase/dir ./path/to/mastermap/dir"}
|
|
||||||
|
|
||||||
addressbase_dir=$1
|
mastermap_dir=$1
|
||||||
mastermap_dir=$2
|
|
||||||
|
|
||||||
#
|
echo "Filter WHERE descriptiveGroup = '(1:Building)'... "
|
||||||
# Check which TOIDs are matched against UPRNs
|
python filter_mastermap.py $mastermap_dir
|
||||||
#
|
|
||||||
python check_ab_mm_match.py $addressbase_dir $mastermap_dir
|
|
||||||
|
|
||||||
#
|
echo "Transform to 3857 (web mercator)..."
|
||||||
# Filter
|
|
||||||
# - WHERE descriptiveGroup = '(1:Building)'
|
|
||||||
# - OR toid in addressbase_toids
|
|
||||||
#
|
|
||||||
python filter_mastermap.py $addressbase_dir $mastermap_dir
|
|
||||||
|
|
||||||
#
|
|
||||||
# Transform to 3857 (web mercator)
|
|
||||||
#
|
|
||||||
find $mastermap_dir -type f -name '*.filtered.csv' -printf "%f\n" | \
|
find $mastermap_dir -type f -name '*.filtered.csv' -printf "%f\n" | \
|
||||||
parallel \
|
parallel \
|
||||||
ogr2ogr \
|
ogr2ogr \
|
||||||
@ -34,13 +18,13 @@ ogr2ogr \
|
|||||||
$mastermap_dir/{} \
|
$mastermap_dir/{} \
|
||||||
-lco GEOMETRY=AS_WKT
|
-lco GEOMETRY=AS_WKT
|
||||||
|
|
||||||
#
|
echo "Update to EWKT (with SRID indicator for loading to Postgres)..."
|
||||||
# Update to EWKT (with SRID indicator for loading to Postgres)
|
echo "Updating POLYGONs.."
|
||||||
#
|
|
||||||
find $mastermap_dir -type f -name '*.3857.csv' -printf "%f\n" | \
|
find $mastermap_dir -type f -name '*.3857.csv' -printf "%f\n" | \
|
||||||
parallel \
|
parallel \
|
||||||
sed -i "'s/^\"POLYGON/\"SRID=3857;POLYGON/'" $mastermap_dir/{}
|
sed -i "'s/^\"POLYGON/\"SRID=3857;POLYGON/'" $mastermap_dir/{}
|
||||||
|
|
||||||
|
echo "Updating MULTIPOLYGONs.."
|
||||||
find $mastermap_dir -type f -name '*.3857.csv' -printf "%f\n" | \
|
find $mastermap_dir -type f -name '*.3857.csv' -printf "%f\n" | \
|
||||||
parallel \
|
parallel \
|
||||||
sed -i "'s/^\"MULTIPOLYGON/\"SRID=3857;MULTIPOLYGON/'" $mastermap_dir/{}
|
sed -i "'s/^\"MULTIPOLYGON/\"SRID=3857;MULTIPOLYGON/'" $mastermap_dir/{}
|
||||||
|
@ -25,11 +25,12 @@ gdf = osmnx.footprints_from_point(point=point, dist=dist)
|
|||||||
|
|
||||||
# preview image
|
# preview image
|
||||||
gdf_proj = osmnx.projection.project_gdf(gdf, to_crs={'init': 'epsg:3857'})
|
gdf_proj = osmnx.projection.project_gdf(gdf, to_crs={'init': 'epsg:3857'})
|
||||||
gdf_proj = gdf_proj[gdf_proj.geometry.apply(lambda g: g.geom_type != 'MultiPolygon')]
|
gdf_proj = gdf_proj[gdf_proj.geometry.apply(lambda g: g.geom_type != 'MultiPolygon')] # noqa
|
||||||
|
|
||||||
fig, ax = osmnx.plot_footprints(gdf_proj, bgcolor='#333333', color='w', figsize=(4,4),
|
fig, ax = osmnx.plot_footprints(gdf_proj, bgcolor='#333333',
|
||||||
save=True, show=False, close=True,
|
color='w', figsize=(4, 4),
|
||||||
filename='test_buildings_preview', dpi=600)
|
save=True, show=False, close=True,
|
||||||
|
filename='test_buildings_preview', dpi=600)
|
||||||
|
|
||||||
# save
|
# save
|
||||||
test_dir = os.path.dirname(__file__)
|
test_dir = os.path.dirname(__file__)
|
||||||
@ -50,7 +51,13 @@ gdf_to_save.rename(
|
|||||||
# convert to CSV
|
# convert to CSV
|
||||||
test_data_csv = str(os.path.join(test_dir, 'test_buildings.3857.csv'))
|
test_data_csv = str(os.path.join(test_dir, 'test_buildings.3857.csv'))
|
||||||
subprocess.run(["rm", test_data_csv])
|
subprocess.run(["rm", test_data_csv])
|
||||||
subprocess.run(["ogr2ogr", "-f", "CSV", test_data_csv, test_data_geojson, "-lco", "GEOMETRY=AS_WKT"])
|
subprocess.run(
|
||||||
|
["ogr2ogr", "-f", "CSV", test_data_csv,
|
||||||
|
test_data_geojson, "-lco", "GEOMETRY=AS_WKT"]
|
||||||
|
)
|
||||||
|
|
||||||
# add SRID for ease of loading to PostgreSQL
|
# add SRID for ease of loading to PostgreSQL
|
||||||
subprocess.run(["sed", "-i", "s/^\"POLYGON/\"SRID=3857;POLYGON/", test_data_csv])
|
subprocess.run(
|
||||||
|
["sed", "-i", "s/^\"POLYGON/\"SRID=3857;POLYGON/",
|
||||||
|
test_data_csv]
|
||||||
|
)
|
||||||
|
@ -1,27 +1,25 @@
|
|||||||
#!/usr/bin/env bash
|
#!/usr/bin/env bash
|
||||||
|
|
||||||
#
|
|
||||||
# Load geometries from GeoJSON to Postgres
|
# Load geometries from GeoJSON to Postgres
|
||||||
# - assume postgres connection details are set in the environment using PGUSER, PGHOST etc.
|
# - assume postgres connection details are set in the environment using PGUSER, PGHOST etc.
|
||||||
#
|
|
||||||
: ${1?"Usage: $0 ./path/to/mastermap/dir"}
|
: ${1?"Usage: $0 ./path/to/mastermap/dir"}
|
||||||
|
|
||||||
mastermap_dir=$1
|
mastermap_dir=$1
|
||||||
|
|
||||||
#
|
|
||||||
# Create 'geometry' record with
|
# Create 'geometry' record with
|
||||||
# id: <polygon-guid>,
|
# id: <polygon-guid>,
|
||||||
# source_id: <toid>,
|
# source_id: <toid>,
|
||||||
# geom: <geom>
|
# geom: <geom>
|
||||||
#
|
|
||||||
|
echo "Copy geometries to db..."
|
||||||
find $mastermap_dir -type f -name '*.3857.csv' \
|
find $mastermap_dir -type f -name '*.3857.csv' \
|
||||||
-printf "$mastermap_dir/%f\n" | \
|
-printf "$mastermap_dir/%f\n" | \
|
||||||
parallel \
|
parallel \
|
||||||
cat {} '|' psql -c "\"COPY geometries ( geometry_geom, source_id ) FROM stdin WITH CSV HEADER;\""
|
cat {} '|' psql -c "\"COPY geometries ( geometry_geom, source_id ) FROM stdin WITH CSV HEADER;\""
|
||||||
|
|
||||||
#
|
|
||||||
# Delete any duplicated geometries (by TOID)
|
# Delete any duplicated geometries (by TOID)
|
||||||
#
|
echo "Delete duplicate geometries..."
|
||||||
psql -c "DELETE FROM geometries a USING (
|
psql -c "DELETE FROM geometries a USING (
|
||||||
SELECT MIN(ctid) as ctid, source_id
|
SELECT MIN(ctid) as ctid, source_id
|
||||||
FROM geometries
|
FROM geometries
|
||||||
|
@ -1,36 +0,0 @@
|
|||||||
#!/usr/bin/env bash
|
|
||||||
|
|
||||||
#
|
|
||||||
# Load UPRNS from CSV to Postgres
|
|
||||||
# - assume postgres connection details are set in the environment using PGUSER, PGHOST etc.
|
|
||||||
#
|
|
||||||
: ${1?"Usage: $0 ./path/to/addressbase/dir"}
|
|
||||||
|
|
||||||
data_dir=$1
|
|
||||||
|
|
||||||
#
|
|
||||||
# Create 'building_properties' record with
|
|
||||||
# uprn: <uprn>,
|
|
||||||
# parent_uprn: <parent_uprn>,
|
|
||||||
# toid: <toid>,
|
|
||||||
# uprn_geom: <point>
|
|
||||||
#
|
|
||||||
find $data_dir -type f -name '*.3857.csv.loadable' \
|
|
||||||
-printf "$data_dir/%f\n" | \
|
|
||||||
parallel \
|
|
||||||
cat {} '|' psql -c "\"COPY building_properties ( uprn_geom, toid, uprn, parent_uprn ) FROM stdin WITH CSV HEADER;\""
|
|
||||||
|
|
||||||
#
|
|
||||||
# Create references
|
|
||||||
#
|
|
||||||
|
|
||||||
# index essential for speeed here
|
|
||||||
psql -c "CREATE INDEX IF NOT EXISTS building_toid_idx ON buildings ( ref_toid );"
|
|
||||||
# link to buildings
|
|
||||||
psql -c "UPDATE building_properties
|
|
||||||
SET building_id = (
|
|
||||||
SELECT b.building_id
|
|
||||||
FROM buildings as b
|
|
||||||
WHERE
|
|
||||||
building_properties.toid = b.ref_toid
|
|
||||||
);"
|
|
@ -3,13 +3,11 @@
|
|||||||
#
|
#
|
||||||
# Extract, transform and load building outlines and property records
|
# Extract, transform and load building outlines and property records
|
||||||
#
|
#
|
||||||
: ${1?"Usage: $0 ./path/to/addressbase/dir ./path/to/mastermap/dir ./path/to/boundary"}
|
: ${1?"Usage: $0 ./path/to/mastermap/dir ./path/to/boundary"}
|
||||||
: ${2?"Usage: $0 ./path/to/addressbase/dir ./path/to/mastermap/dir ./path/to/boundary"}
|
: ${2?"Usage: $0 ./path/to/mastermap/dir ./path/to/boundary"}
|
||||||
: ${3?"Usage: $0 ./path/to/addressbase/dir ./path/to/mastermap/dir ./path/to/boundary"}
|
|
||||||
|
|
||||||
addressbase_dir=$1
|
mastermap_dir=$1
|
||||||
mastermap_dir=$2
|
boundary_file=$2
|
||||||
boundary_file=$3
|
|
||||||
script_dir=${0%/*}
|
script_dir=${0%/*}
|
||||||
|
|
||||||
#
|
#
|
||||||
@ -17,10 +15,9 @@ script_dir=${0%/*}
|
|||||||
#
|
#
|
||||||
|
|
||||||
# extract both datasets
|
# extract both datasets
|
||||||
$script_dir/extract_addressbase.sh $addressbase_dir
|
|
||||||
$script_dir/extract_mastermap.sh $mastermap_dir
|
$script_dir/extract_mastermap.sh $mastermap_dir
|
||||||
# filter mastermap ('building' polygons and any others referenced by addressbase)
|
# filter mastermap ('building' polygons and any others referenced by addressbase)
|
||||||
$script_dir/filter_transform_mastermap_for_loading.sh $addressbase_dir $mastermap_dir
|
$script_dir/filter_transform_mastermap_for_loading.sh $mastermap_dir
|
||||||
|
|
||||||
#
|
#
|
||||||
# Load
|
# Load
|
||||||
@ -33,7 +30,5 @@ psql < $script_dir/../migrations/002.index-geometries.up.sql
|
|||||||
$script_dir/drop_outside_limit.sh $boundary_file
|
$script_dir/drop_outside_limit.sh $boundary_file
|
||||||
# create a building record per outline
|
# create a building record per outline
|
||||||
$script_dir/create_building_records.sh
|
$script_dir/create_building_records.sh
|
||||||
# add UPRNs where they match
|
# Run remaining migrations
|
||||||
$script_dir/load_uprns.sh $addressbase_dir
|
ls $script_dir/../migrations/*.up.sql 2>/dev/null | while read -r migration; do psql < $migration; done;
|
||||||
# index building records
|
|
||||||
psql < $script_dir/../migrations/003.index-buildings.up.sql
|
|
||||||
|
@ -3,11 +3,8 @@
|
|||||||
#
|
#
|
||||||
# Filter and transform for loading
|
# Filter and transform for loading
|
||||||
#
|
#
|
||||||
: ${1?"Usage: $0 ./path/to/addressbase/dir ./path/to/mastermap/dir"}
|
: ${1?"Usage: $0 ./path/to/mastermap/dir"}
|
||||||
: ${2?"Usage: $0 ./path/to/addressbase/dir ./path/to/mastermap/dir"}
|
|
||||||
|
|
||||||
addressbase_dir=$1
|
mastermap_dir=$1
|
||||||
mastermap_dir=$2
|
|
||||||
|
|
||||||
rm -f $addressbase_dir/*.{csv,gml,txt,filtered,gfs}
|
|
||||||
rm -f $mastermap_dir/*.{csv,gml,txt,filtered,gfs}
|
rm -f $mastermap_dir/*.{csv,gml,txt,filtered,gfs}
|
||||||
|
BIN
etl/screenshot/MasterMap.png
Normal file
BIN
etl/screenshot/MasterMap.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 38 KiB |
23
tests/test_filter.py
Normal file
23
tests/test_filter.py
Normal file
@ -0,0 +1,23 @@
|
|||||||
|
import csv
|
||||||
|
import pytest
|
||||||
|
from etl import filter_mastermap
|
||||||
|
|
||||||
|
|
||||||
|
def test_filter_mastermap():
|
||||||
|
"""Test that MasterMap CSV can be correctly filtered to include only buildings."""
|
||||||
|
input_file = "tests/test_mastermap.gml.csv" # Test csv with two buildings and one non-building
|
||||||
|
output_file = input_file.replace('gml', 'filtered')
|
||||||
|
filter_mastermap(input_file) # creates output_file
|
||||||
|
with open(output_file, newline='') as csvfile:
|
||||||
|
csv_array = list(csv.reader(csvfile))
|
||||||
|
assert len(csv_array) == 3 # assert that length is 3 because just two building rows after header
|
||||||
|
|
||||||
|
|
||||||
|
def test_filter_mastermap_missing_descriptivegroup():
|
||||||
|
"""Test that MasterMap CSV can be correctly filtered when the polygon does not have a type specified."""
|
||||||
|
input_file = "tests/test_mastermap_missing_descriptivegroup.gml.csv" # Test csv with one building and one non-building
|
||||||
|
output_file = input_file.replace('gml', 'filtered')
|
||||||
|
filter_mastermap(input_file) # creates output_file
|
||||||
|
with open(output_file, newline='') as csvfile:
|
||||||
|
csv_array = list(csv.reader(csvfile))
|
||||||
|
assert len(csv_array) == 1 # assert that length is 1 because just header
|
3
tests/test_mastermap.filtered.csv
Normal file
3
tests/test_mastermap.filtered.csv
Normal file
@ -0,0 +1,3 @@
|
|||||||
|
WKT,fid,descriptiveGroup
|
||||||
|
"POLYGON ((484704.003 184721.2,484691.62 184729.971,484688.251 184725.214,484700.633 184716.443,484704.003 184721.2))",osgb5000005129953843,"[ ""Building"" ]"
|
||||||
|
"POLYGON ((530022.138 177486.29,530043.695 177498.235,530043.074 177499.355,530042.435 177500.509,530005.349 177480.086,529978.502 177463.333,529968.87 177457.322,529968.446 177457.057,529968.199 177455.714,529968.16 177455.504,529966.658 177454.566,529958.613 177449.543,529956.624 177448.301,529956.62 177448.294,529956.08 177447.4,529954.238 177444.351,529953.197 177442.624,529953.186 177442.609,529950.768 177438.606,529950.454 177438.086,529949.47 177434.209,529950.212 177434.038,529954.216 177433.114,529955.098 177437.457,529952.714 177437.98,529953.55 177441.646,529953.842 177442.008,529957.116 177446.059,529957.449 177446.471,529968.508 177453.375,529974.457 177451.966,529976.183 177458.937,530003.157 177475.772,530020.651 177485.466,530021.257 177484.372,530022.744 177485.196,530022.138 177486.29))",osgb5000005283023887,"[ ""Building"" ]"
|
|
4
tests/test_mastermap.gml.csv
Normal file
4
tests/test_mastermap.gml.csv
Normal file
@ -0,0 +1,4 @@
|
|||||||
|
WKT,fid,descriptiveGroup
|
||||||
|
"POLYGON ((484704.003 184721.2,484691.62 184729.971,484688.251 184725.214,484700.633 184716.443,484704.003 184721.2))",osgb5000005129953843,"[ ""Building"" ]"
|
||||||
|
"POLYGON ((484703.76 184849.9,484703.46 184849.7,484703.26 184849.4,484703.06 184849.2,484702.86 184848.9,484702.76 184848.6,484702.66 184848.2,484702.66 184847.3,484702.76 184847.0,484702.96 184846.7,484703.06 184846.4,484703.36 184846.2,484703.56 184846.0,484704.16 184845.6,484704.46 184845.5,484705.46 184845.5,484706.06 184845.7,484706.26 184845.8,484706.76 184846.3,484706.96 184846.6,484707.16 184846.8,484707.26 184847.2,484707.36 184847.5,484707.36 184848.4,484707.26 184848.7,484707.16 184848.9,484706.76 184849.5,484706.46 184849.7,484706.26 184849.9,484705.66 184850.2,484704.66 184850.2,484703.76 184849.9))",osgb1000000152730957,"[ ""General Surface"" ]"
|
||||||
|
"POLYGON ((530022.138 177486.29,530043.695 177498.235,530043.074 177499.355,530042.435 177500.509,530005.349 177480.086,529978.502 177463.333,529968.87 177457.322,529968.446 177457.057,529968.199 177455.714,529968.16 177455.504,529966.658 177454.566,529958.613 177449.543,529956.624 177448.301,529956.62 177448.294,529956.08 177447.4,529954.238 177444.351,529953.197 177442.624,529953.186 177442.609,529950.768 177438.606,529950.454 177438.086,529949.47 177434.209,529950.212 177434.038,529954.216 177433.114,529955.098 177437.457,529952.714 177437.98,529953.55 177441.646,529953.842 177442.008,529957.116 177446.059,529957.449 177446.471,529968.508 177453.375,529974.457 177451.966,529976.183 177458.937,530003.157 177475.772,530020.651 177485.466,530021.257 177484.372,530022.744 177485.196,530022.138 177486.29))",osgb5000005283023887,"[ ""Building"" ]"
|
|
@ -0,0 +1 @@
|
|||||||
|
WKT,fid,descriptiveGroup
|
|
2
tests/test_mastermap_missing_descriptivegroup.gml.csv
Normal file
2
tests/test_mastermap_missing_descriptivegroup.gml.csv
Normal file
@ -0,0 +1,2 @@
|
|||||||
|
WKT,fid,descriptiveGroup
|
||||||
|
"POLYGON ((517896.1 186250.8,517891.7 186251.6,517891.1 186248.7,517890.75 186246.7,517890.65 186246.35,517890.45 186245.95,517890.25 186245.8,517889.95 186245.75,517889.65 186245.75,517878.3 186247.9,517874.61 186248.55,517872.9 186239.5,517873.4 186239.7,517873.95 186239.8,517874.25 186239.75,517874.65 186239.7,517875.05 186239.6,517878.35 186238.95,517889.1 186236.85,517892.769 186236.213,517903.2 186234.4,517919.55 186231.4,517932.25 186229.1,517942.1 186227.25,517954.65 186225.05,517968.75 186222.45,517985.25 186219.5,518000.0 186216.65,518021.7 186212.7,518026.7 186211.75,518029.1 186211.3,518029.68 186211.173,518033.65 186210.3,518046.1 186207.65,518058.45 186204.95,518063.3 186203.6,518068.1 186202.25,518068.9 186202.05,518079.6 186198.95,518081.4 186198.3,518083.2 186197.55,518084.95 186196.8,518086.7 186196.0,518088.45 186195.25,518097.85 186191.05,518099.15 186190.45,518108.3 186186.2,518108.375 186186.175,518108.45 186186.15,518108.477 186186.132,518114.5 186183.6,518114.65 186183.55,518114.85 186183.45,518115.05 186183.4,518115.25 186183.3,518115.35 186183.2,518115.45 186183.15,518141.85 186171.55,518142.0 186171.5,518142.15 186171.4,518142.45 186171.3,518142.6 186171.2,518142.7 186171.1,518142.8 186171.05,518142.9 186170.95,518143.05 186170.85,518143.15 186170.75,518143.25 186170.6,518143.4 186170.5,518143.5 186170.4,51814
|
Can't render this file because it contains an unexpected character in line 2 and column 1359.
|
Loading…
Reference in New Issue
Block a user