Update ETL docs

This commit is contained in:
Tom Russell 2018-09-29 18:29:57 +01:00
parent 4984ad9515
commit 2a1902f6ce

View File

@ -12,16 +12,40 @@ Before running any of these scripts, you will need the OS data for your area of
interest. AddressBase and MasterMap are available directly from [Ordnance
Survey](https://www.ordnancesurvey.co.uk/).
To help test the Colouring London app, `get_test_polygons.py` will attempt to save a small
(1.5km²) extract from OpenStreetMap to a format suitable for loading to the database.
The scripts should be run in the following order:
1. extract_addressbase.sh
1. extract_mastermap.sh
1. filter_transform_mastermap_for_loading.sh
1. load_geometries.sh
1. create_building_records.sh
1. (SQL migration) psql < ../migrations/002.index-geometries.sql
1. load_uprns.py
1. (SQL migration) psql < ../migrations/002.index-buildings.sql
```bash
# extract both datasets
extract_addressbase.sh ./addressbase_dir
extract_mastermap.sh ./mastermap_dir
# filter mastermap ('building' polygons and any others referenced by addressbase)
filter_transform_mastermap_for_loading.sh ./addressbase_dir ./mastermap_dir
# load all building outlines
load_geometries.sh ./mastermap_dir
# index geometries (should be faster after loading)
psql < ../migrations/002.index-geometries.sql
# create a building record per outline
create_building_records.sh
# add UPRNs where they match
load_uprns.py ./addressbase_dir
# index building records
psql < ../migrations/002.index-buildings.sql
```
To help test the Colouring London application, `get_test_polygons.py` will attempt to save a
small (1.5km²) extract from OpenStreetMap to a format suitable for loading to the database.
In this case, run:
```bash
# download test data
get_test_polygons.py
# load all building outlines
load_geometries.sh ./test_data_dir
# index geometries (should be faster after loading)
psql < ../migrations/002.index-geometries.sql
# create a building record per outline
create_building_records.sh
# index building records
psql < ../migrations/002.index-buildings.sql
```