diff --git a/etl/README.md b/etl/README.md index d42e90b5..60f25780 100644 --- a/etl/README.md +++ b/etl/README.md @@ -8,9 +8,41 @@ for Colouring London: ## Prerequisites +Install PostgreSQL and create a database for colouringlondon, with a database +user that can connect to it. The [PostgreSQL +documentation](https://www.postgresql.org/docs/12/tutorial-start.html) covers +installation and getting started. + +Install the [PostGIS extension](https://postgis.net/). + +Connect to the colouringlondon database and add the PostGIS, pgcrypto and +pg_trgm extensions: + +```sql +create extension postgis; +create extension pgcrypto; +create extension pg_trgm; +``` + +Create the core database tables: + +```bash +psql < ../migrations/001.core.up.sql +``` + +There is some performance benefit to creating indexes after bulk loading data. +Otherwise, it's fine to run all the migrations at this point and skip the index +creation steps below. + +Install GNU parallel, this is used to speed up loading bulk data. + + +## Process and load Ordance Survey data + Before running any of these scripts, you will need the OS data for your area of interest. AddressBase and MasterMap are available directly from [Ordnance -Survey](https://www.ordnancesurvey.co.uk/). +Survey](https://www.ordnancesurvey.co.uk/). The alternative setup below uses +OpenStreetMap. The scripts should be run in the following order: @@ -32,6 +64,10 @@ load_uprns.py ./addressbase_dir psql < ../migrations/003.index-buildings.sql ``` +## Alternative, using OpenStreetMap + +This uses the [osmnx](https://github.com/gboeing/osmnx) python package to get OpenStreetMap data. You will need python and osmnx to run `get_test_polygons.py`. + To help test the Colouring London application, `get_test_polygons.py` will attempt to save a small (1.5kmĀ²) extract from OpenStreetMap to a format suitable for loading to the database. @@ -39,13 +75,17 @@ In this case, run: ```bash # download test data -get_test_polygons.py +python get_test_polygons.py # load all building outlines -load_geometries.sh ./test_data_dir +./load_geometries.sh ./ # index geometries (should be faster after loading) -psql < ../migrations/002.index-geometries.sql +psql < ../migrations/002.index-geometries.up.sql # create a building record per outline -create_building_records.sh +./create_building_records.sh # index building records -psql < ../migrations/002.index-buildings.sql +psql < ../migrations/003.index-buildings.up.sql ``` + +## Finally + +Run the remaining migrations in `../migrations` to create the rest of the database structure. \ No newline at end of file diff --git a/migrations/README.md b/migrations/README.md index 66f9c245..b41d92d3 100644 --- a/migrations/README.md +++ b/migrations/README.md @@ -8,6 +8,7 @@ $ psql "host={hostname} user={username} port={port} sslmode=require dbname=postg > \c colouringlondon > create extension postgis; > create extension pgcrypto; +> create extension pg_trgm; > \q ```