Merge pull request #447 from tomalrussell/feature/extract_readme
Extract data update
This commit is contained in:
commit
4cc7b59027
130
maintenance/extract_data/README.txt
Normal file
130
maintenance/extract_data/README.txt
Normal file
@ -0,0 +1,130 @@
|
||||
# Colouring London Data Extract
|
||||
|
||||
This extract contains a snapshot of contributions to Colouring London
|
||||
(https://colouring.london).
|
||||
|
||||
Colouring London is a citizen science platform collecting information on every building in
|
||||
London, to help make the city more sustainable.
|
||||
|
||||
The data included are open data, licensed under the Open Data Commons Open Database License
|
||||
(ODbL, http://opendatacommons.org/licenses/odbl/) by Colouring London contributors.
|
||||
|
||||
You are free to copy, distribute, transmit and adapt the data, as long as you credit Colouring
|
||||
London and our contributors. If you alter or build upon our data, you may distribute the
|
||||
result only under the same licence.
|
||||
|
||||
|
||||
## Contents
|
||||
|
||||
This extract contains four files:
|
||||
|
||||
- README.txt
|
||||
- building_attributes.csv
|
||||
- building_uprns.csv
|
||||
- edit_history.csv
|
||||
|
||||
|
||||
## Building Attributes
|
||||
|
||||
This is the main table, containing almost all data collected by Colouring London. Apart from
|
||||
`building_id`, `revision_id` and `ref_toid`, all of these fields are optional.
|
||||
|
||||
- `building_id`: unique building ID for Colouring London buildings
|
||||
- `revision_id`: unique revision ID for Colouring London, cross-references to our edit history
|
||||
- `ref_toid`: cross-reference to Ordnance Survey MasterMap TOID
|
||||
- `ref_osm_id`: cross-reference to OpenStreetMap feature osm_id
|
||||
- `location_name`: building name
|
||||
- `location_number`: building number
|
||||
- `location_street`: street name
|
||||
- `location_line_two`: additional address line
|
||||
- `location_town`: town
|
||||
- `location_postcode`: postcode
|
||||
- `location_latitude`: latitude
|
||||
- `location_longitude`: longitude
|
||||
- `date_year`: year built
|
||||
- `date_lower`: lower bound on year built
|
||||
- `date_upper`: upper bound on year built
|
||||
- `date_source`: type of source for building dates
|
||||
- `date_source_detail`: details of source for building dates
|
||||
- `date_link`: list of links to further information relating to building dates
|
||||
- `facade_year`: facade date
|
||||
- `facade_upper`: upper bound on facade date
|
||||
- `facade_lower`: lower bound on facade date
|
||||
- `facade_source`: type of source for facade dates
|
||||
- `facade_source_detail`: details of source for facade dates
|
||||
- `size_storeys_attic`: number of attic storeys
|
||||
- `size_storeys_core`: number of core storeys
|
||||
- `size_storeys_basement`: number of basement storeys
|
||||
- `size_height_apex`: height in metres to the building apex
|
||||
- `size_floor_area_ground`: ground floor floor area in square metres
|
||||
- `size_floor_area_total`: total floor area in square metres
|
||||
- `size_width_frontage`: width of frontage in metres
|
||||
- `likes_total`: number of times the building has been liked by Colouring London users
|
||||
- `planning_portal_link`: link to an entry on https://www.planningportal.co.uk/
|
||||
- `planning_in_conservation_area`: in a conservation area? (True/False)
|
||||
- `planning_conservation_area_name`: conservation area name
|
||||
- `planning_in_list`: in the National Heritage List for England? (True/False)
|
||||
- `planning_list_id`: National Heritage List for England ID
|
||||
- `planning_list_cat`: National Heritage List for England listing type
|
||||
- `planning_list_grade`: National Heritage List for England listing grade
|
||||
- `planning_heritage_at_risk_id`: on the Heritage at Risk list? (True/False)
|
||||
- `planning_world_list_id`: UNESCO World Heritage list ID
|
||||
- `planning_in_glher`: in the Greater London Historic Environment Record? (True/False)
|
||||
- `planning_glher_url`: Greater London Historic Environment Record link
|
||||
- `planning_in_apa`: in an Architectural Priority Area? (True/False)
|
||||
- `planning_apa_name`: Architectural Priority Area name
|
||||
- `planning_apa_tier`: Architectural Priority Area tier
|
||||
- `planning_in_local_list`: in a local list? (True/False)
|
||||
- `planning_local_list_url`: local list reference link
|
||||
- `planning_in_historic_area_assessment`: within a historic area assessment? (True/False)
|
||||
- `planning_historic_area_assessment_url`: historic area assessment reference link
|
||||
|
||||
|
||||
## Building UPRNs
|
||||
|
||||
Buildings are matched to UPRNs (Unique Property Reference Numbers), which should help link
|
||||
Colouring London data against other datasets.
|
||||
|
||||
Read more about UPRNs: https://www.ordnancesurvey.co.uk/business-government/tools-support/uprn
|
||||
|
||||
`building_uprns.csv` looks something like this:
|
||||
|
||||
building_id,uprn,parent_uprn
|
||||
2810432,10091093495,100023038313
|
||||
2810432,10091093496,100023038313
|
||||
2810432,10091093497,
|
||||
|
||||
- `building_id`: Colouring London unique building ID, references the building_id in
|
||||
building_attributes.csv
|
||||
- `uprn`: Unique Property Reference Number associated with the building. In some cases
|
||||
multiple UPRNs are associated with a single Colouring London building, for example in
|
||||
blocks of flats or mixed-use buildings.
|
||||
- `parent_uprn`: optional. Some UPRNs are grouped by a parent-child relationship, so while
|
||||
each UPRN is unique, multiple UPRNs may share the same parent.
|
||||
|
||||
|
||||
## Edit History
|
||||
|
||||
Each change to the Colouring London database is recorded, so it is possible to explore how the
|
||||
dataset evolves over time.
|
||||
|
||||
The edit history logs changes made by users, with the following fields:
|
||||
|
||||
- `revision_id`: unique change id, referenced by building_attributes
|
||||
- `revision_timestamp`: date and time of the change
|
||||
- `building_id`: Colouring London building ID, references building_attributes
|
||||
- `forward_patch`: the changes made, encoded as a JSON string where keys are attribute/column
|
||||
names, and values are the values set by this change.
|
||||
- `reverse_patch`: the reverse of the change, encoded as a JSON string. This shows what the
|
||||
values were before this change was made.
|
||||
- `user`: username of the user who made the change
|
||||
|
||||
|
||||
For example a forward patch might show a building date being provided, along with some source
|
||||
details:
|
||||
|
||||
{"date_year": 1911, "date_source_details": "Survey of London Marylebone draft text"}
|
||||
|
||||
Where the reverse patch shows that there was no previous data stored:
|
||||
|
||||
{"date_year": None, "date_source_details": None}
|
@ -1,4 +1,4 @@
|
||||
SELECT
|
||||
COPY (SELECT
|
||||
building_id,
|
||||
ref_toid,
|
||||
ref_osm_id,
|
||||
@ -16,6 +16,7 @@ SELECT
|
||||
date_upper,
|
||||
date_source,
|
||||
date_source_detail,
|
||||
date_link,
|
||||
facade_year,
|
||||
facade_upper,
|
||||
facade_lower,
|
||||
@ -34,6 +35,8 @@ SELECT
|
||||
planning_conservation_area_name,
|
||||
planning_in_list,
|
||||
planning_list_id,
|
||||
planning_list_cat,
|
||||
planning_list_grade,
|
||||
planning_heritage_at_risk_id,
|
||||
planning_world_list_id,
|
||||
planning_in_glher,
|
||||
@ -44,8 +47,7 @@ SELECT
|
||||
planning_in_local_list,
|
||||
planning_local_list_url,
|
||||
planning_in_historic_area_assessment,
|
||||
planning_historic_area_assessment_url,
|
||||
planning_list_cat,
|
||||
planning_list_grade,
|
||||
date_link
|
||||
FROM buildings
|
||||
planning_historic_area_assessment_url
|
||||
FROM buildings)
|
||||
TO '/tmp/building_attributes.csv'
|
||||
WITH CSV HEADER
|
||||
|
@ -1,3 +1,12 @@
|
||||
SELECT log_id as revision_id, log_timestamp as revision_timestamp, building_id, forward_patch, reverse_patch, u.username as user
|
||||
COPY(SELECT
|
||||
log_id as revision_id,
|
||||
date_trunc('second', log_timestamp) as revision_timestamp,
|
||||
building_id,
|
||||
forward_patch,
|
||||
reverse_patch,
|
||||
u.username as user
|
||||
FROM logs l
|
||||
JOIN users u ON l.user_id = u.user_id
|
||||
JOIN users u
|
||||
ON l.user_id = u.user_id)
|
||||
TO '/tmp/edit_history.csv'
|
||||
WITH CSV HEADER
|
||||
|
@ -1,3 +1,8 @@
|
||||
SELECT building_id, uprn, parent_uprn
|
||||
COPY(SELECT
|
||||
building_id,
|
||||
uprn,
|
||||
parent_uprn
|
||||
FROM building_properties
|
||||
WHERE building_id IS NOT NULL
|
||||
WHERE building_id IS NOT NULL)
|
||||
TO '/tmp/building_uprns.csv'
|
||||
WITH CSV HEADER
|
||||
|
@ -22,39 +22,6 @@ def get_connection():
|
||||
)
|
||||
|
||||
|
||||
def fetch_with_server_side_cursor(
|
||||
connection,
|
||||
query,
|
||||
on_row,
|
||||
row_batch_size=10000
|
||||
):
|
||||
with connection.cursor('server_side') as cur:
|
||||
cur.itersize = row_batch_size
|
||||
cur.execute(query)
|
||||
|
||||
header_saved = False
|
||||
|
||||
for row in cur:
|
||||
if not header_saved:
|
||||
columns = [c[0] for c in cur.description]
|
||||
on_row(columns)
|
||||
header_saved = True
|
||||
on_row(row)
|
||||
|
||||
|
||||
def db_to_csv(connection, query):
|
||||
string_io = StringIO()
|
||||
writer = csv.writer(string_io)
|
||||
|
||||
fetch_with_server_side_cursor(
|
||||
connection,
|
||||
query,
|
||||
lambda row: writer.writerow(row)
|
||||
)
|
||||
|
||||
return string_io.getvalue()
|
||||
|
||||
|
||||
def get_extract_zip_file_path(current_time):
|
||||
base_dir = Path(os.environ['EXTRACTS_DIRECTORY'])
|
||||
file_name = f"data-extract-{current_time:%Y-%m-%d-%H_%M_%S}.zip"
|
||||
@ -79,27 +46,30 @@ def read_sql(rel_path_from_script):
|
||||
return sql_path.read_text()
|
||||
|
||||
|
||||
building_attr_query = read_sql('./export_attributes.sql')
|
||||
building_uprn_query = read_sql('./export_uprns.sql')
|
||||
edit_history_query = read_sql('./export_edit_history.sql')
|
||||
|
||||
|
||||
def make_data_extract(current_time, connection, zip_file_path):
|
||||
if zip_file_path.exists():
|
||||
raise ZipFileExistsError('Archive file under specified name already exists')
|
||||
|
||||
# Execute data dump as Postgres COPY commands, write from server to /tmp
|
||||
with connection.cursor() as cur:
|
||||
cur.execute(read_sql('./export_attributes.sql'))
|
||||
|
||||
with connection.cursor() as cur:
|
||||
cur.execute(read_sql('./export_uprns.sql'))
|
||||
|
||||
with connection.cursor() as cur:
|
||||
cur.execute(read_sql('./export_edit_history.sql'))
|
||||
|
||||
zip_file_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
try:
|
||||
with zipfile.ZipFile(zip_file_path, mode='w') as newzip:
|
||||
newzip.writestr('building_attributes.csv',
|
||||
db_to_csv(connection, building_attr_query))
|
||||
newzip.writestr('building_uprns.csv',
|
||||
db_to_csv(connection, building_uprn_query))
|
||||
newzip.writestr('edit_history.csv',
|
||||
db_to_csv(connection, edit_history_query))
|
||||
|
||||
# TODO: add README
|
||||
newzip.write('README.txt')
|
||||
newzip.write('/tmp/building_attributes.csv', arcname='building_attributes.csv')
|
||||
newzip.write('/tmp/building_uprns.csv', arcname='building_uprns.csv')
|
||||
newzip.write('/tmp/edit_history.csv', arcname='edit_history.csv')
|
||||
|
||||
add_extract_record_to_database(connection, zip_file_path, current_time)
|
||||
except:
|
||||
|
Loading…
Reference in New Issue
Block a user