$> ./bin/wof emit -h
Emit one or more Who's On First records.
Usage:
./bin/wof emit [options] path(N) path(N)
-as-spr
Emit Who's On First records formatted as Standard Place Response (SPR) records. This flag is DEPRECATED. Please use '-format spr' instead.
-as-spr-geojson
Emit Who's On First records as GeoJSON records where the 'properties' element is replaced by a Standard Place Response (SPR) representation of the record. This flag is DEPRECATED. Please use '-format geojson' instead.
-csv-append-property value
Zero or more additional properties to append to each CSV row. Properties should be in the format of {COLUMN_NAME}={PATH}. This flag is only honoured if the -format flag has a value of "csv".
-forgiving
Do not stop processing when errors are encountered.
-format string
Valid options are: csv, spr, spr-geojson or [none]. If none then the raw GeoJSON for each matching record will be emitted.
-include-alt-geoms
Emit alternate geometry records. (default true)
-iterator-uri string
A valid whosonfirst/go-whosonfirst-iterate/v2/emitter URI. If URI is "-" then this flag will be assigned a value of "file://" whose input will be the expanded URIs derived from additional arguments. Available options are: cwd://, directory://, featurecollection://, file://, filelist://, geojsonl://, null://, repo:// (default "repo://")
-query value
One or more {PATH}={REGEXP} parameters for filtering records.
-query-mode string
Specify how query filtering should be evaluated. Valid modes are: ALL, ANY (default "ALL")
-writer-uri string
A valid whosonfirst/go-writer.Writer URI. Available options are: cwd://, featurecollection://, fs://, geoparquet://, io://, jsonl://, null://, repo://, sqlite://, stdout:// (default "jsonl://?writer=stdout://")
For example, emitting records as CSV results with additional custom properties:
$> ./bin/wof emit \
-format csv \
-csv-append-property sfomuseum_description=properties.sfomuseum:description \
-writer-uri stdout:// \
-query 'properties.millsfield:subcategory_id=1511213363' \
/usr/local/data/sfomuseum-data-collection
wof:repo,mz:is_deprecated,edtf:inception,mz:max_latitude,mz:is_ceased,mz:is_superseding,wof:lastmodified,mz:longitude,mz:max_longitude,wof:placetype,wof:country,mz:is_current,wof:belongsto,mz:latitude,mz:min_longitude,wof:superseded_by,mz:min_latitude,edtf:cessation,wof:parent_id,wof:path,mz:is_superseded,wof:name,wof:id,mz:uri,wof:supersedes,sfomuseum_description
sfomuseum-data-collection,-1,1948,37.61661101879963,1,0,1716579582,-122.38615540108617,-122.38583385944366,custom,XY,1,"102527513,1159162825,102191575,85633793,102087579,85922583,85688637,1159160869",37.61635747477365,-122.38647222518921,,37.6161053594541,1948,1159162825,184/675/460/1/1846754601.geojson,0,postcard: Pigeon Key,1846754601,184/675/460/1/1846754601.geojson,,"Color postcard with photographic image depicting houses on island with bridge; postmarked June 7, 1948 in Fort Lauderdale, Fla.; text on front: “Kodachrome by C.H. Ruth / Overseas Highway, Above Pigeon Key, Between Key West and Miami, Fla.”"
Black and white postcard with photographic image depicting profile view of Sikorsky XPBS-1 on water; text on front: “Sikorsky flying dreadnaught [sic] for U/S/ Navy with Hamilton Constant Speed Propellers”.,custom,184/675/460/3/1846754603.geojson,XY,37.6161053594541,1,1716579582,1159162825,0,1930-12,184/675/460/3/1846754603.geojson,"102527513,1159162825,102191575,85633793,102087579,85922583,85688637,1159160869",-1,,37.61661101879963,1930-09,,1,-122.38615540108617,-122.38647222518921,sfomuseum-data-collection,0,postcard: Sikorsky XPBS-1,-122.38583385944366,1846754603,37.61635747477365
XY,1930~,"Black and white postcard with photographic image depicting aerial view of Havana Harbor with Pan American Airways Fokker F-10 in flight; text on front: “Tri-motor airliner, Pan American Airways, over El Morro, Havana”.",1716579582,-122.38583385944366,1,0,184/675/460/5/1846754605.geojson,0,37.61661101879963,-122.38647222518921,37.61635747477365,"102527513,1159162825,102191575,85633793,102087579,85922583,85688637,1159160869",-122.38615540108617,"postcard: Pan American Airways, Fokker F.10, Havana",custom,1159162825,sfomuseum-data-collection,1930~,,37.6161053594541,,1846754605,184/675/460/5/1846754605.geojson,1,-1
-122.38583385944366,,37.61661101879963,193X,0,"102527513,1159162825,102191575,85633793,102087579,85922583,85688637,1159160869",0,184/675/460/7/1846754607.geojson,1159162825,193X,sfomuseum-data-collection,"Black and white postcard with photographic image depicting low-angle front three-quarter view of Pan American Airways Sikorsky S-42 in flight against clouds; text on front: “A ‘Clipper Ship’ of the Air”; text on reverse: “America’s Largest Airliners - Giant 4-engined 32 and 44 passenger, 19-ton Flying Boats - Ply the Pan American Airways Routes between the U.S., the West Indies and South America”.","postcard: Pan American Airways, Sikorsky S-42",custom,37.61635747477365,,1,184/675/460/7/1846754607.geojson,-122.38647222518921,-1,1716579582,1846754607,-122.38615540108617,1,XY,37.6161053594541
... and so on
The default set of CSV row map to the properties of a Standard Places Result (SPR).
For example, emitting all the records marked mz:is_current=1
from the whosonfirst-data-venue-ca repository to a GeoParquet database:
./bin/wof emit \
-writer-uri 'geoparquet://?min=100&max=1000&append-property=wof:concordances' \
-iterator-uri 'repo://?include=properties.mz:is_current=1' \
/usr/local/data/whosonfirst-data-venue-ca \
> /usr/local/data/venue-ca.geoparquet
processed 17349 records in 1m0.000336833s (started 2024-08-16 17:38:11.279896 -0700 PDT m=+0.044642876)
... time passes
processed 546529 records in 20m0.002204708s (started 2024-08-16 17:38:11.279896 -0700 PDT m=+0.044642876)
2024/08/16 17:59:07 INFO time to index paths (1) 20m56.026484291s
And then loading the resultant database in DuckDB:
$> duckdb
v1.0.0 1f98600c2c
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
D LOAD spatial;
D SELECT "wof:id", "wof:name", "wof:concordances" FROM read_parquet('/usr/local/data/venue-ca.geoparquet') LIMIT 5;
┌────────────┬───────────────────┬──────────────────────────────────────────────────────────────────────────────────────────────┐
│ wof:id │ wof:name │ wof:concordances │
│ varchar │ varchar │ varchar │
├────────────┼───────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────┤
│ 1108798699 │ Foxy │ {\n "osm:node": 2687914646,\n "ovtr:id": "08f2baa44c8f6028031206a8c92b602c"\n } │
│ 1108798581 │ Myriade │ │
│ 1125142651 │ Liverpool House │ {} │
│ 1125142781 │ Joe Beef │ {\n "4sq:id": "4b84801af964a520fd3831e3"\n } │
│ 1108808935 │ Drawn & Quarterly │ {\n "4sq:id": "4ad4c06ff964a5205ffb20e3",\n "osm:node": 2704382357\n } │
└────────────┴───────────────────┴──────────────────────────────────────────────────────────────────────────────────────────────┘
D
Or reading data directly from a Who's On First -style data repository on GitHub and writing all the records to GeoParquet file:
$> ./bin/wof emit \
-writer-uri 'geoparquet://?min=100&max=1000&append-property=sfomuseum:placetype' \
-iterator-uri 'git:///tmp' \
https://github.com/sfomuseum-data/sfomuseum-data-architecture.git \
> /usr/local/data/arch.geoparquet
And then, again, in DuckDB:
D SELECT "wof:id", "wof:name", "sfomuseum:placetype", "wof:placetype", "mz:is_current" FROM read_parquet('/usr/local/data/arch.geoparquet');
┌────────────┬─────────────────────────────────────────┬─────────────────────┬───────────────┬───────────────┐
│ wof:id │ wof:name │ sfomuseum:placetype │ wof:placetype │ mz:is_current │
│ varchar │ varchar │ varchar │ varchar │ varchar │
├────────────┼─────────────────────────────────────────┼─────────────────────┼───────────────┼───────────────┤
│ 102527513 │ San Francisco International Airport │ airport │ campus │ 1 │
│ 1159157037 │ G-02 International North Cases │ gallery │ enclosure │ 0 │
│ 1159157039 │ A-02 International South Cases │ gallery │ enclosure │ 0 │
│ 1159157041 │ A-07 International Central Vitrine │ gallery │ enclosure │ 0 │
│ 1159157045 │ 4B International North Wall │ gallery │ enclosure │ 0 │
│ 1159157047 │ 4C International South Wall │ gallery │ enclosure │ 0 │
│ 1159157049 │ 3L Terminal 3 Connector Arrival Level │ gallery │ enclosure │ 0 │
│ 1159157051 │ 3J Photographs │ gallery │ enclosure │ 0 │
│ 1159157053 │ 3C North Connector │ gallery │ enclosure │ 0 │
│ 1159157055 │ 3D Terminal 3 Hub │ gallery │ enclosure │ 0 │
... and so on
Please consult the whosonfirst/go-writer-geoparquet documentation for details on how to configure the -writer-uri
flag.
For example, emitting records as SPR results:
$> ./bin/wof emit \
-format spr \
-writer-uri 'jsonl://?writer=stdout://' \
/usr/local/data/sfomuseum-data-maps/
{"edtf:cessation":"1985~","edtf:inception":"1985~","mz:is_ceased":1,"mz:is_current":0,"mz:is_deprecated":-1,"mz:is_superseded":0,"mz:is_superseding":0,"mz:latitude":37.616459,"mz:longitude":-122.386272,"mz:max_latitude":37.63100646804649,"mz:max_longitude":-122.37094362769881,"mz:min_latitude":37.60096637420677,"mz:min_longitude":-122.40407820844655,"mz:uri":"https://data.whosonfirst.org/136/039/131/3/1360391313.geojson","wof:belongsto":[],"wof:country":"US","wof:id":1360391313,"wof:lastmodified":1716594274,"wof:name":"SFO (1985)","wof:parent_id":-4,"wof:path":"136/039/131/3/1360391313.geojson","wof:placetype":"custom","wof:repo":"sfomuseum-data-maps","wof:superseded_by":[],"wof:supersedes":[]}
...and so on
Or with query filtering:
$> ./bin/wof emit \
-format spr \
-query 'properties.wof:name=SFO \(2023\)' \
/usr/local/data/sfomuseum-data-maps/
{"edtf:cessation":"","edtf:inception":"2023-07~","mz:is_ceased":-1,"mz:is_current":-1,"mz:is_deprecated":-1,"mz:is_superseded":0,"mz:is_superseding":0,"mz:latitude":37.621284127293116,"mz:longitude":-122.38285759138246,"mz:max_latitude":37.642285759714994,"mz:max_longitude":-122.34578162574567,"mz:min_latitude":37.60153229886917,"mz:min_longitude":-122.40810153962025,"mz:uri":"https://data.whosonfirst.org/188/030/951/9/1880309519.geojson","wof:belongsto":[102527513,102191575,85633793,102087579,85922583,554784711,85688637,102085387],"wof:country":"US","wof:id":1880309519,"wof:lastmodified":1716594274,"wof:name":"SFO (2023)","wof:parent_id":-1,"wof:path":"188/030/951/9/1880309519.geojson","wof:placetype":"custom","wof:repo":"sfomuseum-data-maps","wof:superseded_by":[],"wof:supersedes":[]}
Or emitting records as FeatureCollection of GeoJSON-formatted SPR results (where the original geometry is preserved but the properties hash is replaced by that record's SPR) and piping the result to ogr2ogr
:
$> ./bin/wof emit \
-format spr-geojson \
-writer-uri 'featurecollection://?writer=stdout://' \
/usr/local/data/sfomuseum-data-flights-2024-05 \
| ogr2ogr -f parquet flights2.parquet /vsistdin/
Or, iterating over a custom list of files:
$> wof emit \
-iterator-uri - \
-writer-uri 'featurecollection://?writer=stdout://' \
1914563993 1914564157 1914564489 1914564345 \
| json_pp \
| grep 'wof:name'
2024/07/09 13:57:07 INFO time to index paths (4) 797.084µs
"wof:name" : "Terminal 2",
"wof:name" : "Terminal 1",
"wof:name" : "International Terminal",
"wof:name" : "Terminal 3",