Skip to content

Commit

Permalink
GeoParquet and Arrow IPC read/write support (PDAL#4115)
Browse files Browse the repository at this point in the history
* remove dead codepath

* fix initialization order

* implement writers.arrow for feather and parquet support

* use type_fwd provided from arrow

* add ORC output support too

* fix doc warnings

* add license

* add readers.arrow scaffolding

* readers.arrow implementation. fix writers.arrow to write dimensions in correct order

* parquet read support

* make sure to init m_formatType

* retab dependabot?

* geoparquet output

* configure CI to run arrow builds

* feather/parquet GeoParquet-style metadata reading

* missing file

* report read failure error information

* NOMINMAX for WIN32

* need NOMINMAX for tests too

* typo'd target names

* fix up geoparquet projjson output

* fix parquet reader

* remove extraneous Close()

* bump ci

* WIP

* arrow and parquet batch writing

* wip

* support pdal::Geometry creation from WKB

* set XYZ from GeoParquet wkb if it is there

* write XYZ as StructArray for GeoArrow compatibility

* warning nit

* GeoArrow support

* write arrow schema

* set 4326 for empty crs for geoparquet

* oops

* cleanups and docs

* add writers.arrow.write_pipeline_metadata to support writing final table metadata into the ARROW schema for the GeoArrow struct
  • Loading branch information
hobu authored Sep 18, 2023
1 parent 4761ae9 commit 702ca29
Show file tree
Hide file tree
Showing 28 changed files with 2,381 additions and 12 deletions.
7 changes: 7 additions & 0 deletions cmake/arrow.cmake
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
#
# Arrow configuration.
#

find_package(Arrow REQUIRED)
find_package(Parquet REQUIRED)

32 changes: 32 additions & 0 deletions doc/stages/readers.arrow.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
.. _readers.arrow:

readers.arrow
==============


.. plugin::

.. streamable::

The Arrow reader supports reading Arrow and Parquet -formatted data as written by
:ref:`writers.arrow`, although it should support point clouds written by other
writers too if they follow either the `GeoArrow <https://github.com/geoarrow/geoarrow/>`__
or `GeoParquet <https://github.com/opengeospatial/geoparquet/>`__ specification.

Caveats:

* Which schema is read is chosen by the file name extension, but can be
overridden with the `format` option set to `geoarrow` or `geoparquet`
*

Options
-------

filename
Arrow GeoArrow or GeoParquet file to read [Required]
format
`geoarrow` or `geoparquet` option to override any filename extension
hinting of data type [Optional]

.. include:: reader_opts.rst

1 change: 1 addition & 0 deletions doc/stages/readers.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ like :ref:`readers.pgpointcloud`, or a network service like :ref:`readers.ept`.
:glob:
:hidden:

readers.arrow
readers.bpf
readers.buffer
readers.copc
Expand Down
73 changes: 73 additions & 0 deletions doc/stages/writers.arrow.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
.. _writers.arrow:

writers.arrow
===============

The **Arrow Writer** supports writing to `Apache Arrow`_ `Feather`_
and `Parquet`_ file types.

.. plugin::

.. streamable::



Example
-------

.. code-block:: json
[
{
"type":"readers.las",
"filename":"inputfile.las"
},
{
"type":"writers.arrow",
"format":"feather",
"filename":"outputfile.feather"
}
]
.. code-block:: json
[
{
"type":"readers.las",
"filename":"inputfile.las"
},
{
"type":"writers.arrow",
"format":"parquet",
"geoparquet":"true",
"filename":"outputfile.parquet"
}
]
Options
-------

batch_size
Number of rows to write as a batch [Default: 65536*64 ]

filename
Output file to write [Required]
format
File type to write (feather, parquet) [Default: "feather"]

geoarrow_dimension_name
Dimension name to write GeoArrow struct [Default: xyz]

geoparquet
Write WKB column and GeoParquet metadata when writing parquet output

write_pipeline_metadata
Write PDAL pipeline metadata into `PDAL:pipeline:metadata` of
`geoarrow_dimension_name`

.. include:: writer_opts.rst

.. _Apache Arrow: https://arrow.apache.org/
.. _Feather: https://arrow.apache.org/docs/python/feather.html
.. _Parquet: https://arrow.apache.org/docs/cpp/parquet.html

6 changes: 5 additions & 1 deletion doc/stages/writers.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ dimension type, while others only understand fixed dimension names.
:glob:
:hidden:

writers.arrow
writers.bpf
writers.copc
writers.draco
Expand All @@ -38,8 +39,11 @@ dimension type, while others only understand fixed dimension names.
writers.text
writers.tiledb

:ref:`writers.arrow`
write Apache Arrow Feather- or Parquet-formatted files

:ref:`writers.bpf`
Write BPF version 3 files. BPF is an NGA specification for point cloud data.
write BPF version 3 files. BPF is an NGA specification for point cloud data.

:ref:`writers.copc`
COPC, or Cloud Optimized Point Cloud, is an LAZ 1.4 file stored as a
Expand Down
66 changes: 56 additions & 10 deletions pdal/Geometry.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,20 @@ Geometry::Geometry(OGRGeometryH g, const SpatialReference& srs)
}


Geometry::Geometry(double x, double y, double z, SpatialReference ref)
{
OGRGeometry* geom(nullptr);
OGRPoint point(x, y, z);
geom = reinterpret_cast<OGRGeometry *>(&point);

if (geom)
m_geom.reset(geom->clone());

setSpatialReference(ref);
}



Geometry::~Geometry()
{}

Expand All @@ -112,32 +126,50 @@ void Geometry::update(const std::string& wkt_or_json)
bool isJson = (wkt_or_json.find("{") != wkt_or_json.npos) ||
(wkt_or_json.find("}") != wkt_or_json.npos);

OGRGeometry *newGeom;
bool maybeWkt = (wkt_or_json.find("(") != wkt_or_json.npos) ||
(wkt_or_json.find(")") != wkt_or_json.npos);

// first byte is 00 or 01
bool maybeWkb = (wkt_or_json[0] == 0 || wkt_or_json[0] == 1);

OGRGeometry *newGeom (nullptr);
std::string srs;
if (isJson)

if (maybeWkb)
{
// createFromGeoJson may set the geometry's SRS for us
// because GeoJSON is 4326. If the user provided a 'srs'
// node, we're going to override with that, however
newGeom = gdal::createFromGeoJson(wkt_or_json, srs);
// assume WKB
newGeom = gdal::createFromWkb(wkt_or_json, srs);
if (!newGeom)
throw pdal_error("Unable to create geometry from input GeoJSON");
throw pdal_error("Unable to create geometry from input WKB");

if (srs.size())
if (!newGeom->getSpatialReference() && srs.size())
newGeom->assignSpatialReference(
new OGRSpatialReference(SpatialReference(srs).getWKT().data()));

}
else
else if (maybeWkt)
{
newGeom = gdal::createFromWkt(wkt_or_json, srs);
if (!newGeom)
throw pdal_error("Unable to create geometry from input WKT");
throw pdal_error("Unable to create geometry from input WKT");

if (!newGeom->getSpatialReference() && srs.size())
newGeom->assignSpatialReference(
new OGRSpatialReference(SpatialReference(srs).getWKT().data()));
}
else if (isJson)
{
// createFromGeoJson may set the geometry's SRS for us
// because GeoJSON is 4326. If the user provided a 'srs'
// node, we're going to override with that, however
newGeom = gdal::createFromGeoJson(wkt_or_json, srs);
if (!newGeom)
throw pdal_error("Unable to create geometry from input GeoJSON");

if (srs.size())
newGeom->assignSpatialReference(
new OGRSpatialReference(SpatialReference(srs).getWKT().data()));
}


m_geom.reset(newGeom);
Expand Down Expand Up @@ -281,6 +313,20 @@ std::string Geometry::wkt(double precision, bool bOutputZ) const
return wkt;
}

std::string Geometry::wkb() const
{

std::string output(m_geom->WkbSize(), '\0');

char *buf;
OGRErr err = m_geom->exportToWkb(wkbNDR, (unsigned char*) output.data(), wkbVariantIso);
if (err != OGRERR_NONE)
throw pdal_error("Geometry::wkb: unable to export geometry to wkb.");

return output;
}



std::string Geometry::json(double precision) const
{
Expand Down
4 changes: 3 additions & 1 deletion pdal/Geometry.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -52,10 +52,11 @@ class PDAL_DLL Geometry
{

public:
Geometry(const std::string& wkt_or_json,
Geometry(const std::string& wkt_or_wkb_or_json,
SpatialReference ref = SpatialReference());
Geometry();
Geometry(const Geometry&);
Geometry(double x, double y, double z, SpatialReference ref = SpatialReference());
Geometry(Geometry&&);
Geometry(OGRGeometryH g);
Geometry(OGRGeometryH g, const SpatialReference& srs);
Expand All @@ -77,6 +78,7 @@ class PDAL_DLL Geometry
Utils::StatusWithReason transform(SpatialReference ref);

std::string wkt(double precision=15, bool bOutputZ=false) const;
std::string wkb() const;
std::string json(double precision=15) const;

BOX3D bounds() const;
Expand Down
2 changes: 2 additions & 0 deletions pdal/StageExtensions.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,7 @@ using Extensions = std::map<std::string, StringList>;

static const Extensions readerExtensions =
{
{ "readers.arrow", { "feather", "parquet"} },
{ "readers.draco", { "drc" } },
{ "readers.icebridge", { "icebridge", "h5" } },
{ "readers.matlab", { "mat" } },
Expand All @@ -69,6 +70,7 @@ static const Extensions readerExtensions =

static const Extensions writerExtensions =
{
{ "writers.arrow", { "feather", "parquet"} },
{ "writers.draco", { "drc" } },
{ "writers.fbi", { "fbi" } },
{ "writers.matlab", { "mat" } },
Expand Down
23 changes: 23 additions & 0 deletions pdal/private/gdal/GDALUtils.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -200,6 +200,29 @@ OGRGeometry *createFromWkt(const std::string& s, std::string& srs)
return newGeom;
}

/**
Create OGR geometry given a wkb string and text SRS.
\param s WKT string to convert to OGR Geometry.
\param srs Text representation of coordinate reference system.
\return Pointer to new geometry.
*/
OGRGeometry *createFromWkb(const std::string& s, std::string& srs)
{
OGRGeometry *newGeom(nullptr);

size_t nBytesRead;
OGRErr err = OGRGeometryFactory::createFromWkb(s.c_str(),
NULL,
&newGeom,
s.size(),
wkbVariantIso,
nBytesRead);
if (!newGeom)
throw pdal_error("Couldn't convert WKB string to geometry.");

return newGeom;
}


/**
Create OGR geometry given a GEOjson text string and text SRS.
Expand Down
1 change: 1 addition & 0 deletions pdal/private/gdal/GDALUtils.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,7 @@ PDAL_DLL std::vector<Polygon> getPolygons(const NL::json& ogr);
// New signatures to support extraction of SRS from the end of geometry
// specifications.
OGRGeometry *createFromWkt(const std::string& s, std::string& srs);
OGRGeometry *createFromWkb(const std::string& s, std::string& srs);
OGRGeometry *createFromGeoJson(const std::string& s, std::string& srs);

inline OGRGeometry *fromHandle(OGRGeometryH geom)
Expand Down
4 changes: 4 additions & 0 deletions plugins/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -67,3 +67,7 @@ endif()
if(BUILD_PLUGIN_E57)
add_subdirectory(e57)
endif()

if(BUILD_PLUGIN_ARROW)
add_subdirectory(arrow)
endif()
Loading

0 comments on commit 702ca29

Please sign in to comment.