Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve and document our GeoArrow compatibility story #8123

Open
abey79 opened this issue Nov 13, 2024 · 0 comments
Open

Improve and document our GeoArrow compatibility story #8123

abey79 opened this issue Nov 13, 2024 · 0 comments
Labels
🔩 data model 📖 documentation Improvements or additions to documentation feat-map-view Everything related to the map view

Comments

@abey79
Copy link
Member

abey79 commented Nov 13, 2024

Fundamentally, the new map-related archetypes are interoperable with GeoArrow in an efficient way. The way to achieve this should be documented, and helpers should be introduced.

Here is a mostly self-explanatory script showing ways to efficiently log geoarrow data, interoperate with GeoPandas, and a hint of what future helpers could look like:

from __future__ import annotations

from typing import Iterable

import geoarrow.pyarrow as ga
from geopandas import GeoSeries
import pyarrow as pa

import rerun as rr


# Let's assume we have some GeoArrow data coming from somewhere
# IMPORTANT: This is only points. Things get a bit more complicated if we also introduce lines and polygons (the latter
# of which we don't support yet).
data = ga.as_geoarrow(
    ["MULTIPOINT((37.7749 -122.4194), (34.0522 -118.2437), (32.7157 -117.1611))"],
    coord_type=ga.CoordType.INTERLEAVED,  # Rerun needs interleaved coordinates for zero copy
)

rr.init("rerun_example_raw_geoarrow")
rr.connect_tcp()


# ======================================================================================================================
# Scenario 1: zero-copy logging of raw data in GeoArrow format to

# convert the data to a numpy array of the right shape without copy
data_as_np_array = data.storage.values.values.to_numpy(zero_copy_only=True).reshape(-1, 2)
rr.log("points", rr.GeoPoints(lat_lon=data_as_np_array))


# ======================================================================================================================
# Scenario 2: GeoPandas interoperability

# Load the GeoArrow data into a GeoPandas series (there are many other ways to load data to a GeoSeries)
series = GeoSeries.from_arrow(data)

rr.log("geopandas_points", rr.GeoPoints(lat_lon=series.get_coordinates().to_numpy(copy=False)))


# ======================================================================================================================
# Scenario 3: logging wrapper for GeoArrow data (we should definitely provide these in the SDK eventually)


class GeoArrowLatLonBatch(rr.ComponentBatchLike):
    def __init__(self, lat_lon: pa.Array) -> None:
        # TODO: validate input data format
        self.lat_lon = lat_lon

    def component_name(self) -> str:
        return "rerun.components.LatLon"

    def as_arrow_array(self) -> pa.Array:
        return self.lat_lon.storage.values.cast(rr.components.LatLon.arrow_type())


class GeoPointsWrapper:
    lat_lon: pa.Array

    def __init__(self, lat_lon: pa.Array) -> None:
        self.lat_lon = lat_lon

    def as_component_batches(self) -> Iterable[rr.ComponentBatchLike]:
        return [rr.GeoPoints.indicator(), GeoArrowLatLonBatch(lat_lon=self.lat_lon)]


rr.log("wrapper_points", GeoPointsWrapper(data))
@abey79 abey79 added 📖 documentation Improvements or additions to documentation 🔩 data model feat-map-view Everything related to the map view labels Nov 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🔩 data model 📖 documentation Improvements or additions to documentation feat-map-view Everything related to the map view
Projects
None yet
Development

No branches or pull requests

1 participant