-
Notifications
You must be signed in to change notification settings - Fork 193
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
.splat
universal format discussion
#47
Comments
I think that is a great idea; given all the different splat viewers that have been developed, I think this was pretty much inevitable. I think one of the first questions we need to answer is: have we identified all the potential stakeholders in such a project? Ideally the decisions we make here would produce a universal compact format that would be beneficial to anyone that has implemented a viewer (commercial, open-source, and so on). Or maybe my thinking is a little too grandiose at this point and we should just move forward and hope others will join. I'm sure you've read over Aras Pranckevičius's blog about compressing splat files; it would also probably be good to reach out to Kevin Kwok (if you haven't already). As far as the actual approaches we use to organize the data and/or compress it, I'm totally open to suggestions. I admit my current implementation is fairly quick & dirty; I just wanted to get something in place that would cut down the size of the |
Sounds great! I think a universal compact format would be very nice, or at least a plan how the open source community can stay in sync as it keeps improving. I have reached out to Aras Pranckevičius and Kevin Kwok and pointed them here. |
Wall of text! Unity Gaussian Splatting formatThought process behind it is in my blog posts (one, two), but it's kinda like this:
Data "header" is like:
Now the data itself is separate "conceptual files" of:
For color data (which is RGB color plus opacity), I store that in a 2D texture, to enable GPU compression. The texture width is always 2048 (allows for up to 32M splats total, given that max GPU texture height is 16k); height is dependent on splat count but always multiple of 16. And the order that the splat data is laid out inside the texture is not simple row-major, but rather each "chunk" (256 splats) is put into 16x16 blocks, and within a block pixels are arranged in Morton order.
For rotation+scale+SH data, it's like this:
Additional data transformations that are done:
In my implementation the "Very Low" preset is the one that's not absolutely smallest allowed by these formats,
So overall this is like 13.25 bytes/splat, plus 360 KB of SH palette data for the whole splat cloud. For a million splats, this antimatter15/splat .splat formatFrom what I can tell, the format there is 32 bytes/splat (for a million splats: 32 MB):
The format drops all SH data, so some of the realistm of "shininess" of surfaces is lost when looking at them from different angles. Some "probably easy" ways of making this data smaller:
The above would get it down to 20 bytes/splat, but the format would still not have spherical harmonics. gsplat.tech formatI don't recall that format details right now, but IIRC it was something like:
Difference here is that instead of storing rotation+scale, they store 6 numbers of the "3D covariance". This saves a tiny bit of calculations in the shader, but you can't easily factor this out into rotation/scale again if you want to visualize splats as something else than splats (but e.g. oriented boxes). Their genius bit is the clustered/paletted SH table idea. They also do something with opacities, as if they are not stored directly in 8 bits available, but also each of 256 opacity values indexes into a premade table. I guess this achieves similar effect as my "non-linear opacity transformation" above. Their splat data is thus 24 bytes/splat, and IIRC they always store 64k possible SH entries, at half precision float format, so that table is always like 6 MB. For a million splats: 30 MB of data. Wot I think a format could beSo the first question is, do you want a "simple" format like .splat or gsplat.tech, where there's no "chunks" but rather data for each splat is just stored somehow quantized acceptably. This is simple, but probably hard to get below ~20 bytes/splat. With "chunking" like what Unity project does, it get a bit more complicated, but since each chunk stores min/max value range, it is possible to quantize the actual splat values in smaller amount of bits, while still retaining good quality. This is important for positions, scales and color data. Another question is, do you want to have spherical harmonics data or not. It's probably out of the question that each splat would store any form of SH data per-splat, since it's way too large. Even if you cut it down massively (e.g. BC1 GPU compression like in https://aras-p.info/blog/2023/09/13/Making-Gaussian-Splats-smaller/), that is still 7.5 bytes/splat just for SH. For the web, I think the only practical choices are:
I quite like the chunking approach TBH, and it's not terribly complexicated. Keeping WebGL2 in mind that can't read from arbitrary data buffers, there's a certain elegance of putting all the possible data formats into GPU textures, and letting the GPU hardware do all the sampling and decoding. I initially had that in the unity project, but then backed out of that partially because things like "float3" just don't exist as a GPU format (WebGL has it, but internally for GPUs that gets turned into a float4, thus wasting some VRAM). However, that is not a big deal, and specifically for the web, I doubt anyone would use float3 format options. So it might make sense to put everything textures, laid out in the same order as color data in unity project case, i.e.: All the splats are put into "chunks" of 256 splats size. These are preferably put in some sort of "chunk is small / close in space" fashion, e.g. by rearranging splats in 3D morton order by position or some other way. Each chunk stores min/max values of: position (float3 x2), scale (half3 x2), color and opacity (half4 x2). This is 52 bytes/chunk (or 0.2 bytes/splat). For WebGL2 usage, this could be put into a R32UI texture, with rows of 13 pixels containing raw chunk data bits, and within the shader you convert from raw bits into floats and halfs (using Now, you also have more textures, with per-splat data:
So defaults listed above would be 16 bytes/splat (or 13 bytes/splat when using UASTC). And the SH palette data would be stored in half3 format, each of 15 RGB SH values arranged in 4x4 pixel block (with one pixel unused), similar to how gsplat.tech does it. For 4k SH palette that would be 386 KB. But if really really needed, you could get crazily lower, with zero added complexity (since all the data is "just textures" and shader code does not really care how GPU decodes them): positions, rotations, scales and color all using UASTC, and drop SH index. Now it's just 4 bytes/splat; it would look a bit like https://aras-p.info/img/blog/2023/gaussian-splat/GsTruck_4VeryLow.jpg but if you really need to go super small then heck why not. Now, another question is whether all this data should be in one file, or multiple files. I don't know much about web practices, like is it better to load one file, or multiple files in parallel? One file (like .splat or .ply) is very convenient to use. gsplat.tech IIRC loads from something like 4 files at once. I don't know which approach is better. Conceptually, if it's one file, I'd put data in this order:
This way you can display "something" while it's loading, kinda similar to Luma's "magic reveal" but not quite:
"Technology" needed to build all of the above (all/most of that exists in Unity project, but it's all written in C#):
All of the above maybe could be done as some ad-hoc format, or maybe as some way of using glTF2. @hybridherbst might know more there. |
Hi, we would be very much on board with finding a good universal representation for the ref implementation too! Speaking only for myself, my current thought is that it makes sense to leave this mainly to the community, but there's one concrete concern: In order to preserve the usability of the representation in scientific contexts as well, it would require at least an OPTION where everything, including SHs, is stored losslessly. We would love to be kept in the loop! If a reasonable consensus is found, we would do our best to quickly support it in the ref! Best, |
Yeah that makes a lot of sense. This is what I do in the unity project too, e.g. some of the (WIP) splat editing tools only are enabled and only work if literally everything is full floats. Then I actually don't do any "chunking" at all (since it's both cumbersome when editing splats, plus might lead to a precision loss). |
needs:
|
Isn't this just original PLY files? |
Thanks @aras-p for the amazing wall of text! I'm on board with the layout you described. My general sense is that |
Not sure if .png vs .jpg analogy holds up all that well. PLY is completely uncompressed, full float32 precision data (whereas .png is "compressed losslessly"). FWIW I tried doing some lossless compression of 3DGS data, but it does not compress well, mostly because things like rotations and scales are very random. Anyway, probably first question is what's the scope that we're targeting. Everything I wrote above is more towards "this represents a single gaussian splat cloud, nothing else". @chris-aeviator comment above indicates needs for it to be more extensible and/or ability to augment it with some additional metadata. I think (ab)using glTF2 might be a very viable way to look into that. glTF2 itself would provide ability to put "more than just the splat" into a file in case someone needs it (e.g. positions of the cameras, transform of the splat itself, etc.). A good question is though, how exactly to represent the data of the splats. If we go towards the "everything is actually put into textures" idea as in my previous comment, then maybe splat data could be put into glTF2 roughly like so:
Advantage of glTF2 is that it's very much "native" for the web stack, i.e. almost any 3D engine on the web supports it, including things like UASTC texture data transcoding handling. And it would be somewhat "extensible" for future (animated splats, etc.), because the "file format" is just glTF2. But whether anything above makes sense at all, would have to be evaluated by someone who actually knows anything about glTF2. Maybe I'll ask some people around :) |
I haven't fully digested all the great points above (and I don't have any experience with texture compression), so some of this might be wrong! But so far here's how I've been thinking: I would like the format to support streaming— where something can be shown as soon as possible. And in particular, I would like to support a kind of "early termination" where on devices that are either compute and/or bandwidth constrained, some subset of the splats are loaded. i.e. I would like it if mobile devices could just fetch the first n-MB of a file and then abort the transfer and still be able to deliver an acceptable interactive user experience. I would like the format to be deliverable as a single file, rather than a number of files or a folder. I would like the file format to support sharing additional information— for instance the contents of I think it would be nice to support palletized spherical harmonics, but loading them after all the uniformly colored splats. Additionally, I would like the format to be fairly simple to parse and to generate. I think Another thing I have played around with a little bit is to take the "far away" splats and condense them into a panoramic skybox. At the very least I think that the format should be able to represent whether the background is assumed to be black, white or transparent. But having an arbitrary skybox cubemap texture might also be useful for compression. I think that the space is probably evolving too fast for this format to be the "last word" on splat shipping. I haven't really thought of what the right format would be for dynamic/animated splats, for scenes consisting of multiple splats (either somewhat naively composed, or arranged into some regular grid). Perhaps in the future there might be a way to do coarse-to-fine/LOD splats. My thinking was a new
|
It might some of the points I raised really belong to a streaming compatible format, so much reminds me of tiling used e.g. in Vector Maps.
If gltf supports additional metadata this would need schemas, but I think it’s suffice to say that people will want to vary precision and nr of SH channels - these contribute the most to file size
… Am 08.11.2023 um 18:16 schrieb Aras Pranckevičius ***@***.***>:
Not sure if .png vs .jpg analogy holds up all that well. PLY is completely uncompressed, full float32 precision data (whereas .png is "compressed losslessly"). FWIW I tried doing some lossless compression of 3DGS data, but it does not compress well, mostly because things like rotations and scales are very random.
Anyway, probably first question is what's the scope that we're targeting. Everything I wrote above is more towards "this represents a single gaussian splat cloud, nothing else". @chris-aeviator comment above indicates needs for it to be more extensible and/or ability to augment it with some additional metadata.
I think (ab)using glTF2 might be a very viable way to look into that. glTF2 itself would provide ability to put "more than just the splat" into a file in case someone needs it (e.g. positions of the cameras, transform of the splat itself, etc.). A good question is though, how exactly to represent the data of the splats. If we go towards the "everything is actually put into textures" idea as in my previous comment, then maybe splat data could be put into glTF2 roughly like so:
The glTF2 file defines "something dummy" for a mesh (if that's needed at all), like one quad or something.
Then it defines / references all the data textures needed for the splats.
And then it defines a custom "splat material" with some "extra" properties that have nothing to do with standard PBR materials, but instead reference the needed data textures, as well as any other data as needed.
Advantage of glTF2 is that it's very much "native" for the web stack, i.e. almost any 3D engine on the web supports it, including things like UASTC texture data transcoding handling.
And it would be somewhat "extensible" for future (animated splats, etc.), because the "file format" is just glTF2.
But whether anything above makes sense at all, would have to be evaluated by someone who actually knows anything about glTF2. Maybe I'll ask some people around :)
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you were mentioned.
|
I'd be thrilled if GLTF2 could be made to store splats; on the surface, it seems to support both the streaming and extensibility features one would want for an evolving, bleeding-edge rendering primitive. This article provides context for how BabylonJS handles the API for streaming GLTF2 loading via Microsoft's LOD Extension: https://doc.babylonjs.com/features/featuresDeepDive/importers/glTF/progressiveglTFLoad Gaussian Splats have the added benefit that LODs are additive; presumably the low LODs will consist of the largest % of splats, with smaller, more transparent splats loading as part of the high LODs. |
I love aras's chunked approach, yes yes and more yes. for LOD, I wonder if you could do something within the 256 splat chunks that is like, a mini treelet. so you have a single mega splat representing the whole chunk, then do a radix-4 or radix-16 tree from there, storing deltas to your parent params? I guess you'd have to see if it actually helped but im kinda thinking along the lines of - more likely to make the splat values close to 0, so that a byte oriented compression of the output (brotli or lz4 or something) would get to squish it harder, without overly complicating the format. you could even 'delta' and 'undelta' at serialisation time, in-place, so that the in-memory / in-texture format is exactly as aras describes (plus the tree structure for LOD, I guess) but the on disk version has had the parent values subtracted out to make things more compressible. |
@mmalex you know where I got the idea for the chunked style, right? From your own Dreams presentation :) |
I second what several others have already mentioned about supporting "future" data. This space is indeed evolving fast and I think whatever strategy we ultimately land on should to be adaptable/flexible enough to evolve with it (I know that's easy to say and possibly not so easy to do :) ). I also believe supporting some sort of LOD mechanism is very important and will ultimately be required if we want these viewers to be capable of rendering large scenes. Gaussian splat LODs are out of my wheelhouse so I am unsure whether or not the lower-fidelity data should be produced at the same time as the original @zalo and @mmalex -- I would definitely like to learn more about the LOD strategies you are proposing. @zalo -- If Gaussian splat LODs are additive, would supporting LODs simply be a matter of properly ordering the base data (highest fidelity) within the As far as compression goes, I'm a big fan of the chunked approach as well -- thank you @aras-p for sharing your very detailed and insightful thoughts on this matter. |
glTF 2.0 has a concept of "extensions", and that's usually the path by which new features are added and adopted. Here, I'd imagine defining "scenes": [
"children": [ 0 ],
],
"nodes": [
{
"name": "MySplat",
"extensions": {
"EXT_splat": {
"count": 1024,
"chunkData": 25, // accessor index to f32[] data?
"positionTexture": 0, // texture index
"rotationTexture": 1,
"scaleTexture": 2,
"colorTexture": 3,
"shPaletteTexture": 4,
}
}
}
], The texture indices resolve to a texture associated with the file, which could be PNG or UASTC or something else. Future extensions could add new texture formats to glTF 2.0, and that wouldn't affect the The If anyone would like to help with creating input data (.png or .exr uncompressed textures?) and defining the metadata, I'd be happy to help with converting textures to KTX2/UASTC and constructing glTF files using the hypothetical extension above. Also see https://github.com/donmccurdy/KTX2-Samples/blob/main/encode.sh for examples of KTX2 encoding steps (requires the latest KTX Software CLI alpha release).
Neither is strictly better. Web clients can use range requests to grab chunks of a file as if they were multiple files. But not all applications or servers implement range requests, and there is a bit of overhead on each request, so choices vary. glTF has some flexibility here — .glb uses embedded resources, .gltf uses external resources, and conversion between the two (including any glTF extensions) is trivial. |
We're obviously heading at breakneck speed towards the Splataverse ... someone had better reserve the domain names, etc, for this creature ... |
I took the domain splats.ai last month LOL.
I came from 3D graphics background and played with glTF a lot, I will see what I can do to help 'glTF2 extension that loads splat'. Even if the community agreed on creating a new file format, the glTF extension approach would still be valueable because glTF easily reaches the broader 3D audience. Do you think this glTF extension idea should be discussed seperately? |
We're busy implementing a slightly compressed GS PLY format as interim solution while an all bells and whistles format is being thrashed out. It takes ideas directly from dreams(@mmalex) and @aras-p and packages the data into a standard PLY file (though using non-standard PLY properties). The PLY file contains two elements:
This gives roughly 4x saving over uncompressed PLY data without much quality degradation (visual tests still to be done). We have a PR implementing decompression here. It's a simple format with narrow scope, but please do let us know if we've missed anything obvious! |
@slimbuck nice! I very much like the simplicity. I would think that within the same size, you could improve quality slightly (just a guess, I haven't actually tested it), at expense of a small amount of complexity:
|
Thanks @aras-p! I wasn't sure the extra complexity was worth it in such a simple format, but perhaps it is. I'll make these changes and compare the results to see if they make any difference. Thanks again! |
I did some testing today with the garden scene, train scene and guitar scene and found the 2/10/10/10 quaternion format lowered reconstruction error deviation by a good 12-33%. I've updated the format to adopt this. I also changed chunks to store min/max instead of min/size, which just makes a lot of sense. I tried the squared opacity mapping change too, but actually found that it resulted in worse error variance with my test scenes. Not too sure why this might be. So we're just storing plain old opacity for now. We added this format to our editor tool's import & export if anyone is interested to give it a try. Next up we will likely investigate some sort of splat LOD and support for skinning. It would be great to hear if anyone has started investigating either of these! |
if the splat format is being renewed, would it be possible to include bounds in the header somewhere? especially for streaming it would be very useful to be able to dynamically center/position a splat before it has fully loaded. i am using the antimatter format currently with streaming, so the model appears immediately, but it has no interop with center and fitting components. |
Would the LightGaussian paper be of help here? |
Using extensions is great. It is one of the gems of the GLTF format. For the GLTF Gsplat format, here's our (Adobe's preliminary) practice:
Our GLTF format is a direct translation of the *.PLY format. A GLTF Gaussian splats file contains an extension named
In addition, we also have a Finally, we have a |
maybe @zeux has some Ideas from meshopt compression? |
@arpu I have tried using meshopt on gaussian splat data, and the results are, well, "not stellar". Things like rotations and scales are very random, and don't losslessly compress well. |
@aras-p Unsure if you tried filters (eg rotations would need to use the quaternion filter, scales a shared exponent filter), but they would be a requirement for sure. Realistically probably would also need custom filters for some data types for best performance. Lossless is definitely a no-go in any event. Finally, for geometry data meshopt codecs rely on the implicit spatial coherence; GS data would need to be clustered in some way for codecs to work. (one potential issue is that different components of GS might not be well corellated unlike vertex data, so this may not work as well) |
@nepluno It would be cool if there were a way to compose with draco compression as is already implemented and sort-of-standardized for point cloud data (it exists tho not sure how often used https://github.com/google/draco/blob/7d58126d076bc3f5f9d8c114d1700b7311faecfe/src/draco/io/point_cloud_io.h#L56 ) In theory, couldn't gsplat GLTF use no extensions, just use a bunch of standard buffers, then leave it up to the viewer gl code to interpret the data? That provides some fallback (e.g. store points as a standard point cloud) and gives some wiggle room for viewer improvements that are likely in the coming 6-24 months. That said, I know a main point of this discussion is to attempt to get some consensus versus hacky mitigations ... |
@pwais Yes I fully agree on this! There's already some discussion on supporting Draco for point cloud in GLTF. KhronosGroup/glTF#1809
That could also be doable...but the main goal of using GLTF is that we need some hierarchy to do some sort of composition (otherwise using just PLY is fine), and hence we need some tag to identify that a buffer is related to a Gsplat and another buffer is not. |
If attributes were stored in accessors rather than buffers (see #47 (comment)), we would:
By using the existing glTF accessor/bufferview/buffer constructs, you have the choice of storing data as base64 data URIs, or as binary. The three.js community uses glTF extensively, and I always recommend avoiding these data URIs because the loading cost is significant, unless the data involved is trivially small. |
The hierarchy and organization is def helpful, but pretty sure that the performance benefit of GLTF is that the buffers largely skip any JS intervention / SERDES, unlike PLY (at least canonical THREE PLY loader appears to be non-native code). That might make a marginally small benefit if you just load one single object / scene to viz, even if it's a few megabytes... But if you want to viz a stream of splats / point cloud data, then GLTF beats the pants off anything that requires javascript SERDES. E.g. try loading 1,000 frames of ~1MB PLYs (or protobufs or your fav buffer format) versus 1,000 frames of GLTFs. And scrub over those frames. Browsers / phones from even 2-4 years ago, GLTFs are night-and-day better, especially above 1M points. (Like imagine if you're working in robotics, and you want to introduce a replacement for rosbag... you surely would try out GLTF wouldn't you? no?!)
@donmccurdy 💯 💯 base64 data uris are extremely handy hacks but +1 to optional but not required |
Agreed. Storing the attributes into individual accessors would definitely be better, just as how mesh vertices are stored. |
Yes! That's another reason we'd make a schema to store Gsplat in GLTFs. |
A bit late the party here, but is there magic number or specific file header that we all agree on that I can reliably use to detect whether or not a given file is a .splat file? |
Stumbled upon this thread, but as I've been working more and more with .splats in the architecture space/use cases, i think .gltf would be a great container for it, especially with meshopt compression etc. |
So you would split up the different attributes of a splat vertex into what parts of the gltf file exactly? I'm not as familiar, so I don't immediately know how I would go about that. |
@arcman7 if you scroll up to some[ previous comments you can see some suggestions from folks but .gltf can already show off point clouds using point primitives, so that handles the XYZ locations of all the points in a splat, |
Has anyone tried to use Python to implement ply conversion to ksplat? |
After posting on an issue on the KhronosGroup glTF repo, it was suggested I post this here as well, since it's relevant here too, and would get more eyes on it: One concern raised in the MrNerf Discord server, is that standardizing on a format this early in the game might either stifle innovation, or result in an explosion of competing standards, especially in a field as actively researched as this. While the creation of standards is inevitable, whether they will be able to be successful at this point remains to be seen. After some further discussion, we instead worked out a super early(!) draft for a high-level container format. |
Just cross-linking a discussion nianticlabs/spz#7 given the recent announcement by Niantic of their open-sourcing of the spz format which might be interesting to some people in this discussion. nianticlabs/spz#7 |
Hello! I'm the author of gsplat.js, in which I'm using the splat format as provided in antimatter15/splat
I have opened an issue on splat compression, and I think it would be great if we can have a universal representation, with a consistent header to support different compression methods.
I can replicate your compressed format, but maybe we can open a common format repo with test files, so we can stay on the same page?
What do you think?
The text was updated successfully, but these errors were encountered: