-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Streaming decoder #12
Conversation
Hrm, on micropython
|
So this now works reliably to parse even the largest mapbox vector tiles, though on an embedded device it's quite slow. One thing I'm considering adding is start/end events like a proper SAX API. Currently it's kinda hard to tell where repeated elements start and end. |
Is this ready? |
Basically? At least ready for a review. I think there might be an optimization or two to be had, and haven't looked at writing tests. I just threw a ton of data at it and it didn't break. One thing I saw in profiles is that my buffer wrapper is quite hot, and I think I can avoid a lot of that by not nesting it. And one thing that is kinda unavoidable is that it's less robust to malformed input because it reads things more lazily. And it could be interesting to consider if it's worth it porting over the eager version to this system. Since the eager version is pretty wasteful with memory, allocating a byte array for every level of nesting, and then turning it into an IO again. And I think another small thing that might speed things up is implement readinto on my buffer object somehow so that the parsevint function doesn't need to allocate like it does now. |
Concern: debug logging considerably slows the code down even when not actually logging like 1.9s vs 2.6s on a large file. |
So readinto actually resulted in a small speedup, but not nesting wrappers turned out to be way slower actually because So I guess I'm done with it. Question is what to do with logging and testing. |
Since the core logic seems to be quite stable right now, I'm OK for removing debug logs. However I don't think changing warning messages to print is a good idea since I still want it to be as easy to integrate into CPython projects as possible. Also once you finished, could you please add a couple of unit tests to |
Will do. I did notice the loglevel logic seems incorrect, kinda forgot the details. |
Ohai would you still be interested in this if I got around to adding tests? |
As explained in #11 this PR adds a Wire class that decodes the data without allocating everything at once.
It's a bit hacky because it modifies some things to defer doing work until later which narrowly avoids interfering with normal Wire operation. Maybe there is a way to separate them cleaner or make Wire also more lazy, but I didn't want to break Wire too much.
One example could be to instead of keeping track of repeats, just repeat the value in the fmt dict, so that you can index into it directly rather than searching for the right value.
Probably needs some review, tests, and polish.
Here is some code I used for testing:
Which outputs
This is using a mapbox vector tile sample:
330.zip