-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor many things, fix extra/missing commas #58
base: develop
Are you sure you want to change the base?
Commits on Apr 6, 2023
-
ref: improve usesAscii28 parsing from version
Also adds some tests for this
Configuration menu - View commit details
-
Copy full SHA for eb7ebed - Browse repository at this point
Copy the full SHA eb7ebedView commit details -
Configuration menu - View commit details
-
Copy full SHA for c218b74 - Browse repository at this point
Copy the full SHA c218b74View commit details -
Configuration menu - View commit details
-
Copy full SHA for 8e0e914 - Browse repository at this point
Copy the full SHA 8e0e914View commit details -
Configuration menu - View commit details
-
Copy full SHA for 413fbcc - Browse repository at this point
Copy the full SHA 413fbccView commit details -
ref: statically compile regexes
By using static variables inside helper functions, these regexes will only get compiled/allocated once. Since this only happens once, we don't need to worry about freeing the memory after use. This greatly simplifies init/cleanup of the context struct. While I was at it, make the helper functions more readable at what they are doing.
Configuration menu - View commit details
-
Copy full SHA for 1df3bd3 - Browse repository at this point
Copy the full SHA 1df3bd3View commit details -
Configuration menu - View commit details
-
Copy full SHA for cb308c2 - Browse repository at this point
Copy the full SHA cb308c2View commit details
Commits on Apr 7, 2023
-
ref: move string stuff to string_utils.h
Also: - remove passing "extension" around excessively in fec.c - stop passing around the full FEC_CONTEXT excessively when you only need access to a single line. - Capitalize CSV_EXTENSION since it's a constant
Configuration menu - View commit details
-
Copy full SHA for a79be13 - Browse repository at this point
Copy the full SHA a79be13View commit details -
ref: remove unused FEC_CONTEXT.useAscii28
After I did all that refactoring, I now realize we don't even use this info at all. As we read each line, we see if it contains any asci28. If so, then we treat that as the delimiter. Otherwise default to comma.
Configuration menu - View commit details
-
Copy full SHA for 22fcc2c - Browse repository at this point
Copy the full SHA 22fcc2cView commit details -
ref: Move isParseDone() to csv.h and csv.c
It really is a private method of PARSE_CONTEXT, we don't care about the implementation in fec.c
Configuration menu - View commit details
-
Copy full SHA for 1467a7d - Browse repository at this point
Copy the full SHA 1467a7dView commit details -
ref: move initParseContext() into csv.h and csv.c
This is an implementation detail of the csv parser, we don't want to have to deal with it.
Configuration menu - View commit details
-
Copy full SHA for 4ef050a - Browse repository at this point
Copy the full SHA 4ef050aView commit details -
ref: Move readField() into csv.h
Another implementation detail of the csv reader.
Configuration menu - View commit details
-
Copy full SHA for 088bf0b - Browse repository at this point
Copy the full SHA 088bf0bView commit details -
ref: store FIELD_INFO in PARSE_CONTEXT as value, not pointer
We never need to share one FIELD_INFO between multiple places, and the lifetimes make sense to be tied to each other. This makes the API and lifecycle management simpler.
Configuration menu - View commit details
-
Copy full SHA for 98774db - Browse repository at this point
Copy the full SHA 98774dbView commit details -
If I add "-Werror" build flag, then I found a few errors. Ideally I think we should turn on this flag by default to help catch more bugs.
Configuration menu - View commit details
-
Copy full SHA for 88c31c7 - Browse repository at this point
Copy the full SHA 88c31c7View commit details -
ref: Move mapping regexes to separate, static struct
It was weird to store these regexes in the persistentMemory. We only have one version of the mappings per compilation (and runtime) so we can just store them statically. Next up would be to refactor the lookupMappings() function from fec.c to mappings.c While I'm at it, clean up the regex compilation to share the helper function we already have. This involved creating regex.h to hold that shared regexCompile()
Configuration menu - View commit details
-
Copy full SHA for cd86048 - Browse repository at this point
Copy the full SHA cd86048View commit details -
Configuration menu - View commit details
-
Copy full SHA for efe1ff3 - Browse repository at this point
Copy the full SHA efe1ff3View commit details -
Configuration menu - View commit details
-
Copy full SHA for 2f0d02f - Browse repository at this point
Copy the full SHA 2f0d02fView commit details -
ref: Replace py.path with pathlib.Path in tests
tmpdir is deprecated in pytest.
Configuration menu - View commit details
-
Copy full SHA for 41672ce - Browse repository at this point
Copy the full SHA 41672ceView commit details -
ref: test: simplify getting fixtures
We never used the load kwarg, and the helper function did not need to be a fixture itself.
Configuration menu - View commit details
-
Copy full SHA for 30b27a7 - Browse repository at this point
Copy the full SHA 30b27a7View commit details -
Configuration menu - View commit details
-
Copy full SHA for c9ef40c - Browse repository at this point
Copy the full SHA c9ef40cView commit details -
fix: Make reader thread a daemon
We want the reader thread to be stopped if the parent thread stops. Before, my pytest runs sometime hung.
Configuration menu - View commit details
-
Copy full SHA for 706710b - Browse repository at this point
Copy the full SHA 706710bView commit details -
test: ref: Overhaul python testing
Before, the testing was a bit half-baked. We didn't actually check the contents of the output files. We also weren't very thorough, and only checked some APIs on some examples. Now we test all APIs on all examples. Some example .fec files that were tracked in git weren't actually getting tested at all. For example, the 13360.fec file actually produces a segfault! We need to track down the issue and fix it. It is skipped for now. These changes also revealed an inconsistency in howform types are normalized. See `python/tests/filings
Configuration menu - View commit details
-
Copy full SHA for c784f91 - Browse repository at this point
Copy the full SHA c784f91View commit details
Commits on Apr 8, 2023
-
ref: take char*, not PARSE_CONTEXT, as arg to lookupMappings()
We didn't need all the info that PARSE_CONTEXT contained.
Configuration menu - View commit details
-
Copy full SHA for 7bc8549 - Browse repository at this point
Copy the full SHA 7bc8549View commit details -
Configuration menu - View commit details
-
Copy full SHA for 7427a07 - Browse repository at this point
Copy the full SHA 7427a07View commit details -
test: add a unique marker for each test case
So to run the "trailing_commas" test, you can do: pytest -m trailing_commas
Configuration menu - View commit details
-
Copy full SHA for 18451c1 - Browse repository at this point
Copy the full SHA 18451c1View commit details -
ref: Move all form schema lookup logic into mappings.c
Now there is a single understandable `FORM_SCHEMA *lookupSchema(const char *version, int versionLength, const char *form, int formLength);` function. The stuff in fec.c just wraps it with logic specific to FEC_CONTEXT
Configuration menu - View commit details
-
Copy full SHA for c17412e - Browse repository at this point
Copy the full SHA c17412eView commit details -
Configuration menu - View commit details
-
Copy full SHA for 46db0b4 - Browse repository at this point
Copy the full SHA 46db0b4View commit details -
ref: rename lookupSchema() to formSchemaLookup()
and similarly, freeSchema() to formSchemaFree()
Configuration menu - View commit details
-
Copy full SHA for 05dccfa - Browse repository at this point
Copy the full SHA 05dccfaView commit details -
Configuration menu - View commit details
-
Copy full SHA for 6772cdc - Browse repository at this point
Copy the full SHA 6772cdcView commit details -
Configuration menu - View commit details
-
Copy full SHA for 93456d3 - Browse repository at this point
Copy the full SHA 93456d3View commit details -
Configuration menu - View commit details
-
Copy full SHA for 2e84117 - Browse repository at this point
Copy the full SHA 2e84117View commit details -
ref: Rename writeQuotedCsvField() to writeQuotedString()
This doesn't have to do with csv fields, it's just a raw string.
Configuration menu - View commit details
-
Copy full SHA for 3dd36c4 - Browse repository at this point
Copy the full SHA 3dd36c4View commit details -
Configuration menu - View commit details
-
Copy full SHA for 86730f5 - Browse repository at this point
Copy the full SHA 86730f5View commit details -
ref: rename PARSE_CONTEXT to CSV_LINE_PARSER
When I first read this code I was confused as to this structs role.
Configuration menu - View commit details
-
Copy full SHA for dbb7ee8 - Browse repository at this point
Copy the full SHA dbb7ee8View commit details -
Configuration menu - View commit details
-
Copy full SHA for 4e5cdd1 - Browse repository at this point
Copy the full SHA 4e5cdd1View commit details
Commits on Apr 10, 2023
-
endOfField() is only used in csv.c so it doesn't need to be declared.
Configuration menu - View commit details
-
Copy full SHA for b55d079 - Browse repository at this point
Copy the full SHA b55d079View commit details -
Configuration menu - View commit details
-
Copy full SHA for 29a73bf - Browse repository at this point
Copy the full SHA 29a73bfView commit details -
ref: csv: make advanceField() a noop if isParseDone()
Now the call sites don't have to do the test, and the loop logic is much simpler.
Configuration menu - View commit details
-
Copy full SHA for 66ad20d - Browse repository at this point
Copy the full SHA 66ad20dView commit details -
ref: make setVersion() take explicit str
Before, it set the version string based on the implicit ctx->persistentMemory->line->str, which was not at all obvious from the call sites.
Configuration menu - View commit details
-
Copy full SHA for 39e5174 - Browse repository at this point
Copy the full SHA 39e5174View commit details -
Configuration menu - View commit details
-
Copy full SHA for b30a4de - Browse repository at this point
Copy the full SHA b30a4deView commit details -
Configuration menu - View commit details
-
Copy full SHA for 0d4d3c6 - Browse repository at this point
Copy the full SHA 0d4d3c6View commit details -
fix: Fix segfault crash when parsing legacy header
The testing for whitespace was wrong, which meant we never found the version key, so we never setVersion(), so in later steps we got a null pointer for ctx->version
Configuration menu - View commit details
-
Copy full SHA for 1c8173a - Browse repository at this point
Copy the full SHA 1c8173aView commit details -
Configuration menu - View commit details
-
Copy full SHA for 3f75fd1 - Browse repository at this point
Copy the full SHA 3f75fd1View commit details -
Configuration menu - View commit details
-
Copy full SHA for 361fac5 - Browse repository at this point
Copy the full SHA 361fac5View commit details -
Configuration menu - View commit details
-
Copy full SHA for 0c2d4e3 - Browse repository at this point
Copy the full SHA 0c2d4e3View commit details -
Configuration menu - View commit details
-
Copy full SHA for d6df47f - Browse repository at this point
Copy the full SHA d6df47fView commit details -
Remove extra FEC constant in fec.c
It is only used in one place, so remove one level of indirection
Configuration menu - View commit details
-
Copy full SHA for 9e53f1b - Browse repository at this point
Copy the full SHA 9e53f1bView commit details -
feat: python: prioritize zig-out dir when searching for dll
First of all this was broken: - it always returned os.path.join(PARENT_DIR, files[0]), not where we actually found files. - The zig build directory wasn't right. - Really we want to warn if there are multiple DLLs for a certain dir, not for a certain pattern. Before, I had to run - zig build - python setup.py - pytest to do a testing cycle. Now I don't need the `python setup.py` step.
Configuration menu - View commit details
-
Copy full SHA for e5a70d6 - Browse repository at this point
Copy the full SHA e5a70d6View commit details -
Configuration menu - View commit details
-
Copy full SHA for 1f9aa57 - Browse repository at this point
Copy the full SHA 1f9aa57View commit details -
Configuration menu - View commit details
-
Copy full SHA for d6d3f5f - Browse repository at this point
Copy the full SHA d6d3f5fView commit details -
Configuration menu - View commit details
-
Copy full SHA for aea66e9 - Browse repository at this point
Copy the full SHA aea66e9View commit details -
ref: explicitly pass NULL for types for header
We haven't yet set ctx->types yet, so it will always be NULL at this point. But make this explicit. We set ctx->types once we start parsing lines. It would be irrelevant to even try to pass something here, because there is no entry in the `types` lookup table for the `hdr` form.
Configuration menu - View commit details
-
Copy full SHA for e2abdc9 - Browse repository at this point
Copy the full SHA e2abdc9View commit details -
ref: fix: make formSchemaFree() safe from NULLS
in the docstring we say it is, but it wasn't!
Configuration menu - View commit details
-
Copy full SHA for d0d8188 - Browse repository at this point
Copy the full SHA d0d8188View commit details
Commits on Apr 11, 2023
-
Configuration menu - View commit details
-
Copy full SHA for c531383 - Browse repository at this point
Copy the full SHA c531383View commit details -
Configuration menu - View commit details
-
Copy full SHA for e632121 - Browse repository at this point
Copy the full SHA e632121View commit details -
Configuration menu - View commit details
-
Copy full SHA for 235aaae - Browse repository at this point
Copy the full SHA 235aaaeView commit details -
Configuration menu - View commit details
-
Copy full SHA for 3628de1 - Browse repository at this point
Copy the full SHA 3628de1View commit details -
fix: don't reallocate extra byte in copyString()
growStringTo() expects an arg for the total bytes needed including the null terminator. STRING->n represents exactly this, we don't need to compensate with an extra +1 like we do when we are reading from a raw `char *`
Configuration menu - View commit details
-
Copy full SHA for f97e8b1 - Browse repository at this point
Copy the full SHA f97e8b1View commit details -
ref: don't excessively pass around FEC_CONTEXT to _lineContainsF99Sta…
…rt() etc just pass the actual line we are operating on. Alos make them static "private" with a leading underscore. This DOES slightly change the behavior by no longer using ctx->currentLineLength, but I think that is vestigal and unused anyway, and I'm about to remove it.
Configuration menu - View commit details
-
Copy full SHA for cfcebb9 - Browse repository at this point
Copy the full SHA cfcebb9View commit details -
ref: remove unused FEC_CONTEXT.currentLineLength, LINE_INFO.length
These were mistakenly used before, when really we should have been using ctx->persistentMemory->line->n Now I've switched us to that everywhere, so it's safe to remove this. This also has the cascading effect of allowing us to simplify the API of decodeLine() so it no longer returns anything, it just decodes the line. This also fixes a bug in iso_8859_1_to_utf_8() where we didn't increment length when we should have. I'm amazed this didn't crash before? It makes me nervous that we're not using it correctly elsewhere.
Configuration menu - View commit details
-
Copy full SHA for 472104d - Browse repository at this point
Copy the full SHA 472104dView commit details -
Configuration menu - View commit details
-
Copy full SHA for be55068 - Browse repository at this point
Copy the full SHA be55068View commit details -
ref: Move date and float csv writing to csv.c
Also continue removing passing FEC_CONTEXT around willy nilly
Configuration menu - View commit details
-
Copy full SHA for 567e85d - Browse repository at this point
Copy the full SHA 567e85dView commit details -
ref: Create CSV_FIELD as first class citizen in csv API
Now we can pass around this object that contains the parsed value, instead of having to deal with the pointers in the parser.
Configuration menu - View commit details
-
Copy full SHA for 8c25c50 - Browse repository at this point
Copy the full SHA 8c25c50View commit details -
Configuration menu - View commit details
-
Copy full SHA for 3699c97 - Browse repository at this point
Copy the full SHA 3699c97View commit details -
ref: csv: Remove advanceField(), replace columnIndex with numFieldsRead
I think this API makes more sense. Now the parser actis like an iterator. You call nextField() to get hte next field and advance. No need to do the additional followup step of calling advanceField(). Renaming to numFieldsRead also is more intuitive for the users
Configuration menu - View commit details
-
Copy full SHA for f7addbb - Browse repository at this point
Copy the full SHA f7addbbView commit details -
ref: reduce level of nesting in parseLine()
Make it a special case to pull out the first column as the form type. Now we don't need an if() inside the loop, which is more understandable
Configuration menu - View commit details
-
Copy full SHA for 95d2a5d - Browse repository at this point
Copy the full SHA 95d2a5dView commit details -
Configuration menu - View commit details
-
Copy full SHA for 55c107b - Browse repository at this point
Copy the full SHA 55c107bView commit details -
Configuration menu - View commit details
-
Copy full SHA for 1258352 - Browse repository at this point
Copy the full SHA 1258352View commit details -
Configuration menu - View commit details
-
Copy full SHA for e50e4b2 - Browse repository at this point
Copy the full SHA e50e4b2View commit details -
ref: reduce API of collectLineInfo()
We don't need the full STRING, so don't ask for it.
Configuration menu - View commit details
-
Copy full SHA for 466cc9e - Browse repository at this point
Copy the full SHA 466cc9eView commit details -
ref: simplify API of encoding.decodeLine()
We only were ever using the ascii28 flag from the result, so don't pass around the rest of it.
Configuration menu - View commit details
-
Copy full SHA for 4f7a575 - Browse repository at this point
Copy the full SHA 4f7a575View commit details -
Configuration menu - View commit details
-
Copy full SHA for 007c1c8 - Browse repository at this point
Copy the full SHA 007c1c8View commit details -
Configuration menu - View commit details
-
Copy full SHA for 0e846f9 - Browse repository at this point
Copy the full SHA 0e846f9View commit details -
Configuration menu - View commit details
-
Copy full SHA for 6401f2b - Browse repository at this point
Copy the full SHA 6401f2bView commit details -
Configuration menu - View commit details
-
Copy full SHA for bbe1e7a - Browse repository at this point
Copy the full SHA bbe1e7aView commit details -
ref: cleanup malloc()s, don't cast result
We get a void * back, so the casts just add clutter.
Configuration menu - View commit details
-
Copy full SHA for 8340587 - Browse repository at this point
Copy the full SHA 8340587View commit details -
ref: re-order stuff in FEC_CONTEXT
try to put the related things next to each other
Configuration menu - View commit details
-
Copy full SHA for 763fbf9 - Browse repository at this point
Copy the full SHA 763fbf9View commit details -
ref: make path length calculation make more sense
put the parts in order
Configuration menu - View commit details
-
Copy full SHA for 1d7af46 - Browse repository at this point
Copy the full SHA 1d7af46View commit details -
Configuration menu - View commit details
-
Copy full SHA for df17f90 - Browse repository at this point
Copy the full SHA df17f90View commit details -
Configuration menu - View commit details
-
Copy full SHA for 6461d00 - Browse repository at this point
Copy the full SHA 6461d00View commit details -
ref: calculate outputDir outside of WRITE_CONTEXT
We might want to write to a different dir and want control of that outside of WRITE_CONTEXT
Configuration menu - View commit details
-
Copy full SHA for 4f26872 - Browse repository at this point
Copy the full SHA 4f26872View commit details -
Configuration menu - View commit details
-
Copy full SHA for 12726ca - Browse repository at this point
Copy the full SHA 12726caView commit details -
ref: one arg per line for newFecContext()
This is way more readable I think
Configuration menu - View commit details
-
Copy full SHA for 70776ed - Browse repository at this point
Copy the full SHA 70776edView commit details -
ref: label args to newFecContext() in client.py
It was really hard to tell what each represented
Configuration menu - View commit details
-
Copy full SHA for f7837de - Browse repository at this point
Copy the full SHA f7837deView commit details -
Configuration menu - View commit details
-
Copy full SHA for 91ade4a - Browse repository at this point
Copy the full SHA 91ade4aView commit details -
Configuration menu - View commit details
-
Copy full SHA for a524dd7 - Browse repository at this point
Copy the full SHA a524dd7View commit details -
Configuration menu - View commit details
-
Copy full SHA for 2a790ee - Browse repository at this point
Copy the full SHA 2a790eeView commit details
Commits on Apr 12, 2023
-
ref: remove includeFilingId from FEC_CONTEXT
This can be inferred from whether or not filingID is NULL, and now that the WRITE_CONTEXT doesn't depend on the filingId, all the filingId processing can happen in the callers of newFecContext()
Configuration menu - View commit details
-
Copy full SHA for 1b8e100 - Browse repository at this point
Copy the full SHA 1b8e100View commit details -
BREAKING ref: re-order args to newFecContext()
These "options" should all be near each other.
Configuration menu - View commit details
-
Copy full SHA for cc2df27 - Browse repository at this point
Copy the full SHA cc2df27View commit details -
ref: rename test case trailing_commas to too_few_fields
Because I'm about to add another that has extra fields and I want to make the distinction
Configuration menu - View commit details
-
Copy full SHA for 4919d82 - Browse repository at this point
Copy the full SHA 4919d82View commit details
Commits on Apr 13, 2023
-
Configuration menu - View commit details
-
Copy full SHA for 827f903 - Browse repository at this point
Copy the full SHA 827f903View commit details -
test: ref: add helper assert method to better show errors
when you hit an assert, pytest will show the values of the args of the encolsing function. Before, you only saw the values to dir1 and dir2, but now you can see file1 and file2.
Configuration menu - View commit details
-
Copy full SHA for 6005a40 - Browse repository at this point
Copy the full SHA 6005a40View commit details -
I thought this test indicated a bug in the lib, but actually I just wasn't testing it correctly.
Configuration menu - View commit details
-
Copy full SHA for 7f5d80b - Browse repository at this point
Copy the full SHA 7f5d80bView commit details -
Configuration menu - View commit details
-
Copy full SHA for 0d70f9c - Browse repository at this point
Copy the full SHA 0d70f9cView commit details -
feat: Fail earlier on bad output_directory in parse_as_files()
Now if the supplied value can't be coerced to a Path, you get the error when you first call the function, not when the file is opened int he callback.
Configuration menu - View commit details
-
Copy full SHA for dc72c90 - Browse repository at this point
Copy the full SHA dc72c90View commit details -
docs: improve docstrings of python API
make them shorter to conform to line legnth. improve wording a bit. dont repeat defaults that are in the funciton signature.
Configuration menu - View commit details
-
Copy full SHA for 70b3431 - Browse repository at this point
Copy the full SHA 70b3431View commit details -
test: better error message on fail
Now we can see the line of which file.
Configuration menu - View commit details
-
Copy full SHA for 4618d74 - Browse repository at this point
Copy the full SHA 4618d74View commit details -
Configuration menu - View commit details
-
Copy full SHA for 5c4c487 - Browse repository at this point
Copy the full SHA 5c4c487View commit details -
Configuration menu - View commit details
-
Copy full SHA for a244b38 - Browse repository at this point
Copy the full SHA a244b38View commit details -
fix: Don't warn on an empty float field
If we run into a field that is supposed to be a float, but it is an empty string, this isn't actually a problem. Before, we were throwing a warning.
Configuration menu - View commit details
-
Copy full SHA for 56b954c - Browse repository at this point
Copy the full SHA 56b954cView commit details -
feat: put quotes around strings in warnings.
It was hard to tell where the stirng ended before, like if it was an empty string.
Configuration menu - View commit details
-
Copy full SHA for 5e0cd77 - Browse repository at this point
Copy the full SHA 5e0cd77View commit details -
BREAKING: feat: print correct number of fields be default
Fixes washingtonpost#24 Before, we printed exactly what was in the .fec file. If a row had more fields or fewer fields than we expected from the schema, we just printed it as is. This broke downstream tooling, such as loading the .csv's into databases. Now, by default we - pad short rows with empty fields - truncate long rows You can get back to the old behavior by setting the `raw` flag to True. By default it is False. Note that this could be BREAKING. This also adjusts the warnings a bit: BEfore, you got a warning for every extra field in a row. Now you only get one warning per row, and we print out the full row (even though that row is a bit mangled by the csv parser as it removes quotes and delimiters) The tests only currently test the default behavior. A follow up should adjust how we define test cases. Currently, we expect a 1:1 correspondence between an input .fec file and an output. But really we want a 1:N relationship, where one .fec file can generate multiple outputs depending on the options passed. That will require updating our test definition format.
Configuration menu - View commit details
-
Copy full SHA for 8073321 - Browse repository at this point
Copy the full SHA 8073321View commit details
Commits on Jul 29, 2023
-
Configuration menu - View commit details
-
Copy full SHA for b7eb213 - Browse repository at this point
Copy the full SHA b7eb213View commit details