Skip to content

Commit

Permalink
BREAKING: feat: print correct number of fields be default
Browse files Browse the repository at this point in the history
Fixes washingtonpost#24

Before, we printed exactly what was in the .fec file. If a row had
more fields or fewer fields than we expected from the schema, we
just printed it as is. This broke downstream
tooling, such as loading the .csv's into
databases.

Now, by default we
- pad short rows with empty fields
- truncate long rows
You can get back to the old behavior by setting the `raw` flag to True.
By default it is False.

Note that this could be BREAKING.

This also adjusts the warnings a bit:
BEfore, you got a warning for every extra field in a row.
Now you only get one warning per row, and we print out the full row
(even though that row is a bit mangled by the csv parser as
it removes quotes and delimiters)

The tests only currently test the default behavior. A follow up should
adjust how we define test cases. Currently, we expect a 1:1
correspondence between an input .fec file and an output. But really we
want a 1:N relationship, where one .fec file
can generate multiple outputs depending on the
options passed. That will require updating our test definition
format.
  • Loading branch information
NickCrews committed Apr 13, 2023
1 parent 5e0cd77 commit 8073321
Show file tree
Hide file tree
Showing 51 changed files with 86,299 additions and 86,231 deletions.
19 changes: 19 additions & 0 deletions python/src/fastfec/client.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ def parse(
file_handle: io.BinaryIO,
include_filing_id: str | None = None,
should_parse_date: bool = True,
raw: bool = False,
) -> Generator[tuple[str, dict[str, Any]], None, None]:
"""Parses the input file line-by-line.
Expand All @@ -53,6 +54,10 @@ def parse(
If False, date fields are returned as raw YYYY-MM-DD
strings. This would mainly be set to false for
performance reasons.
raw -- If True, if there are fewer or more fields in a row than we
expect, the row will be written to the output file as-is.
If False, we will add empty fields, or skip extra fields,
to the row to make it the correct length.
Returns:
-------
Expand Down Expand Up @@ -85,6 +90,7 @@ def parse(
filing_id, # filingId
1, # silent
0, # warn
raw, # raw
)

# Run the parsing in a separate thread. It's essentially still single-threaded
Expand Down Expand Up @@ -113,6 +119,7 @@ def parse_as_files(
file_handle: io.BinaryIO,
output_directory: str | pathlib.Path,
include_filing_id: str | None = None,
raw: bool = False,
) -> int:
"""Parses the input file into output files in the output directory.
Expand All @@ -124,6 +131,10 @@ def parse_as_files(
output_directory -- A directory in which to place output parsed .csv files
include_filing_id -- If set, prepend a column `filing_id` into each
outputted csv filled with the specified value.
raw -- If True, if there are fewer or more fields in a row than we
expect, the row will be written to the output file as-is.
If False, we will add empty fields, or skip extra fields,
to the row to make it the correct length.
Returns:
-------
Expand All @@ -143,13 +154,15 @@ def open_output_file(form_type: str, *args, **kwargs):
file_handle,
open_output_file,
include_filing_id=include_filing_id,
raw=raw,
)

def parse_as_files_custom(
self,
file_handle: io.BinaryIO,
open_function,
include_filing_id: str | None = None,
raw: bool = False,
) -> int:
"""Parses the input file into output files.
Expand All @@ -161,6 +174,10 @@ def parse_as_files_custom(
stream for each parsed .csv file
include_filing_id -- If set, prepend a column `filing_id` into each
outputted csv filled with the specified value.
raw -- If True, if there are fewer or more fields in a row than we
expect, the row will be written to the output file as-is.
If False, we will add empty fields, or skip extra fields,
to the row to make it the correct length.
Returns:
-------
Expand All @@ -187,6 +204,7 @@ def parse_as_files_custom(
filing_id, # filingId
1, # silent
0, # warn
raw, # raw
)

# Parse
Expand Down Expand Up @@ -223,6 +241,7 @@ def __init_lib(self) -> None:
c_char_p, # filingId
c_int, # silent
c_int, # warn
c_int, # raw
]
self.libfastfec.newFecContext.restype = c_void_p
self.libfastfec.parseFec.argtypes = [c_void_p]
Expand Down
4,384 changes: 2,192 additions & 2,192 deletions python/tests/cases/1527862/expected/SA18.csv

Large diffs are not rendered by default.

64 changes: 32 additions & 32 deletions python/tests/cases/1527862/expected/SA20A.csv
Original file line number Diff line number Diff line change
@@ -1,33 +1,33 @@
form_type,filer_committee_id_number,transaction_id,back_reference_tran_id_number,back_reference_sched_name,entity_type,contributor_organization_name,contributor_last_name,contributor_first_name,contributor_middle_name,contributor_prefix,contributor_suffix,contributor_street_1,contributor_street_2,contributor_city,contributor_state,contributor_zip_code,election_code,election_other_description,contribution_date,contribution_amount,contribution_aggregate,contribution_purpose_descrip,contributor_employer,contributor_occupation,donor_committee_fec_id,donor_committee_name,donor_candidate_fec_id,donor_candidate_last_name,donor_candidate_first_name,donor_candidate_middle_name,donor_candidate_prefix,donor_candidate_suffix,donor_candidate_office,donor_candidate_state,donor_candidate_district,conduit_name,conduit_street1,conduit_street2,conduit_city,conduit_state,conduit_zip_code,memo_code,memo_text_description,reference_code
SA20A,C00703975,29874245,,,ORG,United States Secret Service (USSS),,,,,,950 H St NW,,Washington,DC,202230001,G2020,,2021-04-07,579634.88,3079091.53,,,,,,,,,,,,,,,,,,,,,,Travel Offset
SA20A,C00703975,29874246,,,ORG,United States Secret Service (USSS),,,,,,950 H St NW,,Washington,DC,202230001,G2020,,2021-04-07,692087.52,3079091.53,,,,,,,,,,,,,,,,,,,,,,Travel Offset
SA20A,C00703975,29874253,,,ORG,Zenefits,,,,,,50 Beale St,Fl 10,San Francisco,CA,941051863,G2020,,2021-04-13,27264.40,184631.16,,,,,,,,,,,,,,,,,,,,,,Insurance Offset
SA20A,C00703975,29874222,,,ORG,Boston Globe,,,,,,53 State St,,Boston,MA,021092820,G2020,,2021-04-14,5131.22,7424.37,,,,,,,,,,,,,,,,,,,,,,Travel Offset
SA20A,C00703975,29874243,,,ORG,Bully Pulpit Interactive LLC,,,,,,1445 New York Ave NW,Fl 5,Washington,DC,200052267,G2020,,2021-04-14,18112.89,214904.10,,,,,,,,,,,,,,,,,,,,,,Advertising Refund
SA20A,C00703975,29874256,,,ORG,Zenefits,,,,,,50 Beale St,Fl 10,San Francisco,CA,941051863,G2020,,2021-04-16,0.01,185274.30,,,,,,,,,,,,,,,,,,,,,,Taxes Refund
SA20A,C00703975,29874257,,,ORG,Zenefits,,,,,,50 Beale St,Fl 10,San Francisco,CA,941051863,G2020,,2021-04-16,643.13,185274.30,,,,,,,,,,,,,,,,,,,,,,Taxes Refund
SA20A,C00703975,29874248,,,ORG,American Express,,,,,,PO Box 1270,,Newark,NJ,071011270,G2020,,2021-05-05,0.21,364.27,,,,,,,,,,,,,,,,,,,,,,Processing Fees Refund
SA20A,C00703975,29874258,,,ORG,Zenefits,,,,,,50 Beale St,Fl 10,San Francisco,CA,941051863,G2020,,2021-05-05,45.00,185319.30,,,,,,,,,,,,,,,,,,,,,,Taxes Refund
SA20A,C00703975,29874223,,,ORG,United States Postal Service,,,,,,900 Brentwood Rd NE,,Washington,DC,200181004,G2020,,2021-05-10,2062.47,6872.47,,,,,,,,,,,,,,,,,,,,,,Postage Refund
SA20A,C00703975,29874224,,,ORG,United States Postal Service,,,,,,900 Brentwood Rd NE,,Washington,DC,200181004,G2020,,2021-05-10,2049.98,6872.47,,,,,,,,,,,,,,,,,,,,,,Postage Refund
SA20A,C00703975,29874225,,,ORG,United States Postal Service,,,,,,900 Brentwood Rd NE,,Washington,DC,200181004,G2020,,2021-05-10,1399.19,6872.47,,,,,,,,,,,,,,,,,,,,,,Postage Refund
SA20A,C00703975,29874226,,,ORG,United States Postal Service,,,,,,900 Brentwood Rd NE,,Washington,DC,200181004,G2020,,2021-05-10,1360.83,6872.47,,,,,,,,,,,,,,,,,,,,,,Postage Refund
SA20A,C00703975,29874254,,,ORG,Zenefits,,,,,,50 Beale St,Fl 10,San Francisco,CA,941051863,G2020,,2021-05-12,18298.62,203617.92,,,,,,,,,,,,,,,,,,,,,,Insurance Offset
SA20A,C00703975,29874229,,,ORG,C & J Development Co. LLC,,,,,,2401 SE Tones Dr,Ste 17,Ankeny,IA,500218886,G2020,,2021-05-17,307.78,1669.14,,,,,,,,,,,,,,,,,,,,,,Rent Refund
SA20A,C00703975,29874230,,,ORG,Hudson School District,,,,,,20 Library St,,Hudson,NH,030514240,G2020,,2021-05-17,210.00,210.00,,,,,,,,,,,,,,,,,,,,,,Site Rental Refund
SA20A,C00703975,29874228,,,ORG,Pitney Bowes,,,,,,1 Elmcroft Rd,,Stamford,CT,069260700,G2020,,2021-05-17,414.25,5779.49,,,,,,,,,,,,,,,,,,,,,,Computer Equipment Refund
SA20A,C00703975,29874231,,,ORG,Pitney Bowes,,,,,,1 Elmcroft Rd,,Stamford,CT,069260700,G2020,,2021-05-17,165.24,5779.49,,,,,,,,,,,,,,,,,,,,,,Computer Equipment Refund
SA20A,C00703975,29874227,,,ORG,Treasurer Of The State Of Missouri,,,,,,301 W High St,,Jefferson City,MO,651011517,G2020,,2021-05-17,665.07,665.07,,,,,,,,,,,,,,,,,,,,,,Tax Refund; Original payment through Zenefits
SA20A,C00703975,29874250,,,ORG,Media Buying & Analytics LLC,,,,,,2020 Howell Mill Rd NW,Ste 348D,Atlanta,GA,303181732,G2020,,2021-05-20,750000.00,750000.00,,,,,,,,,,,,,,,,,,,,,,Media Buy Refund
SA20A,C00703975,29874241,,,ORG,Carter Printing Company,,,,,,1739 E Grand Ave,,Des Moines,IA,503163611,G2020,,2021-06-07,2396.48,2396.48,,,,,,,,,,,,,,,,,,,,,,Printing Refund
SA20A,C00703975,29874235,,,ORG,Comcast,,,,,,401 White Horse Rd,,Voorhees,NJ,080432604,G2020,,2021-06-07,8970.00,8988.93,,,,,,,,,,,,,,,,,,,,,,Utilities Refund
SA20A,C00703975,29874238,,,ORG,"Greenspun Media Group, LLC",,,,,,2275 Corporate Cir,Ste 300,Henderson,NV,890747745,G2020,,2021-06-07,206.22,206.22,,,,,,,,,,,,,,,,,,,,,,Travel Offset
SA20A,C00703975,29874239,,,ORG,State Of California Treasurer,,,,,,800 CAPITAL Mall,,Sacramento,CA,94813,G2020,,2021-06-07,238.72,586.75,,,,,,,,,,,,,,,,,,,,,,Tax Refund; Original payment through Zenefits
SA20A,C00703975,29874236,,,ORG,"WJCT, Inc",,,,,,100 Festival Park Ave,,Jacksonville,FL,322021309,G2020,,2021-06-07,270.16,270.16,,,,,,,,,,,,,,,,,,,,,,Travel Offset
SA20A,C00703975,29874255,,,ORG,Zenefits,,,,,,50 Beale St,Fl 10,San Francisco,CA,941051863,G2020,,2021-06-11,13841.21,217459.13,,,,,,,,,,,,,,,,,,,,,,Insurance Offset
SA20A,C00703975,29874259,,,ORG,Zenefits,,,,,,50 Beale St,Fl 10,San Francisco,CA,941051863,G2020,,2021-06-16,500.00,217959.13,,,,,,,,,,,,,,,,,,,,,,Taxes Refund
SA20A,C00703975,29874244,,,ORG,Agence France-Presse,,,,,,1500 K St NW,Ste 600,Washington,DC,200051200,G2020,,2021-06-17,7578.99,96065.20,,,,,,,,,,,,,,,,,,,,,,Travel Offset
SA20A,C00703975,29874247,,,ORG,Bloomberg LP,,,,,,731 Lexington Ave,,New York,NY,100221331,G2020,,2021-06-17,3621.52,95610.18,,,,,,,,,,,,,,,,,,,,,,Travel Offset
SA20A,C00703975,29874249,,,ORG,Fox News Network,,,,,,1211 Avenue Of The Americas,,New York,NY,100368701,G2020,,2021-06-23,5874.51,91973.33,,,,,,,,,,,,,,,,,,,,,,Travel Offset
SA20A,C00703975,29874242,,,ORG,State Of New Hampshire,,,,,,25 Capitol St,,Concord,NH,033016312,G2020,,2021-06-25,2064.31,2064.31,,,,,,,,,,,,,,,,,,,,,,Tax Refund; Original payment through Zenefits
SA20A,C00703975,29874252,,,ORG,ActBlue Technical Services,,,,,,366 Summer St,,Somerville,MA,021443132,G2020,,2021-06-30,7932.41,72576.96,,,,,,,,,,,,,,,,,,,,,,Service Fee Refund
SA20A,C00703975,29874245,,,ORG,United States Secret Service (USSS),,,,,,950 H St NW,,Washington,DC,202230001,G2020,,2021-04-07,579634.88,3079091.53,,,,,,,,,,,,,,,,,,,,,,Travel Offset,
SA20A,C00703975,29874246,,,ORG,United States Secret Service (USSS),,,,,,950 H St NW,,Washington,DC,202230001,G2020,,2021-04-07,692087.52,3079091.53,,,,,,,,,,,,,,,,,,,,,,Travel Offset,
SA20A,C00703975,29874253,,,ORG,Zenefits,,,,,,50 Beale St,Fl 10,San Francisco,CA,941051863,G2020,,2021-04-13,27264.40,184631.16,,,,,,,,,,,,,,,,,,,,,,Insurance Offset,
SA20A,C00703975,29874222,,,ORG,Boston Globe,,,,,,53 State St,,Boston,MA,021092820,G2020,,2021-04-14,5131.22,7424.37,,,,,,,,,,,,,,,,,,,,,,Travel Offset,
SA20A,C00703975,29874243,,,ORG,Bully Pulpit Interactive LLC,,,,,,1445 New York Ave NW,Fl 5,Washington,DC,200052267,G2020,,2021-04-14,18112.89,214904.10,,,,,,,,,,,,,,,,,,,,,,Advertising Refund,
SA20A,C00703975,29874256,,,ORG,Zenefits,,,,,,50 Beale St,Fl 10,San Francisco,CA,941051863,G2020,,2021-04-16,0.01,185274.30,,,,,,,,,,,,,,,,,,,,,,Taxes Refund,
SA20A,C00703975,29874257,,,ORG,Zenefits,,,,,,50 Beale St,Fl 10,San Francisco,CA,941051863,G2020,,2021-04-16,643.13,185274.30,,,,,,,,,,,,,,,,,,,,,,Taxes Refund,
SA20A,C00703975,29874248,,,ORG,American Express,,,,,,PO Box 1270,,Newark,NJ,071011270,G2020,,2021-05-05,0.21,364.27,,,,,,,,,,,,,,,,,,,,,,Processing Fees Refund,
SA20A,C00703975,29874258,,,ORG,Zenefits,,,,,,50 Beale St,Fl 10,San Francisco,CA,941051863,G2020,,2021-05-05,45.00,185319.30,,,,,,,,,,,,,,,,,,,,,,Taxes Refund,
SA20A,C00703975,29874223,,,ORG,United States Postal Service,,,,,,900 Brentwood Rd NE,,Washington,DC,200181004,G2020,,2021-05-10,2062.47,6872.47,,,,,,,,,,,,,,,,,,,,,,Postage Refund,
SA20A,C00703975,29874224,,,ORG,United States Postal Service,,,,,,900 Brentwood Rd NE,,Washington,DC,200181004,G2020,,2021-05-10,2049.98,6872.47,,,,,,,,,,,,,,,,,,,,,,Postage Refund,
SA20A,C00703975,29874225,,,ORG,United States Postal Service,,,,,,900 Brentwood Rd NE,,Washington,DC,200181004,G2020,,2021-05-10,1399.19,6872.47,,,,,,,,,,,,,,,,,,,,,,Postage Refund,
SA20A,C00703975,29874226,,,ORG,United States Postal Service,,,,,,900 Brentwood Rd NE,,Washington,DC,200181004,G2020,,2021-05-10,1360.83,6872.47,,,,,,,,,,,,,,,,,,,,,,Postage Refund,
SA20A,C00703975,29874254,,,ORG,Zenefits,,,,,,50 Beale St,Fl 10,San Francisco,CA,941051863,G2020,,2021-05-12,18298.62,203617.92,,,,,,,,,,,,,,,,,,,,,,Insurance Offset,
SA20A,C00703975,29874229,,,ORG,C & J Development Co. LLC,,,,,,2401 SE Tones Dr,Ste 17,Ankeny,IA,500218886,G2020,,2021-05-17,307.78,1669.14,,,,,,,,,,,,,,,,,,,,,,Rent Refund,
SA20A,C00703975,29874230,,,ORG,Hudson School District,,,,,,20 Library St,,Hudson,NH,030514240,G2020,,2021-05-17,210.00,210.00,,,,,,,,,,,,,,,,,,,,,,Site Rental Refund,
SA20A,C00703975,29874228,,,ORG,Pitney Bowes,,,,,,1 Elmcroft Rd,,Stamford,CT,069260700,G2020,,2021-05-17,414.25,5779.49,,,,,,,,,,,,,,,,,,,,,,Computer Equipment Refund,
SA20A,C00703975,29874231,,,ORG,Pitney Bowes,,,,,,1 Elmcroft Rd,,Stamford,CT,069260700,G2020,,2021-05-17,165.24,5779.49,,,,,,,,,,,,,,,,,,,,,,Computer Equipment Refund,
SA20A,C00703975,29874227,,,ORG,Treasurer Of The State Of Missouri,,,,,,301 W High St,,Jefferson City,MO,651011517,G2020,,2021-05-17,665.07,665.07,,,,,,,,,,,,,,,,,,,,,,Tax Refund; Original payment through Zenefits,
SA20A,C00703975,29874250,,,ORG,Media Buying & Analytics LLC,,,,,,2020 Howell Mill Rd NW,Ste 348D,Atlanta,GA,303181732,G2020,,2021-05-20,750000.00,750000.00,,,,,,,,,,,,,,,,,,,,,,Media Buy Refund,
SA20A,C00703975,29874241,,,ORG,Carter Printing Company,,,,,,1739 E Grand Ave,,Des Moines,IA,503163611,G2020,,2021-06-07,2396.48,2396.48,,,,,,,,,,,,,,,,,,,,,,Printing Refund,
SA20A,C00703975,29874235,,,ORG,Comcast,,,,,,401 White Horse Rd,,Voorhees,NJ,080432604,G2020,,2021-06-07,8970.00,8988.93,,,,,,,,,,,,,,,,,,,,,,Utilities Refund,
SA20A,C00703975,29874238,,,ORG,"Greenspun Media Group, LLC",,,,,,2275 Corporate Cir,Ste 300,Henderson,NV,890747745,G2020,,2021-06-07,206.22,206.22,,,,,,,,,,,,,,,,,,,,,,Travel Offset,
SA20A,C00703975,29874239,,,ORG,State Of California Treasurer,,,,,,800 CAPITAL Mall,,Sacramento,CA,94813,G2020,,2021-06-07,238.72,586.75,,,,,,,,,,,,,,,,,,,,,,Tax Refund; Original payment through Zenefits,
SA20A,C00703975,29874236,,,ORG,"WJCT, Inc",,,,,,100 Festival Park Ave,,Jacksonville,FL,322021309,G2020,,2021-06-07,270.16,270.16,,,,,,,,,,,,,,,,,,,,,,Travel Offset,
SA20A,C00703975,29874255,,,ORG,Zenefits,,,,,,50 Beale St,Fl 10,San Francisco,CA,941051863,G2020,,2021-06-11,13841.21,217459.13,,,,,,,,,,,,,,,,,,,,,,Insurance Offset,
SA20A,C00703975,29874259,,,ORG,Zenefits,,,,,,50 Beale St,Fl 10,San Francisco,CA,941051863,G2020,,2021-06-16,500.00,217959.13,,,,,,,,,,,,,,,,,,,,,,Taxes Refund,
SA20A,C00703975,29874244,,,ORG,Agence France-Presse,,,,,,1500 K St NW,Ste 600,Washington,DC,200051200,G2020,,2021-06-17,7578.99,96065.20,,,,,,,,,,,,,,,,,,,,,,Travel Offset,
SA20A,C00703975,29874247,,,ORG,Bloomberg LP,,,,,,731 Lexington Ave,,New York,NY,100221331,G2020,,2021-06-17,3621.52,95610.18,,,,,,,,,,,,,,,,,,,,,,Travel Offset,
SA20A,C00703975,29874249,,,ORG,Fox News Network,,,,,,1211 Avenue Of The Americas,,New York,NY,100368701,G2020,,2021-06-23,5874.51,91973.33,,,,,,,,,,,,,,,,,,,,,,Travel Offset,
SA20A,C00703975,29874242,,,ORG,State Of New Hampshire,,,,,,25 Capitol St,,Concord,NH,033016312,G2020,,2021-06-25,2064.31,2064.31,,,,,,,,,,,,,,,,,,,,,,Tax Refund; Original payment through Zenefits,
SA20A,C00703975,29874252,,,ORG,ActBlue Technical Services,,,,,,366 Summer St,,Somerville,MA,021443132,G2020,,2021-06-30,7932.41,72576.96,,,,,,,,,,,,,,,,,,,,,,Service Fee Refund,
Loading

0 comments on commit 8073321

Please sign in to comment.