-
Notifications
You must be signed in to change notification settings - Fork 200
Bills
Bill data is primarily collected by the bills.py
task, which covers 2013 to the present. There are two other scrapers to cover earlier time periods.
bills.py
collects data from the official congressional XML data on legislation, which covers the 113th Congress (2013) to the present.
This process has two parts. First, the XML data must be fetched from Govinfo. This script pulls the bill status XML and on subsequent runs only pulls new and changed files:
./run govinfo --bulkdata=BILLSTATUS
Then run the bills task to process any new and changed files:
./run bills
It's recommended to do this two-step process no more than every 6 hours, as the data is not updated more frequently than that (and often really only once daily).
To process only a specific bill, pass in the ID for that bill. For example, S. 968 in the 112th congress's ID is s968-112
:
./run bills --bill_id=s968-112
Previously we were able to fetch bill information from 1973 (93rd Congress) through 2012 by screen scraping THOMAS.gov, the now-gone official source for legislative information run by the Library of Congress. THOMAS was shut down on July 5, 2016, and no screen scraper has been written for Congress.gov, so that information is no longer available in this project. The ProPublica data store has this information as a final archive from THOMAS.
statutes.py
collects bill data from GPO FDSys's Statutes at Large collection. The Statues at Large is the official final compilation of public and private laws and agreed-to concurrent resolutions produced by each Congress. This source of course only provides information on bills that were enacted and concurrent resolutions that were agreed to. This covers 1951-1972 (82nd-92nd Congresses).
To run this scraper, first download the Statutes at Large MODS metadata files from GPO:
./run fdsys --collections=STATUTE --store=mods
Then run the scraper. You can run it in various forms:
./run statutes
./run statutes --volume=65
./run statutes --volumes=65-86
./run statutes --year=1951
./run statutes --years=1951-1972
This outputs the same sort of bill JSON and XML as the bills.py
scraper. It also outputs bill text metadata files, and it can also be used to output bill text itself (see the Bill Text documentation).
The Statutes at Large are available from GPO from the 82nd Congress to the present. But starting with the 93rd Congress (1973), you should get bill metadata from bills.py
instead.
The bill files produced by this scraper have sort-of made up action entries since we don't know the legislative history of the bill. We also assume all bills are enacted by being signed by the President for the sake of outputting status information, although they could have also been enacted by one of the two other ways a bill can become law.
The American Memory scraper, which is in a separate repository, collects data from the Library of Congress's American Memory Century of Lawmaking collection of historical bill data from 1799-1873 (6th-42nd Congresses). The data is the most impoverished, but it attempts to output bill data in the same format as bills.py
.
Every bill has a JSON file, data.json, with fields related to a bill's ID, status, names, sponsorship, amendments, and history. There is also a corresponding XML file, data.xml, which is roughly compatible with GovTrack's legacy data format; however, the XML format is not documented, and is not particularly recommended.
The output files are in the following location:
data/[congress]/bills/[bill_type]/[bill_type][number]/data.{json.xml}
See the documentation for definitions of congress, bill_type, and number.
The examples below use data excerpts where possible from H.R. 3590 from the 111th Congress - the Patient Protection and Affordable Care Act, also known as Obamacare.
{
"bill_id": "hr3590-111",
"bill_type": "hr",
"number": "3590",
"congress": "111",
"introduced_at": "2009-09-17",
"updated_at": "2013-07-19T23:40:56-04:00"
}
Bills are uniquely identified by the combination of a Congress, bill type, and bill number. A "Congress" is a two-year period beginning at noon on January 3 following an election to noon on January 3 two years later. The 111th Congress was from January 2009 to January 2011. Because Congresses end so early in a calendar year, we often write 2009-2010 for shorthand. This timing of Congresses started in 1941 with the 77th Congress. Before that, the starting and ending dates of Congresses were irregular.
Bill_type can be one of hr, hres, hjres, hconres, s, sres, sjres, sconres. These are distinct sorts of legislative documents. Two of these are for bills. The remaining are types of resolutions. It is important that when you display these types that you use the standard abbreviations.
-
hr
: "H.R. 1234". It stands for House of Representatives, but it is the prefix used for bills introduced in the House. -
hres
: "H.Res. 1234". It stands for House Simple Resolution. -
hconres
: "H.Con.Res. 1234". It stands for House Concurrent Resolution. -
hjres
: "H.J.Res. 1234". It stands for House Joint Resolution. -
s
: "S. 1234". It stands for Senate and it is the prefix used for bills introduced in the Senate. Any abbreviation besides "S." is incorrect. -
sres
: "S.Res. 1234". It stands for Senate Simple Resolution. -
sconres
: "S.Con.Res. 1234". It stands for Senate Concurrent Resolution. -
sjres
: "S.J.Res. 1234". It stands for Senate Joint Resolution.
Simple resolutions only get a vote in their originating chamber. Concurrent resolutions get a vote in both chambers but do not go to the President. Neither has the force of law. Joint resolutions can be used either to propose an amendment to the constitution or to propose a law. When used to propose a law, they have exactly the same procedural steps as bills.
The bill number is a positive integer. Bills die at the end of a Congress and numbering starts with 1 at the beginning of each new Congress.
Bill IDs are of the form [bill_type][number]-[congress].
All introduction dates are dates, not specific times.
updated_at
is the date and time that the JSON file was last saved. It reflects the time the scraper was run and is not metadata about the bill itself.
{
"official_title": "An act entitled The Patient Protection and Affordable Care Act.",
"popular_title": "Health care reform bill",
"short_title": "Patient Protection and Affordable Care Act",
"titles": [
{
"as": null,
"is_for_portion": false,
"title": "Health care reform bill",
"type": "popular"
},
{
"as": "introduced",
"is_for_portion": false,
"title": "Service Members Home Ownership Tax Act of 2009",
"type": "short"
},
{
"as": "passed house",
"is_for_portion": false,
"title": "Patient Protection and Affordable Care Act",
"type": "short"
},
{
"as": "passed house",
"is_for_portion": false,
"title": "Service Members Home Ownership Tax Act of 2009",
"type": "short"
},
{
"as": "passed house",
"is_for_portion": true,
"title": "Biologics Price Competition and Innovation Act of 2009",
"type": "short"
},
...
{
"as": "enacted",
"is_for_portion": false,
"title": "Patient Protection and Affordable Care Act",
"type": "short"
},
...
{
"as": "amended by senate",
"is_for_portion": false,
"title": "An act entitled The Patient Protection and Affordable Care Act.",
"type": "official"
}
]
}
Bills can have "official" descriptive titles (almost always), "short" catchy titles (sometimes), and "popular" nickname titles (rare). They can have many of these titles, given at various stages of a bill's life. The current official, short, and popular titles are kept in top-level official_title
, short_title
, and popular_title
fields.
Popular titles are assigned by the Library of Congress, and can be added at any time.
A bill may have multiple titles for any given stage. is_for_portion
is true
when the title is for a portion of the bill, and these titles should not be used when choosing a title for display for the entire bill. See the current_title_for
function for how to chose a bill title for display, if you do not want to use the titles we've already picked out.
{
"subjects_top_term": "Health",
"subjects": [
"Abortion",
"Administrative law and regulatory procedures",
"Adoption and foster care"
],
"summary": {
"as": "Public Law",
"date": "2010-03-23",
"text": "Patient Protection and Affordable Care Act - Title I: Quality, Affordable Health Care for All Americans..."
}
}
The Library of Congress assigns official summaries to some bills, and official keywords to most bills. These values are written by a human in the Library of Congress, and are not usually present when information on the bill is first published. They can be added at any time.
A bill is assigned at most one "top term" which is set in subjects_top_term
. A bill's subject may change through its life cycle, and we believe the Library of Congress adds subject terms cumulatively. Thus by the end of life of a bill, some subject may appear to not be relevant (anymore).
Two taxonomies of subject terms are used. From the 93rd through the 110th Congress a very expansive taxonomy was used. The taxonomy of subject terms changed with the 111th Congress and got smaller (main terms, named entities, documentation on THOMAS).
{
"status": "ENACTED:SIGNED",
"status_at": "2010-03-23",
"history": {
"active": true,
"active_at": "2009-10-07T14:35:00-04:00",
"awaiting_signature": false,
"enacted": true,
"enacted_at": "2010-03-23",
"house_passage_result": "pass",
"house_passage_result_at": "2010-03-21T22:48:00-04:00",
"senate_cloture_result": "pass",
"senate_cloture_result_at": "2009-12-23",
"senate_passage_result": "pass",
"senate_passage_result_at": "2009-12-24",
"vetoed": false
},
"enacted_as": {
"congress": "111",
"law_type": "public",
"number": "148"
}
}
The status
property gives the current status of the bill, and status_at
gives the date or timestamp when the bill transitioned to that status. The values for status
are documented at the end of this page.
The history
section contains a number of useful flags and timestamps documenting the life cycle of a bill. Timestamps can be either dates or datetimes. The fields are:
-
active
:true
if the bill has activity beyond the typical introductory activities all bills go through (mostly referrals). The_at
field is the date of the first such activity. -
house_passage_result
/senate_passage_result
: The result of the (first) House/Senate vote on passage:pass
orfail
. (This excludes ping-pong and conference report votes.) -
senate_cloture_result
: The result of the most recent Senate cloture vote on passage of the bill. -
vetoed
:true
if the bill was vetoed. -
house_override_result
/senate_override_result
: Present if a veto override occurs, and is eitherpass
orfail
. -
enacted
:true
if the bill was enacted. -
awaiting_signature
:true
in bills that have been sent to the President for signature but have not yet been enacted or voted. The time field for this one isawaiting_signature_since
.
The _at
field is the data or datetime on which the corresponding event occurred. It is present only if the event occurred.
If a bill has been enacted as law, the enacted_as
field will be present. In this case, H.R. 3590 became "Public Law 111-148".
law_type
can be either "public" or "private". Most laws are public. Private laws mean laws affecting a particular person or group. For example, sometimes individuals are granted citizenship directly through private laws, such as with S. 4010 in the 111th Congress. Enacted bills are typically cited in the form of "Public Law 111-148" or "Private Law 111-1".
The number
field is called the "slip law number". It is assigned consecutively starting with 1 at the start of each Congress as each new law is enacted. The number is assigned by the National Archives and Records Administration as it files away the actual document signed by the President.
{
"sponsor": {
"bioguide_id": "C001062",
"district": "11",
"name": "Conaway, K. Michael",
"state": "TX",
"title": "Rep",
"type": "person"
},
"cosponsors": [
{
"bioguide_id": "A000374",
"district": "5",
"name": "Abraham, Ralph Lee",
"original_cosponsor": true,
"sponsored_at": "2015-02-24",
"state": "LA",
"title": "Rep",
"withdrawn_at": null
},
{
"bioguide_id": "A000370",
"district": "12",
"name": "Adams, Alma S.",
"original_cosponsor": false,
"sponsored_at": "2015-07-22",
"state": "NC",
"title": "Rep",
"withdrawn_at": null
},
...
]
}
A bill has at most one primary sponsor (sponsor
), and zero or more cosponsors
.
Information on the sponsor
and cosponsors
includes basic information on their name, state, district, title, and, for cosponsors, when they joined as a cosponsor. Sometimes cosponsors withdraw cosponsorship.
The most useful field here is the bioguide_id
. This can be used in conjunction with the dataset at congress-legislators to find much more information about the legislator - including their IDs in other useful systems.
GovTrack's archival bill data files (through the 113th Congress) use a thomas_id
in place of a bioguide_id
.
{
"committees": [
{
"activity": [
"referral"
],
"committee": "House Ways and Means",
"committee_id": "HSWM"
}
]
}
Bills are typically referred to one or more committees shortly after introduction. The committees
object will list which committees have what relation to the bill. The fields on each committee object are:
committee
: The name of the committee as it is referenced on THOMAS.gov. If the committee is a subcommittee, the name of the subcommittee will appear in a subcommittee
field, with its parent committee named in the committee
field.
committee_id
: The ID of the committee in committees-historical.yaml, and, for current bills, in committees-current.yaml. As with committee
, if this object is for a subcommittee, this is the parent committee's id.
When this object refers to a relation to a subcommittee, then subcommittee
and subcommittee_id
will also be set. Here is an example from S. 609 in the 113th Congress:
{
"committees": [
{
"activity": [
"referral",
"markup",
"reporting",
"in committee"
],
"committee": "Senate Energy and Natural Resources",
"committee_id": "SSEG"
},
{
"activity": [
"hearings"
],
"committee": "Senate Energy and Natural Resources",
"committee_id": "SSEG",
"subcommittee": "Subcommittee on Public Lands, Forests, and Mining",
"subcommittee_id": "03"
}
]
}
subcommittee
: The name of the subcommittee as it is referenced on THOMAS.gov.
subcommittee_id
: The ID of the subcommittee. It is a two-digit number-like string, and is unique within a committee. See the committee YAML files linked above.
{
"amendments": [
{
"amendment_id": "s2786-111",
"amendment_type": "s",
"chamber": "s",
"number": "2786"
},
{
"amendment_id": "s2787-111",
"amendment_type": "s",
"chamber": "s",
"number": "2787"
}
]
}
Any amendments introduced in relation to this bill. Amendment IDs are of the form [amendment_type][number]-[congress].
Almost all the time, the amendment_type
is the chamber
("h" or "s"). In the 97th and 98th Congresses, there appear some "Senate Unprinted Amendments". For these amendments, the amendment_type
is "su".
Amendments have at least five numbering systems. The number here is the number according to the Library of Congress in their "S.Amdt./S.Up.Amdt./H.Amdt." numbering system.
{
"related_bills": [
{
"bill_id": "hconres254-111",
"reason": "related"
"type": "bill"
},
{
"bill_id": "hr4872-111",
"reason": "related"
"type": "bill"
}
]
}
The IDs and relationships of related bills and related amendments. The example above includes only related bills.
When type
is bill
, bill_id
is set to the related bill and is in the same format as described earlier.
When type
is amendment
, amendment_id
will identify the related amendment (and again is in the same format as described in the section above on amendments). We believe the Library of Congress will list related amendments when the amendment in question is essentially an entire bill. It is rare.
reason
is one of:
-
identical
: A bill with identical substantive text. -
related
andunknown
: Other related bill. -
supersedes
: The named bill superseded the bill represented by this file. -
rule
: The named bill sets the rules under which bill represented by this file will be considered. (This is a "providing for the consideration" resolution.) -
included-in
: The text of the named bill was included the bill represented by this file. -
ruled-by
: The bill represented by this file "provides for the consideration of" the named bill.
{
"actions": [
{
"acted_at": "2009-12-23",
"references": [
{
"reference": "CR S13796-13866",
"type": "consideration"
}
],
"text": "Considered by Senate.",
"type": "action"
},
...
{
"acted_at": "2009-12-24",
"how": "roll",
"references": [
{
"reference": "CR S13890-14212",
"type": "text"
}
],
"result": "pass",
"roll": "396",
"status": "PASS_BACK:SENATE",
"text": "Passed Senate with an amendment and an amendment to the Title by Yea-Nay Vote. 60 - 39. Record Vote Number: 396.",
"type": "vote",
"vote_type": "vote2",
"where": "s"
},
...
{
"acted_at": "2010-03-21T22:48:00-04:00",
"how": "roll",
"references": [
{
"reference": "CR H1920-2152",
"type": "text as House agreed to Senate amendments"
}
],
"result": "pass",
"roll": "165",
"status": "PASSED:BILL",
"suspension": null,
"text": "On motion that the House agree to the Senate amendments Agreed to by recorded vote: 219 - 212 (Roll no. 165).",
"type": "vote",
"vote_type": "pingpong",
"where": "h"
},
{
"acted_at": "2010-03-23",
"references": [],
"text": "Signed by President.",
"type": "signed"
},
{
"acted_at": "2010-03-23",
"congress": "111",
"law": "public",
"number": "148",
"references": [],
"status": "ENACTED:SIGNED",
"text": "Became Public Law No: 111-148.",
"type": "enacted"
}
]
}
Many actions can occur to a bill over its life, and there will almost always be at least one (its referral to a committee). Every action has:
-
text
: The action line as entered by House, Senate, or Library of Congress staff and as it appears on THOMAS. -
acted_at
: The date or datetime at which the action occurred. -
references
: A list of references to Congressional Record pages that document the action. -
type
: The normalized type of the action (see below).
If the action occurs in a committee, the action object will also have:
-
in_committee
: The committee in which the activity occurred. It is the full committee name. -
in_subcommittee
: The committee in which the activity occurred. It is the subcommittee's name.
There are three other optional fields:
-
status
: If the action causes a change in the status of the bill, this has the new status of the bill after this action (see the documentation of status codes below). -
committees
: A list of any related committees. -
bill_ids
: A list of any related bills.
Where possible, metadata is parsed out of the text of an action to infer more information about it. Some actions will have a more specific type
.
When type
is vote
, this action is a vote in the House or Senate on the passage of the bill, including ping pong votes, conference report votes, and veto-overrides.
-
vote_type
is one ofvote
(vote on passage in originating chamber),vote2
(vote on passage in second chamber),pingpong
(votes on passage after the first two, i.e. ping pong votes),conference
(vote on a conference report),override
(veto override), andcloture
(Senate vote on cloture on the motion to pass). Forvote2
andpingpong
votes, you'll need to check thestatus
to see if the chamber amended the bill (i.e. is the bill done or does it go back to the other chamber). -
how
isroll
if this is a roll call vote, in which caseroll
is set to the roll call vote number. Otherwise this is a free-form text field describing the vote type, such asunanimous consent
. -
where
ish
ors
indicating the chamber of the vote. -
result
ispass
orfail
. -
suspension
istrue
when this is a House vote "under suspension of the rules", which is a special voting mode that requires a higher threshold to pass.
type
can be:
-
referral
when the bill is referred to a committee (which could occur in either chamber). -
reported
,hearings
, ordischarged
, in which casecommittee
will give the name of the committee that reported, held a hearing, or discharged the bill.
type
can be:
-
topresident
, which indicates when the bill is sent to the President to be signed. -
signed
, when the President signs the bill (but it may not be administratively enacted yet). -
vetoed
, when the President vetoes the bill.pocket
is set to1
when this is a pocket veto. -
enacted
, when the bill is administratively considered enacted.law
,congress
,number
are also set (see theenacted
property on bills above).
Type may also be calendar
, but this is not ever useful. Don't use this. It is a formal action that has no significance.
We use fixed taxonomy to describe the current status of a bill. Bear in mind that the taxonomy is for status, which is a separate notion from the activity that got the bill into that status. Sometimes only one activity can lead to a status, but not always.
-
INTRODUCED
. The bill or resolution was introduced but not yet referred to committee. -
REFERRED
. The bill or resolution has been referred to committee in the originating chamber and needs committee action to continue. -
REPORTED
. The bill or resolution was reported by committee in the originating chamber and can now continue with floor debate in the originating chamber. -
PROV_KILL:SUSPENSIONFAILED
. The bill or resolution was brought up "under suspension of the rules" and failed that vote. It could be voted on again, so we call it provisionally killed. If the vote had passed, the status would have been a PASSED or PASS_OVER status. -
PROV_KILL:CLOTUREFAILED
. A Senate cloture vote was taken on the bill or resolution and the vote failed, meaning it was successfully filibustered. It is provisionally dead, in a sense. If the vote had succeeded, no status would be noted. -
FAIL:ORIGINATING:HOUSE
,FAIL:ORIGINATING:SENATE
. The bill or resolution failed in its originating chamber, either in the House or Senate. -
PASSED:SIMPLERES
. A simple resolution has been passed in its originating chamber. This is the end of the life for a simple resolution. -
PASSED:CONSTAMEND
. A joint resolution which is proposing an amendment to the Constitution has passed both chambers in identical form. This is the end of the life for the resolution in the legislative branch. It goes on subsequently to the states. -
PASS_OVER:HOUSE
,PASS_OVER:SENATE
. These status codes indicate the bill or joint or concurrent resolution has passed favorably in its originating chamber and now goes on to the other chamber.PASS_OVER:HOUSE
means the House bill passed the House. These statuses are not used for simple resolutions. When the second chamber passes a bill, the next step is eitherPASSED:BILL
or, if the second chamber amended the bill,PASS_BACK:HOUSE/SENATE
. -
PASSED:CONCURRENTRES
. A concurrent resolution has been passed by both chambers in identical form. This is the end of the life for concurrent resolutions. -
FAIL:SECOND:HOUSE
,FAIL:SECOND:SENATE
. The bill or resolution passed in its originating chamber but failed in the other chamber.FAIL:SECOND:HOUSE
means the bill passed in the Senate but failed in the House. -
PASS_BACK:HOUSE
,PASS_BACK:SENATE
. These status codes occur when a bill is passed in both chambers, but the second chamber made changes that the first chamber now has to agree to. The bill either goes to conference or "ping pong" ensues where the chambers go back and forth between passing the bill until no one makes any more changes. The final vote takes the bill to the statusPASSED:BILL
.PASS_BACK:HOUSE
means the House voted and send the bill back to the Senate. Ping pong can go around and around, soPASS_BACK:HOUSE
can be for both House and Senate bills. -
PROV_KILL:PINGPONGFAIL
. After both chambers have passed a bill or joint/concurrent resolution, if the second chamber made a change the chambers have to resolve their differences. When the second chamber's changes go back to the first chamber for a vote, if the vote fails it's a provisional failure since they can try again. -
PASSED:BILL
. A bill (or a joint resolution not proposing an amendment to the constitution) has been passed by both chambers in identical form. This is normally displayed as "Enrolled Bill." The bill will go on to the President next. This status typically is applied when the second chamber passes a bill. -
CONFERENCE:PASSED:HOUSE
,CONFERENCE:PASSED:SENATE
. After aPASS_BACK:HOUSE/SENATE
status, the two chambers have passed the bill but in non-identical form. The chambers can continue to play ping-pong (agreeing to the other chamber's amendment or proposing a new amendment) which will show up as alternatingPASS_BACK:HOUSE/SENATE
statuses. Or, the chambers can form a conference committee. A conference committee issues a report to which each chamber must agree. When the first chamber agrees to the conference report, one of these statuses is used. When the second chamber agrees to the conference report, the bill moves into thePASSED:BILL
state (orPASSED:SIMPLERES
,PASSED:CONSTAMEND
). -
ENACTED:SIGNED
. The president signed the bill. -
PROV_KILL:VETO
. A "PASSED:BILL" was vetoed by the President. A veto can be overridden. This status applies until an override attempt is made. If the bill was signed instead, the ENACTED status would have occurred instead. A pocket veto is indicated separately withVETOED:POCKET
because it is final (and not a provisional kill). If no override is attempted, the bill's final status remains asPROV_KILL:VETO
. -
VETOED:POCKET
. This status code is for bills that were pocket-vetoed, meaning the President does not sign the bill and Congress adjourns. The bill does not become law and Congress has no opportunity to override. -
VETOED:OVERRIDE_FAIL_ORIGINATING:HOUSE
,VETOED:OVERRIDE_FAIL_ORIGINATING:SENATE
. Veto override failed in the House or Senate, the bill's originating chamber. -
VETOED:OVERRIDE_PASS_OVER:HOUSE
,VETOED:OVERRIDE_PASS_OVER:SENATE
. These status codes indicate a veto override attempt was successful in the originating chamber, and that it is now up to the second chamber to attempt the override. The chamber named in the status is the chamber that just had a successful override vote. -
VETOED:OVERRIDE_FAIL_SECOND:HOUSE
,VETOED:OVERRIDE_FAIL_SECOND:SENATE
. Veto override passed in the the originating chamber but failed in the second chamber. -
ENACTED:VETO_OVERRIDE
. The bill was vetoed but the veto was overridden in both chambers. -
ENACTED:TENDAYRULE
: The bill became law pursuant to the "ten Days (Sundays excepted)" provision of the Constitution, when a bill is neither signed nor vetoed. This has happened only six times since the 93rd Congress, none recently.