Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Historical Committee assignments #46

Open
alexanderfurnas opened this issue Mar 14, 2013 · 21 comments
Open

Historical Committee assignments #46

alexanderfurnas opened this issue Mar 14, 2013 · 21 comments

Comments

@alexanderfurnas
Copy link

Great work here, such an excellent source. I was curious about the possibility of keeping historical committee assignments for legislators from their previous terms. As I understand only current committee assignments are housed here. Anyone have thoughts on this?

@dwillis
Copy link
Member

dwillis commented Mar 14, 2013

It's on my list - I have assignments from the 105th congress onward in the NYT data, but only 111th-present are in the API and vetted. But these should be coming.

@konklone
Copy link
Member

That's great - Derek, if you do that, and wouldn't mind updating this
thread, I'd be happy to do the legwork of importing them into our data here.

On Thu, Mar 14, 2013 at 12:16 PM, Derek Willis [email protected]:

It's on my list - I have assignments from the 105th congress onward in the
NYT data, but only 111th-present are in the API and vetted. But these
should be coming.


Reply to this email directly or view it on GitHubhttps://github.com//issues/46#issuecomment-14912061
.

Developer | sunlightfoundation.com

@alexanderfurnas
Copy link
Author

Fantastic. Thanks for the response Derek.

@schmod
Copy link
Contributor

schmod commented Apr 18, 2013

The Senate Calendar includes a listing of committee assignments, and is available from fdsys as far back as 1996 (although it's PDF only prior to the 105th Congress).

http://www.gpo.gov/fdsys/browse/collection.action?collectionCode=CCAL&browsePath=107%2FSCAL%2F2002-11%2F11-20%5C%2F4%3BFINAL&isCollapsed=false&leafLevelBrowse=false&isDocumentResults=true&ycord=0

@dwillis
Copy link
Member

dwillis commented Apr 18, 2013

Yep, a good resource, although those are the "final" rosters and don't reflect changes made during the course of each congress, which ideally we'd like to have.

@schmod
Copy link
Contributor

schmod commented Apr 18, 2013

Aha, got it.

@schmod
Copy link
Contributor

schmod commented Apr 18, 2013

Hm. You could step through hearing reports on FDSys, which all have the supposedly-then-current committee membership attached to them.

Parsing actually might be fairly easy (as far as these things go), as the GPO put the committee membership in the XML metadata for each document.

@jasonab
Copy link

jasonab commented Jul 17, 2013

Just wanted to check on this issue, with Ed Markey moving from the house to the senate today. It would be nice for the data to reflect that in his commitee memberships. For my needs, I don't care about past data so much as current changes, and maybe the 112th congress. It'd be nice to get at least that much.

@JoshData
Copy link
Member

We update current committee assignments using the committee_membership.py script. I've just run it, see f15f12d.

@bchartoff
Copy link
Contributor

There's a wealth of historical committee membership data here:
http://web.mit.edu/17.251/www/data_page.html#2%29

Pros:

  • It does record mid-session changes to rosters
  • Easily readable data for all historical congresses (csvs w/ icpsr id keys)

Cons:

  • Doesn't include subcommittees, only parent committees
  • Large time lag, certainly not a replacement source for updating current data (just a one-off historical update)

Given that the current-committees data has higher granularity (sub committees), is it worth scraping and preserving this data for historical committee membership?

@JoshData
Copy link
Member

Have to be a little careful with that data. Some is listed as for academic use only.

@schmod
Copy link
Contributor

schmod commented Aug 27, 2013

If anybody wants to brute-force this, Robert Byrd compiled one of the more comprehensive listings of old committees (and their chairpeople, but not members) that I've seen. The Senate historian seems to be keeping the list up to date.

Full membership information is available in the congressional directory, which has been published continuously since 1820. Scanned copies should be available from archive.org. Good luck getting that data into a structured format though...

Charles Stewart's data from the 1st-79th congresses does not have the academic-only disclaimer, but he does request a citation. (If you want his data served to you on a dead tree, you can apparently also buy the thing as a 4,000 page printed volume). I'm pretty sure that CQ also has a fairly comprehensive database of this information, locked away somewhere.

@schmod
Copy link
Contributor

schmod commented Aug 27, 2013

Oh, and the Wikipedians have compiled a good listing of resources for researching historical committee information....

@konklone
Copy link
Member

If a link in our README would suffice as a citation, I don't have a problem
with that.

On Tue, Aug 27, 2013 at 10:49 AM, schmod [email protected] wrote:

If anybody wants to brute-force this, Robert Byrd compiledhttp://books.google.com/books?id=PeHByMYxVm8C&printsec=frontcover&dq=isbn:0160632560&hl=en&sa=X&ei=xq4cUu_EEqi9sASwz4GQDA&ved=0CC8Q6AEwAA#v=onepage&q&f=falseone of the more comprehensive listings of old committees (and their
chairpeople, but not members) that I've seen. The Senate historian seems to
be keeping the list up to datehttp://www.senate.gov/artandhistory/history/resources/pdf/CommitteeChairs.pdf
.

Full membership information is available in the congressional directory,
which has been published continuously since 1820. Scanned copies should be
available from archive.org. Good luck getting that data into a structured
format though...

Charles Stewart's data from the 1st-79th congresses does not have the
academic-only disclaimer, but he does request a citation. (If you want his
data served to you on a dead tree, you can apparently also buy the thing as
a 4,000 page printed volumehttp://books.google.com/books?id=J4JPMQEACAAJ&dq=isbn:1568021712&hl=en&sa=X&ei=UrAcUpCdI_Si4AP7u4B4&ved=0CDgQ6AEwAg).
I'm pretty sure that CQ also has a fairly comprehensive database of this
information, locked away somewhere.


Reply to this email directly or view it on GitHubhttps://github.com//issues/46#issuecomment-23342377
.

Developer | sunlightfoundation.com

@bchartoff
Copy link
Contributor

I'm w/ @konklone on README citation. I've also had zero luck getting CQ data in the past, they hold onto it pretty tight.

@schmod
Copy link
Contributor

schmod commented Aug 27, 2013

Maybe somebody should send Charles Stewart an email as a courtesy?

@konklone
Copy link
Member

Agreed. And we should invite him to join Github and help us out!

@wilson428
Copy link
Member

I can take a stab at revisiting this. Senate calendar from FDSys still seem like a good place to start? @schmod, where is the XML metadata that has the content of the committee memberships that you referenced a few months ago? Can't locate it just poking around.

Also getting lots of dead links for commands like this:

fdsys --year=2009 --store=text,xml --collections=CCAL

e.g.

Downloading: data/fdsys/CCAL/2009/CCAL-111scal-2009-10-30/document.xml
file not found: http://www.gpo.gov/fdsys/pkg/CCAL-111scal-2009-10-30/xml/CCAL-111scal-2009-10-30.xml

Most of GPO site seems to be active. Any ideas?

@konklone
Copy link
Member

I believe GPO's FDSys is only open for certain high priority collections:
https://twitter.com/USGPO/status/384993220536455168

@davidmooreppf
Copy link

Just popping in to say I'm finding this thread helpful in our latest Cong. research, thanks all.

If anyone has a lead on historical member data with subcommittee affiliations, it would be of interest to us, but parent committees are a good start.

Also, I see this has been a recent request again, in issue #522 - maybe this is an area of wider interest for re-use.

@JoshData
Copy link
Member

The Congressional Directory was mentioned earlier, but I was looking it over so I thought I'd post more information:

  • There is plain-text from GPO going back to 1997: https://www.govinfo.gov/app/collection/cdir
  • Some Congresses have more than one update.
  • Recent years seem to only have Senate membership (but we have recent years from other sources).
  • Some years seem to have subcommittees.

I started writing some code before deciding parsing the plain text would be too hard to get done any time soon, but here's some code to pull down the text files:

import json
import urllib.request

def walk_directory(url):
	print(url + "...")
	directory = json.loads(urllib.request.urlopen(url + "?fetchChildrenOnly=1").read().decode("utf8"))
	for node in directory["childNodes"]:
		if node["nodeValue"]["level"] == 3 and node["nodeValue"].get("displayValue", "") != "Committee Assignments":
			# Skip nodes that don't have committee assignments within them.
			pass
		elif "value" in node["nodeValue"]:
			# Recursively go into this node.
			walk_directory(url + "/" + node["nodeValue"]["value"])
		elif re.match("ASSIGNMENTS OF (SENATORS|REPRESENTATIVES) TO COMMITTEES", node["nodeValue"].get("title", "")):
			# This holds committee assignments!
			parse_committee_assignments(node["nodeValue"]["packageid"], node["nodeValue"]["textfile"])

walk_directory("https://www.govinfo.gov/wssearch/rb/cdir")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants