Ethics.Data.gov brings records and data from across the federal government to one central location, making it easier for citizens to hold public officials accountable.

Open Source Software Used in Creating Ethics.Data.gov

In understanding the structure of lobbying disclosure data, Drew Hess's Senate LD-1 & LD-2 analysis package was very helpful. It gathers and parses the Senate data, then loads it into a database -- and we were able to use it to identify the range of possible values and outliers found within that data.

Naveen Manivannan's FEC-Parser proved essential for parsing FEC data. The FEC stores its data in fixed-width columns, its numbers in the IBM Signed format that was created for COBOL in the 1970s. Manivannan's program gathers up all the files from FEC's FTP site, converts them to a modern format, and then provides a script to load all of the data into MySQL neatly. Because FEC-Parser is no longer being maintained, we had to make some minor modifications to it. Though we've not put it into use for Ethics.Data.gov, Chris Schnaars has also very recently released FEC-Scraper to continue performing the same task.

cURL and sed were also indispensable. cURL's ability to fetch the contents of all URLs within a defined range is priceless (e.g., http://example.gov/year/[2008-2011]/category[0-9]/type{d,j,k}.xls). sed allowed a great deal of wheat to be separated from a great deal of chaff, turning piles of unstructured data into neatly formatted CSV with just a few minutes' work.