Friday, May 15, 2015

Finding data through the web inspector

One of my students wanted to harvest data from a government site on nuclear facility decommissioning. The data is published online, but not in the most friendly format.

There was a map:



Followed by many tables:


It's possible to scrape the data (maybe import.io, or Chrome scrapers and more) but the one I want to cover here is looking for the file that feeds that map, using the Chrome's Developer Tools.
  • Go to the page: http://www.nrc.gov/info-finder/decommissioning/
  • Do Command-Option-I or go to Chrome menu > More tools > Developer tools.
  • Click on the Network tab, then refresh the page.

  • Now scroll through the results. This is a list of every file that the browser downloaded to create the web page. We are looking for something that might be the data for the map. Typically we're looking for something that ends in .xml for .json
  • Sure enough, we find something called decomissioning.xml

What to do with this, though? I googled "xml to csv converter" and found this site: http://www.convertcsv.com/xml-to-csv.htm
  • I took the url to the xml file and entered it into the proper field and loaded it.
  • Then in the box below, it converted that file to CSV, which I was then able to download.


Which gave me a pretty clean csv file I can use in Excel.


Pretty slick, eh?

0 comments:

Post a Comment

Note: Only a member of this blog may post a comment.