One of my students wanted to harvest data from a government site on nuclear facility decommissioning. The data is published online, but not in the most friendly format.
There was a map:
Followed by many tables:
It's possible to scrape the data (maybe import.io, or Chrome scrapers and more) but the one I want to cover here is looking for the file that feeds that map, using the Chrome's Developer Tools.
Pretty slick, eh?
There was a map:
Followed by many tables:
It's possible to scrape the data (maybe import.io, or Chrome scrapers and more) but the one I want to cover here is looking for the file that feeds that map, using the Chrome's Developer Tools.
- Go to the page: http://www.nrc.gov/info-finder/decommissioning/
- Do Command-Option-I or go to Chrome menu > More tools > Developer tools.
- Click on the Network tab, then refresh the page.
- Now scroll through the results. This is a list of every file that the browser downloaded to create the web page. We are looking for something that might be the data for the map. Typically we're looking for something that ends in .xml for .json
- Sure enough, we find something called decomissioning.xml
- Right-click on that entry and choose Open Link in new Tab and you get the XML file: http://www.nrc.gov/admin/data/gmaps/decomissioning.xml
What to do with this, though? I googled "xml to csv converter" and found this site: http://www.convertcsv.com/xml-to-csv.htm
- I took the url to the xml file and entered it into the proper field and loaded it.
- Then in the box below, it converted that file to CSV, which I was then able to download.
Which gave me a pretty clean csv file I can use in Excel.
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.