Wednesday, March 11, 2015

Batch geocoding with Texas A&M geocoding service

Data Visualization, Spring 2015


For our Spring 2015 class project, we geocoded every school in Texas. The files are available for free use here: https://github.com/utdata/texas-schools

This is an edited version of the notes for class. Some of it may be discombobulated due to editing around the project. Forgive me.

------

Texas A&M Geoservices offers a (mostly) free, awesome batch geocoding application, including step-by-step instructions on how to use it.


We started with the TEA's list of schools throughout the state (Download School and District File with Site Address), which I found here: http://mansfield.tea.state.tx.us/tea.askted.web/Forms/Home.aspx.

The entire file is available here. I split it into pieces and gave each student a file, which they used for the following steps.

  • Start by converting your source material to a .csv file if it isn't already. (You should be good if you saved the file from github as I asked.) There is one thing to watch here: The school and district number fields start with a ' because some of these numbers start with zeros. If you lose that quote mark and the field turns into numbers, you will not be able to match the data laster.
  • Log in or register
  • Since you are starting a new project, select "Upload a new database"
  • Choose your file type, and then go browse to your file
  • Upload the file
  • Validate the file
  • Review the upload, and then choose Geocoding
  • Go to the next step where you choose which fields are what.
    • StreetAddress needs to be SchoolSiteStreetAddress (NOT SchoolStreetAddress)
    • City needs to be SchoolSiteCity
    • and so on.
    • Leave the other defaults checkmarks as-is
  • Start the process.


You will get an email when your process is completed. There is a link there that will take you to a page like this where you can download:


Screen Shot 2015-01-19 at 8.54.56 AM.png
The file you download will be the one you uploaded, but with 28 new columns added. We will keep all of the fields, but the ones are are interested in are:


  • GeocodeQualityType
  • Latitude
  • Longitude


The GeocodeQualityType tells us the method used to match the address. It ranges from finding the exact parcel (good!) to finding the exact state (not good! That's not very specific for finding a school.) Most frequently you will see:
  • ExactParcelCentroidPoint: This is good. It found the exact land parcel.
  • AddressRangeInterpolation: This is OK. It found a close approximation based of the address. We can look at the MatchScore to see how close it was, but we'll just leave as is for now.
  • USPSZipAreaCentroid: This is bad for us. It might have found a 100% match, but it is only based on the center of the ZIP code, so it may not be anywhere near the school. We will have to manually code this address for any accuracy.
  • You might also see City or State centroids, which are also bad. Basically anything but the first two are bad.


What to do if they are bad? Manually find them in Google Maps, if you can. If you do update the Lat/Long manually, then change this value to "Manual" so we know it has updated. If you can't find it, leave it be so we know it is still bad.


Fixing bad addresses

We'll use this file of AISD results as an example to talk about fixing addresses. (You should use your own file from the geocoding service)


Of the 131 addresses I submitted to the geocoder, 16 of them are USPSZipAreaCentroid and need to be manually coded. That's 8%. If that percentage holds state-wide, then you can expect about 40 manual codings out of your 500+. Probably more if you have rural schools. It's not hard, though.


  • Now that you've got your addresses back, you can save-as on the file and make it an .xlsx file.
  • Filter your data on GeocodeQualityType for your bad addresses.
  • You might want to hide columns between SchoolSiteZip and GeoCodeQualityType so you can see the name and address of the school along with your Lat/Long as you work. (Control-click on the column letter, then choose "Hide")
  • Go to Google Maps and put your school name and address into the search bar and see if you can find it. (If you find you have a problem with the "What's here" funcitonality, try Classic Google Maps. It works mo betta.)
    • If you have trouble, you might try spelling out Elementary, etc.
    • If you can't find by name, try typing in the School Site Address
  • Once you find it, Zoom in to the 500ft or 200ft level if you aren't already.
  • Control-click on the map right on top of the school, and choose "What's here". This will put a box at the top left of the map that includes the Latitude and Longitude. (Screenshot below)
  • Copy and paste those values back into the spreadsheet, then update the GeocodeQualityType to "Manual" for that record. DO THIS CAREFULLY. You are creating data for the world … do it right.
  • Rinse and repeat for all your bad addresses.
  • If you just can't find it, leave it as the Zip or City centroid, but take note of it.
  • Submit your finished file to the Canvas assignment. Note any issues you had, like how many schools you could not find at all.


Screen Shot 2015-01-19 at 9.40.40 AM.png

0 comments:

Post a Comment

Note: Only a member of this blog may post a comment.