Saturday, May 16, 2015

The Texas Public Schools geocoding project

My data visualization class this year embarked on a project to geocode every public school in Texas. The resulting file can be used to join with many data sets from the Texas Education Agency.

You can find the resulting data files on GitHub. Anyone is free to download and use.

We started with 9300 rows of data, and the students and observers in the class each took 520 of the addresses, ran them through Texas A&M's geocoding service, and then hand-corrected any result that was not better than a ZIP code match.

Check out the result here: https://github.com/utdata/texas-schools

Use them to write stories about school performance data and other data sets from the TEA.

Friday, May 15, 2015

Finding data through the web inspector

One of my students wanted to harvest data from a government site on nuclear facility decommissioning. The data is published online, but not in the most friendly format.

There was a map:



Followed by many tables:


It's possible to scrape the data (maybe import.io, or Chrome scrapers and more) but the one I want to cover here is looking for the file that feeds that map, using the Chrome's Developer Tools.
  • Go to the page: http://www.nrc.gov/info-finder/decommissioning/
  • Do Command-Option-I or go to Chrome menu > More tools > Developer tools.
  • Click on the Network tab, then refresh the page.

  • Now scroll through the results. This is a list of every file that the browser downloaded to create the web page. We are looking for something that might be the data for the map. Typically we're looking for something that ends in .xml for .json
  • Sure enough, we find something called decomissioning.xml

What to do with this, though? I googled "xml to csv converter" and found this site: http://www.convertcsv.com/xml-to-csv.htm
  • I took the url to the xml file and entered it into the proper field and loaded it.
  • Then in the box below, it converted that file to CSV, which I was then able to download.


Which gave me a pretty clean csv file I can use in Excel.


Pretty slick, eh?

Wednesday, May 6, 2015

Project: Austin's restaurant inspections process

By Renee Moreno

Click for interactive visualization
Last year the City of Austin performed over 8,000 restaurant inspections. These inspections cover just about any facility that serves food, from churches to restaurants. These scores are then posted online for the public to be able to access and use to judge the quality of a restaurant.

In Texas, the standard elements of a restaurant inspection are set by the state, but it’s then up to the city to monitor and enforce. In Austin, there are 23 Environmental Health Officers who are assigned to monitor the establishments that require inspections according to Carole Barasch, the manager of communications for the Austin department of Health and Human Services.

Inspections happen during any regular business hours, which allows the officers to monitor the daily operations of a business. They bring a scorecard along with them. Included on the card are the three main categories of requirements: Food, Personnel/Handling and Facility/Equipment. Each category has some important requirements listed under it. For example, one of the personnel requirements is “proper/adequate hand washing.” Next to each requirement there is a space for an inspector to deduct points if something isn’t up to code.

Click for interactive visualization
Each part of the scorecard relates back to the Texas Food Establishment Rules, which owners can read to make sure they’re following all of their rules. Some of which are very specific. For example, the code has specific times that they can “hold” different perishable items while waiting to fill orders. This time changes depending on the circumstances and the temperature at which the foods are kept.

However, these rules aren’t just about the food safety. Some of what an inspector looks for is the safety of staff and patrons. One other thing inspectors look for is a poster on the Heimlich maneuver. This poster is required to be in all food establishments at which people are eating. The sign must meet certain size and color requirements and be posted in a place where people could find it if needed. Failure to have this poster, or other crucial consumer advisories can result in a deduction of 3 points from restaurants overall score.

At the end of the visit the officers take add up the number of deductions and subtract that number from 100 to get to the final score that shows up online.

The officers are responsible for inspecting each of the approximately 4,000 food establishments within the city and its surrounding areas twice per year. They aren’t always able to inspect restaurants at this frequency. According to their website, “inspections are prioritized by risk.” This means some restaurants are only inspected once per year.

However, the lack of frequent inspections doesn’t necessarily mean that a restaurant is unsafe. Many restaurants, like Guero’s Taco Bar perform their own inspections to ensure quality at their restaurant.

“Our main goal is to make sure people leave here happy and healthy,” said Rob Lippencott, owner of Guero’s.

Lippencott’s strong desires to keep his patrons happy seem to be a theme across Austin. Over the past three years, less than 2 percent of inspections have resulted in a failing grade. A failing grade is just like it was back in school, anything less than a 70. If a restaurant fails, depending on the violations they have 48 hours to begin fixing the problem. After proper changes have been made, owners must get their establishment re-inspected within approximately 30 days.
Food trucks and farmer's markets however aren’t included in these restaurant inspections. They along with other types of establishments are monitored differently. “Only brick & mortar establishments receive scored inspections,” said Barasch. Their inspections are simply pass/fail. As such they're kept in a different data set that isn't currently available online.

Regardless of the facility being inspected one of the most important things to the department of Heath and Human Services is that the public is able to know about what is going on in their city.

“Educating the public with accurate information is something we work to do on a daily basis,” said Barasch.

Click for interactive visualization

Project: Bullying a hidden epidemic

By Manzhi Wu
Data Visualization, Spring 2015

Sonia Kotecha, cofounder of The Asian Behavioral Health Network in Austin, still remembers what happened in Vermont when she was a toddler. She was called “poop poop” on the playground just because she had a darker skin.

In kindergarten, she was teamed with another South Asian student to address teasing experienced by children of color in a school where they were in the minority. “As a kid it felt strange, I thought ‘why is a friend being imposed on me?’ It drew attention to the fact that something was different about me,” Kotecha said.

Click image to view visualization.
When Kotecha was a child, her babysitter’s children, also South Asian, were picked on at school. “Growing up, I often felt like an easy target for teasing and bullying, because I was shy and came from a different cultural background than my peers at school,” Kotecha said.

Statistics from Centers for Disease Control and Prevention show that in 2013 almost one out of every five high school students in the United States reported being bullied on school property. Of that number, 21.7 percent were Asian American, 17.8 percent were Hispanic and 12.7 percent were African American. According to the Federal Office of Management and Budget, “Asian” refers to people from the Far East, Southeast Asia or the Indian subcontinent.

Besides, the rate of Asian American students being bullied increased more than any other racial group over the previous two years, rising 6 percent.

“Unfortunately, when it comes to Asian American and Pacific Islanders, they have the highest rate of bullying with very limited resources,” said Linda Phan, commissioner from the White House Initiative on Asian Americans and Pacific Islanders.

Dr. Richard Yuen, a clinical psychologist at Lonestar Psychological Services in Austin, said the effects of bullying can be so subtle that sometimes many people overlook them, but over time they can transform into something more severe for the victim.

“There is increased chance of anxiety disorders, depression as well as low self-esteem that really cause me to concern for their overall adjustment to life,” Yuen said.

However, The American Psychiatric Association found that among all ethnicities, Asian Americans and Pacific Islanders are the least likely to seek help for mental disorders. Experts say a cultural value of self-reliance and a fear of shaming the family may keep many of those being bullied from seeking assistance with emotional problems.

“If parents are not supportive of students’ emotional health, then the students have no place else to turn, so they turn more inside,” said Vincent Cobalis, the vice chair of the City of Austin Asian American Quality of Life Commission. “The Asian culture is very reluctant to admit to mental health issues, then they don’t seek out help. We need to break out of that perception that dealing with mental health issues is negative.”

Kotecha is now a social worker and her work touches on race, ethnicity and children and family support. She believes that parents can play a crucial role in addressing bullying and racism.

“We don’t get a lot of protective messages from our parents about potential racism and discrimination in the mainstream society, because I think many of our parents who are immigrants don’t know the history of race in America; they don’t know how deeply rooted that is. They are coming here focusing on education, good quality of life. They didn’t grow up in the context.”

Kotecha said she thought that parents should not simply try to avoid confrontation. “Many Asian parents just told their children to go to school, focus on studies and ignore everything else. They should know it’s more complicated than that. Kids have pressure to fit in and learn social skills, which can also be productive in the real world too,” she said.

For Kotecha, it is all about education – educating teachers and students to be more attuned to bullying and to be more open to different cultures, races and ethnicities. However, in Texas, children from kindergarten to eighth grade don’t have many chances to learn about Asian American history.

According to Noreen Rodriguez, a bilingual elementary teacher in Austin for nine years, she remembers the only times Asians are mentioned in textbooks are for the Chinese during the building of the transcontinental railroad and Chinese exclusion, and Japanese internment during World War II. “The two instances where you talk about Asian groups, it was a very long time ago,” Rodriguez said.

Rodriguez said she noticed when people talk about Asia now in school, it’s often through 3“F”s: Food, Festival and Fun. She thought that’s not enough: “You celebrate a holiday one time a year and that’s it. So what the students know is ‘Oh, the Chinese people have this holiday and it’s fun, and I made a lantern in school,’” Rodriguez said.

She is now working closely with a historian at UT to develop a curriculum for kindergarten through eighth grade in Asian American studies and professional development training for AISD teachers.

On Rodriguez’s bookshelf, there is a children’s book called “Paper Son,” which tells a story of Chinese immigrants. She hopes teachers can teach Asian American history using children’s literature, when there’s so little in textbooks. “It’s hard to teach the things you don’t know,” said Rodriguez, “I’ll expose them to these books and tell them about the history that is not part of what they themselves learned.”

*Anti-bullying campaign across the nation*

Tales of harassment and staggering statistics have prompted actions nationwide, including Austin, where a group was voluntarily formed to deal with the bullying of Asian Americans. The team was composed of Vincent Cobalis, the vice chair of Austin Asian American Quality of Life Commission; Thao Phao, licensed professional counselor and therapist; Peteria Chan, research associate at the Texas Institute for Excellence in Mental Health at UT Austin; and Nicole Williams, a teacher from St. Andrew’s Episcopal School.

“There is expectation among Asian cultures that if the rule is there, then people should be fine,” said Cobalis, “But I don’t think that you can rely just on rules and policies. You have to get people to care about you.”

In Austin, the team is trying to launch a mentorship program. “The idea is to get college students that have been bullied to talk about their experience and share their experience with high school students, and then high school students can share their experience with middle school students,” Cobalis said.

Dr. Richard Yuen said this kind of communication is vital.

“The first and foremost component of any anti-bullying measure is we have to have an honest, open and friendly dialogue about aggression and bullying behavior, ” Yuen said. He suggested that stakeholders including youth, parents, teachers and principals should all be involved.

Right now, the White House initiative is holding listening sessions across the country, according to Linda Phan, commissioner in the initiative. A federal survey for those who interact with Asian youth is in process.

The listening session was aimed at learning what the kids’ experiences are like. “When they are being bullied, who are they getting help from? Do they know where to get help? Do they think their teacher and parents are good sources?” Phan said.

On April 29, the White House initiative will come to Austin and team up with Canyon Vista Middle School for a listening session.

The Asian Behavioral Health Network (ABHN) and the YWCA of Greater Austin are also organizing a workshop focusing on the impact and cultural implications of bullying. A panel of mental and behavioral health professionals will provide examples and lessons from real life experiences, as well as foster a collaboration to address the issue.

“Our hope for this training is to raise awareness about the hidden epidemic of bullying and Asian American youth, while discussing ways we can work together to prevent bullying in our schools and community,” Kotecha said.

Project: Growing number of women vets prompts need for state program

By Nicole Cobler
Data Visualization, Spring 2015

AUSTIN — A state program to provide a permanent means for women military veterans living in Texas to receive health and parental benefits is advancing in the House and Senate.

If the legislation is eventually approved and signed into law by Gov. Greg Abbott, the measure would be the first such program geared strictly to the specific needs of women who served in the U.S. armed services and transitioning back to civilian life.

“By making it statutory program, when we go out in the state and advocate for women veterans, it give us a little bit of street credibility,” Edith Disler, manager of the current Women Veterans Initiative, a program with the Texas Veterans Commission to raise awareness and help women obtain benefits.

The need for the statuatory program has it roots in Texas’ long ties with the U.S. military, with major installations in San Antonio, El Paso, Killeen, Corpus Christi and other Lone Star cities.

Despite a growing number of female veterans in Texas, there is not a permanent state program to address the health care needs of the group.

There are roughly 1.7 million veterans in the state, and women make up 11 percent of that figure, according to the U.S. Department of Veterans Affairs.


There are 155,000 veterans in Bexar County, including 25,000 women veterans, or 16 percent, according to the VA.

Although a temporary initiative was established under the Texas Veterans Commission in 2012 to help female veterans readjust to civilian life, no permanent program exists.

Senate Bill 2001, authored by Sen. Sylvia Garcia, D-Houston, would establish a permanent Texas Women Veterans Program under the commission to address the needs of a growing female veteran population. The bill has companion legislation in the House filed Rep. Ana Hernandez, D-Houston.

Both bills have been approved by House and Senate committees.


Garcia said the current program should be permanent because of the different health care needs that females have when leaving the service.

“Those needs are different,” Garcia said. “I think it’s important that the agency be responsive to that.”

The current initiative has four employees who manage employment, claims and outreach for female veterans in the state. Disler said she does not foresee additional employees being hired when the program becomes permanent.

Disler said that some lawmakers questioned the need for specific programs for women.

“That was the big question,” Disler said. “We have all these programs for veterans, why do women need anything in particular?”

Testimony at legislative hearings committees revealed the differences female veterans face compared to their male counterparts. Those differences including a higher risk of sexual assault, receiving prenatal care and adjusting to running a household when they return from the service.

“Basically it wasn’t lack of support, it was lack of knowledge,” said Disler, an Air Force veteran.

Lashondra Jones, policy associate with Texas Criminal Justice Coalition and a former Marine, testified before the Senate Committee on Veteran Affairs and Military Installations in favor of a permanent program.

Jones said she has met with many women who have experienced military sexual trauma and are unable to get the help they need.

According to a 2014 Veteran Affairs report, one in four women have experienced military sexual trauma. The department considers any sexual activity against a service member’s will as sexual trauma.

“There are so few women veterans programs that cater to females,” Jones said. “It’s just not enough.”

The population of female veterans in the U.S. will only grow, according to the VA. A 2014 report showed that women veterans make up less than 10 percent of total veterans in the U.S., but will account for 16 percent of all veterans by 2043.

Garcia isn’t the only lawmaker taking notice of the growing need of programs for female veterans in the state.

Sen. Jose Menendez, D-San Antonio, who served as chair of the House’s Defense and Veterans’ Affairs Committee before being elected to the Senate, authored several pieces of legislation to help female veterans this session.

In addition to being a co-author on Garcia’s bill, Menendez authored legislation that would create a women veterans mental health initiative. The bill was approved by the Senate.

“We’ve been addressing male veterans’ issues for hundreds of years, and it has not stopped,” Menendez said. “But because the female warrior is new in combat and new in this arena, we haven’t had specific attention paid to their issues.”

ncobler@express-news.net

Twitter: @nicolecobler

Wednesday, April 8, 2015

Adding custom shapefiles to Tableau

Data Visualization, Spring 2015


Tableau has plenty of useful geography built in that we can utilize for data: Countries, states, counties, zip codes and even congressional districts. But, it doesn't have school boundaries, voting precincts or some other geographic features that we could use in journalism.

Unfortunately, Tableau doesn't read the Esri shapefile format or the Google keyhole markup language, so you can't pull in these custom shapes like you can with QGIS or Google Fusion Tables. Tableau understands polygons, and while it is hard to manually convert shapefiles to this polygon format, the company Alteryx has a tool that does it for us.

This lecture walks through converting a shapefile to polygons, pulling it into Tableau and then matching it with data to create a map.

Our goal is to use the City of Austin voting precinct shapefiles and election results to show which candidate garnered the most votes in each precinct for the 2014 City of Austin races. We'll need three things to complete this task:

First, we need a shapefile that will have a field we can link with our data. In our case the CoA_precincts_2012.zip shapefile that I got from the city demographer, that includes the precinct. (Go ahead and download it. You do NOT need to unpack the zip file.) It has the shapes of voting precincts within the City of Austin boundaries.

Next, we'll need our data, coa_winnerslist.xlsx. (Go ahead and download it.) This data set was prepared the day after the election based on total returns provided by the Travis County Elections division. There was a bit of scripting involved to get the file into the shape you see here, which we won't go into here.

The magic happens with our third item, an Alteryx Analytics Gallery tool called Tableau Shapefile to Polygon Converter. This will take our Esri based shapefile and convert into polygons Tableau can understand. You will need an Alteryx account, which is free. (Sign up now.)

Convert the shapefile to polygons

  • Go to the Alteryx Analytics Gallery and under Search Workflows look for "Tableau Shapefile to Polygon Converter"
  • Once there, click on the Signin button, put in your credentials, and then click the Run button.
  • You'll be asked to upload the shapefile in zipped format … we will use the CoA_precincts_2012.zip shapefile described earlier. (If you are doing this with a different shapefile, make sure you have all the files in a folder and then zip them together. I can run through how to do this in class, or Google it.)
  • Once you run the process you will get a window that allows you to download PolygonConverted.tde, which is a Tableau Data extract. There is another file that is a .csv that we don't need for this.tableau-shape-download.png
  • Download this with you data files so you can find it.

Plot the polygons

  • Start a new Tableau project.
  • When asked to connect to a file, choose Tableau Data Extract and then go find your PolygonConverted.tde file you downloaded from Alteryx. You'll be asked to Create an Extract Connection, where you can just OK the default.tableau-shape-dragpoints.png
  • There are three fields (PointID, PolygonID an SubPolygonID) that show up as Measures that need to instead become Dimensions. It's easy enough to do so, just drag them from the Measures area into the Dimensions area. (Pic at right)
  • Double-click on Latitude to move it to Rows
  • Double-click on Longitude to move it to Columns
  • Change the dropdown on Marks from Automatic to Polygon.
  • Now we'll place the points. There are three parts:
    • Drag PointID onto Path
    • Drag PolygonID to Detail
    • Drag SubPolygonID to Detail
  • This will give you a monochoromatic view of the map and you can't see the boundaries. Let's adjust that:
    • Click on the Color mark and under Effects change the border color to white.


Import and join data

Next, we'll pull coa_winnerslist.xlsx into our project and create a relationship to the polygon shapes.
  • Choose Data > Connect to data.
  • Choose the Microsoft Excel type, and go find your coa_winnerslist file.
  • When the data inspector comes up, change the the type of Precincts to a string instead of a number. (Tableau sees numbers here and things they are continuous values that can been added together, but we need them to be a string because they are essentially a text code for the location.) Click Go to Worksheet.
  • We'll still need to move Precincts from Measures to Dimensions. Click and drag it from one box to the other, just like we did PointID, etc. (What's going on here is Tableau sees that field as a numbers, but we need to convert it to text to match our polygon file.)
  • Now we have to connect the "Precincts" fields in both data sources so they can talk to each other:
    • Go under Data > Edit Relationships
    • Click on Custom and then Add to define a relationship. (See screenshot below)
    • Choose the fields you want to match up. In this case, "precincts"


tableau-shape-match.png

tableau-shape-linked.pngtableau-shape-clicklink.png

  • Lastly, you have to click on the green broken link and turn it into a red connected link. In this case red is what you want.
  • This last part is very similar to the "join" we did in Tableau last week. By linking these together, we are joining the two sets.


Build our viz

We now have all the pieces in Tableau, and our data is connected to our shapes, so we can build the main portions of this map. Remember our goal is to show which candidate got the most votes in each precinct.

Filter by race

  • We will want to filter the map for each race, so take the Race dimension and drop it on Filters. When you do, you'll be asked which fields to include … use all except Null.
  • Click on the dropdown on the Race pill and choose Show Quick Filter. You can then use the dropdown on the filter created (which might be under your Show Me pallete) and make that a single-value dropdown.
  • Go back to that dropdown, choose Customize and uncheck "Show all values".

Candidate visual cue

  • Take the Candidate Name field and drop it on the Color mark. This should list which candidate "won" each precinct.

Race results

  • We created a special field in the data that included the results of each candidate for that precinct and race. Drag the dimension Race tooltip onto the Tooltip mark and then try it out to see how it looks.


That's the basics of the map. There is some work to be done before it is publishable, like creating a dashboard, writing headlines and chatter, cleaning up tooltips and such, but the basics of the project are there, including the hard part of bringing a custom shapefile into Tableau.

Wednesday, April 1, 2015

Merging data in Tableau

Data Visualization, Spring 2015


Now that we have a data set of all schools in Texas and their location, we can match that up to just about any other campus-level data set from the Texas Education Agency. (This video explains how to do what we are outlining below, but with different data:)


We are going to do this for STAAR Postsecondary Readiness Standard (All Grades)


We are going to start by pulling the most recent data from the TEA:


As of January, 2015, this was 2013-14 Texas Academic Performance Reports:


  • Go to the download data files link
  • Choose Campus download
  • Select Continue to confirm you are downloading all campuses
  • Find in the list STAAR Postsecondary Readiness Standard (All Grades), select it and Continue.
  • To keep things simple, we'll pull only one data set: Postsecondary Readiness Standard, Two or More Subjects, 2014. Select that, and then click Download.
  • You will also need the reference file there, STAAR Postsecondary Readiness Standard (All Grades) Reference, so you can tell what the column headers are.


The file downloaded is called CSTAAR5.xls. It's a pretty big file … about 4M. Go ahead and open it up in Excel and take a look at it, and maybe saveas a .xlsx file. The first column is CAMPUS. Looks amazingly like the SchoolNumber from our school file, even with the single apostrophe to begin that field.


But notice there is no other identifying information in this file. No school name or address or anything? How can we tell what school we are looking at? That CAMPUS column is the key. We'll match that to our school list. Name this worksheet CSTAAR.


Create a new worksheet in the CSTAAR5 file. Now download and open the completed geocoded address file: We'll use this AISD School file. (Or, if we have the full state from our geocoding lesson ready, we'll use that.) Select all and copy that entire file and then paste it into the blank worksheet in the CSTAAR5 file. Name that worksheet "Schools". Save the .xlsx file.


Now we can go into Tableau and connect to the CSTAAR file. Stop at the connections screen.


Screen Shot 2015-01-19 at 4.14.19 PM.png


Take the CSTAAR5 sheet and drag onto the connection palette. Then when you take the Schools sheet and drag it over, you'll get a connect box, something similar to above.


The first thing to consider is how you want to join the files: Inner, Left, Right or Full Outer. The diagrams help you figure this out … do you want just the records that match on your key in both files? Or do you want all the records from one file and only the matching data from the other? We're going to use Inner for matching records. In our case, a score without a campus name wouldn't be useful for us.


The next thing is to match up the fields that the two will match on. From Data Source, choose the "Campus" field, and from Schools choose "SchoolNumber."


Now, in the data explorer you can explore the data columns all the way to the right and see that the school data has been added to the CSTAAR5 data where there was a match.


Now you can click the Go to Worksheet button and start work on your visualization.


A couple of things about this data:
  • Because TEA uses periods and other non-numbers to code the data, all of the STAAR readiness came in as a string.
    • The . means there is no data for that value for that school.
    • -1 means there was too little data to report it and preserve privacy, by law.
    • You can find a Glossary of these terms on TEA's website.
  • We won't be able to use these numbers as values until we convert them to numbers and move them to measures.
    • Choose the little dropdown on Ca00A004214R > Change Data Type > Number (Whole).
    • Once that is done, drag the field from Dimensions to Measures.


Screen Shot 2015-01-19 at 4.54.06 PM.png


There is another way to handle this. You could blank out all the cells with . or -1 in the Excel file and then pull all the data in again. Tableau will then just see numbers


Of course, you'll notice when you start playing with this data that it isn't shaped very well.

Monday, March 23, 2015

Using importHTML to scrape websites

This is a quick post to show how you can scrape data from websites using Google Spreadsheets. (There are ways to do this in Excel as well).

Let's start with the page, this table of rushing stats on NFL.com. The fact that it is tabular data gives us hope we can scrape it.

Start a new spreadsheet in Google Docs. Go to the first cell and put in this formula:

=IMPORTHTML("http://www.nfl.com/stats/categorystats?tabSeq=0&statisticCategory=RUSHING&conference=null&season=2014&seasonType=POST&d-447263-s=RUSHING_YARDS&d-447263-o=2&d-447263-n=1","table",1)

The function you are calling is IMPORTHTML and it takes three arguments:

  • URL: This is the url you are pulling from. It needs to be in quotes.
  • QUERY: If you are looking for a table or a list. This refers to the HTML of the page. In our case, it is a table, so it is looking for the <table> tag in the code. This should be in quotes. "list" is your other option.
  • INDEX: This is which table on the page. We started with 1 and found it, but some pages might have more than one table on the page. If you put in the number 2 it would look for the second table on the page.
What you get in return is all the tabbed material getting pulled into your page. It is tied to the formula, so if you want to keep it or don't want it to change, copy the content and paste special as values.


Not every page will work, but it is simple enough to try. There are plenty of other discussions on the web about this function, so you can search if you have trouble.

Wednesday, March 11, 2015

Batch geocoding with Texas A&M geocoding service

Data Visualization, Spring 2015


For our Spring 2015 class project, we geocoded every school in Texas. The files are available for free use here: https://github.com/utdata/texas-schools

This is an edited version of the notes for class. Some of it may be discombobulated due to editing around the project. Forgive me.

------

Texas A&M Geoservices offers a (mostly) free, awesome batch geocoding application, including step-by-step instructions on how to use it.


We started with the TEA's list of schools throughout the state (Download School and District File with Site Address), which I found here: http://mansfield.tea.state.tx.us/tea.askted.web/Forms/Home.aspx.

The entire file is available here. I split it into pieces and gave each student a file, which they used for the following steps.

  • Start by converting your source material to a .csv file if it isn't already. (You should be good if you saved the file from github as I asked.) There is one thing to watch here: The school and district number fields start with a ' because some of these numbers start with zeros. If you lose that quote mark and the field turns into numbers, you will not be able to match the data laster.
  • Log in or register
  • Since you are starting a new project, select "Upload a new database"
  • Choose your file type, and then go browse to your file
  • Upload the file
  • Validate the file
  • Review the upload, and then choose Geocoding
  • Go to the next step where you choose which fields are what.
    • StreetAddress needs to be SchoolSiteStreetAddress (NOT SchoolStreetAddress)
    • City needs to be SchoolSiteCity
    • and so on.
    • Leave the other defaults checkmarks as-is
  • Start the process.


You will get an email when your process is completed. There is a link there that will take you to a page like this where you can download:


Screen Shot 2015-01-19 at 8.54.56 AM.png
The file you download will be the one you uploaded, but with 28 new columns added. We will keep all of the fields, but the ones are are interested in are:


  • GeocodeQualityType
  • Latitude
  • Longitude


The GeocodeQualityType tells us the method used to match the address. It ranges from finding the exact parcel (good!) to finding the exact state (not good! That's not very specific for finding a school.) Most frequently you will see:
  • ExactParcelCentroidPoint: This is good. It found the exact land parcel.
  • AddressRangeInterpolation: This is OK. It found a close approximation based of the address. We can look at the MatchScore to see how close it was, but we'll just leave as is for now.
  • USPSZipAreaCentroid: This is bad for us. It might have found a 100% match, but it is only based on the center of the ZIP code, so it may not be anywhere near the school. We will have to manually code this address for any accuracy.
  • You might also see City or State centroids, which are also bad. Basically anything but the first two are bad.


What to do if they are bad? Manually find them in Google Maps, if you can. If you do update the Lat/Long manually, then change this value to "Manual" so we know it has updated. If you can't find it, leave it be so we know it is still bad.


Fixing bad addresses

We'll use this file of AISD results as an example to talk about fixing addresses. (You should use your own file from the geocoding service)


Of the 131 addresses I submitted to the geocoder, 16 of them are USPSZipAreaCentroid and need to be manually coded. That's 8%. If that percentage holds state-wide, then you can expect about 40 manual codings out of your 500+. Probably more if you have rural schools. It's not hard, though.


  • Now that you've got your addresses back, you can save-as on the file and make it an .xlsx file.
  • Filter your data on GeocodeQualityType for your bad addresses.
  • You might want to hide columns between SchoolSiteZip and GeoCodeQualityType so you can see the name and address of the school along with your Lat/Long as you work. (Control-click on the column letter, then choose "Hide")
  • Go to Google Maps and put your school name and address into the search bar and see if you can find it. (If you find you have a problem with the "What's here" funcitonality, try Classic Google Maps. It works mo betta.)
    • If you have trouble, you might try spelling out Elementary, etc.
    • If you can't find by name, try typing in the School Site Address
  • Once you find it, Zoom in to the 500ft or 200ft level if you aren't already.
  • Control-click on the map right on top of the school, and choose "What's here". This will put a box at the top left of the map that includes the Latitude and Longitude. (Screenshot below)
  • Copy and paste those values back into the spreadsheet, then update the GeocodeQualityType to "Manual" for that record. DO THIS CAREFULLY. You are creating data for the world … do it right.
  • Rinse and repeat for all your bad addresses.
  • If you just can't find it, leave it as the Zip or City centroid, but take note of it.
  • Submit your finished file to the Canvas assignment. Note any issues you had, like how many schools you could not find at all.


Screen Shot 2015-01-19 at 9.40.40 AM.png