Wednesday, March 19, 2014
Monday, March 3, 2014
Notes from NICAR 2014
Posted on Monday, March 03, 2014 by Christian McDonald
This was my first NICAR conference, which has a focus on data-driven reporting and other technical aspects of our craft. The IRE conference usually has similar content, but is also broader in scope.
Here's the full schedule from NICAR 2014 and from IRE 2013 last year. This year's IRE 2014 will be in San Francisco on June 26-29.
But back to NICAR 2014. Here are the goods:
Here is a quick list of the panels I went to and what I got out of them.
Here's the full schedule from NICAR 2014 and from IRE 2013 last year. This year's IRE 2014 will be in San Francisco on June 26-29.
But back to NICAR 2014. Here are the goods:
- IRE's list of all the slides, handouts and tip sheets
- Chrys Wu's list of, well, everything and more
- Matt Waite's how to survive and thrive afterward
Here is a quick list of the panels I went to and what I got out of them.
Wednesday
Before NICAR in Baltimore, I also went to Tapestry, a data visualization conference sponsored by Tableau. Unlike the 900+ NICAR, Tapestry was limited to about 100 participants, all on the same tract. Speakers included Alberto Cairo, Aron Pilhofer, Fernanda ViƩgas and Martin Wattenberg and a host of others. It was good for inspiration, connections and a preview of Alberto's talk at NICAR, allowing me to double up on some content.Thursday
- Since I had seen Alberto's talk the day before, I could skip the first part of my D3 News camp to take some MySQL: I actually took two courses on this, which was more any overview of how a database manager can help you vs. Access, and an intro to Navicat, a paid tool to use a MySQL database. It went a little slow for me, but I did enjoy CIR's Chase Davis' take on it.
- I hoped into the last part of the Intro to D3 talk, which really was a panel walk through of basic code, and I suffered from not being at the beginning. Quickly reviewed notes and hoped I would be OK for afternoon hand-on session.
- Afternoon D3: In short, this helped me understand the structure of D3 better, but didn't build any proficiency and I would have trouble creating the most simple viz now. But it was better than not going.
Friday
- Three hour-long sessions on statistics. The intro was really a get-to-know SPSS, a powerful and expensive statistics software package. We didn't get to the most important part of comparing two types of categorical information, so I felt like we missed out a little. The next session on comparing two continuous fields (linear regression) was awesome, and the third on comparing a categorical and continuous information (logistic regression) was also valuable.
- Since I didn't get much actual SQL on Thursday, I chose the Intro to SQLite class and got the best tip of the conference: you can run SQLite in Firefox using the SQLite Database Manager, with no other software to install (at least on a mac.) This might be how I teach SQL in Data-Driven Reporting next year.
- Next I helped Scott Klein (ProPublica) and Michelle Minkoff (AP) teach a session on grabbing data from the web.
- Last was the lightening talks, 10 quick talks. It was a packed room ... well over 800 people. They were all good ...
Saturday
- Building a data-journalism course. One of the panelists from Maryland has his students work on the same data set so they can help each other. Amanda Hickman works to get more basic data and viz/charts in earlier courses so students have familiarity so they can get further in class. A couple have groups of students working on the same long-term project or different parts of same subject. Canadian used a donuts to describe databases (then ate them).
- Clean, clean, clean your mess. Regular expressions class was helpful for this. I had no problem with the code, but had some problems with teaching style. Cleaning data class introduced OpenRefine and class after was helpful in reinforcing those skills. Refine is awesome.
In between all this I made spent good time with folks from the Canadian Broadcasting Company, Orlando Sentinel, Huffington Post, Pixar/Paramount, the Texas Tribune, the AJC, Dayton Daily News, Seattle Times, the Vancouver Sun, small papers in Pennsylvania, South Carolina and god knows where else. And speaking of God, some good bar time with gentleman attending a Catholic youth ministries and a different time with vets getting training at Veteran's Affairs. Also several folks from various schools and many more I'm just not remembering.
My eyes were opened to some database structures, and skills refined for Refine, which I proceeded to crash time after time for four hours straight working on a huge dataset. The statistics classes were well worth it, too ... I know I need to learn more, but that is OK.
Categories: Spring 2014
Shaping data and the Tableau Data Shaper
Posted on Monday, March 03, 2014 by Christian McDonald
Here is a good post that explains how you might want to clean up and *shape* your data for display in Tableau. How you format your data before import can determine a lot about what you can do with it once in Tableau.
The Tableau Add-in for Excel is a great help for this. Here is a post with detailed instructions about it, but I have some short cuts here, because our school machines don't give you admin access and you might not be able to download and run the .exe file.
- Download this file (Tableau.xlam) and put it in Documents > My Tableau Repository.
- Go into Excel > File > Options > Add-ins
- At the bottom, choose Manage > Excel Add-ins and click Go
- Browse to fine the Tableau.xlam file in Documents > My Tableau Repository.
- Click OK through the boxes and you should end up with the "Tableau" menu in Excel.
Categories: Handouts, Spring 2014, Tableau, Tips
Tuesday, February 4, 2014
Getting a Lat/Long from Google Maps
Posted on Tuesday, February 04, 2014 by Christian McDonald
UPDATE 4.13.2014: It looks like the "new" Google Maps has returned the old functionality of being able to right-click on any spot in a map and choose "What's here" to get a decimal-based latitude and longitude. So I think the directions below are irrelevant now.
----------------------
It's easiest to do from the "classic" google maps. Who knows how long that will be available, so I've explained how with the "new" maps as well.
----------------------
It's easiest to do from the "classic" google maps. Who knows how long that will be available, so I've explained how with the "new" maps as well.
- Type in the address and hit return and make sure Google Maps takes you to the right place.
- Right-click on the map at the location and choose “What's here.”
- That will put something like this in the search bar: “30.258659,-97.744548”
- Put the first number that usually starts with “30” in the Latitude field.
- Put the second number that usually starts with ”-97” in the Longitude field.
If you have the “new” google maps, you have to do some extra work to get lat/long. It's easier to just use the “classic” link above, but if you insist:
- Type in the address and hit return to find the location
- X-out the location in the search bar so the pin goes away.
- Click on the map where the pin was, (and then maybe click again) and a window will come up showing the address and the lat, long, but it will formatted wrong. It will be something like: 30° 22.096', -97° 42.209'. Copy that text.
- Go to http://dbsgeo.com/latlon/ and paste in the text into Place Name.
- Make sure it takes you to your location, then from the “Latitude, Longitude” under the map copy the “30” number into Latitude into the homicide database, and the ”-97” number into Longitude.
Categories: Spring 2014, Tips
Wednesday, January 29, 2014
How to build a Google Fusion Tables map
Posted on Wednesday, January 29, 2014 by Christian McDonald
These are the basic steps we talked about in class today for building a Google Fusion Tables map. There are lots of nuances and such, but this is just the basic stuff. There are four steps: Upload, set features styles, set info window and publish.
Easy sneazy.
Now, here is the dirty little secret. The Fusion Table part is the easy part. It's getting the data formatted the way you want it before you import it that is the trick. You have to watch things like:
- Start with a good data set (we plotted points using this Dangerous Dogs list). While you could start with data that has a standard address, Google may or may not do a good jog geocoding that data. It's best if you have good Latitude and Longitude for each record. You can have a separate column each for Latitude and Longitude, or put them in the same column with a comma between them.
- Texas A&M has a free Geocoding service that is wonderful. Once it codes everything, it also tells you how close it got on the address. Watch out for anything that used ZIP code, as that wasn't good and you'll need to manually fix them.
- The Classic Google Maps makes geocoding a single address easy. Just find the place on the map, then right-click at that place and choose What's here, and it will put the Lat,Long in the search bar.
- Log into Fusion Tables. You have to have a regular free Google account (in other words, not your UTMail account.)
- Fusion Tables is connected to Google Drive. Go under Create and see if Fusion Tables is listed. If not, go to "Get more apps" and find it and add it.
- That said, I suggest you bookmark this link that shows only FT tables. It's hard to find otherwise, and is useful in searching for public tables.
- In Drive, go File > Create > Fusion Table (or choose New Table in the showtables list), which will take you to a window where you can upload your .xls file, or find one of your existing Google Spreadsheets.
- Let the wizard guide you through the upload until you see the rows of data.
- Note which columns are colored yellow, as FT thinks those are Locations. Sometimes you have to go under Edit > Change Columns and find the right Lat/Long column and change the type to "Location"
- Most times, FT will recognize the location columns (Lat/Long) and create a tab called "Map of [whatever]". If you don't have Lat/Long or shapes (covered later), then it might try to geocode based on an address field it finds. It might also ask for help to make it better, like adding a city or state, etc.
- Once you have the map and your points are plotted it's time to work on the feature styles and info window.
- While looking at the map tab, look left for Feature Styles. This is where you can pick what kind of marker you want to use for points. You can use a single style, or use different markers based on a column that has numeric values. You can also set the marker by name using a column.
- Click on Set Info Window to change what shows up in the pop-up window when you click on features in map. The Automatic sides lets you check on and off fields in the window. But if you want to edit what the label says, or add some simple HTML, then go to Custom and change the info window there.
- Once all that is set, there are two steps to publish.
- Click on the Share button at top right and set it to "View" for "Anyone with link can view" or for "Public. But DON'T USE THE URL THEY GIVE YOU THERE.
- Instead, now choose the menu under the Map tab and go to Publish. There you can get a link, iframe or javascript embed. to use in your blog.
Using shapefiles for data
Later in class, we published using shapes. I'll go over how to get your own shape files in a future class, but in this case we merged some Census data with an existing shapefile that was already public in Fusion Tables. This is a real abbreviated version:- Upload your data file to Fusion Tables. Note the GEOID field, which as field unique to each county.
- Go either to the advanced search in Drive or the Showtables page and search for Texas County Shapefile. If it doesn't come up, make sure you are searching for "public tables." Once you open that file, copy the URL into your clipboard.
- Go back into your census data file, and go under File > Merge, and the put in the URL of the county shapefile and hit next.
- Match up GEOID column in your data file to the GEOID10 file in the shapefile and merge them. You can keep all the columns for this demonstration.
- A map tab was created for you upon the merge because you have a "geometry" column with all the shape information.
- Now you can use the Set Feature Style, and set the ploygon fills to color by a bucket on the Median Age field. You can also set up an Automatic Legend there.
- Set up your Info Window with the important information, publish your map like you did above and BAM!, you are done.
Now, here is the dirty little secret. The Fusion Table part is the easy part. It's getting the data formatted the way you want it before you import it that is the trick. You have to watch things like:
- ID numbers that start with 0. Those fields have to be set as text fields in Excel, or you'll drop the leading zeros. Zip codes on the East Coast are bad. So are school codes.
- Fields you have to merge on need to be identical. If you are merging by county name, you can't have La Salle County in one file and LaSalle County in another.
- Fusion tables takes .kml for shapefiles (keyhole markup language), but most government supply in the multi-file .shp format. You can convert them at shpescape.com, or with QGIS, which will learn later in the course.
Categories: Fusion Tables, Spring 2014
Saturday, January 4, 2014
Data-related blogs, sites and Twitter accounts
Posted on Saturday, January 04, 2014 by Christian McDonald
In no particular order, and sure to change, these are blogs and sites that you should read regularly for ideas, lessons and knowledge about data and journalism.
- Source
- Flowing Data
- The Functional Art
- The Guardian's Data Blog
- Good Infographics
- ProPublica
- WNYC DataNews
- Texas Tribune TribNerds
- Charts N Things
- Visualizing Data
- Chicago Trib Apps Team
- Data Driven Journalism
- Center for Investigative Reporting
- Technical Bent
Twitter accounts and people worth following
- GoodData
- NPRViz
- Chicago TribApps
- DataRemixed (Ben Jones of Tableau)
- Texas Tribune TribData
Some Tableau-centric data blogs
Categories: Spring 2014
Sunday, November 10, 2013
Spring 2014 course is on a waitlist
Posted on Sunday, November 10, 2013 by Christian McDonald
The good news is that there is enough interest for this class to make for Spring 2014! The bad news is if you haven't registered yet, it is full and you'll be on a wait list. That also probably means that students outside the Journalism department (LBJ school, for instance) probably won't get in unless there are drops.
It's great to have so much interest in the class. The limit of 16 is firm because this is a lab-based classroom, with lots of hands-on work during class. That makes it both fun and challenging.
I've been working on the syllabus and outline for the class and should post some drafts well before the semester, but I'm also working on following semesters where I split this course with Data Driven Reporting.
It's great to have so much interest in the class. The limit of 16 is firm because this is a lab-based classroom, with lots of hands-on work during class. That makes it both fun and challenging.
I've been working on the syllabus and outline for the class and should post some drafts well before the semester, but I'm also working on following semesters where I split this course with Data Driven Reporting.
Subscribe to:
Posts (Atom)


