Data GIS Blog
Here at Data & GIS Services, we love finding new ways to map things. Earlier this semester I was researching how the Sheets tool in Google Drive could be used as a quick and easy visualization tool when I re-discovered its simple map functionality. While there are plenty of more powerful mapping tools if you want to have a lot of features (e.g., ArcGIS, QGIS, Google Fusion Tables, Google Earth, GeoCommons, Tableau, CartoDB), you might consider just sticking with a spreadsheet for some of your simpler projects.
I’ve created a few examples in a public Google Sheet, so you can see what the data and final maps look like. If you’d like to try creating these maps yourself, you can use this template (you’ll have to log into your Google account first, and then click on the “Use this template” button to get your own copy of the spreadsheet).Organizing Your Data
The main thing to remember when trying to create any map or chart in a Google sheet is that the tool is very particular about the order of columns. For any map, you will need (exactly) two columns. According to the error message that pops up if your columns are problematic: “The first column should contain location names or addresses. The second column should contain numeric values.”
Of course, I was curious about what counts as “location names” and wanted to test the limits of this GeoMap chart. If you have any experience with the Google Charts API, you might expect the Google Sheet GeoMap chart to work like the Geo Chart offered there. In the spreadsheet, however, you have only a small set of options compared to the charts API. You do have two map options — a “region” (or choropleth) map and a “marker” (or proportional symbol) map — but the choices for color shading and bubble size are built-in or limited.Region maps (Choropleths)
Region maps are fairly restrictive, because Google needs to know the exact boundary of the country or state that you’re interested in. In a nutshell, a region map can either use country names (or abbreviations) or state names (or abbreviations). The ISO 3166-1 alpha-2 codes seem to work exceptionally well for countries (blazing fast speeds!), but the full country name works well, too. For US states, I also recommend the two letter state abbreviation instead of the full state name. If you ever want to switch the map from “region” to “marker”, the abbreviations are much more specific than the name of the state. (For example, when I switch my “2008 US pres election” map to marker, Washington state turns into a bubble over Washington DC.)Marker maps (Proportional symbol maps)
Marker maps, on the other hand, allow for much more flexibility. In fact, the marker map in Google Sheets will actually geocode street addresses for you. In general, the marker map will work best if the first column (the location column) includes information that is as specific as possible. As I mentioned before, the word “Washington” will go through a search engine and will get matched to Washington DC before Washington state. Same with New York. But the marker map will basically do the search on any text, so the spreadsheet cell can say “NY”, or “100 State Street, Ithaca, NY”, or even the specific latitude and longitude of a place. (See the “World Capitals with lat/lon” sheet; I just put latitude and longitude in a single column, separated with a comma.) As long as the location information is in a single column, it should work, but the more specific the information is, the better.Procedure
When you have your data ready and want to create a map, just select the correct two columns in your spreadsheet, making sure that the first one has appropriate location information and the second one has some kind of numerical data. Then click on the “Insert” menu and go down to “Chart…” You’ll get the chart editor. The first screen will be the “Start” tab, and Google will try to guess what chart you’re trying to use. It probably won’t guess a map on the first try, so just click on the “Charts” tab at the top to manually select a map. Map is one of the lower options on the left hand side, and then you’ll be given a choice between the regions and markers maps. After you select the map, you can either stick with the defaults or go straight to the final tab, “Customize,” to change the colors or to zoom your map into a different region. (NB: As far as I can tell, the only regions that actually work are “World,” “United States,” “Europe,” and “Asia”.)
The default color scale goes from red to white to green. You’ll notice that the maps automatically have a “mid” value for the color. If you’d rather go straight from white to a dark color, just choose something in the middle for the “mid” color.
And there you have it! You can’t change anything beyond the region and the colors, so once you’ve customized those you can click “Update” and check out your map. Don’t like something? Click on the map and a little arrow will appear in the upper right corner. Click there to open the menu, then click on “Advanced edit…” to get back to the chart editor. If you want a bigger version of the map, you can select “Move to own sheet…” from that same menu.Pros and Cons
So, what are these maps good for? Well, firstly, they’re great if you have state or country data and you want a really quick view of the trends or errors in the data. Maybe you have a country missing and you didn’t even realize it. Maybe one of the values has an extra zero at the end and is much larger than expected. This kind of quick and dirty map might be exactly what you need to do some initial exploration of your data, all while staying in a spreadsheet program.
Another good use of this tool is to make a map where you need to geocode addresses but also have proportional symbols. Google Fusion Tables will geocode addresses for you, but it is best for point maps where all the points are the same size or for density maps that calculate how tightly clusters those points are. If you want the points to be sized (and colored) according to a data variable, this is possibly the easiest geocoder I’ve found. It’ll take a while to search for all of the locations, though, and there is probably an upper limit of a couple of hundred rows.
If this isn’t the tool for you, don’t despair! Make an appointment through email (firstname.lastname@example.org) or stop in to see us (walk-in schedule) to learn about other mapping tools, or you can even check out these 7 Ways to Make a Google Map Using Spreadsheet Data.
As we begin our summer in Data and GIS Services, we spend this post reflecting back on some of the services, software, and tools that made data work this spring more productive and more visible. We proudly present our top 10 list for the Spring 2014 semster:
While we enjoy working directly with researchers crafting data management plans, we realize that some data management needs arise outside of consultation hours. Fortunately, the Data Management Planning Tool (DMPTool) is there 24/7 to provide targeted guidance on data management plans for a range of granting agencies.
9. Fusion Tables
A database in the cloud that allows you to query and visualize your data, Fusion Tables has proven a powerful tool for researchers who need database functionality but don’t have time for a full featured database. We’ve worked with many groups to map their data in the cloud; see the Digital Projects blog for an example. Fusion Tables is a regular workshop in Data and GIS.
8. Open Refine
You could learn the UNIX command line and a scripting language to clean your data, but Open Refine opens data cleaning to a wider audience that is more concerned with simplicity than syntax. Open Refine is also a regular workshop in Data and GIS.
7. R and RStudio
A programming language that excels at statistics and data visualization, R offers a powerful, open source solution to running statistics and visualizing complex data. RStudio provides a clean, full-featured development environment for R that greatly enhances the analysis process.
6. Tableau Public
Need a quick, interactive data visualization that you can share with a wide audience? Tableau Public excels at producing dynamic data visualizations from a range of different datasets and provides intuitive controls for letting your audience explore the data.
ArcGIS has long been a core piece of software for researchers working with digital maps. ArcOnline extends the rich mapping features of ArcGIS into the cloud, allowing a wider audience to share and build mapping projects.
A Python library that brings data analysis and modeling to the Python scripting language, Pandas brings the ease and power of Python to a range of data management and analysis challenges.
Paste in your spreadsheet data, choose a layout, drag and drop your variables… and your visualization is ready. Raw makes it easy to go from data to visualization using an intuitive, minimal interface.
2. Stata 13
Another core piece of software in the Data and GIS Lab (and at Duke), Stata 13 brought new features and flexibility (automatic memory management — “hello big data”) that were greatly appreciated by Duke researchers.
1. R Markdown
While many librarians tell people to “document your work,” R Markdown makes it easy to document your research data, explain results, and embed your data visualizations using a minimal markup language that works in any text editor and ties nicely into the R programming language. For pulling it all together, R Markdown is number one in our top ten list!
We hope you’ve enjoyed the list! If you are interested in these or other data tools and techniques, please contact us at email@example.com!
On Thursday, April 17 and Friday, April 18, Duke University will host a visit from Francesca Samsel, a visual artist who uses technology to develop work on the fulcrum between art and science. Francesca works as Research Assistant Faculty in the Computer Science department of the University of Texas at El Paso, is a Research Affiliate with the Center for Agile Technologies at the University of Texas at Austin, and is also a long-term collaborating partner with Jim Ahrens’ Visualization Research Team at Los Alamos National Labs.
Francesca will give two presentations during her visit. A presentation on Thursday afternoon for the Media Arts + Sciences Rendezvous series will address the humanities community and present recommendations for work with scientists and visualization teams. A presentation over lunchtime on Friday for the Visualization Friday Forum will describe a variety of collaborations with scientific teams and address the benefits that can come from incorporating artists into a scientific research team.
Francesca’s visit is sponsored by Information Science + Information Studies (ISIS), with additional support from Media Arts + Sciences. We hope you can join us for one or both of the presentations!
Creating Mutually-Beneficial Multiple-Outcome Collaborations
Thursday, April 17
4:15 pm (talk starts at 4:30)
Smith Warehouse, Bay 10 classroom (2nd floor – enter through Bays 9 or 11)
Drinks and light snacks provided
Many artists draw on the scientific community as sources for their work. Research community are exploding with rich material connected to our contemporary lives. Given that art – science collaborations require weeks, realistically months, in a lab, shoulder to shoulder with the scientists, access is a huge barrier. Francesca Samsel will discuss her history of collaborations with visualization teams and scientists, what worked, what didn’t and how to get in the door.
An Artist, No Thanks! Employing Design and Color Theory to Increase Clarity, Perception and Depth within Scientific Visualization
Friday, April 18, 2014
12:00p.m. to 1:00p.m. (lunch provided)
Levine Science Research Center, Room D106 (near the Research Drive entrance), in conjunction with the Visualization Friday Forum
Francesca Samsel will discuss her ongoing work with Los Alamos National Labs, Research Visualization Team and why they hired an artist to help them design the next generation of scientific visualization tools. Their recent work focuses on developing algorithmically generated color maps to extract the maximum perceivable detail within exa-scale data sets. She will also discuss collaborations with the Visualization Division of the Texas Advanced Computing Center; hydrogeologists, neurologists, environmental research teams and more.