Web Scraping (Guide | Video). Cross listed as OnYourLaptop - Part 3/3 - Web Scraping - gathering data from websites, HTML & JSON Parsing, APIs and gathering Twitter streams (Guide | Video).
Preexisting clean data sets such as the General Social Survey (GSS) or Census data, for example, are readily available, cover long periods of time, and have well documented codebooks. However, some people want to gather their own data. Recent tools and techniques for finding and compiling data from webpages, whole websites or social media sources have become more accessible. But these techniques provide a different layer of complexity.
Data Storage and Management
Data Management Plans: Grants, Strategies and Considerations
Fall 2012-Spring 2014
In the last few years granting agencies across the disciplines have increasingly required data management plans as part of a grant proposal that detail strategies to manage, share and preserve research data as part of a funded grant project. NSF, the NIH, the National Endowment for the Humanities and other organizations have similar requirements, and Duke policy requires that research records (including digital data) be kept for at least five years. How should researchers respond? In this presentation, we’ll give an overview of research data management challenges and opportunities and describe some approaches for meeting them. We’ll ask the audience to share how they do data management now, and we’ll talk about planning underway for new services to help with data management at Duke.
Data Cleaning and Analysis
Stata for Research (Slides and Sample Data)
Spring 2011-Spring 2014
Stata for Research focuses on the core concepts of using stata. This workshop provides a hands on overview of how to load, manage, and analyze data using stata. The workshop will also include a brief introduction to stata graphics as well. No previous experience with stata is required.
Analysis with R
Spring 2013-Spring 2014
Explore the basics of the R programming language for statistics and graphing in this introductory workshop. This hands on workshop covers the basics of getting help, loading, managing, graphing, and analyzing data in R. No previous experience with R is required. Course materials will be available before the class for workshop participants.
Useful R Packages: Extensions for Data Analysis, Management, and Visualization
The basic version of the R programming language provides a powerful tool for data analysis, but much of the value in R lies in the wide range of libraries that extend its basic functionality. This workshop shares a number of popular extensions to R that enable rich graphics (ggplot, google graphics), file conversions, and additional statistical tests. A basic familiarity to R would be useful for this workshop.
OpenRefine: (Guide | Video). (Previously Google Refine.) Cross listed as OnYourLaptop - Part 1/3 - OpenRefine - Data Cleaning, Mining, Transformations, and Text Normalization (Guide | Video)
Open Refine (formerly Google Refine) is a tool for working with semi-structured datasets. It allows you to explore data, easily find facet patterns within data, enables simple detection of data inconsistencies, and offers quick clean-up and transformation options. Open Refine is an often intuitive but powerful tool for normalizing data before importing the dataset into a presentation application (e.g. mapping, charting, or analyzing.) In this hands-on class, we'll explore how Refine can help with common data cleaning challenges.
Introduction to Text Analysis
Fall 2012-Spring 2014
Many research projects involve textual data, and computational advances now provide the means to engage in various types of automated text analysis that can enhance these projects. Understanding what analysis techniques are available and where they can appropriately be applied is an important first step to beginning a text analysis project.
This hands-on approach to text analysis will give a quick overview of small- and large-scale text-based projects before addressing strategies for organizing and conducting text analysis projects. Tools for data collection, parsing and eventual analysis will be introduced and demonstrated. The workshop will focus on acquiring and preparing text sources for small-scale projects and text-based visualizations, but many of the techniques will be useful for larger projects as well. For this introduction, the focus will primarily be on using Graphical User Interface (GUI) tools like Microsoft Excel and Google Refine, instead of programming languages and command line approaches.
Regular Expressions are a powerful method of finding patterns in text. For example: find all words ending in "ing"; all words which begin with a capital letter; all telephone area codes that begin with either the numbers 7 or 8; all email addresses which contain "duke.edu". Many programming languages use regular expressions as a means to support pattern matching.
Mapping and GIS
Do you want to find out how geographic information (GIS) software can aid your research? This class will provide an overview of how ArcGIS software can help you analyze or visualize digital data that has a locational component, as well as discuss starting points for obtaining data. Examples will focus on social science data, but attendees are encouraged to ask questions regarding their own needs and will be welcome to make one-on-one appointments later for more focused instruction.
ArcGIS Online (AGOL) is a companion to the ArcGIS client that allows members of a group to store and share spatial data online and that can be used independently or in conjunction with the client. We'll discuss aspects of the AGOL organizational account, adding and accessing content, creating map and feature services, creating and sharing web maps and presentations, publishing web applications, and using analysis tools.
This class will show some ways that ArcGIS can be used for the analysis and visualization of historical spatial data. Topics discussed will be: sources for GIS layers reflecting the past, georeferencing a scanned historic map, creating new layers from scratch based on known locations of features, editing existing GIS layers to reflect former features and vectorizing a scanned map to create editable features.
Compare and contrast several products intended for geospatial visualization (e.g., a map to embed in a blog or PowerPoint, or for a poster session) and in some cases for GIS data analysis. (1) ArcGIS Online: Companion to the ArcGIS client that allows members of a group to store and share spatial data online, and that can be used independently or in conjunction with the client; (2) GeoCommons: both a repository for spatial data as well as an analysis and visualization tool; (3) Google Earth: emphasis on its features that are most applicable in an academic setting. See our schedule for another session on Google Fusion Tables.
Google Fusion Tables
Fall 2011-Spring 2014
Introduction to the features of Google Fusion Tables, which include merging datasets, filtering and aggregating data, and visualizing data by creating online maps and graphs. For certain tasks, it can serve as an alternative to using statistical software such as Stata or GIS software such as ArcGIS.
Introduction to Tableau Public
Fall 2012-Spring 2014
Tableau Public is free software that allows individuals to quickly create interactive visualizations of their research and business analytics data. This workshop will focus on using Tableau Public to create data visualizations, starting with an overview of the structure of the program and the terminology used. The workshop will include a sample data visualization project, focusing especially on some of the new features in Tableau Public. We will also discuss publishing to the Tableau Public web server and related services and tools, like the full Tableau Desktop application and the recently released Tableau Online.
Designing Academic Figures and Posters
Fall 2013-Spring 2014
Figures and other forms of visual representation can have a huge impact on the communication of research to a broader audience. A well designed figure can summarize research, captivate audience interest, and/or explain complicated phenomena and processes. Likewise, becoming familiar with good strategies for poster design allows researchers to take full advantage of the opportunity to network with colleagues and promote their own research. This workshop will cover basic considerations for designing effective academic figures and posters, including use of color, layout, fonts/typography, and software choices.
Top 10 Dos and Don'ts for Charts and Graphs
Spring 2013-Fall 2013
Simple charts and graphs can be incredibly effective at summarizing data. They are common and thus easier for a wide audience to understand. They are also easy to produce in the tools many people regularly use for other data analysis or project management work. With a few simple tips and tricks, you can avoid common missteps and make sure your charts are clear and easy to understand.
Data Visualization on the Web
Until recently, most data visualizations were created by installing statistics or visualization software onto our computers. In recent years, however, a number of web-based data visualization tools have been developed. These tools offer many advantages over downloaded software applications - visualizations can be created on PCs or Macs, groups can often collaborate in their creation, and the results can often be shared more easily. This workshop will give a quick overview of several web-based visualizations tools, including Google Spreadsheets and Raw. Participants are encouraged to bring laptops to follow along with the demonstrations.
Data Visualization on the Web (Advanced)
Networks (or graphs) are a compelling way of studying relationships between people, places, object, ideas, etc. Generating network data and visualizations, however, can be an involved process requiring specialized tools. This workshop will explore some of the easier ways to produce, load, and visualize network data using Gephi, an open source, multi-platform network analysis and visualization application. Time will be available at the end of the workshop to discuss specific projects and test out different techniques with Gephi.