Data Visualization · Data Sources · Data Management · Data Cleaning · Mapping and GIS

Data Visualization

Adobe Illustrator for Diagrams and Visualizations

Spring 2016
Video · Files and slides

Do you ever try to draw simple diagrams for proposals, reports or publications? Have you ever struggled to get a graphing program to make your plots look just right? Adobe Illustrator is a vector graphics editing program which can be very useful for faculty, students and staff in these types of situations, but many people avoid it because of the seemingly steep learning curve. In this workshop I will present a few basic principles of good graphic design, and then run through some simple examples of Illustrator's capabilities, showing you how to start using it to modify your graphs and create diagrams to explain your ideas.

Advanced Tableau (Data Structures)

Spring 2016
Video · Data and slides

This workshop will focus on the challenges of using different types and structures of data in Tableau. We will learn how to clean and organize various data sources for Tableau, how to join and blend data to combine datasets, and how to design visualizations when datasets have been joined or blended.

Designing Academic Figures and Posters

Fall 2013 - Spring 2016
Video · Guide

Figures and other forms of visual representation can have a huge impact on the communication of research to a broader audience. A well designed figure can summarize research, captivate audience interest, and/or explain complicated phenomena and processes. Likewise, becoming familiar with good strategies for poster design allows researchers to take full advantage of the opportunity to network with colleagues and promote their own research. This workshop will cover basic considerations for designing effective academic figures and posters, including use of color, layout, fonts/typography, and software choices.

Easy Interactive Charts and Maps with Tableau

Fall 2012 - Spring 2016
Video · Data and slides · Guide

Tableau Public (available for both Windows and Mac) is free software that allows individuals to quickly create interactive visualizations of their research and business analytics data. This workshop will focus on using Tableau Public to create data visualizations, starting with an overview of the structure of the program and the terminology used. The workshop will include a sample data visualization and mapping project, focusing especially on some of the new features in Tableau Public 9. We will also discuss publishing to the Tableau Public web server and related services and tools, like the full Tableau Desktop application (free for full-time students).

Making Data Visual

Spring 2016
Video · Slides

The process of making data visual can be nuanced and iterative. Sometimes we start a project with a very specific idea of the kind of visualization we want, but other times we may not be sure what will work best. This workshop will address three important aspects of making data visual: identifying the goal of your visualization, identifying the audience of your visualization, and understanding the pros and cons of different types of visualizations. This workshop will focus not on any specific software application, but instead will focus on helping attendees develop instincts for what kinds of visualizations match well with particular datasets, goals, and audiences. While some mention will be made of non-traditional visualizations, like custom diagrams, the emphasis will be on standard visualization types.

Top 10 Dos and Don'ts for Charts and Graphs

Spring 2013 - Fall 2013
Guide

Simple charts and graphs can be incredibly effective at summarizing data. They are common and thus easier for a wide audience to understand. They are also easy to produce in the tools many people regularly use for other data analysis or project management work. With a few simple tips and tricks, you can avoid common missteps and make sure your charts are clear and easy to understand.

Data Visualization on the Web

Spring 2014

Until recently, most data visualizations were created by installing statistics or visualization software onto our computers. In recent years, however, a number of web-based data visualization tools have been developed. These tools offer many advantages over downloaded software applications - visualizations can be created on PCs or Macs, groups can often collaborate in their creation, and the results can often be shared more easily. This workshop will give a quick overview of several web-based visualizations tools, including Google Spreadsheets and Raw. Participants are encouraged to bring laptops to follow along with the demonstrations.

Data Visualization on the Web (Advanced)

Spring 2014
Slides

While many web-based tools are now available for developing standard chart and visualization types, there are many times when it is necessary to generate custom visualization types that can be shared via the web. JavaScript is one of the most popular ways of generating custom, web-based visualizations. This workshop, held in conjunction with the Duke d3 study group, will introduce a popular JavaScript data visualization library called d3 (http://d3js.org/). We will start from a working code example that can be modified and extended, using it to highlight some of the tricky aspects of learning d3. Some comfort with HTML will be useful for this workshop, but limited knowledge of JavaScript is fine.

Using Gephi for Network Analysis and Visualization

Spring 2014
Video · Slides and data

Networks (or graphs) are a compelling way of studying relationships between people, places, object, ideas, etc. Generating network data and visualizations, however, can be an involved process requiring specialized tools. This workshop will explore some of the easier ways to produce, load, and visualize network data using Gephi, an open source, multi-platform network analysis and visualization application. Time will be available at the end of the workshop to discuss specific projects and test out different techniques with Gephi.

Data Sources

Web Scraping

Fall 2015 - Spring 2017
Video · Guide

AKA: OnYourLaptop - Part 3/3 - Web Scraping - gathering data from websites, HTML & JSON Parsing, APIs and gathering Twitter streams

Preexisting clean data sets such as the General Social Survey (GSS) or Census data, for example, are readily available, cover long periods of time, and have well documented codebooks. However, some people want to gather their own data. Recent tools and techniques for finding and compiling data from webpages, whole websites or social media sources have become more accessible. But these techniques provide a different layer of complexity.

Twitter Stream Gathering

Spring 2017 - Spring 2017
Video · Guide

Explore two methods of gathering real-time twitter stream data, hands-on exercises in applying for twitter API Keys and configuring a twitter-stream data gathering tool.  Investigate and discuss historical twitter data gathering.  Discuss considerations for analysis.

Structuring Humanities Data

Spring 2016
Video · Data and slides

Have you ever wondered what medieval scribes, ancient artifacts, historical paintings and Victorian fiction have to do with data? Have you ever thought about how social media data can be used to document and analyze groups, events and moments in history? Digital tools can open up new and exciting possibilities for Humanistic inquiry, as long as you see people, places, dates and relationships as data and know how to "speak" in the way a computers understand. Through a series of case studies, the Structuring Humanities Data workshop will help Humanists see the data in their subjects and provide guidelines for how to structure and gather data in simple spreadsheets, including ways to deal with tricky but common situations like uncertainty in dates. The workshop will also show examples where computers were used to help gather data automatically, and look under the hood at some data driving visualizations on the web.

Data Management

Data Management Fundamentals

Spring 2017 - Present
Video · Slides

This workshop introduces data management practices to consider throughout the research lifecycle: planning, organization, documentation, storage and backup, sharing, citation, and preservation. The workshop will offer an overview of general recommendations that are relevant across disciplines and will point attendees to additional resources at Duke and beyond.

Introduction to the Open Science Framework

Spring 2018 - Present
Video · Slides

The Open Science Framework (OSF) is a free, open source project management tool developed and maintained by the Center for Open Science. The OSF can help scholars manage their workflow, organize their materials, and share all or part of a project with the broader research community. This workshop will demonstrate some of the key functionalities of the tool including how to structure your materials, manage permissions, version content, integrate with third-party tools (such as Box, GitHub, or Mendeley), share materials, register projects, and track usage.

OSF + TIER Protocol: Designing a reproducibile workflow

Spring 2018
Video · Slides

The Open Science Framework is a free online system for managing and sharing research materials throughout the research cycle, and the TIER Protocol is a workflow for maintaining well-organized documentation of your data and analyses. This workshop will introduce you to key features of the OSF and demonstrate how it can be used in conjunction with TIER to facilitate reproducible research practices, collaboration, and best practices in data management.

Publishing Data with Research and Other Strategies for Increasing Your Impact

Spring 2018
Video · Slides

Scholars can and do communicate their research in various ways. While peer-reviewed journal publications remain the primary outlet for sharing the key results of research projects, there are growing norms (and expectations) that the underlying data from projects should also be published. In this workshop, we will look at 1) strategies to effectively publish data; 2) journal policies related to data sharing; 3) new types of publications such as data articles and registered reports; and 4) strategies for increasing and measuring the impact of your research. There will also be a hands-on portion of the workshop where participants will create their own ORCID identifier.

Managing Sensitive Data

Spring 2018
Video · Slides

In the course of your research you may collect, interact with or analyze data that are classified as “Sensitive” or "Restricted" according to Duke's data classification standard. In this workshop we will examine common sensitive data types, how Duke’s IRB and Information Technology Security Office (ITSO) expects you to protect that data throughout your project’s lifecycle and the resources available to you for sensitive data storage and analysis, data de-identification, and data archiving and sharing.

Writing a Data Management Plan

Fall 2017
Slides

This workshop will be a deep dive into the process of writing a data management plan (DMP) using the DMPTool. To make the most of this workshop, attendees are encouraged to bring a “live” DMP that they are ready to begin or are currently in the process of writing. Attendees without active DMPs may write a “test” DMP based on who they would typically apply to for research funding. The “test” DMP can then serve as a useful reference when it is time to write a live plan. The instructors (both Research Data Management Consultants) will be on-hand to provide individual help during the writing portion of the workshop as needed and, following the workshop, are available to review plans through the DMPTool at any point up to final submission.

Finding a Home for Your Data: An Introduction to Archives and Repositories

Fall 2017
Video · Slides

Publishing and preserving research data within a trusted repository helps researchers comply with funder and journal data sharing policies, supports the discovery of and access to data, and can result in more visibility and higher impact for research projects. This workshop will provide an overview of the different types of repositories and the overall role of repositories within the data sharing landscape. Key repositories in various disciplines will be explored, and attendees will learn about resources for locating and assessing repositories. Attendees will also have an opportunity to locate appropriate repositories for their own research.

Research Collaboration Strategies and Tools

Fall 2017
Slides

Scholars increasingly work on collaborative research projects. Collaborative projects often bring together partners across disciplines, institutions, and sectors. These projects present opportunities for innovation but also raise challenges for the development of efficient and effective workflows and the management of data. This workshop will examine considerations for collaborative research and present some strategies for developing and documenting workflows as well as methods for storing and sharing data. We will also look at some tools (i.e., Box, OSF, PRDN, etc.) available at Duke that can be used to support these types of projects.

Data Management and Grants: Complying with Mandates

Spring 2017
Slides

Today, researchers are increasingly faced with requirements from both federal and private funders to share, archive, and plan for the management of their data. This trend began in 2003 when NIH released their data sharing policy and in 2011 the NSF began requiring that all grant proposals include a two-page Data Management Plan (or DMP). Then in 2013 an Office of Science and Technology Policy Memo directed all federal agencies with over $100 million in annual funding to develop plans to make research products, including data, openly accessible. This workshop will provide an overview of funding agencies’ DMP requirements, the primary components of data management plans, and suggestions for integrating data management updates into grant reporting. Attendees will also learn about tools and resources that can help them write a DMP that complies with funder mandates. A portion of this workshop will include a hands-on data management plan exercise.

Data Management Tools: The Dataverse Project

Spring 2017
Slides

The Dataverse is an open source repository software platform for sharing, preserving, citing, discovering, exploring, and analyzing research data. This workshop will provide an overview of the Dataverse Project and demonstrate how the Dataverse can be used to discover research data and manage and share data in compliance with best practices.

Data Management Tools: Colectica for Excel

Spring 2017
Slides

Are you an avid Excel user? Would you like to know how to add helpful documentation into your Excel files and generate codebooks automatically? If so, I’d like to introduce you to Colectica. While there is a paid version, there is a free version that is more than adequate if you plan to do all of your analysis in Excel. Visit http://www.colectica.com/software/colecticaforexcel and download before the workshop (my apologies to Mac Users in advance - this software only works with Windows). You are encouraged to bring a laptop and your own Excel file(s) to take some time to get to know the tool. Feel free to bring your lunch. In the spirit of Love Your Data Week and Valentine's Day, chocolate will be provided.

Data Management and Reproducibility: Enabling Open and Transparent Research through Data Sharing

Spring 2017
Slides

Making data available within repositories is an essential aspect of supporting open and transparent research. Today as science is tackling the so-called “reproducibility crisis”, researchers are increasingly faced with journal requirements to share their data for the purposes of verification. This workshop will explore the concept of reproducibility, the growth in journal data sharing policies, and present strategies to help researchers share data that meet standards for reproducibility and reuse.

Data Management Plans: Grants, Strategies and Considerations

Fall 2012 - Spring 2014
Guide

In the last few years granting agencies across the disciplines have increasingly required data management plans as part of a grant proposal that detail strategies to manage, share and preserve research data as part of a funded grant project. NSF, the NIH, the National Endowment for the Humanities and other organizations have similar requirements, and Duke policy requires that research records (including digital data) be kept for at least five years. How should researchers respond? In this presentation, we’ll give an overview of research data management challenges and opportunities and describe some approaches for meeting them. We’ll ask the audience to share how they do data management now, and we’ll talk about planning underway for new services to help with data management at Duke.

Data Cleaning and Analysis

Advanced Excel for Data Projects

Fall 2015 - Spring 2016
Video · Data and slides

Spreadsheets are a standard tool for many data projects, whether because of the ability to easily edit data, the ubiquity of spreadsheet programs, or the added features like charts and filters. This workshop extends the introduction from our Basic Data Cleaning and Analysis for Data Tables workshop by focusing on more advanced features in Excel. Examples include filtering, pivot tables, and data visualizations.

Basic Data Cleaning and Analysis for Data Tables

Fall 2014 - Spring 2016
Video · Data and slides

Tables of data, like those you see in spreadsheets or relational databases, are the foundation of most data-driven research today. There are many pitfalls of working with these tables, though, that most people end up having to learn the hard way. In this workshop, we'll take a dataset that has a variety of different properties and learn to work through many common steps of data-driven research to clean and begin analyzing the data. We'll be using Excel to make sure the methods we suggest can be reproduced easily "at home," but many of these techniques are important for other data analysis tools as well. No data experience necessary.

Introduction to R: Data Transformations, Analysis, and Data Structures

Fall 2016 - Present
Video · Workshop files · Guide

A gentle introduction to the basics of the R statistical programming language using the RStudio development environment. Learn about managing your R projects, data types, variable assignments, data cleaning and visualization. No previous experience required.

Reproducibility: Data Management, Git, and RStudio

Fall 2017 - Present
Video · OSF Page · Guide

In response to a growing focus on the importance of reproducibility, replication, and transparency in the research endeavor, scholars are adapting their practices and learning new skills and tools. This workshop will introduce some general data management strategies that can increase the reproducibility of your work. You will also learn through hands-on exercises how to harness two specific tools, git and RStudio, to support the execution of more reproducible research projects. Git is a powerful version control system and RStudio is an open-source statistical software program. 

Introduction to Stata

Spring 2011 - Spring 2016
Video · Slides and Sample Data

Stata for Research focuses on the core concepts of using stata. This workshop provides a hands on overview of how to load, manage, and analyze data using stata. The workshop will also include a brief introduction to stata graphics as well. No previous experience with stata is required.

OpenRefine

Fall 2011 - Spring 2017
Video · Guide · Data

AKA: OnYourLaptop - Part 1/3 - OpenRefine - Data Cleaning, Mining, Transformations, and Text Normalization

Open Refine (formerly Google Refine) is a tool for working with semi-structured datasets. It allows you to explore data, easily find facet patterns within data, enables simple detection of data inconsistencies, and offers quick clean-up and transformation options. Open Refine is an often intuitive but powerful tool for normalizing data before importing the dataset into a presentation application (e.g. mapping, charting, or analyzing.) In this hands-on class, we'll explore how Refine can help with common data cleaning challenges.

Regular Expressions (RegEx):

Fall 2015 - Spring 2017
Video · Guide · Workshop files

AKA: OnYourLaptop - Part 2/3 - Regular Expressions (RegEx)

Regular Expressions are a powerful method of finding patterns in text. For example: find all words ending in "ing"; all words which begin with a capital letter; all telephone area codes that begin with either the numbers 7 or 8; all email addresses which contain "duke.edu". Many programming languages use regular expressions as a means to support pattern matching.

Useful R Packages: Extensions for Data Analysis, Management, and Visualization

Spring 2014

The basic version of the R programming language provides a powerful tool for data analysis, but much of the value in R lies in the wide range of libraries that extend its basic functionality. This workshop shares a number of popular extensions to R that enable rich graphics (ggplot, google graphics), file conversions, and additional statistical tests. A basic familiarity to R would be useful for this workshop.

Introduction to Text Analysis

Fall 2012 - Spring 2014
Guide

Many research projects involve textual data, and computational advances now provide the means to engage in various types of automated text analysis that can enhance these projects. Understanding what analysis techniques are available and where they can appropriately be applied is an important first step to beginning a text analysis project.

This hands-on approach to text analysis will give a quick overview of small- and large-scale text-based projects before addressing strategies for organizing and conducting text analysis projects. Tools for data collection, parsing and eventual analysis will be introduced and demonstrated. The workshop will focus on acquiring and preparing text sources for small-scale projects and text-based visualizations, but many of the techniques will be useful for larger projects as well. For this introduction, the focus will primarily be on using Graphical User Interface (GUI) tools like Microsoft Excel and Google Refine, instead of programming languages and command line approaches.

Mapping and GIS

Introduction to ArcGIS

Fall 2010 - Spring 2016
Video · Guide · Sample Data

Do you want to find out how geographic information (GIS) software can aid your research? This class will provide an overview of how ArcGIS software can help you analyze or visualize digital data that has a locational component, as well as discuss starting points for obtaining data. Examples will focus on social science data, but attendees are encouraged to ask questions regarding their own needs and will be welcome to make one-on-one appointments later for more focused instruction.

Historical GIS

Fall 2012 - Spring 2016
Guide · Sample Data

This class will show some ways that ArcGIS can be used for the analysis and visualization of historical spatial data. Topics discussed will be: sources for GIS layers reflecting the past, georeferencing a scanned historic map, creating new layers from scratch based on known locations of features, editing existing GIS layers to reflect former features and vectorizing a scanned map to create editable features.

Introduction to QGIS

Spring 2016
Guide

Looking for an open source option for GIS? QGIS is free and it is one alternative to using ArcGIS. In this workshop we will demonstrate how to import and analyze data in QGIS and discuss the benefits of using QGIS over other GIS software.

ArcGIS Online

Spring 2014
Guide

ArcGIS Online (AGOL) is a companion to the ArcGIS client that allows members of a group to store and share spatial data online and that can be used independently or in conjunction with the client. We'll discuss aspects of the AGOL organizational account, adding and accessing content, creating map and feature services, creating and sharing web maps and presentations, publishing web applications, and using analysis tools.

Mapping with R

Fall 2017 - Present
Video · Guide · Slides

R has become a popular and reproducible option for supporting spatial and statistical analysis. This hands-on workshop will demonstrate how to plot x/y coordinates; how to generate thematic choropleths with US Census and other federal data; import, view and produce shapefiles; and create leaflet maps for viewing on the web.

Web GIS Applications

Fall 2012 - Fall 2013
Guide · Sample Data

Compare and contrast several products intended for geospatial visualization (e.g., a map to embed in a blog or PowerPoint, or for a poster session) and in some cases for GIS data analysis. (1) ArcGIS Online: Companion to the ArcGIS client that allows members of a group to store and share spatial data online, and that can be used independently or in conjunction with the client; (2) GeoCommons: both a repository for spatial data as well as an analysis and visualization tool; (3) Google Earth: emphasis on its features that are most applicable in an academic setting. See our schedule for another session on Google Fusion Tables.

Google Fusion Tables

Fall 2011 - Spring 2014
Slides

Introduction to the features of Google Fusion Tables, which include merging datasets, filtering and aggregating data, and visualizing data by creating online maps and graphs. For certain tasks, it can serve as an alternative to using statistical software such as Stata or GIS software such as ArcGIS.