Skip to main content

A Self-guided resource and short video for a general overview of data management practices
Data management involves developing strategies and processes to ensure that research data are well organized, formatted, described, and documented, during a project’s lifecycle to support the potential sharing and archiving of resultant data. General good practices and resources are highlighted in this guide and short video.

Learn about services available for publishing your data at Duke.
This workshop provides an overview of Duke's Research Data Repository. The general functionalities of the platform and tips for submitting data are discussed as well as how repositories can help researchers comply with funder and journal policies as well as meet growing standards around data stewardship and sharing, such as the FAIR Guiding Principles. New features are also demonstrated including a new integration with the Globus platform to support transferring large-scale data.

Part of a series of data management 101 workshops developed for scientists, social scientists, and humanists respectively.
Humanists work with various media, content and materials (sources) as part of their research. These sources can be considered data. This workshop introduces data management practices for humanities researchers to consider and apply throughout the research lifecycle. Good data management practices pertaining to planning, organization, documentation, storage and backup, sharing, citation, and preservation will be presented through a humanities lens with discipline-based, concrete examples.

Learn about OSF features and tools that support research project management and reproducibility.
The Open Science Framework (OSF) is a free, open source project management tool developed and maintained by the Center for Open Science. The OSF can help scholars manage their workflow, organize their materials, and share all or part of a project with the broader research community. This workshop will demonstrate some of the key functionalities of the tool including how to structure your materials, manage permissions, version content, integrate with third-party tools (such as Box, GitHub, or Mendeley), share materials, register projects, and track usage. This workshop was presented in the Spring of 2018.

A workshop that addresses some of the challenges for preparing human participants data for ethical data publication.

Learn how to prepare your data for formal publishing within a data repository
This workshop, taught in Fall 2023, explores strategies and best practices for formally publishing your data in a data repository. Topics covered will include new modes of publishing in academia, the use of data and metadata standards to support interoperability and harmonization, an overview of repository options and key features, examples of disciplinary repositories, and data publishing methods to increase the impact of research projects and support the FAIR Guiding Principles (i.e., Findable, Accessible, Interoperable, and Reusable).

Learn about practices and principles to ethically and equitably share data.
This workshop explores the many different ethical issues that can arise with data management and sharing and strategies to address those issues to ensure that goals set by publishers and funders around reproducibility and reuse can be met. How are researchers expected to comply with data sharing policies and practices when they do not actually own the data or ensure disclosure protection for human participants? Likewise how can researchers ethically collect, handle, and share data from certain communities, such as Indigenous People? Topics covered will include proper consent procedures, de-identification, the impact of privacy laws on data sharing, and the application of diversity and equity principles to open science and data sharing.

A workshop to teach faculty and staff about new NIH data management policies and connect them with resources and tools to help
This workshop was a collaboration between the Duke Office of Scientific Integrity and the Duke University Libraries. There are many federal and private funders who require data management plans as part of a grant application, including NIH who recently released a new Data Management and Sharing Policy that takes effect in 2023 and will apply to all grants. This workshop covers the components of a data management plan, what makes a strong plan and how to adhere to it, and where to find guidance, tools, resources, and assistance for building funder-based plans. We also discuss how to make data management plans actionable and meaningful living documents to support research integrity, reproducibility, reuse, and verification of results.

Overview of general data management good practices and a few specific tools that can used
This workshop first introduces data management practices for researchers to consider and apply throughout the research lifecycle. Good data management practices pertaining to planning, organization, documentation, storage and backup, sharing, citation, and preservation will be presented using examples that span disciplines. The second hour of this workshop will offer a mini “tour” of research data management tools including GitHub, LabArchives, and OSF and provide a framework for considering how to assess data management tools for future adoption.

General practices and an example workflow for developing more reproducible practices
The importance of reproducibility, replication, and transparency in the research endeavor is increasingly discussed in academia. This full length workshop and associated online modules introduce the concept of “reproducibility” and foundational strategies that can increase the reproducibility of your work particularly related to organization, documentation, literate coding techniques, version control, and archiving data and code for future access and use. We will also present a protocol, the TIER protocol, as a tool that graduate students or others can use that are first approaching reproducibility. In the second half of the workshop we will present a potential end-to-end reproducible workflow using git, RStudio, Binder, and Zenodo to demonstrate some of the concepts in practice.

Pandas for tabular data, JupyterLab, Altair for visualization, & why Humanists might want to learn Python
The Python programming language is a great option for exploration, analysis and visualization of tabular (spreadsheet) data, such as spreadsheets and CSV files. This series of workshops will take you through some practical examples, from basic to advanced, using the Pandas module to load and transform data for analysis and visualization. There is also a video motivating why Humanities scholars could benefit from learning Python, showing examples of work that would have been very hard to do in other ways.

Programming power for non-coders
Open Refine allows for easy exploration of data. Define facets within data, identify data inconsistencies, quickly clean and transform data. Open Refine is an often intuitive but powerful tool for normalizing data. Use this before importing the dataset into a presentation application (e.g. mapping, charting, or analyzing.)

Use regex to find patterns in text
Regular Expressions are a powerful method of finding patterns in text. For example: find all words ending in ""ing""; all words which begin with a capital letter; all telephone area codes that begin with either the numbers 7 or 8; all email addresses which contain ""duke.edu"". Many programming languages use regular expressions as a means to support pattern matching.
Many research projects involve textual data, and computational advances now provide the means to engage in various types of automated text analysis that can enhance these projects. Understanding what analysis techniques are available and where they can appropriately be applied is an important first step to beginning a text analysis project. This hands-on approach to text analysis will give a quick overview of small- and large-scale text-based projects before addressing strategies for organizing and conducting text analysis projects. Tools for data collection, parsing and eventual analysis will be introduced and demonstrated. The workshop will focus on acquiring and preparing text sources for small-scale projects and text-based visualizations, but many of the techniques will be useful for larger projects as well. For this introduction, the focus will primarily be on using Graphical User Interface (GUI) tools like Microsoft Excel and Google Refine, instead of programming languages and command line approaches.

Gathering data from websites, HTML & JSON Parsing, APIs and gathering Twitter streams
Preexisting clean data sets such as the General Social Survey (GSS) or Census data, for example, are readily available, cover long periods of time, and have well documented codebooks. However, some people want to gather their own data. Recent tools and techniques for finding and compiling data from webpages, whole websites or social media sources have become more accessible. But these techniques provide a different layer of complexity.

Data analysis using Excel, and a Humanities-focused guide on how to turn single spreadsheets into normalized tables
The purpose of this workshop is to demonstrate simple steps in Excel that you can take to transform a single spreadsheet (such as a master copy of your data that you used to facilitate the gathering process) into a series of normalized tables that can be used to populate a relational database model using, for example, MySQL. In order to accomplish this, we first identify entities and corresponding attributes of those entities within the master datasheet, and then create separate tables for each entity that can be connected to one another by foreign keys (columns that reference, by means of an identification code, columns present in other tables). This workshop does not require any coding experience, but it is recommended that users are familiar with Excel basics.

Overview of basic mechanics of using NVivo to organize and analyze qualitative data
This guide provides a brief overview of how to use NVivo to process and analyze qualitative data. Topics ocvered include ingesting data, creating a coding schema, coding data, querying and visualizing data once processed, and operating in a team setting on a shared project.

General tips for creating better data visualization and visual communication

Things to consider when trying to create or improve your posters

Using Tableau to create easy interactive charts and maps for data exploration and communication
The Intro video shows you how to use Tableau to create easy interactive charts and maps for data exploration and communication. Tableau for Survey Data is a great intermediate Tableau workshop for everyone – not just people who deal with survey data – which introduces the Pivot, to transform wide data into tall/tidy, plus forming relationships between multiple tables (like a database JOIN), so it's valuable for any Tableau user. It also happens to visualize Likert scale survey data. The second half covers many common data transformation issues when trying to visualize data from Qualtrics.

Develop skills for creating and manipulating shapes in PowerPoint so you can make better diagrams
While Adobe Illustrator is my preferred software for producing diagrams, PowerPoint is quite full-featured and a great option for those who don't have access to Illustrator. Plus, any skills you gain drawing diagrams will help you create better presentation slides! In this workshop I'll cover basic shape creation and manipulation in Microsoft PowerPoint, including ways to make your own icons with shape combinations.

Basic Adobe Illustrator skills and tools for creating diagrams and for modifying charts and graphs
In the first workshop, you will learn the basics of using Adobe Illustrator, the professional standard in vector graphics software for creating diagrams and infographics. Many people avoid using it because of its steep learning curve, but you will see that it is quite easy to combine simple shapes to create interesting and clear diagrams, and to give all your work that professional edge. In the second workshop you’ll learn how to use Illustrator for fine-tuning charts and graphs created in other programs like Excel, Matlab and R, to give all your work a consistent look, extra highlights and annotations.

Learn how to make this extremely useful and flexible plot that's not part of Excel's default choices
Horizontal bar charts are one of the Excel default plots, but what if you need the same arrangement, but with symbols instead of bars, and perhaps you need error bars on those symbols (forest plot) or sets of symbols with bars in between (dumbbell plot)? In this short session you'll learn a very general technique for tricking Excel into making these sorts of plots. It's a bit of a pain the first time around, but once you practice a couple times you'll be a pro at bending Excel to your data visualization will!

Using the Python module Altair for data visualization and exploration that can be displayed on the web
While Python is my preferred programming language for scripted data transformations, I have avoided routinely doing data visualization in Python. I could follow examples for the many Python visualization libraries, but in the end they all seemed confusing and made it hard to do the types of exploratory visualization that Tableau made easy. Finally, [Altair](https://altair-viz.github.io/) has emerged as a viable alternative for me, because of the way it "thinks about" data and the visualization process. Altair is a declarative statistical visualization library for Python, built on top of the well-design and powerful [Vega-Lite](https://vega.github.io/vega-lite/) visualization grammar. (Vega-lite was built for the web, includes interaction, and is being adopted as a standard by high profile websites and tools.) It works well for small to medium-sized tabular data (like spreadsheets). In this workshop, I’ll run you through both some introductory and some more complex examples using Altair with Python in Jupyter notebooks, so you can get a feeling for how you might use it in your own work.

Visualizing data in R using the ggplot2 library, both introduction and more advanced videos.
In the first workshop, held in Spring 2021, Angela Zoss focuses on ggplot2, a library for R that creates clear and well-designed visualizations and that plays well with other tidyverse packages. We get up and running quickly with ggplot2, going through a variety of examples to learn how to understand, modify, and create ggplot2 visualizations. Building basic skills with visualization will improve your ability to create quick, exploratory visualizations for data analysis as well as more formal, outward-facing visualizations for presentations or publications. The second video covers more advanced topics like dealing with categorical data.

Using Gephi for layout, styling, and analysis of networks or node-link diagrams.
Networks (or graphs) are a compelling way of studying relationships between people, places, object, ideas, etc. Generating network data and visualizations, however, can be an involved and unintuitive process requiring specialized tools. This workshop will explore some of the easier ways to produce, load, and visualize network data using Gephi, an open source, multi-platform network analysis and visualization application.

Guide to Esri's platform for creating interactive maps and web applications
ArcGIS Online (AGOL) is a companion to the ArcGIS client that allows members of a group to store and share spatial data online and that can be used independently or in conjunction with the client. We'll discuss aspects of the AGOL organizational account, adding and accessing content, creating map and feature services, creating and sharing web maps and presentations, publishing web applications, and using analysis tools.

Mapping and spatial analysis with the latest desktop GIS application from Esri

Legacy desktop GIS software from Esri

Importing GIS data into Google Earth

Free and open-source desktop GIS software
Looking for an open source option for GIS? QGIS is free and it is one alternative to using ArcGIS. In this workshop we will demonstrate how to import and analyze data in QGIS and discuss the benefits of using QGIS over other GIS software.

Using the R langauge to procese, analyze, and visualize geospatial data
R has become a popular and reproducible option for supporting spatial and statistical analysis. This hands-on workshop will demonstrate how to plot x/y coordinates; how to generate thematic choropleths with US Census and other federal data; import, view and produce shapefiles; and create leaflet maps for viewing on the web.

Tableau's capabilities to visualize spatial data

Overview of tools for developing interactive online maps

Current and historical financial data, financial markets news, and economic data

Geospatial datasets for local, US, and inernational areas

Sources for data on international trade between countries, at aggregate and commodity levels

The premier source for detailed commodity-level trade data between countries

Sources to get data from the US population and economics Censuses, both current and historical

Economic data sources from US federal government agencies.

Additional Videos

More videos from CDVS, including several workshop recordings, are available on: