Adobe Illustrator for Diagrams and Visualizations
Do you ever try to draw simple diagrams for proposals, reports or publications? Have you ever struggled to get a graphing program to make your plots look just right? Adobe Illustrator is a vector graphics editing program which can be very useful for faculty, students and staff in these types of situations, but many people avoid it because of the seemingly steep learning curve. In this workshop I will present a few basic principles of good graphic design, and then run through some simple examples of Illustrator's capabilities, showing you how to start using it to modify your graphs and create diagrams to explain your ideas.
Advanced Tableau (Data Structures)
This workshop will focus on the challenges of using different types and structures of data in Tableau. We will learn how to clean and organize various data sources for Tableau, how to join and blend data to combine datasets, and how to design visualizations when datasets have been joined or blended.
Designing Academic Figures and Posters
Figures and other forms of visual representation can have a huge impact on the communication of research to a broader audience. A well designed figure can summarize research, captivate audience interest, and/or explain complicated phenomena and processes. Likewise, becoming familiar with good strategies for poster design allows researchers to take full advantage of the opportunity to network with colleagues and promote their own research. This workshop will cover basic considerations for designing effective academic figures and posters, including use of color, layout, fonts/typography, and software choices.
Easy Interactive Charts and Maps with Tableau
Tableau Public (available for both Windows and Mac) is free software that allows individuals to quickly create interactive visualizations of their research and business analytics data. This workshop will focus on using Tableau Public to create data visualizations, starting with an overview of the structure of the program and the terminology used. The workshop will include a sample data visualization and mapping project, focusing especially on some of the new features in Tableau Public 9. We will also discuss publishing to the Tableau Public web server and related services and tools, like the full Tableau Desktop application (free for full-time students).
Making Data Visual
The process of making data visual can be nuanced and iterative. Sometimes we start a project with a very specific idea of the kind of visualization we want, but other times we may not be sure what will work best. This workshop will address three important aspects of making data visual: identifying the goal of your visualization, identifying the audience of your visualization, and understanding the pros and cons of different types of visualizations. This workshop will focus not on any specific software application, but instead will focus on helping attendees develop instincts for what kinds of visualizations match well with particular datasets, goals, and audiences. While some mention will be made of non-traditional visualizations, like custom diagrams, the emphasis will be on standard visualization types.
Top 10 Dos and Don'ts for Charts and Graphs
Spring 2013 - Fall 2013
Simple charts and graphs can be incredibly effective at summarizing data. They are common and thus easier for a wide audience to understand. They are also easy to produce in the tools many people regularly use for other data analysis or project management work. With a few simple tips and tricks, you can avoid common missteps and make sure your charts are clear and easy to understand.
Data Visualization on the Web
Until recently, most data visualizations were created by installing statistics or visualization software onto our computers. In recent years, however, a number of web-based data visualization tools have been developed. These tools offer many advantages over downloaded software applications - visualizations can be created on PCs or Macs, groups can often collaborate in their creation, and the results can often be shared more easily. This workshop will give a quick overview of several web-based visualizations tools, including Google Spreadsheets and Raw. Participants are encouraged to bring laptops to follow along with the demonstrations.
Data Visualization on the Web (Advanced)
Using Gephi for Network Analysis and Visualization
Networks (or graphs) are a compelling way of studying relationships between people, places, object, ideas, etc. Generating network data and visualizations, however, can be an involved process requiring specialized tools. This workshop will explore some of the easier ways to produce, load, and visualize network data using Gephi, an open source, multi-platform network analysis and visualization application. Time will be available at the end of the workshop to discuss specific projects and test out different techniques with Gephi.
AKA: OnYourLaptop - Part 3/3 - Web Scraping - gathering data from websites, HTML & JSON Parsing, APIs and gathering Twitter streams
Preexisting clean data sets such as the General Social Survey (GSS) or Census data, for example, are readily available, cover long periods of time, and have well documented codebooks. However, some people want to gather their own data. Recent tools and techniques for finding and compiling data from webpages, whole websites or social media sources have become more accessible. But these techniques provide a different layer of complexity.
Explore two methods of gathering real-time twitter stream data, hands-on exercises in applying for twitter API Keys and configuring a twitter-stream data gathering tool. Investigate and discuss historical twitter data gathering. Discuss considerations for analysis.
Structuring Humanities Data
Have you ever wondered what medieval scribes, ancient artifacts, historical paintings and Victorian fiction have to do with data? Have you ever thought about how social media data can be used to document and analyze groups, events and moments in history? Digital tools can open up new and exciting possibilities for Humanistic inquiry, as long as you see people, places, dates and relationships as data and know how to "speak" in the way a computers understand. Through a series of case studies, the Structuring Humanities Data workshop will help Humanists see the data in their subjects and provide guidelines for how to structure and gather data in simple spreadsheets, including ways to deal with tricky but common situations like uncertainty in dates. The workshop will also show examples where computers were used to help gather data automatically, and look under the hood at some data driving visualizations on the web.
Data Management 101 for Scientists
Scientists work with lots of data both big and small, and in many formats and systems. This workshop will introduce data management practices for scientists to consider and apply throughout the research lifecycle. Good data management practices pertaining to planning, organization, documentation, storage and backup, sharing, citation, and preservation will be presented through a sciences lens using discipline-based, concrete examples. While good general data management practices are relevant across disciplines, participants working specifically within the sciences are the intended audience for this workshop.
Data Management 101 for Humanists
Humanists work with various media, content and materials (sources) as part of their research. These sources can be considered data. This workshop will introduce data management practices for humanities researchers to consider and apply throughout the research lifecycle. Good data management practices pertaining to planning, organization, documentation, storage and backup, sharing, citation, and preservation will be presented through a humanities lens with discipline-based, concrete examples. While general good data management practices are relevant across disciplines, participants working specifically within the humanities are the intended audience for this workshop.
Data Management 201: How and where to publish your data
Data management practices help researchers take care of their data throughout the entire research process from the planning phase to the end of a project when data might be shared or “published” within a repository. Building upon the foundational concepts covered in the Data Management 101 courses offered this year, this workshop will provide hands-on experience where participants will learn strategies for how to prepare data for publishing by “curating” an example dataset and identifying common data issues. Participants will also learn about the overall role of repositories within the data sharing landscape and learn strategies for locating and assessing repositories.
Research Reproducibility: Tips and Tools
In response to a growing focus on the importance of reproducibility, replication, and transparency in the research endeavor, scholars are adapting their practices and learning new skills and tools. This workshop will introduce some foundational strategies that can increase the reproducibility of your work. You will also learn about specific tools and protocols that you might use within your research workflows including the TIER protocol, git and GitHub, and online containerization tools such as Binder and Code Ocean.
Data Management 101 with Disciplinary Discussions
Researchers work with lots of data both big and small, in many different formats and across various digital systems. The first hour of this workshop will introduce good digital data management practices and how they can be practically applied throughout the research lifecycle. Good data management practices cover pre-project planning, active workflow organization, documentation, storage and backup strategies, and optimizing your final “data package” to fulfill research quality and reproducibility requirements and facilitate new research. The second hour of this workshop will allow participants to break up into broad disciplinary groups (sciences and engineering, social sciences, and humanities) for a facilitated discussion on how to specifically apply good data management in your field. Be prepared to share your own tips and tricks and challenges for this portion of the workshop.
Data Management 101 with Tool Demonstrations
Researchers work with lots of data both big and small, in many different formats and across various digital systems. The first hour of this workshop will introduce good digital data management practices and how they can be practically applied throughout the research lifecycle. Good data management practices cover pre-project planning, active workflow organization, documentation, storage and backup strategies, and optimizing your final “data package” to fulfill research quality and reproducibility requirements and facilitate and new research. The second hour of this workshop will offer a mini “tour” of research data management tools including GitHub, LabArchives, and Tropy. Participants will be able to attend two of these demonstrations in the time allotted. We will end with a share out about what you’ve learned and how you might apply the tool to your own work.
Data Management 201: Preparing Data for Publishing
Data management practices help researchers take care of their data throughout the entire research process from the planning phase to the end of a project when data might be shared or “published” within a repository. Building upon the foundational concepts covered in the data management 101 courses offered this year, this workshop will provide hands-on experience where participants will learn strategies for “curating” a dataset for formal sharing. Participants will identify common data issues, determine recommendations to optimize the dataset, generate metadata and documentation, and consider how these practices might be applied to their own research.
Open Science: General Principles and Practices
Open Science is a growing movement that advocates for research to be transparent and openly available to all others for the purposes of engagement, validation, and extension. This workshop will present an overview of the Open Science movement and the general principles of the movement including the importance of access to data, publications, and the underlying research process as well as new initiatives within scholarly communications that support “openness” of the research endeavor such as preprints, registered reports, persistent identifiers, and community engagement platforms.
Introduction to Duke's Research Data Repository
Spring 2019 - Fall 2019
This workshop will provide an overview of Duke's Research Data Repository. The general functionalities of the platform as well as tips for submitting data will be discussed. Participants will also have an opportunity to discuss how the RDR or other repositories can help them comply with funder and journal policies as well as meet growing standards around data stewardship and sharing, such as the FAIR Guiding Principles.
Managing Sensitive Data
In the course of your research you may collect, interact with or analyze data that are classified as “Sensitive” or "Restricted" according to Duke's data classification standard. In this workshop we will examine common sensitive data types, how Duke’s IRB and Information Technology Security Office (ITSO) expects you to protect that data throughout your project’s lifecycle and the resources available to you for sensitive data storage and analysis, data de-identification, and data archiving and sharing.
Finding a Home for Your Data: An Introduction to Archives and Repositories
Publishing and preserving research data within a trusted repository helps researchers comply with funder and journal data sharing policies, supports the discovery of and access to data, and can result in more visibility and higher impact for research projects. This workshop will provide an overview of the different types of repositories and the overall role of repositories within the data sharing landscape. Key repositories in various disciplines will be explored, and attendees will learn about resources for locating and assessing repositories. Attendees will also have an opportunity to locate appropriate repositories for their own research.
Building Blocks for Reproducibility: Concepts and Practices
In response to a growing focus on the importance of reproducibility, replication, and transparency in the research endeavor, scholars are adapting their practices and learning new skills and tools. DVS is offering a workshop series that will introduce the concepts, practices and tools that will help increase the reproducibility of your work. This workshop will introduce the concept of reproducibility, its impact on science, and basic best practices that you can apply to make your work more transparent and reproducible. The workshop will be taught by a guest instructor, April Clyburne-Sherin, from Code Ocean. The general functionalities of the computational reproducibility platform Code Ocean will also be presented during the workshop.
Building Blocks for Reproducibility: Open Science Framework
In response to a growing focus on the importance of reproducibility, replication, and transparency in the research endeavor, scholars are adapting their practices and learning new skills and tools. DVS is offering a workshop series that will introduce the concepts, practices and tools that will help increase the reproducibility of your work. This workshop will introduce the Open Science Framework (OSF), which is a free, open source project management tool developed and maintained by the Center for Open Science. The OSF can help scholars manage their workflow, organize their materials, and share all or part of a project with the broader research community. This workshop will demonstrate some of the key functionalities of the tool including how to structure your materials, manage permissions, version content, integrate with third-party tools (such as Box, GitHub, or Mendeley), share materials, register projects, and track usage.
Building Blocks for Reproducibility: TIER Protocol
This workshop will introduce the TIER Protocol, which outlines a specification and process for maintaining well-organized documentation and producing more reproducible research projects. An example of using the TIER Protocol in conjunction with the Open Science Framework will also be presented.
Reproducibility: Data Management, Git, and RStudio
This workshop will introduce some general data management strategies that can increase the reproducibility of your work. You will also learn through hands-on exercises how to harness two specific tools, git and RStudio, to support the execution of more reproducible research projects. Git is a powerful version control system and RStudio is an open-source statistical software program. The Hands-on part of this workshop focuses on the practical aspects of configuring RStudio with Git. If you don't intend to use the R programming language, you may want to take a different workshop.
Data Management Fundamentals
This workshop introduces data management practices to consider throughout the research lifecycle: planning, organization, documentation, storage and backup, sharing, citation, and preservation. The workshop will offer an overview of general recommendations that are relevant across disciplines and will point attendees to additional resources at Duke and beyond.
Introduction to the Open Science Framework
The Open Science Framework (OSF) is a free, open source project management tool developed and maintained by the Center for Open Science. The OSF can help scholars manage their workflow, organize their materials, and share all or part of a project with the broader research community. This workshop will demonstrate some of the key functionalities of the tool including how to structure your materials, manage permissions, version content, integrate with third-party tools (such as Box, GitHub, or Mendeley), share materials, register projects, and track usage.
OSF + TIER Protocol: Designing a Reproducibile Workflow
The Open Science Framework is a free online system for managing and sharing research materials throughout the research cycle, and the TIER Protocol is a workflow for maintaining well-organized documentation of your data and analyses. This workshop will introduce you to key features of the OSF and demonstrate how it can be used in conjunction with TIER to facilitate reproducible research practices, collaboration, and best practices in data management.
Developing a Good Informed Consent Process
In this forum you will learn how to develop fully comprehensible, participant-friendly consent protocols. With numerous research funders and journal publishers requiring data management and sharing plans, consent forms and protocols need to address how participant privacy will be protected throughout a project’s lifecycle and how and what data will be shared. The workshop includes interactive exercises where you will assess consent form language to ensure that it is appropriate for the target study population and addresses data sharing in a responsible way.
Publishing Data with Research and Other Strategies for Increasing Your Impact
Scholars can and do communicate their research in various ways. While peer-reviewed journal publications remain the primary outlet for sharing the key results of research projects, there are growing norms (and expectations) that the underlying data from projects should also be published. In this workshop, we will look at 1) strategies to effectively publish data; 2) journal policies related to data sharing; 3) new types of publications such as data articles and registered reports; and 4) strategies for increasing and measuring the impact of your research. There will also be a hands-on portion of the workshop where participants will create their own ORCID identifier.
Consent, Data Sharing and Data Reuse
Spring 2017 - Fall 2017
Research involving humans requires multiple approaches to protect participants’ anonymity. If you are planning to (or are required to) share your human subjects’ data outside of your original project team, or plan to re-use data you collected previously for a new research project, you will need to design a consent form/consent protocol that properly addresses these situations. This workshop will present various research scenarios with examples of how consent protocol should be developed. If available, a staff member of the Duke Office of Research Support will be on hand to answer questions.
Writing a Data Management Plan
This workshop will be a deep dive into the process of writing a data management plan (DMP) using the DMPTool. To make the most of this workshop, attendees are encouraged to bring a “live” DMP that they are ready to begin or are currently in the process of writing. Attendees without active DMPs may write a “test” DMP based on who they would typically apply to for research funding. The “test” DMP can then serve as a useful reference when it is time to write a live plan. The instructors (both Research Data Management Consultants) will be on-hand to provide individual help during the writing portion of the workshop as needed and, following the workshop, are available to review plans through the DMPTool at any point up to final submission.
Research Collaboration Strategies and Tools
Scholars increasingly work on collaborative research projects. Collaborative projects often bring together partners across disciplines, institutions, and sectors. These projects present opportunities for innovation but also raise challenges for the development of efficient and effective workflows and the management of data. This workshop will examine considerations for collaborative research and present some strategies for developing and documenting workflows as well as methods for storing and sharing data. We will also look at some tools (i.e., Box, OSF, PRDN, etc.) available at Duke that can be used to support these types of projects.
Data Management and Grants: Complying with Mandates
Today, researchers are increasingly faced with requirements from both federal and private funders to share, archive, and plan for the management of their data. This trend began in 2003 when NIH released their data sharing policy and in 2011 the NSF began requiring that all grant proposals include a two-page Data Management Plan (or DMP). Then in 2013 an Office of Science and Technology Policy Memo directed all federal agencies with over $100 million in annual funding to develop plans to make research products, including data, openly accessible. This workshop will provide an overview of funding agencies’ DMP requirements, the primary components of data management plans, and suggestions for integrating data management updates into grant reporting. Attendees will also learn about tools and resources that can help them write a DMP that complies with funder mandates. A portion of this workshop will include a hands-on data management plan exercise.
Data Management Tools: The Dataverse Project
The Dataverse is an open source repository software platform for sharing, preserving, citing, discovering, exploring, and analyzing research data. This workshop will provide an overview of the Dataverse Project and demonstrate how the Dataverse can be used to discover research data and manage and share data in compliance with best practices.
Data Management Tools: Colectica for Excel
Are you an avid Excel user? Would you like to know how to add helpful documentation into your Excel files and generate codebooks automatically? If so, I’d like to introduce you to Colectica. While there is a paid version, there is a free version that is more than adequate if you plan to do all of your analysis in Excel. Visit http://www.colectica.com/software/colecticaforexcel and download before the workshop (my apologies to Mac Users in advance - this software only works with Windows). You are encouraged to bring a laptop and your own Excel file(s) to take some time to get to know the tool. Feel free to bring your lunch. In the spirit of Love Your Data Week and Valentine's Day, chocolate will be provided.
Data Management and Reproducibility: Enabling Open and Transparent Research through Data Sharing
Making data available within repositories is an essential aspect of supporting open and transparent research. Today as science is tackling the so-called “reproducibility crisis”, researchers are increasingly faced with journal requirements to share their data for the purposes of verification. This workshop will explore the concept of reproducibility, the growth in journal data sharing policies, and present strategies to help researchers share data that meet standards for reproducibility and reuse.
Data Management Plans: Grants, Strategies and Considerations
Fall 2012 - Spring 2014
In the last few years granting agencies across the disciplines have increasingly required data management plans as part of a grant proposal that detail strategies to manage, share and preserve research data as part of a funded grant project. NSF, the NIH, the National Endowment for the Humanities and other organizations have similar requirements, and Duke policy requires that research records (including digital data) be kept for at least five years. How should researchers respond? In this presentation, we’ll give an overview of research data management challenges and opportunities and describe some approaches for meeting them. We’ll ask the audience to share how they do data management now, and we’ll talk about planning underway for new services to help with data management at Duke.
Advanced Excel for Data Projects
Spreadsheets are a standard tool for many data projects, whether because of the ability to easily edit data, the ubiquity of spreadsheet programs, or the added features like charts and filters. This workshop extends the introduction from our Basic Data Cleaning and Analysis for Data Tables workshop by focusing on more advanced features in Excel. Examples include filtering, pivot tables, and data visualizations.
Basic Data Cleaning and Analysis for Data Tables
Tables of data, like those you see in spreadsheets or relational databases, are the foundation of most data-driven research today. There are many pitfalls of working with these tables, though, that most people end up having to learn the hard way. In this workshop, we'll take a dataset that has a variety of different properties and learn to work through many common steps of data-driven research to clean and begin analyzing the data. We'll be using Excel to make sure the methods we suggest can be reproduced easily "at home," but many of these techniques are important for other data analysis tools as well. No data experience necessary.
A gentle introduction to the basics of the R statistical programming language using the RStudio development environment. Learn about managing your R projects, data types, variable assignments, data cleaning and visualization. No previous experience required.
In response to a growing focus on the importance of reproducibility, replication, and transparency in the research endeavor, scholars are adapting their practices and learning new skills and tools. This workshop will introduce some general data management strategies that can increase the reproducibility of your work. You will also learn through hands-on exercises how to harness two specific tools, git and RStudio, to support the execution of more reproducible research projects. Git is a powerful version control system and RStudio is an open-source statistical software program.
Introduction to Stata
Stata for Research focuses on the core concepts of using stata. This workshop provides a hands on overview of how to load, manage, and analyze data using stata. The workshop will also include a brief introduction to stata graphics as well. No previous experience with stata is required.
AKA: OnYourLaptop - Part 1/3 - OpenRefine - Data Cleaning, Mining, Transformations, and Text Normalization
Open Refine (formerly Google Refine) is a tool for working with semi-structured datasets. It allows you to explore data, easily find facet patterns within data, enables simple detection of data inconsistencies, and offers quick clean-up and transformation options. Open Refine is an often intuitive but powerful tool for normalizing data before importing the dataset into a presentation application (e.g. mapping, charting, or analyzing.) In this hands-on class, we'll explore how Refine can help with common data cleaning challenges.
AKA: OnYourLaptop - Part 2/3 - Regular Expressions (RegEx)
Regular Expressions are a powerful method of finding patterns in text. For example: find all words ending in "ing"; all words which begin with a capital letter; all telephone area codes that begin with either the numbers 7 or 8; all email addresses which contain "duke.edu". Many programming languages use regular expressions as a means to support pattern matching.
Useful R Packages: Extensions for Data Analysis, Management, and Visualization
The basic version of the R programming language provides a powerful tool for data analysis, but much of the value in R lies in the wide range of libraries that extend its basic functionality. This workshop shares a number of popular extensions to R that enable rich graphics (ggplot, google graphics), file conversions, and additional statistical tests. A basic familiarity to R would be useful for this workshop.
Introduction to Text Analysis
Fall 2012 - Spring 2014
Many research projects involve textual data, and computational advances now provide the means to engage in various types of automated text analysis that can enhance these projects. Understanding what analysis techniques are available and where they can appropriately be applied is an important first step to beginning a text analysis project.
This hands-on approach to text analysis will give a quick overview of small- and large-scale text-based projects before addressing strategies for organizing and conducting text analysis projects. Tools for data collection, parsing and eventual analysis will be introduced and demonstrated. The workshop will focus on acquiring and preparing text sources for small-scale projects and text-based visualizations, but many of the techniques will be useful for larger projects as well. For this introduction, the focus will primarily be on using Graphical User Interface (GUI) tools like Microsoft Excel and Google Refine, instead of programming languages and command line approaches.
Mapping and GIS
Introduction to ArcGIS
Do you want to find out how geographic information (GIS) software can aid your research? This class will provide an overview of how ArcGIS software can help you analyze or visualize digital data that has a locational component, as well as discuss starting points for obtaining data. Examples will focus on social science data, but attendees are encouraged to ask questions regarding their own needs and will be welcome to make one-on-one appointments later for more focused instruction.
This class will show some ways that ArcGIS can be used for the analysis and visualization of historical spatial data. Topics discussed will be: sources for GIS layers reflecting the past, georeferencing a scanned historic map, creating new layers from scratch based on known locations of features, editing existing GIS layers to reflect former features and vectorizing a scanned map to create editable features.
Introduction to QGIS
Looking for an open source option for GIS? QGIS is free and it is one alternative to using ArcGIS. In this workshop we will demonstrate how to import and analyze data in QGIS and discuss the benefits of using QGIS over other GIS software.
ArcGIS Online (AGOL) is a companion to the ArcGIS client that allows members of a group to store and share spatial data online and that can be used independently or in conjunction with the client. We'll discuss aspects of the AGOL organizational account, adding and accessing content, creating map and feature services, creating and sharing web maps and presentations, publishing web applications, and using analysis tools.
R has become a popular and reproducible option for supporting spatial and statistical analysis. This hands-on workshop will demonstrate how to plot x/y coordinates; how to generate thematic choropleths with US Census and other federal data; import, view and produce shapefiles; and create leaflet maps for viewing on the web.
Web GIS Applications
Compare and contrast several products intended for geospatial visualization (e.g., a map to embed in a blog or PowerPoint, or for a poster session) and in some cases for GIS data analysis. (1) ArcGIS Online: Companion to the ArcGIS client that allows members of a group to store and share spatial data online, and that can be used independently or in conjunction with the client; (2) GeoCommons: both a repository for spatial data as well as an analysis and visualization tool; (3) Google Earth: emphasis on its features that are most applicable in an academic setting. See our schedule for another session on Google Fusion Tables.
Google Fusion Tables
Fall 2011 - Spring 2014
Introduction to the features of Google Fusion Tables, which include merging datasets, filtering and aggregating data, and visualizing data by creating online maps and graphs. For certain tasks, it can serve as an alternative to using statistical software such as Stata or GIS software such as ArcGIS.