Data Cleaning Using OpenRefine

Introduction

Data cleaning is a critical step in any assignment or project that involves data collection or analyzing existing datasets. OpenRefine is a free desktop application for working with messy spreadsheet data.

Use Cases

  • Discover trends in your data
  • Identify and clean up inconsistencies in your data
  • Parse and combine your data
  • Enrich and reconcile your data with external datasets

Tutorials

For more information about OpenRefine and its use in the classroom, contact the Digital Scholarship team in the libraries.

Close Reading Using Annotation Tools

Introduction

Collaborative annotation tools can engage a group of students in discussion around a common resource. Annotations and threaded conversations can shape class discussion agendas, or serve as discussions in their own right. By using browser-based tools, group annotation can engage students around a single resource or collection. Students can also produce public-facing critical editions of texts, visual sources, or other media.

Pedagogical Possibilities

  • Engaging group discussion around a common text through threaded conversations
  • Connecting a common text to other web-based resources via hypertext and linking
  • Annotating visual resources in synchronous or asynchronous settings
  • Incorporating other voices into classroom discussion (community partners, other scholars, remote students, students in other classes, etc.)

Tools

Hypothes.is

Hypothes.is is an excellent browser-based tool for group annotation. A Chrome browser extension makes annotating any web page quick and easy. You can also add a bookmarklet for any web browser. Students can annotate any web page, or any PDF or EPUB file that is available on the web. If you have a PDF scan of an article, you can share that via a Box or Google Drive link (or anywhere that the file can be viewed in a web browser) so your students can read and add annotations. Hypothes.is is ideal for textual sources available on the web.

Google Jamboard

Jamboard is an app within Haverford’s institutional suite of Google applications (“G Suite”) that serves as a digital whiteboard. Because multiple participants can open and work in the same “Jam,” it is possible to load an image or group of images to the board and mark it up with a variety of colors and drawing tools, or even import other related images via Google Image search. It is possible to create a new Jam independent of the physical Jamboard, and thus can be used in both hybrid and online-only classroom formats. This is a particularly useful tool for marking up visual sources.

Omeka+Neatline

Omeka is a content management system primarily used for building digital collections and exhibits. Neatline, a plugin for Omeka that is primarily used for mapping, also creates opportunities for annotating images, since a static image can also be used as a map background. These annotated sources can be published to the web for a public audience. Within a Neatline exhibit, students can embed images and media within annotations, use hypertext to link annotations to each other or to other sources, and control the visibility of annotations based on user input. This tool is ideal for sharing individually and collaboratively-produced annotated images with the public.

Examples

The Hypothes.is website has several illustrative examples of the tool being used in a classroom setting.

This Jam was produced during a text markup exercise to introduce text encoding in TEI.

This Neatline exhibit is a digital critical edition of Pablo Picasso’s “Guernica,” collaboratively annotated by students in a writing seminar. This annotated version of a 1968 Ebony magazine article titled “What To Do If Arrested” is another excellent example of Neatline’s capabilities.

Potential Assignments/Classroom Activities

Some potential activities involving these tools are listed with recommended duration or lead time:

  • Weekly reading assignments with asynchronous annotations and comment threads can establish topic discussions for class meetings. Students can also link out to additional resources and embed external media in their annotations to create a digital critical edition of an existing text online. (daily or weekly, one library instruction session recommended)
  • Group annotation of a photograph or other visual media in the Jamboard app with live discussion during or after (daily or weekly, one library instruction session recommended)
  • Individually or collaboratively annotated images in Neatline published to the web for public view (at least 4 weeks with at least one library instruction session introducing the tool)

Text Analysis

Introduction

Text analysis is the process of extracting information from a body, or corpus, of texts and organizing it in a meaningful way so that it can serve as the basis for scholarly interpretation.

Pedagogical Possibilities

  • Engaging in close reading by encoding the structural and semantic features of texts
  • Engaging in distant reading by applying computer-assisted analysis to texts
  • Creating digital editions and data visualizations of texts

Texts, Tools, and Examples

For texts, tools, and examples, see the Resources for Text Analysis page of the Resources for Digital Scholarship Research Guide.

Potential Assignments

  • Text encoding with the Text Encoding Initiative Standard (group and/or individual) – select text(s) for encoding, develop and document decisions for how the text(s) should be encoded, encode texts according to guidelines, determine online versions of the text(s) should be displayed
  • Text mining with Voyant – select text(s) for analysis, clean up the text and upload it to Voyant, determine stopwords (i.e. words that should be excluded from results), experiment with the various tools, write up experience

For more information about text analysis and its use in the classroom, contact the Digital Scholarship team in the libraries.

Data Visualization

Data visualization is an excellent way for students to engage with course materials in ways that open interpretation and lend themselves to original discoveries. The same information when displayed as a graph or map often reveals new and unexpected features.

There are several web applications to create simple visualizations such as:

Raw Graphs, https://rawgraphs.io/

Plotly Chart Studio, https://plotly.com/chart-studio/

Data Wrapper, https://www.datawrapper.de/

As part of several courses, Jake Culbertson (Anthropology) asks his students to contribute to a collaborative spreadsheet of the people, places, key topics, and debates that students encounter in course readings.  This provides a tangible task for students as they read, to highlight significant ideas and entities, and to report their results to the class. The spreadsheet provides a common pool of references to significant information that can be used for papers and class discussions.  The spreadsheet also provides opportunities to discuss how best to transform the information in the texts into structured data.

With the spreadsheet and a tool from Stanford called Palladio, students then create maps, graphs, and facets that allow them to identify significant patterns and features in their data. The spreadsheet exercise offers project-based collaboration with outcomes that benefit students’ engagement with readings and builds a shared knowledge base for faculty research that continues to be developed with students from semester to semester.     

Digital Exhibits

Digital Exhibits are an effective way of engaging students with digitized primary sources, curating digital media, teaching visual literacies, and producing collaborative scholarship. Exhibits can help students reach a broader audience with their scholarship, and introduce them to creating multimodal and non-linear narratives.

Teaching Goals

  • Close reading of primary sources or visual resources
  • Curation of digital objects
  • Digital publishing and a critical understanding of the Web
  • Public and/or multimodal scholarship

Tools

Digital exhibits can be created in almost any web framework or content management system. The Digital Scholarship team in the library can support all of the tools mentioned here, and many that aren’t.

Omeka

Omeka is a web-based digital collections and exhibits builder. With no specialized software needed, students can create digital collections by uploading their own or linking to digital objects on the web and describing them using library and archival standards. The platform is aesthetically rigid–only a few themes exist and it is not as easily customizable as some others–but what it lacks in flexibility it makes up for in ease-of-use. Students can focus on their content without needing to learn much technical. Omeka is available on Haverford Sites, so your class can either work on a single common instance hosted by the library or each student can host their own Omeka site.

WordPress

While WordPress was originally built as a blogging platform, it is now one of the most popular site builders on the web. It has a very active developer community that builds themes and plugins that extend the functionality and aesthetics of the core installation. It’s a tool that many students are already familiar with, and its simplicity allows them to focus on content creation. Those who wish to “get under the hood” can still do so. Like Omeka, WordPress is supported on Haverford Sites, so each student could create their own site or the class project can be hosted on a library server.

Jekyll or Static HTML Sites

More robust web frameworks like Jekyll offer total control over the look, feel, and functionality of a digital exhibit or website, and they create static web sites that are easy to host, migrate, and preserve. However, static sites require that students learn some markup language (either Markdown or HTML) to publish their work. While this critical making approach facilitates deeper understanding of the web, it requires more time for instruction and mastery of the required technical skills.

Examples

Possible Assignments

  • In the final weeks of the semester, students create their own WordPress or Omeka exhibits on a research topic of their choice (possibly adapting a paper they’ve written)
  • As a class, students create a common collection of items that they upload to Omeka over several weeks or throughout the semester. From that common collection, each student or group of students creates an exhibit exploring a theme within the collection.
  • Throughout the semester and with regular library instruction, students undertake a scaffolded series of assignments in which they learn markup, create one or more digital collections, and co-curate a digital exhibit to be launched at the end of the course.