If you are interested in finding out what the most frequently occuring words are in the text(s) you are researching, you could always start by creating a Word Cloud, one of the best known text analysis visualizations. The results are simple and aesthetically-pleasing: run your text through a word cloud application to produce a roughly circular design of the most frequently used words, with the highest frequency appearing as the largest and lower frequencies diminishing in size.
But why might researchers want to find the most frequently occuring words in their texts? What kinds of text lend themselves best to quantitative linguistic analysis? And, beyond the word cloud, what kinds of visualizations can give us textual insight? On May 17, I led a workshop aimed at thinking through these questions with the online text analysis tool Voyant.
Voyant is free, online software created by Stéfan Sinclair and Geoffrey Rockwell, and its purpose is to allow researchers to produce word frequency and location results (and corresponding visualizations) from their texts without needing to write their own programs. Researchers can upload any document(s), or copy and paste any body of text or URL, into the analysis tool on Voyant’s homepage, click reveal, and, as the website says, “see through their texts.” The default “skin” - or suite of five tools - offers some popular types of text analysis (including a word cloud function), with options to change tools in all of the windows. Each tool offers some minor customizability via the word search, so if, for example, you don’t want an interviewer’s name affecting the frequency analysis of an interview, you can add it to the stoplist. Similarly, if you want to track only how often and where the names of the main characters in your novel appear, you can create a whitelist that includes only those names. Voyant will reveal how many unique words occur in the text or corpus (group of texts) being analyzed, show keywords in context, and create graphs to visualize the frequency of keywords over the length of the corpus. Voyant also has documentation for each of the tools, as well as a gallery of real research examples.
For a smaller corpus, the tool can be helpful in confirming suspicions human readers are already intuiting, although it can certainly produce surprises (as when one of the most frequent words in an interview ends up being “inaudible,” for example). It becomes most helpful, though, for queries that would be onerous or impossibly time-consuming for human researchers, as in Alyssa Anderson’s study of over 2500 runaway slave advertisements. Voyant produces concrete answers to questions of linguistic context or pattern in large or small corpuses, which may aid researchers in asking specific questions of their texts. But, as Sinclair and Rockwell emphasize, Voyant is also a tool to help us think about and look at our texts differently. Our ability to search a digital text to instantly locate a particular word has changed the way we approach research; similarly, Voyant complicates and nuances what it means to read in a digital environment. Exploring texts with Voyant might be the final step in analysis, but with its emphasis on process and its multiplicity of quantitative analyses, it may very well end up being one of the first.
If you have questions or would like to learn more about Voyant, contact us: firstname.lastname@example.org.
Stefan Sinclair and Geoffrey Rockwell, Hermeneutica
Stephanie Posthumus and Stefan Sinclair, “Reading Environment(s): Digital Humanities Meets Ecocriticism”
Lara Putnam, “The Transnational and the Text-Searchable”
Stephen Ramsay, Reading Machines
Alyssa Anderson, “Using Voyant for Text Analysis”