Survey of the Field: Analyzing Digital Public History

For this activity, I looked at eleven different digital public history websites, ranging in birth date from 1998 to 2014. These sites reveal the broad outline of digital public history work. In the first phase, there is a sense of trying to figure out what digital public history sites could do. Should they be the digital equivalent of a physical exhibition, like the Library of Congress’s Progress of a People site? Should they seek to capture memories from people who lived through an event, like the Blackout History Project? Should they take a reflexive approach and represent not just an historical event but how it is later remembered and interpreted, like the Great Chicago Fire and Web of Memory site? Some of these sites, like the Blackout History Project, try to do too much – not just collect and present oral histories, but archive media coverage and government and corporate documents. Others don’t take full advantage of the medium and just reproduce physical exhibitions. In this phase are the beginnings of some of the markers of good digital public history work: the site should be clearly focused, take advantage of the medium, and promote some form of public engagement.

The second phase of digital public history work is more coherent. The sites I reviewed in this phase are more focused and take advantage of the medium by linking and using multimedia extensively. Links allow users to chart their own path through the material and engage with the site more actively. These sites make historical thinking more obvious by foregrounding a multiplicity of primary sources and perspectives and promote other forms of public engagement. The Smithsonian’s site, A More Perfect Union: Japanese Americans and the U.S., includes a section for users to reflect on the site and to think about the site within the context of contemporary events. The Raid on Deerfield site, unlike the others created during this phase, takes a Rashomon-like approach to history, presenting five perspectives of a single event and asking users to decide. These sites are more clearly focused on a particular theme or event and primarily take the form of an exhibition of primary sources surrounded by traditional historical narrative. The reflexive and educational element of good digital public history emerges in this phase, as these sites try to get their users to engage in historical thinking.

The third, and current, phase of digital public history work is characterized by the differentiation of sites. The Lincoln at 200 site is similar to A More Perfect Union and Jasenovac: Holocaust Era in Croatia; it’s essentially an interactive exhibition of primary sources. The Bracero Archive returns to the goals of the Blackout History Project by soliciting oral histories and other materials from users. In both cases, this is a very good way to use the web, as individuals involved are likely dispersed and primary materials from both events were not likely systematically collected. Manifold Greatness is also an exhibition site, but it more extensively incorporates multimedia and interactive elements for specific user populations (in this case, children). It emphasizes reflectivity and historical thinking by incorporating a section on subsequent uses of the King James Bible with the more traditional narrative of its origins. Operation War Diary brings a new and promising form of public engagement: crowdsourced transcription work. This allows users to directly engage with primary sources and to essentially “do” history in a way that other forms of interaction may not have. This is really neat in a lot of ways, but I have reservations about outsourcing labor that really should be adequately funded (I do understand the types of fiscal constraints cultural heritage institutions operate under, and I understand thinking strategically, but it’s important not to forget this point either).

Tl; dr: Good digital public history work is clearly focused, is reflexive, models or allows users to engage in historical thinking, takes advantage of the medium, and promotes public engagement and interaction.

Response to “Whose Public? Whose History?”

Perhaps not surprisingly, in trying to define public history, Grele, Meringolo, Howe, Karamanski, and Conard end up writing histories of public history. Grele focuses on the relationship between public history and history within the academy while Meringolo highlights the emergence of public history within government institutions. Howe focuses on the field’s origin in the job crisis in the 1970s and then traces the creation of professional associations and publications. This post focuses mostly on Grele’s and Meringolo’s work, as they are the most interesting texts.

Grele offers the most ambitious, idealistic vision of public history and its goals, and I really like it (apologies for the lengthy quotation – this is the best summary):

“In addition, what we know about public historical activities as they now exist points to a similar correctness in Carl Becker’s view that every man can become his own historian; that relatively ordinary people can seek and find knowledge of the world they have made or that was made for them, and that since history always has a social purpose-explicitly or implicitly-such knowledge shapes the way the present is viewed. Thus the task of the public historian, broadly defined, should be to help members of the public do their own history and to aid them in understanding their role in shaping and interpreting events” (Grele, 47-48).

He does temper this a bit:

“This is not to suggest that once a correct view of the past is reached, through the aid of benevolent, anti-corporatist public historians, history immediately becomes a weapon in the arsenal of those who struggle for social change […] But it does mean that local or community history projects can play an important role in moving people to a clearer sense of the possibilities of social change and social action and their roles in such change” (Grele, 48).

This last sentence is key for me, with its emphasis on learning and historical thinking as empowering for individuals and communities. Much of my library scholarship deals with how discursive practices within the library community, particularly around ideas such as “the future” and technology reify the world we currently live in and work to render systemic change unimaginable and impossible. It is very important to me to think about ways to work against these sort of discourses, which aren’t limited to libraryland.  

Meringolo expands on Grele’s definition, but does not lose the core of it. Public history is fundamentally multidisciplinary and collaborative, but that collaboration is not limited to working with historians or other academics: “In reflective practice, public historians engage in active collaboration, constantly reframing questions and improving interpretations in conversation with themselves and with their stakeholders — employers, audiences, and so on” (Meringolo, xxiii). The addition of “employers” here is interesting, especially given Meringolo’s emphasis on public history within governmental institutions (and given the sorts of historical narratives you’ll find in something like the Foreign Relations of the United States), but also accurate. The ideas of “shared authority” and “shared inquiry” are key to this vision of public history, and play into public history’s goals of empowering individuals and communities (Meringolo, xxiii).  

Grele notes that this means that there is always negotiation between the public historian and the public: “Sometimes this merely means helping to bring to the front the information, understanding, and consciousness that is already there. More often it means a much more painstaking process of confronting old interpretations, removing layer upon layer of ideology and obfuscation, and countering the effects of spectacularized media-made instant history” (Grele, 48). Meringolo, too, sees this tension as central to practicing public history: “By acknowledging these emotional attachments, public historians can open up dialogue and foster a mutually educational experience, allowing public historians not only to educate their audiences but also to learn something about the ways in which average people understand, use, and value the past” (xxv).

This tension and the reflective nature of public history means that it is also interested in how historical narratives are produced, communicated, received, and interpreted, which I don’t necessarily think is as central to history more broadly. There is also a much more prominent and explicit educational element to the practice of public history as described by these authors is less crucial to history (whether that is a good thing or not is debatable). Grele’s understanding of the pedagogical element of public history is more one-way than that of Meringolo, who is more interested in understanding the relationship between the public and the public historian as one of exchange, negotiation, and recursiveness. In the quotation above, Grele really portrays the public historian’s role as exposing truth. In my definition, Meringolo’s description comes closer to what should be the practice of public history: thinking through a variety of historical narratives and interpretations, considering the work that those narratives perform, and empowering through education.

My definition of public history would include Grele’s expansiveness and idealism around the ultimate goals of public history and would incorporate Meringolo’s emphasis on public collaboration and public engagement. Education is crucial and multidirectional; it is not the public historian revealing the correct historical interpretation but rather opening up spaces in which the public can question, analyze, and interpret. In contrast to some of the definitions that Meringolo cites that locate the difference of public history in the communication methods and audiences, I would suggest that the reflective nature of public history and its interest in how historical narratives are created are distinct from (some) traditional historical work.

The authors of these articles primarily focus on the origin of the field of public history (wherever they locate that origin) and don’t discuss/unpack “the public.” My definition of public history includes the caveat that publics vary widely by institution and have different needs. Part of doing public history is being aware of who your specific public is, and what its needs are. This is a key element of librarianship, and I think it applies here as well.

Finally, my definition of public history rejects the notion of a division between “academic” history and practical/applied/public history. It is more productive to think of historical work as having particular goals and stakes depending on its contexts in which it is produced and received. This division is reductive and can slide into anti-intellectualism (I see this all of the time in librarianship).

 

Roots of Public History: Introductory Post

This post kicks off another semester in GMU’s digital public humanities certificate program. I’m not sure if I’m ready.

Some basic information about me as a student: I’ve been working as an academic librarian since 2007. I have an M.A. in American studies and an M.S.I. in library and information services. I started this certificate program in the fall of 2015 but otherwise haven’t been in school since April 2007. It’s rough having homework again, especially since I’m quite active in libraryland research and scholarship and have two articles and one conference presentation to write this semester.

My background in digital humanities: My background primarily consists of the course last semester, although a lot of DH concerns overlap with those of libraries (metadata, searching, digitization, etc.).

My interest in digital public history: A lot of this interest is tied to my background in American studies. I am interested in how we talk about and represent the past and being American, and this interest has grown as I’ve lived in Washington, D.C. The digital piece is interesting because it offers different forms of presenting, interpreting, and engaging with history for both institutions and individuals.

My learning goals for the semester: I want to become familiar with the theories, methods, and tools of digital public history. This is less clearly tied to my current position as an academic librarian than the class last semester, so I am a little uncertain about how the projects will inform my library work.

 

Final Project Reflection

My final project, Topic Modeling Detroit, was rooted in my longstanding interest in the history and representation of Detroit. I had previously written about Detroit being represented as ruined in both newspaper articles and photography. The figuration of Detroit as ruined is bound up with a narrative that locates the beginning of its decline in the 1967 riots. That project focused on texts from the 1990s and 2000s, so for this project, I wanted to look at texts from before the riots.

In terms of methods, I was inspired by projects like Robots Reading Vogue, which seemed to take an exploratory, open-ended approach to a group of texts. I didn’t really have any preconceptions as to what I might find in texts about Detroit published prior to 1967, although I did have a broad sense of its history and the centrality of the automobile industry and labor unions to that history. I liked the idea of being able to engage with a group of texts on their own terms, without looking for something specific. Because the university I work at is a member of HathiTrust and the HathiTrust Research Center allows you to use Mallet, the same tool used in Robots Reading Vogue, on a set of texts, I decided to conduct my project within the HathiTrust Research Center. This meant I could only use public domain texts digitized by HathiTrust, but the corpus I was able to create was still quite large – over 2000 texts.

Because HathiTrust includes Library of Congress Subject Headings with each item record, I decided to create my corpus using a subject search for Detroit. Initially, I thought I might break up my corpus by publication date, but it turned out that a lot of the texts in my corpus did not have accurate publication dates. This was pretty surprising given how good HathiTrust metadata usually is. I instead tried different numbers of words and topics in the topic modeling algorithm, and my final project incorporates perspectives from each version. More topics generally leads to finer grained topics but also more noise. Fewer topics got at the big picture of what was in the corpus but subsumed some interesting distinctions within topics.

The feedback I got mostly indicated that I needed to explain the background of the project more fully, which isn’t surprising. This is a topic I’ve been thinking about for a long time, and I keep up on the scholarly literature on the topic (it helps that I buy books for the library for American history and studies). I also had to adjust the way I usually approach texts and textual analysis. I generally read the text first, let it sit in my head, and eventually come up with what I think the text is doing. This was a bit different, and it was (is) hard for me to think about this project as showing what the corpus is, what the texts in it are. This project is not making an argument about the texts, which is what I am accustomed to doing it; it is asking “what are these texts?” I’m still trying to wrap my head around this, because it is so different from what I generally do.

Now that I’ve looked at the results of the topic modeling several times (and made myself approach them differently), I think they’re actually pretty neat. I would want to combine these results with some closer reading of some of the texts, because to me, what texts do is the more interesting question, but being able to get at what they are (and more than 2000 of them, too) is kind of amazing.

Topic Modeling Detroit: First Draft

REVISED 12/6/15

[This is a draft/outline version of my final project, which uses topic modeling to analyze public domain books about Detroit in the HathiTrust Research Center.]

“Topic Modeling Detroit” seeks to perform a “distant reading” of public domain books about Detroit digitized by HathiTrust and available in the HathiTrust Research Center. While ultimately it would be ideal to bring this project into conversation with other work on representations of Detroit, this project is primarily exploratory and designed for me to familiarize myself with the textual analysis tools available through the HathiTrust Research Center.

The HathiTrust Research Center is available to universities that are members of the HathiTrust and is designed to “[enable] computational access for nonprofit and educational users to published works in the public domain.” What this means is that once you have created an account, you can create text corpora and then analyze those corpora with eleven different techniques SUPER EASILY. Francesca Giannetti has a very good primer on the HathiTrust Research Center, which also includes information about using the Data Capsule. The Data Capsule allows you to use in-copyright books, unlike the algorithms embedded in the HathiTrust Research Center.

Creating a workset was very easy, since I knew I wanted the items to have “Detroit” as a subject heading. Subject headings get at the topic of an item, unlike full-text searches (too broad) or title searches (too narrow). There are two interfaces for creating worksets (all images can be clicked on to embiggen):

workset-1 workset-2

The workset I created and used for analysis contained 2,364 books whose publication dates were prior to 1963. Although HathiTrust is very good about metadata like publication year, there were 244 titles whose publication date was indicated 1800, but spot checking indicates that that date is not entirely accurate. There were also 52 items with a publication date of “0000,” which is unpossible and 124 items with a publication date of “1000,” which seems unlikely. 2324 of the books were in English and 2248 were published in the United States. Each of these facets (there are a few others) can be used to limit your workset and each can be clicked on to see the list of items that have the selected characteristic.

facets

languages

Each item record is connected to the full view and full catalog record of the item in HathiTrust.

The topic modeling algorithm within the HTRC uses Mallet and allows you to select the number of “tokens” (words) and topics. In this project, I mostly played around with varying numbers of both, rather than limiting by years as I initially thought I might. As I mentioned earlier, the publication dates are incorrect for many items and it’s not possible to limit your search to a date range (years have to be entered individually and joined by Boolean connectors). Running the algorithm involves two clicks, naming the job, and deciding on the number of tokens and topics. It does take a day or two to return the results, as far as I can tell, and they are displayed within the browser, like so:

results

This means the word clouds are not able to be manipulated and the best way to capture them is with a screenshot. The results page also includes a text list of the most popular words.

I ran the topic modeling algorithm four times: 100 tokens/10 topics; 200 tokens/10 topics; 200 tokens/20 topics; 200 tokens/40 topics. Screenshots of the results for each run are below and include between one and three topics. This is due to the total number of topics and the way they are displayed on the results page. Differences in size between topics should be ignored, since the topics are the same size on the results page (that is, I just took screenshots and didn’t resize them). Also, each set of results had at least one topic that consisted of punctuation marks, diacritics, symbols, and other non-word content. I did not include those here.

100 tokens/10 topics

100tokens-10topics1 100tokens-10topics2 100tokens-10topics3 100tokens-10topics4

200 tokens/10 topics

200tokens-10topics-1 200tokens-10topics-2 200tokens-10topics-3

200 tokens/20 topics

200tokens-20topics1 200tokens-20topics2 200tokens-20topics3 200tokens-20topics4 200tokens-20topics5 200tokens-20topics6 200tokens-20topics7

200 tokens/40 topics

200tokens-40topics-1 200tokens-40topics-2 200tokens-40topics-3 200tokens-40topics-4 200tokens-40topics-5 200tokens-40topics-6 200tokens-40topics-7 200tokens-40topics-8 200tokens-40topics-9 200tokens-40topics-10 200tokens-40topics-11 200tokens-40topics-12 200tokens-40topics-13 200tokens-40topics-14

Analysis

The topic models created with 100 tokens and 10 topics and 200 tokens and 10 topics seem to resemble each other and also to be the most coherent set of topics. These models clearly identify topics or genres within the corpus. They are:

  • history
  • city government/administration/development/public projects
  • biography
  • education/schools
  • geography/maps

The topic model created with 200 tokens and 20 topics refines these categories somewhat and introduces related topics. The topics above are still present

  • 18th/19th century history
  • construction/building
  • medicine
  • libraries
  • accounting/budgets
  • cars

These are pretty interesting refinements/related topics (and we see the emergence of “cars,” which is of course what Detroit has been associated with throughout the twentieth century), but this topic model also introduces some noise. Two of the topics above are not meaningful, and I removed one consisting of symbols.

The topic model created with 200 tokens and 40 topics further refines the broad topics of the 10 topic models. It includes the following additional subtopics:

  • legal profession/court
  • public works
  • water
  • population/demography
  • books
  • government documents
  • engineering/math
  • church
  • car manufacturing

This topic model also reveals that some of the corpus is in French, although the words included in that topic are primarily stop words.

When I initially reviewed the four topic models, I was kind of disappointed, but in taking another look, they do reveal a fair amount about the items in the corpus, particularly the broad categories they fall into. Compared to a content analysis I did of newspaper articles about Detroit, this is obviously much broader and less detailed, but it could definitely help identify subsets of texts to engage in close reading. Using the HathiTrust Research Center was very easy and I can now show students or faculty how to build a workset and use the analysis tools embedded within HTRC. There are a few drawbacks, however, both specific to this project and more generally. Limiting to public domain texts means that only specific post-1924 texts are included, like government documents, which may overly influence the resulting topics. This is particularly significant with the subject of this specific project, which only really became significant in the twentieth century (I’m particularly interested in the period between 1920 and 1960, and that period is not well-represented in HTRC, or really in any digitized text corpus). I’m also very interested in change over time, so the lack of good publication year metadata for so many of these texts was really disappointing. I had hoped to be able to perhaps look at individual decades, even with the caveat I just mentioned. This could be addressed by manually looking at the catalog records for texts with years of 1000 or 0000, since for at least some of them, the publication date is in the title or text. This would be extremely time-consuming, though, and for uncertain results.

 

 

Social Media Strategy

This post is meant to outline my social media strategy for sharing my course project (which is a textual analysis of books about Detroit digitized by Hathitrust), but I do have to say that I don’t know if I want to share it as of right now. I ran the algorithms within Hathitrust and am not wild about the results; that is, I don’t think they actually reveal anything interesting, although I will take a closer look over the next week.

These are the social media strategies I developed for different, but sometimes overlapping, groups that I belong to (or sort of belong to, in the second instance). I chose the platforms I did because I already have a presence on those platforms and am connected to those groups via those platforms. For academic librarians and digital humanities librarians and scholars, Twitter does seem to be the preferred social media platform. I am more personally connected to Detroit scholars and activists, which is why I would try to reach them through Facebook. In all three cases, I would maybe consider asking someone prominent in those groups to boost the signal, as it were.

I did not include other social media platforms because I do not have a presence on them and frankly, it is a lot of work to build a presence and then to interact through numerous social media platforms. Some platforms – like Instagram – don’t seem particularly appropriate, since my project will incorporate a fair amount of text in addition to some images.

Social media strategies

Audience: Academic librarians

Platform: Twitter

Messages: The message for this group would probably be primarily an announcement, since it may or may not be something they’re actually professionally interested in.

Measure: I would measure the success of this via favorites, replies, and retweets and also possibly via analytics on my online portfolio.

___________________________________________

Audience: Digital humanities librarians and scholars

Platform: Twitter

Messages: The message for this group would be more about soliciting feedback and initiating discussion about the project.

Measure: I would measure the success of this through conversations (which could be replies on Twitter) and comments on the portfolio post.

___________________________________________

Audience: Detroit scholars/activists

Platform: Facebook

Messages: This would be a combination of the previous two messages: an announcement, but also looking for feedback about the project from the perspective of people less interested in digital humanities per se and more interested in Detroit.

Measure: I would measure the success of this primarily through conversations on Facebook and the portfolio post, but would also consider likes and shares on Facebook and analytics from my portfolio.

What Can You Do with Crowdsourced Digitization?

In answer to the question in the title of this blog post, you can do a whole lot with crowdsourced digitization. Members of the public can transcribe manuscripts and archival materials (as in the Papers of the War Department and Transcribe Bentham), correct incorrect OCR (as in Trove newspapers), and verify shapes and colors on historical maps (as in Building Inspector). They can also do thing like add tags and comments to materials, which helps make them more findable to other users. Trove offers this to members of the public and so do other projects such as Flickr Commons, which I use a lot for historical photographs.

The types of projects and tasks that seem likely to attract contributors are those that appeal to their interests. In the case of Trove, primary contributors are most interested in family and local history and genealogy. In the case of Transcribe Bentham, frequent contributors were interested in Bentham or history and philosophy more broadly. Main contributors to the Papers of the War Department were similarly interested in American history. These tasks and projects also let contributors feel like they are giving back, that they are contributing to something larger and possibly of historical significance.Building Inspector is somewhat different; it seems more like the sort of task that contributors would do while standing in line or waiting for the bus (and since it’s optimized for mobile devices, I imagine they were). Because the New York Public Library is asking for help, though, I suspect that it would still be seen as altruistic or as helping out with a larger, more important project, similar to the ways in which these other projects are perceived by contributors to them.

Based on my experiences contributing to the Papers of the War Department and Trove, having a wysiwyg and easy-to-use interface is crucial. This is particularly true of the Papers of the War Department, since I had to expend a significant amount of brainpower on reading eighteenth century handwriting. Essentially, the interface can’t stand in the way of the contributor. In terms of community building, it does seem to be helpful to have some sort of community, although that can manifest in different ways. The Trove forums seem to be quite active and a good resource if you’re not quite sure what you’re doing. The Papers of the War Department has basically a conversations tab for each document, on which you can ask questions about the item you’re transcribing. The community of Transcribe Bentham used to be moderated, which was extremely effective but also labor-intensive; now there is a scoreboard, which I’m guessing does some of the same community-building work, but to a lesser degree. The community around Building Inspector is more implied – the same images are shown to three people – but it’s reassuring, as it lets you know that you won’t ruin something.

There is one aspect of crowdsourced digitization that hasn’t come up, and that is its labor politics. Several project creators/managers indicated that their motivation for crowdsourcing transcription and other work is because their institutions will never have the ability to pay for that labor. I certainly don’t blame organizations for using crowdsourced labor (yay austerity), but I do sometimes (particularly as a member of the information professions) wonder about how/if crowdsourced digitization replaces the creation of finding aids for manuscript collections or of catalog records and metadata for almost any item. Not everyone appreciates metadata, and even among librarians I frequently hear about how we don’t need metadata when everything is full-text searchable. This makes me want to bang my head on the wall, since metadata searching can be sooooooo much easier and more effective. Using unpaid labor – often interns – is also endemic to libraries, museums, and archives, and even full-time labor is often underpaid and undervalued, as these are historically feminized positions that involve soft skills and emotional labor.     

How to Read a Wikipedia Article

Reading a Wikipedia article is fairly straightforward in one sense. In order to see the changes made to the article and who has made those changes, you can click on the “View History” tab. Some users will have profile pages, while others (as in the case of the Digital Humanities article) use their real names, which you can then search. Some profile pages include real names, credentials, and institutional affiliations. Other changes, though, will have been made by users only marked by an IP address or by an unsearchable pseudonym. Some changes will include notes as to why those changes were made, but there is also a “Talk” tab where you can see discussions about the article. Most Wikipedia articles also include a list of references, which serve the same purpose as they do with any other book or article; they show the sources used to create the article and allow the reader to find and read those sources herself. These elements emphasize the transparency that Wikipedia cultivates in the creation of articles.

What is not necessarily transparent, but should be kept in mind when reading Wikipedia articles, is what both Rosenzweig’s and Auerbach’s articles emphasize: the social context in which Wikipedia operates. Rosenzweig notes the demographics of Wikipedia writers and editors (and Wikipedia’s corresponding “geek priorities”), localism,  avoidance of controversy, and emphasis on conventional wisdom. Auerbach discusses the organizational culture of Wikipedia’s editors, including its problematic gender politics. Of course, I would argue (and try to convey to my students) that it’s important to be aware of the social context of any text, since that context should inform both the selection and use of that text. I really don’t like to think of “assessing” any text, including a Wikipedia article, outside of how I plan on using it, so providing a general guide on doing this for Wikipedia is difficult. I primarily use Wikipedia when I quickly want to know something and the stakes for knowing it aren’t high – when I want to know who was in a movie or what year it came out, for example, or for a quick and dirty definition of something like digital humanities. If I was writing an article on that film, though, I would track down a different source, but that has less to do with the quality or trustworthiness of the Wikipedia article than with the conventions of academic writing and publishing.

I do use Wikipedia frequently for succinct explanations that will work in the moment and not much else because it is fundamentally an encyclopedia, as Rosenzweig notes (I really like McHenry’s “blandness of mere information, which Rosenzweig cites). This is apparent in the digital humanities wikipedia article, which glosses over complexity and disagreement and contestation to produce a view of digital humanities that is more or less coherent, but lacks depth. Rosenzweig ties this to ideas of objectivity and neutrality, ban on original research, and heavy emphasis on citing “published” sources. I think this leads to sentences like this one from the digital humanities article: “The definition of the “digital humanities” is something that is being continually formulated by scholars and practitioners; they ask questions and demonstrate through projects and collaborations with others.” If I was grading this paper, I would probably write “vague” next to this sentence.

If Wikipedia articles tend to be too shallow in some situations, the references and links are almost always valuable, some of which is undoubtedly attributable to Wikipedia’s emphasis on “published” sources. Linkypedia showed that many Wikipedia articles on economics and labor link to pages from the U.S. Bureau of Labor Statistics and frankly, it’s usually much easier to find the Wikipedia page than any given U.S. government website or document. The Galloway and DellaCorte article discussed how libraries, museums, and archives are and have been adding links to digitized materials, finding aids, and other resources to Wikipedia. Even if the article on, say, the Pittsburgh Courier is not ultimately detailed enough for my purposes, the links to the University of Pittsburgh’s collections would still be valuable. In the digital humanities article, the links to centers, resources, related entries, references, and bibliography would help me move beyond the somewhat superficial article, if I needed to (the list of references can also help with this, but in many Wikipedia articles – I think the digital humanities article is an exception rather than the rule – the sources cited tend to be things that can be easily found and accessed. No recent journal articles, because those are frequently behind paywalls. Not a lot of monographs, unless they’re out of copyright. And so on). I’ve bolded if I needed to, because it’s key: Wikipedia articles should be read and used with both the context of their creation and the context of their ultimate use in mind.

Comparing Voyant, CartoDB, and Palladio

Voyant, CartoDB, and Palladio are somewhat difficult to compare because although they can all be used to analyze the same basic dataset (in this example, the WPA Slave Narratives and a smaller subset of the narratives – those that were conducted in Alabama), each is best used to focus on specific aspects of that dataset. In order to compare these tools, I reviewed my previous posts on each of them (Voyant, CartoDB, and Palladio). In terms of interface and usability, all of these tools were accessible to the novice (me). This is somewhat of a sidebar: since I used these tools in the context of a class, I was given data to work with. Learning how to format and clean up data for specific tools would probably be really useful, and should maybe be part of this course or other courses in the program. I know there are tools like DataWrangler, but this is something I feel sort of lost with, and I think this played into my ability to work with some tools more effectively than others. That is, I understand how Voyant uses complete texts that have been OCRed; I understand what stop words are and how full-text search works, primarily because I am a librarian. I understand the data used by CartoDB mostly because I sat through six days of ArcGIS classes. Palladio eluded me because I lacked the same sort of background knowledge, despite Scott Weingart’s lucid introduction to networks and examples like Mapping the Republic of Letters, Viral Texts, Linked Jazz, and the London Gallery Project, none of which I had trouble understanding. I think part of this is because the Alabama WPA Slave Narratives didn’t seem like an obvious fit for network analysis for me, but not understanding how the data behind it worked further confused the matter.

Anyway, in reviewing my previous posts, it was apparent that I found Voyant to reveal the most interesting aspects of the WPA Slave Narratives. This may be due to two things: 1. In my own research, I primarily do textual analysis, and so what Voyant makes possible is a more expansive version of methods I already use; and 2. Because these are narratives, and the dataset is the full-text of the narratives, this is a richer dataset compared to what I used in CartoDB and Palladio. This doesn’t mean, though, that mapping and network visualization were not useful approaches to the WPA Slave Narratives. The CartoDB maps that showed where the interviews were clustered within Alabama and were the interviewees were enslaved and the CartoDB animated map that showed that time period in which the Alabama interviews were conducted were revealing in ways the texts in Voyant were not, even if that information was available in the texts or the texts’ metadata. I recall trying to work with the differences between male and female interviewees and the subject matter of their interviews in Voyant (I don’t recall if it was successful – much of what I did in most of these tools was not), but graphing interview topics against interviewee gender (and, for that matter, type of slave) in Palladio was immediately and obviously informative. Information about specific interviewers and who they interviewed, which I think was also available in either the texts or the texts’ metadata in Voyant, was also much more obvious in Palladio.

The tools complement each other because they reveal distinctive aspects of the WPA Slave Narratives. Voyant reveals patterns in words, language, and discourse. CartoDB reveals geographies and spaces. Palladio reveals relationships. That sounds banal and inconclusive, but I think it is appropriate given that I’m still at a point where I see these tools primarily as exploratory and want to be careful about stating what they can and can’t do do. Musher’s article on the context of the WPA Slave Narratives highlights the importance of understanding, appreciating, and respecting the context of the data you’re working with, as does Weingart’s post on when to not use networks. All of the projects we’ve looked at are very careful about historicizing and situating their digital projects and only using the methods and tools that make sense given the research question and data. As I alluded to in my definition of digital humanities, I think it’s important that the field as a whole push against dominant discourses of technological utopianism, and foregrounding context and contingency is one way to do that.

Network Visualization with Palladio

Palladio is a browser-based tool that allows you to create network visualizations. The process for creating a visualization is fairly straightforward; you upload data and can then visualize that data on a map or as a graph. The mapping feature is not particularly advanced, but it can provide geographic context for the network graphs, which can sometimes be a bit abstract.

I used Palladio to map data from the Alabama WPA Slave Narratives, including interviewee and interviewer names, where the interviewees were enslaved, where the interviews were conducted, the gender of interviewees and interviewers, the types of positions held by the interviewees (e.g. house, field, or not identified), the ages of interviewees, and the topics covered in the interviews. Since I did this for a class, I had instructions for uploading the data and connecting the datasets; if I hadn’t, I might have had more trouble with this. Fortunately, Palladio has a very helpful FAQ section, which I ended up reading in order to write this post. Once the data is uploaded, it’s very easy to generate multiple graphs based on that data. The challenging part is not creating the graphs but in deciding what would be a meaningful graph. Again, because this was for a class, I had instructions for which items to choose as source and target, but I misread those once and ended up a strange, blobby graph. I’m still not entirely sure of the difference between source and target and why I would use one or the other, either, since it doesn’t seem to matter with some maps (e.g. I tried mapping topics vs. interviewee gender both ways and got the same map). On at least one occasion, the graph I was instructed to make initially appeared to be a meaningless mess – I think it was the graph for topics and interviewees – but once I began moving the topics around, to the outside of the circle, it actually became quite illuminating. I could see which interviewees talked about specific topics and then subsequent graphs of topics/gender of interviewee and topics/type of slave became much more interesting and obviously useful in terms of thinking about the WPA narratives. Here’s what that graph looked like before I started moving topics around:

messy graph created using Palladio

Graphs can be manipulated once you’ve created them. Nodes can be moved around to make the graph more clear. The graph can be zoomed in and out to tease apart connections or to see the broader picture. Sources or targets can be highlighted so as to more easily distinguish between them. Facets can be applied to the graph, so, for example, I could look at a graph of topics and interviewees and then apply a facet so I was only looking at the results for female interviewees or interviewees who had been house slaves. There are also timeline and time spans facets that I did not use and I don’t really have a good sense of what those do.

I think Palladio is designed to do a lot of thinking for the user, which can be really nice – there isn’t a huge learning curve, and I got some nifty-looking graphs out of it without doing much work. For me, though, that also meant that I had to think less and so I feel like I don’t understand as much about what it is showing in those nifty-looking graphs. A big drawback is also that because it is browser-based and you don’t have to create an account, there is not a good way to save your work. It looks like the only option is to download it as a .json file and then upload it again. There’s also not a good way to export your graphs – the FAQ recommends a screen capture (which is what I used to include the image above), but that doesn’t include the data behind the graph. I did download some of my graphs, but I’m not entirely sure how to open the file format (.svg). Overall, I feel like I need to work with this tool much more.