Down and Dirty Guide to Literary Research with Digital Humanities Tools: Text Mining Basics

2010-10-05 - IMG_2791
Miao and Jason get things done with computers!

As part of the final Digital Pedagogy seminar of fall 2012, Margaret Konkol, Patrick McHenry, Olga Menagarishvili, and I will lead the discussion on “trends in the digital humanities.” You can find out more about our readings and other DH resources by reading our TECHStyle post here.

As part of my contribution to the seminar, I will give a demo titled, “Down and Dirty Guide to Literary Research with Digital Humanities Tools: Text Mining Basics.” In my presentation, I will show how traditional literary scholars can employ computers, cameras, and software to enhance their research.

To supplement my presentation, I created the following outline with links to useful resources.

Down and Dirty Guide to Literary Research with Digital Humanities Tools: Text Mining Basics

  1. Text Analysis and Text Mining
    1. My working definition of text mining: “Studying texts with computers and software to uncover new patterns, overlooked connections, and deeper meaning.”
    2. What is Text Analysis: Electronic Texts and Text Analysis by Geoffrey Rockwell and Ian Lancashire
    3. Text mining on Wikipedia
    4. Text Mining as a Research Tool by Ryan Shaw (an excellent resource with a presentation and links to more useful material on and offline)
  2. Advantages to Digital Research Materials
    1. Ask Interesting Questions That Would Otherwise Be Too Difficult or Time Consuming to Ask
    2. Efficiency
    3. Thoroughness
    4. Find New Patterns
    5. Develop Greater Insight
  3. Types of Digital Research Materials
    1. Your Notes
    2. eBooks
    3. eJournals
  4. Digitizing Your Own Research Materials
    1. What to Digitize
      1. Primary Sources
      2. Secondary Sources
    2. How to Digitize
      1. Acquire
        1. Camera > high resolution JPG
        2. Scanner > high resolution TIFF or JPG
      2. Collate as PDF
        1. Adobe Acrobat X Pro (now XI!)
        2. PDFCreator
        3. Mac OS X Preview
      3. Perform Optical Character Recognition (OCR) to generate machine readable/searchable plain text
        1. Adobe Acrobat X Pro
          1. Print PDF to a letter size PDF
          2. Tool > Recognize Text
        2. DevonThink
        3. Use Google
        4. Others?
      4. Save As/Export plain text > .txt files
      5. Engage the “Text” in New Ways
        1. New Ways of Seeing “Texts”
          1. Keyword Search
          2. Line Search
          3. Word Counts
          4. Concordance
          5. Patterns
        2. Tools to Help with Seeing “Texts”
          1. AntConc
          2. BBEdit (“It doesn’t suck” ®)
          3. MacOS X and Linux: cat, find, grep, and print (use “man cat” and “man grep” to learn more from the Terminal. More info herehere, here, here, and here.)
          4. DevonThink
          5. Notepad++
          6. Mac OS X Spotlight/Windows 7 Search
          7. TextEdit
          8. Others?
IMG_0987
Miao awaits digitization.