Text Analysis for the Geeks
print(‘Thank you based Guido van Rossum’)
! Guido van Rossum For those of you who don’t know who Guido van Rossum is, he is the sole genius behind Python! (to be honest, I didn’t know that either, but Google did!).
While he may not be that well known in the circle of digital humanities, his creation has been utilized to its fullest extent by all you word counters and Project Gutenbergers out there. While text analysis wouldn’t exactly be impossible without him, he allowed us to do in a prettier, simpler, and shall I say, more Pythonic way.
But I’ll put my excited Computer Science self aside and talk about the purpose of my little rant (or writing assignment if my professor is reading this. P.S. Please gift me an A). Now at this point you probably have no idea what all these words are about, so I shall tell you know! In the circle of digital humanities, text analysis is a huge deal. And when I mean like huge, I mean like HUUUGE. However, could you imagine going through the entire Sherlock Holmes series and counting every word (omitting common words like “a” or “the”), counting the number of paragraphs, the average number of paragraphs per chapter, or even the amount of times a single word has been used (by the way, the words “small” and “little” are used ALOT!)?! That is why computational analysis is such a great tool to use. It takes the mundane tasks out of text analysis so that we can focus on what is actually important, the ANALYSIS! Sure we can have the computer tell us that Sir Arthur Conan Doyle likes to use the word “small”, but what does that actually mean?
While it would be nice if we can write a Python script that will parse through the entire Sherlock Holmes series and explain the motives of the characters or the underlying themes of the book, a computer cannot process that kind of information (not yet anyway). So for now, we can use the computer as an aid that allows us to find the information we may miss, and helps us create a more complete analysis of whatever text we feed it.