“kiss dark torn sea crenellations gods walk disappointment eluded colours”
The ten words above represent a “topic” generated by the MALLET Topic Modeling Tool from a 546 word test file that I put in called “Poem ideas.txt” The tool generates topics by searching for significant clusters of words (you can find out how that works in more detail here and here) in the texts you put in. Then it makes spreadsheets that show you the topics and how relevant they are to each “doc,” a chunk of text that can vary in size from a few sentences in one file to the entire contents of multiple files.
Whizzbang! you might think, but what good is it? And if you’re a) a careful poet, b) an anti-poet, or c) not a fan of abstract art, you might add, It’s just ten random words with loose associations.
“A hit, a very palpable hit!” The TMT is not meant for small inputs; it is more suited to dealing with inputs like all of gothic fiction, or ten thousand emails, or issues of National Geographic from 1960-2014. Nor does the quoted string of words say anything particularly new or particularly well. The answer to the question, “What good is this ‘found poetry’?” is that it is no good at all, either from the statistician’s perspective, or from a conservative poet’s perspective, like my own.
It is, however, a lovely surprise. In the last place I thought to look for the ingredients of an epic poem, I found conflict in relationship (kiss, torn, dark, disappointment), characters larger-than-life (gods, walk, eluded), and environment (crenellations, dark, sea, colours). The funny thing is that this cluster doesn’t really represent what it appears to represent: a coherent story. The text it models is a disconnected collection of lines I thought up. I’m not even sure how crenellations made the cut – I don’t believe I’ve used that word more than once in the whole of my poetic endeavors, let alone the single input file.
If you find yourself nodding at the connections I drew between the words of the cluster, you can probably imagine the program’s usefulness as a heuristic: not just for literary critical argument – digital humanities is all over that – but for creative writing. TMT has adjustable features, like the number of topics you want to display, the topic proportion threshold, and words you want the program to ignore (articles, prepositions and proper nouns, maybe). Here are some of the other topics I’ve generated while playing with settings, file types and larger sets of files:
- sky mark crash deafening drunk cat curled hoofbeats hounds trumpets
- encyclopedia trevisa medieval principles greek roman memory bowker type detail
- settings vk mc li styleswitheffects zx nk gg jvm fj
- body somme work hondes qualitees touchinge goode liknes ordre litil
- fear kingdom eliot print stars norton fairy harder context made
For a while now my creativity has needed a kick in the arse, and I’m tickled pink that it was a thing so heavily invested in numbers that did it. Honestly, I shouldn’t be surprised; numbers have been kicking my butt since I met them, but hey. This is cool.