Friday, June 21, 2013

Finite Haiku

So a funny thing happened in my brain today. I wondered... given that haiku has exactly 17 syllables all up (5/7/5), and that the english language has a certain number of words/sounds, how many unique haiku exist? Well.. I'm still not sure, but I did some poking around to work out the worst case scenario. The worst case scenario is that English is a living language, and that haiku is simply a collection of 17 syllables, each from a pool of all possible syllables in the language. While this is very simplistic, and throwing random syllables together is unlikely to produce words let alone meaning, it does future proof against new words that may turn up. And it does provide a maximum number. So how many syllables are there in English? Turns out that I couldn't find any definitive answer, but I did find an article here http://semarch.linguistics.fas.nyu.edu/barker/Syllables/index.txt which again is kind of a worst case scenario. The author refers to 15,831 syllable candidates. This does seem rather large, but I'd be interested if someone else had any good sources on something more accurate. So if we take this worst case of 15,831 syllable candidates, and we have 17 positions to fill, again using a worst case scenario that any syllable can follow any syllable, we end up with 17^15,831 unique haiku - which will include both the nonsense ones and also every possible sensible haiku. It did take a while to find a calculator that wasn't going to fall over punching in that kind of number. Luckily, Wolfram Alpha was obliging and came up with 1.7*10^19479. A stupidly large number. How stupidly large? Well... let's compare it to some other things. For the bridge players out there, there are 5.4*10^28 unique bridge deals. For the chess players, it's estimated that there are 1*10^120 unique chess games. So I'd like to make the number more accurate, but I'm not sure how. Any suggestions? Maybe if I could find the average number of syllables in a word (not in a normal distribution, but across the english language), I could use that and the total number of unique words. Any other ideas?