We know the rules and statistics of English: which words go together, which sounds the language employs, and which pairs of letters appear most often. (Q is usually followed by a u, for example, and “quiet” is rarely followed by “bulldozer.”) There are only so many translation schemes that will work with these grammatical parameters. That narrows the number of possible keys from billions to merely millions.
The next step is to take a whole lot of educated guesses about what the key might be. Knight uses what’s called an expectation-maximization algorithm to do that. Instead of relying on a predefined dictionary, it runs through every possible English translation of those Russian words, no matter how ridiculous; it’ll interpret as “yes,” “horse,” “to break dance,” and “quiet!” Then, for each one of those possible interpretations, the algorithm invents a key for transforming an entire document into English—what would the text look like if meant “break dancing”?
The algorithm’s first few thousand attempts are always way, way off. But with every pass, it figures out a few words. And those isolated answers inch the algorithm closer and closer to the correct key. Eventually the computer finds the most statistically likely set of translation rules, the one that properly interprets as “yes” and as “quiet.”
The algorithm can also help break codes, Knight told the Uppsala conference—generally, the longer the cipher, the better they perform. So he casually told the audience, “If you’ve got a long coded text to share, let me know.”
Funny, Schaefer said to Knight at a reception afterward. I have just the thing.