ICCM Paper on a Computational Model of Crossword Play

This summer, we published a study in the proceeding of the International Conference on Cognitive Modeling (ICCM) about some of our crossword puzzle experiments and models.  In the study, we had crossword experts and novices solve individual crossword clues, and developed a computational model of performance based on the knowledge of crossword players.

The basic results are shown on the right. For crossword experts, if we gave them answers that were mostly filled in (only a few missing letters; the 'E' circles), they were on average above 80% correct on their first try.  Giving them an easy versus difficult clue had an impact, raising their accuracy to above 90%.  This  clue difficulty, however, had a large impact when we gave them difficult letter clues (only one or two letters were given; the 'D' circles), raising their accuracy from below 40% to close to 80%.  This indicates that both 'routes'; semantic and orthographic, are critical for solving puzzles.

We also developed a model that views crossword solving as a form of recognition-primed decision making.  According to this, cues (letter parts and semantics) activate possible answers, which then have to be modified and filtered to determine the 'correct' response.  The model does this by using Matt Ginsberg's clue database, which he develops for puzzle creators and which also forms a critical part of his 'Dr Fill' Bot.

Our basic model shows a few things.  large effects of semantic and orthographic difficulty can be accounted for by corpus statistics.  That is, one of the things that makes a hard clue more difficult is how infrequently it appears in crossword puzzles. This means that a tricky clue (like 'iron clothes=ARMOR') becomes easier the more it appears in puzzles. Also, the number of letters given to start with has a large impact. In our current study and model, we find little evidence that these two types of information work together.  Each is important, and neither can solve the problem on its own, but to a first approximation, people appear to solve the clue _either_ with a semantic or a orthographic route.

