Crossword Science: Error detection, backtracking, and expert problem solving.

As I discussed in an earlier posting about a computer model of Sudoku, computers can solve puzzles very differently than humans do. Because computers can be very fast and do a good job of systematically exploring a problem, you can give let them churn on a problem and exhaustively figure out the answer. For something like Sudoku, it is easy to figure out if the answer is correct (it is just simple math, or really, it is just pattern sorting). Problems like Sudoku are sometimes called 'well-structured', meaning that you can tell if you are right or wrong. Other puzzles, problems, and tasks are ill-structured; anything that is not well-structured.

Allen Newell and (Nobel Prize Winner) Herb Simon did research extensively on problem solving starting in the 1950s, and at least by 1958 described a well-structured problem as (roughly):
1. It can be described in numerical terms
2. The goals can be described in a well-understood cost (objective) function
3. There are deterministic algorithms that can be used to solve the problem.

Although there are many many well-structured problems whose solutions are now routinely done by computers and they make our world a better place (just for example, inventory control, your GPS, supply chain management, data compression, database access, etc.), many very important problems are ill-structured. Things like "who should I marry?", "how do you achieve peace in the Middle East?", "how can you prevent an economy from going into recession?", "how do you save someone from a burning building?". And among these ill-structure problems are crossword puzzles. One of the reasons crosswords are ill-structured is requirement #2; the goalstate cannot be described by a well-understood objective function. The multiple constraints for every letter help make the right answer easier to detect, but it is not uncommon to for a few 'mistakes' to creep in, even for very good players.

Without a clear objective function, it is difficult to do a computer-style search to solve crosswords. You can fill in random letters, check to see if they work or if you are getting closer to the answer, back up if you aren't, and repeat. As such, we humans are very reluctant to write down answers that we are not confident about, just to explore the space, even though the stakes are low. In other domains of expertise, where the stakes are high, experts may similarly be unwilling to search possibilities and backtrack. Many times, a surgeon or a CEO or a small business owner cannot afford to solve their problems through this kind of search. What really good experts often do is rely on their intuition and experience to recognize similar situations from the past, and execute without having the search. And this is what we see with good crossword players; they rarely make errors and backtrack. Even novice players who make more errors don't use this as a deliberate strategy and their backtracking is nothing compared to what traditional AI approaches would do.

This brings me to the Sunday April 5 NYT puzzle by Patrick Berry, which led me to do some backtracking. This was a nice puzzle with a theme involving reinterpreting some common phrases and relying on homophony. There was also a fairly opaque clue to 57D: Offensive Line Striker. _ _ _ _ _ _. I was fairly confident about C E N _ _ R, and the answer CENTER seems like it could have worked, either for Football (which has an offensive line and a center position) or Soccer (which I know has a striker and a center). But the crossing words would not fit (see inset)

Obviously, something did not fit. SAST and LOESED aren't words know. Actually, LOESS is the name for a statistical smoothing algorithm I do know, but it doesn't fit 78A Unfettered, and 72A Answers that may anger is wrong too.. Because we don't often make errors, and CENTER seemed to fit, detecting there was an error and figuring out where it was took some time. After deleting the two incorrect letters I was unsure about and working it from the across clues, I finally got LOOSED and SASS, which gives 57D CENSOR. Of course, I had been tricked by the clue to something that was nearly the right answer but not quite. I wonder if Patrick did this on purpose?

Clues like this--ones that complete to two similarly lettered answers, are not uncommon, and I like to collect them when I see them. By chaining these together, I think a constructor could really create a garden-path puzzle, one where a single mis-step will lead to others, which will lead to others, but eventually will prevent solving. Here are some less pernicious doublets I've collected; words that might have the same clue, and also share length and at least one letter

APEX/ACME (Clued as Top)
ACHE/AGUE: (Flu malady)
ETTE/ENNE (feminine suffix)
ILLS/AILS: troubles
REC/REW tape recorder button
GLUM/BLUE/GRIM (sad, far from cheerful)
GALLONS/GASCANS (fuel volumes)
COLT/CALF Newborn barn animal
ONION/OLIVE: cocktail garnish
ROTC/OPEC: driller's organization
DENY/DEFY: contradict

There was a great puzzle on April 1 that included a bunch of these, and could actually be solved multiple ways depending on your choice.

Crossword Science

Saturday, April 5, 2014

Error detection, backtracking, and expert problem solving.

No comments:

Post a Comment