WordSearch is NP-hard
(If you don't know what "NP-hard" or "NP-complete" means, or if
you have some other problem with my proof, you can check the reference
I give at the very end of this message, which introduces these notions;
you can also ask me for clarifications)
INTRODUCTION:
------------
I'm considering here a slight generalisation of the
WORDSEARCH problem from the POTM contest.
Look there for more precise rules; the only
difference is that here, the board size is unbounded
(so that 'NP-completeness' considerations make sense):
(generalized) WordSearch:
given
* a variable-sized grid NxM filled with letters
(in 'A'..'Z' - but fewer letters would do),
* a fixed number of words whose length is bounded
(say, at most 500 words of length at most 80 - but
much lower bounds would still do, as the proof will
show)
you must find disjoint, connected copies of the words
in the grid so as to cover as many letters as possible.
THEOREM:
-------
The (generalized) WordSearch decision problem
("is there a solution that covers N cells ?")
is NP-complete.
As a consequence, (generalized) WordSearch is NP-hard, and
no polynomial-time algo is guaranteed to find the
optimal solution (unless P==NP).
PROOF:
------
The WordSearch decision problem is obviously in NP, as any
claimed solution can be checked in polynomial time. We must
now prove it is NP-hard.
A boolean circuit is a graph made of wires, logical gates,
etc; that produces a boolean output value from boolean input
values. It is "satisfiable" if it can yield output "true"
for some input.
Checking the "satisfiability" of a boolean circuit (BCS)
is known to be a NP-complete problem.
To prove that the WordSearch decision problem is NP-hard
I'll reduce BCS to it, by encoding a BCS problem as a
WordSearch decision problem.
in particular each polynomial-time algo for the decision problem
of WordSearch ("is there a solution that covers N cells ?")
will provide a polynomial-time algo for BCS.
I'll be using a limited number of letters and of words;
only the size of the board will be allowed vary.
letter 'Z': this letter is part of no word: it just fills
the space left outside of the boolean circuit.
I will translate a boolean circuit C
in a WordSearch problem W in such a way that
condition
(1) the circuit C is satisfiable
(i.e. you can attribute values to the inputs so that
the output value is "true")
will be equivalent to condition
(2) in W you can find words that cover all letters
which are not 'Z'.
or (2') there is a solution of W that covers n cells
(where n = numbers of cells that are not 'Z')
'A','B','C': these letters form the wires.
One word: "ACB" is the only word in which 'C' appears.
each wire has the form
ACACAC.....CACACA
B B B B B B
(ending with 'AB' on both ends)
(you must imagine that the blanks are filled with 'Z')
There are only two ways to cover all 'C' is a wire:
all pieces will have shape
AC ("one" or "true")
B
or all pieces will have shape
CA ("zero" or "false")
B
( or also CA or AC but these will be "forbidden" below; )
( B B )
so a wire really transmits information from one end to the other.
You assign a direction to each wire; then, say it carries a "one"
if all pieces read "ACB" in the direction you chose, and say it
carries a "zero" if all pieces read "BCA" in the direction you chose.
Wires can be bent, you just have to be careful about not bending them
too narrowly; each 'A' or 'B' should touch only the 'C's to which they
are meant to be connected.
e.g.
...ACACAC
B B BAB
C
AB
...
Logical gates are really simple to implement.
They will all use letter 'E' (read 'equal')
For instance the "and" gate will use letters 'E' (denoted '='
for convenience) and 'F' (denoted '&'):
B B
in: ..ACAC
AB
& BCBCBC.. :out
&===A A A
AB
in: ..ACAC
B B
The words will describe the four ways to connect input
to output:
"A&&A===B"
"A&&B===A"
"B&&A===A"
"B&&B===A"
For each entry of the gate, a word covers either 'A' or 'B'
but not both. That is the trick that forbids pieces
CA and AC
B B
in my first presentation of wires: such pieces would force
to cover both 'A' and 'B' and the same end of the wire, and
no piece could then cover the adjacent logical gate.
Note that in the words with '&', 'A' represents 'true' on input
but 'false' on output, and conversely for 'B', because of the way
'true'/'false' are defined on wires.
Defining all 16 possible logical gates with two inputs should
now be easy...
'Not' gates are even easier: introduce letter 'D' and words
"ADA" and "BDB". Putting a 'not' gate in a wire is just
a matter of replacing a 'C' by a 'D'.
You must also be able to split wires:
C...
AB
A S
...CB===S with new letter 'S' and words "A===BSSB" and "B===ASSA"
AB (this is sort of a reversed logical gate)
C...
One last problem is how to cross wires without interference:
you can do this as follows:
...
C
AB
T
A AXXT A A A
..BCB TXXBCBCB...
T
AB
C
...
with new letters 'X', 'T' and words
"AXXXXB"
"ATTTTB"
(exercise: you can implement crossings
with only one letter and one word)
So, I showed how to translate the circuit itself, but what
about inputs and output ? We have to end wires in such a way
that, on output, only value "true" is admitted:
...CACACACACB (no need for new letter or words !)
B B B B
or, on input, both values are admitted:
IACAC...
B B
(new letter 'I' and words "IA","IB").
(Remember that a choice is "admitted" if and only if it
still allows you to cover all non-'Z' cells with words.)
To complete the proof, you have to show formally that
a boolean circuit can be translated in polynomial time
in a WordSearch problem, which size is bounded by a
polynomial of the size of the original circuit.
According to R. Kaye (see reference below) this should
be "straightforward" so you must believe me ;-)
END of sketch of proof.
WARNING:
-------
The details of the proof as published on the POTM message board
where not fully correct: the logical gates had unexpected (and
unwanted) covering solutions. So I tripled the '=' in them
(cf. 'and' and 'split' gates.), and completely rewrote the
'crossing' gate (these modifications also make the proof
somewhat clearer).
CONCLUSION REMARKS:
-------------------
Note that the proof needs only a very limited number of
letters and words.
Of course, the fact that the general WordSearch problem is NP-hard
does not imply that all instances of it will be difficult. It all
depends on how Fred will chose his data ;-)
As well, there may still exist a polynomial-time algorithm that
always finds a near-optimal solution. For instance, it is
quite easy to find near-optimal solutions to my
translated boolean circuits: you loose at most one cell per
wire - and this only when you cannot attribute a coherent
boolean value to the wire.
(Don't dream too much of finding such a polynomial-time
approximation algorith: I'm ready to bet they do not exist ;-)
ACKNOWLEDGEMENT:
---------------
The proof was inspired by "Minesweeper is NP-complete",
paper of Richard Kaye where he shows that determining if
a particular MineSweeper board is possible is a NP-complete
problem. To prove MineSweeper is NP-hard, he encodes
boolean circuits as MineSweeper positions. He also
introduces to notion of NP-completeness for those who are
interested.
This paper used to be available at
http://for.mat.bham.ac.uk/R.W.Kaye/publ/minesw.pdf,
but is apparently no longer available electronically (for
copyright reasons ?). However, Richard Kaye discusses
Minesweeper on his site.