P and NP

Posted on August 24, 2010 by Tommy McGuire

Labels: theory, math

In my last post, I made a short, not particularly funny comment relating to Vinay Deolalikar's recent possible proof that P does not equal NP. The possible proof was also briefly the topic of conversation at work, replacing Megan Fox's thumbs and similar issues of greater import. Subsequently, I realized that I had not seen a reasonable, halfway comprehensible description of the P/NP problem.

I would like to remedy that situation with this post. (I would like to, but I probably won't.)

To understand the P/NP problem, you need to at least partially understand two things: the difference between determinism and nondeterminism in computer science and how to compare the efficiency of algorithms.

Deterministic and Nondeterministic Machines

Figure 1: A deterministic finite state machine.

The easiest way to understand determinism and nondeterminism is by looking at finite state machines, a mathematical construct made up of a number of states with labeled transitions between the states. For the purposes of this post, these machines read an input string made up of characters of an alphabet, say {a,b}, and indicate whether the string matches some language; for finite state machines the languages that can be matched are those of simple regular expressions. The machine begins in a given start state and takes a transition labeled with a character as the character is read; the input string is matched if, after the last character is read, the machine is in an accepting state.

In the machine shown in figure 1, there are three states. The initial state is indicated as usual by some sort of arrowhead pointing to the state (although I have drawn it looking something like rabbit-ears on state 1) and the accepting state indicated by a circle around the state. Transitions are represented by arrows labeled with the corresponding input character.

The machine starts in state 1 and matches the input string if it finishes in state 3.
This machine matches the language "any combination of a's and b's, with at least one b in the middle and ending with a b", or "a*b(a|b)*b" over the alphabet {a,b}.
Every state has exactly one outgoing arrow for every letter in the alphabet.

The final point makes this FSM deterministic: at any point in its computation it has exactly one action it can take.

A nondeterministic machine like figure 2, on the other hand, does not follow that rule. This machine is based on the first, but with an additional state 3 (and the original state 3 renamed to 4) and some extra transitions added.

Figure 2: A nondeterministic finite state machine.

In state 1, the machine has two possible transitions for the letter "b".
In state 3, the machine has an unlabeled transition to state 2. This can be interpreted as meaning that the machine can spontaneously change to state 2 without consuming any input.
Also in state 3, the machine has no transition for input letter "a". As a result, if it finds itself in state 3 facing an "a", it can do nothing.

The question is, as in so much of computer science, "what the heck does that mean?" Fortunately, there are three ways of looking at the execution of a nondeterministic machine:

At each state, if there is only one valid transition, it is taken normally. If there is more than one valid transition, the machine records the current state and input position and takes an arbitrarily chosen transition. If there is no valid transitions or if the machine is at the end of the input string and is not in an accepting state, the machine fails, returns to a previously recorded state and restarts with another transition. The failure and restart is backtracking.

Backtracking is frequently used when implementing nondeterministic algorithms, including some regular expression engines. However, at the moment I am only concerned with whether or not the machine can match the input string, not with how it does it, making backtracking less interesting.
At each state the machine takes all possible valid transitions. As a result, the next "state" will be a set of states and the subsequent group of valid transitions will be all of those valid from any state in the set. For example, in state 1 looking at a "b" in the input, figure 2 enters the combination state {2,3}. If the next character is an "a", state 3 fails, leaving only state {2}. The machine is said to accept the string if the accepting state is a member of the set of current states when it reaches the end of the string.

The idea of creating a combination of all possible states lies at the base of the algorithm for converting a nondeterministic finite state machine into a deterministic machine. It is also at the core of Russ Cox's Regular Expression Matching Can Be Simple And Fast engine (and that page has a lot of good information about finite state machines). Ultimately, since the only interesting fact about the finite state machine is the language it matches, this view is the one normally chosen.
Finally, the machine could metaphorically make use of a hypothetical oracle. At each state, the machine simply knows which transition to take in order to ultimately end in an accepting state, if that is possible for the input string. For most uses, this viewpoint is not particularly interesting, but it will become more important when I get to the "NP" bit.

With all that in mind, there are a few things to notice about figure 2.

This finite state machine actually matches the same language as that of figure 1. (The proof is left as an exercise for the reader.)
In general, nondeterministic finite state machines are no more powerful than deterministic machines, as you may have guessed from the comment above about the algorithm for converting a nondeterministic machine to a deterministic machine. For some similar machine models of intermediate power, this is not true, but for the Turing machines at the other end of the power spectrum from finite state machines, it is: nondeterminism does not actually add any expressive power.
On the other hand, the performance of an implementation of a nondeterministic machine may be different from that of an implementation of a deterministic machine. A deterministic FSM examines each input character once and does not require any extra memory; a backtracking implementation requires space to store state information and may examine each input character many times. The "all possible transitions" approach may require more memory for each "super-state". Finally, good oracles are very hard to find.

That brings me to the next part of this post.

Algorithmic efficiency

(I am stealing a lot of this from myself.)

For any algorithm, any operation used in that algorithm, and any input given to the algorithm, there is a mathematical expression that describes how many times the operation is used on that input by that algorithm. Consider a simple algorithm like going through a list of 12 numbers counting how many items there are in the list, one by one. If the operation is looking at a number to determine whether it is the last one in the list, then the operation is used 12 times by that algorithm on that input.

The expression normally depends on the size of the input: if that simple algorithm is given a list of length n (the variable traditionally used for such things), then it will use the operation n times.

Unfortunately, very often the exact mathematical expression is complex, not very helpful, and very unenlightening. The big-oh notation was designed to highlight the most important part of the expression while hiding irrelevant details. It describes the asymptotic behavior of the expression as the inputs of the expression get larger. In this case, that means that it describes the behavior of the algorithm as the size of the inputs to it get bigger.

The behavior of the counting algorithm is O(n), "order of n". How does that describe the speed of the algorithm? If each operation takes a finite (and by assumption, constant) time, then the time taken by any implementation of the algorithm will depend linearly on the size of the input plus some constant factors here and there. The Big-Oh notation provides a way of comparing two algorithms; for a sufficiently large value of n, n^2 will be greater than 10000n, and thus an O(n^2) algorithm will be slower than an O(n) algorithm for any sufficiently large input. There is a traditional hierarchy of algorithms:

O(1) is constant-time; such an algorithm does not depend on the size of its inputs.
O(n) is linear-time; such an algorithm looks at each input element once and is generally pretty good.
O(n log n) is also pretty decent (that is n times the logarithm base 2 of n).
O(n^2), O(n^3), etc. These start to look pretty slow, but are still usable. These and the previous classes are described as polynomial time.
O(2^n) is exponential time, which is really quite bad. Exponential-time algorithms begin to run the risk of having a decent-sized input not finish before the person wanting the result retires. For example, n^2 and 2^n for n=10 is 100 and around 1000, respectively. But for n=30, n^2 is 900 while 2^n is a bit over one billion.

There are intermediate cases, like O(log n), and there are worse classes like O(2^2^...(n times)...^2). The worst are mostly useful for shock value.

P and NP

P is the class of problems for which there is a deterministic polynomial time algorithm which computes a solution to the problem. NP is the class of problems where there is a nondeterministic algorithm which computes a solution to the problem, but no known deterministic polynomial time solution.

P likely seems simple enough to understand, but what does it mean for a nondeterministic algorithm to compute a solution? Neglecting the first model for a moment, if you try to use the second model of nondeterminism above, the "take all possible actions" model, you encounter a situation where it is very difficult to count the number of operations executed by the algorithm.

So, the best approach is the third model, the use of an oracle. For most NP problems the idea is that the oracle, or some other nondeterministic process, generates a potential solution to the problem in, say, constant time, and the remainder of the algorithm verifies that solution, taking polynomial time.

For example, check out the NP problem used in Deolalikar's possible proof: given a collection of n Boolean variables p, q, r, ... and a Boolean expression containing those variables, is there an assignment of true or false to each of the variables that makes the expression true overall? This is called the SAT problem, for determining the satisfiability of the expression.

SAT is a problem that actually crops up from time to time. For instance, software designed to solve SAT problems has been used to determine whether (and how) a group of OSGi bundles can satisfy their mutual dependencies.

Now, if you give me an assignment of true or false to all of the variables, I can tell you whether the expression is satisfied pretty easily, by evaluating the expression. (I want to say that evaluation would be linear on the size of the expression, and that the expression can be modified to make it linear in the number of variables, n. But I'm not sure. In any case, evaluating the expression will definitely be polynomial time.) So, if I have a perfect oracle to give me solutions, or if I had a nondeterministic machine that could look at all possible assignments, I could solve the SAT problem in polynomial time.

Unfortunately, I have neither. (Pesky oracles!) All I have is deterministic machines, and, in general, the only way to convert a nondeterministic algorithm to a deterministic algorithm is to use the backtracking model. At a fundamental level, the deterministic version of the algorithm generates a possible solution, tests it, and if that solution fails it backtracks to generate another possible solution. Certainly, there may be optimizations involved; the deterministic conversion will try not to use naive generate-and-test, but at some point, in general, it will have to fall back to something like backtracking.

To take SAT for example, I can look at the expression and see if there are special characteristics that will tell me the answer (p & ~p, for example, or p | ~p), but in general, for more complex expressions, the best I can hope to do is to narrow down the range of possible solutions.

There is no question here about power; if I have a nondeterministic algorithm to solve a problem, I can get a deterministic algorithm to solve the problem. The downside, however, is that the algorithmic efficiency of the solution gets much worse. In fact, the nondeterministic polynomial time algorithm becomes a deterministic, exponential time algorithm.

With SAT, I can check a possible solution in polynomial time, but to check whether there is any solution, I will have to look at a significant fraction of the 2^n possible assignments of Boolean values to the n variables. That might take a while.

The question here, then, the question that no one has so far been able to answer, is whether the NP class of problems is actually the same as P. Seems simple and obvious, right? Well, one confounding factor is the existence of NP-complete problems. These are problems which are NP themselves, but which, if they could be solved in polynomial time, could be used to solve all other NP problems in polynomial time. SAT is one of the NP-complete problems. (I will not go into the proof; this post is too long already.)

So you have P, problems for which you have a deterministic polynomial time algorithm, and NP, problems for which you have a nondeterministic polynomial time algorithm but for which the best known deterministic algorithm takes exponential time. If P=NP, which few really believe, you could solve a number of really hard problems efficiently. If P!=NP, which seems more likely, you cannot.

I do not want to get too far out of bounds here, but

The P vs. NP question probably doesn't have all that much to do with cryptography; it is based on a different hard problem.
If you are waiting for Ray Kurzweil and Vernor Vinge's singularity, you ought to be hoping P=NP. If not, you could make yourself a billion times smarter and still not put much of a dent in SAT.
[Insert your philosophical blather here!]

Comments

Also check out A simple explanation of the P vs NP problem

Tommy McGuire
2010-08-26

...and P vs. NP for Dummies

Tommy McGuire
2010-08-26

...or Stating P=NP Without Turing Machines, which is over my bow, I'm afraid

Tommy McGuire
2010-08-26