Applied Formal Logic: Correctness of Quick Search
When last I left, I had verified that the implementation of Quick Search was safe: the modified algorithm didn’t access memory it shouldn’t, it didn’t have numeric overflows, and it didn’t modify anything outside of its own implementation. And, using Frama-C, the process hadn’t hardly been torturous or even particularly unpleasant at all. The most important tasks were describing the requirements for calling the two functions. Those requirements are…detailed…but nothing more than what an experienced C programmer would have in mind while writing or calling the functions. A final set of shiny set of annotations ensured that the loops and the variable modifications needed by them behaved themselves.
But I hadn’t done anything about proving that the function actually did what it claimed it was supposed to do. It’s not hard to understand (one of the reasons I like Quick Search), but the ANSI/ISO C Specification Language is unfamiliar and I didn’t know how to express what went on, where. Fortunately, Yannick Moy, a software engineer at AdaCore, came to my assistance with an excellent introduction to doing the same thing with SPARK, and Loïc Correnson, one of the creators of Frama-C, popped in with some excellent suggestions (and very flattering comments).
So now, for your software verification pleasure, here is my essay at proving the correctness of Quick Search. It pretty much follows the same pattern as Yannick’s, but I did come up with a slightly different postcondition for make_bad_shift
.
To start with, I stole the partial_match_at
predicate from the brute force search post, cleaned it up a little and renamed it match_at
since I really don’t need two versions of the goofy thing.
/*@
@ predicate match_at(int loc, unsigned char *haystack, int h,
@ unsigned char *needle, int len) =
@ loc ≤ h - len
@ ∧ ∀ int i; 0 ≤ i < len ⇒ needle[i] ≡ haystack[loc + i];
*/
I’ll come back to the loc ≤ h - len
clause in a bit, but the remainder simply says that the needle matches the haystack starting at loc
, up to the offset len
. Oh, and I replaced the ASCII ACSL tokens with the spiffier Unicode math symbols: among others, ∀ replaces \forall
. In the future, though, I’m going to stick with C’s ==
rather than ≡, since the font I’m using doesn’t do a spectacular job with the Unicode character IDENTICAL TO.
The specification of the function make_bad_shift
is mostly unchanged from the previous post; the addition is the correctness postcondition.
/*@
@ requires \valid(needle + (0 .. n-1)) ∧ 0 ≤ n < INT_MAX;
@ requires \valid(bad_shift + (0 .. chars-1)) ∧ 0 ≤ chars < INT_MAX;
@ requires \separated(needle + (0 .. n-1), bad_shift + (0 .. chars-1));
@ requires ∀ int i; 0 ≤ i < n ⇒ 0 ≤ needle[i] < chars;
@
@ assigns *(bad_shift + (0 .. chars-1));
@
@ ensures ∀ int c; 0 ≤ c < chars ⇒ 1 ≤ bad_shift[c] ≤ n+1;
@
@ // correctness: bad_shift has meaningful shifts
@ ensures ∀ int c; 0 ≤ c < chars ⇒
@ (∀ int k; n - bad_shift[c] + 1 ≤ k < n ⇒ needle[k] != c);
*/
void
make_bad_shift (unsigned char *needle, int n, int bad_shift[], int chars)
Yeah, those two lines towards the bottom. For reasons I don’t want to go into, I parameterized this function by the size of the character set that makes up the strings, chars
. (Well, ok, “reason” probably doesn’t have much to do with it.) Thus, the postcondition predicate is quantified over all characters c in the alphabet of the strings, which is assumed to be between 0 and chars
.
∀ int c; 0 ≤ c < chars ⇒
(∀ int k; n - bad_shift[c] + 1 ≤ k < n ⇒ needle[k] != c);
To understand this predicate, you have to go back to the fundamental idea of the faster string search.
This function is initializing a bad character shift array, where the indices are characters and the values are shifts of characters to skip in the haystack: portions of the haystack that do not need to be examined because no match is possible in those locations.
The
bad_shift
map takes the character following the end of a failed match of the needle and turns it into the longest shift that is possible given that character.On one hand, if the character does not appear in the needle, the shift can move the entire length of the needle, plus one. For example, given a haystack and needle
xxxxXabcb abcb
the chosen character is X, which does not appear in the needle. No position between the current location to that following the X can be a match. The shift is the length of the needle plus one, or 5. The next state to attempt a match is therefore
xxxxXabcb abcb
which conveniently matches. Yay.
On the other, if the character does appear in the array, the next shift is the minimum offset from the right end of the needle of the character. That shift aligns the character in the haystack with the final matching character in the needle. Since it is the final matching character, no shorter shift can work. For example, given the haystack and needle
xxabcbxxx abcb
the chosen character is c, which has a shift of 2 (in other words, c is the second character from the end of the needle). The next state is
xxabcbxxx abcb
where the shift has made the c in the haystack align with the c in the needle.
The interesting thing to notice about both those cases is that the character in question does not occur in the needle from the position immediately after that identified by the shift to the end of the needle.
This last fact is what the postcondition predicate expresses: for any character c, all of the characters after the final occurrence of c in the needle if there is one—n - bad_shift[c]
is the final occurrence of c; adding one indicates the next character—or all of the characters in the needle, if there isn’t—n - bad_shift[c] + 1
is 0—do not match c.
Calculating this magic shift value is what make_bad_shift
does.
static void
make_bad_shift (unsigned char *needle, int n, int bad_shift[], int chars)
{
int i;
/*@
@ loop assigns i, *(bad_shift + (0 .. chars-1));
@ loop invariant 0 ≤ i ≤ chars;
@ loop invariant ∀ int k; 0 ≤ k < i ⇒ bad_shift[k] == n + 1;
*/
for (i = 0; i < chars; ++i) {
bad_shift[i] = n + 1;
}
/*@
@ loop assigns i, *(bad_shift + (0 .. chars-1));
@ loop invariant 0 ≤ i ≤ n;
@ loop invariant ∀ int c; 0 ≤ c < chars ⇒
@ 1 ≤ bad_shift[c] ≤ n + 1;
@ loop invariant ∀ int c; 0 ≤ c < chars ⇒
@ ∀ int k; n - bad_shift[c] + 1 ≤ k < i ⇒ needle[k] != c;
*/
for (i = 0; i < n; ++i) {
bad_shift[needle[i]] = n - i;
}
}
The third loop invariant of the second loop is the clause that allows Frama-C to prove that the final, correctness, postcondition holds. It is the final postcondition with the length of the needle, n, replaced by i, the index of the next element of the needle to be evaluated. Like all good loop invariants (well, ok, some good loop invariants), this predicate is trivially true at the start of the loop, since n - bad_shift[c] + 1
is n − (n + 1)+1, or 0. Furthermore, it is maintained by the loop because n − i is the current offset from the end of the needle.
With that in place, I can describe the Quick Search function itself.
/*@
@ // Original requirements for searching
@ requires \valid(needle + (0 .. n-1)) ∧ 0 ≤ n < INT_MAX;
@ requires \valid(haystack + (0 .. h-1)) ∧ 0 ≤ h < INT_MAX;
@ requires n ≤ h;
@ // the elements of needle are valid indices into bad_shift
@ requires ∀ int i; 0 ≤ i < n ⇒ 0 ≤ needle[i] < UCHAR_MAX + 1;
@ assigns \nothing;
@ ensures -1 ≤ \result ≤ (h-n);
@
@ behavior success:
@ ensures \result >= 0 ⇒ match_at(\result, haystack, h, needle, n);
@ behavior failure:
@ ensures \result == -1 ⇒
@ ∀ int i; 0 ≤ i < h ⇒
@ !match_at(i, haystack, h, needle, n);
*/
int
QS (unsigned char *needle, int n, unsigned char *haystack, int h)
The public contract for QS is almost identical to that of the brute force search. The guarantees made by the function are the same and the only difference in the preconditions are a requirement that the elements of needle be valid when used as indices into bad_shift
.
(And, honestly, I’m not sure why that is. It’s based on the requirement to make_bad_shift
, but I would think it would be satisfied by the unsigned char
type.)
By the way, back at the top of this post, I mentioned match_at
and the uninteresting, but seemingly unnecessary, conjunct in its definition: loc ≤ h - len
. That little guy is seemingly unnecessary because the location of the start of a copy of the needle cannot be found at a location greater than (h − n): There simply isn’t enough room for the needle. However, the code and specifications here are apparently not strong enough to prove that guarantee directly, with the result that it is only possible to prove, in the failure case, that there is no match between 0 and (h − n).
Presumably, I could find some way to logically express the impossibility. Or, I could just rule it out in the definition of match_at
by adding loc ≤ h - len
, which is the route I took. With that conjunct in the definition of match_at
, the failure case is wonderfully complete, showing clearly that there is no match in the haystack.
Upholding the overall guarantees of the contract, however, is a bit of a complication.
This is the complete implementation of Quick Search, including the ACSL assertions necessary to prove its correctness.
int
QS (unsigned char *needle, int n, unsigned char *haystack, int h)
{
int i, j, bad_shift[UCHAR_MAX + 1];
int g;
/* Preprocessing */
make_bad_shift(needle, n, bad_shift, UCHAR_MAX + 1);
/* Searching */
i = 0;
/*@
@ loop assigns i, j;
@ loop invariant 0 ≤ i ≤ h + 1;
@ loop invariant ∀ int k; 0 ≤ k < i ⇒
@ !match_at(k, haystack, h, needle, n);
*/
while (i <= h - n) {
/*@
@ loop assigns j;
@ loop invariant 0 ≤ j ≤ n;
@ loop invariant match_at(i, haystack, h, needle, j);
*/
for (j = 0; j < n && needle[j] == haystack[i + j]; ++j);
if (j >= n) {
return i;
}
if (i == h - n) { break; }
/*@
@ loop assigns g;
@ loop invariant i + 1 ≤ g ≤ i + bad_shift[ haystack[i + n] ];
@ loop invariant ∀ int k; i + 1 ≤ k < g ⇒
@ !match_at(k, haystack, h, needle, n);
*/
for (g = i + 1; g < i + bad_shift[ haystack[i + n] ]; ++g) {
//@ assert haystack[i + n] != needle[i + n - g];
}
i += bad_shift[ haystack[i + n] ]; /* shift */
}
return -1;
}
The one fly in the ointment here is the disconnect between the formal properties of bad_shift
:
∀ int c; 0 ≤ c < chars ⇒
(∀ int k; n - bad_shift[c] + 1 ≤ k < n ⇒ needle[k] != c);
And the overall goal of the outer loop invariant:
∀ int k; 0 ≤ k < i ⇒
!match_at(k, haystack, h, needle, n);
On the one hand, I know that the shift described by bad_shift
gives me the length of the longest non-matching suffix of the needle. On the other, I need to show that, given a location in the haystack, there cannot be a match between that location and the shifted location.
In order to cover this disconnect, QS uses an additional loop which in execution does nothing:
/*@
@ loop assigns g;
@ loop invariant i + 1 ≤ g ≤ i + bad_shift[ haystack[i + n] ];
@ loop invariant ∀ int k; i + 1 ≤ k < g ⇒
@ !match_at(k, haystack, h, needle, n);
*/
for (g = i + 1; g < i + bad_shift[ haystack[i + n] ]; ++g) {
//@ assert haystack[i + n] != needle[i + n - g];
}
Think of the assertion in the loop as a witness: g moves from just after the current location, i + 1, to just before the next location the algorithm examines, i + bad_shift[ haystack[i + n] ]
. With every iteration, the assertion indicates precisely where the mismatch described by the postcondition of bad_shift
lies. That allows Frama-C to conclude that there is no match at the current (or previous) locations given by g. Thus, since there is no match at i and no possible matches between i + 1 and i + bad_shift[ haystack[i + n] ]
, the overall loop invariant of the outer loop is satisfied and QS is proved correct.
But what about that loop…
As the situation now stands, I have proven that Quick Search satisfies its specification—a return value between 0 and h − n indicates it found a valid match and a return value of -1 indicates no match exists. Almost.
What I have actually proven is that Quick Search with a funny loop in the middle that does nothing useful at run-time satisfies its specifications. Which is just not good enough. Sure, with enough optimization, the compiler will probably throw the loop away. Still, it is an unpleasant wart to have executable code to do nothing other than satisfy the verifier. Enter ghost code.
Ghost code is treated as normal code while verifying the properties of the program, but ignored by the compiler while producing executable code. Ghost code in Frama-C is contained in ACSL comments marked by the keyword “ghost”.
In QS, the first statement that can be made ghostly is the declaration of the variable g.
//@ ghost int g;
The remainder is the useless loop in the middle of the function.
/*@ ghost
@ //@ loop assigns g; loop invariant i + 1 ≤ g ≤ i + bad_shift[ haystack[i + n] ]; loop invariant ∀ int k; i + 1 ≤ k < g ⇒ !match_at(k, haystack, h, needle, n);
@ for (g = i + 1; g < i + bad_shift[ haystack[i + n] ]; ++g) {
@ //@ assert haystack[i + n] != needle[i + n - g];
@ }
*/
Within the comment block, the ghost code uses normal C syntax, while only the single line “//@” comments are available for ACSL specifications. The result is the moderately unpleasant single line of clauses that applies to the loop—loop clauses such as assigns and invariants must immediately precede the loop statement itself.
I mentioned this problem in the comments to the previous post, and Loïc Correnson wrote,
Probably you should use an external regular function that holds the ghost loop. Simply pass the pointers to both ghost and non-ghost values to the “ghost” function. Then you can just call the C-function in a one-line ghost code.
Not only it will be less invasive in the original source code, but it cuts the surrounding proofs too.
For people interested, this is an occurence of the lemma-function pattern developed in Dafny and Why-3 tools.
Now, I’m not entirely sure what he has in mind, specifically, by “pass the pointers to both ghost and non-ghost values”, so I am not sure I’m doing this right, but I came up with the following lemma function, lemma_no_match_in_shift
:
/*@
@ requires \valid(haystack + (0 .. h - 1))
@ ∧ \valid(bad_shift + (0 .. UCHAR_MAX));
@ requires 0 ≤ n ≤ h < INT_MAX;
@ requires 0 ≤ i < h - n;
@ requires
@ ∀ int c; 0 ≤ c < UCHAR_MAX + 1 ⇒
@ (1 ≤ bad_shift[c] ≤ n + 1
@ ∧ ∀ int k; n - bad_shift[c] + 1 ≤ k < n ⇒ needle[k] != c);
@
@ assigns \nothing;
@
@ ensures
@ ∀ int k; i + 1 ≤ k < i + bad_shift[ haystack[i + n] ] ⇒
@ !match_at(k, haystack, h, needle, n);
*/
static void
lemma_no_match_in_shift(unsigned char *needle, int n,
unsigned char *haystack, int h,
int *bad_shift, int i)
{
int g;
/*@
@ loop assigns g;
@ loop invariant i + 1 ≤ g ≤ i + bad_shift[ haystack[i + n] ];
@ loop invariant ∀ int k; i + 1 ≤ k < g ⇒ !match_at(k, haystack, h, needle, n);
*/
for (g = i + 1; g < i + bad_shift[ haystack[i + n] ]; ++g) {
//@ assert haystack[i + n] != needle[i + n - g];
}
}
This function incorporates the ghost variable, g, and acts as a proof of the lemma that,
$$
\begin{array}{ll}
& \forall c : 0 \leq c < \left( \textrm{UCHAR_MAX} + 1 \right) : \\
& \quad ( 1 \leq \textrm{bad_shift}[c] \leq ( n + 1 ) \\
& \quad\; \land\; \forall p : (n - \textrm{bad_shift}[c] + 1) \leq p < n : \textrm{needle}[p] \not= c ) \\
\Rightarrow & \\
& \forall q : \left( i + 1 \right) \leq q < \left( i + \textrm{bad_shift}[ \textrm{haystack}[i + n] ] \right) : \\
& \quad \neg \textrm{match_at}(q, \textrm{haystack}, h, \textrm{needle}, n)
\end{array}
$$
In other words, that the guarantees that bad_shift
provides are enough to ensure that there is no possible match of the needle between haystack[i]
and haystack[i + bad_shift[ haystack[i + n] ]]
. The other four requirements of the contract for lemma_no_match_in_shift
are only there to satisfy the -wp-rte
argument to the WP plugin, to prevent possible run-time errors. When checking functional correctness only, those requirements are unnecessary. Loïc also writes,
Interestingly, WP was initially designed with functional properties in mind, not safety : we already have Frama-C/EVA (or ASTREE, or …) for that purpose. Using
-wp-rte
is more a trick than a feature, that still does the job, indeed. However, you may pollute your functional proofs by a lot of « mumbling » assertions regarding overflows and such. And SMT solvers quickly break down with all this garbage.
And unfortunately, I haven’t looked at the EVA plugin as a way of avoiding run-time errors. I’m not sure how I feel about the thing; I’m quite fond of the WP annotations for avoiding errors since they express exactly the ideas that go through my mind when thinking about calling a C function.
(Yes, I did feel it necessary to break out MathJax and LaTeX formulas. Gotta problem wit’ that?)
The loop in the middle of QS
(as well as the ghost variable, g) are replaced by the single ghost code line:
//@ ghost lemma_no_match_in_shift(needle, n, haystack, h, bad_shift, i);
And with that, Quick Search has been fully verified.
The conclusion of a challenge
Back in the brute-force post, I made the following challenge:
There is one…issue (I won’t use the word “error”) with this verification, as written. It’s related to a problem described in “An empirical study on the correctness of formally verified distributed systems.” I’ll leave it as an exercise to the reader to find it.
Quick Search here has the same…issue. Yannick was kind enough to meet the exercise. The issue from “An empirical study” is that brute_force
and IronFleet’s RSM server (and Quick Search) are under-specified (or over-implemented, if you choose to look at it that way). The RSM server implemented once-only semantics although its specification did not mention that fact. brute_force
and Quick Search actually find the left-most copy of the needle in the haystack, if more than one copy exists, but do not include that their guarantees.
That is all fine and dandy for some specific use of specific piece of code. A single user may not need the extended guarantee actually provided by the code for its own correctness. On the other hand, if Quick Search is being used in a library by multiple consumers, at some time the user will need those full guarantees. At that point, if the user is not aware of the actual behavior of the code or (more gallingly) even if they are aware but cannot change the specification, they will need to go through some extra dance steps in the client code to ensure their overall program satisfies the extended requirements—extra steps that are completely unnecessary and at best worthless. (At worst, those steps could be quite expensive.)
There is more to be said on this subject, and I hopefully will soon get around to saying it. But for now, Quick Search is relatively easy to fix. A simple addition to the ensures clause of the success behavior, of a conjunct that says that no earlier match exists, does the trick.
∀ int k; 0 ≤ k < \result ⇒
!match_at(k, haystack, h, needle, n));
This addition, in fact, is already satisfied by the code and the invariant of the outer loop, so no additional work is needed!
Here is the complete code to Quick Search.
/*@
@ // There is a partial match of length len, of the needle at location
@ // loc in the haystack of length h. A complete match has length n.
@ predicate match_at(int loc, unsigned char *haystack, int h,
@ unsigned char *needle, int len) =
@ loc ≤ h - len
@ ∧ ∀ int k; 0 ≤ k < len ⇒ needle[k] ≡ haystack[loc + k];
*/
Do keep in mind that neither match_at
nor lemma_no_match_in_shift
below, nor any of the ACSL specifications in either make_bad_shift
or QS
, have any cost at run-time.
/*@
@ // needle and bad_shift are valid arrays, are separate, and the
@ // elements of needle are valid indices into bad_shift
@ requires
@ \valid(needle + (0 .. n-1))
@ ∧ \valid(bad_shift + (0 .. chars-1))
@ ∧ 0 ≤ n < INT_MAX
@ ∧ 0 ≤ chars < INT_MAX
@ ∧ \separated(needle + (0 .. n-1), bad_shift + (0 .. chars-1))
@ ∧ ∀ int k; 0 ≤ k < n ⇒ 0 ≤ needle[k] < chars;
@
@ // this function initializes bad_shift
@ assigns
@ *(bad_shift + (0 .. chars-1));
@
@ // safety: bad_shift has valid shifts
@ // correctness: bad_shift has meaningful shifts
@ ensures
@ ∀ int c; 0 ≤ c < chars ⇒
@ (1 ≤ bad_shift[c] ≤ n+1
@ ∧ ∀ int k; n - bad_shift[c] + 1 ≤ k < n ⇒ needle[k] != c);
*/
static void
make_bad_shift (unsigned char *needle, int n, int bad_shift[], int chars)
{
int i;
/*@
@ loop assigns
@ i, *(bad_shift + (0 .. chars-1));
@ loop invariant
@ 0 ≤ i ≤ chars
@ ∧ ∀ int k; 0 ≤ k < i ⇒ bad_shift[k] == n + 1;
*/
for (i = 0; i < chars; ++i) {
bad_shift[i] = n + 1;
}
/*@
@ loop assigns
@ i, *(bad_shift + (0 .. chars-1));
@ loop invariant
@ 0 ≤ i ≤ n
@ ∧ ∀ int c; 0 ≤ c < chars ⇒
@ (1 ≤ bad_shift[c] ≤ n + 1
@ ∧ ∀ int k; n - bad_shift[c] + 1 ≤ k < i ⇒ needle[k] != c);
*/
for (i = 0; i < n; ++i) {
bad_shift[needle[i]] = n - i;
}
}
In this function, I did pretty up the ACSL clauses, by combining multiple clauses together. I strongly recommend that you not do this thing, and I don’t intend to do it again in the future. On the one hand, it is prettier and reduces the number of proofs that Frama-C is required to make (although presumably those proofs are more complicated) since Frama-C performs one or more proofs per actual clause. On the other, and more painfully, failing proofs are reported per-clause, and a clause containing multiple conjuncts is very difficult to debug. So, just don’t do it. (But it is prettier.)
/*@
@ // haystack and bad_shift are valid arrays
@ requires
@ \valid(haystack + (0 .. h - 1)) ∧ \valid(bad_shift + (0 .. UCHAR_MAX));
@ // n and h are appropriately sized
@ requires
@ 0 ≤ n ≤ h < INT_MAX;
@ // i is suitably inside haystack
@ requires
@ 0 ≤ i < h - n;
@ // bad_shift has valid and meaningful shifts
@ requires
@ ∀ int c; 0 ≤ c < UCHAR_MAX + 1 ⇒
@ (1 ≤ bad_shift[c] ≤ n + 1
@ ∧ ∀ int k; n - bad_shift[c] + 1 ≤ k < n ⇒ needle[k] != c);
@
@ assigns
@ \nothing;
@
@ ensures
@ ∀ int k; i + 1 ≤ k < i + bad_shift[ haystack[i + n] ] ⇒
@ !match_at(k, haystack, h, needle, n);
*/
static void
lemma_no_match_in_shift(unsigned char *needle, int n,
unsigned char *haystack, int h,
int *bad_shift, int i)
{
int g;
/*@
@ loop assigns g;
@ loop invariant i + 1 ≤ g ≤ i + bad_shift[ haystack[i + n] ];
@ loop invariant ∀ int k; i + 1 ≤ k < g ⇒ !match_at(k, haystack, h, needle, n);
*/
for (g = i + 1; g < i + bad_shift[ haystack[i + n] ]; ++g) {
//@ assert haystack[i + n] != needle[i + n - g];
}
}
As an aside, there are a lot of ACSL comments in the code. If you live somewhere with a requirement of one or more comment lines per line of code, ACSL would seem to be a very good (and productive!) way of satisfying that requirement.
/*@
@ // Original requirements for searching: valid strings and lengths;
@ // also, the elements of needle are valid indices into bad_shift
@ requires
@ \valid(needle + (0 .. n-1))
@ ∧ \valid(haystack + (0 .. h-1))
@ ∧ 0 ≤ n ≤ h < INT_MAX
@ ∧ ∀ int k; 0 ≤ k < n ⇒ 0 ≤ needle[k] < UCHAR_MAX + 1;
@
@ // QS makes no globally visible changes
@ assigns
@ \nothing;
@
@ // safety: limits on return values
@ ensures
@ -1 ≤ \result ≤ (h - n);
@
@ // success: the leftmost match was found and the location returned
@ behavior success:
@ ensures
@ \result >= 0 ⇒
@ (match_at(\result, haystack, h, needle, n)
@ ∧ ∀ int k; 0 ≤ k < \result ⇒
@ !match_at(k, haystack, h, needle, n));
@
@ // failure: there is no match in the haystack
@ behavior failure:
@ ensures
@ \result == -1 ⇒
@ ∀ int k; 0 ≤ k < h ⇒
@ !match_at(k, haystack, h, needle, n);
*/
int
QS (unsigned char *needle, int n, unsigned char *haystack, int h)
{
int i, j, bad_shift[UCHAR_MAX + 1];
/* Preprocessing */
make_bad_shift(needle, n, bad_shift, UCHAR_MAX + 1);
/* Searching */
i = 0;
/*@
@ loop assigns
@ i, j;
@ loop invariant
@ 0 ≤ i ≤ h + 1
@ ∧ ∀ int k; 0 ≤ k < i ⇒
@ !match_at(k, haystack, h, needle, n);
*/
while (i <= h - n) {
/*@
@ loop assigns
@ j;
@ loop invariant
@ 0 ≤ j ≤ n
@ ∧ match_at(i, haystack, h, needle, j);
*/
for (j = 0; j < n && needle[j] == haystack[i + j]; ++j);
if (j >= n) {
return i;
}
if (i == h - n) { break; }
//@ ghost lemma_no_match_in_shift(needle, n, haystack, h, bad_shift, i);
i += bad_shift[ haystack[i + n] ]; /* shift */
}
return -1;
}
And look, Ma, no Peano arithmetic! Not a single Zero or Successor in sight!