Type-safe C?

Posted on August 8, 2014 by Tommy McGuire
Labels: theory, quote, c, unix, programming language

I'm rather proud of an answer to Robert Harper's discussion of C typing that I wrote as a comment to a post, Six Points about Type Safety.

The post includes the footnote:

Dr. Robert Harper describes such a type safe analysis of C in a comment here

and while I largely agree with the six points, I disagree with Harper (and don't feel the need to carry the disagreement to wherever Harper posted his comment).

He says, in part, "For example, C is perfectly type safe. It’s semantics is a mapping from 2^64 64-bit words to 2^64 64-bit words. It should be perfectly possible to call rnd(), cast the result as a word pointer, write to it, and read it back to get the same value. Unix never implemented the C dynamics properly, so we get absurdities like 'Bus Error' that literally have no meaning whatsoever in terms of C code."

I don't believe this to be the case for two reasons, philosophical and definitional.

In the first place, if Unix, etc., "never implemented the C dynamics properly", we are very definitely into the discussion of the status of the concept of "unicorns", given that no actual unicorn exists. As a result, I feel perfectly free to assert anything without any real worry about contradiction---what is he going to do, declare me a heretic? Further, philosophically---my previously existing, unreasoned prejudices---I find his stance silly.

In the second, he is wrong about the definition of C. It's semantics, for any reasonable definition of the "semantics of C", are not a mapping from any number of any sized words to other words. The C standard, which is not formal but which is C in a real sense, pretty clearly says that the operation he describes is either undefined or implementation defined or otherwise similarly verboten. In which case, Unix does implement C dynamics properly. He and many others may not include SIGSEGV in their mental model of C's semantics, but that does not mean that he nor they are right.

Those are both significant problems, although the first is the worse. In what sense can one talk about the semantics of a language if no implementation of the language follows those semantics (even assuming those are the semantics of the definition of the language)? I do know that such is not a useful thing to do.

[Update] ...and the post on Hacker News for the Six Points article has been declared dead. Boy, do I love HN.

active directory applied formal logic ashurbanipal authentication books c c++ comics conference continuations coq data structure digital humanities Dijkstra eclipse virgo electronics emacs goodreads haskell http java job Knuth ldap link linux lisp math naming nimrod notation OpenAM osgi parsing pony programming language protocols python quote R random REST ruby rust SAML scala scheme shell software development system administration theory tip toy problems unix vmware yeti
Member of The Internet Defense League
Site proudly generated by Hakyll.