Saturday, September 27, 2014

On shellshock

The 'net is currently focused on Shellshock, leading to interesting discussions of the responsibility for the problem. Recently, someone posted Not a bash bug, which was linked on Hacker News. The argument there is that:

I would argue that the bash security concern is not a bug. It is clearly a feature. Admittedly, a misguided and misimplemented feature, but still a feature....

It is an old precept for security on unix systems, that environment variables shall be controlled by the parent processes, and an even older and more general precept that input data shall be validated.

That was my initial opinion of the bug, as well. The parent processes are in control of the environment and should be validating input.

On the other hand, after thinking about it, there are a number of reasons why I decided that this is at best a misfeature of Bash.

  • It is incredibly undocumented. I've been a Unix guy for over 25 years, and I've been using Bash for most of that time. (Sorry, David Korn.) I've used Bash a lot. But I've never heard of this thing.

  • It violates some ill-defined, personal, un-thought-about assumptions about environment variables. An environment variable with executable code? That's as terrifying as LD_LIBRARY_PATH, and that is very well known. One reason I've probably missed this feature is that it is something I would never consider using.

  • In my opinion, it's almost impossible to secure this on the parent process' side. Sure, the parent can look for magic Bash strings, but.... This isn't just Apache, it's potentially every other network accessible program that calls a shell, and that is a very common thing to do in Unix.

  • Finally, consider some of the special behavior of execlp and execvp:

    If the header of a file isn't recognized (the attempted execve(2) failed with the error ENOEXEC), these functions will execute the shell (/bin/sh) with the path of the file as its first argument. (If this attempt fails, no further searching is done.)

    You could end up starting a shell without knowing.

One comment on HN is interesting:

The original author of bash (a friend of mine, which is why I have this context) has been being interviewed by various newspapers today regarding shellshock, and finds the idea that he might have anticipated the number of ways people integrated bash into various systems (such as with Apache allowing remote control over environment variables when running software in security domains designed to protect against unknown malicious users) quite humorous. Apparently, it has been an uphill battle to explain that this was all coded so long ago that even by the time he had already passed the project on to a new developer (after having maintained it for quite a while himself) the World Wide Web still wasn't a thing, and only maybe gopher (maybe) had been deployed: that this was even before the Morris worm happened...

I certainly understand the impossibility of anticipating "the number of ways people integrated bash into various systems", but the idea of installing a facility for executing back-channel code was certainly sketchy at the time. Further, why is the feature still there? We stopped using rsh and telnet long ago, right?

Saturday, August 30, 2014

Link o' the day: Maciej Cegłowski is my new waifu

I'm behind the times, I know, but I just found Dabblers and Blowhards by Maciej Cegłowski and I find it an acceptable piece of text. This essay, a response to Paul Graham's "Hackers and Painters", is the source of many unique quotes such as, "In his essays he tends to flit from metaphor to metaphor like a butterfly, never pausing long enough to for a suspicious reader to catch up with his chloroform jar," or his definition of:

  • Painters apply colored goo to cloth using animal hairs tied to a stick.

which you may object to, until you realize it exactly matches his description of:

  • Computer programmers cause a machine to perform a sequence of transformations on electronically stored data.

Some thumbs up.

Saturday, August 23, 2014

More wisdom from mcguire

Names have been changed to protect the identities of those who may not wish to admit they know the Great Sage.

(06:53:46 PM) Mittens: Have you fixed everything yet?!?!
(06:54:00 PM) mcguire: I have fixed some things.
(06:54:20 PM) mcguire: Some things cannot be fixed and some things have not been fixed yet. Some things may never get fixed.
(06:54:30 PM) mcguire: Other things may be fixed, then broken again.
(06:54:35 PM) Mittens: Some things are idiots.
(06:55:23 PM) Mittens: Some things are big idiots.
(06:55:28 PM) mcguire: Finally, there are those things which, like the great blue heron, are neither fixed nor broken, but must merely be understood. Or failing that, just accepted.
(06:55:37 PM) Mittens: Or shot.

Saturday, August 9, 2014

Letterpress cheating in Rust 0.11.0, part 2

I have finally completed upgrading all of the assorted toy Rust programs, my ports of Jeff Knupp's Creating and Optimizing a Letterpress Cheating Program in Python, to 0.11. I also re-executed the notoriously crappy benchmarks.

These programs look for all of the words that can be made from a given set of letters, based on the system dictionary. The argument was "asdwtribnowplfglewhqagnbe", which produces 7440 results from my dictionary with a possible 33,554,406 combinations made from those letters.

Language Program Rust 0.6
Duration
(seconds)
Rust 0.7
Duration
(seconds)
Rust 0.8
Duration
(seconds)
Rust 0.9
Duration
(seconds)
Rust 0.11
Duration
(seconds)
Python alternatives/presser_one.py 49.1 48.6 47.8 39.0 59.4
Nimrod alternatives/nimrod_anagrams 12.3 18.0
Python alternatives/presser_two.py 12.8 12.6 12.3 11.6 17.2
Rust anagrams-hashmap-wide 9.3 15.4 12.1 19.6 15.7
Rust anagrams-vectors-wide 11.8 13.1 12.2 16.8 12.4
Rust anagrams-vectors 8.0 8.2 11.9 8.1 11.0
Rust anagrams-hashmap 6.0 35.5 7.2 7.0 9.3
C alternatives/anagrams-vectors 8.0 5.8 5.8 6.0 9.6
Python alternatives/presser_three.py 6.0 6.3 6.0 5.8 8.3
Rust anagrams-vectors-tasks 27.1 13.8 4.2 4.6 7.7
Rust anagrams-djbhash-tasks 6.2 5.5
Rust anagrams-hashmap-mmap 4.8 10.6 7.3 6.3 2.9
Rust anagrams-djbhashmap 2.8 2.5
C alternatives/anagrams-hash 0.9 1.0 1.0 0.9 1.4

The programming languages and versions for this run are:

  • Python: Python 2.7.6, with Python 2.7.3 and 2.7.5 for previous versions.
  • C: gcc 4.8.2, with 4.6.3 and 4.8.1 for the prior runs, all with -O3.
  • Nimrod: Nimrod 0.9.4 this time, 0.9.2 last, compiled with -d:release.
  • Rust: Rust 0.11.0, compiled with -O.

The various versions of the programs take slightly different approaches. Those with hashmap use a hashtable to store the anagram dictionary while those with vector use a sorted array and binary search to look up anagrams. Those with djbhash use an alternative hashtable implementation, based on the DJB hash algorithm and Python's dictionary implementation. The mmap version, as well as both of the C versions, import the dictionary via mmap rather than reading. All of the programs are single threaded, except for the wide and tasks versions. The wide versions split the dictionary into segments and have each thread search all of the possible combinations in its reduced dictionary. The tasks versions allow each task to have a copy of the full dictionary and the master process divides round-robins the combinations to the tasks. The parameters of each were tuned a while back and have not been adjusted.

Friday, August 8, 2014

Type-safe C?

I'm rather proud of an answer to Robert Harper's discussion of C typing that I wrote as a comment to a post, Six Points about Type Safety.

The post includes the footnote:

Dr. Robert Harper describes such a type safe analysis of C in a comment here

and while I largely agree with the six points, I disagree with Harper (and don't feel the need to carry the disagreement to wherever Harper posted his comment).

He says, in part, "For example, C is perfectly type safe. It’s semantics is a mapping from 2^64 64-bit words to 2^64 64-bit words. It should be perfectly possible to call rnd(), cast the result as a word pointer, write to it, and read it back to get the same value. Unix never implemented the C dynamics properly, so we get absurdities like 'Bus Error' that literally have no meaning whatsoever in terms of C code."

I don't believe this to be the case for two reasons, philosophical and definitional.

In the first place, if Unix, etc., "never implemented the C dynamics properly", we are very definitely into the discussion of the status of the concept of "unicorns", given that no actual unicorn exists. As a result, I feel perfectly free to assert anything without any real worry about contradiction---what is he going to do, declare me a heretic? Further, philosophically---my previously existing, unreasoned prejudices---I find his stance silly.

In the second, he is wrong about the definition of C. It's semantics, for any reasonable definition of the "semantics of C", are not a mapping from any number of any sized words to other words. The C standard, which is not formal but which is C in a real sense, pretty clearly says that the operation he describes is either undefined or implementation defined or otherwise similarly verboten. In which case, Unix does implement C dynamics properly. He and many others may not include SIGSEGV in their mental model of C's semantics, but that does not mean that he nor they are right.

Those are both significant problems, although the first is the worse. In what sense can one talk about the semantics of a language if no implementation of the language follows those semantics (even assuming those are the semantics of the definition of the language)? I do know that such is not a useful thing to do.

[Update] ...and the post on Hacker News for the Six Points article has been declared dead. Boy, do I love HN.