What’s wrong with CS research

The other day I was glaring obsessively at my sad, dwindling traffic numbers, and I noticed that a couple of weeks ago someone seems to have sent my post on the decline and fall of the American university system to reddit.programming, where it got—I thought quite amusingly—46 “upvotes” and 36 “downvotes.”

Obviously, UR is not a programming blog. And in a post that warms up with a 75-mile-long blockquote about Hegel, I didn’t really think of myself as writing for a technical audience. Technical audiences are not to be underrated, however, and I’ve noticed that a lot of UR readers are in this category. So I thought I’d say a few more words about CS research.

This is not a subject that interests all. And if it doesn’t interest you, no one can conceivably require you to read about it. And I recommend you just skip this post.

First, the Reddit comment thread raised various criticisms which I didn’t see in time to refute. Let me start by answering two of these points, and sneakily segue into what I think is wrong with CS research.

One: Guy Steele is a system software designer, not a PL researcher. In other words, he is in a class with Dennis Ritchie, Alan Kay, Brendan Eich, John McCarthy, John Warnock, John Ousterhout, Bjarne Stroustrup, Rob Pike, Larry Wall, Ted Codd, Tim Berners-Lee, Leslie Lamport, Ken Thompson, Dave Cutler, Sergey Brin, Luis von Ahn, Guido van Rossum, Linus Torvalds, and anyone else who has designed fundamental architectural components of code which everyone on earth points their CPU at a zillion times a day. Frankly, these men are heroes. They are literally warriors of God. And anyone who insults them is an asshole.

I’m sure some of the above individuals—perhaps even most—were, at some time in their long and productive careers, funded by various grants or other sweetheart deals with the State. For that matter, I’m sure few if any of them would endorse the extreme opinions I express below. But if anyone with a forehead steeper than a halfpipe has ever described their work as, for example, “grant-savvy,” I am a Cape buffalo.

So here’s the first thing that’s wrong with CS research: there’s no such thing as CS research. First, there is no such thing as “computer science.” Except for a few performance tests and the occasional usability study, nothing any CS researcher does has anything to do with the Scientific Method. Second, there is no such thing as “research.” Any activity which is not obviously productive can be described as “research.” The word is entirely meaningless. All just semantics, of course, but it’s hardly a good sign that even the name is fraudulent.

When we look at what “CS researchers” actually do, we see three kinds of people. We can describe them roughly as creative programmers, mathematicians, and bureaucrats.

Creative programmers write interesting, cool, and relevant software. The only workable system for generating interesting, cool, and relevant software is well-known. Find a bunch of really smart programmers and point them at a large problem space for which there are actual users, and for which new solutions are unconstrained by old designs. Give them a manager who’s just as smart and knows how and when to stay out of their way, and turn them loose.

Depending on the people and the problem, this approach may or may not produce results. But certainly nothing else will. More or less the above strategy was followed at the MIT AI Lab, at Bell Labs in the early Unix days, at Engelbart’s lab, at Xerox PARC, at Berkeley CSRG, etc. It produced most of the aforementioned pioneers of system software.

And it is the farthest possible thing from academic CS research as practiced today, which I think is the main reason why we see so little good come out of the field.

Of course, creative programmers can be hard to inhibit. Academic CS programs do bring together a lot of very smart people, and sometimes they still get out of control and actually build something useful. Thus, for example, Google. Note that the idea that became Google was not a part of anyone’s research project, it was not assigned by Page and Brin’s advisors, it had no Principal Investigator, it was just interesting, cool and relevant software.

As a general rule, this sort of thing can only happen in the first couple years of grad school, in which your basic purpose in life is to learn all the things they forgot to teach you as an undergrad. After that you are sucked into the apparat, and unless you are very good at working the system, your power to work on random cool stuff disappears. If you are extremely lucky and successful, it may reappear in fifteen years, when you have your own empire. But most creative people don’t have the patience to play this game.

The problem is the funding. The reason why CS research produces so little that can be called creative programming these days is that the modern process of grant-funded research is fundamentally incompatible with the task of writing interesting, cool and relevant software. Rather, its goal is to produce publications and careers, and it’s very good at that. As we’ll see when we get to the bureaucrats.

But because of this problem, most creative programming these days comes from free-software programmers working in their spare time. For example, the most interesting new software projects I know of are revision control systems, such as darcs, Monotone, Mercurial, etc. As far as Washington knows, this problem doesn’t even exist. And yet the field has developed wonderfully.

It’s very difficult for creative programmers in the academic CS world to compete with those in the free-software world. Academics have one advantage: money. Free-software programmers have another: freedom. Ideally one would have both, but if you have to pick, pick freedom. If you look at the salary of a grad student or a postdoc, it’s not that hard to support yourself at the same standard of living by working intermittently in the corporate salt mines. Besides, you might actually learn something.

Next we have the mathematicians, who prove interesting, cool, and relevant propositions. Unfortunately, most of the really interesting, cool, and relevant propositions in computational mathematics were proved before 1960, but there is still some fun stuff left. Genuine computational mathematics typically goes under the name “algorithms” in CS, although cryptography is also of course math.

Obviously, computational mathematics and creative programming are only barely related. In my humble opinion, if something is math, it should be in the math department. As we’ll see, managing math and programming as if they were one field has produced disastrous results.

This brings us to the third group, the bureaucrats. Bureaucrats build academic empires which churn out meaningless solutions to irrelevant problems.

The CS-research bureaucrat’s main difficulty is that no one wants to fund bureaucrats. Therefore, he must pretend to be either a creative programmer or a mathematician, preferably both. Since this task is critical to his survival, he is extremely good at it.

The bureaucrat has many strategies. But probably his best is to take an area of creative programming and devour it like a locust, by turning it into a form of mathematics.

Math outcompetes creative programming in the funding process, simply because it appears to be more rigorous. It is more rigorous, and it generates a longer, deeper river of more impressive publications. And, because its area is nominally applied, it doesn’t have to compete with the real mathematicians over in “algorithms,” who would clean the bureaucrats’ clocks in five minutes.

The problem with math is that there’s an infinite supply of it. It is always possible to create new problems to solve and new formalisms to apply. Math is always interesting and cool, at least to mathematicians, and after your field has been parasitized by mathematicians for a decade or two, everyone in it will be a mathematician. There’s enough work for everyone.

It’s possible to describe anything in mathematical notation. I recall seeing some paper once in which someone had created a mathematical description of C. (I forget whether or not this included the preprocessor.) As an achievement, this is somewhat like building a full-size model of the Eiffel Tower out of tongue depressors. It’s clearly not the act of a talentless man, but you have to wonder what he said when he applied for his grant.

Whatever their tactics, what CS bureaucrats always sacrifice is relevance. Of course everyone has a conscience, and everyone would like to be actually relevant. But, especially when the only people checking up are your own godfathers in the funding agencies, it’s much easier to pretend to be relevant. Actual relevance is extremely difficult to achieve, and hardly rewarding at all. In fact it’s embarrassing, because everyone who isn’t relevant and knows it has an automatic grudge against you. Since CS research is quite competitive, evolution has its way, and relevance goes the way of the snake’s legs.

When the responsibility of relevance has been lifted, it’s easy to write as many interesting and cool papers as you like. The point of these papers is to prove that you, the researcher, are smart. Of course, not only would the world be a much better place if a smart person such as yourself was working on something relevant, but you yourself would be having more fun. But this is the trap.

No field has been more infested by irrelevant formalisms than that of programming languages—also known as “PL research.” So when I say Guy Steele isn’t a PL researcher, what I mean is that he’s not a bureaucrat. Rather, he’s a creative programmer who writes interesting, cool, and relevant software, which happens to consist of programming languages.

The only way to understand the horrific trainwreck that is PL research is to actually look, in detail, at the ways it’s discarded relevance in favor of irrelevance. To that end, let me answer another of the Reddit readers’ complaints.

So, two: it was perhaps a little light-hearted of me to describe proof-carrying code as the world’s most pointless compression scheme. But the argument is basically accurate.

The whole idea of proof-carrying code is that it’s a binary format for executable code whose correctness can be automatically verified. The term cannot imply a source format, because a source format whose correctness can be automatically verified is best known as a “type-safe programming language.”

Such a binary format is useful because it saves CPU cycles across the entire network. If the source format can be verified but the binary can’t—a trivial, and generally universal, property of anything anyone could describe as a “compiler”—obviously, you cannot ensure correctness unless you send the source rather than the binary. This requires you to perform the same compilation step on every machine for every piece of code it runs. An obvious, and obviously massive, source of global inefficiency.

Unfortunately for the researchers working on this problem, now and for the foreseeable future, the cost of client-side compilation is quite moot. It may have mattered at a time when computers were much slower. In fact, I will go so far as to say it probably did matter, say, as recently as 1990. However, with modern incremental compilation it is not even worth considering. Everyone who browses the web is compiling code from source, with every click. What is HTML? What is Javascript? Not chopped liver. Source-based systems have won, because they are simply simpler—they pass what Rohit Khare calls the “telnet test”—and they are easily fast enough on any modern machine.

If you can check the source but you can’t check the binary, it’s probably better to just send the source. If the binary is smaller than the source, a naive compression algorithm should do pretty well, too. If you like the fact that the binary is opaque and hard to decompile, it’s not at all hard to write a shrouder (automatic code obfuscator) that achieves the same goals. If you want to support multiple source languages, pick one which covers the entire feature space of the binary verifier, and compile the others to it instead. If all this fails and you really need secure binaries, use a cryptographic signing service. (People who really care might enjoy this amusing discussion of “The Future of PCC” from 2000.)

Because the price of PCC or anything like it is inevitably a file format whose definition is a 80-page LaTeX paper, including many strange, squiggly marks which for the average coder might as well be Sumerian cuneiform. There is a reason why IETF RFCs are ASCII-only, and at least the good ones are quite short. Therefore, the chance that PCC or any design involving it will ever be approvable by any standards process not involving a caoutchouc-based engraved ink applicator is, to put it mildly, remote. In standards, complexity is death.

Therefore, because the entire goal of a network protocol is to be standardized, PCC cannot possibly achieve its own goal. And therefore, every CS researcher working on verification algorithms for generated code is wasting time at public expense.

There is a name for this. The name is patronage, and most historians consider it a criminal abuse of government power. CS research is only a tiny brick in the great pyramid of academic patronage that is the university and educational system, and yes—if you didn’t, someone else would. But if you only steal a candybar, you’re still a thief.

And this is what I think is wrong with CS research: it’s best understood as a massive patronage scheme. Compared to all the other wholesale patronage operations run by our benign, caring masters, the financial cost of this particular scheme is relatively small. However, its human and economic cost is much higher, because it sucks in many young, talented people who could otherwise be engaged in productive work.

Again, any unproductive activity can be described as “research.” There is no endeavor so obviously noble that its value to others cannot be evaluated. Nor is there any endeavor whose toilers will happily report to you that they’re wasting their time, and their funding should be cut. There is no substitute for actual critical thinking.

If you feel the point is made, please feel free to skip to the end of the post. But if you think PCC may be an exception and you’re looking for more examples, by all means, let’s continue.

Again, the general problem in PL research is that it’s been drowned in clouds of math, whose effect is not to create better languages, but to publish more papers. Specifically, PL research today is actually the study of something called “type theory,” whose correspondence to what a naive person might consider research into programming languages—designing new languages for actual users to actually use—is extremely remote.

Type theory actually has nothing to do with computers or computing. It is the modern development of the project in metamathematics (the expression of mathematical statements in mathematical terms) that originated in the late 19th century and continued into the first half of the 20th, with researchers like Frege, Russell, and Church. In other words, it dates back to when “computer” was a job description. I rest my case.

You can model computing as metamathematics, of course. You can model computing as Tinkertoys. The whole point of Church-Turing equivalence is that pretty much any recursive model you can build is an equivalent description of computing. The only reason the lambda calculus, and other such metamathematical approaches (such as defining programs as proofs) are so popular in the world of “CS research” are that “CS research” was largely founded by a land-rush of metamathematicians. The motivation for this migration is left as an exercise for the reader.

Of course, in a sense, anything you do with in a computer can be described as “mathematics.” The point is not whether or not programming is math. The point is what notation is most effective for expressing programs. Choosing the best notation is the entire problem of programming language design, and this problem is neither mathematical nor scientific. A programming language is a user interface for programmers, and if you can reduce UI design to math, science, or any formal process, call Apple, not me.

I happen to prefer the ASCII-based family of notations used by working programmers, which are clearly effective for encoding the transformations of state which all programs, functional or imperative, typed or unsafe, express. I don’t see why anyone whose goal is to design, explain, or specify a programming language should be using any other notation than that of the language itself, and I don’t see why any language needs any non-ASCII notation.

Even academic languages like Haskell and OCaml have ASCII notations, although researchers who write papers on them are still fond of their private symbol set. As a marketing strategy this approaches the genius of Crunchy Frog. Frankly, CS research is so distrusted outside its own ghetto that even posting your work as a PDF file (let alone ps.gz) is a big red flag. Break out the squiggles and it’s pretty much curtains.

Anyway, if you want to learn about type theory, you can buy Types and Programming Languages, by Benjamin F. Pierce. If you are at all confused after this, you can clear up any difficulties with Advanced Topics in Types and Programming Languages, by Benjamin F. Pierce. Presumably Professor Pierce is now working on “Further Reading in Types and Programming Languages,” or “Extremely Advanced Topics in Types and Programming Languages.”

Each of these tomes would make a fine textbook for a graduate class, or at least upper-level undergraduate, at a top-rank American university. Certainly, anyone who can master this material is qualified to teach undergraduate programming. I’d also say they’re smart enough to take any other programming job—for example, they’d make good kernel hackers. If, of course, they were fully retrained to be masters of C (though many already are).

Meanwhile, you can explain the type system of C to a fifteen-year-old in a day. And C is almost certainly the most difficult programming language in use today, except C++. (I love C, but I have never much liked C++.) It takes a decent C programmer typically a good week or two to learn basic C++, although they often emit very nasty C-isms for quite some time thereafter.

Meanwhile, you can explain the type system of Python to a retarded nine-year-old in five minutes. I exaggerate, slightly.

This perhaps explains why Python was not designed by a PL researcher. Python was not designed by a PL researcher because Python would not impress other PL researchers. And since impressing your peers is the sine qua non of any academic discipline, any academic who designed something like Python (probably the closest similar case is Ousterhout’s Tcl) would be doing it simply for their own private kicks—a perverse feat of intellectual masturbation, perhaps impressing the undergrads, but hardly a source of any actual professional prestige.

It also explains why Haskell was not designed by Guido van Rossum (designer of Python). Mynheer van Rossum, who as far as I know dropped out of kindergarten, wanted to build a language that people would find useful. And he certainly did.

A programming language like Haskell, with a generally Piercean type system that takes a semester for even the smartest undergraduates to learn, is not useful. Because one of the key ingredients of victory in a language conflict—as in any other standards war—is how many people already use the language.

This conflict is unstable. It exhibits the Matthew Effect. Since a language is a UI for programmers, a language that’s hard to learn has a fatal and self-reinforcing disadvantage in this contest. There’s no need to learn it, because it has no chance of winning the standards war. And thus it is not a good language.

Haskell actually is quite useful—in one context. It’s an excellent teaching language for advanced students in a first-rate undergraduate CS program. Learning Haskell cannot help but teach you many challenging concepts, and it is inevitable that those who are both smart and dedicated will master these concepts. The rest will not. And thus the university system performs its principal function, the assignment of rank. If you venture to suggest that the same function could be performed, at two orders of magnitude less cost, by an IQ test and three weeks at Parris Island, you are probably some kind of a fascist.

Of course, the exponents of Haskell have an answer for this. They answer that since Haskell is so more elegant and powerful than the likes of Python, it can overcome all these problems.

I am certainly a fan of pure functional programming, which Haskell offers and Python does not. In theory, pure functional programming is actually simpler than imperative programming, because the former is a subset of the latter. However, the slight advantage of removing imperative features does not even begin to make up for the Eigerlike learning curve of Haskell, whose incredible and utterly-unwarranted complexity manifests itself in more ways than I can even begin to describe.

Let me describe one of these ways by examining one of the principal features of Haskell, which is what might be called “dynamic higher-order programming.”

A language has “dynamic higher-order programming” when your programs can construct new functions dynamically at runtime. If you know a little about programming, this probably sounds esoteric, and for good reason. However, Haskell uses it all the time to do just about everything.

For example, in Haskell, when you write an add function—call it add()—that adds two integers a and b, the proper and approved way to do it is to first dynamically construct a function—it will have no name, but call it add_a(x)—that adds a to some argument x. Then you call add_a(b). This is called “currying” (after the mathematician Haskell Curry).

The motivation behind this bizarre obfuscation has always escaped me. Obviously it allows each function to have only a single argument, but it is pretty easy to write an add() function whose single argument is a pair (a b)—hardly a recondite data structure.

And it’s not just that dynamic higher-order programming is overkill for add(). It’s overkill for anything. For example, one of the most touted uses of higher-order-programming is parser generation—as performed by systems like Parsec in Haskell.

The idea of parser generation is, unsurprisingly, to generate a parser. A parser is a function whose input is a text file and whose output is a data structure. Obviously, parsing is the first stage in any process which involves converting source code into executable code, i.e., compiling. No programming language, for example, can exist without a parser.

The trick is that if you write a parser just as a normal program, it tends to be extremely messy. A parser is really best described as a data structure called a grammar, and a grammar—though it may include pieces of code—is not code at all, but its own data structure.

In most programming environments which are not language-specific, there is a separate tool (called yacc in Unix) that automatically converts a grammar into a function written in a normal language. For example, yacc converts a yacc grammar, typically a .y file, into a C program, or .c file.

Now the way yacc works is a nasty process, because it involves generating source, and as a general rule it’s always nasty to generate source. The Haskell model—in which the grammar is converted directly into a function, by a normal program in a language designed to build functions—seems much prettier.

And it is. However, there is only one problem—the function we build is the same every time. A language is a language. Its syntax is not dynamic. By definition, one parser function is sufficient for any program written in the language.

Therefore, we have no need for dynamic higher-order programming. All we need is static higher-order programming. For every program, the parser generator must be run once per grammar. Granted, you might want to load grammars dynamically at runtime, but you might want to load any function dynamically at runtime. The solution to this problem is not dynamic function construction—it is dynamic module loading.

(Note that static higher-order programming is not at all inconsistent with a system of first-class functions, like function pointers in C, closures in Lisp, etc. In a language with first-class functions, you can store functions as variables in normal data structures, but this does not imply the ability to construct new functions.)

In fact, “static higher-order programming” has another name. It’s a specialty of the older, and much simpler, Lisp language family. In Lisp and its many relatives, “static higher-order programs” are called macros. Unsurprisingly, your average PL researcher looks at a macro pretty much the same way your average Parisian chef looks at an Outback Steakhouse.

Even forgetting the macros, it’s pretty easy to see why you don’t need dynamic higher-order programming. For example, a very common operation on functions is to compose them—you have f(x) and g(x), you make h(x), defined as f(g(x)). In the right formal model, a compose operator is really no different from addition. And once you have first-class functions—which are not absolutely necessary, but certainly useful—it seems like a trivial extension.

But is it what you, the programmer, want to do? Actually, no. It is simply bad programming style. Its only effect is to make your program less comprehensible.

Because in a language with first-class functions but without dynamic higher-order programming, all you need to do is build a data structure which is the ordered pair (f(x) g(x)), and define h(x) as an operation on this structure. This may be slightly more cumbersome for the programmer. At least, if one line of code counts as “slightly more cumbersome.”

But it has a big benefit, which is that you’re dealing with a data structure and you know it. Strictly from the perspective of user experience, it’s much easier to debug data structures than to debug dynamic code. This is also the reason why self-modifying code is a bad idea, at least from a software-engineering perspective, and the bad idea of dynamic higher-order programming can be seen as a corollary of this well-known conclusion.

Of course, it may be slightly easier to optimize the result of dynamic code generation, because you can throw a generic runtime optimizer at it. But this comes back to the first problem, which is that, in practice, we only need static code generation. Whatever problem we are solving by constructing h(x), it is probably better solved by a macro, which would perform the same composition statically.

This entire problem space comes down to a single error in the metamathematical approach to computing, which is that the 1930s metamathematicians were fascinated by the fact that they could represent any data structure as a function. While this is true, to describe it as perverse is an insult to perverts.

Surely, if you were actually building a programming environment, rather than trying to express the fundamentals of mathematics, your main concern would be how to represent code as data, rather than the other way around. (Again, this is the Lisp approach.)

Defining data as dynamic higher-order functions worked badly when most computing was standalone. And it works even worse now that we have these things called networks. The problem is that a dynamic higher-order function is a sort of ethereal construct, a speck of four-dimensional pixie dust, and the idea of saving it to disk or sending it over the network does not make any particular sense.

The few people who actually try to use the metamathematical languages for real work struggle manfully to get around this—for example, at Jane Street Capital, an OCaml shop, they use a source preprocessor to compose Lisp-style versions of their types. A thing of beauty is a joy forever. A source preprocessor is not a thing of beauty.

Okay. Let’s rise slightly above this rant, and survey the wreckage. The problem, again, is that taxpayer funds are being used to employ some of the world’s most brilliant and energetic people to do work of no conceivable practical utility. Perhaps it may have some serendipitous benefit, but surely if the same people were working on relevant problems they would produce much more useful results.

How can such a thing happen? How did this disastrous patronage system come to exist? Was it anyone’s evil plot?

Not at all. It’s just the effect of treating something that isn’t a science as if it was a science. Confucius, if I may quote my namesake’s teacher, said that if he were given a chance to reform the state, his first act would be to make sure all things had their proper names.

I think the world could use a charity that funds creative programming. The software systems that people use today—don’t even start me on “Web 2.0”—are awful and ancient, and hardly anyone has any reasonable plan to improve them. Free-software programmers are not at all bad at supporting themselves, but nothing like Xerox PARC exists today, and it should.

But the US Government is not a charity, and creative programming is not “computer science.” And the result of treating these things as if they were things that they aren’t is abominable. It’s the morass of busywork, waste and fraud that is CS research today. By far the simplest treatment for this mess is to just abolish it.

Because anyone who’s not involved in CS research treats the products of this endeavor as if they were smallpox-infected blankets. Even when it is clearly—in my opinion—good, it winds up ignored. Because of the inescapable grant-related propaganda, it’s impossible to tell what’s good and what’s not.

For example, recently some of the same people involved in the PL research I so dislike built a language called Cyclone, which adds some very useful safety features to C. There are still zillions of free-software projects written in C, and probably most of them could benefit from upgrading to Cyclone. But the thing has the academic kiss of death on it, and no one will touch it. The mailing list is virtually empty.

This is not just prejudice. It is rational mistrust. Academics, as we’ve seen, have no incentive to build software that’s actually relevant, and every incentive to build software that appears to be relevant. They have every incentive to overpromote, and no incentive at all to build an actual user base. In fact, an actual user base is an enormous drag for a CS researcher, because it demands actual support, which does not advance the researcher’s career at all. Also, because of the nefarious Bayh-Dole act, CS researchers are frequently armwrestled into patenting their work, which of course they have no incentive to disclose. If the rest of these factors didn’t give you a reason not to use academic software, this certainly would.

The result is a world of Potemkin software which appears to be enormously useful, even revolutionary. In reality it is unusable. The researchers are some of the smartest people in the world, and surely some of what they’re doing has some merit. But it is almost impossible to figure out what’s wheat and what’s chaff, and most sensible people just don’t come near it.

For example, maybe seven years ago I was in a design meeting at a small company which was at the time the unquestioned innovation leader in its field, mobile Web browsing. (I admit that this is a pathetic field to be an “innovation leader” in, and in fact most of our “innovations” turned out to be worthless. I’m just describing the perception here, not the reality.)

The conversation turned to something and one employee, Patrick, who’d been recently poached from Nokia and didn’t know the vibe, said “maybe we could research that.”

Bruce, the CTO, promptly said: “if Alain the CEO heard you use that word, he’d rip off your head and spit down your neck.”

I said, “I take back everything bad I ever said about Alain.”