Wolfram Alpha and hubristic user interfaces

I feel it’s been too long since we had a purely technical discussion here on UR. Gotta mix it up a little more. I know UR has some technical readers. For everyone else, the summer is long.

Aside from being Billy Getty’s freshman roommate (it is no longer a secret that Mr. Getty owned an illegal ferret named “Earwig”), your author was a graduate student in computer science around the same time as Messrs. Brin and Page, at a similar though different institution. Unfortunately, his interest was not search, but operating systems. This turned out to be the “old thing.” Thus your author is doubly familiar with proximity to great wealth and success.

Basically, MM thought search was lame. It reminded him uncomfortably of “library and information science.” As for the Web, it seemed a laudable improvement on FTP—without that nasty reverse TCP connection. It certainly didn’t involve any distributed shared memory or process migration, that’s for sure.

Your author has certainly seen the error of his young ways. He now agrees not only that full-text search is a good idea, but that distributed shared memory is almost always a bad one, and process migration is always a bad one. Indeed, it’s not really clear to me that operating systems is a valid academic field at all. If someone had axed its funding (the dirtiest word in the English language has seven letters and starts with “F”) in 1980, how different would your computer be? Bear in mind: someone would also have a few billion dollars to spend on something else.

But there were (and, sadly, still are) a lot of very bright people in OS, which is really what attracted young MM. And he did learn a trick or two. Some of which work for problems other than process migration. And so, when in the year 2009, he sees people (also very bright) making one of the same mistakes that these bright people taught him not to make in 1992—he feels obliged to comment.

Indeed (as we’ll see), every decade since the ’80s, billions of dollars and gazillions of man-hours have been invested in this fundamental error, to end routinely in disaster. It’s as though the automotive industry had a large ongoing research program searching for the perpetual-motion engine.

The error is that control interfaces must not be intelligent. Briefly, intelligent user interfaces should be limited to applications in which the user does not expect to control the behavior of the product. If the product is used as a tool, its interface should be as unintelligent as possible. Stupid is predictable; predictable is learnable; learnable is usable.

I was reminded of this lesson by a brief perusal of Wolfram Alpha, the hype machine’s latest gift. Briefly: there is actually a useful tool inside Wolfram Alpha, which hopefully will be exposed someday. Unfortunately, this would require Stephen Wolfram to amputate what he thinks is the beautiful part of the system, and leave what he thinks is the boring part.

WA is two things: a set of specialized, hand-built databases and data visualization apps, each of which would be cool, the set of which almost deserves the hype; and an intelligent UI, which translates an unstructured natural-language query into a call to one of these tools. The apps are useful and fine and good. The natural-language UI is a monstrous encumbrance, which needs to be taken out back and shot. It won’t be.

This is hilariously illustrated by WA’s own Technology Review puff piece. Our writer, par for the course, spends seven pages more or less fellating Dr. Wolfram (for real technology journalism: L’Inq and El Reg), but notes:

The site was also bedeviled by an inflexible natural-language interface. For example, if you searched for “Isaac Newton birth,” you got Newton’s birth date (December 25, 1642; you also learned that the moon was in the waxing-crescent phase that day). But if you searched for “Isaac Newton born,” Alpha choked. Aaronson tested it with me and found it couldn’t answer “Who invented the Web?” and didn’t know state-level GDP figures, only national ones. […] “Why won’t it work with two cups of flour and two eggs?” Gray asked, finally.

“Well,” Williams replied, “there’s a bug.” […] But if you gave Wolfram Alpha every allowance–that is, if you asked it about subjects it knew, used search terms it understood, and didn’t care to know the primary source–it was detailed, intelligent, and graphically stunning. […] “Wolfram Alpha is an important advance in search technology in that it raises expectations about how content that is stored in databases should be searched,” Marti Hearst, a computer scientist at the University of California, Berkeley, and the author of Search User Interfaces, told me. But she added that it “has a long way to go before achieving its ambitious goals.”

Fun fact: when the author was a junior grad student, Marti Hearst was a senior grad student. How long will it be before intelligent search interfaces achieve their ambitious goals? Will Professor Hearst have, or have not, retired by then? Suppose we cut off her funding, etc.? And how exactly did CS get to be this field that goes around in a circle, sucking cash and getting nowhere? That’s certainly not why I spent all my Friday nights in the Sun lab.

But what do I mean by control interface? The hypothesis turns on this definition.

Let’s examine this difference between Google and WA. Basically, Google is the exception: the UI that is not a control interface. Because Google’s search interface is not a control interface, it should be an intelligent interface, as of course it is.

Google is not a control interface because intrinsic to the state of performing a full-text search is the assumption that the results are to some extent random. Let’s say I’ve heard of some blog called “Unqualified Reservations” and I type it into Google.

Am I sure that the first result will be the blog itself? I suppose I’m about 95% sure. Do I have any idea what will come next? Of course not. Will I automatically click on the first result? Certainly not. I will look first. Because for all I know, the million lines of code that parsed my query could be having a bad hair day, and send me to Jim Henley instead.

Google is not a control interface, because no predictable mapping exists between control input and system behavior, and none can be expected. A screwdriver is a control interface because if I am screwing in a screw and I turn the handle clockwise, I expect the screw to want to go in. If the screw is reverse threaded, it will want to come out instead, confusing me dreadfully. Fortunately, this mapping is not random; it is predictable. (Yes, Aspies, by “random” I mean “arbitrary.”)

Because of this predictable mapping, people who screw in large numbers of screws are saved a large amount of cognitive load. The feedback loop becomes automatic. It embeds itself in muscle memory. Billions of lives made easier. Give it up for the standardization of the screw.

But any such mapping is inherently impossible for full-text search. Google’s problem is an intrinsically heuristic one. The result of the search is always a starting point for further analysis. There is never any automatic next step.

The advantage of this inherent unpredictability is that since a search request never implies any precise rules for the prioritization of results, a search engine can use arbitrarily fuzzy and complex heuristics to get the best results to the top. And, indeed, should. Thus, Google can be Google, and Google should be Google. And Google is Google. Give it up to teh Goog.

And here we come to Wolfram Alpha. WA is not the same thing as Google. Everyone knows this. Everyone does not seem to realize the implications, however. Let me explain why the natural-language interface of WA is such an awful idea.

WA is not a full-text search engine. It is a database query and visualization tool. More precisely, it is a large (indeed, almost exhaustive) set of such tools. These things may seem similar, but they are as different as popes and partridges.

Google is not a control interface; WA is. When you use WA, you know which of these tools you wish to select. You know that when you type “two cups of flour and two eggs” (which now works) you are looking for a Nutrition Facts label. It is only Stephen Wolfram’s giant electronic brain which has to run ten million lines of code to figure this out. Inside your own brain, it is written on glowing letters across your forehead.

So the giant electronic brain is doing an enormous amount of work to discern information which the user knows and can enter easily: which tool she wants to use.

When the giant electronic brain succeeds in this task, it has saved the user from having to manually select and indicate her actual data-visualization application of choice. This has perhaps saved her some time. How much? Um, not very much.

When the giant electronic brain fails in this task, you type in Grandma’s fried-chicken recipe and get a beautiful 3-D animation of a bird-flu epidemic. (Or, more likely, “Wolfram Alpha wasn’t sure what to do with your input.” Thanks, Wolfram Alpha!) How do you get from this to your Nutrition Facts? Rearrange some words, try again, bang your head on the desk, give up. What we’re looking at here is a classic, old-school, big steaming lump of UI catastrophe.

And does the giant electronic brain fail? Gosh, apparently it does. After many years of research, WA is nowhere near achieving routine accuracy in guessing the tool you want to use from your unstructured natural-language input. No surprise. Not only is the Turing test kinda hard, even an actual human intelligence would have a tough time achieving reliability on this task.

The task of “guess the application I want to use” is actually not even in the domain of artificial intelligence. AI is normally defined by the human standard. To work properly as a control interface, Wolfram’s guessing algorithm actually requires divine intelligence. It is not sufficient for it to just think. It must actually read the user’s mind. God can do this, but software can’t.

Of course, the giant electronic brain is an algorithm, and algorithms can be remembered. For instance, you can be pretty sure that the example queries on the right side of your screen (“June 23, 1988”) will always send you to the same application. If you memorize these formats and avoid inappropriate variations, you may not end up in the atomic physics of the proton.

This is exactly what people do when circumstances force them to use this type of bad UI. They create an incomplete model of the giant electronic brain in their own, non-giant, non-electronic brains. Of course, since the giant electronic brain is a million lines of code which is constantly changing, this is a painful, inadequate and error-prone task. But if you are one of those people for whom one of Wolfram’s data-visualization tools is useful, you have no choice.

This effect is unavoidable in any attempt at an intelligent control interface. Because any attempt at intelligence is inherently complex, the UI is effectively byzantine and incomprehensible. It isn’t actually random, but it might as well be. There is no human way of knowing when it will work and when it will crap out.

But because the attempt fails, the algorithm is incapable of producing actual divine awareness of the user’s intent, a user who is actually trying to use the control interface to get something done, ie achieve the normal task of selecting the dataset to query and visualize, cannot simply delegate that task to the UI. At least, not reliably. So she is constantly pounding her head on her desk. (As a toy, of course, Wolfram Alpha is great—a toy is not a tool, i.e., not a control interface. As a toy, it would never have been built.)

Thus, the “flexible” and “convenient” natural-language interface becomes one which even Technology Review, not exactly famous for its skepticism, describes as “inflexible.” The giant electronic brain has become a giant silicon portcullis, standing between you and your application of choice. You can visualize all sorts of queries with Wolfram Alpha—but first you have to trick, cajole, or otherwise hack a million lines of code into reading your mind.

For serious UI geeks, one way to see an intelligent control interface is as a false affordance—like a knob that cannot be turned, or a chair that cannot be sat in. The worst kind of false affordance is an unreliable affordance—a knob that can be turned except when it can’t, a chair that’s a cozy place to sit except when it rams a hidden metal spike deep into your tender parts.

Wolfram’s natural-language query interface is an unreliable affordance because of its implicit promise of divine intelligence. The tool-guessing UI implicitly promises to read your mind and do what you want. Sometimes it even does. When it fails, however, it leaves the user angry and frustrated—a state of mind seldom productive of advertising revenue.

Now: as I said, we have seen this pattern before. In the department of intelligent control interfaces, everyone above a certain age will be reminded of one great fiasco of the past: the Apple Newton, and its notorious cursive handwriting recognition. (The Doonesbury clip is a perfect four-panel dramatization of the effect of an unreliable affordance.)

Again we see an intelligent algorithm attempt to insinuate itself into the control loop. Again, we see risible disaster. (One difference is that handwriting recognition is not a problem requiring divine intelligence—at least, not for everyone. But human intelligence is equally impossible. Apple actually still ships the Newton handwriting engine, but no one uses it and it still sucks.)

With both these examples under our belt, we may consider the general problem of hubristic user interfaces. For at the spiritual level, the sin here is clearly that of hubris—overweening pride that angers the gods, and drives them to ate, or divine destruction. By presuming to divine intelligence, of course, Wolfram Alpha has committed hubris in the highest degree. (Dr. Wolfram is certainly no stranger to hubris.)

At a more mundane level, however, we may ask: how do these obvious disasters come about? Man is flawed and hubris is eternal, of course. But really. Why, year after year, does the software industry piss away zillions of dollars, and repeatedly infuriate whatever gods there be, butting its head against this wall like a salmon trying to climb Boulder Dam? Why on earth do these mistakes continue to be designed, implemented, and shipped? By smart, smart people?

The simple answer is that both academia and the industry are, to a substantial extent, driven by hype. Hype gets press, and hype also gets funding. The press (Inquirer and Register excepted) is not a critical audience. The NSF is an even less critical audience—at least, for projects it is already pursuing. Again, if abject failure were an obstacle to continued funding, most of “computer science” would have ceased to exist sometime in the ’90s. Instead, Professor Hearst will no doubt be able to pursue her ambitious goals until a comfortable retirement in the 2030s. Long live science!

Hype also generates funding because it generates exaggerated sales projections. For instance:

“What Wolfram Alpha will do,” Wolfram says, “is let people make use of the achievements of science and engineering on an everyday basis, much as the Web and search engines have let billions of people become reference librarians, so to speak.” […] It could do things the average person might want (such as generating customized nutrition labels) as well as things only geeks would care about (such as generating truth tables for Boolean algebraic equations).

Generating customized nutrition labels! The average person! I just laughed so hard, I needed a complete change of clothing.

Dr. Wolfram, may I mention a word to you? That word is MySpace. If there is any such person as this average person, she has a MySpace account. Does she generate customized nutrition labels? On a regular basis, or just occasionally? In what other similar activities does she engage—monitoring the population of Burma? Graphing the lifecycle of stars? Charting Korean copper consumption since the 1960s? Perhaps you should feed MySpace into your giant electronic brain, and see what comes out.

Like most hubristic UIs, Wolfram Alpha is operating with a completely fictitious user narrative. The raison d’etre of the natural-language interface, stated baldly, is to create a usable tool for stupid people who might be confused or intimidated by a tree of menus. The market of stupid people is indeed enormous. The market of stupid people who like to use data-visualization tools is, well, not. (And since the interface is not in fact easy but actually quite difficult, it achieves the coveted status of a non-solution to a non-problem.)

But there is a more subtle and devilish answer to the question of why hubristic UIs happen.

Strangely, to the developers of intelligent control interfaces, these interfaces appear to work perfectly well. Moreover, when the developers demo these interfaces, the demo comes off without a hitch—and is often quite impressive. This is not the normal result of broken software. This “demo illusion” convinces the developers that the product is ready to ship, although it is not and will never be ready to ship.

Demo illusion is caused, I think, by the same compensation mechanism that allows users to grit their teeth and use a hubristic UI. Again, the user who has no choice but to use such a monster develops her own internal mental model of its algorithm. If you are forced to use a Newton, you can, and this is what you do.

For example, the Newton user may note that when she writes a T with the bar sloping up, it is recognized as a T, whereas when the bar slopes down it has an ugly tendency to come out as a lambda. So she trains herself to slope her Ts upwards, or to always enter “one cup of flour” rather than “two cups of flour” and double the Nutrition Facts herself, or to jump through any other trivial and unnecessary hoop in order to placate the angry god inside the “intelligent” UI. By slow painful effort, she constructs a crude subset of the system’s functionality which happens to work for her, and sticks to it thereafter.

But for the actual developers, this compensation mechanism is far more effective. The actual developers (a) have enormous experience with the hubristic UI, (b) have enormous patience with its flaws, and (c) most important, know how it actually works. So their internal model can be, and typically is, orders of magnitude better than that of any naive user. So the product actually seems to work for them, and does. Unfortunately, it’s hard to make money by selling a product to yourself.

Now. UR is a positive, upbeat blog, and we never explore problems without offering solutions. And one of the reasons that Newton is such a fine example of hubristic UI is that Palm, a few years later, came along and did pen input right. It turns out, as some of us had always suspected, that pen computing is just not a very good idea, and the real solution is little keyboards. However, it is not impossible to make pen input work as a product—and Palm proved it, with Graffiti.

What Jeff Hawkins realized is that the human skull contains an organ called a “brain,” which has spent several million years learning to use tools. Therefore, if you are building a control interface, i.e., a tool, the prudent way to proceed is to (a) assume your users will need to learn to use your tool, (b) make it as easy as possible to learn the tool, and (c) make the tool as effective as possible once it is learned.

The big win of Graffiti was that the Graffiti recognizer was simple—perhaps an order of magnitude simpler than Newton’s, maybe more like two. If you invested the small amount of mental effort to learn Graffiti, which was not at all out of proportion to the cost or utility of the Palm, you had a predictable and reliable control mapping with a low error rate, because your brain’s internal model of Graffiti was reasonably close to the actual algorithm. Moreover, the process of learning it was actually kind of fun.

Applying this realization to put a good UI on Wolfram Alpha would not be difficult at all. It would not even require removing the giant electronic brain, which could remain as a toy or exploratory feature. Again, it is a perfectly decent toy, and it may even be a reasonable way to explore the space of visualization tools and datasets that WA provides.

But if you are an actual flow user who actually needs to get something done, WA could give you an alternative, manual interface for selecting your tool. You might perform the discovery task by browsing, say, a good old-fashioned menu. For example, the Nutrition Facts tool might come with its own URL, which you could bookmark and navigate to directly. There might even be a special form for entering your recipe. Yes, I know none of this is very high-tech. (Obviously the coolest thing would be a true command line—but the command line is truly not for all.)

A more intriguing question is whether the Graffiti approach can be applied to full-text search. Many modern search engines, notably the hideous, awfully-named Bing, are actually multiple applications under the hood—just like WA. If Bing figures out that you are searching for a product, it will show you one UI. If it figures out that you are searching for a celebrity, it will show you another UI. It may also switch algorithms, data sets, etc., etc. I’m sure Google has all kinds of analogous, if more subtle, meta-algorithms.

While generic full-text search, unlike generic data visualization, remains a viable application and a very useful one, specialized search might (or might not—this is not my area of expertise) be an even more useful one. If the user has an affordance by which to tell the algorithm the purpose or category of her search, the whole problem of guessing which application to direct the query to disappears and is solved perfectly. A whole class of category errors ceases to exist.

My guess is that if there is any “next thing” in search interfaces, it will come not from smarter UIs, but from dumber ones in which the user does more work—the Graffiti effect. If a small quantity of user effort can produce a substantial improvement in user experience (which is a big if), the user will accept the bargain. Hey, it made Jeff Hawkins rich.