It seems like the article may have confused two concepts from information theory. He talks about entropy (without explicitly naming it) which can be used to bound the best possible data compression scheme.
Data compression is not the problem with interacting with devices though. What he should be instead looking at is channel capacity, where the user is encoding information for transmission that the device will receive and decode.
Maybe that's too technical for the nature of this article, but oh well.
It isn't really sensible to distinguish between "meaning" and "information." As he notes in the side bar, measurements of "information" content depend on the statistical model used. "Meaning" is just information content in a statistical model we don't currently know how to formalize: patterns of human thought.
Like you said, the distinction between meaning and information is largely superficial and arbitrary. Letters are symbols, symbols carry meaning - a particular squiggle only contains information because it has meaning (outside of models and other artificial systems). If I don't read Sanskrit, one letter contains zero information/meaning for me.
"If something is more probable, it contains less information." I don't understand, why would "ee" contain less information than "hh"?
I'm also wondering where his comparison of English to Japanese came from:
"For instance, we could compare the number of bits per character in Japanese versus English (it turns out that English is only 60% as compact as Japanese, which explains why Japanese book are always around two-thirds the size of their English counterparts...)."
Are we talking about kanji or hirigana/katakana? If we're talking about kanji, we're talking about thousands of pictographic symbols (a lot of which are multi-syllabic), which are fundamentally different in nature than English letters (as for the 60% more compact bit, citation please?).
Based on his own assertion, "Let’s say that there are 32 letters and punctuation marks... each character represents 5 bits of information ... a total of 5*16 = 80 bits of information." There's somewhere between 2000-7000 kanji ... so each character would represent...?
I don't understand, why would "ee" contain less information than "hh"?
He's not talking about "information" in the sense of "facts" in this case. Do some reading on Shannon entropy for the details, but broadly, take the following example.
Suppose you're looking for a given word out of some arbitrarily defined set of words, such as a dictionary file. Suppose further you're given either the clue that it contains the sequence "ee", or the sequence "hh". There's a much lower probability of any specific one of the 10000-odd words in the set being the match for "ee" than one of the 100-odd for "hh". In information-theoretic terms, that probability gives you useful data, or information, about the word you're looking for.
How does this approach relate to the value of "don't make me think" in design? How "good" is, for example, a digital watch that asks you to enter the time you want to set expressed in base-4 by selecting one button from a set of 4 at a time? According to Raskin's measure, such a design would be much more "efficient" (and bloody cool geeky surely) .. but is it "good" design?
a) the effect of seeing the result of each of your actions immediately is not being accounted for by this model of "design efficiency" and b) how such a "feedback" relates to the "current state of your mind" instant by instant isn't as well (thats the subjective bit).
I generally feel design needs a bit of the information think as well as a specification of a concurrently evolving "user state".
Shouldn't the efficiency of an atomic-synchronized watch be infinite, since there is no user input? Having a crown which neither needs to be pulled out or pushed in would give 100% efficiency, and the no-input watch seems strictly better than this.
I thought this was a pretty interesting article but I don't really buy the analog vs digital watch comparison. Mostly I don't agree that turning the crown is only 9.5 bits. There are 720 possible combinations and you have to move through on average 180 of them to get to the right time (assuming you can move forwards and backwards). Each tick of the crown is a movement, just as each button press is. In that case the analog watch drops to just over 5% efficiency.
He is overshooting it, but so are you. Really, you just need to know when it's time to stop turning the crown, and that's all.
You won't stop at the exact position, so there are going to be some adjustments that need to be made, but overall it's quite close to 9.5 bits, as he says.
Do you mind explaining how you get to your number of 9.5 bits? I'm just trying to understand this. It seems like you have to iterate through the variations one at a time - just because turning the dial quickly lets you iterate through them quickly doesn't mean you're not actually going through them one by one. I don't see how turning the dial and pressing a button are different in that regard other than you can iterate with a dial much faster than with a button.
I guess I mostly don't see how he defines turning a dial as being equal to 9.5 bits. I understand why he says it would take that many bits to store that decision, but not how a dial gets you there.
Bits of information in information theory dealing with information compression are defined by choices. So it is choices you're looking for, not actions. The guy's article is not technical, though he seems to think it is; it is more of a sentiment, a way of thinking about a problem domain, an approach.
To that extent you're not "going through them one by one", but getting into the proximity, then backtracking (or advancing slowly), and only then going through one by one for a dozen or so variations until you reach your desired option (roughly). It's not as technical, or as precise as he presents it to be, but this approach of tackling UI complexity does give usable approximations of complexity, even if imprecise.
Data compression is not the problem with interacting with devices though. What he should be instead looking at is channel capacity, where the user is encoding information for transmission that the device will receive and decode.
Maybe that's too technical for the nature of this article, but oh well.