Roadmap

(A great place to start if you're new here is my Wide Spacing Roadmap.)

Wednesday, November 4, 2015

The Hidden Secrets of QWERTY

This is another off-topic posting, but it's an issue that I stumble across quite often in my research on typewriter history.  The myth-busting ideas I offer here aren't new or original, but I've added in a new statistical analysis that provides extremely strong evidence in favor of one theory.

[Update 2015/11/09: Sigh.  It turns out there's nothing new under the Sun.  A similar (and more rigorous) statistical analysis can also be found in various papers by Neil Kay, starting with Rerun the Tape of History and QWERTY Always Wins, 2013.]

There is a very old and popular myth about the typewriter that claims that the keys on the QWERTY keyboard have an intentionally bad layout to slow down typists.  Allegedly the inventor, Christopher Latham Sholes, had problems with the type-writing machine (as he called it) jamming when people typed too fast, so he moved keys around until people couldn't type fast enough to jam the typewriter.  There's a number of problems with this very popular theory.

One problem is that the first customers (or product testers) for the new typewriter were telegraph operators, who needed to keep up with the incoming telegraph signal. This was typically only about forty words per minute, but it was absolutely necessary.  And the nature of the telegraph code that was used at the time was that transcribing did not occur at a steady rate, but would often have to wait until possibly ambiguous codes were made clear by context.  The telegraph operator might type in bursts, therefore the typewriter had to attain higher average speeds simply to keep up with the customer.  So perhaps fifty or sixty words per minute was a minimum requirement in his design.  If jams occurred at these speeds, the machine would have been a failure.  And since all the typists in the world were brand new during his testing, it's unlikely he was running into issues with operators going vastly faster than these design speeds.  At least at first.

Ad for the Remington Typewriter, advertising sixty to seventy words per minute.
As found in St. Louis Medical and Surgical Journal, Volume 39, No. 2, July 20, 1880, pg. v

The historical record only confirms this.  The typewriter began commercial production in 1873.  In 1880 (and with the same fundamental design), Remington was advertising "sixty to seventy words per minute".  Other sources confirm that 60 words per minute was a realistic figure for most anyone with enough practice.  But the typewriter was an immediate commercial success and very soon there were expert typists who were very fast.  In the next decade typing speed tests became quite popular, and speeds well over 100 words per minute were common-place.  In 1889, Mr. McBride of Ottawa Ontario typed 179 words in one minute (albeit the same sentence repeated over and over).  The early machines were clearly capable of far more speed than any normal typist would ever require.

Sholes had no reason to slow down the typists, and in any case, QWERTY did not slow them down.

But there is a grain of truth in the myth.  The problem though, wasn't in the keys, but the type-bars.  These were the arms of metal that reached out and struck the ribbon against the paper to stamp each letter.  Anyone who has every operated an older typewriter knows that these bars could sometimes get stuck against each other.  Most type-bars could do this if you typed them at exactly the same time, but for type-bars that were right next to each other, a near miss was enough to cause a jam.

Jamming type-bars.  From a 1920s Hermes Model 2, with a modernized semi-circular basket.  WikiMedia Commons.

This hypothesis of type-bar collisions has been around since at least 1923, where The Story of the Typewriter made this assertion.  This book was written using a large collection of letters from Sholes to various associates, which lends credibility to their claim.  The claim is somewhat corroborated by The Early History of the Typewriter from 1918 (by one of the people working with Sholes), which notes that type-bar collisions were a problem on the very earliest typewriter designs.

There's a very important observation to make about this hypothesis that many modern commentators have completely missed: two adjacent keys on the keyboard do not have two adjacent type-bars.  Have you ever wondered why most keyboards arrange the keys in an odd irregular staggered slant?  This was origally done because every key was a mechanical lever that needed to have a parallel path straight to the back of the machine.  These paths were all evenly spaced, with the keys in any row using every fourth lever.

Keys are staggered so every key has a parallel lever arm.
Diagram from Sholes' patent 207559, filed 1875. 

So while the keyboard order might be QWERTY, the actual order of the key levers on the early typewriters looked like this:

Q A 2 Z W S 3 X E D 4 C R F 5 V T G 6 B Y H 7 N U J 8 M I K 9 , O L _ / P ;

As you can see, supposedly problematic common keys like E and R are no longer next to each other.  But the early typewriters were even more complicated than this.  On more modern typewriters (from the early 1900s onward), this was also the order of the type-bars that would strike the paper.  They were arranged in a semi-circle, and struck the paper in front of you so you could read what you type.  However on the original typewriters, the type-bars were arranged in a full circle, called a basket.  They struck the paper on the bottom side of the roller (you could only see what you typed by lifting the roller).  Half the keys were linked to type-bars on the back half of the basket and half on the front. This was divided by keyboard row, with the top two rows of keys going to the back of the basket and the bottom two rows to the front of the basket.

Remington Standard No. 2 with rollers raised to show circular type basket.


This meant that while the lever arms were in the order described above, that was still not the order in which the type-bars were arranged.  Instead type-bars on the basket were in this order:

Back of the basket:     Q 2 W 3 E 4 R 5 T 6 Y 7 U 8 I 9 O - P;
Front of the basket:     A Z S X D C F V G B H N J M K , L / ;

Now we're finally in a position to understand the engineering decisions involved. On the top bar, we have the vowels (except "A"), with numbers placed between each letter.  No vowel type-bars were adjacent to any other letters, and therefore could not collide with them.  The entire QWERTY row of the keyboard was protected for adjacent type-bar clashes, and in addition to vowels, we also have the very common letters R and T protected from collisions.

A closer look at the bottom two rows of keys, the front of the basket, shows that every pair of type-bars that are next to each other are relatively uncommon.  But how uncommon are they?  Quite a few people who've gotten this far in the analysis still seem to think this layout could happen by chance.  But I wanted to know if this was really even possible, so I did a statistical analysis of his keyboard layout.

In order to analyze the QWERTY layout, I first analyzed a word dictionary, the Enhanced North American Benchmark Lexicon, (ENABLE).  For every possible pair of letters, like "er", I counted the number of words which contained that pair.  (The matches for "er" and "re" were added together, as the order is irrelevant for our purposes.)  Based on this I came up with a total number of matching words per letter pair, and then ranked them based on matches per pair.  Not surprisingly, "er" is ranked first, with 50047 word matches.  Nineteen different letter pairs are tied for 307th place because they never occur at all.

I used this ranking of letter pairs to analyze the adjacent type-bars in the front of the basket.  The most common letter pair that has adjacent type-bars on Sholes' keyboard is A and Z, which ranks 131st in my analysis, out of 325; all other type-bar pairs in the front basket are even less common.

We could actually calculate the odds of this being a coincidence.  But since I'm a programmer not a statistician, I wrote a simulation that tried random keyboard layouts, looking for keyboards where the most likely type-bar pair collision was ranked 131 or lower.  On average, it takes more than 90,000 random tries for this to happen by luck.

But that doesn't take into account how rare the remaining pairs of type-bars are on Sholes' keyboard layout.  I added up all of the possible word matches found in my word pair database, for all of the adjacent type-bars in his keyboard.  I found 3877 total word matches for these type-bar pairs.  This is out of a total of 1,239,045 found word matches, or about 0.3%.  So I rewrote my simulation for this new standard, trying to find keyboards with this few total word matches of colliding type-bar pairs.  With this new approach, it takes on average more than five million (about 5.9 million) random tries to find a keyboard layout that is as good as QWERTY.

To put it another way, the odds that solving this type-bar problem was not Sholes' primary design goal are worse than one in five million.

We can also take a brief glance at the keyboard evolution.  Did you notice anything odd about the keyboard layout in the image above from the 1875 patent?  The bottom row shows "Z C X V".  The last change made to the keyboard layout before it stabilized on it's modern configuration was to reverse the position of the X and the C. Consider that before this final change, the type-bars would be in this order: A-Z-S-C-D-X. This puts the letter combination SC on adjacent type-bars.  This pair is more than twice as common as AZ pairs, and 85th out of 325. So the final change made to the QWERTY layout before it reached our modern standard was to move the most common remaining letter pair found among the type-bars away from each other.

Now lets revisit the QWERTY myth.  Sholes allegedly was trying to slow down typists.  But how exactly QWERTY was supposed to slow down typists?  One suggestion is that QWERTY put popular letter pairings on the same hand, on the theory that you can type more quickly going from one hand to the other, and more slowly on the same hand.  But based on my letter pair dictionary the QWERTY layout puts eight of the twenty most common letter pairings on opposite hands.  Considering there's 325 possible letter pairs, the QWERTY layout is hardly a successful implementation of this strategy.

Another (contradictory) notion is that you could slow down typing by avoiding common letter pairs on two fingers next to each other, because those are very easy to type quickly.  However, the most popular letter pair of all, E and R, are right next to each other, typed with the third and second fingers, the most coordinated fingers. These keys can be typed very rapidly.

If slowing down the typist was a design goal, then by any theory, Sholes clearly did a horrible job.

Now lets look at this analysis from the opposite point of view.  Suppose type-bar jams were his primary concern, but a secondary concern was making the layout as fast and convenient as possible?  While keeping type-bar collisions to an impressively low frequency, he still managed to put seldom-used letters like Q and Z on the far edge of the keyboard.  He put most of the vowels on the right hand (sorry lefties).  He put the most common letter pair in the language, ER, in a position where our two strongest fingers could be drummed in succession to type these letters.  While we could quibble over the details, I find it impressive that he managed to maintain such convenience in layout while avoiding type-bar jams.

I should point out at this point too, that it was likely that his keyboard layout was designed with two- or four-finger hunt-and-peck typing in mind, as touch typing didn't exist.  It developed over the first decade of the typewriter, with eight and ten finger methods, and different fingerings.  It's possible that some touch-typing already existed in 1875 when he proposed the nearly final keyboard layout, but it wasn't drastically different from his 1873 layout.  Still, by 1873 Sholes and a short list of others did have a very large number of hours typing, so it's entirely possible that they themselves had at least toyed with touch-typing.  Regardless, moving keys like Q and Z away from the middle makes sense for both touch-typing and hunt-and-peck.

But alas, Sholes was not satisfied with QWERTY.  He actually included a new layout in another patent filed shortly before he died.

Sholes' final keyboard design.
U.S. Patent 568630, filed 1889 (granted posthumously).

By spreading out more punctuation on the bottom and the numbers on the top, there are even fewer type-bar pairs at all that are letter next to letter. We only have G-Z-K from the top right, and Q-J-V-B from the lower left.  The most common pair here is BV/VB, which is the 246th most common pair out of 325. The total number of words from the ENABLE lexicon that involve adjacent type-bar combinations on the QWERTY keyboard was 3877.  The total for this new keyboard?  Only 77.  Or about one word out of every 16,000.  At the same time he achieves this, he groups the vowels together (in order), and puts all of the most common letters on the right hand (again, sorry lefties) or on the left index finger.

I have tried to simulate the odds of randomly finding a better keyboard than this one (at avoiding adjacent type-bar collisions).  The program ran for a week, and tried more than 40 trillion keyboards, and found no keyboards as good as this one.  Even if you assume this is a bizarre dry patch in the random search, it's hard to reasonably imagine that the odds are better than one in a 100 billion that this keyboard layout was an accident.  It is essentially impossible at this point that type-bar jams were not a primary design goal in his keyboard arrangements.  Based on my simulations, the odds of creating two different keyboards that satisfy this design criteria merely by chance are far worse than one in five hundred quadrillion.  The odds of one person winning two Powerball lotteries back to back (after entering each lottery only once) are more than ten times better than this.

Now to be fair, it's not likely that a keyboard design would be completely random, and so this may not be the most ideal calculation of odds.  But even if you took a few orders of magnitude off the the odds, they'd still be incredibly remote.  And I think given the original argument about slowing down typists, testing completely random keyboard arrangements is completely justified—most of these random keyboard layouts would have been much harder to type on than QWERTY, slowing down the typist more.  It would have been trivial for Sholes to hamper typists far more than QWERTY does.  It would have been very difficult for him to beat QWERTY on type-bar collisions.  And yet he did.

This is not a mistake or design flaw, this is impressive engineering.

This statistical analysis not only demonstrates that type-bar jams were the reason for QWERTY, the analysis also shows what a fantastic job Sholes did to solve this problem, while maintaining a usable keyboard.  Based on all available evidence, it's obvious that he did his best to make typing as fast as possible.  And he put far more time, effort, and engineering into this layout than critics today would ever have imagined.

But in the end, does any of this matter?  Clearly the reasons for the layout of the keyboard and the offset position of the keys are all gone.  QWERTY is purely vestigial.  On the other hand, it does show that QWERTY isn't so bad after all, and the muscle memory spent on learning QWERTY isn't a hopeless waste.  Recent examinations of the Dvorak layout have revealed that it isn't all it's cracked up to be.  And if we decide to replace QWERTY, exactly what metric should we be using?  With Sholes' own XPM layout, he seemed to place value in making the keyboard easy to learn (based on having AEIOUY in order on the keyboard).  So is learning important?  Raw speed?  Limiting repetitive stress?  Reducing errors?  Each of these goals might result in a completely different and not wholly satisfying design.

As awful as people claim QWERTY is, so far there's just been no compelling reason to replace it.  Perhaps this more than anything else demonstrates that QWERTY wasn't so bad after all.

In upcoming blogs, we'll look at who might have done the statistical analysis of this keyboard, and we'll squash look at a recent theory that claims that QWERTY was created to simplify telegraph code transcription, by putting keys with related telegraph codes near each other.

No comments:

Post a Comment