Friday, December 28, 2012

One Space is Just Wrong

"The space between sentences is a an aesthetic and functional choice, not a law."

That's the tag line on my Twitter account.  It's been my motto as I approach this issue.  It's my attempt to be fair and even-handed, and to avoid being preachy.  However as I learn more about this issue, I'm beginning to realize that it might be wrong.  Or at least incomplete.

When it comes to typography, I stand by it.  If you have the luxury of choosing your sentence spacing, by all means, choose whatever sentence spacing you find appropriate for your composition.  Wide spacing, or word spacing, a little wide or very wide, or even narrower than word spacing if you like.

But what I've learned is there is a very clear right answer on how many times to hit the space bar after a full stop.

We must use two spaces after a full stop.

It's odd that I'd say this, considering such brash and pedantic statements from the monospacers like Farhad Manjoo are what set me down this obsessive compulsive trail.  But unlike their reasons, which are clearly wrong and easily dismantled, I have a reason that is practical, and actually important.

First, let me be completely clear.  The printed space between sentences in a published work is not what I am talking about here.  I am only talking about the number of times you should hit the space bar after a sentence.  These things are not the same today, although they used to be.  On a typewriter they were the same thing.  On a Merganthaler Linotype hot metal typesetting machine, they were the same thing (although it was two different kinds of spaces).  But on computers, they are not. 

Computers can format your sentences however you like.  There's just one catch: computers must first know what a sentence is.

Wait, isn't that an easy one?  Actually, it is not.  The problem is that the meaning of a period is ambiguous.  It's not just a full stop.  It's also used to mark abbreviations.  And for initials.  And as a decimal point.  And after enumerations.  And as part of an ellipsis.  These things don't trip up human beings that often.  But for computers, deciding if that period is a full stop defies all attempts at a solution.  And there have been scores of scholarly papers on this very subject of sentence boundaries.  Let's look at a worst case scenario:
"Who's going?" "You and I. Smith is going too."
Is that "You and I." followed by "Smith is going too."?  Or is that referring to someone named "I. Smith"?  With no context, there's no way for even a human being to decide.  So for a complete solution we need a computer program that is capable of understanding context.  This is way beyond our current state of the art in computer software.

But that's just one example, right?  I mean, even people aren't perfect, so there must be an approach that's close enough?  Well, that depends on what you mean.  I've seem algorithms that claim to be successful 95% of the time or even 98% of the time.  Let's think about that - 98% means that your average article longer than 50 sentences is likely to have an ambiguous sentence boundary in it that the computer can't figure out.

So why do we care if a computer can parse what we say?  One practical reason is for machine translation.  Machines that are trying to translate what you are comm­unic­ating can do a much better job if they know what a sentence is.  There's also text-to-speech for the visually impaired (and others), which requires a knowledge of sentence boundaries to yield realistic inflection.  And of course it's useful for those who want wider sentence spacing.  But the larger reason is simply that we are trying to comm­uni­cate, and if a computer is going to mess it up, theres a chance a person might blow it too.

This whole problem could be solved if we only had an unambiguous full stop, rather than the confusing period.  But actually we do, or at least, we did.  For hundreds of years, it wasn't just professional printers that added extra space after sentences.  When students learned basic penmanship in school, they were also taught to add extra space between sentences.  There were no typewriters or computers.  All person-to-person comm­unic­ations before the typewriter were handwritten, and people penned their letters with extra space between sentences.

In other words, the unambiguous full stop is simply a period combined with extra space.  But we are losing this standard piece of punctuation.  We are losing it to technology.  We lost it to the poor formatting capabilities of the Linotype, and the expense of corrections.  We lost it to the general desire for faster and cheaper printing, and even the slight savings in paper.  We lost it to early web design standards that ignored the issue, because HTML was never meant to be so heavily typographic, and the designers didn't want to bother.

But that's all in the past.  Right now, we are losing it to a group of people who have declared war on the extra space.

No, I don't mean typographers.  You do often see claims that typographers everywhere have declared extra spacing to be wrong.  What you don't see is hoards of actual typographers saying that.  Sure you see some here and there.  But there's also quite a few typographers who don't think it's such a hard and fast rule.  And even a few who long for the days of wider sentence spacing.  Despite any claims, typographers are not the driving force in this war.

The real leaders of the crusade are the editors.  If you see someone complaining on line about this issue, by far the best guess is that they are an editor of some sort, and they are complaining about editing out extra spaces.  It's unfortunate that no one has bothered to tell them of the wonders of search-and-replace.  Or of requesting new features in their software if it doesn't do what they need.  Or of buying someone else's software.  All of these choices would alleviate the editor's pain without destroying a basic piece of punctuation.  But instead of fixing their software, they have chosen to bend the habits of the world to their will.

It's more than just unfortunate.  It's tragic.  Editors are entrusted with preserving our comm­unic­ations, with standardizing them.  And yet it is editors who, simply out of plain laziness or technological ignorance, are willing to cast aside the unambiguous full stop like it was nothing more than yesterday's newspaper, or this morning's toilet paper.

In print, the unambiguous full stop disappeared thanks to cost-cutting.  Luckily, we had typewriters there to keep it alive, and teach us all the two-space habit.  But now we're in the computer age.  Much software has learned it's lesson from typewriters, and those two spaces are still used and recognized in a wide number of software packages.  But unfortunately there are other software packages that don't understand.  They learned their lessons from a print industry, without understanding that the loss of extra spacing was simply a cost-cutting strategy.

The irony is that today, the costs that lead to the demise of wide sentence spacing in the print industry no longer apply.  Thanks to the two-space typing habit it's a trivial change to allow the print industry to reliably typeset sentences according to any desired style.  And it's trivial for translation software, and text to speech software to reliably detect sentences and use that information.

Ultimately we write so that we can communicate our ideas clearly and effectively.  And our most basic unit of comm­unic­ation is the sentence.  Isn't it worth providing a reliable way of deciding what is and is not a sentence?

On the other hand, we could all abandon that extra space and just leave it up to chance.  Because think of the all time those editors can save!


  1. Great article on the defense of double spacing. Next we should get a defense for the use of "its"

  2. How about using a "square" of whitespace?

  3. Haha this is so wrong! So all the newspapers, magazines, advertisements, etc. etc. are all using two spaces incorrectly?? And I don't buy this whole "cost savings" crap, that has zero to do with it. One space is all you need, and that's all there is to it. It was decided long ago before typewriters, but unfortunately the typewriter is to blame for it. If you see an article or advertisement in print, online, etc., and it uses two spaces, there is something very wrong. - good example of an ad that was properly composed by Apple. How many spaces do they use after a period? That's right....

  4. It's hardly in doubt that word-spacing is widely popular now (pun intended). But why did this happen and is this prescriptive? As far as the cost issue, I'll touch on that more in a future posting. You also might want to read my posting "A River Runs Through It", which touches on just one of the issues. And as far as this being "decided long agao before typewriters", this is completely wrong. Find any 18th or 19th century book on typography, and it will describe wide sentence spacing as standard, and will in fact use very wide sentence spacing.

    And is a single space "prescriptive"? Even the Chicago Manual of Style people in their FAQ acknowledge that two spaces are still in use by some editors. But that's not the only style manual in town. The American Psychological Association recommends two spaces for their manuscripts, and the Modern Language Association states that "As a practical matter, however, there is nothing wrong with using two spaces". The alleged universal support for a one-space rule simply does not exist.

  5. I don't have any specific disagreements with your points here, but I wanted to note that your "I. Smith" example doesn't really serve the purpose to which you've put it.

    "Who's going?" "You and I. Smith is going too."

    Both humans and computers could easily parse this because of the grammar involved. "You and I. Smith are going" has the implicit meaning of two individuals, while "You and I. Smith is going" has the implicit meaning of three. It's all to do with the agreement of the "be" form with "and" as the conjunction used.

    That being said, your point is preserved with a simple switch to the following:

    "Who else is going?" "You or I. Smith is going too."

    The use of "or" instead of "and" means that the "be" form will always be "is", no matter what. It's here that a program would truly benefit from knowing that a single space means an initializing as opposed to a full stop, since there's no obvious grammatical disagreement.

    I'm still undecided on the whole spacing thing, but I wanted to throw that out there - it would be a shame for people to ignore an otherwise totally valid example because of a grammatical mix-up.

  6. Thanks. I considered that to some degree and decided my example was "close enough". If we're talking computer algorithms, Ideally we'd want it to function well on all common input, not just input that is formally written and grammatically correct. You could also write it with a comma after the "You", which at first glance might seem to force one interpretation. But a comma as a speaker's pause could still make sense in both cases if the pause simply represents the speaker considering his answer.

    At any rate this is a sort of a worst case. Looking at it in terms of readability, what I find in reading is that when I'm tripped up on periods, it's almost always the case that seeing the entire sentence clarifies the meaning. The problem is that as you read a sentence left to right your brain doesn't have the whole sentence, and this is why it's easy to get tripped up on initials and abbreviations even if the meaning is clear across the entire sentence. Your brain tries to anticipate, tries to interpret the period in the wrong sense, and only then do you realize something isn't right and have to re-parse it.

    Wider sentence spacing allows you to avoid this re-parsing. To me it's a clear case where wider spacing should improve readability.

    There's always a tension in typography between form and function. Form says that text should appear as a solid block of grey, or that it should all appear to be the same "color". But function says (according to 19th century typographers) to provide spacing to indicate pauses, even though it breaks up the solid grey just slightly. I came across a phrase the other night that's repeated in many different sources, which sums up how to handle this tension between form and function:

    "Type was made to read."

  7. A convention I have seen rise over the past couple of years for input into engines that do not distinguish between kinds of whitespace, such as HTML and TeX, is to end each sentence with a new line (TeX marks paragraph ends with blank lines, and has line-terminated comments, but otherwise treats new lines as spaces).

    This is done because having principled places for line breaks works well with version control software, but it obviously supports what you propose here.