The Magic of Reading







“What they call Originality is achieved by getting down to the root-principle underlying the practice. From that origin you think your way back to the surface, where you may find you’re breaking untrodden ground.”

Stanley Morison, typographic advisor to the Monotype Corporation and the greatest type historian of the 20th Century.





osprey



Version 1.0



Note:

This document will always be a draft. I’m learning all the time. Please throw all the rocks you can so I can learn more.



1. Executive Summary

This report is a new study of reading, how it works, and how to achieve that mysterious state referred to as “readability”. It’s targeted in the first instance at electronic books, but is also relevant everywhere else that text is read.

If the ideas in this document work – and there are very strong signs that they will – they will change the world. That’s a grandiose claim. But reading is a core human task. We were not ready to implement the much-hyped “Paperless Office” in the 1970s and 1980s. The main obstacle to that vision was: How can you have a paperless office, when reading on the computer screen is so awful?

We are about to break through that barrier. And everything will change when we do.

I’ve read around 12,000 pages of research papers, books and articles over the past several months. The (hopefully logical) case that follows is almost an exact reversal of the discovery process that took place.

The top-level conclusions are:

  1. Pattern recognition is a basic behavior of all animals that became automatic, unconscious and unceasing to ensure survival.
  2. Humans have developed visual pattern recognition to a high degree, and human brain development has given priority to the visual cortex that is a key component of the recognition system.
  3. Pattern recognition is key to the development of language and especially writing and reading systems, which depend entirely upon it.
  4. The book is a complex technological system whose purpose is to Optimize Serial Pattern Recognition, so it can be carried on at a basic instinctive level, leaving the conscious cognitive processing of the reader free to process meaning, visualize and enter the world created by the writer. I call this system OSPREY.
  5. OSPREY is how books work, and the same optimization can be done algorithmically for electronic books and other computer screens by developing two new technologies, both of which are described in this paper:
    • ClearType font display technology that can greatly improve the screen display of letter- and word-shapes, recognition of which lies at the heart of reading.
    • An OSPREY reading engine that will automatically take structured content and display it according to OSPREY rules.

2. Introduction

This ongoing study into the readability of text on screen was carried out as part of Microsoft’s “Bookmaker” Electronic Books project.

If electronic books are ever to become an acceptable alternative to books in print, readability is the biggest single challenge they must overcome. We can deliver text on screen, and the computer offers significant potential advantages in terms of searching, adding active time-based media such as sound, carrying many different books in a single device, and so on.

But will electronic books be readable? Will people ever want to spend the same amount of time looking at a screen as they spend today reading a printed book?

People still don’t like to read even relatively short documents on screen, whereas they will happily spend many hours “lost” in a book. Unless we can make significant advances in readability, electronic books will be limited to niche markets in which early adopters are prepared to put up with relatively poor readability. Is it merely a question of waiting until screens get better?

Almost 15 years ago, I helped develop a hypertext product aimed at moving us towards the “Paperless Office”. As we know, the Paperless Office has so far been a complete bust; more paper is produced today as a result of the widespread adoption of the desktop computer than at any time in history.

The Paperless Office foundered on the same shoal as the first attempts to produce electronic books – poor screen readability, because reading is the core of everything we do. This paper, I hope, explains what went wrong, and how to fix it. The Paperless Office is now a real possibility. We can make it a reality.

2.1 First step: understand what works

To understand what went wrong, and how to fix it, the best place to start is by asking the question: “What went right?”

There is one undisputable fact: The Book works.

Boiled down to its essence, a book is basically sooty marks on shredded trees. Yet it succeeds in capturing and holding our attention for hours at a stretch. Not only that, but as we read it, the book itself disappears. The “real” book we read is inside our heads; reading is an immersive experience.

What’s going on here? What’s the magic?

Those questions are the starting-point of this study.

Although a great deal of readability and reading research has been done over the past couple of centuries, reading and how it works still remains something of a mystery.

One body of work has focused largely on typography and legibility. Another body of work has examined the psychology and physiology of reading. All the research so far has added valuable data to the body of knowledge. But it has failed to explain the true nature of reading and readability, possibly because it was the work of specialists, each with a strong focus in a single area such as psychology, physiology or typography.

I’m not a specialist, although I’ve been dealing with type for 30 years. This paper takes a generalist approach I believe is the key to understanding the phenomenon of immersive reading.

Some great work has been done on the specifics. But what has been lacking until now has been a way of tying all of this work together. Some important missing pieces were also missing from the puzzle. Writing, printing, binding books, and the human beings that read them together make up a “system”. Analyzing its parts does not reveal the whole picture.

2.2 A General Theory of Readability

This paper puts forward a “General Theory of Readability”, which builds on the findings of these different areas of research, and adds perspectives from the study of information processing and instinctive human behavior, to build a new unified model of the reading process. I believe this model gives new insight into the magic of the book; how it works, and why it works. And thus it tells us how to recreate that magic on the screen.

Something deep and mysterious happens when we read, intimately linked to human psychology and physiology, and probably even to our DNA.

The book as we know it today did not happen by chance. It evolved over thousands (arguably millions) of years, as a result of human physiology and the way in which we perceive the world. In a very real sense, the form of the book as we know it today was predetermined by the decision of developing humans to specialize in visual pattern recognition as a core survival skill.

The book is a complex and sophisticated technology for holding and capturing human attention. It is hard to convince people of its sophistication; there are no flashing lights, no knobs or levers, no lines of programming code (there really is programming going on, but not in any sense we’d recognize today…).

The conclusions in this document could have great implications for the future of books. But books are only an extreme case of reading – a skill we use constantly in our daily lives. Advances made to enhance the readability of books on the screen also apply to the display of all information on computer screens, inside Microsoft applications and on the Web.

This has been an amazing journey of exploration for me. The central question: “What’s going on here?” kept leading backwards in time, from printed books to written manuscripts, to writing systems, to pictures drawn on the walls of caves by prehistoric man, and eventually to primitive survival skills and behaviors we humans share with all other animal forms. At the outset, I had no idea just how far back I’d have to go.

2.3 What’s this got to do with software?

Some of the areas touched on in this report are pretty strange territory for a company at the leading edge of technology at the end of the 20th Century. But computer software isn’t an end in itself. We build it so people can create, gather, analyze and communicate information and ideas. Reading and writing are at the very heart of what we do. The difficulty that most people have in getting to grips with computers is a direct result of the fact that they force us to work in ways that are fundamentally different from the way we naturally perceive and interact with our world.

I came across the quote from Stanley Morison – one of the greatest and best-known names in the world of typography – only at the end of this current phase of work. Morison was talking about the design of new typefaces, but it is great advice for any researcher, in any field.

He is absolutely correct. Trying to get right back to the roots and basic principles involved in reading allows us to analyze the book and see it as a truly sophisticated technological system. And understanding how this technology hooks into human nature and perception makes it as relevant and alive today as when Johannes Gutenberg printed the first 42-line bible in Mainz more than five centuries ago, or when the first cave-dwellers drew the “user manual” for hunting on the walls of their homes.

Understanding the root-principle is key to taking text into the future. The computer can go beyond the book – but only if we first really understand it, then move forward with respect and without breaking what already works so well.

The basic principles outlined in this paper will allow us to focus future research on areas most likely to be productive, to develop specific applications for reading information on the screen, to develop testing methods and metrics so we can track how well we are doing, and to go “beyond the book”.

2.4 Why is this a printed document?

Ideally, this document should have “walked the talk”, and been in electronic format for reading on the screen, demonstrating the validity of its conclusions.

Unfortunately, no system today exists that can deliver truly readable text on the screen. We have a first, far-from-perfect implementation, which is constrained by the device on which it runs. It is already better than anything seen so far, and will improve dramatically over the next few months.

This paper, I hope, explains how to build the first really useable eBook, and defines its functionality. But there’s a lot more work to be done to make it real.


3. Detailed Conclusions

  1. One of the most basic functions of the human brain is pattern-recognition. We recognize and match patterns unconsciously and unceasingly while we are awake. This behavior developed for survival in pre-human (animal) evolution, and humans have developed it to a highly sophisticated level. It is coded into our DNA. The growth in size of the visual cortex in the human brain is believed to have resulted from the increasing importance to us of this faculty.
  2. The book has evolved from primitive writing and languages into a sophisticated system that hooks into this basic human function at such a deep level we are not even aware of it. The effect is that the book “just disappears” once it hooks our attention.
  3. The book succeeds in triggering this automated process because it is a “system” whose purpose is to Optimize Serial Pattern Recognition. From the outside it looks simple – not surprising, since it’s designed to become invisible to the reader. There are no bells, whistles, or flashing lights. But “under the hood” the technology is as complex as an internal combustion engine, and similarly it depends on a full set of variables that must be tuned to work together for maximum efficiency. Much previous research has failed to grasp this because of researchers’ tendencies to take the traditional path of attempting to isolate a single variable at a time. To gain full value from these variables requires first setting some invariable parameters, then adjusting complex combinations of variables for readability. I have called this complete system OSPREY (from Optimized Serial Pattern Recognition).
  4. OSPREY has an “S-shaped” efficiency curve. Readability improves only slowly at first as individual variables are tuned. But once enough variables are tuned to work together, efficiency of the system rises dramatically until eventually the law of diminishing returns flattens the curve to a plateau. Conversely, it takes only two or three “broken” or sub-optimal variables to seriously degrade readability.
  5. Reading is a complex and highly automated mental and visual process but makes no demands on conscious processing, leaving the reader free to distill meaning, to visualize, and to enter the world created by the writer. That world is in reality a combination of the writer’s creation and the reader’s own interpretation of it.
  6. Interaction with this technology changes the level of consciousness of the reader. A reader who becomes “lost” in a book is in a conscious state that is closest to hypnotic trance. OSPREY allows the reader to achieve this state of consciousness by reaching his or her own “harmonic rhythm” of eye movements and fixations that becomes so automatic the reader is no longer aware of the process.

    All of the parameters and variables needed to achieve OSPREY are already known for print. They can be duplicated on the computer screen, but some technology improvements are required. Where the computer screen is weakest in relation to print is in the area of fonts and font rendering, which has the greatest effect on the way letter and word shapes are presented to the reader. In the course of this research, the author and others have carried out research in this area and have developed a new rendering technology (Microsoft® ClearType) that greatly improves the quality of type on existing screens.

  7. OSPREY can be produced algorithmically, with no requirement for manual intervention. Now we have proved that ClearType works, all of the required technologies are known. But they have never been assembled into a full system and tuned for the screen with OSPREY goals in mind. Since no such complete systems for displaying text on screens have yet been built, screen display of text is currently on the “low efficiency” segment of the S-Curve. This explains why people prefer to read from paper than screen, especially for longer-duration reading tasks – a fact documented by many researchers, and by our own experiences.
  8. OSPREY technology will allow Microsoft to deliver electronic books that set new standards for readability on the screen. The technology can be folded back into mainstream Web browsing and other software to bring major improvements in the readability of all information on the screen.
  9. Many attempts have been made over the years to develop alternative methods of improving reading speed and comprehension. Examples include technologies such as Rapid Serial Visual Presentation (flashing single words on a computer screen at accelerated rates) that are claimed to greatly increase reading speeds. They have failed to gain acceptance because they do not take the holistic approach needed to achieve the OSPREY state, and fail to take into account the wide variations in reading speed shown by a single reader during the course of reading one book. However, the possibility remains that some new technology can be developed to revolutionize reading, and further research should be carried out to fully explore alternative approaches. Any such “revolutionary” technology will have to be extremely powerful and easy to learn and apply in order to succeed. Not only will it have to improve the immersive reading experience, it will then have to be widely adopted as a replacement for the current system, which has evolved over thousands of years into its current mature technology and is highly bound up with the nature of humans.
  10. We are now building an OSPREY reading engine from existing components and new technologies. For success, the team must continue to have a mix of software developers, typographers and designers. An important part of this project will be work on new fonts for reading on the screen, especially new font display technologies outlined in this paper to squeeze additional resolution from the mainstream display technologies which are likely to remain at or near their current resolution level for some years.
  11. Once an OSPREY system is built, we should carry out further research into cognitive loading – a way of measuring the demands that the reading process makes on our attention. We should compare cognitive loading values for the printed book, for current Web-based documents, and for tuned OSPREY systems. This research will validate the OSPREY approach and provide valuable data for optimal tuning of OSPREY systems. We must develop a range of metrics for immersive reading, and tools to track them.
  12. Using these measures will enable Microsoft to take the book to a new level that is impossible to achieve in print, and then apply the same technologies to all information. Understanding the basic OSPREY principles and implementing a system will enable us to use computer technology to enhance and reinforce the OSPREY effect without breaking it, for example by:
    • Creating new typefaces and font technologies to enhance pattern-recognition, especially for LCD screens.
    • Providing unabridged audio synchronized to the text so the reader can continue the story in places they would normally be unable to read – for example while driving – switching transparently between audio and display.
    • Using subtle and subliminal effects such as ambient sound and lighting to reinforce the book’s ability to draw the reader into the world created by the author. (Subtle is the keyword here: effects must enhance the OSPREY state without disrupting it.) This utilizes the “Walkman Effect” to allow the reader to more quickly move from the physical world into the world of the book and keep her attention there by enhancing the book’s already-powerful capability to blank out distractions. At this point, this is merely a possibility; there is no proof that it will work, or that it might not run contrary to maintaining the flow of reading. This should be investigated in further research.
    • Defining new devices or improving existing desktop PCs with displays tuned to the “sweet spots” which are identified by the OSPREY research. Documents can be formatted for these “sweet spots” and intelligently degrade to provide maximum readability on other devices. A key to this will be the implementation of “adaptive document technology” (Microsoft patent applied for) which will automatically reformat documents to be read on any device while still adhering as closely as possible to OSPREY principles within device constraints. This technology, and the devices that run it, will help drive the paradigm shift from the desktop PCs of today to the portable, powerful, information-centric devices of tomorrow.

4. Pattern recognition: a basic human skill

One of the most basic skills of living beings is pattern-recognition. It is a fundamental part of our nature, one we humans share with animals, birds, and even insects and plants.

Pattern recognition is a precursor to survival behavior. All life needs to recognize the patterns that mean food, shelter, or threats to survival. A daisy will turn to track the path of the sun across the sky. Millions of years ago, the dog family took the decision to specialize in olefactory and aural pattern-recognition. They grew a long nose with many more smell receptors, and their brains developed to recognize and match those patterns.

As our ancestors swung through the trees, a key to survival was the ability to quickly recognize almost-ripe fruit as we moved rapidly past it. (Unripe fruit lacked nutrition and caused digestive problems; but if we waited for it to become fully ripe, some other ape got there first…)

So we specialized in visual pattern recognition, and grew a visual system to handle it (including a cerebral cortex optimized for this task).

Pattern-matching in humans makes extraordinary use of the visual cortex, one of the most highly-developed parts of the human brain. Recognition of many patterns appears to be programmed at DNA level, as evidenced by the newborn human’s ability to recognize a human face.

In primitive times, we had to learn which berries were safe to eat, and which were dangerous. We had to learn to recognize movement using our peripheral vision, then use our higher-acuity focus to match the pattern to our “survival database” to evaluate whether it was caused by another ape (opposite sex for breeding purposes; same-sex, possible territorial battle) or a lion (predator: threat).

For our survival, this pattern recognition had to become unceasing and automatic. In computer terms, pattern matching belongs to the “device driver” class of program. It is activated at birth (maybe even at conception), and remains running in the background until we die, responding to interrupts and able to command the focus of the system when required.

Anyone who studies animal tracking and survival skills realizes at a very early stage that at the core of all these skills is pattern-recognition and matching.

Jon Young, a skilled animal tracking and naturalist who runs the Wilderness Awareness School in Duvall, WA, spent many years being mentored in tracking and wilderness skills by Tom Brown Jr., one of the best-known names in US tracking and wilderness skills circles.

Jon has studied the tracking and survival skills still used by native peoples all over the world, including Native Americans – who were (and in some cases, still are) masters of the art. He has documented how children begin to learn from birth the patterns essential to their survival.

For example, the Kalahari region of Africa is one of the most inhospitable parts of the world. There is almost no surface water for most of the year. Yet to the tiny Kalahari bushmen this is “home”, and provides all that they need to survive.

One of the first survival skills taught to the Bushmen’s children is how to recognize the above-ground “pattern” of a particular bush which has a water-laden tuber in its root system, invisible from the surface. If this is sliced, and the pulp squeezed, it provides a large quantity of pure drinking water.

All survival skills which involve animal tracking, or use of wild plants for food, medicines, clothing etc., are based on pattern recognition, and learning from birth the right database for the relevant ecosystem. Taking a Kalahari bushman and placing him in the Arctic would pose him a serious survival problem. An Eskimo transplanted to the Kalahari would have different but equally serious challenges.

There are patterns associated with wolf, domestic dog, wildcat, cougar, bear, squirrel, or mouse. Each has subtleties that enable the skilled tracker to recognize different events, such as an animal that is hunting, or running from a predator. There are even patterns within tracks which show when an animal turned its head to the side, perhaps to listen to a sound which means danger, or just to nibble a juicy shoot from a bush as it quietly grazed in the forest.

An expert in survival, such as a native or a well-trained woodsman, is one who has studied enough of the patterns of nature – the tracks of animals, the sounds of the birds, and so on – to have built a large “database” of patterns in his or her memory store.

“Nature provides everything we need to survive – and even thrive – in what we call the wilderness. All we have to do is learn to recognize it,” (Jon Young: 1998).

It is also probably no coincidence that among first uses of symbols we have on record appear to be either records of (or how-to instructions on) hunting.

4.1 Pattern Recognition and Reading

What has all this to do with the life of modern man, and especially with reading? Well, most of us may have left the woods to live in towns and cities, but the woods have never left us. We still use this same survival trait of pattern recognition unceasingly and unconsciously in our daily lives. It’s hard-wired into the organism.

Pattern recognition is how we walk down a hallway without continually bumping into the walls. It’s how we stay on the sidewalk and out of the traffic on the roadway. It’s how we recognize each other. Pattern recognition still tells us where to find food – why else would McDonald’s be so protective of its corporate logo?

Modern civilization makes constant use of the fact that we continually pattern recognize and match. Corporate logos, freeway signs, “Walk/Don’t Walk” signals, and so on are all examples.

One of the most pervasive applications of our innate pattern recognition behavior is reading. We learn to read by first learning to recognize the basic patterns of letters. Then we learn to recognize the larger patterns of words. Once we have learned the pattern of the word “window”, we never again read the individual letters; the larger pattern is immediately matched as a gestalt. If we are skilled readers, we may learn to match patterns at phrase or sentence level, or perhaps in even larger units.

Reading is an amalgam of highly automated processes that include word recognition. Seen as a system the task of reading is simply serial pattern recognition. Patterns are recognized as symbols, groups of which are inferred to have meaning. Word recognition is the primary task of reading. In effect, the book takes our highly-tuned survival skill for a walk through a friendly neighborhood park, where almost all the people we meet are old friends whom we recognize immediately (depending on the level of challenge in the content). When we come across a new pattern, we are able to find its meaning (by consulting a dictionary or “pattern database”) and enter it into our memory of stored patterns as a new friend.

If reading, especially of longer texts like books, is analyzed in detail from this viewpoint, the “art” of typography and design can be shown to be a highly-sophisticated technology with a coherent underlying logic which is set up to make Serial Pattern Recognition as effortless as possible. The book is the embodiment of a technology of Optimized Serial Pattern Recognition.

In honor of its wilderness roots, I’ve called it OSPREY.

4.2 The Concept of Harmonic gait

There is another feature of animal tracks that is highly relevant to readability: the concept of harmonic gait.

Every animal has its own specific harmonic gait; the pattern in a group of successive tracks which the animal makes when in its normal relaxed state. Tracks are regularly spaced. In animals with four long legs, for example the dog, cat and deer families, the right rear paw or hoof lands directly on top of the print left by the right front paw or hoof. Trackers call this direct register.

When the animal is moving faster than normal, rear feet land ahead of front feet, until gait speeds up into a canter or gallop, and the pattern changes. When the animal is moving slower than normal, the rear feet land behind the impressions left by the front feet. But even these new patterns are regular and predictable.

Trackers use these regular gaits to analyze animal behavior. Changes in gait are clues to what the animal was doing. Speeding up normally indicates either predatory behavior (e.g. chasing the next meal) or trying to escape from a perceived threat (e.g. when the deer spots movement in its peripheral vision, and matches it to the pattern of “mountain lion”).

These regular gaits have another important use. If a tracker wants to find out where an animal is now, or where it went, he obviously has to follow its tracks. This is easy enough in soft sand, where tracks are deep and easy to see. But when the animal moves over rougher or harder ground, tracks are much harder to spot.

If the tracker knows the animal’s gait, he can predict with reasonable certainty exactly where the next track is likely to be found. He can narrow his search for the next print to the most likely area, find it quickly even if its traces are faint, and confirm the animal’s direction of movement. By using gait measurements (with a “tracking stick” easily made from a fallen branch), trackers can continue to follow the animal in conditions that would otherwise make tracking extremely difficult, if not impossible.

The regular rhythm of the gait acts as a cue to the tracker, telling him exactly where the next pattern-recognition task will take place. The relevance of this will become apparent when we look at typography later in this paper. Books do exactly the same by controlling the pace at which the words are presented and allowing the reader to move through the content at his or her own harmonic or natural gait (which readers change all the time in the course of reading).

The book presents each reader with level ground over which he or she can move at their own pace.


5. The Concept of “Ludic” Reading

A key term that may be unfamiliar to readers of this study is ludic reading. The term was coined in 1964 by reading researcher W. Stephenson, from the Latin Ludos, meaning “I play”. A ludic reader is someone who reads for pleasure.

Many of the conclusions of this paper are reached as a result of examining the technology of the printed book in conjunction with research carried out by psychologists into reading, especially ludic reading. This is clearly the most relevant form of reading to the eBook.

Ludic reading is an extreme case of reading, in which the process becomes so automatic that readers can immerse themselves in it for hours, often ignoring alternative activities such as eating or sleeping (and even working).

A major shortcoming of most of the research carried out into readability over the last hundred or more years is that it has focused, for practical reasons, on short-duration reading tasks. Researchers have announced (with some pride) that they have used “long” reading tasks consisting of 800-word documents in their research.

Compare this with the average “ludic reading” session. Even at the low (for ludic readers) reading speed of 240 words per minute (wpm), a one-hour reading session – which in the context of the book classifies as a short read – the reader will read some 14,400 words.

For very short-duration reading tasks, such as reading individual emails, readers are prepared to put up with poor display of text. They have learned to live with it for short periods. But the longer the read, the more even small faults in display, layout and rendering begin to irritate and distract from attention to content.

The consequence is that a task that should be automatic and unconscious begins to make demands on conscious cognitive processing. Reading becomes hard work. Cognitive capacity normally available exclusively for extracting meaning has to carry an additional load.

If we are trying to read a document on screen, and the computer is connected to a printer, the urge to push the “print” button becomes stronger in direct proportion to the length of the document and its complexity (the demands it makes on cognitive processing).

The massive growth in the use of the Internet over the past few years has actually led to an huge increase in the number of documents being printed, although these documents are delivered in electronic form which could be read without the additional step of printing. Why? Because reading on screen is too much like hard work. People use the Internet to find information – not to read it.

Research into ludic reading is especially valuable to the primary goal of this study, finding ways of making electronic books readable. If eBooks are to succeed, readers must be able to immerse themselves in reading for hours, in the same way as they do with a printed book.

For this to happen, reading on the screen needs to be as automatic and unconscious as reading from paper, which today it clearly is not.

If we can solve this extreme case, the same basic principles apply to any reading task.


5.1 Ludic Reading Research

“It seems incredible, the ease with which we sink through books quite out of sight, pass clamorous pages into soundless dreams”, Fiction and the figures of life, W.H. Gass, 1972.

This passage is quoted in the introduction to “Lost In A Book”, written by Victor Nell, senior lecturer and head of the Health Psychology Unit at the University of South Africa.

Nell’s work is unusual and significant because it concentrates wholly on ludic reading, and details the findings of research projects carried out over a six-year period to examine the phenomenon of long-duration reading.

It looks at the social forces that have shaped reading, the component processes of ludic reading, and the changes in human consciousness that reading brings about. Nearly 300 subjects took part in the studies. In addition to lengthy interviews, subjects’ metabolisms were monitored during ludic reading. The data collected gives a remarkable insight into the reading process and its effect on the reader.

For anyone interested in reading research, this book is worth reading in its entirety. I’ll try to summarize the main points, then develop them. I’ve devoted a whole section to this book, because it’s such a goldmine of data.

Reading books seems to give a deeper pleasure than watching television or going to the theater. Reading is both a spectator and a participant activity, and ludic readers are by and large skilled readers who rapidly and effortlessly assimilate information from the printed page.


Nell gives a skeletal model of reading, then develops it during the course of the book.

Figure 1: Nell's preliminary model of Ludic Reading
Fig. 1: Nell’s preliminary model of Ludic Reading

5.2 The requirements of Ludic Reading

Nell gives three preliminary requirements for Ludic reading: Reading Ability, Positive Expectations, and Correct Book Choice. In the absence of any one of these three, ludic reading is either not attempted or fails. If all three are present, and reading is more attractive than the available alternatives, reading begins and is continued as long as the reinforcements generated are strong enough to withstand the pull of alternative attractions.

Reinforcements include physiological changes in the reader mediated by the autonomic nervous system, such as alterations in heartbeat, muscle tension, respiration, electrical activity of the skin, and so on. Nell and his co-researchers carried out extensive monitoring experiments on subjects’ metabolic rates, and collected hard data showing metabolism changes in readers as they became involved in reading.

These events are by and large unconscious and feed back to consciousness as a general feeling of well-being. (my italics). This ties in well to how book typography has developed to make automatic and unconscious the word recognition aspect of the reading process, which we will examine later in this paper.

In the reading process itself, meaning is extracted from the symbols themselves and formed into inner experience. It is clear that the ability of the content to engage the reader (the “quality” of writing), the reader’s consciousness, social and cultural values and personal experiences all play a part in this process.

Nell says the “greatest mystery of reading” is its power to absorb the reader completely and effortlessly, and on occasion to change his or her state of consciousness through entrancement.

Humans can do many complex things two or more at a time, such as talking while driving a car. But one of these pairs of behaviors is highly automatized, so only the other makes demands on conscious attention.

However, it is impossible to carry on a conversation or do mental arithmetic while reading a book. The more effortful the reading task, the less we are able to resist distractions and the more mental capacity we have available for other tasks, such as listening to the birds in the trees or other forms of woolgathering.

One of the most striking characteristics of Ludic reading is that it is effortless; it holds our attention. The Ludic reader is relaxed and able to resist outside distractions, as if the work of concentration is done for him by the task.

The moment evaluative demands intrude, ludic reading becomes “work reading”.

5.3 Highly-automated processes

Skilled reading is an amalgam of highly-automated processes: word recognition, syntactic parsing, and so on, that make no demands on conscious processing and the extraction of meaning from long continuous texts.

Although reading uses only a fraction of available processing capacity, it does use up all available conscious attention. Furthermore ludic reading, which makes no response demands of the reader, may entail some arousal, though little effort.

The term reading trance can be used to describe the extent to which the reader has become a “temporary citizen” of another world – has “gone away”.

“Attention holds me, but trance fills me, to varying degrees with the wonder and flavor of alternative worlds. Attention grips us and distracts us from our surroundings; but the otherness of reading experience, the wonder and thrill of the author’s creations (as much mine as his), are the domain of trance.”

“The ludic reader’s absorption may be seen as an extreme case of subjectively effortless arousal, which owes its effortlessness to the automatized nature of the skilled reader’s decoding activity; which is aroused because focused attention, like other kinds of alert consciousness, is possible only under the sway of inputs from the ascending reticular activating system of the brainstem; and which is absorbed because of the heavy demands comprehension processes appear to make of conscious attention.”

5.4 Eye movement

Reading requires two kinds of eye movements: saccades, or rapid sweeps of the eye from one group to the next, and fixations, in which the gaze is focused on one word group.

Reading speeds vary. There is a neuromuscular limit of 800-900 words per minute. Intelligent readers cannot fully comprehend even easy material at speeds above 500-600wpm. The average college student reads at 280wpm, and superior college readers at 400-600wpm. Skilled readers read faster than passages can be read aloud to them.

Even skilled readers pick up information from at most eight or nine character spaces to the right of a fixation, and four to the left. Fixation duration is dependent on cognitive processing (i.e. is determined by the difficulty or complexity of the material being read).

Findings of a sophisticated study (Just and Carpenter, 1980) discredit the widely-held view that the saccades and fixations of good readers are of approximately equal length and duration, or that reading ability is improved by lengthening saccade span and shortening fixation duration.

Perceptually-based approaches to the improvement of reading speed (increasing fixation span, decreasing saccade frequency, learning regular eye movements, reading down the center of the page, and so forth) are unsupported by studies which, on the contrary, show skilled readers do not use these techniques.

Studies show all book readers also read newspapers and magazines, the converse does not apply.

Ludic readers read at wildly different rates, Nell’s study found the fastest read at five times the speed of the slowest.

The reading speed of each individual varied just as dramatically in the course of reading a book. One reader moved between a fastest speed of 2 214wpm and 457wpm, while the average across the study group was a ratio of 2.69 between fastest and slowest speeds.

Nell found readers “savor” passages they enjoy most – often rereading them – while often skimming passages they enjoy less.

These last two findings suggest that any external attempt to present information at a pre-determined speed is doomed to failure, even if the reader is allowed to set presentation speed at their own average reading rate. The only method of controlling presentation rate which offers any hope of success would be to very accurately track the reader’s eye movements and link presentation rate to that.

Pace control is one of the reader’s “reward systems”, and terms such as savoring, bolting and their equivalents are accurate descriptions of how skilled readers read.

5.5 Convention – or optimization?

“The appearance of books has changed very little in the five centuries since the invention of printing. Lettering has always been black on white, lines have always marched down the page between white margins in orderly left-and-right-justified form, and letterspacing has always been proportional”.

Nell refers to these and other typographic features as print conventions, which have exhibited extraordinary stability. Considered together with the unchanging nature of perception physiology, he says, they make Tinker’s Legibility of Print (1963) appear to be the last word on the subject, although many technological changes have caused legibility problems. For example, in many of these technologies, word and letter spacing is less tightly controlled; letters may be fractionally displaced to the left or right to create the illusion of a word space, thus compelling the reader’s eye to make an unnecessary regression. Poor letter definition and low contrast, distortion of letters and words are also cited as contributing to poor legibility.

Extraordinary stability is a key observation. It suggests that these are not merely print conventions, but optimizations that have stood the test of time. What worked, survived. What did not work disappeared. Survival did not happen because so-called conventions were easier for the printer (in fact, the reverse is the case), but because they are tuned to the way in which people read. Good typesetting requires much more work and attention to detail than bad typesetting. But bad typesetting is not acceptable to readers.

Nell describes these and other effects of the developing technology as “onslaughts on ease of reading”.

Ludic readers seek books which will “entrance” them; the reader’s assessment of a book’s trance potential is probably the most important single decision in relation to correct book choice, and the most important contributor to the reward systems that keep ludic reading going once it has begun.

Reader’s judgements of trance potential over-ride judgements of merit and difficulty. Tolkien’s Lord of the Rings (1954) is a relatively difficult book, but many readers prefer it to easier ones because of its great power to entrance. Best-sellers are entrancing to large numbers of readers.

Nell undertook a large and complex study of the physiology of reading trance using as his subjects a group of “reading addicts”.

5.6 Reading and Arousal

During reading, brain metabolism rises in the visual-association area, frontal eye fields, premotor cortex and in the classic speech centers of the left hemisphere.

Reading is a state of arousal of the system. Humans like to alternate arousal and relaxed states. Sexual intercourse is high arousal followed by postcoital relaxation; reading a book in bed before going to sleep uses the same arousal/relaxation mechanism – reading before falling asleep is especially prized by ludic readers.

This suggests an electronic book (eBook) had better be able to cope with being dropped off the bed! It also suggests that a backlit book, with no need to have a reading light – and perhaps keep a partner from falling asleep – is a positive benefit of the technology. Reaction to early backlit eBook prototypes confirms this is an attractive feature.

Ludic reading is substantially more activated than the baseline state. Immediately following reading, when the reader lays down the book and closes her eyes, there is a “precipitous decline” in arousal, which affects skeletal muscle, the emotion-sensitive respiratory system and also the autonomic nervous system.

Perversely, the ludic reader actually misperceives the arousal of reading as relaxation – they perceive effortlessness, although substantial physiological arousal is actually taking place.

During ludic reading, heart rate decreases slightly. This indicates that the cognitive processing demands made by ludic reading are not high. Reduced heart rate suggests that the brain is not working hard, which would demand increased blood supply, therefore increased heart rate.

This finding that reading involves arousal is highly significant; it suggests a strong parallel between the level of awareness we achieve while reading and the level of awareness required to survive in primitive times. In effect, we are taking an automatic skill developed for survival for a “walk in a neighborhood park”, during which we meet many old friends (words we know) and make some new ones.

Reading is a form of consciousness change. The state of consciousness of the ludic reader has clear similarities to hypnotic trance.

Both have three things in common: concentrated attention, imperviousness to distraction, and an altered sense of reality.

Consciousness change is eagerly sought after by humans, says Nell, and means of attaining it have been highly prized throughout history – whether through alcohol, mystic experiences, meditative states, or ludic reading. “Of these, ludic books may well be the most portable and most readily accessible means available to us of changing the content and quality of consciousness. It is also under our control at all times”.

There are two reading types: Type A, who read to dull consciousness (escapism) and Type B, who read to heighten it. Type A read for absorption, Type B for entrancement.

Automatized reading skills require no conscious attention. This suggests that any distractions on the page which require the reader to make conscious effort (for example, poorly-defined word shapes, difficulty in following line breaks, etc.) will greatly detract from the experience.

Ludic readers report a concentration effort of near zero for ludic reading, climbing steeply through work reading (39 percent) to boring reading (67 percent).

At the end of the book, Nell draws together the threads of his research to build a motivational model of reading, reproduced below.

Figure 2: Nell's Motivational Model of Ludic Reading
Fig. 2: Nell’s Motivational Model of Ludic Reading

5.7 An expanded model

While this model sheds a great deal of light on the motivational aspect, it does not include a detailed examination of the “ludic reading process”, which is portrayed as a “black box”. For researchers who wish to examine the process itself in more detail, Nell cites the complex information processing model of reading developed by Eric Brown (Brown, E. R. (1981). A Theory of Reading. Journal of Communication Disorders).

Brown’s model, which takes up five separate pages, documents the true complexity of the reading process. Brown suggests that, contrary to previous theories that there are at least two different types of reading – phonemic and semantic – there is really only one, but that it is realized by fluent adult readers to a greater or lesser extent.

However, while an extremely complex sequence of events does take place in the reading process, it is normally automatic and involuntary.

Between Nell’s “black box” of the reading process, and Brown’s highly complex one, there is some middle ground which is worth exploring.

Expanding Nell’s “black box” only slightly gives a new, and I believe valuable, picture of the motivational model of reading.

There are at least three additional decision points that need to be added.

Figure 3: Motivational model with an expanded view of the reading process
Motivational model with an expanded view of the reading process

5.8 Additional decision points

The three additional decision points consist of:

  1. Degree of Effort. The reader carries out continuous subjective evaluation of the effort she is expending to read the book or document. The key to ludic reading, as put forward by Nell, is that it requires subjectively effortless attention. The process is in fact a state of arousal of the system, not a state of relaxation. But the reader perceives that she is relaxed, and the perceived effortlessness of the task is a key to this subjective feeling of relaxation. Once reading starts to feel like hard work – reading a hard passage, reading material on which the reader will later have to answer questions, or straining to read poor typography – the perception of effort augments the “stop reading reinforcers”. Once perceived effort passes a certain threshold value, the reader will simply stop reading.
  2. Comprehension. Evaluation is taking place continuously (Am I understanding this content?). The reader will certainly put in effort to comprehend difficult reading material (reading “broadens the mind”), but again there is a subjective threshold value. If it becomes too hard, reading will stop.
  3. Content match. (Am I enjoying this book?) Nell’s model suggests this is a once-for-all decision covered by Correct Book Choice in the Antecedents of Ludic Reading element of his model, but in reality this evaluation must also be continuous, with a threshold value which if exceeded will also result in the reader ceasing to read. How many of us have started a book and failed to finish it because it did not engage us?

Electronic books have a level playing-field with printed books in relation to comprehension and content match, provided publishers and developers ensure the same kinds of content is available on screen as can be found today in any successful bookstore.

Best-selling novels and best-selling authors achieve their success because the level of comprehension and their content is matched to the comprehension and “absorption criteria” of the general book-reading population. When I buy an espionage novel by English novelist Anthony Price, I know before I begin that this author is able to consistently engage me with characters and plot. I have positive expectations based on his track record with me. I already “know” (have my own internal model) of many of the central characters. These are old friends, who make few demands on my cognitive capacity.

It is in the area of effortless attention that eBooks – and all electronic documents – face their biggest challenge. It’s harder to read on the screen today than it is to read print.


5.9 Flow theory and the reading process

Nell’s analysis of ludic reading meshes extremely well with work on flow theory by researchers in recent years, the best-known of whom is Professor Mihalyi Csikszentmihalyi of the Department of Psychology at the University of Chicago.

(For readers who, like me, have trouble with his name, it’s pronounced “chick-sent-mee-high”: I’m grateful to the magazine article that thoughtfully included the pronunciation, thereby removing a major obstacle from verbally quoting the author’s work…)

Csikszentmihalyi, in his US best-seller Flow: The psychology of optimal experience, details how focused attention leads to changes in our state of consciousness.

Attention can be either focused, or diffused in desultory, random movements. Attention is also wasted when information that conflicts with an individual’s goals appears in consciousness.

What is the goal of the reader? To become immersed in the content. In this context, any information that takes conscious attention detracts from the reading experience. As Csikszentmihalyi says, “…it is impossible to enjoy a tennis game, a book, or a conversation unless attention is fully concentrated on the activity”.

He is even more specific later in his book, categorizing reading specifically as one of the activities capable of triggering the “flow state” by concentrating the attention.

“One of the most universal and distinctive features of optimal experience… is that people become so involved in what they are doing that the activity becomes spontaneous, almost automatic; they stop being aware of themselves as separate from the actions they are performing”.

He details activities designed to make the optimal experience easier to achieve: rock climbing, dancing, making music, sailing etc. In its most powerful form, the book, reading falls into that same category, as we will show later in this paper. The book is designed to capture human attention.

5.10 “On a roll”

Another researcher’s perspective on the flow experience appears in the paper A theory of productivity in the creative process (Brady, 1986), which examined how computer programmers achieve the state of maximum efficiency and creativity we call “being on a roll”.

The key to achieving the “roll state” is that concentration is not broken by distractions. “Interruptions from outside the flow of the problem at hand are particularly damaging … because of their unexpected nature”.

This data on the flow experience will resonate when we come to consider the psychology and physiology of reading and the typographic analysis in subsequent sections of this paper.


6. Previous Reading Research

6.1 The Reading Process: physiology and psychology

Reading is a complex physiological and psychological process involving the eyes, the visual cortex, and both sides of the brain. Memory is key to reading, from the simple and mundane act of recognizing a single letter, to comprehending a whole sentence or passage of text. (Taylor & Taylor, 1983: The Psychology of Reading)

Reading psychology and physiology are tied inextricably to the development of human language and writing systems. Methods of printing books and documents were a groundbreaking development only in that they enabled mass production of what had previously been a manual task requiring perhaps years of labor by a scribe.

By the time printing systems appeared, writing was already a very mature technology. Johannes Gutenberg was not the “Thomas Alva Edison” of writing systems. He was the “Henry Ford”, who worked out how to turn what was previously a hand-built technology into a system for mass production.

The writing system itself remained basically unchanged. In fact, the first typefaces were designed to emulate as nearly as possible the calligraphy of scribes.

Writing and reading were a natural outgrowth of the human instinct for pattern-recognition. Pictures were drawn to represent animals and other objects as early as 20,000 BC – the Stone Age. Reading and writing systems were in existence in North Babylonia 8000 years ago. Alphabet signs were used in Egypt at least 7000 years ago.

A detailed history of the evolution of reading and writing (also one of the earliest and most widely quoted works on the psychology and physiology of reading) is found in The Psychology and Pedagogy of Reading (HUEY, 1915).

6.2 How we read

A huge amount of work has been done, and many books and scientific papers have been written, on how we read. Researchers have dived down into incredible levels of detail, and several different models of how memory works in reading have emerged. There are disputes about the roles of long- and short-term memory for example.

However, all researchers agree that the primary task in reading is pattern recognition. There are disputes about the length of patterns we recognize – individual letters, whole words, groups, phrases and sentences – and how these are assembled, parsed and given meaning by the human mind. But all agree we recognize patterns and then mentally process them in some way.

The traditional approach to teaching reading was to first teach the alphabet of letters, then teach words. Other systems have emerged which concentrate first on whole words.

While the letter-then-word system held sway for languages with alphabets, in languages with logographies such as Chinese, the method of teaching is based on learning words first – since a single character is a word or phrase. Later, children learn the meanings of the component parts or strokes of those characters.

Teaching of reading in alphabet-based systems has moved towards the latter model in past decades, focusing more on words than the basic alphabet, which is learned in the process.

Taylor and Taylor suggest both letter- and word-recognition theories are valid. Poor readers often do not progress beyond the stage of having to identify individual letters before they can recognize a word. Even the adept reader who comes across an unfamiliar word will fall back to recognizing word-parts and even single letters. Ability to use words rather than letters as a unit increases with age and reading skill.

6.3 Saccades and fixations

French oculist Emile Javal in 1906 made the surprising discovery that we read, not with a smooth sweep of the eyes along a line of print, but by moving our viewpoint in a series of jumps or saccades and carrying out recognition during pauses or fixations.

The reader focuses the image of the text upon the retina, the screen of photosensitive receptors at the back of the eyeball. The retina as a whole has a 240-degree field of vision, but has its maximum resolution in a tiny area at the center of the field called the fovea which is only about 0.2mm in diameter. Foveal vision has a field of only one or two degrees at most (Taylor & Taylor). Huey suggested its field of vision was only about 0.75 of a degree of arc. Outside the fovea is the parafovea, three millimeters in diameter and with a field of around ten degrees. From there vision becomes progressively less clear all the way out to the periphery of the retinal field.

Target words are brought into the fovea by a saccade. After information is acquired during a fixation, another saccade moves to the next target word. Occasionally, the eyes jump back to a previous word for clarification of incomplete perception (or in some cases, just to enjoy a particular passage a certain time, or to help with semantic understanding of a complex passage).

Information is gathered by foveal vision. Parafoveal vision is used to determine locations of following fixations.

These eye movements are under constant cognitive control.

6.4 Shape and rhythm are critical

Readers learn to recognize words, not letters, although individual letters can help word recognition.

Thus the shapes of letters, and the way they are assembled together into words, are critical to ease of reading.

Huey makes it clear that the way in which the stream of words is presented to the reader’s eye is also critical. “Lines of varying length lead to a more cautious mode of eye movement… and may cause unnecessarily slow readers”. Elsewhere, he says “…when other conditions are constant, reading rates depend largely upon the ease with which a regular, rhythmical movement can be established and sustained.”

Letter shapes and the way they are assembled into words and presented to the reader is the domain of typography.

In the next section we will examine how typography has developed to take advantage of the instinctive human behavior of pattern recognition. We will show how the properly typeset book is a sophisticated yet largely invisible technology deliberately constructed to hook human attention by making this pattern recognition process automatic and unconscious.

Barriers to effective reading. Huey suggests that bad lighting and bad posture are the two most common causes of reading fatigue.

Too great a distance between desk and seat causes problems, and correct reading angle – which must be matched to the height of the reader – is also necessary. Consider the difference between reading a book (normally held at an angle of 45-degrees) and reading from today’s CRT computer monitors (which place text at a 90-degree angle to the reader). This is an effective argument for a tilting screen which can be placed below the reader’s sight horizon, as seen in the latest flat-panel LCD displays, or for an eBook which can be easily held in the hand or placed on a tilting stand.

6.5 Typographic Research

A huge amount of typographic research has been conducted this century, most of it related to legibility in print. The most prolific of the typographic researchers has been without question Professor Miles Tinker of the University of Minnesota, who with his colleague Donald Paterson published dozens of research papers and a number of books summarizing experiments with thousands of subjects. By 1940, Tinker and Paterson had already given speed of reading tests Tinker had devised to 33,000+ subjects, and he and Paterson continued to work in this field for more than 20 years.

Tinker attempted to evaluate all of the variables in turn: typefaces, type sizes, line length, leading, etc.. In many cases, he reached conclusions that can serve as fixed guidelines for setting readable type. Many of these seem relatively obvious in retrospect, but they have value since they are confirmed by scientific data. However, it must be continually kept in mind that Tinker’s testing was on relatively short passages. Small differences in reader preferences that might be acceptable in shorter reading tasks are likely to become magnified the longer the duration of reading.

The most complete summary of their work is contained in Legibility of Print (1963).

For example:

Typeface. Typefaces in common use are equally legible. Tinker cites faces such as Scotch Roman, which was in widespread use at that time for school textbooks.

Readers prefer a typeface that appears to border on “boldface”, such as Antique or Cheltenham. Sanserif faces are read as rapidly as ordinary type, but readers do not prefer it.

Type style. Italics are read slower than ordinary lower-case roman. While bold type is read at the same speed as roman, seventy percent of readers preferred ordinary lower case. So neither italics nor boldface should be used for large amounts of text, but should be kept for emphasis only.

Type size. 11-point type is read significantly faster than 10 point – but 12 point was read slightly more slowly. 8-and 9-point types are significantly less readable, and once the type size rises to 14 points, efficiency is again reduced. This finding is extremely important when it comes to designing books to be read on the screen, since displaying type on screen at sizes anywhere below 14-point presents technical difficulties due to poor screen resolution. This key issue is addressed later in this paper in describing a new innovative display technology capable of solving these difficulties even on the screen resolutions of today.

Line length. Standard printing practices of between eight and 12 words to the line are preferred by readers. Relatively long and short lines are disliked.

Leading. Readers definitely prefer type set with “leading” or additional space between lines. 10-point type, for instance, is preferred with an additional two points of leading added between the lines. More leading than this begins to counter the beneficial effect.

As a general principle, at body text sizes, an additional 20 percent of space should be added, although type size, leading and line length are inter-related variables, none of which can be designed in isolation.

Tinker defines a series of “safe zones” or effective combinations for type sizes from 6 to 12 points.

Page size and margins. Tinker makes no recommendation on page size other than calling for publishers, printers and paper manufacturers to agree on standards. This suggests that the page sizes in common use are satisfactory. The experiments on line length confirm this. Tinker’s experiments showed that readers preferred material with margins, although experimental work showed material without margins was just as legible. This is one of the areas where Tinker’s testing techniques using relatively short-duration reading tasks may well mask a deeper effect which in short-duration tasks is expressed only as a reader preference, but on a longer-duration task such as book reading may surface as an irritation.

Color of print and background. Black print on a white background is much more legible than the reverse. Printed material on the whole is perceived better as the brightness contrast between print and paper becomes greater. Reading rates are the same for colored ink on colored paper, provided high contrast is maintained.

Tinker’s work, while focused often on single variables, recognized that typography was a system of many inter-related variables. If only two or three of those variables were degraded from optimum settings, he found that this was accompanied by a rapidly-increasing loss in legibility.

6.6 The book as a “system”: Tschichold and Dowding

To truly understand the typography of the book as a “system”, we have to examine the work of specialists in book typography. It is here that analysis often runs into difficulties, since many typographers and designers speak in a language with its own esoteric terms.

The best typographers I have met or read, though, all agree on one point: the purpose of typography in a book is to become invisible. We can re-state this in more scientific terms as “making the reading process as transparent as possible for the reader”. Good typography is meant to pass unnoticed, although achieving it requires an astonishing attention to detail that the lay person can easily misconstrue as “unnecessary fussiness” or even “just art”.

Typographers and designers talk often in terms such as the color of a page (a uniform grayness in which no single word, letter or space stands out from the whole). “Nothing should jump out at you” is another frequent assertion, or “Typography should honor the content”.

What do these unscientific terms really mean? For a detailed analysis of book typography, the reader can do no better than to read in its entirety The Form of the Book, by the eminent 20th Century typographer Jan Tschichold.

6.7 Tschichold: The rebel who recanted

Tschichold’s own history is of great value in the search for readability in books. He was one of the “young rebels” who in the 1920s and 30s led the “revolution” in typography that was meant to overthrow centuries of hidebound tradition.

Tschichold was one of the leading lights of the “New Typography” of that time, in which the rebels eschewed the conventions of the past. Serif typefaces were passé, and text was to be set ragged right, with no indents for paragraphs but instead with additional space between them.

Tschichold was such a leading light among these revolutionaries that in 1933 he was imprisoned by the Nazi Government for six weeks for his “subversive ideas”. Perhaps they wanted to make certain that the traditional “Aryan” values they believed to be embodied in the Gothic blackletter in common use in Germany and Austria at that time were not diluted by “non-Aryan” typography, taking the same attitude to “modern” typography as they took to modern art.

Tschichold fled to Switzerland with his wife and infant son, and spent most of his life in that country until he died in 1974. He spent two years in London at Penguin books, which was at that time the largest publisher of paperback books in the world.

Within two years of leaving Germany, Tschichold began to step back from his revolutionary theories. The Form of the Book”, a series of essays published in 1975, a year after his death, shows that in the course of the next 30 years he had fully recanted. It is of all the more value because Tschichold clearly took none of the “print conventions”, as they have been described elsewhere, at face value. All were rejected, and then returned to in the light of experience.

Tschichold’s writings are especially valuable because he expressed good book typography and how to achieve it in extremely scientific terms. His work is summarized below, although it contains far more detail which cannot be ignored if good typography is to be achieved on the screen.

6.8 Achieving good typography

“Perfect Typography depends on perfect harmony between all of its elements. It is determined by relationships or proportions, which are hidden everywhere; in the margins, in the relationships of the margins to each other, between leading of the type and the margins, placement of page number relative to type area, in the extent to which capital letters are spaced differently from the text, and not least in the spacing of the words themselves.

Comfortable legibility is the absolute benchmark for all typography, and the art of good typography is eminently logical.

Leading, letterspacing and wordspacing must be faultless.

The book designer has to be the loyal and tactful servant of the written word.

Though largely forgotten today, methods and rules on which it is impossible to improve have been developed over centuries. The book designer strives for perfection which is frequently mistaken for dullness by the insensitive. A really well-designed book is recognizable as such only by a select few. The large majority of readers will have only a vague sense of its exceptional qualities.

Typography that cannot be read by everybody is useless. Even with no knowledge, the average reader will rebel at once when the type is too small or otherwise irritates the eye. (We may not know about Art, but we know what we like!)

First and foremost, the form of the letters themselves contributes much to legibility or its opposite. Spacing, if it is too wide or compressed, will spoil almost any typeface.

We cannot change the characteristics of a single letter without at the same time rendering the entire typeface alien and therefore useless.

The more unusual the look of a word we have read – that is to say, recognized – a million times in familiar form, the more we will be disturbed if the form has been altered. Unconsciously, we demand the shape to which we have been accustomed. Anything else alienates us and makes reading difficult.

Small modifications are thinkable, but only within the basic form of the letter.”

6.9 Back to the classical approach

After fifty years of experimentation – and indeed being one of the leading lights of “innovation” and “revolution” – Tschichold concluded “the best typefaces are either the classical fonts themselves, recuttings of them, or new typefaces not drastically different from the classical pattern”.

Sanserif faces are more difficult to read for the average adult. This assertion by Tschichold that serif faces are more readable is not fully consistent with Tinker’s finding that sans serif faces are no less readable. However, it should be borne in mind that Tinker’s research was based on much shorter-duration reading tasks than the book, whereas Tschichold was speaking only of typefaces for books. Tinker’s finding that readers preferred serif faces may indicate that research with book-length reading tasks would produce harder evidence.

Beginnings of paragraphs must be indented. The indention – usually one em – is the only sure way to indicate a paragraph.

The gestalt of the written word ties the education and culture of every single human being to the past, whether he is conscious of it or not. “There are always people around offering ever-simpler recipes as the last word in wisdom. At the present it is the ragged-right line, in an unserifed face, and preferably in one size only”.

Beside an indispensable rhythm, the most important thing is distinct, clear and unmistakable form. Tschichold is talking about reading gait here.

Good typesetting is tight; generous letterspacing is difficult to read because the holes disturb the internal linking of the line and thus endanger comprehension of the thought.

Italics should be used for emphasis.

Two constants reign over the proportions of a well-made book: the hand and the eye. A healthy eye is always about two spans away from the book page, and all people hold a book in the same manner.

The books we study should rest at a slant in front of us.

6.10 Size DOES matter!

Tschichold analyzed page sizes and margins in detail, and says a proportion of 3:4 in page size is fine, but only for quarto books that rest on a table. It is too large for most print, because the size of a double-page spread makes it unwieldy. However, in an electronic book – which has no “facing pages” – this would suggest that the standard screen proportion of 3:4 would work quite well, provided it was used in portrait orientation.

Harmony between page size and type area is achieved when both have the same proportions.

Choice of type size and leading contribute greatly to the beauty of a book. The lines should contain from eight to twelve words; more is a nuisance. Typesetting without leading is a torture for the reader.

Care must be taken to make the spaces between the words in a line optically equal. Wider spacing tends to tear the words of a sentence apart and make comprehension difficult. It results in a page image that is agitated, nervous, flecked with snow. Words in a line are frequently closer to their upper and lower neighbors than to those at the left and right. They lose their significant optical association. Tight typesetting also requires that the space after a period be equal to or narrower than the space between words.

Indents are required at the start of paragraphs. So far no device more economical or even equally good has been found to designate a new group of sentences. Type can only be set without indents if care (i.e. manual intervention) is taken to give the lines at the ends of paragraphs some form of exit. Typesetting without indents makes it difficult for the reader to comprehend what has been printed.

Normal, old-fashioned setting with indents is infinitely better. It simply is not possible to improve upon the old method. It was probably an accidental discovery, but it presents the ideal solution to the problem.

Italic is the right way to emphasize. It is conspicuous because of its tilt, and irritates no more than is necessary for this function.

6.11 Leading or Interlinear spacing

Leading is of great importance for the legibility, beauty and economy of the composition.

Poor typesetting – set too wide – may be saved if the leading is increased. But even the most substantial leading does not abrogate the rules of good word spacing.

Leading in a piece of work such as a book depends also on the width of the margins. Ample leading needs wide borders in order to make the type area stand out.

Lines over 26 picas almost always demand leading. Longer lines need more because the eye would otherwise find it difficult to pick up the next line.

A fixed and ideal line length for a book does not exist. 21 picas is good if eight to ten-point sizes are used. It is not sufficient for 12 point. Nine centimeters looks abominable when the type size is large, because good line justification becomes almost impossible.

Widows – single words or worse, hyphenated parts of words, which appear as the first line on a page – are unacceptable. The typesetter needs to look at preceding pages – perhaps all the way back to the start of the chapter, where there is generally additional space between chapter heading and text.

Pure white paper is cold, unfriendly and is upsetting because, like snow, it blinds the eye. Lightly tinted paper is superior. This suggests that the screen – which is incapable of displaying snow-white – has some hope. It may even be desirable to use a color tint. It is not only unnecessary, but runs counter to good readability, to try to achieve the binary contrast effect of pure black type on pure white paper.

Tschichold’s assertions are set out in a logical manner. He makes it clear that creating easily-recognizable word-patterns, by attending to the shapes of letters, then to the way in which they are assembled into easily-recognizable words, is at the core of good book production. The remainder is the task of presenting these words to the reader in a smoothly-flowing stream.

The devil is in the details. Some letter pairings in words, for example, do not fit well together unless the pairs are “kerned” or moved closer together to remove some of the white space, which would otherwise tend to break up the word.

Ligatures are another method of grouping letters more closely together to harmonize two or even three-letter combinations: “ff” and “ffl” being two examples.

6.12 Dowding: FINER POINTS in the spacing and arrangement of TYPE

Another fine logical analysis of the science of typography is given in Finer Points in the Spacing and Arrangement of Type” by Geoffrey Dowding. Dowding had a long career as typographer to many British publishers; he was also an instructor in typographic design at the London College of Printing for over 20 years.

Most of the book is devoted to the setting of type for continuous reading (i.e. the book).

“Typography consists of detailed manipulation of many variables which may not be immediately obvious, but which in sum add enormously to the appearance and readability of text” – almost an exact echo of Tschichold.

Even the most carefully-planned design will fall short of perfection unless unremitting attention is paid to these details, “minor canons” which have governed both the printed and written manifestations of the Latin script from the earliest times.

Disturbingly large amounts of white space in the wrong places, i.e. between the words, is the antithesis of good composing and sound workmanship.

Consistently close spacing between words, and after full stops, secures one of the essentials of well-set text matter – a striplike quality of line.

An excessive amount of white space between words makes reading harder.

More interlinear spacing can mitigate the effect of carelessly-spaced lines, but a combination of well-spaced lines and properly-spaced words magnifies the beneficial effect of both.

Why does close spacing work?

6.13 Spacing and recognition

A child learns to read by spelling out words, at first letter by letter, then syllable by syllable and afterwards by reading individual words one at a time. But the eyes of the adult reader take in a group of words at each glance.

Although quite wide spacing is desirable between the words of a child’s book and ample leading is also necessary between the lines (reducing progressively as the child becomes older and more adept), in settings not intended for young children great gaps of white between the words break the eye’s track.

The “color” or degree of blackness of a line is improved tremendously by close word-spacing. A carefully composed text page appears as an orderly series of strips of black separated by horizontal channels of white space.

In slovenly setting the page appears as a gray and muddled pattern of isolated spots, this effect being caused by overly-separated words (the same spottiness is noticeable in most typefaces when read on the computer screen).

The normal, easy, left-to-right movement of the eye is slowed down simply because of this separation; further, the short letters and serifs are unable to discharge an important function: that of keeping the eye on ‘the line’. The eye also tends to be confused by a feeling of vertical emphasis, that is, an up & down movement, induced by the relative isolation of the words & consequent insistence of the ascending and descending letters. This movement is further emphasized by those ‘rivers’ of white which are the inseparable & ugly accompaniments of all carelessly set text matter. The letter-spacing of words in upper- and lower-case increases the confusion. Of course, in solid, i.e. unleaded settings, such faults, both of word- and of letter-spacing, are especially noticeable.

Any feeling of vertical emphasis is absent in a well-composed page, the close word-spacing ensuring that the white space is available for use between the lines where it serves the useful purpose of aiding readability. It is astonishing how much space can be saved depthwise by close spacing in the lines themselves. And in hand-setting when word-spacing in a line is close it is more likely to be even throughout the line. In varying the spacing between pairs of words in a too openly spaced line, frequently and often shockingly, the compositor is obviously not intent on securing visually even spacing throughout the line but on justifying it with the least amount of effort in the shortest possible time.

The plea for closer wordspacing in text settings is not something which has been fathered recently by a small company of eccentrics. In the best printing it has been an established practice for over five hundred years, and in the manuscript for many more centuries than that.

In arranging text setting care must be exercised to ensure that the type and the measure are so related that the eye has, firstly, no difficulty in swinging easily to & fro without any suggestion of strain: and secondly, is not hindered in finding the beginning of the following line.

“Other things being equal, the longer the line the greater the excursions of the eyes and the greater the difficulty in passing from one line to the next. Very short lines, on the other hand, demand too frequent a change of direction in the movement of the eyes.”

For what kind of setting are we designing? Is it a large work? How is it to be used? Is it for a Bible, a work of reference, or a novel? If the work is a lectern Bible the reader will be standing & his eyes will be at a considerable distance from the page: each period of reading is likely to be a short one. On the other hand if the work is a pocket dictionary or other book of reference it will either be consulted for brief periods or be pored over; if it is a novel it may be read quickly, perhaps in a single evening: an easily readable measure is therefore imperative.

6.14 Line Length

Those lines which exceed the normal, i.e. lines of more than nine or ten average words, must be leaded proportionately in order to compensate for the extension.

Lines in this document are on average 14 words long. But they have additional space between them. This makes them acceptable in a document such as this, intended to be printed on standard letter-sized paper. The lines have also been shortened, by increasing the size of the margins. In a book, this amount of leading would drive up production costs by creating far more pages. This document is also intended to be printed on standard “US letter” paper. If the leading is not increased as the measure is extended there is a risk of “doubling”, i.e. reading the beginning of the same line twice. But some settings to very narrow measures may require less leading than normal or near normal measures.

Seriffed faces generally, with the exception of those styled ‘modern’, are undoubtedly easier to read than the sans serifs because the serifs help the normal horizontal movement of the eyes in reading by carrying them along the line. (By “modern”, Dowding is referring to faces such as Bodoni, which have very black, almost bold, letter stems, with highly-contrasted thin serifs). No such guides exist in a sans serif face and unless the lines are impeccably set and well separated by leading there is a distinct tendency to movement in the other direction, i.e. a vertical, or up-and-down movement. Sans serif faces require more leading than any other kind of type, except perhaps the Egyptians. Neither is suitable for solid setting (i.e. unleaded). Modern faces like Bodoni are inclined to dazzle the reader for the reasons already stated especially when printed on coated papers. Faces in this group should always be amply leaded.

6.15 Dividing words: hyphenation

Consistently close and even spacing cannot be achieved, except in the most unusual circumstances, if the typesetter has resolved never to divide words. Such works would rarely, if ever, be of any typographic distinction.

It is a popularly though erroneously held opinion that close spacing in text setting inevitably multiplies the number of word divisions, for one can have as many, or more, divided words in a careless piece of text composition as in one that is well set. Indeed, the reduction of word-spacing in a slovenly setting often helps to reunite, and so reduce, the number of divided words. And in ease of reading we tend to gain more by the close spacing of words than we lose in the momentary pauses occasioned at the ends of lines by word-division: one pauses at the end of each line in any case.

It is infinitely preferable to have a number of break lines succeeding each other than to have widely word-spaced lines. In a little book (Symbola Heroica) printed at Antwerp by Christopher Plantin in 1583, five successive hyphens are a commonplace, six occur frequently and there is at least one instance of ten. The word-spacing is very pleasant and there are never any rivers.

Words must be divided according to syllabic or etymological principles. Breaking words merely to the convenience of a full line cannot be justified.

There are two places in which divided words prove objectionable. First, in books for the very young. Children who are learning to read are likely to be confused by them. The fact that a hyphen follows the first part of the division means little to a child. Second, no paragraph should end with a divided word. “Widows” are frowned on by some but much depends on their position on the page: syllabic ‘widows’ would rightly be condemned wherever they appeared.

6.16 Tighter setting: importance of ligatures

The term ligature comes from the Latin word ‘ligatura’ which means anything used in binding or tying. In printing, an exact definition of the word would recognize only the actual tie or link between two joined letters, e.g. between the letters ct, st in some fonts. Now, however the term ligature is used less exactly to describe those combinations of either two or three letters which are joined together and cast as one unit, for example ff fi fl ffi ffl, and the compound vowel characters, or vowel-ligatures, æ, œ, known as dipthongs. The ‘f’ ligatures & the vowel ligatures are the ones which are standard to the normal font.

Early founder-printers cut many ligatures for their fonts; today, only certain type designs carry (in roman and italic) great numbers of ligatured letters.

If letters, normally ligatured, are set separately, as they sometimes are, they create the impression that they are on the wrong ‘set’. This unpacked and spotty appearance is caused by the excess of white space round them. From the purely practical point of view ligatures are space savers.

Both Tschichold and Dowding make it clear that the purpose of typography is to create text which presents easily-recognizable word-shapes to the reader in as smooth a manner as possible, leading the reader’s eye from the first word to the last in an unbroken and smooth manner, and that ligatures, because they improve the optical spacing of characters, make words easier to recognize.

In the next section, we analyze this flow and how it is set up.


7. How the Book works

7.1 A 300-page waterslide for human attention

Now we have seen how human attention works, and the importance of pattern-recognition as an instinctive human trait.

We have some understanding of how this instinctive trait led to the evolution of writing and reading systems that goes back some 22,000 years.

We have seen how books alter our state of consciousness, how they capture and hold our attention.

We have glimpsed some of the factors at work in making reading easy or hard, and some of the care and attention to detail that goes into the typography of the book in order to make it invisible.

Now we can conduct a scientific analysis of the book and how the technology actually works.

It is hard to recognize technology at work in the book, since there are no flashing LEDs, no wheels, knobs or switches. There are no moving parts to this technology; because its magic is to stay still, but to move us. But it is basically a 300-page waterslide for human attention.

7.2 The logical structure of the book

We’ll start our analysis of the book at the micro level.

The atomic component of the book is the letter. Formation of the letters is important, since these have to be easily recognizable (Tinker, Taylor, Nell, Tschichold, etc.).

Although typefaces vary in the shape of their letters, creative differences are only possible within fairly narrow boundaries. No matter how creative the type designer wishes to be, a lower-case “a” always has to be recognizable as such, or it is unusable. (This very fact is central to the US Government’s refusal to allow copyrighting of typefaces. Other Governments, e.g. Germany, take a different view and recognize the creativity involved in typefaces.)

Design of letters is not merely about the shapes of the letters, but how those work together to form words. It is also about the visual balance between the counters – spaces inside characters – and the spaces outside. Some of this is the domain of the type designer, who has to consider how all the letters of her face work in combination with each other when creating them. Some of it is the province of the typesetter (human or computer) who has to define a letterspacing that makes the words as effortless to recognize as possible. (for more details, see Jan Smeijers excellent book, Counterpunch)

Typographers pay extremely close attention to letterspacing, since it is a core part of their work. Letterspacing should be as tight as possible without causing characters to collide. Typographic research documents that as letterspacing gets wider, word recognition becomes progressively harder. Letterspacing should remain constant throughout a book.

There is plenty of hard data from typographic research documenting how serif typefaces (e.g. Times New Roman, Palatino) work better for longer-duration reading tasks than sanserif faces (e.g. Helvetica, Arial).

The serifs fulfil two tasks. First, they aid in visually tying the individual letters together into word “gestalts”, making those units easier to recognize. Second, their direction helps to lead the eye along the horizontal path that makes for effortless recognition of successive “patterns” or words.

Researchers have failed to understand the significance of letter spacing and its effect on building easily-recognizable units of meaning. For example, one study carried out on letter-spacing attempted to gauge its effect by measuring subjects’ ability to recognize “pseudo-words” (i.e. groups of letters with no meaning) under varying letter-spacing conditions.

This study concluded that letter-spacing had no discernible effect on pseudo-word recognition.

The researchers were exactly right – and totally wrong at the same time. It is the fact that words have meaning, and that our brain recognizes units of meaning, that makes letterspacing important. The easier we make the brain’s task of recognizing units of meaning, the easier the document is to read. The study completely misses the point.

7.3 Words and Lines

The next important piece of the technology is inter-word spacing. Research shows that this should be perceptibly wider than the inter-character spacing, to help point out the end of one word-recognition and the start of the next one. There is an optimum value for each type size. While actual spacing may vary slightly from this optimum, it can do so only within fairly tight limits, or it will interfere with the “flow” of the eye across the line of text.

Inter-word spacing should also be more or less constant throughout the whole of a book. This allows the reader to find and maintain his own “harmonic gait” or relaxed reading speed throughout the whole text.

The next level up in the technology is how this arrangement of words and spaces between them is assembled into lines. Research shows that line-length is important; an optimum value is around 66 letters and spaces per line at normal reading distance and in normal type sizes of 10-12 point. It also shows that ideally, line length should be constant, “cueing” the reader’s eye by keeping the start and end of each line in the same horizontal position. This way of setting text is called “justified”.

This brings up another set of factors. If line length is constant AND spacing between words cannot vary by more than a small amount, then the only way to achieve this is to hyphenate words. Research shows the obstacle of having to piece together two parts of a hyphenated word on successive lines means less effort for the reader than having to deal with variations in word spacing.

However, if words are to be hyphenated it must be done meaningfully, on the basis of syllables or etymology; the only way to do this is by using a language-specific dictionary that stores acceptable hyphenation points for words in that language. Algorithmic hyphenation is a very poor substitute; incorrectly-hyphenated words are blockers to the smooth flow of effortless attention.

It may be possible to save having to ship a hyphenation dictionary with every language by encoding “soft-hyphens”, or hyphenation opportunities, in electronic book or other computer-read text. Such soft hyphenation would be carried out by authoring tools.

If words are to be hyphenated and spacing adjusted microscopically, then this must be done at paragraph, not line, level. In some specific cases hyphenation is unacceptable, e.g. when a hyphenated part of a word lands as the only text left in a new line at the top of a page. In this case, decisions have to be made interactively about which is the “least worst” case: e.g. adding an extra line elsewhere in the text, (for instance at the start of a chapter, or on the previous page), or reducing inter-word spacing in that paragraph below the desired threshold value.

So line length, word spacing, number of lines and hyphenation are inter-related variables.

Distance between successive lines of text is another important factor. It should be constant throughout a book. In traditional typesetting, this is called “leading”, from the strips of metal that were put between lines of set metal type to control spacing.

Size of type, line length and leading are inter-related variables.

7.4 Top-down analysis

Once we understand the method of assembling individual lines, it is worth changing our perspective on the book, and analyzing it from a “top-down” viewpoint.

Page size is important; “portrait” orientation gives more lines, of a better length, than landscape. It is ideally suited to the way we read, in fact, it evolved from basic reading principles. It is NOT an artifact left over from the past that needs to be left behind as we evolve reading technology; it is a result of the way we read, not a cause.

The page acts as a focal plane for the eyes. We use the high-acuity areas of the fovea and parafovea to read the text. The text area works best when proportional to the page size, creating a perspective that keeps the reader’s attention from wandering away from the text. The margins thus created also help the reader to unconsciously define the “field of recognition”; i.e. “this is the area of attention, where word recognition takes place”.

Understanding that reading is built upon the survival trait of pattern recognition reveals another way in which this “focal plane” works. Peripheral vision in animals and humans does not have the same focal resolution of high-acuity foveal and parafoveal vision. It is designed not to focus on pattern recognition, but to detect movement. Peripheral vision is used to detect threat or prey. It is the background detection system that tells us when and where to focus attention.

In reading, peripheral vision can remain in its “background/watchful” state, leaving us free to focus attention on the content.

Margins delineate the area between “attention” or focus and “ background watchfulness”.

At the top of the page, the eye begins on the first word. If more than a single word is contained in the fixation, constant inter-word spacing defines where one recognition ends and the next begins.

Now the attention (having been trained in the reading process) moves to the next line, repeating the same process for the number of lines on the page.

This flow is repeated from page to page.

7.5 Visual Cues

At every stage, there are visual “cues” to help us. These cues are constant. Analyzing book typography from this perspective reveals a large number of possible variables that are given constant values (for a single instance of a book):

The effect of these constant values is that we can settle into a comfortable (because unconscious) reading pattern or gait.

7.6 Disrupting the flow

Anything inside the text which disrupts this regular pattern has the effect of making an automatic, unconscious process become a conscious one: character shapes that are hard to identify, bad letterspacing that makes it an effort to recognize words, large variations in the spacing between words, and so on.

Tinker identifies all of these variables, and asserts that not only do they all have to be tuned to work together to make reading effortless, but that if only two or three are set sub-optimally, it can completely destroy reading efficiency.

It can thus be seen that the technology of a book is a complex engine to capture and hold human attention by directly hooking our innate pattern-recognizing behavior.

The book is a technology for Optimizing Serial Pattern Recognition. We get on this 300-page waterslide at Word One, and it is so designed that our attention slides from word to word, from line to line, and from page to page until we reach the last word. Of course, it is not quite that simple: we stop reading when other distractions, such as hunger, get in the way and rise to take priority. We also sometimes regress, perhaps to read a passage we did not fully understand, or enjoyed so much we want to go back and savor. But the natural dynamic is for a serial flow from start to finish.

There are other types of reading: encyclopaedias, reference books and so on, which we read in different ways. But they use the same basic mechanism to keep us in the passage which we are reading.

This technology is at least as complex as an internal combustion engine. And like an internal combustion engine, it takes only one or two variables wrongly “tuned” to make the whole engine vibrate with a dramatic loss in efficiency.

Precise attention to apparently insignificant tiny details such as word and letter-spacing is not disproportionate “fussiness”. It is by these and all the other details that the true power of setting up serial pattern recognition is achieved, making the recognition apparently effortless for the reader.

7.7 Underlying mathematics

There is an underlying mathematics to OSPREY that can be captured in software code.

This mathematics is largely already known. Desktop publishing applications such as Adobe PageMaker, Quark Xpress and Microsoft Publisher do this today, provided they are driven with the correct parameters. Purists may gasp at the placement of Publisher in the same context as “professional” publishing applications. But in reality Microsoft Publisher 98, with its Quill pagination engine and underlying Line Services line-breaking engine, stands up extremely well in comparison.

All three of these examples, and the many other similar software packages on the market, suffer from basic shortcomings in relation to displaying text on screen for electronic books.

The first and most important of these is that they were all designed to produce print. As such, their screen display of text must adhere strictly to WYSIWYG. What their users really care about is what they will see in print. Screen text then has to be an exact representation of printer text. To achieve this, printer font metrics are used; this results in distortion of the screen display. Word and letter-spacing, and even the shapes of the characters themselves, are altered in order to make the lines match the printer output.

All defaults and adjustment of spacing are driven by these printer metrics: ideal screen parameters would be different.

All of these applications provide a framework of controls that allow the experienced user to achieve good results. None of them is today capable of taking text and automatically setting it to the right type size, measure, leading, page size, margins etc. for the eBook. However, the underlying code is perfectly capable of being tuned to do the job.

It is perfectly possible to program a set of “harmonic tunings” for the OSPREY variables and have code such as Quill and Line Services format incoming book text automatically.


8. The state of screen reading today

8.1 Research into reading on screen

Now that we understand how the printed book works, we can turn at last to the computer screen and compare it with print. An understanding of the OSPREY principle, and how books and other printed documents use it in order to capture and hold our attention, reveals just how poor reading from the screen is by comparison. However, this understanding also reveals methods for dramatically improving screen displays – to the point where screen reading can become widely accepted.

Before we move forward, though, we need to look back at the research so far, and the conclusions of the researchers.

A great deal of research has been done over the past 20 or so years into reading on the screen. It began in the days of primitive computer displays – known as VDUs or visual display units. Early examples offered only the most primitive display of characters on the screen; crude character shapes, flicker, and typically green or amber text on a black background.

It was clear that such screens were completely unsuitable for protracted use. Operators complained of eyestrain; health agencies, labor unions and others in many countries were successful in introducing mandatory limits on continuous working. Many early research studies looked at the readability of such screens and found them wanting – not surprisingly.

Screen displays evolved dramatically, especially with the introduction of Graphical User Interfaces (GUIs) and computer graphics cards capable of displaying higher resolutions.

Computer displays – especially in software designed for document creation – have tried as much as possible to emulate paper, with black text on a “white” background.

Much of the research into readability on the screen has been overtaken by the rapid development of GUIs, as have standards for screen readability drawn up by organizations such as the European Commission. For example, a screen readability standard that is about to become a legal requirement in Europe, the ISO 9000 standard, is clearly based on pre-GUI days. For instance, the standard specifies that screen display of characters should ensure that no two characters are presented so close together that they touch.

This type of requirement made sense in the days of low-resolution displays. However, as screen graphics have evolved and resolutions improved such standards are increasingly out of touch. Two important tools of the typographer’s trade are ligatures (pairs or even triple-letter combinations) which not only touch, but merge, and kerning, in which certain pairs of letters are moved more closely together (perhaps even touching) in order to improve the optically-perceived spacing and make words more readable.

When resolutions and graphics capability were low, this kind of functionality was impractical on the screen. As both improve, they have become not just desirable, but essential, for readability.

Research into readability on the screen needs to be viewed against the background of rapid development, in both software and hardware.

For example, almost all of the studies carried out so far have used “traditional” computer displays employing Cathode Ray Tubes (CRTs).

CRT characteristics are fundamentally different from paper. They have inherent flicker which is perceptible at low refresh rates, and the solution so far has been to increase refresh rates to a level at which flicker becomes “imperceptible”. Whether this has truly solved the problem, or whether it remains as a factor in screen reading fatigue that continues to act at a more subtle level, remains to be seen.

CRTs are capable of producing very bright displays that if not carefully controlled can add an unacceptable level of glare.

Over the past few years, the growth in market share of portable computers – laptops, notebooks, handheld devices and so on – has led to rapid evolution of flat-panel display technology.

Early displays were monochrome, and of relatively low resolution. But Liquid Crystal Display (LCD) screens, as well as more esoteric technologies such as plasma displays, have rapidly improved.

Early LCD displays suffered from poor contrast, and in portable machines where power requirements have to be kept low to prolong working time on batteries this remains an issue. However, battery technology also continues to evolve, new technologies in on-glass circuity, and new screen materials such as polysilicon which are beginning to ship in devices with screen resolutions of 200dpi or more, are greatly improving screen aperture, improving screen brightness and lowering backlighting power requirements.

None of the screen readability research – apart from our own Microsoft work – takes into account the major leap forward in displaying text on LCD screens provided by the ClearType technology.

8.2 Optimization of Reading of Continuous Text

Paul Muter at the University of Toronto, in his paper Interface Design and Optimization of Reading of Continuous Text (© 1996, Ablex Publishing Corp) suggests that we do not yet know how to optimize reading via electronic equipment, but goes on to suggest that many of the factors which affect readability of print also apply, for example:

Muter’s conclusions on right justification reflect a well-understood typographic issue which has been addressed earlier in this paper, i.e. poor justification is worse than no justification at all. Simplistic justification, which merely introduces additional spaces between words, disrupts the reading gait by breaking the pattern of more-or-less fixed space between words. Fully-justified text – which requires hyphenation to ensure wordspacing is constant – is standard practice in publishing. It is only in recent years that this has been implemented in “standard” applications such as word-processing; previously it was implemented only in “professional” desktop publishing applications.

The key term in his conclusions on margins, serifs or typeface in general is “within reasonable limits”. This is shorthand for “within limits already established in text production”, i.e. by hundreds of years of publishing usage.

Many of the key factors identified by Muter and others in differences between print and screen that could account for observed slower reading from the computer screens of the 1980s are addressed by Microsoft’s ClearType technology. These are:

The benefits of ClearType are not only in character shape, edge sharpness, resolution, and stroke width. Because ClearType makes it possible to produce excellent character shapes at “small” sizes – the sizes people normally read in print – the technology also solves the problems of character size and thus number of characters per line, lines per page, words per page and inter-line spacing.

Other factors mentioned by Muter, such as the effect of margins, method for text advancement, and so on, are addressed by the OSPREY reading engine.

Again, Muter reaches the same conclusion as regards screen readability that Tinker reached for print: “It is quite clear that no single variable accounts for the obtained differences in performance between CRTs and paper. Several of the … variables, including resolution, interline spacing, polarity and edge sharpness contribute to the effect”.

“With a more modern system, including a large, higher-resolution screen with dark characters on a light background, reading from a computer can be as efficient as reading from a book.” (Muter & Maurutto, 1991).

Here Muter contradicts himself slightly: if line length is important (which he agrees it is) then specifying a “large, high-resolution screen” is a mistake. The higher resolution is the benefit, NOT the size. Of course, at the time of writing, and since Muter was comparing print with CRTs, high-resolution screens were only available in large sizes. This probably explains the slip.

Muter’s work also explored the issues of color and flicker, which relate to phosphor patterns on CRT screens and refresh rates, neither of which is as relevant on LCD displays.

Muter also re-iterates previous findings that paging is superior to scrolling in terms of both performance and user preference.

Muter does quote one finding (Nas, 1988) which suggests reading is slower if words are hyphenated at the ends of lines. It is likely that such a disadvantage, if it exists, is outweighed by the cueing benefits of justification AND constant word-spacing. These two requirements are mutually-exclusive, unless words are hyphenated.

Muter also examines various dynamic text presentation systems such as Rapid Serial Visual Presentation, in which single words are flashed on the screen in rapid succession.

His conclusion, however, is that despite the large number of published experiments on reading text from computers, no system has been found which is more efficient, in terms of both speed and comprehension, than the book.

Muter suggests that a likely reason for this is that the “bottleneck” is in the processing in the human brain, and that the technology of the book is optimal, having evolved over several centuries.

8.3 Paper versus screen

Reading from paper versus screens: a critical review of the empirical literature (Dillon, 1992).

This is an excellent review of a great deal of the work done up to 1992, which was dominated by work on overcoming speed deficits resulting from poor image quality. Dillon highlights the fact that emerging literature revealed a more complex set of variables at work. His review considered the differences between paper and screen in terms of outcome and processes of reading and concluded that single-variable explanations failed to capture the range of issues involved. He pointed to existing research, which was dominated by problems found when reading from VDUs (generally, green or white text on a black background). Testing methodologies, experiment design, and subject selection were frequently flawed.

By far the most common finding, he said, was that reading from the screen is between 20 and 30 percent slower than from paper.

Some of the test methodologies used by researchers are almost unbelievable, and completely unrelated to the normal reading experience. For example, in one study by Muter et al in 1982, subjects were asked to read white characters 1cm high, displayed on a blue screen, at a reading distance of 5 meters – in a “well-illuminated room”.

In this study, it also took nine seconds to repaint each screenful of information.

Other studies carried out at that time used similarly skewed test methodologies, for example, comparing printed text with characters 4mm high with green text 3mm high on a black screen background. Given Tinker and Paterson’s work on legibility in print, it was hardly surprising that researchers found screen reading slower and less acceptable.

Most of these studies were carried out in the 1980s using older displays (then referred to as Visual Display Units – presumably for the Visual Display Operatives who would run them). (Heppner, Anderson & Farstrup, 1985)

Later studies using computers with GUIs and thus text which more closely approached print parameters showed there was in fact little or no difference between screen and print, provided that attention was paid to such factors as screen resolution, refresh rates, anti-aliasing, text polarity, etc..

Various researchers found that paging text rather than scrolling was much more acceptable. (Schwarz, Beldie & Pastoor, 1983)

In a study carried out at Hewlett-Packard Labs in California (SID 95 Digest, 1995), E.R. Anthony and J.E. Farrell used a 1200 dpi, 24-bit color screen to simulate printed output, and found that users found no difference between the two, suggesting that screen resolution was, in fact, the major issue, and that when there was sufficient resolution then existing parameters for achieving legibility in print can be applied to the screen.

Reading performance using screen and paper was directly compared in another study (Osborne and Holton, 1988), which examined the argument that reading from the screen was slower. These researchers paid closer attention to experimental detail, comparing light characters on a dark background and dark characters on a light background on both paper and screen.

They found no significant difference, although readers expressed a clear preference for the “normal” presentation – dark characters on a light background – for both screen and paper.


Key factors:

Another study (Gould, Alfaro, Finn, Haupt and Minuto, 1987) reached the same conclusion, finding that a combination of dark characters on a light background, removing jaggedness from screen fonts (in their case using anti-aliasing) and using a high-resolution monitor (in their case, 1000×800) leveled the playing-field between screen and paper in terms of reading speed.

Some researchers in the past have suggested that one way of improving screen readability is simply to give users a larger screen. (de Bruijn, de Mul & van Oostendorp, 1992)

This study seems fatally flawed. Virtually all researchers conclude that a myriad of factors, including resolution, refresh rate and character size are involved in producing readable text on screen. Yet this study uses standard and large-size screens with differences in all three of these variables, but bases its conclusions only on the different screen sizes. The researchers argue that these factors can only have had a minimal effect (contradicting the vast body of research to the contrary) and dismiss this by saying further research is required to exclude these “possibly confounding effects”.

Other studies (Richardson, Dillon and McKnight, 1989) (Duchnicky & Kolers, 1983) indicate that screen size is not a major factor in reading performance – although readers expressed preference for larger screens.

Other researchers attempt to bypass the screen issue altogether by developing new technologies such as “electronic paper”, and even inks made from “mutant bacteria” (a strain of bacteriorhodopsin) which change color based on response to electrical charge.

While these technologies may prove a future substitute for paper, so far none of the R&D efforts have succeeded in shipping a useable product. Some researchers have suggested that electronic paper that responds to a change in charge could be used to build “electronic books” by binding several hundred sheets of this material together.

This begs the question: If a single sheet can change into any page, why not use just one – and isn’t that (assuming it refreshes fast enough) just a screen by any other name? The “page” still requires electronics to drive it.

It is often dangerous to dismiss nascent technologies. But the failure so far of any of these groups to ship a practical implementation suggests that it is safe to do so at least until they are proved to work.

(Kolers, Duchnicky and Ferguson, 1981) found readers preferred more characters per line rather than larger type sizes, and that static pages were processed more efficiently than pages scrolled at the reader’s preferred rate. Scrolling faster than the preferred rate gave readers better reading efficiency, but created problems of user acceptance.

In one of the few tests that used extended reading times, subjects read printed books and continuous text on screen for two hours. There was no significant difference in either reading performance via comprehension scores, nor were there differences evident in eyestrain, dizziness or fatigue.

However, reading from the screen was found to be 28.5 percent slower.

This test found no difference in presence of proportional word spacing and non-proportional wordspacing – hardly surprising, as the examples of both were displayed on videotext montiors that looked uniformly awful. Viewing the sample screens in the paper is enough to explain the slower reading performance; it is surprising any of the screen subjects actually made it to the end of the two-hour test.

(Jorna & Snyder, 1991) found that if image quality (i.e. resolution) of print and screen were equal, they would yield equivalent reading speeds.

(Gould, Alfaro, Barnes, Finn, Grischkowsky and Minuto, 1987) tried to explain causative factors in their earlier findings on screen reading being slower than reading print. They tried to isolate single variables to explain the difference, and concluded that the difference was due to a combination of variables such as display orientation, character size, font or polarity, probably centering on the image quality of the characters themselves.

(Trollip and Sales, 1986) compared unjustified text with “fill-justified” text, i.e. text justified by inserting extra spaces between words. Subjects were asked to read printed samples. They found fill-justified text was read more slowly. This evidence supports assertions by typographers that irregular word-spacing from line to line interrupts the reading flow. If text is justified, it must be combined with hyphenation in order to keep word-spacing constant.

(Gould and Grischkowsky, 1986) examined the effect of visual angle of a line of characters on reading speed. The experiment found that proofreading and accuracy were reduced at extreme visual angles, however, text displayed on most normal computer screens did not produced such extreme angles and thus visual angle had no effect.

The study used two different typefaces, the 3277 CRT character set (characteristic of VDU displays at that time), and Letter Gothic. The CRT characters were green on a black background, Letter Gothic was black on white. It was found proofreading performance was significantly poorer and slower with the CRT characters.

The experiment varied the size of characters as visual angle (line length) was varied, and the researchers concluded that line length and character size were inter-related variables which contributed to readability in an interactive way.

(Waern and Rollenhagen, 1983) analyzed the task of reading text from the screen. As with many of these “early” studies (ten years is a very long time in the computer world), much of the data is irrelevant, since “traditional” VDU displays have been superceded by Graphical User Interfaces which display text more closely resembling paper (though still a long way off).

The listed the following parameters as affecting human vision, citing earlier research (Stewart, 1979):

The researchers appear to have ignored other factors – line length, page size, justification etc., all of which were investigated and found to be important variables by the earlier work of researchers like Tinker and Paterson in their classic studies of readability and legibility of print.

8.4 Innovative approaches

Innovative approaches to reading from the screen have been tried. Scrolling at a fixed pace, at a user-preferred pace, and at faster or slower than the user-preferred pace have all been tested. In all cases, inexperienced users preferred paging to scrolling.

Another innovative approach is referred to as Rapid Serial Visual Presentation; generally this means flashing single words on the screen at extremely high speeds. In some implementations, speed is gradually accelerated.

Grandiose claims have been made for this technology, in all cases by companies attempting to sell RSVP tools for authoring and reading. None have so far met with commercial success. Claims of reading speeds of 750 words per minute and higher have been made; one company has even claimed that RSVP induces a “trance-like state” in which readers actually see pictures associated with the text.

The “trance-like state” is clearly a nod to the work of Victor Nell.

Researchers have tried to seriously investigate some of these claims.

(Konrad, Kramer, Watson and Weber, 1996) investigated the use of RSVP for dynamic information. They found that RSVP had better performance for clause-level units only if the author had already carried out “chunking” into units of meaning. In other words, RSVP efficiency depended on semantic analysis of content and editing into units of meaning – which would rule it out as a general-purpose panacea to improve readability of text on the computer screen. RSVP worked better on clause-level presentation than word-level presentation. The work also suggests that when objects (words) are viewed in rapid sequence, attention may be diverted or overloaded.

Although this is purely anecdotal, I personally tried all of the implementations of RSVP I could find on the Web, and came to similar conclusions.

Because words differ in length, the “recognition field” varies continuously. This made it tiring to continually refocus on each word.

As presentation speed increased, I had the sensation of gradually being left behind. As an extremely fast reader it was disconcerting to have to stop the display to try to catch up.

Since words were flashed up one at a time, my brain appeared to be working hard at “chunking” units of meaning – having to hold words in some temporary buffer until the full “chunk” had been built, at which time it could be released (unfortunately, this extra cognitive load normally meant I was running to try to catch up with the text, which had continued to change).

This personal experience reinforced the findings of the paper cited above.

For readers who wish to explore the claims for RSVP in more detail, here are some of the references:


The Art Of Legibility (Tenax Software, 1997). A sales job for the company’s MARS and Vortex RSVP software. Plenty of references to other RSVP work.

Colorado Entrepreneur sells software that improves literacy (Knight-Ridder Business News, 26/12/97)

AceReader helps you read web pages, email and documents faster! (Stepware, Inc.)

Super speed-reading (Forbes, 08/11/97)


A number of less-exotic technologies have been introduced to try to improve screen readability. The most common of these is the use of anti-aliasing (aka grayscaling when applied to black-and-white text and background).


(O’Regan, Bismuth, Hersch and Pappas) published a paper on “perceptually-tuned” grayscale fonts, which claimed that their grayscaling technique improves legibility of type at small (8 and 10 point) sizes.

Professor Hersch, of the Ecole Polytechnique in Lausanne, is a world expert in digital typography, publisher of numerous books and papers on the subject.

It is clear that the grayscaling techniques employed do improve legibility at these vital sizes (vital, because these are the sizes at which most people prefer – perhaps even need – to read large bodies of text).

However, it is also clear that the improvement is not sufficient to make people comfortable reading for extended periods on the screen. The problem with grayscaling is that it blurs the text in order to smooth out its jaggedness, but it does do by using the same-sized pixels which caused the problem in the first place.

Type has extremely small features that work to enhance its legibility. Trying to portray such small features with the traditional pixel is akin to being asked to paint the Mona Lisa, then being handed a paint-roller.

Grayscaling unfortunately still uses the same size paint-roller. It’s just that along with the black paint we had, we now get a few extra buckets of gray with which to smear the edges.

Especially at small sizes, text looks blurred. It is fatiguing to read for extended periods, since the eye is continually trying to focus the inherently unfocusable.

Another approach is to try to design fonts specifically for reading on the screen, in effect adapting type to the constraints of the pixel.

At Microsoft, we have spent more time, expertise and money on this issue than anyone.

We commissioned Matthew Carter, one of the world’s leading type designers, to create two completely new typefaces for Microsoft – Verdana, a sans serif face, and Georgia, a serif face.

These two faces have shipped with every version of Microsoft Internet Explorer and Microsoft Windows since 1997. They have thus been in constant use by millions of people daily. They have been hailed as great standards for the Web. They have been made available for free download from Microsoft’s website.

And they are not good enough, until they were freed by ClearType from the constraints of the pixel.

A research study Microsoft commissioned from Carnegie Mellon University (Boyarski, Neuwirth, Forlizzi, and Regli, 1997), found that Georgia was indeed more readable than Times New Roman, a Windows core font designed for print, but which had been tuned for screen readability to a huge degree. The study found Verdana was even more readable than Georgia.

Anti-aliased versions of fonts were easier to read and more legible than non-anti-aliased versions.

The study also compared Microsoft’s anti-aliasing algorithms with those of Adobe Systems – and found that readers preferred the Adobe anti-aliasing – (although, thankfully, only to a very small degree!).

Microsoft also commissioned other research at the University of Reading (pronounced redding) in the UK, which has a large and international-renowned Typography and Graphics department. This study was on the effect of line length on readability. (Dyson and Kipping, 1996)

Accepted wisdom among graphics designers and typographers is that lines of 55-65 characters (at type sizes from 9-12 point) are most readable.

Dyson and Kipping found that longer lines of 100 characters were actually read more quickly, but that readers were less comfortable with these longer lines, preferring the more normal line lengths.

They also found that scrolling was slower than paging.

With typical academic reluctance to jump to conclusions, these researchers suggested that the difference between readers’ perceptions and actual performance means it is difficult to make practical recommendations on optimal line length.

Microsoft commissioned a follow-up study to investigate this in more detail (Dyson and Haselgrove, 1998), which had similar results.

At Microsoft – where we have spent a great deal of effort over many years trying to make people feel more comfortable using computers – we take a more pragmatic view. “Perception is everything.” Especially when applied to long-duration reading tasks such as books, comfort is far more important than performance – since we read books at our own pace in any event, and pacing is driven by content.


9. The Readable Electronic Book

It is now possible to write a high-level functional specification for the electronic book and its user interface. The draft below is not exhaustive and will need considerable refinement and attention to detail during the detailed specification and development process. Most of the findings apply to any document we might wish to read on the screen.

In reality, there may be more than one type of electronic book: there are at least two “sweet spots” for reading. The first is a device that is smaller, more portable, and equates more or less to the printed paperback, except that it will contain a number of books, be able to download new books, and may also have other features such as annotation. It could be a monochrome device.

The second level of device will have color and support for sound, and will take advantage of these and other capabilities to take electronic books beyond the books of today. To succeed, this will have to be done in a careful, planned manner, to ensure that this additional functionality does not destroy readability by degrading the OSPREY effect.

Both levels of device will have much in common.

They will take as input a defined data structure designed to allow automatic formatting of the content.

They will use the screen as a “page” in portrait mode.

They will utilize a reading engine to layout this text according to OSPREY principles.

They may use a single typeface, or a number of typefaces, but these will be carefully chosen (and in some cases, adjusted by hinting and other techniques) for optimum readability on the screen.

New RGB striping font technology will be used to improve the readability of the type. If the lower-level device is to be monochrome, this should be manufactured using a basic color screen but with the color filter left out during the manufacturing process. This will allow high-resolution grayscale to be implemented using the same technique as for RGB color.

Text will be divided into “pages” which will have adequate margins. Pages will be numbered as the text is being formatted. Pages will always have the same number on the same device, and will always be laid out identically.

The user interface will have two modes: browsing and reading.

In browsing mode, the reader will be provided with all the software tools necessary to find a book, purchase or borrow a book, and load a book.

In reading mode, these tools will disappear and the user interface will be very similar to a printed book, permitting only page turns, forward and back, etc. It will provide a means of “backing out” to browsing mode. This mode change will happen automatically and completely transparently as far as the reader is concerned. The decision to open or re-open a book will automatically trigger reading mode.

The OSPREY engine will have a set of harmonic tunings for different type sizes. For example, when the reader wants to read 11point type, the text will be formatted with the correct line length, leading etc. If the reader wants to read, say, 16 point, these settings will all change together to a new “set” harmonically-balanced according to OSPREY rules. Changes will be driven by the reader’s typesize preference alone.

The OSPREY engine will use screen metrics to set the text.

Harmonic tunings will include word- and letter-spacing settings.

Text will be set fully-justified.

To keep spacing constant, text will be hyphenated.

Hyphenation will be done using a dictionary specific to the language of the text. Soft hyphens may be embodied in the text to avoid requirements for the device to have multiple dictionaries.

The OSPREY engine will be capable of handling all typographic features handled by the print publishing engines of today – although the reader will never be aware of the complexity of this process since it will occur automatically.

These features will include: pair kerning (essential for good letter spacing), ligatures, super- and sub-script, small capitals, non-aligning numerals, etc. It may be necessary to implement this by integrating the OpenType Services library.

The engine will be aware of text at the paragraph, page and chapter level and will be capable of widow and orphan control (by adding an additional line to the previous page).

The two-page spread may be one aspect of the printed book that really is merely an artifact of the means of production, i.e. if we have a sheet of paper, it makes sense to print on both sides. But anyone who has tried reading in bed will quickly realize its disadvantages. Most books do not use the spread; perhaps it will be retained only on larger displays, more likely it will go the way of hot-metal typesetting.

9.1 Alternatives to OSPREY

There are many possible alternatives to Osprey. A number are listed below. The reader will see that all have either been specifically dealt with earlier in this document, or are ruled out on general principles.


10. OSPREY requirements for print

Font

Inter-character spacing

Inter-word spacing

Line length

Fully-justified lines

Leading (interlinear spacing)

Text area (no. of lines per page)

Page size and layout

Navigation


11. OSPREY requirements for the screen

Character shapes are the worst

The single biggest obstacle to screen readability today is at the micro level: typefaces and the way in which they are displayed.

There are a number of contributing factors:

Most typefaces were NOT designed for reading on the screen, nor were technologies for displaying them. Most typefaces came from the traditional printing world. When typesetting was translated to computers, the main concern was WYSYWIG – representing on-screen exactly what would come out on paper from laser printers or imagesetters.

With a limited number of pixels available, screen fonts were designed to be mere representations of the true printer fonts. Also, when laying out text on screen, typesetting (and word processing) programs used the font’s printer metrics in order to accurately show line-breaks, page-breaks and so on.

To fit screen fonts into printer metrics – and with a very limited number of pixels available – individual character shapes had to be distorted to fit. The result was typefaces whose screen versions had irregular shapes and lost all of the subtlety and defining features of the original print faces.

The worst problem was and remains today the resolution of the screen. Even the first primitive laser printers had a resolution of 300dpi (dots per inch). Today 600dpi is the norm, and 1200dpi is common. In contrast, the resolution of mainstream computer monitors (leaving aside expensive high-resolution devices) today range from 72dpi to about 120dpi. In practice, the true resolutions go up only to around 106-110dpi.

With so few pixels available to create representations of characters, especially at the small 10 and 12 point sizes typically used for reading large amounts of text, no improvement is possible.

Typeface design does not solve problem

Some typefaces have been designed for reading on the screen (Microsoft has been a leader in this effort). But all run up against the same basic problem of resolution. Designing typefaces specifically for the screen is a pragmatic approach: we have only so many pixels available, therefore we design characters to give the best and most readable shapes within those constraints. We make the best of what’s available.

However, even this approach runs up against some fundamental problems. Research shows, for instance, that serif typefaces are best for sustained reading. It also reveals that the serif should be typically only around 18% as heavy as the typical character stem feature. At small sizes, single-pixel-wide stems are the thinnest that can be achieved using current technology. The next jump up is to two-pixel-wide stems.

The same single-pixel limit, however, also applies to serifs. Therefore a typical serif face in a small size has serifs of equal weight to the character stems, giving a “slabby” look that destroys much of the benefit of the serifs.

At the same time, this coarseness of resolution causes another problem. If a Bold version of a typeface is required, it must of course have heavier stems than the regular or roman weight. But the only means today of increasing this weight is to go from one-pixel to two-pixel stems, which results in bold versions of typefaces that are far too bold. In print, typically, a boldface might be 30% heavier than the roman. On the screen, it’s 100% heavier.

This might seem like only a minor problem; except that the one-pixel stem weight of a regular typeface produces characters that are too spidery and light to be read comfortably on the screen, even when character shapes are specifically designed for screen display. And two-pixel type is too heavy at small sizes.

This problem screams out for increased screen resolution. With screens of, say, 300dpi, there would be three times as many pixels for each character; each pixel being much smaller. So a regular weight of a typeface might be four or five (300dpi) pixels wide, and a bold version might be seven, or eight pixels wide. Serifs could be one or two pixels thick, and so on, and it would be possible to display most of the subtlety of the original faces on the screen.

Resolution of this standard is achievable today only in specialized and expensive devices, either high-resolution CRT monitors or high-resolution LCD devices. Both pose production problems that make it unlikely they will become mainstream within a three-to-four-year timeframe.

ClearType RGB striping font technology

Faced with the failure of the pragmatic approach, and unwilling to wait until mainstream screen resolution improvements solved the problems for us, we decided to look for an innovative solution using today’s technologies.

LCD screens are becoming more and more mainstream display technology. Introduced in the beginning to give low-power display for laptop computers, they have been steadily evolving over the past few years. Resolutions have increased to a nominal 96dpi (in reality, anywhere from 75 to 106, depending on the individual device).

Alongside resolution improvements have come support for color, and new backlit LCD devices seem set to replace traditional CRTs once economies of scale and competition pull prices down. This is in fact already occurring; in August Compaq Computer announced a 20% price cut, taking the cost of a pedestal-mounted desktop LCD below $1000 for the first time. The first full-digitally-addressed desktop LCD, which can also be rotated into either portrait or landscape mode, was launched by Toshiba this summer at a price of $1500, which includes its own graphics card. We can expect the capabilities of such devices to increase almost as rapidly as their prices fall. As desktop devices, they can use more powerful (and power-hungry) backlighting, and are now credible (and desirable) replacements for CRTs. They are more readable, since they have no flicker, can be easily tilted to the desirable reading angle, and take up far less space on the desktop, making them also much less intimidating devices on which to read.

During the preparation of this fairly lengthy document, I found myself for the very first time proofreading and correcting entirely on the screen (a 110dpi SGI flat-panel display running ClearType). Like everyone else, I would normally not consider using the screen for a task of this length.

LCD devices are especially relevant to the concept of the “electronic book” (eBook), since these devices must be portable, light, and consume little power. But they are also likely to play a major role in increasing acceptance for reading all kinds of information on the computer screen instead of in print.

The rise of LCD devices presents new sets of both problems and opportunities for screen readability. CkearType was developed to take advantage of hitherto-unused capabilities of LCD screens, and delivers on-screen type of a clarity and subtlety which has never been seen on mainstream devices.

LCD problems

Existing font rendering technologies were developed for CRT devices, and in the past these have been applied to LCD screens simply by treating an LCD pixel as identical to a CRT pixel. However, the two are very different. A CRT “pixel” is a dot generated by electrons impinging on a phosphor screen. By virtue of the process, there is some “bleed” from one pixel to its adjacent pixel. This is a phenomenon equivalent to “dot gain” in printing, in which ink spreads as it is absorbed into the paper. This is a well-known effect for which printers have to compensate in advance. The same “pixel bleed” on a CRT has the effect of smoothing out some of the jagged edges of pixels.

Other font display technologies such as anti-aliasing attempt to deliberately introduce additional pixels in either levels of gray or color around the rasterized screen type in order to smooth out curves and diagonals, taking advantage of the human eye’s tendency to merge the dots and fill in the blanks.

LCD pixels are very different, for two reasons. The first causes a further problem, the second represents a major opportunity which Microsoft is now beginning to exploit.

LCD pixels are much sharper than CRT pixels, by virtue of the physical construction of LCD screens. While on a CRT there is no real “pixel grid” and software has to construct a virtual grid to aid pixel manipulation, on the LCD the grid is very real and consists of hard lines which define each pixel boundary. Because of these hard lines, there is no pixel bleed on LCDs; the jagged effects of pixelation (aliasing) are thus much worse. In addition, because of lack of pixel bleed and color contrast levels, existing anti-aliasing technologies do not work nearly as well on LCD screens.

However, hidden in this pixel world is a physical characteristic of color LCD screens. LCDs are made up of pixels that consist of Red, Green and Blue (RGB) sub-pixels. Normally, these run in stripes down the length of the screen. In reality, this means that the screen has a theoretical resolution three times that being addressed.

One can address these sub-pixels directly. Unfortunately, you end up with text that looks like as if was designed by the late Dr. Timothy Leary. It’s great to read if you’re on acid, but way too colorful for the rest of us.

The “trick” in the ClearType technology a way of addressing this resolution without color artifacts. It’s a complex process; we had a lot of assistance from Microsoft Research display experts and mathematicians. Details are described in the relevant US and international patent claims we have lodged for the technology.

It is enough to say it works more spectacularly than we could have dreamed; reaction has been uniformly amazed at the new quality bar for type on the screen.

Font

Five typesizes only:

Inter-character spacing

Inter-word spacing

Line length

Fully-justified lines

Leading (interlinear spacing)

Text area (no. of lines per page)

Page size and layout

Internal navigation

External navigation


12. Future research and development

This document is a blueprint for further research. There’s a huge amount of future research needed to test and prove the Osprey concepts, and to implement the functionality, to keep a whole team of researchers – including external research groups – busy for the rest of our lives. Here are some of the projects that we need to do. Some have already begun. This section will continue to grow.

ClearType

Readability


Appendix: Readers Anonymous

Hi: Welcome to the inaugural meeting of the Redmond Chapter of Readers Anonymous.


My name is Bill, and I’m a book addict. Let me tell you my story.

It began when I was three years old. We lived in a very rundown house, in a very rundown area of Glasgow, Scotland – one of the roughest cities anywhere in the world.

My father was a high-steel construction worker. He spent most of his working life several hundred feet above the ground, on major construction projects like bridges, nuclear generating stations, and so on. He’d left school at the age of 14. In Glasgow, hit worse than most places in the world by the Depression of the 1930s, there was no prospect of a job. So he joined the British Royal Navy, where he spent 17 years of his life.

One picture stays clear in my mind. My father, holding out his hands, palm-upward, saying to me, “Son, the Navy was good for me. But I only ever learned to work with these. You should learn how to work with your head. Get an education, and you’ll have opportunities I never had.”

Well, on the few occasions my parents decided to have a night out, they left me in the care of Tommy Nicholson, the 16-year-old son of neighbors.

Tommy was amazing. Together, we dismantled old clocks and radios and tried to figure out how they worked. But the real magic happened when he’d bring around his collection of school exercise books in which he’d drawn his own comics, filled with the heroes and villains he’d invented. That’s when I first got interested in reading.

Seeing my interest, my mum and dad bought me a 12-volume set of The Children’s Encyclopaedia, written and edited by Arthur Mee. The volumes were bound in red leather, and tooled with gold leaf. God knows what they represented in terms of a fraction of the family’s total disposable income in those days – a small fortune, I imagine. I never appreciated the sacrifice they made until many years after my father died, so I never did get to tell him how much I valued what they’d done.

But the investment paid off. From the age of four until I was 11 or 12, I seldom spent less than two or three hours a day, lying on the kitchen floor, reading those encyclopaedias. I was hooked. I handled those volumes with reverence and respect. Ten years later, when I eventually passed them on to another kid, you’d have thought they’d lain on a bookshelf untouched for the entire time, or been bought new the previous day. There was not a single marked or damaged page or cover.

My reading grew. By the time I was 13, I had to visit the local public library twice a week. Even with my two library tickets, my mum’s tickets, my dad’s tickets and my sister’s, the eight books I could borrow at one time was nowhere near enough. I had to stop back in midweek and borrow a new pile. I counted; at that stage I was reading an average of 17 books a week.

It’s continued ever since. Even working at Microsoft, and with the demands of a family of my own, I seldom read fewer than four books a week. I never set off on a business trip – or even a shopping trip with the family, or a visit to the dentist’s surgery – without a paperback book either tucked down beside the car seat or in my pocket.

Last year – wearing a kilt to honor the past – I stood on stage with Bill Gates at Comdex in Las Vegas, and we announced ClearType to the world. It’s been a long journey from Old Shettleston Road in Glasgow.

Books have taken me from the back streets to the Pacific Northwest, many other parts of the USA, and much of Europe. They’ve been my friends through loneliness and hard times, and my companions in good times. There are books I’ve read once, and books I’ve read 20 times or more, returning to them like old friends.

Books have altered my consciousness and changed my life in many ways. This is one addiction I’m not about to quit.


References

Andrews, R. B. (1949). Reading Power Unlimited. Texas Outlook: 20‑21. Anthony, E.J. & Farrell, J.E. (1995). CRT-Display Simulation of Printed Output. SID 95 Digest 15(2): 209‑212. Birkerts, S. (1994). The Gutenberg Elegies. New York, Fawcett Columbine. Boyarski, D., Neuwirth, C., Forlizzi, J., Regli, S.H. (1997). A study of fonts designed for screen display. Carnegie Mellon University. Brady, J. (1986). A theory of productivity in the creative process. IEEE Computer Graphics and Applications (May 1986): 25‑34. Bringhurst, R. (1997). The Elements of Typographic Style. Vancouver, BC, Hartley & Marks, Publishers. Brown, E.R. (1981). A Theory of Reading. Journal of Communication Disorders 14(6): 443‑466. Csikszentmihalyi, M. (1990). Flow: The Psychology of Optimal Experience. Harper & Row, Publishers, Inc. de Bruijn, D., de Mul, S. & van Oostendorp, H. (1992). The influence of screen size and text layout on the study of text. Behaviour and Information Technology 11(2): 71‑8. DeMarco, T. & Lister, T. (1987). Peopleware: productive projects & teams. New York, Dorset House Publishing Co. Dillon, A. (1992). Reading from paper versus screens: a critical review of the empirical literature. Ergonomics 35(10): 1297‑1326. Dowding, G. (1966). Finer Points in the spacing and arrangement of type. Vancouver, BC, Hartley & Marks Publishers Inc. Duchnicky, R.L. & Kolers, P.A. (1983). Readability of text scrolled on visual display terminals as a function of window size. Human Factors 25(6): 683‑92. Dyson, M.C., & Kipping, G.J. (1996). The effects of line length and method of movement on reading from screen. Dyson, M.C. & Haselgrove, M. (1998). Effects of reading speed and line length on comprehension. Fabrizio, R., Kaplan, I. & Teal, G. (1967). Readability as a function of the straightness of right-hand margins. Journal of Typographic Research 1(1): 90‑95. Geisler, W.S. (1984). Physical limits of acuity and hyperacuity. Journal of the Optical Society of America 1(7): 775‑82. Gould, J.D., Alfaro, L., Finn, R., Haupt, B. & Minuto, A. (1987). Reading from CRT displays can be as fast as reading from paper. Human Factors 29(5): 497‑517. Gould, J.D. & Grischkowsky, N. (1986). Does visual angle of a line of characters affect reading speed? Human Factors 28(2): 165‑73. Gould, J.D., Alfaro, L., Barnes, V., Finn, R., Grischkowsky, N. & Minuto, A. (1987). Reading is slower from CRT displays than from paper: Attempts to isolate a single-variable explanation. Human Factors 29(3): 269‑99. Haber, R.N. & Haber, L.R. (1981). Visual components of the reading process. Visible Language 15(2): 147‑182. Healy, A.F. & McNamara, D.S. (1996). Verbal learning and memory: does the modal model still work? Annual Review of Psychology (1996): 143. Heppner, F.H., Anderson, J.G.T., Farstrup, A.E. & Weiderman, N.H. (1985). Reading performance on a standardized test is better from print than from computer display. Journal of Reading 1985(1): 321‑5. Hersch, R.D. (1993). Visual and technical aspects of type. Cambridge, England, Cambridge Unigversity Press. High, C.R. (1997). The Art of Legibility. Olympia, WA, Tenax Software Engineering. Huey, E.B. (1915). The Psychology and Pedagogy of Reading. New York, The MacMillan Company. Hutheesing, N. (1997). Super speed-reading. Forbes. 160: 123. Johnston, P.H. (1984). Assessment in Reading. Handbook of Reading Research (1984): 147‑82. Jorna, G.C. & Snyder, H.L. (1991). Image quality determines differences in reading performance and perceived image quality with CRT and hard-copy displays. Human Factors 33(4): 459‑469. Kolers, P.A., Duchnicky, R.L. & Ferguson, D.C. (1981). Eye movement measurement of readability of CRT displays. Human Factors 23(1981): 517‑27. Konrad, C.M., Kramer, A.F., Watson, S.E. & Weber, T.A. (1996). A comparison of sequential and spatial displays in a complex monitoring task. Human Factors (09/01/96): 464. Kupper, N. Recording of visual reading activity: research into newspaper reading behaviour. McAteer, S. (1998). Electronic Books: progress on the path to electronic distribution. Jupiter Strategic Planning Services: 1‑2. Mewhort, D.J.K. (1966). Sequential redundancy and letter spacing as determinants of tachistoscopic recognition. Canadian Journal of Psychology 20(4): 435‑44. Mills, C.B. & Weldon, L.J. (1987). Reading Text from Computer Screens. ACM Computing Surveys 19(4): 329‑58. Morison, S. (1936). Typography. London, Encyclopaedia Brittanica. 18: 650‑652. Morison, S. & Browne, W.C. (1936). Printing Types. London, Encyclopaedia Brittanica. 18: 508‑512. Morrison, R.E. & Inhoff, A.W. (1981). Visual factors and eye movements in reading. Visible Language 15(2): 129‑147. Muter, P. (1998). Interface Design and Optimization of Reading of Continuous Text. Psychology. University of Toronto: 1‑17. Muter, P., Latremouille, S.A., Treurniet, W.C. & Beam, P. (1982). Extended reading of continuous text on television screens. Human factors 24(5): 501‑8. Nell, V. (1988). Lost in a book; the psychology of reading for pleasure. New Haven & London, Yale University Press. Ogg, O. (1948). The 26 Letters. New York, The Thomas Y. Crowell Company. O’Hara, K. & Sellen, A. (1997). A comparison of reading paper and on-line documents. CHI ’97, Atlanta, Ge., ACM. O’Regan, K., Bismuth, N., Hersch, Roger D. & Pappas, A. Legibility of perceptually-tuned grayscale fonts. Osborne, D.J. & Holton, D. (1988). Reading from screen vs. paper: there is no difference. International Journal of Man-Machine Studies 28: 1‑9. Paterson, D.G. & Tinker, Miles A. (1940). How to Make Type readable. New York, Harper & Brothers Publishers. Paterson, D.G. & Tinker, Miles A. (1931). Studies of typographical factors influencing speed of reading: VI Black type versus white type. Journal of Applied Psychology 15(2): 241‑7. Paterson, D.G. & Tinker, Miles A. (1936). Studies of typographical factors influencing speed of reading: XII, Printing Surface. Journal of Applied Psychology 20(1936): 128‑31. Paterson, D.G. & Tinker, Miles A. (1932). Studies of typographical factors influencing speed of reading: X Style of type face. Journal of Applied Psychology 16(6): 605‑13. Paterson, D.G. & Tinker, Miles A. (1932). Studies of typographical factors influencing speed of reading: VIII Space between lines or leading. Journal of Applied Psychology 16(4): 388‑97. Paterson, D.G. & Tinker, Miles A. (1930). Studies of typographical factors influencing speed of reading: IV Effect of practice on equivalence of test forms. Journal of Applied Psychology 14(3): 211‑7. Payne, D.E. (1967). Readability of typewritten material: proportional versus standard spacing. Journal of Typographic Research 1(2): 125‑137. Postgate, J.P.P. (1936). Textual criticism. London, Encyclopaedia Brittanica. 22: 6‑11. Quan, M. (1998). Mutant bacteria, electronic ink and paper under development: offbeat technologies may hold key to displays. Electronic Engineering Times. Rayner, K. (1981). Visual cues in word recognition and reading. Visible Language 15(2): 125‑129. Richardson, J., Dillon, A. & McKnight, C. (1989). The effect of window size on reading and manipulating electronic text. Contemporary Ergonomics: 474‑479. Riddell, J.R. & Oswald, J.C. (1936). Printing. London, Encyclopaedia Brittanica. 18: 499‑508. Romero, C.L. & Schulaka, C. (1997). Colorado Entrepreneur sells software that improves literacy. KRTBN Knight-Ridder Business Tribune News. Boulder, Co.: 1. Rubinstein, R. (1988). Digital Typography: An introduction to type and composition for computer system design. New York, Addison-Wesley Publishing Company. Schwarz, E., Beldie, I.P. & Pastoor, S. (1983). A comparison of paging and scrolling for changing screen contents by inexperienced users. Human Factors 25(3): 279‑82. Silberman, S. (1998). Ex Libris. Wired: 5. Smeijers, F. (1996). Counterpunch: making type in the 16th Century; designing typefaces now. London, Hyphen Press. Spencer, H. (1968). The Visible Word. London, Royal College of Art. StepWare, INC. (1998). AceReader helps you read web pages, email and documents faster! Press release, StepWare, Inc. Stewart, T.F.M. (1979). Eyestrain and visual display units: a review. Displays (April): 25‑32. Taylor, I. & Taylor, M.M. (1983). The Psychology of Reading. Toronto, Academic Press. Tinker, M.A. (1963). Legibility of Print. Ames, Iowa State University Press. Tinker, M.A. (1965). Bases for Effective reading. Minneapolis, University of Minnesota Press. Tinker, M.A. & Paterson, D.G. (1935). Studies of typographical factors influencing speed of reading: XI Role of set in typographical studies. Journal of Applied Psychology 19(1935): 647‑51. Tinker, M.A. & Paterson, D.G. (1928). Influence of type form on speed of reading. Journal of Applied Psychology 12(4): 359‑68. Tinker, M.A. & Paterson, D.G. (1931). Studies of typographical factors influencing speed of reading: V Simultaneous variation of type size and line length. Journal of Applied Psychology 15(1): 72‑8. Tinker, M.A. & Paterson, D.G. (1929). Studies of typographical factors influencing speed of reading: II Size of type. Journal of Applied Psychology 13(2): 120‑130. Tinker, M.A. & Paterson, D.G. (1931). Studies of typographical factors influencing speed of reading: VII Variations in color of print and background. Journal of Applied Psychology 15(5): 471‑9. Tinker, M.A. & Paterson, D.G. (1929). Studies of typographical factors influencing speed of reading: III Length of line. Journal of Applied Psychology 13(3): 205‑219. Tinker, M.A. (1963). Influence of simultaneous variation in size of type, width of line and leading for newspaper type. Journal of Applied Psychology 47(6): 380‑2. Tinker, M.A. & Paterson, D.G. (1931). Studies of typographical factors influencing speed of reading: IX Reductions in size of newspaper print. Journal of Applied Psychology 16(5): 525‑31. Tinker, M.A. & Paterson, D.G. (1936). Studies of typographical factors influencing speed of reading: XIII Methodological considerations. Journal of Applied Psychology 20(1936): 132‑45. Trollip, S.R. & SALES, G. (1986). Readability of computer-generated fill text. Human Factors 28(2): 159‑63. Tschichold, J. (1991). The Form of the Book: essays on the morality of good design. Vancouver, BC., Hartley & Marks Publishers. Waern, Y. & Rollenhagen, C. (1983). Reading text from visual display units. International Journal of Man-Machine Studies (18): 441‑465. Waller, R. (1991). Typography and Discourse. Handbook of Reading Research 2: 341‑80. Warford, H.S. (1972). Design for print production.