Gestalt principles hindered my sudoku performance

Last week, while waiting for friends, I picked up a community newspaper in hopes of finding a puzzle to help me pass the time. I found a sudoku puzzle.

A sudoku puzzle consists of nine 3×3 squares, sprinkled with a few starter numbers. The player must fill in all the blanks by referring to the numbers that are already filled. A number can only occur once in each row of 9, each column of 9, and each 3×3 square.

I regularly complete difficult sudoku puzzles, but this easy one—more starter numbers makes the puzzle easier—was taking much longer than I expected.

I soon realised that my slow performance was due to a design decision by the graphic artist!

In the original puzzle, shown at left, the graphic designer used  shading  for all the starter numbers. In my reformatted version, on the right, I used shading to separate the 3×3 squares. Both puzzles also use thicker lines to separate the 3×3 squares.

gestalt-sudoku-puzzle

The shading for starter numbers, on the left, is unfortunate because it interferes with the player’s perception of the nine 3×3 squares. Instead, players perceive groups of numbers (in diagonals, in sets of two, and sets of five).

I assume the designer’s intention was to help identify the starter numbers. Regardless of the designer’s intention, the human brain processes the shading just as it processes all visual information: according to rules that cognitive psychologists call gestalt principles. A sudoku player’s brain—any human brain—will first perceive the shaded boxes as groups or sets.

gestalt-sudoku-circled

In sudoku, the grouping on the left is actually meaningless—and counterproductive. However, since the brain applies gestalt principles rather involuntarily and at a low level, the grouping cannot easily be ignored. The player must make a deliberate cognitive effort to ignore the disruptive visual signal of the original shading. This extra effort slows the player’s time-on-task performance.

You can check your own perception by comparing how readily you see diagonals and groups in both puzzles above. On the left, are you more likely to see two diagonals, two groups of five, and many groups of two? If you are a sudoku player, you’ll recognise that these groupings in the puzzle are irrelevant to the game.

If you like, you can print the puzzles at the top, and give them to different sudoku players. Which puzzle is faster to complete?

Interested in gestalt principles? I’ve blogged about the use of gestalt principles before.

Auto-correct a touch-screen problem

For the past few months, I’ve been taking an average of 1.6 flights per week on commercial airplanes. Most of these offered seatback entertainment, so I could watch the TV show or movie of my choice, or listen to satellite radio while reading. Touch-screen controls are easy to use because they let me touch—or tap—the item or the control that I want. By using the touch screen, I can select a program, adjust the volume, skip the next song, and so on.

One thing I’ve noticed is that about ¼ of seatback touch screens are poorly registered. By registration I mean that the system and the user agree on where the user is tapping or touching the screen:

An illustration of registration

I recorded a video of two common tasks for a seatback entertainment system: selecting the language and adjusting the volume. As you can see, the registration is off, so I initially get the French interface instead of the English, and I must press an unrelated button to adjust the sound:

This video shows a touch screen that detects a tap in a different location than the one I tapped.

The registration error is significant. My fingertip tapped about 2 cm left of the centre of the EN button. The larger the registration error, the harder to tap a small target—as was the case with the volume controls in the video, above, where I appear to be tapping the Fast-Forward button. On more than one flight I have unintentionally increased the sound to painful levels while attempting to lower the volume!

A system such as this could be made to detect and auto-correct poor registration. If we assume that repeat taps on a blank location indicates poor registration, the software could:

  1. After several repeat taps, select the nearest target—a reasonable guess—even if it is a centimetre or two away from the user’s tap.
  2. Ask the user to confirm the guess. “Did you mean [this one]?”
  3. If the user confirms, calculate the amount by which to correct the registration, and then fix the registration error.

This solution requires a screen—perhaps the start screen—whose choices are spaced far apart, so the system can detect when the user appears to be tapping a blank space:

Tapping a blank space (at right)

If user testing were to show that auto-correction needs human involvement, after calculating the registration error, the system could ask the user to check the corrected registration. For example:

Confirming that the registration is correct
Are you there? Please tap the green circle.

I haven’t done any testing of this idea, nor have I given this much thought, so I’m certain there are many more and better ways to auto-correct a registration problem on a touch screen. I merely wanted to identify one possible solution in order to get to the next point: the need to consider the business drivers when deciding to address (or deciding not to address) a usability problem.

Everything costs money

Fixing this problem—it’s a real problem, you’ve seen the video—would cost money. If the following can be quantified and evaluated within a framework of passenger-experience goals, there may be a convincing business case:

  • Not every passenger can work around a registration problem. Those who cannot would be unable to use the entertainment system. When everyone else gets a movie, how does the passenger with a failing system feel?
  • If a failed entertainment system is perceived as a negative experience, will passengers blame the touch-screen/software manufacturer or blame the airline? I’m sure you can imagine the complaint: “I sat there for hours without a movie! It’s the airline’s fault.” What’s the likelihood that this will cause churn (passenger switches to another brand next time)?
  • Based on the screens I’ve seen, some frustrated passengers must use hard objects that scratch and even gouge the touch screen. Are they trying to force the screen to understand what they want? Are they vandalising the screen? What’s the cost of replacing a damaged or vandalised screen?
  • A scratched screen is like graffiti. It affects every subsequent passenger in that seat. Do vandalised screens affect the airline’s goal of attaining a particular passenger rating for perceived quality or aesthetic experience?
  • The in-flight entertainment system was implicated in a catastrophic Swiss Air crash near Peggy’s Cove about a decade ago. Would a fix to the touch-screen registration problem incur prohibitive safety-testing costs?

Leaner, more agile

This week, I’m attending a few days of training in agile software development, in an Innovel course titled Lean, Agile and Scrum for Project Managers and IT Leadership.

My first exposure to agile was in Desiree Sy‘s 2005 presentation, Strategy and Tactics for Agile Design: A design case study, to the Usability Professionals Association (UPA) annual conference in Montreal, Canada. It was a popular presentation then, and UPA-conference attendees continue to be interested in agile methods now. This year, at the UPA conference in Portland, USA, a roomful of usability analysts and user-experience practitioners discussed the challenges that agile methods present to their practice. One of the panellists told the room: “Agile is a response to the classic development problem: delivering the wrong product, too late.” There was lots of uncomfortable laugher at this. Then came the second, thought-provoking sentence: “Agile shines a light on the rest of us, since we are now on the critical path.” Wow! So it’s no longer developers, but designers, usability analysts, etc, who are holding up the schedule?

An agile loadDuring this week’s training, I’m learning lots while looking for one thing in particular: how to ensure agile methods accommodate non-developer activities, from market-facing product management activities, to generative product design, to early prototype testing, to usability testing, and so on.

I’m starting to suspect that when agile methods “don’t work” for non-developers, it’s because the process is wagging the dog (or that its “rules” are being applied dogmatically). I think I’m hearing that agile isn’t a set of fixed rules—so not a religion—but a sensible and flexible method that team members can adapt to their specific project and product.

Durable design: still possible?

A simple and good design can last and last. Consider the qualities of a BC Telephones operator’s chair from the 1930s:

Telephone operators (ca. 1932)

Environmentally defensible. It is made primarily of a renewable resource—wood—and is so durable that, after decades, it still withstands daily use.

Functional. Originally, at BC Tel, this chair fit a small space, swivelled so the operator could get in and out of a small workspace, and provided a place for the operator’s personal items. After it was decommissioned, this compact and strong chair continued to be functional in other settings.

Aesthetically appealing. I’m thinking of the wood, the form, and the chair’s history. This chair has only marginally been repurposed, because it still seats people as they connect to a telco service—formerly a telephone, now Internet access.

Can we still design objects that last as long as this chair has?

A wooden chair from a telephone operator “call centre”. The video shows the chair’s swivel- and coat-hook features.

User performance depends on conditions

In early June, in a hotel lobby, I stopped to observe someone troubleshooting a wireless connection. I’ve faced this challenge myself, since every hotel seems to have a slightly different process for connecting.

The person I was observing was visually impaired and had his GUI enlarged by about 1000% or more. As he attempted to troubleshoot his wireless connection, he was very rapidly scrolling horizontally and vertically in order to read the text and view the icons in the Wireless Connection Status dialog box. The hugely enlarged GUI flew around the screen. His screen displayed only a small portion of the total GUI, but he never lost his place.

Only part of the screen is visible

In contrast, I lost my place repeatedly. I couldn’t relate the different pieces of information, so what I saw was effectively meaningless to me much of the time. His spatial awareness—his ability to navigate quickly around a relatively large area—was clearly more developed than mine.

I could not keep up with all of the text, either, even when he was reading it to me out loud: “It says ‘Signal Strength” is 4 bars, but it won’t connect. See?” (Well, actually, I didn’t see.) Though I’m very familiar with this dialog box, I could only read the shorter words as they flew by on screen. The larger words were illegible to me. His ability to read rapidly-moving whole words when only parts of them were visible at any given instant was much more developed than mine. I felt sheepish about being functionally illiterate under these conditions.

Flying text is hard to read

It was interesting to see how my own user performance depends on such a narrow range of conditions. I need to see the whole word and its context. I need to see at least half the dialog box at once. And, if the image is moving, it must be moving slowly.

Designing and influencing user performance

When designing the user experience of software, UX- and Development teams often focus on how the user interface supports user performance, because that’s within their locus of control. Once the product is in the wild, environmental factors may reduce user performance despite the team’s best product-design efforts. But I believe it’s possible for a UX team to also influence the environment in which their products get used. Consider two of these:

  • The user’s display size.
  • The soundscape.
Large displays < one salary

The environment affects user performanceUsers of all ages and genders are more effective at performing search tasks and comparison tasks (Tao Ni et al, 2006), and more effective at spatial tasks, when they use large displays. Mary Czerwinski et al, reported a 12% significant performance benefit (2003). However, when given a choice, people don’t want very large displays on their office desks; they opt for medium-sized displays instead. One study showed that older users least prefer large displays but stand to gain the most performance benefit. (This study was done before multi-monitor arrangements became common.)

A 12% improvement in performance suggests that 7 people with large displays could theoretically do the job of 8 people with medium displays. How many large displays could your office buy for one person’s salary every year? For business-to-business sales and especially for enterprise-wide software implementations, there’s a place for sales teams and proposal writers to mention the business case for larger displays.

Call it what you want—innovation, thinking outside the box, providing solutions—your UX-Design team can work with the Sales and Service/Implementation teams to ensure customers get solutions that include better hardware choices.

Speak less clearly, please

A half-decade of research by Dr Sabine Schlittmeier has expanded on what common sense told us: it’s harder to concentrate when others are chatting in the background. Schlittmeier found that when background speech is louder and more intelligible, it negatively affects verbal short-term memory, sustained attention, and verbal-logical reasoning. When I asked her what techniques have been shown successful, Schlittmeier told me that a masking sound, such as music or talk radio, is not objectively effective because the higher level of background sound has detrimental cognitive effects, but subjectively people feel this is effective. She added that there’s a measurable benefit to:

  • Shifting high-concentration work to times when fewer people are around.
  • Doing high-concentration work in single offices.

I suppose working remotely—from a quiet home—is a variation of these solutions.

I also asked, “What one thing, if handled differently, would most improve the way people experience noise at work?” Schlittmeier said it’s not about one thing. She recommended attacking problem sound from all dimensions at once: loudness, frequency characteristics, sound production, transmission, and so on.

The way I read the research results, reducing background speech to a soft, unintelligible noise could result in a 10% to 25% decrease in memory errors and logic errors, and an 18% increase in attention span. What Schlittmeier hasn’t provided is data about overall productivity improvement, without which it’s harder to make a business case for spending on office-noise abatement.

But there are other ways to mitigate the background office noise that affects your users, and you may be able to influence how your customers approach that problem.

A box that promotes wide screeens or headsetsAgain: call it what you want—innovation, thinking outside the box, providing solutions—your UX-Design team can work with the Marketing team to influence the environment through traditional marketing. Imagine a business-to-consumer product that is designed to work even better with a (noise-cancelling) headset—and which is depicted in use with headsets in the marketing messages and on the packaging.

Train yourself in frustration, confusion, and inefficiency

For professional reasons, I like to mess around with software. It’s a form of training, because some of the messing around leads to frustration, confusion, and inefficiency. And that’s good.

My hope is that my experiences will help me to better understand what I put various groups of software users through when they use the software I helped design and build.

An easy way to mess around is by changing default settings. For example, my iTunes isn’t set to English. This helps me understand the experience of users who learned one language at home as children and now use another language at work as adults. It’s not just beneficial to experience the initial pain of memorising where to click (as I become a rote user in a GUI I cannot read), but also the additional moments of frustration when I must do something new—an occasional task whose command vector I haven’t memorised.

Relating to the language challenges that some users face

Another easy way to mess around is to switch between iMac and Windows computers. It’s not just the little differences, such as whether the Minimise/Maximise/Close buttons are on the left or right sides of the title bar, or whether that big key on the keyboard is labelled Enter or Return.

Switching between operating systemsIt’s also the experience of inefficiency. It’s knowing you could work faster, if only the tool weren’t in your way. This also applies to successive versions of “the same” operating sytem. This is the frustration of the transfer user.

It’s noticing how completely arbitrary many design standards are—how arbitrarily different between operating systems—such as the End key that either does or doesn’t move the insertion point to the end of the line.

Another easy way to mess around is to run applications in a browser that’s not supported. I do it for tasks that matter, such as making my travel bookings.

All this occasional messing around is about training myself. The experiences I get from this broaden the range of details I ask developers to think about as they convert designs into code and into pleasing, productive user experiences.

In a separate IxDA discussion thread, a few people reacted to this blog post:

  • Try a Dvorak keyboard instead of a Qwerty keyboard (Johnathan Berger).
  • Watch children’s first use of a design (Brandon E.B. Ward).
  • Use only the keyboard, not the mouse (CK Vijay Bhaskar).
  • Sit in at the Customer Support desk for a day (Adrian Howard).
  • Search Twitter to find out how people feel about a product (Paul Bryan).

See also the comment(s) below, directly in this blog.

Unreliability of self-reported user data

Many people are bad at estimating how often and how long they’re on the phone. Interestingly, you can predict who will overestimate and who will underestimate their phone usage, according to the 2009 study, “Factors influencing self-report of mobile phone use” by Dr Lada Timotijevic et al. For this study, a self-reported estimate is considered  accurate if it is within 10% of the actual number:

Defining 'accuracy'

Underestimated Accurate Overestimated
Number of phone calls (number of people) (number of people) (number of people)
High user 71% 10% 19%
Medium user 53% 21% 26%
Low user 33% 16% 51%
Duration of phone calls
High user 41% 20% 39%
Medium user 27% 17% 56%
Low user 13% 6% 81%

If people are bad at estimating their phone use, does this mean that people are bad at all self-reporting tasks?

Not surprisingly, it depends how long it’s been since the event they’re trying to remember. It also depends on other factors. Here are some factoids that should convince you to be careful with self-reported user data that you collect.

What’s the problem with self-reported data?

On questions that ask respondents to remember and count specific events, people frequently have trouble because their ability to recall is limited. Instead of answering “I’m not sure,” people typically use partial information from memory to construct or infer a number. In 1987, N.M. Bradburn et al found that U.S. respondents to various surveys had trouble answering such questions as:

  • During the last 2 weeks, on days when you drank liquor, about how many drinks did you have?
  • During the past 12 months, how many visits did you make to a dentist?
  • When did you last work at a full-time job?

To complicate matters, not all self-report data is suspect. Can you predict which data is likely to be accurate or inaccurate?

  • Self-reported Madagascar crayfish harvesting—quantities, effort, and harvesting locations—collected in interviews was shown reliable (2008, Julia P. G. Jones et al).
  • Self-reported eating behaviour by people with binge-eating disorders was shown “acceptably” reliable, especially for bulimic episodes (2001, Carlos M. Grilo et al).
  • Self-reported condom use was shown accurate over the medium term, but not in the short term or long term (1995, James Jaccard et al).
  • Self-reported numbers of sex partners were underreported and sexual experiences and condom use overreported a year later when compared to self-reported data at the time (2002, Maryanne Garry et al).
  • Self-reported questions about family background, such as father’s employment, result in “seriously biased” research findings in studies of social mobility in The Netherlands—by as much as 41% (2008, Jannes Vries and Paul M. Graaf).
  • Participation in a weekly worship service is overreported in U.S. polls. Polls say 40% but attendance data says 22% (2005, C. Kirk Hadaway and Penny Long Marler).
Can you improve self-reported data that you collect?

Yes, you can. Consider these:

  • Decomposition into categories. Estimates of credit-card spending gets more accurate if respondents are asked for separate estimates of their expenditures on, say, entertainment, clothing, travel, and so on (2002, J. Srivastava and P. Raghubir ).
  • For your quantitative or qualitative usability research or other user research, it’s easy to write your survey questions or your lines of inquiry so they ask for data in a decomposited form.

  • Real-time data collection. Collecting self-reported real-time data from patients in their natural environments “holds considerable promise” for reducing bias (2002, Michael R. Hufford and Saul Shiffman).
    Collecting real-time self-report data
  • This finding is from 2002. Social-media tools and handheld devices now make real-time data collection more affordable and less unnatural. For example, use text messages or Twitter to send reminders and receive immediate direct/private responses.

  • Fuzzy set collection methods. Fuzzy-set representations provide a more complete and detailed description of what participants recall about past drug use (2003, Georg E. Matt et al).
  • If you’re afraid of math but want to get into fuzzy sets, try a textbook (for example, Fuzzy set social science by Charles Ragin), audit a fuzzy-math course for social sciences (auditing is a low-stakes way to get things explained), or hire a tutor in math or sociology/anthropology to teach it to you.

Also, when there’s a lot at stake, use multiple data sources to examine the extent of self-report response bias, and to determine whether it varies as a function of respondent characteristics or assessment timing (2003, Frances K. Del Boca and Jack Darkes). Remember that your qualitative research is also one of those data sources.

Up and down the TV channels

My television lets me step through the channels. To do this, I use the remote control’s CH button. Similarly, my television lets me page through the list of programs, five channels at a time. To do this, I use the remote control’s PG button. In fact, it’s one button for the stepping and paging functions.

My remote control

The programs in the list are shown in numeric order, so smaller numbers are higher in the list. Pressing “+” will page the list up, so “+” leads to smaller numbers. Similarly, pressing “–” will page the list down, to larger numbers. This follows the same mental model as scrolling in a computer window, including the one you’re reading in, now.

Scrolling up

In contrast, when I’m watching one channel (full-screen, so with the program guide hidden), the same two buttons have the inverse effect. The “+” button increases the number of the channel (which is like moving down in the programs list, not up). This follows the same mental model as a spin control in many computer programs.

Spinning up

Imagine using the one button in succession for the two functions:

first as PG to page through the menu
  and then, after selecting a channel,
as CH to step through the channels.

I see in this an excellent problem for a practicum student or as a class assignment that’s combining user research, design, GUI, and handheld devices. Possible questions:

  • What research would confirm that this is, in fact, a problem?
  • If you confirm the problem, is it entirely on the hardware side? How many people are affected?
  • Is there a business case to fix the problem?
  • How could you fix it? What design methods and processes would you use? Why?
  • How could you demonstrate that your design fixes the problem? Is there a lower-cost way to validate the design, and, if so, what are the trade-offs?

Testing in the UX-design process

Three weeks ago, a client called me. They had just completed release 1.0 of a new Web application that will replace their current flagship product. The client was asking about summative usability testing to evaluate how well the product performs in the hands of users, because they want their customers to succeed.

Since the product is an enterprise-wide product that requires training, one thing the client specifically asked about was whether the Help is a help to users.

Do you need help?A quick heuristic review I did turned up no obvious problems in the Help, so we decided on user observation with scenarios. In a preparatory dry run done a few weeks ago, I supplied a participant with a few scenarios and some sample data. The participant I observed was unable to start two of the scenarios, and completed the third scenario incorrectly by adding data to the wrong database.

The Help didn’t help her. The participant was able to find the right Help topic, but she completely misinterpreted the first step in the Help’s instructions.

The team had not anticipated the apparent problem that turned up during the dry run. Assuming it is a real problem—and this can’t be more than an assumption given the sample size of 1—this story nicely illustrates the benefit of summative testing, as you’ll see below.

Best practices working together

The team, including a product manager, several developers, a technical communicator as Help author, and me as a contract usability analyst, used these best practices:

  • The Help author used a single-sourcing method. The most common GUI names, phrases, and sentences, are re-used, inserted into many topics from one source, like a variable. In almost every Help topic, the problematic first step was one such re-usable snippet.
  • The product manager assesses the bugs based on severity and cost, ensuring the low-hanging fruit and most serious of defects get priority.
  • In a heuristic review of the Help, I (wearing a usability-analyst hat) did not predict that the first step in most topics would be misinterpreted. Heuristic reviews, when conducted by a lone analyst, typically won’t predict all usability problems.
  • The developers use an Agile method. At this stage of their development cycle, they build a new version of the product every Friday, and, after testing, publish it the following Friday.

After the dry run uncovered the apparent problem, the product manager said: “Let’s fix it.” Since the Help author used re-usable snippets, rewording in one place quickly fixed the problem throughout the Help. And the company’s Agile software development method meant the correction has already been published.

Was this the right thing to do? Should an error found by one participant during a dry run of upcoming usability tests result in a change? The team’s best practices certainly made this change so inexpensive as to be irresistible. With the first corporate customer already migrated to the new product, my client has a lot riding on this. I can’t be certain this rewritten sentence has improved the Help, but—along with the other bugs they’ve fixed—I know it increases my client’s confidence and pride in their product’s quality.

It’ll be interesting to see what the upcoming user observations turn up.

Reminding myself of things I already know

The actual user-observation sessions are still ten days away, but the dry run reminded me of things I already know:

  • Despite each professional’s best efforts, there will always be unanticipated outcomes where users are involved. Users have a demonstrated ability to be more “creative” than designers, developers, and content authors, simply by misinterpreting and making unintended use/misuse of our work.
  • The best practices in each discipline can dovetail and work together to allow rapid iteration of the product by the team as a whole. A faster response means fewer users will be affected and the cost of support—and of the rapid iteration—will be lower. A good development process adjusts practices across teams (product management, research, development, user experience, design, tech-comm, quality assurance) so the practices dovetail rather than conflict.
  • Summative testing helps validate and identify what needs to be iterated. Testing earlier and more often means that fewer or perhaps no users will be affected. Testing earlier and more often is a great way to involve users, a requirement for user-centred design, or UCD. It also changes the role of testing from summative to formative, as it shapes the design of the product before release, rather than after.