Testing in the UX-design process

Three weeks ago, a client called me. They had just completed release 1.0 of a new Web application that will replace their current flagship product. The client was asking about summative usability testing to evaluate how well the product performs in the hands of users, because they want their customers to succeed.

Since the product is an enterprise-wide product that requires training, one thing the client specifically asked about was whether the Help is a help to users.

Do you need help?A quick heuristic review I did turned up no obvious problems in the Help, so we decided on user observation with scenarios. In a preparatory dry run done a few weeks ago, I supplied a participant with a few scenarios and some sample data. The participant I observed was unable to start two of the scenarios, and completed the third scenario incorrectly by adding data to the wrong database.

The Help didn’t help her. The participant was able to find the right Help topic, but she completely misinterpreted the first step in the Help’s instructions.

The team had not anticipated the apparent problem that turned up during the dry run. Assuming it is a real problem—and this can’t be more than an assumption given the sample size of 1—this story nicely illustrates the benefit of summative testing, as you’ll see below.

Best practices working together

The team, including a product manager, several developers, a technical communicator as Help author, and me as a contract usability analyst, used these best practices:

  • The Help author used a single-sourcing method. The most common GUI names, phrases, and sentences, are re-used, inserted into many topics from one source, like a variable. In almost every Help topic, the problematic first step was one such re-usable snippet.
  • The product manager assesses the bugs based on severity and cost, ensuring the low-hanging fruit and most serious of defects get priority.
  • In a heuristic review of the Help, I (wearing a usability-analyst hat) did not predict that the first step in most topics would be misinterpreted. Heuristic reviews, when conducted by a lone analyst, typically won’t predict all usability problems.
  • The developers use an Agile method. At this stage of their development cycle, they build a new version of the product every Friday, and, after testing, publish it the following Friday.

After the dry run uncovered the apparent problem, the product manager said: “Let’s fix it.” Since the Help author used re-usable snippets, rewording in one place quickly fixed the problem throughout the Help. And the company’s Agile software development method meant the correction has already been published.

Was this the right thing to do? Should an error found by one participant during a dry run of upcoming usability tests result in a change? The team’s best practices certainly made this change so inexpensive as to be irresistible. With the first corporate customer already migrated to the new product, my client has a lot riding on this. I can’t be certain this rewritten sentence has improved the Help, but—along with the other bugs they’ve fixed—I know it increases my client’s confidence and pride in their product’s quality.

It’ll be interesting to see what the upcoming user observations turn up.

Reminding myself of things I already know

The actual user-observation sessions are still ten days away, but the dry run reminded me of things I already know:

  • Despite each professional’s best efforts, there will always be unanticipated outcomes where users are involved. Users have a demonstrated ability to be more “creative” than designers, developers, and content authors, simply by misinterpreting and making unintended use/misuse of our work.
  • The best practices in each discipline can dovetail and work together to allow rapid iteration of the product by the team as a whole. A faster response means fewer users will be affected and the cost of support—and of the rapid iteration—will be lower. A good development process adjusts practices across teams (product management, research, development, user experience, design, tech-comm, quality assurance) so the practices dovetail rather than conflict.
  • Summative testing helps validate and identify what needs to be iterated. Testing earlier and more often means that fewer or perhaps no users will be affected. Testing earlier and more often is a great way to involve users, a requirement for user-centred design, or UCD. It also changes the role of testing from summative to formative, as it shapes the design of the product before release, rather than after.

Cognitive psych in poll design

The WordPress community recently ran a poll. Users were asked to choose one of 11 visual designs. The leading design got only 18% of the vote, which gives rise to such questions as:

  • Is this a meaningful win? The leader only barely beat the next three designs, and 82% voted for other designs.

WordPress pollI don’t know about the 18% versus 82%. I do wonder whether some of the entries triggered a cognitive process in voters that caused them to pay less attention to the other designs, which may bring the leading design’s razor-thin lead into question. This cognitive process—known as the “ugly option”—is used successfully by designers as they deliberately apply cognitive psychology to entice users to act. I’ll explain why, below, but I first want to explain my motivation for this blog post.

I’m using this WordPress poll as a jumping-off point to discuss the difficulty of survey design. I’m not commenting on the merit of the designs. (I never saw the designs up close.) And I’m certainly not claiming that people involved in the poll used cognitive psych to affect the poll’s outcome. Instead, in this blog, I’m discussing what I know about cognitive psychology as it applies to the design of surveys such as this recent WordPress.org poll.

Survey design affects user responses

If you’ve heard of the controversial Florida butterfly ballot in the USA’s presidential election in 2000, then you know ballot design—survey design—can affect the outcome. I live outside the USA, but as a certified usability analyst I regularly come across this topic in industry publications; since that infamous election, usability analysts in the USA have been promoting more research and usability testing to ensure good ballot design. I imagine that the Florida butterfly ballot would have tested poorly in a formative usability study.

The recent WordPress poll, however, would likely have tested well in a usability study to determine whether WordPress users could successfully vote for their choice. The question I have is whether the entries themselves caused a cognitive bias in favour of some entries at the expense of others.

It seems that one entry was entered multiple times, as dark, medium, and light variations. This seems like a good idea: “Let’s ask voters which one is better.” Interestingly, the visual repetition—the similar images—may have an unintended effect if you add other designs into the mix. Cognitive science tells us people are more likely to select one of the similar ones. Consider this illustration:

More people choose the leftmost image. The brain’s tendency to look for patterns keeps it more interested in the two similar images. The brain’s tendency to avoid the “ugly option” means it’ll prefer the more beautiful one of the two. Research shows that symmetry correlates with beauty across cultures, so I manipulated the centre image in Photoshop to make it asymmetrical, or “uglier”.

The ugly-option rule applies to a choice between different bundles of goods (like magazine subscriptions with different perks), different prices (like the bottles on a restaurant wine list), and different appearances (like the photos, above). It may have applied to the design images in the WordPress poll. The poll results published by WordPress.org lists the intentional variations in the table of results:

  • DR1: Fluency style, dark
  • DR2: Fluency style, medium
  • DR3: Fluency style, light

The variants scored 1st, 4th, and 6thIn addition to these three, which placed 1st, 4th, and 6th overall, it’s possible there were other sets of variations, because other entries may have resembled each other, too.

As a usability analyst and user researcher, I find this fascinating. Does the ugly-option rule hold true when there are 11 options? Was the dark-medium-light variation sufficient to qualify one of the three as ugly? Did the leading design win because it was part of a set that included an ugly option? And, among the 11 entries, how many sets were there?

There are ways to test this.

Test whether the poll results differ in teh absence of an ugly-option set. A|B testing is useful for this. It involves giving half the users poll A with only one of the dark-medium-light variants, and the other half poll B with all three variants included. You can then can compare the two result sets. If there is a significant difference, then some further combinations can be tested to see if other possible explanations can be ruled out.

For more about the ugly option and other ways to make your designs persuasive, I recommend watching Kath Straub and Spencer Gerrol in the HFI webcast, The Science of Persuasive Design: Convincing is converting, with video and slides. There’s also an audio-only podcast and an accompanying white paper.