Usability testing distant users

When a product’s users are scarce and widely dispersed, and your travel budget is limited, usability testing can be a challenge.

Remote testing from North America was part of the answer, for me. I’ve never used UserVue because the users I needed to reach were in Africa, Australia, South America, and Asia—continents that UserVue doesn’t reach. Even within North America, UserVue didn’t address the biggest problems I faced:

  • My study participants commonly face restrictive IT policies, so they cannot install our pre-release product and prerequisites.
  • I need to prevent study participants from risking their data by using it with a pre-release product.
  • There’s no way to force an uninstall after the usability test. Who else will see our pre-release?

Instead, I blended a solution of my own with Morae, Skype, Virtual Audio Cable, and GoToMeeting. I Testing that's really remoteused GoToMeeting to share my desktop, which addresses all three of the problems listed above. I used Skype to get video and audio. I used Virtual Audio Cable to redirect the incoming voice from Skype to Morae Recorder’s microphone channel. Morae recorded everything except the PIP video. It worked. However, my studies were sometimes limited by poor Internet bandwidth to the isolated locations of my study participants.

Amateur facilitators. I realise this is controversial among usability practitioners, but beggars in North America can’t be choosers about how they conduct usability tests on other continents. I developed a one-hour training session for the team of travelling product managers. Training included a short PowerPoint presentation about the concepts, followed by use of Morae Recorder with webcam and headset while role-playing in pairs. The main points I had to get across:

  • Between study participants, reset the sample data and program defaults.
  • When you’re ready to start recording, first check that the video and audio are in focus and recording.
  • While you facilitate, do not lead the user. Instead, try paraphrasing and active listening (by which I mean vernacular elicitation). Remember that you’re not training the users, so task failure is acceptable, and useful to us.

I had a fair bit of influence over the quality of the research, since I developed the welcome script and test scenarios, provided the sample data, and analysed the Morae recordings once they arrived in North America. Due to poor Internet bandwidth to the isolated areas of my study participants, the product managers had to ship me the Morae recordings on DVD, by courier.

It worked. I also believe that amateur facilitation gave the product managers an additional opportunity to learn about customers.

Why products stay pre-chasm

I’ve spent some time working with legacy products—software for which the core code was written before “usability” was a term developers had heard of, back when developers were still called programmers.

I remember my first conversation—held last century—with a developer about his users and the usability of his legacy product. The product-adoption curve, with the chasmI used Geoffrey Moore’s book, Crossing the chasm, to introduce the idea that there are different types of users. The chasm (illustrated) shows five groups of users. The area under the curve represents the quantity of potential users, in the order in which they will typically adopt the product. Users who are technology enthusiasts and visionaries either enjoy or don’t mind being on the bleeding edge because of the benefits they get from using the product, according to Moore. Usability isn’t one of the typical benefits to the left of the chasm, where new products are first introduced. Wider adoption follows only after the product is made to cross the chasm—but this takes a concerted effort on the part of the development team.

What delays a product from crossing the chasm?

  • Revenues that are sufficient to let the company putter along.
  • Team members who themselves are tech enthusiasts or visionaries. This occurs, for example, when a nutritionist who also knows how to program develops software for nutritionists.
  • Team members who have infrequent exposure to newer user experiences. This could include someone who still uses Microsoft Office 2003, eschews social-networking applications on the web, or uses Linux.
  • Product managers whose roles are weakly defined or absent, making it more likely that new features get developed for the technical challenge of it, rather than for the user need or the business strategy.
  • Sales reps who ask for more functions in the product in order to make a sale—and a development culture that goes along with this.
  • Managers whose vision and strategic plan leave out user experience and usability.
  • Team members who have been on the team for decades.
  • Organisations that don’t use business cases to help them decide where to apply business resources.
  • Organisations with poor change-management practices—because moving a product across the chasm is more likely dramatic and disruptive than smooth and gradual.
  • Designers who are weak or absent in the process, so that “design” happens on the fly, during development.

If you recognise your work environment or yourself in this list, do you want to change things? If you do, but don’t know how, what actions can you take to learn the answer?

Are usability studies experiments?

When I conduct usability studies, I use a laptop and Morae to create a portable and low-cost usability lab. I typically visit the participant on site, so I can look around. I provide a scenario that participants can follow as they try using a new software feature. Morae records their voices, facial expressions, on-screen actions, clicks, and typing. The raw data gets saved in a searchable and graphable format. Afterward, I review the recorded data (I rarely have written notes) and make evidence-based recommendations to the feature’s project team.

For example, in one usability study I realised that the order of the steps was causing confusion and doubt. The three-step wizard would disappear after step, so users could modify their data on the screen. Users thought the task was completed with step 2, so the reappearance of the wizard for step 3 caused confusion. I recommended we change the order of the steps:

Change the order of the steps

Changing the order fixed the “confused users” problem. But was this scientific? Aside from the fact that I’m researching human subjects rather than test tubes, am I following a scientific model? At first glance, the usability test I conducted doesn’t seem to follow the positivist model of what constitutes an experiment:

A positivist model

For example, unlike the test-tube lab experiment of a chemist, my usability test had no control group. Also, my report went beyond factual conclusions to actual recommendations (based on my expert opinion) for the next iteration.

I can argue this both ways.

On the one hand, my usability test does have a control group if I take the next iteration of the product and repeat the usability test with additional participants, to see whether my recommendations solved the problem. I could compare the task-completion rates and the task duration.

A model of usability testing

On the other hand, if I were asked to determine whether the product has an “efficient” workflow or a “great” user experience—which are subjective measures—I’d say a positivist-model experiment is inappropriate. To measure a user’s confusion or satisfaction, I might consider their facial expressions, verbal utterances, and self-reported ratings. This calls for a research design whose epistemology is rooted in post-positivist, ethnomethodological, situated, standpoint, or critical approaches, and has more in common with research done by an ethnographer than by a chemist.

If you liked this, you may also like Epistemology of usability studies.

Epistemology of usability studies

Currently, I’m conducting research on usability analysis and on how Morae software might influence that. My research gaze is rather academic, in that I’m especially interested in the epistemology of usability analysis.

One of my self-imposed challenges is to make my research relevant to usability practitioners. I’m a practitioner and CUA myself, and I have little time for academic exercises because I work where the rubber hits the road. This blog post outlines what I’m up to.

At Simon Fraser University, I learned that epistemological approaches have different assumptions about what is knowable. On one side (below, left), it’s about numbers, rates, percentages, graphs, grids, tables, proving absolute truths. On the other side, (below, right) it’s about seeking objectivity while knowing that it’s impossible because everything has a cultural context. The epistemology you choose, when doing research, depends on what you believe. And the epistemology dictates what methods you use, and how you report your results.

You can be
certain of
what you know.
You cannot be
objective about
what you know.

Let’s look at some examples.

Study 1 fits with the view (above, left) that “you can be certain of what you know.” I plan and conduct a quantitative study to measure the time it takes a series of users to complete two common tasks in a software package: upgrading to the latest version of the software, and activating the software. I make appointments with users. In my workplace, I give each user a scenario and a computer. I observe them and time them as they complete the tasks by using the software package. My hope is that statistical analysis will give me results that I can report, including the average time on task with error bars, as the graph (right) illustrates.

Study 2 fits with the view (above, right) that “you cannot be objective about what you know” because all research takes place within a context. To lessen the impact of conducting research, I contact users to ask if I can study their workplace. I observe each user for a day. My hope is to analyse the materials and interaction that I’ve observed in context—complete with typical interruptions, distractions, and stimuli. Since a new software version has just been released, my hope is that I’ll get to observe them as they upgrade. I’ll report any usability issues, interaction-design hurdles, and unmet needs that I observe.

The above are compilations of studies I conducted.

  • Study 1 revealed several misunderstandings and installation problems, including a user who abandoned the installation process because he believed it was complete. I was able to report the task success rate and have the install wizard fixed.
  • Study 2 revealed that users write numbers on paper and then re-enter them elsewhere, which had not been observed when users visited our site for usability testing. One user told me: “I never install  the latest version because the updates can be unstable,” and another said: “I only upgrade if there’s a fix for a feature I use” to avoid unexpected new defects. I was able to report the paper-based workaround and the users’ feelings about quality, for product managers to reflect in future requirements.

Clearly, there’s more than one way to conduct research, and not every method fits every team. That’s an idea that can be explored at length.

This has me wondering: which method fits what, when, where? Is there a relationship between a team’s development process and the approach to user research (epistemology) that it’s willing to embrace? …between its corporate usability maturity and the approach?

Those are two of the lines of inquiry in my research at Simon Fraser University.

If you liked this post, you may also like Are usability studies experiments?

Put the card in the slot

We know that human brains use patterns (or schemata) to figure out the world and decide what to do. This kind of cognitive activity takes place very quickly, which means we can react quickly to the world around us, as long as the pattern holds.

Here’s a pattern (or schema) that your brain may know: to put a card in a slot, use the narrow edge. If the card has a clipped corner, use that edge. Some examples:
clipped-cards-with-affordance
This pattern is easy for your brain because of the physical cues—also known as affordance. The narrow edge + clipped corner say: “This side goes in.”

A pattern that’s harder for your brain to learn is the magnetic-stripe bank card. You have to think about which way the stripe goes. And it seems harder when you’re in a hurry, or when you feel less safe. Have you used an outdoor bank machine at night, with people hanging around?

Vancouver transit tickets are awful because they don’t follow the card-in-the-slot pattern correctly. As a passenger on the SkyTrain (overhead subway line), you must punch your ticket at the station entrance. The machines are placed so you must turn your back on the drug dealers and their customers who hang out in the subway. As with bank cards, these transit tickets fit four ways, but only one will punch the ticket, and it’s not the edge with the clipped corner. There’s a yellow arrow to assist, but while the arrow is clearly visible in daylight, it’s almost invisible in the yellow-tinged fluorescent lighting of most SkyTrain stations. Also, the yellow arrow must be face down, which is counterintuitive because then you cannot see it. In the photo (above, right), can you find the arrow on the ticket?

Did the designers consider how their ticket machines or bank cards would make passengers feelDesigning for emotion seems to have legs, these days, but it’s not a new idea.

Doing better. I wonder whether the designers of these systems (transit tickets, bank cards) considered all possible options. It’s a Five Sketches™ mantra: You get a better design when you first saturate the design space. This can include doing a competitor analysis to seek out other ideas. And there are other models for transit tickets. I’ve seen Paris subway tickets with the magnetic stripe in the middle, so passengers could insert the ticket face up or face down, frontwards or backwards. The more recent tag-on/tag-off technology used from London, UK, to Perth, WA, avoids the insert-your-card problem—though the overall experience may be worse, since failure to tag off means paying the highest possible fare.

As for bank cards, IBM’s designers must have modelled bank cards on credit cards, which had the magnetic stripe toward the top instead of in the middle. This doubles the customer’s chances of inserting the card incorrectly. An obvious question to have asked at the design stage: can we design a bank machine to read the card regardless of how it’s inserted?

Standard OK-Cancel button order

I have two stories about command buttons.

Quite a few years ago, a team member walked me through a new dialog box. He entered some data, and then unintentionally clicked the Cancel button. He made this error twice in a row, thus losing his changes twice in a row. I pointed out that the OK and Cancel buttons were in the wrong order. The developer switched the buttons to the Windows-standard layout (below, right), and the user-performance problem was solved.

A few years later, on a different project, not only were the buttons in non-standard order, they used non-standard wording and they used coloured icons. My request to follow the Windows standard was met only half-way and then sent for Beta testing before I saw it again. The buttons were now in the correct order, but the button names were changed, and the names and icons were still non-standard. Beta testers loudly protested the change. (Beta testers are often expert users, and experts abhor any change that slows them down.) At the time, the company was only a few steps up the Neilsen Corporate Usability Maturity model, so instead of completing the change to Windows-standard OK and Cancel buttons, the buttons were rolled back, to appease the protesting Beta users. I found out too late to retest with Windows-standard buttons, so there was no data to convince the developers. For me, it was an opportunity to learn from failure. 🙂

Why is non-standard so hard?

Try this Stroop test (right). Ignore the words. Instead, identify the colours, out loud. No doubt, the second panel went slower and took more effort.

Try the variation, at left. Find the first occurrence of the word Blue. Next, find the first occurrence of the colour .

Just as mismatches between text and colour slow your Stroop-test performance, mismatches between standard and non-standard OK and Cancel buttons slow user performance. Our Beta users clicked the wrong buttons—a huge waste of their time—because the new solution didn’t follow any standard. The Beta testers were right to protest, but wrong in their demand to revert to the original non-standard state. (See: Customers can’t do your job.)

Users learn GUI patterns—patterns that are widely reinforced by user experience—and users expect GUI to behave predictably, so it’s unwise to deviate radically from the standards, unless there are product-management reasons to do so.

I’ll write more about following standards versus designing something new in the coming few posts.

P.S. It looks like Jakob Nielsen got here before me.

Users are not used to it

For several years, I did usability testing on CAD-style software that was full of legacy code, some of which preceded Windows 98.

Some of that legacy code dealt with CAD objects that displayed on screen. To work with these objects, users had a choice of menu commands and toolbar buttons, supplemented by dialog boxes. For example, to move an object, users could not simply click and drag it; they would choose a command, click the object, and then, in a dialog box, enter the distance to move the object.

That’s the way CAD programs worked when that legacy code was originally written.

Over the years, during my usability testing of various features, I noticed a growing trend toward direct manipulation. That is, to work with an object, users would try to click it or drag it. They would do this without thinking. Even long-time users, faced with a new feature (studies from 2005-2006), would try direct manipulation first:

  • 100% of the test subjects clicked a cube, trying to select it.
  • 100% of the test subjects dragged a point or line, trying to move it.
  • 100% of the test subjects clicked in the window, trying to create a point.
  • 100% dragged across points, trying to select them.

But the new features were built on the legacy code, so had the command-driven interaction style. A simple click on the object was usually a dead end for users.

And the users would say: “Darn,” and then look for another vector—another pathway—to complete the task.

The reasons we didn’t provide direct manipulation:

  • “Our users are used to the way it is now.” Clearly, usability-test results negated that argument. Users are not so accustomed to old-style interaction, because their first instinct for new CAD tasks in an existing product was direct manipulation.
  • “There’s not enough bang for the buck” because the opportunity cost (the cost of skipping other possible projects) was deemed too great. It’s hard to argue with this, as a usability analyst. The company opted for more features, and may have increased its risk of being leapfrogged by the competition, as discussed in an earlier blog.

From napkin to Five Sketches™

In 2007, a flash of insight hit me, which led to the development of the Five Sketches™ method for small groups who need to design usable software. Looking back, it was an interesting journey.

The setting. I was working on a two-person usability team faced with six major software- and web products to support. We were empowered to do usability, but not design. At the time, the team was in the early stages of Nielsen’s Corporate Usability Maturity model. Design, it was declared, would be the responsibility of the developers, not the usability team. I was faced with this challenge:

How to get usable products
from software- and web developers
by using a method that is
both reliable and repeatable.

The first attempt. I introduced each development team to the usability basics: user personas, requirements, paper prototyping, heuristics, and standards. Some developers went for usability training. In hindsight, it’s easy to see that none of this could work without a formal design process in place.

The second attempt. I continued to read, to listen, and to ask others for ideas. The answer came as separate pieces, from different sources. For several months, I was fumbling in the metaphorical dark, having no idea that the answer was within reach. Then, after a Microsoft product launch on Thursday, 18 October, 2007, the light went on. While sitting on a bar stool, the event’s guest speaker, GK Vanpatter, mapped out an idea for me on a cocktail napkin:

  1. Design requires three steps.
  2. Not everyone is comfortable with each of those steps.
  3. You have to help them.

The quadrants are the conative preferences or preferred problem-solving styles.

I recognised that I already had an answer to step 3, because I’d heard Bill Buxton speak at the 2007 UPA conference, four months earlier. I could help developers be comfortable designing by asking them to sketch.

It was more easily said than done. Everyone on that first team showed dedication and courage. We had help from a Vancouver-based process expert who skilfully debriefed each of us and then served us a summary of remaining problems to iron out. And, when we were done, we had the beginnings of an ideation-and-design method.

Since then, it’s been refined with additional teams of design participants, and it will be refined further—perhaps changed significantly to suit changing circumstances. But that’s the story of the first year.

Functional sophistication, not complexity

Some software companies add ever more features to their software as a way to differentiate it from its competitors. Lucinio Santos’ lengthy analysis of sophistication versus complexity includes this graphic:

functional-sophistication-not-complexity

An excellent example of simplification is the Microsoft Office ribbon. Many users who upgrade dislike the ribbon for months because of the sheer amount of GUI change it imposes, but the ribbon successfully simplifies and makes existing features more discoverable.

Incidentally, the Office ribbon was designed by a design team using generative design. I facilitated a ribbon-design project that used a team of developers Five Sketches™—a method that incorporates a generative design.