Unreliability of self-reported user data

Many people are bad at estimating how often and how long they’re on the phone. Interestingly, you can predict who will overestimate and who will underestimate their phone usage, according to the 2009 study, “Factors influencing self-report of mobile phone use” by Dr Lada Timotijevic et al. For this study, a self-reported estimate is considered  accurate if it is within 10% of the actual number:

Defining 'accuracy'

Underestimated Accurate Overestimated
Number of phone calls (number of people) (number of people) (number of people)
High user 71% 10% 19%
Medium user 53% 21% 26%
Low user 33% 16% 51%
Duration of phone calls
High user 41% 20% 39%
Medium user 27% 17% 56%
Low user 13% 6% 81%

If people are bad at estimating their phone use, does this mean that people are bad at all self-reporting tasks?

Not surprisingly, it depends how long it’s been since the event they’re trying to remember. It also depends on other factors. Here are some factoids that should convince you to be careful with self-reported user data that you collect.

What’s the problem with self-reported data?

On questions that ask respondents to remember and count specific events, people frequently have trouble because their ability to recall is limited. Instead of answering “I’m not sure,” people typically use partial information from memory to construct or infer a number. In 1987, N.M. Bradburn et al found that U.S. respondents to various surveys had trouble answering such questions as:

  • During the last 2 weeks, on days when you drank liquor, about how many drinks did you have?
  • During the past 12 months, how many visits did you make to a dentist?
  • When did you last work at a full-time job?

To complicate matters, not all self-report data is suspect. Can you predict which data is likely to be accurate or inaccurate?

  • Self-reported Madagascar crayfish harvesting—quantities, effort, and harvesting locations—collected in interviews was shown reliable (2008, Julia P. G. Jones et al).
  • Self-reported eating behaviour by people with binge-eating disorders was shown “acceptably” reliable, especially for bulimic episodes (2001, Carlos M. Grilo et al).
  • Self-reported condom use was shown accurate over the medium term, but not in the short term or long term (1995, James Jaccard et al).
  • Self-reported numbers of sex partners were underreported and sexual experiences and condom use overreported a year later when compared to self-reported data at the time (2002, Maryanne Garry et al).
  • Self-reported questions about family background, such as father’s employment, result in “seriously biased” research findings in studies of social mobility in The Netherlands—by as much as 41% (2008, Jannes Vries and Paul M. Graaf).
  • Participation in a weekly worship service is overreported in U.S. polls. Polls say 40% but attendance data says 22% (2005, C. Kirk Hadaway and Penny Long Marler).
Can you improve self-reported data that you collect?

Yes, you can. Consider these:

  • Decomposition into categories. Estimates of credit-card spending gets more accurate if respondents are asked for separate estimates of their expenditures on, say, entertainment, clothing, travel, and so on (2002, J. Srivastava and P. Raghubir ).
  • For your quantitative or qualitative usability research or other user research, it’s easy to write your survey questions or your lines of inquiry so they ask for data in a decomposited form.

  • Real-time data collection. Collecting self-reported real-time data from patients in their natural environments “holds considerable promise” for reducing bias (2002, Michael R. Hufford and Saul Shiffman).
    Collecting real-time self-report data
  • This finding is from 2002. Social-media tools and handheld devices now make real-time data collection more affordable and less unnatural. For example, use text messages or Twitter to send reminders and receive immediate direct/private responses.

  • Fuzzy set collection methods. Fuzzy-set representations provide a more complete and detailed description of what participants recall about past drug use (2003, Georg E. Matt et al).
  • If you’re afraid of math but want to get into fuzzy sets, try a textbook (for example, Fuzzy set social science by Charles Ragin), audit a fuzzy-math course for social sciences (auditing is a low-stakes way to get things explained), or hire a tutor in math or sociology/anthropology to teach it to you.

Also, when there’s a lot at stake, use multiple data sources to examine the extent of self-report response bias, and to determine whether it varies as a function of respondent characteristics or assessment timing (2003, Frances K. Del Boca and Jack Darkes). Remember that your qualitative research is also one of those data sources.

User mismatch: discard data?

When you’re researching users, every once in a while you come across one that’s an anomaly. You must decide whether to exclude their data points in the set or whether to adjust your model of the users.

Let me tell you about one such user. I’ll call him Bob (not his real name). I met Bob during a day of back-to-back usability tests of a specialised software package. The software has two categories of users:

  • What type of user is this?Professionals who interpret data and use it in their design work.
  • Technicians who mainly enter and check data. A senior technician may do some of the work of a professional user.

When Bob walked in, I went through the usual routine: welcome; sign this disclaimer; tell me about the work you do. Bob’s initial answers identified him as a professional user. Once on the computer, though, Bob was unable to complete the first step of the test scenario. Rather than try to solve the problem, he sat back, folded his hands, and said: “I don’t know how to use this.” Since Bob was unwilling to try the software, I instead had a conversation (actually an unstructured interview) with him. Here’s what I learned:

My observation For this product, this is…
Bob received no formal product training. He was taught by his colleagues. Typical of more than half the professionals. 
Bob has a university degree that is only indirectly related to his job.  Atypical of professionals.
He’s young (graduated 3 years ago). Atypical of professionals, but desirable because many of his peers are expected to retire in under a decade.
Bob moved to his current town because his spouse got a job there. He would be unwilling to move to another town for work. Atypical. Professionals in this industry typically often work in remote locations for high pay.
•  Bob is risk averse. Typical of the professionals.
•  He is easily discouraged, and isn’t inclined to troubleshoot. Atypical. Professionals take responsibility for driving their troubleshooting needs.
Bob completes the same task once or several times a day, with updated data each time. Atypical of professionals. This is typical of technicians. 

I decided to discard Bob’s data from the set.

The last two observations are characteristic of a rote user. Some professionals are rote users because they don’t know the language of the user interface, but this did not apply to Bob. There was a clear mismatch between the work that Bob said he does and both his lack of curiosity and non-performance in the usability lab. These usability tests took place before the 2008 economic downturn, when professionals in Bob’s industry were hard to find, so I quietly wondered whether hiring Bob had been a desperation move on the part of his employer.

If Bob had been an new/emerging type of user, discarding Bob’s data would have been a mistake. Imagine if Bob had been part of a new group of users:

  • What would be the design implications?
  • What would be the business implications?
  • Would we need another user persona to represent users like Bob?