Unreliability of self-reported user data

Many people are bad at estimating how often and how long they’re on the phone. Interestingly, you can predict who will overestimate and who will underestimate their phone usage, according to the 2009 study, “Factors influencing self-report of mobile phone use” by Dr Lada Timotijevic et al. For this study, a self-reported estimate is considered  accurate if it is within 10% of the actual number:

Defining 'accuracy'

Underestimated Accurate Overestimated
Number of phone calls (number of people) (number of people) (number of people)
High user 71% 10% 19%
Medium user 53% 21% 26%
Low user 33% 16% 51%
Duration of phone calls
High user 41% 20% 39%
Medium user 27% 17% 56%
Low user 13% 6% 81%

If people are bad at estimating their phone use, does this mean that people are bad at all self-reporting tasks?

Not surprisingly, it depends how long it’s been since the event they’re trying to remember. It also depends on other factors. Here are some factoids that should convince you to be careful with self-reported user data that you collect.

What’s the problem with self-reported data?

On questions that ask respondents to remember and count specific events, people frequently have trouble because their ability to recall is limited. Instead of answering “I’m not sure,” people typically use partial information from memory to construct or infer a number. In 1987, N.M. Bradburn et al found that U.S. respondents to various surveys had trouble answering such questions as:

  • During the last 2 weeks, on days when you drank liquor, about how many drinks did you have?
  • During the past 12 months, how many visits did you make to a dentist?
  • When did you last work at a full-time job?

To complicate matters, not all self-report data is suspect. Can you predict which data is likely to be accurate or inaccurate?

  • Self-reported Madagascar crayfish harvesting—quantities, effort, and harvesting locations—collected in interviews was shown reliable (2008, Julia P. G. Jones et al).
  • Self-reported eating behaviour by people with binge-eating disorders was shown “acceptably” reliable, especially for bulimic episodes (2001, Carlos M. Grilo et al).
  • Self-reported condom use was shown accurate over the medium term, but not in the short term or long term (1995, James Jaccard et al).
  • Self-reported numbers of sex partners were underreported and sexual experiences and condom use overreported a year later when compared to self-reported data at the time (2002, Maryanne Garry et al).
  • Self-reported questions about family background, such as father’s employment, result in “seriously biased” research findings in studies of social mobility in The Netherlands—by as much as 41% (2008, Jannes Vries and Paul M. Graaf).
  • Participation in a weekly worship service is overreported in U.S. polls. Polls say 40% but attendance data says 22% (2005, C. Kirk Hadaway and Penny Long Marler).
Can you improve self-reported data that you collect?

Yes, you can. Consider these:

  • Decomposition into categories. Estimates of credit-card spending gets more accurate if respondents are asked for separate estimates of their expenditures on, say, entertainment, clothing, travel, and so on (2002, J. Srivastava and P. Raghubir ).
  • For your quantitative or qualitative usability research or other user research, it’s easy to write your survey questions or your lines of inquiry so they ask for data in a decomposited form.

  • Real-time data collection. Collecting self-reported real-time data from patients in their natural environments “holds considerable promise” for reducing bias (2002, Michael R. Hufford and Saul Shiffman).
    Collecting real-time self-report data
  • This finding is from 2002. Social-media tools and handheld devices now make real-time data collection more affordable and less unnatural. For example, use text messages or Twitter to send reminders and receive immediate direct/private responses.

  • Fuzzy set collection methods. Fuzzy-set representations provide a more complete and detailed description of what participants recall about past drug use (2003, Georg E. Matt et al).
  • If you’re afraid of math but want to get into fuzzy sets, try a textbook (for example, Fuzzy set social science by Charles Ragin), audit a fuzzy-math course for social sciences (auditing is a low-stakes way to get things explained), or hire a tutor in math or sociology/anthropology to teach it to you.

Also, when there’s a lot at stake, use multiple data sources to examine the extent of self-report response bias, and to determine whether it varies as a function of respondent characteristics or assessment timing (2003, Frances K. Del Boca and Jack Darkes). Remember that your qualitative research is also one of those data sources.

One Reply to “Unreliability of self-reported user data”

  1. Jerome,

    It’s interesting that as compared to the group, a high user is the worst reporter of frequency but the best reporter of duration. The number of calls helps them more accurately report duration, but that they get lots (more than the magic 7, say) may mean they can’t visualize each instance so counting is not possible.

    This is useful for me. In my business I think I can say “People who get a lot of phone calls underestimate (71%)
    how many they actually get – the same may be true for emails.” 😉


Comments are closed.