Sunday, October 27, 2013

New Posts on IonPsych - October 27, 2013

This fall, I am teaching a graduate seminar on speaking/writing for general audiences. As part of the class, students blog at www.ionpsych.comEach week, I'll provide a short summary of the latest posts.


The latest posts on www.ionpsych.com

Anna Popova discusses the value of negative experiences and imagination in decision making.

Jim Monti describes the best way to stave off the cognitive costs of aging.

Luis Flores describes a recent study of depression and argues that people with depression may enjoy activities just as much as those without depression, but they are less willing to work to experience those activities.

Christina Tworek muses on the causes of the large disparity between what scientists know and what the public knows of science.

Emily Hankosky critiques claims in a recent book espousing personal responsibility for overcoming addiction. She makes a case that discounting neurobiological bases of addiction is irresponsible.

Lindsey Hammerslag discusses the importance of teaching about certainty and uncertainty in science.

Sunday, October 13, 2013

New Posts on IonPsych - 13 October 2013

This fall, I am teaching a graduate seminar on speaking/writing for general audiences. As part of the class, students blog at www.ionpsych.comEach week, I'll provide a short summary of the latest posts.


The latest posts on www.ionpsych.com

H. A. Logis explores how we mold our behaviors to the people around us.

Brian Metzger demonstrates how perception often isn't subject to free will.

Emily Kim describes alternatives to academia for social psychologists.

Joachim Operskalski evaluates how new developments in neuroscience might improve treatment of psychiatric disorders and why we shouldn't be so quick to dismiss Prozac. 

Anna Popova explains how to make statistics interesting.

Sunday, October 6, 2013

New Posts on www.ionpsych.com (Oct 6, 2013)

This fall, I am teaching a graduate seminar on speaking/writing for general audiences. As part of the class, students blog at www.ionpsych.comEach week, I'll provide a short summary of the latest posts.


The latest posts on www.ionpsych.com

Jim Monti  discusses why Omega-3 fatty acids are important for your brain, and why diet matters.

Aldis Sipolins presents an alternative to random assignment when dealing with small samples.

Carolyn Hughes considers whether you really experience pain when you see someone else hurting. 

Monday, September 30, 2013

New Posts on IonPsych

This fall, I am teaching a graduate seminar on speaking/writing for general audiences. As part of the class, students blog at www.ionpsych.comEach week, I'll provide a short summary of the latest posts.


The latest posts on www.ionpsych.com

Carolyn Hughes  discusses the neural basis of first impressions. 

Luis Flores addresses the difference between worrying and generalized anxiety disorder

Lindsey Hammerslag considers whether scientists are sexist for failing to deal with monkey cramps. 

H. A. Logis revisits claims that bullying kills.

Christina Tworek examines why some people have a problem with boys wearing nail polish.

Monday, September 23, 2013

New posts on www.ionpsych.com

This fall, I am teaching a graduate seminar on speaking/writing for general audiences. As part of the class, students blog at www.ionpsych.com. Each week, I'll provide a short summary of the latest posts.


The latest posts on www.ionpsych.com

Anna Popova  explains why individual and group decisions are treated differently by researchers studying decision making (and why they shouldn't be).

Jim Monti examines how your diet might affect your risk of Alzheimer's disease.

Aldis Sipolins covers how you might improve your video game playing (and learning in general) by applying direct current to your head. 

Emily Hankosky describes a breakthrough study that might change how people have children in the not-too-distant future.

Judy Chiu argues that stress isn't inherently bad for you - what matters is how you think about stress.

Thursday, September 5, 2013

19 questions about video games, multitasking, and aging (a HI-BAR commentary on a new Nature paper)

HI-BAR (Had I Been A Reviewer)

A post-publication review of Anguera et al (2013). Video game training enhances cognitive control in older adults. Nature, 501, 97-101.

For more information about HI-BAR reviews, see my post from earlier today.


In a paper published this week in Nature, Anguera et al reported a study in which older adults were trained on a driving video game for 12 hours. Approximately 1/3 of the participants engaged in multitasking training (both driving and detecting signs), another 1/3 did the driving or sign tasks separately without having to do both at once, and the final 1/3 was a no-contact control. The key findings in the paper: 

  • After multitasking training, the seniors attained "levels beyond those achieved by untrained 20-year-old participants, with gains persisting for 6 months"
  • Multitasking training "resulted in performance benefits that extended to untrained cognitive control abilities"
  • Neural measures of midline-frontal theta power and frontal-parietal theta coherence correlated with these improvements
This is one of many recent papers touting the power of video games to improve cognition, published in a top journal, that receives glowing (almost breathless) media coverage: The NY Times reports "A Multitasking Video Game Makes Old Brains Act Younger." A story in Nature in nature claims "Gaming Improves Multitasking Skills." The Atlantic titles their story, "How to Rebuild an Attention Span." (Here's one exception that notes a few limitations). 

In several media quotes, the senior author on the paper (Gazzaley) admirably cautions against over-hyping of these findings (e.g., "Video games shouldn’t now be seen as a guaranteed panacea" in the Nature story). Yet overhyping is exactly what we have in the media coverage (and a bit in the paper as well).

The research is not bad. It's a reasonable, publishable first study that requires a bit more qualification and more limited conclusions: Some of the strongest claims are not justified, the methods and findings have limitations, and none of those are shortcomings are acknowledged or addressed. If you are a regular reader of this blog, you're familiar with the problems that plague most such studies. Unfortunately, it appears that the reviewers, editors, and authors did not address them.

In the spirit of Rolf Zwaan's recent "50 questions" post (although this paper is far stronger than the one he critiqued), here are 19 comments/questions about the paper and supplementary materials (in a somewhat arbitrary order). I hope that the authors can answer many of these questions by providing more information. Some might be harder to address. I would be happy to post their response here if they would like.

19 Questions and Comments 

1.  The sample size is small given the scope of the claims, averaging about 15 per group. That's worrisome -- it's too small a sample to be confident that random assignment compensates for important unknown differences between the groups.

2. The no-contact control group is of limited value. All it tells us is whether the training group improved more than would be expected from just re-taking the same tests. It's not an adequate control group to draw any conclusions about the effectiveness of training. It does nothing to control for motivation, placebo effects, differences in social contact, differences in computer experience, etc. Critically, the relative improvements due to multitasking training reported in the paper are consistently weaker (and fewer are statistically significant) when the comparison is to the active "single task" control group. According to Supplementary Table 2, out of the 11 reported outcome measures, the multitasking group improved more than the no-contact group on 5 of those measures, and they improved more than the single-task control on only 3. 

3. The dual-task element of multitasking is the mechanism that purportedly underlies transfer to the other cognitive tasks, and neither the active nor the no-contact control included that interference component. If neither control group had the active ingredient, why were the effects consistently weaker when the multitasking group was compared to the single task group than when compared to the control group? That suggests the possibility of a differential placebo effect: Regardless of whether or not the condition included the active ingredient, participants might improve because they expected to improve. 

4. The active control group is relatively good (compared to those often used in cognitive interventions) - it uses many of the same elements as the multitasking group and is fairly closely matched. But, the study included no checks for differential expectations between the two training groups. If participants expected greater improvements on some outcome measures from multitasking training than from single-task training, then some or all of the benefits for various outcome measures might have been due to expectations rather than to any benefits of dual-task training. For details, see our paper in Perspectives that discusses this pervasive problem with active control groups. If you want the shorter version, see my blog post about it. Just because a control group is active does not mean that it accounts for differential expectations across conditions.

5. The paper reports that improvements in the training task were maintained for 6 months. That's welcome information, but not particularly surprising (see #13 below). The critical question is whether the transfer effects were long-lasting. Were they? The paper doesn't say. If they weren't, then all we know is that subjects retained the skills they had practiced, and we know nothing about the long-term consequences of that learning for other cognitive skills.

6. According to Figure 9 in the supplement, 23% of the participants who enrolled in the study were dropped from the study/analyses (60 enrolled, 46 completed the study). Did drop out or exclusion differentially affect one group? If participants were dropped based on their performance, how were the cutoffs determined? Did the number of subjects excluded for each reason vary across groups? Are the results robust to different cut-offs? What are the implications for the broad use of this type of training if nearly a quarter of elderly adults cannot do the tasks adequately?

7. Supplemental Table 2 reports 3 significant outcome measures out of 11 tasks/measures (when comparing to the active control group). Many of those tasks include multiple measures and could have been analyzed differently. Consider also that each measure could be compared to either control group and that it also would have been noteworthy if the single task group had outperformed the no-contact group. That means there were a really large number of possible statistical tests that, if significant, could have been interpreted as supporting transfer of training. I see no evidence of correction for multiple tests. Only a handful of these many tests were significant, and most were just barely so (interaction of session x group was p=.08, p=.03, and p=.02). For the crucial comparison of the multitasking group to each control group, the table only reports a "yes" or "no" for statistically significant at .05, and they must be close to that boundary. (There also are oddities in the table, like a reported significant effect with d=.67, but a non-significant one with d=.68 for the same group comparison.) With correction for multiple comparisons, I'm guessing that none of these effects would reach statistical significance. A confirmatory replication with a larger sample would be needed to show that the few significant results (with small sample sizes) were not just false positives. 

8. The pattern of outcome measures is somewhat haphazard and inconsistent with the hypothesis that dual-task interference is the reason for differential improvements. For example, if the key ingredient in dual-task training is interference, why didn't multitasking training lead to differential improvement on the dual-task outcome measure? That lack of a critical finding is largely ignored. Similarly, why was there a benefit for the working memory task that didn't involve distraction/interference? Why wasn't there a difference in the visual short term memory task both with and without distraction? Why was there a benefit for the working memory task without distraction (basically a visual memory task) but not the visual memory task? The pattern of improvements seems inconsistent with the proposed mechanism for improvement. 
 
9. The study found that practice on a multitasking video game improves performance on that game to level of a college student. Does that mean that the game improved multitasking abilities to the level of a 20 year old? No, although you'd never know that from the media coverage. The actual finding is that after 12 hours of practice on a game, seniors play as well as a 20 year old who is playing the game for the first time. The finding does not show that multitasking training improved multitasking more broadly. In fact, it did not even transfer to a different dual task. Did they improve to the level of 20 year olds on any of the transfer tasks? That seems unlikely, but if they did, that would be bigger news.

10. The paper reports only difference scores and does not report any means or standard deviations. This information is essential to help the reader decide whether improvements were contaminated by regression to the mean or influenced by outliers. Perhaps the authors could upload the full data set to openscienceframework.org or another online repository to make those numbers available?

11. Why are there such large performance decreases in the no-contact group (Figures 3a and 3b of the main paper)? This is a slowing of 100ms, a pretty massive decline for just one month of aging. Most of the other data are presented as z-scores, so it's impossible to tell whether the reported interactions are driven by a performance decrease in one or both of the control groups rather than an improvement in the multitasking group. That's another reason why it's essential to report the performance means and standard deviations for both the pre-test and the post-test.

12. It seems a bit generous to claim (p.99) that, in addition to the significant differences on some outcome measures, there were trends for better performance in other tasks like the UFOV. Supplementary Figure 15 shows no difference in UFOV improvements between the multitasking group and the no-contact control. Moreover, because these figures show Z-scores, it's impossible to tell whether the single-task group is improving less or even showing worse performance. Again, we need the means for pre- and post-testing to evaluate the relative improvements.

13. Two of the core findings of this paper, that multitasking training can improve the performance of elderly subjects to the levels shown by younger subjects and that those improvements last for months, are not novel. In fact, they were demonstrated nearly 15 years ago in a paper that wasn't cited. Kramer et al (1999) found that giving older adults dual-task training led to substantial improvements on the task, reaching the levels of young adults after a small amount of training. Moreover, the benefits of that training lasted for two months. Here are the relevant bits from the Kramer et al abstract: 
Young and old adults were presented with rows of digits and were required to indicate whether the number of digits (element number task) or the value of the digits (digit value task) were greater than or less than five. Switch costs were assessed by subtracting the reaction times obtained on non-switch trials from trials following a task switch.... First, large age-related differences in switch costs were found early in practice. Second, and most surprising, after relatively modest amounts of practice old and young adults switch costs were equivalent. Older adults showed large practice effects on switch trials. Third, age-equivalent switch costs were maintained across a two month retention period. 

14. While we're on the subject of novelty, the authors state in their abstract: "These findings ... provide the first evidence, to our knowledge, of how a custom-designed video game can be used to assess cognitive abilities across the lifespan, evaluate underlying neural mechanisms, and serve as a powerful tool for cognitive enhancement." Unfortunately, they seem not to have consulted the extensive literature on the effects of training with the custom-made game Space Fortress. That game was designed by cognitive psychologists and neuroscientists in the 1980s to study different forms of training and to measure cognitive performance. It includes components designed to train and test memory, attention, motor control, etc. It has been used with young and old participants, and it has been studied using ERP, fMRI, and other measures. The task has been used to study cognitive training and transfer of training both to laboratory tasks and to real world performance. It has also been used to study different forms of training, some of which involve explicit multitasking and others that involve separating different task components. There are dozens of papers (perhaps more than 100) using that game to study cognitive abilities, training, and aging. Those earlier studies suffered from many of the same problems that most training interventions do, but they do address the same issues studied in this paper. The new game looks much better than Space Fortress, and undoubtably is more fun to play, but it's not novel in the way the authors claim.

15. Were the experimenters who conducted the cognitive testing blind to the condition assignment? That wasn't stated, and if they were not, then experimenter demands could contribute to differential improvements during the post-test. 

16. Could the differences between conditions be driven by differences in social contact and computer experience? The extended methods state, "if requested by the participant, a laboratory member would visit the participant in their home to help set up the computer and instruct training." How often was such assistance requested? Did the rates differ across groups? Later, the paper states, "All participants were contacted through email and/or phone calls on a weekly basis to encourage and discuss their training; similarly, in the event of any questions regarding the training procedures, participants were able to contact the research staff through phone and email." Presumably, the authors did not really mean "all participants." What reason would the no-contact group have to contact the experimenters, and why would the experimenters check in on their training progress? As noted earlier, differences like this are one reason why no-contact controls are entirely inadequate for exploring the necessary ingredients of a training improvement.

17. Most of the assessment tasks were computer based. Was there any control for prior computer experience or the amount of additional assistance each group needed? If not, the difference between these small samples might partly be driven by baseline differences in computer skills that were not equally distributed across conditions. The training tasks might also have trained the computer skills of the older participants or increased their comfort with computers. If so, improved computing skills might account for any differences in improvement between the training conditions and the no-contact control.

18. The paper states, "Given that there were no clear differences in sustained attention or working memory demands between MTT and STT, transfer of benefits to these untrained tasks must have resulted from challenges to overlapping cognitive control processes." Why are there no differences? Presumably maintaining both tasks in mind simultaneously in the multitasking condition places some demand on working memory. And, the need to devote attention to both tasks might place a greater demand on attention as well. Perhaps the differences aren't clear, but it seems like an unverified assumption that they tap these processes equally.

19. The paper reports a significant relationship between brain measures and TOVA improvement (p = .04).  The “statistical analyses” section reports that one participant was excluded for not showing the expected pattern of results after training (increased midline frontal theta power).  Is this a common practice? What is the p value of the correlation when this excluded participant is included?  Why aren’t correlations reported for the relationship between the transfer tasks and training performance or brain changes for the single-task control group? If the same relationships exist for that group, then that undermines the claim that multitask training is doing something special. The authors report that these relationships aren't significant, but the ones for the multitasking group are not highly significant either, and the presence of a significant relationship in one case and not in the other does not mean that the effects are reliably different for the two conditions. 


Conclusions

Is this a worthwhile study? Sure. Is it fundamentally flawed? Not really. Does it merit extensive media coverage due to it's importance and novelty? Probably not. Should seniors rush out to buy brain training games to overcome their real-world cognitive declines? Not if their decision is based on this study. Should we trust claims that such games might have therapeutic benefits? Not yet.

Even if we accept all of the findings of this paper as correct and replicable, nothing in this study shows that the game training will improve an older person's ability to function outside of the laboratory. Claims of meaningful benefits, either explicit or implied, should be withheld until demonstrations show improvements on real or simulated real-world tasks as well. 

This is a good first study of the topic, and it provides a new and potentially useful way to measure and train multitasking, but it doesn't merit quite the exuberance displayed in media coverage of it. If I were to rewrite the abstract to reflect what the study actually showed, it might sound something like this:
In three studies, we validated a new measure of multitasking, an engaging video game, by replicating prior evidence that multitasking declines linearly with age. Consistent with earlier evidence, we find that 12 hours of practice with multitasking leads to substantial performance gains for older participants, bringing their performance to levels comparable to those of 20-year-old subjects performing the task for the first time. And, they remained better at the task even after 6 months. The multitasking improvements were accompanied by changes to theta activity in EEG measures. Furthermore, an exploratory analysis showed that multitasking training led to greater improvements than did an active control condition for a subset of the tasks in a cognitive battery. However, the pattern of improvements on these transfer tasks was not entirely consistent with what we might expect from multitasking training, and the active control condition did not necessarily induce the same expectations for improvement. Future confirmatory research with a larger sample size and checks for differential expectations is needed to confirm that training enhances performance on other tasks before strong claims about the benefits of training are merited. The video game multitasking training we developed may prove to be a more enjoyable way to measure and train multitasking in the elderly.

HI-BAR (Had I Been A Reviewer)

HI-BAR
Had I Been a Reviewer

If you're a researcher, you undoubtedly have had the experience of reading a new paper in your specialty area and thinking to yourself, "Had I been a reviewer, I would have raised serious concerns about these findings and claims." Or, less charitably, you might ask, "How the hell did that paper survive peer review?"

Each paper is reviewed by only 2 or 3 people, and small samples can lead to flawed conclusions. Given that I can't insert myself into the review process in advance of publication, I will, on occasion, use my blog to post the sorts of comments I would have made Had I Been A Reviewer of the original manuscript. My comments won't always take the same form that they would have if I had reviewed the paper in advance of publication when constructive comments might improve a manuscript. Rather, they will comment on the strengths and shortcomings of the finished product. On occasion, when I have reviewed a manuscript and the paper was published without addressing major concerns, I might post the reviews here as well (I always sign my reviews, so they won't come as a surprise to the authors in such cases).

Not all of these posts will be take-downs or critiques, although some will be. Post-publication review can help to correct mistakes in the literature, and it also identifies controversies that might have been glossed over in a manuscript media coverage of it and it can inspire future research. I hope that more researchers will take up the call and post their own HI-BAR post-publication reviews.

Monday, August 12, 2013

Good resources for science writing/speaking?

For a psychology graduate class I'm teaching this fall (on speaking/writing for general audiences), I'm trying to create a list of good resources on writing, speaking, and blogging about science. I'm hoping that you can help. 

I'm particularly interested in finding good discussions of the value and risks of blogging, suggestions for best practices in writing/speaking, etc. Do you have a favorite go-to source for such advice? Do you know of helpful resources for beginning science writers and speakers? If so, please leave them in the comments (or send them to me directly). I'll compile the full list and will post it here.

Wednesday, August 7, 2013

Stop the presses

Yesterday I encountered something I've never seen before: a formal press release from an academic society (SPSP) about a conference presentation of unpublished research:
http://www.eurekalert.org/pub_releases/2013-08/sfpa-vgb080213.php
A friend of mine forwarded it to me because it makes claims about the cognitive benefits of video game training, an area fraught with methodological problems that my colleagues and I have written about extensively (e.g, here's a recent blog post about a recent critique of such interventions). My guess is that the design shortcomings we discussed in that paper undermine the claims that these authors are making. But, I have no way to know. The actual research isn't available.

 Why does this work merit a press release now, before the research has been published? 

 The purpose of a press release is to draw public (and media) attention to a new finding, but in this case, the press release effectively is the finding because nobody can access the actual research. Journalists or science writers covering this study will have no more information than is available in the release itself, so they cannot verify that the research actually shows what the release claims that it does. In other words, the press release encourages churnalism rather than science reporting.

 In my view, academic societies should not be encouraging media coverage of research until the actual research is available for popular consumption. Doing so risks misleading the public. For this particular release, if the studies suffer from the problems we discussed in our recent article, then the conclusions might be unjustified and there would be a reasonable chance that the research would not survive the peer review process (I can only hope that reviewers would nix publication if the claims aren't justified). If that happened, then the press release would have hyped vapor-findings, claims that lack any underlying support. How does that benefit the popular appraisal of our field?

 Journalists and bloggers are free to discuss research they learn about at conferences, of course. And they typically do a good job in noting when findings are tentative (or giving enough details that others can evaluate the claims). But a formal press release from an academic society about unpublished research that is not available seems to me to be a different beast.

 Are there cases in which an academic society should issue a press release based on a conference presentation? Do you think this sort of press release is acceptable? I'd be curious to hear the perspectives of other scientists and science writers. Let me know what you think?

Tuesday, July 9, 2013

Pop Quiz - What can we learn from an intervention study?

Pop Quiz

1. Why is a double-blind, placebo-controlled study with random assignment to conditions the gold standard for testing the effectiveness of a treatment?

2. If participants are not blind to their condition and know the nature of their treatment, what problems does that lack of control introduce?

3. Would you use a drug if the only study showing that it was effective used a design in which those people who were given the drug knew that they were taking the treatment and those who were not given the drug knew they were not receiving the treatment? If not, why not?

Stop reading now, and think about your answers. 


Most people who have taken a research methods class (or introductory psychology) will be able to answer all three. The gold standard controls for participant and experimenter expectations and helps to control for unwanted variation between the people in each group. If participants know their treatment, then their beliefs and expectations might affect the outcome. I would hope that you wouldn't trust a drug tested without a double-blind design. Without such a design, any improvement by the treatment group need not have resulted from the drug.

In a paper out today in Perspectives on Psychological Science, my colleagues (Walter Boot, Cary Stothart, and Cassie Stutts) and I note that psychology interventions typically cannot blind participants to the nature of the intervention—you know what's in your "pill." If you spend 30 hours playing an action video game, you know which game you're playing. If you are receiving treatment for depression, you know what is involved in your treatment. Such studies almost never confront the issues introduced by the lack of blinding to conditions, and most make claims about the effectiveness of their interventions when the design does not permit that sort of inference. Here is the problem:
If participants know the treatment they are receiving, they may form expectations about how that treatment will affect their performance on the outcome measures. And, participants in the control condition might form different expectations. If so, any difference between the two groups might result from the consequences of those expectations (e.g., arousal, motivation, demand characteristics, etc.) rather than from the treatment itself.
A truly double blind design addresses that problem—if people don't know whether they are receiving the treatment or the placebo, their expectations won't differ. Without a double blind design, researchers have an obligation to use other means to control for differential expectations. If they don't, then a bigger improvement in the treatment group tells you nothing conclusive about the effectiveness of the treatment. Any improvement could be due to the treatment, to different expectations, or to some combination of the two. No causal claims about the effectiveness of the treatment are justified.

If we wouldn't trust the effectiveness of a new drug when the only study testing it lacked a control for placebo effects, why should we believe a psychology intervention if it lacked any controls for differential expectations? Yet, almost all published psychology interventions attribute causal potency to interventions that lack such controls. Authors seem to ignore this known problem, reviewers don't block publication of such papers, and editors don't reject them.

Most psychology interventions have deeper problems than just a lack of controls for differential expectations. Many do not include a control group that is matched to the treatment group on everything other than the hypothesized critical ingredient of the treatment. Without such matching, any difference between the tasks could contribute to the difference performance. Some psychology interventions use completely different control tasks (e.g., crosswords puzzles as a control for working memory training, educational DVDs a control for auditory memory training, etc). Even worse, some do not even use an active control group, instead comparing performance to a "no-contact" control group that just takes a pre-test and a post-test. Worst of all, some studies use a wait-list control group that doesn't even complete the outcome measures before and after the intervention.

In my view, a psychology intervention that uses a waitlist or no-contact control should not be published. Period. Reviewers and editors should reject it without further consideration -- it tells us almost nothing about whether the treatment had any effect, and is just a pilot study (and a weak one at that). 

Studies with active control groups that are not matched to the treatment intervention should be viewed as suspect—we have no idea what differences between the treatment and control condition were necessary. Even closely matched control groups do not permit causal claims if the study did nothing to check for differential expectations.

To make it easy to understand these shortcomings, here is a flow chart from our paper that illustrates when causal conclusions are merited and what we can learn from studies with weaker control conditions (short answer -- not much):

Figure illustrating the appropriate conclusions as a function of the intervention design




































Almost no psychology interventions even fall into that lower-right box, but almost all of them make causal claims anyway. That needs to stop.



If you want to read more, check out our OpenScienceFramework Page for this paper/project. It includes an answers to a set of Frequent Questions.

Tuesday, July 2, 2013

Six simple steps scientists can take to avoid having their work misrepresented

Writing a journal article? Here are six simple steps you can take to avoid having your claims misinterpreted and misrepresented. None of these steps require any special analyses or changes to your lab practices. They are steps you should take when writing about your findings. I haven't always followed these steps in my own articles, but I will be in the future (whenever I have final say on a paper or can convince my collaborators).

  1. Do not speculate in your abstract. Abstracts are the place to report what you did, why you did it, and what you found. It is fine to report any conclusion that follows directly from your data. But, you should not use the abstract as a place to make claims that exceed your evidence. For example, even if you think your findings with undergrads in your laboratory might be relevant for a better understanding of autism, your abstract should not mention autism unless you actually studied it. Readers of your abstract (often the only thing people read) will assume that what you said is what you found, and media reports will focus on that your speculation rather than your findings.
  2. Separate planned and exploratory analyses and label them. If you registered your analysis plan and stick to it, you can mark those analyses as planned, documenting that you are testing what you originally intended to test. It is fine to explore your data fully, but you should flag any unplanned analyses as exploratory and note explicitly that they require replication and verification. Your exploratory analyses should be treated as speculative rather than definitive tests.
  3. Combine results and discussion sections. Justify each analysis and explain what it shows in the same place in your manuscript. If you separate your analyses and explanations, non-expert readers will skip your evidence and focus on your conclusions. By combining them, you allow the reader to better evaluate the link between your evidence and your conclusions.
  4. Add a caveats and limitations section. In your general discussion, you should add a description of any limitations of your study. That includes shortcomings of the method, but also limitations to the generalizability of your sample, effects in need of replication, etc. If your effects are small, you should note if and how that limits their practical implications. By identifying limitations and caveats in your paper, your readers will better understanding what your findings do and do not show.
  5. Specify the limits of generalization. Few papers do this, but all of them should. Most papers in psychology test undergraduates and then make claims as if they apply to all of humanity. Perhaps they do, but any generalization beyond the tested population should be justified. If you tested undergraduates and expect your studies to generalize to similar undergraduate populations, you should say so. If you think they also will generalize to the elderly or to children, you should say so and explain why. Spell out the characteristics of your sample that you think are essential to obtain your effect. Specifying generalization has benefits. First, it lets readers know the scope of your effects and helps them to predict whether they could obtain the same result with their own available population. Second, it clarifies the importance of your findings. If you expect that your effects are limited to subjects at your university in December of 2012 and won't generalize to other times or places, then it is less clear that anyone should care. Third, by specifying your generalization, you are making a more precise claim about your effect that others can then test. If you claim your effect should generalize to all undergraduates, then anyone testing undergraduates should be able to find it (assuming adequate statistical power), and if they can't, that undermines your claim. If you restrict generalization too much to protect yourself against challenges, then others will have no reason to bother testing your effect. Perhaps most importantly, if you appropriately limit your generalization in the paper itself, then media coverage will be less likely generalize your claims beyond what you actually intended.
  6. Flag speculation as speculation. If you must discuss implications that go beyond what your data show, explicitly flag those conclusions as speculative and note that they are not supported by your study. By calling speculation what it is, you avoid having others assume that your wildest and most provocative ideas are evidence-based. Speculation is okay as long as everyone reading your paper knows what it is.

Bonus suggestion: If you have a multiple-author paper, the Acknowledgements or Author's Note should specify each author's contributions clearly and completely. By doing so, you assign both credit and blame where it is deserved. For example, when I collaborate on a neuroimaging project, I make clear that I had nothing to do with any of the imaging data collection, coding, or analysis. I should get no credit for that part of a study (given that I know nothing about imaging), but I also should take no blame for any missteps in that part of the project.

Thursday, June 6, 2013

When beliefs and actions are misaligned - the case of distracted driving


Originally posted to the invisiblegorilla blog on 22 December 2010. I am gradually reposting all of my earlier blog posts from other sites onto my personal website where I will be blogging for the foreseeable future. The post is unedited from the 2010 version.

During the summer of 2010, the California Office of Traffic Safety conducted a survey of 1671 drivers at gas stations throughout California. The survey asked drivers about their own driving behavior and perceptions of driving risks. Earlier this year I posted about the apparent contradiction between what we know and what we do—people continue to talk and text while driving despite awareness of the dangers. The California survey results (pdf) reinforce that conclusion.
59.5% of respondents listed talking on a phone (hand held or hands free) as the most serious distraction for drivers. In fact, 45.8% of respondents admitted to making a mistake while driving and talking on a phone, and 54.6 claimed to have been hit or almost hit by someone talking on a phone. People are increasingly aware of the dangers. As David Strayer has shown, talking on a phone while driving is roughly comparable to driving under the influence of alcohol (pdf). Yet, people continue to talk on the phone while driving.
Unlike some earlier surveys that only asked general questions about phone use, this one asked how often the respondents talked on a phone in the past 30 days. 14.0% report regularly talking on a hand-held phone (now illegal) and another 29.4% report regularly talking on a hands-free phone. Fewer than 50% report never talking on a hands free phone while driving (and only 52.8% report never talking on hand-held phones). People know that they are doing something dangerous, but they do it anyway (at least sometimes).
Fewer people report texting while driving than talking while driving: 9.4% do so regularly, 10.4% do so sometimes, and another 10.6% do so rarely. In other words, more than 30% of subjects still text while driving, at least on occasion, even though texting is much more distracting than talking and is substantially worse than driving under the influence.
68% of respondents thought that a hands-free conversation is safer than a hand-held one, a mistaken but unfortunately common belief. The misconception is understandable given that almost all laws regulating cell phones while driving focus on hand-held phones. The research consistently shows little if any benefit from using a hands-free phone—the distraction is in your head, not your hands.
Fortunately, there is hope that education (and perhaps regulation) can help. The extensive education campaigns about mandatory seatbelt use and the dangers of drunk driving have had an effect over the years: 95.8% report always using a seat belt, and only 1% report never wearing a seatbelt. Only 5.9% reported having driven when they thought they had already had too much alcohol to drive safely.
Sources cited:
Strayer, D., Drews, F., & Crouch, D. (2006). A Comparison of the Cell Phone Driver and the Drunk Driver Human Factors: The Journal of the Human Factors and Ergonomics Society, 48 (2), 381-391 DOI: 10.1518/001872006777724471

Wednesday, June 5, 2013

Continuing the "diablog" with Rolf Zwaan -- still more thoughts

+Rolf Zwaan just continued our "diablog" (love that term) on reliability and replication. (Rolf -- sorry for slightly misrepresenting your conclusion in my last post, and thanks for clarifying.) At this point, I think we're in complete agreement on pretty much everything in our discussion. I thought I'd comment on one suggestion in his post that was first raised by +Etienne LeBel in the comments on Rolf's first post and that Rolf discussed in his most recent post.

The idea of permitting additional between subjects conditions on registered replication reports is an interesting one. As Rolf notes, that won't work for within-subject designs as the new conditions would potentially affect the measurement of the original conditions. I have several concerns about permitting additional conditions for registered replication reports at Perspectives, but I don't think any of them necessarily precludes additional conditions. It's something the other editors and I will need to discuss more. Here are the primary issues as I see them:  


  • The inclusion of additional conditions should not diminish the sample size for the primary conditions. Otherwise, it would lead to a noisier effect size estimate for the crucial conditions, undermining the primary purpose of the replication reports. Given subject pool constraints and our desire to measure the crucial effects with a maximimal sample size, that could be a problem, particularly at smaller schools.
  • The additional condition must in no way affect the measurements in the primary condition. That is, subjects in the primary conditions could not be aware of the existence of an additional condition. Some measures would need to be taken to avoid any interactions among subjects. That's already something we account for in most designs, so I don't see this as a major impediment.
  • The additional conditions could not be reported alongside the primary analyses in the printed journal article. The issue here is that we want the final published article to report the same measures and tests for each individual replication attempt. Otherwise, the final report will become unwieldy, with each of the many participating labs reporting different analyses. That would hinder the ability of readers to assess the strength of the primary effect under study.
If we do decide to permit additional between-subjects conditions, analyses of those conditions could be reported on the OSF project pages created for each participating lab. There are no page limits for those project pages, and each lab could discuss their additional conditions more fully. I will make sure the other editors (+Alex Holcombe and +Bobbie Spellman) and I discuss this possibility.




The Value of Pre-Registration - comments on a letter in the Guardian

The Guardian just published a great letter, signed by many of the leaders of our field, calling for pre-registration as a way to improve our science. I imagine they would have had many more signatories if the authors had put out a more public call. I'll add my virtual signature here. If you agree with the letter, please make sure your colleagues see it and add your virtual signature as well.

I think pre-registration is the way forward. I hadn't pre-registered my studies before this past year, but I've started doing that for all of the studies for which I have direct input into the management of the study. I hope more journals will begin to conduct the review process before the data are in, vetting the method and analysis plan and then publishing the results regardless of the outcome. But even if they don't, pre-registration is the one way to demonstrate that the planned analyses weren't p-hacked. My bet is that, as the ratio of pre-registered to not-pre-registered studies in our journals grows, researchers will begin to look askance at studies that were not pre-registered. The incentive to pre-register will increase as a result, and that's a good thing.


Even if journals don't accept studies before data collection, pre-registration helps to certify that the research showed what it claimed to show. And, pre-registration does not preclude exploratory analyses. They can just be flagged as such in the final article, and readers will know to treat this explorations as preliminary and speculative, requiring further verification. I personally favor having two labeled headings in every results section, one for planned analyses and one for exploratory analyses. Even without pre-registration, that's a good approach. But pre-registration certifies the planned ones.

It's easy to pre-register your results and post your data publicly. You can do that with a free account at OpenScienceFramework.org.




update: fixed formatting errors.