Monday, April 14, 2014

SATA-gate: How One Woman’s Inflated Statistics and Embellished Recollections Made Her Famous

Numbers, not just words, tell stories. As with words, the arrangement and selection of numbers determine both the accuracy and the perspective of the story. Readers, therefore, must be discerning when encountering a story told with numbers, especially when the source of that story makes claims like, “My data is 100% correct.

When I was in middle school, I wanted to be a rapper. Reflecting on that time of my life, I cannot help but laugh at my puerile imagination. The self-aggrandizing stage name I had selected for myself was Daddy Smooth, and I still, in jest, occasionally refer to myself as such when going out for Karaoke with my friends.

Although I gave up my fantasy of being a rapper a long time ago, I am still a fan of hip-hop. One of my favorite hip hop artists is Yasiin Bey, formerly known as Mos Def. On his 1999 album Black on Both Sides, he has a song called “Mathematics,” which I have been listening to and thinking about continually over the past three months. A paradoxical lyric from the song goes, “You wanna know how to rhyme / You better learn how to add / It’s mathematics.”

Without hearing the rest of the song or being familiar with Bey’s oeuvre, one would likely wonder how rhyming (rapping) and mathematics have anything to do each other. Unlike many of the more commercially successful rap songs, Bey’s hip hop reflects the lyricist’s heightened social consciousness and penchant for social commentary. Bey understands that the stories we tell ourselves as a society are often stories told with numbers. Hence, if our numbers are not right, our stories are likely not right, either.

Mathematics from Mário on Vimeo.

Since early January, a story has been circulating the media about UNC athletes and athletics. The story, first told by Mary Willingham on CNN, is that 70% of a large sample of UNC athletes—consisting mostly of football and men’s basketball players—read below a high school level, and of that 70% at least 10% read at an elementary level.

Willingham’s story is obviously one based on numbers. Since its original publication on CNN, it has also been propagated by BusinessWeek, CBS Sports, ESPN, HBO, HuffPost, Slate, The Chronicle of Higher Education, and countless local media outlets around the country. It is a story, Willingham and the media have been proclaiming, that exposes the descent of college sports into moral degeneracy. It is a story, Paul Barrett of Businessweek continually pontificates, that reveals the extent to which UNC has “failed” its athletes. It is a story O’Bannon attorney Michael Hausfeld hoped would demonstrate that college athletes do not receive a “real” education.

On Friday, however, we learned Willingham’s story is one predicated on numbers as fictitious as the monsters of the Gévaudan.

Intimations of a Scandal

Almost exactly a year ago, on the morning of April 18, I was getting ready to leave my office for the College Sport Research Institute (CSRI) Conference here at UNC. As I was browsing the conference program on my computer, I noticed a listing for a poster presentation titled, “The Incidence of Learning Disabilities and ADHD in First Year NCAA DI College Athletes.” The presenters were Mary Willingham and Dr. Lyn Johnson, the psychologist who, for several years, had conducted psycho-educational testing for UNC student-athletes. Before leaving for the conference, I brought the poster presentation to the attention of some colleagues and asked whether anyone had been informed Willingham and Johnson would be presenting. No one had heard anything about it.

Technically, Willingham and Johnson would have had no obligation to contact us or to obtain consent from the students if the data on which the research was based had been secondary (de-identified) data generated from Johnson’s assessments. However, from the poster presentation’s title, I could not apprehend whether they would have used primary or secondary data, and I was concerned.

Later that day, I attended the poster presentation (see the pictures below). Without taking the time to critique the study’s design and conclusions, for now I only want to highlight one aspect of the poster that caught my attention from an ethical standpoint: the inclusion of aggregate ACT/SAT scores. Although the way those aggregate scores were reported did not categorically indicate they were gathered from primary data, it raised suspicion nonetheless. Specifically, if Johnson had turned over primary data to Willingham, who had not worked with athletes since 2010, Johnson had unethically breached the athletes’ confidentiality. Since Willingham stopped working with athletes, she would have had no right to access their primary data without their consent.

Some of my colleagues also attended the poster presentation and concurrently had similar concerns. Following the conference, we notified our superiors about the potential breach of confidentiality. We were assured our concerns would be taken seriously, and some time later we learned our office would no longer be utilizing Johnson’s services. I was not informed of the specific reason, and I did not ask. Such contractual matters are outside the purview of my job, and I presumed the concerns my colleagues and I had shared were indeed investigated.

Before I flash forward several months, I want to make two points related to Willingham and Johnson’s poster presentation. First, just last Friday, the same day the university published the external review of Willingham’s data, Dan Kane published an article in the N&O about Johnson’s contract being canceled last year. Kane wrote, “In July after athletic officials learned her data was being used in research that suggested the university was admitting athletes with poor literacy skills, they pulled the plug on her $50,000 annual contract.” From what I understand of the situation, Kane’s statement is not accurate. To my knowledge, no one in my office or in Athletics knew at that time Willingham and Johnson were analyzing data on literacy skills or athletes’ preparedness for college. Their poster presentation was exclusively about learning disabilities and ADHD, not reading levels, as the pictures above demonstrate. The concern my colleagues and I shared was strictly about the ethics of the study, not about its findings regarding reading levels. Again, such findings were not included in the poster presentation and were not revealed until several months later, when the CNN report was published.

Kane’s insinuation that Athletics officials were trying to silence Willingham and Johnson’s research was thus likely based on his misunderstanding of the poster presentation. However, before blaming Kane entirely, consider that Willingham may have misled him about both the study and the circumstances surrounding Johnson’s contract being canceled. I do not recall seeing Kane at the conference, and so his understanding of the study is likely based exclusively on Willingham’s account. His article about Johnson was published the same day the external review was released, suggesting Willingham may have fed him information in an effort to preemptively discredit the review (more on that later in the essay).

The second point related to the poster presentation is the number of athletes in the sample: 117. One of the pictures above shows the part of the poster that lists the numbers and demographics of the athletes in the study. The important number is 117, the total sample size. I will come back to that later.

Flash forward five months to September, when Willingham's Paper Class Inc. co-conspirator Jay Smith and I engaged in a heated email exchange initiated by his attack on my credibility. As I explained in my essay “Silent Dishonesty,” in that exchange I shared my concern that Willingham and Johnson had breached the athletes’ confidentiality for the study they presented at the CSRI conference. Smith responded as follows:

The data she has collected is secondary data. According to IRB protocols, such data requires no explicit consent, since aggregated data generated by the operations of the institution are not considered data from “human subjects.” This is a non-issue. You have decided to make it one, for some reason, but it is a non-issue.

At the time, I had no reason to suspect Smith would be dishonest, and I accepted his response. Months later, however, Willingham’s admission she had been monitoring the athletes’ grades revealed Smith had not been truthful.

Inflated Statistics

The next part of the story marks its preliminary descent down the rabbit hole of the absurd. Within a month of my exchange with Smith, my director gave me a copy of an email a faculty member had forwarded to her from Willingham. My director asked me to examine the email, evaluate Willingham’s claims to the best of my understanding, and then report back to her at my earliest convenience. Willingham’s claims in that email were the same she would eventually make to CNN: according to her research, 60% of a sample of athletes read between 4th and 8th grade levels, and another 10% read below a 4th grade level.

My job responsibilities do not include maintaining the records of student-athletes’ psycho-educational assessments and evaluations. However, I have both consent and a legitimate educational interest in accessing those records, and I review new students’ records at the beginning of every year so that I can plan students’ educational services accordingly. Those records, and the data contained therein, come from the psychologist who conducted the assessments and evaluations, the same Lyn Johnson whom Dan Kane wrote about.

The scope of Willingham’s “findings” extend beyond football players (the only athletes with whom I work) and also include five years’ worth of data from football players who attended UNC before I started. Therefore, I could only comment with any level of certainty on the data I had seen, but based on that data, I was skeptical of Willingham’s claims.

Colleagues whom I respect and trust volunteered to conduct a basic review of all the data from Johnson and compare it against Willingham’s findings. Because Johnson had very thoroughly reported each student's scores and interpretations of the SATA subtests, my colleagues' task of generating descriptive statistics, though tedious, was as simple as adding up numbers and calculating percentages. My colleagues then, like the independent reviewers whose conclusions were published Friday, found no basis for Willingham’s claims.

I do not know whether my superiors responded to Willingham’s email after my colleagues determined the error of Willingham’s claims, but I was not concerned because none of us could have imagined she would take such baseless findings to the national media.

CNN's report on January 7 was thus as much of a shock to my office as to the rest of the campus community. I was befuddled with conflicting feelings of incredulity and intrigue. Willingham, I thought, must have used a means of assessing reading levels other than the SATA. I sent her an email of support and asked her to email me back. My hope was that she would respond, so that I could inquire what assessment she had used and uncover whether she had administered it in a way that would protect students’ anonymity.

However, two days later she had not responded, and I went back to the CNN report to watch the video segment that accompanied the article. In an interview with Sara Ganim, Willingham compared the intellectual abilities of college athletes with the abilities of elementary students, and I was appalled. No educator with integrity would make such an analogy, especially on national TV.

That same day, a colleague informed me Willingham had posted on her blog the aggregate grades for a “cohort” or football players who had played in the recent bowl game. Knowing Willingham’s current position did not give her a legitimate educational interest in accessing athletes’ records, my previously held suspicions about the ethics of her research were revived. Thus, I sent her an email expressing my concern and stating that I hoped she had been following FERPA guidelines. She never responded.

After Willingham indicated, to my shock, that her data did in fact come from the SATA results, the Provost wanted another, more systematic review of the assessment data from Johnson, in addition to the basic review my colleagues had conducted just months prior. I was not involved this time either, but I know the Provost himself secured a copy of the SATA manual and participated in the review. Again, not surprisingly, he and the other reviewers found that the data from Johnson did not support Willingham’s claims.

We all felt bamboozled. Not only were her statistics skewed, the sample size was a mystery, too. Remember the sample size from Willingham and Johnson’s poster presentation: 117. Yet the sample size Willingham reported to CNN was 183. If the findings she reported to CNN were based on her research with Johnson, then why did the sample sizes conflict? Willingham’s findings did not make sense to any of us who knew what was happening.

At some point during the first few weeks after the CNN report was published, I felt the experience go from bizarre to surreal. I felt as if I were inhabiting a Salvador Dali painting, trapped in the liminal space between dreams and reality, where the physical world and the imagined world are indistinguishable. How could a person take such obviously fabricated data to a national news source and insist she was telling the truth? My naïve view of human nature could not account for such beguiling behavior.

The Persistence of Memory, Salvador Dali, MOMA New York

Of course, anyone who has followed this story knows Willingham did more than just insist she was telling the truth, even after others thoroughly refuted her. On January 16, the IRB (as part of the university’s research office) ordered Willingham to halt her research because, contrary to her statements on her research application, she was indeed conducting research on primary data. Then, the following day, the Provost thoroughly disproved her findings at a Faculty Council Meeting. Yet she responded with the brazen pronouncement, “My data is 100% accurate.” Moreover, she charged that the university was conspiring to “squash” her research, and she referred to the Provost as a “void.”

Perhaps Willingham’s most peculiar defense of her findings was her spurious attempt to make university officials seem as if they were hiding something: “They have all the data and more. It belongs to them, and they paid a lot of money for it.” Yes, we do have the data—all of it, directly from Johnson. Several people reviewed the data, finding that it does not support Willingham’s claims.

Puffery Par Excellence

For me, the most disorientating aspect of this experience was observing how uncritically the national media promulgated Willingham’s claims, willfully ignoring the Provost’s repudiation of the findings. No journalist expressed even the slightest skepticism about Willingham’s findings or her credibility. For example, Robin Wilson, a senior writer for The Chronicle of Higher Education, published a puff piece on Willingham the week after the Faculty Council meeting, in which Wilson’s only two attempts to appear balanced were palpably disingenuous. She wrote a few lines about the Provost’s critique and a few lines about the IRB director’s explanation for why Willingham’s research was halted, but after each of those small sections of the article, Wilson interjected some comment from Willingham meant to overrule her critics.

Dismissing the Provost’s and the IRB director’s explanations, Wilson chose to favor the perspective of Frank Baumgartner, a Distinguished Professor of Political Science. Baumgartner was one of the small handful of faculty members at the meeting who were critical of the Provost’s presentation while most applauded his conclusions. Baumgartner’s distinguished professorship makes his perspective seem worth considering, yet he is the same person who, elsewhere, baselessly accused the Provost of stonewalling, saying, “It looks bad, it smells bad. I don’t know if it was bad, but it smells bad.

My question for Baumgartner is this: What kind of research methodologies do political scientists use? Are their methodologies olfactory in nature? Does the Political Science Department teach doctoral students to sniff around for the reliability and validity of research findings, like hounds hunting for foxes? If so, I will be sure to sign up for a course next Fall. Otherwise, maybe Baumgartner should not respond to presentations on research findings the same way he would respond to a cooking demonstration, and maybe Wilson and the other journalists who quoted Baumgartner should think more carefully about who and what they quote.

Another example of a journalist’s suspending disbelief as if she were watching an installment of Twilight is Rebecca Schuman, an education columnist for Slate. In the second paragraph of her article, before further flattering Willingham’s credentials while discarding the Provost’s critique, Schuman refers to Willingham’s study as “extensive research.” After I read that sentence, I had to start over my reading of the article because I thought perhaps I had overlooked some hint indicating the article was a parody. Unfortunately, it was not a parody; it was meant to be taken seriously. What makes Schuman's reference to Willingham’s research as “extensive” especially vexing is the fact that Schuman has a PhD from the University of California at Irvine. However, she has on occasion written about her struggles to succeed in academia. Considering her aberrant understanding of “extensive research,” I am not surprised.

As anyone who has followed this story knows, the worst offender is Paul Barrett of BusinessWeek. However, most people do not know the full extent of his offense. His reporting on Willingham was so egregiously more irresponsible than others’ reporting because he had more reason to question Willingham’s claims than anyone. How do I know he had more reason to question her claims? I know because I told him myself in an email, and he responded with, “All very helpful. Many thanks.”

On January 31, Barrett emailed me to inquire about an email I sent Chancellor Folt last summer. I willingly sent him the full text of my email to the Chancellor. In addition, I suggested to him that he “consider consulting an expert in educational assessment to evaluate Ms. Willingham’s claims,” and I explained that ACT/SAT scores should not be used to calculate grade equivalents as Willingham claimed to have done. Did his later reporting demonstrate any indication he had pursued an informed evaluation of Willingham’s methodology?


Rather, on February 27, Barrett published a fulsome cover story, complete with iconoclastic graphics, extolling Willingham’s virtues while denouncing UNC’s transgressions. Nowhere in the article did Barrett entertain the possibility Willingham had reported inflated statistics and embellished recollections, despite my giving him reasonable cause to do so. In fact, not only did he willfully ignore my comments, he reported on the very methodology I had explained was flawed. Defending Willingham's findings, he wrote, "And her assessment wasn’t based solely on the SATA; she also looked at results from athletes’ SAT and ACT entrance exams." Barrett’s article clearly demonstrates his utter disregard for journalistic integrity, and the article’s appearance as the cover story raises questions about the legitimacy of BusinessWeek as a credible news source.

Even before Barrett’s story—more like a combination of love letter to Willingham and hate mail to UNC—was published, I had felt pushed to the edge of my tolerance for distortion and defamation. At some point early in February, I decided I could remain silent no longer, which led to my writing and publishing “Truth and Literacy at UNC” on my blog. Yet Barrett’s puffery pushed me over the edge. After reading Barrett’s newest contribution to the Willingham melodrama, I felt like Howard Beale in the 1976 film Network, when he shouts, “I’m as mad as hell, and I’m not going to take this anymore!”

How could so many local and national media outlets be complicit in such a contrived, sensationalized narrative of a humble David (i.e., Willingham) versus a domineering Goliath (i.e., UNC)? How could the media manufacture such a hero out of Willingham, making her the face of the movement to reform college sports—a movement, by the way, I otherwise support—when the numbers telling her story were fabricated? What other stories has the media sold to us and affixed onto our collective memory? What else have we believed because the media told the story that would sell rather than the story that was true?

I became obsessed with these questions, continuously brooding over the Willingham saga, and I began furiously taking notes and outlining essays, waiting for the opportune time to publish again on my blog. The previews for the recent ESPN Outside the Lines segment and HBO Real Sports segment prompted the production of some drafts, and since March 28, I have now published, including this essay, approximately 15,000 words on the issues surrounding what I am now calling “SATA-gate.”

Embellished Recollections

Thus far in this essay, I have focused only on the statistics Willingham reported. Yet her anecdotal claims and recollections of her experience in the Academic Support Program are also dubious.

As I explained in my essay “Truth and Literacy at UNC,” Willingham was likely misleading with her anecdote about the athlete who could not sound out Wisconsin and her other anecdote about the athlete who wanted to learn to read newspaper articles about himself. If those anecdotes are even true, I suspect each of those athletes had a specific learning disability that impeded his reading fluency but which they could overcome with technological accommodations. Students with learning disabilities are not intellectually disabled and are as capable of succeeding as other students, when provided appropriate accommodations and strategy instruction. Considering the likelihood those two students had a learning disability, and given that the external review found only about 7% of the athletes in the sample read below high school level, I do not believe Willingham’s anecdotes fairly represent the intellectual abilities of the vast majority of athletes, even football and men’s basketball players.

In fact (to return to the SATA for a moment), a significant point Willingham never bothered to explain about the SATA-based assessments of the athletes is that the SATA was used as a tool to identify students who might have a learning disability. I know for a fact that some of the approximately 7% whom the external review found to read below a high school level were eventually diagnosed with a learning disability and therefore would likely test higher on the SATA if they were permitted to use accommodations. Accommodations for learning disabilities are like glasses for those of us with less than 20/20 vision. When I take a sight test without my glasses, I do poorly. However, when I put on my glasses, I perform much better. Accommodations for learning disabilities work much the same way. Therefore, when we consider the presence of learning disabilities among the athletes and the fact that with accommodations some of those athletes would have tested higher, even the statistic of 7% is likely too high to accurately represent the number of athletes with actual elementary reading levels.

Furthermore, not only did Willingham use misleading anecdotes, she also very likely exaggerated the extent of her experience. In a WUNC story that aired on January 31, she claimed she worked with about 500 athletes during her seven years with the Academic Support Program. Unless the phrase “worked with” is interpreted so loosely it includes any athlete she may have briefly encountered in a single interaction, I find that claim highly improbable. To have worked with 500 students over seven years means she would have had to work with an average of 70 new students per year. Learning specialists, however, do not work with that many students.

I am an active member of the National Association of Academic Advisors for Athletics (N4A), the umbrella organization for academic support staff, and I am a leading member of the N4A’s committee of learning specialists. Every year at our national conference, one of our most discussed topics is the number of students each of us has in our caseload. I have heard of caseloads as low as six students and as high as 40. However, those numbers include returning students as well as new students. Willingham undoubtedly had students who needed her support for more than their first year, meaning her caseload would have been even larger than 70 to accommodate both new and returning students. To have worked with that many students intimately enough to “still see their faces,” as she claims she does, or even closely enough to be considered the typical work of a learning specialist, would have required more hours than are available in a day or week or month.

Furthermore, her availability to work in the capacity of a learning specialist with that many students would have been even more limited if she was also responsible for certifying athletes’ NCAA eligibility, as she recently claimed when fumbling for an explanation of the now infamous “Rosa Parks” paper. In a Slate article that “corrected” the viral version of the story, Willingham explained that the A- originally reported as the grade for the paper was actually the grade for the class. She stated she knew the class grade because she had certified that student’s NCAA eligibility.

However, I find that claim highly suspect because typically NCAA certification is the responsibility of the academic counselor, not the learning specialist. Certification is a tedious process of filling out the paperwork that demonstrates a student-athlete is meeting all the NCAA academic requirements. In addition to me, the Academic Support Program here at UNC has one other full-time learning specialist, and neither of us have certification responsibilities. My understanding is that learning specialists in our program have never been responsible for certification.

Of course, I did not work with Willingham, and so I cannot say with certainty that she was not involved in certification. Yet I can assert with certainty that she would not have had enough time in her schedule to work as a learning specialist for 70 new students per year, regardless, but especially in addition to having certification responsibilities.

In short, Willingham’s recollection of her experience as a learning specialist is highly questionable. As Mos Def / Yasiin Bey rapped, “It’s a numbers game / But shit don’t add up somehow.”

Coordinated Denial and Obfuscation

On Friday, the university published the external review of the data from which Willingham claimed 70% of the athletes in her sample read below a high school level. The external review was actually three external reviews, conducted by three experts in educational assessment. Each of the experts came to similar conclusions, confirming what we already knew: “The data do not support the public claims about the students’ reading ability.”

Yet a week prior, in anticipation of the external review, Smith began his and Willingham’s conspicuously coordinated strategy to discredit the review. On April 3, in an interview on WCHL, Smith said “It’s very curious that no effort has been made to reach [Johnson].” A week later, the day the review was released, Sara Ganim, the CNN reporter who originally published Willingham’s findings, tweeted the same sentiment.

In addition, as I already mentioned, Dan Kane published an article that same day, in which he insinuated the university had canceled Johnson’s contract to silence Willingham and Johnson’s research.

Of course, Willingham’s response to the review was obfuscation at the most unscrupulous level. Incidentally, it was publicized via a tweet from Sara Ganim.

One of the primary means of obfuscation is to trumpet irrelevant information as if it were relevant. Smith and Willingham both criticized the university for not contacting Johnson, but Johnson’s input is unequivocally inconsequential because we already have all her assessment data and evaluations. Ironically, that was exactly Willingham’s point back in January when she was complaining about the Provost’s request for her data. “They have all the data and more,” she repeated. Again, yes, we do have the data, and it does not confirm Willingham’s findings. Two internal reviews and an external review consisting of three independent experts have demonstrated Willingham’s findings are bogus. She knows that, as does Smith. Their pretending otherwise reveals how desperate and dishonest they have become. Instead of attempting to discredit the review, they should be volunteering to pay the university the $15,000 the review cost.

Stated simply, the story Mary Willingham has been telling since January 7 is one based on counterfeit numbers, rendering it more fraudulent than one of Julius Nyang’oro’s classes.


Daniel Kahneman is a Nobel prize-winning psychologist and the author of Thinking Fast, Thinking Slow. In a TED talk he gave four years ago, he explains how the perception of one’s experience profoundly affects the way one later remembers the facts of that experience. Specifically, the way one perceives the conclusion of an experience will influence the way one later recollects the entire experience, regardless of how one perceived the bulk of the experience while it was occurring. For example, if one is watching and enjoying a movie, but the ending turns out highly disappointing, one is likely to review the whole movie negatively.

Kahneman explains this phenomenon by distinguishing between the two selves that each of us possesses: the experiencing self and the remembering self. The experiencing self is the self that is actively perceiving and engaging in the current experience. On the other hand, the remembering self is the self that tells stories based on retrospective perception. Those stories, and the memories that comprise them, are ineluctably affected by how one perceived the conclusions of the associated past experiences.

Borrowing Kahneman’s language, I would like to suggest Mary Willingham be understood as UNC’s remembering learning specialist, whereas I might be understood as UNC’s experiencing learning specialist. Of course, Willingham was at one time the experiencing learning specialist, but she clearly perceived the conclusion of her experience negatively, skewing her entire recollection. As the current experiencing learning specialist, however, I have recently witnessed the university implement meaningful reform that will ensure future student-athletes (1) have a reasonable chance at academic success and (2) receive the support they need to achieve that success. My extant perception, therefore, is profoundly more positive than Willingham’s retrospective perception, which is obviously distorted.

With her statistics disproved and her experience called into question, Willingham has no more support for her invectives against UNC. She has spoken of significant numbers of UNC athletes who read below a high school level, saying, “I can still see their faces.” However, we now know the number of faces was not even 7% of her sample. I do believe she sees faces, but unlike Haley Joel Osment’s character in the M. Night Shyamalan thriller, Willingham has no sixth sense through which some otherwise imperceptible faces become visible. Sadly, the faces Willingham sees are the products of a mind compromised by regret, disposed toward melodrama, and desperate for attention.

Jay Smith recently wrote, “Lucky for me, I have many friends. Many smart friends.” If that is true, I now implore those many smart friends of his to intervene to stop him and Willingham from descending down the abyss of dishonesty any farther. Encourage them to retreat from public life and return to their respective positions as professor and advisor, where they have many opportunities to make productive contributions to the life of the university. Willingham’s disturbed, remembering self has imposed itself on both her and Smith’s experiencing selves, but, as active educators, they both still have the opportunity to abide in a more fruitful here and now.

Willingham and Smith would do more good for themselves and their cause by abandoning their campaign now.