you learn by being taught

Forgive the relative quiet lately; I’ve been enjoying my birthday weekend and then catching up on a ton of work. There’s a bunch of good things coming this week, including the return of book reviews after a brief (and unplanned) break.

This morning I spoke to an entire public high school, where I was invited to discuss being a product of public schools, higher ed, and success. It was very funny for me to be asked, though flattering – as I told the kids today, I would never think of myself casually as a success. Who ever thinks that way, beyond the wealthy and the deluded? But it was flattering and fun. I told them that there was no great wisdom in life, just a series of decisions before you, and hopefully with time the perspective to be able to choose better from worse. And, because I think this is important, I told them that they needed to cultivate a sense of “good enough” in their lives. At that age, they are being told constantly that they should pursue their dreams. But very few of us get what we’ve dreamed of, and those who have often find it’s far less grand than they’d imagined. So I told them to learn and experience and enjoy and to figure out how to live in the essential disappointment of human life.

It wasn’t as much of a bummer as it sounds!

I have been reflecting on the value of teachers. I have been accused a lot, lately, of not believing that teachers matter. That’s the opposite of the truth, really. I just think that this notion of casting the value of teachers in purely quantitative terms is a mistake, and a very recent one. The entire history of the Western canon, from Socrates to Aquinas to Locke to Dewey to Baldwin, contains arguments against this reduction. But this fight, to define what I mean and what I don’t against the tide, is a fight I suspect I will always have to keep fighting, and I intend to.

Our culture celebrates autodidacts. It talks constantly of “disrupting” education. It insists always that we need to radically reshape how we teach and learn. It treats as heroic the rejection of teachers and traditional mentorship. The self-help aisle of the bookstore abounds with writers who insist that they truly learned by rejecting the typical method of education and became, instead, self-taught, self-made. It’s an unavoidable trope.

What amazes me about my own education is just how far that is from the truth for me personally. I’ve learned, over decades, how I learn. It’s pretty simple: teachers teach me. That was true in kindergarten and it’s true now that I have my doctorate. I can’t tell you how often I have found myself feeling lost and ignorant, only to have patient, kind teachers take me through the familiar processes of modeling and repetition that are cornerstones of education. I think back to my graduate statistics classes, where I often feel like the slowest person in class, but where I always ended up getting there, thanks to steady and reassuring teaching. When I don’t get what I need from class, I’d go to office hours, or I’d go to the statistics help room, where brilliant graduate students eagerly shared knowledge and experience with me. None of this is fundamentally any different than when Mrs. Gebhardt taught me to cut shapes out of paper or when Mr. Shearer taught me simple algebra or when Mr. Tucci taught me to read poetry or when Dr. Nunn taught me to write a real research paper. The process is always the same, and in every case, I have succeeded not through rejecting the authority of teachers but by accepting their help, by recognizing their superior knowledge and letting them use it to enrich my life.

Is that a contradiction of what I’ve said about the limited ability of teachers to control the outcomes of their students? I don’t think so. The question is, do you want us to have a fuller and more humane vision of what it means to learn? I do.

They say that great men see farther than others by standing on the shoulders of giants. I think most of us are enabled to see as far as others because others have collectively reached their hands down and pulled us up.

another notch in the belt

It’s my birthday today. Wasn’t that long ago that I was part of a vanguard of young writer types. What the hell happened?

This project’s about three months old now, and I gotta tell you guys: I haven’t had this much fun writing in ages. It’s been better than I could have hoped. Thanks for coming along.

I woke up one day to find that my life had gotten pretty damn good. My job’s not perfect, but it’s still pretty great. I miss teaching, and I’d love to be in a position where I had some motivation to get peer reviewed stuff published. But I’m working at a great college with a gorgeous campus in a system I admire immensely. It’s part of my job to stay on top of the research literature, so I’m reading books and articles at a good clip. Polyani said that a scholar is someone who lives with the questions, and I do, and that’s enough. Very few people get that opportunity. It’s a privilege.

It’s also a privilege to live in this city. The other day I was walking home, cutting through Prospect Park right after dusk. I came to the Long Meadow, which a few hours before had been absolutely packed with people picnicking and jogging and flying kites and walking dogs. For a brief moment I found it utterly empty, not another soul in sight, alone in one of the most popular parks in the city. And I knew in that moment that it was all for me.

Study of the Week: Feed Kids to Feed Them

Today’s Study of the Week is about subsidized meal programs for public school students, particularly breakfast. School breakfast programs have been targeted by policymakers for awhile, in part because of discouraging participation levels. Even many students who are eligible for subsidized lunches often don’t take advantage of school breakfast. The reasons for this are multiple. Price is certainly a factor. As you’d expect, price is inversely related to participation rates for school breakfast. Also, in order to take advantage of breakfast programs, you need to arrive at school early enough to eat before school formally begins, and it’s often hard enough to get teenagers to school on time just for class. Finally, there’s a stigma component, particularly associated with subsidized breakfast programs. It was certainly the case at my public high school, where 44% of students were eligible for federal school lunch subsidies, that school breakfast carried class associations. At lunch, everybody’s eating together, but students at breakfast tended to be poorer kids – which in turn likely makes it less likely that students will want to be seen getting school breakfast.

The study, written by Jacob Leos-Urbel, Amy Ellen Schwartz, Meryle Weinstein, and Sean Corcoran (all of NYU), takes advantage of a policy change in New York public schools in 2003. Previously, school breakfast had been free only to those who were eligible for federal lunch subsidies, which remains the case in most school districts. New York made breakfast free for all students, defraying the costs by raising the price of unsubsidized lunch from $1.00 to $1.50. They then went looking to see if the switch to free breakfast for all changed participation in the breakfast program, looking for differences between the three tiers – free lunch students, reduced lunch students, and students who pay full price. They also compared outcomes from traditional schools to Universal Free Meal (UFM) schools, where the percentage of eligible students is so high that everyone in the school gets meals for free already. This helped them tease out possible differences in participation based on moving to a universal free breakfast model. They were able to use a robust data set comprising results from 723,843 students from 667 schools, grades 3–8. They also investigated whether breakfast participation rates were associated with performance in quantitative educational metrics.

It’s important to say that it’s hard to really get at causality here because we’re not doing a randomized experiment. Such an experiment would be flatly unethical – “sorry, kid, you got sorted into the no-free-breakfast group, good luck.” So we have to do observational studies and use what techniques we can to adjust for their weaknesses. In this study, the authors used what’s called a difference in difference design. These techniques are often used when analyzing natural experiments. In the current case, we have schools where the change in policy has no impact on who receives free breakfast (the UFM schools) and schools where there is an impact (the traditional schools). Therefore the UFM schools can function as a kind of natural control group, since they did not receive the “treatment.” You then use a statistical model to compare the change in the variables of interest for the “control” group to the change for the “treatment” group. Make sense?

What did the authors find? The results of the policy change were modest, in almost every measurable way, and consistent across a number of models that the authors go into in great detail in the paper. Students did take advantage of school breakfast more after breakfast became universally free. On the one hand, students who paid full price increased breakfast participation by 55%, which is a large number; but on the other hand, their initial baseline participation rates were so low (again because breakfast participation is class-influenced) that they only ate on average 6 additional breakfasts a year. Reduced price and free were increased by 33% and 15%, respectively – the latter particularly interesting given that those students did not pay for breakfast to begin with. Still, that too only represents about 6 meals over the course of a year, not nothing but perhaps less than we’d hope for a program with low participation rates. The only meaningful difference in models seems to be when they restrict their analysis to the small number (91) of schools where less than a third of students are eligible for lunch subsidies, in which case breakfast participation grew by a substantially larger amount. The purchase of lunches, for what it’s worth, remained static despite the price increase.

There’s a lot of picking apart the data and attempting to determine to what degree these findings are related to stigma. I confess I find the discussion a bit muddled but your money may vary. The educational impacts, also, were slight. They found a small increase in attendance, but this result was not significant, and no impact on reading and math outcomes.

These findings are somewhat discouraging. Certainly we would hope that moving to a universal program would help to spur participation rates to a greater degree than we’re seeing here. But it’s important to note that the authors largely restricted their analysis to the years immediately before and after the policy change, thanks to the needs of their model. When broadening the time frame by a couple years, they find an accelerating trend in participation rates, though the model is somewhat less robust. What’s more, as the authors note, decreasing stigma is the kind of thing that takes time. If it is in fact the case that stigma keeps students from taking part in school breakfast, it may well take a longer time period for universal free breakfast to erode that disincentive.

I’m also inclined to suspect that the need to get kids to school early to eat represents a serious challenge to the pragmatic success of this program. There’s perhaps good news on the way:

Even when free for all, school breakfast is voluntary. Further, unlike school lunch, breakfast traditionally is not fully incorporated into the school day and students must arrive at school early in order to participate. Importantly, in the time period since the introduction of the universal free breakfast policy considered in this paper, New York City and other large cities have begun to explore other avenues to increase participation. Most notably, some schools now provide breakfast in the classroom.

Ultimately, I believe that making school breakfast universally free is a great change even in light of relatively modest impacts on participation rate. We should embrace providing free breakfast to all students regardless of income level out of the principle of doing so, particularly considering that fluctuations in parental income might make kids who are technically ineligible unable to pay for breakfast. In time, if we set up this universal program as an embedded part of the school day, and work diligently to erase the stigma of using it, I believe more and more kids will begin their days with a full stomach.

As for the lack of impacts on quantitative metrics, well – I think that’s no real objection at all. We should feed kids to feed them, not to improve their numbers. This all dovetails with my earlier point about after school programs: if we insist on viewing every question through the lens of test scores, we’re missing out on opportunities to improve the lives of children and parents that are real and important. Again, I will say that I recognize the value of quantitative academic outcome in certain policy situations. But the relentless focus on quantitative outcomes leads to scenarios where we have to ask questions like whether giving kids free breakfast improves test scores. If it does, great – but the reason to feed children is to feed children. When it comes to test scores and education policy, the tail too often wags the dog, and it has to stop.

two economists ask teachers to behave as irrational actors

I was considering doing a front-to-back fisking of this interview of Raj Chetty, Professor of Economics at Stanford University, conducted by the libertarian economist Tyler Cowen. Despite Chetty’s obviously impressive credentials, he says several things in the interview that simply don’t hold up to scrutiny, in particular regarding the simultaneity problem1 and the impact of the shared environment2 I’ve decided to just focus on one key point, though.

The standard neoliberal ed reform argument goes like this: the major entrenched socioeconomic and racial inequalities in this country are no excuse for poor quantitative outcomes for groups of students; teachers and schools, despite all of the evidence to the contrary, control most of the variation in educational outcomes; therefore our perceived education problems are the result of lazy, untalented teachers; introducing a market for schooling will force schools to get rid of those teachers and metrics will improve. Now this story has failed to play out this way again and again in places like Detroit and Washington DC, but we’ll let that slide for now. If we accept this argument on its own terms, we need to get many talented people into teaching and replace the hundreds of thousands of “bad” teachers we’d be getting rid of.

Ed reform types are typically cagey about the scale of teacher dismissals – they hate to actually come out and say “I’d like to get hundreds of thousands of teachers fired” – but based on their own numbers, their own claims about the size and extent of the problem, that’s what needs to happen. You can’t simultaneously say that there’s a nationwide education crisis that needs to be solved by firing teachers and avoid the conclusion that huge numbers need to be fired. If reformers claim that even one out of every ten public teachers needs to be let go (a low number in reform rhetoric), we’re talking about more than 300,000 fired teachers.

I’ve argued before that the idea that market economics are effective means to solve educational problems falls apart once you recognize that, unlike a factory building a widget, educators don’t control most of what contributes to a child’s learning outcomes. But suppose you do believe in the standard conservative economics take on school reform: how can Chetty’s ideas make sense, if we trust young workers in a labor market to act in their own rational best interest? Chetty believes that we need, at scale, to “either retrain or dismiss the teachers who are less effective, [to] substantially increase productivity without significantly increasing cost.” Without increasing costs, in other words, by raising teacher salaries. The median teacher in this country makes ~$57,000 a year; the 75th percentile makes ~$73k, and the 25th percentile, ~$45k. Compare with median lawyer salaries well above $100,000 a year and median doctor salaries close to $200,000, or an average of $125,000+ for MBA graduates. So we’re not going to pay teachers more, and we’re going to sufficiently erode labor protections, if we’re going to dismiss those less effective teachers. This doesn’t sound like a good deal already.

Of course, teachers don’t just suffer from low median wages compared to people with similar levels of schooling. They also suffer from far lower social status than they are typically afforded in other countries, as Dr. Chetty acknowledges:

Yeah, I think status seems incredibly important. My sense of the K–12 education system in the US is, unfortunately for many kids graduating from top colleges, teaching is not near the top of the list of professions that they’d consider. It’s partly because, in a sense, they can’t afford to be teachers because it entails such a pay cut. But also because they feel that it’s not the most prestigious career to pursue.

Why yes, Dr. Chetty, it’s true! Teachers don’t get a lot of prestige in this country! Maybe that’s because well-paid celebrity academics who make several times the median teacher salary – people like you – talk casually about firing them en masse and insist that they are the source of poor metrics! The ed reform movement has insulted the profession of public school teacher for years. Popular expressions of that philosophy, like the execrable documentary Waiting for “Superman, have contributed to widespread assumptions that students are failing because their teachers are lazy and corrupt. How can a political movement that has relentlessly insulted the teaching profession not contribute to declining interest in being part of that profession?

Here in New York, the numbers are clear: we’re already facing a serious teacher shortage.

What Chetty and Cowen are asking for makes no sense according to their own manner of thinking. Dr. Chetty, Dr. Cowen: there is no bullpen. Even if I thought that teachers controlled far more of the variance in quantitative education metrics than I do, and even if I didn’t have objections about fair labor practices against removing hundreds of thousands of teachers, we would be stuck with this simple fact. We do not have hundreds of thousands of talented young professionals, eager to forego the far greater rewards available in the private sector, ready to jump in and start teaching. And we certainly won’t have such a thing if we share Chetty’s resistance to paying teachers more and his commitment to making it easier to fire them.

So: no higher salaries for a relatively low-paying profession, eroding the job security that is the most treasured benefit of the job, continuing to degrade and insult the current workforce as lazy and undeserving, getting rid of hundreds of thousands of them, and yet somehow attracting hundreds of thousands of more talented, more committed young workers to become teachers.

According to what school of economics, exactly, is such a thing possible?


 

Study of the Week: Better and Worse Ways to Attack Entrance Exams

For this week’s Study of the Week I want to look at standardized tests, the concept of validity, and how best – and worst – to criticize exams like the SAT and ACT. To begin, let’s consider what exactly it means to call such exams valid.

What is validity?

Validity is a multi-faceted concept that’s seen as a core aspect of test development. Like many subjects in psychometrics and stats, it tends to be used casually and referred to as something fairly simple, when in fact the concept is notoriously complex. Accepting that any one-sentence definition of validity is thus a distortion, generally we say that validity refers to the degree that a test measures that which it purports to measure. A test is more valid or less depending on its ability to actually capture the underlying traits we are interested in investigating through its mechanism. No test can ever be fully or perfectly validated; rather we can say that it is more or less valid. Validity is a vector, not a destination.

Validity is so complex, and so interesting, in part because it sits at the nexus of both quantitative and philosophical concerns. Concepts that we want to test may appear superficially simple but are often filled with hidden complexity. As I wrote in a past Study of the Week, talking about the related issues of construct and operationalization,

If we want to test reading ability, how would we go about doing that? A simple way might be to have a a test subject read a book out loud. We might then decide if the subject can be put into the CAN READ or CAN’T READ pile. But of course that’s quite lacking in granularity and leaves us with a lot of questions. If a reader mispronounces a word but understands its meaning, does that mean they can’t read that word? How many words can a reader fail to read correctly in a given text before we sort them into the CAN’T READ pile? Clearly, reading isn’t really a binary activity. Some people are better or worse readers and some people can reader harder or easier texts. What we need is a scale and a test to assign readers to it. What form should that scale take? How many questions is best? Should the test involve reading passages or reading sentences? Fill in the blank or multiple choice? Is the ability to spot grammatical errors in a text an aspect of reading, or is that a different construct? Is vocabulary knowledge a part of the construct of reading ability or a separate construct?

Questions such as these are endemic to test development, and frequently we are forced to make subjective decisions about how best to measure complex constructs of interest. Common to the quantitative social sciences, this subjective, theoretical side of validity is often written out of our conception of the topic, as we want to speak with the certainty of numbers and the authority of the “harder” sciences. But theory is inextricable from empiricism, and the more that we wish to hide it, the more subject we are to distortions that arise from failing to fully think through our theories and what they mean. Good empiricists know theory comes first; without it, the numbers are meaningless.

Validity has been subdivided into a large number of types, which reflect different goals and values within the test development process. Some examples include:

  • Predictive Validity: The ability of a test’s results to predict that which it should be able to predict if the test is in fact valid. If a reading test predicts whether students can in fact read texts of a given complexity or reading level, that would provide evidence of predictive validity. The SAT’s ability to predict the grades of college freshmen is a classic example.
  • Concurrent Validity: If a test’s results are strongly correlated with that of a test that measures similar constructs and which has itself been sufficiently validated, that provides evidence of concurrent validity. Of course, you have to be careful – two invalid tests might provide similar results but not tell us much of actual worth. Still, a test of quantitative reasoning and a test of math would be expected to be imperfectly yet moderately-to-strongly correlated if each is itself a valid test of the given construct.
  • Curricular Validity: As the name implies, curricular validity reflects the degree to which a test matches with a given curriculum. If a test of biology closely matches the content in the syllabus of that biology course, we would argue for high curricular validity. This is important because we can easily imagine a scenario where general ability in biology could be measured effectively by a test that lacked curricular validity – students who are strong in biology might score well on a test, and students who are poor would likely score poorly, even if that test didn’t closely match the curriculum. But that test would still not be a particularly valid measure of biology as learned in that class, so curricular validity would be low. This is often expressed as a matter of ethics.
  • Ecological Validity: Heading in a “softer” direction, ecological validity is often discussed to refer to the degree to which a test or similar assessment instrument matches the real-life contexts in which its consequences will be enacted. Take writing assessment. In previous generations, it was common for student writing ability to be tested through multiple choice tests on grammar and sentence combining. These tests were argued to be valid because their results tend to be highly correlated with the scores that students receive on written essay exams. But writing teachers objected, quite reasonably, that we should test student writing by having them write, even if those correlations are strong. This is an invocation of ecological validity and reflects a broader (and to me positive) effort to not reduce validity to narrowly numerical terms.

I could go on!

When we talk about entrance examinations like the SAT or GRE, we often fixate on predictive validity, for obvious reasons. If we’re using test scores as criteria for entry into selective institutions, we are making a set of claims about the relationship between those scores and the eventual performance of those students. Most importantly, we’re saying that the tests help us to know that students can complete a given college curriculum, that we’re not setting them up to fail by admitting them to a school where they are not academically prepared to thrive. This is, ostensibly, the first responsibility of the college admissions process. Ostensibly.

Of course, there are ceiling effects here, and a whole host of social and ethical concerns that predictive validity can’t address. I can’t find a link now but awhile back a Harvard admissions officer admitted that something like 90% of the applicants have the academic ability to succeed at the school, and that much of the screening process had little to do with actual academic preparedness. This is a big subject that’s outside of the bounds of this week’s study.

The ACT: Still Predictively Valid

Today’s study, by Paul A. Westrick, Huy Le, Steven B. Robbins, Justine M. R. Radunzel, and Frank L. Schmidt1, is a large-n (189,612) study about the predictive validity of the ACT, with analysis of the role of socioeconomic status (SES) and high school grades in retention and college grades. The researchers examined the outcomes of students who took the ACT and went on to enroll in 4-year institutions from 2000 to 2006.

The nut:

After corrections for range restriction, the estimated mean correlation between ACT scores and 1st-year GPA was .51, and the estimated mean correlation between high school GPA and 1st-year GPA was .58. In addition, the validity coefficients for ACT Composite score and high school GPA were found to be somewhat variable across institutions, with 90% of the coefficients estimated to fall between .43 and .60, and between .49 and .68, respectively (as indicated by the 90% credibility intervals). In contrast, after correcting for artifacts, the estimated mean correlation between SES and 1st-year GPA was only .24 and did not vary across institutions….

…1st-year GPA, the most proximal predictor of 2nd-year retention, had the strongest relationship (.41). ACT Composite scores (.19) and high school GPA (.21) were similar in the strength of their relationships with 2nd-year retention, and SES had the weakest relationship with 2nd-year retention (.10).

The results should be familiar to anyone who has taken a good look at the literature on these tests, and to anyone who has been a regular reader of this blog. The ACT is in fact a pretty strong predictor of GPA, though far from a perfect one at .51. Context is key here; in the world of social sciences and education, .51 is an impressive degree of predictive validity for the criterion of interest. But there’s lots of wiggle! And I think that’s ultimately a good thing; it permits us to recognize that there are a variety of ways to effectively navigate the challenges of the college experience… and to fail to do so. (As the Study of the Week post linked to above notes, GPA is strongly influenced by Conscientiousness, the part of the Five Factor Model associated with persistence and delaying gratification.) We live in a world of variability, and no test can ever make perfectly accurate predictions about who will succeed or fail. Exceptions abound. Proponents of these tests will say, though, that they are probably much more valid predictors of college grades and dropout rates than more subjective criteria like essays and extracurricular activities. And they have a point.

Does the fact that SES correlates “only” at .24 with college GPA mean SES doesn’t matter? Of course not. That level of correlation for a variable that is truly construct-irrelevant and which has such obvious social justice dimensions is notable even if its less powerful than some would suspect. It simply means that we should take care not to exaggerate that relationship, or the relationship between SES and performance on tests like the ACT and SAT, which is similar at about .25 in the best data known to me. Again: clearly that is a relevant relationship, and clearly it does not support the notion that these tests only reflect differences in SES.

Ultimately, every read I have of the extant evidence demonstrates that tests like the SAT and ACT are moderately to highly effective at predicting which students will succeed in terms of college GPA and retention rates. They are not perfect and should not be treated as such, so we should use other types of evidence such as high school grades and other, “soft” factors in our college admissions procedures – in other words, what we already do – if we’re primarily concerned with screening for prerequisite ability. Does that mean I have no objections to these tests or their use? Not at all. It just means that I want to make the right kinds of criticisms.

Don’t Criticize Strength, Criticize Weakness

A theme that I will return to again and again in this space is that we need to consider education and its place in society from a high enough level to think coherently. Critics of the SAT and ACT tend to pitch their criticisms at a level that does them no good.

So take this piece in Slate from a couple enthusiastic SAT (and IQ) proponent. In it, they take several liberal academics to task for making inaccurate claims about the SAT, in particular the idea that the SAT only measures how well you take the SAT. As the authors say, the evidence against this is overwhelming; the SAT, like the ACT, is and has always been an effective predictor of college grades and retention rates, which is precisely what the test is mean to predict. The big testing companies invest a great deal of money and effort in making them predictively valid. (And a great deal of test taker time and effort, too, given that one section out of each given exam is “experimental,” unscored and used for the production of future tests.) When you attack the predictive validity of these tests – their ability to make meaningful predictions about who will succeed and who will fail at college – you are attacking them at their strongest point. It’s like their critics are deliberately making the weakest critique possible.

“These tests are only proxies for socioeconomic status” is a factually incorrect attempt to make a criticism of how our educational system replicates received advantage. It fails because it does not operate at the right level of perspective. Here’s a better version, my version: “these tests are part of an educational system that reflects a narrow definition of student success that is based on the needs of capitalism, rather than a fuller, more humanistic definition of what it means to be a good student.”

These tests do indeed tell us how well students are likely to do in college and in turn provide some evidence of how well they will do in the working world. But college, like our educational system as a whole, has been tuned to attend to the needs of the market rather than to the broader needs of humanity. The former privileges the kind of abstract processing and brute reasoning skills that tests are good at measuring and which makes one a good Facebook or Boeing employee. The latter would include things like ethical responsibility, aesthetic appreciation, elegance of expression, and dedication to equality, among other things, which tests are not well suited to measuring. A more egalitarian society would of course also have need for, and value, the raw processing power that we can test for effectively, but that strength would be correctly seen as just one value among many. To get there, though, we have to make much broader critiques and reforms of contemporary society than the “SAT just measures how well you take the SAT” crowd tend to engage in.

What I am asking for, in other words, is that we focus on telling the whole story rather than distorting what we know about part of the story. There is so much to criticize in our system and how it doles out rewards, so let’s attack weakness, not strength.

notes

  • For some odd reason my last post, on public subsidies for wealthy Ivies in an era of austerity, did not get pushed out to RSS readers. Apparently that’s happened before. It’s frustrating and I’m not sure what’s happening. You can always follow the ANOVA’s Twitter account for new posts.
  • That post has been republished at Jacobin.
  • I was on the left-leaning military affairs podcast What a Hell of a Way to Die, talking about the GI Bill, recently.
  • I will be appearing on the Katie Halper Show on June 14th at Brooklyn Commons from 7 PM to 10 PM, with the brilliant Angela Nagle. It’s a fundraiser for WBAI which is well-worth supporting.
  • This past week’s book review, on “Rebekah Nathan”‘s My Freshman Year, has been delayed and will be pushed out to Patreon patrons tomorrow afternoon. Archival content for patrons is coming later today.
  • Sometimes I write about non-education stuff on Medium. Here’s me on podcasts.
  • A couple people have asked about my Academia.edu and ResearchGate profiles, so I’ll just note that I often forget those exist and they are rarely if ever updated, though I’m going to make an effort to get them up to speed this week.
  • Coming soon: posts on teacher observations, corpus linguistics, and regression.

two sets of universities, two countries, two futures

click for creator's Flickr page
image by Flickr user John Walker used under CC License

Today, Yale University’s 316th commencement will take place. Beaming young people and their proud parents will flock to the immaculate New Haven campus, eager to start their climb further up the ladder of American success. They know, as they surely knew the day they arrived, that their passage through such an august institution prepares them for a life of financial security and high social standing. They know, in other words, that as much as any young people, they are positioned to advance to the rarefied world of elite America.

Meanwhile, elsewhere in Connecticut, twelve community colleges and four public universities – including one found in the very same city – are starved to death by austerity and neoliberalism, as the Democrat governor and a Democratic state legislature in a rich blue state enact brutal cuts to education, social services, and mental health care, while fighting to cut taxes on corporations. The cuts to the Connecticut State University system are particularly devastating. They risk killing majors, shuttering departments, and destroying tenure. Programs that help shepherd a student body that comes disproportionately from non-traditional backgrounds, and thus needs help the most, are under threat. Classes may be cut from course schedules, making it even harder for working students and students who are parents to fit school into the schedules. In every way, a university system that already struggles to serve its students and its state thanks to resource constraints will be hurt even more.

These cuts are personal, for me, as I am a graduate of Central Connecticut State University in the CSU system. I will risk self-aggrandizement in saying that I am an example of the kind of success story that is routinely produced by the CSU system and systems like it. In my early 20s I was lost – orphaned, broke, alcoholic, struggling from then-undiagnosed mental illness, and completely without direction or a sense of purpose. But I took classes at the local community college for a year, then transferred to Central, where I met warm, engaging, committed educators who shepherded me through my education and showed me that I had skills and knowledge that had value – that my life had value. Today, I have a PhD, live in New York City, work at a wonderful public college myself, and have been published by some of the most prominent newspapers and magazines in the world. I owe all of that, without exception, to my time in the CSU system. It was there that I put my life back together, thanks to the dedication of the professionals who worked there and the relatively low tuition costs that enabled me to attend. I say with no exaggeration: the Connecticut State University system saved my life. And now, for shortfalls of less than $100 million a year, that system risks being permanently crippled.

To make all of this worse, down I-91 from my old university, Yale sits on a mountain of money, and yet receives more and more from public funds. The degree to which our government subsidizes the immensely wealthy Ivy League schools defies belief. A report from Open the Books, an organization that works for transparency in government spending, estimates that the federal and state governments spent over $40 billion on the Ivy League schools in tax exemptions, contracts, grants, and direct gifts from 2010 to 2015. The eight Ivy League universities – small, elite institutions from one region of the country that serve a tiny fraction of our college students and who could scarcely need government support less – receive more money annually from the federal government, on average, than 16 states. Four in ten students from the top 0.1% of families by income attend the Ivy League or similarly elite institutions; in 2012, 70% of Yale’s incoming freshmen came from families making more than $120,000; the median family income for Harvard students is triple the national average. The overwhelming majority of these students go on to lives of economic security, and many to the upper echelons of our economy.

Yet we continue to pour in government money to these rich institutions, and their wealthy alumni pour in hundreds of millions of dollars to their endowments untaxed, often invoking the spirit of giving and the need for equal opportunity while they do so. Meanwhile, we know empirically that systems like the CSU system, or the City University of New York system (where I now work), or the California State University system – America’s Great Working Class Colleges – do a far better job of creating social mobility than their elite counterparts. Yet each of these systems struggles under brutal cuts to their funding even though our country has never been richer.

What political philosophy, exactly, could possibly justify this condition? What ideology would conclude that this is a good use of resources, either public or philanthropic?

And yet the condition endures, even accelerates, year after year. No one seems to ask why those institutions who are objectively fabulously wealthy should receive such outlandish public subsidy, nor does anyone provide an answer as to why so many of our wealthiest continue to cut large checks for these institutions while our working class colleges, who need the money so desperately, starve. I am absolutely committed to the idea that higher education should be funded with public moneys, but I am also perplexed at the tendency of charitable donations to go where they are needed least of all. Where is Bill Gates to subsidize our working class colleges? Where is Mark Zuckerberg? Why does the philanthropic impulse, when it comes to higher education, always result in the rich getting richer? Connecticut is home to a small army of hedge fund managers and other incredibly wealthy types. I would love it if we could take their money by force for the good of all of society. But barring that, why don’t they use Connecticut’s starving public system for tax avoidance, rather than elite universities that are already filthy rich? Unless the entire point of such gifts is not to create equality of opportunity but to destroy it, to ensure that only those who start out at the top get to end up there. Our elite universities do many good things, but there is no question that they perpetuate and deepen inequality. That is in fact their most basic function: the replication of the ruling class.

I have no doubt that Yale’s class of 2017 is full of smart, talented, and passionate young people. I wish them the best. I also have no doubt that those among them who may not be talented or hardworking will be wholly inoculated from that condition thanks to the accidents of birth and privilege that helped them reach their rarefied station in the first place. As a socialist, I am not interested in making them more susceptible to material hardship and the vagaries of chance, but rather of giving everyone that same level of protection – and that means raiding the coffers of their school, their parents, and their future employers for the betterment of all. I also don’t doubt that, on balance, graduates of the Connecticut State system will succeed as well. College graduates writ large enjoy a substantial premium in income and unemployment rates over those without degrees, after all. But how hard will they have to struggle, as their instructors are stretched thinner and thinner by these brutal cuts? How many of them will sink deeper into debt as they are forced to take additional semesters of classes to complete their degrees? How many of them will drop out, thanks to these cuts, and suffer under the burden of student loan debt with no degree to help them secure a better life? How many people who could have been saved, as I was saved, now won’t be because of these cuts?

Today’s Yale commencement ceremony, of course, will be stocked with liberals, decent progressive folk who will tell you they believe in equality and social justice. The parents will mostly be liberal Democrats. The student ranks will be filled, no doubt, by genuine radicals, and the faculty with Marxists and socialists. They do good deeds at these places, such as how Yale’s community recently forced the school to change the name of Calhoun College, thanks to John C Calhoun’s history as a slave owner. I celebrate the activist zeal of all involved in such actions. Yet what Yale’s community can’t do – and perhaps wouldn’t, if it could – is to dismantle its place in the engine of American inequality. For all of the decent people involved in that institution, there is no chance that it will ever voluntarily abandon its role as an incubator of the ruling class. To do so would be unthinkable. That’s the reality of higher education: ostensible leftists preside over the ever-accelerating accumulation of power, money, and privilege. A better way is possible, but it cannot be achieved from within campus.

Until we reach that better world, we’ll be left with these ugly divides. In a sea of political ugliness it’s hard for me to imagine a more stark statement of America’s grand failures than this, a starving public university system that serves the poor and the brown and the needy, while next door a school for the 1% sits on $25 billion dollars, untaxed. CSU students, like Yale students, will walk on campus lawns with caps and gowns, eager to begin their new lives. Like Yale students, CSU students will seek a better life. But how many of them will be stuck here in this other America, inequality America, austerity America, while those who’ve already been given so much are given even more?

Correction: Fixed some inaccurate wording in the fourth paragraph.

“Like the validity of intelligence testing, the heritability of intelligence is no longer scientifically contentious.”

That headline is taken from this piece on Vox, by  advocating a third way between “race realist”-style racism and liberal blank slatism. I’ve chosen it as the title for this post because I thought the reaction on social media showed the power of a headline for shaping popular perception of an argument.

Yesterday was an interesting day for me, watching that piece get passed around approvingly on Twitter and Facebook. Interesting because I wrote a post that made substantially the same argument as the Vox piece – that intelligence testing is predictively valid and that genes account for some of the individual variation in that testing, but that racial groupings are socially constructed and arguments about inherent racial inferiority are invalid (and bigoted). Yet I got a lot of heat for my post while the Vox piece was roundly praised. In particular, I was told often that a) IQ and its proxies are not valid and b) that there is no genetic influence on psychological and cognitive outcomes. Both of these ideas are strenuously denied by the authors, who are (unlike me) experts in the relevant fields. Yet because the piece was pitched as anti-Charles Murray (and Sam Harris), objections to these points were muted to nonexistent. Still, it’s essential that progressive people recognize the most important contention of the Vox piece: that rejecting pseudoscientific racism does not undermine the predictive validity of IQ testing or the overwhelming evidence of polygenic heritability of cognitive outcomes. As they say,

a realistic acceptance of the facts about intelligence and genetics, tempered with an appreciation of the complexities and gaps in evidence and interpretation, does not commit the thoughtful scholar to Murrayism in either its right-leaning mainstream version or its more toxically racialist forms.

Obviously, when topics are as sensitive as these, first impressions are incredibly important. Still, it was simultaneously gratifying and aggravating to me. For example, I was accused of “cosplaying as Charles Murray” at Lawyers, Guns, and Money for my initial post on these topics, but a blogger there approvingly shared the Vox piece that made the same argument yesterday as a rejection of Murray. Such a fine line between imitation and rejection, when you don’t read carefully! Like I said in a brief post on Medium, same planet, different worlds.

In any event, I am encouraged by the success of that essay, the authors deserve credit for laying out the case so persuasively, and I think the worm is finally turning against blank slatism or IQ denialism as default progressive opinions. It is not necessary to embrace blank slate thinking to fight racism, and in fact our efforts to do so will be strengthened by our willingness to embrace genetic behaviorism.

Why It Matters

Some people ask me, why bother? Why not just leave this stuff alone, given that some have taken ideas in the same general orbit to truly noxious ends?

It matters that progressive people reject blank slatism because blank slatism is incorrect and we should tell the truth. But even from the most pragmatic or consequentialist perspective, we should accept the contemporary science on intelligence and heritability because doing so is the only way to effectively fight racism and white supremacy. By refusing to engage with the extant science on individual variation, we leave that field of argument entirely to those who would use it for the worst possible ends. As the authors say,

The left has another lesson to learn as well. If people with progressive political values, who reject claims of genetic determinism and pseudoscientific racialist speculation, abdicate their responsibility to engage with the science of human abilities and the genetics of human behavior, the field will come to be dominated by those who do not share those values. Liberals need not deny that intelligence is a real thing or that IQ tests measure something real about intelligence, that individuals and groups differ in measured IQ, or that individual differences are heritable in complex ways.

This is precisely my position. Don’t play to the alt-right frame; don’t help them make the case that progressives are anti-science or resistant to facts. Fight bad science with better. It is also my position, as readers of this blog know, that the assumption that all human beings have equal academic potential produces bad educational policy and leads inevitably to conservative “just deserts” economic attitudes and the social inequalities inherent to meritocracy.

As the authors note, the heritability of cognitive outcomes does not imply that they are not mutable. My position is not, and has never been, genetic determinism, which suggests that genes are destiny and that there are no other meaningful factors. But the influence of genes has to be part of a frank discussion about the fact that, summatively speaking, we have overwhelming evidence that not all individuals have the same academic potential. I have always actually been mechanism agnostic about this; that is, I am not sure to what degree the persistence of academic inequality is about genes or parenting or environment or resources or pure luck. I’m just sure that when we look at scale, the obvious conclusion has to be that not everyone has the same level of potential in all academic endeavors. (We should bear in mind that just as genetic influence does not make an outcome immutable, environmental influence does not mean it’s necessarily changeable.)

As someone interested in education policy, the obvious analytical conclusion is that we should stop trying to force students to reach universal arbitrary performance goals, as No Child Left Behind mandated and test mania encourages. As a socialist, the obvious moral conclusion is that we should move more and more material needs outside of the market economy and guarantee them via government, as our own inability to fully control our academic outcomes means that they cannot morally be used as justification for increased risk of poverty, hopelessness, and marginalization. As I’ve said many times, I believe the racial academic achievement gap will be closed, precisely because I don’t think races are meaningful categories or that they express intrinsic differences in human value. The question is, what happens after we close the racial achievement gap? Would it imply that a bigotry against those not blessed with strong academic potential would be justified? That’s what “meritocracy” argues, and I believe it’s a moral error. I believe that this tendency, called the hereditarian left by some, will only grow in a world where the logic of meritocracy has brought us spiraling inequality, the division of our country into essentially two different societies with profoundly different qualities of life.

To fight against it, we have to talk clearly and openly about these issues, and I think that Vox piece was strong step in that direction. The Genetics & Human Agency project, which is led by Turkheimer and Harden, is a step in the right direction too. There are big moral and political questions here, and it’s up to us to answer them in a fair and humane way.

 

 

Campbell’s Law and the inevitability of school fraud

When discussing education policy there’s few things more useful to understand than Campbell’s Law:

The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor.

There’s a great piece on Campbell’s Law and testing mania from a mathematician here (PDF). The implications of this dynamic are obvious in ed circles. Why do teachers cheat in a high-stakes environments? Why do charter schools cook the books while preaching about the importance of their mission? Why do I suspect that there’s a mountain of cheating and corruption going on in our ed data that hasn’t been discovered? Because of Campbell’s Law and the stubbornness of academic inequalities.

Hard to think of a more apt example of the influence of Campbell’s Law than this story out of a San Diego charter school. I highly encourage you to read this piece, as its the kind of diligent and important local journalism that is so deeply threatened today. It tells the story of a school where a leader with missionary zeal and a no excuses culture has conspired to pressure teachers into rampant grade inflation, sending young people into higher education with grades that don’t remotely match their skills.

Forgive this lengthy excerpt but I think it’s worth it here.

Teachers who have worked with 48-year-old Riveroll say he’s an inspiring leader, a visionary with extraordinary charisma and passion. Parents adore the man who has been named teacher of the year, educator of the year and selected as one of four principals nationwide to participate in the Public Education Leadership Program at Harvard University.

Yet data, documents and interviews contradict the Gompers brand of preparing every student for college. Gompers’ standardized test scores — one metric for college acceptance — are among the bottom of schools in San Diego County and California. These numbers are in contrast to students’ straight A grades with courses in precalculus, advanced biology and AP history.

Teachers say grades are inflated, and if students still can’t graduate, they are “counseled” to attend school elsewhere. The same teachers who praise Riveroll’s talent blame him, saying he shames educators who assign failing grades by telling them they are “murdering” kids.

“He knows he’s not allowed to say, ‘Change their grades or else,’” said former Gompers chemistry teacher Ben Davey.

“But he can say, ‘You’re killing these kids, are you sure you want to leave it as an F?’”

Many people have pointed to rising graduation rates as evidence of the effectiveness of ed reform. And more kids graduating from high school is a good thing indeed. But there’s concerns about the graduation rate, involving juking the stats (again) and the fear that this stems from lower standards rather than objectively better students. (Take Renewal Schools here in NYC, for example.) I’m agnostic on the overall question, although I agree with pessimists who say, for example, that something like a third to a half of all graduating American high school students probably couldn’t demonstrate the requisite level of algebra ability required to graduate from high school. The question is, what happens when you combine intense pressure from above to graduate students along with the reality that (as I keep insisting) student outcomes are not nearly as plastic as policy types like to imagine they are?

Standardized tests show proficiency in math and English language arts at Gompers has gotten worse from 2011 to 2016. Forty percent of 11th-graders are below basic proficiency in English. Ninety-one percent didn’t reach the state standard for mathematics….

Six percent of Gompers students were considered “college-ready” based on their SAT scores in 2015-2016. Five percent based on their ACT.

Twenty-two percent of the Advanced Placement (AP) tests taken that year were marked three or higher, the level at which college credit is granted. San Diego Unified averaged 59 percent.

However, inewsource learned that of the 113 students graduating this year, not one earned a grade lower than a C in the first semester of their 2015-2016 school year. More than half of the class had straight A’s with courses in advanced chemistry, AP history and precalculus. Some of those students failed several lower-level classes the year before.

The class averaged a 4.7 GPA out of 5 the first half of their junior year.

It’s essential to say: this kind of dynamic, where a crusading spirit and insistence that everyone can achieve to the same level collides with the limitations of reality, makes fraud, lowered standards, or both inevitable. It’s an entirely predictable condition; as long as you make people’s jobs dependent on reaching metrics that they can’t reach legitimately, they will achieve them illegitimately. It doesn’t matter how much integrity they have. It doesn’t matter if they’re good people. It doesn’t matter if they’re really invested in their students success. Campbell’s Law is not a normative claim but an empirical observation; educational fraud does happen under these conditions, no matter what we think about it morally. And these lowered standards inevitably come back to bite us in the end, as these students go on to colleges where they either fail out from lack of prerequisite ability or are graduated into jobs they then can’t perform. That’s a problem here in the CUNY system, for example, where 57% of undergraduates can’t pass an algebra test. Advancing them in the system might seem humane at the individual level but in the broader perspective it’s just amplifying problems.

This is a story that some might imagine inspires a level of bitter cynicism in me. But I don’t feel embittered about the story; I just feel sad about it. It seems genuinely tragic to me, in that there are genuinely good intentions leading to these bad outcomes. The charter school world is no doubt full of profiteers and con men, but I also acknowledge that many people really believe their cheery, fingers-stuck-in-their-ears rhetoric about how every single child is capable of excelling. The problem is that this enthusiasm is destructive in a world where different students have very real differences in their individual ability and in the socioeconomic and environmental conditions in which they learn. Truly humane education policy would acknowledge those differences, not attempt to paper them over with cheerful, dishonest bromides. What we need to accept as a society is that what’s really “killing these kids” is not their lack of academic preparation for college but an economic system in which only those who are so prepared have a meaningful shot at a comfortable and secure life.

She remembers a strict but supportive director who valued 12-hour days out of his staff, along with sacrificing vacation time, career goals and a personal life. But she also remembers that Riveroll would bring her coffee after she’d put in late hours the night before.

“It is very challenging to balance this work,” said Parsons. “It is truly missionary work.”

So much pathology If it is to be as effective and pragmatically useful as it can be at scale, teaching cannot be missionary work. This has always been the problem with the Dead Poets Society, one-inspiring-teacher-breaks-through-to-kids-in-the-ghetto narrative. Even if I was sure these narratives actually reflected what’s best for kids, they are by nature not scalable or subject to being instituted by policy. It’s a hard thing for teachers to accept, but true: by its very nature, inspiration cannot be required or replicated. It’s a beautiful thing when the lives of students are changed in that way. But mass education has to be a system that works in the mundane constraints of real life.

Acknowledging those constraints is necessary if we’re really committed to improving our system for the pragmatic benefit of all. And the first step to getting there is to recognize that insisting that all students can excel will inevitably result in these kinds of pleasant lies.

Study of the Week: What Actually Helps Poor Students? Human Beings

As I’ve said many times, a big part of improving our public debates about education (and, with hope, our policy) lies in having a more realistic attitude towards what policy and pedagogy are able to accomplish in terms of changing quantitative outcomes. We are subject to socioeconomic constraints which create persistent inequalities, such as the racial achievement gap; these may be fixable via direct socioeconomic policy (read: redistribution and hierarchy leveling), but have proven remarkably resistant to fixing through educational policy. We also are constrained by the existence of individual differences in academic talent, the origins of which are controversial but the existence of which should not be. These, I believe, will be with us always, though their impact on our lives can be ameliorated through economic policy.

I have never said that there is no hope for changing quantitative indicators. I have, instead, said that the reduction of the value of education to only those quantitative indicators is a mistake, especially if we have a realistic attitude towards what pedagogy and policy can achieve.  We can and should attempt to improve outcomes on these metrics, but we must be realistic, and the absolute refusal of policy types to do so has resulted in disasters like No Child Left Behind. Of course we should ask questions about what works, but we must be willing to recognize that even what works is likely of limited impact compared to factors that schools, teachers, and policy don’t control.

This week’s Study of the Week, by Dietrichson, Bøg, Filges, and Jørgensen, provides some clues. It’s a meta-analysis of 101 studies from the past 15 years, three quarters of which were randomized controlled trials. That’s a particularly impressive evidentiary standard. It doesn’t mean that the conclusions are completely certain, but that number of studies, particularly with randomized controlled designs, lends powerful evidence to what the authors find. If we’re going to avoid the pitfalls of significance testing and replicability, we have to do meta-analysis, even as we recognize that they are not a panacea. Before we take a look at this one, a quick word on how they work.

Effect Size and Meta-Analysis

The term “statistically significant” appears in discussions of research all the time, but as you often hear, statistical significance is not the same thing as practical significance. (After “correlation does not imply causation!” that’s the second most common Stats 101 bromide people will throw at you on the internet.) And it’s true and important to understand. Statistical significance tests are performed to help ascertain the likelihood that a perceived quantitative effect is a figment of our data. So we have some hypothesis (giving kids an intervention before they study will boost test scores, say) and we also have the null hypothesis (kids who had the intervention will not perform differently than those who didn’t take it). After we do our experiment we have two average test scores for the two groups, and we know how many of each we have and how spread out their scores are (the standard deviation). Afterwards we can calculate a p-value, which tells us the likelihood that we would have gotten that difference in average test scores or better even if the null was actually true. Stat heads hate this kind of language but casually people will say that a result with a low p-value is likely a “real” effect.

For all of its many problems, statistical significance testing remains an important part of navigating a world of variability. But note what a p-value is not telling us: the actual strength of the effect. That is, a p-value helps us have confidence in making decisions based on a perceived difference in outcomes, but it can’t tell us how practically strong the effect is. So in the example above, the p-value would not be an appropriate way to report the size in the differences in averages between the two groups. Typically people have just reported those different averages and left it at that. But consider the limitations of that approach: frequently we’re going to be comparing different figures from profoundly different research contexts and derived from different metrics and scales. So how can we responsibly compare different studies and through them different approaches? By calculating and reporting effect size.

As I discussed the other day, we frequently compare different interventions and outcomes through reference to the normal distribution and standard deviation. As I said, that allows us to make easy comparisons between positions on different scales. You look at the normal distribution and can say OK, students in group A were this far below the mean, students in group B were this far above it, and so we can say responsibly how different they are and where they stand relative the the norm. Pragmatically speaking (and please don’t scold me), there’s only about three standard deviations of space below and above the mean in normally-distributed data. So when we say that someone is a standard deviation above or below someone else, that gives you a sense of the scale we’re talking about here. Of course, the context and subject matter makes a good deal of difference too.

There’s lots of different ways to calculate effect sizes, though all involve comparing the size of the given effect to the standard deviation. (Remember, standard deviation is important because spread tells us how much we should trust a given average. If I give a survey on a 0-10 scale and I get equal numbers of every number on that scale – exactly as many 0s, 1s, 2s, 3s, etc. – I’ll get an average of 5. If I give that same survey and everyone scores a 5, I still get an average of 5. But for which situation is 5 a more accurate representation of my data?) In the original effect size, and one that you still see sometimes, you simply divide the difference between the averages by the pooled standard deviations of the experiments you’re comparing, to give you Cohen’s d. There are much fancier ways to calculate effect size, but that’s outside the bounds of this post.

A meta-analysis takes advantage of the affordances of effect size to compare different interventions in a mathematically responsible way. A meta-analysis isn’t just a literature review; rather than just reporting what previous researchers have found, those conducting a meta-analysis use quantitative data made available to researchers to calculate pooled effect sizes. When doing so, they weight the data by looking at the sample size (more is better), the standardized deviation (less spread is better), and the size of the effect. There are then some quality controls and attempts to account for differences in context and procedure between different studies. What you’re left with is the ability to compare different results and discuss how big effects are in a way that helps mitigate the power of error and variability in individual studies.

Because meta-analyses must go laboriously through explanations of how studies were selected and disqualified, as well as descriptions of quality controls and the particular methods to pool standard deviations and calculate effect sizes, reading them carefully is very boring. So feel free to hit up the Donation buttons to the right to reward your humble servant for peeling through all this.

Bet On the Null

One cool thing about meta-analysis is that they allow you to get a bird’s eye view on the kind of effects that are reported on various studies of various types of interventions. And what you find, in ed research, is that we’re mostly playing with small effects.

In the graphic above, the scale at the bottom is for effect sizes represented in standard deviations. The dots on the lines are the effect sizes for a given study. The lines extending from the dots are our confidence interval. A confidence interval is another way of grappling with statistical significance and how much we trust a given average. Because of the inevitability of measurement error, we can never say for 100% that a sample mean is the actual mean of that population. Instead, we can say with a certain degree of confidence, which we choose ourselves, that the true mean lines within a given range of values. 95% confidence intervals, such as these, are a typical convention. Those lines tell us that, given the underlying data, we can say with 95% confidence that the true average lies within those lines. If you wanted to narrow those lines, you could choose a lower % of confidence, but then you’re necessarily increasing the chance the true mean isn’t actually within the line.

Anyhow, look at the effects here. As is so common in education, we’re generally talking about small impacts from our various interventions. This doesn’t tell you what kind of interventions these studies performed – we’ll get there in just a second – but I just want to note how studies with the most dependable designs tend to produce limited effects in education. In fact in a majority of these studies the confidence interval includes zero. Meanwhile, only 6 of these studies have meaningfully powerful effects, although in context they’re pretty large.

Not to cast aspersions but the Good et al. study is the kind of effect size that makes me skeptical right off the bat. The very large confidence interval should also give us pause. That doesn’t mean the researchers weren’t responsible, or that we throw out that study entirely. It just means that this is exactly what meta-analysis is for: it helps us put results in context, to compare the quantitative results of individual studies against others and to get a better perspective on the size of a given effect and the meaning of a confidence interval. In this case, the confidence interval is so wide that we should take the result with several pinches of salt, given the variability involved. Again, no insult to the researchers; ed data is highly variable so getting dependable numbers is hard. We just need to be real: when it comes to education interventions, we are constrained by the boundaries of the possible.

Poor students benefit most from the intervention of human beings

OK, on to the findings. When it comes to improving outcomes for students from poor families, what does this meta-analysis suggest works?

A few things here. We’ve got a pretty good illustration of the relationship between confidence intervals and effect size; small-group instruction has a strong effect size but because the confidence interval (just barely) overlaps with 0 it could not be considered statistically significant to a .05 level. Does that mean we throw out the findings? No; the .05 confidence interval isn’t a dogma, despite what journal publishing guidelines might make you think. But it does mean that we have to be frank about the level of variability in outcomes here. It seems small group instruction is pretty effective in some contexts for some students but potentially not effective at all.

Bear in mind: because we’re looking at aggregates of various studies here, wide confidence intervals likely mean that different studies found conflicting findings. We might say, then, that these interventions can be powerful but that we are less certain about the consistency of their outcomes; maybe these things work well for some students but not at all for others. Meanwhile an intervention like increased resources has a nice tight confidence interval, giving us more confidence that the effect is “real,” but a small effect size. Is it worth it? That’s a matter of perspective.

Tutoring looks pretty damn good, doesn’t it? True, we’re  talking about less than .4 of a SD on average, but again, look at the context here. And that confidence interval is nice and tight, meaning that we should feel pretty strongly that this is a real effect. This should not be surprising to anyone who has followed the literature on tutoring interventions. Yet how often do you hear about tutoring from ed reformers? How often does it pop up at The Atlantic or The New Republic? Compare that to computer-mediated instruction, which is a topic of absolute obsession in our ed debate, the digital Godot we’re all waiting for to swoop in and save our students. No matter how often we get the same result, technology retains its undeserved reputation as the key to fixing our system. When I say that education reform is an ideological project and not a practical one, this is what I mean.

What’s shared by tutoring, small group instruction, cooperative learning, and feedback and progress monitoring – the interventions that come out looking best? The influence of another human being. The ability to work closely with others, particularly trained professionals, to go through the hard, inherently social work of error and correction and trying again. Being guided by another human being towards mastery of skills and concepts. Not paying tons of money on some ed tech boondoggle. Rather, giving individual people the time necessary to work closely with students and shepherd their progress. Imagine if we invested our money in giving all struggling students the ability to work individually or in small groups with dedicated educational professionals that we treated as respected experts and paid accordingly.

What are we doing instead? Oh, right. Funneling millions of dollars into one of the most  profitable companies in the world for little proven benefit. Guess you can’t be too cynical.