Longtime readers should know that Sherman Dorn is one of my favorite people in the edusphere. His recent “How can we use bad measures in decisionmaking?” is a fine example of why I value his contributions so much.
His titular question is THE QUESTION at the heart of so much ed policy action these days. Nobody who isn’t seeking profits or losing their mind likes the tests being used — not Arne Duncan, not Barack Obama, not the people in Madison poised to build a Gifted Education house of cards on them — but almost nobody wants to give up on the tests and many want to expand their use (Arne Duncan, Barack Obama, those house of card builders in Madison).
Everyone talks of better tests, multimodal assessments, new ways of looking at data…. All this can be good, however we aren’t there yet and the simple-minded attraction of letting the flawed data “drive” education policy is strong (the current draft of the MMSD Strategic Plan has both reasonable data ” inform[ed]” and frightening “data driven” language). Additionally, at least three truths often get lost when better assessments and data are discussed (Dorn hits most of all of these).
- All assessments and data are of limited utility. They are snapshots at best; they are only designed to measure specific things; standard deviations and confidence intervals recognize some of the limits, but are rarely part of “accountability” discussions. the temptation to use assessments for things they are not designed for is always there.
- Because better assessments should mean assessing more things in more ways,fulfilling this promise will result in more time and resources devoted to assessment and analysis and less to teaching and learning.
- Employing multiple assessments or sophisticated data analysis (ie Value Added) moves away from transparency in accountability. It already clear that few policy makers, much less members of the public, understand the nature of current assessments and accountability practices. When you employ Value Added techniques all but the most statistically adept are shut out (some Value Added methods are proprietary and even those who commission the analysis are kept in the dark about the nature of that analysis; others are open, but beyond the understanding of most people). Combining multiple assessments, including qualitative approaches, produces similar issues. The MMSD Gifted plan is a perfect illustration. They promise to identify potential and achievement with referrals and multiple assessments over five domains (academic, creative, leadership, visual and performing arts) and then decide who gets the extra services based on “percentile scores.” Does anyone think that the promised “transparency” of this exercise will be meaningful to parents and Board members?
This was supposed to be about Sherman Dorn’s post, so back to that (although I think the above — especially the local stuff — is a salient context for what Dorn wrote).
After much good introductory material (including a link to the relatively recent, must read Broader, Bolder Approach Accountability Paper), Dorn explores a variety of positions relative to the problems of “data that cover too little,” and “data of questionable trustworthiness.” His presentation of their strengths and weaknesses is insightful and informative.
Dorn himself rejects both the “don’t worry” and “toss” extremes and seeks to extend (begin?) the conversation in pragmatic directions. Here is how he closes:
Even if you haven’t read Accountability Frankenstein or other entries on this blog, you have probably already sussed out my view that both “don’t worry” and “toss” are poor choices in addressing messy data. All other options should be on the table, usable for different circumstances and in different ways. Least explored? The last idea, modeling trustworthiness problems as formal uncertainty. I’m going to part from measurement researchers and say that the modeling should go beyond standard errors and measurement errors, or rather head in a different direction. There is no way to use standard errors or measurement errors to address issues of trustworthiness that go beyond sampling and reliability issues, or to structure a process to balance the inherently value-laden and political issues involved here.
The difficulty in looking coldly at messy and mediocre data generally revolve around the human tendency to prefer impressions of confidence and certainty over uncertainty, even when a rational examination and background knowledge should lead one to recognize the problems in trusting a set of data. One side of that coin is an emphasis on point estimates and firmly-drawn classification lines. The other side is to decide that one should entirely ignore messy and mediocre data because of the flaws. Neither is an appropriate response to the problem.
I probably don’t do justice to his post. Read the whole thing.
The reality is that bad data is being used and that the uses are expanding. I am not as sanguine as Sherman Dorn about the potential for better data and better ways of using it (I’m guessing he’d object to the word sanguine here, and he’d be right because it does not capture where I think he is coming from. Take it not as an absolute but only as a comparison with me), but I do know that explicit discussions of the issues involved like Dorn’s post are necessary to progress.
Thanks Sherman for the questions and answers.
Thomas J. Mertz