TeachLab Presents The Homework Machine

Failure to Disrupt Book Club with Courtney Bell

Episode Summary

For TeachLab’s eighth Failure to Disrupt Book Club we look back at Justin’s live conversation with regular Audrey Watters and special guest Courtney Bell, a former research scientist at the Education Testing Services and now director of the Wisconsin Center for Education Research (WCER), UW–Madison School of Education. Together they discuss the book’s third edtech dilemma, the Trap of Routine Assessment. “The assessment practice of observing Justin teach or Justin teaching in an assessment situation is not the same, by definition from Justin's real world teaching… My assertion is, that's always true in every assessment. If that's the case, then we think to ourself where can technology fit into this thing?” - Courtney Bell

Episode Notes

“The assessment practice of observing Justin teach or Justin teaching in an assessment situation is not the same, by definition from Justin's real world teaching… My assertion is, that's always true in every assessment. If that's the case, then we think to ourself where can technology fit into this thing?”

- Courtney Bell

In this episode we’ll talk about:

Courtney’s edtech story - PalmPilot and Mursion
Complex performance assessment
History of assessment technology - TUTOR and PLATO
Real-world teaching vs. The observer effect
Capturing teacher decision making
Lack of social understanding in technology assessment
Peer-assessment technology
Meaningful feedback
Stealth Assessment

Resources and Links

Watch the full Book Club webinar here!

Check out Justin Reich’s new book, Failure To Disrupt!

Join our self-paced online edX course: Becoming a More Equitable Educator: Mindsets and Practices

Transcript

https://teachlabpodcast.simplecast.com/episodes/bookclub8/transcript

Produced by Aimee Corrigan and Garrett Beazley

Recorded and mixed by Garrett Beazley

Follow TeachLab:

Facebook

Twitter

YouTube

Episode Transcription

Justin Reich: Home Studio is a Teaching Systems Lab at MIT. This is TeachLab, a podcast about the art and craft of teaching. I'm Justin Reich. I'm here with Audrey Watters for the eighth episode of our book club series discussing my new book, Failure to Disrupt: Why Technology Alone Can't Transform Education. And today we're talking about Chapter 7, The Trap of Routine Assessment, with the incredible Courtney Bell. Courtney Bell is a research scientist and the Director of the Wisconsin Center for Education Research, where she focuses on teacher evaluation. Courtney is someone who deeply understands assessment and assessment systems, and she's also just a profoundly humane person, as you'll get to know in this episode. Audrey and I enjoyed the conversation and we hope you will too.

So Courtney, the way that we ask everyone to introduce themselves... We'll get more into your background specifically, but maybe you can start with your EdTech story. Is there any particular moment as a student or as a teacher that got you interested in assessment or interested in education technology? Or what stands out to you as you think about your pathway here?

Courtney Bell: Yeah, so two things. One, which maybe then, Justin, you're going to say to yourself, "Oh wow, I really should have rethought this guest." So the first one is, in graduate school we had to take a course at a different institution. And as a graduate student you have that self-consciousness of, "Oh, maybe I don't really understand what the big point here is." So here we were listening to this super famous science educator, who, this gives away my age, was working with PalmPilots.

And this science teacher, educator, researcher person was going on and on about how this PalmPilot was this amazing data collection tool. It was going to revolutionize K-12 science education. And I am sitting there, wheels turning as fast as they can turn like, "Okay, I don't get it." I had just been a science teacher. I had just been a high school science teacher. Biology, I don't get it. How is this going to revolutionize [inaudible 00:02:20]? What am I missing? So I come to EdTech as a skeptic, I should say first. So, a born skeptic of [crosstalk 00:02:26].

Justin Reich: I don't know if you can remember in that moment, what was it about the description of the PalmPilot that just seemed totally discordant with reality for you?

Courtney Bell: Well, the first and most important thing is it solved a problem I'd never had as a teacher. Okay? I didn't have any trouble getting people to write things down and collect data. I had 8 million probes that I could use. I have awesome graphing calculators. Why do I need a PalmPilot? It solved a problem I didn't have as a teacher. So I'm like, "Okay, I must be missing something. Maybe somebody else has this problem. I don't know."

Justin Reich: That's great. Did you say there was a second one?

Courtney Bell: Yeah. So that's the skeptic in me, which goes to the second story, which sends the skeptic in this way. And I can share a slide to show people this setup later, if we're curious about it. So when I was at ETS, we partnered with two organizations, TeachingWorks, which is a center at the University of Michigan run by Deborah Ball, for those of you who know stuff about math ed, and Francesca Forzani, her colleague there. And so, they were busy working on these things called high-leverage practices, which are discreet teaching practices that all teachers do across grade levels. They look very different across grade levels and subjects, but they're important, and they are repetitive. People do them a lot. So teachers do them a lot.

And so we've been partnering with them and Mursion, which is an EdTech company that we can say more about it in a second, but they do this AI-supported avatar technology. So it's basically a person, an actor, actually... When the company was originally founded, an actor behind and animating a number of avatars on the screen. And so-

Justin Reich: Like a digital puppetry kind of thing. [crosstalk 00:04:14]

Courtney Bell: You got it, digital puppetry.

Justin Reich: Somebody's sitting in a warehouse with an Xbox controller and a voice modulator, making five little avatars that look like children talk to people and stuff like that.

Courtney Bell: And it's a one actor to a five kid thing, in the case like Justin is describing. So what you do in real time is, they brought the technology to ETS to show us this. And so you would put on this headset, and it had a microphone around it and a camera would capture you. Imagine yourself standing in front of a big slide projector screen, like you would for a PowerPoint slide projector. So, you're up there in front of it and it's capturing you, and you've got this headset on. And then the kids would say something. They would say like, "Oh, hi, Ms. Bell." So I watched a couple of people go through what we call the simulator, and I was like, "Hmm." And everyone's quiet, and they kept asking like, "Oh, you want to hop in? You want to hop in?" Okay, so I'm like, "I'm in."

So what was really weird for me is this, the skeptic in me fully expected this to be kind of performing. Like me thinking about, "How do I test this assessment system?" That's what I thought it was going to feel like, and that is not at all what it felt like. It felt like somebody needed to have an MMRI on my brain. I seriously felt like I was a high school teacher again. These neural pathways that I was enacting of like the calling on the kids, who are, by the way, little cartoons, puppets, and I know that full well, and completely interacting with them using the kinds of thinking that I did as a teacher. Both as a high school teacher and as a university teacher. And that was a profound experience to me. Until I was inside that simulator, I would never have believed it would have felt that way.

Justin Reich: That's great. So the skeptic, fully prepared to screen new technologies and say, "Uh-uh, that's not what's going to work. That's not what's going to be helpful." And then you found something that you could step into, this digital teaching simulator, where you go, "Wow, this is making me exercise my brain in a way that feels really real and authentic to me and could potentially be helpful to other teachers." That's great. That's a great introduction.

So we probably ought to ask you for a little bit more background, just so people know where you're coming from. So you're a biology teacher, and then you taught for a little bit at the University of Connecticut, and then you went and worked at Educational Testing Services.

Courtney Bell: Yep.

Justin Reich: Which, for the folks who are out of the country, maybe you could just describe what ETS is and what your work there was like.

Courtney Bell: Yeah. ETS is a nonprofit testing company. So all that nonprofit part means is that the money that they do make from those tests fees, that all of us pay for various tests, the GRE, TOEFL, TOEIC, for those of you who are out of the country, you're probably more familiar with those, those then get invested back into the public good. And one of the versions of the public good is supporting foundational basic science research on all kinds of things within the assessment domain. So while I was there, I started and led a center that developed assessments of teaching quality. And some of them were technology-enhanced kinds of assessments, and some of them were not at all, like observation tools out in schools in US and around the globe, actually.

And I most recently, just before I left at the end of June this past year... We're just in the process, it'll be released next week. It's by the OACD, where we're doing a big, large-scale study in eight countries of the relationship between teaching and learning, using all different kinds of assessment tools. So my assessment background focuses on observation tools, but we've used multiple choice items, all kinds of computer adaptive stuff, all kinds of stuff around portfolios. So lots of different kinds of ways, but all for me personally around the assessment of teaching and teachers. That said, people that were in my center focused on students and their learning as well.

Justin Reich: Great. So your expertise is in this really challenging domain of, teaching is this immensely complex task where the outcomes of what happens from teaching is really hard to trace. As all teachers know, sometimes it's obvious that the kid in front of you clicks and gets it, and then sometimes it looks like it's obvious, but they actually knew it before and you haven't taught them a thing. And then sometimes it looks like they totally don't get it at all, and a month later they snap something from November with something from December and have this major breakthrough. And half of what you're doing isn't really related to academic content anyway, it's making sure they feel good, healthy, whole people. And how do we figure out who's doing that well, and what they are doing when they do that well, so we can tell other people about what that works look like and raise another generation of educators to be a little bit better than the last one. Is that a reasonable way of capturing what you're aiming for?

Courtney Bell: Yeah, for sure. Complex performance assessment is what I would call it for shorthand in assessment language.

Justin Reich: Good. Yeah, so simple performance assessment would be, can you add? Can you repeat something? Can you remember a list of numbers or something like that, and this is complex, doing a real world task. Okay. So hopefully now it's obvious to people why we invited you here to talk with us about the trap of routine assessment, one of these things that I describe as a fundamental dilemma for education technology, that if we can't get better at assessment technologies, then there's going to be parts of our education technology that remain stunted for a long time. Maybe we'll pull in Audrey Watters into the conversation here. And Audrey, and then Courtney, maybe you can, especially for folks who [inaudible 00:09:58] had a chance to read the chapter, what did you take away from it? What are the key arguments and key ideas here? And then we can get into what you think worked and made sense, and what was a problem? Audrey, do you want to start us off?

Audrey Watters: Yeah. I want to say one thing about ETS. I actually just finished-

Justin Reich: Yeah, anything.

Audrey Watters: I just finished working on a book on some of the history of EdTech, and one of the people who I look at is Ben Wood. He ran ETS for a while, but his archives were at ETS headquarters in New Jersey. And he was a professor at Columbia in the 1920s, '30s, '40s, and he was one of the very first people... So, he was really interested in standardized testing early, and early standardized testing. And at the time, things were graded by hand. And he was one of the first who became really interested in the idea of starting to use computational machinery, business machinery at the time, in order to be able to scale assessment.

It's just interesting to think about, we have this really long legacy of what assessment started to look like, almost a hundred years ago, in order for it to scale. He happened to have a partnership, reached out to IBM and was very interested in building machines that would automate the grading of tests. What you think about what those kinds of machines would look like in the 1930s, it's not a surprise that they were multiple choice tests. Just thinking about the machinery that we use to automate assessment is actually one of those classic, almost like cart before the horse kinds of things. In some ways we're still using a technology of assessment, the multiple choice test, that's a hundred years old.

I liked how you talked about TUTOR, the programming language from PLATO, is that we think that we're building these brand new artificial intelligence assessments that are using the latest and greatest in data analysis and machine learning, but really there's this whole other really long legacy of assessment that we're still kind of stuck with.

Justin Reich: PLATO was a computer system that was developed at the University of Illinois, Urbana-Champaigne. It was one of the first massively networked computer systems and so people bought terminals rather than machines and they hooked into the, like we do now with most computers, but it was sort of an internet before there was an internet. TUTOR was one of the very first programming languages for PLATO after machine code and it was called TUTOR because one of the main things that people tried to do with the PLATO computer system was to teach other people. They taught them lots of things. The example that I cite in the book is from a lesson about art history, but a lot of what they taught was math. Then of course, universally throughout the history of computer assisted instruction, people are trying to teach computer programming.

Some of the most popular lessons in TUTOR were how to use TUTOR to program other lessons that people could take in other topics. One theme of the book is that not only our education technologies, but in some ways our whole learning systems, are shaped by what assessment technologies are available to us and those assessment technologies have always been limited. They've always been constrained in various ways.

Then, of course, everyone should buy Teaching Machines and it comes out next year, from MIT Press, by Audrey Watters, which I've had a chance to read and is outstanding. I'm hoping that we'll be able to do another conversation like this, but Courtney, what would be your take away on the trap of routine assessment?

Courtney Bell: I love the ideas that connects back to the workplace. I love the connection back to the sociological, which is right. Society keeps valuing these things that, in very simple terms, just are more and more increasingly complex human behaviors. Whether it's problem solving or collaboration. Increasingly, both economically and as humans, we value those things. That's fine and we're able to take apart and decompose or dissect the lower level things, but it's that, that we teach to computers and so computers by definition are always going to be our assessment technology to connect, Justin to your words. Our assessment technology is always going to be back behind the thing that we value in society and the thing we want most for our children or want most for our undergraduates, for example. That's a trap, right? How do we think about the nature of that trap?

One of the things the chapter offers for us is this idea that it might be possible to broaden out what those computers can do with. We could pick up a little bit here, like maybe the framing of a problem. A computer can't figure out how to score whether or not Courtney can problem solve, but the computer could figure out or help figure out, make more possible at scale, does Courtney have the ability to read a complex situation and frame the problem? That's one piece of problem solving. It's not the whole of it. That to me is a very striking idea for a way to walk forward from a progress perspective.

Justin Reich: I think the chapter is a bit hand wavy about why this is so hard. One of the arguments that I make is just sort of empirically, if you look back over the last 20 years... education policymakers. There's no serious person out there who is arguing, no, no, no, we basically have all the education technology, the assessment technologies that we need, but let's just use them. In the era, in the United States of the common core, there are these two huge testing consortium's that are made and millions of dollars put behind them. Park and [inaudible 00:16:31] are balanced, they're consortium's of state with the [inaudible 00:16:35] to try to come up with better [inaudible 00:16:39]. Then there are universities, there's organizations like ETS and the college board, there's lots of smart people who are working on these things.

Somebody this morning, actually in Russia asked me, well, when AI comes along, how much of a difference is this going to make? My answer is something like, I don't think that much. We've had super smart people working on this problem with millions of dollars at their disposal for a long, long time, and lots of motivation, both financially, but also educationally, morally. I assume that people in testing companies look at their tests and go, yeah, we wish these things were better. I don't think I explained very well, why it's the case that it's so hard to make progress and I'm wondering if you, who've been inside the belly of the beast can give us some more insight into that dilemma.

Courtney Bell: First we need to think together about what we mean by the word assessment. We want to know some information from a certain kind of setting. We can think about these contexts in which human beings interact in the world, let's say as practices, with a little P practice. Not the practice, capital P, of teaching, but a little P practice. I can engage in a certain way with students around, let's say, double digit subtraction and it goes a certain way. That's a practice and I do it repetitively. Okay, fine. Let's say what we really care about, let's keep the teaching example.

Let's say what I really care about, from an assessment perspective, is I really want to know, can Justin teach very well? That's really what I want to know. I want to assess that. Justin has a practice, a little P practice, called Justin's teaching and it's going on all the time in the real world. For example, Justin plans his lesson this week, he thinks about the unit. Teachers in general do not think only lesson by lesson, they often have a curriculum and then they make a plan for the lesson for that day. They go ahead, they teach the lesson, they think about the lesson, they reflect on it and then they seek to address those lessons, maybe that particular lesson's, strengths and weaknesses over time. Maybe they found out that Audrey didn't understand the lesson on Monday and so the teacher decides oh, got to reteach that thing on Tuesday. Okay, fine. That all happens and that's a part of Justin's little P, real-world teaching practice.

Now we say to ourselves, we want to assess Justin's teaching. We want to know how well does Justin teach. We now have to intervene in some way in that real world phenomenon. In the case of children, this is in the real world phenomenon of how, for example, they learn how to read. They're learning how to read at lots of places, not just inside of school. They're learning in their homes, as they drive down the street on their bicycles, et cetera. Seeing stop signs. When we layer on top of real-world practice, which is the thing again, we care about in assessment, the practice of assessment, now we create all kinds of things going on.

First, we've got to make decisions about what lesson that we're going to go watch of Justin's. Justin's going to engage in selecting that lesson. If I'm principal, he's going to think real carefully about which lesson he invites me to if he gets the choice to invite me. Justin and I, as principal and teacher, might go back and forth about that lesson, et cetera and on, and on.

The point here being that the assessment practice of observing Justin teach or Justin's teaching in an assessment situation is not the same, by definition, from Justin's real-world teaching. That is-

Justin Reich: This is like the observer effect, that people see throughout science. That when you do things, when you intervene in some way, the circumstances become different. What we would ideally want to evaluate is this abstract thing called real teaching, but as soon as we start looking at Justin's real teaching, we don't get Justin's real teaching anymore. We get Justin's teaching under assessment.

Courtney Bell: That is exactly right and you cannot escape that. That fact is like, maybe somebody will argue with us, Justin, I hope they do. My assertion is that's always true in every assessment. If that's the case, then we think to ourselves, where can technology fit into this thing? Some people argue that the thing we need to do is to use technology to create opportunities to make the assessment space more like the real world space. This is the gaming stuff that you talk about. An example of this that we see, that's a low tech version of it, is when we started to do student portfolios. Vermont actually had a huge effort around student portfolios.

We want to keep the things and then use technologies, various kinds of technologies, to build upon the real world setting. For us, I think whenever we're assessing and the places where we can imagine assessment technology coming in, unless it figures out which pieces of this are we going to engage the technology in, we will always be doing the kinds of machine learning kinds of things that you're reacting to your Russian colleagues with, like, "Yeah, pretty sure you're not going to make a lot of progress on that." So we have to be clear-eyed about the reality that there is only so much right this second about the real world that the technology can get at.

And so people will be dissatisfied until we begin to learn about, for example, things as complex as games and simulations, et cetera. But that's not going to get us out of the problem that those are not the thing that we really want, what we really want is do kids understand science? Can kids problem solve? Et cetera.

Justin Reich: So I would say a thing that you're maybe arguing against in the book is that I think one way you can interpret my chapter is I say, "Look, what we basically do with assessments is some form of pattern-matching. And much of what we've done to improve assessment since then is to make more complex pattern-matching."

And I sort of make the claim like, "Well, it seems like the way we're going to make assessment better is by coming up with more and more clever ways of doing that kind of pattern-matching."

And I hear you saying, "No, that might be one thing to do, but a more interesting thing to do is to ask the question, 'Can technologies create environments in which we have more observational control, but people still perform in ways that seem authentic and natural to them,'" like you described with this immersion teaching. Where a great thing about the immersion teaching is that you can basically do it in front of any computer monitor; it doesn't have to be with a particular group of students on a particular day, and a particular, whatever... That that is a more promising way of getting out of the trap of routine assessment, is like building cooler worlds for people to perform in, rather than just trying to do better pattern-matching about their responses. How fair is that?

Courtney Bell: I think that's fair. And I would add not just cooler, there's actually something very specific we're aiming at. We're aiming at that too, we want the technology to be able to get us closer to the real world actions that the person is engaged in that we care about.

So in the case of teachers, their quality of teaching inheres in part in their decision-making, their moment-to-moment decision making. Do they hear Audrey tell the teacher the wrong answer and decide to ignore Audrey because the teacher's then going to call on Justin, who she knows has tracked that math problem, and is going to be able to explain to the class the steps that he went through? Or does the teacher make the choice why don't we let Audrey say whatever Audrey's going to say, and I'm going to work with what Audrey brings to us as a class. That's a decision a teacher makes in a moment. So if we can get our technology to get our teachers for example, in this case our students in the space where they're more likely to engage in the behavior we care about, we're much more likely to learn something that's worth knowing.

Justin Reich: It would be great to hear more, both from Audrey and Courtney about other things in the chapter that you thought, "Oh, this doesn't sound quite right," or, "We ought to rethink that." And then if there are questions that are coming up in the chat, it would be great to hear from some other folks. So I'll process the chat, but Audrey and Courtney, were there things in the chapter that you found otherwise, where you went, "Oh, I'm not sure if that's the right way to think about this?"

Audrey Watters: Let me think; I don't think that there was for me. I was really struck with a couple of pieces, and actually I think a lot about, what's his name, he's at Google, Peter Norvig. He wrote a piece, I think it was called The Unreasonable Effectiveness of Data, just this idea, and I think it is very commonly held within a lot of engineering folks, that as long as we do have tons of data, the answers are going to bubble up to the top.

Norvig would say, "We don't need theory anymore, we have data." And I think that that runs really counter to some of the stuff that Courtney was talking about, with really carefully thinking about not just how do we design assessment, but what are we designing in terms of instruction, in terms of curriculum as well?

Courtney Bell: I love that thought, Audrey. One of the things I'd argue back, I guess Justin, to the chapter, is not so much that the chapter gets it wrong, but the chapter narrowly has to talk about assessment kind of as a tool that we can use for a particular purpose. And it puts into the background, all authors have to do this, it puts into the background the social setting in which it comes and sits.

So the story that you tell about the dual, can you create an automated scoring engine that can throw all the words into a word bag and figure out with some amount of similar-to-human-level of reliability, what score that essay should be given? And then the computer programmers on the other side are like, "Ooh, let's break this thing, and let's figure out how to send it gibberish, and have that thing score high."

And one of the things that's so profound about that, and you do mention it, is this idea some people object to automated scoring engines of text, because they feel like we write for audiences, we're human beings in interaction, you never write for a nondescript audience; you're always writing for a purpose. So already you're bankrupting it by first making an assessment, and second then when you put an automated scoring engine into the whole thing, it's like, "What are we even after anyway?"

And the thing that it made me think about is the idea that assessment at some level is built on this idea that we know what we're measuring, and we agree, I'll say scores, what the scores coming out of that assessment mean. And so if the technology is only ever aiming at getting those scores right, and doesn't actually aim at getting the meaning right, I'll say something provocative, I think we've done something where we've actually started to erode the trust in the assessment itself; because the person taking the test, and the person designing the test is actually trying to get at math knowledge, they're actually trying to get at writing capability.

So that isn't to say we shouldn't work on automated scoring engines, for sure we should. But it is to say assessments wind up in social situations, and the whole assessment enterprise is built on shared trust and shared meaning in what those scores mean. So to the degree technology begins to undermine that, we've got a serious problem on our hands.

Justin Reich: Well, I think the most powerful illustration of that undermining of trust and communication, I'm sure there are others, Audrey may have her own suggestions, but was with peer grading.

So when massive open online courses were released, there are a bunch of folks who realized, "We're not going to be able to assess some of the things that we most care about using multiple choice questions, using AI grading; but it's entirely possible that if we ask a bunch of people in the class to evaluate someone else's performance, then what we'll find is that the average of those peer assessment scores comes out to be typically what an expert would say, or that a group of peers will disagree about as much as two experts will."

And that proved generally to be true. It proved particularly to be true when people came up with clever mechanisms, first [inaudible 00:30:03] testing people as peers. So they'd say, "Courtney, evaluate these five essays from your colleagues," and then secretly give you two that they had already graded.

And so if you were way too easy or way too hard, we could be like, "All right, let's down-weight Courtney, because she's way too tough or she's way too easy." And then if you were right on with those other two, we can up-weight you. Once we do a few tricks like that, it turns out that if you randomly assign a hundred essays to be graded by peers, and then randomly assigned a hundred essays to be graded by experts, that they average out to be about the same scores, or reasonably close.

But what a grade means to a person in a course is that there is a single mechanism, which has evaluated your performance, usually an expert, and then given you some meaningful feedback to it, not this gibberish that it just took me three minutes to describe of like, "Yeah, a bunch of people at your thing, and we average it, and we're pretty sure that that average is about what an expert would have given you. And so even though these people are individually clueless, we're fairly confident that the wisdom of the crowds should [inaudible 00:31:10] evaluation."

And some of the research that came back on people's response to this was like, "No, this just doesn't feel right." Like it wasn't enough to build the peer assessment technology, you had to re-inculcate people into a new culture where there's a new kind of trust. And the thing that the peer graders were trying to get away from was these mechanisms of automated assessment that felt cheesy; it feels gross to have a computer say, "Well, we didn't actually read your essay, but we predict based on your word usage that a human would have graded it as X." But even when you have multiple humans, you still come up with these dilemmas in the social situation.

Audrey Watters: To that end, Justin, I think that this ties back to what we were talking about when Dan Meyer was the guest, is in that situation, those peers were not part of your community. We use the word, "Peer," but in a MOOC with 10,000 participants, it wasn't really part of your community.

And I think that the other piece that you just alluded to is a lot of this automatic grading stuff that always makes me chuckle is that they claim that auto graders are just as good as the people who grade the essay portions of standardized tests. But the essay portions of standardized tests are also graded in these massive warehouses, with people making barely minimum wage, who are given a rubric to follow. It's not the way in which your teacher, again, who's part of your learning community would grade you.

It's a job that someone got from Craigslist and is making minimum wage doing. I mean, the bar is really low. We're not having students' work read by their community. We're not having their writing read by people that they're engaged with.

Justin Reich: That's right. I mean, I think it's exactly what Courtney said too, about the idea of this sort of, you're trying to modify the social situation, like, "Hey, the 10,000 of you that are taking this class, you're a community now, and your community is going to evaluate that." And of course lots of people would go, "No, no, no. I'm pretty sure these 10,000 people I've never seen before, some of which are just typing gibberish into this peer editing thing are not my community." Yeah, and absolutely right, that peer editing might come off very differently in a group of 30 or 40 that know one another, rather than in a group of 10,000 for whom you could never plausibly meet.

Eric, who works on Graspable Math, makes a comment where he's saying that part of what they're trying to do with their particular piece of education technology related to stealth assessment is to not make people feel like they're being assessed. Because when people feel like they're being assessed, then they behave in different ways that are not real-world ways.

In your field, Courtney, of teacher evaluation, this must come up all the time. I teach a certain way when it's just me and my students in the classroom. As soon as someone else walks in with a PalmPilot that's taking notes on my performance or, with a rubric or with a video camera or something like that, I go, oh, maybe I don't want to teach as regular me, like makes a bunch of off-color jokes and regular me does these other kinds of things. I mean, is that connected to this idea of the observer effect again? If as soon as we tell you you're being assessed, you start doing something different, it seems like a major problem with assessment.

Courtney Bell: By definition. And then when you scale it, you get Campbell's Law, right, which is sort of disastrous. You've doubled disasters. So you've impacted it by assessing it, and now you incentivize certain kinds of behaviors, which is back to the thing Audrey was commenting on. We [inaudible 00:35:16] this is what we think math learning is, down to this assessment, right?

But here's the thing. It's not like out there in nature, so to speak. Out in the wild, everything is all great. Things are not great, right? Lots of kids are not learning what we want for them to learn, what they're capable of learning.

So we shouldn't somehow vilify assessment as like, "Oh, this thing is just awful on a hundred." It's not, but we absolutely have to have clearer eyes that even in a technology-enhanced environment, it's only ever going to do part of the work that we need it to do. And in my mind, if we could get over that and be like, "Yeah, okay, it's only going to do part of it. Let's figure out how to make it do the work that it's best suited to do from a technology standpoint," like you were talking about at the beginning, Justin, build rich environments, figure out how to get to interactions. That feels like that's really good and that's a fruitful direction to press ourselves in as researchers.

Justin Reich: And what sort of sustains you? Because this is something that you've dedicated so many years of your career towards, but you come at this as a bit of a skeptic, you come at this with some authentic teaching experience, and then you decide to go into assessment design. And because you're there, you see all of the problems and all of the challenges that Audrey described. What are the things that make you go sort of at the end of the day, "But yeah, this is still a thing worth really pushing on and worth really trying to get better at."

Courtney Bell: I mean, I guess this is so personal. Left to our own devices, we're tribalists, right? We're people who we love one another, we want to take care of one another, but the truth is we're pretty bad at doing it with people that are different than ourselves. And we have a really long history in this country and around the world of doing that to one another. So assessment can be, and people will really hate this thought, one upside of NCLB, No Child Left Behind-

Justin Reich: Yeah, No Child Left Behind, which was this act which sort of mandated greater assessment in third through eighth grade and tenth grade in the United States.

Courtney Bell: It shed light on something that had been going on in the United States for years, which is we have been failing certain groups of kids systematically and egregiously. Egregiously. Now, it led to a ton of horrific other implications, bubble kids and all kinds of teaching behaviors and test prep kind of things that truly have been very detrimental for education, but if we can't document that there's a problem, it's very hard to act on it.

So for me, if we take assessment and we right-size our expectations of it, and we try to work on its most productive aspects, and we treat it as a part of a larger system that we use to help ourselves get better in this democracy, speaking specifically in the US context, to create more equitable learning opportunities and outcomes for all children in this country, I think that's the best we could hope for.

And I'm not optimistic we'll do it without something that sheds that light. I guess that's the thing. So in some ways, it's like the worst of two evils. Do we let ourselves be just regular, keep going as we are, or do we pick a tool or a set of tools, put it together with other information, and chain ourselves to that as a society to try to use it as a tool to improve overall? And I guess I choose the second.

Justin Reich: Well, I don't think you could have a more impassioned argument for sort of the tinkerer's view towards assessment, that this is what we have, there's a particular function that it can perform if we build that well and if we build the whole system around that well, and for all of its imperfections, to say, "Well, let's keep kind of working on this thing until we can get it more right."

Audrey, do you have any parting words thinking about the trap of routine assessment this week?

Audrey Watters: Yeah. I mean, it's been interesting to watch the chat and think about the ideas of how do students feel anxiety and distrust around assessment, and how do we create situations where students feel less anxiety? And I think that part of that is not actually having the stealth assessment, so that students are actually sort of not just having their tests once every other year or a big test in the spring, but the students are somehow always being assessed seems to lead us down to some other paths that we've talked about already with surveillance. And I guess we'll talk about with Canvas, right, when we think about data.

So I do think how do we answer some of these questions and are there ways in which we can think about it without being so reliant on scores? I think that that's an important takeaway.

Justin Reich: Terrific. Well, Audrey Watters, thank you once again. Courtney Bell, thank you so much for joining us. A really great conversation and one that I hope a lot of folks will have benefited from. I know it was helpful for me to think about some of the ways that the chapter that tracks assessment [inaudible 00:40:51] some ways forward, but you've added some more to that list, which was really wonderful. So thank you so much, Courtney, for joining us.

Courtney Bell: Yeah, thank you guys. It's great.

Justin Reich: That was our Failure to Disrupt Book Club conversation about Chapter Seven, The Trap of Routine Assessment, with Courtney Bell. Thanks to Courtney and Audrey Watters for joining us. If you'd like to dig deeper into what we discussed in today's episode, you can check out our show notes. You can find my new book, Failure to Disrupt: Why Technology Alone Can't Transform Education, available from booksellers everywhere, but buy it at your local bookshop. Be sure to check out related media and sign up for future online events at failuretodisrupt.com. That's failuretodisrupt.com.

And join myself and Vanderbilt professor and author Rich Milner in a free self-paced online course for educators, Becoming a More Equitable Educator: Mindsets and Practices. Through inquiry and practice, you'll cultivate a better understanding of yourself and your students, gain new resources to help all students thrive, and develop an action plan to work in your community to advance the lifelong work of equitable teaching.

You can find the link to this edX course in our show notes, where you can enroll now and join hundreds of educators, passionate anti-racist educators from around the world who are talking with one another about how to do this incredibly important work better.

I'm Justin Reich. Thanks for listening to TeachLab. Please subscribe to TeachLab to get future episodes. And if you like our podcast, share it with a friend and leave us a review.

This episode of TeachLab was produced by Aimee Corrigan and Garrett Beazley, recorded and sound mixed by Garrett Beazley. Stay safe. Until next time.