Category Archives: assessment

Testing 105: Stardardized Tests – The Students’ Point of View

An interesting conversation happened on the metro on the way home from the airport the other day. A gentleman sat down next to me. Through a bit of small talk we discovered that we were both educators; he at a nearby college, and I at a public elementary school. The next thing you know, he mentioned something about the education reforms of Michelle Rhee.

I took a deep breath, and thought to myself, “Do I make an excuse not to chat and stare out the window all the way to my park & ride stop, or do I dive right in to this giant can of worms that is Public Education Reform?”

I took a calming breath and said, “I’m not a big fan of Michelle Rhee.”

As expected, he followed up with, “Oh? Why not?”

Now, my answer could have gone in many different directions. But our commute was generally less than 30 minutes, so I had to narrow it down. I picked teacher evaluation.

I told him that I felt that it was not fair to evaluate teachers based upon their students’ test scores.

The test scores she was using to determined teacher effectiveness were not representative of what the students knew or were able to do. They were norm referenced standardized tests, identifiable because the scores were reported in a percentile ranking, and were designed to rank. The state tests were given in the Spring, and the results were not published until the following fall. By then, a teachers has a whole new class, and last year’s class is off to middle school.

Then I told him the story I heard at Bunko. I play this silly dice game called Bunco once a month with eleven other women from my neighborhood. Most of them are white, affluent and privileged, as are their children. It was May, which means testing time at the local middle school. The son of one of the bunko moms got home from school, and being the proactive, interested, involved-in-her-son’s-education mom that the bunko lady was, she asked him how the testing went. His response was something to this effect: “Who cares? The tests don’t count for anything, anyway. If the teachers were nice and gave us treats like in elementary school, then we might try harder. But they just talk down to us and treat us all like cheaters, so why should we try at all? It’s just a big waste of time.”

Now, before we go all ballistic on the poor, apathetic, pre-teen kid; let’s see if we can figure out where he’s coming from.

Let’s start with, “Who cares? The tests don’t count for anything, anyway.” From his point of view, he is exactly right. The tests do not count toward any grade on his report card. The tests do not count toward any exit requirements for middle school, nor any entrance requirements for high school. If fact, the students, their parents, (and the students’ teachers) won’t even get the results of the test until the following fall, by which time our middle schooler will be well into the first semester of his next grade level, and will have forgotten all about last year’s testing.

And when the results do come, they don’t give anybody much relevant information. One can determine a student’s percentile ranking for each test. A 70th percentile ranking means that the student scored better than 70% of his peers on a particular test. One can find out how many questions were possible, and how many responses were correct. But one cannot find out what the questions were, so there is no way to use the results for any individual’s educational purpose.

There is also no way to tell if a student accidentally skipped one, missed one because it was left blank, missed one because of a stray pencil mark on the answer document, or maybe accidentally bubbled in the correct answer on the wrong line, or bubbled in two circles for one question by mistake. Pesky humans!

Middle school students have, by now, caught on to the fact that the state test does not matter to them personally. And as human juveniles, they have not all reached that cognitive level of social awareness and civic responsibility that would motivate them from within to do their best on the test anyway. So the results are skewed for everyone. And I can’t really blame them. After all, “the tests don’t count for anything, anyway.”

On to the next gripe. “If the teachers were nice and gave us treats like in elementary school, then we might try harder.” Clearly, this student has fond memories of his elementary testing windows. I can’t speak to his elementary school testing experience, but I can speak with authority about my 29 years of testing experience at 3 different schools in 2 different California districts.

In elementary school, we do, indeed, try to make it fun. Testing time has almost a Mardi Gras atmosphere. There is a big lead up to the two week testing window. Lots of information about dates and times is sent home to the parents. Students, parents, teachers, administrators, counselors, custodians, secretaries, the PTA parents; everyone is soundly reminded that while the testing is very important to our school, and that we want students to do their best on the test, the testing days during the two week testing window will be normal, business as usual days, just like any other school days.

Except that parents and students are urged to not be absent during the testing window. Any non-emergency appointments, like for doctors or dentists, should be scheduled for outside the testing window so no student has to come to school late or leave early. But it will be just like any other school day.

Except that parents and students are urged to come to school on time; maybe even a few minutes early. A student who comes in late disrupts the learning environment for the other students, and it can often throw the rest of the tardy child’s day off kilter if he comes in late after all the others have started teaching and learning. But it will be just like any other school day.

Except that parents and students are reminded that students need to go to bed early and get a good night’s sleep (8-10 hours!) so students can be well rested and do their best on the test. It will be just like any other school day.

Except that parents and students are reminded that students need to eat a healthy breakfast, including protein, so they won’t be hungry and will have fuel in their tummies to do their best on the test. It will be just like any other school day.

Except that at recess, students get to have a snack of cheese and crackers just to make sure that no student (at this 100% free lunch school) fails to do his best on the test because he was hungry. But it will be just like any other school day.

Oh, except that recess times and the lunch schedule might be altered to accommodate the testing schedule. And part of the playground might be off limits during specific times during the day for distraction mitigation. But it will be just like any other school day.

Oh, and Speech will be cancelled during this two week period, and so will RSP and the library and the computer lab, because we need the credentialed teachers to help proctor. But it will be just like any other school day.

Oh, and both instrumental and vocal music will be cancelled. But it will be just like any other school day.

AND, on the Friday before the testing window begins, we get celebrate with a (use your best Game Show Host announcer voice here) “Do your Best on the Test” Pep-Rally Assembly! (And the crowd roars!) The principal hands out Smarties (that he has asked the staff to voluntarily donate so he will have enough for 850 kids.) Mr. C. will teach the students how to do the wave, and video tape it! We will sing songs and stamp our feet! The student council members will do a dance that the principal taught them! And the principal will announce that each day during the two week testing window, each student who comes to school ON TIME will receive a red ticket. All of the red tickets will be put into an opportunity drawing for this Shiny New Red Bike!!! (Picture the student council president riding around in circles.) Even students who don’t get tested yet (kinder through 2nd grade) will get on-time attendance tickets, but their opportunity drawing will be for something else, not the bike, because they don’t get to take the test just yet. But we want ALL of the students from ALL of the families to be on time every day during the two week testing window!

But it will be just like any other school day.

Can you say, mixed messages? The elementary school students still fall for this sort of thing, and for the most part, they really DO try their best to do their best on the test. But by the time they get to middle school, when all of the ‘fun’ of testing disappears, the kids catch on pretty fast.

Forging ahead: “But they just talk down to us…”

Well, maybe that’s because we have to. To administer the state standardized test, each teacher receives an instruction booklet. The booklet is scripted. It specifically tells us what to say, in little boxes that say, “Say:” We are required to read the script exactly as it is written; no additions, no omissions. We sign a document that says we promise to do this. This is to ensure that each teacher in each classroom says to the students the exact same thing that every other teacher in every other classroom in the state says to the students. This is a way to protect against one set of students getting an unfair advantage (or disadvantage!) because the directions to the test were explained in a different manner.

To middle school students, who have heard similar directions about, for example, how to fill in the bubble completely, since at least second grade, this scripted language could certainly be construed as ‘being talked down to’.   I get that. Can’t fix it.

And: “…treat us all like cheaters…”

Again, this has to do with standardization practices, but how would a middle schooler know that? Before the testing window begins, we must remove from the classroom anything that could give a student an advantage over another student somewhere else in the state taking the same test. If math facts, formulas, vocabulary words, a poster illustrating the writing process, diagrams, student work, and other charts and posters are decorating one classroom and not another, then the conditions under which the test is given are not the same; they are not standardized. So, it all comes down off of the walls before the test. And nothing goes up in its place until after the testing for the whole school is completed. So what looks like ‘ensuring the test is given under similar conditions’ to the staff, looks like, “You take bulletin boards down because you think we will cheat,” to the students.

And we rearrange their desks from the cooperative groups of four or six that they have been in all year to stand alone, individual desks.

And we make them use desk dividers.

And we collect their cellphones first thing in the morning, and return them as they are leaving on testing days.

And they are not allowed to go to the bathroom while the test is in progress.

And then there are the proctors. The proctors (of which the principal is one!) periodically wander in and out through classrooms.

The teachers know they are there to monitor teacher behavior. Are we sticking to the script? And we replacing dull pencils with sharp ones in a timely fashion? Are we keeping test materials in a locked cabinet when not in use? Are my walls bare and free of any hint-giving materials? Are we keeping all the promises we made when we signed the testing agreement?

But the students think the proctors are there to make sure that they are not cheating. To the tweens, what else would proctors be doing?

Obviously, we are assuming they will cheat if given half a chance, and we are doing everything in our power to remove that option, right?

So yeah, during the testing window, we appear to talk down to the students and we appear to treat them like cheaters. Yeah. On the up side, they notice this as a drastic change from status quo, so that means we are not treating them that way the rest of the school year, right? Whew! A silver lining!

And you want to evaluate my effectiveness as a teacher on that student’s test score?  The guys sitting next to me on the Metro said he had no idea, and thanked me.  You’re welcome.

I am not a fan of Michelle Rhee.

 

 

 

 

 

 

 

 

 

 

 

 

Testing 104: Standardized tests-Criterion Referenced, or Norm Referenced?

Wait, we’ve just learned what it means for a test to be standardized, but now we hear that there are different types of standardized tests?

Yes! Not all standardized tests are created equally! (Which is a little bit ironic when you stop and think about it.)

Standardized tests can fall into one of two broad categories: criterion referenced standardized tests and norm referenced standardized tests. It is important to know the difference between the two, because each type of standardized test has a different purpose.

The purpose of a criterion referenced test is to find out to what extent any given student has mastered a specific set of criteria. A spelling test is a good example of a criterion referenced test. One student’s test answers are compared to a key of correct answers, and the results show how well the student did on the test. A well written criterion referenced test will show what the student knows, understands, and is able to do with respect to a specific set of criteria.

On the other hand, the purpose of a norm referenced test is to sort; it is designed to take any given set of students, have them take the same test, and, using the results, rank them top to bottom, from highest scoring student to the lowest scoring student compared to each other. The SAT test commonly used for college admissions is a good example of a norm referenced test. The results of a well written norm referenced test will form a perfect bell curve, with a few scores spread out at opposite ends of the curve, and a whole bunch of scores clustered in the middle. The results of this type of test are used to rank; to compare how each student did compared to the others within the same group who took the same test. These kinds of tests have their uses, but norm referenced test results are not designed to indicate how much students know, understand, and are able to do.

Luckily, it’s super easy to tell which kind of standardized test is which by simply looking at how the test results are reported.   If the score of the test is reported as a percent score, for example, ” Your child scored 70% (read 70 percent) on the spelling test”, then it is a criterion referenced standardized test. If the score of the test is reported as a percentile ranking, for example, “Your child scored in the 70th %ile (read 70th percentile) for Reading”, then it is a norm referenced standardized test.

So in the realm of standardized tests there are criterion referenced standardized tests and norm referenced standardized tests…two different types of tests, each being necessarily standardized, created for two distinctly different purposes: criterion referenced tests are designed to measure mastery of specific criteria, and norm referenced tests are designed to rank.

So, if I wanted to find out what you now know, understand, and are able to do with each type of standardized test, would the best assessment for the job be a criterion referenced test or a norm referenced test?

Testing 103: Testing Environment Standardization

So a standardized test, because it is ‘standardized’, ensures fairness, equity and justice, right? Since every student taking that test is tested on the exact same content under the exact same circumstances, it stands to reason that that gives everyone taking the test an equal opportunity to do well, right? And that the results will be an unbiased way to compare one student to another, apples to apples, right?

But that is not always as easy as it sounds

Yes, the testing environment and testing conditions are standardized, so each student taking the test will take it under the same circumstances as every other student taking that same test.  The trouble is, students are human, and like snowflakes, each one is different. (And states are different, and school districts are different, and individual schools within a district are different, because they are all filled with those pesky humans!)

A standardized test often has a time limit. That is part of standardizing the testing environment. If one student had three hours, and another student had just 30 minutes, then the testing conditions are not the same, and the student who had more time, may have an advantage over the student who had only 30 minutes. Using the results of such an assessment for comparison reasons would be like comparing apples to oranges, as the saying goes.

As another example, at each testing site, the testing window always occurs after the same number of days in school. This ensures equity across different school calendars. In California, the Education Code (Ed Code) dictates that one school year will contain exactly 180 student days. But the start dates, holidays, and vacations are up to each individual district to decide. If the whole state had the exact same testing window start date, then kids at schools that began the school year before Labor Day, for example, would have attended more school days before the test begins than kids at schools that started after Labor Day. Another example is that a school on a Traditional Calendar would have had more student days in school before the testing start date than a Year-Round Calendar school that had already had a number of off-track days before the start of the testing window. In order for the test environment to be standardized, i.e. the same for every participant, the start date is set to begin after a specified number of school days, rather than a date on the calendar.

So standardization makes it fair and equal for everybody.

Oh, except for that one student who was out for a week and a half back in February because he had to have an emergency appendectomy. (You can’t plan that sort of thing for vacation time!)

Oh, and that other student who moved here from out of the district. They were on a different school calendar, but now she is on our calendar. She hasn’t been in school as many days as the students who have been at our site since the first day of school, but she takes the test on our schedule now anyway.

Oh, and the student who is chronically tardy through no fault of his own. His parents just can’t seem to get him to school on time, so he misses anywhere from 10 to 30 minutes of instruction per day. If that happens for the first 120 days of school, averaging 20 minutes late, that’s 120 days X 20 minutes, which is 2400 minutes, or 40 hours, or almost 7 days, based on a 6 hour school day. That’s a lot of time.

Sometimes, special accommodations are made in the testing environment; in some circumstances they are even required. For example, accommodations may be made for a certain student population, perhaps according to Individualized Education Plans (IEPs). Certain students, who have been identified with specific learning disabilities, may take the test in a small group setting, or be given more time than the general population. Or, second language learners may have the test read to them in their primary language, or have access to a bilingual dictionary during the test. Those accommodations would be with the intention of leveling the playing field, in hopes that apples could still be compared to apples, despite the learning disability or language barrier.

There is a lot to take into consideration when one tries to standardize the testing conditions across an entire state, for example.

In our district, there is a list of things to do before and during the testing window to ensure standardization. It looks something like this each year:

Before the testing, each site will:

  • Notify families of the testing window and encourage 100% attendance. Discourage absences, late starts, and early outs for students for things such as doctor or dentist appointments that could be reschedule outside of the testing window.
  • Cancel all special pull-out programs scheduled during the testing window. This includes library, vocal music, instrumental music, speech, RSP, ESL, counseling, field trips, etc. Sometimes physical education and even recess can be cancelled or rescheduled so it does not interfere with the testing schedule.
  • Sometimes a school-wide snack is scheduled during each day of the testing window to level the ‘hunger’ playing field. (Some students have breakfast, some students don’t.)

Before testing, each teacher will:

  • Remove any bulletin boards that may contain information that students could use to help them on the test; i.e. math facts, formulas, sound-letter cards, word walls, vocabulary, posters or charts containing subject-matter content, student work containing subject-matter content, etc. etc. etc.

During testing, each teacher will:

  • Hang the red TESTING: DO NOT DISTURB sign on the door.
  • Collect all cell phones and unauthorized electronic devices.
  • Ensure that no student leaves the classroom during the testing session.
  • Use desk dividers, or separate students from one another to ensure independent work

Also to ensure standardization during the testing, each teacher is required to read the directions for the test aloud to the students directly from the test administration booklet, exactly as printed. It’s even written as a script. This ensures that each teacher giving a particular test says the exact same thing to the students in exactly the same way as all of the other teachers giving that test. If one teacher worded a direction slightly differently, or accidentally skipped something, or added something that was not written in the directions, it could somehow give that group of test takers an unfair advantage (or disadvantage!).

Again, the intention is for all of the students taking the standardized test to be able to be on equal footing with all of the other students taking the same, enabling the results to be compared on an apples to apples basis.

Oh, but what about that time that our testing days fell in the middle of a heat wave? Students in classrooms with out air conditioning where the temperatures can reach 103 degrees are not taking the test under the same conditions as students in classrooms where the A/C keeps the temperature comfortable, or as students in schools not experiencing a heat wave during their testing days. Does classroom temperature affect student outcomes?

Oh, and what about the student who was running a fever last night but not this morning, and was sent to school today anyway, because it was a testing day, and the school had stressed how important it was for the students not to be absent. Will she score as well as she would have on a day that she didn’t run a fever the night before?  How does physical health affect student outcomes?

Oh, and what about the student who found out just this morning that his cat had been run over by a car in the middle of the night. Would he score as well as he would have if the test had been the day before the death of his pet?  How does emotional health affect student outcomes?

These are just a few of the ways that the very best intentions toward test standardization in pursuit of fairness, equity, and justice can be thwarted simply because humans are involved. Too bad we’re not educating widgits!

 

 

Testing, 102: Test Question Standardization

So a standardized test, because it is ‘standardized’, ensures fairness, equity and justice, right? Since every student taking that test is tested on the exact same content under the exact same circumstances, it stands to reason that that gives everyone taking the test an equal opportunity to do well, right? And that the results will be an unbiased way to compare one student to another, apples to apples, right?

But that is not always as easy as it sounds

Yes, the content is standardized, so each student taking that test will answer the same questions as every other student taking that test.

The trouble is, students are human, and like snowflakes, each one is different.

Sometimes, there is inadvertent bias built right into the test questions themselves, even though the test developers are aware of this and try to control for it. Here are a couple of oversimplified possibilities, just to give you the idea. Test developers look for common ground, so no one test taker has an advantage over another because of, say who they are. So let’s say we create a question to assess a student’s ability to identify sequence, a common reading skill.

So the test developers decide to use an excerpt from the story of The Three Little Pigs, because everybody knows that, right? Well, not necessarily. If the child comes from a culture other than the dominant one, the set of bedtime stories that the child heard could be totally different and ‘foreign’ to the dominant culture’s. The bedtime stories may even have been told/read in a completely different language. Or maybe the child’s parent works two jobs and doesn’t have time to read to the children every night, even though the parent knows he/she should and would, in fact, relish the time spent with the children. Would a test taker not familiar with the story of the three pigs be at a disadvantage?

Or maybe the test could contain a reference to sibling rivalry. Could that put an only child who never played with brothers and sisters at a disadvantage when answering that question compared to a child who has had first hand experience interacting with siblings on a regular basis ?

It has been shown that the more affluent the student’s family, the more closely the prior knowledge of the test taker matches the general content from which the test developers draw.

So even if the questions on a given test are standardized, individual students could interpret those questions differently because of influences over which the student has no control: gender, race, religious affiliation, family socio-economic status, and education levels of the parents.

The quest for a fair, equitable, and just test, the results of which can be used to compare apples to apples is a noble one. It is also very difficult to achieve.

Testing 101: What is a Standardized Test?

A test is a test is a test, right?  Wrong!

There is a lot of talk across the nation about standardized tests.  What does it actually mean if the test is “standardized”?  Standardization is all about comparing apples to apples. Basically, it means that the exact same test is administered to every student who takes it under the exact same conditions…the conditions and the test itself are ‘standardized’.  The purpose of this kind of test is so that later, when the results are available, the results can be used to ‘compare apples to apples’.

Here’s a common misconception about standardized tests:  it is often assumed that the questions on the test–in other words, the content being tested–is based on an agreed upon set of ‘standards’.  Sounds logical, right?  Now, that may or may not be the case, (that the content being tested is based on some set of standards), but in the world of educational assessment, content is not what makes a standardized test, well, standardized.

A standardized test means that every student taking a certain test will have the exact same questions to answer.  The questions may be in a different order, but each student will, by the end of the test, answer the same set of questions as every other student taking that same test.

A standardized test means that every student taking the test will take it under the exact same testing conditions.  A standardized test is administered in a controlled environment.  Simply put, the test is given under similar conditions everywhere, and the conditions are specifically dictated by the creators of the test and enforced by the district (or body) administering the test.  That is often why there is a proctor present.  The purpose of the proctor is to make sure that the testing conditions that were specified are being followed in each and every place the test is being administered.

The good old SAT test is a good example of a standardized test.

So, if a student is taking a standardized test, it does not mean that the content being assessed is based on some set of standards, as the term ‘standardized test’ seems to imply, but rather that the test itself, and the conditions under which the test is given, to the extent that it is humanly possible, are consistent and the same across all test takers.  Hence the term, ‘standardized’.  Just like, thank goodness, all electrical outlets (within the United States, at least) and USB ports are ‘standardized’.