The next generation science standards are calling for a new model of science instruction. As educators make the switch from traditional instruction to next generation, state-level standardized testing for science must make the same shift that the classroom environment is making.
The above benchmarks, taken from the Trends in International Mathematics and Science Study (TIMSS), give some insight to the concern evident in the National Research Council’s report back in 2011. That report in turn was responding to concerns by the National Academy and the National Institute of Health, which about a decade ago realized that students were not college and career ready from a STEM standpoint, and asked an important question: Why?
Right now, the best-performing students in various American states are performing at about a 565-point level, which is a high international benchmark. But let’s look at what this really represents by starting at the bottom and working our way up, really striving to understand each level of these benchmarks.
At a low level, students demonstrate knowledge and simple facts, interpret simple diagrams and complete simple tables providing short written responses to questions. If that describes instruction and performance in your district, especially if that describes it at its best, that’s unfortunately a low level of performance, even under the old standards.
At an intermediate level, students show basic knowledge and understanding of practical situations, the ability to interpret information on pictorial diagrams and apply factual knowledge to practical situations. The “high” international benchmark now has students who can apply material, explain phenomena, compare and contrast, make simple inferences and provide brief descriptive responses.
This is where the best states in the U.S. are weighing in. And this is a problem, for reasons that will become clear after we examine the diagram below.
Bloom’s Taxonomy traditionally orders “learning domains” in a pyramid that progresses from remembering to understanding, applying, analyzing, evaluating and creating at the top. We reorder that to show the top three domains – creating, evaluating and analyzing – as occurring equally in a true next generation environment.
Looking at the reordering of Bloom’s Taxonomy, you can see that creating, evaluating and analyzing are equally important skills. However, the high international benchmark only touches on applying, which is the level below these three higher order thinking skills. At the intermediate level, we don’t even have that.
According to this comparison of benchmarks against Bloom’s Taxonomy, most students are not, according to the National Academy’s Institutes, ready for college or STEM careers … or even, in many cases, non-STEM careers.
Students who hit this benchmark show some knowledge of the process of scientific inquiry, have the ability to interpret results in the context of a simple experiment, are able to reason and draw conclusions from descriptions and diagrams, and can evaluate and support an argument. Students who do this are creating, evaluating and analyzing simultaneously.
They are also remembering, understanding and applying, of course, but those lower level skills aren't sufficient on their own. The students who will be successful in college and career are those who have higher order thinking skills, which is why the Next Generation Science Standards are pushing for them so heavily both in the learning and the testing environment.
An example of an assessment that aligns with NGSS, requires higher-order thinking skills and asks students to draw conclusions based on using the practices of science and engineering. Image credit: PISA.
We will see positive results from this type of testing if, and only if, it aligns with curriculum in the classroom, but first we must understand what the assessments are looking for so we can be sure instruction mirrors the mode of testing as well as the content.
Let’s take an example of an assessment that could align with NGSS, and might be familiar to those of you who use PARCC or Smarter Balanced. Up top, you have a reading passage. In the red area, you see a fictional student who puts forward his ideas. Then down on the bottom below the graphs, you'll see that he relates his idea to specific data that's put forward.
So how does this work as an assessment? The overall question is essentially asking students to use data to ascertain whether or not Andre’s conclusion is reasonable. In other words, is the basic assumption behind the greenhouse effect supported by the data in two graphs this student found? In order to assess this question, students need to use their science and engineering practice skills, content knowledge and crosscutting concepts of system behavior.
Here we have the meat of the assessment, which is now asking students to use their skills, disciplinary knowledge and crosscutting concepts to find and present data to back a specific argument. Image credit: PISA.
In response to Question 1, for instance, a student might offer as an example the fact that carbon emissions have not been rising steadily, yet temperature has, such as happened in the 1930s. That might lead them to conclude that it isn’t reasonable to say that greenhouse gases are the sole cause of increases in temperature. Coming to this conclusion requires that students access all of those different skill sets and concepts. Students must be able to use those practice skills to think about content dynamically, to consider the actual nature of how things interact.
Students who have these skill sets will be equipped to answer complex questions that require higher order thinking skills. Teaching this kind of thinking – creating, evaluating and analyzing – is crucial for students who will soon be tested in this way.
This particular test is aimed at an 8th- or 9th-grade level. So if you have an 8th- or 9th-grade classroom, ask yourself whether your students would be ready to handle a question like this. If your students aren’t currently receiving this type of instruction, chances are they aren’t equipped to answer questions like the one posed above.
Instructional practices have to evolve to meet these higher order assessment types, or else we won’t see the results we want to see.