Lots of primary schools are now using standardised tests in each year group to help monitor the progress of pupils. They can be useful for identifying those pupils who seem to have dropped behind their peers, or perhaps aren’t progressing through the curriculum as you might expect based on their prior attainment.
However, the fact that standardised scores from such tests look very much like the scaled scores issued for end of Key Stage assessments can cause confusion. If schools are aiming to predict outcomes at the end of Key Stage 2, it doesn’t make sense to treat the two as the same thing.
Tests like Rising Stars’ PiRA and PUMA assessments, or the NFER tests, use standardised scores based on a sample of pupils who have taken the test. For a standardised scale, a score of 100 is the average achievement in a cohort. People are usually familiar with this idea from IQ tests. Scores above 100 suggest achievement that it above average, and vice versa. But even this we should take with caution.
Because no test is a perfect measure, it’s not wise to treat somebody with a score of 98 as any different from a score of 102; we just can’t be that accurate. Most test series will give you an indication of confidence intervals. That is to say, a range of scores within which you could reasonably expect a pupil to fall. For example, scoring 103 on a test might mean that you could be 95% sure that such a pupil would score between 99 and 107 if you kept testing them. Of course, we don’t keep testing them. We use the figures from a single test as an indicator of how they are doing compared to others their age.
Standardised scores are based on the familiar concept of the bell curve. Half of pupils will score below 100, and half will score above (well, after those who have scored exactly 100). For most school tests, only about one in 6 children will score above 115; similarly, only 1/6 will score below 85.
Scaled scores, while looking very similar to standardised scores, are in fact very different. For scaled scores, the 100 marker has been planned in advance. There is a threshold of attainment which pupils must cross in order to score at least 100. In the Key Stage 2 tests since 2016, considerably more than half of pupils have score over 100.
In simple terms: it is easier to score 100+ in the national tests than in a standardised test like PIRA or NFER.
If we look again the bell curve, around 75% of pupils achieved 100+ in KS2 maths. If we look at the top ¾ of achievers in a standardised test, then some of those pupils might have scored as little as 90 on the standardised scale. It’s not to do with whether the tests are easier or harder; just that the scoring systems are different.
On the bell curve, while only 50% of children can score over 100 on the standardised test, around ¾ can – and do – on the statutory tests.
The problem is reversed when it comes to Greater Depth. On a standardised test, you would expect around ¼ of pupils to score 110 or higher. However, for KS2 maths, only 17% of pupils got a scaled score of greater than 110.
As ever, making predictions is a fool’s game. Scoring 95 on one year’s standardised test is no more an indicator of SATs success than England winning a match this year means they’ll win the World Cup next year.
If you rely on standardised scores for making your predictions of later scaled scores, then you may find yourself over-estimating your proportions at greater depth, and potentially under-estimating your proportions achieving the expected standard.
Rising Stars have provided indicative bands based on the correlation between their PiRA/PUMA tests and the national tests – but it’s not a perfect science.