Standardised and Scaled Scores

Lots of primary schools are now using standardised tests in each year group to help monitor the progress of pupils. They can be useful for identifying those pupils who seem to have dropped behind their peers, or perhaps aren’t progressing through the curriculum as you might expect based on their prior attainment.

However, the fact that standardised scores from such tests look very much like the scaled scores issued for end of Key Stage assessments can cause confusion. If schools are aiming to predict outcomes at the end of Key Stage 2, it doesn’t make sense to treat the two as the same thing.

Standardised scores

Tests like Rising Stars’ PiRA and PUMA assessments, or the NFER tests, use standardised scores based on a sample of pupils who have taken the test. For a standardised scale, a score of 100 is the average achievement in a cohort. People are usually familiar with this idea from IQ tests. Scores above 100 suggest achievement that it above average, and vice versa. But even this we should take with caution.

Because no test is a perfect measure, it’s not wise to treat somebody with a score of 98 as any different from a score of 102; we just can’t be that accurate. Most test series will give you an indication of confidence intervals. That is to say, a range of scores within which you could reasonably expect a pupil to fall. For example, scoring 103 on a test might mean that you could be 95% sure that such a pupil would score between 99 and 107 if you kept testing them. Of course, we don’t keep testing them. We use the figures from a single test as an indicator of how they are doing compared to others their age.

Standardised scores are based on the familiar concept of the bell curve. Half of pupils will score below 100, and half will score above (well, after those who have scored exactly 100). For most school tests, only about one in 6 children will score above 115; similarly, only 1/6 will score below 85.

bellcurve

Scaled scores

Scaled scores, while looking very similar to standardised scores, are in fact very different. For scaled scores, the 100 marker has been planned in advance. There is a threshold of attainment which pupils must cross in order to score at least 100. In the Key Stage 2 tests since 2016, considerably more than half of pupils have score over 100.

In simple terms: it is easier to score 100+ in the national tests than in a standardised test like PIRA or NFER.

If we look again the bell curve, around 75% of pupils achieved 100+ in KS2 maths. If we look at the top ¾ of achievers in a standardised test, then some of those pupils might have scored as little as 90 on the standardised scale. It’s not to do with whether the tests are easier or harder; just that the scoring systems are different.

On the bell curve, while only 50% of children can score over 100 on the standardised test, around ¾ can – and do – on the statutory tests.

bellcurve2

The problem is reversed when it comes to Greater Depth. On a standardised test, you would expect around ¼ of pupils to score 110 or higher. However, for KS2 maths, only 17% of pupils got a scaled score of greater than 110.

Predictions

As ever, making predictions is a fool’s game. Scoring 95 on one year’s standardised test is no more an indicator of SATs success than England winning a match this year means they’ll win the World Cup next year.

If you rely on standardised scores for making your predictions of later scaled scores, then you may find yourself over-estimating your proportions at greater depth, and potentially under-estimating your proportions achieving the expected standard.

Rising Stars have provided indicative bands based on the correlation between their PiRA/PUMA tests and the national tests – but it’s not a perfect science.

Advertisements

One thought on “Standardised and Scaled Scores

  1. Tom Burkard 9 March 2018 at 5:55 pm Reply

    However inconvenient it may be for predicting pupils’ subsequent achievement, there is an advantage to have standardised (norm-referenced) tests use in conjunction with scaled–or criterion-referenced–tests. All we have to do is look back to when Sir Michael Barber was trumpeting the stunning success of the National Literacy Strategy, when in fact the SATs– scaled tests–weren’t properly anchored from year to year. Testifying before the House Education Committee, Barber was torn to shreds by Prof Peter Tymms, who then had the chair of Durham’s Centre for Evaluation and Monitoring. Using the same test with small samples each year, the CEM found that literacy standards had barely budged–and maths showed only a very slight improvement. Subsequent investigation revealed that SATs had in fact got noticeably easier each year, which is hardly surprising considering the political pressure from Barber’s ‘Delivery Unit’ to produce results.

    Obviously, we can’t use the same test annually for monitoring pupil progress, or teachers would just teach to it. But as Tymms noted, it is statistically impossible to achieve any more than a very small incremental progress from year to year when your sample size consists of an entire year group in England (upwards of 700,000 pupils). So the next-best thing is a standardised test. Already, they’ve revealed that the scoring on SATs is extremely optimistic. If, a few years down the line, we find that 90% of our pupils are scoring 100 on SATs, we will know that the government is trying to pull another Barber.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s

%d bloggers like this: