Standardised and Scaled Scores

Lots of primary schools are now using standardised tests in each year group to help monitor the progress of pupils. They can be useful for identifying those pupils who seem to have dropped behind their peers, or perhaps aren’t progressing through the curriculum as you might expect based on their prior attainment.

However, the fact that standardised scores from such tests look very much like the scaled scores issued for end of Key Stage assessments can cause confusion. If schools are aiming to predict outcomes at the end of Key Stage 2, it doesn’t make sense to treat the two as the same thing.

Standardised scores

Tests like Rising Stars’ PiRA and PUMA assessments, or the NFER tests, use standardised scores based on a sample of pupils who have taken the test. For a standardised scale, a score of 100 is the average achievement in a cohort. People are usually familiar with this idea from IQ tests. Scores above 100 suggest achievement that it above average, and vice versa. But even this we should take with caution.

Because no test is a perfect measure, it’s not wise to treat somebody with a score of 98 as any different from a score of 102; we just can’t be that accurate. Most test series will give you an indication of confidence intervals. That is to say, a range of scores within which you could reasonably expect a pupil to fall. For example, scoring 103 on a test might mean that you could be 95% sure that such a pupil would score between 99 and 107 if you kept testing them. Of course, we don’t keep testing them. We use the figures from a single test as an indicator of how they are doing compared to others their age.

Standardised scores are based on the familiar concept of the bell curve. Half of pupils will score below 100, and half will score above (well, after those who have scored exactly 100). For most school tests, only about one in 6 children will score above 115; similarly, only 1/6 will score below 85.

bellcurve

Scaled scores

Scaled scores, while looking very similar to standardised scores, are in fact very different. For scaled scores, the 100 marker has been planned in advance. There is a threshold of attainment which pupils must cross in order to score at least 100. In the Key Stage 2 tests since 2016, considerably more than half of pupils have score over 100.

In simple terms: it is easier to score 100+ in the national tests than in a standardised test like PIRA or NFER.

If we look again the bell curve, around 75% of pupils achieved 100+ in KS2 maths. If we look at the top ¾ of achievers in a standardised test, then some of those pupils might have scored as little as 90 on the standardised scale. It’s not to do with whether the tests are easier or harder; just that the scoring systems are different.

On the bell curve, while only 50% of children can score over 100 on the standardised test, around ¾ can – and do – on the statutory tests.

bellcurve2

The problem is reversed when it comes to Greater Depth. On a standardised test, you would expect around ¼ of pupils to score 110 or higher. However, for KS2 maths, only 17% of pupils got a scaled score of greater than 110.

Predictions

As ever, making predictions is a fool’s game. Scoring 95 on one year’s standardised test is no more an indicator of SATs success than England winning a match this year means they’ll win the World Cup next year.

If you rely on standardised scores for making your predictions of later scaled scores, then you may find yourself over-estimating your proportions at greater depth, and potentially under-estimating your proportions achieving the expected standard.

Rising Stars have provided indicative bands based on the correlation between their PiRA/PUMA tests and the national tests – but it’s not a perfect science.

Advertisements

Bad news to bury worse news

The DfE announced today that it plans to introduce a multiplication tables check in Year 4 – and I’m angry.

I’m not alone in feeling angry it seems, but my reasons are very different than those of so many.  The multiplication check has been government policy for some time, has been moved to Year 4 on the basis of feedback from the profession, and will not form part of the high stakes assessment information that is published every year. Perhaps more importantly, the check focuses on something which is undoubtedly useful for mathematics. It’s a classic case of where teaching to the test is absolutely desirable.

So why the anger?

Well, the DfE also chose today – perhaps not coincidentally – to release the updates to the Teacher Assessment frameworks for KS1 and KS2. So while everyone was getting their knickers in a twist about whether an online check was helpful or harmful, the department managed to quietly sneak out the news that the useless writing assessment procedures we’ve been battling with for nearly three years now are here to stay.

It’s worth remembering that these are the frameworks against which statutory teacher assessments are made. The decisions which have seen wild volatility between and within local authorities, a failed moderation system, huge discrepancies in what is permitted, and a real lack of understanding of the circumstances under which judgements should be made. This is the system we’ll continue to have to use in the years to come.

Notably, the DfE doesn’t trust such judgements for the purposes of setting a baseline for secondary schools. The new progress 8 measure ignores the Writing judgement completely. Yet it will remain an integral part of the high stakes assessment process against which primaries are judged. Schools and school leaders will continue to have to choose between honest, accurate assessment, and playing the system to ensure that schools remain above the floor and coasting standards.

It’s clear from recent years’ results that the system isn’t a fair or useful reflection of how pupils are achieving in schools, and that the high stakes use of the outcomes will unjustly damage schools and careers. It’s obvious to most that the framework offers no sensible judgement on the quality of children’s writing, or their skill as a writer.

Yet here we all are, arguing about whether a 25-minute quiz in Year 4 is the problem.

I can’t help but think that that’s exactly what the DfE hoped for.

 

What is a “particular weakness” anyway?

In DfE terms, it’s early days for being able to make decisions about KS2 Writing outcomes. After all, it wasn’t so long ago that we were reaching February without any exemplification at all, so for the STA to have released its “particular weakness” scenarios as early as mid-January is progress!

However, publishing the materials is one thing. Providing the clarity that a high stakes statutory assessment process dearly needs is quite another. The example scenarios offer some insight into the thinking at the STA about this new ‘flexibility’, but seem to have deliberately skirted round the key issues that keep coming up, such as dyslexia!

In an effort to get a sense of the interpretations out there, I put together some very brief scenarios of my own, and asked Y6 teachers to say whether or not they thought such pupils would be awarded the expected standard. And as I feared, there is a real lack of clarity about. The six example scenarios follow, accompanied by the pie charts showing decisions. In each case, the blue represents those who would award EXS (based on a sample of 668 responses)

Scenario 1

graph1

77% award EXS

Edith has shown herself to be a fluent and confident writer. She adapts her writing for a variety of purposes, and in many cases has evidence of elements of working at Greater Depth. However, there are no examples of the passive voice used in any of her writing, except through planned tasks.

Scenario 2

graph2

67% award EXS

Beowulf is a good writer, who meets almost all of the requirements for EXS. However, he has been identified as being at high risk of dyslexia. In his writing he has shown that he can use some of the Y5/6 words accurately. However, he struggles with some of the regular spelling patterns from the curriculum, and his work contains several errors, particularly for the more complex patterns.

Scenario 3

graph3

36% award EXS

Ethelred writes effectively for a range of audiences and purposes, with sound grammatical accuracy. He uses inverted commas correctly to mark speech, but does not yet consistently include punctuation within the inverted commas.

Scenario 4

graph4

71% award EXS

Boudicca writes well, showing an interesting range of language, sentence type and punctuation. However, she has developed a largely un-joined style of writing, which although clearly legible does not include the usual diagonal or horizontal strokes.

Scenario 5

graph5

55% award EXS

Cleopatra is a confident writer, who shows good grasp of technical aspects and a beautiful joined style of writing. She enjoys writing fiction and can develop good plot, with writing that flows well. However, in non-fiction texts she is not always able to use the cohesive devices that enable cohesion between paragraphs. There are some examples of stock phrases used (On the other hand, Another reason, etc.) when writing in a formal style, but these are not consistent across the non-fiction texts she writes

Scenario 6

 

graph6

92% award EXS

Englebert is a technically sound writer. He is able to adapt writing for fiction and non-fiction purposes and uses a variety of language and punctuation techniques. His spelling of common patterns is generally good. However, there are a number of examples of words from the Y5/6 lists which are mis-spelt in his writing generally. His teacher has shown that he could spell these words correctly when tested in the context of dictated sentences throughout the year.

 

Notably, all but one of the results were within 5 percentage points of the figures above when looking only at those who said they had had some training provided on this topic. The biggest difference came for scenario 4 (handwriting) where only 61% of those who said they’d been trained would award EXS compared to 71% of the full sample.

 

It’s hard to say what I expected when I set up these little scenarios. I certainly don’t know what any “correct” responses might be. I think I imagined that some would be fairly evenly split – as with the case of Cleopatra’s weak use of cohesive devices.

Scenario 6 has genuinely surprised me. I don’t know what a moderator would say, but my fear about dictated sentences would be that children could easily be tested on a handful of words each week, learned for Friday’s test, and then quickly forgotten. Is that sufficient to say they can spell at the Expected Standard? Who knows? (That’s not to say that I think ‘no’ is the correct answer either; I’m not persuaded that the importance of spelling those particular words is as great as the system might suggest).

I’m equally surprised at scenario 3. Is it really right that speech punctuation is so so important that 2/3 of teachers would deny a pupil an EXS judgement on this alone – even when so many are happy to overlook spelling or handwriting failures?

As I say – I don’t have any answers. If any moderator – or perhaps an STA representative would like to give a definitive response, I’d be glad of it. I suspect that as close as we’d get to an official answer is that a moderator would have more evidence upon which to make a decision. Which is all well and good. For the 3-4% of pupils whose work gets moderated. For everyone else, we have to hope that teachers have got it right. And judging by these results, that’s not that easy!

Writing Moderation materials

Just a quick post to share the moderation support materials that were shared by the STA today. For some reason, they have only been shared via the password-protected NCA Tools website. However, there is no indication that they should be maintained under any conditions of secrecy, and no indication that they are not covered by the usual Crown Copyright rules… so here they are:

KS1_standardisation_training_presentation_1

KS1_teacher_assessment_moderation_training_pack_1

KS2_standardisation_training_presentation_1

KS2_teacher_assessment_moderation_training_pack_1

The presentations include clarifications about some of the criteria included in the assessment frameworks.

The moderation training packs include the examples that are meant to help illustrate what counts as an exception when you want to overlook one of the criteria.

See if you find it at all helpful…

National Curriculum test videos

With the introduction of the new style National Curriculum tests in 2016, I made some short informative videos for parents about each set of tests. Since then, I’ve updated then each year to reflect changes such as this year’s timetable changes at KS2. The videos last around 5 minutes and are ideal for sharing on school websites, twitter feeds, facebook pages, etc.

To help schools use them most effectively, I have provided links below in each of the main formats so they can easily be shared. Please feel free to share or download the videos and use them for your school:

Key Stage 2 tests

youtube facebookicon Twit mp4icon
YouTube Facebook Twitter MP4 download

Key Stage 1 tests – including Grammar, Punctuation & Spelling

 

youtube facebookicon Twit mp4icon
YouTube Facebook Twitter MP4 download

Key Stage 1 tests – without GPS

youtube facebookicon Twit mp4icon
YouTube Facebook Twitter MP4 download

Primary Assessment changes… again!

First of all, let me say that I’m pleased that primary assessment is changing again, because it’s been a disaster in so many ways. So here is a summary of the changes at each key stage – with my thoughts about each.

Early Years Foundation Stage Profile

  • The EYFS Profile will stay, but will be updated to bring it into line with the new national curriculum and take account of current knowledge & research. I’ve never been a huge fan of the profile, but I know most EY practitioners have been, so that seems a sensible move.
  • A proposed change to reduce the number of reported Early Learning Goals to focus on prime areas and Literacy/Maths
  • The ’emerging’ band may be divided to offer greater clarity of information particularly for lower-attaining pupils.
  • An advisory panel will be set up to advise on changes to the profile and ELGs. Membership of that could be contentious

Reception Baseline

  • New Reception baseline to be introduced from 2020 (with proper trialling beforehand this time, one presumes!) to take place in the first 6 weeks of school.
  • Won’t be a ‘test’, but also won’t be observational over time. Suspect something more like the current CEM model, perhaps?
  • Will focus on literacy & numeracy, and potentially a ‘self-regulation’ element, as good predictors for attainment in KS2
  • Data won’t be used for any judgements about Reception, but will be used at cohort level to judge progress by the end of KS2.
  • The intention is for the assessment to provide some narrative formative information about children’s next steps.

Key Stage 1

  • The KS1 Grammar, Punctuation & Spelling test will remain optional.
  • Statutory Assessment will remain until at least 2023 (to allow for a year of overlap with the first cohort to be assessed using Reception baseline).
  • A new framework for Teacher Assessment of Writing has been published for this year only. Exemplification will follow this term.
  • DfE will continue to make assessments available (perhaps through an assessment bank if that ever gets off the ground!) after 2023, to help schools to benchmark attainment.
  • After 2023, tests and statutory teacher assessment will become optional for through primary schools.
  • There is more work to be done to find a system which works well for infant/junior and first/middle schools. This will be done with those in the sectors.

Key Stage 2

  • A multiplication check will be introduced at the end of Year 4. (Although, of course, whether the end means July or May remains to be seen).
  • School-level data on the multiplication check won’t be published.
  • This will be the last year that teachers have to make Teacher Assessment judgements for Reading and Maths
  • A new framework for Teacher Assessment of Writing has been published for this year only. Exemplification will follow this term.
  • DfE will continue to evaluate other options for the future, but not really committing to anything yet.
  • Small trials of peer-to-peer moderation will take place this summer.
  • Science Teacher Assessment frameworks will be updated next year.
  • The Reading test will not be timetabled for Monday of SATs week any more (hurrah!)
  • The DfE aims to link the reading content of the tests more closely to the curriculum to ensure children are drawing on their knowledge.

My thoughts

Overall, I’m pleased. Most of these changes are to be welcomed. The Reception baseline is a sensible idea (just a shame it was so badly implemented the first time round), as is scrapping KS1 assessments. The Early Years changes seem reasonable given the popularity of the current setup. The improvements to the KS2 Reading test are positive, as is the removal of pointless Teacher Assessment judgements.

On Writing, I fear we haven’t gone far enough. The current system is a joke, and it seems like the interim solution we’ll have to replace the old interim solution will just aim to make it less awful without really fixing the problem. It’s a shame that there is no obvious answer on the horizon. Perhaps the department has had its fingers burnt by rushing into quick fixes in the past and is prepared to bide its time.

In the interim, the updated expectations for Writing seem more manageable both in terms of achieving and assessing them. Of course, the devil is in the detail. If we get another exemplification book that breaks down single statements into several tick-boxes then we may be back at square one. Equally, of course, we can expect proportions of pupils meeting the expected standard to rise again substantially this year. Surely we have to be honest now and say that we really cannot use this data for accountability purposes? Mind you, perhaps it won’t matter – if we’re all getting 90% in Writing, it’ll only be the tested subjects that will make a difference to the accountability!

There are some other changes I would have liked to have seen. I really don’t think the “expected standard” label is helpful, particularly in subjects where scaled scores are used; it’s a shame we’ve not seen the back of that.

We’re not out of the woods yet. But we’re heading in the right direction, and credit is due to those at the department for listening. Let’s just hope they keep listening until we all get it right.

Will we see a leap in Writing attainment?

I’ve long been clear that I think that the current system of assessing writing at KS2 (and at KS1 for that matter) is so flawed as to be completely useless. The guidance on independence is so vague and open to interpretation and abuse, the framework so strictly applied (at least in theory), and moderation so ineffective at identifying any poor practice, that frankly you could make up your results by playing lottery numbers and nobody would be any the wiser.

One clear sign of its flaws last year was in the fact that having for years been the lowest-scoring area of attainment, and despite the new very stringent criteria which almost all teachers seem to dislike, somehow we ended up with more children achieving the expected standard in Writing than in any other subject area.

My fear now is that we will see that odd situation continue, as teachers get wise to the flaws in the framework and exploit them. I’m not arguing that teachers are cheating (although I’m sure some are), but rather that the system is so hopelessly constructed that the best a teacher can do for their pupils is to teach to the framework and ensure that every opportunity is provided for children to show the few skills required to reach the standard. There is no merit now in focusing on high quality writing; only in meeting the criteria. Results will rise, with no corresponding increase in the quality of writing needed.

For that reason, I suspect that we will likely see a substantial increase in the number of schools having more pupils reaching the expected standard. At Greater Depth level I suspect the picture will be more varied as different LAs give contradictory messages about how easy is should be to achieve, and different moderators appear to apply different expectations.

In an effort to get a sense of the direction of travel, I asked teachers  – via social media –  to share their writing data for last year, and their intended judgements for this year. Now, perhaps unsurprisingly, more teachers from schools with lower attainment last year have shared their data, so along with all the usual caveats of what a small sample this is, it’s worth noting that it’s certainly not representative. But it might be indicative.

Over 250 responses were given, of which just over 10 had to be ignored (because it seems that some teachers can’t grasp percentages, or can’t read questions!). Of the 240 responses used, the average figure for 2016 was 71% achieving EXS and 11% achieving GDS. Both of these figures are lower than last year’s national figures (74% / 15%) – which themselves seemed quite high, considering that just 5 years before, a similar percentage had managed to reach the old (apparently easier) Level 4 standard. Consequently, we might reasonably expect a greater increase in these schools results in 2017 – as the lower-attaining schools strive to get closer to last year’s averages.

Nevertheless, it does appear that the rise could be quite substantial. Across the group as a whole, the percentage of pupils achieving the expected standard rose by 4 percentage points (to just above last year’s national average), with the percentage achieving greater depth rising by a very similar amount (again, to just above last year’s national average).

We might expect this tendency towards the mean, and certainly that seems evident. Among those schools who fell short of the 74% last year, the median increase in percentage achieving expected was 8 percentage points; by contrast, for those who exceeded the 74% figure last year, the median change was a fall of 1 percentage point.

Now again, let me emphasise the caveats. This isn’t a representative sample at all – just a self-selecting group. And maybe if you’re in a school which did poorly last year and has pulled out all the stops this year, you’d be more likely to have responded, so it’s perfectly possible that this overestimates the national increase.

But equally, it’s possible that we’ll see an increase in teacher assessment scores which outstrips the increases in tested subjects – even though it’s already starting from a higher (some might say inflated) base.

I’m making a stab in the dark and predicting that we might see the proportion of children – nationally – reaching the Expected Standard in Writing reach 79% this year. Which is surely bonkers?

Stop moaning about tests!

Today marked the end of 4 short days of testing. For Year 6 pupils everywhere, they’ll have spent less than 5 hours on tests – probably not for the first time this year – and later in the year we’ll find out how they did.

Now, I’m the first to complain when assessment isn’t working, and there are lots of problems with KS2 assessment. Statutory Teacher Assessment is a joke; the stakes for schools – and especially headteachers – are ridiculously high; the grammar test is unnecessary for accountability and unnecessarily prescriptive. I certainly couldn’t be seen as an apologist for the DfE. And yet…

For some reason it appears that many primary teachers (particularly in Facebook groups, it seems) are cross that some of the tests contained hard questions. I’ve genuinely seen someone complain that their low-ability children can’t reach the expected standard. Surely that’s the very reason they’re defining them as low ability?

Plenty of people seem annoyed that some of the questions on the maths test were very challenging. Except, of course, we know that some children will score 100% each year, so the level of challenge seems fair. There were also plenty of easier, more accessible questions that allowed those less confident mathematicians to show what they can do. It’s worth remembering that to reach the expected standard last year, just 55% of marks were needed.

But the thing that annoys me most is the number of people seemingly complaining that the contexts for problem-solving questions make the questions too difficult. Of course they do, that’s the point: real maths doesn’t come in lists of questions on a page that follow a straightforward pattern. What makes it all the more irritating is that many of those bemoaning the contexts of problems are exactly the same sort who moan about a tables test, complaining that knowing facts isn’t worthwhile unless you can apply them.

Well guess what: kids need both. Arithmetic knowledge and skills need to be secure to allow children to focus their energies on tackling those more complex mathematical problems. You can’t campaign against the former, and then complain about the latter.

The tests need to – as much as possible – allow children across the ability range to demonstrate their skill, while differentiating between those who are more and less confident. That’s where last year’s reading test fell down: too few accessible elements and too many which almost no children could access. This year’s tests were fair and did a broadly good job of catering for that spread. For those complaining about the level of literacy required, it’s worth remembering that questions can be read to children, and indeed many will have had a 1:1 reader throughout.

No test will be perfect, and there are plenty of reasons to be aggrieved about the chaos that is primary assessment at the moment, but blaming tests because not all children can answer all questions is a nonsense, and we’d do well to pick our battles more carefully!

Platitudes don’t reduce workload

There’s no denying that workload remains a significant issue in our profession.  However, the solutions are not to be found in platitudes and pleasantries.

Two popular solutions have cropped up this weekend and both need dropping.

The first is slightly tangential, and focuses in theory on wellbeing. The problem with that is that the biggest threat to teachers’ wellbeing is workload. Reduce the workload, you’ll reduce the issue.

The TES ran a column this week that include ideas such as laughing yoga and ‘star of the week’. Now, if ‘star of the week’ is the sort of thing that floats your boat, then knock yourself out. Personally, I’d find it cringy or patronising. Similarly, with yoga, if that’s for you, then great. As a way of improving my wellbeing, it reminds me of the course I attended as an NQT where we were told that massage would be a good relaxation technique, before being paired up with complete strangers to practice massage techniques. I assure you, I did not feel relaxed!

If teachers want to use yoga to find inner peace and relaxation, then wouldn’t the best thing we could do as schools be to ensure that teachers have enough time left in their week to attend yoga classes in their own time?

The second solution which comes up every now and then is the barmy notion that Ofsted should judge schools on how they reduce workload. Can you imagine the nonsense of it?

As I’ve said before, in recent years Ofsted has done a good job of clarifying its expectations (both for schools and inspectors), so it is now rarely the cause of the problem.

However, Ofsted cannot be the solution either. Excessive workload is often a matter of weak leadership. Confident headteachers will make decisions about policies on things like marking, data and planning which focus on benefit for pupils in relation to time and effort costs, which align with the recommendations of the DfE’s workload reports. That’s great. But where weak leaders fail to follow such guidance, they’re also likely to get it wrong when it comes to Ofsted judging their efforts.

A poor headteacher who thinks that draconian marking or planning policies are useful, is just the sort of headteacher who might think that locking up the school at 5pm every night is a helpful workload-reduction technique. Just because you can’t be in the building doesn’t make that workload disappear, but it might appear a good strategy at first glance.

The problem is, with all the best intentions, as soon as you make a measurable goal of reducing workload, you actually create a task of headteachers being seen to act on workload. The school who never had a bonkers policy gets no credit, while the crazy head who insists on scrutinising every lesson plan gets to claim that he’s made it easier by allowing you to upload them rather than print them in triplicate.

As my TES column last autumn was headed: Want to reduce workload? Reduce work.

KS2 Writing: Moderated & Unmoderated Results

After the chaos of last year’s writing assessment arrangements, there have been many questions hanging over the results, one of which has been the difference between the results of schools which had their judgements moderated, and those which did not.

When the question was first raised, I was doubtful that it would show much difference. Indeed, back in July when questioned about it, I said as much:

At the time, I was of the view that LAs each trained teachers in their own authorities about how to apply the interim frameworks, and so most teachers within an LA would be working to the same expectations. As a result, while variations between LAs were to be expected (and clearly emerged), the variation within each authority should be less.

At a national level, it seems that the difference is relatively small. Having submitted Freedom of Information Requests to 151 Local Authorities in England, I now have responses from all but one of them. Among those results, the differences are around 3-4 percentage points:

moderated

Now, these results are not negligible, but it is worth bearing in mind that Local Authorities deliberately select schools for moderation based on their knowledge of them, so it may be reasonable to presume that a larger number of lower-attaining schools might form part of the moderated group.

The detail that has surprised me is the variation between authorities in the consistency of their results. Some Local Authority areas have substantial differences between the moderated and unmoderated schools. As Helen Ward has reported in her TES article this week, the large majority of authorities have results which were lower in moderated schools. Indeed, in 11 authorities, the difference is 10 or more percentage points for pupils working at the Expected Standard. By contrast, in a small number, it seems that moderated schools have ended up with higher results than their unmoderated neighbours.

What can we learn from this? Probably not a great deal that we didn’t already know. It’s hard to blame the Local Authorities: they can’t be responsible for the judgements made in schools they haven’t visited, and nor is it their fault that we were all left with such an unclear and unhelpful assessment system. All this data highlights is the chaos we all suffered – and may well suffer again in 2017.

To see how your Local Authority results compare, view the full table* of data here. It shows the proportions of pupils across the LA who were judged as working at the Expected and Greater Depth Standards in both moderated and unmoderated schools.


*Liverpool local authority claimed a right not to release their data on the grounds of commercial sensitivity, which I am appealing. I fully expect this to be released in due course and for it to be added here.