Category Archives: assessment

Some clarity on KS2 Writing moderation … but not a lot

Not for the first time, the Department has decided to issue some clarification about the writing assessment framework at Key Stage 2 (and its moderation!). For some inexplicable reason, rather than sharing this clarity in writing, it has been produced as a slowly-worded video – as if it were us that were stupid!

Here’s my take on what it says:

Some Clarity – especially on punctuation

  • For Greater Depth, the long-winded bullet point about shifts in formality has to be seen in several pieces of work, with more than one shift within each of those pieces.
  • For Expected Standard, it is acceptable to have evidence of colons and semi-colons for introducing, and within, lists (i.e. not between clauses)
  • For Expected Standard, any of either brackets, dashes or commas are acceptable to show parenthesis. There is no need to show all three.
  • Bullet points are punctuation, but the DfE is pretending they’re not, so there’s no need to have evidence of them as part of the “full range” of punctuation needed for Greater Depth.
  • Three full stops to mark ellipsis are also punctuation, but again, the DfE has managed to redefine ellipsis in such a way that they’re not… so again, not needed for Greater Depth.

A bit of guidance on spelling

This was quite clear: if a teacher indicates that a spelling needs correcting by writing a comment in the margin on the relevant line, then the correction of that spelling cannot be counted as independent. If the comment to correct spellings comes at the end of a paragraph or whole piece, without specifying what to correct, then it can still count as independent.

No clarity whatsoever on ‘independence’

Believe me, I’ve re-watched this several times – and not all of them at double-speed – and I’m still bemused that they think this clarifies things. The whole debacle is still reliant on phrases like “over-scaffolding” and “over-detailed”. Of course, if things are over-detailed then there is too much detail. What isn’t any clearer is how much detail is too much detail. The video tells us that:

“success criteria would be considered over-detailed where the advice given directly shapes what pupils write by directing them to include specific words or phrases”

So we know specifying particular words is too much, but is it okay to use success criteria which include:

  • Use a varied range of sentence structures

Is it too specific to include this?

  • Use a varied range of sentence openers

What about…?

  • Use adverbs as sentence openers

There’s a wide gulf between the three examples above. Which of these is acceptable? Because if it’s the latter, then schools relying on the first will find themselves under-valuing work – and vice versa, of course. That’s before you even begin to consider the impossibility of telling what success criteria and other supporting examples are available in classrooms at the time of writing.

The video tries to help by adding:

“success criteria must not specifically direct pupils as to what to include or where to include something in their writing”

But all of those examples are telling children what to include – that’s the whole point of success criteria.

If I’ve understood correctly, I think all three of those examples are acceptable. But it shouldn’t matter what I think: if the whole system depends on what each of us thinks the guidance means, then the consistency necessary for fair and useful assessment is non-existent.

The whole issue remains a farce. Doubtless this year Writing results will rise, probably pushing them even higher above the results for the externally tested subjects. Doubtless results will vary widely across the country, with little or no relationship to success in the tested subjects. And doubtless moderation will be a haphazard affair with professionals doing their best to work within an incomprehensible framework.

And to think that people will lose their jobs over data that results from this nonsense!


The full video in all its 11-minute glory can be found at: https://www.youtube.com/watch?v=BQ-73l71hqQ

 

National Curriculum Test videos

I’ve updated the videos I made last year to explain the KS1 and KS2 tests to parents. As there is an option about using the Grammar, Punctuation & Spelling tests in primary schools, there are now two versions of the video for KS1 (one with, one without the GPS tests).

Please feel free to use these videos on your school’s website or social media channels, or in parent meetings, etc. There are MP4 versions available to download.

Key Stage 2

Re-tweetable version:

Facebook shareable version:
https://www.facebook.com/primarycurriculum/videos/1311921482187352/

Downloadable MP4 file: https://goo.gl/b0Lo9v

Key Stage 1 – version that includes the GPS tests

Re-tweetable version:

Facebook shareable version:
https://www.facebook.com/primarycurriculum/videos/1311921482187352/

Downloadable MP4 file: https://goo.gl/jo18qk

Key Stage 1 – version for schools not using the GPS tests

Re-tweetable version:

Facebook shareable version:
https://www.facebook.com/primarycurriculum/videos/1311921482187352/

Downloadable MP4 file:  https://goo.gl/xMDFSJ

The impossibility of Teacher Assessment

I’ve said for a fair while now that I’d like to see the end of statutory Teacher Assessment. It’s becoming a less unpopular thing to say, but I still don’t think it’s quite reached the point of popularity yet. But let me try, once again, to persuade you.

The current focus of my ire is the KS2 Writing assessment, partly because it’s the one I am most directly involved in (doing as a teacher, not designing the monstrosity!), and partly because it is the one with the highest stakes. But the issues are the same at KS1.

Firstly, let me be frank about this year’s KS2 Writing results: they’re nonsense! Almost to a man we all agreed last year that the expectations were too high; that the threshold was something closer to a Level 5 than a 4b; that the requirements for excessive grammatical features would lead to a negative impact on the quality of writing. And then somehow we ended up with 74% of children at the expected standard, more than in any other subject. It’s poppycock.

Some of that will be a result of intensive drilling, which won’t have improved writing that much. Some of it will be a result of a poor understanding of the frameworks, or accidental misuse of them. Some of it will be because of cheating. The real worry is that we hardly know which is which. And guidance released this year which is meant to make things clearer barely helps.

I carried out a poll over the last week asking people to consider various sets of success criteria and to decide whether they would be permitted under the new rules which state that

independent

So we need to decide what constitutes “over-aiding” pupils. At either end of the scale, that seems quite simple.Just short of 90% of responses (of 824) said that the following broad guidance would be fine:

1.png

Simplest criteria

Similarly, at the other extreme, 92% felt that the following ‘slow-writing’ type model would not fit within the definition of ‘independent’:

8

Slow writing approach

This is all very well, but in reality, few of us would use such criteria for assessed work. The grey area in the middle is where it becomes problematic. Take the following example:

5

The disputed middle ground

In this case results are a long way from agreement. 45% of responses said that it would be acceptable, 55% not. If half of schools eschew this level of detail and it is actually permitted, then their outcomes are bound to suffer. By contrast, if nearly half use it but it ought not be allowed, then perhaps their results will be inflated. Of course, a quarter of those schools maybe moderated which could lead to even those schools with over-generous interpretations of the rules suffering. There is no consistency here at all.

The STA will do their best to temper these issues, but I really think they are insurmountable. At last week’s Rising Stars conference on the tests, John McRoberts of the STA was quoted as explaining where the line should be drawn:

That advice does appear to clarify things (such that it seems the 45% were probably right in the example above), but it is far from solving the problem. For the guidance is full of such vague statements. It’s clear that I ought not to be telling children to use the word “anxiously”, but is it okay to tell them to open with an adverb while also having a display on the wall listing appropriate adverbs – including anxiously? After all, the guidance does say that:

guidance.png

Would that count as independent? What if my classroom display contained useful phrases for opening sentences for the particular genre we were writing? Would that still be independent?

The same problems apply in many contexts. For spelling children are meant to be able to spell words from the Y5/6 list. Is it still okay if they have the list permanently printed on their desks? If they’re trained to use the words in every piece?

What about peer-editing, which is also permitted? Is it okay if I send my brightest speller around the room to edit children’s work with them. Is that ‘independent’?

For an assessment to be a fair comparison of pupils across the country, the conditions under which work is produced must be as close to identical as possible, yet this is clearly impossible in this case.

Moderation isn’t a solution

The temptation is to say that Teacher Assessment can be robust if combined with moderation. But again, the flaws are too obvious. For a start, the cost of moderating all schools is likely to be prohibitive. But even if it were possible, it’s clear that a moderator cannot tell everything about how a piece of work was produced. Of course moderators will be able to see if all pupils use the same structure or sentence openers. But they won’t know what was on my classroom displays while the children were writing the work. They won’t know how much time was spent on peer-editing work before it made the final book version. They won’t be able to see whether or not teachers have pointed out the need for corrections, or whether each child had been given their own key phrases to learn by heart. Moderation is only any good at comparing judgements of the work in front of you, not of the conditions in which it was produced.

That’s not to imply that cheating is widespread. Far from it: I’ve already demonstrated that a good proportion of people will be wrong in their interpretations of the guidance in good faith. The system is almost impossible to be any other way.

The stakes are too high now. Too much rests on those few precious numbers. And while in an ideal world that wouldn’t be the case, we cannot expect teachers to provide accurate, meaningful and fair comparisons, while also judging them and their schools on the numbers they produce in the process.

Surely it’s madness to think otherwise?


For the results of all eight samples of success criteria, see this document.

 

A consistent inconsistency

With thanks to my headteacher for inadvertently providing the blog title.

With Justine Greening’s announcement yesterday we discovered that the DfE has definitely understood that all is not rosy in the primary assessment garden. And yet, we find ourselves looking at two more years of the broken system before anything changes. My Twitter timeline today has been filled with people outraged at the fact that the “big announcement” turned out to be “no change”.

I understand the rage entirely. And I certainly don’t think I’ve been shy about criticising the department’s chaotic organisation of the test and errors made. But I’m also not ready to throw my toys out of the pram just yet. This might just be the first evidence that the department is really listening. Yes, perhaps too little too late. Yes, it would have been nice for it to have been accompanied by an acknowledgement that the problems were caused by the pace of change enforced by ministers. But maybe they’re learning that lesson?

For a start, there are many teachers nationally who are just glad of the consistency. As my headteacher said earlier today, it leaves us with a consistent inconsistency. But nevertheless, there will be many teachers who are relieved to see that the system is going to be familiar for the next couple of years.

It’s a desire I can understand, but just can’t go along with. There are too many problems with the current system – mostly those surrounding the Teacher Assessment frameworks and moderation. But I will hang fire, because there is the prospect of change on the horizon.

It’s tempting to see it as meaningless consultation, but until we see the detail I don’t want to rule anything out. I hope that the department is listening to advice, and is open to recommendations – including those which the NAHT Assessment Reform Group of which I am a member is drawing together over this term.

If the DfE listens to the profession, and in the spring consults on a meaningful reform that brings about sensible assessment and accountability processes, then we may eventually come to see yesterday’s announcement as the least bad of the available options.

Of course, if they mess it up again, I’ll be on their case.

The potential of Comparative Judgement in primary

I have made no secret of my loathing of the Interim Assessment Frameworks, and the chaos surrounding primary assessment of late. I’ve also been quite open about a far less popular viewpoint: that we should give up on statutory Teacher Assessment. The chaos of the 2016 moderation process and outcomes was an extreme case, but it’s quite clear that the system cannot work.

It’s crazy that schools can be responsible for deciding the scores on which they will be judged. It has horrible effects on reliability of that data, and also creates pressure which has an impact on the integrity of teachers’ and leaders’ decisions. What’s more, as much as we would like for our judgements to be considered as accurate, the evidence points to a sad truth: humans (including teachers) are fallible. As a result, Teacher Assessment judgements are biased – before we even take into account the pressures of needing the right results for the school. Tests tend to be more objective.

However, it’s also fair to say that tests have their limitations. I happen to think that the model of Reading and Maths tests is not unreasonable. True, there were problems with this year’s, but the basic principles seems sound to me, so long as we remember that the statutory tests are about the accountability cycle, not about formative information. But even here there is a gap: the old Writing test was scrapped because of its failings.

That’s where Comparative Judgement has a potential role to play. But there is some work to be done in the profession for it to find its right place. Firstly we have to be clear about a couple of things:

  1. Statutory Assessment at the end of Key Stages is – and indeed should be – separate from the rest of assessment that happens in the classroom
  2. What we do to judge work, and how we report that to pupils and parents are – and should be – separate things.

Comparative Judgement is based on the broad idea of comparing lots of pieces of work until you have essentially sorted them into a rank order. That doesn’t mean that individuals’ ranks need be reported, any more than we routinely report raw scores to pupils and parents. It does, though, offer the potential of moving away from the hideous tick-box approach of the Interim Frameworks.

Teachers are understandably concerned by the idea of ranking, but it’s really not that different from how we previously judged writing. Most experienced Y2/Y6 teachers didn’t spend hours poring over the level descriptors, but rather used their knowledge of what they considered L2/L4 to look like, and judged whether they were looking at work that was better or worse. Comparative Judgement simply formalises this process.

It particularly tackles the issue that is particularly prevalent with the current interim arrangements: excellent writing which scores poorly because of a lack of dashes or hyphens (and poor writing which scores highly because it’s littered with them!). If we really want good writing to be judged “in the round”, then we cannot rely on simplistic and narrow criteria. Rather, we have to look at work more holistically – and Comparative Judgement can achieve that.

Rather than teachers spending hours poring over tick-lists and building portfolios of evidence, we would simply submit a number of pieces of work towards the end of Year 6 and they would be compared to others nationally. If the DfE really wants to, once they had been ranked in order, they could apply scaled scores to the general pattern, so that pupils received a scaled score just like the tests for their writing. The difference would be that instead of collecting a few marks for punctuation, and a few for modal verbs, the whole score would be based on the overall effect of the piece of writing. Equally, the rankings could be turned into “bands” that matched pupils who were “Working Towards” or “Working at Greater Depth”. Frankly, we could choose quite what was reported to pupils and parents; the key point is that we would be more fairly comparing pupils based on how good they were at writing, rather than how good they were at ticking off features from a list.

There are still issues to be resolved, such as exactly what pieces of writing schools would submit for judgement, and the tricky issue of quite how independent the work should be. Equally, the system doesn’t lend itself as easily to teachers being able to use the information formatively – but then, aren’t we always saying that we don’t want teachers to teach to the tests?

Certainly if we want children’s writing to be judged based on its broad effectiveness, and for our schools to be compared fairly for how well we have developed good writers, then it strikes me that it’s a lot better than what we have at the moment.


Dr Chris Wheadon and his team are carrying out a pilot project to look at how effective moderation could be in Year 6. Schools can find out more, and sign up to join the pilot (at a cost) at: https://www.sharingstandards.com/

 

Getting started with FFT data for KS2

School leaders are used to dealing with change, not least when it comes to assessment data, but this year is in a league of its own. With changes to all the tests, teacher assessment, scaled scores and accountability measures, headteachers would be forgiven for despairing of any attempt to make sense of it.

Even when Raise becomes available, there’s no saying how easy it will be to interpret, not least because of all the changes this year. However, the FFT Summary Dashboard is available from today (Wednesday 14th), allowing you to make headway into that first stage of data analysis to evaluate your school’s strengths, and pick out areas for further development. In today’s climate, any help with that will be welcome!

The first glance of your dashboard will give you a very quick visual representation of your key headline figures – attainment and progress – related to those that will feature in performance tables and be published on your school website. In FFT these are represented in the form of comparison gauges:

gauges.png

Comparison gauges that show key figures at a glance

The beauty of this is the clarity they provide compared to the complexity of the published data and its confidence intervals. In short: the middle white zone shows that you’re broadly in line with national outcomes; the red and green bands at either end suggest significant lower or higher results. This will be particularly helpful for governors who are either shocked by changes in numbers from the old system, or who are concerned about small negative values on the progress measures.

 

The dashboard offers more clarity, too, about specific groups within your school. With a changing landscape it can be hard to know what to expect, but the pupil group analysis will quickly tell you which specific groups – girls, middle attainers, free school meals – have performed particularly well, and which seem not to be keeping up. It’s a simple overview that makes a good starting point for further investigation.

groups

Quick identification of groups that have done particularly well, or poorly (green plus symbols show significant values)

It’s worth remembering, though, that some groups may be very small in your school: if you’ve only got a handful of girls, then don’t get too worked up over variations!

The dashboard also helps to pick out trends over time – another challenge when all the goalposts seem to have moved. By comparing the national results to previous years, FFT have been able to plot a trajectory that compares how attainment and progress might have looked in 2014 and 2015 under the current system. As a result, you can begin to see whether your school has improved by comparison to the national picture.

time.png

The time series shows your previous results adjusted to bring them more closely into line with the new frameworks. Not perfect, but a very telling ‘starter for ten’!

A caveat here: this is much more difficult with the writing judgements which are much less precise than the scaled scores. Take that alongside the evident variation in writing outcomes this year, and you may want to look deeper into those figures before making any quick judgements.

vulgps

Groups analysed

Further into the summary dashboard itself, we get into the detail of vulnerable groups and of the separate subjects. Again, you get an overview that helps to pinpoint areas to look into further. Specific groups remain a clear focus for Ofsted and other inspections, so this information will be vital to leaders. The further breakdown of subjects will be of interest too, and of particular use in schools where writing has been affected by the national inconsistencies. Again these sections allow you to compare your attainment and progress to the national picture, and also to reflect on how your results may have changed over time.

No doubt, by the time school leaders and governors have begun to look at their summary overview, there will be many more questions asked. That’s where the FFT Aspire platform can help. Using your summary as a starting point, you can explore each element in greater detail, filtering your results for different groups, or subjects – even down to the level of individual pupils. It will help you to unpick the measures that are likely to feature on your Raise Online profile when it arrives, and with others too, including using contextual information about your pupils to compare to similar groups elsewhere.  Alongside the target-setting and other elements of FFT, you have a wealth of information at your fingertips that can be used to focus your school improvement planning – the summary dashboard is just the start.

 


This post was written with the support of FFT in preparation for the launch of the new dashboards on 14th September 2016.

Some thoughts on KS2 Progress

Caveats first: these conclusions, such as they are, are drawn from a small sample of a little over 50 schools. That sample of schools isn’t representative: indeed, it has slightly higher attainment than the national picture, both in terms of KS2 outcomes, and in KS1 starting points. However, with over 2000 pupils’ data, it shows some interesting initial patterns – particularly when comparing the three subject areas.

Firstly, on Maths – the least controversial of the three subjects. It seems that – in this sample – pupils who achieved Level 2c at KS1 had an approximately 40% chance of reaching the new expected standard (i.e. a scaled score of 100+). That leaps to around 66% for those achieving L2b at KS1 (i.e. just short of the national average)

mathslevels

The orange bar shows the average of this sample, which is slightly higher than the national average of 70%

It’s important to note, though, that progress measures will not be based on subject levels, but on the combined APS score at Key Stage 1. The graph for these comparisons follows a similar pattern, as you’d expect:

mathsaps

Where fewer than 10 pupils’ data was available for any given APS score, these have been omitted.

There is an interesting step here between pupils in this sample on APS of 13 (or less) who have a chance of 40% or less of reaching the expected standard, while those scoring 13.5 or more have a greater than 50% chance of achieving the standard. (The dip at 12.5 APS points relates to pupils who scored Level 2s in Maths and one English subject, but a level 1 in the other, highlighting the importance of good literacy for achievement in KS2 Maths)

For Reading, the graphs look broadly similar in shape

readinglevels

Blue bar shows average of this sample at 67%, which is slightly higher than national average of 66%

Interestingly here the level 2c children scorers still have only 40% chance of meeting the expected standard, but those achieving 2b have a lower chance than in maths of reaching the expected standard (58% compared to 66% for Maths).

When looking at the APS starting points, there is something of a plateau at the right-hand end of the graph. The numbers of pupils involved here are relatively few here (as few as 31 pupils in some columns). Interestingly, the dip at 18.5 APS points represents the smallest sample group shown, made up of pupils who scored 2a/2b in the two English subjects, but a Level 3 in Maths at Ks1. This will be of comfort to teachers who have been concerned about the negative effect of such patterns on progress measures: it seems likely that we will still be comparing like with like in this respect.

readingaps

It is in Writing that the differences become more notable – perhaps an artefact of the unusual use of Teacher Assessment to measure progress. Compared to just 40% of pupils attaining L2c in Reading or Maths achieving the new expected standard, some 50% of those in Writing managed to make the conversion, and this against a backdrop of teachers concerned that the expected standard was too high in English. Similarly, over 3/4 of those achieving Level 2b managed to reach the standard (cf 58% Reading, 66% Maths)

writinglevels

In contrast to the other subjects, attainment in this sample appears notably lower in Writing than the national average (at 70% compared to 74% nationally)

With the APS comparisons, there are again slight dips at certain APS points, including 18.5 and 19.5 points. In the latter case, this reflects the groups of pupils who achieved Level 3s in both Reading and Maths, but only a 2b in Writing at KS1, suggesting again that the progress measure does a good job of separating out different abilities, even using combined APS scores.

writingaps

Of course, this is all of interest (if you’re interested in such things), but the real progress measures will be based on the average score of each pupil with each KS1 APS score. I’d really like to collect some more data to try to get a more reliable estimate of those figures, so if you would be willing to contribute your school’s KS1 and KS2 data, please see my previous blog here.


Spread of data

Following a request in the comments, below, I’ve also attached a table showing the proportions of pupils achieving each scaled score for the two tests. This is now based on around 2800-2900 pupils, and again it’s important to note that this is not a representative sample.

proportions

A few words on the 65% floor standard

There’s been much discussion about this in the last few days, so I thought I’d summarise a few thoughts.

Firstly, many people seem to think that the government will be forced to review the use of a 65% floor standard in light of the fact that only 53% of pupils nationally met the combined requirements. In fact, I’d argue the opposite: the fact that so few schools exceed the attainment element of the floor standard is no bad thing. Indeed, I’d prefer it if no such attainment element existed.

There will be schools for whom reaching 65% combined Reading, Writing & Maths attainment did not require an inordinate amount of work – and won’t necessarily represent great progress. Why should those schools escape further scrutiny just because they had well-prepared intakes? Of course, there will be others who have met the standard through outstanding teaching and learning… but they will have great progress measures too. The 65% threshold is inherently unfair on those schools working with the most challenging intakes and has no good purpose.

That’s why I welcomed the new progress measures. Yes it’s technical, and yes it’s annoying that we won’t have it for another couple of months, but it is a fairer representation of how well a school has achieved in educating its pupils – regardless of their prior attainment.

That said, there will be schools fretting about their low combined Reading, Writing & Maths scores. I carried out a survey immediately after results were released, and so far 548 schools have responded, sharing their combined RWM scores. From that (entirely unscientific self-selecting) group, just 28% of schools had reached the 65% attainment threshold. And the spread of results is quite broad – including schools at both 0% and 100%.

The graph below shows the spread of results with each colour showing a band of 1/5th of schools in the survey. Half of schools fell between 44% and 66%.

Combined attainment

Click to see full-size version

As I said on the day the results were published – for a huge number of schools, the progress measure will become all important this year. And for that, we just have to wait.

Edit:

Since posting, a few people have quite rightly raised the issue of junior/middle schools, who have far less control over the KS1 judgements (and indeed in middle schools, don’t even have control over the whole Key Stage). There are significant issues here about the comparability of KS1 data between infant/first schools and through primary schools (although not necessarily with the obvious conclusions). I do think that it’s a real problem that needs addressing: but I don’t think that the attainment floor standard does anything to address it, so it’s a separate – albeit important – issue.

A little data experiment

Right, let me be clear up-front: I cannot predict your school’s progress scores. I can’t even pretend to estimate a prediction of it. There is just no way to find it, without knowing the full national picture of data – and even the DfE don’t have that finalised yet.

However, we do know how the progress measure works (see the video here if you don’t), so it would be possible to recreate the process based on a sample of data. It’s really little more than a thought experiment, but it may be of interest all the same.

To get even close to that, though, it will need lots of data, from lots of schools in lots of detail. Where in the past I have tried to collect summary data, for this experiment I would need real data from schools that includes both KS1 and KS2 results for their Year 6 cohorts. My plan, then, is to collate the data and find the average progress made by pupils with common starting points within the sample.  I will then share the resulting progress calculations with schools who have submitted data.

Because this needs very specific and accurate data, it won’t be possible to collect this using an open spreadsheet. Instead, below is a master spreadsheet which can be downloaded by anybody who wants to take part. If you wish to join in, please download the form, add your own data, and then return it to info@primarycurriculum.me.uk

Just don’t base your career decisions on the results 🙂

To take part, please download, complete and return the following spreadsheet:

 

 

Am I overstretching it…?

What are people’s thoughts?

Everyone  wants to know about progress measures, but we won’t have the national data until September. We can’t work it out in advance… but is it worth trying to estimate?

I collected data on Tuesday night about the SATs results, and my sample was within 1 percentage point of the final national figures, which wasn’t bad. However, this would be a much more significant project.

To get anything close to an estimate of national progress measures, we would need a substantial number of schools to share their school’s data at pupil level. It would mean schools sharing their KS1 and scaled score results for every pupil – anonymised of course, but detailed school data all the same.

My thinking at this stage is that I’d initially only share any findings with the schools that were able to contribute. It would be a small sample, but it might give us a very rough idea. Very rough.

Would it be useful… and do people think they would be able to contribute?