Online testing
Queensland Teachers' Journal, Vol 123 No 5, 27 July 2018, p12
In the March issue, I explained why reviewing NAPLAN is just good policy. However, some have suggested a review is not needed because NAPLAN is changing and evolving into an online test. In this short piece I’d like to describe what an online, adaptive test is, what the claims regarding them are and why I think they are unlikely to solve the major problems for NAPLAN.
Since 2013, ACARA has been trialing NAPLAN Online in a sample of schools. It has a webpage with a number of research reports and information that can be accessed here https://www.nap.edu.au/online-assessment/research-and-development. It is interesting reading for all educators. ACARA claims that “NAPLAN Online will provide better assessment, more precise results and faster turnaround of information”. 200,000 students sat the online version of NAPLAN in 2018, leading up to full implementation in 2020.
NAPLAN Online is a form of standardised testing known as a computer adaptive test (CAT).Traditional standardised tests are an arrangement of material occurring in three distinct stages (Madaus, Russell, & Higgins, 2009, p. 40). The first stage is identifying the domain which is the specific area of interest that is being measured, whether it is a body of knowledge, skills, abilities or attributes. Tests are usually developed from these domains to report on abstract traits called constructs. A construct is (usually) the statistical creation of a theoretical idea not directly observable but assumed to be measured through a collection of items. These items, the second stage of test development, are effectively samples of domains and sub-domains so that a meaningful prediction can be claimed, or inferred, within the domain of interest. The third stage, inference, concerns how test scores can be used to infer achievement within the domain and can occur at the level of the individual (for example, how well a year five student understands grammar) or can be aggregated into larger groups to enable comparisons (Sellar et al, 2017).
CATs differ from traditional tests as they “adapt to a student’s ability level” through offering branches or pathways (Shapiro & Gebhardt, 2012, p. 296). These are known as testlets. In NAPLAN Online, students are streamed or branched into different ‘testlets’ based on their attainment in preceding questions. ACARA calls this its tailored test design.
The traditional pencil and paper NAPLAN has been rightly criticised for a number of reasons. When it is used for accountability purposes it can have the following issues:
- It can only ever be a snapshot of student achievement at a given moment in time and is therefore subject to external factors outside the remit of the teacher or school.
- It isn’t very good as a diagnostic test because a) it doesn’t assess how students are being taught and b) it takes too long to get information back to be of use to students.
- Its use as an accountability measure can lead to an excessive focus on tests distorting classroom practice, meaning that inordinate time is spent on practice tests, other curriculum areas become squeezed out as class time becomes dominated by the domain areas tested.
- It can be influenced by a variety of student responses; including student anxiety, low student motivation when students realise they don’t seem to benefit from the tests themselves and so on.
As Nichols and Berliner (2007) remind us, any time a test carries high stakes (such as publishing the data on MySchool or using class results to determine whether or not a teacher should be given a job) it tends to distort the processes it intends to measure. Further, as Wyse and Torrance (2009) found, when national tests are used in high-stakes ways, there can be an initial improvement of results but over time these improvements plateau because policymakers and system leaders ignore those difficult structural issues that impact achievement.
While these are just some of the criticisms of NAPLAN, it is important to see NAPLAN Online as trying to respond to some of these issues. First, shifting the tests online enables quicker feedback to teachers so that the diagnostic possibilities are more available. This may explain ACARA’s enthusiasm for machine marking of student writing. Second, the nature of an online assessment means that different types of question can be asked, such as using embedded multimedia tasks. Third, the adaptive design means that questions can be tailored to fit the capabilities of the students so that they are more engaged with the test, that is, they are suitably encouraged by the appropriate difficulty of the items to continue trying to answer the questions. Research conducted by Martin and Lazendic (2018) suggests that online testing has a positive effect on student test motivation.
However, while there may be some psychometric advantages to online tests (precision, time, test engagement) a concerning aspect is the belief that this will solve structural problems associated with NAPLAN itself. We should not be optimistic, NAPLAN’s problems are also structural and shifting the test online is not going to change the fact that the overemphasis on, and publication of, the results is distorting classroom practice.
Running two separate tests and then combining results through statistical manipulation is also something we should be sceptical about. ACARA seems convinced that this is a fairly straightforward process, but this is not certain. The production of the score in the online version to be moderated is much more dicey than the pencil and paper version, beholden as it is to access, computer problems, device problems, internet problems and the like. Furthermore, test-taking behaviour in online environments may work differently. When investigating the 2015 PISA assessment which had some cohorts taking online (but not adaptive) tests and others taking traditional tests, Jerrim (2018) found that pupils completing the computer-based test performed substantially worse than pupils completing the traditional test even after statistical moderation.
As a psychometrician friend of mine once said; Just as you can’t model your way out of bad data, neither can you test your way out of bad results. Equally, I’d say that redesigning your tests to alleviate minor problems rather than the major ones is not going to improve the education system and the way that it is affected. NAPLAN Online does not change the need for an immediate evaluation of the tests.
Citations
Hwang, G. (2003). A conceptual map model for developing intelligent tutoring systems. Computers & Education, 40, 217-235.
Jerrim, J. (2018). A digital divide? Randomised evidence on the impact of computer-based assessment in PISA. Centre for Education Economics Ltd (CFEE). http://www.cfee.org.uk/node/242
Madaus, G., Russell, M., & Higgins, J. (2009). The Paradoxes of High Stakes Testing: How they affect students, their parents, teachers, principals, schools. and society. Charlotte, NC:: Information Age Publishing.
Martin, A. J., & Lazendic, G. (2018). Computer-adaptive testing: Implications for students’ achievement, motivation, engagement, and subjective test experience. Journal of Educational Psychology, 110(1), 27-45.
Nichols, S. L., & Berliner, D. C. (2007). Collateral damage: How high-stakes testing corrupts America's schools. Harvard Education Press.
Sellar, S., Thompson, G., & Rutkowski, D. (2017). The Global Education Race: Taking the Measure of PISA and International Testing. Toronto: Brush Education
http://dx.doi.org/10.1037/edu0000205
Wyse, D., & Torrance, H. (2009). The development and consequences of national curriculum assessment for primary education in England. Educational research, 51(2), 213-228.