Latest research results from the EU project THE LANGUAGE MAGICIAN
3. September 2018
As part of the TLM project, data was collected from pupils in Germany, Great Britain, Italy and Spain over several survey periods. With the help of the data collected, the validity of the instrument – i.e. its quality and suitability – was to be checked on the one hand and, on the other hand, a first impression of the pupils’ linguistic competences was to be gained. In addition, a questionnaire was used at the end of the game to determine the motivation of the test persons.
In the first survey cluster starting in October 2016, the first level of the game was tested, which was designed for primary school pupils with 50 to 70 hours of foreign language instruction and tested the skills of listening comprehension, reading comprehension and writing, as well as a mixture of these skills (integrated skills). In the second survey cluster, the second level of the game was tested on pupils with 70 to 100 hours of foreign language instruction. In addition to English, Spanish and German, the second cluster examined the students’ language competences in Italian. The skills of listening comprehension, reading comprehension, writing and integrated skills were once again the focus of the observations.
During the pilot phase of the project, more than 6,000 students tested the game. More test persons played Level 1 than Level 2. For data analysis, 2574 data sets from level 1 were used. In the analysis of data from the second cluster of surveys, 885 subjects were examined, 141 of them learning German as a foreign language at school, 96 with Spanish, 80 with Italian and 568 learning English as a foreign language at school. In the following, the results from Spain, Germany and Italy for English as a test language will be described by way of example.
The results indicate that the test instrument can differentiate well between strong and weak learners. With regard to the distribution of the data, a distribution skewed to the left (or positively skewed to the right) and a flat distribution of the data can be determined, which can still be interpreted as a normal distribution of data.
Another research task consisted of testing how reliable – i.e. dependable – the computer game works as an instrument for the analysis of the pupils’ learning progress. It turned out2 that the majority of the pupils completed the items (individual tasks) correctly. However, in addition to the theoretical aspects of testing, educational aspects must also be taken into account when testing by means of the gaming method, as the first few tasks contribute significantly to the students’ motivation as they work through the series of tests.
For the following set of tasks, which are designed to test reading comprehension, an increasing level of difficulty3 of the tasks can be determined. The final set of writing tasks, the only productive skill tested – and rarely focused on during the early years of learning – turns out to be very difficult, as expected. Overall, the results have shown that the degree of difficulty is appropriate or can even be rated as ‘good’4.
In order to test the validity (reliability) of the test instrument, not only individual skills, but also general language competence were measured by means of an adapted C-test. The C-test is considered one of the most intensively researched test procedures and has been considered a reliable, valid and objective method for measuring general language competence since the 1980s. This is a specific form of cloze text, which is used not only in higher education as an instrument for assessment at certain levels. With the help of a bivariate regression – the correlation between two characteristics is tested – the question was asked to what extent the test series developed also measures linguistic competence. The answer was a definite ‘Yes’5 with a high percentage. This piece of evidence also confirmed the project team’s approach and working method.
It can also be assumed that the set of tasks designed measures linguistic competence and can make a significant contribution to the diagnosis of language competence, including individual skills. In addition to the test instrument itself, the motivational aspects of the tests were examined. It was demonstrated that the task formats are very motivating for the students. Furthermore, the survey of the pupils’ learning progress, particularly with the help of a computer game, results in a very highly rated learning situation for the test persons and, fortunately, is not perceived as a test situation by the majority of learners. Rather, it is the fun of playing the game that the students experience when working on their tasks. For the project THE LANGUAGE MAGICIAN, teachers in their feedback rated the academic monitoring of the game very highly. With these results, the academics involved in the project can fully recommend applying the game as an instrument for measuring the level of learning in schools.
More articles in which research results have been published:
https://www.thelanguagemagician.net/resources/ , “Presentations Final Conference – Assessment with a Magic Touch” – Session 3 “Session 3- Research results form piloting THE LANGUAGE MAGICIAN – by Louise Courtney, Suzanne Graham, Norbert Schlüter, Josefine Klein and Anna Cicogna
https://www.thelanguagemagician.net/research-background-language-magician/ “Some research background to the Language Magician, 19. October 2017, by Suzanne Graham, University of Reading
2 For the reliability analyses of the second level, the difficulty index (p) is given, which shows the percentage of items that a respondent can solve on average. A low index indicates a difficult task, a high p-value an easy task, which was consequently correctly solved by many test persons. Ideally, the degree of difficulty should be around 50 and can thus enable a differentiation in the lower and upper performance segments. Unfortunately, these theoretical requirements can hardly be implemented in educational practice, because an essential factor in test processing is motivation, which increases with the feasibility of the tasks.
3 The results show p-values of 79.20, a difficulty index of p=74.40 for listening comprehension and p=66.72 for integrated skills.
4 P-values are 24.10. The entire test series with a difficulty index of p=66.63 can consequently be rated as ‘good’.
5 With an r2 of .66, the estimation model can provide an explained variation of 66 %, which is remarkable for social science issues.