- Количество слайдов: 24
Why plagiarism detection software might not catch cheats Dr Edgar A. Whitley LSE
Background • A three year HEFCE funded project on student diversity and academic writing (LSE and Lancaster) – http: //www. lums. lancs. ac. uk/Departments/o wt/Research/sdaw/ • Lessons learned about international students apply equally to many home students
Research assumptions • Some plagiarism is deliberate attempt to deceive – Copy someone else’s essay – Buy or ‘commission’ essays • Much “plagiarism” might be the result of students learning – To become members of a new academic community – To do lengthy academic writing – To do academic writing in an additional language
‘Plagiarism detection’ software
Turnitin • Used in over 80 countries and by 5000 institutions (12 million students and educators) worldwide. • 40 million student papers in their database growing by 50, 000 papers per day • Turnitin crawler has downloaded over 12 billion Internet pages and updates itself at a rate of 60 million pages per day.
No original work
May catch students learning to become part of the academy • May have come from a ‘teaching only’ background (e. g. India) • May have limited experience of using journals and refereed conference papers (e. g. China) • May have limited experience of writing long ‘essays’ (e. g. Greece)
Implications for practice • Need to rethink recruitment and selection policies • Need to provide advice about the why and how of referencing – At the time of need, not administrative convenience
More generally • May have limited skills for paraphrasing and critical engagement with the literature (argumentation) • May be unaware of regulations and penalties regarding plagiarism
Continued • Need to provide opportunity to learn (i. e. make mistakes) and get feedback • Need to provide clear guidance on what is expected from student work
What does this indicate?
What might not be being caught?
‘Copy’ detection software • Dependent on coverage of database of texts • Dependent on algorithm used to match texts
Database coverage • Inevitably limited to a subset of available materials – Must be in electronic form – Must be in ‘readable’ electronic form – Must have access to materials – Must have uptodate materials
Actual coverage • Some indications – Total of 15308 fragments were submitted to Turnitin – 48. 4% of fragments were ‘found’ (i. e. similarity index > 25%)
Based on our study there is a 50% chance of being undetected if using random texts taken from the internet
Matching algorithm • Based on a system specific criteria for what counts as a match, e. g. number of characters • If sufficient variation within the matching block then no match detected
Turnitin’s algorithm • Based on matching consecutive characters • 7 consecutive words + 4 new words will probably never be detected • Minor changes at the right place can mean the difference between detection and nondetection
Implications • Ability to paraphrase affects likelihood of match being found • Not all misuse of sources will be picked up – Absence of match does not mean no inappropriate use of sources
Why plagiarism detection software might not catch cheats • Some of what is caught is not cheating but learning to become part of an academic community • Some cheating might not be picked up by algorithm and database
Who, in your institution, should we inform about our project work?
For more information • Resources website – http: //www. sdaw. info • Email E. A. [email protected] ac. uk