Why plagiarism detection software might not catch cheats

Скачать презентацию Why plagiarism detection software might not catch cheats

ee15774a4c0f2c0df7be26aa6e97f8f3.ppt

Количество слайдов: 24

Why plagiarism detection software might not catch cheats Dr Edgar A. Whitley LSE

Background • A three year HEFCE funded project on student diversity and academic writing (LSE and Lancaster) – http: //www. lums. lancs. ac. uk/Departments/o wt/Research/sdaw/ • Lessons learned about international students apply equally to many home students

Research assumptions • Some plagiarism is deliberate attempt to deceive – Copy someone else’s essay – Buy or ‘commission’ essays • Much “plagiarism” might be the result of students learning – To become members of a new academic community – To do lengthy academic writing – To do academic writing in an additional language

‘Plagiarism detection’ software

Turnitin • Used in over 80 countries and by 5000 institutions (12 million students and educators) worldwide. • 40 million student papers in their database growing by 50, 000 papers per day • Turnitin crawler has downloaded over 12 billion Internet pages and updates itself at a rate of 60 million pages per day.

Summary reports

Proper referencing

No original work

May catch students learning to become part of the academy • May have come from a ‘teaching only’ background (e. g. India) • May have limited experience of using journals and refereed conference papers (e. g. China) • May have limited experience of writing long ‘essays’ (e. g. Greece)

Implications for practice • Need to rethink recruitment and selection policies • Need to provide advice about the why and how of referencing – At the time of need, not administrative convenience

More generally • May have limited skills for paraphrasing and critical engagement with the literature (argumentation) • May be unaware of regulations and penalties regarding plagiarism

Continued • Need to provide opportunity to learn (i. e. make mistakes) and get feedback • Need to provide clear guidance on what is expected from student work

What does this indicate?

What might not be being caught?

‘Copy’ detection software • Dependent on coverage of database of texts • Dependent on algorithm used to match texts

Database coverage • Inevitably limited to a subset of available materials – Must be in electronic form – Must be in ‘readable’ electronic form – Must have access to materials – Must have uptodate materials

Actual coverage • Some indications – Total of 15308 fragments were submitted to Turnitin – 48. 4% of fragments were ‘found’ (i. e. similarity index > 25%)

Based on our study there is a 50% chance of being undetected if using random texts taken from the internet

Matching algorithm • Based on a system specific criteria for what counts as a match, e. g. number of characters • If sufficient variation within the matching block then no match detected

Turnitin’s algorithm • Based on matching consecutive characters • 7 consecutive words + 4 new words will probably never be detected • Minor changes at the right place can mean the difference between detection and nondetection

Implications • Ability to paraphrase affects likelihood of match being found • Not all misuse of sources will be picked up – Absence of match does not mean no inappropriate use of sources

Why plagiarism detection software might not catch cheats • Some of what is caught is not cheating but learning to become part of an academic community • Some cheating might not be picked up by algorithm and database

Who, in your institution, should we inform about our project work?

For more information • Resources website – http: //www. sdaw. info • Email E. A. Whitley@lse. ac. uk