
8ea76240cba4c62a01c50290012823ce.ppt
- Количество слайдов: 14
Carnegie Mellon The Sounds of Silence: Towards Automated Evaluation of Student Learning in a Reading Tutor that Listens Jack. Mostowand Gregory Aist Project LISTEN, Carnegie Mellon University http: //www. . cmu edu cs. /~listen Mostow 3/18/2018, p. 1
Carnegie Mellon Pilot study in urban elementary schoo Goals: • Analyze extended use of Reading Tutor • Identify opportunities for improvement Protocol: • Principal chose 8 lowest third-grade readers • Aide took each kid daily to use Reading Tutor in small room • Kid chose text to read ( Weekly Readerpoems, …) , Milestones: • Oct. 96: deployed Pentium, trained users, refined design • Nov. 96: school pre-tested individually • June 97: school post-tested individually Mostow 3/18/2018, p. 2
Carnegie Mellon User-Tutor interaction (11/7/96 version used in pilot study) User may: • click. Back • click. Help • click. Go • click word • read Tutor may: • go on • read word • recueword • read phrase Mostow 3/18/2018, p. 3
Carnegie Mellon Data recorded by Reading Tutor Sessions from Nov. 96 to May 97 (excluding ) outliers • 29 to 57 sessions per kid, averaging 14 minutes • Not used during vacations, downtime, absences 6 gigabytes of data • . WAV files of kids’ spoken utterances • . SEG files of time-aligned speech recognizer output • . LOG files of Reading Tutor events Mostow 3/18/2018, p. 4
Carnegie Mellon What to evaluate? Usability (can kids use it? ) • 1993 Wizard of Oz experiments • Lab and in-school user tests of successive versions Assistiveness kids perform better with than without? ) (do • 1994 Reading Coach boosted comprehension by ~20% • But: evaluation obtrusive, costly, sparse, subjective, noisy Learning (do kids improve over time? ) • Within tutor: this talk • On unassisted reading: pre-/post-test by school • More than with alternatives: future studies Mostow 3/18/2018, p. 5
Carnegie Mellon How should the Reading Tutor evaluate learning? Evaluation should be • Ecologically valid based on normal system use - • Authentic student chooses material - • Unobtrusive invisible to student - • Automatic objective, cheap - • Fast -- computable in real-time on PC • Robust-- to student, recognizer, and tutor behavior • Data-rich-- based on many observations • Sensitive detect subtle effects -So estimate improvementassisted performance in Mostow 3/18/2018, p. 6
Carnegie Mellon How to estimate performance? Accuracy= % of text words matched by recognizer output • Coarse-grained • Sensitive to missed words • Doesn’t penalize requests for help Inter-word latency time interval between aligned text words = • Finer-grained • Sensitive to hesitations, insertions • Robust to many speech recognizer errors Mostow 3/18/2018, p. 7
Carnegie Mellon Estimation of accuracy and latency (Nov. 96 example from video) Text: If the computer thinks you need help, it talks to you. Student said: if the computer. . . takes your name. . . help it. . . take. . . s to you Recognizer heard: IF THE COMPUTER THINKS YOU IF THE HELP IT TO TO YOU Tutor estimated 81% accuracy; inter-word latencies: If the computer thinks you …help, ittalks need. . . to you. ? 43 39 1 60 41 226 7 1 242 1 cs Mostow 3/18/2018, p. 8
Carnegie Mellon Improvement in accuracy and latency (same kid reads “help” in May 97) Text: When some kids jump rope, they help other people too. Student said: when some kids jump rope they help other people too Recognizer heard: WHEN SOME KIDS JUMP ROPE THEY HELP OTHER PEOPLE TOO Tutor estimated 100% accuracy; inter-word latencies: When some kids jump rope, they help other people too. ? 1 10 34 19 77 9 1 34 1 cs Mostow 3/18/2018, p. 9
Carnegie Mellon Which performance improvements coun Echoing the sentence doesn’t count. • So look only at the first try. Picking stories with easier words doesn’t count. • So look at changes on the same word. Memorizing the story doesn’t count. • So look only at encounters of words in new contexts. Remembering recent words doesn’t count. • So look only at the first time a word is seen that day. Mostow 3/18/2018, p. 10
Carnegie Mellon Accuracy increased 16% on same word fromfirstto last day seen in new context Mostow 3/18/2018, p. 11
Carnegie Mellon Latency decreased 35% on same word fromfirstto last day read in new context Mostow 3/18/2018, p. 12
Carnegie Mellon Is accuracy and latency estimation. . . Ecologically valid? Reading Tutor used in school Authentic? kids choose stories Unobtrusive? evaluate assisted reading invisibly Automatic? align recognizer output against text Fast? real-time on Pentium Robust? to much student, recognizer, and tutor behavior Data-rich? 10498 utterances, 139133 aligned words Sensitive? detects significant but subtle effects (< 0. 1 sec) Mostow 3/18/2018, p. 13
Carnegie Mellon Conclusion Does the Reading Tutor help? • Yes, with assisted reading • Transfers to unassisted reading! Research questions: • Who benefits how much, when, and why? • How should we improve the Tutor? For more information: • http: //www. cs. cmu. edu/~listen Mostow 3/18/2018, p. 14