Research Funded Projects
Two funded research projects: Ortho-Logo-Paedia (OLP) FP5-Quality of Life 2.67M€ EC project and STARDUST NHS-NEAT 275K€ UK project were based on Athanassios’ Optical Logo-Therapy, PhD thesis. This is a review of the software that was developed for these projects.
Table of Contents
STAPTK - Speech Technology Application Toolkit
STAPTk software was developed in the Speech and Hearing group, at the department of Computer Science, University of Sheffield. It is the epitome of the research work of Athanassios Hatzis, nine years in total, during his MSc, PhD thesis and the two funded research projects STARDUST and OLP that he and Prof. Phil Green initiated.
Those days Athanassios was a central figure in the department, he has written most of the code for STARDUST and OLP and also acted as a group leader of software development for the two funded projects.
The paper titled “An Integrated Toolkit Deploying Speech Technology for Computer Based Speech Training with Application to Dysarthric Speakers” summarizes the development and application of STAPTk.
Computer based speech training systems aim to provide the client with customised tools for improving articulation based on audio-visual stimuli and feedback. They require the integration of various components of speech technology, such as speech recognition and transcription tools, and a database management system which supports multiple on-the-fly configurations of the speech training application. This paper describes the requirements and development of STAPTk from the point of view of developers, clinicians, and clients in the domain of speech training for severely dysarthric speakers. Preliminary results from an extended field trial are presented.
Eurospeech 2003 – STAPTk (ppsx)
STAPTK in OLP
One of the main components of the OLP system is the final release of STAPTK (OPTACIA). OPTACIA was first published in WISP 2001 conference and is strongly based on the software that Athanassios Hatzis developed during his PhD thesis as well as the software developed during the STARDUST project. This final version is linked to the other components of the OLP system by executing OPTACIA through a command-line interface.
OPTACIA Kinematic Maps
An OPTACIA Kinematic map uses a two-dimensional ANN-trained map as a visual metaphor for isolated sounds and simple utterances. We describe the map as “kinematic” (i.e. relating to movement) because its visualization technique correlates strongly with articulatory movement during speech production. Visualization of speech sounds is instantaneous, the client can vary the articulators in response to on-screen visual feedback. Speech therapy is based on the evaluation of the quality of speech production and accompanied characteristics i.e. consistency and intelligibility. OPTACIA is meant to assist speech therapists in that role.
This paper concentrates on therapy based on real-time audio-visual feedback of client’s speech and presents results from building and testing visual maps for training hearing-impaired clients using the OPTACIA component. OPTACIA was developed from the Optical Logo-Therapy OLT (Hatzis, 1999; Hatzis and Green, 2001). OPTACIA is based on three basic, well-founded treatment principles: visuomotor tracking, visual contrast, and visual reinforcement. Visuomotor tracking (Ziegler, Vogel, Teiwes, and Ahrndt, 1997) is a special case of biofeedback where some dynamic physical measure of performance is portrayed visually in real-time.
Real-time audio visual animation with a sprite on the map
Real-time audio visual animation and Waveform display on wavesurfer
Visual Feedback Main Characteristics
Speech Therapy requirements are translated to visual feedback requirements and therefore the computer-based speech training aid, in our case OPTACIA, has to demonstrate visual consistency (phonetic map areas, and speech trajectories), visual contrast of the target areas of the map, and visual accuracy on the production of speech. In OPTACIA we achieved this with target markers, each target-marker has an area associated around it that is used for scoring purposes. The target markers’ functionality make them very useful because speech therapist can set them to fine-tune OPTACIA map with intermediate targets and establish short-term and long-term speech production targets for patients with various speech disorders.
OPTACIA Main Tasks
- Record speech data
- Transcribe speech data
- Build map
- select speech data
- design layout
- train map
- Explore layout of the map
- Monitor/Measure performance on the map
Future Directions and Problems to solve
Researchers interested in the technique developed in OPTACIA should consider the following issues that have been tackled in the present software :
- Acoustic to articulation mapping
- Synchronizing real-time audio-visual playback and recording
- Definition and statistical modeling of speech targets
- Automation of segmentation/labelling problem
- Automation of training maps
- The silence/speech detection problem
- Definition of metrics
- Distance from target
- Visualization of metrics
- Mapping/Visualisation of unseen speech input
- Mapping/Visualisation of non-continuous sounds
STAPTK in STARDUST
STARDUST NHS project (2000-2003) integrates computer based visual-feedback for speech training to assist dysarthric speakers to improve the consistency of their utterances, and speech recognition of commands in sequence to control an environmental device.
In an attempt to reduce speech production inconsistency and hence enhance the success of voice-driven assistive technology, the ASR component in STAPTk is closely coupled with therapy - Eurospeech 2003, Phil Green et al.
- Provide visual training aids to help improve consistency and/or intelligibility of severely dysarthric speakers
- Use training sessions to procude data for ASR
- Build small vocabulary recognisers for dysarthric clients
- Use the recognisers in assistive technology
Dysarthric ASR Problems
- Small training corpus
- Large deviance from normal
- Fluency problems
- Limited phonetic contrast
- Inconsistent production
Eurospeech 2003 Presentation
Eurospeech 2003 – Automatic Speech Recognition with Sparse Training Data for Dysarthric Speakers (ppsx)
STAPTk is based on open-source software but unfortunately University of Sheffield decided not to make it public.
Front end GUI in Tcl/Tk and incr-Tcl/incr-Tk
Back end speech and graphics processing in C
Speech processing with Snack and Wavesurfer (Kåre Sjölander and Jonas Beskow, 2003)
Speech detection algorithms, MGR endpointer (Bruce T. Lowerre, Public domain, 1995)
Graphics processing MATLAB Run-Time-Libraries
Credit is given to the following people that contributed in the development of STAPTk.
Domain expertise contributors
Phil Green (Speech Technology)
Mark Hawley (Assistive Technology)
Pam Enderby (Speech Pathology)
Sara Howard (Phonetics)
Rebecca Palmer (Speech Therapy)
Mark Parker (Speech Therapy)
Kate Woods (Speech Therapy)
David House (Speech Phonology)
Anne-Marie Öster (Speech Pathology)
Vincent Wan (ANN training)
James Carmichael (GUIs, STARDUST)
Stuart Cunningham (STARDUST-Command Sequence Recognition, Speech Training)
Kåre Sjölander (Synchronization of real-time audio-visual feedback with Snack-Wavesurfer)