Speech Technology Application Toolkit
STAPTk software was developed in the Speech and Hearing group, at the department of Computer Science, University of Sheffield. It is the epitome of the research work of Athanassios Hatzis, nine years in total : during his MSc, PhD thesis and the two funded research projects (STARDUST, OLP) that he and Prof. Phil Green initiated. Those days Athanassios was a central figure in the department, he has written most of the code for the following applications (STARDUST, OLP) and acted also as a group leader of software development for the two funded projects.
The paper titled "An Integrated Toolkit Deploying Speech Technology for Computer Based Speech Training with Application to Dysarthric Speakers" summarizes the development and application of STAPTk.
Computer based speech training systems aim to provide the client with customised tools for improving articulation based on audio-visual stimuli and feedback. They require the integration of various components of speech technology, such as speech recognition and transcription tools, and a database management system which supports multiple on-the-fly configurations of the speech training application. This paper describes the requirements and development of STAPTk (www.dcs.shef.ac.uk/spandh/projects/staptk – Speech Training Application Toolkit) from the point of view of developers, clinicians, and clients in the domain of speech training for severely dysarthric speakers. Preliminary results from an extended field trial are presented.
Eurospeech 2003 – STAPTk (.ppsx)
STAPTk is based solely on open-source software
Front end GUI in Tcl/Tk and incr-Tcl/incr-Tk
Back end speech and graphics processing in C
Speech processing with Snack and Wavesurfer (Kåre Sjölander and Jonas Beskow, 2003)
Speech detection algorithms, MGR endpointer (Bruce T. Lowerre, Public domain, 1995)
Graphics processing MATLAB Run-Time-Libraries
Credit is given to the following people that contributed in the development of STAPTk.
Domain expertise contributors
Phil Green (Speech Technology)
Mark Hawley (Assistive Technology)
Pam Enderby (Speech Pathology)
Sara Howard (Phonetics)
Rebecca Palmer (Speech Therapy)
Mark Parker (Speech Therapy)
Kate Woods (Speech Therapy)
David House (Speech Phonology)
Anne-Marie Öster (Speech Pathology)
Vincent Wan (ANN training)
James Carmichael (GUIs, STARDUST)
Stuart Cunningham (STARDUST-Command Sequence Recognition, Speech Training)
Kåre Sjölander (Synchronization of real-time audio-visual feedback with Snack-Wavesurfer)
Cross-References to STAPTk on the Web
2008 Computer Technology to Aid Therapy of Speech Disorders
2007 Articulation practice software solution incorporating real-time phonetic mapping technology
CORDIS Technology Marketplace, Phil Green
2005 International Summer School on Speech Variation VISPP’2005
2004 OLP User’s Manual
STARDUST – Speech Recognition for People with Severe Dysarthria
"In an attempt to reduce speech production inconsistency and hence enhance the success of voice-driven assistive technology, the ASR component in STAPTk is closely coupled with therapy" (Eurospeech 2003, Phil Green et al.)
STARDUST NHS project (2000-2003) integrates computer based visual-feedback for speech training to assist dysarthric speakers to improve the consistency of their utterances, and speech recognition of commands in sequence to control an environmental device.
Provide visual training aids to help improve consistency and/or intelligibility of severely dysarthric speakers
Use training sessions to procude data for ASR
Build small vocabulary recognisers for dysarthric clients
Use the recognisers in assistive technology
Dysarthric ASR Problems
- Small training corpus
- Large deviance from normal
- Fluency problems
- Limited phonetic contrast
- Inconsistent production
STAPTk in STARDUST action
Eurospeech 2003 – Automatic Speech Recognition with Sparse Training Data for Dysarthric Speakers (.ppsx)
OLP – Ortho-Logo-Paedia
One of the main components of the OLP system is the final release of STAPTK (OPTACIA). OPTACIA was first published in WISP 2001 conference and is strongly based on the software that Athanassios Hatzis developed during his PhD thesis as well as the software developed during the STARDUST project (see above). This final version is linked to the other components of the OLP system by executing OPTACIA through a command-line interface.
OPTACIA Kinematic Maps
An OPTACIA Kinematic map uses a two-dimensional ANN-trained map as a visual metaphor for isolated sounds and simple utterances. We describe the map as "kinematic" (i.e. relating to movement) because its visualization technique correlates strongly with articulatory movement during speech production. Visualization of speech sounds is instantaneous, the client can vary the articulators in response to on-screen visual feedback. Speech therapy is based on the evaluation of the quality of speech production and accompanied characteristics i.e. consistency and intelligibility. OPTACIA is meant to assist speech therapists in that role.
Testing a New Method for Training Fricatives
using Visual Maps in the Ortho-Logo-Paedia Project (OLP) (pdf)
This paper concentrates on therapy based on real-time audio-visual feedback of client’s speech and presents results from building and testing visual maps for training hearing-impaired clients using the OPTACIA component. OPTACIA was developed from the Optical Logo-Therapy OLT (Hatzis, 1999; Hatzis and Green, 2001). OPTACIA is based on three basic, well-founded treatment principles: visuomotor tracking, visual contrast, and visual reinforcement. Visuomotor tracking (Ziegler, Vogel, Teiwes, and Ahrndt, 1997) is a special case of biofeedback where some dynamic physical measure of performance is portrayed visually in real-time.
STAPTK in OLP (OPTACIA) action
Visual Feedback Main Characteristics
Speech Therapy requirements are translated to visual feedback requirements and therefore the computer-based speech training aid, in our case OPTACIA, has to demonstrate visual consistency (phonetic map areas, and speech trajectories), visual contrast of the target areas of the map, and visual accuracy on the production of speech. In OPTACIA we achieved this with target markers, each target-marker has an area associated around it that is used for scoring purposes. The target markers’ functionality make them very useful because speech therapist can set them to fine-tune OPTACIA map with intermediate targets and establish short-term and long-term speech production targets for patients with various speech disorders.
OPTACIA Main Tasks
- Record speech data
- Transcribe speech data
- Build map
- select speech data
- design layout
- train map
- Explore layout of the map
- Monitor/Measure performance on the map
Problems to solve
Researchers interested in the technique developed in OPTACIA should consider the following issues that have been tackled in the present software :
- Acoustic to articulation mapping
- Synchronizing real-time audio-visual playback and recording
- Definition and statistical modeling of speech targets
- Automation of segmentation/labelling problem
- Automation of training maps
- The silence/speech detection problem
- Definition of metrics (Distance from target, Acceptance/Rejection, Consistency, Intelligibility)
- Visualization of metrics
- Mapping/Visualisation of unseen speech input
- Mapping/Visualisation of non-continuous sounds
Athanassios Hatzis is currently active in a different domain of health information technology but do not hesitate to contact him in case you have plans to continue research in this specific area.