Gavin Brockis on Monday 14 February 2011 Tweet this!
Experiments with YouTube Machine Transcription and Camtasia Speech-to-Text tools.
This screencast shows the Machine Transcription generated from a screencast which I recorded with Camtasia and uploaded to YouTube. While uploading the pre-prepared caption track, I noticed that Youtube had also automatically (and quite quickly) analysed the voice-over and generated its own caption track, using its built-in Speech-to-Text algorithm, so I made a quick recording of the results, as this is a feature I hadn't noticed before:
Screencast URL: http://www.youtube.com/watch?feature=player_embedded&v=8SqVP_c0tf0#. There's quite a bit of dense text, so I'd suggest you watch in fullscreen (button at bottom right of the player)
The caption track can also be downloaded as text, so even if YouTube isn't your chosen delivery channel, you could just use it as a free online Speech-toText (STT) tool...
For sake of comparison I then removed both the original captions from the Camtasia project, and those transcribed by YouTube, and used Camtasia's own automatic Speech-to-Text transcription tool to make a third set of captions, which generated the following version of the same section of speech:
"I discovered official media
in this greenhouse or by giving you a quick introduction to screen customer to work for which you can use spotted checklist when
planning any type of screen coating project
off the five key stages of the screen coating work for a similar to those for the production of other types of digital media and the for learning
objects and they are planning
post production and use of the
planning involves both the design of your screen cast including scripting 40 villages another content and also planning what resources you
will need to act as during production and delivery
of goods clink of design is focused on the set of learning goals we should inform your plans for the content length and formatted screen cost
target length in particular can be an important decision
which can add useful structural constraints
resources you will need to include people collaborators and technical assistance equipment access to a suitable workspace the
skills needed to capture prepare your materials for delivery
planning is an absolutely vital stage of the workflow and many of you most important decisions will be made before you press record
for the first time
once planning is complete you can start to prepare your workspace WorkStation &Materials to ensure a successful and stress free recording session
this is the preproduction stage of the workflow way the shore everything you need is to hand and the you've configured and where necessary tested your input devices your liking
want to pursue a call by UN protection
while this is the stage where you are she create much your contact you may find if you've planned and prepared successfully that can be quite a quick process
the main thing to bear in mind return of factors which can factual record and white background noise interruptions and they need to minimize or avoid them
different techniques for recording often a matter of personal style and preference of goodwill is to concentrate on a relaxed and natural performance record everything that a decisions about which version or
take to use until afterwards
if you plan to edit the work of your screen. The next stage of its production is where you will do it
this production can be as simple as Preston the export by not using outlook for Mac all he can involve complex and editing process..."
Note - I have 'trained' the Camtasia STT tool to recognise my voice, but it is still very hit-and miss, as you can see! Particular words can be taught (eg JISC Digital Media) which I have not yet done, so this may also improve performance.
The original screencast can be seen in its entirety here, and the captions made from the imported text of the script can be enabled by pressing the CC button in the trasnsport bar.
I've had a lot of interest from workshop and online surgery attendees in production of captions and subtitles, and captions are a vital tool for providing access. While it's interesting to compare these STT transcripts, which are certainly better than nothing, their results still need heavy editing to be usable - a process which could take just as much time as manual transcription by a fast typist, if not more.
Still the most reliable method I have found (not being a touch typist) is to read the voice-over from a script where possible, and then import and sync it with the screencast in post-production.