Speech recognition technology can dramatically improve productivity

Paul Boughton

 If you read almost any press release for the latest version of a CAD, solid modelling or surface modelling package, you will inevitably find claims of increased functionality and improvements to the user interface.

All of which is entirely understandable, as companies using such software want to maximise their return on investment in staff and equipment in the design office as much as they do on the factory floor.

Usually the improvements to the usability centre around small changes to reduce the number of mouse clicks or keyboard strokes to perform a specific command, or changes to the layout or menu configuration to make it more intuitive or easier to access the most commonly used features.

Various other techniques have been employed over the years to try to make CAD operators more productive, such as colour-coded keyboards designed for particular CAD packages, tablets, mice with more functions, the Spaceball, and research is currently underway into haptic (touch-sensitive) devices for use with three-dimensional (3D) models.

But there is one area of usability that is perhaps under-utilised: speech recognition (sometimes called voice recognition). Computer software has been available commercially for approximately 10 years that can accept an input from a microphone and interpret it as a command. Numerous alternative packages – some offered as freeware or shareware – can be used for dictation or for ‘command and control’ of the computer. Some of these have been developed primarily for disabled users who find it difficult to use a keyboard and mouse, while others are aimed at general productivity. Indeed, some commentators believe that such tools will become increasingly popular alongside the wider adoption of VOIP (voice over internet protocol – or, effectively, telephony via a broadband internet connection).

One company that has realised the potential for speech recognition with CAD is Enact Technology, which has recently launched a product called Speak4CAD. This software has been developed for use with Autocad and is claimed to double the speed of operators. During Speak4CAD’s beta testing, productivity was measured by comparing manual drawings to those created by spoken drawing commands and dimensions (Fig.1). Bruce Swan, Enact Technology senior partner, said: “We are extremely pleased at the reception Speak4CAD software is receiving. One of our beta-test users, architect Fernando Andrade's response is typical of what we are hearing from everyone now using our product. He says that Speak4CAD has entirely changed the way he uses Autocad because it enables him to focus on drawing instead of the keyboard – a more natural way to work.”

CAD operators often find that the software forces them to work with one hand on the mouse and the other on the keyboard. A right-handed person will normally use the mouse with the right hand, leaving the left hand to do the typing. Inevitably this is slow, and requires the operator to look down at the keyboard (Fig.2). Speak4CAD resolves this conflict by voice activating commands and numbers based on spoken directions given simultaneously with mouse moves.

By using Speak4CAD, typing is said to be reduced by an average of 70percent and operators report that they experience much less fatigue and stress – both mentally and physically (Fig.3).

Although the software comes preconfigured with all the standard Autocad commands, users can also create their own customised commands.

Language lexicon

Speech-recognition applications based on a specific command grammar are considerably faster and more accurate than dictation software that has to continuously search an entire language lexicon to find a word match.

For this reason, Enact Technology designed and wrote Speak4CAD with an Autocad-specific command grammar. This enables responses to spoken command to be instantaneous. Enact Technology has also developed a number recognition grammar that eliminates the need for users to type in difficult numerical expressions.

Speak4CAD software is based on Windows .Net and does not alter the Autocad installation. A fully functional 30-day trial can be downloaded from the Speak4CAD website, and the software can be purchased for E125(US$149.95).

It has also been reported that Infoquest Technologies recently showed a speech-recognition add-on for Autocad at the Autodesk University 2005.

Surveyor 2006 is said to be for creating as-built architectural drawings, using voice-recognition technology to turn a two-person job into a one-person task. It has been claimed that the software can pay for itself in as little as 12 days.

Integrated speech recognition

While Enact Technology and Infoquest Technologies have developed Autocad add-ons, Think3 has incorporated voice recognition as an integral part of its Thinkdesign solid and surface modelling software package for several years (Fig.4).

When it was first introduced at Thinkdesign version 6.0 in 2001, the speech input was described as the most significant advance in CAD user interface technology since the industry’s migration to Windows from Unix. Speech recognition gives users a simpler, faster and more natural design experience that keeps the focus where it is most needed – on the model – without the distractions of command-line user interfaces and dialogue boxes.

Think3 worked with Microsoft to build speech functionality into Thinkdesign 6.0 as an early adopter of Microsoft’s SAPI5.0 (speech application programming interface). By way of an example of what can be achieved using speech recognition, Glenn Kennedy, a reviewer with Cadalyst magazine, built a simple 3D solid model with and without speech recognition.

When speech recognition was enabled, the model required 65 mouse clicks, no keyboard strokes and took 160 seconds to complete. Without the speech recognition, the same part took 117 mouse clicks, 37 keystrokes and took 207 seconds.

Thinkdesign version 9.0 is now available, still with an inbuilt speech recognition user interface.

With the majority of low- and mid-range CAD and modelling packages now running under the Windows operating system, there is an opportunity for users to make at least some productivity gains by installing one of the numerous Windows-based speech recognition utilities that are available.

VR Commander is one such command-and-control programme that adds a voice interface to virtually any Windows-based application. Users can simulate keystrokes, run any file or script and input text strings of any size using voice commands.

This technology is claimed to be so powerful that it is used to command and control some of today's most advanced fighter aircraft. And for users who have grown accustomed to a wireless keyboard and mouse, the VR Commander even supports wireless Bluetooth microphones.

A seven-day free trial version can be downloaded form the company's website, or the software can be purchased for E8.34(US$9.95).

A similar package is e-Speaking from the company with the same name. This package has over 100 commands built in, and more can be downloaded. As with the speech recognition capability in Thinkdesign, e-Speaking uses Microsoft's SAPI and .Net technologies. The package is free to download initially, but users must purchase a licence for E11.73(US$14) if they wish to add, edit or delete commands.

Text to speech

Another speech-recognition Windows utility is Speechtoolscenter, though this also has the ability to convert text to speech. Although this function is limited to text documents, rich-text format (RTF) files, Microsoft Word files, e-mails and web pages, it could make reading – or proof-reading – documents considerably easier.

It can also translate text into WAV audio files, and it has a facility for translating English text into spoken Spanish, French, Italian and German. Available as shareware, a licence for Speechtoolscenter costs E16.72 (US$19.95).

The Windows-based utilities described above are only a few of the many that are available as conventional licensed software, shareware or freeware. Plenty of information – and user reviews – can be accessed via the internet if required.

Many people have tried speech-recognition programmes in the past and been disappointed with the results.

But it has to be remembered that these programmes were likely to be dictation-type tools using continuous speech recognition and an associated large database that must be searched for each word used.

As a result, they can be slow, inaccurate and require training. In contrast, command-and-control programmes are extremely fast, accurate and a real benefit in terms of improved productivity and reduced fatigue. Speech recognition has now come of age and deserves another hearing. 

Recent Issues