Voice Direct - Sensory, Inc.

Voice DirectTM

Speech Recognition IC

From the Interactive Speech™ Line of Products

Speaker-Dependent Speech Recognition Solution

GENERAL DESCRIPTION

Voice Direct™, from the Interactive Speech™ family of

products, is a speaker-dependent speech recognition IC

designed for use in cost-sensitive electronic products. In

addition to performing speech recognition, Voice Direct

plays speech prompts, performs system control functions,

provides status outputs and interfaces to external ROM

and Serial EEPROM. Voice Direct can be controlled by

an external host processor, or it can operate in a pin

configurable stand-alone mode.

Voice Direct employs a sophisticated neural network to

recognize trained words or phrases with greater than 99%

accuracy. The highly-integrated nature of the chip

reduces external parts count. A complete recognition

system can be built with few additional parts other than a

battery, speaker, external memory, microphone, crystal,

and audio input circuitry. Voice Direct is available as an

IC or in a complete system module that includes a PCB

and all external components.

Voice Direct can be easily integrated into existing

products or used to quickly design new products. High

quality speech recognition is now possible in cost-sensitive

consumer products!

FEATURES

High Quality, Low Cost Speech Recognition

• Speaker-dependent recognition works in any language

• Recognizes up to 60 words/phrases

• Supports phrases up to 3.2 seconds

• • Minimal memory less than 100 bytes/word external

Integrated Single-Chip Solution

• Direct interface to 8K byte external memory for

template storage (Serial EEPROM)

• English speech prompts

• On-chip A/D and D/A

• • Output PWM circuitry for direct speaker drive

• • Language localization and custom synthesis options

Two Operating Modes

• External Host - controlled by an external processor

through a simple 3-wire host interface

• • Stand Alone - pin-configurable operation

Voice DirectTM Block Diagram

External

Preamp

AGC

External

Preamp

Microphone

A/D Converter and

Audio Signal

Processor Control Processor

Speech

Synthesizer

DTMF

Generator AMP

I2C Serial

Interface External

ROM

Interface

Speaker

Digital I/O

Voice DirectTM

Oscillator

DAC

Voice DirectTM DATA SHEET

From the Interactive Speech™ Line of Products

FEATURE OVERVIEW

Voice Direct performs high quality speaker-dependent

speech recognition. The chip utilizes its neural network

recognizer to recognize discrete words or short phrases.

The chip performs three basic functions:

Train - Users train the chip to identify a specific word

by saying each word twice. After training, the two

patterns are averaged and a template is stored.

Recognize - The user speaks a word and the chip

compares the new pattern with the previously trained

templates to identify which word was spoken. The chip

then outputs the result of its analysis.

Erase - Users can delete previously trained words from

the set of recognition templates.

In each of these functions, Voice Direct features integrated

speech prompting providing a complete interactive user

interface.

EXTERNAL HOST MODE

Voice Direct’s external host operating mode provides a

complete speaker dependent recognition system that can

easily be controlled by an External Host processor (Host).

The Host communicates to Voice Direct using a 3-wire

serial bus. This high-level control interface allows the

Host to control the flow of operations and to initiate all of

its functions including training, recognition, or synthesis.

In external host mode, Voice Direct recognizes up to 60

words. To improve application flexibility these words can

be divided into smaller recognition sets, improving

accuracy and functionality.

STAND ALONE MODE

Voice Direct’s stand alone operating mode is designed to

provide a complete recognition system using only the chip,

external template storage memory, and a few passive

electronic components. All operations, including training,

recognition, and erase can be controlled by configuring

chip input pins. Output pins provide status information to

external devices. In stand alone mode, Voice Direct can

recognize one set of 15 words.

SPEECH PROMPTS

Voice Direct includes a standard English vocabulary of

over 100 phrases to guide the user through its functions.

This standard word list can be replaced with a customized

word list for English or foreign languages via an external

ROM chip.

RECOGNITION THRESHOLD

Voice Direct supports multiple acceptance threshold levels

during the recognition process. The acceptance level

determines how closely the spoken word must match a

pre-trained template in order to pass. The user adjusts the

level depending on the complexity of the recognition set.

More complex recognition sets should have a higher

acceptance level, while simpler sets can use a lower

threshold level.

INPUT AUDIO AMPLIFIER AND FILTER

Voice Direct requires an external pre-amplifier to

condition the input signal. When used with an

inexpensive omni-directional electret microphone, the

input audio amplifier and filter must provide

approximately 58 dB of low-noise mid-band gain, 2-bit

AGC controllability, and a first order bandpass response

with 3dB points at roughly 700 Hz and 3300 Hz.

AUDIO OUTPUT

Voice Direct can directly drive a 32-Ohm speaker from

the SP0 pin, providing approximately 0.15W of audio

power.

MEMORY INTERFACES

Voice Direct requires 8K bytes of dedicated external Serial

EEPROM memory for template storage. Each time a new

word is trained, Voice Direct automatically writes the

template to the memory device. During recognition, Voice

Direct reads the templates from the memory device and

compares them with spoken words or phrases. Voice

Direct communicates through a I2C 2-wire serial interface.

TSSP MODULE

The Voice Direct solution is also available as a complete

module. The module is a single 2” x 2” PCB that includes

all external components (e.g., preamplifier, memory)

required by Voice Direct, except microphone and speaker.

This module is ideal for prototype development or small

production runs.

Feature Summary Of Voice DirectTM

Maximum Number of

Recognition Words Multiple Recognition

Sets Supported Acceptance

Threshold

Levels

Custom

Synthesis Foreign

Language

Synthesis

External Host 60 Yes (up to 8) 5Yes Yes

Stand-Alone 15 No 3No Yes

DATA SHEET Voice DirectTM

From the Interactive Speech™ Line of Products 3

Voice Direct

64-pin QFP Pinouts

ELECTRICAL SPECIFICATIONS

OPERATING CONDITIONS -20°C to +70°C;

Vcc=3.5-5.0V;

Vss =0V

ABSOLUTE MAXIMUM RATINGS

Maximum voltage 7.5V

Minimum voltage on any pin Vss-0.6V

Maximum voltage on any pin Vdd+0.6V

Any pin to GND ~0.1V to +7.5V

Operating temperature (TO)-20°C to +70°C

Soldering temperature 260°C for 10 sec

WARNING: Stressing the Voice

Direct beyond the “Absolute

Maximum Ratings” may cause

permanent damage. These are stress

ratings only. Operation beyond the

“Operating Conditions” is not

recommended and extended

exposure beyond the “Operating

Conditions” may affect device

reliability.

Name Pin Description I/O

A[15:0] 1-8, 11-18 External ROM Memory Address Bus O

AGC0 33 AGC control 0. The Voice Direct controls the amplifier gain with this signal. O

AGC1 32 AGC control 1 O

AGND 52 Analog Ground. For noise reasons, analog and digital grounds should be separate. -

AIN0 51 Analog In, low gain. (range AGND to AVDD/2.) I

AIN1 50 Analog In, hi gain (8X input amplitude of AIN0, same range) I

AVDD 55 Analog Voltage. For noise reasons, keep this supply independent of digital circuitry. -

DACOUT 48 Analog Output (unbuffered). O

GND 9, 22, 41, 56 Digital Ground -

MD[7:0] 57-64 External ROM Memory Data Bus I/O

MEM1 35 Memory Control 1. Serial Clock for Serial EEPROM. O

MEM2 34 Memory Control 2. Serial Data for Serial EEPROM. I/O

MODE/SP

1 54 The MODE pin is used to select Stand alone or CPU mode. This pin is also Speaker

Connect1. A 32-Ohm speaker can be connected directly to this pin. O (I at

powerup)

-RESET 21 Reset I

-RM 44 Read Memory Strobe. Can control -OE pin of External ROM. O

SH 49 Sample and Hold. Connect a 470 pF capacitor from here to AGND. I

SP0 53 Speaker Connect0. A 32-Ohm speaker can be connected directly to this pin. O

VDD 10,23,36,40,

46, 47 Digital Supply Voltage (core) -

-WR 43 Write Result. After a recognition sequence the chip places the result on the memory

data bus MD[7:0] and strobes this signal to latch the result into external devices. O

X1, X2 19,20 Crystal connect. A 14.312 mHz crystal is connected to these pins. O

NC 42 *** NO CONNECT -

Name Pin Description : I/O

64 MQFP EXTERNAL HOST MODE STAND ALONE MODE

HIGH 24 I/O: expansion bit 3 O: Add 8 to selected Category

OUT7 25 I/O: expansion bit 2 O: Category 7 (or 15 with HIGH)

OUT6 26 I/O: expansion bit 1 O: Category 6 (or 14 with HIGH)

OUT5 27 I/O: expansion bit 0 O: Category 5 (or 13 wuth HIGH)

OUT4 28 I/O: External host Bus Data bit3 O: Category 4 (or 12 with HIGH)

OUT3 29 I/O: External host Bus Data bit2 O: Category 3 (or 11 with HIGH)

OUT2 30 I/O: External host Bus Data bit1 O: Category 2 (or 10 with HIGH)

OUT1 31 I/O: External host Bus Data bit0 O: Category 1 (or 9 with HIGH)

ERROR 36 O: Voice Direct sets LOW to indicate processing in

progress O: Indicates an error occurred during the

last training or recognition sequence.

-TRAIN 37 O: Auxiliary status bit. I: “L”: Initiate Set Training.

-RECOG 38 I: CPU sets LOW to request action I: “L”: Initiate Recognition

MHS 39 I: Master handshake pin

Voice DirectTM DATA SHEET

521 East Weddell Drive

Sunnyvale, CA 94089

TEL: (408) 744-9000

FAX: (408) 744-1299

P/N 80-0022-5 9/8/98

Sensory is registered by the U.S. Patent and Trademark Office.

All other trademarks or registered trademarks are the property

of their respective owners.

From the Interactive Speech™ Line of Products

THE INTERACTIVE SPEECH™ PRODUCT LINE

The Interactive Speech line of ICs and software was developed to “bring life to products” through advanced speech

recognition and audio technology. The Interactive Speech Product Line was designed for consumer telephony products and

cost-sensitive consumer electronic applications such as home electronics, personal security, and personal communication.

The product line includes award-winning RSC-series general purpose microcontrollers plus a line of easy-to-implement chips

which can be pin-configured or controlled by an external host microcontroller. Sensory’s software technologies run on a

variety of microcontrollers and DSPs.

RSC-164

The RSC-164 is a low-cost 8-bit microcontroller designed for use in consumer electronics. It is a fully

integrated microcontroller and includes A/D, D/A, ROM, and RAM circuitry on chip. The RSC-164

can perform a full range of speech/audio functions including speech recognition, speaker verification,

speech and music synthesis, and voice record/playback.

Voice Direct™ TSSP

The Voice Direct TSSP provides cost-sensitive products with speaker-dependent speech recognition, speech synthesis and

DTMF tone generation. This easy-to-use, pin-configurable chip requires no custom programming and can recognize up to 60

trained words. The Voice Direct TSSP is most ideal for consumer telephony products which feature voice dialing.

Voice Dialer™ ASSP

The Voice Dialer ASSP delivers speech recognition technology that allows users to dial phone numbers by saying the name

of the person they wish to call. Voice dialing and phone directory management through speech recognition can be easily

integrated into existing products. This IC is designed for use as a slave chip controlled by an external host processor.

Voice Dialer Software

The Voice Dialer software provides advanced speech technology on a variety of microcontroller and DSP platforms. A

complete speech API and flexible design allows manufacturers to easily integrate speech functionality into telephony products.

IMPORTANT NOTICES

Sensory reserves the right to make changes to or to discontinue any product or service identified in this publication at any time without notice in order to improve design and supply the best

possible product. Sensory does not assume responsibility for use of any circuitry other than circuitry entirely embodied in a Sensory product. Information contained herein is provided

gratuitously and without liability to any user. Reasonable efforts have been made to verify the accuracy of this information but no guarantee whatsoever is given as to the accuracy or as to its

applicability to particular uses.

Applications described in this data sheet are for illustrative purposes only, and Sensory makes no warranties or representations that the RSC series of products will be suitable for such

applications. In every instance, it must be the responsibility of the user to determine the suitability of the products for each application. Sensory products are not authorized for use as critical

components in life support devices or systems.

Sensory conveys no license or title, either expressed or implied, under any patent, copyright, or mask work right to the RSC series of products, and Sensory makes balance between

recognition and synthesis no warranties or representations that the RSC series of products are free from patent, copyright, or mask work right infringement, unless otherwise specified.

Nothing contained herein shall be construed as a recommendation to use any product in violation of existing patents or other rights of third parties. The sale of any Sensory product is subject

to all Sensory Terms and Conditions of Sales and Sales Policies.