Voice DirectTM Speech Recognition IC Speaker-Dependent Speech Recognition Solution GENERAL DESCRIPTION FEATURES Voice DirectTM, from the Interactive SpeechTM family of products, is a speaker-dependent speech recognition IC designed for use in cost-sensitive electronic products. In addition to performing speech recognition, Voice Direct plays speech prompts, performs system control functions, provides status outputs and interfaces to external ROM and Serial EEPROM. Voice Direct can be controlled by an external host processor, or it can operate in a pin configurable stand-alone mode. High Quality, Low Cost Speech Recognition * Speaker-dependent recognition works in any language * Recognizes up to 60 words/phrases * Supports phrases up to 3.2 seconds * Minimal memory less than 100 bytes/word external Integrated Single-Chip Solution Voice Direct employs a sophisticated neural network to recognize trained words or phrases with greater than 99% accuracy. The highly-integrated nature of the chip reduces external parts count. A complete recognition system can be built with few additional parts other than a battery, speaker, external memory, microphone, crystal, and audio input circuitry. Voice Direct is available as an IC or in a complete system module that includes a PCB and all external components. * Direct interface to 8K byte external memory for template storage (Serial EEPROM) * English speech prompts * On-chip A/D and D/A * Output PWM circuitry for direct speaker drive * Language localization and custom synthesis options Two Operating Modes Voice Direct can be easily integrated into existing products or used to quickly design new products. High quality speech recognition is now possible in cost-sensitive consumer products! * External Host - controlled by an external processor through a simple 3-wire host interface * Stand Alone - pin-configurable operation Voice DirectTM Block Diagram Oscillator External External Preamp Preamp Microphone DTMF Generator A/D Converter and Audio Signal Processor AMP Control Processor Speaker Speech Synthesizer AGC DAC Voice DirectTM I2 C Serial Interface Digital I/O External ROM Interface From the Interactive SpeechTM Line of Products Voice DirectTM DATA SHEET FEATURE OVERVIEW Voice Direct performs high quality speaker-dependent speech recognition. The chip utilizes its neural network recognizer to recognize discrete words or short phrases. The chip performs three basic functions: Train - Users train the chip to identify a specific word by saying each word twice. After training, the two patterns are averaged and a template is stored. Recognize - The user speaks a word and the chip compares the new pattern with the previously trained templates to identify which word was spoken. The chip then outputs the result of its analysis. Erase - Users can delete previously trained words from the set of recognition templates. In each of these functions, Voice Direct features integrated speech prompting providing a complete interactive user interface. EXTERNAL HOST MODE Voice Direct's external host operating mode provides a complete speaker dependent recognition system that can easily be controlled by an External Host processor (Host). The Host communicates to Voice Direct using a 3-wire serial bus. This high-level control interface allows the Host to control the flow of operations and to initiate all of its functions including training, recognition, or synthesis. In external host mode, Voice Direct recognizes up to 60 words. To improve application flexibility these words can be divided into smaller recognition sets, improving accuracy and functionality. STAND ALONE MODE Voice Direct's stand alone operating mode is designed to provide a complete recognition system using only the chip, external template storage memory, and a few passive electronic components. All operations, including training, recognition, and erase can be controlled by configuring chip input pins. Output pins provide status information to external devices. In stand alone mode, Voice Direct can recognize one set of 15 words. SPEECH PROMPTS Voice Direct includes a standard English vocabulary of over 100 phrases to guide the user through its functions. This standard word list can be replaced with a customized word list for English or foreign languages via an external ROM chip. RECOGNITION THRESHOLD Voice Direct supports multiple acceptance threshold levels during the recognition process. The acceptance level determines how closely the spoken word must match a pre-trained template in order to pass. The user adjusts the level depending on the complexity of the recognition set. More complex recognition sets should have a higher acceptance level, while simpler sets can use a lower threshold level. INPUT AUDIO AMPLIFIER AND FILTER Voice Direct requires an external pre-amplifier to condition the input signal. When used with an inexpensive omni-directional electret microphone, the input audio amplifier and filter must provide approximately 58 dB of low-noise mid-band gain, 2-bit AGC controllability, and a first order bandpass response with 3dB points at roughly 700 Hz and 3300 Hz. AUDIO OUTPUT Voice Direct can directly drive a 32-Ohm speaker from the SP0 pin, providing approximately 0.15W of audio power. MEMORY INTERFACES Voice Direct requires 8K bytes of dedicated external Serial EEPROM memory for template storage. Each time a new word is trained, Voice Direct automatically writes the template to the memory device. During recognition, Voice Direct reads the templates from the memory device and compares them with spoken words or phrases. Voice Direct communicates through a I2C 2-wire serial interface. TSSP MODULE The Voice Direct solution is also available as a complete module. The module is a single 2" x 2" PCB that includes all external components (e.g., preamplifier, memory) required by Voice Direct, except microphone and speaker. This module is ideal for prototype development or small production runs. Feature Summary Of Voice DirectTM Maximum Number of Recognition Words Multiple Recognition Sets Supported Acceptance Threshold Levels Custom Synthesis Foreign Language Synthesis External Host 60 Yes (up to 8) 5 Yes Yes Stand-Alone 15 No 3 No Yes From the Interactive SpeechTM Line of Products Voice DirectTM DATA SHEET 64 ELECTRICAL SPECIFICATIONS 49 1 OPERATING CONDITIONS -20C to +70C; 48 Vcc=3.5-5.0V; Vss =0V ABSOLUTE MAXIMUM RATINGS Voice Direct 64-pin QFP Pinouts Maximum voltage 7.5V Vss-0.6V Vdd+0.6V Any pin to GND ~0.1V to +7.5V Operating temperature (TO) -20C to +70C Soldering temperature 260C for 10 sec Minimum voltage on any pin Maximum voltage on any pin 16 33 32 17 Name A[15:0] AGC0 AGC1 AGND AIN0 AIN1 AVDD DACOUT GND MD[7:0] MEM1 MEM2 MODE/SP 1 -RESET -RM SH SP0 VDD -WR X1, X2 NC Name Pin 1-8, 11-18 33 32 52 51 50 55 48 9, 22, 41, 56 57-64 35 34 54 21 44 49 53 10,23,36,40, 46, 47 43 HIGH OUT7 OUT6 OUT5 OUT4 OUT3 OUT2 OUT1 ERROR 19,20 42 Pin 64 MQFP 24 25 26 27 28 29 30 31 36 -TRAIN -RECOG MHS 37 38 39 WARNING: Stressing the Voice Direct beyond the "Absolute Maximum Ratings" may cause permanent damage. These are stress ratings only. Operation beyond the "Operating Conditions" is not recommended and extended exposure beyond the "Operating Conditions" may affect device reliability. Description External ROM Memory Address Bus AGC control 0. The Voice Direct controls the amplifier gain with this signal. AGC control 1 Analog Ground. For noise reasons, analog and digital grounds should be separate. Analog In, low gain. (range AGND to AVDD/2.) Analog In, hi gain (8X input amplitude of AIN0, same range) Analog Voltage. For noise reasons, keep this supply independent of digital circuitry. Analog Output (unbuffered). Digital Ground External ROM Memory Data Bus Memory Control 1. Serial Clock for Serial EEPROM. Memory Control 2. Serial Data for Serial EEPROM. The MODE pin is used to select Stand alone or CPU mode. This pin is also Speaker Connect1. A 32-Ohm speaker can be connected directly to this pin. Reset Read Memory Strobe. Can control -OE pin of External ROM. Sample and Hold. Connect a 470 pF capacitor from here to AGND. Speaker Connect0. A 32-Ohm speaker can be connected directly to this pin. Digital Supply Voltage (core) I/O O O O I I O I/O O I/O O (I at powerup) I O I O - Write Result. After a recognition sequence the chip places the result on the memory O data bus MD[7:0] and strobes this signal to latch the result into external devices. Crystal connect. A 14.312 mHz crystal is connected to these pins. O *** NO CONNECT Description : I/O EXTERNAL HOST MODE STAND ALONE MODE I/O: expansion bit 3 O: Add 8 to selected Category I/O: expansion bit 2 O: Category 7 (or 15 with HIGH) I/O: expansion bit 1 O: Category 6 (or 14 with HIGH) I/O: expansion bit 0 O: Category 5 (or 13 wuth HIGH) I/O: External host Bus Data bit3 O: Category 4 (or 12 with HIGH) I/O: External host Bus Data bit2 O: Category 3 (or 11 with HIGH) I/O: External host Bus Data bit1 O: Category 2 (or 10 with HIGH) I/O: External host Bus Data bit0 O: Category 1 (or 9 with HIGH) O: Voice Direct sets LOW to indicate processing in O: Indicates an error occurred during the progress last training or recognition sequence. O: Auxiliary status bit. I: "L": Initiate Set Training. I: CPU sets LOW to request action I: "L": Initiate Recognition I: Master handshake pin From the Interactive SpeechTM Line of Products 3 Voice DirectTM DATA SHEET THE INTERACTIVE SPEECHTM PRODUCT LINE The Interactive Speech line of ICs and software was developed to "bring life to products" through advanced speech recognition and audio technology. The Interactive Speech Product Line was designed for consumer telephony products and cost-sensitive consumer electronic applications such as home electronics, personal security, and personal communication. The product line includes award-winning RSC-series general purpose microcontrollers plus a line of easy-to-implement chips which can be pin-configured or controlled by an external host microcontroller. Sensory's software technologies run on a variety of microcontrollers and DSPs. RSC-164 The RSC-164 is a low-cost 8-bit microcontroller designed for use in consumer electronics. It is a fully integrated microcontroller and includes A/D, D/A, ROM, and RAM circuitry on chip. The RSC-164 can perform a full range of speech/audio functions including speech recognition, speaker verification, speech and music synthesis, and voice record/playback. Voice DirectTM TSSP The Voice Direct TSSP provides cost-sensitive products with speaker-dependent speech recognition, speech synthesis and DTMF tone generation. This easy-to-use, pin-configurable chip requires no custom programming and can recognize up to 60 trained words. The Voice Direct TSSP is most ideal for consumer telephony products which feature voice dialing. Voice DialerTM ASSP The Voice Dialer ASSP delivers speech recognition technology that allows users to dial phone numbers by saying the name of the person they wish to call. Voice dialing and phone directory management through speech recognition can be easily integrated into existing products. This IC is designed for use as a slave chip controlled by an external host processor. Voice Dialer Software The Voice Dialer software provides advanced speech technology on a variety of microcontroller and DSP platforms. A complete speech API and flexible design allows manufacturers to easily integrate speech functionality into telephony products. IMPORTANT NOTICES Sensory reserves the right to make changes to or to discontinue any product or service identified in this publication at any time without notice in order to improve design and supply the best possible product. Sensory does not assume responsibility for use of any circuitry other than circuitry entirely embodied in a Sensory product. Information contained herein is provided gratuitously and without liability to any user. Reasonable efforts have been made to verify the accuracy of this information but no guarantee whatsoever is given as to the accuracy or as to its applicability to particular uses. Applications described in this data sheet are for illustrative purposes only, and Sensory makes no warranties or representations that the RSC series of products will be suitable for such applications. In every instance, it must be the responsibility of the user to determine the suitability of the products for each application. Sensory products are not authorized for use as critical components in life support devices or systems. Sensory conveys no license or title, either expressed or implied, under any patent, copyright, or mask work right to the RSC series of products, and Sensory makes balance between recognition and synthesis no warranties or representations that the RSC series of products are free from patent, copyright, or mask work right infringement, unless otherwise specified. Nothing contained herein shall be construed as a recommendation to use any product in violation of existing patents or other rights of third parties. The sale of any Sensory product is subject to all Sensory Terms and Conditions of Sales and Sales Policies. 521 East Weddell Drive Sunnyvale, CA 94089 (c) 1996 SENSORY, INC. ALL RIGHTS RESERVED P/N 80-0022-5 9/8/98 Sensory is registered by the U.S. Patent and Trademark Office. All other trademarks or registered trademarks are the property of their respective owners. TEL: (408) 744-9000 FAX: (408) 744-1299 From the Interactive SpeechTM Line of Products