How will automation affect the VLSI industry

archive

The communication between humans and computers is facing one of the most important innovations in its history: the input and output of human language. During the sixties, punch cards and tape were mainly used for entering information, and print lists were also used for output. The introduction of screens and keyboards in the 1970s was seen as a clear step forward, even if there was initially some resistance to the new communication aids.

The direct consequence of the introduction of these new techniques was an increase in effectiveness and productivity. The advances in high-level integration technology (VLSI) today lead to the penetration of electronic data processing into ever new areas. Inevitably, more and more people who have not received any special training in this area come into contact with the corresponding devices and systems. In order not to suffer any setbacks in productivity, advanced communication techniques that make using the computer as easy as possible must therefore be developed as a priority.

Actually, voice input and output is nothing new. Corresponding devices were brought onto the market as early as the beginning of the 1970s. The market success was, however, quite small, since the software effort for integration into a system was alarmingly large and the price / performance ratio was rather unfavorable. The development of VLSI technology has made decisive progress possible here. Powerful 16-bit processors and special signal processing processors enable the use of complex, but very stable algorithms. At the same time, the main part of problem solving is shifted to software and firmware, which leads to increased flexibility when changing and improving the product. At the same time, hardware costs are reduced because inexpensive mass-produced components can be used.

A speech recognition module was recently brought onto the market by Intel, which exemplarily shows which solutions are possible today with the use of state-of-the-art hardware. The most important features and components of this system are:

- A language processing module. This module not only performs the actual speech recognition, but also other operations such as feedback to the operator, buffering of messages, and transfer of information to the user system.

- Support for the development of the user software. This reduces effort, costs and risk for the user.

- Optional use of different degrees of integration with the same application software. Not only is the complete system offered, but also a functionally identical set of boards and the components to be able to assemble these boards yourself.

As with the transition from the punch card to the screen, the correct structuring and application of the new language processing aids is the key to success. Therefore, the basic structure of communication using language should first be briefly explained here.

Repeat in the event of transmission errors

Both the (human) sender and the (human) recipient transform a message. The sender must generate acoustic signals that represent the message to be transmitted, the receiver reacts to the message and gives feedback to the sender. If, for example, the message was incomprehensible to the recipient, he / she returns a corresponding error message and the sender repeats the message. During the repetition, more understandable words are used or the speaking speed is reduced.

In principle, communication between a human and a computer works the same way. The computer uses feedback, for example on a screen, to show whether and what it has understood. Of course, the words and sentences to be recognized must first be made known to the machine in a training session. With the aid of the feedback mentioned, the user can then adapt his behavior to the performance of the speech processing system. Because of the need for training, the system will generally only be able to understand the person performing the training. The use of speech recognition only makes sense if processes with the same speaker are repeated sufficiently often.

Other important conclusions can be drawn from the above model. On the one hand, the time it takes for the computer to respond must be very short. Studies have shown that in a conversation the time until the partner answers, be it verbally or through body language, is only about half a second.

Errors in the system, namely failure to recognize or incorrectly recognize speech patterns, are also very important. However, while failure to recognize does not have too great an impact, incorrect recognition can lead to considerable irritation for the user and dangerously reduce the acceptance of a speech recognition system in the long term. In the same way, interference from background noise, coughing or the like can have an effect.

Since speech processing is done interactively, user input requests and control outputs must be generated at the same time to verify correct recognition. This makes it possible to guide the user through more complex tasks. At the same time, this method is of great importance for the high reliability of the system, since the user guidance avoids many errors from the start.

This includes "overhearing" a message. The question and answer game will immediately detect such a mistake. Since plain text is also used, i.e. terminology used when dealing with other people, the operator's code errors are largely ruled out.

There are a number of options for arranging the microphone. In many applications, a direct connection via cable is an option. If longer distances are to be bridged, the use of infrared transmission or wireless microphone should be considered.

The human being is part of the system

Screens and speech synthesis come into question for the input requirements and for checking the correctness. However, it must always be remembered that an appropriate selection of these auxiliary devices is critical to the acceptance of the system. In contrast to other areas (...) here the human being is not just a user, but part of the system!

The language processing system mentioned briefly above offers a number of properties that are very useful for the implementation method described. Intel offers the product on three different integration levels.

The SBC570 is a complete system with a microphone speech processing board, the necessary software and the direct connection to Intel development systems. The development system can also take on the role of the guest computer. With this combination it is possible to carry out the necessary developments of the first two steps.

The SBC570 speech processing board only includes the hardware and firmware functions for speech recognition. A multibus interface or a serial interface (V.24) are optionally available at the connections. The corresponding signals for the operator console are also available. A second serial channel is used to guide the user and to forward the feedback.

At the component level, a set of components (SBC570) is offered that is technically identical to the components that are used in the system or on the circuit boards. The set contains Intel's signal processor 2921, the single-chip computer 8048 and the 16-bit standard processor 8086. The EPROM memory is the 27128. Since only standard components are used, users of voice processing systems can expect the usual price decline for electronic components.

The announcements made by companies in the office and factory automation sector clearly show the importance attached to direct communication with the computer through language processing. Some go so far as to predict a revolution in data processing by speech-recognition systems in the eighties.

Henning Wriedt is Marcom Manager at Intel Semiconductor GmbH, Munich.