New features that improve Measurement of Sound and Vibration and Speech Recognition

Problems with sound analysis and speech recognition

In former times it has not been known what properties in sound signals, which actually were decisive for speech intelligibility. It has therefore been assumed that the frequency spectrum plays the decisive part in speech intelligibility, although it never has been proved, and it leaves a number of inexplicable phenomena.

Among other things how it is possible to identify a deep male voice with a pitch as low as 60-70 Hz through a telephone with a lower cut-off frequency at 300 Hz, or why are beo able to talk “exactly” like human beings even though the frequencies (formants) in the sound signals are quite different from human beings. Further humans are able to perceive speech under very complicated phonetic conditions such as to understand the lyrics of a song, even with heavy accompaniment.

In the period from 1977 to 1980 Frank U. Leonhard carried out his Ph.D. study, at the Technical University of Denmark. The subject was Speech Analysis Based on Linear Predictive Coding (LPC). During the study Frank U. Leonhard among other things tried to find the relationship between the spectrum of the speech signal and the auditory perception. The conclusion was that there were too much inconsistency between the spectrum and the human auditory perception, especially if you compare speech from children, females and males, to that could be the answer.

At that time it was not possible to raise money for research in another pre-process and Frank U. Leonhard therefore left the Technical University of Denmark in 1980 and joints the industry. The work in the industry didn’t deal with speech analysis and synthesis, but he continued to study auditory perception in his spare time. After 13 years research Frank got his first breakthrough. It was in the spring 1993 and May 4, 1993 and Leonhard Research was established. Leonhard Research has since changed name to Leonhard.

It appears that energy changes with short rise or fall times (at most 2 ms) in form of transients in sound signals play an important part in auditory perception. No one has earlier use the auditory information that is implied in abrupt energy changes and Frank U. Leonhard patented it in 1993. These abrupt changes are normally pulses generated by falling objects, breaking twigs or mechanical faults in rotating machines. Pulses may also be generated by the vocal cords in voiced speech. In this case the signal is a pulse train, and the period between the pulses defines the pitch. These is another important auditory property and it explains why is it possible to identify a deep male voice with a pitch as low as 60-70 Hz through a telephone with a lower cut-off frequency at 300 Hz.

In 2000 the first commercial transient analyser that analysed abrupt energy changes. It was introduced under the name HARMONI. Quickly the first outstand results showed up in the industry, among others test of rub & buzz in loudspeakers.

In 2001 NTI, Liechtenstein, implemented the transient analyser in the first type of instrument, as the first test instrument company. It was introduced under the name PureSoundTM.

Today the transient analyser is used in many industries among others in the engineering industry for sound and vibration analyses especially at the production lines. Examples of products that are tested are compressors and electric motors.

In 2003 Frank got his second breakthrough. He discovered that the ear performs an oscillations analysis in the time domain to identify vowels and the colour of the sound. An oscillation analysis has in most cases a much more outstanding profile than a spectrum based on frequency analysis and it has at last a quality equal to spectrum. Further it gives a speaker independent identification of vowels in contrast to the spectrum. The method is patent pending. Together the transient and oscillation analysis is assembled in APPA (Auditory Perceptual Pulse Analysis).

Skriv et svar