Part 4

4 Dec

 

Chapter 4: 1977-1982: Acoustics & Speech Research

 

On April 1, 1977, I returned to the Bell Labs research area, joining the Acoustics and Speech Research Department in Murray Hill, NJ, headed by Osamu Fujimora. While Fujimora was a Professor at the University of Tokyo, he had developed an X-ray tracking system that recorded the movements of small lead pellets placed on the tongue, lips and jaw of subjects, as well as the the sounds, while speaking basic sounds, words, sentences and paragraphs of text. The movement of the pellets, the corresponding acoustic speech signals, and the phonetic transcriptions of the speech exercises were stored on multi-layer Winchester discs.

My initial project was to develop a computer program to align in time the phonetic transcriptions of the speech samples with the recorded sound signal and the pellet movement data. With these alignment time-marks, researchers could call up and analyze the articulation of  the various phonetic components of speech. I was not an expert on the phonetics of speech and had to quickly learn the symbols and the basics of this field.

My first need was to learn the how to use the computer system with which I could access the acoustic, articulatory, and phonetic data on the disks. The computer area housed a DDP-224, computer, an SEL computer, and a data room with several washing-machine sized disk readers. In today’s world of 200 giga-byte hard drives in small lap-top computers such as my Mac Powerbook on which I am writing this, it is hard to imagine these large disk-reader machines into which we loaded a 15-inch diameter multi-layer disk unit that contained only a few hundred kilo-bytes of data. The speech articulation database comprised a dozen of these large disks – large in size, but not large in storage capacity.

To access the data I had to sign up for time on the computer, and when I came into the computer room, I had to start-up the DDP-224 by manually keying in the octal code to boot-up the computer, and then load the selected Winchester data disk into the big disk-reader. I wrote programs in the Fortran language that provided access and analysis of the articulation, acoustic and phonetic-transcription data on the disc units. The demand for time on the computer was high and my hours of access were usually very early in the morning or very late at night. I also had a portable Texas Instruments modem unit that accessed the host computer from home by dialing up the server number, then putting the phone headset into the modem recepticles at the top of the TI unit. Everything I typed in and received back from the computer was printed on a thermal paper roll in the unit. On many a late night at home, I made a long roll of print-out while writing and de-bugging code.

The time alignment of the acoustic and articulation data with the phonetic transcription of each speech utterance was facilitated by a program written by Bishnu Atal and Larry Rabiner of our lab that accurately assigned time markers to the segments of the acoustic data corresponding to the voiced(V), unvoiced(U), and silence(S) segments of the speech utterance. Ideally, the elements in the V, U, S sequence could then be lined up with the elements of the phonetic transcription, but unfortunately the measured and predicted features did not form a clean one-to-one match.  After struggling with this matching problem for many days, I got the idea that instead of trying to match all the V, U, S time segments with the phonetic elements, I should just try to match the S-segments in the acoustic data with the pauses indicated in the phonetic transcriptions. If  I could find the optimum fit in this first stage, then the V, U, S segments between each of the matched pauses could be aligned in a second stage of the matching procedure. This was one of those “AHA!” moments that allowed the whole procedure to work. [See Note 1 at the end of this chapter].

Because my work in control theory, I was familiar with the dynamic programming process developed by Richard Bellman of CalTech for finding the optimum match of two feature sets, using an efficient search algorithm for the minimization of the error cost among the possible matching paths. I believe this was the first application of this “time-warping” matching algorithm in our research lab. Later it was successfully applied by others in our research on speech-recognition and signature-verification techniques.

I completed this first project about a year after I transferred to Fujimora’s department in the Acoustics and Speech Research Laboratory, where Max Matthews was the Director. Although I had not been previously told, I learned that my continued work there depended on the management’s evaluation of this initial work. My talk on the project was listed in the Labs-wide BTL Research Calendar for April. The talk went well, but I think Max was still not convinced I should get a permanent position, perhaps because he didn’t understand the significance of  the method I used to get a successful match, and partly because of his general disapproval of Fujimora’s projects. Shortly after that Research Calendar talk, he called me into his office. With him was John Pierce, a famous researcher and former Executive Director of Research at Bell Labs, who had retired and was then a professor at Cal.Tech. Max asked me to explain my matching procedure of the acoustic, articulatory and phonetic data, and when I described the optimal matching of the silences in the data rather than the speech portions and the application of dynamic programming to efficiently find the optimum match, Pierce appreciated the breakthrough that allowed the complete alignment to be achieved. Shortly after this meeting, I was told I could remain as a Member of Technical Staff (MTS) in his laboratory at Murray Hill. Subsequently, Max and his wife Marge, and me and my wife Mary Jean, became good friends. [see Note 2]

Given the sucessful alignment of the acoustic, articulatory, and phonetic databases,  the next problem was to develop the means to access, analyze, and display the data. For my part, I wrote a program that allowed one to search through the data for a given phonetic symbol and display the articulator pellet positions and the speech sounds for each occurance of that phoneme segment. My friend, Joe Perkell, who worked on speech dynamics at MIT, came to Murray Hill to do a study with me on the variability of vocal-tract positions of certain vowels in different speech contexts. When Joe came down to help in this project, he stayed on a bed in our music studio, and we would get up at 6 am to drive to the Labs for our computer time. The result of this work showed for these vowels that, while there was variability in the tongue position due to the tongue transitions before and after the particular vowel, the variation was least in the tongue position necessary for that vowel’s sound. [Perkell, J.S. and Nelson, W.L., “Articulatory targets and speech motor control: A study in vowel production”, in Speech Motor Control, Pergamon Press, Stockholm, Sweden (1982)].

My second research project involved the analysis of the physics of skilled movements. The hypothesis I put forth was that in skilled movements there was an efficient trade-off between the speed of such movements and the effort expended.

 

….More to follow.

Chapter 4 Notes

 

  1. In my years of research at Columbia  and Bell Labs there have been several such “Aha!” moments that led me to the solution of difficult problems. I’m sure that many researchers struggling for a solution on a problem have had such unexplaned brainstorms. Probably the most famous “Aha!” moment was that reportedly had by Archimedes while he was struggling to find a method for measuring the volume of various solid objects. Then, as he sat in his bathtub and observed the water rise, he realized the volume of  his submerged body equaled the volume of water it displaced, and he supposedly ran naked into the street shouting “Eureka! I have found it.”

 

  1. Max Mathews, myself, and many others in the lab, were interested in computer analysis and recording of music, and were also amateur musicians. Since I was studying the cello, I soon became involved in chamber music get-togethers with Max, David Slepian, Joan Miller, Steve Levenson, Aaron Rosenberg, and others at Murray Hill. For several years, Mary Jean tactfully, and patiently, agreed to play some of the easier Haydn, Mozart and Beethoven trios with Max and me at our studio in Morristown, even though it must have boring for such an accomplished musician as she was. The hardest part for her was abstaining from trying to correct all of our many errors. Sometimes she would explain the problem, and have us try it again, but if it didn’t work, she would let it go. For example, Max could just not manage the 3/2 tempo between the piano and violin parts in the Haydn #1 Trio. Finally, realizing he was my boss at the Labs, she let it go, bless her heart.

 

 

 

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: