Audio Analysis of the 911 Call
of Patsy Ramsey 20 August 2003, 24 August 2003
"Dave" on Jameson's Webbsleuths (www.webbsleuths.com)
Abstract
Copies of the recording of the emergency 911 call made by Patsy Ramsey were analyzed using audio
software; results are discussed. It is concluded that purported conversation between the Ramseys is a
combination of several different noise sources that give only the appearance of conversation. MP3®
files have been produced which demonstrate both spectral differences between noise sources and the
similarity between different sections of the recordings in terms of timing and cadence, suggesting a
repetitive, mechanical source for some of the sources.
Introduction
On 26 December 1996, Patsy Ramsey, then of Boulder, Colorado, made a 911 emergency call to the
Boulder Regional Communication Center to report the kidnapping of her daughter, JonBenét. The 911
call was recorded, and very recently (July 2003) the Boulder County District Attorney's office has
released audio copies of Patsy Ramsey's 911 call in the form of audio cassette tapes and audio CD's.
A controversy has existed ever since 1997 as to whether or not the recording of the 911 call contains
unintentionally overheard conversation between members of the Ramsey family after the point where it
was assumed that Patsy had hung up the phone. Some of those investigating the case of the murder
of six-year-old JonBenét Ramsey seem to think that this issue is a very important one while others
believe that regardless of what is on the tape, it is of very little evidentiary value, unless it is a
confession of some sort. The main issue appears to be whether or not Burke Ramsey, JonBenét's older
brother, was awake at the time of the 911 call. The Ramseys had maintained that Burke was asleep at
this time while certain investigators had maintained that he was awake and speaking with his parents,
therefore that the elder Ramseys engaged in some sort of deception, so perhaps lied about the
circumstances surrounding the death of their daughter.
This report contains a technical discussion of testing that was done on both an audio cassette
recording and an audio CD that were obtained from the Boulder County DA's office. The audio cassette
and CD track are described, various types of processing are discussed, including separate processing
and testing of portions of the recordings that some believe to be conversation. The conclusion is
reached that these purported conversations are almost certainly not actually conversation by the
Ramseys which is being picked up over their phone line and recorded after Patsy thought she had hung
up the phone. If there is such conversation during the controversial audio sections, it is not audible
even after various types of enhancement. Instead it appears that different types of noise with
different spectral characteristics are superimposed in such a manner as to produce the appearance of
voices. The appearance of voices would no doubt be especially strong in the presence of imagination
empowered by suggestion and wishful thinking.
Audio Processing
A cassette tape and an audio CD of the emergency 911 call made by Patsy Ramsey on 26 December
1996 were obtained from the Boulder County District Attorney's office in Boulder, Colorado. For a more
complete description of the audio processing steps applied, please refer to the Appendix. What follows
here is a summary in text form, but is still intended primarily for a technical audience.
The "tape" was a recording on a common-usage, 120-minute audio cassette, a Maxell® UR,
normal-bias tape. This tape was played on a Pioneer® CW-650R dual-cassette tape deck. A digitized
version was produced by feeding the output of the tape deck into a Behringer® Eurorack MX802A
mixer, the output of which was fed into a Terratech EWX-2496 mastering sound card that was
installed in a Pentium III® computer. The recording software was Steinberg® WaveLab(tm) Lite, set
for monophonic recording, 24-bit, 96,000 samples per second.
The "CD track" was a single audio track on a generic, brandless CD-R or CD-RW, probably CD-R. The
track data were so-called CD quality or 16-bit, 44,100 samples per second, stereo (actually dual
monophonic). The audio track was ripped from the CD track using Ahead's Nero® CD burning software
that was bundled with a Creative Labs® 8432E CD/RW CD burner, reportedly a relabeled Plextor®. The
stereo image was discovered to be two completely identical monophonic tracks. The left channel of
the CD track was then upsampled to a monophonic, 24-bit, 96,000 samples per second WAV file using
Syntrillium's CoolEdit® 2000.
Both the tape and the upsampled CD track were then processed through EXE Consulting's Engulf
Audio(tm) software to produce a high-quality stereo image. This high-quality stereo image simulates a
binaural recording, including stereo separation, echo, and reverb, all calculated in a self-consistent
manner by solving the 3-D wave equation for a point source in a large enclosure. The simulated
environment was a large concert hall. The absorption (reverb time T60) was set to 0.30 seconds for
100 Hz and 0.20 seconds for 4,000 Hz. These reverb times are very short for a large concert hall, but
the tape is conversation, so the absorption was increased (T60 decreased). As is well known, reverb
times should be shorter for speech than for orchestral music in order to maintain speech
comprehension. The ideal listening environment for this simulated binaural recording is studio- quality
headphones, but with the fast absorption (short reverb times), the use of computer speakers in an
acoustically dead office space was found to be acceptable. We used primarily studio-quality
headphones for our testing. Normally, this type of step would be performed later rather than earlier.
The reason for performing this step first was to create a stereo image so that following processing
steps could be more readily monitored using studio-quality headphones.
Both stereo images were subjected to dynamics processing (compression and expansion) to bring up
low-level sounds, then noise-reduced using Syntrillium's CoolEdit 2000. Noiseprints were taken from the
stereo images themselves during relatively quiet sections. The tape, in particular, produced a very
noise-free 24/96 stereo WAV file. The CD track was very noisy to begin with, so didn't produce as
noise-free a WAV file.
Both 24/96 stereo WAV files were then downsampled to CD-quality (16- bit, 44,100 stereo) images,
then compressed using the Fraunhofer Institute's MP3 algorithms, licensed by Syntrillium for their
CoolEdit 2000 program. The images were compressed at a rate of 256 KBits per second. They are joint
stereo, MPEG-1, layer 3. (See Reference <2> for a summary of MP3).
General Characteristics of Recordings
The cassette recording begins with Patsy Ramsey saying, "?55 Fifteenth Street." The "7" digit is cut
off at the beginning, but something sounding like the ending "n" sound can be heard. It is possible that
whoever made the cassette tape copy neglected to take into account the unmagnetized tape leader
at the beginning of the cassette tape.
The cassette recording ends with what appears to be typing by the 911 dispatcher. There is a final
louder click sound as if the dispatcher hit a particular key harder than the others. The CD track begins
with buzzing, a quick series of pops, a couple of separated pops, more buzzing, a very brief amount of
noise, then a click which probably was the connection to the Ramsey phone line being made. Patsy
Ramsey utters something immediately after the click, then the dispatcher says, "911 Emergency."
Patsy then utters something else which cannot easily be distinguished, then says, "Police!" The 911
dispatcher starts saying, "What's going..." Then Patsy says, "?55 Fifteenth Street," the point where
the cassette recording starts. It is difficult to hear the "7" of the Ramsey home address in this case
because the 911 dispatcher is talking at that point, as Patsy interrupts her.
The CD track appears to contain another audio section beyond the point where the cassette tape
recording ends. Immediately after the point where the cassette recording ends at what sounds like a
final, loud key click, there is a three or four second section of a buzzing sound on the CD track, the
same as that at the beginning of the track. Then there is a very brief section of noise like an open
microphone followed by a pop. Following this is a section of noise similar to that which can be heard
earlier in the track, immediately after the 911 dispatcher said, "Patsy?" for the last time and prior to
her typing sounds. This later noise section is three or four seconds long. A short series of pops occurs,
then a short section of buzzing sound on top of the noise ends the CD track.
The bulk of the recordings are very similar, as would be expected. A very notable difference between
the recordings is that popping noises occur throughout the CD track that cannot be heard on the
cassette tape recording. The CD track is in general much noisier than the cassette recording. It
appears to be a poor-quality digitized version of what is contained on the cassette tape plus additional
preceding and succeeding audio; however, it does contain this additional, critically important
information, from a forensic standpoint, at the beginning and end of the track. These additional audio
sections were used to draw conclusions about the controversial audio sections which some claim
contain conversation. It is important to note that the controversial sections are NOT contained in the
additional audio sections that are only on the CD track; the controversial sections potentially
containing barely perceptible conversation are on BOTH the cassette recording and the CD track.
Purported Conversation
The audio section of both the tape and the CD track that contains purported conversation begins
shortly after the 911 dispatcher says, "Patsy?" for the fourth time with noise that could possibly be
interpreted as "We're not speaking to you" and ends with a very brief bit of noise resembling "What did
you find?" and containing what sounds like a final, hard keystroke. A separate MP3 file was created for
this particular section from the audio cassette. This should help listeners locate the controversial
section without having to search for it. This section has slightly different audio processing performed
on it in order to help bring out the purported conversation so that listeners can easily identify the
"We're not speaking to you" and "What DID you find" sections. Please refer to the Appendix for more
complete details on audio processing of this section.
Analysis of Noise
Several loops were created from short audio sections extracted from both the digitized cassette tape
and the ripped CD track. Audio sections were extracted from the purported conversation between the
Ramseys after the 911 call was assumed to be completed, from a section prior to the 911 call being
connected to Patsy Ramsey, and from another section on the CD track which appears not to be the
call from Patsy, but is perhaps part of the recording of a subsequent call.
Immediately prior to the connection being made to Patsy Ramsey, there is a very brief audio section
(hereinafter #1) on the CD track that contains noise that sounds as though it could possibly be
conversation. After the last time the 911 dispatcher says, "Patsy?" (the fourth time), there is another
audio section (#2) which is relatively quiet except for some noise which sounds as though it could also
possibly be conversation. It has been alleged at various times since December 1996 that this
conversation was John Ramsey saying something like, "We're not speaking to you." This audio section
is contained on both the cassette tape and the CD track. At the time near the last audible (what is
assumed to be) key press by the 911 dispatcher, there is another section (#3) with noise which could
possibly be conversation. This has been alleged to be Burke Ramsey saying something like, "What DID
you find?" This section is contained on both the cassette tape and the CD track. A final audio section
(#4) containing similar noise to section #2 is contained only on the CD track and follows a section of
buzzing which resembles that of an amplified ground loop where the ground loop is close to electrical
equipment such as a computer. Section #4 also sounds like #1 which is immediately prior to the
connection being made to the Ramsey phone line.
This last audio section lasts about seven seconds and is somewhat of a mystery. It is not contained
on the cassette tape at all, and it appears to have nothing whatsoever to do with the call from Patsy.
The intervening buzzing sound between the audio sections #3 and #4 is very similar to the buzzing
sound at the very beginning of the CD track, before audio section #1 starts. During both of these
buzzing sounds, there is no audible input whatsoever, that is no discernible background noise such as
from an open microphone or anything else such as that. This intervening buzzing sound is the
strongest evidence that audio section #4 has nothing to do with Patsy's call. It is fortuitous, however,
that it appears to have the same type of noise as does the recording of Patsy's call, as if it were from
the same original recording system, perhaps something like a hangup call that came in soon after
Patsy's call.
The extracted audio sections were processed again, separately from the rest of the recordings, in
order to determine whether or not they contained conversation. It was noted early on that these
noises had higher frequencies than should be passed over the Ramsey phone line; they were also very
mechanical sounding in their cadence and precision and in their apparently repetitive nature.
Certain of these audio sections were overlaid with each other, one in the stereo Left channel, the
other in the stereo Right channel. With this arrangement, the timing of the audio sections could be
independently altered until a common sound was heard from both the Left and Right channels. If some
sort of repetitive machine noise was present, it should be possible to synchronize the sounds from the
two channels so that they became one, at least for any common, repetitive source. We found that we
could do this with, for example, section #2 in the Right channel and each of the others in the Left
channel. The composite sections were then looped a number of times for ease of listening. It is much
easier to comprehend a short audio section if it is repeated a number of times.
The cassette recording was judged to be probably the most faithful rendition of the master recording
(evidence tape), even though we did not have access to this master. We judge this because of the
excessive amount of extra noise on the CD track. With respect to the digitized audio cassette
recording , it was possible to overlay only audio sections #2 and #3 because sections #1 and #4 were
not present on the audio tape. Nevertheless, it was possible to find a particular offset time, the same
as for the CD track, which seemed to join these sections together. Some of the noise was very similar
in both channels. In particular, the cadence of the two was suspiciously similar.
Two additional loops were made from this overlay of #2 and #3 ("We're not..." and "What DID...") from
the digitized audio cassette recording, one with low-pass filtering, the other with high-pass filtering.
The filtering had a strong effect on certain portions of the noise and not on others, depending on
which type of filtering was performed. If the noise had been conversation from a single individual
uttering a sentence, one would not expect a strong filtering effect which caused the first part of the
sentence to disappear almost completely yet leave the latter portion almost intact. One would also not
expect two completely different utterances (one a statement and one a question, no less) by two
different individuals to have the same cadence. Nor would one expect that word of sentences uttered
in ordinary conversation to be precisely timed as though they were mechanically produced.
To summarize, the loops created were:
CD track
1) Overlay of #2 ("We're not...") in Right channel to #1 (prior to connection) in Left channel.
2) Overlay of #2 in Right to #3 ("What DID...") in Left.
3) Overlay of #2 in Right to #4 (end) in Left.
Tape
1) Overlay of #2 in Right to #3 in Left with high-pass filtering.
2) Overlay of #2 in Right to #3 in Left with low-pass filtering.
Please refer to the Appendix for more complete details.
Discussion
CD track:
Overlay of #2("We're not...) to #1 (prior to connection):
The noise loops from the CD track demonstrate a repetitive noise which is present at various times
throughout the track. In particular, some of that repetitive noise was detected before the 911
dispatcher answered the 911 call from Patsy Ramsey, audio section #1. A portion of this noise is
indistinguishable that of #2. The previously purported conversation by John Ramsey, said to be
something like, "We're not speaking to you" could very well be this repetitive noise ("We're not
speaking...") plus a second, narrow-band "hooting" type of noise ("...to you") that appears to be
centered at approximately 500 Hz, as determined from the tape overlay. The narrow-band hooting
type of sound is repeated during the sounds that are probably the 911 dispatcher's typing. The
repetitive noise ("We're not speaking...") may simply be drowned out at this time by the typing sounds.
It is a part of this "We're not speaking..." section that is similar to the noise of audio section #1,
recorded prior to the 911 dispatcher answering the call from Patsy.
Overlay of #2 to #3 ("What DID..."):
The overlay of sections #2 and #3 displays a very similar cadence for the two different sections. This
should be very surprising if one is expecting John Ramsey to be saying, "We're not speaking to you"
and Burke to be saying "What did you find?" These sections are too mechanical and precise to be
ordinary human conversation. In addition, as just previously mentioned, the long 'u' sound of "...to
you" does not have the same spectral characteristics as the part "(We're) not speaking..." because
the high-pass filtering does away with the former but not the latter, as determined by the tape
overlay below.
Overlay of #2 and #4 (end of track):
The overlay of sections #2 and #4 reveal a repetitive background noise that sounds somewhat like a
dishwasher operating can be heard in both channels at the same cadence. The cadence is precise
enough that it would appear to be a machine noise rather than speech, unless someone is purposefully
speaking in a very mechanical and unnatural manner.
Summary of CD track overlays:
Our conclusion is that there is no discernible conversation during the purported "We're not speaking to
you" and "What DID you find?" sections of the CD track, but that these are instead composed of
background noises, possibly modulated by the recording equipment, for example the automatic gain
control (AGC) which is particularly evident after loud keystrokes. In particular, the section that
precedes the 911 dispatcher's answering of the 911 call (#1) cannot possibly contain speech by the
Ramseys, yet this early background noise is indistinguishable from the noise that is part of purported
conversation between the Ramseys (#2). One can easily fool oneself into believing that there is
conversation, but a more careful examination of the recording, complete with comparisons of one
section of the recording to another by superposing them together in separate stereo channels, allows
one to hear that the background noise is repeated at various times throughout the recording. Other
noises are recorded on top of this repetitive background noise, especially the long 'u' sound, and this
causes the noise to appear not to be as repetitive as more complete analysis shows it to be. There is
also some sort of automatic gain control in operation, as was previously mentioned, which may very
well be causing the repetitive background noise to appear to come and go.
Tape Overlay:
When listening to the overlay of samples extracted from the digitized audio cassette recording which
are high-pass filtered (6,500 and 16,000 Hz bandpass), one can hear ample signal of purported
conversation which, in that case, sounds like clicking that is distinct from the clearly audible keyboard
typing sounds. But this shouldn't be the case because phone lines have a fairly steep cutoff at about
3,000 Hz. Moreover, the long 'u' or "hoot" sound is practically gone. With low-pass filtering (375 and
750 Hz bandpass), we hear hooting sounds and something from the "What DID you find" purported
conversation, but not much of the other purported conversation. Conclusion: As with the CD track
overlay, these noises have different spectral characteristics from voice, and the hooting type of noise
is again found to have different spectral characteristics from the clicking type of sound.
Summary of all overlays:
There appear to be at least four different noise sounds: the background repetitive noise, the
purported "What DID you find," the purported "We're not speaking...,"and the "...to you" hooting type
of sound. None of these have the spectral characteristics of voice over a phone line, although perhaps
the "...not speaking..." comes closest in our analysis. These noises all have different spectral
characteristics from each other. If someone said, "(We're) not speaking...," that same person did not
say, "...to you." Also, because the hooting ("...to you") sound is repeated at intervals during the
typing, and because it is fairly narrow-band, it is unlikely to be the voice of anyone. A very short
instance of this sound is also heard after the click of Patsy's hangup, immediately before the third
"Patsy?" The purported "What DID you find" noise is too broad band to be voice over a phone line. The
"(We're) not speaking..." noise has too many high frequencies at certain specific times (clicks) to be
voice over a phone line. There appears to be some sort of underlying mechanical cadence to some
portion of all the audio sections, even though all the sections do display different spectral
characteristics. Also of interest is that the purported audio sections and the occurrence of the hooting
sounds during the typing occur in an almost periodic fashion <3>.
Conclusion
After extensive processing and analysis, we conclude that recordings of the 911 emergency call made
by Patsy Ramsey to report the kidnapping of her daughter JonBenét do not contain any audible
conversation between any of the Ramseys following Patsy's hanging up the phone. There are too many
discrepancies between the expectations of voice characteristics and the characteristics of the noises
which some have reported as conversation for the hypothesis of additional conversation on the
recording to be accepted. There appear instead to be several different noises with different
characteristics, including at least one that has a cadence and is repeated. It is suggested that the
combinations of these noises provide merely an appearance of conversation, particularly to wishful
thinkers after the idea of conversation has been suggested to them. Unfortunately this noise has not
only been falsely portrayed as conversation, but the idea that it is conversation has been
bootstrapped into a demonstration of deception by the Ramseys, and then to a virtual proof of the
guilt of at least one of the parents.
Further work could be done to test the actual evidence tape to verify these findings, although one
shouldn't expect that the spectral characteristics would be appreciably different; however, one may
find that certain noises that are on the audio cassette and especially on the CD track not to be
present on the evidence tape. One or more of the several noise sources we found may be due to
copying rather than due to the original recording. Unless one of these potential copying artifacts is
masking something, we expect the same conclusions would be reached regarding the lack of audible
conversation from the Ramseys' phone line after Patsy hung up the phone. We also don't expect any
revolutionary findings because after various enhancements, we do clearly hear something that we
could imagine, with a little effort, to be "We're not speaking to you" and "What DID you find?" It would
also perhaps be beneficial to produce a better-quality audio CD for distribution by the District
Attorney's office. It may also prove beneficial to determine the actual audio environment during the
recording of Patsy's 911 call, although it may be far too late to do that accurately if the environment
has changed considerably.
Thanks to the Boulder County District Attorney's office for providing the audio cassette and audio CD.
Thanks to "Jameson" of www.webbsleuths.com for encouragement in this project.
References and Notes
<1> An attempt was made to acknowledge trademarks the first time they are encountered in this
document. CoolEdit is a registered trademark of Syntrillium Software Corporation. WaveLab is a
trademark and Steinberg is a registered trademark of Steinberg Media Technologies AG. Engulf Audio is
a trademark of EXE Consulting. Pioneer is a registered trademark of Pioneer Electronics (USA) Inc.
Behringer is a registered trademark of Behringer International GmbH. Nero is a registered trademark of
Ahead Software. MP3 is a registered trademark of Thomson Multimedia. Maxell is a registered
trademark of Hitachi Maxell, Ltd. Creative Labs is a registered trademark of Creative Technology Ltd.
Plextor is a registered trademark of Plextor Corp. Pentium III is a registered trademark of Intel
Corporation. Microsoft and Windows are registered trademarks of Microsoft Corporation. Other
trademarks are trademarks of their respective holders (obviously).
<2> The following web page contains a good summary of what MP3 is all about as well as links to
related pages:
http://www.mp3licensing.com/mp3/mp3.html
<3> The mechanically repetitive noises during this section sound something like "...to you,"
"HOOT-hoot," "HOOT-hoot," "What did..." on the tape and "...to YOU," "hoot-hoo-hoo-HOOT,"
"hoot-hoo-hoo-HOOT," "What did you find" on the CD track. Closer examination indicates that the
"hooting" sounds are also composed of more than one sound. One is almost a short, squawking or
squeaking sort of hoot whereas the other one, usually following the former by a half second or so, is
quieter and more drawn out. The short one is particularly noticeable in two occurrences during the
typing that follows the purported "We're not speaking to you."
Appendix
Specific processing steps.