10. Digital Audio Basics

**James** Tue Jun 15, 2010 1:44 am

There are two main categories of audio files: compressed and uncompressed. This has nothing to do with the compressor effect. Compression in digital audio refers to slimming down the file size. Within compressed files there are two types: lossy and loss-less.

LOSSY COMPRESSION:
Lossy compression formats (mp3, m4a, aac, wma...) result in some degree of loss of quality. Different formats do it slightly differently. They take advantage of our "perceived loudness" of different frequencies. One of the first things to go in the process of making an mp3 is cutting off the lowest and highest sounds. Resolution for lossy audio files is usually measured in kbps, meaning how may kilobits it takes to describe a second of audio. This is not the ultimate measure of resolution for a lossy file though. It depends on the quality of the file it came from. Also, different formats can sound better than each other at the same resolution. A 128kbps .m4a will generally sound better than a 128kbps .mp3 made from an identical file. Lossy files can also be saved with variable bit rates, which means the resolution isn't constant. For example, at a moment where only one instrument is playing, it may need less bits to reproduce the sound than at a moment where more is happening. These are mostly for applications where the file will be streaming online. Generally avoid them for other applications. They can result in minor timing alterations which would make them bad for what we're doing.

LOSS-LESS COMPRESSION:
Loss-less formats (flac, apple lossless...) reproduce the signal 100% accurately when played back, but they can't make the files as small as lossy formats. They reduce the file size by detecting repeated patterns of bits, and only saving the pattern once. It can then reference the saved pattern every time it would come up. They don't come up too much. I've only really seen them on audiophile peer-to-peer networks.

UNCOMPRESSED FILES:
Uncompressed formats (aif, wav) (referred to as PCM formats) are the types of files every program uses to record. Most programs will play either one. Some programs will let you drop a compressed file on a track, but others like Logic and Pro Tools will require you to convert the file to PCM to use it. The resolution of PCM files is measured by two parameters: bit depth and sample rate. These are both important to understand.

SAMPLE RATE:
Think of a recording as a reel of film: a bunch of audio snapshots in rapid series. Each frame is a sample. The sample rate of the audio is how many samples there are per second. Each sample is just a notation of the volume at that instant. The smallest sample rate you will encounter is 44.1kHz or 44,100 samples/second. This is the standard for CD's. Audio on DVD's is always at 48kHz. I think blue-ray can go up to 96kHz or 192kHz. All sample rates are multiples of 44.1kHz or 48kHz. (i.e. 44.1k, 48k, 88.2k, 96k, 172.4k, 192k) If you're recording something that will ultimately end up on CD, you want to use a multiple of 44.1k, if it's something that will ultimately be on DVD you want a multiple of 48k. It makes the eventual step-down in resolution a little bit cleaner.

NYQUIST's THEOREM:
There is a principle that determines what the lowest possible sample rate is, called Nyquist's theorem. It says the sample rate must be twice the highest frequency you wish to reproduce. The human hearing range is 20Hz-20kHz, so to reproduce that entire range, the smallest possible sample rate would actually be 40kHz. When they were designing CD's they decided to up it to 44.1kHz because there are some harmonics outside our hearing range that interact with the harmonics we do hear. The sample rate needs to be twice the highest frequency because you need at least one snapshot for each compression and rarefaction of the wave. If you missed one compression, the two rarefactions on either side would not longer be separated, thus increasing the wavelength at that instant and lowering the pitch.

BIT DEPTH:
The other parameter, bit depth, is a measure of how many bits (or binary digits) of data are used to describe each sample. This determines how many possible volumes a sound can be at, i.e. the dynamic range. To find the possible values at a given bit depth, just take 2 to the power of the bit depth. 8-bit audio is capable of 256 different values (2^ Cool

. 16-bit audio (CD standard) has 65,536 values (2^16). 24-bit (DVD standard) has 16,777,216 values (2^24). Sometimes you'll see 32-bit floating point. That's kind of like a variable bit rate from lossy files. Generally avoid it unless told otherwise.

I've been recording in 24-bit/44.1kHz. You guys should too, if you're not already (or 88.2k if you have extra hard drive space you don't want).