A spectrogram is a time-varying spectral representation, and probably the simplest Time-Frequency Representation. I'm fascinated with them, including optimal methods for calculating. Questions arise, e.g. FFT sample length, FFT overlap, windowing, zero-padding, wavelets vs. Fourier transform. Take a look at Eddie Vedder singing Yellow Ledbetter. He's got great control ;) Note all the harmonics. Apparently that's normal for human voice. Next up is a dolphin. Note the coherent frequency peak with time. That's a waveform. Much better sustain than Eddie. What's my point? They're using echolocation to sense the environment. (The animals, not Eddie. Well, maybe Eddie. You'd have to ask him.) The principles are almost exactly the same for RADAR, despite the fundamental difference in carrier waves. This isn't new knowledge or anything, I simply found the similarity striking, since I discovered these images after spending months working with radar waveforms. Radar waveforms are commonly called "chirps". When heard at baseband (audio) frequency, they sound like a bird's chirp. Given a known rate of change of frequency with time, one has a measure with units s^-2. With radars, this is commonly known as a "chirp rate". The chirp rate is the slope of the straight line in the adjacent figure. Calculating Spectrograms: The web is full of information and practical webpages, but I've never read a decent explanation for how to optimally calculate a spectrogram. The short answer; there is no optimal method, only different ways of looking at the same information. The glib answer is to sample faster. A spectrogram illustrates rapidly changing signals by viewing the spectral evolution in time. The foremost requirement is that the signal is sampled sufficiently fast to resolve those changes. If this condition is not met, forget about the spectrogram or anything else for that matter.1 A spectrogram is calculated from a single timeseries. Think of it as a line of samples. The simplest method of calculating the spectrogram is to partition the timeseries into equal lengths, stack them up next to each other (line -> 2D image), then Fourier-transfom. The questions arises, at what length (# samples) do we cut the timeseries? Using this method, there is a trade-off between resolution in the time domain vs. resolution in the frequency domain. If you don't find this interesting, consider this problem is mathematically analogous to Heisenberg's uncertainty principle. The two bases (dimensions) are non-commutative transforms ;) Let's recap: N total number of samples ("length" of timeseries) [fixed]Nfft number of samples in the frequency domain [variable]Nt number of spectra in the time domain [variable]The simple method listed above requires N = Nfft * Nt For complex signals, varying the Nfft/Nt ratio reveals different information. Nfft/Nt ratio emphasizes frequency information (harmonics, etc.), Figures pulled from wikipedia, because they're better than mine. (These spectra were created using the reassignment method; which uses the spectral phase information in an algorithm to emphasize a single phase-coherent signal in a limited frequency-time region. It's an extension to the basic spectrogram and the same rule applies.) In short, we'd like to increase Nfft to improve frequency resolution, but run the risk of including two distinct temporal events in the same spectrum. The impulse response of the windowing function limits the time range that can be resolved (see Nelson's separability). N = Nfft * Nt is an unsatisfactory restriction, and tends to miss signal at the transition between Nt1, Nt2,... because the power is split between adjacent spectra. A common trick is to relax this restriction by allowing for "sliding", i.e. overlap of samples used in each spectra. A 50% overlap is justified in this regard. Overlapping will repeat information, resulting in a blurred visual effect.A useless trick is zero-padding to increase Nfft, i.e. frequency resolution. Zero-padding increases analysis resolution, not data resolution.Provided the signal chirp rate (frequency rate) Fr is known a-priori, the optimal Nfft can be calculated as Nfft = (Fs2/Fr)1/2 where Fs is the sampling rate. This follows from setting the spectral resolution df and the temporal resolution dt to match the frequency rate Fr:df = Fs/Nfftdt = Nfft/FsFr->df/dt=(Fs/Nfft)2Conversely, Fr could be estimated from the data, and the spectrogram optimized. If you're a visual learner, this is equivalent to having the chirp line diagonalize each data point/square/pixel on the figure. The area of one data point df*dt is unitless and unitarily equal to one sample.This is for the extremely simple case of a signal with constant frequency rate. For the case of multiple-signal varying- Fr a more advanced time-frequency representation is called for. Specifically, how to visualize varying frequency rates? :)Comments: --------------------------------- Stationary phase works because noise has random phase. For spectral analysis, phase is half the information. Applying that information is the difficult part. Spectrogram results may be improved by extending the one-dimensional window to a two-dimensional kernel. If trying to estimate a chirp's properties, try chirplets. (ugh.. that name) If you want to frequency-filter a broadband chirp, perhaps rotation of the time-frequency domain will help. For complex signals, the spectrogram method may not be sufficient. Data-adaptive kernels may provide better results. Regarding windowing functions, people tend to use them blindly as a necessity. Most often they are used as an apodization function2 to manage the Gibbs phenomenon and reduce the noise floor at the cost of broadening spectral peaks. There is no windowing function (in my knowledge) that narrows and amplifies a spectral peak, they only lower the portion of the noise floor due to Gibbs phenomenon. And in some cases, the Gibbs noise is insignificant compared to other sources. If the non-windowed spectra have sufficient SNR, why window? Indeed, if you know the signal frequency, why not use an integer number of periods? No Gibbs, and the sharpest peaks possible.3 National instruments has a pretty good guide to choosing windows. For signals that change rapidly in frequency, and/or are broadband, often no window is the best choice. Footnotes: --------------------------------- 1. Unless you are purposely using undersampling. In which case I would love to see some experimental results. It looks great on paper, but I worry about hardware imperfections. I need to read more. 2. I prefer "apodization function" because it's specific about purpose. Windowing is also used for beamforming, and other general frequency filters which is a completely different application. 3. Sometimes the spectral peak is extremely narrow, i.e. a few pixels/data points. On a high-resolution spectrogram, it won't be seen unless you zoom in. Then you get into issues of matching the resolutions of the spectrogram and screen. Don't confuse a widened-via-windowing spectral peak as better simply because it wasn't visible non-windowed. |






