Wave Extraction


Input:
A rectified bundle of grooves.

Output:
A matrix corresponding to the audio information in the bundle.  The matrix at row i, column j is an audio sample from the i'th groove in the bundle, at time j.

The first step in this stage is to discover the shape of grooves in the rectified image, which is not exactly straight, as explained here.  To solve this, we first need to discover the density of the grooves, or, in other words, the frequency at which an ant walking along a vertical cross section of the rectified image traverses grooves.  This is done similarly to the method explain in the track separation detection stage.  After this frequency is discovered, the algorithm finds the phase of this frequency at each column cross section of the rectified image.  A plot of the phase (as a function of the column index) looks something like this:



As you can see, the value of the phase is between -pi and pi.  To get the shape of the groove, we need to make the phases contiguous (we want to "lift" the right part of the above image, so that it fits the left part).  The resulting groove shape (after smoothing and rescaling) looks like this:




This groove shape should be thought of as the shape of a theoretical groove with no modulation on it, because the modulation from differnt grooves are averaged out in the process.

Compare the shape of the curve to the shape of a single groove in the following image:



After the shape of the groove is discovered, the groove positions need to found.  This is done by "sweeping" the groove shape vertically over the rectified image.  For each sweep position, we calculate the sum of the intensities along the groove shape.  A plot of these sums-of-intensities along a sweep looks like this:



The maximal peaks in this plot correspond to the location of the grooves.  Now we know where the grooves are.

The next step is to walk along each groove and extract samples.  The idea is to look at the intensity vector of a cross section of the groove at time t, and to find a peak in that cross section.  A typical cross section depends on the manufacturing process of the record.  The following example is from our 33 RPM record.




And this one is from our 78 RPM record.  Note that the grooves are wider in this record.



The peak is found roughly as follows:  Find the highest intensity pixel in the vertical cross section of the groove at time t.  Look two pixels before and two after.  Fit a parabola to this vector of 5 pixel intensities, and return the position of the peak of the parabola.  This is the desired audio sample.

The vector of samples for each groove is finally passed through two filters:  The first is a high-pass filter that is responsible for further cleaning out rectification inaccuracies (the assumption is that rectification errors are at a very low frequency).  The second is a low-pass filter that is supposed to kill frequencies in a range that is obviously not part of the music, and therefore noise.

.m files: get_waves.m, get_wav.m