Audio
Alignment and Track Reconstruction
Problem
The audio extraction or "wave extraction" algorithm produces blocks of
overlapping waves as its output. Each track or "song" will
correspond to a number of blocks of waves that must be aligned
horizontally across time and vertically across grooves. This time
alignment problem arises because of the necessity of scanning multiple
images and the vertical groove alignment problem arises of the
necessity of limiting wave block size in memory. The alignment
must be produced and stored for the purpose of producing output
wave files from the wave blocks.
After all of the necessary alignments have been found, the individual
wave chunks must be pieced together to form one long playable wave for
each track on the record.
Solution
As mentioned in the previous section, the wave extraction returns
blocks of data that roughly correspond to regions of the record as
shown below. A nice property about the blocks is that the samples
from groove to groove should correspond very well in overlapping
regions of blocks assuming that that center was found reasonably well.

Grooves must be tracked along multiple overlapping misaligned sectors
with wrap-around.

To find an alignment, we took a small patch (of about 150 samples by
ten grooves typically) from one block and then searched in the second
block for all possible matches. The goodness of a match was
measured as the sum of squared differences and the minimal match was
taken as a candidate alignment. To speed up the search slightly
we used convolution in Matlab (see find_diffs.m find_diffs.m for
the implementation). Due to differences between scanned sectors
caused by things such as lighting, dust, quite record regions, and
limitations in the rectification and wave extraction such as failure to
completely remove the warp, the best match that is found might not be
correct. To test the validity of a hypothesized alignment, we
used statistics to verify the hypotheses. The idea is to compute
the mean and standard deviation of a random alignment by looking
hundreds of random alignment positions between the two wave
blocks. We then compute a mean again over hundreds of alignments,
but this time using the hypothesized alignment as an offset. If
the new mean mean is more than one standard deviation from the random
mean, then we accept the alignment. Otherwise we try again up to
three times. On both the 78 and 33 rpm records that we tried, a
good alignment was always found within three tries. Finding an
alignment typically takes about 15 seconds and the statistical test
takes less than a second.

The diagram below shows an example alignment result. The images
are reasonably well normalized and centered which is a result of effort
put into the wave extraction. Good normalization and centering is
critical for the type of alignment we used.

Finally, once all of the alignments have been computed, it is a simple
matter of using the alignments to reconstruct the wave data. To
minimize audible misalignment errors between waves from neighboring
sectors, we did a crossfade of about 200 samples in the overlapping
region. The create_all_songs
and create_song
functions listed below do the actual reconstruction and produce a set
of playable wave files corresponding to the original songs on the
record. No human interaction is needed.
.m files: align_all.m, align_waves.m,
find_stats.m, find_stats_rand.m,
find_all_jumps.m,
find_jump.m, find_diffs.m,
create_all_songs.m,
create_song.m