Audio Alignment and Track Reconstruction


Problem

The audio extraction or "wave extraction" algorithm produces blocks of overlapping waves as its output.   Each track or "song" will correspond to a number of blocks of waves that must be aligned horizontally across time and vertically across grooves.  This time alignment problem arises because of the necessity of scanning multiple images and the vertical groove alignment problem arises of the necessity of limiting wave block size in memory.  The alignment must be produced  and stored for the purpose of producing output wave files from the wave blocks.

After all of the necessary alignments have been found, the individual wave chunks must be pieced together to form one long playable wave for each track on the record.

Solution

As mentioned in the previous section, the wave extraction returns blocks of data that roughly correspond to regions of the record as shown below.  A nice property about the blocks is that the samples from groove to groove should correspond very well in overlapping regions of blocks assuming that that center was found reasonably well.




Grooves must be tracked along multiple overlapping misaligned sectors with wrap-around.



To find an alignment, we took a small patch (of about 150 samples by ten grooves typically) from one block and then searched in the second block for all possible matches.  The goodness of a match was measured as the sum of squared differences and the minimal match was taken as a candidate alignment.  To speed up the search slightly we used convolution in Matlab (see find_diffs.m find_diffs.m for the implementation).  Due to differences between scanned sectors caused by things such as lighting, dust, quite record regions, and limitations in the rectification and wave extraction such as failure to completely remove the warp, the best match that is found might not be correct.  To test the validity of a hypothesized alignment, we used statistics to verify the hypotheses.  The idea is to compute the mean and standard deviation of a random alignment by looking hundreds of random alignment positions between the two wave blocks.  We then compute a mean again over hundreds of alignments, but this time using the hypothesized alignment as an offset.  If the new mean mean is more than one standard deviation from the random mean, then we accept the alignment.  Otherwise we try again up to three times.  On both the 78 and 33 rpm records that we tried, a good alignment was always found within three tries.  Finding an alignment typically takes about 15 seconds and the statistical test takes less than a second.



The diagram below shows an example alignment result.  The images are reasonably well normalized and centered which is a result of effort put into the wave extraction.  Good normalization and centering is critical for the type of alignment we used.


Finally, once all of the alignments have been computed, it is a simple matter of using the alignments to reconstruct the wave data.  To minimize audible misalignment errors between waves from neighboring sectors, we did a crossfade of about 200 samples in the overlapping region.   The create_all_songs and create_song functions listed below do the actual reconstruction and produce a set of playable wave files corresponding to the original songs on the record.  No human interaction is needed.


.m files: align_all.m, align_waves.m, find_stats.m, find_stats_rand.m, find_all_jumps.m, find_jump.m, find_diffs.m, create_all_songs.m, create_song.m