Audio Extraction from Optical Scans of Records


image

COS 429 Final Project

Written by:

Mark McCann

Paul Calamia

Nir Ailon

e-mail: {mmccann,pcalamia,nailon}@cs.princeton.edu

image


Contents:



Overview

Vinyl records encode audio signals mechanically.  For monaural records, the signal is modulated as tiny sideway movements of a spiral groove.  A record needle following these movements creates an electromagnetic signal which is sent to the speakers via amplifying circuits.  The movements of the groove can be captured using off the shelf scanning equipment.  This gives rise to the idea of extracting the audio signal from optical information and saving it as a sound file playable by any standard playback software.
Back to top


Objective

Our goal was to create a system that has the following properties:
Back to top

Technical Information

programming environment
matlab
scanner type
Epson Expression 1640XL
memory usage
around  80MB
running time for full record (not including scanning) on Intel Pentium 4 2GHz or Mac G4 1.25GHz)
around 2-3 hours
Back to top

Input & Output

The input to the system is a directory with n scanned images of the records.  The filenames are 1.png, 2.png, ..., n.png.  Each image should look roughly like this:
scan


The output is a collection of .wav files, corresponding to the different tracks of the record.

Back to top

The Full Pipeline





record scanning




center of record detection




track separation detection




rectification




wave extraction




alignment and track reconstruction




audio post-processing


Back to top

Sample output

From a 78 RPM record by Woody Herman and His Orchestra (DECCA 9-25300).  The title of the track is "Las Chiapanecas".
From a 33+1/3 RPM record by composer Scott Joplin and performed by the New England Conservatory Ragtime Ensemble.  See if you can recognize some of these (hint: Entertainer, Maple Leaf Rag, Easy Winners).
Back to top

References and related work

S. Cavaglieri, O. Johnsen, and F. Bapst, Optical Retrieval and Storage of Analog Sound Recordings. Presented at the AES 20th International Conference, Budapest, Hungary, October 5–7, 2001. www.eif.ch/visualaudio/publications/AES20.pdf
V. Fadeyev and C. Haber, Reconstruction of Mechanically Recorded Sound.  Lawrence Berkeley National Laboratory Technical Report 51983, 2003.  (also submitted to the Journal of the Audio Engineering Society) www-cdf.lbl.gov/%7Eav/
U. Kalla,  N.  Jalden,  N. Lithammer,  M. Eriksson,  and E. Perez,  The Digital Needle Project - Group Light Green. KTH Royal Institute of Technology, Stockholm, Sweden http://www.s3.kth.se/signal/edu/projekt/students/03/lightgreen/
P. Olsson,  D. Öhlin.  R. Olofsson,  R. Vaerlien, and C. Ayrault. The Digital Needle Project - Group Light Blue. KTH Royal Institute of Technology, Stockholm, Sweden www.s3.kth.se/signal/edu/projekt/students/03/lightblue/
O. Springer, Digital Needle - A Virtual Gramophone.  www.cs.huji.ac.il/~springer/
S. Stotzer, O. Johnsen, F. Bapst, C. Sudan, and R. Ingol, Phonographic Sound Extraction Using Image and Signal Processing. Proc. ICASSP May 17-21, 2004. www.eif.ch/visualaudio/publications/ICASSP%20Phonographic%20sound%20extraction.pdf









From Fadeyev et al.



Back to top

Code

File name
Description
Usage
do_everything.m
Runs the entire pipeline.
do_everything(dir_name, num_scans, rpm)
environment.m
Adds global variables related to the project to the current scope.
environment
set_dir.m
Sets the current working directory of the scanned images and all the intermediate files and output files that are created during the process.  The smooth_image and smooth_wave determine the amount of smoothing (Gaussian width) done to the image and extracted wave respectively.  This function is called by do_everything.
set_dir(dir_name, num_scans, rpm [,smooth_image] [,smooth_wave])
findOuterEdge.m
This function returns a list of points corresponding to the  edge of the record for a particular scanned image n.  This is called by calcCenter.
L = findOuterEdge(n)
calcCenter.m
This function computes the center and radius of the circle that best fits the list of circle edge points returned by findOuterEdge.  It is used for the warping as performed by get_track.
[x y r] = calcCenter(pts)
get_separation.m
This function performs the track separation for a single image.  It is called by get_track.

y and x are the coordinates of the record center.  r1 and r2 are end points of an interval of radii outside of which the data is ignored.  The record image is passed via a global variable.
[sep_list,..dbg vars..] = get_separation(y,x,r1,r2)
get_track.m
This function iterates on all scanned images, and for each one calls calcCenter to do the record center detection.  Then it calls get_separation to get the outer and inner points and the track separation.  After that, it performs rectification of all regions of each image, respecting the track separation and memory constraints (the max_mem parameter).  arg is used for debugging.

There are two outputs to this function.  The first is files named #.#.trk.mat.  Each such file is a rectified region.  The first # is the image number ("sector") and the second # is the bundle number (a bundle is a run of consecutive grooves from one scanned image).  The second is a file named song_struct.mat, which describes the track structure (e.g. "bundles 2,3&4 belong to track number 2").
get_track([max_mem] [,arg])
get_waves.m
This function is run after get_track.  It is called by do_everything.  It iterates over the #.#.trk.mat files, and for each one, extracts a wave matrix.  The matrices are saved in files named #.#.waves.mat.  It used get_wav.m to do the actual wave extraction job. get_waves
get_wav.m
This function takes one rectified bundle of grooves as input and returns the corresponding wave matrix.  Input and output through global variables.  It is called by get_waves.
[...debug info...] = get_wav
align_all.m
This function is run after get_waves. It is called by do_everything.  The purpose is to find alignment of wave blocks in the time domain between sectors.  align_waves actually does all of the work and is called many times for every needed wave block alignment.  The output from this function is stored in the format #.#.align.mat.
align_all
align_waves.m
This function performs alignment on small subregion of right side of matrix waves1 with all of waves2.  gstart and gend are optional parameters to specify the starting and ending grooves of waves1 to try alignment.  offset is an optional parameter specifying how many samples from the end of waves1 to try alignment on. The results are stored in the struct alignStruct.  This function is called by align_waves.  This function calls find_diffs to do the actual search and tests the validity of a proposed alignment with find_stats and find_stats_rand.  It will try finding a good enough alignment up to three times.
alignStruct =
align_waves(waves1, waves2, gstart, gend, offset)

Contents.m
This function provides a list of all our project function files in Matlab's standard help format.
Contents  or  help
create_all_songs.m
This function calls create_song for every track on the record.  It assumes that align_all and find_all_jumps have already been run.  It is called by do_everything and is the last step that ultimately generates playable audio files.
create_all_songs
create_song.m
This function is called by create_all_songs and is responsible generating playable audio files by grabbing chunks of waves from waves blocks in order based on time and groove alignment.  It generates a song from bundles first_bun to last_bun.  Its output is stored in global Gmaster.
create_song(first_bun, last_bun)
do_gaussian1d.m
Performs Gaussian smoothing of a 1d vector.  Used in several places in the project. outvec = do_gaussian1d(invec, sigma)
do_gaussian.m
Performs piecemeal Gaussian smoothing of a 2d image.  It is called by get_track.  This function is used for smoothing the rectified images obtained in get_track.  Due to the extremely slow performance of the matlab filter2 function on big matrices, we needed a function that does it piecemeal.

Input and output of this function via global variables.
do_gaussian(sigma)
find_all_jumps.m
This function finds alignment between wave blocks in the first sector.  It vertically aligns blocks within the same song as was determined by the track separation algorithm.  It calls find_jump multiple times to do the alignment work.  The rsults are saved in the format #.jump.mat.
find_all_jumps
find_diffs.m
This function is finds the position of the least sum of squared differences for a small matrix A on a larger matrix B.  The result is either the best match or a sorted list of all best matches in ascending order by value.  This is used by the alignment functions align_waves and find_jump.
L = find_diffs(A, B)
find_jump.m
This does the work of the function find_all_jumps for a particular vertical jump on the first sector between the bundles passed in.
j_struct = find_jump(bundle)
find_stats.m
Finds the mean and standard deviation of numwindows sum-of-squared- differences between data windows in overlapping areas of different wave matrices. The windows are assumed to contain the same data, and the mean is expected to be low as compared to the mean returned by find_stats_rand.m (see below). [ssdmean, ssdstd_dev] = find_stats(wavemat1, wavemat2, numgrooves, numsamples, groovejump, samplejump, numwindows)
find_stats_rand.m
Finds the mean and standard deviation of numwindows sum-of-squared- differences between randomly chosen data windows in adjacent wave matrices. The return values are used with those from find_stats.m to confirm the validity of the calculated groove alignment. [ssdmean, ssdstd_dev] = find_stats_rand(wavemat1, wavemat2, numgrooves, numsamples, numwindows)
get_num_bundles.m
Returns the number of bundles as was determined by get_track.
bundles = get_num_bundles
load_aligns.m
Loads the alignment data for a bundle of alignments into global Galigns struct array
load_aligns(bundle)
load_waves.m
Loads all of the waves data for a bundle into global Gall_waves struct array.
load_waves(bundle)
process_song.m
Resamples and filters audio data after extraction from scanned images. process_song(wav_file_name)
rate_from_align.m
Determines the sample rate in  samples per second from the alignment data for a particular bundle and from the user supplied rpm.  This number should agree closely with number from rate_from_track.
rate = rate_from_align(bundle)
rate_from_track.m
Determines the sample rate in samples per second as can be determined directly from the scanned image files and based on the user supplied rpm.  In practice this number agrees very closely for each scanned image and therefore this value is determined from the first image only.
rate = rate_from_track
Back to top