Original idea
One day in a biology class, we analysed bird songs to deduce which bird was the song coming from. We used a software called Audacity, which can render a sound’s spectrogram. It represents each frequency’s intensity at each point in time in the song. That way, patterns start emerging and it’s possible to recognise a bird’s specific “footprint” and deduce which bird sang it in the first place.

I then had the idea of putting an arbitrary image inside the sound’s spectrogram. Considering the spectrogram was obtained from a Discrete Fourier Transform, all I had to do is write a way to perform an Inverse Fourier Transform on the original image.
My approach to this was to convert the image to black and white, and then encode the x-axis as time and the y-axis as frequency. Then, for a given column, I simply computed the sum of cosines which frequency is proportional to the pixel’s height in that column. Therefore the higher the pixel, the higher “its frequency” in the output. I also matched the grayscale intensity to the cosine’s amplitude for that pixel. I then summed up all the values in the column, wrote it to a file and proceeded to the next column. This produces a sound whose spectrogram recovers exactly the original image.
Applications
This technique has been used in the videogame Doom in 1993.
Indeed, the authors hid satanic symbols (666, pentagrams) in the video game’s soundtrack. The general public noticed but the deed was done only decades later. Nowadays, this technique has some use in protecting copyrighted audio material, as any recording will also inevitably pickup on those hidden images, and later can be analysed to show it’s in fact an illegal copy.
Documentation
Today, the project is documented on Hackster.io, a website for DIY enthusiasts which I wanted to try out (even though it’s mainly meant for hardware people). The source code is also freely available on my GitHub where everyone can download, run and contribute to it.
