Sunday, September 29, 2019

Remote Audio Transmitter with an ESP8266

The motivation for this project was to have a wireless way to listen to my building's intercom remotely, be it from another room or from another state. Having the ability to do this doesn't necessarily add security to my residence like cameras would (maybe keep an eye out for a post on that), but it's interesting nonetheless. So what we can do is capture the audio over WiFi with an ESP8266 and capture it on a server. All with the power of Docker, of course!

The best place for any project is to search the web to see if someone has already done it. A lot of the stuff I found was for the ESP32. I only have ESP8266s on hand, and I didn't want to cut a sandwich with a chainsaw. So I kept looking. I found an interesting page on making a baby monitor using an ESP8266.

Sven337's wonderful baby monitor project served as a very good jumping off point for my project. It featured code for using the ESP8266 to transmit delta compressed audio samples over UDP to a receiver over WiFi. This is basically exactly what I wanted to do.

His project used an MCP3201 to ingest the audio. He claimed that the ADC on the ESP8266 was far too noisy to be useful in this context. After trying this out myself, I found him to be absolutely correct. I guess there's just too much going on in a small footprint to isolate a clean signal. So I got myself an MCP3201 to get as close to his original design before modifying the stuff to strip features I don't need and to build a Docker container around it.


I managed to get the code to write a WAV file of the sound that was passed into the ADC. This was, indeed, a good start.

I went to work optimizing the code. This was primarily removing things I didn't need like silence detection and filtering on the transmitter and log messages on the receiver. In fact, I actually just wanted the receiver to receive and decode the UDP packets and print them straight to STDOUT. So I removed a bunch of file logic as well.

This resulted in smaller and (subjectively) faster binaries across the board. When running ffplay -f s16le -ac 1 -ar 20k <(./udpserver)the sound was pretty much real time without pitch skewing. This is obviously a good thing, but don't hold onto that real time hope just yet, we're about to lose it.

The actual part of the project that I would be writing was the simple one: the dockerization of the program. Not only that, but we'd have to restream the thing as well. That's certainly no problem for us!

The idea here is to use a multistage build. We'll use a regular Alpine Linux container to build the UDP Server program and then copy it into the correct container. The container we'll use to execute this is FFMPEG version 3.3 because it still has a copy of FFServer. The power of FFServer - just like the power of FFMPEG - is that it's both exceptionally flexible and free. I wish they continued to make it, but for now we'll have to stick to an old version of FFMPEG that comes with FFServer.

Our container - despite best practices - will be running three components. The first is the UDP server we'll compile in the first stage. This will pipe its information into FFMPEG which will write to a buffer file (an FFM file) for FFServer to transcode and deliver to anybody asking for it. We will be doing this in OGG format because it's streamable and because I feel like it. This introduces a significant delay, but there is still no pitch skewing and very few blips in the audio.

This Gist is the code for the whole project. Now we move on to the main portion: does it work with the intercom system in my building?


This is the back of the panel. Ideally, we can just hook our input wires to the terminals that the "Listen" button are wired to. We'll assume for the sake of this exercise that there is a good way to power this solution (at current there isn't - I'm using a battery bank). Is it safe to connect directly to the ADC? Sure, why the heck not?


I've done my best to try and figure out what contact on the board does what. Take this list with a whole Utah full of salt:

  • D - Door release?
  • - - Not sure what this one is yet. It could be reference ground for the door release and voice output.
  • PT - Voice output? (Could stand for Push to Talk)
  • T - Likely audio input (Could stand for Talk)
  • SIG - Pretty sure this is the buzzer noise you hear when somebody pushes your unit's doorbell
  • C - Likely audio common
So, using this as my best guess, I attached the inputs to T and C. This yielded a disgusting noise that had no resemblance to the outside. Even with some resistors to bring the volume down, it was the same noise, but quieter. So I wasn't clipping the ADC. When a loud sound is heard (like the door closing) you can hear some disruptions to this pattern, but the signal is drowned out completely by the noise.

I'm not exactly sure why this is happening. It's possible that the speaker's resistive load is acting as some sort of filter for the audio when it comes in. This speaker is rated at 15 ohms and 0.5 watts, and all speakers behave like resistors and inductors in some way. So, maybe I need to add some sort of load and inductance to the line? I don't know. I don't have an inductor on hand to try it out, and working with this makes me nervous because I don't want to break the entire building's intercom. So I decided to take the L and wrap the project up until I became brave enough to try and figure out how to isolate the sound from the signal. 


As always, there are additional smaller things in this project that I'd like to get to that I just didn't have time for:
  • Fix the buffer problem - there exists an issue where the UDP Server doesn't deliver samples at a perfect 20kHz like FFMPEG likes. Using FFPlay, this is resolved immediately because the playback is real time. With FFServer, things are getting buffered in the FFM file, so if we deviate from the 20kHz, that can effect the latency. Not sure how to fix that yet, but removing silence detection seemed to help.
  • Talkback feature and remote unlocking - wouldn't it be cool if I could talk back through the speaker and trigger the "unlock" button remotely? The former sounds much harder than the latter.
  • Make this thing more resistant to unstable power - The difference between a small USB power brick and my computer's USB line is astounding. Even with a couple of Caps across the 5V line, the ADC (I assume) isn't very tolerant to noisy power. I assume this can be fixed.
  • Make it easier to expand - the UDP server is hardcoded to a port. Maybe the UDP server can accept multiple inputs or we could run multiple UDP servers and pipe them all through FFServer. Either way, there's probably a better way than to just spin up several containers.
All in all, it was a fun project. I wish I did more of it from scratch but it would have taken me twice as long to figure out the UDP and ADC stuff. So again, a big shout out to Sven337. His code was a great starting point for this little project.

No comments:

Post a Comment