Undocumented Code: Making a Smart Speaker

The Orange Pi Zero came in the mail yesterday, so I guess we're ready to move on to the next part in the design of the smart speaker: audio input and output. We won't cover any of the intelligence behind the AI right now because all I want to do is get audio out of the Orange Pi and into the speaker at a reasonable volume, and maybe get voice recognition to work. By the end of this, we should have something that can play music files and maybe transcribe what I say. We'll see.

This is the OrangePi, a knockoff Raspberry Pi from China. It's much cheaper than the full RPi, and this one has exactly the features I want:

It's small
It has built in WiFi
It has an Ethernet and USB port for easy expansion
It has an interface header with USB, Line Out, and Mic In
Did I mention it was cheap?

I downloaded an Ubuntu Server Image from Armbian and wrote it to an SD Card. The OrangePi takes a little bit to boot up, but once it does it seems like a competent system. I'm going to run it on Ethernet for now, but when it's all built together, you'll only need ethernet to set it up (maybe not even that).

So the first thing was to get audio out of there and into the speaker. The pinout of the speaker to the volume board is very graciously marked on the volume board's PCB. It has 5 pins: VC-0, VCC, L, R, and GND. To turn the speaker on, VC-0 and VCC should be shorted (the speaker runs at 12 Volts, so to run everything I'll need to step it down somehow), and L, R, and GND are fairly self explanatory.

This is taken from the manufacturer's pretty decent documentation. So I attached the line outs and ground to the pin header and turned everything on. The first thing I noticed was that there was a 60Hz hum coming from the speaker. I assume this is because the rectifier doesn't smooth out all of the bumps. There is a capacitor on the volume board (with the power switch) that I assume does this smoothing. I'll add that to the final design later on - I think I can solve this with a single capacitor bridging VCC and GND.

When the board finally booted, I scp'd over a wav file and it played very well. The only potential issue I see is that it has to be over 50% volume to be even close to audible. That being said, minus the 60Hz hum that's taking out much of the bass, the sound quality is pretty good.

The next step was trying to get a USB microphone to work. This, too, was very painless with the builtin alsa system. I'm eventually going to use a USB microphone, so I plugged in my Blue Snowball and it found and identified it just fine (aren't USB standards just great?). I recorded and played a WAV back. A potential issue here is that the recording was very quiet. This might be the fault of the microphone, I'm not sure.

The final step in seeing if everything was working was trying to get voice recognition to work. I heard some really good things about Steven Hickson's AUI project. So I decided to give it a shot. I followed the instructions on the readme, but there were some weird dependency issues. So I tried to do it with the streaming Google Speech API using NodeJS (which is what I wanted to write this in in the first place.)

Once I figured out all of the authentication stuff (using a default account), the sample given for streaming live audio in real time worked pretty well!

This method actually works really well. If my network connection were faster, the transcription would be much faster. But as it stands right now it works fairly well. The only downside is that I still need to figure out hotword detection and I would really like to stay under the free 60 min/month tier Google has. But other than that, I'm fairly impressed with what this has been able to do in such a short amount of time. Here are the next steps that I'll do for the next time:

Thermal Regulation (it holds back and sometimes freezes because the chip gets too hot), I'll need a heat sink or something like that.
WiFi Connection Setup
Fix it to the subwoofer box and make the wiring a little more permanent
Write a system that does hotword detection and command processing

That last one will be a doozy and will probably be a whole part all on its own, but until then, I think this has been a great success with some great progress!

Undocumented Code

Saturday, September 30, 2017

Making a Smart Speaker - Part 3

No comments:

Post a Comment