[Audio] Configure sample rate of BufferRecorderNode


#1

Hi all,

It’s my first time using the audio API in Cinder. I can’t seem to find how to control the sample rate.

My application is very simple audio-wise: after a trigger I want to record a snippet of audio (about 10 seconds) and transmit it over the network. I’m working with voice audio, so the sample rate can be lowered to 16 kHz to minimize data.

I see BufferRecorderNode has a function getSampleRate, which mentions the sample rate “is governed by the Context’s OutputNode”. But 1) I don’t have an OutputNode and 2) I can’t find a function there either…

I find it confusing that BufferRecorderNode does not inherit from OutputNode, isn’t it also an endpoint for audio?

Can someone help me out?
Thanks!

My code:

auto ctx = audio::Context::master();

// use mono processing
auto format_mono = audio::Node::Format().channels( 1 );

auto input = ctx->createInputDeviceNode( audio::Device::getDefaultInput(), format_mono );

mRecorder = ctx->makeNode( new ci::audio::BufferRecorderNode( format_mono ) );
mRecorder->setNumSeconds( CAPTURE_LENGTH );

input >> mRecorder;

input->enable();
ctx->enable();

#2

I think you need to update the sample rate of the Device Manager.
https://forum.libcinder.org/topic/changing-audio-speed-sample-rate#23286000002683005


#3

Thanks for the link!

So the simplest way to change the sample rate seems to be:

auto device = ctx->deviceManager()->getDefaultInput();
device->updateFormat( audio::Device::Format().sampleRate( 48000 ) );

Unfortunately this is limited to the sample rates supported by your hardware (in my case only 44.1 and 48 kHz).
It would be nice if you could configure the context to work at a rate independent from the hardware, but I can imagine that could get quite complicated.

For now, I’ll see if I can override BufferRecorderNode::writeToFile to accept an extra parameter.


#4

@koenaad you seem to be figuring it out, but here’s my understanding: audio::dsp::Converter is likely what you’re looking for. On OS X / iOS, setting the samplerate on an audio::Device is currently limited to what your hardware supports, as it instructs Core Audio to process audio at that rate. On other platforms, both OutputDeviceNodes and InputDeviceNodes use an audio::dsp::Converter internally, and I believe the implementation supports pretty much all samplerates. But that will also affect what you hear (are you also playing your sound out of some speakers?), if this is just to compress and send over the 'net, maybe you should get the audio::Buffer back from BufferRecorderNode and run it with the Converter yourself.

Also makes me think of writing to an .ogg file would suit your situation, as it’d be much much smaller. This isn’t implemented yet, only reading oggs is implemented at this time, but it wouldn’t take a huge effort to add a TargetFileOggVorbis, all the ogg source is in there to do it (and a commented out shell here). It’s definitally something I’d like to see get done, anyway.

I find it confusing that BufferRecorderNode does not inherit from OutputNode, isn’t it also an endpoint for audio?

Currently the only OutputNode implementation is OutputDeviceNode. BufferRecorderNode falls into the ‘auto-pullable’ category, along with MonitorNodes. At the time I designed it, I was under the impression that you can only ever have one OutputDeviceNode running at any given time, and this essentially runs the dsp graph within an audio::Context. The Context also keeps a list of NodeAutoPullables, of which it updates each time the OutputDeviceNode gets a system callback. Knowing what I do now, this design is limited on many platforms (for example on Windows most multi-channel soundcards present themselves to the OS as multiple stereo pairs of audio devices), and I’m planning to change it. I think your confusion is worth considering, perhaps the idea of NodeAudioPullable shoudl be removed, and all that currently inherit from it should be OutputNodes. This isn’t at the top of my todo list, but it’s getting there, and I’m collecting notes from situations such as this.

Please keep me updated, and let me know what you find!

cheers,
Rich


#5

Hi @rich.e, I’m not playing back any audio (yet). The only nodes used are an InputDeviceNode and the BufferRecorderNode.
So even though I’m not actively using any OutputDeviceNode, the default OutputDeviceNode is still responsible for pulling audio through the context?

Especially the documentation was confusing by mentioning everywhere that the sample rate is defined by the OutputDeviceNode while I didn’t use any.
But I guess it will be pretty rare creating a Cinder app without using an OutputDeviceNode

I’ve now adapted BufferRecorderNode::writeToFile to accept an extra parameter sampleRate using an Converter internally. Is this something that would fit into mainline Cinder? I could create a PR if desired.

Using .ogg files seems a good idea as well, I’ll check it out.


#6

Especially the documentation was confusing by mentioning everywhere that the sample rate is defined by the OutputDeviceNode while I didn’t use any.
But I guess it will be pretty rare creating a Cinder app without using an OutputDeviceNode…

Well it’s more like so far we’ve focused on covering the main use case of real-time audio i/o, with a design that was sort of attempting to be extended in the future to support more advanced use cases, such as offline processing, spectral graphs, etc. You can read about the audio::Context here, which briefly mentions the role of it’s one and only OutputNode.

I’ve now adapted BufferRecorderNode::writeToFile to accept an extra parameter sampleRate using an Converter internally. Is this something that would fit into mainline Cinder? I could create a PR if desired.

Nice! I think it’d be very nice to be able to specify the output samplerate you’re writing to a file. I think this functionality (and the audio::dsp::Converter) should live on the audio::Target itself, similar to what I’ve done with audio::Source. You can see here where I enable a converter if the file’s native samplerate doesn’t match the desired raw PCM samplerate. Then I agree, BufferRecorderNode::writeToFile can get an addiitional optional argument to specify samplerate.

cheers,
Rich


#7

Alright thanks, I’m starting to get the hang of it!

This is how I implemented resampling within BufferRecorderNode: github commit.
I thought this was a quite elegant solution since the resampling is performed while copying the buffer. But I get that performing resampling in audio::Target is more consistent and useful towards other classes.

I’m willing to have a look at it and implement it there. I’m new to the Cinder framework (and contributing via Github in general) so some guidance is welcome :wink:

The most logical implementation would be to create an additional TargetFile::write with new parameter size_t sampleRate, the sample rate of the buffer. Seems good?


#8

Really cool to see you figured out a solution! I’d love to work with you on getting this feature into core.

Re-reading your original need, are you sending the .wav data uncompressed over the network (as in, serializing the samples in the resulting audio::Buffer)? If so, I can see why you added this functionality directly to BufferRecorderNode. I’d like to discuss the options and use cases here, as there are a few directions this could go in. I still like the idea of having a Converter directly at the audio::Target layer, but this will only give you output files. Though wouldn’t you want to send a compressed audio file over the net anyhow, or is encode / decode latency a problem?

Another thing worth considering is whether to avoid the expense of creating that Converter each time the file is written, by specifying the output sample format when BufferRecorderNode is constructed and then allocating one in BufferRecorderNode::initialize().

I’d like to take this discussion to github, or feel free to email me directly at rich.eakin@gmail.com if you like. Also the beauty of this Node system is that you can run with your modifies BufferRecorderNode for your current project, and we can work through the best way to get this feature into core that is robust and forward thinking.

cheers,
Rich


#9

My use case might a bit unorthodox… so my application uses automatic speech recognition. When you hit record, the app records a snippet of audio (about 12 seconds) and sends it to the cloud to be recognized.

I’m currently using the Google Speech API. Google has a number of official client libraries, unfortunately none of them are in C or C++. The client libraries are easier to authenticate with and you don’t need to enable billing.
So I decided to use a separate Python script (which I already made for a separate project btw).

The procedure is:

  • record a snippet of audio with BufferRecorderNode
  • write to disk
  • run the Python script in a separate thread (using popen)
  • this script picks up the file and sends it to Google Cloud
  • a string with the results are returned to Cinder

Their best practices mention using uncompressed audio (16-bit linear or FLAC) and 16 kHz or higher.
I thought it would be useful if BufferRecorderNode had resampling functionality and integrated it there without really considering the overall project. If you’re considering the overall project, resampling in TargetFile would be more useful for the global project indeed.

What is the best way to start a discussion on github? Open a new issue?


#10

Doesn’t sound that crazy to me. :slight_smile: Yea I suppose writing either to a raw or .wav format makes sense - I’d think that .wav would be easier to handle on the python end (if it supports reading .wav files rather than raw PCM buffers)? I’m not entirely opposed to the resampler living in BufferRecorderNode, though it does still strike me as a necessary piece of the audio::Target system (whether the target is to a file or hopefully fancier things in the future).

I think either a github issue or PR is appropriate, and easier to keep of track of over this forum.

cheers,
Rich


#11

The python script can handle both encoded .wav as well as raw PCM streams. But if you’re using .wav files it can deduce the encoding from the data header which is a bit easier.

Excuses for the delay, I’ve made a PR with an implementation of the sample rate converter inside audio::TargetFile (though I’m not completely happy with it).
We can further discuss it here: https://github.com/cinder/Cinder/pull/1869