A Detailed Walk Through PitchClient#

PitchClient is an example of a Client that takes audio input and produces streaming, non-audio output (a stream of lists in Max and Pure Data, control-rate outputs in SuperCollider). Let’s walk through it.

PitchClient.hpp Overview#

In common with all clients, the basic shape of PitchClient.hpp features some basic blocks

  • A namespace within fluid::client, in this case fluid::client::pitch

  • An enum for indexing the Client’s parameters

  • A constexpr variable that describes the parameters

  • The Client’s class, in this case PitchClient, that inherits from FluidBaseClient and tag classes that describe its input and output types

  • A type alias that wraps the Client in the ClientWrapper template

In addition, because PitchClient has an offline counterpart which works on host buffers, there are further blocks that describe the parameters and aliases needed to automatically generate this offline version.

Let’s go through these. As we go, we’ll also highlight (ahem, confess to) bits that are currently somewhat smelly.

The Namespace#

namespace fluid {
namespace client {
namespace pitch {

This is simple. By convention all Clients live within fluid::client. Because of the unfortunately complex way that parameters are handled, they also all have their own sub-namespace to help prevent name collisions (this applies mostly to the contents of the indexing enum that we’re about to meet).

The enum#

enum PitchParamIndex {
  kSelect,
  kAlgorithm,
  kMinFreq,
  kMaxFreq,
  kUnit,
  kFFT
};

Parameters in Clients are, at heart, wrapped out C++’s heterogeneous container std::tuple and, as such, need to be retrieved using a compile-time constant integer. By convention, we use an enum for this because it allows us to use labels that a more memorable and less error-prone than trying to remember the numbers.

Code Smell

The labels of plain enums like these get injected into whatever namespace they’re in without qualification. Hence the need for the namespace: otherwise code that might use more than one Client header could end up with collisions from these labels.

Having the enum before the next block (the parameter declaration) is also a matter of convention. For Clients with complex parameter declarations that need to reference other parameters, it’s helpful (because we can use the labels), as we’ll see below.

Parameters Declaration#

constexpr auto PitchParams = defineParameters(
    ChoicesParam("select","Selection of Outputs","pitch","confidence"),
    EnumParam("algorithm", "Algorithm", 2, "Cepstrum",
              "Harmonic Product Spectrum", "YinFFT"),
    FloatParam("minFreq", "Low Frequency Bound", 20, Min(0), Max(10000),
               UpperLimit<kMaxFreq>()),
    FloatParam("maxFreq", "High Frequency Bound", 10000, Min(1), Max(20000),
               LowerLimit<kMinFreq>()),
    EnumParam("unit", "Frequency Unit", 0, "Hz", "MIDI"),
    FFTParam("fftSettings", "FFT Settings", 1024, -1, -1));

This dense little block specifies much of how the interface of PitchClient will be generated by a host (i.e. Max, PD, SuperCollider). In Max and PD, these declarations map to the attributes that the eventual object will expose; in SC, they describe the controls of the UGen (for the audio processing case) or the arguments for the processing method (in the offline case).

Note

What we end up with from this block is an object that describes the parameters. Under the hood, it is an instance of a specialization of a class template called ParameterDescriptorSet. That is: this object describes the parameters, rather than holding the values.

The variable PitchParams is a compile-time constant (constexpr) of automatically determined type (auto) that results from a calling a function, defineParameters(). This function takes a variable number of arguments, each of which describes a parameter for the object.

Note

The order of these parameter declarations can matter. In SuperCollider, it determines the order in which the server expects to receive them from a client (and crashing can happen if this isn’t followed). In Max and PD, it makes no difference for attributes, but parameters can also sometimes be signalled to be used also as arguments to an object, in which case the ordering used in the declaration dictates the ordering of the arguments.

Declaring a Parameter#

Each of the parameter descriptions results from a function call to a function that determines the type of the parameter, and takes arguments that specify it. Let’s break one down:

FloatParam("minFreq", 
          "Low Frequency Bound", 
          20, 
          Min(0), Max(10000), UpperLimit<kMaxFreq>())

This says that we want a parameter that’s a floating point number (FloatParam). The next argument is the name for the parameter that will be rendered in environments that use them (Max, PD) and (by convention) repeated in the sclang code that talks to this client. Following that is a longer descriptive label, as one would see in Max’s object inspector, and in generated documentation for this Client. This is followed by a default value for the parameter, so here we’re saying the that the default low-frequency bound for the object is 20 Hz.

All parameter declarations have in common that the first two arguments are a name and a descriptive label. By convention the name is always camelCase – the Max and Pure Data wrappers will, in fact, convert this to lower case for attribute names, but the documentation generator for SuperCollider will use the original, to match up with the sclang class file (which, one day, will be generated). Remaining arguments differ for different types: what follows the name and label is always a default value, where that makes sense for the type, and then final arguments define constraints on that parameter (again, where that makes sense for the type).

Constraints#

What follows are a list of constraints for the parameter. Min and Max are, hopefully, quite self-explanatory: the absolute range for this parameter is 0Hz-10kHz. The final entry, UpperLimit<kMaxFreq> is more complex though: this is a constraint that says to reference the value of another parameter and use that to constrain the value of this one. In this case, it’s saying that the maxFreq parameter establishes an upper limit on the value of this one, so if maxFreq were lower than 10kHz, then the effective maximum value of minFreq would also be lower than 10kHz. This notation is why the enum above is helpful: we can see which other parameter is being referenced here.

Code Smell

This way of doing inter-parameter constraints is brittle and non-DRY, because the order of the parameters has to be replicated correctly between the enum and this block. It also adds considerable complexity to the underlying code and introduces technical constraints out of proportion to how often the facility is actually useful. It is likely to be replaced.

Parameter Types#

We can see that there’s a number of different types of parameter being declared here in addition to FloatParam:

  • ChoicesParam: a set of options from which the user can choose between 1 and the whole set. It is normally used to choose between the available outputs of Clients when there are multiple options. Here, one can decide to just get a pitch estimate, a confidence rating or both.

  • EnumParam: a set of options from which the user can select just one. In this case it’s used twice: once to specify which pitch estimation algorithm to use, and again to specify in what units to send the pitch output.

  • FFTParams: a set of numbers that represent settings for the FFT analysis used by this Client.

You’ll see that each function call has the first two arguments in common (the name and description), but that things diverge from there. For types where a default value makes sense, this is always the third argument. That applies everywhere here expect to ChoicesParam (where the default is always ‘everything’).

For the EnumParam, the final arguments describe the set of options, and the default indicates the (zero-based) index for the default option

EnumParam("algorithm", "Algorithm", 2, "Cepstrum",
              "Harmonic Product Spectrum", "YinFFT")
EnumParam("unit", "Frequency Unit", 0, "Hz", "MIDI")

So for algorithm, the options are ["Cepstrum", "Harmonic Product Spectrum", "YinFFT"], and the default is 2: YinFFT.

For unit, the options are ["Hz", "MIDI"] and the default is 0: Hz.

Note

EnumParams are, in practice, just bounded non-negative integers. However, the labels can be used in richer environments, like Max, and for generated documentation.

Finally, the trailing numbers for FFTParam also spell out the defaults

    FFTParam("fftSettings", "FFT Settings", 1024, -1, -1));

Here this is a series of numbers describing the window size, hop size and FFT size. There could also be a fourth number specifying the default maximum FFT size (up to a global max of 65536).

Code Smell

By convention this set of FFT defaults has been used everywhere, so should probably actually use a common function / object to make changing our mind easier, should it happen. There’s also persistent questions about whether it would be better to have FFT defaults that are (more) optimal for the use case rather than generic (e.g. finer frequency resolution for sinusoidal analysis)

The Class#

PitchClient is defined as

class PitchClient : public FluidBaseClient, public AudioIn, public ControlOut; 

You’ll see it inherits from three other classes, FluidBaseClient, AudioIn and ControlOut. All Clients (currently) must inherit from FluidBaseClient (mostly for the parameter system to work). Meanwhile AudioIn and ControlOut are tag types that are used by the host wrappers to generate the appropriate code for this I/O configuration.

Code Smell

This tag inheritance system isn’t really powerful enough to fulfil our needs:

  • It would be better to be able to specify the quantity of ports of different types at compile time, where possible

  • It would also be better to specify this information in a more similar way to other interface specifications, like parameters and messages

  • What combinations of I/O types are feasible is unfortunately host dependent, and enforcement of this is currently pretty ad hoc

The heart of the Client class is the process member function that actually does the work:

template <typename T>
void process(std::vector<HostVector<T>>& input,
              std::vector<HostVector<T>>& output, FluidContext& c)

Other important features:

  • A constructor: This is responsible for important things, like letting the algorithms in this client know how much memory to allocate up-front, and for declaring the number of different inputs and outputs

  • A bunch of boilerplate to do with the parameter system, for which I will apologize shortly

  • Member functions needed by the host wrapper

In the case of PitchClient there are also some declarations, before the public section of the class that are used internally. We’ll address these when we look at process() in more detail, but first the boilerplate and the constructor.

Boilerplate#

Code Smell

Even needing a section on boilerplate is pretty smelly. A high priority ambition is to get rid of the need for this and to considerably streamline how Clients describe themselves to the outside world.

Each Client currently needs an unfortunate amount of boilerplate code to let the parameter magic work without resorting to macros. It’s quite embarrassing. Here it is for PitchClient, with some added inline comments

// Advertises the C++ type of the ParameterDescriptoSet that we 
// defined above in `PitchParams`
using ParamDescType = decltype(PitchParams);

// Derives and advertises the type of the ParamSetView that actually 
// holds parameter values rather than descriptions 
using ParamSetViewType = ParameterSetView<ParamDescType>;

// Declares a member variable, mParams, that is a 
// std::reference_wrapper around an instance of the parameter values
// This needs to be killed with fire 
std::reference_wrapper<ParamSetViewType> mParams;

// Defines a member function used by host wrappers to set 
// the value of mParams (called before each invocation of process())
void setParams(ParamSetViewType& p) { mParams = p; }

// Defines a member function, used by PitchClient, to retrieve
// parameter values without having to write an absurd number of things
template <size_t N>
auto& get() const
{
  return mParams.get().template get<N>();
}

// Defines a *static* member function that returns the actual ParameterDescriptorSet instance, PitchParams 
static constexpr auto& getParameterDescriptors() { return PitchParams; }

Every Client needs some version of this, and it’s almost always identical except for the name of the variable that hold the ParameterDescriptorSet instance (here PitchParams). So, it’s not even just straightforward copypasta. I’m very sorry (but I do have a vision of how to get rid of it all).

So, the instructions for making the boilerplate for your own client come down to:

  1. Paste a block from an existing Client

  2. Find the two references the parameter description object and change to match what you declared above

Constructor#

Here is the signature for the constructor:

PitchClient(ParamSetViewType& p, FluidContext& c)

This is the general form for all Clients. The arguments are a reference to a set of starting parameter values, of type ParamSetViewType (ewwwwww), and an instance of FluidContext, which is a container for important features about the current execution context, like what memory allocator to use and the current vector size.

The whole constructor for PitchClient:

PitchClient(ParamSetViewType& p, FluidContext& c)
    : mParams(p), 
      mSTFTBufferedProcess(get<kFFT>(), 1, 0, c.hostVectorSize(), c.allocator()),
      cepstrumF0(get<kFFT>().maxFrameSize(), c.allocator()),
      mMagnitude(get<kFFT>().maxFrameSize(), c.allocator()),
      mDescriptors(2, c.allocator())
{
  audioChannelsIn(1);
  controlChannelsOut({1,mMaxFeatures});
  setInputLabels({"audio input"});
  setOutputLabels({"pitch (hz or MIDI), pitch confidence (0-1)"});
}

So, after the declaration, there’s the member initialization list. This, uh, initializes the class members: here, the parameter values (mParams) and the various Algorithm objects that are used by PitchClient. Note that the algorithms all get passed the allocator from FluidContext, so that they allocate memory from the correct place (especially important for SuperCollider).

The body of the constructor then does a bunch of stuff that would be better done at compile-time, namely declaring the numbers of I/O ports and giving them some labels (for assistance in Max).

process(), at last#

As a reminder, here’s the signature for process:

template <typename T>
void process(std::vector<HostVector<T>>& input,
              std::vector<HostVector<T>>& output, FluidContext& c)

So this is a member function template, on this type T, which is the underlying type of these HostVector things of which we have two std::vectors. std::vector<HostVector<T>> is, in essence, a slightly convoluted way of saying either double** or float**, i.e. T will be either double or float (depending on the host), and the input and output arguments represent the arrays of data that we read from / write to. As with the constructor, we also have an instance of FluidContext.

Note

HostVector<T> is actually an alias for FluidTensorView<T,1>. The FluidTensorView class template acts as a wrapper around a T* that allows us to do multidimensional indexing relatively painlessly. It’s the main container type passed between FluCoMa functions, and what most Clients work with.

Code Smell

  • Really input should be const. Something to fix across the clients

  • std::vector<HostVector<T>> is a bit of a mouthful. We can do better and make a single container alias that hides this implementation detail

The steps that process() follows are widely repeated across many Clients:

  1. Check that there’s actually data to process

  2. assert any invariants, such as ensuring that the number of outputs is as needed

  3. See if certain key parameters have changed value, in which case some Algorithms may need to be reinitialized

  4. The actual work

  5. Transform and dispatch outputs

Steps 1 and 2 are pretty trivial:

if (!input[0].data() || !output[0].data()) return;
assert(controlChannelsOut().size && "No control channels");
assert(output[0].size() >= controlChannelsOut().size &&
        "Too few output channels");

These say that if either the inputs or the outputs are nullptr then we can finish: in some hosts this signals that they aren’t connected. This is left to the Client to decide because in some contexts a Client may still want to do some work even if it’s not connected. The assertions are more dramatic: they’re saying that we expect there to be at least enough outputs to support what we think we need, and if there aren’t then this is a catastrophic breach of contract and we terminate the program (in debug builds).

Step 3 looks like this:

if (mParamTracker.changed(get<kFFT>().frameSize(), sampleRate(), c.hostVectorSize()))
{
  cepstrumF0.init(get<kFFT>().frameSize(), c.allocator());
  mSTFTBufferedProcess = STFTBufferedProcess(get<kFFT>(), 1, 0, c.hostVectorSize(), c.allocator());
}

This uses a little utility called ParamValueTracker that we can use to track whether the values of some bunch of things have changed since last time we called it. In this case, if the FFT analysis settings, the sample rate or host vector size have changed, then the cepstrum algorithm and the STFT processor need to be reinitialized.

Note

By and large the Algorithms all expose this pattern of having a separate init() member function. This is to try and encourage a decoupling of up-front allocation-hopefully based on full knowledge of the maximum memory requirements over the object’s lifetime-and any initialization that an Algorithm needs to do before it can do useful work. Maybe it’d be better if an Algorithm’s storage were more explicitly decoupled from its behaviour (so we could just make a new instance instead, without doing lots of reallocations).

Step 4, then, the actual work:

FluidTensorView<double, 1> mags = mMagnitude(Slice(0,get<kFFT>().frameSize()));
        
mSTFTBufferedProcess.processInput(
    get<kFFT>(), input, c, [&](ComplexMatrixView in) {
      algorithm::STFT::magnitude(in.row(0), mags);
      switch (get<kAlgorithm>())
      {
      case 0:
        cepstrumF0.processFrame(mags, mDescriptors, get<kMinFreq>(),
                                get<kMaxFreq>(), sampleRate(),c.allocator());
        break;
      case 1:
        hps.processFrame(mags, mDescriptors, 4, get<kMinFreq>(),
                          get<kMaxFreq>(), sampleRate(), c.allocator());
        break;
      case 2:
        yinFFT.processFrame(mags, mDescriptors, get<kMinFreq>(),
                            get<kMaxFreq>(), sampleRate(), c.allocator());
        break;
      }
    });

First, into the variable mags we’re taking a slice out of a container of double that we allocated when the PitchClient was instantiated – remember that FluidTensorView is pointer-ish, so this isn’t an allocation, just making a wrapper around a portion of already allocated memory. We use this to store FFT magnitudes below.

The next portion might be a bit magic. The member variable mSTFTBuferedProcess (which could maybe be renamed), is an instance of a helper class that handles arranging input samples into overlapping windows and doing an FFT on these windows, i.e an STFT. In this case we’re calling its member function template processInput(), which just says to do a forward FFT on the input samples, but we don’t need to do an IFFT and overlap-add on an output frame because we’re not outputting audio. The signature looks something like

template<typename ProcessFunc>
processInput(FFTParams, InputSamples, FluidContext, ProcessFunc);

That is, we pass it the current FFT settings, our current input vector of input samples, the FluidContext we got passed into process() and the the template parameter ProcessFunc, which should be a function object that takes a single (ComplexMatrixView) (in the input-only case). Here, as in all the Clients, we supply that function object through a lambda

[&](ComplexMatrixView in) {
  algorithm::STFT::magnitude(in.row(0), mags);
  switch (get<kAlgorithm>())
  {
  case 0:
    cepstrumF0.processFrame(mags, mDescriptors, get<kMinFreq>(),
                            get<kMaxFreq>(), sampleRate(),c.allocator());
    break;
  case 1:
    hps.processFrame(mags, mDescriptors, 4, get<kMinFreq>(),
                      get<kMaxFreq>(), sampleRate(), c.allocator());
    break;
  case 2:
    yinFFT.processFrame(mags, mDescriptors, get<kMinFreq>(),
                        get<kMaxFreq>(), sampleRate(), c.allocator());
    break;
  }
}

Note

ComplexMatrixView is an alias for FluidTensorView<std::complex<double>,2>. So, a wrapper around some pointer to an array of std::complex<double> that allows us to address it as a 2D structure (hence ‘matrix’). It doesn’t, however, offer any linear algebra facilities.

So this is a function that takes 2D array of complex numbers, representing the content of the current FFT frame. What can be confusing is how often this function gets called: because we’re buffering the input, the lambda gets called every time there’s a new window of data available. Meanwhile process() gets called every time the host receives a new signal vector. So the lambda could get called multiple times in one visit to process() if the host vector size is bigger than the STFT hop size, or conversely it could get called only every n invocations of process() if the hop size is bigger than the vector size by some factor n.

Code Smell

This scheme works fine, but is a source of easy errors and confusion. Not least because it makes reasoning about the output of process() more difficult: when hop > host vector size, it can be easy to forget that we need to hold on to the last output rather than output zeros (which makes users sad). It also means that we potentially do more work than the user sees getting used (as the outputs can’t update more than once per signal vector)

Clearly it would be better if we could be more declarative about this in the Client and merely state that we need windowing and transforming to happen, and just treat process() itself as where the core processing happens. This way, the framework could handle caching results, and we can split up the mixture of things that happen in process() but aren’t processing.

Meanwhile, actually in the lambda, the steps are straightforward enough:

  1. Calculate the FFT magnitudes from the complex input.

  2. See which Algorithm we’re currently using, and send the magnitudes into its processFrame member function, along with another FluidTensorView<double,1> from the mDescriptors member variable to hold the output data, and whatever parameters and other data they need.

Note

Passing mDescriptors this way is taking advantage of the fact that FluidTensorView’s cousin FluidTensor can be implicitly converted to a FluidTensorView of the same type. FluidTensor is an owning version of FluidTensorView, which is to say that rather than wrapping a pointer owneed by someone else, it wraps a container (a std::vector, as it happens), and therefore is in charge of the lifetime of its contents.

Step 5! Finally, then, it’s time to marshall our outputs and finish up. Here we go, with some added comments:

// Query the `ChoiceParam` 'select' to see which outputs the user wants
auto selection = get<kSelect>();
index numSelected = asSigned(selection.count());
index numOuts = std::min<index>(mMaxFeatures,numSelected);
// that's now our number of outputs to report back to the host 
controlChannelsOut({1,numOuts, mMaxFeatures});

//Then fill the appropriate slots in the output buffer. 
index i = 0;
//pitch
if (selection[0])
  output[0](i++) =
      static_cast<T>(setPitchUnits[asUnsigned(get<kUnit>())](mDescriptors(0)));

// pitch confidence
if(selection[1])
  output[0](i) = static_cast<T>(mDescriptors(1));

//fill any unselected slots with 0 (for supercollider) 
output[0](Slice(numOuts,mMaxFeatures - numOuts)).fill(0);  

Note

The type fluid::index is an alias to std::ptrdiff_t, i.e. a signed integer of pointer-width. We realise it’s a religious issue, but we’ve chosen to go with signed throughout, including for indexing, despite the pain that this bring in interactions with standard library containers. Given that now the C++ committee (mostly) acknowledge that using unsigned indexing was a mistake, even if they won’t change it, we can live with ourselves.

Supporting Member Functions#

There are three remaining member functions:

//reports the latency introduced by this processing chain to the host 
index latency() { return get<kFFT>().winSize(); }

// used by the automatic non-real-time wrapper mechanism to work out things 
// like how much padding to add in processing 
AnalysisSize analysisSettings()
{
  return { get<kFFT>().winSize(), get<kFFT>().hopSize() }; 
}

// allows the host to reset the `Client` to a starting state 
void  reset(FluidContext& c)
{
  mSTFTBufferedProcess.reset();
  cepstrumF0.init(get<kFFT>().frameSize(), c.allocator());
}

Code Smell

Needing latency() (for a trivial case like this) and analyisSettings() are a further consequence of having the windowing->fft pipeline hidden in process() rather than expressed declaratively. IOW, hosts should be able to know about this pipeline and deal with its entirely mundane consequences automatically, and Clients should only need to step in if they’re doing something whacky.

Registering PitchClient and its Non-real-time Sibling#

At this point we exit the pitch namespace and come back up to fluid::client. All that remains is to make available the final types that the host wrappers can use to generate objects:

// Register the real-time version
using RTPitchClient = ClientWrapper<pitch::PitchClient>;

// Declare some extra parameters for the non-real-time version that 
// serve as input and output buffer objects 
auto constexpr NRTPitchParams = makeNRTParams<pitch::PitchClient>(
    InputBufferParam("source", "Source Buffer"),
    BufferParam("features", "Features Buffer"));

// Wrap PitchClient up as a non-real-time object
using NRTPitchClient =
    NRTControlAdaptor<pitch::PitchClient, decltype(NRTPitchParams),
                      NRTPitchParams, 1, 1>;

// Make it so that the non-real-time object can do its work in 
// a separate thread if requested 
using NRTThreadedPitchClient = NRTThreadingAdaptor<NRTPitchClient>;

The pattern for making the wrapped version for offline use is quite similar to what we’ve already seen: we make a constexpr variable that’s specialising ParameterDescriptorSet. The difference here is that we’re composing two new parameters with the existing set in PitchParams. These are types that we haven’t seen before: InputBufferParam represents an buffer object and is read-only, and BufferParam is a writeable equivalent. So this is saying that we will replace our audio input with a buffer in the parameter source and our stream of output features with a buffer in the parameter features. The function makeNRTParams will also add, for each input buffer, a set of parameters for the offset and number of frames to process, and likewise for channels. Meaning we can do multichannel processing, and / or work just on sections of buffers.

Code Smell

  • I’m not sure why we need to explicitly wrap the real-time Client in a ClientWrapper. Seems like it could be done automagically.

  • The whole business with makeNRTParams is more boilerplate, which would go away if clients could report their I/O more effectively.