A Detailed Walk Through PitchClient =================================== `PitchClient` is an example of a `Client` that takes audio input and produces streaming, non-audio output (a stream of lists in Max and Pure Data, control-rate outputs in SuperCollider). Let's walk through it. # `PitchClient.hpp` Overview In common with all clients, the basic shape of `PitchClient.hpp` features some basic blocks * A namespace within `fluid::client`, in this case `fluid::client::pitch` * An `enum` for indexing the `Client`'s parameters * A `constexpr` variable that describes the parameters * The `Client`'s class, in this case `PitchClient`, that inherits from `FluidBaseClient` and tag classes that describe its input and output types * A type alias that wraps the `Client` in the `ClientWrapper` template In addition, because `PitchClient` has an offline counterpart which works on host buffers, there are further blocks that describe the parameters and aliases needed to automatically generate this offline version. Let's go through these. As we go, we'll also highlight (ahem, confess to) bits that are currently somewhat smelly. ## The Namespace ```c++ namespace fluid { namespace client { namespace pitch { ``` This is simple. By convention all `Clients` live within `fluid::client`. Because of the unfortunately complex way that parameters are handled, they also all have their own sub-namespace to help prevent name collisions (this applies mostly to the contents of the indexing `enum` that we're about to meet). ## The `enum` ```c++ enum PitchParamIndex { kSelect, kAlgorithm, kMinFreq, kMaxFreq, kUnit, kFFT }; ``` Parameters in `Clients` are, at heart, wrapped out C++'s heterogeneous container `std::tuple` and, as such, need to be retrieved using a compile-time constant integer. By convention, we use an `enum` for this because it allows us to use labels that a more memorable and less error-prone than trying to remember the numbers. ```{admonition} Code Smell The labels of plain `enum`s like these get injected into whatever namespace they're in without qualification. Hence the need for the namespace: otherwise code that might use more than one `Client` header could end up with collisions from these labels. ``` Having the `enum` before the next block (the parameter declaration) is also a matter of convention. For `Client`s with complex parameter declarations that need to reference other parameters, it's helpful (because we can use the labels), as we'll see below. ## Parameters Declaration ```c++ constexpr auto PitchParams = defineParameters( ChoicesParam("select","Selection of Outputs","pitch","confidence"), EnumParam("algorithm", "Algorithm", 2, "Cepstrum", "Harmonic Product Spectrum", "YinFFT"), FloatParam("minFreq", "Low Frequency Bound", 20, Min(0), Max(10000), UpperLimit()), FloatParam("maxFreq", "High Frequency Bound", 10000, Min(1), Max(20000), LowerLimit()), EnumParam("unit", "Frequency Unit", 0, "Hz", "MIDI"), FFTParam("fftSettings", "FFT Settings", 1024, -1, -1)); ``` This dense little block specifies much of how the interface of `PitchClient` will be generated by a host (i.e. Max, PD, SuperCollider). In Max and PD, these declarations map to the attributes that the eventual object will expose; in SC, they describe the controls of the UGen (for the audio processing case) or the arguments for the processing method (in the offline case). ```{note} What we end up with from this block is an *object* that describes the parameters. Under the hood, it is an instance of a specialization of a class template called `ParameterDescriptorSet`. That is: this object describes the parameters, rather than holding the values. ``` The variable `PitchParams` is a compile-time constant (`constexpr`) of automatically determined type (`auto`) that results from a calling a function, `defineParameters()`. This function takes a variable number of arguments, each of which describes a parameter for the object. ```{note} The order of these parameter declarations can matter. In SuperCollider, it determines the order in which the server expects to receive them from a client (and crashing can happen if this isn't followed). In Max and PD, it makes no difference for attributes, but parameters can also sometimes be signalled to be used also as *arguments* to an object, in which case the ordering used in the declaration dictates the ordering of the arguments. ``` ### Declaring a Parameter Each of the parameter descriptions results from a function call to a function that determines the type of the parameter, and takes arguments that specify it. Let's break one down: ```c++ FloatParam("minFreq", "Low Frequency Bound", 20, Min(0), Max(10000), UpperLimit()) ``` This says that we want a parameter that's a floating point number (`FloatParam`). The next argument is the name for the parameter that will be rendered in environments that use them (Max, PD) and (by convention) repeated in the `sclang` code that talks to this client. Following that is a longer descriptive label, as one would see in Max's object inspector, and in generated documentation for this `Client`. This is followed by a default value for the parameter, so here we're saying the that the default low-frequency bound for the object is 20 Hz. All parameter declarations have in common that the first two arguments are a name and a descriptive label. By convention the name is always `camelCase` – the Max and Pure Data wrappers will, in fact, convert this to lower case for attribute names, but the documentation generator for SuperCollider will use the original, to match up with the sclang class file (which, one day, will be generated). Remaining arguments differ for different types: what follows the name and label is always a default value, where that makes sense for the type, and then final arguments define *constraints* on that parameter (again, where that makes sense for the type). ### Constraints What follows are a list of *constraints* for the parameter. `Min` and `Max` are, hopefully, quite self-explanatory: the absolute range for this parameter is 0Hz-10kHz. The final entry, `UpperLimit` is more complex though: this is a constraint that says to reference the value of another parameter and use that to constrain the value of this one. In this case, it's saying that the `maxFreq` parameter establishes an upper limit on the value of this one, so if `maxFreq` were lower than 10kHz, then the effective maximum value of `minFreq` would also be lower than 10kHz. This notation is why the `enum` above is helpful: we can see which other parameter is being referenced here. ```{admonition} Code Smell This way of doing inter-parameter constraints is brittle and non-DRY, because the order of the parameters has to be replicated correctly between the `enum` and this block. It also adds considerable complexity to the underlying code and introduces technical constraints out of proportion to how often the facility is actually useful. It is likely to be replaced. ``` ### Parameter Types We can see that there's a number of different types of parameter being declared here in addition to `FloatParam`: * `ChoicesParam`: a set of options from which the user can choose between 1 and the whole set. It is normally used to choose between the available outputs of `Client`s when there are multiple options. Here, one can decide to just get a pitch estimate, a confidence rating or both. * `EnumParam`: a set of options from which the user can select just one. In this case it's used twice: once to specify which pitch estimation algorithm to use, and again to specify in what units to send the pitch output. * `FFTParams`: a set of numbers that represent settings for the FFT analysis used by this `Client`. You'll see that each function call has the first two arguments in common (the name and description), but that things diverge from there. For types where a default value makes sense, this is always the third argument. That applies everywhere here expect to `ChoicesParam` (where the default is always 'everything'). For the `EnumParam`, the final arguments describe the set of options, and the default indicates the (zero-based) index for the default option ```c++ EnumParam("algorithm", "Algorithm", 2, "Cepstrum", "Harmonic Product Spectrum", "YinFFT") EnumParam("unit", "Frequency Unit", 0, "Hz", "MIDI") ``` So for `algorithm`, the options are `["Cepstrum", "Harmonic Product Spectrum", "YinFFT"]`, and the default is `2`: `YinFFT`. For `unit`, the options are `["Hz", "MIDI"]` and the default is 0: `Hz`. ```{note} `EnumParams` are, in practice, just bounded non-negative integers. However, the labels can be used in richer environments, like Max, and for generated documentation. ``` Finally, the trailing numbers for `FFTParam` also spell out the defaults ```c++ FFTParam("fftSettings", "FFT Settings", 1024, -1, -1)); ``` Here this is a series of numbers describing the window size, hop size and FFT size. There could also be a fourth number specifying the default *maximum* FFT size (up to a global max of 65536). ```{admonition} Code Smell By convention this set of FFT defaults has been used everywhere, so should probably actually use a common function / object to make changing our mind easier, should it happen. There's also persistent questions about whether it would be better to have FFT defaults that are (more) optimal for the use case rather than generic (e.g. finer frequency resolution for sinusoidal analysis) ``` ## The Class `PitchClient` is defined as ```c++ class PitchClient : public FluidBaseClient, public AudioIn, public ControlOut; ``` You'll see it inherits from three other classes, `FluidBaseClient`, `AudioIn` and `ControlOut`. All `Client`s (currently) must inherit from `FluidBaseClient` (mostly for the parameter system to work). Meanwhile `AudioIn` and `ControlOut` are *tag types* that are used by the host wrappers to generate the appropriate code for this I/O configuration. ```{admonition} Code Smell This tag inheritance system isn't really powerful enough to fulfil our needs: * It would be better to be able to specify the quantity of ports of different types at compile time, where possible * It would also be better to specify this information in a more similar way to other interface specifications, like parameters and messages * What combinations of I/O types are feasible is unfortunately host dependent, and enforcement of this is currently pretty *ad hoc* ``` The heart of the `Client` class is the `process` member function that actually does the work: ```c++ template void process(std::vector>& input, std::vector>& output, FluidContext& c) ``` Other important features: * A constructor: This is responsible for important things, like letting the `algorithms` in this client know how much memory to allocate up-front, and for declaring the number of different inputs and outputs * A bunch of boilerplate to do with the parameter system, for which I will apologize shortly * Member functions needed by the host wrapper In the case of `PitchClient` there are also some declarations, before the `public` section of the class that are used internally. We'll address these when we look at `process()` in more detail, but first the boilerplate and the constructor. ### Boilerplate ```{admonition} Code Smell Even needing a section on boilerplate is pretty smelly. A high priority ambition is to get rid of the need for this and to considerably streamline how `Client`s describe themselves to the outside world. ``` Each `Client` currently needs an unfortunate amount of boilerplate code to let the parameter magic work without resorting to macros. It's quite embarrassing. Here it is for `PitchClient`, with some added inline comments ```c++ // Advertises the C++ type of the ParameterDescriptoSet that we // defined above in `PitchParams` using ParamDescType = decltype(PitchParams); // Derives and advertises the type of the ParamSetView that actually // holds parameter values rather than descriptions using ParamSetViewType = ParameterSetView; // Declares a member variable, mParams, that is a // std::reference_wrapper around an instance of the parameter values // This needs to be killed with fire std::reference_wrapper mParams; // Defines a member function used by host wrappers to set // the value of mParams (called before each invocation of process()) void setParams(ParamSetViewType& p) { mParams = p; } // Defines a member function, used by PitchClient, to retrieve // parameter values without having to write an absurd number of things template auto& get() const { return mParams.get().template get(); } // Defines a *static* member function that returns the actual ParameterDescriptorSet instance, PitchParams static constexpr auto& getParameterDescriptors() { return PitchParams; } ``` Every `Client` needs some version of this, and it's almost always identical *except* for the name of the variable that hold the `ParameterDescriptorSet` instance (here `PitchParams`). So, it's not even just straightforward copypasta. I'm very sorry (but I do have a vision of how to get rid of it all). So, the instructions for making the boilerplate for your own client come down to: 1. Paste a block from an existing `Client` 2. Find the two references the parameter description object and change to match what you declared above ### Constructor Here is the signature for the constructor: ```c++ PitchClient(ParamSetViewType& p, FluidContext& c) ``` This is the general form for all `Client`s. The arguments are a reference to a set of starting parameter values, of type `ParamSetViewType` (ewwwwww), and an instance of `FluidContext`, which is a container for important features about the current execution context, like what memory allocator to use and the current vector size. The whole constructor for `PitchClient`: ```c++ PitchClient(ParamSetViewType& p, FluidContext& c) : mParams(p), mSTFTBufferedProcess(get(), 1, 0, c.hostVectorSize(), c.allocator()), cepstrumF0(get().maxFrameSize(), c.allocator()), mMagnitude(get().maxFrameSize(), c.allocator()), mDescriptors(2, c.allocator()) { audioChannelsIn(1); controlChannelsOut({1,mMaxFeatures}); setInputLabels({"audio input"}); setOutputLabels({"pitch (hz or MIDI), pitch confidence (0-1)"}); } ``` So, after the declaration, there's the member initialization list. This, uh, initializes the class members: here, the parameter values (`mParams`) and the various `Algorithm` objects that are used by `PitchClient`. Note that the algorithms all get passed the allocator from `FluidContext`, so that they allocate memory from the correct place (especially important for SuperCollider). The body of the constructor then does a bunch of stuff that would be better done at compile-time, namely declaring the numbers of I/O ports and giving them some labels (for assistance in Max). ### `process()`, at last As a reminder, here's the signature for `process`: ```c++ template void process(std::vector>& input, std::vector>& output, FluidContext& c) ``` So this is a member function *template*, on this type `T`, which is the underlying type of these `HostVector` things of which we have two `std::vectors`. `std::vector>` is, in essence, a slightly convoluted way of saying either `double**` or `float**`, i.e. `T` will be either `double` or `float` (depending on the host), and the `input` and `output` arguments represent the arrays of data that we read from / write to. As with the constructor, we also have an instance of `FluidContext`. ```{note} `HostVector` is actually an alias for `FluidTensorView`. The `FluidTensorView` class template acts as a wrapper around a `T*` that allows us to do multidimensional indexing relatively painlessly. It's the main container type passed between FluCoMa functions, and what most `Clients` work with. ``` ```{admonition} Code Smell * Really `input` should be `const`. Something to fix across the clients * `std::vector>` is a bit of a mouthful. We can do better and make a single container alias that hides this implementation detail ``` The steps that `process()` follows are widely repeated across many `Client`s: 1. Check that there's actually data to process 2. `assert` any invariants, such as ensuring that the number of `outputs` is as needed 3. See if certain key parameters have changed value, in which case some `Algorithms` may need to be reinitialized 4. The actual work 5. Transform and dispatch outputs Steps 1 and 2 are pretty trivial: ```c++ if (!input[0].data() || !output[0].data()) return; assert(controlChannelsOut().size && "No control channels"); assert(output[0].size() >= controlChannelsOut().size && "Too few output channels"); ``` These say that if either the inputs or the outputs are `nullptr` then we can finish: in some hosts this signals that they aren't connected. This is left to the `Client` to decide because in some contexts a `Client` may still want to do some work even if it's not connected. The `assert`ions are more dramatic: they're saying that we expect there to be at least enough outputs to support what we think we need, and if there aren't then this is a catastrophic breach of contract and we terminate the program (in debug builds). Step 3 looks like this: ```c++ if (mParamTracker.changed(get().frameSize(), sampleRate(), c.hostVectorSize())) { cepstrumF0.init(get().frameSize(), c.allocator()); mSTFTBufferedProcess = STFTBufferedProcess(get(), 1, 0, c.hostVectorSize(), c.allocator()); } ``` This uses a little utility called `ParamValueTracker` that we can use to track whether the values of some bunch of things have changed since last time we called it. In this case, if the FFT analysis settings, the sample rate or host vector size have changed, then the cepstrum algorithm and the STFT processor need to be reinitialized. ```{note} By and large the `Algorithm`s all expose this pattern of having a separate `init()` member function. This is to try and encourage a decoupling of up-front allocation-hopefully based on full knowledge of the maximum memory requirements over the object's lifetime-and any initialization that an `Algorithm` needs to do before it can do useful work. Maybe it'd be better if an `Algorithm`'s storage were more explicitly decoupled from its behaviour (so we could just make a new instance instead, without doing lots of reallocations). ``` Step 4, then, the actual work: ```c++ FluidTensorView mags = mMagnitude(Slice(0,get().frameSize())); mSTFTBufferedProcess.processInput( get(), input, c, [&](ComplexMatrixView in) { algorithm::STFT::magnitude(in.row(0), mags); switch (get()) { case 0: cepstrumF0.processFrame(mags, mDescriptors, get(), get(), sampleRate(),c.allocator()); break; case 1: hps.processFrame(mags, mDescriptors, 4, get(), get(), sampleRate(), c.allocator()); break; case 2: yinFFT.processFrame(mags, mDescriptors, get(), get(), sampleRate(), c.allocator()); break; } }); ``` First, into the variable `mags` we're taking a slice out of a container of `double` that we allocated when the `PitchClient` was instantiated – remember that `FluidTensorView` is pointer-ish, so this isn't an allocation, just making a wrapper around a portion of already allocated memory. We use this to store FFT magnitudes below. The next portion might be a bit magic. The member variable `mSTFTBuferedProcess` (which could maybe be renamed), is an instance of a helper class that handles arranging input samples into overlapping windows and doing an FFT on these windows, i.e an STFT. In this case we're calling its member function template `processInput()`, which just says to do a forward FFT on the input samples, but we don't need to do an IFFT and overlap-add on an output frame because we're not outputting audio. The signature looks something like ```c++ template processInput(FFTParams, InputSamples, FluidContext, ProcessFunc); ``` That is, we pass it the current FFT settings, our current input vector of input samples, the `FluidContext` we got passed into `process()` and the the template parameter `ProcessFunc`, which should be a function object that takes a single `(ComplexMatrixView)` (in the input-only case). Here, as in all the `Client`s, we supply that function object through a lambda ```c++ [&](ComplexMatrixView in) { algorithm::STFT::magnitude(in.row(0), mags); switch (get()) { case 0: cepstrumF0.processFrame(mags, mDescriptors, get(), get(), sampleRate(),c.allocator()); break; case 1: hps.processFrame(mags, mDescriptors, 4, get(), get(), sampleRate(), c.allocator()); break; case 2: yinFFT.processFrame(mags, mDescriptors, get(), get(), sampleRate(), c.allocator()); break; } } ``` ```{note} `ComplexMatrixView` is an alias for `FluidTensorView,2>`. So, a wrapper around some pointer to an array of `std::complex` that allows us to address it as a 2D structure (hence 'matrix'). It doesn't, however, offer any linear algebra facilities. ``` So this is a function that takes 2D array of complex numbers, representing the content of the current FFT frame. What can be confusing is how often this function gets called: because we're buffering the input, the lambda gets called every time there's a new window of data available. Meanwhile `process()` gets called every time the host receives a new signal vector. So the lambda could get called multiple times in one visit to `process()` if the host vector size is bigger than the STFT hop size, or conversely it could get called only every `n` invocations of `process()` if the hop size is bigger than the vector size by some factor `n`. ```{admonition} Code Smell This scheme *works* fine, but is a source of easy errors and confusion. Not least because it makes reasoning about the `output` of `process()` more difficult: when hop > host vector size, it can be easy to forget that we need to hold on to the *last output* rather than output zeros (which makes users sad). It also means that we potentially do more work than the user sees getting used (as the outputs can't update more than once per signal vector) Clearly it would be better if we could be more declarative about this in the `Client` and merely state that we need windowing and transforming to happen, and just treat `process()` itself as where the core processing happens. This way, the framework could handle caching results, and we can split up the mixture of things that happen in `process()` but aren't processing. ``` Meanwhile, actually in the lambda, the steps are straightforward enough: 1. Calculate the FFT magnitudes from the complex input. 2. See which `Algorithm` we're currently using, and send the magnitudes into its `processFrame` member function, along with another `FluidTensorView` from the `mDescriptors` member variable to hold the output data, and whatever parameters and other data they need. ```{note} Passing `mDescriptors` this way is taking advantage of the fact that `FluidTensorView`'s cousin `FluidTensor` can be implicitly converted to a `FluidTensorView` of the same type. `FluidTensor` is an *owning* version of `FluidTensorView`, which is to say that rather than wrapping a pointer owneed by someone else, it wraps a container (a std::vector, as it happens), and therefore is in charge of the lifetime of its contents. ``` Step 5! Finally, then, it's time to marshall our outputs and finish up. Here we go, with some added comments: ```c++ // Query the `ChoiceParam` 'select' to see which outputs the user wants auto selection = get(); index numSelected = asSigned(selection.count()); index numOuts = std::min(mMaxFeatures,numSelected); // that's now our number of outputs to report back to the host controlChannelsOut({1,numOuts, mMaxFeatures}); //Then fill the appropriate slots in the output buffer. index i = 0; //pitch if (selection[0]) output[0](i++) = static_cast(setPitchUnits[asUnsigned(get())](mDescriptors(0))); // pitch confidence if(selection[1]) output[0](i) = static_cast(mDescriptors(1)); //fill any unselected slots with 0 (for supercollider) output[0](Slice(numOuts,mMaxFeatures - numOuts)).fill(0); ``` ```{note} The type `fluid::index` is an alias to `std::ptrdiff_t`, i.e. a signed integer of pointer-width. We realise it's a religious issue, but we've chosen to go with signed throughout, including for indexing, despite the pain that this bring in interactions with standard library containers. Given that now the C++ committee (mostly) acknowledge that using unsigned indexing was a mistake, even if they won't change it, we can live with ourselves. ``` ### Supporting Member Functions There are three remaining member functions: ```c++ //reports the latency introduced by this processing chain to the host index latency() { return get().winSize(); } // used by the automatic non-real-time wrapper mechanism to work out things // like how much padding to add in processing AnalysisSize analysisSettings() { return { get().winSize(), get().hopSize() }; } // allows the host to reset the `Client` to a starting state void reset(FluidContext& c) { mSTFTBufferedProcess.reset(); cepstrumF0.init(get().frameSize(), c.allocator()); } ``` ```{admonition} Code Smell Needing `latency()` (for a trivial case like this) and `analyisSettings()` are a further consequence of having the windowing->fft pipeline hidden in `process()` rather than expressed declaratively. IOW, hosts should be able to know about this pipeline and deal with its entirely mundane consequences automatically, and `Client`s should only need to step in if they're doing something whacky. ``` ## Registering `PitchClient` and its Non-real-time Sibling At this point we exit the `pitch` namespace and come back up to `fluid::client`. All that remains is to make available the final types that the host wrappers can use to generate objects: ```c++ // Register the real-time version using RTPitchClient = ClientWrapper; // Declare some extra parameters for the non-real-time version that // serve as input and output buffer objects auto constexpr NRTPitchParams = makeNRTParams( InputBufferParam("source", "Source Buffer"), BufferParam("features", "Features Buffer")); // Wrap PitchClient up as a non-real-time object using NRTPitchClient = NRTControlAdaptor; // Make it so that the non-real-time object can do its work in // a separate thread if requested using NRTThreadedPitchClient = NRTThreadingAdaptor; ``` The pattern for making the wrapped version for offline use is quite similar to what we've already seen: we make a `constexpr` variable that's specialising `ParameterDescriptorSet`. The difference here is that we're composing two new parameters with the existing set in `PitchParams`. These are types that we haven't seen before: `InputBufferParam` represents an buffer object and is read-only, and `BufferParam` is a writeable equivalent. So this is saying that we will replace our audio input with a buffer in the parameter `source` and our stream of output features with a buffer in the parameter `features`. The function `makeNRTParams` will also add, for each input buffer, a set of parameters for the offset and number of frames to process, and likewise for channels. Meaning we can do multichannel processing, and / or work just on sections of buffers. ```{admonition} Code Smell * I'm not sure why we need to explicitly wrap the real-time `Client` in a `ClientWrapper`. Seems like it could be done automagically. * The whole business with `makeNRTParams` is more boilerplate, which would go away if clients could report their I/O more effectively. ```