
Since only one filter in each stage actually needs to be computed on any given cycle, the implementation can be done with just three FIR filters (44-tap, 12-tap and 5 tap) with the coefficients updated from a ROM table for each decimator output computation cycle (this high efficiency approach would require a tightly synchronized state machine, while instead computing all internal filters would allow for a lot of slop with the internal timing above the minimum limits). This can be a very efficient approach since only one of the filters in each stage actually needs to be computed for each output (note that each filter within a group would contain the exact same data but the multiply and sum only needs to be done on one of them each time). The decimation is done by selecting the appropriate filter output associated with the computation cycle for each decimator output rate (for the first two stages this just ends up being that the commutators move back one sample after every output update, and the last stage moves forward one sample after every other update).

This would be identical in performance to the resampler above but can be done with an internal sampling rate as low as 67.2 KSps and significantly fewer overall computations. Interpolation and decimation resampling can also be accomplished by mapping the same filter coefficients as designed for the resampler shown above into polyphase structures as depicted in the diagram below. Further, in many resamplers (not this one due to the close ratio), the images to be rejected can be isolated to distinct frequency bands resulting in the use of multiband filters which the least squares algorithms support and further maximize rejection where it is needed most. The least squares algorithm ( firls in MATLAB/Octave and Python) provides an optimal solution for resampling applications, resulting in higher image rejection for a given number of taps. The filters could certainly be designed with windowed Sinc functions (this is known as the windowing approach to FIR filter design which is sub-optimal - see our further discussion here FIR Filter Design: Window vs Parks McClellan and Least Squares). For real-time application, the expected delay through the resampler would be 7.9 ms. To do this as shown where I use a requirement of 20 KHz audio bandwidth and 80 dB resampling image rejection, I estimate that 171 taps would be needed for FIR1, 95 taps for FIR2 and 25 taps for FIR3 (as linear phase filters so one multiplier for every 2 taps).

The intermediate blocks can run at any arbitrary higher sampling rate to keep up with the throughput and the input/output blocks are rate matched (consuming samples at 44.1KSps and providing output samples at the 48 KSps rate). This would be implemented with the following structure where the interpolator blocks signify insert of $I$ samples between each sample (up-sampling) and the decimation blocks signify selecting every $D$th sample and throwing away the rest (down-sampling). Interp by 4, decimate by 3, interp by 8, decimate by 7, interp by 5, decimate by 7.

The following demonstrates one approach to resample from 44.1KHz to 48KHz, where care has been taken to not reduce the sampling rate below 44.1KHz (if that matters for fidelity concerns) and the multiple stages simplifies the filtering needed: The greatest common divisor between the two rates is 300, thus to resample this exactly from 44.1KHz to 48KHz you would need to use the ratio $160/147$ (and the inverse for the other direction):

However if the transmitter and receiver are not synchronized then buffering will ultimately be needed (as further detailed at the end of this post). The consumption time and transmission time is identical: One second of data is still one second of data regardless of sampling time.
