Spectral Animation Compression
Minimalistic research notes on an experiment in lossy animation compression.
Motivation
In the game Sky: Children of the Light, a concert by singer Aurora was attended by over four thousand concurrent players. To avoid lag, player actions and animations were heavily restricted since loading all unique animations into RAM could overwhelm memory.
Here I suggest an alternative approach. We stream compressed animations from disk and decompress them on the fly. This would move the load onto compute instead of memory. How can this be done? From my experience documenting animation compression at Ubisoft, significant memory savings are already made with bit tricks, as most animation tracks are sparse or constant. However, quaternion tracks are a richer signal.
Quaternions are uncommon objects with properties like double-covering, unitarity, and non-commutativity. While their high correlation over time suggests frequency-based compression might work, they must be afforded special treatment if we are to approach the problem rigourously.
Quaternion Fourier Transform
If we are to treat quaternion-valued signals as first class citizens, then we need to define the Quaternion Fourier Transform. My reference is the book Quaternion and Clifford Fourier Transforms, by Eckard Hitzer.
Once this object is in hand, we will be able reach out to the vast world of signal processing, recovering for example an Uncertainty Principle that can tell us the tradeoff between animation segment length and frequency resolution.
This will allow us to derive a Short-time Quaternion Fourier Transform, which is essential since we intend to slice up an animation into many streamable segments.
The plan is to selectively zero-out particular frequency coefficients of the animation track.
Random Notes
The point is that we need the equivalent of a frequency for elements of \(SO(3)\). I know that for locally compact Abelian groups there exists the Pontryagin dual, which generalises the Fourier transform, but quaternions are not Abelian.
I have seen papers around that claim a well-defined quaternion Fourier transform, but I cannot judge whether it makes mathematical sense. I have also seen the words "representation theory" used in the same context as Fourier transforms, so I will definitely look into spherical harmonics.
Finally, another question is how to actually interpret the amplitudes, frequences and phases of the basis functions. It would be nice for them to simply correspond to something like angular velocity around an axis but it is not so clear to me why that should be the case.
This is important because we may decide to perform some kind of band-limiting or perform a psycho-perceptual frequency masking to get better compression ratios. So it would be nice to be able to have some intuition about which frequences to mask.
Further Questions
A user on Hacker News remarked that audio compression has dedicated hardware support. The question arises, could the hardware be hijacked for motion data?
Research Notes
(25/11/2024)
- Practiced compressing synthetic sinusoidal data with a simple frequency ratio drop-out algorithm.
- Acquired a book on Quaternion and Clifford Transforms.
(24/11/2024)
- Illustrated a proof of the Cartan-Dieudonné theorem.
(23/11/2024)
- Implemented a Quaternion class.
- Performed some basic testing with the unittest framework.
- Learned about animation data sources: CMU and La Forge.
- Checked out the c3d file format and ezc3d package for motion data.
Image adapted from Chem Libre Texts
Last edited 23/11/2024