A new way of music creation

We've spent 2.5 years building a generative music model and a brand-new interface for making music. We are a tiny team of self-taught researchers and seasoned engineers and designers. We all love and make music, and felt unsatisfied with how existing music models were deployed and used.

We started with a long, gruesome journey into building and refining the whole stack to pretrain and posttrain our music model on a shoestring budget. Some key lessons along the way: the trap of thinking that compression and reconstruction in the VAE / audio codec were the most important things, when what really matters is the downstream learnability of the latents, and that hyperparameter-optimal scaling laws are the best way to ablate training recipe and architecture experiments.

We finally reached a level of quality with our models that we believe punches well above its compute budget. Aphrodite, the audio codec (~3 kbps), Apollo, our diffusion transformer, and Virgil, our synthetic audio captioner.

Our next step was to rethink the HCI layer for music models. We rebuilt the DAW / timeline experience from first principles into a beginner friendly web GAW (Generative Audio Workstation) where inpainting, extending, remixing, multi track / stem editing are as intuitive as “painting” on the screen. We aimed for an experience that stays in touch with the spirit of creation, where the creator still walks out of a finished song feeling pride in what they made, and yet accessible enough for anyone to feel the joy of making music.

A new way of music creation