mirror of https://github.com/huggingface/diffusers.git synced 2025-12-06 20:44:33 +08:00

Files

Aryan a4df8dbc40 Update more licenses to 2025 (#11746 )

update

2025-06-19 07:46:01 +05:30

2.0 KiB

Raw Blame History

AutoencoderOobleck

The Oobleck variational autoencoder (VAE) model with KL loss was introduced in Stability-AI/stable-audio-tools and Stable Audio Open by Stability AI. The model is used in 🤗 Diffusers to encode audio waveforms into latents and to decode latent representations into audio waveforms.

The abstract from the paper is:

Open generative models are vitally important for the community, allowing for fine-tunes and serving as baselines when presenting new models. However, most current text-to-audio models are private and not accessible for artists and researchers to build upon. Here we describe the architecture and training process of a new open-weights text-to-audio model trained with Creative Commons data. Our evaluation shows that the model's performance is competitive with the state-of-the-art across various metrics. Notably, the reported FDopenl3 results (measuring the realism of the generations) showcase its potential for high-quality stereo sound synthesis at 44.1kHz.