- Modify offload_models function to handle DiffusionPipeline correctly
- Ensure compatibility with both single and multiple module inputs
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* CogView4: remove SiLU in final AdaLN (match Megatron); add switch to AdaLayerNormContinuous; split temb_raw/temb_blocks
* CogView4: remove SiLU in final AdaLN (match Megatron); add switch to AdaLayerNormContinuous; split temb_raw/temb_blocks
* CogView4: remove SiLU in final AdaLN (match Megatron); add switch to AdaLayerNormContinuous; split temb_raw/temb_blocks
* CogView4: use local final AdaLN (no SiLU) per review; keep generic AdaLN unchanged
* re-add configs as normal files (no LFS)
* Apply suggestions from code review
* Apply style fixes
---------
Co-authored-by: 武嘉涵 <lambert@wujiahandeMacBook-Pro.local>
Co-authored-by: Aryan <contact.aryanvs@gmail.com>
Co-authored-by: Aryan <aryan@huggingface.co>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* try to use deepseek with an agent to auto i18n to zh
Signed-off-by: SamYuan1990 <yy19902439@126.com>
* add two more docs
Signed-off-by: SamYuan1990 <yy19902439@126.com>
* fix, updated some prompt for better translation
Signed-off-by: SamYuan1990 <yy19902439@126.com>
* Try to passs CI check
Signed-off-by: SamYuan1990 <yy19902439@126.com>
* fix up for human review process
Signed-off-by: SamYuan1990 <yy19902439@126.com>
* fix up
Signed-off-by: SamYuan1990 <yy19902439@126.com>
* fix review comments
Signed-off-by: SamYuan1990 <yy19902439@126.com>
---------
Signed-off-by: SamYuan1990 <yy19902439@126.com>
* Initial commit implementing frequency-decoupled guidance (FDG) as a guider
* Update FrequencyDecoupledGuidance docstring to describe FDG
* Update project so that it accepts any number of non-batch dims
* Change guidance_scale and other params to accept a list of params for each freq level
* Add comment with Laplacian pyramid shapes
* Add function to import_utils to check if the kornia package is available
* Only import from kornia if package is available
* Fix bug: use pred_cond/uncond in freq space rather than data space
* Allow guidance rescaling to be done in data space or frequency space (speculative)
* Add kornia install instructions to kornia import error message
* Add config to control whether operations are upcast to fp64
* Add parallel_weights recommended values to docstring
* Apply style fixes
* make fix-copies
---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Aryan <aryan@huggingface.co>
* feat: support lora in qwen image and training script
* up
* up
* up
* up
* up
* up
* add lora tests
* fix
* add tests
* fix
* reviewer feedback
* up[
* Apply suggestions from code review
Co-authored-by: Aryan <aryan@huggingface.co>
---------
Co-authored-by: Aryan <aryan@huggingface.co>
[Examples] uniform naming notations
since the in parameter `size` represents `args.resolution`, I thus replace the `args.resolution` inside DreamBoothData with `size`. And revise some notations such as `center_crop`.
Co-authored-by: Linoy Tsaban <57615435+linoytsaban@users.noreply.github.com>
* style
* Fix class name casing for SkyReelsV2 components in multiple files to ensure consistency and correct functionality.
* cleaning
* cleansing
* Refactor `get_timestep_embedding` to move modifications into `SkyReelsV2TimeTextImageEmbedding`.
* Remove unnecessary line break in `get_timestep_embedding` function for cleaner code.
* Remove `skyreels_v2` entry from `_import_structure` and update its initialization to directly assign the list of SkyReelsV2 components.
* cleansing
* Refactor attention processing in `SkyReelsV2AttnProcessor2_0` to always convert query, key, and value to `torch.bfloat16`, simplifying the code and improving clarity.
* Enhance example usage in `pipeline_skyreels_v2_diffusion_forcing.py` by adding VAE initialization and detailed prompt for video generation, improving clarity and usability of the documentation.
* Refactor import structure in `__init__.py` for SkyReelsV2 components and improve formatting in `pipeline_skyreels_v2_diffusion_forcing.py` to enhance code readability and maintainability.
* Update `guidance_scale` parameter in `SkyReelsV2DiffusionForcingPipeline` from 5.0 to 6.0 to enhance video generation quality.
* Update `guidance_scale` parameter in example documentation and class definition of `SkyReelsV2DiffusionForcingPipeline` to ensure consistency and improve video generation quality.
* Update `causal_block_size` parameter in `SkyReelsV2DiffusionForcingPipeline` to default to `None`.
* up
* Fix dtype conversion for `timestep_proj` in `SkyReelsV2Transformer3DModel` to *ensure* correct tensor operations.
* Optimize causal mask generation by replacing repeated tensor with `repeat_interleave` for improved efficiency in `SkyReelsV2Transformer3DModel`.
* style
* Enhance example documentation in `SkyReelsV2DiffusionForcingPipeline` with guidance scale and shift parameters for T2V and I2V. Remove unused `retrieve_latents` function to streamline the code.
* Refactor sample scheduler creation in `SkyReelsV2DiffusionForcingPipeline` to use `deepcopy` for improved state management during inference steps.
* Enhance error handling and documentation in `SkyReelsV2DiffusionForcingPipeline` for `overlap_history` and `addnoise_condition` parameters to improve long video generation guidance.
* Update documentation and progress bar handling in `SkyReelsV2DiffusionForcingPipeline` to clarify asynchronous inference settings and improve progress tracking during denoising steps.
* Refine progress bar calculation in `SkyReelsV2DiffusionForcingPipeline` by rounding the step size to one decimal place for improved readability during denoising steps.
* Update import statements in `SkyReelsV2DiffusionForcingPipeline` documentation for improved clarity and organization.
* Refactor progress bar handling in `SkyReelsV2DiffusionForcingPipeline` to use total steps instead of calculated step size.
* update templates for i2v, v2v
* Add `retrieve_latents` function to streamline latent retrieval in `SkyReelsV2DiffusionForcingPipeline`. Update video latent processing to utilize this new function for improved clarity and maintainability.
* Add `retrieve_latents` function to both i2v and v2v pipelines for consistent latent retrieval. Update video latent processing to utilize this function, enhancing clarity and maintainability across the SkyReelsV2DiffusionForcingPipeline implementations.
* Remove redundant ValueError for `overlap_history` in `SkyReelsV2DiffusionForcingPipeline` to streamline error handling and improve user guidance for long video generation.
* Update default video dimensions and flow matching scheduler parameter in `SkyReelsV2DiffusionForcingPipeline` to enhance video generation capabilities.
* Refactor `SkyReelsV2DiffusionForcingPipeline` to support Image-to-Video (i2v) generation. Update class name, add image encoding functionality, and adjust parameters for improved video generation. Enhance error handling for image inputs and update documentation accordingly.
* Improve organization for image-last_image condition.
* Refactor `SkyReelsV2DiffusionForcingImageToVideoPipeline` to improve latent preparation and video condition handling integration.
* style
* style
* Add example usage of PIL for image input in `SkyReelsV2DiffusionForcingImageToVideoPipeline` documentation.
* Refactor `SkyReelsV2DiffusionForcingPipeline` to `SkyReelsV2DiffusionForcingVideoToVideoPipeline`, enhancing support for Video-to-Video (v2v) generation. Introduce video input handling, update latent preparation logic, and improve error handling for input parameters.
* Refactor `SkyReelsV2DiffusionForcingImageToVideoPipeline` by removing the `image_encoder` and `image_processor` dependencies. Update the CPU offload sequence accordingly.
* Refactor `SkyReelsV2DiffusionForcingImageToVideoPipeline` to enhance latent preparation logic and condition handling. Update image input type to `Optional`, streamline video condition processing, and improve handling of `last_image` during latent generation.
* Enhance `SkyReelsV2DiffusionForcingPipeline` by refining latent preparation for long video generation. Introduce new parameters for video handling, overlap history, and causal block size. Update logic to accommodate both short and long video scenarios, ensuring compatibility and improved processing.
* refactor
* fix num_frames
* fix prefix_video_latents
* up
* refactor
* Fix typo in scheduler method call within `SkyReelsV2DiffusionForcingVideoToVideoPipeline` to ensure proper noise scaling during latent generation.
* up
* Enhance `SkyReelsV2DiffusionForcingImageToVideoPipeline` by adding support for `last_image` parameter and refining latent frame calculations. Update preprocessing logic.
* add statistics
* Refine latent frame handling in `SkyReelsV2DiffusionForcingImageToVideoPipeline` by correcting variable names and reintroducing latent mean and standard deviation calculations. Update logic for frame preparation and sampling to ensure accurate video generation.
* up
* refactor
* up
* Refactor `SkyReelsV2DiffusionForcingVideoToVideoPipeline` to improve latent handling by enforcing tensor input for video, updating frame preparation logic, and adjusting default frame count. Enhance preprocessing and postprocessing steps for better integration.
* style
* fix vae output indexing
* upup
* up
* Fix tensor concatenation and repetition logic in `SkyReelsV2DiffusionForcingImageToVideoPipeline` to ensure correct dimensionality for video conditions and latent conditions.
* Refactor latent retrieval logic in `SkyReelsV2DiffusionForcingVideoToVideoPipeline` to handle tensor dimensions more robustly, ensuring compatibility with both 3D and 4D video inputs.
* Enhance logging in `SkyReelsV2DiffusionForcing` pipelines by adding iteration print statements for better debugging. Clean up unused code related to prefix video latents length calculation in `SkyReelsV2DiffusionForcingImageToVideoPipeline`.
* Update latent handling in `SkyReelsV2DiffusionForcingImageToVideoPipeline` to conditionally set latents based on video iteration state, improving flexibility for video input processing.
* Refactor `SkyReelsV2TimeTextImageEmbedding` to utilize `get_1d_sincos_pos_embed_from_grid` for timestep projection.
* Enhance `get_1d_sincos_pos_embed_from_grid` function to include an optional parameter `flip_sin_to_cos` for flipping sine and cosine embeddings, improving flexibility in positional embedding generation.
* Update timestep projection in `SkyReelsV2TimeTextImageEmbedding` to include `flip_sin_to_cos` parameter, enhancing the flexibility of time embedding generation.
* Refactor tensor type handling in `SkyReelsV2AttnProcessor2_0` and `SkyReelsV2TransformerBlock` to ensure consistent use of `torch.float32` and `torch.bfloat16`, improving integration.
* Update tensor type in `SkyReelsV2RotaryPosEmbed` to use `torch.float32` for frequency calculations, ensuring consistency in data types across the model.
* Refactor `SkyReelsV2TimeTextImageEmbedding` to utilize automatic mixed precision for timestep projection.
* down
* down
* style
* Add debug tensor tracking to `SkyReelsV2Transformer3DModel` for enhanced debugging and output analysis; update `Transformer2DModelOutput` to include debug tensors.
* up
* Refactor indentation in `SkyReelsV2AttnProcessor2_0` to improve code readability and maintain consistency in style.
* Convert query, key, and value tensors to bfloat16 in `SkyReelsV2AttnProcessor2_0` for improved performance.
* Add debug print statements in `SkyReelsV2TransformerBlock` to track tensor shapes and values for improved debugging and analysis.
* debug
* debug
* Remove commented-out debug tensor tracking from `SkyReelsV2TransformerBlock`
* Add functionality to save processed video latents as a Safetensors file in `SkyReelsV2DiffusionForcingPipeline`.
* up
* Add functionality to save output latents as a Safetensors file in `SkyReelsV2DiffusionForcingPipeline`.
* up
* Remove additional commented-out debug tensor tracking from `SkyReelsV2TransformerBlock` and `SkyReelsV2Transformer3DModel` for cleaner code.
* style
* cleansing
* Update example documentation and parameters in `SkyReelsV2Pipeline`. Adjusted example code for loading models, modified default values for height, width, num_frames, and guidance_scale, and improved output video quality settings.
* Update shift parameter in example documentation and default values across SkyReels V2 pipelines. Adjusted shift values for I2V from 3.0 to 5.0 and updated related example code for consistency.
* Update example documentation in SkyReels V2 pipelines to include available model options and update model references for loading. Adjusted model names to reflect the latest versions across I2V, V2V, and T2V pipelines.
* Add test templates
* style
* Add docs template
* Add SkyReels V2 Diffusion Forcing Video-to-Video Pipeline to imports
* style
* fix-copies
* convert i2v 1.3b
* Update transformer configuration to include `image_dim` for SkyReels V2 models and refactor imports to use `SkyReelsV2Transformer3DModel`.
* Refactor transformer import in SkyReels V2 pipeline to use `SkyReelsV2Transformer3DModel` for consistency.
* Update transformer configuration in SkyReels V2 to increase `in_channels` from 16 to 36 for i2v conf.
* Update transformer configuration in SkyReels V2 to set `added_kv_proj_dim` values for different model types.
* up
* up
* up
* Add SkyReelsV2Pipeline support for T2V model type in conversion script
* upp
* Refactor model type checks in conversion script to use substring matching for improved flexibility
* upp
* Fix shard path formatting in conversion script to accommodate varying model types by dynamically adjusting zero padding.
* Update sharded safetensors loading logic in conversion script to use substring matching for model directory checks
* Update scheduler parameters in SkyReels V2 test files for consistency across image and video pipelines
* Refactor conversion script to initialize text encoder, tokenizer, and scheduler for SkyReels pipelines, enhancing model integration
* style
* Update documentation for SkyReels-V2, introducing the Infinite-length Film Generative model, enhancing text-to-video generation examples, and updating model references throughout the API documentation.
* Add SkyReelsV2Transformer3DModel and FlowMatchUniPCMultistepScheduler documentation, updating TOC and introducing new model and scheduler files.
* style
* Update documentation for SkyReelsV2DiffusionForcingPipeline to correct flow matching scheduler parameter for I2V from 3.0 to 5.0, ensuring clarity in usage examples.
* Add documentation for causal_block_size parameter in SkyReelsV2DF pipelines, clarifying its role in asynchronous inference.
* Simplify min_ar_step calculation in SkyReelsV2DiffusionForcingPipeline to improve clarity.
* style and fix-copies
* style
* Add documentation for SkyReelsV2Transformer3DModel
Introduced a new markdown file detailing the SkyReelsV2Transformer3DModel, including usage instructions and model output specifications.
* Update test configurations for SkyReelsV2 pipelines
- Adjusted `in_channels` from 36 to 16 in `test_skyreels_v2_df_image_to_video.py`.
- Added new parameters: `overlap_history`, `num_frames`, and `base_num_frames` in `test_skyreels_v2_df_video_to_video.py`.
- Updated expected output shape in video tests from (17, 3, 16, 16) to (41, 3, 16, 16).
* Refines SkyReelsV2DF test parameters
* Update src/diffusers/models/modeling_outputs.py
Co-authored-by: Aryan <contact.aryanvs@gmail.com>
* Refactor `grid_sizes` processing by using already-calculated post-patch parameters to simplify
* Update docs/source/en/api/pipelines/skyreels_v2.md
Co-authored-by: Aryan <contact.aryanvs@gmail.com>
* Refactor parameter naming for diffusion forcing in SkyReelsV2 pipelines
- Changed `flag_df` to `enable_diffusion_forcing` for clarity in the SkyReelsV2Transformer3DModel and associated pipelines.
- Updated all relevant method calls to reflect the new parameter name.
* Revert _toctree.yml to adjust section expansion states
* style
* Update docs/source/en/api/models/skyreels_v2_transformer_3d.md
Co-authored-by: YiYi Xu <yixu310@gmail.com>
* Add copying label to SkyReelsV2ImageEmbedding from WanImageEmbedding.
* Refactor transformer block processing in SkyReelsV2Transformer3DModel
- Ensured proper handling of hidden states during both gradient checkpointing and standard processing.
* Update SkyReels V2 documentation to remove VRAM requirement and streamline imports
- Removed the mention of ~13GB VRAM requirement for the SkyReels-V2 model.
- Simplified import statements by removing unused `load_image` import.
* Add SkyReelsV2LoraLoaderMixin for loading and managing LoRA layers in SkyReelsV2Transformer3DModel
- Introduced SkyReelsV2LoraLoaderMixin class to handle loading, saving, and fusing of LoRA weights specific to the SkyReelsV2 model.
- Implemented methods for state dict management, including compatibility checks for various LoRA formats.
- Enhanced functionality for loading weights with options for low CPU memory usage and hotswapping.
- Added detailed docstrings for clarity on parameters and usage.
* Update SkyReelsV2 documentation and loader mixin references
- Corrected the documentation to reference the new `SkyReelsV2LoraLoaderMixin` for loading LoRA weights.
- Updated comments in the `SkyReelsV2LoraLoaderMixin` class to reflect changes in model references from `WanTransformer3DModel` to `SkyReelsV2Transformer3DModel`.
* Enhance SkyReelsV2 integration by adding SkyReelsV2LoraLoaderMixin references
- Added `SkyReelsV2LoraLoaderMixin` to the documentation and loader imports for improved LoRA weight management.
- Updated multiple pipeline classes to inherit from `SkyReelsV2LoraLoaderMixin` instead of `WanLoraLoaderMixin`.
* Update SkyReelsV2 model references in documentation
- Replaced placeholder model paths with actual paths for SkyReels-V2 models in multiple pipeline files.
- Ensured consistency across the documentation for loading models in the SkyReelsV2 pipelines.
* style
* fix-copies
* Refactor `fps_projection` in `SkyReelsV2Transformer3DModel`
- Replaced the sequential linear layers for `fps_projection` with a `FeedForward` layer using `SiLU` activation for better integration.
* Update docs
* Refactor video processing in SkyReelsV2DiffusionForcingPipeline
- Renamed parameters for clarity: `video` to `video_latents` and `overlap_history` to `overlap_history_latent_frames`.
- Updated logic for handling long video generation, including adjustments to latent frame calculations and accumulation.
- Consolidated handling of latents for both long and short video generation scenarios.
- Final decoding step now consistently converts latents to pixels, ensuring proper output format.
* Update activation function in `fps_projection` of `SkyReelsV2Transformer3DModel`
- Changed activation function from `silu` to `linear-silu` in the `fps_projection` layer for improved performance and integration.
* Add fps_projection layer renaming in convert_skyreelsv2_to_diffusers.py
- Updated key mappings for the `fps_projection` layer to align with new naming conventions, ensuring consistency in model integration.
* Fix fps_projection assignment in SkyReelsV2Transformer3DModel
- Corrected the assignment of the `fps_projection` layer to ensure it is properly cast to the appropriate data type, enhancing model functionality.
* Update _keep_in_fp32_modules in SkyReelsV2Transformer3DModel
- Added `fps_projection` to the list of modules that should remain in FP32 precision, ensuring proper handling of data types during model operations.
* Remove integration test classes from SkyReelsV2 test files
- Deleted the `SkyReelsV2DiffusionForcingPipelineIntegrationTests` and `SkyReelsV2PipelineIntegrationTests` classes along with their associated setup, teardown, and test methods, as they were not implemented and not needed for current testing.
* style
* Refactor: Remove hardcoded `torch.bfloat16` cast in attention
* Refactor: Simplify data type handling in transformer model
Removes unnecessary data type conversions for the FPS embedding and timestep projection.
This change simplifies the forward pass by relying on the inherent data types of the tensors.
* Refactor: Remove `fps_projection` from `_keep_in_fp32_modules` in `SkyReelsV2Transformer3DModel`
* Update src/diffusers/models/transformers/transformer_skyreels_v2.py
Co-authored-by: Aryan <contact.aryanvs@gmail.com>
* Refactor: Remove unused flags and simplify attention mask handling in SkyReelsV2AttnProcessor2_0 and SkyReelsV2Transformer3DModel
Refactor: Simplify causal attention logic in SkyReelsV2
Removes the `flag_causal_attention` and `_flag_ar_attention` flags to simplify the implementation.
The decision to apply a causal attention mask is now based directly on the `num_frame_per_block` configuration, eliminating redundant flags and conditional checks. This streamlines the attention mechanism and simplifies the `set_ar_attention` methods.
* Refactor: Clarify variable names for latent frames
Renames `base_num_frames` to `base_latent_num_frames` to make it explicit that the variable refers to the number of frames in the latent space.
This change improves code readability and reduces potential confusion between latent frames and decoded video frames.
The `num_frames` parameter in `generate_timestep_matrix` is also renamed to `num_latent_frames` for consistency.
* Enhance documentation: Add detailed docstring for timestep matrix generation in SkyReelsV2DiffusionForcingPipeline
* Docs: Clarify long video chunking in pipeline docstring
Improves the explanation of long video processing within the pipeline's docstring.
The update replaces the abstract description with a concrete example, illustrating how the sliding window mechanism works with overlapping chunks. This makes the roles of `base_num_frames` and `overlap_history` clearer for users.
* Docs: Move visual demonstration and processing details for SkyReelsV2DiffusionForcingPipeline to docs page from the code
* Docs: Update asynchronous processing timeline and examples for long video handling in SkyReels-V2 documentation
* Enhance timestep matrix generation documentation and logic for synchronous/asynchronous video processing
* Update timestep matrix documentation and enhance analysis for clarity in SkyReelsV2DiffusionForcingPipeline
* Docs: Update visual demonstration section and add detailed step matrix construction example for asynchronous processing in SkyReelsV2DiffusionForcingPipeline
* style
* fix-copies
* Refactor parameter names for clarity in SkyReelsV2DiffusionForcingImageToVideoPipeline and SkyReelsV2DiffusionForcingVideoToVideoPipeline
* Refactor: Avoid VAE roundtrip in long video generation
Improves performance and quality for long video generation by operating entirely in latent space during the iterative generation process.
Instead of decoding latents to video and then re-encoding the overlapping section for the next chunk, this change passes the generated latents directly between iterations.
This avoids a computationally expensive and potentially lossy VAE decode/encode cycle within the loop. The full video is now decoded only once from the accumulated latents at the end of the process.
* Refactor: Rename prefix_video_latents_length to prefix_video_latents_frames for clarity
* Refactor: Rename num_latent_frames to current_num_latent_frames for clarity in SkyReelsV2DiffusionForcingImageToVideoPipeline
* Refactor: Enhance long video generation logic and improve latent handling in SkyReelsV2DiffusionForcingImageToVideoPipeline
Refactor: Unify video generation and pass latents directly
Unifies the separate code paths for short and long video generation into a single, streamlined loop.
This change eliminates the inefficient decode-encode cycle during long video generation. Instead of converting latents to pixel-space video between chunks, the pipeline now passes the generated latents directly to the next iteration.
This improves performance, avoids potential quality loss from intermediate VAE steps, and enhances code maintainability by removing significant duplication.
* style
* Refactor: Remove overlap_history parameter and streamline long video generation logic in SkyReelsV2DiffusionForcingImageToVideoPipeline
Refactor: Streamline long video generation logic
Removes the `overlap_history` parameter and simplifies the conditioning process for long video generation.
This change avoids a redundant VAE encoding step by directly using latent frames from the previous chunk for conditioning. It also moves image preprocessing outside the main generation loop to prevent repeated computations and clarifies the handling of prefix latents.
* style
* Refactor latent handling in i2v diffusion forcing pipeline
Improves the latent conditioning and accumulation logic within the image-to-video diffusion forcing loop.
- Corrects the splitting of the initial conditioning tensor to robustly handle both even and odd lengths.
- Simplifies how latents are accumulated across iterations for long video generation.
- Ensures the final latents are trimmed correctly before decoding only when a `last_image` is provided.
* Refactor: Remove overlap_history parameter from SkyReelsV2DiffusionForcingImageToVideoPipeline
* Refactor: Adjust video_latents parameter handling in prepare_latents method
* style
* Refactor: Update long video iteration print statements for clarity
* Fix: Update transformer config with dynamic causal block size
Updates the SkyReelsV2 pipelines to correctly set the `causal_block_size` in the transformer's configuration when it's provided during a pipeline call.
This ensures the model configuration reflects the user's specified setting for the inference run. The `set_ar_attention` method is also renamed to `_set_ar_attention` to mark it as an internal helper.
* style
* Refactor: Adjust video input size and expected output shape in inference test
* Refactor: Rename video variables for clarity in SkyReelsV2DiffusionForcingVideoToVideoPipeline
* Docs: Clarify time embedding logic in SkyReelsV2
Adds comments to explain the handling of different time embedding tensor dimensions.
A 2D tensor is used for standard models with a single time embedding per batch, while a 3D tensor is used for Diffusion Forcing models where each frame has its own time embedding. This clarifies the expected input for different model variations.
* Docs: Update SkyReels V2 pipeline examples
Updates the docstring examples for the SkyReels V2 pipelines to reflect current best practices and API changes.
- Removes the `shift` parameter from pipeline call examples, as it is now configured directly on the scheduler.
- Replaces the `set_ar_attention` method call with the `causal_block_size` argument in the pipeline call for diffusion forcing examples.
- Adjusts recommended parameters for I2V and V2V examples, including inference steps, guidance scale, and `ar_step`.
* Refactor: Remove `shift` parameter from SkyReelsV2 pipelines
Removes the `shift` parameter from the call signature of all SkyReelsV2 pipelines.
This parameter is a scheduler-specific configuration and should be set directly on the scheduler during its initialization, rather than being passed at runtime through the pipeline. This change simplifies the pipeline API.
Usage examples are updated to reflect that the `shift` value should now be passed when creating the `FlowMatchUniPCMultistepScheduler`.
* Refactors SkyReelsV2 image-to-video tests and adds last image case
Simplifies the test suite by removing a duplicated test class and streamlining the dummy component and input generation.
Adds a new test to verify the pipeline's behavior when a `last_image` is provided as input for conditioning.
* test: Add image components to SkyReelsV2 pipeline test
Adds the `image_encoder` and `image_processor` to the test components for the image-to-video pipeline.
Also replaces a hardcoded value for the positional embedding sequence length with a more descriptive calculation, improving clarity.
* test: Add callback configuration test for SkyReelsV2DiffusionForcingVideoToVideoPipeline
test: Add callback test for SkyReelsV2DFV2V pipeline
Adds a test to validate the callback functionality for the `SkyReelsV2DiffusionForcingVideoToVideoPipeline`.
This test confirms that `callback_on_step_end` is invoked correctly and can modify the pipeline's state during inference. It uses a callback to dynamically increase the `guidance_scale` and asserts that the final value is as expected.
The implementation correctly accounts for the nested denoising loops present in diffusion forcing pipelines.
* style
* fix: Update image_encoder type to CLIPVisionModelWithProjection in SkyReelsV2ImageToVideoPipeline
* UP
* Add conversion support for SkyReels-V2-FLF2V models
Adds configurations for three new FLF2V model variants (1.3B-540P, 14B-540P, and 14B-720P) to the conversion script.
This change also introduces specific handling to zero out the image positional embeddings for these models and updates the main script to correctly initialize the image-to-video pipeline.
* Docs: Update and simplify SkyReels V2 usage examples
Simplifies the text-to-video example by removing the manual group offloading configuration, making it more straightforward.
Adds comments to pipeline parameters to clarify their purpose and provides guidance for different resolutions and long video generation.
Introduces a new section with a code example for the video-to-video pipeline.
* style
* docs: Add SkyReels-V2 FLF2V 1.3B model to supported models list
* docs: Update SkyReels-V2 documentation
* Move the initialization of the `gradient_checkpointing` attribute to its suggested location.
* Refactor: Use logger for long video progress messages
Replaces `print()` calls with `logger.debug()` for reporting progress during long video generation in SkyReelsV2DF pipelines.
This change reduces console output verbosity for standard runs while allowing developers to view progress by enabling debug-level logging.
* Refactor SkyReelsV2 timestep embedding into a module
Extract the sinusoidal timestep embedding logic into a new `SkyReelsV2Timesteps` `nn.Module`.
This change encapsulates the embedding generation, which simplifies the `SkyReelsV2TimeTextImageEmbedding` class and improves code modularity.
* Fix: Preserve original shape in timestep embeddings
Reshapes the timestep embedding tensor to match the original input shape.
This ensures that batched timestep inputs retain their batch dimension after embedding, preventing potential shape mismatches.
* style
* Refactor: Move SkyReelsV2Timesteps to model file
Colocates the `SkyReelsV2Timesteps` class with the SkyReelsV2 transformer model.
This change moves model-specific timestep embedding logic from the general embeddings module to the transformer's own file, improving modularity and making the model more self-contained.
* Refactor parameter dtype retrieval to use utility function
Replaces manual parameter iteration with the `get_parameter_dtype` helper to determine the time embedder's data type.
This change improves code readability and centralizes the logic.
* Add comments to track the tensor shape transformations
* Add copied froms
* style
* fix-copies
* up
* Remove FlowMatchUniPCMultistepScheduler
Deletes the `FlowMatchUniPCMultistepScheduler` as it is no longer being used.
* Refactor: Replace FlowMatchUniPC scheduler with UniPC
Removes the `FlowMatchUniPCMultistepScheduler` and integrates its functionality into the existing `UniPCMultistepScheduler`.
This consolidation is achieved by using the `use_flow_sigmas=True` parameter in `UniPCMultistepScheduler`, simplifying the scheduler API and reducing code duplication. All usages, documentation, and tests are updated accordingly.
* style
* Remove text_encoder parameter from SkyReelsV2DiffusionForcingPipeline initialization
* Docs: Rename `pipe` to `pipeline` in SkyReels examples
Updates the variable name from `pipe` to `pipeline` across all SkyReels V2 documentation examples. This change improves clarity and consistency.
* Fix: Rename shift parameter to flow_shift in SkyReels-V2 examples
* Fix: Rename shift parameter to flow_shift in example documentation across SkyReels-V2 files
* Fix: Rename shift parameter to flow_shift in UniPCMultistepScheduler initialization across SkyReels test files
* Removes unused generator argument from scheduler step
The `generator` parameter is not used by the scheduler's `step` method within the SkyReelsV2 diffusion forcing pipelines. This change removes the unnecessary argument from the method call for code clarity and consistency.
* Fix: Update time_embedder_dtype assignment to use the first parameter's dtype in SkyReelsV2TimeTextImageEmbedding
* style
* Refactor: Use get_parameter_dtype utility function
Replaces manual parameter iteration with the `get_parameter_dtype` helper.
* Fix: Prevent (potential) error in parameter dtype check
Adds a check to ensure the `_keep_in_fp32_modules` attribute exists on a parameter before it is accessed.
This prevents a potential `AttributeError`, making the utility function more robust when used with models that do not define this attribute.
---------
Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: Aryan <contact.aryanvs@gmail.com>
* Update pipeline_onnx_stable_diffusion.py to remove float64
init_noise_sigma was being set as float64 before multiplying with latents, which changed latents into float64 too, which caused errors with onnxruntime since the latter wanted float16.
* Update pipeline_onnx_stable_diffusion_inpaint.py to remove float64
init_noise_sigma was being set as float64 before multiplying with latents, which changed latents into float64 too, which caused errors with onnxruntime since the latter wanted float16.
* Update pipeline_onnx_stable_diffusion_upscale.py to remove float64
init_noise_sigma was being set as float64 before multiplying with latents, which changed latents into float64 too, which caused errors with onnxruntime since the latter wanted float16.
* Update pipeline_onnx_stable_diffusion.py with comment for previous commit
Added comment on purpose of init_noise_sigma. This comment exists in related scripts that use the same line of code, but it was missing here.
---------
Co-authored-by: YiYi Xu <yixu310@gmail.com>
* remove k-diffusion as we don't use it from the core.
* Revert "remove k-diffusion as we don't use it from the core."
This reverts commit 8bc86925a0.
* pin k-diffusion
* FIX set_lora_device when target layers differ
Resolves#11833
Fixes a bug that occurs after calling set_lora_device when multiple LoRA
adapters are loaded that target different layers.
Note: Technically, the accompanying test does not require a GPU because
the bug is triggered even if the parameters are already on the
corresponding device, i.e. loading on CPU and then changing the device
to CPU is sufficient to cause the bug. However, this may be optimized
away in the future, so I decided to test with GPU.
* Update docstring to warn about device mismatch
* Extend docstring with an example
* Fix docstring
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* ENH Improve speed of expanding LoRA scales
Resolves#11816
The following call proved to be a bottleneck when setting a lot of LoRA
adapters in diffusers:
cdaf84a708/src/diffusers/loaders/peft.py (L482)
This is because we would repeatedly call unet.state_dict(), even though
in the standard case, it is not necessary:
cdaf84a708/src/diffusers/loaders/unet_loader_utils.py (L55)
This PR fixes this by deferring this call, so that it is only run when
it's necessary, not earlier.
* Small fix
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* feat: use exclude modules to loraconfig.
* version-guard.
* tests and version guard.
* remove print.
* describe the test
* more detailed warning message + shift to debug
* update
* update
* update
* remove test
* ⚡️ Speed up method `AutoencoderKLWan.clear_cache` by 886%
**Key optimizations:**
- Compute the number of `WanCausalConv3d` modules in each model (`encoder`/`decoder`) **only once during initialization**, store in `self._cached_conv_counts`. This removes unnecessary repeated tree traversals at every `clear_cache` call, which was the main bottleneck (from profiling).
- The internal helper `_count_conv3d_fast` is optimized via a generator expression with `sum` for efficiency.
All comments from the original code are preserved, except for updated or removed local docstrings/comments relevant to changed lines.
**Function signatures and outputs remain unchanged.**
* Apply style fixes
* Apply suggestions from code review
Co-authored-by: Aryan <contact.aryanvs@gmail.com>
* Apply style fixes
---------
Co-authored-by: codeflash-ai[bot] <148906541+codeflash-ai[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Aryan <aryan@huggingface.co>
Co-authored-by: Aryan <contact.aryanvs@gmail.com>
Co-authored-by: Aseem Saxena <aseem.bits@gmail.com>
* Add Pruna optimization framework documentation
- Introduced a new section for Pruna in the table of contents.
- Added comprehensive documentation for Pruna, detailing its optimization techniques, installation instructions, and examples for optimizing and evaluating models
* Enhance Pruna documentation with image alt text and code block formatting
- Added alt text to images for better accessibility and context.
- Changed code block syntax from diff to python for improved clarity.
* Add installation section to Pruna documentation
- Introduced a new installation section in the Pruna documentation to guide users on how to install the framework.
- Enhanced the overall clarity and usability of the documentation for new users.
* Update pruna.md
* Update pruna.md
* Update Pruna documentation for model optimization and evaluation
- Changed section titles for consistency and clarity, from "Optimizing models" to "Optimize models" and "Evaluating and benchmarking optimized models" to "Evaluate and benchmark models".
- Enhanced descriptions to clarify the use of `diffusers` models and the evaluation process.
- Added a new example for evaluating standalone `diffusers` models.
- Updated references and links for better navigation within the documentation.
* Refactor Pruna documentation for clarity and consistency
- Removed outdated references to FLUX-juiced and streamlined the explanation of benchmarking.
- Enhanced the description of evaluating standalone `diffusers` models.
- Cleaned up code examples by removing unnecessary imports and comments for better readability.
* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Enhance Pruna documentation with new examples and clarifications
- Added an image to illustrate the optimization process.
- Updated the explanation for sharing and loading optimized models on the Hugging Face Hub.
- Clarified the evaluation process for optimized models using the EvaluationAgent.
- Improved descriptions for defining metrics and evaluating standalone diffusers models.
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* support text-to-image
* update example
* make fix-copies
* support use_flow_sigmas in EDM scheduler instead of maintain cosmos-specific scheduler
* support video-to-world
* update
* rename text2image pipeline
* make fix-copies
* add t2i test
* add test for v2w pipeline
* support edm dpmsolver multistep
* update
* update
* update
* update tests
* fix tests
* safety checker
* make conversion script work without guardrail
* add clarity in documentation for device_map
* docs
* fix how compiler tester mixins are used.
* propagate
* more
* typo.
* fix tests
* fix order of decroators.
* clarify more.
* more test cases.
* fix doc
* fix device_map docstring in pipeline_utils.
* more examples
* more
* update
* remove code for stuff that is already supported.
* fix stuff.
* allow loading from repo with dot in name
* put new arg at the end to avoid breaking compatibility
* add test for loading repo with dot in name
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Update pipeline_flux_inpaint.py to fix padding_mask_crop returning only the inpainted area and not the entire image.
* Apply style fixes
* Update src/diffusers/pipelines/flux/pipeline_flux_inpaint.py
* Add community class StableDiffusionXL_T5Pipeline
Will be used with base model opendiffusionai/stablediffusionxl_t5
* Changed pooled_embeds to use projection instead of slice
* "make style" tweaks
* Added comments to top of code
* Apply style fixes
[examples] flux-control: use num_training_steps_for_scheduler in get_scheduler instead of args.max_train_steps * accelerator.num_processes
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* add guidance rescale
* update docs
* support adaptive instance norm filter
* fix custom timesteps support
* add custom timestep example to docs
* add a note about best generation settings being available only in the original repository
* use original org hub ids instead of personal
* make fix-copies
---------
Co-authored-by: Linoy Tsaban <57615435+linoytsaban@users.noreply.github.com>
* [gguf] Refactor __torch_function__ to avoid unnecessary computation
This helps with torch.compile compilation lantency. Avoiding unnecessary
computation should also lead to a slightly improved eager latency.
* Apply style fixes
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* feat: pipeline-level quant config.
Co-authored-by: SunMarc <marc.sun@hotmail.fr>
condition better.
support mapping.
improvements.
[Quantization] Add Quanto backend (#10756)
* update
* updaet
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* Update docs/source/en/quantization/quanto.md
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* Update src/diffusers/quantizers/quanto/utils.py
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* update
* update
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
[Single File] Add single file loading for SANA Transformer (#10947)
* added support for from_single_file
* added diffusers mapping script
* added testcase
* bug fix
* updated tests
* corrected code quality
* corrected code quality
---------
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
[LoRA] Improve warning messages when LoRA loading becomes a no-op (#10187)
* updates
* updates
* updates
* updates
* notebooks revert
* fix-copies.
* seeing
* fix
* revert
* fixes
* fixes
* fixes
* remove print
* fix
* conflicts ii.
* updates
* fixes
* better filtering of prefix.
---------
Co-authored-by: hlky <hlky@hlky.ac>
[LoRA] CogView4 (#10981)
* update
* make fix-copies
* update
[Tests] improve quantization tests by additionally measuring the inference memory savings (#11021)
* memory usage tests
* fixes
* gguf
[`Research Project`] Add AnyText: Multilingual Visual Text Generation And Editing (#8998)
* Add initial template
* Second template
* feat: Add TextEmbeddingModule to AnyTextPipeline
* feat: Add AuxiliaryLatentModule template to AnyTextPipeline
* Add bert tokenizer from the anytext repo for now
* feat: Update AnyTextPipeline's modify_prompt method
This commit adds improvements to the modify_prompt method in the AnyTextPipeline class. The method now handles special characters and replaces selected string prompts with a placeholder. Additionally, it includes a check for Chinese text and translation using the trans_pipe.
* Fill in the `forward` pass of `AuxiliaryLatentModule`
* `make style && make quality`
* `chore: Update bert_tokenizer.py with a TODO comment suggesting the use of the transformers library`
* Update error handling to raise and logging
* Add `create_glyph_lines` function into `TextEmbeddingModule`
* make style
* Up
* Up
* Up
* Up
* Remove several comments
* refactor: Remove ControlNetConditioningEmbedding and update code accordingly
* Up
* Up
* up
* refactor: Update AnyTextPipeline to include new optional parameters
* up
* feat: Add OCR model and its components
* chore: Update `TextEmbeddingModule` to include OCR model components and dependencies
* chore: Update `AuxiliaryLatentModule` to include VAE model and its dependencies for masked image in the editing task
* `make style`
* refactor: Update `AnyTextPipeline`'s docstring
* Update `AuxiliaryLatentModule` to include info dictionary so that text processing is done once
* simplify
* `make style`
* Converting `TextEmbeddingModule` to ordinary `encode_prompt()` function
* Simplify for now
* `make style`
* Up
* feat: Add scripts to convert AnyText controlnet to diffusers
* `make style`
* Fix: Move glyph rendering to `TextEmbeddingModule` from `AuxiliaryLatentModule`
* make style
* Up
* Simplify
* Up
* feat: Add safetensors module for loading model file
* Fix device issues
* Up
* Up
* refactor: Simplify
* refactor: Simplify code for loading models and handling data types
* `make style`
* refactor: Update to() method in FrozenCLIPEmbedderT3 and TextEmbeddingModule
* refactor: Update dtype in embedding_manager.py to match proj.weight
* Up
* Add attribution and adaptation information to pipeline_anytext.py
* Update usage example
* Will refactor `controlnet_cond_embedding` initialization
* Add `AnyTextControlNetConditioningEmbedding` template
* Refactor organization
* style
* style
* Move custom blocks from `AuxiliaryLatentModule` to `AnyTextControlNetConditioningEmbedding`
* Follow one-file policy
* style
* [Docs] Update README and pipeline_anytext.py to use AnyTextControlNetModel
* [Docs] Update import statement for AnyTextControlNetModel in pipeline_anytext.py
* [Fix] Update import path for ControlNetModel, ControlNetOutput in anytext_controlnet.py
* Refactor AnyTextControlNet to use configurable conditioning embedding channels
* Complete control net conditioning embedding in AnyTextControlNetModel
* up
* [FIX] Ensure embeddings use correct device in AnyTextControlNetModel
* up
* up
* style
* [UPDATE] Revise README and example code for AnyTextPipeline integration with DiffusionPipeline
* [UPDATE] Update example code in anytext.py to use correct font file and improve clarity
* down
* [UPDATE] Refactor BasicTokenizer usage to a new Checker class for text processing
* update pillow
* [UPDATE] Remove commented-out code and unnecessary docstring in anytext.py and anytext_controlnet.py for improved clarity
* [REMOVE] Delete frozen_clip_embedder_t3.py as it is in the anytext.py file
* [UPDATE] Replace edict with dict for configuration in anytext.py and RecModel.py for consistency
* 🆙
* style
* [UPDATE] Revise README.md for clarity, remove unused imports in anytext.py, and add author credits in anytext_controlnet.py
* style
* Update examples/research_projects/anytext/README.md
Co-authored-by: Aryan <contact.aryanvs@gmail.com>
* Remove commented-out image preparation code in AnyTextPipeline
* Remove unnecessary blank line in README.md
[Quantization] Allow loading TorchAO serialized Tensor objects with torch>=2.6 (#11018)
* update
* update
* update
* update
* update
* update
* update
* update
* update
fix: mixture tiling sdxl pipeline - adjust gerating time_ids & embeddings (#11012)
small fix on generating time_ids & embeddings
[LoRA] support wan i2v loras from the world. (#11025)
* support wan i2v loras from the world.
* remove copied from.
* upates
* add lora.
Fix SD3 IPAdapter feature extractor (#11027)
chore: fix help messages in advanced diffusion examples (#10923)
Fix missing **kwargs in lora_pipeline.py (#11011)
* Update lora_pipeline.py
* Apply style fixes
* fix-copies
---------
Co-authored-by: hlky <hlky@hlky.ac>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Fix for multi-GPU WAN inference (#10997)
Ensure that hidden_state and shift/scale are on the same device when running with multiple GPUs
Co-authored-by: Jimmy <39@🇺🇸.com>
[Refactor] Clean up import utils boilerplate (#11026)
* update
* update
* update
Use `output_size` in `repeat_interleave` (#11030)
[hybrid inference 🍯🐝] Add VAE encode (#11017)
* [hybrid inference 🍯🐝] Add VAE encode
* _toctree: add vae encode
* Add endpoints, tests
* vae_encode docs
* vae encode benchmarks
* api reference
* changelog
* Update docs/source/en/hybrid_inference/overview.md
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* update
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Wan Pipeline scaling fix, type hint warning, multi generator fix (#11007)
* Wan Pipeline scaling fix, type hint warning, multi generator fix
* Apply suggestions from code review
[LoRA] change to warning from info when notifying the users about a LoRA no-op (#11044)
* move to warning.
* test related changes.
Rename Lumina(2)Text2ImgPipeline -> Lumina(2)Pipeline (#10827)
* Rename Lumina(2)Text2ImgPipeline -> Lumina(2)Pipeline
---------
Co-authored-by: YiYi Xu <yixu310@gmail.com>
making ```formatted_images``` initialization compact (#10801)
compact writing
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
Fix aclnnRepeatInterleaveIntWithDim error on NPU for get_1d_rotary_pos_embed (#10820)
* get_1d_rotary_pos_embed support npu
* Update src/diffusers/models/embeddings.py
---------
Co-authored-by: Kai zheng <kaizheng@KaideMacBook-Pro.local>
Co-authored-by: hlky <hlky@hlky.ac>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
[Tests] restrict memory tests for quanto for certain schemes. (#11052)
* restrict memory tests for quanto for certain schemes.
* Apply suggestions from code review
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
* fixes
* style
---------
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
[LoRA] feat: support non-diffusers wan t2v loras. (#11059)
feat: support non-diffusers wan t2v loras.
[examples/controlnet/train_controlnet_sd3.py] Fixes#11050 - Cast prompt_embeds and pooled_prompt_embeds to weight_dtype to prevent dtype mismatch (#11051)
Fix: dtype mismatch of prompt embeddings in sd3 controlnet training
Co-authored-by: Andreas Jörg <andreasjoerg@MacBook-Pro-von-Andreas-2.fritz.box>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
reverts accidental change that removes attn_mask in attn. Improves fl… (#11065)
reverts accidental change that removes attn_mask in attn. Improves flux ptxla by using flash block sizes. Moves encoding outside the for loop.
Co-authored-by: Juan Acevedo <jfacevedo@google.com>
Fix deterministic issue when getting pipeline dtype and device (#10696)
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
[Tests] add requires peft decorator. (#11037)
* add requires peft decorator.
* install peft conditionally.
* conditional deps.
Co-authored-by: DN6 <dhruv.nair@gmail.com>
---------
Co-authored-by: DN6 <dhruv.nair@gmail.com>
CogView4 Control Block (#10809)
* cogview4 control training
---------
Co-authored-by: OleehyO <leehy0357@gmail.com>
Co-authored-by: yiyixuxu <yixu310@gmail.com>
[CI] pin transformers version for benchmarking. (#11067)
pin transformers version for benchmarking.
updates
Fix Wan I2V Quality (#11087)
* fix_wan_i2v_quality
* Update src/diffusers/pipelines/wan/pipeline_wan_i2v.py
Co-authored-by: YiYi Xu <yixu310@gmail.com>
* Update src/diffusers/pipelines/wan/pipeline_wan_i2v.py
Co-authored-by: YiYi Xu <yixu310@gmail.com>
* Update src/diffusers/pipelines/wan/pipeline_wan_i2v.py
Co-authored-by: YiYi Xu <yixu310@gmail.com>
* Update pipeline_wan_i2v.py
---------
Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: hlky <hlky@hlky.ac>
LTX 0.9.5 (#10968)
* update
---------
Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: hlky <hlky@hlky.ac>
make PR GPU tests conditioned on styling. (#11099)
Group offloading improvements (#11094)
update
Fix pipeline_flux_controlnet.py (#11095)
* Fix pipeline_flux_controlnet.py
* Fix style
update readme instructions. (#11096)
Co-authored-by: Juan Acevedo <jfacevedo@google.com>
Resolve stride mismatch in UNet's ResNet to support Torch DDP (#11098)
Modify UNet's ResNet implementation to resolve stride mismatch in Torch's DDP
Fix Group offloading behaviour when using streams (#11097)
* update
* update
Quality options in `export_to_video` (#11090)
* Quality options in `export_to_video`
* make style
improve more.
add placeholders for docstrings.
formatting.
smol fix.
solidify validation and annotation
* Revert "feat: pipeline-level quant config."
This reverts commit 316ff46b76.
* feat: implement pipeline-level quantization config
Co-authored-by: SunMarc <marc@huggingface.co>
* update
* fixes
* fix validation.
* add tests and other improvements.
* add tests
* import quality
* remove prints.
* add docs.
* fixes to docs.
* doc fixes.
* doc fixes.
* add validation to the input quantization_config.
* clarify recommendations.
* docs
* add to ci.
* todo.
---------
Co-authored-by: SunMarc <marc@huggingface.co>
* test permission
* Add cross attention type for Sana-Sprint.
* Add Sana-Sprint training script in diffusers.
* make style && make quality;
* modify the attention processor with `set_attn_processor` and change `SanaAttnProcessor3_0` to `SanaVanillaAttnProcessor`
* Add import for SanaVanillaAttnProcessor
* Add README file.
* Apply suggestions from code review
* style
* Update examples/research_projects/sana/README.md
---------
Co-authored-by: lawrence-cj <cjs1020440147@icloud.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* begin transformer conversion
* refactor
* refactor
* refactor
* refactor
* refactor
* refactor
* update
* add conversion script
* add pipeline
* make fix-copies
* remove einops
* update docs
* gradient checkpointing
* add transformer test
* update
* debug
* remove prints
* match sigmas
* add vae pt. 1
* finish CV* vae
* update
* update
* update
* update
* update
* update
* make fix-copies
* update
* make fix-copies
* fix
* update
* update
* make fix-copies
* update
* update tests
* handle device and dtype for safety checker; required in latest diffusers
* remove enable_gqa and use repeat_interleave instead
* enforce safety checker; use dummy checker in fast tests
* add review suggestion for ONNX export
Co-Authored-By: Asfiya Baig <asfiyab@nvidia.com>
* fix safety_checker issues when not passed explicitly
We could either do what's done in this commit, or update the Cosmos examples to explicitly pass the safety checker
* use cosmos guardrail package
* auto format docs
* update conversion script to support 14B models
* update name CosmosPipeline -> CosmosTextToWorldPipeline
* update docs
* fix docs
* fix group offload test failing for vae
---------
Co-authored-by: Asfiya Baig <asfiyab@nvidia.com>
* [train_controlnet_sdxl] Add LANCZOS as the default interpolation mode for image resizing
* [train_dreambooth_lora_flux_advanced] Add LANCZOS as the default interpolation mode for image resizing
* 1. add pre-computation of prompt embeddings when custom prompts are used as well
2. save model card even if model is not pushed to hub
3. remove scheduler initialization from code example - not necessary anymore (it's now if the base model's config)
4. add skip_final_inference - to allow to run with validation, but skip the final loading of the pipeline with the lora weights to reduce memory reqs
* pre encode validation prompt as well
* Update examples/dreambooth/train_dreambooth_lora_hidream.py
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Update examples/dreambooth/train_dreambooth_lora_hidream.py
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Update examples/dreambooth/train_dreambooth_lora_hidream.py
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* pre encode validation prompt as well
* Apply style fixes
* empty commit
* change default trained modules
* empty commit
* address comments + change encoding of validation prompt (before it was only pre-encoded if custom prompts are provided, but should be pre-encoded either way)
* Apply style fixes
* empty commit
* fix validation_embeddings definition
* fix final inference condition
* fix pipeline deletion in last inference
* Apply style fixes
* empty commit
* layers
* remove readme remarks on only pre-computing when instance prompt is provided and change example to 3d icons
* smol fix
* empty commit
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* fix issue that training flux controlnet was unstable and validation results were unstable
* del unused code pieces, fix grammar
---------
Co-authored-by: Your Name <you@example.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Fix: Inherit `StableDiffusionXLLoraLoaderMixin`
`StableDiffusionXLControlNetAdapterInpaintPipeline`
used to incorrectly inherit
`StableDiffusionLoraLoaderMixin`
instead of `StableDiffusionXLLoraLoaderMixin`
* Update pe_selection_index_based_on_dim
* Make pe_selection_index_based_on_dim work with torh.compile
* Fix AuraFlowTransformer2DModel's dpcstring default values
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
loose expected_max_diff from 5e-1 to 8e-1 to make
KandinskyV22PipelineInpaintCombinedFastTests::test_float16_inference
pass on XPU
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
Before this if txt_ids was 3d tensor, line with txt_ids[:1] concat txt_ids by batch dim. Now we first check that txt_ids is 2d tensor (or take first batch element) and then concat by token dim
* loose test_float16_inference's tolerance from 5e-2 to 6e-2, so XPU can
pass UT
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
* fix test_pipeline_flux_redux fail on XPU
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
---------
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
* [WIP][LoRA] Implement hot-swapping of LoRA
This PR adds the possibility to hot-swap LoRA adapters. It is WIP.
Description
As of now, users can already load multiple LoRA adapters. They can
offload existing adapters or they can unload them (i.e. delete them).
However, they cannot "hotswap" adapters yet, i.e. substitute the weights
from one LoRA adapter with the weights of another, without the need to
create a separate LoRA adapter.
Generally, hot-swapping may not appear not super useful but when the
model is compiled, it is necessary to prevent recompilation. See #9279
for more context.
Caveats
To hot-swap a LoRA adapter for another, these two adapters should target
exactly the same layers and the "hyper-parameters" of the two adapters
should be identical. For instance, the LoRA alpha has to be the same:
Given that we keep the alpha from the first adapter, the LoRA scaling
would be incorrect for the second adapter otherwise.
Theoretically, we could override the scaling dict with the alpha values
derived from the second adapter's config, but changing the dict will
trigger a guard for recompilation, defeating the main purpose of the
feature.
I also found that compilation flags can have an impact on whether this
works or not. E.g. when passing "reduce-overhead", there will be errors
of the type:
> input name: arg861_1. data pointer changed from 139647332027392 to
139647331054592
I don't know enough about compilation to determine whether this is
problematic or not.
Current state
This is obviously WIP right now to collect feedback and discuss which
direction to take this. If this PR turns out to be useful, the
hot-swapping functions will be added to PEFT itself and can be imported
here (or there is a separate copy in diffusers to avoid the need for a
min PEFT version to use this feature).
Moreover, more tests need to be added to better cover this feature,
although we don't necessarily need tests for the hot-swapping
functionality itself, since those tests will be added to PEFT.
Furthermore, as of now, this is only implemented for the unet. Other
pipeline components have yet to implement this feature.
Finally, it should be properly documented.
I would like to collect feedback on the current state of the PR before
putting more time into finalizing it.
* Reviewer feedback
* Reviewer feedback, adjust test
* Fix, doc
* Make fix
* Fix for possible g++ error
* Add test for recompilation w/o hotswapping
* Make hotswap work
Requires https://github.com/huggingface/peft/pull/2366
More changes to make hotswapping work. Together with the mentioned PEFT
PR, the tests pass for me locally.
List of changes:
- docstring for hotswap
- remove code copied from PEFT, import from PEFT now
- adjustments to PeftAdapterMixin.load_lora_adapter (unfortunately, some
state dict renaming was necessary, LMK if there is a better solution)
- adjustments to UNet2DConditionLoadersMixin._process_lora: LMK if this
is even necessary or not, I'm unsure what the overall relationship is
between this and PeftAdapterMixin.load_lora_adapter
- also in UNet2DConditionLoadersMixin._process_lora, I saw that there is
no LoRA unloading when loading the adapter fails, so I added it
there (in line with what happens in PeftAdapterMixin.load_lora_adapter)
- rewritten tests to avoid shelling out, make the test more precise by
making sure that the outputs align, parametrize it
- also checked the pipeline code mentioned in this comment:
https://github.com/huggingface/diffusers/pull/9453#issuecomment-2418508871;
when running this inside the with
torch._dynamo.config.patch(error_on_recompile=True) context, there is
no error, so I think hotswapping is now working with pipelines.
* Address reviewer feedback:
- Revert deprecated method
- Fix PEFT doc link to main
- Don't use private function
- Clarify magic numbers
- Add pipeline test
Moreover:
- Extend docstrings
- Extend existing test for outputs != 0
- Extend existing test for wrong adapter name
* Change order of test decorators
parameterized.expand seems to ignore skip decorators if added in last
place (i.e. innermost decorator).
* Split model and pipeline tests
Also increase test coverage by also targeting conv2d layers (support of
which was added recently on the PEFT PR).
* Reviewer feedback: Move decorator to test classes
... instead of having them on each test method.
* Apply suggestions from code review
Co-authored-by: hlky <hlky@hlky.ac>
* Reviewer feedback: version check, TODO comment
* Add enable_lora_hotswap method
* Reviewer feedback: check _lora_loadable_modules
* Revert changes in unet.py
* Add possibility to ignore enabled at wrong time
* Fix docstrings
* Log possible PEFT error, test
* Raise helpful error if hotswap not supported
I.e. for the text encoder
* Formatting
* More linter
* More ruff
* Doc-builder complaint
* Update docstring:
- mention no text encoder support yet
- make it clear that LoRA is meant
- mention that same adapter name should be passed
* Fix error in docstring
* Update more methods with hotswap argument
- SDXL
- SD3
- Flux
No changes were made to load_lora_into_transformer.
* Add hotswap argument to load_lora_into_transformer
For SD3 and Flux. Use shorter docstring for brevity.
* Extend docstrings
* Add version guards to tests
* Formatting
* Fix LoRA loading call to add prefix=None
See:
https://github.com/huggingface/diffusers/pull/10187#issuecomment-2717571064
* Run make fix-copies
* Add hot swap documentation to the docs
* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: hlky <hlky@hlky.ac>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Refactor `LTXConditionPipeline` to add text-only conditioning
* style
* up
* Refactor `LTXConditionPipeline` to streamline condition handling and improve clarity
* Improve condition checks
* Simplify latents handling based on conditioning type
* Refactor rope_interpolation_scale preparation for clarity and efficiency
* Update LTXConditionPipeline docstring to clarify supported input types
* Add LTX Video 0.9.5 model to documentation
* Clarify documentation to indicate support for text-only conditioning without passing `conditions`
* refactor: comment out unused parameters in LTXConditionPipeline
* fix: restore previously commented parameters in LTXConditionPipeline
* fix: remove unused parameters from LTXConditionPipeline
* refactor: remove unnecessary lines in LTXConditionPipeline
* model card gen code
* push modelcard creation
* remove optional from params
* add import
* add use_dora check
* correct lora var use in tags
* make style && make quality
---------
Co-authored-by: Aryan <aryan@huggingface.co>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* allow models to run with a user-provided dtype map instead of a single dtype
* make style
* Add warning, change `_` to `default`
* make style
* add test
* handle shared tensors
* remove warning
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
set self._hf_peft_config_loaded to True on successful lora load
Sets the `_hf_peft_config_loaded` flag if a LoRA is successfully loaded in `load_lora_adapter`. Fixes bug huggingface/diffusers/issues/11148
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* [Documentation] Update README and example code with additional usage instructions for AnyText
* [Documentation] Update README for AnyTextPipeline and improve logging in code
* Remove wget command for font file from example docstring in anytext.py
* Don't use `torch_dtype` when `quantization_config` is set
* up
* djkajka
* Apply suggestions from code review
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* fix bug when pixart-dmd inference with `num_inference_steps=1`
* use return_dict=False and return [1] element for 1-step pixart model, which works for both lcm and dmd
reverts accidental change that removes attn_mask in attn. Improves flux ptxla by using flash block sizes. Moves encoding outside the for loop.
Co-authored-by: Juan Acevedo <jfacevedo@google.com>
* Add initial template
* Second template
* feat: Add TextEmbeddingModule to AnyTextPipeline
* feat: Add AuxiliaryLatentModule template to AnyTextPipeline
* Add bert tokenizer from the anytext repo for now
* feat: Update AnyTextPipeline's modify_prompt method
This commit adds improvements to the modify_prompt method in the AnyTextPipeline class. The method now handles special characters and replaces selected string prompts with a placeholder. Additionally, it includes a check for Chinese text and translation using the trans_pipe.
* Fill in the `forward` pass of `AuxiliaryLatentModule`
* `make style && make quality`
* `chore: Update bert_tokenizer.py with a TODO comment suggesting the use of the transformers library`
* Update error handling to raise and logging
* Add `create_glyph_lines` function into `TextEmbeddingModule`
* make style
* Up
* Up
* Up
* Up
* Remove several comments
* refactor: Remove ControlNetConditioningEmbedding and update code accordingly
* Up
* Up
* up
* refactor: Update AnyTextPipeline to include new optional parameters
* up
* feat: Add OCR model and its components
* chore: Update `TextEmbeddingModule` to include OCR model components and dependencies
* chore: Update `AuxiliaryLatentModule` to include VAE model and its dependencies for masked image in the editing task
* `make style`
* refactor: Update `AnyTextPipeline`'s docstring
* Update `AuxiliaryLatentModule` to include info dictionary so that text processing is done once
* simplify
* `make style`
* Converting `TextEmbeddingModule` to ordinary `encode_prompt()` function
* Simplify for now
* `make style`
* Up
* feat: Add scripts to convert AnyText controlnet to diffusers
* `make style`
* Fix: Move glyph rendering to `TextEmbeddingModule` from `AuxiliaryLatentModule`
* make style
* Up
* Simplify
* Up
* feat: Add safetensors module for loading model file
* Fix device issues
* Up
* Up
* refactor: Simplify
* refactor: Simplify code for loading models and handling data types
* `make style`
* refactor: Update to() method in FrozenCLIPEmbedderT3 and TextEmbeddingModule
* refactor: Update dtype in embedding_manager.py to match proj.weight
* Up
* Add attribution and adaptation information to pipeline_anytext.py
* Update usage example
* Will refactor `controlnet_cond_embedding` initialization
* Add `AnyTextControlNetConditioningEmbedding` template
* Refactor organization
* style
* style
* Move custom blocks from `AuxiliaryLatentModule` to `AnyTextControlNetConditioningEmbedding`
* Follow one-file policy
* style
* [Docs] Update README and pipeline_anytext.py to use AnyTextControlNetModel
* [Docs] Update import statement for AnyTextControlNetModel in pipeline_anytext.py
* [Fix] Update import path for ControlNetModel, ControlNetOutput in anytext_controlnet.py
* Refactor AnyTextControlNet to use configurable conditioning embedding channels
* Complete control net conditioning embedding in AnyTextControlNetModel
* up
* [FIX] Ensure embeddings use correct device in AnyTextControlNetModel
* up
* up
* style
* [UPDATE] Revise README and example code for AnyTextPipeline integration with DiffusionPipeline
* [UPDATE] Update example code in anytext.py to use correct font file and improve clarity
* down
* [UPDATE] Refactor BasicTokenizer usage to a new Checker class for text processing
* update pillow
* [UPDATE] Remove commented-out code and unnecessary docstring in anytext.py and anytext_controlnet.py for improved clarity
* [REMOVE] Delete frozen_clip_embedder_t3.py as it is in the anytext.py file
* [UPDATE] Replace edict with dict for configuration in anytext.py and RecModel.py for consistency
* 🆙
* style
* [UPDATE] Revise README.md for clarity, remove unused imports in anytext.py, and add author credits in anytext_controlnet.py
* style
* Update examples/research_projects/anytext/README.md
Co-authored-by: Aryan <contact.aryanvs@gmail.com>
* Remove commented-out image preparation code in AnyTextPipeline
* Remove unnecessary blank line in README.md
* updated train_dreambooth_lora to fix the LR schedulers for `num_train_epochs` in distributed training env
* fixed formatting
* remove trailing newlines
* fixed style error
* Fix SD2.X clip single file load projection_dim
Infer projection_dim from the checkpoint before loading
from pretrained, override any incorrect hub config.
Hub configuration for SD2.X specifies projection_dim=512
which is incorrect for SD2.X checkpoints loaded from civitai
and similar.
Exception was previously thrown upon attempting to
load_model_dict_into_meta for SD2.X single file checkpoints.
Such LDM models usually require projection_dim=1024
* convert_open_clip_checkpoint use hidden_size for text_proj_dim
* convert_open_clip_checkpoint, revert checkpoint[text_proj_key].shape[1] -> [0]
values are identical
---------
Co-authored-by: Teriks <Teriks@users.noreply.github.com>
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
* fix-copies went uncaught it seems.
* remove more unneeded encode_prompt() tests
* Revert "fix-copies went uncaught it seems."
This reverts commit eefb302791.
* empty
* minor documentation fixes of the depth and normals pipelines
* update license headers
* update model checkpoints in examples
fix missing prediction_type in register_to_config in the normals pipeline
* add initial marigold intrinsics pipeline
update comments about num_inference_steps and ensemble_size
minor fixes in comments of marigold normals and depth pipelines
* update uncertainty visualization to work with intrinsics
* integrate iid
---------
Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* More robust from_pretrained init_kwargs type checking
* Corrected for Python 3.10
* Type checks subclasses and fixed type warnings
* More type corrections and skip tokenizer type checking
* make style && make quality
* Updated docs and types for Lumina pipelines
* Fixed check for empty signature
* changed location of helper functions
* make style
---------
Co-authored-by: hlky <hlky@hlky.ac>
This PR updates the max_shift value in flux to 1.15 for consistency across the codebase. In addition to modifying max_shift in flux, all related functions that copy and use this logic, such as calculate_shift in `src/diffusers/pipelines/stable_diffusion_3/pipeline_stable_diffusion_3_img2img.py`, have also been updated to ensure uniform behavior.
* init
* encode with glm
* draft schedule
* feat(scheduler): Add CogView scheduler implementation
* feat(embeddings): add CogView 2D rotary positional embedding
* 1
* Update pipeline_cogview4.py
* fix the timestep init and sigma
* update latent
* draft patch(not work)
* fix
* [WIP][cogview4]: implement initial CogView4 pipeline
Implement the basic CogView4 pipeline structure with the following changes:
- Add CogView4 pipeline implementation
- Implement DDIM scheduler for CogView4
- Add CogView3Plus transformer architecture
- Update embedding models
Current limitations:
- CFG implementation uses padding for sequence length alignment
- Need to verify transformer inference alignment with Megatron
TODO:
- Consider separate forward passes for condition/uncondition
instead of padding approach
* [WIP][cogview4][refactor]: Split condition/uncondition forward pass in CogView4 pipeline
Split the forward pass for conditional and unconditional predictions in the CogView4 pipeline to match the original implementation. The noise prediction is now done separately for each case before combining them for guidance. However, the results still need improvement.
This is a work in progress as the generated images are not yet matching expected quality.
* use with -2 hidden state
* remove text_projector
* 1
* [WIP] Add tensor-reload to align input from transformer block
* [WIP] for older glm
* use with cogview4 transformers forward twice of u and uc
* Update convert_cogview4_to_diffusers.py
* remove this
* use main example
* change back
* reset
* setback
* back
* back 4
* Fix qkv conversion logic for CogView4 to Diffusers format
* back5
* revert to sat to cogview4 version
* update a new convert from megatron
* [WIP][cogview4]: implement CogView4 attention processor
Add CogView4AttnProcessor class for implementing scaled dot-product attention
with rotary embeddings for the CogVideoX model. This processor concatenates
encoder and hidden states, applies QKV projections and RoPE, but does not
include spatial normalization.
TODO:
- Fix incorrect QKV projection weights
- Resolve ~25% error in RoPE implementation compared to Megatron
* [cogview4] implement CogView4 transformer block
Implement CogView4 transformer block following the Megatron architecture:
- Add multi-modulate and multi-gate mechanisms for adaptive layer normalization
- Implement dual-stream attention with encoder-decoder structure
- Add feed-forward network with GELU activation
- Support rotary position embeddings for image tokens
The implementation follows the original CogView4 architecture while adapting
it to work within the diffusers framework.
* with new attn
* [bugfix] fix dimension mismatch in CogView4 attention
* [cogview4][WIP]: update final normalization in CogView4 transformer
Refactored the final normalization layer in CogView4 transformer to use separate layernorm and AdaLN operations instead of combined AdaLayerNormContinuous. This matches the original implementation but needs validation.
Needs verification against reference implementation.
* 1
* put back
* Update transformer_cogview4.py
* change time_shift
* Update pipeline_cogview4.py
* change timesteps
* fix
* change text_encoder_id
* [cogview4][rope] align RoPE implementation with Megatron
- Implement apply_rope method in attention processor to match Megatron's implementation
- Update position embeddings to ensure compatibility with Megatron-style rotary embeddings
- Ensure consistent rotary position encoding across attention layers
This change improves compatibility with Megatron-based models and provides
better alignment with the original implementation's positional encoding approach.
* [cogview4][bugfix] apply silu activation to time embeddings in CogView4
Applied silu activation to time embeddings before splitting into conditional
and unconditional parts in CogView4Transformer2DModel. This matches the
original implementation and helps ensure correct time conditioning behavior.
* [cogview4][chore] clean up pipeline code
- Remove commented out code and debug statements
- Remove unused retrieve_timesteps function
- Clean up code formatting and documentation
This commit focuses on code cleanup in the CogView4 pipeline implementation, removing unnecessary commented code and improving readability without changing functionality.
* [cogview4][scheduler] Implement CogView4 scheduler and pipeline
* now It work
* add timestep
* batch
* change convert scipt
* refactor pt. 1; make style
* refactor pt. 2
* refactor pt. 3
* add tests
* make fix-copies
* update toctree.yml
* use flow match scheduler instead of custom
* remove scheduling_cogview.py
* add tiktoken to test dependencies
* Update src/diffusers/models/embeddings.py
Co-authored-by: YiYi Xu <yixu310@gmail.com>
* apply suggestions from review
* use diffusers apply_rotary_emb
* update flow match scheduler to accept timesteps
* fix comment
* apply review sugestions
* Update src/diffusers/schedulers/scheduling_flow_match_euler_discrete.py
Co-authored-by: YiYi Xu <yixu310@gmail.com>
---------
Co-authored-by: 三洋三洋 <1258009915@qq.com>
Co-authored-by: OleehyO <leehy0357@gmail.com>
Co-authored-by: Aryan <aryan@huggingface.co>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
Update Custom Diffusion Documentation for Multiple Concept Inference
This PR updates the Custom Diffusion documentation to correctly demonstrate multiple concept inference by:
- Initializing the pipeline from a proper foundation model (e.g., "CompVis/stable-diffusion-v1-4") instead of a fine-tuned model.
- Defining model_id explicitly to avoid NameError.
- Correcting method calls for loading attention processors and textual inversion embeddings.
* update
* fix
* non_blocking; handle parameters and buffers
* update
* Group offloading with cuda stream prefetching (#10516)
* cuda stream prefetch
* remove breakpoints
* update
* copy model hook implementation from pab
* update; ~very workaround based implementation but it seems to work as expected; needs cleanup and rewrite
* more workarounds to make it actually work
* cleanup
* rewrite
* update
* make sure to sync current stream before overwriting with pinned params
not doing so will lead to erroneous computations on the GPU and cause bad results
* better check
* update
* remove hook implementation to not deal with merge conflict
* re-add hook changes
* why use more memory when less memory do trick
* why still use slightly more memory when less memory do trick
* optimise
* add model tests
* add pipeline tests
* update docs
* add layernorm and groupnorm
* address review comments
* improve tests; add docs
* improve docs
* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* apply suggestions from code review
* update tests
* apply suggestions from review
* enable_group_offloading -> enable_group_offload for naming consistency
* raise errors if multiple offloading strategies used; add relevant tests
* handle .to() when group offload applied
* refactor some repeated code
* remove unintentional change from merge conflict
* handle .cuda()
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* feat: new community mixture_tiling_sdxl pipeline for SDXL mixture-of-diffusers support
* fix use of variable latents to tile_latents
* removed references to modules that are not being used in this pipeline
* make style, make quality
* fixfeat: added _get_crops_coords_list function to pipeline to automatically define ctop,cleft coord to focus on image generation, helps to better harmonize the image and corrects the problem of flattened elements.
* feat: new community mixture_tiling_sdxl pipeline for SDXL mixture-of-diffusers support
* fix use of variable latents to tile_latents
* removed references to modules that are not being used in this pipeline
* make style, make quality
* feat(training-utils): support device and dtype params in compute_density_for_timestep_sampling
* chore: update type hint
* refactor: use union for type hint
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
controlnet union XL, make control_image immutible
when this argument is passed a list, __call__
modifies its content, since it is pass by reference
the list passed by the caller gets its content
modified unexpectedly
make a copy at method intro so this does not happen
Co-authored-by: Teriks <Teriks@users.noreply.github.com>
* add community pipeline for semantic guidance for flux
* fix imports in community pipeline for semantic guidance for flux
* Update examples/community/pipeline_flux_semantic_guidance.py
Co-authored-by: hlky <hlky@hlky.ac>
* fix community pipeline for semantic guidance for flux
---------
Co-authored-by: Linoy Tsaban <57615435+linoytsaban@users.noreply.github.com>
Co-authored-by: hlky <hlky@hlky.ac>
* Add IP-Adapter example to Flux docs
* Apply suggestions from code review
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* update
* update
* make style
* remove dynamo disable
* add coauthor
Co-Authored-By: Dhruv Nair <dhruv.nair@gmail.com>
* update
* update
* update
* update mixin
* add some basic tests
* update
* update
* non_blocking
* improvements
* update
* norm.* -> norm
* apply suggestions from review
* add example
* update hook implementation to the latest changes from pyramid attention broadcast
* deinitialize should raise an error
* update doc page
* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* update docs
* update
* refactor
* fix _always_upcast_modules for asym ae and vq_model
* fix lumina embedding forward to not depend on weight dtype
* refactor tests
* add simple lora inference tests
* _always_upcast_modules -> _precision_sensitive_module_patterns
* remove todo comments about review; revert changes to self.dtype in unets because .dtype on ModelMixin should be able to handle fp8 weight case
* check layer dtypes in lora test
* fix UNet1DModelTests::test_layerwise_upcasting_inference
* _precision_sensitive_module_patterns -> _skip_layerwise_casting_patterns based on feedback
* skip test in NCSNppModelTests
* skip tests for AutoencoderTinyTests
* skip tests for AutoencoderOobleckTests
* skip tests for UNet1DModelTests - unsupported pytorch operations
* layerwise_upcasting -> layerwise_casting
* skip tests for UNetRLModelTests; needs next pytorch release for currently unimplemented operation support
* add layerwise fp8 pipeline test
* use xfail
* Apply suggestions from code review
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
* add assertion with fp32 comparison; add tolerance to fp8-fp32 vs fp32-fp32 comparison (required for a few models' test to pass)
* add note about memory consumption on tesla CI runner for failing test
---------
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* autoencoder_dc tiling
* add tiling and slicing support in SANA pipelines
* create variables for padding length because the line becomes too long
* add tiling and slicing support in pag SANA pipelines
* revert changes to tile size
* make style
* add vae tiling test
* fix SanaMultiscaleLinearAttention apply_quadratic_attention bf16
---------
Co-authored-by: Aryan <aryan@huggingface.co>
* Update hunyuan_video.md to rectify the checkpoint id
* bfloat16
* more fixes
* don't update the checkpoint ids.
* update
* t -> T
* Apply suggestions from code review
* fix
---------
Co-authored-by: YiYi Xu <yixu310@gmail.com>
Fix argument name in 8bit quantized example
Found a tiny mistake in the documentation where the text encoder model was passed to the wrong argument in the FluxPipeline.from_pretrained function.
* autoencoder_dc tiling
* add tiling and slicing support in SANA pipelines
* create variables for padding length because the line becomes too long
* add tiling and slicing support in pag SANA pipelines
* revert changes to tile size
* make style
* add vae tiling test
---------
Co-authored-by: Aryan <aryan@huggingface.co>
Correcting a typo in the table number of a referenced paper (in scheduling_ddim_inverse.py)
Changed the number of the referenced table from 1 to 2 in a comment of the set_timesteps() method of the DDIMInverseScheduler class (also according to the description of the 'timestep_spacing' attribute of its __init__ method).
* Add no_mmap arg.
* Fix arg parsing.
* Update another method to force no mmap.
* logging
* logging2
* propagate no_mmap
* logging3
* propagate no_mmap
* logging4
* fix open call
* clean up logging
* cleanup
* fix missing arg
* update logging and comments
* Rename to disable_mmap and update other references.
* [Docs] Update ltx_video.md to remove generator from `from_pretrained()` (#10316)
Update ltx_video.md to remove generator from `from_pretrained()`
* docs: fix a mistake in docstring (#10319)
Update pipeline_hunyuan_video.py
docs: fix a mistake
* [BUG FIX] [Stable Audio Pipeline] Resolve torch.Tensor.new_zeros() TypeError in function prepare_latents caused by audio_vae_length (#10306)
[BUG FIX] [Stable Audio Pipeline] TypeError: new_zeros(): argument 'size' failed to unpack the object at pos 3 with error "type must be tuple of ints,but got float"
torch.Tensor.new_zeros() takes a single argument size (int...) – a list, tuple, or torch.Size of integers defining the shape of the output tensor.
in function prepare_latents:
audio_vae_length = self.transformer.config.sample_size * self.vae.hop_length
audio_shape = (batch_size // num_waveforms_per_prompt, audio_channels, audio_vae_length)
...
audio = initial_audio_waveforms.new_zeros(audio_shape)
audio_vae_length evaluates to float because self.transformer.config.sample_size returns a float
Co-authored-by: hlky <hlky@hlky.ac>
* [docs] Fix quantization links (#10323)
Update overview.md
* [Sana]add 2K related model for Sana (#10322)
add 2K related model for Sana
* Update src/diffusers/loaders/single_file_model.py
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
* Update src/diffusers/loaders/single_file.py
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
* make style
---------
Co-authored-by: hlky <hlky@hlky.ac>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Leojc <liao_junchao@outlook.com>
Co-authored-by: Aditya Raj <syntaxticsugr@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Junsong Chen <cjs1020440147@icloud.com>
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
* dont assume scheduler has optional config params
* make style, make fix-copies
* calculate_shift
* fix-copies, usage in pipelines
---------
Co-authored-by: hlky <hlky@hlky.ac>
* fix device issue in single gpu case
* Update src/diffusers/pipelines/pipeline_utils.py
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
RFInversionFluxPipeline.encode_image, device fix
Use self._execution_device instead of self.device when selecting
a device for the input image tensor.
This allows for compatibility with enable_model_cpu_offload &
enable_sequential_cpu_offload
Co-authored-by: Teriks <Teriks@users.noreply.github.com>
Co-authored-by: Linoy Tsaban <57615435+linoytsaban@users.noreply.github.com>
* update
* fix make copies
* update
* add relevant markers to the integration test suite.
* add copied.
* fox-copies
* temporarily add print.
* directly place on CUDA as CPU isn't that big on the CIO.
* fixes to fuse_lora, aryan was right.
* fixes
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* make base code changes referred from train_instructpix2pix script in examples
* change code to use PEFT as discussed in issue 10062
* update README training command
* update README training command
* refactor variable name and freezing unet
* Update examples/research_projects/instructpix2pix_lora/train_instruct_pix2pix_lora.py
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* update README installation instructions.
* cleanup code using make style and quality
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Update pipeline_controlnet.py
* Update pipeline_controlnet_img2img.py
runwayml Take-down so change all from to this
stable-diffusion-v1-5/stable-diffusion-v1-5
* Update pipeline_controlnet_inpaint.py
* runwayml take-down make change to sd-legacy
* runwayml take-down make change to sd-legacy
* runwayml take-down make change to sd-legacy
* runwayml take-down make change to sd-legacy
* Update convert_blipdiffusion_to_diffusers.py
style change
Enable VAE hash to be able to change with args change. If not, train_dataset_with_embeddiings may have row number inconsistency with train_dataset_with_vae.
Co-authored-by: Linoy Tsaban <57615435+linoytsaban@users.noreply.github.com>
* fix the Positinoal Embedding bug in 2K model;
* Change the default model to the BF16 one for more stable training and output
* make style
* substract buffer size
* add compute_module_persistent_sizes
---------
Co-authored-by: yiyixuxu <yixu310@gmail.com>
[BUG FIX] [Stable Audio Pipeline] TypeError: new_zeros(): argument 'size' failed to unpack the object at pos 3 with error "type must be tuple of ints,but got float"
torch.Tensor.new_zeros() takes a single argument size (int...) – a list, tuple, or torch.Size of integers defining the shape of the output tensor.
in function prepare_latents:
audio_vae_length = self.transformer.config.sample_size * self.vae.hop_length
audio_shape = (batch_size // num_waveforms_per_prompt, audio_channels, audio_vae_length)
...
audio = initial_audio_waveforms.new_zeros(audio_shape)
audio_vae_length evaluates to float because self.transformer.config.sample_size returns a float
Co-authored-by: hlky <hlky@hlky.ac>
* Check correct model type is passed to `from_pretrained`
* Flax, skip scheduler
* test_wrong_model
* Fix for scheduler
* Update tests/pipelines/test_pipelines.py
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* EnumMeta
* Flax
* scheduler in expected types
* make
* type object 'CLIPTokenizer' has no attribute '_PipelineFastTests__name'
* support union
* fix typing in kandinsky
* make
* add LCMScheduler
* 'LCMScheduler' object has no attribute 'sigmas'
* tests for wrong scheduler
* make
* update
* warning
* tests
* Update src/diffusers/pipelines/pipeline_utils.py
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
* import FlaxSchedulerMixin
* skip scheduler
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
* copy transformer
* copy vae
* copy pipeline
* make fix-copies
* refactor; make original code work with diffusers; test latents for comparison generated with this commit
* move rope into pipeline; remove flash attention; refactor
* begin conversion script
* make style
* refactor attention
* refactor
* refactor final layer
* their mlp -> our feedforward
* make style
* add docs
* refactor layer names
* refactor modulation
* cleanup
* refactor norms
* refactor activations
* refactor single blocks attention
* refactor attention processor
* make style
* cleanup a bit
* refactor double transformer block attention
* update mochi attn proc
* use diffusers attention implementation in all modules; checkpoint for all values matching original
* remove helper functions in vae
* refactor upsample
* refactor causal conv
* refactor resnet
* refactor
* refactor
* refactor
* grad checkpointing
* autoencoder test
* fix scaling factor
* refactor clip
* refactor llama text encoding
* add coauthor
Co-Authored-By: "Gregory D. Hunkins" <greg@ollano.com>
* refactor rope; diff: 0.14990234375; reason and fix: create rope grid on cpu and move to device
Note: The following line diverges from original behaviour. We create the grid on the device, whereas
original implementation creates it on CPU and then moves it to device. This results in numerical
differences in layerwise debugging outputs, but visually it is the same.
* use diffusers timesteps embedding; diff: 0.10205078125
* rename
* convert
* update
* add tests for transformer
* add pipeline tests; text encoder 2 is not optional
* fix attention implementation for torch
* add example
* update docs
* update docs
* apply suggestions from review
* refactor vae
* update
* Apply suggestions from code review
Co-authored-by: hlky <hlky@hlky.ac>
* Update src/diffusers/pipelines/hunyuan_video/pipeline_hunyuan_video.py
Co-authored-by: hlky <hlky@hlky.ac>
* Update src/diffusers/pipelines/hunyuan_video/pipeline_hunyuan_video.py
Co-authored-by: hlky <hlky@hlky.ac>
* make fix-copies
* update
---------
Co-authored-by: "Gregory D. Hunkins" <greg@ollano.com>
Co-authored-by: hlky <hlky@hlky.ac>
* first add a script for DC-AE;
* DC-AE init
* replace triton with custom implementation
* 1. rename file and remove un-used codes;
* no longer rely on omegaconf and dataclass
* replace custom activation with diffuers activation
* remove dc_ae attention in attention_processor.py
* iinherit from ModelMixin
* inherit from ConfigMixin
* dc-ae reduce to one file
* update downsample and upsample
* clean code
* support DecoderOutput
* remove get_same_padding and val2tuple
* remove autocast and some assert
* update ResBlock
* remove contents within super().__init__
* Update src/diffusers/models/autoencoders/dc_ae.py
Co-authored-by: YiYi Xu <yixu310@gmail.com>
* remove opsequential
* update other blocks to support the removal of build_norm
* remove build encoder/decoder project in/out
* remove inheritance of RMSNorm2d from LayerNorm
* remove reset_parameters for RMSNorm2d
Co-authored-by: YiYi Xu <yixu310@gmail.com>
* remove device and dtype in RMSNorm2d __init__
Co-authored-by: YiYi Xu <yixu310@gmail.com>
* Update src/diffusers/models/autoencoders/dc_ae.py
Co-authored-by: YiYi Xu <yixu310@gmail.com>
* Update src/diffusers/models/autoencoders/dc_ae.py
Co-authored-by: YiYi Xu <yixu310@gmail.com>
* Update src/diffusers/models/autoencoders/dc_ae.py
Co-authored-by: YiYi Xu <yixu310@gmail.com>
* remove op_list & build_block
* remove build_stage_main
* change file name to autoencoder_dc
* move LiteMLA to attention.py
* align with other vae decode output;
* add DC-AE into init files;
* update
* make quality && make style;
* quick push before dgx disappears again
* update
* make style
* update
* update
* fix
* refactor
* refactor
* refactor
* update
* possibly change to nn.Linear
* refactor
* make fix-copies
* replace vae with ae
* replace get_block_from_block_type to get_block
* replace downsample_block_type from Conv to conv for consistency
* add scaling factors
* incorporate changes for all checkpoints
* make style
* move mla to attention processor file; split qkv conv to linears
* refactor
* add tests
* from original file loader
* add docs
* add standard autoencoder methods
* combine attention processor
* fix tests
* update
* minor fix
* minor fix
* minor fix & in/out shortcut rename
* minor fix
* make style
* fix paper link
* update docs
* update single file loading
* make style
* remove single file loading support; todo for DN6
* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* add abstract
* 1. add DCAE into diffusers;
2. make style and make quality;
* add DCAE_HF into diffusers;
* bug fixed;
* add SanaPipeline, SanaTransformer2D into diffusers;
* add sanaLinearAttnProcessor2_0;
* first update for SanaTransformer;
* first update for SanaPipeline;
* first success run SanaPipeline;
* model output finally match with original model with the same intput;
* code update;
* code update;
* add a flow dpm-solver scripts
* 🎉[important update]
1. Integrate flow-dpm-sovler into diffusers;
2. finally run successfully on both `FlowMatchEulerDiscreteScheduler` and `FlowDPMSolverMultistepScheduler`;
* 🎉🔧[important update & fix huge bugs!!]
1. add SanaPAGPipeline & several related Sana linear attention operators;
2. `SanaTransformer2DModel` not supports multi-resolution input;
2. fix the multi-scale HW bugs in SanaPipeline and SanaPAGPipeline;
3. fix the flow-dpm-solver set_timestep() init `model_output` and `lower_order_nums` bugs;
* remove prints;
* add convert sana official checkpoint to diffusers format Safetensor.
* Update src/diffusers/models/transformers/sana_transformer_2d.py
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update src/diffusers/models/transformers/sana_transformer_2d.py
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update src/diffusers/models/transformers/sana_transformer_2d.py
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update src/diffusers/pipelines/pag/pipeline_pag_sana.py
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update src/diffusers/models/transformers/sana_transformer_2d.py
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update src/diffusers/models/transformers/sana_transformer_2d.py
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update src/diffusers/pipelines/sana/pipeline_sana.py
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update src/diffusers/pipelines/sana/pipeline_sana.py
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* update Sana for DC-AE's recent commit;
* make style && make quality
* Add StableDiffusion3PAGImg2Img Pipeline + Fix SD3 Unconditional PAG (#9932)
* fix progress bar updates in SD 1.5 PAG Img2Img pipeline
---------
Co-authored-by: Vinh H. Pham <phamvinh257@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* make the vae can be None in `__init__` of `SanaPipeline`
* Update src/diffusers/models/transformers/sana_transformer_2d.py
Co-authored-by: hlky <hlky@hlky.ac>
* change the ae related code due to the latest update of DCAE branch;
* change the ae related code due to the latest update of DCAE branch;
* 1. change code based on AutoencoderDC;
2. fix the bug of new GLUMBConv;
3. run success;
* update for solving conversation.
* 1. fix bugs and run convert script success;
2. Downloading ckpt from hub automatically;
* make style && make quality;
* 1. remove un-unsed parameters in init;
2. code update;
* remove test file
* refactor; add docs; add tests; update conversion script
* make style
* make fix-copies
* refactor
* udpate pipelines
* pag tests and refactor
* remove sana pag conversion script
* handle weight casting in conversion script
* update conversion script
* add a processor
* 1. add bf16 pth file path;
2. add complex human instruct in pipeline;
* fix fast \tests
* change gemma-2-2b-it ckpt to a non-gated repo;
* fix the pth path bug in conversion script;
* change grad ckpt to original; make style
* fix the complex_human_instruct bug and typo;
* remove dpmsolver flow scheduler
* apply review suggestions
* change the `FlowMatchEulerDiscreteScheduler` to default `DPMSolverMultistepScheduler` with flow matching scheduler.
* fix the tokenizer.padding_side='right' bug;
* update docs
* make fix-copies
* fix imports
* fix docs
* add integration test
* update docs
* update examples
* fix convert_model_output in schedulers
* fix failing tests
---------
Co-authored-by: Junyu Chen <chenjydl2003@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: chenjy2003 <70215701+chenjy2003@users.noreply.github.com>
Co-authored-by: Aryan <aryan@huggingface.co>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: hlky <hlky@hlky.ac>
* add test for expanding lora and normal lora error
* Update tests/lora/test_lora_layers_flux.py
* fix things.
* Update src/diffusers/loaders/peft.py
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Add offload option in flux-control training
* Update examples/flux-control/train_control_flux.py
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* modify help message
* fix format
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Use torch in get_2d_sincos_pos_embed
* Use torch in get_3d_sincos_pos_embed
* get_1d_sincos_pos_embed_from_grid in LatteTransformer3DModel
* deprecate
* move deprecate, make private
* fixed a dtype bfloat16 bug in torch_utils.py
when generating 1024*1024 image with bfloat16 dtype, there is an exception:
File "/opt/conda/lib/python3.10/site-packages/diffusers/utils/torch_utils.py", line 107, in fourier_filter
x_freq = fftn(x, dim=(-2, -1))
RuntimeError: Unsupported dtype BFloat16
* remove whitespace in torch_utils.py
* Update src/diffusers/utils/torch_utils.py
* Update torch_utils.py
---------
Co-authored-by: hlky <hlky@hlky.ac>
Sometimes, the decoder might lack parameters and only buffers (e.g., this happens when we manually need to convert all the parameters to buffers — e.g. to avoid packing fp16 and fp32 parameters with FSDP)
* Avoid creating a progress bar when it is disabled.
This is useful when exporting a pipeline, and allows a compiler to avoid trying to compile away tqdm.
* Prevent the PyTorch compiler from compiling progress bars.
* Update pipeline_utils.py
* Workaround for upscale with large output tensors.
Fixes#10040.
* Fix scale when output_size is given
* Style
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Add reference_attn & reference_adain support for sdxl with other controlnet
* Update README.md
* Update README.md by replacing human example with a cat one
Replace human example with a cat one
* Replace default human example with a cat one
* Use example images from huggingface documentation-images repository
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Update sdxl reference community pipeline
* Update README.md
Add example images.
* Style & quality
* Use example images from huggingface documentation-images repository
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Update handle single blocks on _convert_xlabs_flux_lora_to_diffusers to fix bug on updating keys and old_state_dict
---------
Co-authored-by: raul_ar <raul.moreno.salinas@autoretouch.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* add skip_layers argument to SD3 transformer model class
* add unit test for skip_layers in stable diffusion 3
* sd3: pipeline should support skip layer guidance
* up
---------
Co-authored-by: bghira <bghira@users.github.com>
Co-authored-by: yiyixuxu <yixu310@gmail.com>
* Move files to research-projects.
* docs: add IP Adapter training instructions
* Delete venv
* Update examples/ip_adapter/tutorial_train_sdxl.py
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Cherry-picked commits and re-moved files
to research_projects.
* make style.
* Update toctree and delete ip_adapter.
* Nit Fix
* Fix nit.
* Fix nit.
* Create training script for single GPU and set
model format to .safetensors
* Add sample inference script and restore _toctree
* Restore toctree.yaml
* fix spacing.
* Update toctree.yaml
---------
Co-authored-by: AMohamedAakhil <a.aakhilmohamed@gmail.com>
Co-authored-by: BootesVoid <78485654+AMohamedAakhil@users.noreply.github.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Add server example.
* Minor updates to README.
* Add fixes after local testing.
* Apply suggestions from code review
Updates to README from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* More doc updates.
* Maybe this will work to build the docs correctly?
* Fix style issues.
* Fix toc.
* Minor reformatting.
* Move docs to proper loc.
* Fix missing tick.
* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Sync docs changes back to README.
* Very minor update to docs to add space.
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update pipeline_flux_img2img.py
Added FromSingleFileMixin to this pipeline loader like the other FLUX pipelines.
* Update pipeline_flux_img2img.py
typo
* modified: src/diffusers/pipelines/flux/pipeline_flux_img2img.py
* Feature IP Adapter Xformers Attention Processor: this fix error loading incorrect attention processor when setting Xformers attn after load ip adapter scale, issues: #8863#8872
* Add new community pipeline for 'Adaptive Mask Inpainting', introduced in [ECCV2024] Beyond the Contact: Discovering Comprehensive Affordance for 3D Objects from Pre-trained 2D Diffusion Models
Update train_controlnet_flux.py
Fix the problem of inconsistency between size of image and size of validation_image which causes np.stack to report error.
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* modelcard generation edit
* add missed tag
* fix param name
* fix var
* change str to dict
* add use_dora check
* use correct tags for lora
* make style && make quality
---------
Co-authored-by: Aryan <aryan@huggingface.co>
* make lora target modules configurable and change the default
* style
* make lora target modules configurable and change the default
* fix bug when using prodigy and training te
* fix mixed precision training as proposed in https://github.com/huggingface/diffusers/pull/9565 for full dreambooth as well
* add test and notes
* style
* address sayaks comments
* style
* fix test
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* fix: removed setting of text encoder lr for T5 as it's not being tuned
* fix: removed setting of text encoder lr for T5 as it's not being tuned
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Linoy Tsaban <57615435+linoytsaban@users.noreply.github.com>
* add ostris trainer to README & add cache latents of vae
* add ostris trainer to README & add cache latents of vae
* style
* readme
* add test for latent caching
* add ostris noise scheduler
9ee1ef2a0a/toolkit/samplers/custom_flowmatch_sampler.py (L95)
* style
* fix import
* style
* fix tests
* style
* --change upcasting of transformer?
* update readme according to main
* add pivotal tuning for CLIP
* fix imports, encode_prompt call,add TextualInversionLoaderMixin to FluxPipeline for inference
* TextualInversionLoaderMixin support for FluxPipeline for inference
* move changes to advanced flux script, revert canonical
* add latent caching to canonical script
* revert changes to canonical script to keep it separate from https://github.com/huggingface/diffusers/pull/9160
* revert changes to canonical script to keep it separate from https://github.com/huggingface/diffusers/pull/9160
* style
* remove redundant line and change code block placement to align with logic
* add initializer_token arg
* add transformer frac for range support from pure textual inversion to the orig pivotal tuning
* support pure textual inversion - wip
* adjustments to support pure textual inversion and transformer optimization in only part of the epochs
* fix logic when using initializer token
* fix pure_textual_inversion_condition
* fix ti/pivotal loading of last validation run
* remove embeddings loading for ti in final training run (to avoid adding huggingface hub dependency)
* support pivotal for t5
* adapt pivotal for T5 encoder
* adapt pivotal for T5 encoder and support in flux pipeline
* t5 pivotal support + support fo pivotal for clip only or both
* fix param chaining
* fix param chaining
* README first draft
* readme
* readme
* readme
* style
* fix import
* style
* add fix from https://github.com/huggingface/diffusers/pull/9419
* add to readme, change function names
* te lr changes
* readme
* change concept tokens logic
* fix indices
* change arg name
* style
* dummy test
* revert dummy test
* reorder pivoting
* add warning in case the token abstraction is not the instance prompt
* experimental - wip - specific block training
* fix documentation and token abstraction processing
* remove transformer block specification feature (for now)
* style
* fix copies
* fix indexing issue when --initializer_concept has different amounts
* add if TextualInversionLoaderMixin to all flux pipelines
* style
* fix import
* fix imports
* address review comments - remove necessary prints & comments, use pin_memory=True, use free_memory utils, unify warning and prints
* style
* logger info fix
* make lora target modules configurable and change the default
* make lora target modules configurable and change the default
* style
* make lora target modules configurable and change the default, add notes to readme
* style
* add tests
* style
* fix repo id
* add updated requirements for advanced flux
* fix indices of t5 pivotal tuning embeddings
* fix path in test
* remove `pin_memory`
* fix filename of embedding
* fix filename of embedding
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
* refac: docstrings in training_utils.py
* fix: manual edits
* run make style
* add docstring at cast_training_params
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Fix local variable 'cached_folder' referenced before assignment in hub_utils.py
Fix when use `local_files_only=True` with `subfolder`, local variable 'cached_folder' referenced before assignment issue.
Co-authored-by: YiYi Xu <yixu310@gmail.com>
* Improve the performance and suitable for NPU
* Improve the performance and suitable for NPU computing
* Improve the performance and suitable for NPU
* Improve the performance and suitable for NPU
* Improve the performance and suitable for NPU
* Improve the performance and suitable for NPU
---------
Co-authored-by: 蒋硕 <jiangshuo9@h-partners.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Fixed local variable noise_pred_text referenced before assignment when using PAG with guidance scale and guidance rescale at the same time.
* Fixed style.
* Made returning text pred noise an argument.
* Removed int8 to float32 conversion (`* 2.0 - 1.0`) from `train_transforms` as it caused image overexposure.
Added `_resize_for_rectangle_crop` function to enable video cropping functionality. The cropping mode can be configured via `video_reshape_mode`, supporting options: ['center', 'random', 'none'].
* The number 127.5 may experience precision loss during division operations.
* wandb request pil image Type
* Resizing bug
* del jupyter
* make style
* Update examples/cogvideo/README.md
* make style
---------
Co-authored-by: --unset <--unset>
Co-authored-by: Aryan <aryan@huggingface.co>
* Support bfloat16 for Upsample2D
* Add test and use is_torch_version
* Resolve comments and add decorator
* Simplify require_torch_version_greater_equal decorator
* Run make style
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
* add train flux-controlnet scripts in example.
* fix error
* fix subfolder error
* fix preprocess error
* Update examples/controlnet/README_flux.md
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Update examples/controlnet/README_flux.md
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* fix readme
* fix note error
* add some Tutorial for deepspeed
* fix some Format Error
* add dataset_path example
* remove print, add guidance_scale CLI, readable apply
* Update examples/controlnet/README_flux.md
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* update,push_to_hub,save_weight_dtype,static method,clear_objs_and_retain_memory,report_to=wandb
* add push to hub in readme
* apply weighting schemes
* add note
* Update examples/controlnet/README_flux.md
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* make code style and quality
* fix some unnoticed error
* make code style and quality
* add example controlnet in readme
* add test controlnet
* rm Remove duplicate notes
* Fix formatting errors
* add new control image
* add model cpu offload
* update help for adafactor
* make quality & style
* make quality and style
* rename flux_controlnet_model_name_or_path
* fix back src/diffusers/pipelines/flux/pipeline_flux_controlnet.py
* fix dtype error by pre calculate text emb
* rm image save
* quality fix
* fix test
* fix tiny flux train error
* change report to to tensorboard
* fix save name error when test
* Fix shrinking errors
---------
Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Your Name <you@example.com>
* bugfix: precedence of operations should be slicing -> tiling
* fix typo
* fix another typo
* deprecate current implementation of tiled_encode and use new impl
* Update src/diffusers/models/autoencoders/autoencoder_kl.py
Co-authored-by: YiYi Xu <yixu310@gmail.com>
* Update src/diffusers/models/autoencoders/autoencoder_kl.py
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
* flux controlnet mode to take into account batch size
* incorporate yiyixuxu's suggestions (cleaner logic) as well as clean up control mode handling for multi case
* fix
* fix use_guidance when controlnet is a multi and does not have config
---------
Co-authored-by: Christopher Beckham <christopher.j.beckham@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* refactor scheduler class usage
* reorder to make tests more readable
* remove pipeline specific checks and skip tests directly
* rewrite denoiser conditions cleaner
* bump tolerance for cog test
* [docs] Replace runwayml/stable-diffusion-v1-5 with Lykon/dreamshaper-8
Updated documentation as runwayml/stable-diffusion-v1-5 has been removed from Huggingface.
* Update docs/source/en/using-diffusers/inpaint.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Replace with stable-diffusion-v1-5/stable-diffusion-v1-5
* Update inpaint.md
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Fix dtype error
* [bugfix] Fixed the issue on sd3 dreambooth training
* [bugfix] Fixed the issue on sd3 dreambooth training
---------
Co-authored-by: 蒋硕 <jiangshuo9@h-partners.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* MAINT Permission for GH token in stale.yml
See https://github.com/huggingface/peft/pull/2061 for the equivalent PR
in PEFT.
This restores the functionality of the stale bot after permissions for
the token have been limited. The action still shows errors for PEFT but
the bot appears to work fine.
* Also add write permissions for PRs
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* enable pxla training of stable diffusion 2.x models.
* run linter/style and run pipeline test for stable diffusion and fix issues.
* update xla libraries
* fix read me newline.
* move files to research folder.
* update per comments.
* rename readme.
---------
Co-authored-by: Juan Acevedo <jfacevedo@google.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* add controlnet sd3 example
* add controlnet sd3 example
* update controlnet sd3 example
* add controlnet sd3 example test
* fix quality and style
* update test
* update test
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* add animatediff + vid2vide + controlnet
* post tests fixes
* PR discussion fixes
* update docs
* change input video to links on HF + update an example
* make quality fix
* fix ip adapter test
* fix ip adapter test input
* update ip adapter test
* remove 2 shapes from SDFunctionTesterMixin::test_vae_tiling
* combine freeu enable/disable test to reduce many inference runs
* remove low signal unet test for signature
* remove low signal embeddings test
* remove low signal progress bar test from PipelineTesterMixin
* combine ip-adapter single and multi tests to save many inferences
* fix broken tests
* Update tests/pipelines/test_pipelines_common.py
* Update tests/pipelines/test_pipelines_common.py
* add progress bar tests
* add vid2vid pipeline for cogvideox
* make fix-copies
* update docs
* fake context parallel cache, vae encode tiling
* add test for cog vid2vid
* use video link from HF docs repo
* add copied from comments; correctly rename test class
* Update train_custom_diffusion.py to fix the LR schedulers for `num_train_epochs`
* Fix saving text embeddings during safe serialization
* Fixed formatting
to avoid "FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations"
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* optimize guidance creation in flux pipeline by moving it outside the loop
* use torch.full instead of torch.tensor to create a tensor with a single value
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* add requirements + fix link to bghira's guide
* text ecnoder training fixes
* text encoder training fixes
* text encoder training fixes
* text encoder training fixes
* style
* add tests
* fix encode_prompt call
* style
* unpack_latents test
* fix lora saving
* remove default val for max_sequenece_length in encode_prompt
* remove default val for max_sequenece_length in encode_prompt
* style
* testing
* style
* testing
* testing
* style
* fix sizing issue
* style
* revert scaling
* style
* style
* scaling test
* style
* scaling test
* remove model pred operation left from pre-conditioning
* remove model pred operation left from pre-conditioning
* fix trainable params
* remove te2 from casting
* transformer to accelerator
* remove prints
* empty commit
* fix for lr scheduler in distributed training
* Fixed the recalculation of the total training step section
* Fixed lint error
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* clipping for fp16
* fix typo
* added fp16 inference to docs
* fix docs typo
* include link for fp16 investigation
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* initial work draft for freenoise; needs massive cleanup
* fix freeinit bug
* add animatediff controlnet implementation
* revert attention changes
* add freenoise
* remove old helper functions
* add decode batch size param to all pipelines
* make style
* fix copied from comments
* make fix-copies
* make style
* copy animatediff controlnet implementation from #8972
* add experimental support for num_frames not perfectly fitting context length, ocntext stride
* make unet motion model lora work again based on #8995
* copy load video utils from #8972
* copied from AnimateDiff::prepare_latents
* address the case where last batch of frames does not match length of indices in prepare latents
* decode_batch_size->vae_batch_size; batch vae encode support in animatediff vid2vid
* revert sparsectrl and sdxl freenoise changes
* revert pia
* add freenoise tests
* make fix-copies
* improve docstrings
* add freenoise tests to animatediff controlnet
* update tests
* Update src/diffusers/models/unets/unet_motion_model.py
* add freenoise to animatediff pag
* address review comments
* make style
* update tests
* make fix-copies
* fix error message
* remove copied from comment
* fix imports in tests
* update
---------
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
* add hunyuan model test
* apply suggestions
* reduce dims further
* reduce dims further
* run make style
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* add LatteTransformer3DModel model test
* change patch_size to 1
* reduce req len
* reduce channel dims
* increase num_layers
* reduce dims further
* run make style
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Aryan <aryan@huggingface.co>
* Update TensorRT txt2img and inpaint community pipelines
Signed-off-by: Asfiya Baig <asfiyab@nvidia.com>
* update tensorrt install instructions
Signed-off-by: Asfiya Baig <asfiyab@nvidia.com>
---------
Signed-off-by: Asfiya Baig <asfiyab@nvidia.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* animatediff specific transformer model
* make style
* make fix-copies
* move blocks to unet motion model
* make style
* remove dummy object
* fix incorrectly passed param causing test failures
* rename model and output class
* fix sparsectrl imports
* remove todo comments
* remove temporal double self attn param from controlnet sparsectrl
* add deprecated versions of blocks
* apply suggestions from review
* update
---------
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
* handle lora scale and clip skip in lpw sd and sdxl
* use StableDiffusionLoraLoaderMixin
* use StableDiffusionXLLoraLoaderMixin
* style
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* chore: Update is_google_colab check to use environment variable
* Check Colab with all possible COLAB_* env variables
* Remove unnecessary word
* Make `_is_google_colab` more inclusive
* Revert "Make `_is_google_colab` more inclusive"
This reverts commit 6406db21ac.
* Make `_is_google_colab` more inclusive.
* chore: Update import_utils.py with notebook check improvement
* Refactor import_utils.py to improve notebook detection for VS Code's notebook
* chore: Remove `is_notebook()` function and related code
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Fix multi-gpu case
* Prefer previously created `unwrap_model()` function
For `torch.compile()` generalizability
* `chore: update unwrap_model() function to use accelerator.unwrap_model()`
* Add AuraFlowPipeline and KolorsPipeline to auto map
Just T2I. Validated using `quickdif`
* Add Kolors I2I and SD3 Inpaint auto maps
* style
---------
Co-authored-by: yiyixuxu <yixu310@gmail.com>
* add Latte to diffusers
* remove print
* remove print
* remove print
* remove unuse codes
* remove layer_norm_latte and add a flag
* remove layer_norm_latte and add a flag
* update latte_pipeline
* update latte_pipeline
* remove unuse squeeze
* add norm_hidden_states.ndim == 2: # for Latte
* fixed test latte pipeline bugs
* fixed test latte pipeline bugs
* delete sh
* add doc for latte
* add licensing
* Move Transformer3DModelOutput to modeling_outputs
* give a default value to sample_size
* remove the einops dependency
* change norm2 for latte
* modify pipeline of latte
* update test for Latte
* modify some codes for latte
* modify for Latte pipeline
* modify for Latte pipeline
* modify for Latte pipeline
* modify for Latte pipeline
* modify for Latte pipeline
* modify for Latte pipeline
* modify for Latte pipeline
* modify for Latte pipeline
* modify for Latte pipeline
* modify for Latte pipeline
* modify for Latte pipeline
* modify for Latte pipeline
* modify for Latte pipeline
* modify for Latte pipeline
* modify for Latte pipeline
* modify for Latte pipeline
* modify for Latte pipeline
* modify for Latte pipeline
* modify for Latte pipeline
* modify for Latte pipeline
* modify for Latte pipeline
* modify for Latte pipeline
* modify for Latte pipeline
* modify for Latte pipeline
* modify for Latte pipeline
* modify for Latte pipeline
* modify for Latte pipeline
* video_length -> num_frames; update prepare_latents copied from
* make fix-copies
* make style
* typo: videe -> video
* update
* modify for Latte pipeline
* modify latte pipeline
* modify latte pipeline
* modify latte pipeline
* modify latte pipeline
* modify for Latte pipeline
* Delete .vscode directory
* make style
* make fix-copies
* add latte transformer 3d to docs _toctree.yml
* update example
* reduce frames for test
* fixed bug of _text_preprocessing
* set num frame to 1 for testing
* remove unuse print
* add text = self._clean_caption(text) again
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: Aryan <contact.aryanvs@gmail.com>
Co-authored-by: Aryan <aryan@huggingface.co>
* Add vae_roundtrip.py example
* Add cuda support to vae_roundtrip
* Move vae_roundtrip.py into research_projects/vae
* Fix channel scaling in vae roundrip and also support taesd.
* Apply ruff --fix for CI gatekeep check
---------
Co-authored-by: Álvaro Somoza <asomoza@users.noreply.github.com>
* introduce to promote reusability.
* up
* add more tests
* up
* remove comments.
* fix fuse_nan test
* clarify the scope of fuse_lora and unfuse_lora
* remove space
* minor changes
* minor changes
* minor changes
* minor changes
* minor changes
* minor changes
* minor changes
* fix
* fix
* aligning with blora script
* aligning with blora script
* aligning with blora script
* aligning with blora script
* aligning with blora script
* remove prints
* style
* default val
* license
* move save_model_card to outside push_to_hub
* Update train_dreambooth_lora_sdxl_advanced.py
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Motion Model / Adapter versatility
- allow to use a different number of layers per block
- allow to use a different number of transformer per layers per block
- allow a different number of motion attention head per block
- use dropout argument in get_down/up_block in 3d blocks
* Motion Model added arguments renamed & refactoring
* Add test for asymmetric UNetMotionModel
* Add check for WindowsPath in to_json_string
On Windows, os.path.join returns a WindowsPath. to_json_string does not convert this from a WindowsPath to a string. Added check for WindowsPath to to_json_saveable.
* Remove extraneous convert to string in test_check_path_types (tests/others/test_config.py)
* Fix style issues in tests/others/test_config.py
* Add unit test to test_config.py to verify that PosixPath and WindowsPath (depending on system) both work when converted to JSON
* Remove distinction between PosixPath and WindowsPath in ConfigMixIn.to_json_string(). Conditional now tests for Path, and uses Path.as_posix() to convert to string.
---------
Co-authored-by: Vincent Dovydaitis <vincedovy@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* doc for max_sequence_length
* better position and changed note to tip
* apply suggestions
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Class methods are supposed to use `cls` conventionally
* `make style && make quality`
* An Empty commit
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Discourage using `revision`
* `make style && make quality`
* Refactor code to use 'variant' instead of 'revision'
* `revision="bf16"` -> `variant="bf16"`
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Trim all the trailing white space in the whole repo
* Remove unnecessary empty places
* make style && make quality
* Trim trailing white space
* trim
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Add support for _foreach operations and non-blocking to EMAModel
* default foreach to false
* add non-blocking EMA offloading to SD1.5 T2I example script
* fix whitespace
* move foreach to cli argument
* linting
* Update README.md re: EMA weight training
* correct args.foreach_ema
* add tests for foreach ema
* code quality
* add foreach to from_pretrained
* default foreach false
* fix linting
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: drhead <a@a.a>
* fix
* add check
* key present is checked before
* test case draft
* aply suggestions
* changed testing repo, back to old class
* forgot docstring
---------
Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* get rid of the legacy lora remnants and make our codebase lighter
* fix depcrecated lora argument
* fix
* empty commit to trigger ci
* remove print
* empty
* fix typo in __call__ of pipeline_stable_diffusion_3.py
* fix typo in __call__ of pipeline_stable_diffusion_3_img2img.py
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
[SD3 Docs] Corrected title about loading model with T5
Corrected the documentation title to "Loading the single file checkpoint with T5" Previously, it incorrectly stated "Loading the single file checkpoint without T5" which contradicted the code snippet showing how to load the SD3 checkpoint with the T5 model
* [LoRA] text encoder: read the ranks for all the attn modules
* In addition to out_proj, read the ranks of adapters for q_proj, k_proj, and v_proj
* Allow missing adapters (UNet already supports this)
* ruff format loaders.lora
* [LoRA] add tests for partial text encoders LoRAs
* [LoRA] update test_simple_inference_with_partial_text_lora to be deterministic
* [LoRA] comment justifying test_simple_inference_with_partial_text_lora
* style
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Fix sharding when no device_map is passed
* style
* add tests
* align
* add docstring
* format
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Update train_dreambooth_sd3.py to fix TE garbage collection
* Update train_dreambooth_lora_sd3.py to fix TE garbage collection
---------
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* image_processor.py: Fixed an error in ValueError's message , as the string's join method tried to join types, instead of strings
Bug that occurred:
f"Input is in incorrect format. Currently, we only support {', '.join(supported_formats)}"
TypeError: sequence item 0: expected str instance, type found
* Fixed: C417 Unnecessary `map` usage (rewrite using a generator expression)
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* feat: support saving a model in sharded checkpoints.
* feat: make loading of sharded checkpoints work.
* add tests
* cleanse the loading logic a bit more.
* more resilience while loading from the Hub.
* parallelize shard downloads by using snapshot_download()/
* default to a shard size.
* more fix
* Empty-Commit
* debug
* fix
* uality
* more debugging
* fix more
* initial comments from Benjamin
* move certain methods to loading_utils
* add test to check if the correct number of shards are present.
* add a test to check if loading of sharded checkpoints from the Hub is okay
* clarify the unit when passed as an int.
* use hf_hub for sharding.
* remove unnecessary code
* remove unnecessary function
* lucain's comments.
* fixes
* address high-level comments.
* fix test
* subfolder shenanigans./
* Update src/diffusers/utils/hub_utils.py
Co-authored-by: Lucain <lucainp@gmail.com>
* Apply suggestions from code review
Co-authored-by: Lucain <lucainp@gmail.com>
* remove _huggingface_hub_version as not needed.
* address more feedback.
* add a test for local_files_only=True/
* need hf hub to be at least 0.23.2
* style
* final comment.
* clean up subfolder.
* deal with suffixes in code.
* _add_variant default.
* use weights_name_pattern
* remove add_suffix_keyword
* clean up downloading of sharded ckpts.
* don't return something special when using index.json
* fix more
* don't use bare except
* remove comments and catch the errors better
* fix a couple of things when using is_file()
* empty
---------
Co-authored-by: Lucain <lucainp@gmail.com>
* first draft
* secret
* tiktok
* capital matters
* dataset matter
* don't be a prick
* refact
* only on main or tag
* document with an example
* Update destination dataset
* link
* allow manual trigger
* better
* lin
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* handle norm_type of transformer2d_model safely.
* log an info when old model class is being returned.
* Apply suggestions from code review
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
* remove extra stuff
---------
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
* Update transformer2d.md title
For the other classes (e.g., UNet2DModel) the title of the documentation coincides with the name of the class, but that was not the case for Transformer2DModel.
* Update model docs titles for consistency with class names
* Modularized the train_lora_sdxl file
* Modularized the train_lora_sdxl file
* Modularized the train_lora_sdxl file
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Modularized the train_lora file
* Modularized the train_lora file
* Modularized the train_lora file
* Modularized the train_lora file
* Modularized the train_lora file
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* implement marigold depth and normals pipelines in diffusers core
* remove bibtex
* remove deprecations
* remove save_memory argument
* remove validate_vae
* remove config output
* remove batch_size autodetection
* remove presets logic
move default denoising_steps and processing_resolution into the model config
make default ensemble_size 1
* remove no_grad
* add fp16 to the example usage
* implement is_matplotlib_available
use is_matplotlib_available, is_scipy_available for conditional imports in the marigold depth pipeline
* move colormap, visualize_depth, and visualize_normals into export_utils.py
* make the denoising loop more lucid
fix the outputs to always be 4d tensors or lists of pil images
support a 4d input_image case
attempt to support model_cpu_offload_seq
move check_inputs into a separate function
change default batch_size to 1, remove any logic to make it bigger implicitly
* style
* rename denoising_steps into num_inference_steps
* rename input_image into image
* rename input_latent into latents
* remove decode_image
change decode_prediction to use the AutoencoderKL.decode method
* move clean_latent outside of progress_bar
* refactor marigold-reusable image processing bits into MarigoldImageProcessor class
* clean up the usage example docstring
* make ensemble functions members of the pipelines
* add early checks in check_inputs
rename E into ensemble_size in depth ensembling
* fix vae_scale_factor computation
* better compatibility with torch.compile
better variable naming
* move export_depth_to_png to export_utils
* remove encode_prediction
* improve visualize_depth and visualize_normals to accept multi-dimensional data and lists
remove visualization functions from the pipelines
move exporting depth as 16-bit PNGs functionality from the depth pipeline
update example docstrings
* do not shortcut vae.config variables
* change all asserts to raise ValueError
* rename output_prediction_type to output_type
* better variable names
clean up variable deletion code
* better variable names
* pass desc and leave kwargs into the diffusers progress_bar
implement nested progress bar for images and steps loops
* implement scale_invariant and shift_invariant flags in the ensemble_depth function
add scale_invariant and shift_invariant flags readout from the model config
further refactor ensemble_depth
support ensembling without alignment
add ensemble_depth docstring
* fix generator device placement checks
* move encode_empty_text body into the pipeline call
* minor empty text encoding simplifications
* adjust pipelines' class docstrings to explain the added construction arguments
* improve the scipy failure condition
add comments
improve docstrings
change the default use_full_z_range to True
* make input image values range check configurable in the preprocessor
refactor load_image_canonical in preprocessor to reject unknown types and return the image in the expected 4D format of tensor and on right device
support a list of everything as inputs to the pipeline, change type to PipelineImageInput
implement a check that all input list elements have the same dimensions
improve docstrings of pipeline outputs
remove check_input pipeline argument
* remove forgotten print
* add prediction_type model config
* add uncertainty visualization into export utils
fix NaN values in normals uncertainties
* change default of output_uncertainty to False
better handle the case of an attempt to export or visualize none
* fix `output_uncertainty=False`
* remove kwargs
fix check_inputs according to the new inputs of the pipeline
* rename prepare_latent into prepare_latents as in other pipelines
annotate prepare_latents in normals pipeline with "Copied from"
annotate encode_image in normals pipeline with "Copied from"
* move nested-capable `progress_bar` method into the pipelines
revert the original `progress_bar` method in pipeline_utils
* minor message improvement
* fix cpu offloading
* move colormap, visualize_depth, export_depth_to_16bit_png, visualize_normals, visualize_uncertainty to marigold_image_processing.py
update example docstrings
* fix missing comma
* change torch.FloatTensor to torch.Tensor
* fix importing of MarigoldImageProcessor
* fix vae offloading
fix batched image encoding
remove separate encode_image function and use vae.encode instead
* implement marigold's intial tests
relax generator checks in line with other pipelines
implement return_dict __call__ argument in line with other pipelines
* fix num_images computation
* remove MarigoldImageProcessor and outputs from import structure
update tests
* update docstrings
* update init
* update
* style
* fix
* fix
* up
* up
* up
* add simple test
* up
* update expected np input/output to be channel last
* move expand_tensor_or_array into the MarigoldImageProcessor
* rewrite tests to follow conventions - hardcoded slices instead of image artifacts
write more smoke tests
* add basic docs.
* add anton's contribution statement
* remove todos.
* fix assertion values for marigold depth slow tests
* fix assertion values for depth normals.
* remove print
* support AutoencoderTiny in the pipelines
* update documentation page
add Available Pipelines section
add Available Checkpoints section
add warning about num_inference_steps
* fix missing import in docstring
fix wrong value in visualize_depth docstring
* [doc] add marigold to pipelines overview
* [doc] add section "usage examples"
* fix an issue with latents check in the pipelines
* add "Frame-by-frame Video Processing with Consistency" section
* grammarly
* replace tables with images with css-styled images (blindly)
* style
* print
* fix the assertions.
* take from the github runner.
* take the slices from action artifacts
* style.
* update with the slices from the runner.
* remove unnecessary code blocks.
* Revert "[doc] add marigold to pipelines overview"
This reverts commit a505165150afd8dab23c474d1a054ea505a56a5f.
* remove invitation for new modalities
* split out marigold usage examples
* doc cleanup
---------
Co-authored-by: yiyixuxu <yixu310@gmail.com>
Co-authored-by: yiyixuxu <yixu310@gmail,com>
Co-authored-by: sayakpaul <spsayakpaul@gmail.com>
* add a more secure way to run tests from a PR.
* make pytest more secure.
* address dhruv's comments.
* improve validation check.
* Update .github/workflows/run_tests_from_a_pr.yml
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
---------
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
sampling bug fix in basic_training.md
In the diffusers basic training tutorial, setting the manual seed argument (generator=torch.manual_seed(config.seed)) in the pipeline call inside evaluate() function rewinds the dataloader shuffling, leading to overfitting due to the model seeing same sequence of training examples after every evaluation call. Using generator=torch.Generator(device='cpu').manual_seed(config.seed) avoids this.
* Update pipeline_stable_diffusion_instruct_pix2pix.py
Add `cross_attention_kwargs` to `__call__` method of `StableDiffusionInstructPix2PixPipeline`, which are passed to UNet.
* Update documentation for pipeline_stable_diffusion_instruct_pix2pix.py
* Update docstring
* Update docstring
* Fix typing import
* make _callback_tensor_inputs consistent between sdxl pipelines
* forgot this one
* fix failing test
* fix test_components_function
* fix controlnet inpaint tests
* Merged isinstance calls to make the code simpler.
* Corrected formatting errors using ruff.
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
Fix `added_cond_kwargs` when using IP-Adapter
Fix error when using IP-Adapter in pipeline and passing `ip_adapter_image_embeds` instead of `ip_adapter_image`
Co-authored-by: YiYi Xu <yixu310@gmail.com>
* Expand `diffusers-cli env`
* SafeTensors -> Safetensors
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Move `safetensors_version = "not installed"` to `else`
* Update `safetensors_version` checking
* Add GPU detection for Linux, Mac OS, and Windows
* Add accelerator detection to environment command
* Add is_peft_version to import_utils
* Update env.py
* Add `huggingface_hub` reference
* Add `transformers` reference
* Add reference for `huggingface_hub`
* Fix print statement in env.py for unusual OS
* Up
* Fix platform information in env.py
* up
* Fix import order in env.py
* ruff
* make style
* Fix platform system check in env.py
* Fix run method return type in env.py
* 🤗
* No need f-string
* Remove location info
* Remove accelerate config
* Refactor env.py to remove accelerate config
* feat: Add support for `bitsandbytes` library in environment command
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
* fixed vae loading issue #7619
* rerun make style && make quality
* bring back model_has_vae and add change \ to / in config_file_name on windows os to make match work
* add missing import platform
* bring back import model_info
* make config_file_name OS independent
* switch to using Path.as_posix() to resolve OS dependence
* improve style
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: bssrdf <bssrdf@gmail.com>
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
Update requirements.txt
If the datasets library is old, it will not read the metadata.jsonl and the label will default to an integer of type int.
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Fixed a wrong link to python versions in contributing.md file.
* Updated the link to a permalink, so that it will permanently point to the specific line.
* find & replace all FloatTensors to Tensor
* apply formatting
* Update torch.FloatTensor to torch.Tensor in the remaining files
* formatting
* Fix the rest of the places where FloatTensor is used as well as in documentation
* formatting
* Update new file from FloatTensor to Tensor
* Remove dead code
* PylancereportGeneralTypeIssues: Strings nested within an f-string cannot use the same quote character as the f-string prior to Python 3.12.
* Remove dead code
SDXL LoRA weights for text encoders should be decoupled on save
The method checks if at least one of unet, text_encoder and
text_encoder_2 lora weights are passed, which was not reflected in the
implentation.
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
`model_output.shape` may only have rank 1.
There are warnings related to use of random keys.
```
tests/schedulers/test_scheduler_flax.py: 13 warnings
/Users/phillypham/diffusers/src/diffusers/schedulers/scheduling_ddpm_flax.py:268: FutureWarning: normal accepts a single key, but was given a key array of shape (1, 2) != (). Use jax.vmap for batching. In a future JAX version, this will be an error.
noise = jax.random.normal(split_key, shape=model_output.shape, dtype=self.dtype)
tests/schedulers/test_scheduler_flax.py::FlaxDDPMSchedulerTest::test_betas
/Users/phillypham/virtualenv/diffusers/lib/python3.9/site-packages/jax/_src/random.py:731: FutureWarning: uniform accepts a single key, but was given a key array of shape (1,) != (). Use jax.vmap for batching. In a future JAX version, this will be an error.
u = uniform(key, shape, dtype, lo, hi) # type: ignore[arg-type]
```
* 7879 - adjust documentation to use naruto dataset, since pokemon is now gated
* replace references to pokemon in docs
* more references to pokemon replaced
* Japanese translation update
---------
Co-authored-by: bghira <bghira@users.github.com>
* Add Ascend NPU support for SDXL fine-tuning and fix the model saving bug when using DeepSpeed.
* fix check code quality
* Decouple the NPU flash attention and make it an independent module.
* add doc and unit tests for npu flash attention.
---------
Co-authored-by: mhh001 <mahonghao1@huawei.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* chore: reducing model sizes
* chore: shrinks further
* chore: shrinks further
* chore: shrinking model for img2img pipeline
* chore: reducing size of model for inpaint pipeline
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
FlaxStableDiffusionSafetyChecker sets main_input_name to "clip_input".
This makes StableDiffusionSafetyChecker consistent.
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
* Added get_velocity function to EulerDiscreteScheduler.
* Fix white space on blank lines
* Added copied from statement
* back to the original.
---------
Co-authored-by: Ruining Li <ruining@robots.ox.ac.uk>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
swap the order for do_classifier_free_guidance concat with repeat
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
* Check for latents, before calling prepare_latents - sdxlImg2Img
* Added latents check for all the img2img pipeline
* Fixed silly mistake while checking latents as None
A new function compute_dream_and_update_latents has been added to the
training utilities that allows you to do DREAM rectified training in line
with the paper https://arxiv.org/abs/2312.00210. The method can be used
with an extra argument in the train_text_to_image.py script.
Co-authored-by: Jimmy <39@🇺🇸.com>
* Convert channel order to BGR for the watermark encoder. Convert the watermarked BGR images back to RGB. Fixes#6292
* Revert channel order before stacking images to overcome limitations that negative strides are currently not supported
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Fixed wrong decorator by modifying it to @classmethod.
* Updated the method and it's argument.
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* add scheduled pseudo-huber loss training scripts
See #7488
* add reduction modes to huber loss
* [DB Lora] *2 multiplier to huber loss cause of 1/2 a^2 conv.
pairing of c6495def1f
* [DB Lora] add option for smooth l1 (huber / delta)
Pairing of dd22958caa
* [DB Lora] unify huber scheduling
Pairing of 19a834c3ab
* [DB Lora] add snr huber scheduler
Pairing of 47fb1a6854
* fixup examples link
* use snr schedule by default in DB
* update all huber scripts with snr
* code quality
* huber: make style && make quality
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Initialize target_unet from unet rather than teacher_unet so that we correctly add time_embedding.cond_proj if necessary.
* Use UNet2DConditionModel.from_config to initialize target_unet from unet's config.
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* give it a shot.
* print.
* correct assertion.
* gather results from the rest of the tests.
* change the assertion values where needed.
* remove print statements.
* get device <-> component mapping when using multiple gpus.
* condition the device_map bits.
* relax condition
* device_map progress.
* device_map enhancement
* some cleaning up and debugging
* Apply suggestions from code review
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* incorporate suggestions from PR.
* remove multi-gpu condition for now.
* guard check the component -> device mapping
* fix: device_memory variable
* dispatching transformers model to have force_hooks=True
* better guarding for transformers device_map
* introduce support balanced_low_memory and balanced_ultra_low_memory.
* remove device_map patch.
* fix: intermediate variable scoping.
* fix: condition in cpu offload.
* fix: flax class restrictions.
* remove modifications from cpu_offload and model_offload
* incorporate changes.
* add a simple forward pass test
* add: torch_device in get_inputs()
* add: tests
* remove print
* safe-guard to(), model offloading and cpu offloading when balanced is used as a device_map.
* style
* remove .
* safeguard device_map with more checks and remove invalid device_mapping strategues.
* make a class attribute and adjust tests accordingly.
* fix device_map check
* fix test
* adjust comment
* fix: device_map attribute
* fix: dispatching.
* max_memory test for pipeline
* version guard the tests
* fix guard.
* address review feedback.
* reset_device_map method.
* add: test for reset_hf_device_map
* fix a couple things.
* add reset_device_map() in the error message.
* add tests for checking reset_device_map doesn't have unintended consequences.
* fix reset_device_map and offloading tests.
* create _get_final_device_map utility.
* hf_device_map -> _hf_device_map
* add documentation
* add notes suggested by Marc.
* styling.
* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* move updates within gpu condition.
* other docs related things
* note on ignore a device not specified in .
* provide a suggestion if device mapping errors out.
* fix: typo.
* _hf_device_map -> hf_device_map
* Empty-Commit
* add: example hf_device_map.
---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* remove libsndfile1-dev and libgl1 from workflows and ensure that re present in the respective dockerfiles.
* change to self-hosted runner; let's see 🤞
* add libsndfile1-dev libgl1 for now
* use self-hosted runners for building and push too.
* Restore unet params back to normal from EMA when validation call is finished
* empty commit
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Allow safety and feature extractor arguments to be passed to convert_from_ckpt
Allows management of safety checker and feature extractor
from outside of the convert ckpt class.
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* reduce block sizes for unet1d.
* reduce blocks for unet_2d.
* reduce block size for unet_motion
* increase channels.
* correctly increase channels.
* reduce number of layers in unet2dconditionmodel tests.
* reduce block sizes for unet2dconditionmodel tests
* reduce block sizes for unet3dconditionmodel.
* fix: test_feed_forward_chunking
* fix: test_forward_with_norm_groups
* skip spatiotemporal tests on MPS.
* reduce block size in AutoencoderKL.
* reduce block sizes for vqmodel.
* further reduce block size.
* make style.
* Empty-Commit
* reduce sizes for ConsistencyDecoderVAETests
* further reduction.
* further block reductions in AutoencoderKL and AssymetricAutoencoderKL.
* massively reduce the block size in unet2dcontionmodel.
* reduce sizes for unet3d
* fix tests in unet3d.
* reduce blocks further in motion unet.
* fix: output shape
* add attention_head_dim to the test configuration.
* remove unexpected keyword arg
* up a bit.
* groups.
* up again
* fix
* Skip `test_freeu_enabled ` on MPS
* Small fixes
- import skip_mps correctly
- disable all instances of test_freeu_enabled
* Empty commit to trigger tests
* Empty commit to trigger CI
* increase number of workers for the tests.
* move to beefier runner.
* improve the fast push tests too.
* use a beefy machine for pytorch pipeline tests
* up the number of workers further.
* UniPC Multistep add `rescale_betas_zero_snr`
Same patch as DPM and Euler with the patched final alpha cumprod
BF16 doesn't seem to break down, I think cause UniPC upcasts during some
phases already? We could still force an upcast since it only
loses ≈ 0.005 it/s for me but the difference in output is very small. A
better endeavor might upcasting in step() and removing all the other
upcasts elsewhere?
* UniPC ZSNR UT
* Re-add `rescale_betas_zsnr` doc oops
* UniPC UTs iterate solvers on FP16
It wasn't catching errs on order==3. Might be excessive?
* UniPC Multistep fix tensor dtype/device on order=3
* UniPC UTs Add v_pred to fp16 test iter
For completions sake. Probably overkill?
* 7529 do not disable autocast for cuda devices
* Remove typecasting error check for non-mps platforms, as a correct autocast implementation makes it a non-issue
* add autocast fix to other training examples
* disable native_amp for dreambooth (sdxl)
* disable native_amp for pix2pix (sdxl)
* remove tests from remaining files
* disable native_amp on huggingface accelerator for every training example that uses it
* convert more usages of autocast to nullcontext, make style fixes
* make style fixes
* style.
* Empty-Commit
---------
Co-authored-by: bghira <bghira@users.github.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* start printing the tensors.
* print full throttle
* set static slices for 7 tests.
* remove printing.
* flatten
* disable test for controlnet
* what happens when things are seeded properly?
* set the right value
* style./
* make pia test fail to check things
* print.
* fix pia.
* checking for animatediff.
* fix: animatediff.
* video synthesis
* final piece.
* style.
* print guess.
* fix: assertion for control guess.
---------
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
* Add `final_sigma_zero` to UniPCMultistep
Effectively the same trick as DDIM's `set_alpha_to_one` and
DPM's `final_sigma_type='zero'`.
Currently False by default but maybe this should be True?
* `final_sigma_zero: bool` -> `final_sigmas_type: str`
Should 1:1 match DPM Multistep now.
* Set `final_sigmas_type='sigma_min'` in UniPC UTs
* Initial commit
* Implemented block lora
- implemented block lora
- updated docs
- added tests
* Finishing up
* Reverted unrelated changes made by make style
* Fixed typo
* Fixed bug + Made text_encoder_2 scalable
* Integrated some review feedback
* Incorporated review feedback
* Fix tests
* Made every module configurable
* Adapter to new lora test structure
* Final cleanup
* Some more final fixes
- Included examples in `using_peft_for_inference.md`
- Added hint that only attns are scaled
- Removed NoneTypes
- Added test to check mismatching lens of adapter names / weights raise error
* Update using_peft_for_inference.md
* Update using_peft_for_inference.md
* Make style, quality, fix-copies
* Updated tutorial;Warning if scale/adapter mismatch
* floats are forwarded as-is; changed tutorial scale
* make style, quality, fix-copies
* Fixed typo in tutorial
* Moved some warnings into `lora_loader_utils.py`
* Moved scale/lora mismatch warnings back
* Integrated final review suggestions
* Empty commit to trigger CI
* Reverted emoty commit to trigger CI
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* speed up test_vae_slicing in animatediff
* speed up test_karras_schedulers_shape for attend and excite.
* style.
* get the static slices out.
* specify torch print options.
* modify
* test run with controlnet
* specify kwarg
* fix: things
* not None
* flatten
* controlnet img2img
* complete controlet sd
* finish more
* finish more
* finish more
* finish more
* finish the final batch
* add cpu check for expected_pipe_slice.
* finish the rest
* remove print
* style
* fix ssd1b controlnet test
* checking ssd1b
* disable the test.
* make the test_ip_adapter_single controlnet test more robust
* fix: simple inpaint
* multi
* disable panorama
* enable again
* panorama is shaky so leave it for now
* remove print
* raise tolerance.
* Bug fix for controlnetpipeline check_image
Bug fix for controlnetpipeline check_image when using multicontrolnet and prompt list
* Update test_inference_multiple_prompt_input function
* Update test_controlnet.py
add test for multiple prompts and multiple image conditioning
* Update test_controlnet.py
Fix format error
---------
Co-authored-by: Lvkesheng Shen <45848260+Fantast416@users.noreply.github.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* add remove_all_hooks
* a few more fix and tests
* up
* Update src/diffusers/pipelines/pipeline_utils.py
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* split tests
* add
---------
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* apple mps: training support for SDXL LoRA
* sdxl: support training lora, dreambooth, t2i, pix2pix, and controlnet on apple mps
---------
Co-authored-by: bghira <bghira@users.github.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* mps: fix XL pipeline inference at training time due to upstream pytorch bug
* Update src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* apply the safe-guarding logic elsewhere.
---------
Co-authored-by: bghira <bghira@users.github.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
you cannot specify `type="bool"` and `action="store_true"` at the same time.
remove excessive and buggy `type=bool`.
Co-authored-by: Linoy Tsaban <57615435+linoytsaban@users.noreply.github.com>
* feat: support dora loras from community
* safe-guard dora operations under peft version.
* pop use_dora when False
* make dora lora from kohya work.
* fix: kohya conversion utils.
* add a fast test for DoRA compatibility..
* add a nightly test.
* fixed typo
* updated doc to be consistent in naming
* make style/quality
* preprocessing for 4 channels and not 6
* make style
* test for 4c
* make style/quality
* fixed test on cpu
* fixed doc typo
* changed default ckpt to 4c
* Update pipeline_stable_diffusion_ldm3d.py
* fix bug
---------
Co-authored-by: Aflalo <estellea@isl-iam1.rr.intel.com>
Co-authored-by: Aflalo <estellea@isl-gpu33.rr.intel.com>
Co-authored-by: Aflalo <estellea@isl-gpu38.rr.intel.com>
* Add properties and `IPAdapterTesterMixin` tests for `StableDiffusionPanoramaPipeline`
* Update torch manual seed to use `torch.Generator(device=device)`
* Refactor 📞🔙 to support `callback_on_step_end`
* make fix-copies
* fix freeinit impl
* fix progress bar
* fix progress bar and remove old code
* fix num_inference_steps==1 case for freeinit by atleast running 1 step when fast sampling enabled
* checking to improve pipelines.
* more fixes.
* add: tip to encourage the usage of revision
* Apply suggestions from code review
* retrigger ci
---------
Co-authored-by: YiYi Xu <yixu310@gmail.com>
* Fix ControlNetModel.from_unet do not load add_embedding
* delete white space in blank line
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* debugging
* let's see the numbers
* let's see the numbers
* let's see the numbers
* restrict tolerance.
* increase inference steps.
* shallow copy of cross_attentionkwargs
* remove print
* pop scale from the top-level unet instead of getting it.
* improve readability.
* Apply suggestions from code review
Co-authored-by: YiYi Xu <yixu310@gmail.com>
* fix a little bit.
---------
Co-authored-by: YiYi Xu <yixu310@gmail.com>
* Add properties and `IPAdapterTesterMixin` tests for `StableDiffusionPanoramaPipeline`
* Fix variable name typo and update comments
* Update deprecated `output_type="numpy"` to "np" in test files
* Discard changes to src/diffusers/pipelines/stable_diffusion_panorama/pipeline_stable_diffusion_panorama.py
* Update test_stable_diffusion_panorama.py
* Update numbers in README.md
* Update get_guidance_scale_embedding method to use timesteps instead of w
* Update number of checkpoints in README.md
* Add type hints and fix var name
* Fix PyTorch's convention for inplace functions
* Fix a typo
* Revert "Fix PyTorch's convention for inplace functions"
This reverts commit 74350cf65b.
* Fix typos
* Indent
* Refactor get_guidance_scale_embedding method in LEditsPPPipelineStableDiffusionXL class
* log loss per image
* add commandline param for per image loss logging
* style
* debug-loss -> debug_loss
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Change step_offset scheduler docstrings
* Mention it may be needed by some models
* More docstrings
These ones failed literal S&R because I performed it case-sensitive
which is fun.
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* add: support for notifying maintainers about the nightly test status
* add: a tempoerary workflow for validation.
* cancel in progress.
* runs-on
* clean up
* add: peft dep
* change device.
* multiple edits.
* remove temp workflow.
* add: a workflow to check if docker containers can be built if the files are modified.
* type
* unify docker image build test and push
* make it run on prs too.
* check
* check
* check
* check again.
* remove docker test build file.
* remove extra dependencies./
* check
* Initial commit
* Removed copy hints, as in original SDXLControlNetPipeline
Removed copy hints, as in original SDXLControlNetPipeline, as the `make fix-copies` seems to have issues with the @property decorator.
* Reverted changes to ControlNetXS
* Addendum to: Removed changes to ControlNetXS
* Added test+docs for mixture of denoiser
* Update docs/source/en/using-diffusers/controlnet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/using-diffusers/controlnet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* I added a new doc string to the class. This is more flexible to understanding other developers what are doing and where it's using.
* Update src/diffusers/models/unet_2d_blocks.py
This changes suggest by maintener.
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Update src/diffusers/models/unet_2d_blocks.py
Add suggested text
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Update unet_2d_blocks.py
I changed the Parameter to Args text.
* Update unet_2d_blocks.py
proper indentation set in this file.
* Update unet_2d_blocks.py
a little bit of change in the act_fun argument line.
* I run the black command to reformat style in the code
* Update unet_2d_blocks.py
similar doc-string add to have in the original diffusion repository.
* Fix bug for mention in this issue section #6901
* Update src/diffusers/schedulers/scheduling_ddim_flax.py
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* Fix linter
* Restore empty line
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* copied from for t2i pipelines without ip adapter support.
* two more pipelines with proper copied from comments.
* revert to the original implementation
* throw error when patch inputs and layernorm are provided for transformers2d.
* add comment on supported norm_types in transformers2d
* more check
* fix: norm _type handling
* [bug] Fix float/int guidance scale not working in `StableVideoDiffusionPipeline`
* Add test to disable CFG on SVD
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* support and example launch for sdxl turbo
* White space fixes
* Trailing whitespace character
* ruff format
* fix guidance_scale and steps for turbo mode
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Radames Ajna <radamajna@gmail.com>
* update svd docs
* fix example doc string
* update return type hints/docs
* update type hints
* Fix typos in pipeline_stable_video_diffusion.py
* make style && make fix-copies
* Update src/diffusers/pipelines/stable_video_diffusion/pipeline_stable_video_diffusion.py
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update src/diffusers/pipelines/stable_video_diffusion/pipeline_stable_video_diffusion.py
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* update based on suggestion
---------
Co-authored-by: M. Tolga Cangöz <mtcangoz@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Enable FakeTensorMode for EulerDiscreteScheduler scheduler
PyTorch's FakeTensorMode does not support `.numpy()` or `numpy.array()`
calls.
This PR replaces `sigmas` numpy tensor by a PyTorch tensor equivalent
Repro
```python
with torch._subclasses.FakeTensorMode() as fake_mode, ONNXTorchPatcher():
fake_model = DiffusionPipeline.from_pretrained(model_name, low_cpu_mem_usage=False)
```
that otherwise would fail with
`RuntimeError: .numpy() is not supported for tensor subclasses.`
* Address comments
* add tags for diffusers training
* add tags for diffusers training
* add tags for diffusers training
* add tags for diffusers training
* add tags for diffusers training
* add tags for diffusers training
* add dora tags for drambooth lora scripts
* style
* add is_dora arg
* style
* add dora training feature to sd 1.5 script
* added notes about DoRA training
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* initial
* check_inputs fix to the rest of pipelines
* add fix for no cfg too
* use of variable
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Add copyright notice to relevant files and fix typos
* Set `timestep_spacing` parameter of `StableDiffusionXLPipeline`'s scheduler to `'trailing'`.
* Update `StableDiffusionXLPipeline.from_single_file` by including EulerAncestralDiscreteScheduler with `timestep_spacing="trailing"` param.
* Update model loading method in SDXL Turbo documentation
* move model helper function in pipeline to EfficiencyMixin
---------
Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* DPMMultistep rescale_betas_zero_snr
* DPM upcast samples in step()
* DPM rescale_betas_zero_snr UT
* DPMSolverMulti move sample upcast after model convert
Avoids having to re-use the dtype.
* Add a newline for Ruff
* log_validation unification for controlnet.
* additional fixes.
* remove print.
* better reuse and loading
* make final inference run conditional.
* Update examples/controlnet/README_sdxl.md
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* resize the control image in the snippet.
---------
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* Make LoRACompatibleConv padding_mode work.
* Format code style.
* add fast test
* Update src/diffusers/models/lora.py
Simplify the code by patrickvonplaten.
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* code refactor
* apply patrickvonplaten suggestion to simplify the code.
* rm test_lora_layers_old_backend.py and add test case in test_lora_layers_peft.py
* update test case.
---------
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
* modulize log validation
* run make style and refactor wanddb support
* remove redundant initialization
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* make checkpoint_merger pipeline pass the "variant" argument to from_pretrained()
* make style
---------
Co-authored-by: Lincoln Stein <lstein@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
* add stable_diffusion_xl_ipex community pipeline
* make style for code quality check
* update docs as suggested
---------
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* standardize model card
* fix tags
* correct import styling and update tags
* run make style and make quality
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* feat: allow low_cpu_mem_usage in ip adapter loading
* reduce the number of device placements.
* documentation.
* throw low_cpu_mem_usage warning only once from the main entry point.
* use load_model_into_meta in single file utils
* propagate to autoencoder and controlnet.
* correct class name access behaviour.
* remove torch_dtype from load_model_into_meta; seems unncessary
* remove incorrect kwarg
* style to avoid extra unnecessary line breaks
* fix: bias loading bug
* fixes for SDXL
* apply changes to the conversion script to match single_file_utils.py
* do transpose to match the single file loading logic.
Remove <cat-toy> validation prompt from textual_inversion_sdxl.py
The `<cat-toy>` validation prompt is a default choice for the example task in the README. But no other part of `textual_inversion_sdxl.py` references the cat toy and `textual_inversion.py` has a default validation prompt of `None` as well.
So bring `textual_inversion_sdxl.py` in line with `textual_inversion.py` and change default validation prompt to `None`
* attention_head_dim
* debug
* print more info
* correct num_attention_heads behaviour
* down_block_num_attention_heads -> num_attention_heads.
* correct the image link in doc.
* add: deprecation for num_attention_head
* fix: test argument to use attention_head_dim
* more fixes.
* quality
* address comments.
* remove depcrecation.
* add: support for passing ip adapter image embeddings
* debugging
* make feature_extractor unloading conditioned on safety_checker
* better condition
* type annotation
* index to look into value slices
* more debugging
* debugging
* serialize embeddings dict
* better conditioning
* remove unnecessary prints.
* Update src/diffusers/loaders/ip_adapter.py
Co-authored-by: YiYi Xu <yixu310@gmail.com>
* make fix-copies and styling.
* styling and further copy fixing.
* fix: check_inputs call in controlnet sdxl img2img pipeline
---------
Co-authored-by: YiYi Xu <yixu310@gmail.com>
* feat: standarize model card creation for dreambooth training.
* correct 'inference
* remove comments.
* take component out of kwargs
* style
* add: card template to have a leaner description.
* widget support.
* propagate changes to train_dreambooth_lora
* propagate changes to custom diffusion
* make widget properly type-annotated
* fix: callback function name is incorrect
On this tutorial there is a function defined and then used inside `callback_on_step_end` argument, but the name was not correct (mismatch)
* fix: typo in num_timestep (correct is num_timesteps)
fixed property name
* remove _to_tensor
* remove _to_tensor definition
* remove _collapse_frames_into_batch
* remove lora for not bloating the code.
* remove sample_size.
* simplify code a bit more
* ensure timesteps are always in tensor.
* Fix `AutoencoderTiny` with `use_slicing`
When using slicing with AutoencoderTiny, the encoder mistakenly encodes the entire batch for every image in the batch.
* Fixed formatting issue
* add noise_offset param
* micro conditioning - wip
* image processing adjusted and moved to support micro conditioning
* change time ids to be computed inside train loop
* change time ids to be computed inside train loop
* change time ids to be computed inside train loop
* time ids shape fix
* move token replacement of validation prompt to the same section of instance prompt and class prompt
* add offset noise to sd15 advanced script
* fix token loading during validation
* fix token loading during validation in sdxl script
* a little clean
* style
* a little clean
* style
* sdxl script - a little clean + minor path fix
sd 1.5 script - change default resolution value
* ad 1.5 script - minor path fix
* fix missing comma in code example in model card
* clean up commented lines
* style
* remove time ids computed outside training loop - no longer used now that we utilize micro-conditioning, as all time ids are now computed inside the training loop
* style
* [WIP] - added draft readme, building off of examples/dreambooth/README.md
* readme
* readme
* readme
* readme
* readme
* readme
* readme
* readme
* removed --crops_coords_top_left from CLI args
* style
* fix missing shape bug due to missing RGB if statement
* add blog mention at the start of the reamde as well
* Update examples/advanced_diffusion_training/README.md
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* change note to render nicely as well
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Fix bug in ResnetBlock2D.forward when not USE_PEFT_BACKEND and using scale_shift for time emb where the lora scale gets overwritten.
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* fix minsnr implementation for v-prediction case
* format code
* always compute snr when snr_gamma is specified
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* feat: explicitly tag to diffusers when using push_to_hub
* remove tags.
* reset repo.
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* fix: tests
* fix: push_to_hub behaviour for tagging from save_pretrained
* Apply suggestions from code review
Co-authored-by: Lucain <lucainp@gmail.com>
* Apply suggestions from code review
Co-authored-by: Lucain <lucainp@gmail.com>
* import fixes.
* add library name to existing model card.
* add: standalone test for generate_model_card
* fix tests for standalone method
* moved library_name to a better place.
* merge create_model_card and generate_model_card.
* fix test
* address lucain's comments
* fix return identation
* Apply suggestions from code review
Co-authored-by: Lucain <lucainp@gmail.com>
* address further comments.
* Update src/diffusers/pipelines/pipeline_utils.py
Co-authored-by: Lucain <lucainp@gmail.com>
---------
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Lucain <lucainp@gmail.com>
* initial commit for unconditional/class-conditional consistency training script
* make style
* Add entry for consistency training script in community README.
* Move consistency training script from community to research_projects/consistency_training
* Add requirements.txt and README to research_projects/consistency_training directory.
* Manually revert community README changes for consistency training.
* Fix path to script after moving script to research projects.
* Add option to load U-Net weights from pretrained model.
---------
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* begin animatediff img2video and video2video
* revert animatediff to original implementation
* add img2video as pipeline
* update
* add vid2vid pipeline
* update imports
* update
* remove copied from line for check_inputs
* update
* update examples
* add multi-batch support
* fix __init__.py files
* move img2vid to community
* update community readme and examples
* fix
* make fix-copies
* add vid2vid batch params
* apply suggestions from review
Co-Authored-By: Dhruv Nair <dhruv.nair@gmail.com>
* add test for animatediff vid2vid
* torch.stack -> torch.cat
Co-Authored-By: Dhruv Nair <dhruv.nair@gmail.com>
* make style
* docs for vid2vid
* update
* fix prepare_latents
* fix docs
* remove img2vid
* update README to :main
* remove slow test
* refactor pipeline output
* update docs
* update docs
* merge community readme from :main
* final fix i promise
* add support for url in animatediff example
* update example
* update callbacks to latest implementation
* Update src/diffusers/pipelines/animatediff/pipeline_animatediff_video2video.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/diffusers/pipelines/animatediff/pipeline_animatediff_video2video.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* fix merge
* Apply suggestions from code review
* remove callback and callback_steps as suggested in review
* Update tests/pipelines/animatediff/test_animatediff_video2video.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* fix import error caused due to unet refactor in #6630
* fix numpy import error after tensor2vid refactor in #6626
* make fix-copies
* fix numpy error
* fix progress bar test
---------
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* sd1.5 support in separate script
A quick adaptation to support people interested in using this method on 1.5 models.
* sd15 prompt text encoding and unet conversions
as per @linoytsaban 's recommendations. Testing would be appreciated,
* Readability and quality improvements
Removed some mentions of SDXL, and some arguments that don't apply to sd 1.5, and cleaned up some comments.
* make style/quality commands
* tracker rename and run-it doc
* Update examples/advanced_diffusion_training/train_dreambooth_lora_sd15_advanced.py
* Update examples/advanced_diffusion_training/train_dreambooth_lora_sd15_advanced.py
---------
Co-authored-by: Linoy Tsaban <57615435+linoytsaban@users.noreply.github.com>
* move unets to module 🦋
* parameterize unet-level import.
* fix flax unet2dcondition model import
* models __init__
* mildly depcrecating models.unet_2d_blocks in favor of models.unets.unet_2d_blocks.
* noqa
* correct depcrecation behaviour
* inherit from the actual classes.
* Empty-Commit
* backwards compatibility for unet_2d.py
* backward compatibility for unet_2d_condition
* bc for unet_1d
* bc for unet_1d_blocks
* Fixed the bug related to saving DeepSpeed models.
* Add information about training SD models using DeepSpeed to the README.
* Apply suggestions from code review
---------
Co-authored-by: mhh001 <mahonghao1@huawei.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* - extract function for stage in UNet2DConditionModel init & forward
- Add new function get_mid_block() to unet_2d_blocks.py
* add type hint to get_mid_block aligned with get_up_block and get_down_block; rename _set_xxx function
* add type hint and use keyword arguments
* remove `copy from` in versatile diffusion
* add animatediff img2vid
* fix
* Update examples/community/README.md
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* fix code snippet between ip adapter face id and animatediff img2vid
---------
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* [Fix] Multiple image conditionings in a single batch for `StableDiffusionControlNetPipeline`.
* Refactor `check_inputs` in `StableDiffusionControlNetPipeline` to avoid redundant codes.
* Make the behavior of MultiControlNetModel to be the same to the original ControlNetModel
* Keep the code change minimum for nested list support
* Add fast test `test_inference_nested_image_input`
* Remove redundant check for nested image condition in `check_inputs`
Remove `len(image) == len(prompt)` check out of `check_image()`
Co-authored-by: YiYi Xu <yixu310@gmail.com>
* Better `ValueError` message for incompatible nested image list size
Co-authored-by: YiYi Xu <yixu310@gmail.com>
* Fix syntax error in `check_inputs`
* Remove warning message for multi-ControlNets with multiple prompts
* Fix a typo in test_controlnet.py
Co-authored-by: YiYi Xu <yixu310@gmail.com>
* Add test case for multiple prompts, single image conditioning in `StableDiffusionMultiControlNetPipelineFastTests`
* Improved `ValueError` message for nested `controlnet_conditioning_scale`
* Documenting the behavior of image list as `StableDiffusionControlNetPipeline` input
---------
Co-authored-by: YiYi Xu <yixu310@gmail.com>
* Fixes#6418 Advanced Dreambooth LoRa Training
* change order of import to fix nit
* fix nit, use cast_training_params
* remove torch.compile fix, will move to a new PR
* remove unnecessary import
* Enable image resizing to adjust its height and width in StableDiffusionXLInstructPix2PixPipeline
* Ensure that validation is performed at every 'validation_step', not at every step
* fix: training resume from fp16.
* add: comment
* remove residue from another branch.
* remove more residues.
* thanks to Younes; no hacks.
* style.
* clean things a bit and modularize _set_state_dict_into_text_encoder
* add comment about the fix detailed.
* support compile
* make style
* move unwrap_model inside function
* change unwrap call
* run make style
* Update examples/dreambooth/train_dreambooth.py
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Revert "Update examples/dreambooth/train_dreambooth.py"
This reverts commit 70ab09732e.
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Remove conversion to RGB
* Add a Conversion Function
* Add type hint for convert_method
* Update src/diffusers/utils/loading_utils.py
Update docstring
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update docstring
* Optimize imports
* Optimize imports (2)
* Reformat code
---------
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* base template file - train_instruct_pix2pix.py
* additional import and parser argument requried for lora
* finetune only instructpix2pix model -- no need to include these layers
* inject lora layers
* freeze unet model -- only lora layers are trained
* training modifications to train only lora parameters
* store only lora parameters
* move train script to research project
* run quality and style code checks
* move train script to a new folder
* add README
* update README
* update references in README
---------
Co-authored-by: Rahul Raman <rahulraman@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* enable stable-xl textual inversion
* check if optimizer_2 exists
* check text_encoder_2 before using
* add textual inversion for sdxl in a single file
* fix style
* fix example style
* reset for error changes
* add readme for sdxl
* fix style
* disable autocast as it will cause cast error when weight_dtype=bf16
* fix spelling error
* fix style and readme and 8bit optimizer
* add README_sdxl.md link
* add tracker key on log_validation
* run style
* rm the second center crop
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* add tutorials to toctree.yml
* fix title
* fix words
* add overview ja
* fix diffusion to 拡散
* fix line 21
* add space
* delete supported pipline
* fix tutorial_overview.md
* fix space
* fix typo
* Delete docs/source/ja/tutorials/using_peft_for_inference.md
this file is not translated
* Delete docs/source/ja/tutorials/basic_training.md
this file is not translated
* Delete docs/source/ja/tutorials/autopipeline.md
this file is not translated
* fix toctree
* add: experimental script for diffusion dpo training.
* random_crop cli.
* fix: caption tokenization.
* fix: pixel_values index.
* fix: grad?
* debug
* fix: reduction.
* fixes in the loss calculation.
* style
* fix: unwrap call.
* fix: validation inference.
* add: initial sdxl script
* debug
* make sure images in the tuple are of same res
* fix model_max_length
* report print
* boom
* fix: numerical issues.
* fix: resolution
* comment about resize.
* change the order of the training transformation.
* save call.
* debug
* remove print
* manually detaching necessary?
* use the same vae for validation.
* add: readme.
* unwrap text encoder when saving hook only for full text encoder tuning
* unwrap text encoder when saving hook only for full text encoder tuning
* save embeddings in each checkpoint as well
* save embeddings in each checkpoint as well
* save embeddings in each checkpoint as well
* Update examples/advanced_diffusion_training/train_dreambooth_lora_sdxl_advanced.py
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* add documentation for DeepCache
* fix typo
* add wandb url for DeepCache
* fix some typos
* add item in _toctree.yml
* update formats for arguments
* Update deepcache.md
* Update docs/source/en/optimization/deepcache.md
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* add StableDiffusionXLPipeline in doc
* Separate SDPipeline and SDXLPipeline
* Add the paper link of ablation experiments for hyper-parameters
* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Make WDS pipeline interpolation type configurable.
* Make the VAE encoding batch size configurable.
* Make lora_alpha and lora_dropout configurable for LCM LoRA scripts.
* Generalize scalings_for_boundary_conditions function and make the timestep scaling configurable.
* Make LoRA target modules configurable for LCM-LoRA scripts.
* Move resolve_interpolation_mode to src/diffusers/training_utils.py and make interpolation type configurable in non-WDS script.
* apply suggestions from review
* debug
* debug test_with_different_scales_fusion_equivalence
* use the right method.
* place it right.
* let's see.
* let's see again
* alright then.
* add a comment.
* I added a new doc string to the class. This is more flexible to understanding other developers what are doing and where it's using.
* Update src/diffusers/models/unet_2d_blocks.py
This changes suggest by maintener.
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Update src/diffusers/models/unet_2d_blocks.py
Add suggested text
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Update unet_2d_blocks.py
I changed the Parameter to Args text.
* Update unet_2d_blocks.py
proper indentation set in this file.
* Update unet_2d_blocks.py
a little bit of change in the act_fun argument line.
* I run the black command to reformat style in the code
* Update unet_2d_blocks.py
similar doc-string add to have in the original diffusion repository.
* Batter way to write binarize function
* Solve check_code_quality error
* My mistake to run pull request but not reformated file
* Update image_processor.py
* remove extra variable and space
* Update image_processor.py
* Run ruff libarary to reformat my file
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
* add: test to check if peft loras are loadable in non-peft envs.
* add torch_device approrpiately.
* fix: get_dummy_inputs().
* test logits.
* rename
* debug
* debug
* fix: generator
* new assertion values after fixing the seed.
* shape
* remove print statements and settle this.
* to update values.
* change values when lora config is initialized under a fixed seed.
* update colab link
* update notebook link
* sanity restored by getting the exact same values without peft.
* change timesteps used to calculate snr when --with_prior_preservation is enabled
* change timesteps used to calculate snr when --with_prior_preservation is enabled (canonical script)
* style
* revert canonical script to before snr gamma change
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Add unload_ip_adapter method
* Update attn_processors with original layers
* Add test
* Use set_default_attn_processor
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Fix gradient-checkpointing option is ignored in SDXL+LoRA training. (#6388)
* Fix gradient-checkpointing option is ignored in SD+LoRA training.
* Fix gradient checkpoint is not applied to text encoders. (SDXL+LoRA)
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* add doc for diffusion fast
* add entry to _toctree
* Apply suggestions from code review
* fix titlew
* fix: title entry
* add note about fuse_qkv_projections
* add adapter_name in fuse
* add tesrt
* up
* fix CI
* adapt from suggestion
* Update src/diffusers/utils/testing_utils.py
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
* change to `require_peft_version_greater`
* change variable names in test
* Update src/diffusers/loaders/lora.py
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
* break into 2 lines
* final comments
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
* [Peft] fix saving / loading when unet is not "unet"
* Update src/diffusers/loaders/lora.py
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* undo stablediffusion-xl changes
* use unet_name to get unet for lora helpers
* use unet_name
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2023-12-26 11:39:28 +01:00
2021 changed files with 518573 additions and 59105 deletions
echo "Quality check failed. Please ensure the right dependency versions are installed with 'pip install -e .[quality]' and run 'make style && make quality'" >> $GITHUB_STEP_SUMMARY
check_repository_consistency:
needs:check_code_quality
runs-on:ubuntu-22.04
steps:
- uses:actions/checkout@v3
- name:Set up Python
uses:actions/setup-python@v4
with:
python-version:"3.10"
- name:Install dependencies
run:|
python -m pip install --upgrade pip
pip install .[quality]
- name:Check repo consistency
run:|
python utils/check_copies.py
python utils/check_dummies.py
python utils/check_support_list.py
make deps_table_check_updated
- name:Check if failure
if:${{ failure() }}
run:|
echo "Repo consistency check failed. Please ensure the right dependency versions are installed with 'pip install -e .[quality]' and run 'make fix-copies'" >> $GITHUB_STEP_SUMMARY
echo "Quality check failed. Please ensure the right dependency versions are installed with 'pip install -e .[quality]' and run 'make style && make quality'" >> $GITHUB_STEP_SUMMARY
check_repository_consistency:
needs:check_code_quality
runs-on:ubuntu-22.04
steps:
- uses:actions/checkout@v3
- name:Set up Python
uses:actions/setup-python@v4
with:
python-version:"3.8"
- name:Install dependencies
run:|
python -m pip install --upgrade pip
pip install .[quality]
- name:Check repo consistency
run:|
python utils/check_copies.py
python utils/check_dummies.py
python utils/check_support_list.py
make deps_table_check_updated
- name:Check if failure
if:${{ failure() }}
run:|
echo "Repo consistency check failed. Please ensure the right dependency versions are installed with 'pip install -e .[quality]' and run 'make fix-copies'" >> $GITHUB_STEP_SUMMARY
PIPELINE_USAGE_CUTOFF:1000000000# set high cutoff so that only always-test pipelines run
jobs:
check_code_quality:
runs-on:ubuntu-22.04
steps:
- uses:actions/checkout@v3
- name:Set up Python
uses:actions/setup-python@v4
with:
python-version:"3.8"
- name:Install dependencies
run:|
python -m pip install --upgrade pip
pip install .[quality]
- name:Check quality
run:make quality
- name:Check if failure
if:${{ failure() }}
run:|
echo "Quality check failed. Please ensure the right dependency versions are installed with 'pip install -e .[quality]' and run 'make style && make quality'" >> $GITHUB_STEP_SUMMARY
check_repository_consistency:
needs:check_code_quality
runs-on:ubuntu-22.04
steps:
- uses:actions/checkout@v3
- name:Set up Python
uses:actions/setup-python@v4
with:
python-version:"3.8"
- name:Install dependencies
run:|
python -m pip install --upgrade pip
pip install .[quality]
- name:Check repo consistency
run:|
python utils/check_copies.py
python utils/check_dummies.py
python utils/check_support_list.py
make deps_table_check_updated
- name:Check if failure
if:${{ failure() }}
run:|
echo "Repo consistency check failed. Please ensure the right dependency versions are installed with 'pip install -e .[quality]' and run 'make fix-copies'" >> $GITHUB_STEP_SUMMARY
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -57,13 +57,13 @@ Any question or comment related to the Diffusers library can be asked on the [di
- ...
Every question that is asked on the forum or on Discord actively encourages the community to publicly
share knowledge and might very well help a beginner in the future that has the same question you're
share knowledge and might very well help a beginner in the future who has the same question you're
having. Please do pose any questions you might have.
In the same spirit, you are of immense help to the community by answering such questions because this way you are publicly documenting knowledge for everybody to learn from.
**Please** keep in mind that the more effort you put into asking or answering a question, the higher
the quality of the publicly documented knowledge. In the same way, well-posed and well-answered questions create a high-quality knowledge database accessible to everybody, while badly posed questions or answers reduce the overall quality of the public knowledge database.
In short, a high quality question or answer is *precise*, *concise*, *relevant*, *easy-to-understand*, *accessible*, and *well-formated/well-posed*. For more information, please have a look through the [How to write a good issue](#how-to-write-a-good-issue) section.
In short, a high quality question or answer is *precise*, *concise*, *relevant*, *easy-to-understand*, *accessible*, and *well-formatted/well-posed*. For more information, please have a look through the [How to write a good issue](#how-to-write-a-good-issue) section.
**NOTE about channels**:
[*The forum*](https://discuss.huggingface.co/c/discussion-related-to-httpsgithubcomhuggingfacediffusers/63) is much better indexed by search engines, such as Google. Posts are ranked by popularity rather than chronologically. Hence, it's easier to look up questions and answers that we posted some time ago.
@@ -245,7 +245,7 @@ The official training examples are maintained by the Diffusers' core maintainers
This is because of the same reasons put forward in [6. Contribute a community pipeline](#6-contribute-a-community-pipeline) for official pipelines vs. community pipelines: It is not feasible for the core maintainers to maintain all possible training methods for diffusion models.
If the Diffusers core maintainers and the community consider a certain training paradigm to be too experimental or not popular enough, the corresponding training code should be put in the `research_projects` folder and maintained by the author.
Both official training and research examples consist of a directory that contains one or more training scripts, a requirements.txt file, and a README.md file. In order for the user to make use of the
Both official training and research examples consist of a directory that contains one or more training scripts, a `requirements.txt` file, and a `README.md` file. In order for the user to make use of the
training examples, it is required to clone the repository:
Therefore when adding an example, the `requirements.txt` file shall define all pip dependencies required for your training example so that once all those are installed, the user can run the example's training script. See, for example, the [DreamBooth `requirements.txt` file](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/requirements.txt).
@@ -355,7 +356,7 @@ You will need basic `git` proficiency to be able to contribute to
manual. Type `git --help` in a shell and enjoy. If you prefer books, [Pro
Git](https://git-scm.com/book/en/v2) is a very good reference.
Follow these steps to start contributing ([supported Python versions](https://github.com/huggingface/diffusers/blob/main/setup.py#L265)):
Follow these steps to start contributing ([supported Python versions](https://github.com/huggingface/diffusers/blob/42f25d601a910dceadaee6c44345896b4cfa9928/setup.py#L270)):
1. Fork the [repository](https://github.com/huggingface/diffusers) by
clicking on the 'Fork' button on the repository's page. This creates a copy of the code
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -15,7 +15,7 @@ specific language governing permissions and limitations under the License.
🧨 Diffusers provides **state-of-the-art** pretrained diffusion models across multiple modalities.
Its purpose is to serve as a **modular toolbox** for both inference and training.
We aim at building a library that stands the test of time and therefore take API design very seriously.
We aim to build a library that stands the test of time and therefore take API design very seriously.
In a nutshell, Diffusers is built to be a natural extension of PyTorch. Therefore, most of our design choices are based on [PyTorch's Design Principles](https://pytorch.org/docs/stable/community/design.html#pytorch-design-philosophy). Let's go over the most important ones:
@@ -63,14 +63,14 @@ Let's walk through more detailed design decisions for each class.
Pipelines are designed to be easy to use (therefore do not follow [*Simple over easy*](#simple-over-easy) 100%), are not feature complete, and should loosely be seen as examples of how to use [models](#models) and [schedulers](#schedulers) for inference.
The following design principles are followed:
- Pipelines follow the single-file policy. All pipelines can be found in individual directories under src/diffusers/pipelines. One pipeline folder corresponds to one diffusion paper/project/release. Multiple pipeline files can be gathered in one pipeline folder, as it’s done for [`src/diffusers/pipelines/stable-diffusion`](https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines/stable_diffusion). If pipelines share similar functionality, one can make use of the [#Copied from mechanism](https://github.com/huggingface/diffusers/blob/125d783076e5bd9785beb05367a2d2566843a271/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_img2img.py#L251).
- Pipelines follow the single-file policy. All pipelines can be found in individual directories under src/diffusers/pipelines. One pipeline folder corresponds to one diffusion paper/project/release. Multiple pipeline files can be gathered in one pipeline folder, as it’s done for [`src/diffusers/pipelines/stable-diffusion`](https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines/stable_diffusion). If pipelines share similar functionality, one can make use of the [#Copied from mechanism](https://github.com/huggingface/diffusers/blob/125d783076e5bd9785beb05367a2d2566843a271/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_img2img.py#L251).
- Pipelines all inherit from [`DiffusionPipeline`].
- Every pipeline consists of different model and scheduler components, that are documented in the [`model_index.json` file](https://huggingface.co/runwayml/stable-diffusion-v1-5/blob/main/model_index.json), are accessible under the same name as attributes of the pipeline and can be shared between pipelines with [`DiffusionPipeline.components`](https://huggingface.co/docs/diffusers/main/en/api/diffusion_pipeline#diffusers.DiffusionPipeline.components) function.
- Every pipeline consists of different model and scheduler components, that are documented in the [`model_index.json` file](https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5/blob/main/model_index.json), are accessible under the same name as attributes of the pipeline and can be shared between pipelines with [`DiffusionPipeline.components`](https://huggingface.co/docs/diffusers/main/en/api/diffusion_pipeline#diffusers.DiffusionPipeline.components) function.
- Every pipeline should be loadable via the [`DiffusionPipeline.from_pretrained`](https://huggingface.co/docs/diffusers/main/en/api/diffusion_pipeline#diffusers.DiffusionPipeline.from_pretrained) function.
- Pipelines should be used **only** for inference.
- Pipelines should be very readable, self-explanatory, and easy to tweak.
- Pipelines should be designed to build on top of each other and be easy to integrate into higher-level APIs.
- Pipelines are **not** intended to be feature-complete user interfaces. For futurecomplete user interfaces one should rather have a look at [InvokeAI](https://github.com/invoke-ai/InvokeAI), [Diffuzers](https://github.com/abhishekkrthakur/diffuzers), and [lama-cleaner](https://github.com/Sanster/lama-cleaner).
- Pipelines are **not** intended to be feature-complete user interfaces. For feature-complete user interfaces one should rather have a look at [InvokeAI](https://github.com/invoke-ai/InvokeAI), [Diffuzers](https://github.com/abhishekkrthakur/diffuzers), and [lama-cleaner](https://github.com/Sanster/lama-cleaner).
- Every pipeline should have one and only one way to run it via a `__call__` method. The naming of the `__call__` arguments should be shared across all pipelines.
- Pipelines should be named after the task they are intended to solve.
- In almost all cases, novel diffusion pipelines shall be implemented in a new pipeline folder/file.
@@ -81,7 +81,7 @@ Models are designed as configurable toolboxes that are natural extensions of [Py
The following design principles are followed:
- Models correspond to **a type of model architecture**. *E.g.* the [`UNet2DConditionModel`] class is used for all UNet variations that expect 2D image inputs and are conditioned on some context.
- All models can be found in [`src/diffusers/models`](https://github.com/huggingface/diffusers/tree/main/src/diffusers/models) and every model architecture shall be defined in its file, e.g. [`unet_2d_condition.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/unet_2d_condition.py), [`transformer_2d.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/transformer_2d.py), etc...
- All models can be found in [`src/diffusers/models`](https://github.com/huggingface/diffusers/tree/main/src/diffusers/models) and every model architecture shall be defined in its file, e.g. [`unets/unet_2d_condition.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/unets/unet_2d_condition.py), [`transformers/transformer_2d.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/transformers/transformer_2d.py), etc...
- Models **do not** follow the single-file policy and should make use of smaller model building blocks, such as [`attention.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention.py), [`resnet.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/resnet.py), [`embeddings.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/embeddings.py), etc... **Note**: This is in stark contrast to Transformers' modeling files and shows that models do not really follow the single-file policy.
- Models intend to expose complexity, just like PyTorch's `Module` class, and give clear error messages.
- Models all inherit from `ModelMixin` and `ConfigMixin`.
@@ -90,7 +90,7 @@ The following design principles are followed:
- To integrate new model checkpoints whose general architecture can be classified as an architecture that already exists in Diffusers, the existing model architecture shall be adapted to make it work with the new checkpoint. One should only create a new file if the model architecture is fundamentally different.
- Models should be designed to be easily extendable to future changes. This can be achieved by limiting public function arguments, configuration arguments, and "foreseeing" future changes, *e.g.* it is usually better to add `string` "...type" arguments that can easily be extended to new future types instead of boolean `is_..._type` arguments. Only the minimum amount of changes shall be made to existing architectures to make a new model checkpoint work.
- The model design is a difficult trade-off between keeping code readable and concise and supporting many model checkpoints. For most parts of the modeling code, classes shall be adapted for new model checkpoints, while there are some exceptions where it is preferred to add new classes to make sure the code is kept concise and
readable long-term, such as [UNet blocks](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/unet_2d_blocks.py) and [Attention processors](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention_processor.py).
readable long-term, such as [UNet blocks](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/unets/unet_2d_blocks.py) and [Attention processors](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention_processor.py).
### Schedulers
@@ -100,7 +100,7 @@ The following design principles are followed:
- All schedulers are found in [`src/diffusers/schedulers`](https://github.com/huggingface/diffusers/tree/main/src/diffusers/schedulers).
- Schedulers are **not** allowed to import from large utils files and shall be kept very self-contained.
- One scheduler Python file corresponds to one scheduler algorithm (as might be defined in a paper).
- If schedulers share similar functionalities, we can make use of the `#Copied from` mechanism.
- If schedulers share similar functionalities, we can make use of the `#Copied from` mechanism.
- Schedulers all inherit from `SchedulerMixin` and `ConfigMixin`.
- Schedulers can be easily swapped out with the [`ConfigMixin.from_config`](https://huggingface.co/docs/diffusers/main/en/api/configuration#diffusers.ConfigMixin.from_config) method as explained in detail [here](./docs/source/en/using-diffusers/schedulers.md).
- Every scheduler has to have a `set_num_inference_steps`, and a `step` function. `set_num_inference_steps(...)` has to be called before every denoising process, *i.e.* before `step(...)` is called.
🤗 Diffusers is the go-to library for state-of-the-art pretrained diffusion models for generating images, audio, and even 3D structures of molecules. Whether you're looking for a simple inference solution or training your own diffusion models, 🤗 Diffusers is a modular toolbox that supports both. Our library is designed with a focus on [usability over performance](https://huggingface.co/docs/diffusers/conceptual/philosophy#usability-over-performance), [simple over easy](https://huggingface.co/docs/diffusers/conceptual/philosophy#simple-over-easy), and [customizability over abstractions](https://huggingface.co/docs/diffusers/conceptual/philosophy#tweakable-contributorfriendly-over-abstraction).
@@ -77,13 +67,13 @@ Please refer to the [How to use Stable Diffusion in Apple Silicon](https://huggi
## Quickstart
Generating outputs is super easy with 🤗 Diffusers. To generate an image from text, use the `from_pretrained` method to load any pretrained diffusion model (browse the [Hub](https://huggingface.co/models?library=diffusers&sort=downloads) for 16000+ checkpoints):
Generating outputs is super easy with 🤗 Diffusers. To generate an image from text, use the `from_pretrained` method to load any pretrained diffusion model (browse the [Hub](https://huggingface.co/models?library=diffusers&sort=downloads) for 30,000+ checkpoints):
| [Tutorial](https://huggingface.co/docs/diffusers/tutorials/tutorial_overview) | A basic crash course for learning how to use the library's most important features like using models and schedulers to build your own diffusion system, and training your own diffusion model. |
| [Loading](https://huggingface.co/docs/diffusers/using-diffusers/loading_overview) | Guides for how to load and configure all the components (pipelines, models, and schedulers) of the library, as well as how to use different schedulers. |
| [Pipelines for inference](https://huggingface.co/docs/diffusers/using-diffusers/pipeline_overview) | Guides for how to use pipelines for different inference tasks, batched generation, controlling generated outputs and randomness, and how to contribute a pipeline to the library. |
| [Optimization](https://huggingface.co/docs/diffusers/optimization/opt_overview) | Guides for how to optimize your diffusion model to run faster and consume less memory. |
| [Loading](https://huggingface.co/docs/diffusers/using-diffusers/loading) | Guides for how to load and configure all the components (pipelines, models, and schedulers) of the library, as well as how to use different schedulers. |
| [Pipelines for inference](https://huggingface.co/docs/diffusers/using-diffusers/overview_techniques) | Guides for how to use pipelines for different inference tasks, batched generation, controlling generated outputs and randomness, and how to contribute a pipeline to the library. |
| [Optimization](https://huggingface.co/docs/diffusers/optimization/fp16) | Guides for how to optimize your diffusion model to run faster and consume less memory. |
| [Training](https://huggingface.co/docs/diffusers/training/overview) | Guides for how to train a diffusion model for different tasks with different training techniques. |
## Contribution
@@ -154,7 +144,7 @@ Also, say 👋 in our public Discord channel <a href="https://discord.gg/G7tWnz9
@@ -219,7 +210,7 @@ Also, say 👋 in our public Discord channel <a href="https://discord.gg/G7tWnz9
- https://github.com/deep-floyd/IF
- https://github.com/bentoml/BentoML
- https://github.com/bmaltais/kohya_ss
- +7000 other amazing GitHub repositories 💪
- +14,000 other amazing GitHub repositories 💪
Thank you for using us ❤️.
@@ -238,7 +229,7 @@ We also want to thank @heejkoo for the very helpful overview of papers, code and
```bibtex
@misc{von-platen-etal-2022-diffusers,
author={Patrick von Platen and Suraj Patil and Anton Lozhkov and Pedro Cuenca and Nathan Lambert and Kashif Rasul and Mishig Davaadorj and Thomas Wolf},
author={Patrick von Platen and Suraj Patil and Anton Lozhkov and Pedro Cuenca and Nathan Lambert and Kashif Rasul and Mishig Davaadorj and Dhruv Nair and Sayak Paul and William Berman and Yiyi Xu and Steven Liu and Thomas Wolf},
Welcome to Diffusers Benchmarks. These benchmarks are use to obtain latency and memory information of the most popular models across different scenarios such as:
* Base case i.e., when using `torch.bfloat16` and `torch.nn.functional.scaled_dot_product_attention`.
* Base + `torch.compile()`
* NF4 quantization
* Layerwise upcasting
Instead of full diffusion pipelines, only the forward pass of the respective model classes (such as `FluxTransformer2DModel`) is tested with the real checkpoints (such as `"black-forest-labs/FLUX.1-dev"`).
The entrypoint to running all the currently available benchmarks is in `run_all.py`. However, one can run the individual benchmarks, too, e.g., `python benchmarking_flux.py`. It should produce a CSV file containing various information about the benchmarks run.
The benchmarks are run on a weekly basis and the CI is defined in [benchmark.yml](../.github/workflows/benchmark.yml).
## Running the benchmarks manually
First set up `torch` and install `diffusers` from the root of the directory:
```py
pipinstall-e".[quality,test]"
```
Then make sure the other dependencies are installed:
```sh
cd benchmarks/
pip install -r requirements.txt
```
We need to be authenticated to access some of the checkpoints used during benchmarking:
```sh
hf auth login
```
We use an L40 GPU with 128GB RAM to run the benchmark CI. As such, the benchmarks are configured to run on NVIDIA GPUs. So, make sure you have access to a similar machine (or modify the benchmarking scripts accordingly).
Then you can either launch the entire benchmarking suite by running:
```sh
python run_all.py
```
Or, you can run the individual benchmarks.
## Customizing the benchmarks
We define "scenarios" to cover the most common ways in which these models are used. You can
define a new scenario, modifying an existing benchmark file:
You can also configure a new model-level benchmark and add it to the existing suite. To do so, just defining a valid benchmarking file like `benchmarking_flux.py` should be enough.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Outpainting
Outpainting extends an image beyond its original boundaries, allowing you to add, replace, or modify visual elements in an image while preserving the original image. Like [inpainting](../using-diffusers/inpaint), you want to fill the white area (in this case, the area outside of the original image) with new visual elements while keeping the original image (represented by a mask of black pixels). There are a couple of ways to outpaint, such as with a [ControlNet](https://hf.co/blog/OzzyGT/outpainting-controlnet) or with [Differential Diffusion](https://hf.co/blog/OzzyGT/outpainting-differential-diffusion).
This guide will show you how to outpaint with an inpainting model, ControlNet, and a ZoeDepth estimator.
Before you begin, make sure you have the [controlnet_aux](https://github.com/huggingface/controlnet_aux) library installed so you can use the ZoeDepth estimator.
```py
!pipinstall-qcontrolnet_aux
```
## Image preparation
Start by picking an image to outpaint with and remove the background with a Space like [BRIA-RMBG-1.4](https://hf.co/spaces/briaai/BRIA-RMBG-1.4).
<iframe
src="https://briaai-bria-rmbg-1-4.hf.space"
frameborder="0"
width="850"
height="450"
></iframe>
For example, remove the background from this image of a pair of shoes.
[Stable Diffusion XL (SDXL)](../using-diffusers/sdxl) models work best with 1024x1024 images, but you can resize the image to any size as long as your hardware has enough memory to support it. The transparent background in the image should also be replaced with a white background. Create a function (like the one below) that scales and pastes the image onto a white background.
To avoid adding unwanted extra details, use the ZoeDepth estimator to provide additional guidance during generation and to ensure the shoes remain consistent with the original image.
Once your image is ready, you can generate content in the white area around the shoes with [controlnet-inpaint-dreamer-sdxl](https://hf.co/destitech/controlnet-inpaint-dreamer-sdxl), a SDXL ControlNet trained for inpainting.
Load the inpainting ControlNet, ZoeDepth model, VAE and pass them to the [`StableDiffusionXLControlNetPipeline`]. Then you can create an optional `generate_image` function (for convenience) to outpaint an initial image.
> Now is a good time to free up some memory if you're running low!
>
> ```py
> pipeline=None
> torch.cuda.empty_cache()
> ```
Now that you have an initial outpainted image, load the [`StableDiffusionXLInpaintPipeline`] with the [RealVisXL](https://hf.co/SG161222/RealVisXL_V4.0) model to generate the final outpainted image with better quality.
Prepare a mask for the final outpainted image. To create a more natural transition between the original image and the outpainted background, blur the mask to help it blend better.
Create a better prompt and pass it to the `generate_outpaint` function to generate the final outpainted image. Again, paste the original image over the final outpainted background.
<!-- Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License. -->
# Caching methods
Cache methods speedup diffusion transformers by storing and reusing intermediate outputs of specific layers, such as attention and feedforward layers, instead of recalculating them at each inference step.
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -12,14 +12,24 @@ specific language governing permissions and limitations under the License.
# IP-Adapter
[IP-Adapter](https://hf.co/papers/2308.06721) is a lightweight adapter that enables prompting a diffusion model with an image. This method decouples the cross-attention layers of the image and text features. The image features are generated from an image encoder. Files generated from IP-Adapter are only ~100MBs.
[IP-Adapter](https://hf.co/papers/2308.06721) is a lightweight adapter that enables prompting a diffusion model with an image. This method decouples the cross-attention layers of the image and text features. The image features are generated from an image encoder.
<Tip>
Learn how to load an IP-Adapter checkpoint and image in the [IP-Adapter](../../using-diffusers/loading_adapters#ip-adapter) loading guide.
Learn how to load an IP-Adapter checkpoint and image in the IP-Adapter [loading](../../using-diffusers/loading_adapters#ip-adapter) guide, and you can see how to use it in the [usage](../../using-diffusers/ip_adapter) guide.
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -12,10 +12,26 @@ specific language governing permissions and limitations under the License.
# LoRA
LoRA is a fast and lightweight training method that inserts and trains a significantly smaller number of parameters instead of all the model parameters. This produces a smaller file (~100 MBs) and makes it easier to quickly train a model to learn a new concept. LoRA weights are typically loaded into the UNet, text encoder or both. There are two classes for loading LoRA weights:
LoRA is a fast and lightweight training method that inserts and trains a significantly smaller number of parameters instead of all the model parameters. This produces a smaller file (~100 MBs) and makes it easier to quickly train a model to learn a new concept. LoRA weights are typically loaded into the denoiser, text encoder or both. The denoiser usually corresponds to a UNet ([`UNet2DConditionModel`], for example) or a Transformer ([`SD3Transformer2DModel`], for example). There are several classes for loading LoRA weights:
- [`LoraLoaderMixin`] provides functions for loading and unloading, fusing and unfusing, enabling and disabling, and more functions for managing LoRA weights. This class can be used with any model.
- [`StableDiffusionXLLoraLoaderMixin`] is a [Stable Diffusion (SDXL)](../../api/pipelines/stable_diffusion/stable_diffusion_xl) version of the [`LoraLoaderMixin`] class for loading and saving LoRA weights. It can only be used with the SDXL model.
- [`StableDiffusionLoraLoaderMixin`] provides functions for loading and unloading, fusing and unfusing, enabling and disabling, and more functions for managing LoRA weights. This class can be used with any model.
- [`StableDiffusionXLLoraLoaderMixin`] is a [Stable Diffusion (SDXL)](../../api/pipelines/stable_diffusion/stable_diffusion_xl) version of the [`StableDiffusionLoraLoaderMixin`] class for loading and saving LoRA weights. It can only be used with the SDXL model.
- [`SD3LoraLoaderMixin`] provides similar functions for [Stable Diffusion 3](https://huggingface.co/blog/sd3).
- [`FluxLoraLoaderMixin`] provides similar functions for [Flux](https://huggingface.co/docs/diffusers/main/en/api/pipelines/flux).
- [`CogVideoXLoraLoaderMixin`] provides similar functions for [CogVideoX](https://huggingface.co/docs/diffusers/main/en/api/pipelines/cogvideox).
- [`Mochi1LoraLoaderMixin`] provides similar functions for [Mochi](https://huggingface.co/docs/diffusers/main/en/api/pipelines/mochi).
- [`AuraFlowLoraLoaderMixin`] provides similar functions for [AuraFlow](https://huggingface.co/fal/AuraFlow).
- [`LTXVideoLoraLoaderMixin`] provides similar functions for [LTX-Video](https://huggingface.co/docs/diffusers/main/en/api/pipelines/ltx_video).
- [`SanaLoraLoaderMixin`] provides similar functions for [Sana](https://huggingface.co/docs/diffusers/main/en/api/pipelines/sana).
- [`HunyuanVideoLoraLoaderMixin`] provides similar functions for [HunyuanVideo](https://huggingface.co/docs/diffusers/main/en/api/pipelines/hunyuan_video).
- [`Lumina2LoraLoaderMixin`] provides similar functions for [Lumina2](https://huggingface.co/docs/diffusers/main/en/api/pipelines/lumina2).
- [`WanLoraLoaderMixin`] provides similar functions for [Wan](https://huggingface.co/docs/diffusers/main/en/api/pipelines/wan).
- [`SkyReelsV2LoraLoaderMixin`] provides similar functions for [SkyReels-V2](https://huggingface.co/docs/diffusers/main/en/api/pipelines/skyreels_v2).
- [`CogView4LoraLoaderMixin`] provides similar functions for [CogView4](https://huggingface.co/docs/diffusers/main/en/api/pipelines/cogview4).
- [`AmusedLoraLoaderMixin`] is for the [`AmusedPipeline`].
- [`HiDreamImageLoraLoaderMixin`] provides similar functions for [HiDream Image](https://huggingface.co/docs/diffusers/main/en/api/pipelines/hidream)
- [`QwenImageLoraLoaderMixin`] provides similar functions for [Qwen Image](https://huggingface.co/docs/diffusers/main/en/api/pipelines/qwen)
- [`LoraBaseMixin`] provides a base class with several utility methods to fuse, unfuse, unload, LoRAs and more.
<Tip>
@@ -23,10 +39,77 @@ To learn more about how to load LoRA weights, see the [LoRA](../../using-diffuse
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# PEFT
Diffusers supports loading adapters such as [LoRA](../../using-diffusers/loading_adapters) with the [PEFT](https://huggingface.co/docs/peft/index) library with the [`~loaders.peft.PeftAdapterMixin`] class. This allows modeling classes in Diffusers like [`UNet2DConditionModel`], [`SD3Transformer2DModel`] to operate with an adapter.
<Tip>
Refer to the [Inference with PEFT](../../tutorials/using_peft_for_inference.md) tutorial for an overview of how to use PEFT in Diffusers for inference.
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -12,26 +12,51 @@ specific language governing permissions and limitations under the License.
# Single files
Diffusers supports loading pretrained pipeline (or model) weights stored in a singlefile, such as a `ckpt` or `safetensors` file. These single file types are typically produced from community trained models. There are three classes for loading single file weights:
The [`~loaders.FromSingleFileMixin.from_single_file`] method allows you to load:
-[`FromSingleFileMixin`] supports loading pretrained pipeline weights stored in a singlefile, which can either be a `ckpt` or `safetensors` file.
-[`FromOriginalVAEMixin`] supports loading a pretrained [`AutoencoderKL`] from pretrained ControlNet weights stored in a single file, which can either be a `ckpt` or `safetensors` file.
- [`FromOriginalControlnetMixin`] supports loading pretrained ControlNet weights stored in a single file, which can either be a `ckpt` or `safetensors` file.
*a model stored in a single file, which is useful if you're working with models from the diffusion ecosystem, like Automatic1111, and commonly rely on a single-file layout to store and share models
*a model stored in their originally distributed layout, which is useful if you're working with models finetuned with other services, and want to load it directly into Diffusers model objects and pipelines
<Tip>
> [!TIP]
> Read the [Model files and layouts](../../using-diffusers/other-formats) guide to learn more about the Diffusers-multifolder layout versus the single-file layout, and how to load models stored in these different layouts.
To learn more about how to load single file weights, see the [Load different Stable Diffusion formats](../../using-diffusers/other-formats) loading guide.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# SD3Transformer2D
This class is useful when *only* loading weights into a [`SD3Transformer2DModel`]. If you need to load weights into the text encoder or a text encoder and SD3Transformer2DModel, check [`SD3LoraLoaderMixin`](lora#diffusers.loaders.SD3LoraLoaderMixin) class instead.
The [`SD3Transformer2DLoadersMixin`] class currently only loads IP-Adapter weights, but will be used in the future to save weights and load LoRAs.
<Tip>
To learn more about how to load LoRA weights, see the [LoRA](../../using-diffusers/loading_adapters#lora) loading guide.
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -12,7 +12,7 @@ specific language governing permissions and limitations under the License.
# UNet
Some training methods - like LoRA and Custom Diffusion - typically target the UNet's attention layers, but these training methods can also target other non-attention layers. Instead of training all of a model's parameters, only a subset of the parameters are trained, which is faster and more efficient. This class is useful if you're *only* loading weights into a UNet. If you need to load weights into the text encoder or a text encoder and UNet, try using the [`~loaders.LoraLoaderMixin.load_lora_weights`] function instead.
Some training methods - like LoRA and Custom Diffusion - typically target the UNet's attention layers, but these training methods can also target other non-attention layers. Instead of training all of a model's parameters, only a subset of the parameters are trained, which is faster and more efficient. This class is useful if you're *only* loading weights into a UNet. If you need to load weights into the text encoder or a text encoder and UNet, try using the [`~loaders.StableDiffusionLoraLoaderMixin.load_lora_weights`] function instead.
The [`UNet2DConditionLoadersMixin`] class provides functions for loading and saving weights, fusing and unfusing LoRAs, disabling and enabling LoRAs, and setting and deleting adapters.
<!-- Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License. -->
# AllegroTransformer3DModel
A Diffusion Transformer model for 3D data from [Allegro](https://github.com/rhymes-ai/Allegro) was introduced in [Allegro: Open the Black Box of Commercial-Level Video Generation Model](https://huggingface.co/papers/2410.15458) by RhymesAI.
The model can be loaded with the following code snippet.
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -12,7 +12,7 @@ specific language governing permissions and limitations under the License.
# AsymmetricAutoencoderKL
Improved larger variational autoencoder (VAE) model with KL loss for inpainting task: [Designing a Better Asymmetric VQGAN for StableDiffusion](https://arxiv.org/abs/2306.04632) by Zixin Zhu, Xuelu Feng, Dongdong Chen, Jianmin Bao, Le Wang, Yinpeng Chen, Lu Yuan, Gang Hua.
Improved larger variational autoencoder (VAE) model with KL loss for inpainting task: [Designing a Better Asymmetric VQGAN for StableDiffusion](https://huggingface.co/papers/2306.04632) by Zixin Zhu, Xuelu Feng, Dongdong Chen, Jianmin Bao, Le Wang, Yinpeng Chen, Lu Yuan, Gang Hua.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# AutoModel
The `AutoModel` is designed to make it easy to load a checkpoint without needing to know the specific model class. `AutoModel` automatically retrieves the correct model class from the checkpoint `config.json` file.
<!-- Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License. -->
# AutoencoderDC
The 2D Autoencoder model used in [SANA](https://huggingface.co/papers/2410.10629) and introduced in [DCAE](https://huggingface.co/papers/2410.10733) by authors Junyu Chen\*, Han Cai\*, Junsong Chen, Enze Xie, Shang Yang, Haotian Tang, Muyang Li, Yao Lu, Song Han from MIT HAN Lab.
The abstract from the paper is:
*We present Deep Compression Autoencoder (DC-AE), a new family of autoencoder models for accelerating high-resolution diffusion models. Existing autoencoder models have demonstrated impressive results at a moderate spatial compression ratio (e.g., 8x), but fail to maintain satisfactory reconstruction accuracy for high spatial compression ratios (e.g., 64x). We address this challenge by introducing two key techniques: (1) Residual Autoencoding, where we design our models to learn residuals based on the space-to-channel transformed features to alleviate the optimization difficulty of high spatial-compression autoencoders; (2) Decoupled High-Resolution Adaptation, an efficient decoupled three-phases training strategy for mitigating the generalization penalty of high spatial-compression autoencoders. With these designs, we improve the autoencoder's spatial compression ratio up to 128 while maintaining the reconstruction quality. Applying our DC-AE to latent diffusion models, we achieve significant speedup without accuracy drop. For example, on ImageNet 512x512, our DC-AE provides 19.1x inference speedup and 17.9x training speedup on H100 GPU for UViT-H while achieving a better FID, compared with the widely used SD-VAE-f8 autoencoder. Our code is available at [this https URL](https://github.com/mit-han-lab/efficientvit).*
The following DCAE models are released and supported in Diffusers.
The `AutoencoderDC` model has `in` and `mix` single file checkpoint variants that have matching checkpoint keys, but use different scaling factors. It is not possible for Diffusers to automatically infer the correct config file to use with the model based on just the checkpoint and will default to configuring the model using the `mix` variant config file. To override the automatically determined config, please use the `config` argument when using single file loading with `in` variant checkpoints.
<!-- Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License. -->
# AutoencoderKLHunyuanVideo
The 3D variational autoencoder (VAE) model with KL loss used in [HunyuanVideo](https://github.com/Tencent/HunyuanVideo/), which was introduced in [HunyuanVideo: A Systematic Framework For Large Video Generative Models](https://huggingface.co/papers/2412.03603) by Tencent.
The model can be loaded with the following code snippet.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# AutoencoderOobleck
The Oobleck variational autoencoder (VAE) model with KL loss was introduced in [Stability-AI/stable-audio-tools](https://github.com/Stability-AI/stable-audio-tools) and [Stable Audio Open](https://huggingface.co/papers/2407.14358) by Stability AI. The model is used in 🤗 Diffusers to encode audio waveforms into latents and to decode latent representations into audio waveforms.
The abstract from the paper is:
*Open generative models are vitally important for the community, allowing for fine-tunes and serving as baselines when presenting new models. However, most current text-to-audio models are private and not accessible for artists and researchers to build upon. Here we describe the architecture and training process of a new open-weights text-to-audio model trained with Creative Commons data. Our evaluation shows that the model's performance is competitive with the state-of-the-art across various metrics. Notably, the reported FDopenl3 results (measuring the realism of the generations) showcase its potential for high-quality stereo sound synthesis at 44.1kHz.*
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -12,7 +12,7 @@ specific language governing permissions and limitations under the License.
# AutoencoderKL
The variational autoencoder (VAE) model with KL loss was introduced in [Auto-Encoding Variational Bayes](https://arxiv.org/abs/1312.6114v11) by Diederik P. Kingma and Max Welling. The model is used in 🤗 Diffusers to encode images into latents and to decode latent representations into images.
The variational autoencoder (VAE) model with KL loss was introduced in [Auto-Encoding Variational Bayes](https://huggingface.co/papers/1312.6114v11) by Diederik P. Kingma and Max Welling. The model is used in 🤗 Diffusers to encode images into latents and to decode latent representations into images.
The abstract from the paper is:
@@ -21,7 +21,7 @@ The abstract from the paper is:
## Loading from the original format
By default the [`AutoencoderKL`] should be loaded with [`~ModelMixin.from_pretrained`], but it can also be loaded
from the original format using [`FromOriginalVAEMixin.from_single_file`] as follows:
from the original format using [`FromOriginalModelMixin.from_single_file`] as follows:
```py
fromdiffusersimportAutoencoderKL
@@ -33,6 +33,9 @@ model = AutoencoderKL.from_single_file(url)
<!-- Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License. -->
# AutoencoderKLAllegro
The 3D variational autoencoder (VAE) model with KL loss used in [Allegro](https://github.com/rhymes-ai/Allegro) was introduced in [Allegro: Open the Black Box of Commercial-Level Video Generation Model](https://huggingface.co/papers/2410.15458) by RhymesAI.
The model can be loaded with the following code snippet.
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License. -->
# AutoencoderKLCogVideoX
The 3D variational autoencoder (VAE) model with KL loss used in [CogVideoX](https://github.com/THUDM/CogVideo) was introduced in [CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer](https://github.com/THUDM/CogVideo/blob/main/resources/CogVideoX.pdf) by Tsinghua University & ZhipuAI.
The model can be loaded with the following code snippet.
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.