* Fix multi-gpu case
* Prefer previously created `unwrap_model()` function
For `torch.compile()` generalizability
* `chore: update unwrap_model() function to use accelerator.unwrap_model()`
* Add AuraFlowPipeline and KolorsPipeline to auto map
Just T2I. Validated using `quickdif`
* Add Kolors I2I and SD3 Inpaint auto maps
* style
---------
Co-authored-by: yiyixuxu <yixu310@gmail.com>
* add Latte to diffusers
* remove print
* remove print
* remove print
* remove unuse codes
* remove layer_norm_latte and add a flag
* remove layer_norm_latte and add a flag
* update latte_pipeline
* update latte_pipeline
* remove unuse squeeze
* add norm_hidden_states.ndim == 2: # for Latte
* fixed test latte pipeline bugs
* fixed test latte pipeline bugs
* delete sh
* add doc for latte
* add licensing
* Move Transformer3DModelOutput to modeling_outputs
* give a default value to sample_size
* remove the einops dependency
* change norm2 for latte
* modify pipeline of latte
* update test for Latte
* modify some codes for latte
* modify for Latte pipeline
* modify for Latte pipeline
* modify for Latte pipeline
* modify for Latte pipeline
* modify for Latte pipeline
* modify for Latte pipeline
* modify for Latte pipeline
* modify for Latte pipeline
* modify for Latte pipeline
* modify for Latte pipeline
* modify for Latte pipeline
* modify for Latte pipeline
* modify for Latte pipeline
* modify for Latte pipeline
* modify for Latte pipeline
* modify for Latte pipeline
* modify for Latte pipeline
* modify for Latte pipeline
* modify for Latte pipeline
* modify for Latte pipeline
* modify for Latte pipeline
* modify for Latte pipeline
* modify for Latte pipeline
* modify for Latte pipeline
* modify for Latte pipeline
* modify for Latte pipeline
* modify for Latte pipeline
* video_length -> num_frames; update prepare_latents copied from
* make fix-copies
* make style
* typo: videe -> video
* update
* modify for Latte pipeline
* modify latte pipeline
* modify latte pipeline
* modify latte pipeline
* modify latte pipeline
* modify for Latte pipeline
* Delete .vscode directory
* make style
* make fix-copies
* add latte transformer 3d to docs _toctree.yml
* update example
* reduce frames for test
* fixed bug of _text_preprocessing
* set num frame to 1 for testing
* remove unuse print
* add text = self._clean_caption(text) again
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: Aryan <contact.aryanvs@gmail.com>
Co-authored-by: Aryan <aryan@huggingface.co>
* Add vae_roundtrip.py example
* Add cuda support to vae_roundtrip
* Move vae_roundtrip.py into research_projects/vae
* Fix channel scaling in vae roundrip and also support taesd.
* Apply ruff --fix for CI gatekeep check
---------
Co-authored-by: Álvaro Somoza <asomoza@users.noreply.github.com>
* introduce to promote reusability.
* up
* add more tests
* up
* remove comments.
* fix fuse_nan test
* clarify the scope of fuse_lora and unfuse_lora
* remove space
* minor changes
* minor changes
* minor changes
* minor changes
* minor changes
* minor changes
* minor changes
* fix
* fix
* aligning with blora script
* aligning with blora script
* aligning with blora script
* aligning with blora script
* aligning with blora script
* remove prints
* style
* default val
* license
* move save_model_card to outside push_to_hub
* Update train_dreambooth_lora_sdxl_advanced.py
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Motion Model / Adapter versatility
- allow to use a different number of layers per block
- allow to use a different number of transformer per layers per block
- allow a different number of motion attention head per block
- use dropout argument in get_down/up_block in 3d blocks
* Motion Model added arguments renamed & refactoring
* Add test for asymmetric UNetMotionModel
* Add check for WindowsPath in to_json_string
On Windows, os.path.join returns a WindowsPath. to_json_string does not convert this from a WindowsPath to a string. Added check for WindowsPath to to_json_saveable.
* Remove extraneous convert to string in test_check_path_types (tests/others/test_config.py)
* Fix style issues in tests/others/test_config.py
* Add unit test to test_config.py to verify that PosixPath and WindowsPath (depending on system) both work when converted to JSON
* Remove distinction between PosixPath and WindowsPath in ConfigMixIn.to_json_string(). Conditional now tests for Path, and uses Path.as_posix() to convert to string.
---------
Co-authored-by: Vincent Dovydaitis <vincedovy@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* doc for max_sequence_length
* better position and changed note to tip
* apply suggestions
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Class methods are supposed to use `cls` conventionally
* `make style && make quality`
* An Empty commit
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Discourage using `revision`
* `make style && make quality`
* Refactor code to use 'variant' instead of 'revision'
* `revision="bf16"` -> `variant="bf16"`
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Trim all the trailing white space in the whole repo
* Remove unnecessary empty places
* make style && make quality
* Trim trailing white space
* trim
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Add support for _foreach operations and non-blocking to EMAModel
* default foreach to false
* add non-blocking EMA offloading to SD1.5 T2I example script
* fix whitespace
* move foreach to cli argument
* linting
* Update README.md re: EMA weight training
* correct args.foreach_ema
* add tests for foreach ema
* code quality
* add foreach to from_pretrained
* default foreach false
* fix linting
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: drhead <a@a.a>
* fix
* add check
* key present is checked before
* test case draft
* aply suggestions
* changed testing repo, back to old class
* forgot docstring
---------
Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* get rid of the legacy lora remnants and make our codebase lighter
* fix depcrecated lora argument
* fix
* empty commit to trigger ci
* remove print
* empty
* fix typo in __call__ of pipeline_stable_diffusion_3.py
* fix typo in __call__ of pipeline_stable_diffusion_3_img2img.py
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
[SD3 Docs] Corrected title about loading model with T5
Corrected the documentation title to "Loading the single file checkpoint with T5" Previously, it incorrectly stated "Loading the single file checkpoint without T5" which contradicted the code snippet showing how to load the SD3 checkpoint with the T5 model
* [LoRA] text encoder: read the ranks for all the attn modules
* In addition to out_proj, read the ranks of adapters for q_proj, k_proj, and v_proj
* Allow missing adapters (UNet already supports this)
* ruff format loaders.lora
* [LoRA] add tests for partial text encoders LoRAs
* [LoRA] update test_simple_inference_with_partial_text_lora to be deterministic
* [LoRA] comment justifying test_simple_inference_with_partial_text_lora
* style
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Fix sharding when no device_map is passed
* style
* add tests
* align
* add docstring
* format
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Update train_dreambooth_sd3.py to fix TE garbage collection
* Update train_dreambooth_lora_sd3.py to fix TE garbage collection
---------
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* image_processor.py: Fixed an error in ValueError's message , as the string's join method tried to join types, instead of strings
Bug that occurred:
f"Input is in incorrect format. Currently, we only support {', '.join(supported_formats)}"
TypeError: sequence item 0: expected str instance, type found
* Fixed: C417 Unnecessary `map` usage (rewrite using a generator expression)
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* feat: support saving a model in sharded checkpoints.
* feat: make loading of sharded checkpoints work.
* add tests
* cleanse the loading logic a bit more.
* more resilience while loading from the Hub.
* parallelize shard downloads by using snapshot_download()/
* default to a shard size.
* more fix
* Empty-Commit
* debug
* fix
* uality
* more debugging
* fix more
* initial comments from Benjamin
* move certain methods to loading_utils
* add test to check if the correct number of shards are present.
* add a test to check if loading of sharded checkpoints from the Hub is okay
* clarify the unit when passed as an int.
* use hf_hub for sharding.
* remove unnecessary code
* remove unnecessary function
* lucain's comments.
* fixes
* address high-level comments.
* fix test
* subfolder shenanigans./
* Update src/diffusers/utils/hub_utils.py
Co-authored-by: Lucain <lucainp@gmail.com>
* Apply suggestions from code review
Co-authored-by: Lucain <lucainp@gmail.com>
* remove _huggingface_hub_version as not needed.
* address more feedback.
* add a test for local_files_only=True/
* need hf hub to be at least 0.23.2
* style
* final comment.
* clean up subfolder.
* deal with suffixes in code.
* _add_variant default.
* use weights_name_pattern
* remove add_suffix_keyword
* clean up downloading of sharded ckpts.
* don't return something special when using index.json
* fix more
* don't use bare except
* remove comments and catch the errors better
* fix a couple of things when using is_file()
* empty
---------
Co-authored-by: Lucain <lucainp@gmail.com>
* first draft
* secret
* tiktok
* capital matters
* dataset matter
* don't be a prick
* refact
* only on main or tag
* document with an example
* Update destination dataset
* link
* allow manual trigger
* better
* lin
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* handle norm_type of transformer2d_model safely.
* log an info when old model class is being returned.
* Apply suggestions from code review
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
* remove extra stuff
---------
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
* Update transformer2d.md title
For the other classes (e.g., UNet2DModel) the title of the documentation coincides with the name of the class, but that was not the case for Transformer2DModel.
* Update model docs titles for consistency with class names
* Modularized the train_lora_sdxl file
* Modularized the train_lora_sdxl file
* Modularized the train_lora_sdxl file
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Modularized the train_lora file
* Modularized the train_lora file
* Modularized the train_lora file
* Modularized the train_lora file
* Modularized the train_lora file
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* implement marigold depth and normals pipelines in diffusers core
* remove bibtex
* remove deprecations
* remove save_memory argument
* remove validate_vae
* remove config output
* remove batch_size autodetection
* remove presets logic
move default denoising_steps and processing_resolution into the model config
make default ensemble_size 1
* remove no_grad
* add fp16 to the example usage
* implement is_matplotlib_available
use is_matplotlib_available, is_scipy_available for conditional imports in the marigold depth pipeline
* move colormap, visualize_depth, and visualize_normals into export_utils.py
* make the denoising loop more lucid
fix the outputs to always be 4d tensors or lists of pil images
support a 4d input_image case
attempt to support model_cpu_offload_seq
move check_inputs into a separate function
change default batch_size to 1, remove any logic to make it bigger implicitly
* style
* rename denoising_steps into num_inference_steps
* rename input_image into image
* rename input_latent into latents
* remove decode_image
change decode_prediction to use the AutoencoderKL.decode method
* move clean_latent outside of progress_bar
* refactor marigold-reusable image processing bits into MarigoldImageProcessor class
* clean up the usage example docstring
* make ensemble functions members of the pipelines
* add early checks in check_inputs
rename E into ensemble_size in depth ensembling
* fix vae_scale_factor computation
* better compatibility with torch.compile
better variable naming
* move export_depth_to_png to export_utils
* remove encode_prediction
* improve visualize_depth and visualize_normals to accept multi-dimensional data and lists
remove visualization functions from the pipelines
move exporting depth as 16-bit PNGs functionality from the depth pipeline
update example docstrings
* do not shortcut vae.config variables
* change all asserts to raise ValueError
* rename output_prediction_type to output_type
* better variable names
clean up variable deletion code
* better variable names
* pass desc and leave kwargs into the diffusers progress_bar
implement nested progress bar for images and steps loops
* implement scale_invariant and shift_invariant flags in the ensemble_depth function
add scale_invariant and shift_invariant flags readout from the model config
further refactor ensemble_depth
support ensembling without alignment
add ensemble_depth docstring
* fix generator device placement checks
* move encode_empty_text body into the pipeline call
* minor empty text encoding simplifications
* adjust pipelines' class docstrings to explain the added construction arguments
* improve the scipy failure condition
add comments
improve docstrings
change the default use_full_z_range to True
* make input image values range check configurable in the preprocessor
refactor load_image_canonical in preprocessor to reject unknown types and return the image in the expected 4D format of tensor and on right device
support a list of everything as inputs to the pipeline, change type to PipelineImageInput
implement a check that all input list elements have the same dimensions
improve docstrings of pipeline outputs
remove check_input pipeline argument
* remove forgotten print
* add prediction_type model config
* add uncertainty visualization into export utils
fix NaN values in normals uncertainties
* change default of output_uncertainty to False
better handle the case of an attempt to export or visualize none
* fix `output_uncertainty=False`
* remove kwargs
fix check_inputs according to the new inputs of the pipeline
* rename prepare_latent into prepare_latents as in other pipelines
annotate prepare_latents in normals pipeline with "Copied from"
annotate encode_image in normals pipeline with "Copied from"
* move nested-capable `progress_bar` method into the pipelines
revert the original `progress_bar` method in pipeline_utils
* minor message improvement
* fix cpu offloading
* move colormap, visualize_depth, export_depth_to_16bit_png, visualize_normals, visualize_uncertainty to marigold_image_processing.py
update example docstrings
* fix missing comma
* change torch.FloatTensor to torch.Tensor
* fix importing of MarigoldImageProcessor
* fix vae offloading
fix batched image encoding
remove separate encode_image function and use vae.encode instead
* implement marigold's intial tests
relax generator checks in line with other pipelines
implement return_dict __call__ argument in line with other pipelines
* fix num_images computation
* remove MarigoldImageProcessor and outputs from import structure
update tests
* update docstrings
* update init
* update
* style
* fix
* fix
* up
* up
* up
* add simple test
* up
* update expected np input/output to be channel last
* move expand_tensor_or_array into the MarigoldImageProcessor
* rewrite tests to follow conventions - hardcoded slices instead of image artifacts
write more smoke tests
* add basic docs.
* add anton's contribution statement
* remove todos.
* fix assertion values for marigold depth slow tests
* fix assertion values for depth normals.
* remove print
* support AutoencoderTiny in the pipelines
* update documentation page
add Available Pipelines section
add Available Checkpoints section
add warning about num_inference_steps
* fix missing import in docstring
fix wrong value in visualize_depth docstring
* [doc] add marigold to pipelines overview
* [doc] add section "usage examples"
* fix an issue with latents check in the pipelines
* add "Frame-by-frame Video Processing with Consistency" section
* grammarly
* replace tables with images with css-styled images (blindly)
* style
* print
* fix the assertions.
* take from the github runner.
* take the slices from action artifacts
* style.
* update with the slices from the runner.
* remove unnecessary code blocks.
* Revert "[doc] add marigold to pipelines overview"
This reverts commit a505165150afd8dab23c474d1a054ea505a56a5f.
* remove invitation for new modalities
* split out marigold usage examples
* doc cleanup
---------
Co-authored-by: yiyixuxu <yixu310@gmail.com>
Co-authored-by: yiyixuxu <yixu310@gmail,com>
Co-authored-by: sayakpaul <spsayakpaul@gmail.com>
* add a more secure way to run tests from a PR.
* make pytest more secure.
* address dhruv's comments.
* improve validation check.
* Update .github/workflows/run_tests_from_a_pr.yml
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
---------
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
sampling bug fix in basic_training.md
In the diffusers basic training tutorial, setting the manual seed argument (generator=torch.manual_seed(config.seed)) in the pipeline call inside evaluate() function rewinds the dataloader shuffling, leading to overfitting due to the model seeing same sequence of training examples after every evaluation call. Using generator=torch.Generator(device='cpu').manual_seed(config.seed) avoids this.
* Update pipeline_stable_diffusion_instruct_pix2pix.py
Add `cross_attention_kwargs` to `__call__` method of `StableDiffusionInstructPix2PixPipeline`, which are passed to UNet.
* Update documentation for pipeline_stable_diffusion_instruct_pix2pix.py
* Update docstring
* Update docstring
* Fix typing import
* make _callback_tensor_inputs consistent between sdxl pipelines
* forgot this one
* fix failing test
* fix test_components_function
* fix controlnet inpaint tests
* Merged isinstance calls to make the code simpler.
* Corrected formatting errors using ruff.
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
Fix `added_cond_kwargs` when using IP-Adapter
Fix error when using IP-Adapter in pipeline and passing `ip_adapter_image_embeds` instead of `ip_adapter_image`
Co-authored-by: YiYi Xu <yixu310@gmail.com>
* Expand `diffusers-cli env`
* SafeTensors -> Safetensors
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Move `safetensors_version = "not installed"` to `else`
* Update `safetensors_version` checking
* Add GPU detection for Linux, Mac OS, and Windows
* Add accelerator detection to environment command
* Add is_peft_version to import_utils
* Update env.py
* Add `huggingface_hub` reference
* Add `transformers` reference
* Add reference for `huggingface_hub`
* Fix print statement in env.py for unusual OS
* Up
* Fix platform information in env.py
* up
* Fix import order in env.py
* ruff
* make style
* Fix platform system check in env.py
* Fix run method return type in env.py
* 🤗
* No need f-string
* Remove location info
* Remove accelerate config
* Refactor env.py to remove accelerate config
* feat: Add support for `bitsandbytes` library in environment command
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
* fixed vae loading issue #7619
* rerun make style && make quality
* bring back model_has_vae and add change \ to / in config_file_name on windows os to make match work
* add missing import platform
* bring back import model_info
* make config_file_name OS independent
* switch to using Path.as_posix() to resolve OS dependence
* improve style
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: bssrdf <bssrdf@gmail.com>
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
Update requirements.txt
If the datasets library is old, it will not read the metadata.jsonl and the label will default to an integer of type int.
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Fixed a wrong link to python versions in contributing.md file.
* Updated the link to a permalink, so that it will permanently point to the specific line.
* find & replace all FloatTensors to Tensor
* apply formatting
* Update torch.FloatTensor to torch.Tensor in the remaining files
* formatting
* Fix the rest of the places where FloatTensor is used as well as in documentation
* formatting
* Update new file from FloatTensor to Tensor
* Remove dead code
* PylancereportGeneralTypeIssues: Strings nested within an f-string cannot use the same quote character as the f-string prior to Python 3.12.
* Remove dead code
SDXL LoRA weights for text encoders should be decoupled on save
The method checks if at least one of unet, text_encoder and
text_encoder_2 lora weights are passed, which was not reflected in the
implentation.
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
`model_output.shape` may only have rank 1.
There are warnings related to use of random keys.
```
tests/schedulers/test_scheduler_flax.py: 13 warnings
/Users/phillypham/diffusers/src/diffusers/schedulers/scheduling_ddpm_flax.py:268: FutureWarning: normal accepts a single key, but was given a key array of shape (1, 2) != (). Use jax.vmap for batching. In a future JAX version, this will be an error.
noise = jax.random.normal(split_key, shape=model_output.shape, dtype=self.dtype)
tests/schedulers/test_scheduler_flax.py::FlaxDDPMSchedulerTest::test_betas
/Users/phillypham/virtualenv/diffusers/lib/python3.9/site-packages/jax/_src/random.py:731: FutureWarning: uniform accepts a single key, but was given a key array of shape (1,) != (). Use jax.vmap for batching. In a future JAX version, this will be an error.
u = uniform(key, shape, dtype, lo, hi) # type: ignore[arg-type]
```
* 7879 - adjust documentation to use naruto dataset, since pokemon is now gated
* replace references to pokemon in docs
* more references to pokemon replaced
* Japanese translation update
---------
Co-authored-by: bghira <bghira@users.github.com>
* Add Ascend NPU support for SDXL fine-tuning and fix the model saving bug when using DeepSpeed.
* fix check code quality
* Decouple the NPU flash attention and make it an independent module.
* add doc and unit tests for npu flash attention.
---------
Co-authored-by: mhh001 <mahonghao1@huawei.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* chore: reducing model sizes
* chore: shrinks further
* chore: shrinks further
* chore: shrinking model for img2img pipeline
* chore: reducing size of model for inpaint pipeline
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
FlaxStableDiffusionSafetyChecker sets main_input_name to "clip_input".
This makes StableDiffusionSafetyChecker consistent.
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
* Added get_velocity function to EulerDiscreteScheduler.
* Fix white space on blank lines
* Added copied from statement
* back to the original.
---------
Co-authored-by: Ruining Li <ruining@robots.ox.ac.uk>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
swap the order for do_classifier_free_guidance concat with repeat
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
* Check for latents, before calling prepare_latents - sdxlImg2Img
* Added latents check for all the img2img pipeline
* Fixed silly mistake while checking latents as None
A new function compute_dream_and_update_latents has been added to the
training utilities that allows you to do DREAM rectified training in line
with the paper https://arxiv.org/abs/2312.00210. The method can be used
with an extra argument in the train_text_to_image.py script.
Co-authored-by: Jimmy <39@🇺🇸.com>
* Convert channel order to BGR for the watermark encoder. Convert the watermarked BGR images back to RGB. Fixes#6292
* Revert channel order before stacking images to overcome limitations that negative strides are currently not supported
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Fixed wrong decorator by modifying it to @classmethod.
* Updated the method and it's argument.
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* add scheduled pseudo-huber loss training scripts
See #7488
* add reduction modes to huber loss
* [DB Lora] *2 multiplier to huber loss cause of 1/2 a^2 conv.
pairing of c6495def1f
* [DB Lora] add option for smooth l1 (huber / delta)
Pairing of dd22958caa
* [DB Lora] unify huber scheduling
Pairing of 19a834c3ab
* [DB Lora] add snr huber scheduler
Pairing of 47fb1a6854
* fixup examples link
* use snr schedule by default in DB
* update all huber scripts with snr
* code quality
* huber: make style && make quality
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Initialize target_unet from unet rather than teacher_unet so that we correctly add time_embedding.cond_proj if necessary.
* Use UNet2DConditionModel.from_config to initialize target_unet from unet's config.
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* give it a shot.
* print.
* correct assertion.
* gather results from the rest of the tests.
* change the assertion values where needed.
* remove print statements.
* get device <-> component mapping when using multiple gpus.
* condition the device_map bits.
* relax condition
* device_map progress.
* device_map enhancement
* some cleaning up and debugging
* Apply suggestions from code review
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* incorporate suggestions from PR.
* remove multi-gpu condition for now.
* guard check the component -> device mapping
* fix: device_memory variable
* dispatching transformers model to have force_hooks=True
* better guarding for transformers device_map
* introduce support balanced_low_memory and balanced_ultra_low_memory.
* remove device_map patch.
* fix: intermediate variable scoping.
* fix: condition in cpu offload.
* fix: flax class restrictions.
* remove modifications from cpu_offload and model_offload
* incorporate changes.
* add a simple forward pass test
* add: torch_device in get_inputs()
* add: tests
* remove print
* safe-guard to(), model offloading and cpu offloading when balanced is used as a device_map.
* style
* remove .
* safeguard device_map with more checks and remove invalid device_mapping strategues.
* make a class attribute and adjust tests accordingly.
* fix device_map check
* fix test
* adjust comment
* fix: device_map attribute
* fix: dispatching.
* max_memory test for pipeline
* version guard the tests
* fix guard.
* address review feedback.
* reset_device_map method.
* add: test for reset_hf_device_map
* fix a couple things.
* add reset_device_map() in the error message.
* add tests for checking reset_device_map doesn't have unintended consequences.
* fix reset_device_map and offloading tests.
* create _get_final_device_map utility.
* hf_device_map -> _hf_device_map
* add documentation
* add notes suggested by Marc.
* styling.
* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* move updates within gpu condition.
* other docs related things
* note on ignore a device not specified in .
* provide a suggestion if device mapping errors out.
* fix: typo.
* _hf_device_map -> hf_device_map
* Empty-Commit
* add: example hf_device_map.
---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* remove libsndfile1-dev and libgl1 from workflows and ensure that re present in the respective dockerfiles.
* change to self-hosted runner; let's see 🤞
* add libsndfile1-dev libgl1 for now
* use self-hosted runners for building and push too.
* Restore unet params back to normal from EMA when validation call is finished
* empty commit
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Allow safety and feature extractor arguments to be passed to convert_from_ckpt
Allows management of safety checker and feature extractor
from outside of the convert ckpt class.
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* reduce block sizes for unet1d.
* reduce blocks for unet_2d.
* reduce block size for unet_motion
* increase channels.
* correctly increase channels.
* reduce number of layers in unet2dconditionmodel tests.
* reduce block sizes for unet2dconditionmodel tests
* reduce block sizes for unet3dconditionmodel.
* fix: test_feed_forward_chunking
* fix: test_forward_with_norm_groups
* skip spatiotemporal tests on MPS.
* reduce block size in AutoencoderKL.
* reduce block sizes for vqmodel.
* further reduce block size.
* make style.
* Empty-Commit
* reduce sizes for ConsistencyDecoderVAETests
* further reduction.
* further block reductions in AutoencoderKL and AssymetricAutoencoderKL.
* massively reduce the block size in unet2dcontionmodel.
* reduce sizes for unet3d
* fix tests in unet3d.
* reduce blocks further in motion unet.
* fix: output shape
* add attention_head_dim to the test configuration.
* remove unexpected keyword arg
* up a bit.
* groups.
* up again
* fix
* Skip `test_freeu_enabled ` on MPS
* Small fixes
- import skip_mps correctly
- disable all instances of test_freeu_enabled
* Empty commit to trigger tests
* Empty commit to trigger CI
* increase number of workers for the tests.
* move to beefier runner.
* improve the fast push tests too.
* use a beefy machine for pytorch pipeline tests
* up the number of workers further.
* UniPC Multistep add `rescale_betas_zero_snr`
Same patch as DPM and Euler with the patched final alpha cumprod
BF16 doesn't seem to break down, I think cause UniPC upcasts during some
phases already? We could still force an upcast since it only
loses ≈ 0.005 it/s for me but the difference in output is very small. A
better endeavor might upcasting in step() and removing all the other
upcasts elsewhere?
* UniPC ZSNR UT
* Re-add `rescale_betas_zsnr` doc oops
* UniPC UTs iterate solvers on FP16
It wasn't catching errs on order==3. Might be excessive?
* UniPC Multistep fix tensor dtype/device on order=3
* UniPC UTs Add v_pred to fp16 test iter
For completions sake. Probably overkill?
* 7529 do not disable autocast for cuda devices
* Remove typecasting error check for non-mps platforms, as a correct autocast implementation makes it a non-issue
* add autocast fix to other training examples
* disable native_amp for dreambooth (sdxl)
* disable native_amp for pix2pix (sdxl)
* remove tests from remaining files
* disable native_amp on huggingface accelerator for every training example that uses it
* convert more usages of autocast to nullcontext, make style fixes
* make style fixes
* style.
* Empty-Commit
---------
Co-authored-by: bghira <bghira@users.github.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* start printing the tensors.
* print full throttle
* set static slices for 7 tests.
* remove printing.
* flatten
* disable test for controlnet
* what happens when things are seeded properly?
* set the right value
* style./
* make pia test fail to check things
* print.
* fix pia.
* checking for animatediff.
* fix: animatediff.
* video synthesis
* final piece.
* style.
* print guess.
* fix: assertion for control guess.
---------
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
* Add `final_sigma_zero` to UniPCMultistep
Effectively the same trick as DDIM's `set_alpha_to_one` and
DPM's `final_sigma_type='zero'`.
Currently False by default but maybe this should be True?
* `final_sigma_zero: bool` -> `final_sigmas_type: str`
Should 1:1 match DPM Multistep now.
* Set `final_sigmas_type='sigma_min'` in UniPC UTs
* Initial commit
* Implemented block lora
- implemented block lora
- updated docs
- added tests
* Finishing up
* Reverted unrelated changes made by make style
* Fixed typo
* Fixed bug + Made text_encoder_2 scalable
* Integrated some review feedback
* Incorporated review feedback
* Fix tests
* Made every module configurable
* Adapter to new lora test structure
* Final cleanup
* Some more final fixes
- Included examples in `using_peft_for_inference.md`
- Added hint that only attns are scaled
- Removed NoneTypes
- Added test to check mismatching lens of adapter names / weights raise error
* Update using_peft_for_inference.md
* Update using_peft_for_inference.md
* Make style, quality, fix-copies
* Updated tutorial;Warning if scale/adapter mismatch
* floats are forwarded as-is; changed tutorial scale
* make style, quality, fix-copies
* Fixed typo in tutorial
* Moved some warnings into `lora_loader_utils.py`
* Moved scale/lora mismatch warnings back
* Integrated final review suggestions
* Empty commit to trigger CI
* Reverted emoty commit to trigger CI
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* speed up test_vae_slicing in animatediff
* speed up test_karras_schedulers_shape for attend and excite.
* style.
* get the static slices out.
* specify torch print options.
* modify
* test run with controlnet
* specify kwarg
* fix: things
* not None
* flatten
* controlnet img2img
* complete controlet sd
* finish more
* finish more
* finish more
* finish more
* finish the final batch
* add cpu check for expected_pipe_slice.
* finish the rest
* remove print
* style
* fix ssd1b controlnet test
* checking ssd1b
* disable the test.
* make the test_ip_adapter_single controlnet test more robust
* fix: simple inpaint
* multi
* disable panorama
* enable again
* panorama is shaky so leave it for now
* remove print
* raise tolerance.
* Bug fix for controlnetpipeline check_image
Bug fix for controlnetpipeline check_image when using multicontrolnet and prompt list
* Update test_inference_multiple_prompt_input function
* Update test_controlnet.py
add test for multiple prompts and multiple image conditioning
* Update test_controlnet.py
Fix format error
---------
Co-authored-by: Lvkesheng Shen <45848260+Fantast416@users.noreply.github.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* add remove_all_hooks
* a few more fix and tests
* up
* Update src/diffusers/pipelines/pipeline_utils.py
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* split tests
* add
---------
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* apple mps: training support for SDXL LoRA
* sdxl: support training lora, dreambooth, t2i, pix2pix, and controlnet on apple mps
---------
Co-authored-by: bghira <bghira@users.github.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* mps: fix XL pipeline inference at training time due to upstream pytorch bug
* Update src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* apply the safe-guarding logic elsewhere.
---------
Co-authored-by: bghira <bghira@users.github.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
you cannot specify `type="bool"` and `action="store_true"` at the same time.
remove excessive and buggy `type=bool`.
Co-authored-by: Linoy Tsaban <57615435+linoytsaban@users.noreply.github.com>
* feat: support dora loras from community
* safe-guard dora operations under peft version.
* pop use_dora when False
* make dora lora from kohya work.
* fix: kohya conversion utils.
* add a fast test for DoRA compatibility..
* add a nightly test.
* fixed typo
* updated doc to be consistent in naming
* make style/quality
* preprocessing for 4 channels and not 6
* make style
* test for 4c
* make style/quality
* fixed test on cpu
* fixed doc typo
* changed default ckpt to 4c
* Update pipeline_stable_diffusion_ldm3d.py
* fix bug
---------
Co-authored-by: Aflalo <estellea@isl-iam1.rr.intel.com>
Co-authored-by: Aflalo <estellea@isl-gpu33.rr.intel.com>
Co-authored-by: Aflalo <estellea@isl-gpu38.rr.intel.com>
* Add properties and `IPAdapterTesterMixin` tests for `StableDiffusionPanoramaPipeline`
* Update torch manual seed to use `torch.Generator(device=device)`
* Refactor 📞🔙 to support `callback_on_step_end`
* make fix-copies
* fix freeinit impl
* fix progress bar
* fix progress bar and remove old code
* fix num_inference_steps==1 case for freeinit by atleast running 1 step when fast sampling enabled
* checking to improve pipelines.
* more fixes.
* add: tip to encourage the usage of revision
* Apply suggestions from code review
* retrigger ci
---------
Co-authored-by: YiYi Xu <yixu310@gmail.com>
* Fix ControlNetModel.from_unet do not load add_embedding
* delete white space in blank line
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* debugging
* let's see the numbers
* let's see the numbers
* let's see the numbers
* restrict tolerance.
* increase inference steps.
* shallow copy of cross_attentionkwargs
* remove print
* pop scale from the top-level unet instead of getting it.
* improve readability.
* Apply suggestions from code review
Co-authored-by: YiYi Xu <yixu310@gmail.com>
* fix a little bit.
---------
Co-authored-by: YiYi Xu <yixu310@gmail.com>
* Add properties and `IPAdapterTesterMixin` tests for `StableDiffusionPanoramaPipeline`
* Fix variable name typo and update comments
* Update deprecated `output_type="numpy"` to "np" in test files
* Discard changes to src/diffusers/pipelines/stable_diffusion_panorama/pipeline_stable_diffusion_panorama.py
* Update test_stable_diffusion_panorama.py
* Update numbers in README.md
* Update get_guidance_scale_embedding method to use timesteps instead of w
* Update number of checkpoints in README.md
* Add type hints and fix var name
* Fix PyTorch's convention for inplace functions
* Fix a typo
* Revert "Fix PyTorch's convention for inplace functions"
This reverts commit 74350cf65b.
* Fix typos
* Indent
* Refactor get_guidance_scale_embedding method in LEditsPPPipelineStableDiffusionXL class
* log loss per image
* add commandline param for per image loss logging
* style
* debug-loss -> debug_loss
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Change step_offset scheduler docstrings
* Mention it may be needed by some models
* More docstrings
These ones failed literal S&R because I performed it case-sensitive
which is fun.
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* add: support for notifying maintainers about the nightly test status
* add: a tempoerary workflow for validation.
* cancel in progress.
* runs-on
* clean up
* add: peft dep
* change device.
* multiple edits.
* remove temp workflow.
* add: a workflow to check if docker containers can be built if the files are modified.
* type
* unify docker image build test and push
* make it run on prs too.
* check
* check
* check
* check again.
* remove docker test build file.
* remove extra dependencies./
* check
* Initial commit
* Removed copy hints, as in original SDXLControlNetPipeline
Removed copy hints, as in original SDXLControlNetPipeline, as the `make fix-copies` seems to have issues with the @property decorator.
* Reverted changes to ControlNetXS
* Addendum to: Removed changes to ControlNetXS
* Added test+docs for mixture of denoiser
* Update docs/source/en/using-diffusers/controlnet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/using-diffusers/controlnet.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* I added a new doc string to the class. This is more flexible to understanding other developers what are doing and where it's using.
* Update src/diffusers/models/unet_2d_blocks.py
This changes suggest by maintener.
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Update src/diffusers/models/unet_2d_blocks.py
Add suggested text
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Update unet_2d_blocks.py
I changed the Parameter to Args text.
* Update unet_2d_blocks.py
proper indentation set in this file.
* Update unet_2d_blocks.py
a little bit of change in the act_fun argument line.
* I run the black command to reformat style in the code
* Update unet_2d_blocks.py
similar doc-string add to have in the original diffusion repository.
* Fix bug for mention in this issue section #6901
* Update src/diffusers/schedulers/scheduling_ddim_flax.py
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* Fix linter
* Restore empty line
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* copied from for t2i pipelines without ip adapter support.
* two more pipelines with proper copied from comments.
* revert to the original implementation
* throw error when patch inputs and layernorm are provided for transformers2d.
* add comment on supported norm_types in transformers2d
* more check
* fix: norm _type handling
* [bug] Fix float/int guidance scale not working in `StableVideoDiffusionPipeline`
* Add test to disable CFG on SVD
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* support and example launch for sdxl turbo
* White space fixes
* Trailing whitespace character
* ruff format
* fix guidance_scale and steps for turbo mode
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Radames Ajna <radamajna@gmail.com>
* update svd docs
* fix example doc string
* update return type hints/docs
* update type hints
* Fix typos in pipeline_stable_video_diffusion.py
* make style && make fix-copies
* Update src/diffusers/pipelines/stable_video_diffusion/pipeline_stable_video_diffusion.py
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update src/diffusers/pipelines/stable_video_diffusion/pipeline_stable_video_diffusion.py
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* update based on suggestion
---------
Co-authored-by: M. Tolga Cangöz <mtcangoz@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Enable FakeTensorMode for EulerDiscreteScheduler scheduler
PyTorch's FakeTensorMode does not support `.numpy()` or `numpy.array()`
calls.
This PR replaces `sigmas` numpy tensor by a PyTorch tensor equivalent
Repro
```python
with torch._subclasses.FakeTensorMode() as fake_mode, ONNXTorchPatcher():
fake_model = DiffusionPipeline.from_pretrained(model_name, low_cpu_mem_usage=False)
```
that otherwise would fail with
`RuntimeError: .numpy() is not supported for tensor subclasses.`
* Address comments
* add tags for diffusers training
* add tags for diffusers training
* add tags for diffusers training
* add tags for diffusers training
* add tags for diffusers training
* add tags for diffusers training
* add dora tags for drambooth lora scripts
* style
* add is_dora arg
* style
* add dora training feature to sd 1.5 script
* added notes about DoRA training
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* initial
* check_inputs fix to the rest of pipelines
* add fix for no cfg too
* use of variable
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Add copyright notice to relevant files and fix typos
* Set `timestep_spacing` parameter of `StableDiffusionXLPipeline`'s scheduler to `'trailing'`.
* Update `StableDiffusionXLPipeline.from_single_file` by including EulerAncestralDiscreteScheduler with `timestep_spacing="trailing"` param.
* Update model loading method in SDXL Turbo documentation
* move model helper function in pipeline to EfficiencyMixin
---------
Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* DPMMultistep rescale_betas_zero_snr
* DPM upcast samples in step()
* DPM rescale_betas_zero_snr UT
* DPMSolverMulti move sample upcast after model convert
Avoids having to re-use the dtype.
* Add a newline for Ruff
* log_validation unification for controlnet.
* additional fixes.
* remove print.
* better reuse and loading
* make final inference run conditional.
* Update examples/controlnet/README_sdxl.md
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* resize the control image in the snippet.
---------
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* Make LoRACompatibleConv padding_mode work.
* Format code style.
* add fast test
* Update src/diffusers/models/lora.py
Simplify the code by patrickvonplaten.
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* code refactor
* apply patrickvonplaten suggestion to simplify the code.
* rm test_lora_layers_old_backend.py and add test case in test_lora_layers_peft.py
* update test case.
---------
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
* modulize log validation
* run make style and refactor wanddb support
* remove redundant initialization
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* make checkpoint_merger pipeline pass the "variant" argument to from_pretrained()
* make style
---------
Co-authored-by: Lincoln Stein <lstein@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
* add stable_diffusion_xl_ipex community pipeline
* make style for code quality check
* update docs as suggested
---------
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* standardize model card
* fix tags
* correct import styling and update tags
* run make style and make quality
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* feat: allow low_cpu_mem_usage in ip adapter loading
* reduce the number of device placements.
* documentation.
* throw low_cpu_mem_usage warning only once from the main entry point.
* use load_model_into_meta in single file utils
* propagate to autoencoder and controlnet.
* correct class name access behaviour.
* remove torch_dtype from load_model_into_meta; seems unncessary
* remove incorrect kwarg
* style to avoid extra unnecessary line breaks
* fix: bias loading bug
* fixes for SDXL
* apply changes to the conversion script to match single_file_utils.py
* do transpose to match the single file loading logic.
Remove <cat-toy> validation prompt from textual_inversion_sdxl.py
The `<cat-toy>` validation prompt is a default choice for the example task in the README. But no other part of `textual_inversion_sdxl.py` references the cat toy and `textual_inversion.py` has a default validation prompt of `None` as well.
So bring `textual_inversion_sdxl.py` in line with `textual_inversion.py` and change default validation prompt to `None`
* attention_head_dim
* debug
* print more info
* correct num_attention_heads behaviour
* down_block_num_attention_heads -> num_attention_heads.
* correct the image link in doc.
* add: deprecation for num_attention_head
* fix: test argument to use attention_head_dim
* more fixes.
* quality
* address comments.
* remove depcrecation.
* add: support for passing ip adapter image embeddings
* debugging
* make feature_extractor unloading conditioned on safety_checker
* better condition
* type annotation
* index to look into value slices
* more debugging
* debugging
* serialize embeddings dict
* better conditioning
* remove unnecessary prints.
* Update src/diffusers/loaders/ip_adapter.py
Co-authored-by: YiYi Xu <yixu310@gmail.com>
* make fix-copies and styling.
* styling and further copy fixing.
* fix: check_inputs call in controlnet sdxl img2img pipeline
---------
Co-authored-by: YiYi Xu <yixu310@gmail.com>
* feat: standarize model card creation for dreambooth training.
* correct 'inference
* remove comments.
* take component out of kwargs
* style
* add: card template to have a leaner description.
* widget support.
* propagate changes to train_dreambooth_lora
* propagate changes to custom diffusion
* make widget properly type-annotated
* fix: callback function name is incorrect
On this tutorial there is a function defined and then used inside `callback_on_step_end` argument, but the name was not correct (mismatch)
* fix: typo in num_timestep (correct is num_timesteps)
fixed property name
* remove _to_tensor
* remove _to_tensor definition
* remove _collapse_frames_into_batch
* remove lora for not bloating the code.
* remove sample_size.
* simplify code a bit more
* ensure timesteps are always in tensor.
* Fix `AutoencoderTiny` with `use_slicing`
When using slicing with AutoencoderTiny, the encoder mistakenly encodes the entire batch for every image in the batch.
* Fixed formatting issue
* add noise_offset param
* micro conditioning - wip
* image processing adjusted and moved to support micro conditioning
* change time ids to be computed inside train loop
* change time ids to be computed inside train loop
* change time ids to be computed inside train loop
* time ids shape fix
* move token replacement of validation prompt to the same section of instance prompt and class prompt
* add offset noise to sd15 advanced script
* fix token loading during validation
* fix token loading during validation in sdxl script
* a little clean
* style
* a little clean
* style
* sdxl script - a little clean + minor path fix
sd 1.5 script - change default resolution value
* ad 1.5 script - minor path fix
* fix missing comma in code example in model card
* clean up commented lines
* style
* remove time ids computed outside training loop - no longer used now that we utilize micro-conditioning, as all time ids are now computed inside the training loop
* style
* [WIP] - added draft readme, building off of examples/dreambooth/README.md
* readme
* readme
* readme
* readme
* readme
* readme
* readme
* readme
* removed --crops_coords_top_left from CLI args
* style
* fix missing shape bug due to missing RGB if statement
* add blog mention at the start of the reamde as well
* Update examples/advanced_diffusion_training/README.md
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* change note to render nicely as well
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Fix bug in ResnetBlock2D.forward when not USE_PEFT_BACKEND and using scale_shift for time emb where the lora scale gets overwritten.
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* fix minsnr implementation for v-prediction case
* format code
* always compute snr when snr_gamma is specified
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* feat: explicitly tag to diffusers when using push_to_hub
* remove tags.
* reset repo.
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* fix: tests
* fix: push_to_hub behaviour for tagging from save_pretrained
* Apply suggestions from code review
Co-authored-by: Lucain <lucainp@gmail.com>
* Apply suggestions from code review
Co-authored-by: Lucain <lucainp@gmail.com>
* import fixes.
* add library name to existing model card.
* add: standalone test for generate_model_card
* fix tests for standalone method
* moved library_name to a better place.
* merge create_model_card and generate_model_card.
* fix test
* address lucain's comments
* fix return identation
* Apply suggestions from code review
Co-authored-by: Lucain <lucainp@gmail.com>
* address further comments.
* Update src/diffusers/pipelines/pipeline_utils.py
Co-authored-by: Lucain <lucainp@gmail.com>
---------
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Lucain <lucainp@gmail.com>
* initial commit for unconditional/class-conditional consistency training script
* make style
* Add entry for consistency training script in community README.
* Move consistency training script from community to research_projects/consistency_training
* Add requirements.txt and README to research_projects/consistency_training directory.
* Manually revert community README changes for consistency training.
* Fix path to script after moving script to research projects.
* Add option to load U-Net weights from pretrained model.
---------
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* begin animatediff img2video and video2video
* revert animatediff to original implementation
* add img2video as pipeline
* update
* add vid2vid pipeline
* update imports
* update
* remove copied from line for check_inputs
* update
* update examples
* add multi-batch support
* fix __init__.py files
* move img2vid to community
* update community readme and examples
* fix
* make fix-copies
* add vid2vid batch params
* apply suggestions from review
Co-Authored-By: Dhruv Nair <dhruv.nair@gmail.com>
* add test for animatediff vid2vid
* torch.stack -> torch.cat
Co-Authored-By: Dhruv Nair <dhruv.nair@gmail.com>
* make style
* docs for vid2vid
* update
* fix prepare_latents
* fix docs
* remove img2vid
* update README to :main
* remove slow test
* refactor pipeline output
* update docs
* update docs
* merge community readme from :main
* final fix i promise
* add support for url in animatediff example
* update example
* update callbacks to latest implementation
* Update src/diffusers/pipelines/animatediff/pipeline_animatediff_video2video.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/diffusers/pipelines/animatediff/pipeline_animatediff_video2video.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* fix merge
* Apply suggestions from code review
* remove callback and callback_steps as suggested in review
* Update tests/pipelines/animatediff/test_animatediff_video2video.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* fix import error caused due to unet refactor in #6630
* fix numpy import error after tensor2vid refactor in #6626
* make fix-copies
* fix numpy error
* fix progress bar test
---------
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* sd1.5 support in separate script
A quick adaptation to support people interested in using this method on 1.5 models.
* sd15 prompt text encoding and unet conversions
as per @linoytsaban 's recommendations. Testing would be appreciated,
* Readability and quality improvements
Removed some mentions of SDXL, and some arguments that don't apply to sd 1.5, and cleaned up some comments.
* make style/quality commands
* tracker rename and run-it doc
* Update examples/advanced_diffusion_training/train_dreambooth_lora_sd15_advanced.py
* Update examples/advanced_diffusion_training/train_dreambooth_lora_sd15_advanced.py
---------
Co-authored-by: Linoy Tsaban <57615435+linoytsaban@users.noreply.github.com>
* move unets to module 🦋
* parameterize unet-level import.
* fix flax unet2dcondition model import
* models __init__
* mildly depcrecating models.unet_2d_blocks in favor of models.unets.unet_2d_blocks.
* noqa
* correct depcrecation behaviour
* inherit from the actual classes.
* Empty-Commit
* backwards compatibility for unet_2d.py
* backward compatibility for unet_2d_condition
* bc for unet_1d
* bc for unet_1d_blocks
* Fixed the bug related to saving DeepSpeed models.
* Add information about training SD models using DeepSpeed to the README.
* Apply suggestions from code review
---------
Co-authored-by: mhh001 <mahonghao1@huawei.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* - extract function for stage in UNet2DConditionModel init & forward
- Add new function get_mid_block() to unet_2d_blocks.py
* add type hint to get_mid_block aligned with get_up_block and get_down_block; rename _set_xxx function
* add type hint and use keyword arguments
* remove `copy from` in versatile diffusion
* add animatediff img2vid
* fix
* Update examples/community/README.md
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* fix code snippet between ip adapter face id and animatediff img2vid
---------
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* [Fix] Multiple image conditionings in a single batch for `StableDiffusionControlNetPipeline`.
* Refactor `check_inputs` in `StableDiffusionControlNetPipeline` to avoid redundant codes.
* Make the behavior of MultiControlNetModel to be the same to the original ControlNetModel
* Keep the code change minimum for nested list support
* Add fast test `test_inference_nested_image_input`
* Remove redundant check for nested image condition in `check_inputs`
Remove `len(image) == len(prompt)` check out of `check_image()`
Co-authored-by: YiYi Xu <yixu310@gmail.com>
* Better `ValueError` message for incompatible nested image list size
Co-authored-by: YiYi Xu <yixu310@gmail.com>
* Fix syntax error in `check_inputs`
* Remove warning message for multi-ControlNets with multiple prompts
* Fix a typo in test_controlnet.py
Co-authored-by: YiYi Xu <yixu310@gmail.com>
* Add test case for multiple prompts, single image conditioning in `StableDiffusionMultiControlNetPipelineFastTests`
* Improved `ValueError` message for nested `controlnet_conditioning_scale`
* Documenting the behavior of image list as `StableDiffusionControlNetPipeline` input
---------
Co-authored-by: YiYi Xu <yixu310@gmail.com>
* Fixes#6418 Advanced Dreambooth LoRa Training
* change order of import to fix nit
* fix nit, use cast_training_params
* remove torch.compile fix, will move to a new PR
* remove unnecessary import
* Enable image resizing to adjust its height and width in StableDiffusionXLInstructPix2PixPipeline
* Ensure that validation is performed at every 'validation_step', not at every step
* fix: training resume from fp16.
* add: comment
* remove residue from another branch.
* remove more residues.
* thanks to Younes; no hacks.
* style.
* clean things a bit and modularize _set_state_dict_into_text_encoder
* add comment about the fix detailed.
* support compile
* make style
* move unwrap_model inside function
* change unwrap call
* run make style
* Update examples/dreambooth/train_dreambooth.py
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Revert "Update examples/dreambooth/train_dreambooth.py"
This reverts commit 70ab09732e.
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Remove conversion to RGB
* Add a Conversion Function
* Add type hint for convert_method
* Update src/diffusers/utils/loading_utils.py
Update docstring
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update docstring
* Optimize imports
* Optimize imports (2)
* Reformat code
---------
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* base template file - train_instruct_pix2pix.py
* additional import and parser argument requried for lora
* finetune only instructpix2pix model -- no need to include these layers
* inject lora layers
* freeze unet model -- only lora layers are trained
* training modifications to train only lora parameters
* store only lora parameters
* move train script to research project
* run quality and style code checks
* move train script to a new folder
* add README
* update README
* update references in README
---------
Co-authored-by: Rahul Raman <rahulraman@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* enable stable-xl textual inversion
* check if optimizer_2 exists
* check text_encoder_2 before using
* add textual inversion for sdxl in a single file
* fix style
* fix example style
* reset for error changes
* add readme for sdxl
* fix style
* disable autocast as it will cause cast error when weight_dtype=bf16
* fix spelling error
* fix style and readme and 8bit optimizer
* add README_sdxl.md link
* add tracker key on log_validation
* run style
* rm the second center crop
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* add tutorials to toctree.yml
* fix title
* fix words
* add overview ja
* fix diffusion to 拡散
* fix line 21
* add space
* delete supported pipline
* fix tutorial_overview.md
* fix space
* fix typo
* Delete docs/source/ja/tutorials/using_peft_for_inference.md
this file is not translated
* Delete docs/source/ja/tutorials/basic_training.md
this file is not translated
* Delete docs/source/ja/tutorials/autopipeline.md
this file is not translated
* fix toctree
* add: experimental script for diffusion dpo training.
* random_crop cli.
* fix: caption tokenization.
* fix: pixel_values index.
* fix: grad?
* debug
* fix: reduction.
* fixes in the loss calculation.
* style
* fix: unwrap call.
* fix: validation inference.
* add: initial sdxl script
* debug
* make sure images in the tuple are of same res
* fix model_max_length
* report print
* boom
* fix: numerical issues.
* fix: resolution
* comment about resize.
* change the order of the training transformation.
* save call.
* debug
* remove print
* manually detaching necessary?
* use the same vae for validation.
* add: readme.
* unwrap text encoder when saving hook only for full text encoder tuning
* unwrap text encoder when saving hook only for full text encoder tuning
* save embeddings in each checkpoint as well
* save embeddings in each checkpoint as well
* save embeddings in each checkpoint as well
* Update examples/advanced_diffusion_training/train_dreambooth_lora_sdxl_advanced.py
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* add documentation for DeepCache
* fix typo
* add wandb url for DeepCache
* fix some typos
* add item in _toctree.yml
* update formats for arguments
* Update deepcache.md
* Update docs/source/en/optimization/deepcache.md
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* add StableDiffusionXLPipeline in doc
* Separate SDPipeline and SDXLPipeline
* Add the paper link of ablation experiments for hyper-parameters
* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Make WDS pipeline interpolation type configurable.
* Make the VAE encoding batch size configurable.
* Make lora_alpha and lora_dropout configurable for LCM LoRA scripts.
* Generalize scalings_for_boundary_conditions function and make the timestep scaling configurable.
* Make LoRA target modules configurable for LCM-LoRA scripts.
* Move resolve_interpolation_mode to src/diffusers/training_utils.py and make interpolation type configurable in non-WDS script.
* apply suggestions from review
* debug
* debug test_with_different_scales_fusion_equivalence
* use the right method.
* place it right.
* let's see.
* let's see again
* alright then.
* add a comment.
* I added a new doc string to the class. This is more flexible to understanding other developers what are doing and where it's using.
* Update src/diffusers/models/unet_2d_blocks.py
This changes suggest by maintener.
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Update src/diffusers/models/unet_2d_blocks.py
Add suggested text
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Update unet_2d_blocks.py
I changed the Parameter to Args text.
* Update unet_2d_blocks.py
proper indentation set in this file.
* Update unet_2d_blocks.py
a little bit of change in the act_fun argument line.
* I run the black command to reformat style in the code
* Update unet_2d_blocks.py
similar doc-string add to have in the original diffusion repository.
* Batter way to write binarize function
* Solve check_code_quality error
* My mistake to run pull request but not reformated file
* Update image_processor.py
* remove extra variable and space
* Update image_processor.py
* Run ruff libarary to reformat my file
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
* add: test to check if peft loras are loadable in non-peft envs.
* add torch_device approrpiately.
* fix: get_dummy_inputs().
* test logits.
* rename
* debug
* debug
* fix: generator
* new assertion values after fixing the seed.
* shape
* remove print statements and settle this.
* to update values.
* change values when lora config is initialized under a fixed seed.
* update colab link
* update notebook link
* sanity restored by getting the exact same values without peft.
* change timesteps used to calculate snr when --with_prior_preservation is enabled
* change timesteps used to calculate snr when --with_prior_preservation is enabled (canonical script)
* style
* revert canonical script to before snr gamma change
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Add unload_ip_adapter method
* Update attn_processors with original layers
* Add test
* Use set_default_attn_processor
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Fix gradient-checkpointing option is ignored in SDXL+LoRA training. (#6388)
* Fix gradient-checkpointing option is ignored in SD+LoRA training.
* Fix gradient checkpoint is not applied to text encoders. (SDXL+LoRA)
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* add doc for diffusion fast
* add entry to _toctree
* Apply suggestions from code review
* fix titlew
* fix: title entry
* add note about fuse_qkv_projections
* add adapter_name in fuse
* add tesrt
* up
* fix CI
* adapt from suggestion
* Update src/diffusers/utils/testing_utils.py
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
* change to `require_peft_version_greater`
* change variable names in test
* Update src/diffusers/loaders/lora.py
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
* break into 2 lines
* final comments
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
* [Peft] fix saving / loading when unet is not "unet"
* Update src/diffusers/loaders/lora.py
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* undo stablediffusion-xl changes
* use unet_name to get unet for lora helpers
* use unet_name
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* remove validation args from textual onverson tests
* reduce number of train steps in textual inversion tests
* fix: directories.
* debig
* fix: directories.
* remove validation tests from textual onversion
* try reducing the time of test_text_to_image_checkpointing_use_ema
* fix: directories
* speed up test_text_to_image_checkpointing
* speed up test_text_to_image_checkpointing_checkpoints_total_limit_removes_multiple_checkpoints
* fix
* speed up test_instruct_pix2pix_checkpointing_checkpoints_total_limit_removes_multiple_checkpoints
* set checkpoints_total_limit to 2.
* test_text_to_image_lora_checkpointing_checkpoints_total_limit_removes_multiple_checkpoints speed up
* speed up test_unconditional_checkpointing_checkpoints_total_limit_removes_multiple_checkpoints
* debug
* fix: directories.
* speed up test_instruct_pix2pix_checkpointing_checkpoints_total_limit
* speed up: test_controlnet_checkpointing_checkpoints_total_limit_removes_multiple_checkpoints
* speed up test_controlnet_sdxl
* speed up dreambooth tests
* speed up test_dreambooth_lora_checkpointing_checkpoints_total_limit_removes_multiple_checkpoints
* speed up test_custom_diffusion_checkpointing_checkpoints_total_limit_removes_multiple_checkpoints
* speed up test_text_to_image_lora_sdxl_text_encoder_checkpointing_checkpoints_total_limit
* speed up # checkpoint-2 should have been deleted
* speed up examples/text_to_image/test_text_to_image.py::TextToImage::test_text_to_image_checkpointing_checkpoints_total_limit
* additional speed ups
* style
* fix RuntimeError: Input type (float) and bias type (c10::Half) should be the same
* format source code
* format code
* remove the autocast blocks within the pipeline
* add autocast blocks to pipeline caller in train_text_to_image_lora.py
* [Community Pipeline] Add Marigold Monocular Depth Estimation
- add single-file pipeline
- update README
* fix format - add one blank line
* format script with ruff
* use direct image link in example code
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* separate out upsamplers and downsamplers.
* import all the necessary blocks in resnet for backward comp.
* move upsample2d and downsample2d to utils.
* move downsample_2d to downsamplers.py
* apply feedback
* fix import
* samplers -> sampling
* EulerAncestral add `rescale_betas_zero_snr`
Uses same infinite sigma fix from EulerDiscrete. Interestingly the
ancestral version had the opposite problem: too much contrast instead of
too little.
* UT for EulerAncestral `rescale_betas_zero_snr`
* EulerAncestral upcast samples during step()
It helps this scheduler too, particularly when the model is using bf16.
While the noise dtype is still the model's it's automatically upcasted
for the add so all it affects is determinism.
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* fix: unscale fp16 gradient problem
* fix for dreambooth lora sdxl
* make the type-casting conditional.
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
---------
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* fix: init for vae during pixart tests
* print the values
* add flatten
* correct assertion value for test_inference
* correct assertion values for test_inference_non_square_images
* run styling
* debug test_inference_with_multiple_images_per_prompt
* fix assertion values for test_inference_with_multiple_images_per_prompt
Typo: The script for LoRA training is `train_text_to_image_lora_prior.py` not `train_text_to_image_prior_lora.py`.
Alternatively you could rename the file and keep the README.md unchanged.
* feat: introduce autoencoders module
* more changes for styling and copy fixing
* path changes in the docs.
* fix: import structure in init.
* fix controlnetxs import
* Clean up comments in LCM(-LoRA) distillation scripts.
* Calculate predicted source noise noise_pred correctly for all prediction_types.
* make style
* apply suggestions from review
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* load pipeline for inference only if validation prompt is used
* move things outside
* load pipeline for inference only if validation prompt is used
* fix readme when validation prompt is used
---------
Co-authored-by: linoytsaban <linoy@huggingface.co>
Co-authored-by: apolinário <joaopaulo.passos@gmail.com>
* fix broken example in pipeline_stable_diffusion_safe
* fix typo in pipeline_stable_diffusion_pix2pix_zero
* add missing docs
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* support ip-adapter in src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_upscale.py
* support ip-adapter in src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_attend_and_excite.py
* support ip-adapter in src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_instruct_pix2pix.py
* update tests
* support ip-adapter in src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_panorama.py
* support ip-adapter in src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_sag.py
* support ip-adapter in src/diffusers/pipelines/stable_diffusion_safe/pipeline_stable_diffusion_safe.py
* support ip-adapter in src/diffusers/pipelines/latent_consistency_models/pipeline_latent_consistency_text2img.py
* support ip-adapter in src/diffusers/pipelines/latent_consistency_models/pipeline_latent_consistency_img2img.py
* support ip-adapter in src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_ldm3d.py
* revert changes to sd_attend_and_excite and sd_upscale
* make style
* fix broken tests
* update ip-adapter implementation to latest
* apply suggestions from review
---------
Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Fix SD scripts - there are only 2 items per batch
* Adjustments to make the SDXL scripts work with other datasets
* Use public webdataset dataset for examples
* make style
* Minor tweaks to the readmes.
* Stress that the database is illustrative.
* utils and test modifications to enable device agnostic testing
* device for manual seed in unet1d
* fix generator condition in vae test
* consistency changes to testing
* make style
* add device agnostic testing changes to source and one model test
* make dtype check fns private, log cuda fp16 case
* remove dtype checks from import utils, move to testing_utils
* adding tests for most model classes and one pipeline
* fix vae import
* Update train_dreambooth_lora_sdxl_advanced.py
* remove global function args from dreamboothdataset class
* style
* style
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* improve help tags
* style fix
* changes token_abstraction type to string.
support multiple concepts for pivotal using a comma separated string.
* style fixup
* changed logger to warning (not yet available)
* moved the token_abstraction parsing to be in the same block as where we create the mapping of identifier to token
---------
Co-authored-by: Linoy <linoy@huggingface.co>
* Update value_guided_sampling.py
Changed the scheduler step function as predict_epsilon parameter is not there in latest DDPM Scheduler
* Update value_guided_sampling.md
Updated a link to a working notebook
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* fix: duplicate unet prefix problem.
* Update src/diffusers/loaders/lora.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
---------
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* adapt PixArtAlphaPipeline for pixart-lcm model
* remove original_inference_steps from __call__
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* LLMGroundedDiffusionPipeline: inherit from DiffusionPipeline and fix peft
* Use main in the revision in the examples
* Add "Copied from" statements in comments
* Fix formatting with ruff
* imports and readme bug fixes
* bug fix - ensures text_encoder params are dtype==float32 (when using pivotal tuning) even if the rest of the model is loaded in fp16
* added pivotal tuning to readme
* mapping token identifier to new inserted token in validation prompt (if used)
* correct default value of --train_text_encoder_frac
* change default value of --adam_weight_decay_text_encoder
* validation prompt generations when using pivotal tuning bug fix
* style fix
* textual inversion embeddings name change
* style fix
* bug fix - stopping text encoder optimization halfway
* readme - will include token abstraction and new inserted tokens when using pivotal tuning
- added type to --num_new_tokens_per_abstraction
* style fix
---------
Co-authored-by: Linoy Tsaban <linoy@huggingface.co>
* make `requires_safety_checker` a kwarg instead of a positional argument as it's more future-proof
* apply `make style` formatting edits
* add image_encoder to arguments and pass to super constructor
* add diffusers example
* add diffusers example
* Comment about making it faster
* Apply suggestions from code review
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
---------
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* Fixed custom module importing on Windows
Windows use back slash and `os.path.join()` follows that convention.
* Apply suggestions from code review
Co-authored-by: Lucain <lucainp@gmail.com>
* Update pipeline_utils.py
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Lucain <lucainp@gmail.com>
* integrated sdxl for the text2video-zero pipeline
* make fix-copies
* fixed CI issues
* make fix-copies
* added docs and `copied from` statements
* added fast tests
* made a small change in docs
* quality+style check fix
* updated docs. added controlnet inference with sdxl
* added device compatibility for fast tests
* fixed docstrings
* changing vae upcasting
* remove torch.empty_cache to speed up inference
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* made fast tests to run on dummy models only, fixed copied from statements
* fixed testing utils imports
* Added bullet points for SDXL support
* fixed formatting & quality
* Update tests/pipelines/text_to_video/test_text_to_video_zero_sdxl.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update tests/pipelines/text_to_video/test_text_to_video_zero_sdxl.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* fixed minor error for merging
* fixed updates of sdxl
* made fast tests inherit from `PipelineTesterMixin` and run in 3-4secs on CPU
* make style && make quality
* reimplemented fast tests w/o default attn processor
* make style & make quality
* make fix-copies
* make fix-copies
* fixed docs
* make style & make quality & make fix-copies
* bug fix in cross attention
* make style && make quality
* make fix-copies
* fix gpu issues
* make fix-copies
* updated pipeline signature
---------
Co-authored-by: Vahram <vahram.tadevosyan@lambda-loginnode02.cm.cluster>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
* Add SSD-1B support for controlnet model
* Add conditioning_channels into ControlNet init from unet
* Fix black formatting
* Isort fixes
* Adds SSD-1B controlnet pipeline test with UNetMidBlock2D as mid block
* Overrides failing ssd-1b tests
* Fixes tests after main branch update
* Fixes code quality checks
---------
Co-authored-by: Marko Kostiv <marko@linearity.io>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Change pipeline_controlnet_inpaint.py to add ip-adapter support. Changes are similar to those in pipeline_controlnet
* Change tests for the StableDiffusionControlNetInpaintPipeline by adding image_encoder: None
* Update src/diffusers/pipelines/controlnet/pipeline_controlnet_inpaint.py
Co-authored-by: YiYi Xu <yixu310@gmail.com>
---------
Co-authored-by: YiYi Xu <yixu310@gmail.com>
* move several state dict conversion utils out of lora.py
* check
* check
* check
* check
* check
* check
* check
* revert back
* check
* check
* again check
* maybe fix?
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
---------
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* bug in MultiAdapter for Inpainting
* adapter_input is a list for MultiAdapter
---------
Co-authored-by: andres <andres@hax.ai>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* [Tests] Make sure that we don't run tests mulitple times
* [Tests] Make sure that we don't run tests mulitple times
* [Tests] Make sure that we don't run tests mulitple times
* add comments to explain the code better
* add comments to explain the code better
* add comments to explain the code better
* add comments to explain the code better
* add comments to explain the code better
* fix more
* fix more
* fix more
* fix more
* fix more
* fix more
* I added a new doc string to the class. This is more flexible to understanding other developers what are doing and where it's using.
* Update src/diffusers/models/unet_2d_blocks.py
This changes suggest by maintener.
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Update src/diffusers/models/unet_2d_blocks.py
Add suggested text
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Update unet_2d_blocks.py
I changed the Parameter to Args text.
* Update unet_2d_blocks.py
proper indentation set in this file.
* Update unet_2d_blocks.py
a little bit of change in the act_fun argument line.
* I run the black command to reformat style in the code
* Update unet_2d_blocks.py
similar doc-string add to have in the original diffusion repository.
* I enhanced the code by replacing multiple redundant variables with a single variable, as they all served the same purpose. Additionally, I utilized the get_activation function for improved flexibility in choosing activation functions.
* Using as black package to reformated my file
* reverte some changes
* Remove conv_out_padding variables and using as conv_in_padding
* conv_out_padding create and add them into the code.
* run black command to solving styling problem
* add little bit space between comment and import statement
* I am utilizing the ruff library to address the style issues in my Makefile.
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Add custom timesteps support to LCMScheduler.
* Add custom timesteps support to StableDiffusionPipeline.
* Add custom timesteps support to StableDiffusionXLPipeline.
* Add custom timesteps support to remaining Stable Diffusion pipelines which support LCMScheduler (img2img, inpaint).
* Add custom timesteps support to remaining Stable Diffusion XL pipelines which support LCMScheduler (img2img, inpaint).
* Add custom timesteps support to StableDiffusionControlNetPipeline.
* Add custom timesteps support to T21 Stable Diffusion (XL) Adapters.
* Clean up Stable Diffusion inpaint tests.
* Manually add support for custom timesteps to AltDiffusion pipelines since make fix-copies doesn't appear to work correctly (it deletes the whole pipeline).
* make style
* Refactor pipeline timestep handling into the retrieve_timesteps function.
* deprecated: KarrasVeScheduler, ScoreSdeVpScheduler
* delete tests relevant to deprecated schedulers
* chore: run make style
* fix: import error caused due to incorrect _import_structure after deprecation
* fix: ScoreSdeVpScheduler was not importable from diffusers
* remove import added by assumption
* Update src/diffusers/schedulers/__init__.py as suggested by @patrickvonplaten
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* make it a part deprecated
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Fix
* fix
* fix doc
* fix doc....again.......
* remove karras_ve test folder
Co-Authored-By: YiYi Xu <yixu310@gmail.com>
---------
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: yiyixuxu <yixu310@gmail,com>
* [Fix: pixart-alpha]
add ASPECT_RATIO_512_BIN in use_resolution_binning for random 512px image generation.
* add slow test file for 512px generation without resolution binning
* fix: slow tests for resolution binning.
---------
Co-authored-by: jschen <chenjunsong4@h-partners.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* finalize
* finalize
* finalize
* add slow test
* add slow test
* add slow test
* Fix more
* add slow test
* fix more
* fix more
* fix more
* fix more
* fix more
* fix more
* fix more
* fix more
* fix more
* Better
* Fix more
* Fix more
* add slow test
* Add auto pipelines
* add slow test
* Add all
* add slow test
* add slow test
* add slow test
* add slow test
* add slow test
* Apply suggestions from code review
* add slow test
* add slow test
* Additions:
- support for different lr for text encoder
- support for Prodigy optimizer
- support for min snr gamma
- support for custom captions and dataset loading from the hub
* adjusted --caption_column behaviour (to -not- use the second column of the dataset by default if --caption_column is not provided)
* fixed --output_dir / --model_dir_name confusion
* added --repeats, --adam_weight_decay_text_encoder
+ some fixes
* Update examples/dreambooth/train_dreambooth_lora_sdxl.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update examples/dreambooth/train_dreambooth_lora_sdxl.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update examples/dreambooth/train_dreambooth_lora_sdxl.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* - import compute_snr from diffusers/training_utils.py
- cluster adamw together
- when using 'prodigy', if --train_text_encoder == True and --text_encoder_lr != --learning rate, changes the lr of the text encoders optimization params to be --learning_rate (otherwise errors)
* shape fixes when custom captions are used
* formatting and a little cleanup
* code styling
* --repeats default value fixed, changed to 1
* bug fix - removed redundant lines of embedding concatenation when using prior_preservation (that duplicated class_prompt embeddings)
* changed dataset loading logic according to the following usecases (to avoid unnecessary dependency on datasets)-
1. user provides --dataset_name
2. user provides local dir --instance_data_dir that contains a metadata .jsonl file
3. user provides local dir --instance_data_dir that contains only images
in cases [1,2] we import datasets and use load_dataset method, in case [3] we process the data same as in the original script setting
* styling fix
* arg name fix
* adjusted the --repeats logic
* -removed redundant arg and 'if' when loading local folder with prompts
-updated readme template
-some default val fixes
-custom caption tests
* image path fix for readme
* code style
* bug fix
* --caption_column arg
* readme fix
---------
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Linoy Tsaban <linoy@huggingface.co>
* Change LCMScheduler.set_timesteps to pick more evenly spaced inference timesteps.
* Change inference_indices implementation to better match previous behavior.
* Add num_inference_steps=26 test case to test_inference_steps.
* run CI
---------
Co-authored-by: patil-suraj <surajp815@gmail.com>
* fix an issue that ipex occupy too much memory, it will not impact performance
* make style
---------
Co-authored-by: root <jun.chen@intel.com>
Co-authored-by: Meng Guoqing <guoqing.meng@intel.com>
An upcoming change to JAX will include non-local (addressable) CPU devices in jax.devices() when JAX is used multicontroller-style, where there are multiple Python processes.
This change preserves the current behavior by replacing uses of jax.devices("cpu"), which previously only returned local devices, with jax.local_devices("cpu"), which will return local devices both now and in the future.
This change is always safe (i.e., it should always preserve the previous behavior), but it may sometimes be unnecessary if code is never used in a multicontroller setting.
Co-authored-by: Peter Hawkins <phawkins@google.com>
* fix: UnboundLocalError with image_latents
* chore: run make style, quality, fix-copies
* revert changes from make fix-copies
* revert changes from make fix-copies
---------
Co-authored-by: YiYi Xu <yixu310@gmail.com>
* add also peft latest on peft CI
* up
* up
* up
* Update .github/workflows/pr_test_peft_backend.yml
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* begin doc
* fix examples
* add in toctree
* fix toctree
* improve copy
* improve introductions
* add lcm doc
* fix filename
* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* address Sayak's comments
* remove controlnet aux
* open in colab
* move to Specific pipeline examples
* update controlent and adapter examples
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* improvement: docs and type hints
* improvement: docs and type hints
minor refactor
* improvement: docs and type hints
* update with suggestions from review
Co-Authored-By: Dhruv Nair <dhruv.nair@gmail.com>
---------
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
* Fix typos, update, add Copyright info, and trim trailing whitespace
* Update docs/source/en/api/pipelines/text_to_video_zero.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* 1 second is not a long video, but 6 seconds is
* Update text_to_video_zero.md
* Update text_to_video_zero.md
* Update text_to_video_zero.md
* Update wuerstchen.md
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* does this fix things?
* attention mask use
* attention mask order
* better masking.
* add: tesrt
* remove mask_featur
* test
* debug
* fix: tests
* deprecate mask_feature
* add deprecation test
* add slow test
* add print statements to retrieve the assertion values.
* fix for the 1024 fast tes
* fix tesy
* fix the remaining
* Apply suggestions from code review
* more debug
---------
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Fix the pipeline name in the examples for LMD+ pipeline
* Add LMD+ colab link
* Apply code formatting
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Update the reference for text_to_video.md
The original reference (VideoFusion) might be misleading. VideoFusion is not open-sourced. I am the co-first author of ModelScopeT2V. I change the referred paper to the right one.
* Update docs/source/en/api/pipelines/text_to_video.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
---------
Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* [Docs] Running the pipeline twice does not appear to be the intention of these examples
One is with `cross_attention_kwargs` and the other (next line) removes it
* [Docs] Clarify that these are two separate examples
One using `scale` and the other without it
* add: locm docs.
* correct path
* Apply suggestions from code review
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* Apply suggestions from code review
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* up
* add
---------
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* consistency decoder
* rename
* Apply suggestions from code review
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Update src/diffusers/pipelines/consistency_models/pipeline_consistency_models.py
* uP
* Apply suggestions from code review
* uP
* uP
* uP
---------
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Add adapter fusing + PEFT to the docs
* Update docs/source/en/tutorials/using_peft_for_inference.md
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Update docs/source/en/tutorials/using_peft_for_inference.md
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Update docs/source/en/tutorials/using_peft_for_inference.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/tutorials/using_peft_for_inference.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/tutorials/using_peft_for_inference.md
* Update docs/source/en/tutorials/using_peft_for_inference.md
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* I added a new doc string to the class. This is more flexible to understanding other developers what are doing and where it's using.
* Update src/diffusers/models/unet_2d_blocks.py
This changes suggest by maintener.
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Update src/diffusers/models/unet_2d_blocks.py
Add suggested text
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Update unet_2d_blocks.py
I changed the Parameter to Args text.
* Update unet_2d_blocks.py
proper indentation set in this file.
* Update unet_2d_blocks.py
a little bit of change in the act_fun argument line.
* I run the black command to reformat style in the code
* Update unet_2d_blocks.py
similar doc-string add to have in the original diffusion repository.
* I removed the dummy variable defined in both the encoder and decoder.
* Now, I run black package to reformat my file
* Remove the redundant line from the adapter.py file.
* Black package using to reformated my file
* Replacing the nn.Mish activation function with a get_activation function allows developers to more easily choose the right activation function for their task. Additionally, removing redundant variables can improve code readability and maintainability.
* I try to fix this: Fast tests for PRs / Fast PyTorch Models & Schedulers CPU tests (pull_request)
* Update src/diffusers/models/resnet.py
Co-authored-by: YiYi Xu <yixu310@gmail.com>
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
* Refactor LCMScheduler.step such that prev_sample == denoised at the last timestep in the schedule.
* Make timestep scaling when calculating boundary conditions configurable.
* Reparameterize timestep_scaling to be a multiplicative rather than division scaling.
* make style
* fix dtype conversion
* make style
---------
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* I added a new doc string to the class. This is more flexible to understanding other developers what are doing and where it's using.
* Update src/diffusers/models/unet_2d_blocks.py
This changes suggest by maintener.
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Update src/diffusers/models/unet_2d_blocks.py
Add suggested text
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Update unet_2d_blocks.py
I changed the Parameter to Args text.
* Update unet_2d_blocks.py
proper indentation set in this file.
* Update unet_2d_blocks.py
a little bit of change in the act_fun argument line.
* I run the black command to reformat style in the code
* Update unet_2d_blocks.py
similar doc-string add to have in the original diffusion repository.
* I removed the dummy variable defined in both the encoder and decoder.
* Now, I run black package to reformat my file
* Remove the redundant line from the adapter.py file.
* Black package using to reformated my file
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
* I added a new doc string to the class. This is more flexible to understanding other developers what are doing and where it's using.
* Update src/diffusers/models/unet_2d_blocks.py
This changes suggest by maintener.
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Update src/diffusers/models/unet_2d_blocks.py
Add suggested text
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Update unet_2d_blocks.py
I changed the Parameter to Args text.
* Update unet_2d_blocks.py
proper indentation set in this file.
* Update unet_2d_blocks.py
a little bit of change in the act_fun argument line.
* I run the black command to reformat style in the code
* Update unet_2d_blocks.py
similar doc-string add to have in the original diffusion repository.
* I removed the dummy variable defined in both the encoder and decoder.
* Now, I run black package to reformat my file
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
* Update final model offload for more pipelines
Add test to ensure all pipeline components are returned to CPU after
execution with model offloading
* Add comment to explain early UNet offload in Text-to-Video pipeline
* Style
* stabilize dpmpp for sdxl by using euler at the final step
* add lu's uniform logsnr time steps
* add test
* fix check_copies
* fix tests
---------
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* fix error reported 'find_unused_parameters' running in mutiple GPUs or NPUs
* fix code check of importing module by its alphabetic order
---------
Co-authored-by: jiaqiw <wangjiaqi50@huawei.com>
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
* I added a new doc string to the class. This is more flexible to understanding other developers what are doing and where it's using.
* Update src/diffusers/models/unet_2d_blocks.py
This changes suggest by maintener.
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Update src/diffusers/models/unet_2d_blocks.py
Add suggested text
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Update unet_2d_blocks.py
I changed the Parameter to Args text.
* Update unet_2d_blocks.py
proper indentation set in this file.
* Update unet_2d_blocks.py
a little bit of change in the act_fun argument line.
* I run the black command to reformat style in the code
* Update unet_2d_blocks.py
similar doc-string add to have in the original diffusion repository.
* I use a lower method in the activation function.
* Replace multiple if-else statements with a dictionary of activation functions, and call one if statement to retrieve the appropriate function.
* I am using black package to reforamted my file
* I defined the ACTIVATION_FUNCTIONS variable outside of the function
* activation function variable convert to lower case
* First, I resolved the conflict issue. Then, I ran the Black package to reformat my file.
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* improvement: add typehints and docs to src/diffusers/models/attention_processor.py
* improvement: add typehints and docs to src/diffusers/models/vae.py
* improvement: add missing docs in src/diffusers/models/vq_model.py
* improvement: add typehints and docs to src/diffusers/models/transformer_temporal.py
* improvement: add typehints and docs to src/diffusers/models/t5_film_transformer.py
* improvement: add type hints to src/diffusers/models/unet_1d_blocks.py
* improvement: add missing type hints to src/diffusers/models/unet_2d_blocks.py
* fix: CI error (make fix-copies required)
* fix: CI error (make fix-copies required again)
---------
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
* Add a new community pipeline
examples/community/latent_consistency_img2img.py
which can be called like this
import torch
from diffusers import DiffusionPipeline
pipe = DiffusionPipeline.from_pretrained(
"SimianLuo/LCM_Dreamshaper_v7", custom_pipeline="latent_consistency_txt2img", custom_revision="main")
# To save GPU memory, torch.float16 can be used, but it may compromise image quality.
pipe.to(torch_device="cuda", torch_dtype=torch.float32)
img2img=LatentConsistencyModelPipeline_img2img(
vae=pipe.vae,
text_encoder=pipe.text_encoder,
tokenizer=pipe.tokenizer,
unet=pipe.unet,
#scheduler=pipe.scheduler,
scheduler=None,
safety_checker=None,
feature_extractor=pipe.feature_extractor,
requires_safety_checker=False,
)
img = Image.open("thisismyimage.png")
result = img2img(prompt,img,strength,num_inference_steps=4)
* Apply suggestions from code review
Fix name formatting for scheduler
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* update readme (and run formatter on latent_consistency_img2img.py)
---------
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* fix
* fix copies
* remove heun from tests
* add back heun and fix the tests to include 2nd order
* fix the other test too
* Apply suggestions from code review
* Apply suggestions from code review
* Apply suggestions from code review
* make style
* add more comments
---------
Co-authored-by: yiyixuxu <yixu310@gmail,com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* initial commit for LatentConsistencyModelPipeline and LCMScheduler based on the community pipeline
* Add callback and freeu support.
* apply suggestions from review
* Clean up LCMScheduler
* Remove timeindex argument to LCMScheduler.step.
* Add support for clipping or thresholding the predicted original sample.
* Remove unused methods and arguments in LCMScheduler.
* Improve comment about (lack of) negative prompt support.
* Change input guidance_scale to match the StableDiffusionPipeline (Imagen) CFG formulation.
* Move lcm_origin_steps from pipeline __call__ to LCMScheduler.__init__/config (as origin_steps).
* Fix typo when clipping/thresholding in LCMScheduler.
* Add some initial LCMScheduler tests.
* add type annotations from review
* Fix type annotation bug.
* Override test_add_noise_device in LCMSchedulerTest since hardcoded timesteps doesn't work under default settings.
* Add generator argument pipeline prepare_latents call.
* Cast LCMScheduler.timesteps to long in set_timesteps.
* Add onestep and multistep full loop scheduler tests.
* Set default height/width to None and don't hardcode guidance scale embedding dim.
* Add initial LatentConsistencyPipeline fast and slow tests.
* Add initial documentation for LatentConsistencyModelPipeline and LCMScheduler.
* Make remaining failing fast tests pass.
* make style
* Make original_inference_steps configurable from pipeline __call__ again.
* make style
* Remove guidance_rescale arg from pipeline __call__ since LCM currently doesn't support CFG.
* Make LCMScheduler defaults match config of LCM_Dreamshaper_v7 checkpoint.
* Fix LatentConsistencyPipeline slow tests and add dummy expected slices.
* Add checks for original_steps in LCMScheduler.set_timesteps.
* make fix-copies
* Improve LatentConsistencyModelPipeline docs.
* Apply suggestions from code review
Co-authored-by: Aryan V S <avs050602@gmail.com>
* Apply suggestions from code review
Co-authored-by: Aryan V S <avs050602@gmail.com>
* Apply suggestions from code review
Co-authored-by: Aryan V S <avs050602@gmail.com>
* Update src/diffusers/schedulers/scheduling_lcm.py
* Apply suggestions from code review
Co-authored-by: Aryan V S <avs050602@gmail.com>
* finish
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Aryan V S <avs050602@gmail.com>
* add
* Update docs/source/en/api/pipelines/controlnet_sdxl.md
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
---------
Co-authored-by: yiyixuxu <yixu310@gmail,com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update get_dummy_inputs(...) in T2I-Adapter tests to take image height and width as params.
* Update the T2I-Adapter unit tests to run with the standard number of UNet down blocks so that all T2I-Adapter down blocks get exercised.
* Update the T2I-Adapter down blocks to better match the padding behavior of the UNet.
* Revert "Update the T2I-Adapter unit tests to run with the standard number of UNet down blocks so that all T2I-Adapter down blocks get exercised."
This reverts commit 6d4a060a34.
* Create utility functions for testing the T2I-Adapter downscaling bahevior.
* (minor) Improve readability with an intermediate named variable.
* Statically parameterize T2I-Adapter test dimensions rather than generating them dynamically.
* Fix static checks.
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Added args, kwargs to ```U
* Add UNetMidBlock2D as a supported mid block type
* Fix extra init input for UNetMidBlock2D, change allowed types for Mid-block init
* Update unet_2d_condition.py
* Update unet_2d_condition.py
* Update unet_2d_condition.py
* Update unet_2d_condition.py
* Update unet_2d_condition.py
* Update unet_2d_condition.py
* Update unet_2d_condition.py
* Update unet_2d_condition.py
* Update unet_2d_blocks.py
* Update unet_2d_blocks.py
* Update unet_2d_blocks.py
* Update unet_2d_condition.py
* Update unet_2d_blocks.py
* Updated docstring, increased check strictness
Updated the docstring for ```UNet2DConditionModel``` to include ```reverse_transformer_layers_per_block``` and updated checking for nested list type ```transformer_layers_per_block```
* Add basic shape-check test for asymmetrical unets
* Update src/diffusers/models/unet_2d_blocks.py
Removed blank line
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Update unet_2d_condition.py
Remove blank space
* Update unet_2d_condition.py
Changed docstring for `mid_block_type`
* Fixed docstring and wrong default value
* Reformat with black
* Reformat with necessary commands
* Add UNetMidBlockFlat to versatile_diffusion/modeling_text_unet.py to ensure consistency
* Removed args, kwargs, use on mid-block type
* Make fix-copies
* Update src/diffusers/models/unet_2d_condition.py
Wrap into single line
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* make fix-copies
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* I added a new doc string to the class. This is more flexible to understanding other developers what are doing and where it's using.
* Update src/diffusers/models/unet_2d_blocks.py
This changes suggest by maintener.
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Update src/diffusers/models/unet_2d_blocks.py
Add suggested text
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Update unet_2d_blocks.py
I changed the Parameter to Args text.
* Update unet_2d_blocks.py
proper indentation set in this file.
* Update unet_2d_blocks.py
a little bit of change in the act_fun argument line.
* I run the black command to reformat style in the code
* Update unet_2d_blocks.py
similar doc-string add to have in the original diffusion repository.
* Update unet_2d_blocks.py
Added Beutifull doc-string into the UNetMidBlock2D class.
* Update unet_2d_blocks.py
I replaced the definition in this parameter resnet_time_scale_shift and resnet_groups.
* Update unet_2d_blocks.py
I remove additional sentences into the resnet_groups argument.
* Update unet_2d_blocks.py
I replaced my definition with the maintainer definition in the attention_head_dim parameter.
* I am using black package for reformated my file
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
* added TODOs
* Enhanced and reformatted the docstrings of IFPipeline methods.
* Enhanced and fixed the docstrings of IFImg2ImgSuperResolutionPipeline methods.
* Enhanced and fixed the docstrings of IFImg2ImgPipeline methods.
* Enhanced and fixed the docstrings of IFInpaintingSuperResolutionPipeline methods.
* Enhanced and fixed the docstrings of IFInpaintingPipeline methods.
* Enhanced and fixed the docstrings of IFSuperResolutionPipeline methods.
* Update src/diffusers/pipelines/deepfloyd_if/pipeline_if.py
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update src/diffusers/pipelines/deepfloyd_if/pipeline_if_img2img.py
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update src/diffusers/pipelines/deepfloyd_if/pipeline_if_img2img_superresolution.py
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update src/diffusers/pipelines/deepfloyd_if/pipeline_if_inpainting.py
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update src/diffusers/pipelines/deepfloyd_if/pipeline_if_superresolution.py
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update src/diffusers/pipelines/deepfloyd_if/pipeline_if_inpainting_superresolution.py
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* remove redundant code
* fix code style
* revert the ordering to not break backwards compatibility
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* changed channel parameters for UNET and VAE. Decreased hidden layers size with increased attention heads and intermediate size
* changed the assertion check range
* clean up
---------
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
* fix: sdxl pipeline when unet is not available.
* fix moe
* account for text
* ifx more
* don't make unet optional.
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* split conditionals.
* add optional components to sdxl pipeline
* propagate changes to the rest of the pipelines.
* add: test
* add to all
* fix: rest of the pipelines.
* use pipeline_class variable
* separate pipeline mixin
* use safe_serialization
* fix: test
* access actual output.
* add: optional test to adapter and ip2p sdxl pipeline tests/
* add optional test to controlnet sdxl.
* fix tests
* fix ip2p tests
* fix more
* fifx more.
* use np output type.
* fix for StableDiffusionXLMultiControlNetPipelineFastTests.
* fix: SDXLOptionalComponentsTesterMixin
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* fix tests
* Empty-Commit
* revert previous
* quality
* fix: test
---------
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Add ability to mix usage of T2I-Adapter(s) and ControlNet(s).
Previously, UNet2DConditional implemnetation onloy allowed use of one or the other.
Adds new forward() arg down_intrablock_additional_residuals specifically for T2I-Adapters. If down_intrablock_addtional_residuals is not used, maintains backward compatibility with prior usage of only T2I-Adapter or ControlNet but not both
* Improving forward() arg docs in src/diffusers/models/unet_2d_condition.py
Co-authored-by: psychedelicious <4822129+psychedelicious@users.noreply.github.com>
* Add deprecation warning if down_block_additional_residues is used for T2I-Adapter (intrablock residuals)
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Oops my bad, fixing last commit.
* Added import of diffusers utils.deprecate
* Conform to max line length
* Modifying T2I-Adapter pipelines to reflect change to UNet forward() arg for T2I-Adapter residuals.
---------
Co-authored-by: psychedelicious <4822129+psychedelicious@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* add: freeu to the core sdxl pipeline.
* add: freeu to video2video
* add: freeu to the core SD pipelines.
* add: freeu to image variation for sdxl.
* add: freeu to SD ControlNet pipelines.
* add: freeu to SDXL controlnet pipelines.
* add: freu to t2i adapter pipelines.
* make fix-copies.
* I added a new doc string to the class. This is more flexible to understanding other developers what are doing and where it's using.
* Update src/diffusers/models/unet_2d_blocks.py
This changes suggest by maintener.
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Update src/diffusers/models/unet_2d_blocks.py
Add suggested text
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Update unet_2d_blocks.py
I changed the Parameter to Args text.
* Update unet_2d_blocks.py
proper indentation set in this file.
* Update unet_2d_blocks.py
a little bit of change in the act_fun argument line.
* I run the black command to reformat style in the code
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* improvement: add missing typehints and docs to diffusers/models/attention.py
* chore: convert doc strings to raw python strings
add missing typehints
* improvement: add missing typehints and docs to diffusers/models/adapter.py
* improvement: add missing typehints and docs to diffusers/models/lora.py
* docs: include suggestion by @sayakpaul in src/diffusers/models/adapter.py
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* docs: include suggestion by @sayakpaul in src/diffusers/models/adapter.py
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* docs: include suggestion by @sayakpaul in src/diffusers/models/adapter.py
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* docs: include suggestion by @sayakpaul in src/diffusers/models/adapter.py
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Update src/diffusers/models/lora.py
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Added mark_step for sdxl to run with pytorch xla. Also updated README with instructions for xla
* adding soft dependency on torch_xla
* fix some styling
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
* add missing docstrings
* chore: run make quality
* improvement: include docs suggestion by @yiyixuxu
---------
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* decrease UNet2DConditionModel & ControlNetModel blocks
* decrease UNet2DConditionModel & ControlNetModel blocks
* decrease even more blocks & number of norm groups
* decrease vae block out channels and n of norm goups
* fix code style
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* fix(gligen_inpaint_pipeline): 🐛 Wrap the timestep() 0-d tensor in a list to convert to 1-d tensor. This avoids the TypeError caused by trying to directly iterate over a 0-dimensional tensor in the denoising stage
* test(gligen/gligen_text_image): unit test using the EulerAncestralDiscreteScheduler
---------
Co-authored-by: zhen-hao.chu <zhen-hao.chu@vitrox.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Min-SNR Gamma: correct the fix for SNR weighted loss in v-prediction by adding 1 to SNR rather than the resulting loss weights
Co-authored-by: bghira <bghira@users.github.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* ✨ Added Fourier filter function to upsample blocks
* 🔧 Update Fourier_filter for float16 support
* ✨ Added UNetFreeUConfig to UNet model for FreeU adaptation 🛠️
* move unet to its original form and add fourier_filter to torch_utils.
* implement freeU enable mechanism
* implement disable mechanism
* resolution index.
* correct resolution idx condition.
* fix copies.
* no need to use resolution_idx in vae.
* spell out the kwargs
* proper config property
* fix attribution setting
* place unet hasattr properly.
* fix: attribute access.
* proper disable
* remove validation method.
* debug
* debug
* debug
* debug
* debug
* debug
* potential fix.
* add: doc.
* fix copies
* add: tests.
* add: support freeU in SDXL.
* set default value of resolution idx.
* set default values for resolution_idx.
* fix copies
* fix rest.
* fix copies
* address PR comments.
* run fix-copies
* move apply_free_u to utils and other minors.
* introduce support for video (unet3D)
* minor ups
* consistent fix-copies.
* consistent stuff
* fix-copies
* add: rest
* add: docs.
* fix: tests
* fix: doc path
* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* style up
* move to techniques.
* add: slow test for sd freeu.
* add: slow test for sd freeu.
* add: slow test for sd freeu.
* add: slow test for sd freeu.
* add: slow test for sd freeu.
* add: slow test for sd freeu.
* add: slow test for video with freeu
* add: slow test for video with freeu
* add: slow test for video with freeu
* style
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* handle case when controlnet is list
* Update src/diffusers/loaders.py
* Apply suggestions from code review
* Update src/diffusers/loaders.py
* typecheck comment
---------
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* pipline fetcher
* update script
* clean up
* clean up
* clean up
* new pipeline runner
* rename tests to match modules
* test actions in pr
* change runner to gpu
* clean up
* clean up
* clean up
* fix report
* fix reporting
* clean up
* show test stats in failure reports
* give names to jobs
* add lora tests
* split torch cuda tests and add compile tests
* clean up
* fix tests
* change push to run only on main
---------
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update Unipc einsum to support 1D and 3D diffusion.
* Add unittest
* Update unittest & edge case
* Fix unittest
* Fix testing_utils.py
* Fix unittest file
---------
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Add docstring for the AutoencoderKL's encode
#5229
* Support Python 3.8 syntax in AutoencoderKL.decode type hints
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Follow the style guidelines in AutoencoderKL's encode
#5230
---------
Co-authored-by: stano <>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Add VAE slicing and tiling methods.
* Switch to using VaeImageProcessing for preprocessing and postprocessing of images.
* Rename the VaeImageProcessor to vae_image_processor to avoid a name clash with the CLIPImageProcessor (image_processor).
* Remove the postprocess() function because we're using a VaeImageProcessor instead.
* Remove UniDiffuserPipeline.decode_image_latents because we're using VaeImageProcessor instead.
* Refactor generating text from text latents into a decode_text_latents method.
* Add enable_full_determinism() to UniDiffuser tests.
* make style
* Add PipelineLatentTesterMixin to UniDiffuserPipelineFastTests.
* Remove enable_model_cpu_offload since it is now part of DiffusionPipeline.
* Rename the VaeImageProcessor instance to self.image_processor for consistency with other pipelines and rename the CLIPImageProcessor instance to clip_image_processor to avoid a name clash.
* Update UniDiffuser conversion script.
* Make safe_serialization configurable in UniDiffuser conversion script.
* Rename image_processor to clip_image_processor in UniDiffuser tests.
* Add PipelineKarrasSchedulerTesterMixin to UniDiffuserPipelineFastTests.
* Add initial test for compiling the UniDiffuser model (not tested yet).
* Update encode_prompt and _encode_prompt to match that of StableDiffusionPipeline.
* Turn off standard classifier-free guidance for now.
* make style
* make fix-copies
* apply suggestions from review
---------
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* added docstrings in forward methods of T2IAdapter model and FullAdapter model
* added docstrings in forward methods of FullAdapterXL and AdapterBlock models
* Added docstrings in forward methods of adapter models
* fix ddim inverse scheduler
* update test of ddim inverse scheduler
* update test of pix2pix_zero
* update test of diffedit
* fix typo
---------
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* split_head_dim flax attn
* Make split_head_dim non default
* make style and make quality
* add description for split_head_dim flag
* Update src/diffusers/models/attention_flax.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
---------
Co-authored-by: Juan Acevedo <jfacevedo@google.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Timestep bias for fine-tuning SDXL
* Adjust parameter choices to include "range" and reword the help statements
* Condition our use of weighted timesteps on the value of timestep_bias_strategy
* style
---------
Co-authored-by: bghira <bghira@users.github.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Fix FullAdapterXL.total_downscale_factor.
* Fix incorrect error message in T2IAdapter.__init__(...).
* Move IP-Adapter test_total_downscale_factor(...) to pipeline test file (requested in code review).
* Add more info to error message about an unsupported T2I-Adapter adapter_type.
---------
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Make sure the repo_id is valid before sending it to huggingface_hub to get a more understandable error message.
Re #5110
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* SDXL microconditioning documentation should indicate the correct default order of parameters, so that developers know
* SDXL microconditioning documentation should indicate the correct default order of parameters, so that developers know
* empty
---------
Co-authored-by: bghira <bghira@users.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* support transformer_layers_per block in flax UNet
* add support for text_time additional embeddings to Flax UNet
* rename attention layers for VAE
* add shape asserts when renaming attention layers
* transpose VAE attention layers
* add pipeline flax SDXL code [WIP]
* continue add pipeline flax SDXL code [WIP]
* cleanup
* Working on JIT support
Fixed prompt embedding shapes so they work in parallel mode. Assuming we
always have both text encoders for now, for simplicity.
* Fixing embeddings (untested)
* Remove spurious line
* Shard guidance_scale when jitting.
* Decode images
* Fix sharding
* style
* Refiner UNet can be loaded.
* Refiner / img2img pipeline
* Allow latent outputs from base and latent inputs in refiner
This makes it possible to chain base + refiner without having to use the
vae decoder in the base model, the vae encoder in the refiner, skipping
conversions to/from PIL, and avoiding TPU <-> CPU memory copies.
* Adapt to FlaxCLIPTextModelOutput
* Update Flax XL pipeline to FlaxCLIPTextModelOutput
* make fix-copies
* make style
* add euler scheduler
* Fix import
* Fix copies, comment unused code.
* Fix SDXL Flax imports
* Fix euler discrete begin
* improve init import
* finish
* put discrete euler in init
* fix flax euler
* Fix more
* make style
* correct init
* correct init
* Temporarily remove FlaxStableDiffusionXLImg2ImgPipeline
* correct pipelines
* finish
---------
Co-authored-by: Martin Müller <martin.muller.me@gmail.com>
Co-authored-by: patil-suraj <surajp815@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* min-SNR gamma for Dreambooth training
* Align the mse_loss_weights style with SDXL training example
---------
Co-authored-by: bghira <bghira@users.github.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Resolve v_prediction issue for min-SNR gamma weighted loss function
* Combine MSE loss calculation of epsilon and velocity, with a note about the application of the epsilon code to sample prediction
* style
---------
Co-authored-by: bghira <bghira@users.github.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* fix test
* initial commit
* change test
* updates:
* fix tests
* test fix
* test fix
* fix tests
* make test faster
* clean up
* fix precision in test
* fix precision
* Fix tests
* Fix logging test
* fix test
* fix test
---------
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* [SDXL] Make sure multi batch prompt embeds works
* [SDXL] Make sure multi batch prompt embeds works
* improve more
* improve more
* Apply suggestions from code review
Fixed `get_word_inds` mistake/typo in P2P community pipeline
The function `get_word_inds` was taking a string of text and either a word (str) or a word index (int) and returned the indices of token(s) the word would be encoded to.
However, there was a typo, in which in the second `if` branch the word was checked to be a `str` **again**, not `int`, which resulted in an [example code from the docs](https://github.com/huggingface/diffusers/tree/main/examples/community#prompt2prompt-pipeline) to result in an error
* add support for clip skip
* fix condition
* fix
* add clip_output_layer_to_default
* expose
* remove the previous functions.
* correct condition.
* apply final layer norm
* address feedback
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* refactor clip_skip.
* port to the other pipelines.
* fix copies one more time
---------
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Remove logger.info statement from Unet2DCondition code to ensure torch compile reliably succeeds
* Convert logging statement to a comment for future archaeologists
* Update src/diffusers/models/unet_2d_condition.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
---------
Co-authored-by: bghira <bghira@users.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Add attn_groups argument to UNet2DMidBlock2D to control theinternal Attention block's GroupNorm.
* Add docstring for attn_norm_num_groups in UNet2DModel.
* Since the test UNet config uses resnet_time_scale_shift == 'scale_shift', also set attn_norm_num_groups to 32.
* Add test for attn_norm_num_groups to UNet2DModelTests.
* Fix expected slices for slow tests.
* Also fix tolerances for slow tests.
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Initial commit P2P
* Replaced CrossAttention, added test skeleton
* bug fixes
* Updated docstring
* Removed unused function
* Created tests
* improved tests
- made fast inference tests faster
- corrected image shape assertions
* Corrected expected output shape in tests
* small fix: test inputs
* Update tests
- used conditional unet2d
- set expected image slices
- edit_kwargs are now not popped, so pipe can be run multiple times
* Fixed bug in int tests
* Fixed tests
* Linting
* Create prompt2prompt.md
* Added to docs toc
* Ran make fix-copies
* Fixed code blocks in docs
* Using same interface as StableDiffusionPipeline
* Fixed small test bug
* Added all options SDPipeline.__call_ has
* Fixed docstring; made __call__ like in SD
* Linting
* Added test for multiple prompts
* Improved docs
* Incorporated feedback
* Reverted formatting on unrelated files
* Moved prompt2prompt to community
- Moved prompt2prompt pipeline from main to community
- Deleted tests
- Moved documentation to community and shorted it
* Update src/diffusers/utils/dummy_torch_and_transformers_objects.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
---------
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* check out dtypes.
* check out dtypes.
* check out dtypes.
* check out dtypes.
* check out dtypes.
* check out dtypes.
* check out dtypes.
* potential fix
* check out dtypes.
* check out dtypes.
* working?
* Fix an unmatched backtick and make description more general for DiffusionPipeline.enable_sequential_cpu_offload.
* make style
* _exclude_from_cpu_offload -> self._exclude_from_cpu_offload
* make style
* apply suggestions from review
* make style
* speed up lora loading
* Apply suggestions from code review
* up
* up
* Fix more
* Correct more
* Apply suggestions from code review
* up
* Fix more
* Fix more -
* up
* up
* [Draft] Refactor model offload
* [Draft] Refactor model offload
* Apply suggestions from code review
* cpu offlaod updates
* remove model cpu offload from individual pipelines
* add hook to offload models to cpu
* clean up
* model offload
* add model cpu offload string
* make style
* clean up
* fixes for offload issues
* fix tests issues
* resolve merge conflicts
* update src/diffusers/pipelines/pipeline_utils.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* make style
* Update src/diffusers/pipelines/latent_diffusion/pipeline_latent_diffusion.py
---------
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
* Revert "Temp Revert "[Core] better support offloading when side loading is enabled… (#4927)"
This reverts commit 2ab170499e.
* tests: install accelerate from main
* add t2i_example script
* remove in channels logic
* remove comments
* remove use_euler arg
* add requirements
* only use canny example
* use datasets
* comments
* make log_validation consistent with other scripts
* add readme
* fix title in readme
* update check_min_version
* change a few minor things.
* add doc entry
* add: test for t2i adapter training
* remove use_auth_token
* fix: logged info.
* remove tests for now.
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Add --vae_precision option to the SDXL pix2pix script so that we have the option of avoiding float32 overhead
* style
---------
Co-authored-by: bghira <bghira@users.github.com>
* Add dropout param to get_down_block/get_up_block and UNet2DModel/UNet2DConditionModel.
* Add dropout param to Versatile Diffusion modeling, which has a copy of UNet2DConditionModel and its own get_down_block/get_up_block functions.
* Change StableDiffusionInpaintPipelineFastTests.get_dummy_inputs to produce a random image and a white mask_image.
* Add dummy expected slices for the test_stable_diffusion_inpaint tests.
* Remove print statement
---------
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* proposal for flaky tests
* more precision fixes
* move more tests to use cosine distance
* more test fixes
* clean up
* use default attn
* clean up
* update expected value
* make style
* make style
* Apply suggestions from code review
* Update src/diffusers/pipelines/stable_diffusion/pipeline_onnx_stable_diffusion_img2img.py
* make style
* fix failing tests
---------
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Initial code to add force_unmasked_unchanged argument to StableDiffusionInpaintPipeline.__call__.
* Try to improve StableDiffusionInpaintPipelineFastTests.get_dummy_inputs.
* Use original mask to preserve unmasked pixels in pixel space rather than latent space.
* make style
* start working on note in docs to force unmasked area to be unchanged
* Add example of forcing the unmasked area to remain unchanged.
* Revert "make style"
This reverts commit fa7759293a.
* Revert "Use original mask to preserve unmasked pixels in pixel space rather than latent space."
This reverts commit 092bd0e9e9.
* Revert "Try to improve StableDiffusionInpaintPipelineFastTests.get_dummy_inputs."
This reverts commit ff41cf43c5.
* Revert "Initial code to add force_unmasked_unchanged argument to StableDiffusionInpaintPipeline.__call__."
This reverts commit 989979752a.
---------
Co-authored-by: Will Berman <wlbberman@gmail.com>
* Fix potential type conversion errors in SDXL pipelines
* make sure vae stays in fp16
---------
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* refactoring of encode_prompt()
* better handling of device.
* fix: device determination
* fix: device determination 2
* handle num_images_per_prompt
* revert changes in loaders.py and give birth to encode_prompt().
* minor refactoring for encode_prompt()/
* make backward compatible.
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* fix: concatenation of the neg and pos embeddings.
* incorporate encode_prompt() in test_stable_diffusion.py
* turn it into big PR.
* make it bigger
* gligen fixes.
* more fixes to fligen
* _encode_prompt -> encode_prompt in tests
* first batch
* second batch
* fix blasphemous mistake
* fix
* fix: hopefully for the final time.
---------
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* adding save and load for MultiAdapter, adding test
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Adding changes from review test_stable_diffusion_adapter
* import sorting fix
---------
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Increase min accelerate ver to avoid OOM when mixed precision
* Rm re-instantiation of VAE
* Rm casting to float32
* Del unused models and free GPU
* Fix style
* Update textual_inversion.py
fixed safe_path bug in textual inversion training
* Update test_examples.py
update test_textual_inversion for updating saved file's name
* Update textual_inversion.py
fixed some formatting issues
---------
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* empty PR
* init
* changes
* starting with the pipeline
* stable diff
* prev
* more things, getting started
* more functions
* makeing it more readable
* almost done testing
* var changes
* testing
* device
* device support
* maybe
* device malfunctions
* new new
* register
* testing
* exec does not work
* float
* change info
* change of architecture
* might work
* testing with colab
* more attn atuff
* stupid additions
* documenting and testing
* writing tests
* more docs
* tests and docs
* remove test
* empty PR
* init
* changes
* starting with the pipeline
* stable diff
* prev
* more things, getting started
* more functions
* makeing it more readable
* almost done testing
* var changes
* testing
* device
* device support
* maybe
* device malfunctions
* new new
* register
* testing
* exec does not work
* float
* change info
* change of architecture
* might work
* testing with colab
* more attn atuff
* stupid additions
* documenting and testing
* writing tests
* more docs
* tests and docs
* remove test
* change cross attention
* revert back
* tests
* reverting back to orig
* changes
* test passing
* pipeline changes
* before quality
* quality checks pass
* remove print statements
* doc fixes
* __init__ error something
* update docs, working on dim
* working on encoding
* doc fix
* more fixes
* no more dependent on 512*512
* update docs
* fixes
* test passing
* remove comment
* fixes and migration
* simpler tests
* doc changes
* green CI
* changes
* more docs
* changes
* new images
* to community examples
* selete
* more fixes
* changes
* fix
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Update loaders.py
Solves an error sometimes thrown while iterating over state_dict.keys() caused by using the .pop() method within the loop.
* Update loaders.py
* debugging
* better logic for filtering.
* Update src/diffusers/loaders.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
---------
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* dreambooth training
* train_dreambooth validation scheduler
* set a particular scheduler via a string
* modify readme after setting a particular scheduler via a string
* modify readme after setting a particular scheduler
* use importlib to set a particular scheduler
* import with correct sort
* Fix AutoencoderTiny encoder scaling convention
* Add [-1, 1] -> [0, 1] rescaling to EncoderTiny
* Move [0, 1] -> [-1, 1] rescaling from AutoencoderTiny.decode to DecoderTiny
(i.e. immediately after the final conv, as early as possible)
* Fix missing [0, 255] -> [0, 1] rescaling in AutoencoderTiny.forward
* Update AutoencoderTinyIntegrationTests to protect against scaling issues.
The new test constructs a simple image, round-trips it through AutoencoderTiny,
and confirms the decoded result is approximately equal to the source image.
This test checks behavior with and without tiling enabled.
This test will fail if new AutoencoderTiny scaling issues are introduced.
* Context: Raw TAESD weights expect images in [0, 1], but diffusers'
convention represents images with zero-centered values in [-1, 1],
so AutoencoderTiny needs to scale / unscale images at the start of
encoding and at the end of decoding in order to work with diffusers.
* Re-add existing AutoencoderTiny test, update golden values
* Add comments to AutoencoderTiny.forward
This is a better method than comparing against a list of supported backends as it allows for supporting any number of backends provided they are installed on the user's system.
This should have no effect on the behaviour of tests in Huggingface's CI workers.
See transformers#25506 where this approach has already been added.
* Update loaders.py
add config_file to from_single_file,
when the download_from_original_stable_diffusion_ckpt use
* Update loaders.py
add config_file to from_single_file,
when the download_from_original_stable_diffusion_ckpt use
* change config_file to original_config_file
* make style && make quality
---------
Co-authored-by: jianghua.zuo <jianghua.zuo@weimob.com>
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
* Add SDXL long weighted prompt pipeline
* Add SDXL long weighted prompt pipeline usage sample in the readme document
* Add SDXL long weighted prompt pipeline usage sample in the readme document, add result image
* make safetensors default
* set default save method as safetensors
* update tests
* update to support saving safetensors
* update test to account for safetensors default
* update example tests to use safetensors
* update example to support safetensors
* update unet tests for safetensors
* fix failing loader tests
* fix qc issues
* fix pipeline tests
* fix example test
---------
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
* add: train to text image with sdxl script.
Co-authored-by: CaptnSeraph <s3raph1m@gmail.com>
* fix: partial func.
* fix: default value of output_dir.
* make style
* set num inference steps to 25.
* remove mentions of LoRA.
* up min version
* add: ema cli arg
* run device placement while running step.
* precompute vae encodings too.
* fix
* debug
* should work now.
* debug
* debug
* goes alright?
* style
* debugging
* debugging
* debugging
* debugging
* fix
* reinit scheduler if prediction_type was passed.
* akways cast vae in float32
* better handling of snr.
Co-authored-by: bghira <bghira@users.github.com>
* the vae should be also passed
* add: docs.
* add: sdlx t2i tests
* save the pipeline
* autocast.
* fix: save_model_card
* fix: save_model_card.
---------
Co-authored-by: CaptnSeraph <s3raph1m@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: bghira <bghira@users.github.com>
* Fixing repo_id regex validation error on windows platforms
* Validating correct URL with prefix is provided
If we are loading a URL then we don't need to use os.path.join and array slicing to split out a repo_id and file path from an absolute filepath.
Checking if the URL prefix is valid first before doing any URL splitting otherwise we raise a ValueError since neither a valid filepath or URL was provided.
* Style fixes
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* move slow pix2pixzero tests to nightly
* move slow panorama tests to nightly
* move txt2video full test to nightly
* clean up
* remove nightly test from text to video pipeline
* add load_lora_weights and save_lora_weights to StableDiffusionXLImg2ImgPipeline
* add load_lora_weights and save_lora_weights to StableDiffusionXLInpaintPipeline
* apply black format
* apply black format
* add copy statement
* fix statements
* fix statements
* fix statements
* run `make fix-copies`
* add pipeline class
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* style
---------
Co-authored-by: yiyixuxu <yixu310@gmail,com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* move audioldm tests to nightly
* move kandinsky im2img ddpm test to nightly
* move flax dpm test to nightly
* move diffedit dpm test to nightly
* move fp16 slow tests to nightly
* add train_text_to_image_lora_sdxl.py
* add train_text_to_image_lora_sdxl.py
* add test and minor fix
* Update examples/text_to_image/README_sdxl.md
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* fix unwrap_model rule
* add invisible-watermark in requirements
* del invisible-watermark
* Update examples/text_to_image/README_sdxl.md
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update examples/text_to_image/README_sdxl.md
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update examples/text_to_image/train_text_to_image_lora_sdxl.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* del comment & update readme
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* added placeholder token concatenation during training
* Update examples/textual_inversion/textual_inversion.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
---------
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Faster controlnet model instantiation, and allow controlnets to be loaded (from ckpt) in a parallel thread with a SD model (ckpt) without tensor errors (race condition)
* type conversion
Default value of `control_guidance_start` and `control_guidance_end` in `StableDiffusionControlNetPipeline.check_inputs` causes `TypeError: object of type 'float' has no len()`
Proposed fix:
Convert `control_guidance_start` and `control_guidance_end` to list if float
* Update src/diffusers/pipelines/controlnet/pipeline_controlnet.py
* Update src/diffusers/pipelines/controlnet/pipeline_controlnet.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/diffusers/pipelines/controlnet/pipeline_controlnet.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
---------
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Prevent online access when desired
- Bypass requests with config files option added to download_from_original_stable_diffusion_ckpt
- Adds local_files_only flags to all from_pretrained requests
* add zero123 pipeline to community
* add community doc
* reformat
* update zero123 pipeline, including cc_projection within diffusers; add convert ckpt scripts; support diffusers weights
* first draft
* tidy api
* apply feedback
* mdx to md
* apply feedback
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
---------
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* update expected slice so img2img compile tests pass
* use default attn processor
* use default attn processor and update expected slice value to pass test
* use default attn processor
* set default attn processor and update expected slice
* set default attn processor and change precision for check
* set unet to use default attn processor
* fixed typo
* updated doc to be consistent in naming
* make style/quality
* preprocessing for 4 channels and not 6
* make style
* test for 4c
* make style/quality
* fixed test on cpu
* fixed doc typo
* changed default ckpt to 4c
* Update pipeline_stable_diffusion_ldm3d.py
---------
Co-authored-by: Aflalo <estellea@isl-iam1.rr.intel.com>
Co-authored-by: Aflalo <estellea@isl-gpu33.rr.intel.com>
Co-authored-by: Aflalo <estellea@isl-gpu38.rr.intel.com>
Update unet_1d.py
highlighting the way the modules are actually fed in the main code as the order matters because no skip block attaches time embeds whilst others do not
* [SDXL-IP2P] Add gif for demonstrating training processes
* [SDXL-IP2P] Add gif for demonstrating training processes
* [SDXL-IP2P] Change gif to URLs
* [SDXL-IP2P] Add URLs in case gif now show
---------
Co-authored-by: Harutatsu Akiyama <kf.zy.qin@gmail.com>
* fix_batch_xl
* Fix other pipelines as well
* up
* up
* Update tests/pipelines/stable_diffusion_xl/test_stable_diffusion_xl_inpaint.py
* sort
* up
* Finish it all up Co-authored-by: Bagheera <bghira@users.github.com>
* Co-authored-by: Bagheera bghira@users.github.com
* Co-authored-by: Bagheera <bghira@users.github.com>
* Finish it all up Co-authored-by: Bagheera <bghira@users.github.com>
* add test for pipeline import.
* Update tests/others/test_dependencies.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* address suggestions
---------
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* initial
* style
* from ...pipelines -> from ..pipeline_util
* make style
* fix-copies
* fix value_guided_sampling oops
* style
* add test
* Show failing test
* update from_pipe
* fix
* add controlnet, additional test and register unused original config
* update for controlnet
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* store unused config as private attribute and pass if can
* add doc
* kandinsky inpaint pipeline does not work with decoder checkpoint
* update doc
* Apply suggestions from code review
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* style
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* fix
* Apply suggestions from code review
---------
Co-authored-by: yiyixuxu <yixu310@gmail,com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* fix: #4206
* add: sdxl controlnet training smoketest.
* remove unnecessary token inits.
* add: licensing to model card.
* include SDXL licensing in the model card and make public visibility default
* debugging
* debugging
* disable local file download.
* fix: training test.
* fix: ckpt prefix.
* Fix the XL ensemble not working for any kerras scheduler sigmas and having an off by one bug
* Update src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py
* make sytle
---------
Co-authored-by: Jimmy <39@🇺🇸.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* fix bug when no cfg
* style
* fix no cfg for shap-e and cycle
* style
* fix no cfg for sdxl
* fix copies
---------
Co-authored-by: yiyixuxu <yixu310@gmail,com>
* 📄 Renamed File for Better Understanding
Renamed the 'rl' file to 'run_locomotion'. This change was made to improve the clarity and readability of the codebase. The 'rl' name was ambiguous, and 'run_locomotion' provides a more clear description of the file's purpose.
Thanks 🙌
* 📁 [Docs] Renamed Directory for Better Clarity
Renamed the 'rl' directory to 'reinforcement_learning'. This change provides a clearer understanding of the directory's purpose and its contents.
* Update examples/reinforcement_learning/README.md
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* 📝 Update README
---------
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Fix bug in ControlNetPipelines with MultiControlNetModel of length 1
* Add tests for varying number of ControlNet models
* Fix missing indexing for control_guidance_start and control_guidance_end
* Fix code quality
* Separate test for MultiControlNet with one model
* Revert formatting of earlier test
* Add controlnet from single file
* Updates
* make style
* finish
* Apply suggestions from code review
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* feat: add act_fn param to OutValueFunctionBlock
* feat: update unet1d tests to not use mish
* feat: add `mish` as the default activation function
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* feat: drop mish tests from unet1d
---------
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* add: controlnet sdxl.
* modifications to controlnet.
* run styling.
* add: __init__.pys
* incorporate https://github.com/huggingface/diffusers/pull/4019 changes.
* run make fix-copies.
* resize the conditioning images.
* remove autocast.
* run styling.
* disable autocast.
* debugging
* device placement.
* back to autocast.
* remove comment.
* save some memory by reusing the vae and unet in the pipeline.
* apply styling.
* Allow low precision sd xl
* finish
* finish
* changes to accommodate the improved VAE.
* modifications to how we handle vae encoding in the training.
* make style
* make existing controlnet fast tests pass.
* change vae checkpoint cli arg.
* fix: vae pretrained paths.
* fix: steps in get_scheduler().
* debugging.
* debugging./
* fix: weight conversion.
* add: docs.
* add: limited tests./
* add: datasets to the requirements.
* update docstrings and incorporate the usage of watermarking.
* incorporate fix from #4083
* fix watermarking dependency handling.
* run make-fix-copies.
* Empty-Commit
* Update requirements_sdxl.txt
* remove vae upcasting part.
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* run make style
* run make fix-copies.
* disable suppot for multicontrolnet.
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* run make fix-copies.
* dtyle/.
* fix-copies.
---------
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Add Recent Timestep Scheduling Improvements to DDIM Inverse Scheduler
Roll timesteps by one to reflect origin-destination semantic discrepancy
Restore `set_alpha_to_one` option to handle negative initial timesteps
Remove `set_alpha_to_zero` option not used due to previous truncation
* Bugfix
* Remove unnecessary calls to `detach()`
Use `self.image_processor.preprocess` in DiffEdit pipeline functions
* Preprocess list input for inverted image latents in diffedit pipeline
* Add `timestep_spacing` and `steps_offset` to `DPMSolverMultistepInverseScheduler`
* Update expected test results to account for inverting last forward diffusion step
* Fix inversion progress bar bug
* Add first draft for proper fast tests for DDIMInverseScheduler
* Add deprecated DDIMInverseScheduler kwarg to ConfigMixer registry
* Fix test failure in DPMMultistepInverseScheduler
Invert step specification leads to negative noise variance in SDE-based algs
Add first draft for proper fast tests for DPMMultistepInverseScheduler
* Update expected test results to account for inverting last forward diffusion step
Clean up diffedit fast test
* Quick implementation of t2i-adapter
Load adapter module with from_pretrained
Prototyping generalized adapter framework
Writeup doc string for sideload framework(WIP) + some minor update on implementation
Update adapter models
Remove old adapter optional args in UNet
Add StableDiffusionAdapterPipeline unit test
Handle cpu offload in StableDiffusionAdapterPipeline
Auto correct coding style
Update model repo name to "RzZ/sd-v1-4-adapter-pipeline"
Refactor MultiAdapter to better compatible with config system
Export MultiAdapter
Create pipeline document template from controlnet
Create dummy objects
Supproting new AdapterLight model
Fix StableDiffusionAdapterPipeline common pipeline test
[WIP] Update adapter pipeline document
Handle num_inference_steps in StableDiffusionAdapterPipeline
Update definition of Adapter "channels_in"
Update documents
Apply code style
Fix doc typo and merge error
Update doc string and example
Quality of life improvement
Remove redundant code and file from prototyping
Remove unused pageage
Remove comments
Fix title
Fix typo
Add conditioning scale arg
Bring back old implmentation
Offload sideload
Add supply info on document
Update src/diffusers/models/adapter.py
Co-authored-by: Will Berman <wlbberman@gmail.com>
Update MultiAdapter constructor
Swap out custom checkpoint and update pipeline constructor
Update docment
Apply suggestions from code review
Co-authored-by: Will Berman <wlbberman@gmail.com>
Correcting style
Following single-file policy
Update auto size in image preprocess func
Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_adapter.py
Co-authored-by: Will Berman <wlbberman@gmail.com>
fix copies
Update adapter pipeline behavior
Add adapter_conditioning_scale doc string
Add the missing doc string
Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Fix few bugs from suggestion
Handle L-mode PIL image as control image
Rename to differentiate adapter resblock
Update src/diffusers/models/adapter.py
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Fix typo
Update adapter parameter name
Update test case and code style
Fix copies
Fix typo
Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_adapter.py
Co-authored-by: Will Berman <wlbberman@gmail.com>
Update Adapter class name
Add checkpoint converting script
Fix style
Fix-copies
Remove dev script
Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Updates for parameter rename
Fix convert_adapter
remove main
fix diff
more
refactoring
more
more
small fixes
refactor
tests
more slow tests
more tests
Update docs/source/en/api/pipelines/overview.mdx
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
add community contributor to docs
Update docs/source/en/api/pipelines/stable_diffusion/adapter.mdx
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Update docs/source/en/api/pipelines/stable_diffusion/adapter.mdx
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Update docs/source/en/api/pipelines/stable_diffusion/adapter.mdx
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Update docs/source/en/api/pipelines/stable_diffusion/adapter.mdx
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Update docs/source/en/api/pipelines/stable_diffusion/adapter.mdx
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
fix
remove from_adapters
license
paper link
docs
more url fixes
more docs
fix
fixes
fix
fix
* fix sample inplace add
* additional_kwargs -> additional_residuals
* move t2i adapter pipeline to own module
* preprocess -> _preprocess_adapter_image
* add TencentArc to license
* fix example code links
* add image converter and fix example doc string
* fix links
* clearer additional residual application
---------
Co-authored-by: HimariO <dsfhe49854@gmail.com>
* 📝 Update doc with more descriptive title and filename for "IF" section
Updated the documentation to provide a more descriptive title and filename for the "IF" section. Previously, having only "IF" as the title was not conveying a clear meaning. By renaming the section to "DeepFloyd IF," we provide users with a more informative and context-specific heading.
Thanks! 🙌
* 📝 Update name for "IF" section in 📝 Update name for "IF" section in README
Updated the link and name for the "IF" section in the README file to reflect the new heading "DeepFloyd IF."
* 📝 Fix broken link for "Instruct Pix2Pix" section in README
Fixed the broken link for the "Instruct Pix2Pix" section in the README file. Previously, the link was pointing to an incorrect location due to the presence of "stable_diffusion" in the URL. By removing "stable_diffusion" from the URL, I have corrected the error and ensured that users are directed to the correct section.
* 🔧💼 Updated parameters in _toctree.yml file
- ✏️ Updated 'local' parameter to 'api/pipelines/deepfloyd_if'.
- ✏️ Updated 'title' parameter to 'DeepFloyd IF'.
🎯 These changes aim to improve visibility and accessibility in the documentation of the DeepFloyd IF pipeline. 🚀📚
* add noise_sampler to StableDiffusionKDiffusionPipeline
* fix/docs: Fix the broken doc links (#3897)
* fix/docs: Fix the broken doc links
Signed-off-by: GitHub <noreply@github.com>
* Update docs/source/en/using-diffusers/write_own_pipeline.mdx
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
---------
Signed-off-by: GitHub <noreply@github.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* Add video img2img (#3900)
* Add image to image video
* Improve
* better naming
* make fix copies
* add docs
* finish tests
* trigger tests
* make style
* correct
* finish
* Fix more
* make style
* finish
* fix/doc-code: Updating to the latest version parameters (#3924)
fix/doc-code: update to use the new parameter
Signed-off-by: GitHub <noreply@github.com>
* fix/doc: no import torch issue (#3923)
Ffix/doc: no import torch issue
Signed-off-by: GitHub <noreply@github.com>
* Correct controlnet out of list error (#3928)
* Correct controlnet out of list error
* Apply suggestions from code review
* correct tests
* correct tests
* fix
* test all
* Apply suggestions from code review
* test all
* test all
* Apply suggestions from code review
* Apply suggestions from code review
* fix more tests
* Fix more
* Apply suggestions from code review
* finish
* Apply suggestions from code review
* Update src/diffusers/schedulers/scheduling_k_dpm_2_ancestral_discrete.py
* finish
* Adding better way to define multiple concepts and also validation capabilities. (#3807)
* - Added validation parameters
- Changed some parameter descriptions to better explain their use.
- Fixed a few typos.
- Added concept_list parameter for better management of multiple subjects
- changed logic for image validation
* - Fixed bad logic for class data root directories
* Defaulting validation_steps to None for an easier logic
* Fixed multiple validation prompts
* Fixed bug on validation negative prompt
* Changed validation logic for tracker.
* Added uuid for validation image labeling
* Fix error when comparing validation prompts and validation negative prompts
* Improved error message when negative prompts for validation are more than the number of prompts
* - Changed image tracking number from epoch to global_step
- Added Typing for functions
* Added some validations more when using concept_list parameter and the regular ones.
* Fixed error message
* Added more validations for validation parameters
* Improved messaging for errors
* Fixed validation error for parameters with default values
* - Added train step to image name for validation
- reformatted code
* - Added train step to image's name for validation
- reformatted code
* Updated README.md file.
* reverted back original script of train_dreambooth.py
* reverted back original script of train_dreambooth.py
* left one blank line at the eof
* reverted back setup.py
* reverted back setup.py
* added same logic for when parameters for prior preservation are used without enabling the flag while using concept_list parameter.
* Ran black formatter.
* fixed a few strings
* fixed import sort with isort and removed fstrings without placeholder
* fixed import order with ruff (since with isort wasn't ok)
---------
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* [ldm3d] Update code to be functional with the new checkpoints (#3875)
* fixed typo
* updated doc to be consistent in naming
* make style/quality
* preprocessing for 4 channels and not 6
* make style
* test for 4c
* make style/quality
* fixed test on cpu
---------
Co-authored-by: Aflalo <estellea@isl-iam1.rr.intel.com>
Co-authored-by: Aflalo <estellea@isl-gpu33.rr.intel.com>
Co-authored-by: Aflalo <estellea@isl-gpu38.rr.intel.com>
* Improve memory text to video (#3930)
* Improve memory text to video
* Apply suggestions from code review
* add test
* Apply suggestions from code review
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* finish test setup
---------
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* revert automatic chunking (#3934)
* revert automatic chunking
* Apply suggestions from code review
* revert automatic chunking
* avoid upcasting by assigning dtype to noise tensor (#3713)
* avoid upcasting by assigning dtype to noise tensor
* make style
* Update train_unconditional.py
* Update train_unconditional.py
* make style
* add unit test for pickle
* revert change
---------
Co-authored-by: root <root@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Prathik Rao <prathikrao@microsoft.com@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
* Fix failing np tests (#3942)
* Fix failing np tests
* Apply suggestions from code review
* Update tests/pipelines/test_pipelines_common.py
* Add `timestep_spacing` and `steps_offset` to schedulers (#3947)
* Add timestep_spacing to DDPM, LMSDiscrete, PNDM.
* Remove spurious line.
* More easy schedulers.
* Add `linspace` to DDIM
* Noise sigma for `trailing`.
* Add timestep_spacing to DEISMultistepScheduler.
Not sure the range is the way it was intended.
* Fix: remove line used to debug.
* Support timestep_spacing in DPMSolverMultistep, DPMSolverSDE, UniPC
* Fix: convert to numpy.
* Use sched. defaults when instantiating from_config
For params not present in the original configuration.
This makes it possible to switch pipeline schedulers even if they use
different timestep_spacing (or any other param).
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Missing args in DPMSolverMultistep
* Test: default args not in config
* Style
* Fix scheduler name in test
* Remove duplicated entries
* Add test for solver_type
This test currently fails in main. When switching from DEIS to UniPC,
solver_type is "logrho" (the default value from DEIS), which gets
translated to "bh1" by UniPC. This is different to the default value for
UniPC: "bh2". This is where the translation happens: 36d22d0709/src/diffusers/schedulers/scheduling_unipc_multistep.py (L171)
* UniPC: use same default for solver_type
Fixes a bug when switching from UniPC from another scheduler (i.e.,
DEIS) that uses a different solver type. The solver is now the same as
if we had instantiated the scheduler directly.
* do not save use default values
* fix more
* fix all
* fix schedulers
* fix more
* finish for real
* finish for real
* flaky tests
* Update tests/pipelines/stable_diffusion/test_stable_diffusion_pix2pix_zero.py
* Default steps_offset to 0.
* Add missing docstrings
* Apply suggestions from code review
---------
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Add Consistency Models Pipeline (#3492)
* initial commit
* Improve consistency models sampling implementation.
* Add CMStochasticIterativeScheduler, which implements the multi-step sampler (stochastic_iterative_sampler) in the original code, and make further improvements to sampling.
* Add Unet blocks for consistency models
* Add conversion script for Unet
* Fix bug in new unet blocks
* Fix attention weight loading
* Make design improvements to ConsistencyModelPipeline and CMStochasticIterativeScheduler and add initial version of tests.
* make style
* Make small random test UNet class conditional and set resnet_time_scale_shift to 'scale_shift' to better match consistency model checkpoints.
* Add support for converting a test UNet and non-class-conditional UNets to the consistency models conversion script.
* make style
* Change num_class_embeds to 1000 to better match the original consistency models implementation.
* Add support for distillation in pipeline_consistency_models.py.
* Improve consistency model tests:
- Get small testing checkpoints from hub
- Modify tests to take into account "distillation" parameter of ConsistencyModelPipeline
- Add onestep, multistep tests for distillation and distillation + class conditional
- Add expected image slices for onestep tests
* make style
* Improve ConsistencyModelPipeline:
- Add initial support for class-conditional generation
- Fix initial sigma for onestep generation
- Fix some sigma shape issues
* make style
* Improve ConsistencyModelPipeline:
- add latents __call__ argument and prepare_latents method
- add check_inputs method
- add initial docstrings for ConsistencyModelPipeline.__call__
* make style
* Fix bug when randomly generating class labels for class-conditional generation.
* Switch CMStochasticIterativeScheduler to configuring a sigma schedule and make related changes to the pipeline and tests.
* Remove some unused code and make style.
* Fix small bug in CMStochasticIterativeScheduler.
* Add expected slices for multistep sampling tests and make them pass.
* Work on consistency model fast tests:
- in pipeline, call self.scheduler.scale_model_input before denoising
- get expected slices for Euler and Heun scheduler tests
- make Euler test pass
- mark Heun test as expected fail because it doesn't support prediction_type "sample" yet
- remove DPM and Euler Ancestral tests because they don't support use_karras_sigmas
* make style
* Refactor conversion script to make it easier to add more model architectures to convert in the future.
* Work on ConsistencyModelPipeline tests:
- Fix device bug when handling class labels in ConsistencyModelPipeline.__call__
- Add slow tests for onestep and multistep sampling and make them pass
- Refactor fast tests
- Refactor ConsistencyModelPipeline.__init__
* make style
* Remove the add_noise and add_noise_to_input methods from CMStochasticIterativeScheduler for now.
* Run python utils/check_copies.py --fix_and_overwrite
python utils/check_dummies.py --fix_and_overwrite to make dummy objects for new pipeline and scheduler.
* Make fast tests from PipelineTesterMixin pass.
* make style
* Refactor consistency models pipeline and scheduler:
- Remove support for Karras schedulers (only support CMStochasticIterativeScheduler)
- Move sigma manipulation, input scaling, denoising from pipeline to scheduler
- Make corresponding changes to tests and ensure they pass
* make style
* Add docstrings and further refactor pipeline and scheduler.
* make style
* Add initial version of the consistency models documentation.
* Refactor custom timesteps logic following DDPMScheduler/IFPipeline and temporarily add torch 2.0 SDPA kernel selection logic for debugging.
* make style
* Convert current slow tests to use fp16 and flash attention.
* make style
* Add slow tests for normal attention on cuda device.
* make style
* Fix attention weights loading
* Update consistency model fast tests for new test checkpoints with attention fix.
* make style
* apply suggestions
* Add add_noise method to CMStochasticIterativeScheduler (copied from EulerDiscreteScheduler).
* Conversion script now outputs pipeline instead of UNet and add support for LSUN-256 models and different schedulers.
* When both timesteps and num_inference_steps are supplied, raise warning instead of error (timesteps take precedence).
* make style
* Add remaining diffusers model checkpoints for models in the original consistency model release and update usage example.
* apply suggestions from review
* make style
* fix attention naming
* Add tests for CMStochasticIterativeScheduler.
* make style
* Make CMStochasticIterativeScheduler tests pass.
* make style
* Override test_step_shape in CMStochasticIterativeSchedulerTest instead of modifying it in SchedulerCommonTest.
* make style
* rename some models
* Improve API
* rename some models
* Remove duplicated block
* Add docstring and make torch compile work
* More fixes
* Fixes
* Apply suggestions from code review
* Apply suggestions from code review
* add more docstring
* update consistency conversion script
---------
Co-authored-by: ayushmangal <ayushmangal@microsoft.com>
Co-authored-by: Ayush Mangal <43698245+ayushtues@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* add test case for StableDiffusionKDiffusionPipeline noise_sampler
---------
Signed-off-by: GitHub <noreply@github.com>
Co-authored-by: Aisuko <urakiny@gmail.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Andrés Mauricio Repetto Ferrero <amd.repetto@gmail.com>
Co-authored-by: estelleafl <estelle.aflalo@intel.com>
Co-authored-by: Aflalo <estellea@isl-iam1.rr.intel.com>
Co-authored-by: Aflalo <estellea@isl-gpu33.rr.intel.com>
Co-authored-by: Aflalo <estellea@isl-gpu38.rr.intel.com>
Co-authored-by: Prathik Rao <prathikr@usc.edu>
Co-authored-by: root <root@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
Co-authored-by: Prathik Rao <prathikrao@microsoft.com@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>
Co-authored-by: ayushmangal <ayushmangal@microsoft.com>
Co-authored-by: Ayush Mangal <43698245+ayushtues@users.noreply.github.com>
* Add circular padding option
* Fix style with black
* Fix corner case with small image size
* Add circular padding test cases
* Fix docstring
* Improve docstring for circular padding, remove slow test case
* Update docs for circular padding argument
* Add images comparison for circular padding
* diffusers#4003 - initial implementation of max_inference_steps
* diffusers#4003 - initial implementation of max_inference_steps and first_inference_step for img2img
* diffusers#4003 - use first_inference_step as an input arg for get_timestamps in img2img
* diffusers#4003 Do not add noise during img2img when we have a defined first timestep
* diffusers#4003 Mild updates after revert
* diffusers#4003 Missing change
* Show implementation with denoising_start and end
* Apply suggestions from code review
* Update src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* move to 0.19.0dev
* Apply suggestions from code review
* add exhaustive tests
* add docs
* finish
* Apply suggestions from code review
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* make style
---------
Co-authored-by: bghira <bghira@users.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* 📝 Fix broken link to models documentation
Corrected the link to the models documentation in the README. Previously, the link was pointing to an incorrect URL. Now, the link directs users to the correct documentation page for more details on the models.
Thanks! 🙌
* Update src/diffusers/models/README.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* refactor to support patching LoRA into T5
instantiate the lora linear layer on the same device as the regular linear layer
get lora rank from state dict
tests
fmt
can create lora layer in float32 even when rest of model is float16
fix loading model hook
remove load_lora_weights_ and T5 dispatching
remove Unet#attn_processors_state_dict
docstrings
* text encoder monkeypatch class method
* fix test
---------
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* refactor prior_transformer
adding conversion script
add pipeline
add step_index from pipeline, + remove permute
add zero pad token
remove copy from statement for betas_for_alpha_bar function
* add
* add
* update conversion script for renderer model
* refactor camera a little bit
* clean up
* style
* fix copies
* Update src/diffusers/schedulers/scheduling_heun_discrete.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/diffusers/pipelines/shap_e/pipeline_shap_e.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/diffusers/pipelines/shap_e/pipeline_shap_e.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* alpha_transform_type
* remove step_index argument
* remove get_sigmas_karras
* remove _yiyi_sigma_to_t
* move the rescale prompt_embeds from prior_transformer to pipeline
* replace baddbmm with einsum to match origial repo
* Revert "replace baddbmm with einsum to match origial repo"
This reverts commit 3f6b435d65.
* add step_index to scale_model_input
* Revert "move the rescale prompt_embeds from prior_transformer to pipeline"
This reverts commit 5b5a8e6be9.
* move rescale from prior_transformer to pipeline
* correct step_index in scale_model_input
* remove print lines
* refactor prior - reduce arguments
* make style
* add prior_image
* arg embedding_proj_norm -> norm_embedding_proj
* add pre-norm for proj_embedding
* move rescale prompt from pipeline to _encode_prompt
* add img2img pipeline
* style
* copies
* Update src/diffusers/models/prior_transformer.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/diffusers/models/prior_transformer.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/diffusers/models/prior_transformer.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/diffusers/models/prior_transformer.py
add arg: encoder_hid_proj
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/diffusers/models/prior_transformer.py
add new config: norm_in_type
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/diffusers/models/prior_transformer.py
add new config: added_emb_type
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/diffusers/models/prior_transformer.py
rename out_dim -> clip_embed_dim
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/diffusers/models/prior_transformer.py
rename config: out_dim -> clip_embed_dim
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/diffusers/models/prior_transformer.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/diffusers/models/prior_transformer.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* finish refactor prior_tranformer
* make style
* refactor renderer
* fix
* make style
* refactor img2img
* remove params_proj
* add test
* add upcast_softmax to prior_transformer
* enable num_images_per_prompt, add save_gif utility
* add
* add fast test
* make style
* add slow test
* style
* add test for img2img
* refactor
* enable batching
* style
* refactor scheduler
* update test
* style
* attempt to solve batch related tests timeout
* add doc
* Update src/diffusers/pipelines/shap_e/pipeline_shap_e.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/diffusers/pipelines/shap_e/pipeline_shap_e_img2img.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* hardcode rendering related config
* update betas_for_alpha_bar on ddpm_scheduler
* fix copies
* fix
* export_to_gif
* style
* second attempt to speed up batching tests
* add doc page to index
* Remove intermediate clipping
* 3rd attempt to speed up batching tests
* Remvoe time index
* simplify scheduler
* Fix more
* Fix more
* fix more
* make style
* fix schedulers
* fix some more tests
* finish
* add one more test
* Apply suggestions from code review
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* style
* apply feedbacks
* style
* fix copies
* add one example
* style
* add example for img2img
* fix doc
* fix more doc strings
* size -> frame_size
* style
* update doc
* style
* fix on doc
* update repo name
* improve the usage example in shap-e img2img
* add usage examples in the shap-e docs.
* consolidate examples.
* minor fix.
* update doc
* Apply suggestions from code review
* Apply suggestions from code review
* remove upcast
* Make sure background is white
* Update src/diffusers/pipelines/shap_e/pipeline_shap_e.py
* Apply suggestions from code review
* Finish
* Apply suggestions from code review
* Update src/diffusers/pipelines/shap_e/pipeline_shap_e.py
* Make style
---------
Co-authored-by: yiyixuxu <yixu310@gmail,com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* Kandinsky2_2
* fix init kandinsky2_2
* kandinsky2_2 fix inpainting
* rename pipelines: remove decoder + 2_2 -> V22
* Update scheduling_unclip.py
* remove text_encoder and tokenizer arguments from doc string
* add test for text2img
* add tests for text2img & img2img
* fix
* add test for inpaint
* add prior tests
* style
* copies
* add controlnet test
* style
* add a test for controlnet_img2img
* update prior_emb2emb api to accept image_embedding or image
* add a test for prior_emb2emb
* style
* remove try except
* example
* fix
* add doc string examples to all kandinsky pipelines
* style
* update doc
* style
* add a top about 2.2
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* vae -> movq
* vae -> movq
* style
* fix the #copied from
* remove decoder from file name
* update doc: add a section for kandinsky 2.2
* fix
* fix-copies
* add coped from
* add copies from for prior
* add copies from for prior emb2emb
* copy from for img2img
* copied from for inpaint
* more copied from
* more copies from
* more copies
* remove the yiyi comments
* Apply suggestions from code review
* Self-contained example, pipeline order
* Import prior output instead of redefining.
* Style
* Make VQModel compatible with model offload.
* Fix copies
---------
Co-authored-by: Shahmatov Arseniy <62886550+cene555@users.noreply.github.com>
Co-authored-by: yiyixuxu <yixu310@gmail,com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* Add new text encoder
* add transformers depth
* More
* Correct conversion script
* Fix more
* Fix more
* Correct more
* correct text encoder
* Finish all
* proof that in works in run local xl
* clean up
* Get refiner to work
* Add red castle
* Fix batch size
* Improve pipelines more
* Finish text2image tests
* Add img2img test
* Fix more
* fix import
* Fix embeddings for classic models (#3888)
Fix embeddings for classic SD models.
* Allow multiple prompts to be passed to the refiner (#3895)
* finish more
* Apply suggestions from code review
* add watermarker
* Model offload (#3889)
* Model offload.
* Model offload for refiner / img2img
* Hardcode encoder offload on img2img vae encode
Saves some GPU RAM in img2img / refiner tasks so it remains below 8 GB.
---------
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* correct
* fix
* clean print
* Update install warning for `invisible-watermark`
* add: missing docstrings.
* fix and simplify the usage example in img2img.
* fix setup for watermarking.
* Revert "fix setup for watermarking."
This reverts commit 491bc9f5a6.
* fix: watermarking setup.
* fix: op.
* run make fix-copies.
* make sure tests pass
* improve convert
* make tests pass
* make tests pass
* better error message
* fiinsh
* finish
* Fix final test
---------
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* use sample directly instead of the dataclass.
* more usage of directly samples instead of dataclasses
* more usage of directly samples instead of dataclasses
* use direct sample in the pipeline.
* direct usage of sample in the img2img case.
* add default to unet output to prevent it from being a required arg
* add unit test
* make style
* adjust unit test
* mark as fast test
* adjust assert statement in test
---------
Co-authored-by: Prathik Rao <prathikrao@microsoft.com@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
Co-authored-by: root <root@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
* initial commit
* Improve consistency models sampling implementation.
* Add CMStochasticIterativeScheduler, which implements the multi-step sampler (stochastic_iterative_sampler) in the original code, and make further improvements to sampling.
* Add Unet blocks for consistency models
* Add conversion script for Unet
* Fix bug in new unet blocks
* Fix attention weight loading
* Make design improvements to ConsistencyModelPipeline and CMStochasticIterativeScheduler and add initial version of tests.
* make style
* Make small random test UNet class conditional and set resnet_time_scale_shift to 'scale_shift' to better match consistency model checkpoints.
* Add support for converting a test UNet and non-class-conditional UNets to the consistency models conversion script.
* make style
* Change num_class_embeds to 1000 to better match the original consistency models implementation.
* Add support for distillation in pipeline_consistency_models.py.
* Improve consistency model tests:
- Get small testing checkpoints from hub
- Modify tests to take into account "distillation" parameter of ConsistencyModelPipeline
- Add onestep, multistep tests for distillation and distillation + class conditional
- Add expected image slices for onestep tests
* make style
* Improve ConsistencyModelPipeline:
- Add initial support for class-conditional generation
- Fix initial sigma for onestep generation
- Fix some sigma shape issues
* make style
* Improve ConsistencyModelPipeline:
- add latents __call__ argument and prepare_latents method
- add check_inputs method
- add initial docstrings for ConsistencyModelPipeline.__call__
* make style
* Fix bug when randomly generating class labels for class-conditional generation.
* Switch CMStochasticIterativeScheduler to configuring a sigma schedule and make related changes to the pipeline and tests.
* Remove some unused code and make style.
* Fix small bug in CMStochasticIterativeScheduler.
* Add expected slices for multistep sampling tests and make them pass.
* Work on consistency model fast tests:
- in pipeline, call self.scheduler.scale_model_input before denoising
- get expected slices for Euler and Heun scheduler tests
- make Euler test pass
- mark Heun test as expected fail because it doesn't support prediction_type "sample" yet
- remove DPM and Euler Ancestral tests because they don't support use_karras_sigmas
* make style
* Refactor conversion script to make it easier to add more model architectures to convert in the future.
* Work on ConsistencyModelPipeline tests:
- Fix device bug when handling class labels in ConsistencyModelPipeline.__call__
- Add slow tests for onestep and multistep sampling and make them pass
- Refactor fast tests
- Refactor ConsistencyModelPipeline.__init__
* make style
* Remove the add_noise and add_noise_to_input methods from CMStochasticIterativeScheduler for now.
* Run python utils/check_copies.py --fix_and_overwrite
python utils/check_dummies.py --fix_and_overwrite to make dummy objects for new pipeline and scheduler.
* Make fast tests from PipelineTesterMixin pass.
* make style
* Refactor consistency models pipeline and scheduler:
- Remove support for Karras schedulers (only support CMStochasticIterativeScheduler)
- Move sigma manipulation, input scaling, denoising from pipeline to scheduler
- Make corresponding changes to tests and ensure they pass
* make style
* Add docstrings and further refactor pipeline and scheduler.
* make style
* Add initial version of the consistency models documentation.
* Refactor custom timesteps logic following DDPMScheduler/IFPipeline and temporarily add torch 2.0 SDPA kernel selection logic for debugging.
* make style
* Convert current slow tests to use fp16 and flash attention.
* make style
* Add slow tests for normal attention on cuda device.
* make style
* Fix attention weights loading
* Update consistency model fast tests for new test checkpoints with attention fix.
* make style
* apply suggestions
* Add add_noise method to CMStochasticIterativeScheduler (copied from EulerDiscreteScheduler).
* Conversion script now outputs pipeline instead of UNet and add support for LSUN-256 models and different schedulers.
* When both timesteps and num_inference_steps are supplied, raise warning instead of error (timesteps take precedence).
* make style
* Add remaining diffusers model checkpoints for models in the original consistency model release and update usage example.
* apply suggestions from review
* make style
* fix attention naming
* Add tests for CMStochasticIterativeScheduler.
* make style
* Make CMStochasticIterativeScheduler tests pass.
* make style
* Override test_step_shape in CMStochasticIterativeSchedulerTest instead of modifying it in SchedulerCommonTest.
* make style
* rename some models
* Improve API
* rename some models
* Remove duplicated block
* Add docstring and make torch compile work
* More fixes
* Fixes
* Apply suggestions from code review
* Apply suggestions from code review
* add more docstring
* update consistency conversion script
---------
Co-authored-by: ayushmangal <ayushmangal@microsoft.com>
Co-authored-by: Ayush Mangal <43698245+ayushtues@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Add timestep_spacing to DDPM, LMSDiscrete, PNDM.
* Remove spurious line.
* More easy schedulers.
* Add `linspace` to DDIM
* Noise sigma for `trailing`.
* Add timestep_spacing to DEISMultistepScheduler.
Not sure the range is the way it was intended.
* Fix: remove line used to debug.
* Support timestep_spacing in DPMSolverMultistep, DPMSolverSDE, UniPC
* Fix: convert to numpy.
* Use sched. defaults when instantiating from_config
For params not present in the original configuration.
This makes it possible to switch pipeline schedulers even if they use
different timestep_spacing (or any other param).
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Missing args in DPMSolverMultistep
* Test: default args not in config
* Style
* Fix scheduler name in test
* Remove duplicated entries
* Add test for solver_type
This test currently fails in main. When switching from DEIS to UniPC,
solver_type is "logrho" (the default value from DEIS), which gets
translated to "bh1" by UniPC. This is different to the default value for
UniPC: "bh2". This is where the translation happens: 36d22d0709/src/diffusers/schedulers/scheduling_unipc_multistep.py (L171)
* UniPC: use same default for solver_type
Fixes a bug when switching from UniPC from another scheduler (i.e.,
DEIS) that uses a different solver type. The solver is now the same as
if we had instantiated the scheduler directly.
* do not save use default values
* fix more
* fix all
* fix schedulers
* fix more
* finish for real
* finish for real
* flaky tests
* Update tests/pipelines/stable_diffusion/test_stable_diffusion_pix2pix_zero.py
* Default steps_offset to 0.
* Add missing docstrings
* Apply suggestions from code review
---------
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Improve memory text to video
* Apply suggestions from code review
* add test
* Apply suggestions from code review
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* finish test setup
---------
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* - Added validation parameters
- Changed some parameter descriptions to better explain their use.
- Fixed a few typos.
- Added concept_list parameter for better management of multiple subjects
- changed logic for image validation
* - Fixed bad logic for class data root directories
* Defaulting validation_steps to None for an easier logic
* Fixed multiple validation prompts
* Fixed bug on validation negative prompt
* Changed validation logic for tracker.
* Added uuid for validation image labeling
* Fix error when comparing validation prompts and validation negative prompts
* Improved error message when negative prompts for validation are more than the number of prompts
* - Changed image tracking number from epoch to global_step
- Added Typing for functions
* Added some validations more when using concept_list parameter and the regular ones.
* Fixed error message
* Added more validations for validation parameters
* Improved messaging for errors
* Fixed validation error for parameters with default values
* - Added train step to image name for validation
- reformatted code
* - Added train step to image's name for validation
- reformatted code
* Updated README.md file.
* reverted back original script of train_dreambooth.py
* reverted back original script of train_dreambooth.py
* left one blank line at the eof
* reverted back setup.py
* reverted back setup.py
* added same logic for when parameters for prior preservation are used without enabling the flag while using concept_list parameter.
* Ran black formatter.
* fixed a few strings
* fixed import sort with isort and removed fstrings without placeholder
* fixed import order with ruff (since with isort wasn't ok)
---------
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Correct controlnet out of list error
* Apply suggestions from code review
* correct tests
* correct tests
* fix
* test all
* Apply suggestions from code review
* test all
* test all
* Apply suggestions from code review
* Apply suggestions from code review
* fix more tests
* Fix more
* Apply suggestions from code review
* finish
* Apply suggestions from code review
* Update src/diffusers/schedulers/scheduling_k_dpm_2_ancestral_discrete.py
* finish
* Support for manual CLIP loading in StableDiffusionPipeline - txt2img.
* Update src/diffusers/pipelines/stable_diffusion/convert_from_ckpt.py
* Update variables & according docs to match previous style.
* Updated to match style & quality of 'diffusers'
---------
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Add guidance start/stop
* Add guidance start/stop to inpaint class
* Black formatting
* Add support for guidance for multicontrolnet
* Add inclusive end
* Improve design
* correct imports
* Finish
* Finish all
* Correct more
* make style
---------
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* add paradigms parallel sampling pipeline
* linting
* ran make fix-copies
* add paradigms parallel sampling pipeline
* linting
* ran make fix-copies
* Apply suggestions from code review
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* changes based on review
* add docs for paradigms
* update docs with paradigms abstract
* improve documentation, and add tests for ddim/ddpm batch_step_no_noise
* fix docs and run make fix-copies
* minor changes to docs.
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* move parallel scheduler to new classes for DDPMParallelScheduler and DDIMParallelScheduler
* remove changes for scheduling_ddim, adjust licenses, credits, and commented code
* fix tensor type that is breaking tests
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* add entry for safe stable diffusion to the sd overview page.
* add missing pipelines o the broader overview section in the pipelines.
* address PR feedback./
* refactor: readme serialized from the example when push_to_hub is True.
* fix: batch size arg.
* a bit better formatting
* minor fixes.
* add note on env.
* Apply suggestions from code review
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* condition wandb info better
* make mixed_precision assignment in cli args explicit.
* separate inference block for sample images.
* Apply suggestions from code review
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* address more comments.
* autocast mode.
* correct none image type problem.
* ifx: list assignment.
* minor fix.
---------
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* added ldm3d pipeline and updated image processor to support depth
* added description
* added paper reference
* added docs
* fixed bug
* added test
* Update tests/pipelines/stable_diffusion/test_stable_diffusion_ldm3d.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update tests/pipelines/stable_diffusion/test_stable_diffusion_ldm3d.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_ldm3d.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_ldm3d.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_ldm3d.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_ldm3d.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_ldm3d.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_ldm3d.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_ldm3d.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_ldm3d.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_ldm3d.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_ldm3d.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_ldm3d.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* added reference in indexmdx
* reverted changes tto image processor'
* added LDM3DOutput
* Fixes with make style
* fix failing tests for make fix-copies
* aligned with our version
* Update pipeline_stable_diffusion_ldm3d.py
updated the guidance scale
* Fix for failing check_code_quality test
* Code review feedback
* Fix typo in ldm3d_diffusion.mdx
* updated the doc accordnlgy
* copyrights
* fixed test failure
* make style
* added image processor of LDM3D in the documentation:
* added ldm3d doc to toctree
* run make style && make quality
* run make fix-copies
* Update docs/source/en/api/image_processor.mdx
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Update docs/source/en/api/pipelines/stable_diffusion/ldm3d_diffusion.mdx
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Update docs/source/en/api/pipelines/stable_diffusion/ldm3d_diffusion.mdx
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* updated the safety checker to accept tuple
* make style and make quality
* Update src/diffusers/pipelines/stable_diffusion/__init__.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_ldm3d.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_ldm3d.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_ldm3d.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* LDM3D output
* up
---------
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Aflalo <estellea@isl-gpu27.rr.intel.com>
Co-authored-by: Anahita Bhiwandiwalla <anahita.bhiwandiwalla@intel.com>
Co-authored-by: Aflalo <estellea@isl-gpu26.rr.intel.com>
Co-authored-by: Aflalo <estellea@isl-iam1.rr.intel.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Aflalo <estellea@isl-gpu42.rr.intel.com>
Co-authored-by: Aflalo <estellea@isl-gpu43.rr.intel.com>
* modify the issue template to include core maintainers.
* add: entry for audio.
* Update .github/ISSUE_TEMPLATE/bug-report.yml
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
---------
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Update pipeline_flax_controlnet.py
Change type of images array from jax.numpy.array to numpy.ndarray to permit in-place modification of the array when the safety checker detects a NSFW image.
* fix docs typos. add frame_ids argument to text2video-zero pipeline call
* make style && make quality
* add support of pytorch 2.0 scaled_dot_product_attention for CrossFrameAttnProcessor
* add chunk-by-chunk processing to text2video-zero docs
* make style && make quality
* Update docs/source/en/api/pipelines/text_to_video_zero.mdx
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* added StableDiffusionCanvasPipeline pipeline
* Added utils codes to pipe_utils file.
* make style
* delete mixture.py and Text2ImageRegion class
* make style
* Added the codes to the readme.md file.
* Moved functions from pipeline_utils to mix_canvas
Thus, issues are of the same importance as pull requests when contributing to this library ❤️.
In order to make your issue as **useful for the community as possible**, let's try to stick to some simple guidelines:
- 1. Please try to be as precise and concise as possible.
*Give your issue a fitting title. Assume that someone which very limited knowledge of diffusers can understand your issue. Add links to the source code, documentation other issues, pull requests etc...*
*Give your issue a fitting title. Assume that someone which very limited knowledge of Diffusers can understand your issue. Add links to the source code, documentation other issues, pull requests etc...*
- 2. If your issue is about something not working, **always** provide a reproducible code snippet. The reader should be able to reproduce your issue by **only copy-pasting your code snippet into a Python shell**.
*The community cannot solve your issue if it cannot reproduce it. If your bug is related to training, add your training script and make everything needed to train public. Otherwise, just add a simple Python code snippet.*
- 3. Add the **minimum amount of code / context that is needed to understand, reproduce your issue**.
- 3. Add the **minimum** amount of code / context that is needed to understand, reproduce your issue.
*Make the life of maintainers easy. `diffusers` is getting many issues every day. Make sure your issue is about one bug and one bug only. Make sure you add only the context, code needed to understand your issues - nothing more. Generally, every issue is a way of documenting this library, try to make it a good documentation entry.*
- 4. For issues related to community pipelines (i.e., the pipelines located in the `examples/community` folder), please tag the author of the pipeline in your issue thread as those pipelines are not maintained.
- type:markdown
attributes:
value:|
For more in-detail information on how to write good issues you can have a look [here](https://huggingface.co/course/chapter8/5?fw=pt)
For more in-detail information on how to write good issues you can have a look [here](https://huggingface.co/course/chapter8/5?fw=pt).
- type:textarea
id:bug-description
attributes:
@@ -46,6 +47,64 @@ body:
attributes:
label:System Info
description:Please share your system info with us. You can run the command `diffusers-cli env` and copy-paste its output below.
about: Start a new translation effort in your language
title: '[<languageCode>] Translating docs to <languageName>'
labels: WIP
assignees: ''
---
<!--
Note: Please search to see if an issue already exists for the language you are trying to translate.
-->
Hi!
Let's bring the documentation to all the <languageName>-speaking community 🌐.
Who would want to translate? Please follow the 🤗 [TRANSLATING guide](https://github.com/huggingface/diffusers/blob/main/docs/TRANSLATING.md). Here is a list of the files ready for translation. Let us know in this issue if you'd like to translate any, and we'll add your name to the list.
Some notes:
* Please translate using an informal tone (imagine you are talking with a friend about Diffusers 🤗).
* Please translate in a gender-neutral way.
* Add your translations to the folder called `<languageCode>` inside the [source folder](https://github.com/huggingface/diffusers/tree/main/docs/source).
* Register your translation in `<languageCode>/_toctree.yml`; please follow the order of the [English version](https://github.com/huggingface/diffusers/blob/main/docs/source/en/_toctree.yml).
* Once you're finished, open a pull request and tag this issue by including #issue-number in the description, where issue-number is the number of this issue. Please ping @stevhliu for review.
* 🙋 If you'd like others to help you with the translation, you can also post in the 🤗 [forums](https://discuss.huggingface.co/c/discussion-related-to-httpsgithubcomhuggingfacediffusers/63).
Congratulations! You've made it this far! You're not quite done yet though.
Once merged, your PR is going to appear in the release notes with the title you set, so make sure it's a great title that fully reflects the extent of your awesome contribution.
Then, please replace this with a description of the change and which issue is fixed (if applicable). Please also include relevant motivation and context. List any dependencies (if any) that are required for this change.
Once you're done, someone will review your PR shortly (see the section "Who can review?" below to tag some potential reviewers). They may suggest changes to make the code even better. If no one reviewed your PR after a week has passed, don't hesitate to post a new comment @-mentioning the same persons---sometimes notifications get lost.
-->
<!-- Remove if not applicable -->
Fixes # (issue)
## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
- [ ] Did you read the [contributor guideline](https://github.com/huggingface/diffusers/blob/main/CONTRIBUTING.md)?
- [ ] Did you read our [philosophy doc](https://github.com/huggingface/diffusers/blob/main/PHILOSOPHY.md) (important for complex PRs)?
- [ ] Was this discussed/approved via a GitHub issue or the [forum](https://discuss.huggingface.co/c/discussion-related-to-httpsgithubcomhuggingfacediffusers/63)? Please add a link to it if that's the case.
- [ ] Did you make sure to update the documentation with your changes? Here are the
[documentation guidelines](https://github.com/huggingface/diffusers/tree/main/docs), and
[here are tips on formatting docstrings](https://github.com/huggingface/diffusers/tree/main/docs#writing-source-documentation).
- [ ] Did you write any new necessary tests?
## Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
<!-- Your PR will be replied to more quickly if you can figure out the right person to tag with @.
If you know how to use git blame, that is the easiest way, otherwise, here is a rough guide of **who to tag**.
Please tag fewer than 3 people.
Core library:
- Schedulers: @yiyixuxu
- Pipelines and pipeline callbacks: @yiyixuxu and @asomoza
- Training examples: @sayakpaul
- Docs: @stevhliu and @sayakpaul
- JAX and MPS: @pcuenca
- Audio: @sanchit-gandhi
- General functionalities: @sayakpaul@yiyixuxu@DN6
echo "Quality check failed. Please ensure the right dependency versions are installed with 'pip install -e .[quality]' and run 'make style && make quality'" >> $GITHUB_STEP_SUMMARY
check_repository_consistency:
needs:check_code_quality
runs-on:ubuntu-latest
steps:
- uses:actions/checkout@v3
- name:Set up Python
uses:actions/setup-python@v4
with:
python-version:"3.8"
- name:Install dependencies
run:|
python -m pip install --upgrade pip
pip install .[quality]
- name:Check repo consistency
run:|
python utils/check_copies.py
python utils/check_dummies.py
make deps_table_check_updated
- name:Check if failure
if:${{ failure() }}
run:|
echo "Repo consistency check failed. Please ensure the right dependency versions are installed with 'pip install -e .[quality]' and run 'make fix-copies'" >> $GITHUB_STEP_SUMMARY
echo "Quality check failed. Please ensure the right dependency versions are installed with 'pip install -e .[quality]' and run 'make style && make quality'" >> $GITHUB_STEP_SUMMARY
check_repository_consistency:
needs:check_code_quality
runs-on:ubuntu-latest
steps:
- uses:actions/checkout@v3
- name:Set up Python
uses:actions/setup-python@v4
with:
python-version:"3.8"
- name:Install dependencies
run:|
python -m pip install --upgrade pip
pip install .[quality]
- name:Check repo consistency
run:|
python utils/check_copies.py
python utils/check_dummies.py
make deps_table_check_updated
- name:Check if failure
if:${{ failure() }}
run:|
echo "Repo consistency check failed. Please ensure the right dependency versions are installed with 'pip install -e .[quality]' and run 'make fix-copies'" >> $GITHUB_STEP_SUMMARY
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -14,7 +14,7 @@ specific language governing permissions and limitations under the License.
We ❤️ contributions from the open-source community! Everyone is welcome, and all types of participation –not just code– are valued and appreciated. Answering questions, helping others, reaching out, and improving the documentation are all immensely valuable to the community, so don't be afraid and get involved if you're up for it!
Everyone is encouraged to start by saying 👋 in our public Discord channel. We discuss the latest trends in diffusion models, ask questions, show off personal projects, help each other with contributions, or just hang out ☕. <a href="https://Discord.gg/G7tWnz98XR"><img alt="Join us on Discord" src="https://img.shields.io/Discord/823813159592001537?color=5865F2&logo=Discord&logoColor=white"></a>
Everyone is encouraged to start by saying 👋 in our public Discord channel. We discuss the latest trends in diffusion models, ask questions, show off personal projects, help each other with contributions, or just hang out ☕. <a href="https://discord.gg/G7tWnz98XR"><img alt="Join us on Discord" src="https://img.shields.io/discord/823813159592001537?color=5865F2&logo=Discord&logoColor=white"></a>
Whichever way you choose to contribute, we strive to be part of an open, welcoming, and kind community. Please, read our [code of conduct](https://github.com/huggingface/diffusers/blob/main/CODE_OF_CONDUCT.md) and be mindful to respect it during your interactions. We also recommend you become familiar with the [ethical guidelines](https://huggingface.co/docs/diffusers/conceptual/ethical_guidelines) that guide our project and ask you to adhere to the same principles of transparency and responsibility.
@@ -28,11 +28,11 @@ the core library.
In the following, we give an overview of different ways to contribute, ranked by difficulty in ascending order. All of them are valuable to the community.
* 1. Asking and answering questions on [the Diffusers discussion forum](https://discuss.huggingface.co/c/discussion-related-to-httpsgithubcomhuggingfacediffusers) or on [Discord](https://discord.gg/G7tWnz98XR).
* 2. Opening new issues on [the GitHub Issues tab](https://github.com/huggingface/diffusers/issues/new/choose)
* 3. Answering issues on [the GitHub Issues tab](https://github.com/huggingface/diffusers/issues)
* 2. Opening new issues on [the GitHub Issues tab](https://github.com/huggingface/diffusers/issues/new/choose).
* 3. Answering issues on [the GitHub Issues tab](https://github.com/huggingface/diffusers/issues).
* 4. Fix a simple issue, marked by the "Good first issue" label, see [here](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22).
* 5. Contribute to the [documentation](https://github.com/huggingface/diffusers/tree/main/docs/source).
* 6. Contribute a [Community Pipeline](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3Acommunity-examples)
* 6. Contribute a [Community Pipeline](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3Acommunity-examples).
* 7. Contribute to the [examples](https://github.com/huggingface/diffusers/tree/main/examples).
* 8. Fix a more difficult issue, marked by the "Good second issue" label, see [here](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3A%22Good+second+issue%22).
* 9. Add a new pipeline, model, or scheduler, see ["New Pipeline/Model"](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3A%22New+pipeline%2Fmodel%22) and ["New scheduler"](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3A%22New+scheduler%22) issues. For this contribution, please have a look at [Design Philosophy](https://github.com/huggingface/diffusers/blob/main/PHILOSOPHY.md).
@@ -40,7 +40,7 @@ In the following, we give an overview of different ways to contribute, ranked by
As said before, **all contributions are valuable to the community**.
In the following, we will explain each contribution a bit more in detail.
For all contributions 4.-9. you will need to open a PR. It is explained in detail how to do so in [Opening a pull requst](#how-to-open-a-pr)
For all contributions 4-9, you will need to open a PR. It is explained in detail how to do so in [Opening a pull request](#how-to-open-a-pr).
### 1. Asking and answering questions on the Diffusers discussion forum or on the Diffusers Discord
@@ -63,7 +63,7 @@ In the same spirit, you are of immense help to the community by answering such q
**Please** keep in mind that the more effort you put into asking or answering a question, the higher
the quality of the publicly documented knowledge. In the same way, well-posed and well-answered questions create a high-quality knowledge database accessible to everybody, while badly posed questions or answers reduce the overall quality of the public knowledge database.
In short, a high quality question or answer is *precise*, *concise*, *relevant*, *easy-to-understand*, *accesible*, and *well-formated/well-posed*. For more information, please have a look through the [How to write a good issue](#how-to-write-a-good-issue) section.
In short, a high quality question or answer is *precise*, *concise*, *relevant*, *easy-to-understand*, *accessible*, and *well-formated/well-posed*. For more information, please have a look through the [How to write a good issue](#how-to-write-a-good-issue) section.
**NOTE about channels**:
[*The forum*](https://discuss.huggingface.co/c/discussion-related-to-httpsgithubcomhuggingfacediffusers/63) is much better indexed by search engines, such as Google. Posts are ranked by popularity rather than chronologically. Hence, it's easier to look up questions and answers that we posted some time ago.
@@ -91,12 +91,12 @@ open a new issue nevertheless and link to the related issue.
New issues usually include the following.
#### 2.1. Reproducible, minimal bug reports.
#### 2.1. Reproducible, minimal bug reports
A bug report should always have a reproducible code snippet and be as minimal and concise as possible.
This means in more detail:
- Narrow the bug down as much as you can, **do not just dump your whole code file**
- Format your code
- Narrow the bug down as much as you can, **do not just dump your whole code file**.
- Format your code.
- Do not include any external libraries except for Diffusers depending on them.
- **Always** provide all necessary information about your environment; for this, you can run: `diffusers-cli env` in your shell and copy-paste the displayed information to the issue.
- Explain the issue. If the reader doesn't know what the issue is and why it is an issue, she cannot solve it.
@@ -105,9 +105,9 @@ This means in more detail:
For more information, please have a look through the [How to write a good issue](#how-to-write-a-good-issue) section.
You can open a bug report [here](https://github.com/huggingface/diffusers/issues/new/choose).
You can open a bug report [here](https://github.com/huggingface/diffusers/issues/new?assignees=&labels=bug&projects=&template=bug-report.yml).
#### 2.2. Feature requests.
#### 2.2. Feature requests
A world-class feature request addresses the following points:
@@ -125,21 +125,21 @@ Awesome! Tell us what problem it solved for you.
You can open a feature request [here](https://github.com/huggingface/diffusers/issues/new?assignees=&labels=&template=feature_request.md&title=).
#### 2.3 Feedback.
#### 2.3 Feedback
Feedback about the library design and why it is good or not good helps the core maintainers immensely to build a user-friendly library. To understand the philosophy behind the current design philosophy, please have a look [here](https://huggingface.co/docs/diffusers/conceptual/philosophy). If you feel like a certain design choice does not fit with the current design philosophy, please explain why and how it should be changed. If a certain design choice follows the design philosophy too much, hence restricting use cases, explain why and how it should be changed.
If a certain design choice is very useful for you, please also leave a note as this is great feedback for future design decisions.
You can open an issue about feedback [here](https://github.com/huggingface/diffusers/issues/new?assignees=&labels=&template=feedback.md&title=).
#### 2.4 Technical questions.
#### 2.4 Technical questions
Technical questions are mainly about why certain code of the library was written in a certain way, or what a certain part of the code does. Please make sure to link to the code in question and please provide detail on
why this part of the code is difficult to understand.
You can open an issue about a technical question [here](https://github.com/huggingface/diffusers/issues/new?assignees=&labels=bug&template=bug-report.yml).
#### 2.5 Proposal to add a new model, scheduler, or pipeline.
#### 2.5 Proposal to add a new model, scheduler, or pipeline
If the diffusion model community released a new model, pipeline, or scheduler that you would like to see in the Diffusers library, please provide the following information:
@@ -156,19 +156,19 @@ You can open a request for a model/pipeline/scheduler [here](https://github.com/
Answering issues on GitHub might require some technical knowledge of Diffusers, but we encourage everybody to give it a try even if you are not 100% certain that your answer is correct.
Some tips to give a high-quality answer to an issue:
- Be as concise and minimal as possible
- Be as concise and minimal as possible.
- Stay on topic. An answer to the issue should concern the issue and only the issue.
- Provide links to code, papers, or other sources that prove or encourage your point.
- Answer in code. If a simple code snippet is the answer to the issue or shows how the issue can be solved, please provide a fully reproducible code snippet.
Also, many issues tend to be simply off-topic, duplicates of other issues, or irrelevant. It is of great
help to the maintainers if you can answer such issues, encouraging the author of the issue to be
more precise, provide the link to a duplicated issue or redirect them to [the forum](https://discuss.huggingface.co/c/discussion-related-to-httpsgithubcomhuggingfacediffusers/63) or [Discord](https://discord.gg/G7tWnz98XR)
more precise, provide the link to a duplicated issue or redirect them to [the forum](https://discuss.huggingface.co/c/discussion-related-to-httpsgithubcomhuggingfacediffusers/63) or [Discord](https://discord.gg/G7tWnz98XR).
If you have verified that the issued bug report is correct and requires a correction in the source code,
please have a look at the next sections.
For all of the following contributions, you will need to open a PR. It is explained in detail how to do so in the [Opening a pull requst](#how-to-open-a-pr) section.
For all of the following contributions, you will need to open a PR. It is explained in detail how to do so in the [Opening a pull request](#how-to-open-a-pr) section.
### 4. Fixing a "Good first issue"
@@ -202,7 +202,7 @@ Please have a look at [this page](https://github.com/huggingface/diffusers/tree/
### 6. Contribute a community pipeline
[Pipelines](https://huggingface.co/docs/diffusers/api/pipelines/overview) are usually the first point of contact between the Diffusers library and the user.
Pipelines are examples of how to use Diffusers [models](https://huggingface.co/docs/diffusers/api/models) and [schedulers](https://huggingface.co/docs/diffusers/api/schedulers/overview).
Pipelines are examples of how to use Diffusers [models](https://huggingface.co/docs/diffusers/api/models/overview) and [schedulers](https://huggingface.co/docs/diffusers/api/schedulers/overview).
We support two types of pipelines:
- Official Pipelines
@@ -242,27 +242,28 @@ We support two types of training examples:
Research training examples are located in [examples/research_projects](https://github.com/huggingface/diffusers/tree/main/examples/research_projects) whereas official training examples include all folders under [examples](https://github.com/huggingface/diffusers/tree/main/examples) except the `research_projects` and `community` folders.
The official training examples are maintained by the Diffusers' core maintainers whereas the research training examples are maintained by the community.
This is because of the same reasons put forward in [6. Contribute a community pipeline](#contribute-a-community-pipeline) for official pipelines vs. community pipelines: It is not feasible for the core maintainers to maintain all possible training methods for diffusion models.
This is because of the same reasons put forward in [6. Contribute a community pipeline](#6-contribute-a-community-pipeline) for official pipelines vs. community pipelines: It is not feasible for the core maintainers to maintain all possible training methods for diffusion models.
If the Diffusers core maintainers and the community consider a certain training paradigm to be too experimental or not popular enough, the corresponding training code should be put in the `research_projects` folder and maintained by the author.
Both official training and research examples consist of a directory that contains one or more training scripts, a requirements.txt file, and a README.md file. In order for the user to make use of the
Both official training and research examples consist of a directory that contains one or more training scripts, a `requirements.txt` file, and a `README.md` file. In order for the user to make use of the
training examples, it is required to clone the repository:
Therefore when adding an example, the `requirements.txt` file shall define all pip dependencies required for your training example so that once all those are installed, the user can run the example's training script. See, for example, the [DreamBooth `requirements.txt` file](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/requirements.txt).
Training examples of the Diffusers library should adhere to the following philosophy:
- All the code necessary to run the examples should be found in a single Python file
- One should be able to run the example from the command line with `python <your-example>.py --args`
- All the code necessary to run the examples should be found in a single Python file.
- One should be able to run the example from the command line with `python <your-example>.py --args`.
- Examples should be kept simple and serve as **an example** on how to use Diffusers for training. The purpose of example scripts is **not** to create state-of-the-art diffusion models, but rather to reproduce known training schemes without adding too much custom logic. As a byproduct of this point, our examples also strive to serve as good educational materials.
To contribute an example, it is highly recommended to look at already existing examples such as [dreambooth](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/train_dreambooth.py) to get an idea of how they should look like.
@@ -281,7 +282,7 @@ If you are contributing to the official training examples, please also make sure
usually more complicated to solve than [Good first issues](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22).
The issue description usually gives less guidance on how to fix the issue and requires
a decent understanding of the library by the interested contributor.
If you are interested in tackling a second good issue, feel free to open a PR to fix it and link the PR to the issue. If you see that a PR has already been opened for this issue but did not get merged, have a look to understand why it wasn't merged and try to open an improved PR.
If you are interested in tackling a good second issue, feel free to open a PR to fix it and link the PR to the issue. If you see that a PR has already been opened for this issue but did not get merged, have a look to understand why it wasn't merged and try to open an improved PR.
Good second issues are usually more difficult to get merged compared to good first issues, so don't hesitate to ask for help from the core maintainers. If your PR is almost finished the core maintainers can also jump into your PR and commit to it in order to get it merged.
### 9. Adding pipelines, models, schedulers
@@ -297,7 +298,7 @@ if you don't know yet what specific component you would like to add:
- [Model or pipeline](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3A%22New+pipeline%2Fmodel%22)
Before adding any of the three components, it is strongly recommended that you give the [Philosophy guide](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3A%22Good+second+issue%22) a read to better understand the design of any of the three components. Please be aware that
Before adding any of the three components, it is strongly recommended that you give the [Philosophy guide](https://github.com/huggingface/diffusers/blob/main/PHILOSOPHY.md) a read to better understand the design of any of the three components. Please be aware that
we cannot merge model, scheduler, or pipeline additions that strongly diverge from our design philosophy
as it will lead to API inconsistencies. If you fundamentally disagree with a design choice, please
open a [Feedback issue](https://github.com/huggingface/diffusers/issues/new?assignees=&labels=&template=feedback.md&title=) instead so that it can be discussed whether a certain design
@@ -337,8 +338,8 @@ to be merged;
9. Add high-coverage tests. No quality testing = no merge.
- If you are adding new `@slow` tests, make sure they pass using
CircleCI does not run the slow tests, but GitHub actions does every night!
10. All public methods must have informative docstrings that work nicely with markdown. See `[pipeline_latent_diffusion.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/latent_diffusion/pipeline_latent_diffusion.py)` for an example.
CircleCI does not run the slow tests, but GitHub Actions does every night!
10. All public methods must have informative docstrings that work nicely with markdown. See [`pipeline_latent_diffusion.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/latent_diffusion/pipeline_latent_diffusion.py) for an example.
11. Due to the rapidly growing repository, it is important to make sure that no files that would significantly weigh down the repository are added. This includes images, videos, and other non-text files. We prefer to leverage a hf.co hosted `dataset` like
[`hf-internal-testing`](https://huggingface.co/hf-internal-testing) or [huggingface/documentation-images](https://huggingface.co/datasets/huggingface/documentation-images) to place these files.
If an external contribution, feel free to add the images to your PR and ask a Hugging Face member to migrate your images
@@ -355,7 +356,7 @@ You will need basic `git` proficiency to be able to contribute to
manual. Type `git --help` in a shell and enjoy. If you prefer books, [Pro
Git](https://git-scm.com/book/en/v2) is a very good reference.
Follow these steps to start contributing ([supported Python versions](https://github.com/huggingface/diffusers/blob/main/setup.py#L244)):
Follow these steps to start contributing ([supported Python versions](https://github.com/huggingface/diffusers/blob/42f25d601a910dceadaee6c44345896b4cfa9928/setup.py#L270)):
1. Fork the [repository](https://github.com/huggingface/diffusers) by
clicking on the 'Fork' button on the repository's page. This creates a copy of the code
@@ -364,7 +365,7 @@ under your GitHub user account.
2. Clone your fork to your local disk, and add the base repository as a remote:
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -22,7 +22,7 @@ In a nutshell, Diffusers is built to be a natural extension of PyTorch. Therefor
## Usability over Performance
- While Diffusers has many built-in performance-enhancing features (see [Memory and Speed](https://huggingface.co/docs/diffusers/optimization/fp16)), models are always loaded with the highest precision and lowest optimization. Therefore, by default diffusion pipelines are always instantiated on CPU with float32 precision if not otherwise defined by the user. This ensures usability across different platforms and accelerators and means that no complex installations are required to run the library.
- Diffusers aim at being a **light-weight** package and therefore has very few required dependencies, but many soft dependencies that can improve performance (such as `accelerate`, `safetensors`, `onnx`, etc...). We strive to keep the library as lightweight as possible so that it can be added without much concern as a dependency on other packages.
- Diffusers aims to be a **light-weight** package and therefore has very few required dependencies, but many soft dependencies that can improve performance (such as `accelerate`, `safetensors`, `onnx`, etc...). We strive to keep the library as lightweight as possible so that it can be added without much concern as a dependency on other packages.
- Diffusers prefers simple, self-explainable code over condensed, magic code. This means that short-hand code syntaxes such as lambda functions, and advanced PyTorch operators are often not desired.
## Simple over easy
@@ -31,13 +31,13 @@ As PyTorch states, **explicit is better than implicit** and **simple is better t
- We follow PyTorch's API with methods like [`DiffusionPipeline.to`](https://huggingface.co/docs/diffusers/main/en/api/diffusion_pipeline#diffusers.DiffusionPipeline.to) to let the user handle device management.
- Raising concise error messages is preferred to silently correct erroneous input. Diffusers aims at teaching the user, rather than making the library as easy to use as possible.
- Complex model vs. scheduler logic is exposed instead of magically handled inside. Schedulers/Samplers are separated from diffusion models with minimal dependencies on each other. This forces the user to write the unrolled denoising loop. However, the separation allows for easier debugging and gives the user more control over adapting the denoising process or switching out diffusion models or schedulers.
- Separately trained components of the diffusion pipeline, *e.g.* the text encoder, the unet, and the variational autoencoder, each have their own model class. This forces the user to handle the interaction between the different model components, and the serialization format separates the model components into different files. However, this allows for easier debugging and customization. Dreambooth or textual inversion training
is very simple thanks to diffusers' ability to separate single components of the diffusion pipeline.
- Separately trained components of the diffusion pipeline, *e.g.* the text encoder, the UNet, and the variational autoencoder, each has their own model class. This forces the user to handle the interaction between the different model components, and the serialization format separates the model components into different files. However, this allows for easier debugging and customization. DreamBooth or Textual Inversion training
is very simple thanks to Diffusers' ability to separate single components of the diffusion pipeline.
## Tweakable, contributor-friendly over abstraction
For large parts of the library, Diffusers adopts an important design principle of the [Transformers library](https://github.com/huggingface/transformers), which is to prefer copy-pasted code over hasty abstractions. This design principle is very opinionated and stands in stark contrast to popular design principles such as [Don't repeat yourself (DRY)](https://en.wikipedia.org/wiki/Don%27t_repeat_yourself).
In short, just like Transformers does for modeling files, diffusers prefers to keep an extremely low level of abstraction and very self-contained code for pipelines and schedulers.
In short, just like Transformers does for modeling files, Diffusers prefers to keep an extremely low level of abstraction and very self-contained code for pipelines and schedulers.
Functions, long code blocks, and even classes can be copied across multiple files which at first can look like a bad, sloppy design choice that makes the library unmaintainable.
**However**, this design has proven to be extremely successful for Transformers and makes a lot of sense for community-driven, open-source machine learning libraries because:
- Machine Learning is an extremely fast-moving field in which paradigms, model architectures, and algorithms are changing rapidly, which therefore makes it very difficult to define long-lasting code abstractions.
@@ -47,30 +47,30 @@ Functions, long code blocks, and even classes can be copied across multiple file
At Hugging Face, we call this design the **single-file policy** which means that almost all of the code of a certain class should be written in a single, self-contained file. To read more about the philosophy, you can have a look
at [this blog post](https://huggingface.co/blog/transformers-design-philosophy).
In diffusers, we follow this philosophy for both pipelines and schedulers, but only partly for diffusion models. The reason we don't follow this design fully for diffusion models is because almost all diffusion pipelines, such
as [DDPM](https://huggingface.co/docs/diffusers/v0.12.0/en/api/pipelines/ddpm), [Stable Diffusion](https://huggingface.co/docs/diffusers/v0.12.0/en/api/pipelines/stable_diffusion/overview#stable-diffusion-pipelines), [UnCLIP (Dalle-2)](https://huggingface.co/docs/diffusers/v0.12.0/en/api/pipelines/unclip#overview) and [Imagen](https://imagen.research.google/) all rely on the same diffusion model, the [UNet](https://huggingface.co/docs/diffusers/api/models#diffusers.UNet2DConditionModel).
In Diffusers, we follow this philosophy for both pipelines and schedulers, but only partly for diffusion models. The reason we don't follow this design fully for diffusion models is because almost all diffusion pipelines, such
as [DDPM](https://huggingface.co/docs/diffusers/api/pipelines/ddpm), [Stable Diffusion](https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/overview#stable-diffusion-pipelines), [unCLIP (DALL·E 2)](https://huggingface.co/docs/diffusers/api/pipelines/unclip) and [Imagen](https://imagen.research.google/) all rely on the same diffusion model, the [UNet](https://huggingface.co/docs/diffusers/api/models/unet2d-cond).
Great, now you should have generally understood why 🧨 Diffusers is designed the way it is 🤗.
We try to apply these design principles consistently across the library. Nevertheless, there are some minor exceptions to the philosophy or some unlucky design choices. If you have feedback regarding the design, we would ❤️ to hear it [directly on GitHub](https://github.com/huggingface/diffusers/issues/new?assignees=&labels=&template=feedback.md&title=).
## Design Philosophy in Details
Now, let's look a bit into the nitty-gritty details of the design philosophy. Diffusers essentially consist of three major classes, [pipelines](https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines), [models](https://github.com/huggingface/diffusers/tree/main/src/diffusers/models), and [schedulers](https://github.com/huggingface/diffusers/tree/main/src/diffusers/schedulers).
Let's walk through more in-detail design decisions for each class.
Now, let's look a bit into the nitty-gritty details of the design philosophy. Diffusers essentially consists of three major classes: [pipelines](https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines), [models](https://github.com/huggingface/diffusers/tree/main/src/diffusers/models), and [schedulers](https://github.com/huggingface/diffusers/tree/main/src/diffusers/schedulers).
Let's walk through more detailed design decisions for each class.
### Pipelines
Pipelines are designed to be easy to use (therefore do not follow [*Simple over easy*](#simple-over-easy) 100%)), are not feature complete, and should loosely be seen as examples of how to use [models](#models) and [schedulers](#schedulers) for inference.
Pipelines are designed to be easy to use (therefore do not follow [*Simple over easy*](#simple-over-easy) 100%), are not feature complete, and should loosely be seen as examples of how to use [models](#models) and [schedulers](#schedulers) for inference.
The following design principles are followed:
- Pipelines follow the single-file policy. All pipelines can be found in individual directories under src/diffusers/pipelines. One pipeline folder corresponds to one diffusion paper/project/release. Multiple pipeline files can be gathered in one pipeline folder, as it’s done for [`src/diffusers/pipelines/stable-diffusion`](https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines/stable_diffusion). If pipelines share similar functionality, one can make use of the [#Copied from mechanism](https://github.com/huggingface/diffusers/blob/125d783076e5bd9785beb05367a2d2566843a271/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_img2img.py#L251).
- Pipelines all inherit from [`DiffusionPipeline`]
- Pipelines follow the single-file policy. All pipelines can be found in individual directories under src/diffusers/pipelines. One pipeline folder corresponds to one diffusion paper/project/release. Multiple pipeline files can be gathered in one pipeline folder, as it’s done for [`src/diffusers/pipelines/stable-diffusion`](https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines/stable_diffusion). If pipelines share similar functionality, one can make use of the [#Copied from mechanism](https://github.com/huggingface/diffusers/blob/125d783076e5bd9785beb05367a2d2566843a271/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_img2img.py#L251).
- Pipelines all inherit from [`DiffusionPipeline`].
- Every pipeline consists of different model and scheduler components, that are documented in the [`model_index.json` file](https://huggingface.co/runwayml/stable-diffusion-v1-5/blob/main/model_index.json), are accessible under the same name as attributes of the pipeline and can be shared between pipelines with [`DiffusionPipeline.components`](https://huggingface.co/docs/diffusers/main/en/api/diffusion_pipeline#diffusers.DiffusionPipeline.components) function.
- Every pipeline should be loadable via the [`DiffusionPipeline.from_pretrained`](https://huggingface.co/docs/diffusers/main/en/api/diffusion_pipeline#diffusers.DiffusionPipeline.from_pretrained) function.
- Pipelines should be used **only** for inference.
- Pipelines should be very readable, self-explanatory, and easy to tweak.
- Pipelines should be designed to build on top of each other and be easy to integrate into higher-level APIs.
- Pipelines are **not** intended to be feature-complete user interfaces. For futurecomplete user interfaces one should rather have a look at [InvokeAI](https://github.com/invoke-ai/InvokeAI), [Diffuzers](https://github.com/abhishekkrthakur/diffuzers), and [lama-cleaner](https://github.com/Sanster/lama-cleaner)
- Pipelines are **not** intended to be feature-complete user interfaces. For feature-complete user interfaces one should rather have a look at [InvokeAI](https://github.com/invoke-ai/InvokeAI), [Diffuzers](https://github.com/abhishekkrthakur/diffuzers), and [lama-cleaner](https://github.com/Sanster/lama-cleaner).
- Every pipeline should have one and only one way to run it via a `__call__` method. The naming of the `__call__` arguments should be shared across all pipelines.
- Pipelines should be named after the task they are intended to solve.
- In almost all cases, novel diffusion pipelines shall be implemented in a new pipeline folder/file.
@@ -81,16 +81,16 @@ Models are designed as configurable toolboxes that are natural extensions of [Py
The following design principles are followed:
- Models correspond to **a type of model architecture**. *E.g.* the [`UNet2DConditionModel`] class is used for all UNet variations that expect 2D image inputs and are conditioned on some context.
- All models can be found in [`src/diffusers/models`](https://github.com/huggingface/diffusers/tree/main/src/diffusers/models) and every model architecture shall be defined in its file, e.g. [`unet_2d_condition.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/unet_2d_condition.py), [`transformer_2d.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/transformer_2d.py), etc...
- All models can be found in [`src/diffusers/models`](https://github.com/huggingface/diffusers/tree/main/src/diffusers/models) and every model architecture shall be defined in its file, e.g. [`unets/unet_2d_condition.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/unets/unet_2d_condition.py), [`transformers/transformer_2d.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/transformers/transformer_2d.py), etc...
- Models **do not** follow the single-file policy and should make use of smaller model building blocks, such as [`attention.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention.py), [`resnet.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/resnet.py), [`embeddings.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/embeddings.py), etc... **Note**: This is in stark contrast to Transformers' modeling files and shows that models do not really follow the single-file policy.
- Models intend to expose complexity, just like PyTorch's module does, and give clear error messages.
- Models intend to expose complexity, just like PyTorch's `Module` class, and give clear error messages.
- Models all inherit from `ModelMixin` and `ConfigMixin`.
- Models can be optimized for performance when it doesn’t demand major code changes, keeps backward compatibility, and gives significant memory or compute gain.
- Models can be optimized for performance when it doesn’t demand major code changes, keep backward compatibility, and give significant memory or compute gain.
- Models should by default have the highest precision and lowest performance setting.
- To integrate new model checkpoints whose general architecture can be classified as an architecture that already exists in Diffusers, the existing model architecture shall be adapted to make it work with the new checkpoint. One should only create a new file if the model architecture is fundamentally different.
- Models should be designed to be easily extendable to future changes. This can be achieved by limiting public function arguments, configuration arguments, and "foreseeing" future changes, *e.g.* it is usually better to add `string` "...type" arguments that can easily be extended to new future types instead of boolean `is_..._type` arguments. Only the minimum amount of changes shall be made to existing architectures to make a new model checkpoint work.
- The model design is a difficult trade-off between keeping code readable and concise and supporting many model checkpoints. For most parts of the modeling code, classes shall be adapted for new model checkpoints, while there are some exceptions where it is preferred to add new classes to make sure the code is kept concise and
readable longterm, such as [UNet blocks](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/unet_2d_blocks.py) and [Attention processors](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/cross_attention.py).
readable long-term, such as [UNet blocks](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/unets/unet_2d_blocks.py) and [Attention processors](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention_processor.py).
### Schedulers
@@ -99,12 +99,12 @@ Schedulers are responsible to guide the denoising process for inference as well
The following design principles are followed:
- All schedulers are found in [`src/diffusers/schedulers`](https://github.com/huggingface/diffusers/tree/main/src/diffusers/schedulers).
- Schedulers are **not** allowed to import from large utils files and shall be kept very self-contained.
- One scheduler python file corresponds to one scheduler algorithm (as might be defined in a paper).
- If schedulers share similar functionalities, we can make use of the `#Copied from` mechanism.
- One scheduler Python file corresponds to one scheduler algorithm (as might be defined in a paper).
- If schedulers share similar functionalities, we can make use of the `#Copied from` mechanism.
- Schedulers all inherit from `SchedulerMixin` and `ConfigMixin`.
- Schedulers can be easily swapped out with the [`ConfigMixin.from_config`](https://huggingface.co/docs/diffusers/main/en/api/configuration#diffusers.ConfigMixin.from_config) method as explained in detail [here](./using-diffusers/schedulers.mdx).
- Schedulers can be easily swapped out with the [`ConfigMixin.from_config`](https://huggingface.co/docs/diffusers/main/en/api/configuration#diffusers.ConfigMixin.from_config) method as explained in detail [here](./docs/source/en/using-diffusers/schedulers.md).
- Every scheduler has to have a `set_num_inference_steps`, and a `step` function. `set_num_inference_steps(...)` has to be called before every denoising process, *i.e.* before `step(...)` is called.
- Every scheduler exposes the timesteps to be "looped over" via a `timesteps` attribute, which is an array of timesteps the model will be called upon
- Every scheduler exposes the timesteps to be "looped over" via a `timesteps` attribute, which is an array of timesteps the model will be called upon.
- The `step(...)` function takes a predicted model output and the "current" sample (x_t) and returns the "previous", slightly more denoised sample (x_t-1).
- Given the complexity of diffusion schedulers, the `step` function does not expose all the complexity and can be a bit of a "black box".
- In almost all cases, novel schedulers shall be implemented in a new scheduling file.
- In almost all cases, novel schedulers shall be implemented in a new scheduling file.
🤗 Diffusers is the go-to library for state-of-the-art pretrained diffusion models for generating images, audio, and even 3D structures of molecules. Whether you're looking for a simple inference solution or training your own diffusion models, 🤗 Diffusers is a modular toolbox that supports both. Our library is designed with a focus on [usability over performance](https://huggingface.co/docs/diffusers/conceptual/philosophy#usability-over-performance), [simple over easy](https://huggingface.co/docs/diffusers/conceptual/philosophy#simple-over-easy), and [customizability over abstractions](https://huggingface.co/docs/diffusers/conceptual/philosophy#tweakable-contributorfriendly-over-abstraction).
@@ -21,11 +33,11 @@
- State-of-the-art [diffusion pipelines](https://huggingface.co/docs/diffusers/api/pipelines/overview) that can be run in inference with just a few lines of code.
- Interchangeable noise [schedulers](https://huggingface.co/docs/diffusers/api/schedulers/overview) for different diffusion speeds and output quality.
- Pretrained [models](https://huggingface.co/docs/diffusers/api/models) that can be used as building blocks, and combined with schedulers, for creating your own end-to-end diffusion systems.
- Pretrained [models](https://huggingface.co/docs/diffusers/api/models/overview) that can be used as building blocks, and combined with schedulers, for creating your own end-to-end diffusion systems.
## Installation
We recommend installing 🤗 Diffusers in a virtual environment from PyPi or Conda. For more details about installing [PyTorch](https://pytorch.org/get-started/locally/) and [Flax](https://flax.readthedocs.io/en/latest/installation.html), please refer to their official documentation.
We recommend installing 🤗 Diffusers in a virtual environment from PyPI or Conda. For more details about installing [PyTorch](https://pytorch.org/get-started/locally/) and [Flax](https://flax.readthedocs.io/en/latest/#installation), please refer to their official documentation.
### PyTorch
@@ -55,7 +67,7 @@ Please refer to the [How to use Stable Diffusion in Apple Silicon](https://huggi
## Quickstart
Generating outputs is super easy with 🤗 Diffusers. To generate an image from text, use the `from_pretrained` method to load any pretrained diffusion model (browse the [Hub](https://huggingface.co/models?library=diffusers&sort=downloads) for 4000+ checkpoints):
Generating outputs is super easy with 🤗 Diffusers. To generate an image from text, use the `from_pretrained` method to load any pretrained diffusion model (browse the [Hub](https://huggingface.co/models?library=diffusers&sort=downloads) for 27.000+ checkpoints):
```python
fromdiffusersimportDiffusionPipeline
@@ -72,14 +84,13 @@ You can also dig into the models and schedulers toolbox to build your own diffus
@@ -114,8 +125,7 @@ You can look out for [issues](https://github.com/huggingface/diffusers/issues) y
- See [New model/pipeline](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3A%22New+pipeline%2Fmodel%22) to contribute exciting new diffusion models / diffusion pipelines
- See [New scheduler](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3A%22New+scheduler%22)
Also, say 👋 in our public Discord channel <a href="https://discord.gg/G7tWnz98XR"><img alt="Join us on Discord" src="https://img.shields.io/discord/823813159592001537?color=5865F2&logo=discord&logoColor=white"></a>. We discuss the hottest trends about diffusion models, help each other with contributions, personal projects or
just hang out ☕.
Also, say 👋 in our public Discord channel <a href="https://discord.gg/G7tWnz98XR"><img alt="Join us on Discord" src="https://img.shields.io/discord/823813159592001537?color=5865F2&logo=discord&logoColor=white"></a>. We discuss the hottest trends about diffusion models, help each other with contributions, personal projects or just hang out ☕.
@@ -213,7 +228,7 @@ We also want to thank @heejkoo for the very helpful overview of papers, code and
```bibtex
@misc{von-platen-etal-2022-diffusers,
author={Patrick von Platen and Suraj Patil and Anton Lozhkov and Pedro Cuenca and Nathan Lambert and Kashif Rasul and Mishig Davaadorj and Thomas Wolf},
author={Patrick von Platen and Suraj Patil and Anton Lozhkov and Pedro Cuenca and Nathan Lambert and Kashif Rasul and Mishig Davaadorj and Dhruv Nair and Sayak Paul and William Berman and Yiyi Xu and Steven Liu and Thomas Wolf},
Use the relative style to link to the new file so that the versioned docs continue to work.
For an example of a rich moved section set please see the very end of [the transformers Trainer doc](https://github.com/huggingface/transformers/blob/main/docs/source/en/main_classes/trainer.mdx).
For an example of a rich moved section set please see the very end of [the transformers Trainer doc](https://github.com/huggingface/transformers/blob/main/docs/source/en/main_classes/trainer.md).
## Writing Documentation - Specification
@@ -109,8 +109,8 @@ although we can write them directly in Markdown.
Adding a new tutorial or section is done in two steps:
- Add a new file under `docs/source`. This file can either be ReStructuredText (.rst) or Markdown (.md).
- Link that file in `docs/source/_toctree.yml` on the correct toc-tree.
- Add a new Markdown (.md) file under `docs/source/<languageCode>`.
- Link that file in `docs/source/<languageCode>/_toctree.yml` on the correct toc-tree.
Make sure to put your new file under the proper section. It's unlikely to go in the first section (*Get Started*), so
depending on the intended targets (beginners, more advanced users, or researchers) it should go in sections two, three, or four.
@@ -119,8 +119,8 @@ depending on the intended targets (beginners, more advanced users, or researcher
When adding a new pipeline:
-create a file `xxx.mdx` under `docs/source/api/pipelines` (don't hesitate to copy an existing file as template).
- Link that file in (*Diffusers Summary*) section in `docs/source/api/pipelines/overview.mdx`, along with the link to the paper, and a colab notebook (if available).
-Create a file `xxx.md` under `docs/source/<languageCode>/api/pipelines` (don't hesitate to copy an existing file as template).
- Link that file in (*Diffusers Summary*) section in `docs/source/api/pipelines/overview.md`, along with the link to the paper, and a colab notebook (if available).
- Write a short overview of the diffusion model:
- Overview with paper & authors
- Paper abstract
@@ -129,8 +129,6 @@ When adding a new pipeline:
- Add all the pipeline classes that should be linked in the diffusion model. These classes should be added using our Markdown syntax. By default as follows:
```
## XXXPipeline
[[autodoc]] XXXPipeline
- all
- __call__
@@ -144,11 +142,11 @@ This will include every public method of the pipeline that is documented, as wel
- __call__
- enable_attention_slicing
- disable_attention_slicing
- enable_xformers_memory_efficient_attention
- enable_xformers_memory_efficient_attention
- disable_xformers_memory_efficient_attention
```
You can follow the same process to create a new scheduler under the `docs/source/api/schedulers` folder
You can follow the same process to create a new scheduler under the `docs/source/<languageCode>/api/schedulers` folder.
### Writing source documentation
@@ -156,7 +154,7 @@ Values that should be put in `code` should either be surrounded by backticks: \`
and objects like True, None, or any strings should usually be put in `code`.
When mentioning a class, function, or method, it is recommended to use our syntax for internal links so that our tool
adds a link to its documentation with this syntax: \[\`XXXClass\`\] or \[\`function\`\]. This requires the class or
adds a link to its documentation with this syntax: \[\`XXXClass\`\] or \[\`function\`\]. This requires the class or
function to be in the main package.
If you want to create a link to some internal class or function, you need to
@@ -164,7 +162,7 @@ provide its path. For instance: \[\`pipelines.ImagePipelineOutput\`\]. This will
`pipelines.ImagePipelineOutput` in the description. To get rid of the path and only keep the name of the object you are
linking to in the description, add a ~: \[\`~pipelines.ImagePipelineOutput\`\] will generate a link with `ImagePipelineOutput` in the description.
The same works for methods so you can either use \[\`XXXClass.method\`\] or \[~\`XXXClass.method\`\].
The same works for methods so you can either use \[\`XXXClass.method\`\] or \[\`~XXXClass.method\`\].
#### Defining arguments in a method
@@ -196,8 +194,8 @@ Here's an example showcasing everything so far:
For optional arguments or arguments with defaults we follow the following syntax: imagine we have a function with the
following signature:
```
defmy_function(x: str = None, a: float = 1):
```py
def my_function(x:str=None,a:float=3.14):
```
then its documentation should look like this:
@@ -206,7 +204,7 @@ then its documentation should look like this:
Args:
x (`str`, *optional*):
This argument controls ...
a (`float`, *optional*, defaults to 1):
a (`float`, *optional*, defaults to `3.14`):
This argument is used to ...
```
@@ -244,10 +242,10 @@ Here's an example of a tuple return, comprising several objects:
```
Returns:
`tuple(torch.FloatTensor)` comprising various elements depending on the configuration ([`BertConfig`]) and inputs:
- ** loss** (*optional*, returned when `masked_lm_labels` is provided) `torch.FloatTensor` of shape `(1,)` --
`tuple(torch.Tensor)` comprising various elements depending on the configuration ([`BertConfig`]) and inputs:
- ** loss** (*optional*, returned when `masked_lm_labels` is provided) `torch.Tensor` of shape `(1,)` --
Total loss is the sum of the masked language modeling loss and the next sequence prediction (classification) loss.
- **prediction_scores** (`torch.FloatTensor` of shape `(batch_size, sequence_length, config.vocab_size)`) --
- **prediction_scores** (`torch.Tensor` of shape `(batch_size, sequence_length, config.vocab_size)`) --
Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).
```
@@ -268,4 +266,3 @@ We have an automatic script running with the `make style` command that will make
This script may have some weird failures if you made a syntax mistake or if you uncover a bug. Therefore, it's
recommended to commit your changes before running `make style`, so you can revert the changes done by that script
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
### Translating the Diffusers documentation into your language
As part of our mission to democratize machine learning, we'd love to make the Diffusers library available in many more languages! Follow the steps below if you want to help translate the documentation into your language 🙏.
**🗞️ Open an issue**
To get started, navigate to the [Issues](https://github.com/huggingface/diffusers/issues) page of this repo and check if anyone else has opened an issue for your language. If not, open a new issue by selecting the "Translation template" from the "New issue" button.
To get started, navigate to the [Issues](https://github.com/huggingface/diffusers/issues) page of this repo and check if anyone else has opened an issue for your language. If not, open a new issue by selecting the "🌐 Translating a New Language?" from the "New issue" button.
Once an issue exists, post a comment to indicate which chapters you'd like to work on, and we'll add your name to the list.
@@ -16,7 +28,7 @@ First, you'll need to [fork the Diffusers repo](https://docs.github.com/en/get-s
Once you've forked the repo, you'll want to get the files on your local machine for editing. You can do that by cloning the fork with Git as follows:
**📋 Copy-paste the English version with a new language code**
@@ -29,18 +41,18 @@ You'll only need to copy the files in the [`docs/source/en`](https://github.com/
```bash
cd ~/path/to/diffusers/docs
cp -r source/en source/LANG-ID
cp -r source/en source/<LANG-ID>
```
Here, `LANG-ID` should be one of the ISO 639-1 or ISO 639-2 language codes -- see [here](https://www.loc.gov/standards/iso639-2/php/code_list.php) for a handy table.
Here, `<LANG-ID>` should be one of the ISO 639-1 or ISO 639-2 language codes -- see [here](https://www.loc.gov/standards/iso639-2/php/code_list.php) for a handy table.
**✍️ Start translating**
The fun part comes - translating the text!
The first thing we recommend is translating the part of the `_toctree.yml` file that corresponds to your doc chapter. This file is used to render the table of contents on the website.
The first thing we recommend is translating the part of the `_toctree.yml` file that corresponds to your doc chapter. This file is used to render the table of contents on the website.
> 🙋 If the `_toctree.yml` file doesn't yet exist for your language, you can create one by copy-pasting from the English version and deleting the sections unrelated to your chapter. Just make sure it exists in the `docs/source/LANG-ID/` directory!
> 🙋 If the `_toctree.yml` file doesn't yet exist for your language, you can create one by copy-pasting from the English version and deleting the sections unrelated to your chapter. Just make sure it exists in the `docs/source/<LANG-ID>/` directory!
The fields you should add are `local` (with the name of the file containing the translation; e.g. `autoclass_tutorial`), and `title` (with the title of the doc in your language; e.g. `Load pretrained instances with an AutoClass`) -- as a reference, here is the `_toctree.yml` for [English](https://github.com/huggingface/diffusers/blob/main/docs/source/en/_toctree.yml):
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Outpainting
Outpainting extends an image beyond its original boundaries, allowing you to add, replace, or modify visual elements in an image while preserving the original image. Like [inpainting](../using-diffusers/inpaint), you want to fill the white area (in this case, the area outside of the original image) with new visual elements while keeping the original image (represented by a mask of black pixels). There are a couple of ways to outpaint, such as with a [ControlNet](https://hf.co/blog/OzzyGT/outpainting-controlnet) or with [Differential Diffusion](https://hf.co/blog/OzzyGT/outpainting-differential-diffusion).
This guide will show you how to outpaint with an inpainting model, ControlNet, and a ZoeDepth estimator.
Before you begin, make sure you have the [controlnet_aux](https://github.com/huggingface/controlnet_aux) library installed so you can use the ZoeDepth estimator.
```py
!pipinstall-qcontrolnet_aux
```
## Image preparation
Start by picking an image to outpaint with and remove the background with a Space like [BRIA-RMBG-1.4](https://hf.co/spaces/briaai/BRIA-RMBG-1.4).
<iframe
src="https://briaai-bria-rmbg-1-4.hf.space"
frameborder="0"
width="850"
height="450"
></iframe>
For example, remove the background from this image of a pair of shoes.
[Stable Diffusion XL (SDXL)](../using-diffusers/sdxl) models work best with 1024x1024 images, but you can resize the image to any size as long as your hardware has enough memory to support it. The transparent background in the image should also be replaced with a white background. Create a function (like the one below) that scales and pastes the image onto a white background.
To avoid adding unwanted extra details, use the ZoeDepth estimator to provide additional guidance during generation and to ensure the shoes remain consistent with the original image.
Once your image is ready, you can generate content in the white area around the shoes with [controlnet-inpaint-dreamer-sdxl](https://hf.co/destitech/controlnet-inpaint-dreamer-sdxl), a SDXL ControlNet trained for inpainting.
Load the inpainting ControlNet, ZoeDepth model, VAE and pass them to the [`StableDiffusionXLControlNetPipeline`]. Then you can create an optional `generate_image` function (for convenience) to outpaint an initial image.
> Now is a good time to free up some memory if you're running low!
>
> ```py
> pipeline=None
> torch.cuda.empty_cache()
> ```
Now that you have an initial outpainted image, load the [`StableDiffusionXLInpaintPipeline`] with the [RealVisXL](https://hf.co/SG161222/RealVisXL_V4.0) model to generate the final outpainted image with better quality.
Prepare a mask for the final outpainted image. To create a more natural transition between the original image and the outpainted background, blur the mask to help it blend better.
Create a better prompt and pass it to the `generate_outpaint` function to generate the final outpainted image. Again, paste the original image over the final outpainted background.
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -12,8 +12,13 @@ specific language governing permissions and limitations under the License.
# Configuration
Schedulers from [`~schedulers.scheduling_utils.SchedulerMixin`] and models from [`ModelMixin`] inherit from [`ConfigMixin`] which conveniently takes care of storing all the parameters that are
passed to their respective `__init__` methods in a JSON-configuration file.
Schedulers from [`~schedulers.scheduling_utils.SchedulerMixin`] and models from [`ModelMixin`] inherit from [`ConfigMixin`] which stores all the parameters that are passed to their respective `__init__` methods in a JSON-configuration file.
<Tip>
To use private or [gated](https://huggingface.co/docs/hub/models-gated#gated-models) models, log-in with `huggingface-cli login`.
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Pipelines
The [`DiffusionPipeline`] is the easiest way to load any pretrained diffusion pipeline from the [Hub](https://huggingface.co/models?library=diffusers) and use it for inference.
<Tip>
You shouldn't use the [`DiffusionPipeline`] class for training or finetuning a diffusion model. Individual
components (for example, [`UNetModel`] and [`UNetConditionModel`]) of diffusion pipelines are usually trained individually, so we suggest directly working with instead.
</Tip>
The pipeline type (for example [`StableDiffusionPipeline`]) of any diffusion pipeline loaded with [`~DiffusionPipeline.from_pretrained`] is automatically
detected and pipeline components are loaded and passed to the `__init__` function of the pipeline.
Any pipeline object can be saved locally with [`~DiffusionPipeline.save_pretrained`].
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# VAE Image Processor
The [`VaeImageProcessor`] provides a unified API for [`StableDiffusionPipeline`]s to prepare image inputs for VAE encoding and post-processing outputs once they're decoded. This includes transformations such as resizing, normalization, and conversion between PIL Image, PyTorch, and NumPy arrays.
All pipelines with [`VaeImageProcessor`] accept PIL Image, PyTorch tensor, or NumPy arrays as image inputs and return outputs based on the `output_type` argument by the user. You can pass encoded image latents directly to the pipeline and return latents from the pipeline as a specific output with the `output_type` argument (for example `output_type="latent"`). This allows you to take the generated latents from one pipeline and pass it to another pipeline as input without leaving the latent space. It also makes it much easier to use multiple pipelines together by passing PyTorch tensors directly between different pipelines.
## VaeImageProcessor
[[autodoc]] image_processor.VaeImageProcessor
## VaeImageProcessorLDM3D
The [`VaeImageProcessorLDM3D`] accepts RGB and depth inputs and returns RGB and depth outputs.
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -10,13 +10,6 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o
specific language governing permissions and limitations under the License.
-->
# DEIS
# Overview
Fast Sampling of Diffusion Models with Exponential Integrator.
## Overview
Original paper can be found [here](https://arxiv.org/abs/2204.13902). The original implementation can be found [here](https://github.com/qsh-zh/deis).
## DEISMultistepScheduler
[[autodoc]] DEISMultistepScheduler
The APIs in this section are more experimental and prone to breaking changes. Most of them are used internally for development, but they may also be useful to you if you're interested in building a diffusion model with some custom parts or if you're interested in some of our helper utilities for working with 🤗 Diffusers.
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# IP-Adapter
[IP-Adapter](https://hf.co/papers/2308.06721) is a lightweight adapter that enables prompting a diffusion model with an image. This method decouples the cross-attention layers of the image and text features. The image features are generated from an image encoder.
<Tip>
Learn how to load an IP-Adapter checkpoint and image in the IP-Adapter [loading](../../using-diffusers/loading_adapters#ip-adapter) guide, and you can see how to use it in the [usage](../../using-diffusers/ip_adapter) guide.
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# LoRA
LoRA is a fast and lightweight training method that inserts and trains a significantly smaller number of parameters instead of all the model parameters. This produces a smaller file (~100 MBs) and makes it easier to quickly train a model to learn a new concept. LoRA weights are typically loaded into the UNet, text encoder or both. There are two classes for loading LoRA weights:
- [`LoraLoaderMixin`] provides functions for loading and unloading, fusing and unfusing, enabling and disabling, and more functions for managing LoRA weights. This class can be used with any model.
- [`StableDiffusionXLLoraLoaderMixin`] is a [Stable Diffusion (SDXL)](../../api/pipelines/stable_diffusion/stable_diffusion_xl) version of the [`LoraLoaderMixin`] class for loading and saving LoRA weights. It can only be used with the SDXL model.
<Tip>
To learn more about how to load LoRA weights, see the [LoRA](../../using-diffusers/loading_adapters#lora) loading guide.
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# PEFT
Diffusers supports loading adapters such as [LoRA](../../using-diffusers/loading_adapters) with the [PEFT](https://huggingface.co/docs/peft/index) library with the [`~loaders.peft.PeftAdapterMixin`] class. This allows modeling classes in Diffusers like [`UNet2DConditionModel`] to load an adapter.
<Tip>
Refer to the [Inference with PEFT](../../tutorials/using_peft_for_inference.md) tutorial for an overview of how to use PEFT in Diffusers for inference.
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Single files
The [`~loaders.FromSingleFileMixin.from_single_file`] method allows you to load:
* a model stored in a single file, which is useful if you're working with models from the diffusion ecosystem, like Automatic1111, and commonly rely on a single-file layout to store and share models
* a model stored in their originally distributed layout, which is useful if you're working with models finetuned with other services, and want to load it directly into Diffusers model objects and pipelines
> [!TIP]
> Read the [Model files and layouts](../../using-diffusers/other-formats) guide to learn more about the Diffusers-multifolder layout versus the single-file layout, and how to load models stored in these different layouts.
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Textual Inversion
Textual Inversion is a training method for personalizing models by learning new text embeddings from a few example images. The file produced from training is extremely small (a few KBs) and the new embeddings can be loaded into the text encoder.
[`TextualInversionLoaderMixin`] provides a function for loading Textual Inversion embeddings from Diffusers and Automatic1111 into the text encoder and loading a special token to activate the embeddings.
<Tip>
To learn more about how to load Textual Inversion embeddings, see the [Textual Inversion](../../using-diffusers/loading_adapters#textual-inversion) loading guide.
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# UNet
Some training methods - like LoRA and Custom Diffusion - typically target the UNet's attention layers, but these training methods can also target other non-attention layers. Instead of training all of a model's parameters, only a subset of the parameters are trained, which is faster and more efficient. This class is useful if you're *only* loading weights into a UNet. If you need to load weights into the text encoder or a text encoder and UNet, try using the [`~loaders.LoraLoaderMixin.load_lora_weights`] function instead.
The [`UNet2DConditionLoadersMixin`] class provides functions for loading and saving weights, fusing and unfusing LoRAs, disabling and enabling LoRAs, and setting and deleting adapters.
<Tip>
To learn more about how to load LoRA weights, see the [LoRA](../../using-diffusers/loading_adapters#lora) loading guide.
| `diffusers.logging.CRITICAL` or `diffusers.logging.FATAL` | 50 | only report the most critical errors |
| `diffusers.logging.ERROR` | 40 | only report errors |
| `diffusers.logging.WARNING` or `diffusers.logging.WARN` | 30 | only report errors and warnings (default) |
| `diffusers.logging.INFO` | 20 | only report errors, warnings, and basic information |
| `diffusers.logging.DEBUG` | 10 | report all information |
By default, `tqdm` progress bars are displayed during model download. [`logging.disable_progress_bar`] and [`logging.enable_progress_bar`] are used to enable or disable this behavior.
- `diffusers.logging.DEBUG` (int value, 10): report all information.
By default, `tqdm` progress bars will be displayed during model download. [`logging.disable_progress_bar`] and [`logging.enable_progress_bar`] can be used to suppress or unsuppress this behavior.
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Models
Diffusers contains pretrained models for popular algorithms and modules for creating the next set of diffusion models.
The primary function of these models is to denoise an input sample, by modeling the distribution \\(p_{\theta}(x_{t-1}|x_{t})\\).
The models are built on the base class ['ModelMixin'] that is a `torch.nn.module` with basic functionality for saving and loading models both locally and from the HuggingFace hub.
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# AsymmetricAutoencoderKL
Improved larger variational autoencoder (VAE) model with KL loss for inpainting task: [Designing a Better Asymmetric VQGAN for StableDiffusion](https://arxiv.org/abs/2306.04632) by Zixin Zhu, Xuelu Feng, Dongdong Chen, Jianmin Bao, Le Wang, Yinpeng Chen, Lu Yuan, Gang Hua.
The abstract from the paper is:
*StableDiffusion is a revolutionary text-to-image generator that is causing a stir in the world of image generation and editing. Unlike traditional methods that learn a diffusion model in pixel space, StableDiffusion learns a diffusion model in the latent space via a VQGAN, ensuring both efficiency and quality. It not only supports image generation tasks, but also enables image editing for real images, such as image inpainting and local editing. However, we have observed that the vanilla VQGAN used in StableDiffusion leads to significant information loss, causing distortion artifacts even in non-edited image regions. To this end, we propose a new asymmetric VQGAN with two simple designs. Firstly, in addition to the input from the encoder, the decoder contains a conditional branch that incorporates information from task-specific priors, such as the unmasked image region in inpainting. Secondly, the decoder is much heavier than the encoder, allowing for more detailed recovery while only slightly increasing the total inference cost. The training cost of our asymmetric VQGAN is cheap, and we only need to retrain a new asymmetric decoder while keeping the vanilla VQGAN encoder and StableDiffusion unchanged. Our asymmetric VQGAN can be widely used in StableDiffusion-based inpainting and local editing methods. Extensive experiments demonstrate that it can significantly improve the inpainting and editing performance, while maintaining the original text-to-image capability. The code is available at https://github.com/buxiangzhiren/Asymmetric_VQGAN*
Evaluation results can be found in section 4.1 of the original paper.
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Tiny AutoEncoder
Tiny AutoEncoder for Stable Diffusion (TAESD) was introduced in [madebyollin/taesd](https://github.com/madebyollin/taesd) by Ollin Boer Bohan. It is a tiny distilled version of Stable Diffusion's VAE that can quickly decode the latents in a [`StableDiffusionPipeline`] or [`StableDiffusionXLPipeline`] almost instantly.
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# AutoencoderKL
The variational autoencoder (VAE) model with KL loss was introduced in [Auto-Encoding Variational Bayes](https://arxiv.org/abs/1312.6114v11) by Diederik P. Kingma and Max Welling. The model is used in 🤗 Diffusers to encode images into latents and to decode latent representations into images.
The abstract from the paper is:
*How can we perform efficient inference and learning in directed probabilistic models, in the presence of continuous latent variables with intractable posterior distributions, and large datasets? We introduce a stochastic variational inference and learning algorithm that scales to large datasets and, under some mild differentiability conditions, even works in the intractable case. Our contributions are two-fold. First, we show that a reparameterization of the variational lower bound yields a lower bound estimator that can be straightforwardly optimized using standard stochastic gradient methods. Second, we show that for i.i.d. datasets with continuous latent variables per datapoint, posterior inference can be made especially efficient by fitting an approximate inference model (also called a recognition model) to the intractable posterior using the proposed lower bound estimator. Theoretical advantages are reflected in experimental results.*
## Loading from the original format
By default the [`AutoencoderKL`] should be loaded with [`~ModelMixin.from_pretrained`], but it can also be loaded
from the original format using [`FromOriginalModelMixin.from_single_file`] as follows:
```py
fromdiffusersimportAutoencoderKL
url="https://huggingface.co/stabilityai/sd-vae-ft-mse-original/blob/main/vae-ft-mse-840000-ema-pruned.safetensors"# can also be a local file
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Consistency Decoder
Consistency decoder can be used to decode the latents from the denoising UNet in the [`StableDiffusionPipeline`]. This decoder was introduced in the [DALL-E 3 technical report](https://openai.com/dall-e-3).
The original codebase can be found at [openai/consistencydecoder](https://github.com/openai/consistencydecoder).
<Tip warning={true}>
Inference is only supported for 2 iterations as of now.
</Tip>
The pipeline could not have been contributed without the help of [madebyollin](https://github.com/madebyollin) and [mrsteyk](https://github.com/mrsteyk) from [this issue](https://github.com/openai/consistencydecoder/issues/1).
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# ControlNetModel
The ControlNet model was introduced in [Adding Conditional Control to Text-to-Image Diffusion Models](https://huggingface.co/papers/2302.05543) by Lvmin Zhang, Anyi Rao, Maneesh Agrawala. It provides a greater degree of control over text-to-image generation by conditioning the model on additional inputs such as edge maps, depth maps, segmentation maps, and keypoints for pose detection.
The abstract from the paper is:
*We present ControlNet, a neural network architecture to add spatial conditioning controls to large, pretrained text-to-image diffusion models. ControlNet locks the production-ready large diffusion models, and reuses their deep and robust encoding layers pretrained with billions of images as a strong backbone to learn a diverse set of conditional controls. The neural architecture is connected with "zero convolutions" (zero-initialized convolution layers) that progressively grow the parameters from zero and ensure that no harmful noise could affect the finetuning. We test various conditioning controls, eg, edges, depth, segmentation, human pose, etc, with Stable Diffusion, using single or multiple conditions, with or without prompts. We show that the training of ControlNets is robust with small (<50k) and large (>1m) datasets. Extensive results show that ControlNet may facilitate wider applications to control image diffusion models.*
## Loading from the original format
By default the [`ControlNetModel`] should be loaded with [`~ModelMixin.from_pretrained`], but it can also be loaded
from the original format using [`FromOriginalModelMixin.from_single_file`] as follows:
<!--Copyright 2024 The HuggingFace Team and Tencent Hunyuan Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# HunyuanDiT2DControlNetModel
HunyuanDiT2DControlNetModel is an implementation of ControlNet for [Hunyuan-DiT](https://arxiv.org/abs/2405.08748).
ControlNet was introduced in [Adding Conditional Control to Text-to-Image Diffusion Models](https://huggingface.co/papers/2302.05543) by Lvmin Zhang, Anyi Rao, and Maneesh Agrawala.
With a ControlNet model, you can provide an additional control image to condition and control Hunyuan-DiT generation. For example, if you provide a depth map, the ControlNet model generates an image that'll preserve the spatial information from the depth map. It is a more flexible and accurate way to control the image generation process.
The abstract from the paper is:
*We present ControlNet, a neural network architecture to add spatial conditioning controls to large, pretrained text-to-image diffusion models. ControlNet locks the production-ready large diffusion models, and reuses their deep and robust encoding layers pretrained with billions of images as a strong backbone to learn a diverse set of conditional controls. The neural architecture is connected with "zero convolutions" (zero-initialized convolution layers) that progressively grow the parameters from zero and ensure that no harmful noise could affect the finetuning. We test various conditioning controls, eg, edges, depth, segmentation, human pose, etc, with Stable Diffusion, using single or multiple conditions, with or without prompts. We show that the training of ControlNets is robust with small (<50k) and large (>1m) datasets. Extensive results show that ControlNet may facilitate wider applications to control image diffusion models.*
This code is implemented by Tencent Hunyuan Team. You can find pre-trained checkpoints for Hunyuan-DiT ControlNets on [Tencent Hunyuan](https://huggingface.co/Tencent-Hunyuan).
## Example For Loading HunyuanDiT2DControlNetModel
<!--Copyright 2024 The HuggingFace Team and The InstantX Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# SD3ControlNetModel
SD3ControlNetModel is an implementation of ControlNet for Stable Diffusion 3.
The ControlNet model was introduced in [Adding Conditional Control to Text-to-Image Diffusion Models](https://huggingface.co/papers/2302.05543) by Lvmin Zhang, Anyi Rao, Maneesh Agrawala. It provides a greater degree of control over text-to-image generation by conditioning the model on additional inputs such as edge maps, depth maps, segmentation maps, and keypoints for pose detection.
The abstract from the paper is:
*We present ControlNet, a neural network architecture to add spatial conditioning controls to large, pretrained text-to-image diffusion models. ControlNet locks the production-ready large diffusion models, and reuses their deep and robust encoding layers pretrained with billions of images as a strong backbone to learn a diverse set of conditional controls. The neural architecture is connected with "zero convolutions" (zero-initialized convolution layers) that progressively grow the parameters from zero and ensure that no harmful noise could affect the finetuning. We test various conditioning controls, eg, edges, depth, segmentation, human pose, etc, with Stable Diffusion, using single or multiple conditions, with or without prompts. We show that the training of ControlNets is robust with small (<50k) and large (>1m) datasets. Extensive results show that ControlNet may facilitate wider applications to control image diffusion models.*
## Loading from the original format
By default the [`SD3ControlNetModel`] should be loaded with [`~ModelMixin.from_pretrained`].
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Models
🤗 Diffusers provides pretrained models for popular algorithms and modules to create custom diffusion systems. The primary function of models is to denoise an input sample as modeled by the distribution \\(p_{\theta}(x_{t-1}|x_{t})\\).
All models are built from the base [`ModelMixin`] class which is a [`torch.nn.Module`](https://pytorch.org/docs/stable/generated/torch.nn.Module.html) providing basic functionality for saving and loading models, locally and from the Hugging Face Hub.
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# PixArtTransformer2DModel
A Transformer model for image-like data from [PixArt-Alpha](https://huggingface.co/papers/2310.00426) and [PixArt-Sigma](https://huggingface.co/papers/2403.04692).
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# PriorTransformer
The Prior Transformer was originally introduced in [Hierarchical Text-Conditional Image Generation with CLIP Latents](https://huggingface.co/papers/2204.06125) by Ramesh et al. It is used to predict CLIP image embeddings from CLIP text embeddings; image embeddings are predicted through a denoising diffusion process.
The abstract from the paper is:
*Contrastive models like CLIP have been shown to learn robust representations of images that capture both semantics and style. To leverage these representations for image generation, we propose a two-stage model: a prior that generates a CLIP image embedding given a text caption, and a decoder that generates an image conditioned on the image embedding. We show that explicitly generating image representations improves image diversity with minimal loss in photorealism and caption similarity. Our decoders conditioned on image representations can also produce variations of an image that preserve both its semantics and style, while varying the non-essential details absent from the image representation. Moreover, the joint embedding space of CLIP enables language-guided image manipulations in a zero-shot fashion. We use diffusion models for the decoder and experiment with both autoregressive and diffusion models for the prior, finding that the latter are computationally more efficient and produce higher-quality samples.*
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Transformer2DModel
A Transformer model for image-like data from [CompVis](https://huggingface.co/CompVis) that is based on the [Vision Transformer](https://huggingface.co/papers/2010.11929) introduced by Dosovitskiy et al. The [`Transformer2DModel`] accepts discrete (classes of vector embeddings) or continuous (actual embeddings) inputs.
When the input is **continuous**:
1. Project the input and reshape it to `(batch_size, sequence_length, feature_dimension)`.
2. Apply the Transformer blocks in the standard way.
3. Reshape to image.
When the input is **discrete**:
<Tip>
It is assumed one of the input classes is the masked latent pixel. The predicted classes of the unnoised image don't contain a prediction for the masked pixel because the unnoised image cannot be masked.
</Tip>
1. Convert input (classes of latent pixels) to embeddings and apply positional embeddings.
2. Apply the Transformer blocks in the standard way.
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
@@ -10,11 +10,14 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o
specific language governing permissions and limitations under the License.
-->
# improved pseudo numerical methods for diffusion models (iPNDM)
# TransformerTemporalModel
## Overview
A Transformer model for video-like data.
Original implementation can be found [here](https://github.com/crowsonkb/v-diffusion-pytorch/blob/987f8985e38208345c1959b0ea767a625831cc9b/diffusion/sampling.py#L296).
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# UNetMotionModel
The [UNet](https://huggingface.co/papers/1505.04597) model was originally introduced by Ronneberger et al for biomedical image segmentation, but it is also commonly used in 🤗 Diffusers because it outputs images that are the same size as the input. It is one of the most important components of a diffusion system because it facilitates the actual diffusion process. There are several variants of the UNet model in 🤗 Diffusers, depending on it's number of dimensions and whether it is a conditional model or not. This is a 2D UNet model.
The abstract from the paper is:
*There is large consent that successful training of deep networks requires many thousand annotated training samples. In this paper, we present a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently. The architecture consists of a contracting path to capture context and a symmetric expanding path that enables precise localization. We show that such a network can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks. Using the same network trained on transmitted light microscopy images (phase contrast and DIC) we won the ISBI cell tracking challenge 2015 in these categories by a large margin. Moreover, the network is fast. Segmentation of a 512x512 image takes less than a second on a recent GPU. The full implementation (based on Caffe) and the trained networks are available at http://lmb.informatik.uni-freiburg.de/people/ronneber/u-net.*
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.