Compare commits

..

1298 Commits

Author SHA1 Message Date
Aryan
8335a733ba update 2024-11-20 03:34:19 +01:00
Aryan
aac7abaacf update 2024-11-20 03:05:15 +01:00
Aryan
ff13deea58 try fixing attention 2024-11-20 03:03:57 +01:00
Dhruv Nair
30dd9f6845 update 2024-11-18 17:50:51 +01:00
Dhruv Nair
27f81bd54f update 2024-11-18 17:30:24 +01:00
Parag Ekbote
1dbd26fa23 Notebooks for Community Scripts Examples (#9905)
* Add Notebooks on Community Scripts
2024-11-12 14:08:48 -10:00
Eliseu Silva
dac623b59f Feature IP Adapter Xformers Attention Processor (#9881)
* Feature IP Adapter Xformers Attention Processor: this fix error loading incorrect attention processor when setting Xformers attn after load ip adapter scale, issues: #8863 #8872
2024-11-08 15:40:51 -10:00
Sayak Paul
8d6dc2be5d Revert "[Flux] reduce explicit device transfers and typecasting in flux." (#9896)
Revert "[Flux] reduce explicit device transfers and typecasting in flux. (#9817)"

This reverts commit 5588725e8e.
2024-11-08 13:35:38 -10:00
Sayak Paul
d720b2132e [Advanced LoRA v1.5] fix: gradient unscaling problem (#7018)
fix: gradient unscaling problem

Co-authored-by: Linoy Tsaban <57615435+linoytsaban@users.noreply.github.com>
2024-11-08 19:31:43 -04:00
SahilCarterr
9cc96a64f1 [FIX] Fix TypeError in DreamBooth SDXL when use_dora is False (#9879)
* fix use_dora

* fix style and quality

* fix use_dora with peft version

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-11-08 19:09:24 -04:00
Michael Tkachuk
5b972fbd6a Enabling gradient checkpointing in eval() mode (#9878)
* refactored
2024-11-08 09:03:26 -10:00
SahilCarterr
0be52c07d6 [fix] Replaced shutil.copy with shutil.copyfile (#9885)
fix shutil.copy
2024-11-08 08:32:32 -10:00
Dhruv Nair
1b392544c7 Improve downloads of sharded variants (#9869)
* update

* update

* update

* update

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-11-08 17:49:00 +05:30
Sayak Paul
5588725e8e [Flux] reduce explicit device transfers and typecasting in flux. (#9817)
reduce explicit device transfers and typecasting in flux.
2024-11-06 22:33:39 -04:00
Sayak Paul
ded3db164b [Core] introduce controlnet module (#8768)
* move vae flax module.

* controlnet module.

* prepare for PR.

* revert a commit

* gracefully deprecate controlnet deps.

* fix

* fix doc path

* fix-copies

* fix path

* style

* style

* conflicts

* fix

* fix-copies

* sparsectrl.

* updates

* fix

* updates

* updates

* updates

* fix

---------

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2024-11-06 22:08:55 -04:00
SahilCarterr
76b7d86a9a Updated _encode_prompt_with_clip and encode_prompt in train_dreamboth_sd3 (#9800)
* updated encode prompt and clip encod prompt


---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-11-05 15:08:50 -10:00
Sookwan Han
e2b3c248d8 Add new community pipeline for 'Adaptive Mask Inpainting', introduced in [ECCV2024] ComA (#9228)
* Add new community pipeline for 'Adaptive Mask Inpainting', introduced in [ECCV2024] Beyond the Contact: Discovering Comprehensive Affordance for 3D Objects from Pre-trained 2D Diffusion Models
2024-11-05 15:05:58 -10:00
Vahid Askari
a03bf4a531 Fix: Remove duplicated comma in distributed_inference.md (#9868)
Fix: Remove duplicated comma

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-11-05 23:37:11 +01:00
SahilCarterr
08ac5cbc7f [Fix] Test of sd3 lora (#9843)
* fix test

* fix test asser

* fix format

* Update test_lora_layers_sd3.py
2024-11-05 11:05:20 -10:00
Aryan
3f329a426a [core] Mochi T2V (#9769)
* update

* udpate

* update transformer

* make style

* fix

* add conversion script

* update

* fix

* update

* fix

* update

* fixes

* make style

* update

* update

* update

* init

* update

* update

* add

* up

* up

* up

* update

* mochi transformer

* remove original implementation

* make style

* update inits

* update conversion script

* docs

* Update src/diffusers/pipelines/mochi/pipeline_mochi.py

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

* Update src/diffusers/pipelines/mochi/pipeline_mochi.py

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

* fix docs

* pipeline fixes

* make style

* invert sigmas in scheduler; fix pipeline

* fix pipeline num_frames

* flip proj and gate in swiglu

* make style

* fix

* make style

* fix tests

* latent mean and std fix

* update

* cherry-pick 1069d210e1

* remove additional sigma already handled by flow match scheduler

* fix

* remove hardcoded value

* replace conv1x1 with linear

* Update src/diffusers/pipelines/mochi/pipeline_mochi.py

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

* framewise decoding and conv_cache

* make style

* Apply suggestions from code review

* mochi vae encoder changes

* rebase correctly

* Update scripts/convert_mochi_to_diffusers.py

* fix tests

* fixes

* make style

* update

* make style

* update

* add framewise and tiled encoding

* make style

* make original vae implementation behaviour the default; note: framewise encoding does not work

* remove framewise encoding implementation due to presence of attn layers

* fight test 1

* fight test 2

---------

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
Co-authored-by: yiyixuxu <yixu310@gmail.com>
2024-11-05 20:33:41 +05:30
RogerSinghChugh
a3cc641f78 Refac training utils.py (#9815)
* Refac training utils.py

* quality

---------

Co-authored-by: sayakpaul <spsayakpaul@gmail.com>
2024-11-04 09:40:44 -08:00
Sayak Paul
13e8fdecda [feat] add load_lora_adapter() for compatible models (#9712)
* add first draft.

* fix

* updates.

* updates.

* updates

* updates

* updates.

* fix-copies

* lora constants.

* add tests

* Apply suggestions from code review

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

* docstrings.

---------

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
2024-11-02 09:50:39 +05:30
Dorsa Rohani
c10f875ff0 Add Diffusion Policy for Reinforcement Learning (#9824)
* enable cpu ability

* model creation + comprehensive testing

* training + tests

* all tests working

* remove unneeded files + clarify docs

* update train tests

* update readme.md

* remove data from gitignore

* undo cpu enabled option

* Update README.md

* update readme

* code quality fixes

* diffusion policy example

* update readme

* add pretrained model weights + doc

* add comment

* add documentation

* add docstrings

* update comments

* update readme

* fix code quality

* Update examples/reinforcement_learning/README.md

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* Update examples/reinforcement_learning/diffusion_policy.py

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* suggestions + safe globals for weights_only=True

* suggestions + safe weights loading

* fix code quality

* reformat file

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-11-02 09:18:44 +05:30
Leo Jiang
a98a839de7 Reduce Memory Cost in Flux Training (#9829)
* Improve NPU performance

* Improve NPU performance

* Improve NPU performance

* Improve NPU performance

* [bugfix] bugfix for npu free memory

* [bugfix] bugfix for npu free memory

* [bugfix] bugfix for npu free memory

* Reduce memory cost for flux training process

---------

Co-authored-by: 蒋硕 <jiangshuo9@h-partners.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-11-01 12:19:32 +05:30
Boseong Jeon
3deed729e6 Handling mixed precision for dreambooth flux lora training (#9565)
Handling mixed precision and add unwarp

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Linoy Tsaban <57615435+linoytsaban@users.noreply.github.com>
2024-11-01 10:16:05 +05:30
ScilenceForest
7ffbc2525f Update train_controlnet_flux.py,Fix size mismatch issue in validation (#9679)
Update train_controlnet_flux.py

Fix the problem of inconsistency between size of image and size of validation_image which causes np.stack to report error.

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-11-01 10:15:10 +05:30
SahilCarterr
f55f1f7ee5 Fixes EMAModel "from_pretrained" method (#9779)
* fix from_pretrained and added test

* make style

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-11-01 09:20:19 +05:30
Leo Jiang
9dcac83057 NPU Adaption for FLUX (#9751)
* NPU implementation for FLUX

* NPU implementation for FLUX

* NPU implementation for FLUX

* NPU implementation for FLUX

* NPU implementation for FLUX

* NPU implementation for FLUX

* NPU implementation for FLUX

* NPU implementation for FLUX

* NPU implementation for FLUX

* NPU implementation for FLUX

* NPU implementation for FLUX

* NPU implementation for FLUX

* NPU implementation for FLUX

* NPU implementation for FLUX

---------

Co-authored-by: 蒋硕 <jiangshuo9@h-partners.com>
2024-11-01 09:03:15 +05:30
Abhipsha Das
c75431843f [Model Card] standardize advanced diffusion training sd15 lora (#7613)
* modelcard generation edit

* add missed tag

* fix param name

* fix var

* change str to dict

* add use_dora check

* use correct tags for lora

* make style && make quality

---------

Co-authored-by: Aryan <aryan@huggingface.co>
2024-11-01 03:23:00 +05:30
YiYi Xu
d2e5cb3c10 Revert "[LoRA] fix: lora loading when using with a device_mapped mode… (#9823)
Revert "[LoRA] fix: lora loading when using with a device_mapped model. (#9449)"

This reverts commit 41e4779d98.
2024-10-31 08:19:32 -10:00
Sayak Paul
41e4779d98 [LoRA] fix: lora loading when using with a device_mapped model. (#9449)
* fix: lora loading when using with a device_mapped model.

* better attibutung

* empty

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* minors

* better error messages.

* fix-copies

* add: tests, docs.

* add hardware note.

* quality

* Update docs/source/en/training/distributed_inference.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* fixes

* skip properly.

* fixes

---------

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-10-31 21:17:41 +05:30
Sayak Paul
ff182ad669 [CI] add a big GPU marker to run memory-intensive tests separately on CI (#9691)
* add a marker for big gpu tests

* update

* trigger on PRs temporarily.

* onnx

* fix

* total memory

* fixes

* reduce memory threshold.

* bigger gpu

* empty

* g6e

* Apply suggestions from code review

* address comments.

* fix

* fix

* fix

* fix

* fix

* okay

* further reduce.

* updates

* remove

* updates

* updates

* updates

* updates

* fixes

* fixes

* updates.

* fix

* workflow fixes.

---------

Co-authored-by: Aryan <aryan@huggingface.co>
2024-10-31 18:44:34 +05:30
Sayak Paul
4adf6affbb [Tests] clean up and refactor gradient checkpointing tests (#9494)
* check.

* fixes

* fixes

* updates

* fixes

* fixes
2024-10-31 18:24:19 +05:30
Sayak Paul
8ce37ab055 [training] use the lr when using 8bit adam. (#9796)
* use the lr when using 8bit adam.

* remove lr as we pack it in params_to_optimize.

---------

Co-authored-by: Linoy Tsaban <57615435+linoytsaban@users.noreply.github.com>
2024-10-31 15:51:42 +05:30
Sayak Paul
09b8aebd67 [training] fixes to the quantization training script and add AdEMAMix optimizer as an option (#9806)
* fixes

* more fixes.
2024-10-31 15:46:00 +05:30
Sayak Paul
c1d4a0dded [CI] add new runner for testing (#9699)
new runner.
2024-10-31 14:58:05 +05:30
Aryan
9a92b8177c Allegro VAE fix (#9811)
fix
2024-10-30 18:04:15 +05:30
Aryan
0d1d267b12 [core] Allegro T2V (#9736)
* update

* refactor transformer part 1

* refactor part 2

* refactor part 3

* make style

* refactor part 4; modeling tests

* make style

* refactor part 5

* refactor part 6

* gradient checkpointing

* pipeline tests (broken atm)

* update

* add coauthor

Co-Authored-By: Huan Yang <hyang@fastmail.com>

* refactor part 7

* add docs

* make style

* add coauthor

Co-Authored-By: YiYi Xu <yixu310@gmail.com>

* make fix-copies

* undo unrelated change

* revert changes to embeddings, normalization, transformer

* refactor part 8

* make style

* refactor part 9

* make style

* fix

* apply suggestions from review

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* update example

* remove attention mask for self-attention

* update

* copied from

* update

* update

---------

Co-authored-by: Huan Yang <hyang@fastmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-10-29 13:14:36 +05:30
Raul Ciotescu
c5376c5695 adds the pipeline for pixart alpha controlnet (#8857)
* add the controlnet pipeline for pixart alpha

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: junsongc <cjs1020440147@icloud.com>
2024-10-28 08:48:04 -10:00
Linoy Tsaban
743a5697f2 [flux dreambooth lora training] make LoRA target modules configurable + small bug fix (#9646)
* make lora target modules configurable and change the default

* style

* make lora target modules configurable and change the default

* fix bug when using prodigy and training te

* fix mixed precision training as  proposed in https://github.com/huggingface/diffusers/pull/9565 for full dreambooth as well

* add test and notes

* style

* address sayaks comments

* style

* fix test

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-10-28 17:27:41 +02:00
Linoy Tsaban
db5b6a9630 [SD 3.5 Dreambooth LoRA] support configurable training block & layers (#9762)
* configurable layers

* configurable layers

* update README

* style

* add test

* style

* add layer test, update readme, add nargs

* readme

* test style

* remove print, change nargs

* test arg change

* style

* revert nargs 2/2

* address sayaks comments

* style

* address sayaks comments
2024-10-28 16:07:54 +02:00
Biswaroop
493aa74312 [Fix] remove setting lr for T5 text encoder when using prodigy in flux dreambooth lora script (#9473)
* fix: removed setting of text encoder lr for T5 as it's not being tuned

* fix: removed setting of text encoder lr for T5 as it's not being tuned

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Linoy Tsaban <57615435+linoytsaban@users.noreply.github.com>
2024-10-28 13:07:30 +02:00
Vinh H. Pham
3b5b1c5698 [Fix] train_dreambooth_lora_flux_advanced ValueError: unexpected save model: <class 'transformers.models.t5.modeling_t5.T5EncoderModel'> (#9777)
fix save state te T5
2024-10-28 12:52:27 +02:00
Sayak Paul
fddbab7993 [research_projects] Update README.md to include a note about NF5 T5-xxl (#9775)
Update README.md
2024-10-26 22:13:03 +09:00
SahilCarterr
298ab6eb01 Added Support of Xlabs controlnet to FluxControlNetInpaintPipeline (#9770)
* added xlabs support
2024-10-25 11:50:55 -10:00
Ina
73b59f5203 [refactor] enhance readability of flux related pipelines (#9711)
* flux pipline: readability enhancement.
2024-10-25 11:01:51 -10:00
Jingya HUANG
52d4449810 Add a doc for AWS Neuron in Diffusers (#9766)
* start draft

* add doc

* Update docs/source/en/optimization/neuron.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/optimization/neuron.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/optimization/neuron.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/optimization/neuron.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/optimization/neuron.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/optimization/neuron.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/optimization/neuron.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* bref intro of ON

* Update docs/source/en/optimization/neuron.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-10-25 08:24:58 -07:00
Sayak Paul
df073ba137 [research_projects] add flux training script with quantization (#9754)
* add flux training script with quantization

* remove exclamation
2024-10-26 00:07:57 +09:00
Leo Jiang
94643fac8a [bugfix] bugfix for npu free memory (#9640)
* Improve NPU performance

* Improve NPU performance

* Improve NPU performance

* Improve NPU performance

* [bugfix] bugfix for npu free memory

* [bugfix] bugfix for npu free memory

* [bugfix] bugfix for npu free memory

---------

Co-authored-by: 蒋硕 <jiangshuo9@h-partners.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-10-25 23:35:19 +09:00
Zhiyang Shen
435f6b7e47 [Docs] fix docstring typo in SD3 pipeline (#9765)
* fix docstring typo in SD3 pipeline

* fix docstring typo in SD3 pipeline
2024-10-25 16:33:35 +05:30
Sayak Paul
1d1e1a2888 Some minor updates to the nightly and push workflows (#9759)
* move lora integration tests to nightly./

* remove slow marker in the workflow where not needed.
2024-10-24 23:49:09 +09:00
Rachit Shah
24c7d578ba config attribute not foud error for FluxImagetoImage Pipeline for multi controlnet solved (#9586)
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-10-23 10:33:29 -10:00
Linoy Tsaban
bfa0aa4ff2 [SD3-5 dreambooth lora] update model cards (#9749)
* improve readme

* style

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-10-23 23:16:53 +03:00
Álvaro Somoza
ab1b7b2080 [Official callbacks] SDXL Controlnet CFG Cutoff (#9311)
* initial proposal

* style
2024-10-23 13:21:56 -03:00
Fanli Lin
9366c8f84b fix bug in require_accelerate_version_greater (#9746)
fix bug
2024-10-23 10:01:33 +05:30
Sayak Paul
e45c25d03a post-release 0.31.0 (#9742)
* post-release

* style
2024-10-22 20:42:30 +05:30
Dhruv Nair
76c00c7236 is_safetensors_compatible fix (#9741)
update
2024-10-22 19:35:03 +05:30
Dhruv Nair
0d9d98fe5f Fix typos (#9739)
* update

* update

* update

* update

* update

* update
2024-10-22 16:12:28 +05:30
Sayak Paul
60ffa84253 [bitsandbbytes] follow-ups (#9730)
* bnb follow ups.

* add a warning when dtypes mismatch.

* fx-copies

* clear cache.

* check_if_quantized_param

* add a check on shape.

* updates

* docs

* improve readability.

* resources.

* fix
2024-10-22 16:00:05 +05:30
Álvaro Somoza
0f079b932d [Fix] Using sharded checkpoints with gated repositories (#9737)
fix
2024-10-22 01:33:52 -03:00
Yu Zheng
b0ffe92230 Update sd3 controlnet example (#9735)
* use make_image_grid in diffusers.utils

* use checkpoint on the Hub
2024-10-22 09:02:16 +05:30
Tolga Cangöz
1b64772b79 Fix schedule_shifted_power usage in 🪆Matryoshka Diffusion Models (#9723)
* [matryoshka.py] Add schedule_shifted_power attribute and update get_schedule_shifted method
2024-10-21 14:23:50 -10:00
YiYi Xu
2d280f173f fix singlestep dpm tests (#9716)
fix

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-10-21 13:27:01 -10:00
G.O.D
63a0c9e5f7 [bugfix] reduce float value error when adding noise (#9004)
* Update train_controlnet.py

reduce float value error for bfloat16

* Update train_controlnet_sdxl.py

* style

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: yiyixuxu <yixu310@gmail.com>
2024-10-21 13:26:05 -10:00
YiYi Xu
e2d037bbf1 minor doc/test update (#9734)
* update some docs and tests!

---------

Co-authored-by: Aryan <contact.aryanvs@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Aryan <aryan@huggingface.co>
Co-authored-by: apolinário <joaopaulo.passos@gmail.com>
2024-10-21 13:06:13 -10:00
timdalxx
bcd61fd349 [docs] add docstrings in pipline_stable_diffusion.py (#9590)
* fix the issue on flux dreambooth lora training

* update : origin main code

* docs: update pipeline_stable_diffusion docstring

* docs: update pipeline_stable_diffusion docstring

* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* fix: style

* fix: style

* fix: copies

* make fix-copies

* remove extra newline

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Aryan <aryan@huggingface.co>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-10-21 09:39:20 -07:00
Sayak Paul
d27ecc5960 [Docs] docs to xlabs controlnets. (#9688)
* docs to xlabs controlnets.

Co-authored-by: Anzhella Pankratova <son0shad@gmail.com>

* Apply suggestions from code review

Co-authored-by: Anzhella Pankratova <54744846+Anghellia@users.noreply.github.com>

---------

Co-authored-by: Anzhella Pankratova <son0shad@gmail.com>
Co-authored-by: Anzhella Pankratova <54744846+Anghellia@users.noreply.github.com>
2024-10-21 09:38:22 -07:00
Chenyu Li
6b915672f4 Fix typo in cogvideo pipeline (#9722)
Fix type in cogvideo pipeline
2024-10-21 21:39:39 +05:30
Sayak Paul
b821f006d0 [Quantization] Add quantization support for bitsandbytes (#9213)
* quantization config.

* fix-copies

* fix

* modules_to_not_convert

* add bitsandbytes utilities.

* make progress.

* fixes

* quality

* up

* up

rotary embedding refactor 2: update comments, fix dtype for use_real=False (#9312)

fix notes and dtype

up

up

* minor

* up

* up

* fix

* provide credits where due.

* make configurations work.

* fixes

* fix

* update_missing_keys

* fix

* fix

* make it work.

* fix

* provide credits to transformers.

* empty commit

* handle to() better.

* tests

* change to bnb from bitsandbytes

* fix tests

fix slow quality tests

SD3 remark

fix

complete int4 tests

add a readme to the test files.

add model cpu offload tests

warning test

* better safeguard.

* change merging status

* courtesy to transformers.

* move  upper.

* better

* make the unused kwargs warning friendlier.

* harmonize changes with https://github.com/huggingface/transformers/pull/33122

* style

* trainin tests

* feedback part i.

* Add Flux inpainting and Flux Img2Img (#9135)

---------

Co-authored-by: yiyixuxu <yixu310@gmail.com>

Update `UNet2DConditionModel`'s error messages (#9230)

* refactor

[CI] Update Single file Nightly Tests (#9357)

* update

* update

feedback.

improve README for flux dreambooth lora (#9290)

* improve readme

* improve readme

* improve readme

* improve readme

fix one uncaught deprecation warning for accessing vae_latent_channels in VaeImagePreprocessor (#9372)

deprecation warning vae_latent_channels

add mixed int8 tests and more tests to nf4.

[core] Freenoise memory improvements (#9262)

* update

* implement prompt interpolation

* make style

* resnet memory optimizations

* more memory optimizations; todo: refactor

* update

* update animatediff controlnet with latest changes

* refactor chunked inference changes

* remove print statements

* update

* chunk -> split

* remove changes from incorrect conflict resolution

* remove changes from incorrect conflict resolution

* add explanation of SplitInferenceModule

* update docs

* Revert "update docs"

This reverts commit c55a50a271.

* update docstring for freenoise split inference

* apply suggestions from review

* add tests

* apply suggestions from review

quantization docs.

docs.

* Revert "Add Flux inpainting and Flux Img2Img (#9135)"

This reverts commit 5799954dd4.

* tests

* don

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* contribution guide.

* changes

* empty

* fix tests

* harmonize with https://github.com/huggingface/transformers/pull/33546.

* numpy_cosine_distance

* config_dict modification.

* remove if config comment.

* note for load_state_dict changes.

* float8 check.

* quantizer.

* raise an error for non-True low_cpu_mem_usage values when using quant.

* low_cpu_mem_usage shenanigans when using fp32 modules.

* don't re-assign _pre_quantization_type.

* make comments clear.

* remove comments.

* handle mixed types better when moving to cpu.

* add tests to check if we're throwing warning rightly.

* better check.

* fix 8bit test_quality.

* handle dtype more robustly.

* better message when keep_in_fp32_modules.

* handle dtype casting.

* fix dtype checks in pipeline.

* fix warning message.

* Update src/diffusers/models/modeling_utils.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* mitigate the confusing cpu warning

---------

Co-authored-by: Vishnu V Jaddipal <95531133+Gothos@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-10-21 10:11:57 +05:30
Aryan
24281f8036 make deps_table_update to fix CI tests (#9720)
* update

* dummy change to trigger CI; will revert

* no deps peft

* np deps

* todo

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-10-21 09:58:26 +05:30
Sayak Paul
2a1d2f6218 [Docker] pin torch versions in the dockerfiles. (#9721)
* pin torch versions in the dockerfiles.

* more
2024-10-20 10:44:09 +05:30
Aryan
56d6d21bae [CI] pin max torch version to fix CI errors (#9709)
* pin max torch version

* update

* Update setup.py
2024-10-20 01:50:56 +05:30
hlky
89565e9171 Add prompt scheduling callback to community scripts (#9718) 2024-10-19 14:22:22 -03:00
bonlime
5d3e7bdaaa Fix bug in Textual Inversion Unloading (#9304)
* Update textual_inversion.py

* add unload test

* add comment

* fix style

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Your Name <you@example.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-10-19 02:37:32 -10:00
Linoy Tsaban
2541d141d5 [advanced flux lora script] minor updates to readme (#9705)
* fix arg naming

* fix arg naming

* fix arg naming

* fix arg naming
2024-10-18 15:35:44 +03:00
Aryan
5704376d03 [refactor] DiffusionPipeline.download (#9557)
* update

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2024-10-17 12:38:06 -10:00
Linoy Tsaban
9a7f824645 [Flux] Add advanced training script + support textual inversion inference (#9434)
* add ostris trainer to README & add cache latents of vae

* add ostris trainer to README & add cache latents of vae

* style

* readme

* add test for latent caching

* add ostris noise scheduler
9ee1ef2a0a/toolkit/samplers/custom_flowmatch_sampler.py (L95)

* style

* fix import

* style

* fix tests

* style

* --change upcasting of transformer?

* update readme according to main

* add pivotal tuning for CLIP

* fix imports, encode_prompt call,add TextualInversionLoaderMixin to FluxPipeline for inference

* TextualInversionLoaderMixin support for FluxPipeline for inference

* move changes to advanced flux script, revert canonical

* add latent caching to canonical script

* revert changes to canonical script to keep it separate from https://github.com/huggingface/diffusers/pull/9160

* revert changes to canonical script to keep it separate from https://github.com/huggingface/diffusers/pull/9160

* style

* remove redundant line and change code block placement to align with logic

* add initializer_token arg

* add transformer frac for range support from pure textual inversion to the orig pivotal tuning

* support pure textual inversion - wip

* adjustments to support pure textual inversion and transformer optimization in only part of the epochs

* fix logic when using initializer token

* fix pure_textual_inversion_condition

* fix ti/pivotal loading of last validation run

* remove embeddings loading for ti in final training run (to avoid adding huggingface hub dependency)

* support pivotal for t5

* adapt pivotal for T5 encoder

* adapt pivotal for T5 encoder and support in flux pipeline

* t5 pivotal support + support fo pivotal for clip only or both

* fix param chaining

* fix param chaining

* README first draft

* readme

* readme

* readme

* style

* fix import

* style

* add fix from https://github.com/huggingface/diffusers/pull/9419

* add to readme, change function names

* te lr changes

* readme

* change concept tokens logic

* fix indices

* change arg name

* style

* dummy test

* revert dummy test

* reorder pivoting

* add warning in case the token abstraction is not the instance prompt

* experimental - wip - specific block training

* fix documentation and token abstraction processing

* remove transformer block specification feature (for now)

* style

* fix copies

* fix indexing issue when --initializer_concept has different amounts

* add if TextualInversionLoaderMixin to all flux pipelines

* style

* fix import

* fix imports

* address review comments - remove necessary prints & comments, use pin_memory=True, use free_memory utils, unify warning and prints

* style

* logger info fix

* make lora target modules configurable and change the default

* make lora target modules configurable and change the default

* style

* make lora target modules configurable and change the default, add notes to readme

* style

* add tests

* style

* fix repo id

* add updated requirements for advanced flux

* fix indices of t5 pivotal tuning embeddings

* fix path in test

* remove `pin_memory`

* fix filename of embedding

* fix filename of embedding

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-10-17 12:22:11 +03:00
Aryan
d9029f2c59 [tests] fix name and unskip CogI2V integration test (#9683)
update

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-10-16 16:28:19 +05:30
Aryan
d204e53291 [core] improve VAE encode/decode framewise batching (#9684)
* update

* apply suggestions from review

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-10-16 16:25:41 +05:30
Aryan
8cabd4a0db [pipeline] CogVideoX-Fun Control (#9671)
* cogvideox-fun control

* make style

* make fix-copies

* karras schedulers

* Update src/diffusers/pipelines/cogvideo/pipeline_cogvideox_fun_control.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/api/pipelines/cogvideox.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* apply suggestions from review

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-10-16 16:21:09 +05:30
Jongho Choi
5783286d2b [peft] simple update when unscale (#9689)
Update peft_utils.py

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-10-16 16:10:19 +05:30
Linoy Tsaban
ee4ab23892 [SD3 dreambooth-lora training] small updates + bug fixes (#9682)
* add latent caching + smol updates

* update license

* replace with free_memory

* add --upcast_before_saving to allow saving transformer weights in lower precision

* fix models to accumulate

* fix mixed precision issue as proposed in https://github.com/huggingface/diffusers/pull/9565

* smol update to readme

* style

* fix caching latents

* style

* add tests for latent caching

* style

* fix latent caching

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-10-16 11:13:37 +03:00
Sayak Paul
cef4f65cf7 [LoRA] log a warning when there are missing keys in the LoRA loading. (#9622)
* log a warning when there are missing keys in the LoRA loading.

* handle missing keys and unexpected keys better.

* add tests

* fix-copies.

* updates

* tests

* concat warning.

* Add Differential Diffusion to Kolors (#9423)

* Added diff diff support for kolors img2img

* Fized relative imports

* Fized relative imports

* Added diff diff support for Kolors

* Fized import issues

* Added map

* Fized import issues

* Fixed naming issues

* Added diffdiff support for Kolors img2img pipeline

* Removed example docstrings

* Added map input

* Updated latents

Co-authored-by: Álvaro Somoza <asomoza@users.noreply.github.com>

* Updated `original_with_noise`

Co-authored-by: Álvaro Somoza <asomoza@users.noreply.github.com>

* Improved code quality

---------

Co-authored-by: Álvaro Somoza <asomoza@users.noreply.github.com>

* FluxMultiControlNetModel (#9647)

* tests

* Update src/diffusers/loaders/lora_pipeline.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* fix

---------

Co-authored-by: M Saqlain <118016760+saqlain2204@users.noreply.github.com>
Co-authored-by: Álvaro Somoza <asomoza@users.noreply.github.com>
Co-authored-by: hlky <hlky@hlky.ac>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-10-16 07:46:12 +05:30
Charchit Sharma
29a2c5d1ca Resolves [BUG] 'GatheredParameters' object is not callable (#9614)
* gatherparams bug

* calling context lib object

* fix

---------

Co-authored-by: Aryan <aryan@huggingface.co>
2024-10-16 06:44:10 +05:30
glide-the
0d935df67d Docs: CogVideoX (#9578)
* CogVideoX docs


---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-10-15 14:41:56 -10:00
YiYi Xu
3e9a28a8a1 [authored by @Anghellia) Add support of Xlabs Controlnets #9638 (#9687)
* Add support of Xlabs Controlnets


---------

Co-authored-by: Anzhella Pankratova <son0shad@gmail.com>
2024-10-15 12:10:45 -10:00
Aryan
2ffbb88f1c [training] CogVideoX-I2V LoRA (#9482)
* update

* update

* update

* update

* update

* add coauthor

Co-Authored-By: yuan-shenghai <963658029@qq.com>

* add coauthor

Co-Authored-By: Shenghai Yuan <140951558+SHYuanBest@users.noreply.github.com>

* update

Co-Authored-By: yuan-shenghai <963658029@qq.com>

* update

---------

Co-authored-by: yuan-shenghai <963658029@qq.com>
Co-authored-by: Shenghai Yuan <140951558+SHYuanBest@users.noreply.github.com>
2024-10-16 02:07:07 +05:30
Ahnjj_DEV
d40da7b68a Fix some documentation in ./src/diffusers/models/adapter.py (#9591)
* Fix some documentation in ./src/diffusers/models/adapter.py

* Update src/diffusers/models/adapter.py

* Update src/diffusers/models/adapter.py

* Update src/diffusers/models/adapter.py

* Update src/diffusers/models/adapter.py

* Update src/diffusers/models/adapter.py

* Update src/diffusers/models/adapter.py

* Update src/diffusers/models/adapter.py

* Update src/diffusers/models/adapter.py

* Update src/diffusers/models/adapter.py

* Update src/diffusers/models/adapter.py

* Update src/diffusers/models/adapter.py

* Update src/diffusers/models/adapter.py

* Update src/diffusers/models/adapter.py

* Update src/diffusers/models/adapter.py

* Update src/diffusers/models/adapter.py

* Update src/diffusers/models/adapter.py

* Update src/diffusers/models/adapter.py

* Update src/diffusers/models/adapter.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/diffusers/models/adapter.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/diffusers/models/adapter.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/diffusers/models/adapter.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/diffusers/models/adapter.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/diffusers/models/adapter.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/diffusers/models/adapter.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* run make style

* make style & fix

* make style : 0.1.5 version ruff

* revert changes to examples

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Aryan <aryan@huggingface.co>
2024-10-15 10:27:39 -07:00
wony617
a3e8d3f7de [docs] refactoring docstrings in models/embeddings_flax.py (#9592)
* [docs] refactoring docstrings in `models/embeddings_flax.py`

* Update src/diffusers/models/embeddings_flax.py

* make style

---------

Co-authored-by: Aryan <aryan@huggingface.co>
2024-10-15 19:15:14 +05:30
wony617
fff4be8e23 [docs] refactoring docstrings in community/hd_painter.py (#9593)
* [docs] refactoring docstrings in community/hd_painter.py

* Update examples/community/hd_painter.py

Co-authored-by: Aryan <contact.aryanvs@gmail.com>

* make style

---------

Co-authored-by: Aryan <contact.aryanvs@gmail.com>
Co-authored-by: Aryan <aryan@huggingface.co>
2024-10-15 18:50:12 +05:30
Jiwook Han
355bb641e3 [doc] Fix some docstrings in src/diffusers/training_utils.py (#9606)
* refac: docstrings in training_utils.py

* fix: manual edits

* run make style

* add docstring at cast_training_params

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-10-15 18:47:52 +05:30
Charchit Sharma
92d2baf643 refactor image_processor.py file (#9608)
* refactor image_processor file

* changes as requested

* +1 edits

* quality fix

* indent issue

---------

Co-authored-by: Aryan <aryan@huggingface.co>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-10-15 17:20:33 +05:30
0x名無し
dccf39f01e Dreambooth lora flux bug 3dtensor to 2dtensor (#9653)
* fixed issue #9350, Tensor is deprecated

* ran make style
2024-10-15 17:18:13 +05:30
Sayak Paul
99d87474fd [Chore] fix import of EntryNotFoundError. (#9676)
fix import of EntryNotFoundError.
2024-10-15 14:07:08 +05:30
Robin
79b118e863 [Fix] when run load pretain with local_files_only, local variable 'cached_folder' referenced before assignment (#9376)
Fix local variable 'cached_folder' referenced before assignment in hub_utils.py

Fix when use `local_files_only=True` with `subfolder`, local variable 'cached_folder' referenced before assignment issue.

Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-10-14 20:49:36 -10:00
hlky
9d0616189e Slight performance improvement to Euler, EDMEuler, FlowMatchHeun, KDPM2Ancestral (#9616)
* Slight performance improvement to Euler

* Slight performance improvement to EDMEuler

* Slight performance improvement to FlowMatchHeun

* Slight performance improvement to KDPM2Ancestral

* Update KDPM2AncestralDiscreteSchedulerTest

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-10-14 19:34:25 -10:00
hlky
5f0df17703 Refactor SchedulerOutput and add pred_original_sample in DPMSolverSDE, Heun, KDPM2Ancestral and KDPM2 (#9650)
Refactor SchedulerOutput and add pred_original_sample

Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-10-14 18:11:01 -10:00
hlky
957e5cabff Convert list/tuple of HunyuanDiT2DControlNetModel to HunyuanDiT2DMultiControlNetModel (#9651)
Convert list/tuple of HunyuanDiT2DControlNetModel to HunyuanDiT2DMultiControlNetModel

Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-10-14 18:09:30 -10:00
hlky
3e4c5707c3 Convert list/tuple of SD3ControlNetModel to SD3MultiControlNetModel (#9652)
Convert list/tuple of SD3ControlNetModel to SD3MultiControlNetModel

Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-10-14 17:57:34 -10:00
hlky
1bcd19e4d0 Add pred_original_sample to if not return_dict path (#9649)
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-10-14 17:56:54 -10:00
SahilCarterr
22ed39f571 Added Lora Support to SD3 Img2Img Pipeline (#9659)
* add lora
2024-10-14 11:39:20 -10:00
Tolga Cangöz
56c21150d8 [Community Pipeline] Add 🪆Matryoshka Diffusion Models (#9157) 2024-10-14 11:38:44 -10:00
Leo Jiang
5956b68a69 Improve the performance and suitable for NPU computing (#9642)
* Improve the performance and suitable for NPU

* Improve the performance and suitable for NPU computing

* Improve the performance and suitable for NPU

* Improve the performance and suitable for NPU

* Improve the performance and suitable for NPU

* Improve the performance and suitable for NPU

---------

Co-authored-by: 蒋硕 <jiangshuo9@h-partners.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-10-14 21:39:33 +05:30
Yuxuan.Zhang
8d81564b27 CogView3Plus DiT (#9570)
* merge 9588

* max_shard_size="5GB" for colab running

* conversion script updates; modeling test; refactor transformer

* make fix-copies

* Update convert_cogview3_to_diffusers.py

* initial pipeline draft

* make style

* fight bugs 🐛🪳

* add example

* add tests; refactor

* make style

* make fix-copies

* add co-author

YiYi Xu <yixu310@gmail.com>

* remove files

* add docs

* add co-author

Co-Authored-By: YiYi Xu <yixu310@gmail.com>

* fight docs

* address reviews

* make style

* make model work

* remove qkv fusion

* remove qkv fusion tets

* address review comments

* fix make fix-copies error

* remove None and TODO

* for FP16(draft)

* make style

* remove dynamic cfg

* remove pooled_projection_dim as a parameter

* fix tests

---------

Co-authored-by: Aryan <aryan@huggingface.co>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-10-14 19:30:36 +05:30
Ryan Lin
68d16f7806 Flux - soft inpainting via differential diffusion (#9268)
* Flux - soft inpainting via differential diffusion

* .

* track changes to FluxInpaintPipeline

* make mask arrangement simplier

* make style

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: Álvaro Somoza <asomoza@users.noreply.github.com>
Co-authored-by: asomoza <somoza.alvaro@gmail.com>
2024-10-14 10:07:48 -03:00
Sayak Paul
86bcbc389e [Tests] increase transformers version in test_low_cpu_mem_usage_with_loading (#9662)
increase transformers version in test_low_cpu_mem_usage_with_loading
2024-10-13 22:39:38 +05:30
Jinzhe Pan
6a5f06488c [docs] Fix xDiT doc image damage (#9655)
* docs: fix xDiT doc image damage

* doc: move xdit images to hf dataset

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-10-12 13:05:07 +05:30
Sayak Paul
c7a6d77b5f [CI] replace ubuntu version to 22.04. (#9656)
replace ubuntu version to 22.04.
2024-10-12 11:55:36 +05:30
hlky
0f8fb75c7b FluxMultiControlNetModel (#9647) 2024-10-11 14:39:19 -03:00
M Saqlain
3033f08201 Add Differential Diffusion to Kolors (#9423)
* Added diff diff support for kolors img2img

* Fized relative imports

* Fized relative imports

* Added diff diff support for Kolors

* Fized import issues

* Added map

* Fized import issues

* Fixed naming issues

* Added diffdiff support for Kolors img2img pipeline

* Removed example docstrings

* Added map input

* Updated latents

Co-authored-by: Álvaro Somoza <asomoza@users.noreply.github.com>

* Updated `original_with_noise`

Co-authored-by: Álvaro Somoza <asomoza@users.noreply.github.com>

* Improved code quality

---------

Co-authored-by: Álvaro Somoza <asomoza@users.noreply.github.com>
2024-10-11 10:47:31 -03:00
GSSun
164ec9f423 fix IsADirectoryError when running the training code for sd3_dreambooth_lora_16gb.ipynb (#9634)
Add files via upload

fix IsADirectoryError when running the training code
2024-10-11 13:33:39 +05:30
Subho Ghosh
38a3e4df92 flux controlnet control_guidance_start and control_guidance_end implement (#9571)
* flux controlnet control_guidance_start and control_guidance_end implement

* minor fix - added docstrings, consistent controlnet scale flux and SD3
2024-10-10 09:29:02 -03:00
Sayak Paul
e16fd93d0a [LoRA] fix dora test to catch the warning properly. (#9627)
fix dora test.
2024-10-10 11:47:49 +05:30
Pakkapon Phongthawee
07bd2fabb6 make controlnet support interrupt (#9620)
* make controlnet support interrupt

* remove white space in controlnet interrupt
2024-10-09 12:03:13 -10:00
SahilCarterr
af28ae2d5b add PAG support for SD Img2Img (#9463)
* added pag to sd img2img pipeline


---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-10-09 10:40:58 -10:00
Sayak Paul
31058cdaef [LoRA] allow loras to be loaded with low_cpu_mem_usage. (#9510)
* allow loras to be loaded with low_cpu_mem_usage.

* add flux support but note https://github.com/huggingface/diffusers/pull/9510\#issuecomment-2378316687

* low_cpu_mem_usage.

* fix-copies

* fix-copies again

* tests

* _LOW_CPU_MEM_USAGE_DEFAULT_LORA

* _peft_version default.

* version checks.

* version check.

* version check.

* version check.

* require peft 0.13.1.

* explicitly specify low_cpu_mem_usage=False.

* docs.

* transformers version 4.45.2.

* update

* fix

* empty

* better name initialize_dummy_state_dict.

* doc todos.

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* style

* fix-copies

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-10-09 10:57:16 +05:30
Yijun Lee
ec9e5264c0 refac/pipeline_output (#9582) 2024-10-08 16:11:13 -10:00
sanaka
acd6d2c42f Fix the bug that joint_attention_kwargs is not passed to the FLUX's transformer attention processors (#9517)
* Update transformer_flux.py
2024-10-08 11:25:48 -10:00
v2ray
86bd991ee5 Fixed noise_pred_text referenced before assignment. (#9537)
* Fixed local variable noise_pred_text referenced before assignment when using PAG with guidance scale and guidance rescale at the same time.

* Fixed style.

* Made returning text pred noise an argument.
2024-10-08 09:27:10 -10:00
Sayak Paul
02eeb8e77e [LoRA] Handle DoRA better (#9547)
* handle dora.

* print test

* debug

* fix

* fix-copies

* update logits

* add warning in the test.

* make is_dora check consistent.

* fix-copies
2024-10-08 21:47:44 +05:30
glide-the
66eef9a6dc fix: CogVideox train dataset _preprocess_data crop video (#9574)
* Removed int8 to float32 conversion (`* 2.0 - 1.0`) from `train_transforms` as it caused image overexposure.

Added `_resize_for_rectangle_crop` function to enable video cropping functionality. The cropping mode can be configured via `video_reshape_mode`, supporting options: ['center', 'random', 'none'].

* The number 127.5 may experience precision loss during division operations.

* wandb request pil image Type

* Resizing bug

* del jupyter

* make style

* Update examples/cogvideo/README.md

* make style

---------

Co-authored-by: --unset <--unset>
Co-authored-by: Aryan <aryan@huggingface.co>
2024-10-08 12:52:52 +05:30
Sayak Paul
63a5c8742a Update distributed_inference.md to include transformer.device_map (#9553)
* Update distributed_inference.md to include `transformer.device_map`

* Update docs/source/en/training/distributed_inference.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-10-08 08:03:51 +05:30
Eliseu Silva
1287822973 Fix for use_safetensors parameters, allow use of parameter on loading submodels (#9576) (#9587)
* Fix for use_safetensors parameters, allow use of parameter on loading submodels (#9576)
2024-10-07 10:41:32 -10:00
Yijun Lee
a80f689200 refac: docstrings in import_utils.py (#9583)
* refac: docstrings in import_utils.py

* Update import_utils.py
2024-10-07 13:27:35 -07:00
captainzz
2cb383f591 fix vae dtype when accelerate config using --mixed_precision="fp16" (#9601)
* fix vae dtype when accelerate config using --mixed_precision="fp16"

* Add param for upcast vae
2024-10-07 21:00:25 +05:30
Sayak Paul
31010ecc45 [Chore] add a note on the versions in Flux LoRA integration tests (#9598)
add a note on the versions.
2024-10-07 17:43:48 +05:30
Clem
3159e60d59 fix xlabs FLUX lora conversion typo (#9581)
* fix startswith syntax in xlabs lora conversion

* Trigger CI

https://github.com/huggingface/diffusers/pull/9581#issuecomment-2395530360
2024-10-07 10:47:54 +05:30
YiYi Xu
99f608218c [sd3] make sure height and size are divisible by 16 (#9573)
* check size

* up
2024-10-03 08:36:26 -10:00
Xiangchendong
7f323f0f31 fix cogvideox autoencoder decode (#9569)
Co-authored-by: Aryan <aryan@huggingface.co>
2024-10-02 09:07:06 -10:00
Darren Hsu
61d37640ad Support bfloat16 for Upsample2D (#9480)
* Support bfloat16 for Upsample2D

* Add test and use is_torch_version

* Resolve comments and add decorator

* Simplify require_torch_version_greater_equal decorator

* Run make style

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-10-01 16:08:12 -10:00
JuanCarlosPi
33fafe3d14 Add PAG support to StableDiffusionControlNetPAGInpaintPipeline (#8875)
* Add pag to controlnet inpainting pipeline


---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-09-30 20:04:42 -10:00
hlky
c4a8979f30 Add beta sigmas to other schedulers and update docs (#9538) 2024-09-30 09:00:54 -10:00
Sayak Paul
f9fd511466 [LoRA] support Kohya Flux LoRAs that have text encoders as well (#9542)
* support kohya flux loras that have tes.
2024-09-30 07:59:39 -10:00
Sayak Paul
8e7d6c03a3 [chore] fix: retain memory utility. (#9543)
* fix: retain memory utility.

* fix

* quality

* free_memory.
2024-09-28 21:08:45 +05:30
Anand Kumar
b28675c605 [train_instruct_pix2pix.py]Fix the LR schedulers when num_train_epochs is passed in a distributed training env (#9316)
Fixed pix2pix lr scheduler

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-09-28 21:01:37 +05:30
Aryan
bd4df2856a [refactor] remove conv_cache from CogVideoX VAE (#9524)
* remove conv cache from the layer and pass as arg instead

* make style

* yiyi's cleaner implementation

Co-Authored-By: YiYi Xu <yixu310@gmail.com>

* sayak's compiled implementation

Co-Authored-By: Sayak Paul <spsayakpaul@gmail.com>

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-09-28 17:09:30 +05:30
Sayak Paul
11542431a5 [Core] fix variant-identification. (#9253)
* fix variant-idenitification.

* fix variant

* fix sharded variant checkpoint loading.

* Apply suggestions from code review

* fixes.

* more fixes.

* remove print.

* fixes

* fixes

* comments

* fixes

* apply suggestions.

* hub_utils.py

* fix test

* updates

* fixes

* fixes

* Apply suggestions from code review

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* updates.

* removep patch file.

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-09-28 09:57:31 +05:30
Sayak Paul
81cf3b2f15 [Tests] [LoRA] clean up the serialization stuff. (#9512)
* clean up the serialization stuff.

* better
2024-09-27 07:57:09 -10:00
PromeAI
534848c370 [examples] add train flux-controlnet scripts in example. (#9324)
* add train flux-controlnet scripts in example.

* fix error

* fix subfolder error

* fix preprocess error

* Update examples/controlnet/README_flux.md

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* Update examples/controlnet/README_flux.md

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* fix readme

* fix note error

* add some Tutorial for deepspeed

* fix some Format Error

* add dataset_path example

* remove print, add guidance_scale CLI, readable apply

* Update examples/controlnet/README_flux.md

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* update,push_to_hub,save_weight_dtype,static method,clear_objs_and_retain_memory,report_to=wandb

* add push to hub in readme

* apply weighting schemes

* add note

* Update examples/controlnet/README_flux.md

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* make code style and quality

* fix some unnoticed error

* make code style and quality

* add example controlnet in readme

* add test controlnet

* rm Remove duplicate notes

* Fix formatting errors

* add new control image

* add model cpu offload

* update help for adafactor

* make quality & style

* make quality and style

* rename flux_controlnet_model_name_or_path

* fix back src/diffusers/pipelines/flux/pipeline_flux_controlnet.py

* fix dtype error by pre calculate text emb

* rm image save

* quality fix

* fix test

* fix tiny flux train error

* change report to to tensorboard

* fix save name error when test

* Fix shrinking errors

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Your Name <you@example.com>
2024-09-27 13:31:47 +05:30
Sayak Paul
2daedc0ad3 [LoRA] make set_adapters() method more robust. (#9535)
* make set_adapters() method more robust.

* remove patch

* better and concise code.

* Update src/diffusers/loaders/lora_base.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-09-27 07:32:43 +05:30
Aryan
665c6b47a2 [bug] Precedence of operations in VAE should be slicing -> tiling (#9342)
* bugfix: precedence of operations should be slicing -> tiling

* fix typo

* fix another typo

* deprecate current implementation of tiled_encode and use new impl

* Update src/diffusers/models/autoencoders/autoencoder_kl.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/models/autoencoders/autoencoder_kl.py

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-09-26 22:12:07 +05:30
Álvaro Somoza
066ea374c8 [Tests] Fix ChatGLMTokenizer (#9536)
fix
2024-09-25 22:10:15 -10:00
YiYi Xu
9cd37557d5 flux controlnet fix (control_modes batch & others) (#9507)
* flux controlnet mode to take into account batch size

* incorporate yiyixuxu's suggestions (cleaner logic) as well as clean up control mode handling for multi case

* fix

* fix use_guidance when controlnet is a multi and does not have config

---------

Co-authored-by: Christopher Beckham <christopher.j.beckham@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-09-25 19:09:54 -10:00
hlky
1c6ede9371 [Schedulers] Add beta sigmas / beta noise schedule (#9509)
Add beta sigmas / beta noise schedule
2024-09-25 13:30:32 -10:00
v2ray
aa3c46d99a [Doc] Improved level of clarity for latents_to_rgb. (#9529)
Fixed latents_to_rgb doc.

Co-authored-by: Álvaro Somoza <asomoza@users.noreply.github.com>
2024-09-25 19:26:58 -03:00
YiYi Xu
c76e88405c update get_parameter_dtype (#9526)
* up

* Update src/diffusers/models/modeling_utils.py

Co-authored-by: Aryan <aryan@huggingface.co>

---------

Co-authored-by: Aryan <aryan@huggingface.co>
2024-09-25 11:00:57 -10:00
Steven Liu
d9c969172d [docs] Model sharding (#9521)
* flux shard

* feedback
2024-09-25 09:33:54 -07:00
Lee Penkman
065ce07ac3 Update community_projects.md (#9266) 2024-09-25 08:54:36 -07:00
Sayak Paul
6ca5a58e43 [Community Pipeline] Batched implementation of Flux with CFG (#9513)
* batched implementation of flux cfg.

* style.

* readme

* remove comments.
2024-09-25 15:25:15 +05:30
hlky
b52684c3ed Add exponential sigmas to other schedulers and update docs (#9518) 2024-09-24 14:50:12 -10:00
YiYi Xu
bac8a2412d a few fix for SingleFile tests (#9522)
* update sd15 repo

* update more
2024-09-24 13:36:53 -10:00
Sayak Paul
28f9d84549 [CI] allow faster downloads from the Hub in CI. (#9478)
* allow faster downloads from the Hub in CI.

* HF_HUB_ENABLE_HF_TRANSFER: 1

* empty

* empty

* remove ENV HF_HUB_ENABLE_HF_TRANSFER=1.

* empty
2024-09-24 09:42:11 +05:30
LukeLin
2b5bc5be0b [Doc] Fix path and and also import imageio (#9506)
* Fix bug

* import imageio
2024-09-23 16:47:34 -07:00
captainzz
bab17789b5 fix bugs for sd3 controlnet training (#9489)
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-09-23 13:40:44 -10:00
hlky
19547a5734 Add Noise Schedule/Schedule Type to Schedulers Overview documentation (#9504)
* Add Noise Schedule/Schedule Type to Schedulers Overview docs

* Update docs/source/en/api/schedulers/overview.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-09-23 16:39:55 -07:00
Seongbin Lim
3e69e241f7 Allow DDPMPipeline half precision (#9222)
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-09-23 13:28:14 -10:00
hlky
65f9439b56 [Schedulers] Add exponential sigmas / exponential noise schedule (#9499)
* exponential sigmas

* Apply suggestions from code review

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* make style

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-09-23 13:12:51 -10:00
pibbo88
00f5b41862 Fix the bug of sd3 controlnet training when using gradient checkpointing. (#9498)
Fix the bug of sd3 controlnet training when using gradient_checkpointing. Refer to issue #9496
2024-09-23 12:30:24 -10:00
M Saqlain
14f6464bef [Tests] Reduce the model size in the lumina test (#8985)
* Reduced model size for lumina-tests

* Handled failing tests
2024-09-23 20:35:50 +05:30
Sayak Paul
ba5af5aebb [Cog] some minor fixes and nits (#9466)
* fix positional arguments in check_inputs().

* add video and latetns to check_inputs().

* prep latents_in_channels.

* quality

* multiple fixes.

* fix
2024-09-23 11:27:05 +05:30
Sayak Paul
aa73072f1f [CI] fix nightly model tests (#9483)
* check if default attn procs fix it.

* print

* print

* replace

* style./

* replace revision with variant.

* replace with stable-diffusion-v1-5/stable-diffusion-inpainting.

* replace with stable-diffusion-v1-5/stable-diffusion-v1-5.

* fix
2024-09-21 07:44:47 +05:30
Aryan
e5d0a328d6 [refactor] LoRA tests (#9481)
* refactor scheduler class usage

* reorder to make tests more readable

* remove pipeline specific checks and skip tests directly

* rewrite denoiser conditions cleaner

* bump tolerance for cog test
2024-09-21 07:10:36 +05:30
Vladimir Mandic
14a1b86fc7 Several fixes to Flux ControlNet pipelines (#9472)
* fix flux controlnet pipelines

---------

Co-authored-by: yiyixuxu <yixu310@gmail.com>
2024-09-19 15:49:36 -10:00
Aryan
2b443a5d62 [training] CogVideoX Lora (#9302)
* cogvideox lora training draft

* update

* update

* update

* update

* update

* make fix-copies

* update

* update

* apply suggestions from review

* apply suggestions from reveiw

* fix typo

* Update examples/cogvideo/train_cogvideox_lora.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* fix lora alpha

* use correct lora scaling for final test pipeline

* Update examples/cogvideo/train_cogvideox_lora.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* apply suggestions from review; prodigy optimizer

YiYi Xu <yixu310@gmail.com>

* add tests

* make style

* add README

* update

* update

* make style

* fix

* update

* add test skeleton

* revert lora utils changes

* add cleaner modifications to lora testing utils

* update lora tests

* deepspeed stuff

* add requirements.txt

* deepspeed refactor

* add lora stuff to img2vid pipeline to fix tests

* fight tests

* add co-authors

Co-Authored-By: Fu-Yun Wang <1697256461@qq.com>

Co-Authored-By: zR <2448370773@qq.com>

* fight lora runner tests

* import Dummy optim and scheduler only wheh required

* update docs

* add coauthors

Co-Authored-By: Fu-Yun Wang <1697256461@qq.com>

* remove option to train text encoder

Co-Authored-By: bghira <bghira@users.github.com>

* update tests

* fight more tests

* update

* fix vid2vid

* fix typo

* remove lora tests; todo in follow-up PR

* undo img2vid changes

* remove text encoder related changes in lora loader mixin

* Revert "remove text encoder related changes in lora loader mixin"

This reverts commit f8a8444487.

* update

* round 1 of fighting tests

* round 2 of fighting tests

* fix copied from comment

* fix typo in lora test

* update styling

Co-Authored-By: YiYi Xu <yixu310@gmail.com>

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: zR <2448370773@qq.com>
Co-authored-by: Fu-Yun Wang <1697256461@qq.com>
Co-authored-by: bghira <bghira@users.github.com>
2024-09-19 14:37:57 +05:30
Sayak Paul
d13b0d63c0 [Flux] add lora integration tests. (#9353)
* add lora integration tests.

* internal note

* add a skip marker.
2024-09-19 09:21:28 +05:30
Anatoly Belikov
5d476f57c5 adapt masked im2im pipeline for SDXL (#7790)
* adapt masked im2im pipeline for SDXL

* usage for masked im2im stable diffusion XL pipeline

* style

* style

* style

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-09-17 16:55:49 -10:00
Aryan
da18fbd54c set max_shard_size to None for pipeline save_pretrained (#9447)
* update default max_shard_size

* add None check to fix tests

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-09-17 10:15:18 -10:00
Aryan
ba06124e4a Remove CogVideoX mentions from single file docs; Test updates (#9444)
* remove mentions from single file

* update tests

* update
2024-09-17 10:05:45 -10:00
Subho Ghosh
bb1b0fa1f9 Feature flux controlnet img2img and inpaint pipeline (#9408)
* Implemented FLUX controlnet support to Img2Img pipeline
2024-09-17 09:43:54 -10:00
Linoy Tsaban
8fcfb2a456 [Flux with CFG] add flux pipeline with cfg support (#9445)
* true_cfg

* add check negative prompt/embeds inputs

* move to community pipelines

* move to community pipelines

* revert true cfg changes to the orig pipline

* style

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-09-16 12:09:34 -10:00
Sayak Paul
5440cbd34e [CI] updates to the CI report naming, and accelerate installation (#9429)
* chore: id accordingly to avoid duplicates.

* update properly.

* updates

* updates

* empty

* updates

* changing order helps?
2024-09-16 11:29:07 -10:00
suzukimain
b52119ae92 [docs] Replace runwayml/stable-diffusion-v1-5 with Lykon/dreamshaper-8 (#9428)
* [docs] Replace runwayml/stable-diffusion-v1-5 with Lykon/dreamshaper-8

Updated documentation as runwayml/stable-diffusion-v1-5 has been removed from Huggingface.

* Update docs/source/en/using-diffusers/inpaint.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Replace with stable-diffusion-v1-5/stable-diffusion-v1-5

* Update inpaint.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-09-16 10:18:45 -07:00
Yuxuan.Zhang
8336405e50 CogVideoX-5b-I2V support (#9418)
* draft Init

* draft

* vae encode image

* make style

* image latents preparation

* remove image encoder from conversion script

* fix minor bugs

* make pipeline work

* make style

* remove debug prints

* fix imports

* update example

* make fix-copies

* add fast tests

* fix import

* update vae

* update docs

* update image link

* apply suggestions from review

* apply suggestions from review

* add slow test

* make use of learned positional embeddings

* apply suggestions from review

* doc change

* Update convert_cogvideox_to_diffusers.py

* make style

* final changes

* make style

* fix tests

---------

Co-authored-by: Aryan <aryan@huggingface.co>
2024-09-16 14:46:24 +05:30
Sayak Paul
2171f77ac5 [CI] make runner_type restricted. (#9441)
make runner_type restricted.
2024-09-16 12:09:31 +05:30
Aryan
2454b98af4 Allow max shard size to be specified when saving pipeline (#9440)
allow max shard size to be specified when saving pipeline
2024-09-16 08:36:07 +05:30
Linoy Tsaban
37e3603c4a [Flux Dreambooth lora] add latent caching (#9160)
* add ostris trainer to README & add cache latents of vae

* add ostris trainer to README & add cache latents of vae

* style

* readme

* add test for latent caching

* add ostris noise scheduler
9ee1ef2a0a/toolkit/samplers/custom_flowmatch_sampler.py (L95)

* style

* fix import

* style

* fix tests

* style

* --change upcasting of transformer?

* update readme according to main

* keep only latent caching

* add configurable param for final saving of trained layers- --upcast_before_saving

* style

* Update examples/dreambooth/README_flux.md

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* Update examples/dreambooth/README_flux.md

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* use clear_objs_and_retain_memory from utilities

* style

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-09-15 15:30:31 +03:00
Leo Jiang
e2ead7cdcc Fix the issue on sd3 dreambooth w./w.t. lora training (#9419)
* Fix dtype error

* [bugfix] Fixed the issue on sd3 dreambooth training

* [bugfix] Fixed the issue on sd3 dreambooth training

---------

Co-authored-by: 蒋硕 <jiangshuo9@h-partners.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-09-14 16:29:38 +05:30
Benjamin Bossan
48e36353d8 MAINT Permission for GH token in stale.yml (#9427)
* MAINT Permission for GH token in stale.yml

See https://github.com/huggingface/peft/pull/2061 for the equivalent PR
in PEFT.

This restores the functionality of the stale bot after permissions for
the token have been limited. The action still shows errors for PEFT but
the bot appears to work fine.

* Also add write permissions for PRs

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-09-13 21:00:57 +05:30
Sayak Paul
6dc6486565 [LoRA] fix adapter movement when using DoRA. (#9411)
fix adapter movement when using DoRA.
2024-09-13 07:31:53 +05:30
Dhruv Nair
1e8cf2763d [CI] Nightly Test Updates (#9380)
* update

* update

* update

* update

* update

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-09-12 20:21:28 +05:30
Sayak Paul
6cf8d98ce1 [CI] update artifact uploader version (#9426)
update artifact uploader version
2024-09-12 19:26:09 +05:30
Juan Acevedo
45aa8bb187 Ptxla sd training (#9381)
* enable pxla training of stable diffusion 2.x models.

* run linter/style and run pipeline test for stable diffusion and fix issues.

* update xla libraries

* fix read me newline.

* move files to research folder.

* update per comments.

* rename readme.

---------

Co-authored-by: Juan Acevedo <jfacevedo@google.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-09-12 08:35:06 +05:30
Aryan
5e1427a7da [docs] AnimateDiff FreeNoise (#9414)
* update docs

* apply suggestions from review

* Update docs/source/en/api/pipelines/animatediff.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/api/pipelines/animatediff.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/api/pipelines/animatediff.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* apply suggestions from review

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-09-11 12:59:58 -07:00
asfiyab-nvidia
b9e2f886cd FluxPosEmbed: Remove Squeeze No-op (#9409)
Remove Squeeze op

Signed-off-by: Asfiya Baig <asfiyab@nvidia.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-09-10 19:12:36 -10:00
dianyo
b19827f6b4 Migrate the BrownianTree to BrownianInterval in DPM solver (#9335)
migrate the BrownianTree to BrownianInterval

Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-09-10 18:29:15 -10:00
Yu Zheng
c002731d93 [examples] add controlnet sd3 example (#9249)
* add controlnet sd3 example

* add controlnet sd3 example

* update controlnet sd3 example

* add controlnet sd3 example test

* fix quality and style

* update test

* update test

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-09-11 07:04:37 +05:30
Sayak Paul
adf1f911f0 [Tests] fix some fast gpu tests. (#9379)
fix some fast gpu tests.
2024-09-11 06:50:02 +05:30
captainzz
f28a8c257a fix from_transformer() with extra conditioning channels (#9364)
* fix from_transformer() with extra conditioning channels

* style fix

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: Álvaro Somoza <somoza.alvaro@gmail.com>
2024-09-09 07:51:48 -10:00
Jinzhe Pan
2c6a6c97b3 [docs] Add xDiT in section optimization (#9365)
* docs: add xDiT to optimization methods

* fix: picture layout problem

* docs: add more introduction about xdit & apply suggestions

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-09-09 10:31:07 -07:00
Igor Filippov
a7361dccdc [Pipeline] animatediff + vid2vid + controlnet (#9337)
* add animatediff + vid2vide + controlnet

* post tests fixes

* PR discussion fixes

* update docs

* change input video to links on HF + update an example

* make quality fix

* fix ip adapter test

* fix ip adapter test input

* update ip adapter test
2024-09-09 22:48:21 +05:30
YiYi Xu
485b8bb000 refactor get_timesteps for SDXL img2img + add set_begin_index (#9375)
* refator + add begin_index

* add kolors img2img to doc
2024-09-09 06:38:22 -10:00
Sayak Paul
d08ad65819 modify benchmarks to replace sdv1.5 with dreamshaper. (#9334) 2024-09-09 20:54:56 +05:30
YiYi Xu
8cdcdd9e32 add flux inpaint + img2img + controlnet to auto pipeline (#9367) 2024-09-06 07:14:48 -10:00
Dhruv Nair
d269cc8a4e [CI] Quick fix for Cog Video Test (#9373)
update
2024-09-06 15:25:53 +05:30
Aryan
6dfa49963c [core] Freenoise memory improvements (#9262)
* update

* implement prompt interpolation

* make style

* resnet memory optimizations

* more memory optimizations; todo: refactor

* update

* update animatediff controlnet with latest changes

* refactor chunked inference changes

* remove print statements

* update

* chunk -> split

* remove changes from incorrect conflict resolution

* remove changes from incorrect conflict resolution

* add explanation of SplitInferenceModule

* update docs

* Revert "update docs"

This reverts commit c55a50a271.

* update docstring for freenoise split inference

* apply suggestions from review

* add tests

* apply suggestions from review
2024-09-06 12:51:20 +05:30
Haruya Ishikawa
5249a2666e fix one uncaught deprecation warning for accessing vae_latent_channels in VaeImagePreprocessor (#9372)
deprecation warning vae_latent_channels
2024-09-05 07:32:27 -10:00
Linoy Tsaban
55ac421f7b improve README for flux dreambooth lora (#9290)
* improve readme

* improve readme

* improve readme

* improve readme
2024-09-05 17:53:23 +05:30
Dhruv Nair
53051cf282 [CI] Update Single file Nightly Tests (#9357)
* update

* update
2024-09-05 14:33:44 +05:30
Tolga Cangöz
3000551729 Update UNet2DConditionModel's error messages (#9230)
* refactor
2024-09-04 10:49:56 -10:00
Vishnu V Jaddipal
249a9e48e8 Add Flux inpainting and Flux Img2Img (#9135)
---------

Co-authored-by: yiyixuxu <yixu310@gmail.com>
2024-09-04 10:31:43 -10:00
Fanli Lin
2ee3215949 [tests] make 2 tests device-agnostic (#9347)
* enabel on xpu

* fix style
2024-09-03 16:34:03 -10:00
Eduardo Escobar
8ecf499d8b Enable load_lora_weights for StableDiffusion3InpaintPipeline (#9330)
Enable load_lora_weights for StableDiffusion3InpaintPipeline

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-09-03 15:19:37 -10:00
YiYi Xu
dcf320f293 small update on rotary embedding (#9354)
* update

* fix

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-09-03 07:18:33 -10:00
Sayak Paul
8ba90aa706 chore: add a cleaning utility to be useful during training. (#9240) 2024-09-03 15:00:17 +05:30
Aryan
9d49b45b19 [refactor] move positional embeddings to patch embed layer for CogVideoX (#9263)
* remove frame limit in cogvideox

* remove debug prints

* Update src/diffusers/models/transformers/cogvideox_transformer_3d.py

* revert pipeline; remove frame limitation

* revert transformer changes

* address review comments

* add error message

* apply suggestions from review
2024-09-03 14:45:12 +05:30
Dhruv Nair
81da2e1c95 [CI] Add option to dispatch Fast GPU tests on main (#9355)
update
2024-09-03 14:35:13 +05:30
Aryan
24053832b5 [tests] remove/speedup some low signal tests (#9285)
* remove 2 shapes from SDFunctionTesterMixin::test_vae_tiling

* combine freeu enable/disable test to reduce many inference runs

* remove low signal unet test for signature

* remove low signal embeddings test

* remove low signal progress bar test from PipelineTesterMixin

* combine ip-adapter single and multi tests to save many inferences

* fix broken tests

* Update tests/pipelines/test_pipelines_common.py

* Update tests/pipelines/test_pipelines_common.py

* add progress bar tests
2024-09-03 13:59:18 +05:30
Dhruv Nair
f6f16a0c11 [CI] More Fast GPU Test Fixes (#9346)
* update

* update

* update

* update
2024-09-03 13:22:38 +05:30
Vishnu V Jaddipal
1c1ccaa03f Xlabs lora fix (#9348)
* Fix ```from_single_file``` for xl_inpaint

* Add basic flux inpaint pipeline

* style, quality, stray print

* Fix stray changes

* Add inpainting model support

* Change lora conversion for xlabs

* Fix stray changes

* Apply suggestions from code review

* style

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-09-03 10:43:43 +05:30
Dhruv Nair
007ad0e2aa [CI] More fixes for Fast GPU Tests on main (#9300)
update
2024-09-02 17:51:48 +05:30
Aryan
0e6a8403f6 [core] Support VideoToVideo with CogVideoX (#9333)
* add vid2vid pipeline for cogvideox

* make fix-copies

* update docs

* fake context parallel cache, vae encode tiling

* add test for cog vid2vid

* use video link from HF docs repo

* add copied from comments; correctly rename test class
2024-09-02 16:54:58 +05:30
Aryan
af6c0fb766 [core] CogVideoX memory optimizations in VAE encode (#9340)
fake context parallel cache, vae encode tiling

(cherry picked from commit bf890bca0e)
2024-09-02 15:48:37 +05:30
YiYi Xu
d8a16635f4 update runway repo for single_file (#9323)
update to a place holder
2024-08-30 08:51:21 -10:00
Aryan
e417d02811 [docs] Add a note on torchao/quanto benchmarks for CogVideoX and memory-efficient inference (#9296)
* add a note on torchao/quanto benchmarks and memory-efficient inference

* apply suggestions from review

* update

* Update docs/source/en/api/pipelines/cogvideox.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/api/pipelines/cogvideox.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* add note on enable sequential cpu offload

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-08-30 13:53:25 +05:30
Dhruv Nair
1d4d71875b [CI] Update Hub Token on nightly tests (#9318)
update
2024-08-30 10:23:50 +05:30
YiYi Xu
61d96c3ae7 refactor rotary embedding 3: so it is not on cpu (#9307)
change get_1d_rotary to accept pos as torch tensors
2024-08-30 01:07:15 +05:30
YiYi Xu
4f495b06dc rotary embedding refactor 2: update comments, fix dtype for use_real=False (#9312)
fix notes and dtype
2024-08-28 23:31:47 -10:00
Anand Kumar
40c13fe5b4 [train_custom_diffusion.py] Fix the LR schedulers when num_train_epochs is passed in a distributed training env (#9308)
* Update train_custom_diffusion.py to fix the LR schedulers for `num_train_epochs`

* Fix saving text embeddings during safe serialization

* Fixed formatting
2024-08-29 14:23:36 +05:30
Sayak Paul
2a3fbc2cc2 [LoRA] support kohya and xlabs loras for flux. (#9295)
* support kohya lora in flux.

* format

* support xlabs

* diffusion_model prefix.

* Apply suggestions from code review

Co-authored-by: apolinário <joaopaulo.passos@gmail.com>

* empty commit.

Co-authored-by: Leommm-byte <leom20031@gmail.com>

---------

Co-authored-by: apolinário <joaopaulo.passos@gmail.com>
Co-authored-by: Leommm-byte <leom20031@gmail.com>
2024-08-29 07:41:46 +05:30
apolinário
089cf798eb Change default for guidance_scalein FLUX (#9305)
To match the original code, 7.0 is too high
2024-08-28 07:39:45 -10:00
Aryan
cbc2ec8f44 AnimateDiff prompt travel (#9231)
* update

* implement prompt interpolation

* make style

* resnet memory optimizations

* more memory optimizations; todo: refactor

* update

* update animatediff controlnet with latest changes

* refactor chunked inference changes

* remove print statements

* undo memory optimization changes

* update docstrings

* fix tests

* fix pia tests

* apply suggestions from review

* add tests

* update comment
2024-08-28 14:48:12 +05:30
Frank (Haofan) Wang
b5f591fea8 Update __init__.py (#9286) 2024-08-27 07:57:25 -10:00
Dhruv Nair
05b38c3c0d Fix Flux CLIP prompt embeds repeat for num_images_per_prompt > 1 (#9280)
update
2024-08-27 07:41:12 -10:00
Dhruv Nair
8f7fde5701 [CI] Update Release Tests (#9274)
* update

* update
2024-08-27 18:34:00 +05:30
Dhruv Nair
a59672655b Fix Freenoise for AnimateDiff V3 checkpoint. (#9288)
update
2024-08-27 18:30:39 +05:30
Marçal Comajoan Cara
9aca79f2b8 Replace transformers.deepspeed with transformers.integrations.deepspeed (#9281)
to avoid "FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations"

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-08-27 18:08:23 +05:30
Steven Liu
bbcf2a8589 [docs] Add pipelines to table (#9282)
update pipelines
2024-08-27 12:15:30 +05:30
Álvaro Somoza
4cfb2164fb [IP Adapter] Fix cache_dir and local_files_only for image encoder (#9272)
initial fix
2024-08-26 09:03:08 -10:00
Linoy Tsaban
c977966502 [Dreambooth flux] bug fix for dreambooth script (align with dreambooth lora) (#9257)
* fix shape

* fix prompt encoding

* style

* fix device

* add comment
2024-08-26 17:29:58 +05:30
YiYi Xu
1ca0a75567 refactor 3d rope for cogvideox (#9269)
* refactor 3d rope

* repeat -> expand
2024-08-25 11:57:12 -10:00
王奇勋
c1e6a32ae4 [Flux] Support Union ControlNet (#9175)
* refactor
---------

Co-authored-by: haofanwang <haofanwang.ai@gmail.com>
2024-08-25 00:24:21 -10:00
yangpei-comp
77b2162817 Bugfix in pipeline_kandinsky2_2_combined.py: Image type check mismatch (#9256)
Update pipeline_kandinsky2_2_combined.py

Bugfix on image type check mismatch
2024-08-23 08:38:47 -10:00
Dhruv Nair
4e66513a74 [CI] Run Fast + Fast GPU Tests on release branches. (#9255)
* update

* update
2024-08-23 19:34:37 +05:30
Dhruv Nair
4e74206b0c [Single File] Add Flux Pipeline Support (#9244)
update
2024-08-23 14:40:43 +05:30
Dhruv Nair
255ac592c2 [Single File] Support loading Comfy UI Flux checkpoints (#9243)
update
2024-08-23 14:40:29 +05:30
Sayak Paul
2d9ccf39b5 [Core] fuse_qkv_projection() to Flux (#9185)
* start fusing flux.

* test

* finish fusion

* fix-copues
2024-08-23 10:54:13 +05:30
zR
960c149c77 Cogvideox-5B Model adapter change (#9203)
* draft of embedding

---------

Co-authored-by: Aryan <aryan@huggingface.co>
2024-08-22 16:03:29 -10:00
YiYi Xu
dc07fc29da fix _identify_model_variants (#9247)
merge now, will add/fix test next
2024-08-22 12:00:17 -10:00
Elias Rad
805bf33fa7 Docs fix spelling issues (#9219)
* fix PHILOSOPHY.md

* fix CONTRIBUTING.md

* fix tutorial_overview.md

* fix stable_diffusion.md

* Update tutorial_overview.md
2024-08-22 13:38:07 -07:00
Aryan
0ec64fe9fc [tests] fix broken xformers tests (#9206)
* fix xformers tests

* remove unnecessary modifications to cogvideox tests

* update
2024-08-22 15:17:47 +05:30
Sayak Paul
5090b09d48 [Flux LoRA] support parsing alpha from a flux lora state dict. (#9236)
* support parsing alpha from a flux lora state dict.

* conditional import.

* fix breaking changes.

* safeguard alpha.

* fix
2024-08-22 07:01:52 +05:30
Sayak Paul
32d6492c7b [Core] Tear apart from_pretrained() of DiffusionPipeline (#8967)
* break from_pretrained part i.

* part ii.

* init_kwargs

* remove _fetch_init_kwargs

* type annotation

* dtyle

* switch to _check_and_update_init_kwargs_for_missing_modules.

* remove _check_and_update_init_kwargs_for_missing_modules.

* use pipeline_loading_kwargs.

* remove _determine_current_device_map.

* remove _filter_null_components.

* device_map fix.

* fix _update_init_kwargs_with_connected_pipeline.

* better handle custom pipeline.

* explain _maybe_raise_warning_for_inpainting.

* add example for model variant.

* fix
2024-08-22 06:50:57 +05:30
Steven Liu
43f1090a0f [docs] Network alpha docstring (#9238)
fix docstring

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-08-22 06:46:29 +05:30
YiYi Xu
c291617518 Flux followup (#9074)
* refactor rotary embeds

* adding jsmidt as co-author of this PR for https://github.com/huggingface/diffusers/pull/9133

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Joseph Smidt <josephsmidt@gmail.com>
2024-08-21 08:44:58 -10:00
satani99
9003d75f20 Add StableDiffusionXLControlNetPAGImg2ImgPipeline (#8990)
* Added pad controlnet sdxl img2img pipeline

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-08-21 07:24:22 -10:00
Dhruv Nair
750bd79206 [Single File] Fix configuring scheduler via legacy kwargs (#9229)
update
2024-08-21 21:15:20 +05:30
YiYi Xu
214372aa99 fix a regression in is_safetensors_compatible (#9234)
fix
2024-08-21 18:56:55 +05:30
Vinh H. Pham
867e0c919e StableDiffusionLatentUpscalePipeline - positive/negative prompt embeds support (#8947)
* make latent upscaler accept prompt embeds

---------

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-08-20 18:00:55 -10:00
Sangwon Lee
16a3dad474 Fix StableDiffusionXLPAGInpaintPipeline (#9128) 2024-08-20 11:54:27 -10:00
Disty0
21682bab7e Custom sampler support for Stable Cascade Decoder (#9132)
Custom sampler support Stable Cascade Decoder
2024-08-20 09:56:53 -10:00
Vishnu V Jaddipal
214990e5f2 Fix ``from_single_file`` for xl_inpaint (#9054) 2024-08-20 12:09:01 +05:30
Dhruv Nair
cf2c49b179 Remove M1 runner from Nightly Test (#9193)
* update

* update
2024-08-20 11:44:58 +05:30
Leo Jiang
eda36c4c28 Fix dtype error for StableDiffusionXL (#9217)
Fix dtype error

Co-authored-by: 蒋硕 <jiangshuo9@h-partners.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-08-20 06:58:50 +05:30
Zoltan
803e817e3e Add vae slicing and tiling to flux pipeline (#9122)
add vae slicing and tiling to flux pipeline

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-08-19 10:39:45 -10:00
YiYi Xu
67f5cce294 fix autopipeline for kolors img2img (#9212)
fix
2024-08-19 07:40:15 -10:00
Jiwook Han
d72bbc68a9 Reflect few contributions on contribution.md that were not reflected on #8294 (#8938)
* incorrect_number_fix

* add_TOC

* Update docs/source/ko/conceptual/contribution.md

Co-authored-by: Jihun Lim <31366038+heuristicwave@users.noreply.github.com>

* Update docs/source/ko/conceptual/contribution.md

Co-authored-by: Jihun Lim <31366038+heuristicwave@users.noreply.github.com>

* fix: manual edits

* fix: manual edtis

* fix: manual edits

* Update docs/source/ko/conceptual/contribution.md

Co-authored-by: Jihun Lim <31366038+heuristicwave@users.noreply.github.com>

* Update docs/source/ko/conceptual/contribution.md

Co-authored-by: Jihun Lim <31366038+heuristicwave@users.noreply.github.com>

* Update docs/source/ko/conceptual/contribution.md

Co-authored-by: Jihun Lim <31366038+heuristicwave@users.noreply.github.com>

* Update docs/source/ko/conceptual/contribution.md

Co-authored-by: Jihun Lim <31366038+heuristicwave@users.noreply.github.com>

* Update docs/source/ko/conceptual/contribution.md

Co-authored-by: Jihun Lim <31366038+heuristicwave@users.noreply.github.com>

* Update docs/source/ko/conceptual/contribution.md

Co-authored-by: Jihun Lim <31366038+heuristicwave@users.noreply.github.com>

* Update docs/source/ko/conceptual/contribution.md

Co-authored-by: Jihun Lim <31366038+heuristicwave@users.noreply.github.com>

* Update docs/source/ko/conceptual/contribution.md

Co-authored-by: Jihun Lim <31366038+heuristicwave@users.noreply.github.com>

* Update docs/source/ko/conceptual/contribution.md

Co-authored-by: Jihun Lim <31366038+heuristicwave@users.noreply.github.com>

* fix: manual edits

---------

Co-authored-by: Jihun Lim <31366038+heuristicwave@users.noreply.github.com>
2024-08-19 09:54:38 -07:00
Dhruv Nair
9ab80a99a4 [CI] Add fail-fast=False to CUDA nightly and slow tests (#9214)
* update

* update
2024-08-19 16:08:35 +05:30
Dhruv Nair
940b8e0358 [CI] Multiple Slow Test fixes. (#9198)
* update

* update

* update

* update
2024-08-19 13:31:09 +05:30
Dhruv Nair
b2add10d13 Update is_safetensors_compatible check (#8991)
* update

* update

* update

* update

* update
2024-08-19 11:35:22 +05:30
Wenlong Wu
815d882217 Add loading text inversion (#9130) 2024-08-19 09:26:27 +05:30
M Saqlain
ba4348d9a7 [Tests] Improve transformers model test suite coverage - Lumina (#8987)
* Added test suite for lumina

* Fixed failing tests

* Improved code quality

* Added function docstrings

* Improved formatting
2024-08-19 08:29:03 +05:30
townwish4git
d25eb5d385 fix(sd3): fix deletion of text_encoders etc (#8951)
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-08-18 15:37:40 -10:00
Tolga Cangöz
7ef8a46523 [Docs] Fix CPU offloading usage (#9207)
* chore: Fix cpu offloading usage

* Trim trailing white space

* docs: update Kolors model link in kolors.md
2024-08-18 13:12:12 -10:00
Sayak Paul
f848febacd feat: allow sharding for auraflow. (#8853) 2024-08-18 08:47:26 +05:30
Beinsezii
b38255006a Add Lumina T2I Auto Pipe Mapping (#8962) 2024-08-16 23:14:17 -10:00
Jianqi Pan
cba548d8a3 fix(pipeline): k sampler sigmas device (#9189)
If Karras is not enabled, a device inconsistency error will occur. This is due to the fact that sigmas were not moved to the specified device.
2024-08-16 22:43:42 -10:00
Álvaro Somoza
db829a4be4 [IP Adapter] Fix object has no attribute with image encoder (#9194)
* fix

* apply suggestion
2024-08-17 02:00:04 -04:00
Sayak Paul
e780c05cc3 [Chore] add set_default_attn_processor to pixart. (#9196)
add set_default_attn_processor to pixart.
2024-08-16 13:07:06 +05:30
C
e649678bf5 [Flux] Optimize guidance creation in flux pipeline by moving it outside the loop (#9153)
* optimize guidance creation in flux pipeline by moving it outside the loop

* use torch.full instead of torch.tensor to create a tensor with a single value

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-08-16 10:14:05 +05:30
Sayak Paul
39b87b14b5 feat: allow flux transformer to be sharded during inference (#9159)
* feat: support sharding for flux.

* tests
2024-08-16 10:00:51 +05:30
Dhruv Nair
3e46043223 Small improvements for video loading (#9183)
* update

* update
2024-08-16 09:36:58 +05:30
Simo Ryu
1a92bc05a7 Add Learned PE selection for Auraflow (#9182)
* add pe

* Update src/diffusers/models/transformers/auraflow_transformer_2d.py

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* Update src/diffusers/models/transformers/auraflow_transformer_2d.py

* beauty

* retrigger ci.

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-08-15 16:30:24 +05:30
dependabot[bot]
0c1e63bd11 Bump jinja2 from 3.1.3 to 3.1.4 in /examples/research_projects/realfill (#7873)
Bumps [jinja2](https://github.com/pallets/jinja) from 3.1.3 to 3.1.4.
- [Release notes](https://github.com/pallets/jinja/releases)
- [Changelog](https://github.com/pallets/jinja/blob/main/CHANGES.rst)
- [Commits](https://github.com/pallets/jinja/compare/3.1.3...3.1.4)

---
updated-dependencies:
- dependency-name: jinja2
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-08-14 16:36:59 +05:30
dependabot[bot]
e7e45bd127 Bump torch from 2.0.1 to 2.2.0 in /examples/research_projects/realfill (#8971)
Bumps [torch](https://github.com/pytorch/pytorch) from 2.0.1 to 2.2.0.
- [Release notes](https://github.com/pytorch/pytorch/releases)
- [Changelog](https://github.com/pytorch/pytorch/blob/main/RELEASE.md)
- [Commits](https://github.com/pytorch/pytorch/compare/v2.0.1...v2.2.0)

---
updated-dependencies:
- dependency-name: torch
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-08-14 16:36:46 +05:30
Álvaro Somoza
82058a5413 post release 0.30.0 (#9173)
* post release

* fix quality
2024-08-14 12:55:55 +05:30
Aryan
a85b34e7fd [refactor] CogVideoX followups + tiled decoding support (#9150)
* refactor context parallel cache; update torch compile time benchmark

* add tiling support

* make style

* remove num_frames % 8 == 0 requirement

* update default num_frames to original value

* add explanations + refactor

* update torch compile example

* update docs

* update

* clean up if-statements

* address review comments

* add test for vae tiling

* update docs

* update docs

* update docstrings

* add modeling test for cogvideox transformer

* make style
2024-08-14 03:53:21 +05:30
王奇勋
5ffbe14c32 [FLUX] Support ControlNet (#9126)
* cnt model

* cnt model

* cnt model

* fix Loader "Copied"

* format

* txt_ids for  multiple images

* add test and format

* typo

* Update pipeline_flux_controlnet.py

* remove

* make quality

* fix copy

* Update src/diffusers/pipelines/flux/pipeline_flux_controlnet.py

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

* Update src/diffusers/pipelines/flux/pipeline_flux_controlnet.py

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

* Update src/diffusers/pipelines/flux/pipeline_flux_controlnet.py

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

* Update src/diffusers/pipelines/flux/pipeline_flux_controlnet.py

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

* Update src/diffusers/models/controlnet_flux.py

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

* fix

* make copies

* test

* bs

---------

Co-authored-by: haofanwang <haofanwang.ai@gmail.com>
Co-authored-by: haofanwang <haofan@HaofandeMBP.lan>
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2024-08-13 18:17:40 +05:30
林金鹏
cc0513091a Support SD3 controlnet inpainting (#9099)
* add controlnet inpainting pipeline

* [SD3] add controlnet inpaint example

* update example and fix code style

* fix code style with ruff

* Update controlnet_sd3.md : add control inpaint pipeline

* Update docs/source/en/api/pipelines/controlnet_sd3.md

Co-authored-by: Aryan <contact.aryanvs@gmail.com>

* Update docs/source/en/api/pipelines/controlnet_sd3.md

Co-authored-by: Aryan <contact.aryanvs@gmail.com>

* Update docs/source/en/api/pipelines/controlnet_sd3.md

Co-authored-by: Aryan <contact.aryanvs@gmail.com>

* Update src/diffusers/pipelines/controlnet_sd3/pipeline_stable_diffusion_3_controlnet_inpainting.py

Co-authored-by: Aryan <contact.aryanvs@gmail.com>

* Update __init__.py : add sd3 control pipelines

* Update pipeline : add new param doc & check input reference.

* fix typo

* make style & make quality

* add unittest for sd3 controlnet inpaint

---------

Co-authored-by: 鹏徙 <linjinpeng.ljp@alibaba-inc.com>
Co-authored-by: Aryan <contact.aryanvs@gmail.com>
2024-08-13 17:30:46 +05:30
Sayak Paul
15eb77bc4c Update distributed_inference.md to include a fuller example on distributed inference (#9152)
* Update distributed_inference.md

* Update docs/source/en/training/distributed_inference.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-08-12 09:56:03 -07:00
Linoy Tsaban
413ca29b71 [Flux Dreambooth LoRA] - te bug fixes & updates (#9139)
* add requirements + fix link to bghira's guide

* text ecnoder training fixes

* text encoder training fixes

* text encoder training fixes

* text encoder training fixes

* style

* add tests

* fix encode_prompt call

* style

* unpack_latents test

* fix lora saving

* remove default val for max_sequenece_length in encode_prompt

* remove default val for max_sequenece_length in encode_prompt

* style

* testing

* style

* testing

* testing

* style

* fix sizing issue

* style

* revert scaling

* style

* style

* scaling test

* style

* scaling test

* remove model pred operation left from pre-conditioning

* remove model pred operation left from pre-conditioning

* fix trainable params

* remove te2 from casting

* transformer to accelerator

* remove prints

* empty commit
2024-08-12 11:58:03 +05:30
Dhruv Nair
10dc06c8d9 Update Video Loading/Export to use imageio (#9094)
* update

* update

* update

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-08-12 10:19:53 +05:30
Dibbla!
3ece143308 Errata - fix typo (#9100) 2024-08-12 07:30:19 +05:30
Steven Liu
98930ee131 [docs] Resolve internal links to PEFT (#9144)
* resolve peft links

* fuse_lora
2024-08-10 06:37:46 +05:30
Daniel Socek
c1079f0887 Fix textual inversion SDXL and add support for 2nd text encoder (#9010)
* Fix textual inversion SDXL and add support for 2nd text encoder

Signed-off-by: Daniel Socek <daniel.socek@intel.com>

* Fix style/quality of text inv for sdxl

Signed-off-by: Daniel Socek <daniel.socek@intel.com>

---------

Signed-off-by: Daniel Socek <daniel.socek@intel.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-08-09 20:23:06 +05:30
Linoy Tsaban
65e30907b5 [Flux] Dreambooth LoRA training scripts (#9086)
* initial commit - dreambooth for flux

* update transformer to be FluxTransformer2DModel

* update training loop and validation inference

* fix sd3->flux docs

* add guidance handling, not sure if it makes sense(?)

* inital dreambooth lora commit

* fix text_ids in compute_text_embeddings

* fix imports of static methods

* fix pipeline loading in readme, remove auto1111 docs for now

* fix pipeline loading in readme, remove auto1111 docs for now, remove some irrelevant text_encoder_3 refs

* Update examples/dreambooth/train_dreambooth_flux.py

Co-authored-by: Bagheera <59658056+bghira@users.noreply.github.com>

* fix te2 loading and remove te2 refs from text encoder training

* fix tokenizer_2 initialization

* remove text_encoder training refs from lora script (for now)

* try with vae in bfloat16, fix model hook save

* fix tokenization

* fix static imports

* fix CLIP import

* remove text_encoder training refs (for now) from lora script

* fix minor bug in encode_prompt, add guidance def in lora script, ...

* fix unpack_latents args

* fix license in readme

* add "none" to weighting_scheme options for uniform sampling

* style

* adapt model saving - remove text encoder refs

* adapt model loading - remove text encoder refs

* initial commit for readme

* Update examples/dreambooth/train_dreambooth_lora_flux.py

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* Update examples/dreambooth/train_dreambooth_lora_flux.py

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* fix vae casting

* remove precondition_outputs

* readme

* readme

* style

* readme

* readme

* update weighting scheme default & docs

* style

* add text_encoder training to lora script, change vae_scale_factor value in both

* style

* text encoder training fixes

* style

* update readme

* minor fixes

* fix te params

* fix te params

---------

Co-authored-by: Bagheera <59658056+bghira@users.noreply.github.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-08-09 07:31:04 +05:30
Sayak Paul
cee7c1b0fb Update README.md to include InstantID (#8770)
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-08-08 10:14:12 -07:00
Monjoy Narayan Choudhury
1fcb811a8e Add Differential Diffusion to HunyuanDiT. (#9040)
* Add Differential Pipeline.

* Fix Styling Issue using ruff -fix

* Add details to Contributing.md

* Revert "Fix Styling Issue using ruff -fix"

This reverts commit d347de162d.

* Revert "Revert "Fix Styling Issue using ruff -fix""

This reverts commit ce7c3ff216.

* Revert README changes

* Restore README.md

* Update README.md

* Resolved Comments:

* Fix Readme based on review

* Fix formatting after make style

---------

Co-authored-by: Aryan <aryan@huggingface.co>
2024-08-08 18:53:39 +05:30
David Steinberg
ae026db7aa Fix a dead link (#9116)
Co-authored-by: Aryan <aryan@huggingface.co>
2024-08-08 18:46:50 +05:30
sayantan sadhu
8e3affc669 fix for lr scheduler in distributed training (#9103)
* fix for lr scheduler in distributed training

* Fixed the recalculation of the total training step section

* Fixed lint error

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-08-08 08:45:48 +05:30
Steven Liu
ba7e48455a [docs] Organize model toctree (#9118)
* toctree

* fix
2024-08-08 08:31:58 +05:30
zR
2dad462d9b Add CogVideoX text-to-video generation model (#9082)
* add CogVideoX

---------

Co-authored-by: Aryan <aryan@huggingface.co>
Co-authored-by: sayakpaul <spsayakpaul@gmail.com>
Co-authored-by: Aryan <contact.aryanvs@gmail.com>
Co-authored-by: yiyixuxu <yixu310@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-08-06 21:23:57 -10:00
Dhruv Nair
e3568d14ba Freenoise change vae_batch_size to decode_chunk_size (#9110)
* update

* update
2024-08-07 12:47:18 +05:30
Aryan
f6df22447c [feat] allow sparsectrl to be loaded from single file (#9073)
* allow sparsectrl to be loaded with single file

* update

---------

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2024-08-07 11:12:30 +05:30
latentCall145
9b5180cb5f Flux fp16 inference fix (#9097)
* clipping for fp16

* fix typo

* added fp16 inference to docs

* fix docs typo

* include link for fp16 investigation

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-08-07 10:54:20 +05:30
Aryan
16a93f1a25 [core] FreeNoise (#8948)
* initial work draft for freenoise; needs massive cleanup

* fix freeinit bug

* add animatediff controlnet implementation

* revert attention changes

* add freenoise

* remove old helper functions

* add decode batch size param to all pipelines

* make style

* fix copied from comments

* make fix-copies

* make style

* copy animatediff controlnet implementation from #8972

* add experimental support for num_frames not perfectly fitting context length, ocntext stride

* make unet motion model lora work again based on #8995

* copy load video utils from #8972

* copied from AnimateDiff::prepare_latents

* address the case where last batch of frames does not match length of indices in prepare latents

* decode_batch_size->vae_batch_size; batch vae encode support in animatediff vid2vid

* revert sparsectrl and sdxl freenoise changes

* revert pia

* add freenoise tests

* make fix-copies

* improve docstrings

* add freenoise tests to animatediff controlnet

* update tests

* Update src/diffusers/models/unets/unet_motion_model.py

* add freenoise to animatediff pag

* address review comments

* make style

* update tests

* make fix-copies

* fix error message

* remove copied from comment

* fix imports in tests

* update

---------

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2024-08-07 10:35:18 +05:30
Sayak Paul
2d753b6fb5 fix train_dreambooth_lora_sd3.py loading hook (#9107) 2024-08-07 10:09:47 +05:30
Álvaro Somoza
39e1f7eaa4 [Kolors] Add PAG (#8934)
* txt2img pag added

* autopipe added, fixed case

* style

* apply suggestions

* added fast tests, added todo tests

* revert dummy objects for kolors

* fix pag dummies

* fix test imports

* update pag tests

* add kolor pag to docs

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-08-07 09:29:52 +05:30
Dhruv Nair
e1b603dc2e [Single File] Add single file support for Flux Transformer (#9083)
* update

* update

* update

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-08-07 08:49:57 +05:30
Marc Sun
e4325606db Fix loading sharded checkpoints when we have variants (#9061)
* Fix loading sharded checkpoint when we have variant

* add test

* remote print

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-08-06 13:38:44 -10:00
Ahn Donghoon (안동훈 / suno)
926daa30f9 add PAG support for Stable Diffusion 3 (#8861)
add pag sd3


---------

Co-authored-by: HyoungwonCho <jhw9811@korea.ac.kr>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: crepejung00 <jaewoojung00@naver.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: Aryan <contact.aryanvs@gmail.com>
Co-authored-by: Aryan <aryan@huggingface.co>
2024-08-06 09:11:35 -10:00
Dhruv Nair
325a5de3a9 [Docs] Add community projects section to docs (#9013)
* update

* update

* update
2024-08-06 08:59:39 -07:00
Dhruv Nair
4c6152c2fb update 2024-08-06 12:00:14 +00:00
Vinh H. Pham
87e50a2f1d [Tests] Improve transformers model test suite coverage - Hunyuan DiT (#8916)
* add hunyuan model test

* apply suggestions

* reduce dims further

* reduce dims further

* run make style

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-08-06 12:59:30 +05:30
Aryan
a57a7af45c [bug] remove unreachable norm_type=ada_norm_continuous from norm3 initialization conditions (#9006)
remove ada_norm_continuous from norm3 list

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-08-06 07:23:48 +05:30
Sayak Paul
52f1378e64 [Core] add QKV fusion to AuraFlow and PixArt Sigma (#8952)
* add fusion support to pixart

* add to auraflow.

* add tests

* apply review feedback.

* add back args and kwargs

* style
2024-08-05 14:09:37 -10:00
Tolga Cangöz
3dc97bd148 Update CLIPFeatureExtractor to CLIPImageProcessor and DPTFeatureExtractor to DPTImageProcessor (#9002)
* fix: update `CLIPFeatureExtractor` to `CLIPImageProcessor` in codebase

* `make style && make quality`

* Update `DPTFeatureExtractor` to `DPTImageProcessor` in codebase

* `make style`

---------

Co-authored-by: Aryan <aryan@huggingface.co>
2024-08-05 09:20:29 -10:00
omahs
6d32b29239 Fix typos (#9077)
* fix typo
2024-08-05 09:00:08 -10:00
YiYi Xu
bc3c73ad0b add sentencepiece as a soft dependency (#9065)
* add sentencepiece as  soft dependency for kolors

* up

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-08-05 08:04:51 -10:00
Sayak Paul
5934873b8f [Docs] add stable cascade unet doc. (#9066)
* add stable cascade unet doc.

* fix path
2024-08-05 21:28:48 +05:30
Aryan
b7058d142c PAG variant for HunyuanDiT, PAG refactor (#8936)
* copy hunyuandit pipeline

* pag variant of hunyuan dit

* add tests

* update docs

* make style

* make fix-copies

* Update src/diffusers/pipelines/pag/pag_utils.py

* remove incorrect copied from

* remove pag hunyuan attn procs to resolve conflicts

* add pag attn procs again

* new implementation for pag_utils

* revert pag changes

* add pag refactor back; update pixart sigma

* update pixart pag tests

* apply suggestions from review

Co-Authored-By: yixu310@gmail.com

* make style

* update docs, fix tests

* fix tests

* fix test_components_function since list not accepted as valid __init__ param

* apply patch to fix broken tests

Co-Authored-By: Sayak Paul <spsayakpaul@gmail.com>

* make style

* fix hunyuan tests

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-08-05 17:56:09 +05:30
Vinh H. Pham
e1d508ae92 [Tests] Improve transformers model test suite coverage - Latte (#8919)
* add LatteTransformer3DModel model test

* change patch_size to 1

* reduce req len

* reduce channel dims

* increase num_layers

* reduce dims further

* run make style

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Aryan <aryan@huggingface.co>
2024-08-05 17:13:03 +05:30
Sayak Paul
fc6a91e383 [FLUX] support LoRA (#9057)
* feat: lora support for Flux.

add tests

fix imports

major fixes.

* fix

fixes

final fixes?

* fix

* remove is_peft_available.
2024-08-05 10:24:05 +05:30
Aryan
2b76099610 [refactor] apply qk norm in attention processors (#9071)
* apply qk norm in attention processors

* revert attention processor

* qk-norm in only attention proc 2.0 and fused variant
2024-08-04 05:42:46 -10:00
psychedelicious
4f0d01d387 type get_attention_scores as optional in get_attention_scores (#9075)
`None` is valid for `get_attention_scores`, should be typed as such
2024-08-04 17:19:05 +05:30
asfiyab-nvidia
3dc10a535f Update TensorRT txt2img and inpaint community pipelines (#9037)
* Update TensorRT txt2img and inpaint community pipelines

Signed-off-by: Asfiya Baig <asfiyab@nvidia.com>

* update tensorrt install instructions

Signed-off-by: Asfiya Baig <asfiyab@nvidia.com>

---------

Signed-off-by: Asfiya Baig <asfiyab@nvidia.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-08-04 16:00:40 +05:30
Sayak Paul
c370b90ff1 [Flux] minor documentation fixes for flux. (#9048)
* minor documentation fixes for flux.

* clipskip

* add gist
2024-08-04 15:53:01 +05:30
Philip Rideout
ebf3ab1477 Fix grammar mistake. (#9072) 2024-08-04 04:32:03 +05:30
Aryan
fbe29c6298 [refactor] create modeling blocks specific to AnimateDiff (#8979)
* animatediff specific transformer model

* make style

* make fix-copies

* move blocks to unet motion model

* make style

* remove dummy object

* fix incorrectly passed param causing test failures

* rename model and output class

* fix sparsectrl imports

* remove todo comments

* remove temporal double self attn param from controlnet sparsectrl

* add deprecated versions of blocks

* apply suggestions from review

* update

---------

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2024-08-03 13:03:39 +05:30
Tolga Cangöz
7071b7461b Errata: Fix typos & \s+$ (#9008)
* Fix typos

* chore: Fix typos

* chore: Update README.md for promptdiffusion example

* Trim trailing white spaces

* Fix a typo

* update number

* chore: update number

* Trim trailing white space

* Update README.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update README.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-08-02 21:24:25 -07:00
Frank (Haofan) Wang
a054c78495 Update transformer_flux.py (#9060) 2024-08-03 08:58:32 +05:30
Dhruv Nair
b1f43d7189 Fix Nightly Deps (#9036)
update
2024-08-02 15:18:54 +05:30
Sayak Paul
0e460675e2 [Flux] allow tests to run (#9050)
* fix tests

* fix

* float64 skip

* remove sample_size.

* remove

* remove more

* default_sample_size.

* credit black forest for flux model.

* skip

* fix: tests

* remove OriginalModelMixin

* add transformer model test

* add: transformer model tests
2024-08-02 11:49:59 +05:30
Sayak Paul
7b98c4cc67 [Core] Add PAG support for PixArtSigma (#8921)
* feat: add pixart sigma pag.

* inits.

* fixes

* fix

* remove print.

* copy paste methods to the pixart pag mixin

* fix-copies

* add documentation.

* add tests.

* remove correction file.

* remove pag_applied_layers

* empty
2024-08-02 07:12:41 +05:30
Sayak Paul
27637a5402 Flux pipeline (#9043)
add flux!

Signed-off-by: Adrien <adrien@huggingface.co>
Co-authored-by: Adrien <adrien.69740@gmail.com>
Co-authored-by: Anatoly Belikov <abelikov@singularitynet.io>
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
Co-authored-by: yiyixuxu <yixu310@gmail.com>
2024-08-01 11:30:52 -10:00
Aryan
2ea22e1cc7 [docs] fix pia example (#9015)
fix pia example docstring
2024-08-02 02:47:40 +05:30
YiYi Xu
95a7832879 fix load sharded checkpoint from a subfolder (local path) (#8913)
fix

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-08-01 20:15:42 +05:30
Dhruv Nair
c646fbc124 Updates deps for pipeline test fetcher (#9033)
update
2024-08-01 13:22:13 +05:30
Aryan
05b706c003 PAG variant for AnimateDiff (#8789)
* add animatediff pag pipeline

* remove unnecessary print

* make fix-copies

* fix ip-adapter bug

* update docs

* add fast tests and fix bugs

* update

* update

* address review comments

* update ip adapter single test expected slice

* implement test_from_pipe_consistent_config; fix expected slice values

* LoraLoaderMixin->StableDiffusionLoraLoaderMixin; add latest freeinit test
2024-08-01 12:39:39 +05:30
Yoach Lacombe
ea1b4ea7ca Fix Stable Audio repository id (#9016)
Fix Stable Audio repo id
2024-07-30 23:17:44 +05:30
Aryan
e5b94b4c57 [core] Move community AnimateDiff ControlNet to core (#8972)
* add animatediff controlnet to core

* make style; remove unused method

* fix copied from comment

* add tests

* changes to make tests work

* add utility function to load videos

* update docs

* update pipeline example

* make style

* update docs with example

* address review comments

* add latest freeinit test from #8969

* LoraLoaderMixin -> StableDiffusionLoraLoaderMixin

* fix docs

* Update src/diffusers/utils/loading_utils.py

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

* fix: variable out of scope

---------

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2024-07-30 17:10:37 +05:30
Yoach Lacombe
69e72b1dd1 Stable Audio integration (#8716)
* WIP modeling code and pipeline

* add custom attention processor + custom activation + add to init

* correct ProjectionModel forward

* add stable audio to __initèè

* add autoencoder and update pipeline and modeling code

* add half Rope

* add partial rotary v2

* add temporary modfis to scheduler

* add EDM DPM Solver

* remove TODOs

* clean GLU

* remove att.group_norm to attn processor

* revert back src/diffusers/schedulers/scheduling_dpmsolver_multistep.py

* refactor GLU -> SwiGLU

* remove redundant args

* add channel multiples in autoencoder docstrings

* changes in docsrtings and copyright headers

* clean pipeline

* further cleaning

* remove peft and lora and fromoriginalmodel

* Delete src/diffusers/pipelines/stable_audio/diffusers.code-workspace

* make style

* dummy models

* fix copied from

* add fast oobleck tests

* add brownian tree

* oobleck autoencoder slow tests

* remove TODO

* fast stable audio pipeline tests

* add slow tests

* make style

* add first version of docs

* wrap is_torchsde_available to the scheduler

* fix slow test

* test with input waveform

* add input waveform

* remove some todos

* create stableaudio gaussian projection + make style

* add pipeline to toctree

* fix copied from

* make quality

* refactor timestep_features->time_proj

* refactor joint_attention_kwargs->cross_attention_kwargs

* remove forward_chunk

* move StableAudioDitModel to transformers folder

* correct convert + remove partial rotary embed

* apply suggestions from yiyixuxu -> removing attn.kv_heads

* remove temb

* remove cross_attention_kwargs

* further removal of cross_attention_kwargs

* remove text encoder autocast to fp16

* continue removing autocast

* make style

* refactor how text and audio are embedded

* add paper

* update example code

* make style

* unify projection model forward + fix device placement

* make style

* remove fuse qkv

* apply suggestions from review

* Update src/diffusers/pipelines/stable_audio/pipeline_stable_audio.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* make style

* smaller models in fast tests

* pass sequential offloading fast tests

* add docs for vae and autoencoder

* make style and update example

* remove useless import

* add cosine scheduler

* dummy classes

* cosine scheduler docs

* better description of scheduler

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-07-30 15:29:06 +05:30
Sayak Paul
8c4856cd6c [LoRA] fix: animate diff lora stuff. (#8995)
* fix: animate diff lora stuff.

* fix scaling function for UNetMotionModel

* emoty
2024-07-30 09:18:41 +05:30
Anatoly Belikov
f240a936da handle lora scale and clip skip in lpw sd and sdxl community pipelines (#8988)
* handle lora scale and clip skip in lpw sd and sdxl

* use StableDiffusionLoraLoaderMixin

* use StableDiffusionXLLoraLoaderMixin

* style

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-07-30 07:19:28 +05:30
Sayak Paul
00d8d46e23 [Docs] credit where it's due for Lumina and Latte. (#9000)
credit where it's due for Lumina and Latte.
2024-07-29 10:02:03 -07:00
Adrien
bfc9369f0a [CI] Update runner configuration for setup and nightly tests (#9005)
* [CI] Update runner configuration for setup and nightly tests

Signed-off-by: Adrien <adrien@huggingface.co>

* fix group

Signed-off-by: Adrien <adrien@huggingface.co>

* update for t4

Signed-off-by: Adrien <adrien@huggingface.co>

---------

Signed-off-by: Adrien <adrien@huggingface.co>
2024-07-29 21:14:31 +05:30
Álvaro Somoza
73acebb8cf [Kolors] Add IP Adapter (#8901)
* initial draft

* apply suggestions

* fix failing test

* added ipa to img2img

* add docs

* apply suggestions
2024-07-26 14:25:44 -04:00
Aryan
ca0747a07e remove unused code from pag attn procs (#8928) 2024-07-26 07:58:40 -10:00
Aryan
5c53ca5ed8 [core] AnimateDiff SparseCtrl (#8897)
* initial sparse control model draft

* remove unnecessary implementation

* copy animatediff pipeline

* remove deprecated callbacks

* update

* update pipeline implementation progress

* make style

* make fix-copies

* update progress

* add partially working pipeline

* remove debug prints

* add model docs

* dummy objects

* improve motion lora conversion script

* fix bugs

* update docstrings

* remove unnecessary model params; docs

* address review comment

* add copied from to zero_module

* copy animatediff test

* add fast tests

* update docs

* update

* update pipeline docs

* fix expected slice values

* fix license

* remove get_down_block usage

* remove temporal_double_self_attention from get_down_block

* update

* update docs with org and documentation images

* make from_unet work in sparsecontrolnetmodel

* add latest freeinit test from #8969

* make fix-copies

* LoraLoaderMixin -> StableDiffsuionLoraLoaderMixin
2024-07-26 17:46:05 +05:30
Aryan
57a021d5e4 [fix] FreeInit step index out of bounds (#8969)
* fix step index out of bounds

* add test for free_init with different schedulers

* add test to vid2vid and pia
2024-07-26 16:45:55 +05:30
Dhruv Nair
1168eaaadd [CI] Nightly Test Runner explicitly set runner for Setup Pipeline Matrix (#8986)
* update

* update

* update
2024-07-26 13:20:35 +05:30
Dhruv Nair
bce9105ac7 [CI] Fix parallelism in nightly tests (#8983)
update
2024-07-26 10:04:01 +05:30
RandomGamingDev
2afb2e0aac Added accelerator based gradient accumulation for basic_example (#8966)
added accelerator based gradient accumulation for basic_example

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-07-26 09:35:52 +05:30
Sayak Paul
d87fe95f90 [Chore] add LoraLoaderMixin to the inits (#8981)
* introduce  to promote reusability.

* up

* add more tests

* up

* remove comments.

* fix fuse_nan test

* clarify the scope of fuse_lora and unfuse_lora

* remove space

* rewrite fuse_lora a bit.

* feedback

* copy over load_lora_into_text_encoder.

* address dhruv's feedback.

* fix-copies

* fix issubclass.

* num_fused_loras

* fix

* fix

* remove mapping

* up

* fix

* style

* fix-copies

* change to SD3TransformerLoRALoadersMixin

* Apply suggestions from code review

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

* up

* handle wuerstchen

* up

* move lora to lora_pipeline.py

* up

* fix-copies

* fix documentation.

* comment set_adapters().

* fix-copies

* fix set_adapters() at the model level.

* fix?

* fix

* loraloadermixin.

---------

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2024-07-26 08:59:33 +05:30
Sayak Paul
50e66f2f95 [Chore] remove all is from auraflow. (#8980)
remove all is from auraflow.
2024-07-26 07:31:06 +05:30
efwfe
9b8c8605d1 fix guidance_scale value not equal to the value in comments (#8941)
fix guidance_scale value not equal with the value in comments
2024-07-25 12:31:37 -10:00
YiYi Xu
62863bb1ea Revert "[LoRA] introduce LoraBaseMixin to promote reusability." (#8976)
Revert "[LoRA] introduce LoraBaseMixin to promote reusability. (#8774)"

This reverts commit 527430d0a4.
2024-07-25 09:10:35 -10:00
mazharosama
1fd647f2a0 Enable CivitAI SDXL Inpainting Models Conversion (#8795)
modify in_channels in network_config params

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2024-07-25 07:44:57 -10:00
asfiyab-nvidia
0bda1d7b89 Update TensorRT img2img community pipeline (#8899)
* Update TensorRT img2img pipeline

Signed-off-by: Asfiya Baig <asfiyab@nvidia.com>

* Update TensorRT version installed

Signed-off-by: Asfiya Baig <asfiyab@nvidia.com>

* make style and quality

Signed-off-by: Asfiya Baig <asfiyab@nvidia.com>

* Update examples/community/stable_diffusion_tensorrt_img2img.py

Co-authored-by: Tolga Cangöz <46008593+tolgacangoz@users.noreply.github.com>

* Update examples/community/README.md

Co-authored-by: Tolga Cangöz <46008593+tolgacangoz@users.noreply.github.com>

* Apply style and quality using ruff 0.1.5

Signed-off-by: Asfiya Baig <asfiyab@nvidia.com>

---------

Signed-off-by: Asfiya Baig <asfiyab@nvidia.com>
Co-authored-by: Tolga Cangöz <46008593+tolgacangoz@users.noreply.github.com>
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2024-07-25 21:58:21 +05:30
Sayak Paul
527430d0a4 [LoRA] introduce LoraBaseMixin to promote reusability. (#8774)
* introduce  to promote reusability.

* up

* add more tests

* up

* remove comments.

* fix fuse_nan test

* clarify the scope of fuse_lora and unfuse_lora

* remove space

* rewrite fuse_lora a bit.

* feedback

* copy over load_lora_into_text_encoder.

* address dhruv's feedback.

* fix-copies

* fix issubclass.

* num_fused_loras

* fix

* fix

* remove mapping

* up

* fix

* style

* fix-copies

* change to SD3TransformerLoRALoadersMixin

* Apply suggestions from code review

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

* up

* handle wuerstchen

* up

* move lora to lora_pipeline.py

* up

* fix-copies

* fix documentation.

* comment set_adapters().

* fix-copies

* fix set_adapters() at the model level.

* fix?

* fix

---------

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2024-07-25 21:40:58 +05:30
Aryan
3ae0ee88d3 [tests] speed up animatediff tests (#8846)
* speed up animatediff tests

* fix pia test_ip_adapter_single

* fix tests/pipelines/pia/test_pia.py::PIAPipelineFastTests::test_dict_tuple_outputs_equivalent

* update

* fix ip adapter tests

* skip test_from_pipe_consistent_config tests

* fix prompt_embeds test

* update test_from_pipe_consistent_config tests

* fix expected_slice values

* remove temporal_norm_num_groups from UpBlockMotion

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2024-07-25 17:35:43 +05:30
Dhruv Nair
5fbb4d32d5 [CI] Slow Test Updates (#8870)
* update

* update

* update
2024-07-25 16:00:43 +05:30
Sayak Paul
d8bcb33f4b [Tests] fix slices of 26 tests (first half) (#8959)
* check for assertions.

* update with correct slices.

* okay

* style

* get it ready

* update

* update

* update

---------

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2024-07-25 14:56:49 +05:30
Sanchit Gandhi
4a782f462a [AudioLDM2] Fix cache pos for GPT-2 generation (#8964)
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-07-25 09:21:49 +05:30
RandomGamingDev
cdd12bde17 Added Code for Gradient Accumulation to work for basic_training (#8961)
added line allowing gradient accumulation to work for basic_training example
2024-07-25 08:40:53 +05:30
Sayak Paul
2c25b98c8e [AuraFlow] fix long prompt handling (#8937)
fix
2024-07-24 11:19:30 +05:30
Dhruv Nair
93983b6780 [CI] Skip flaky download tests in PR CI (#8945)
update
2024-07-24 09:25:06 +05:30
Sayak Paul
41b705f42d remove residual i from auraflow. (#8949)
* remove residual i.

* rename to aura_flow in pipeline test
2024-07-24 07:31:54 +05:30
Sayak Paul
50d21f7c6a [Core] fix QKV fusion for attention (#8829)
* start debugging the problem,

* start

* fix

* fix

* fix imports.

* handle hunyuan

* remove residuals.

* add a check for making sure there's appropriate procs.

* add more rigor to the tests.

* fix test

* remove redundant check

* fix-copies

* move check_qkv_fusion_matches_attn_procs_length and check_qkv_fusion_processors_exist.
2024-07-24 06:52:19 +05:30
Dhruv Nair
3bb1fd6fc0 Fix name when saving text inversion embeddings in dreambooth advanced scripts (#8927)
update
2024-07-23 19:51:20 +05:30
Tolga Cangöz
cf55dcf0ff Fix Colab and Notebook checks for diffusers-cli env (#8408)
* chore: Update is_google_colab check to use environment variable

* Check Colab with all possible COLAB_* env variables

* Remove unnecessary word

* Make `_is_google_colab` more inclusive

* Revert "Make `_is_google_colab` more inclusive"

This reverts commit 6406db21ac.

* Make `_is_google_colab` more inclusive.

* chore: Update import_utils.py with notebook check improvement

* Refactor import_utils.py to improve notebook detection for VS Code's notebook

* chore: Remove `is_notebook()` function and related code

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-07-23 18:04:20 +05:30
Vinh H. Pham
7a95f8d9d8 [Tests] Improve transformers model test suite coverage - Temporal Transformer (#8932)
* add test for temporal transformer

* remove unused variable

* fix code quality

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-07-23 15:36:30 +05:30
akbaig
7710415baf fix: checkpoint save issue in advanced dreambooth lora sdxl script (#8926)
Co-authored-by: Linoy Tsaban <57615435+linoytsaban@users.noreply.github.com>
2024-07-23 14:44:56 +05:30
Aritra Roy Gosthipaty
8b21feed42 [Tests] reduce the model size in the audioldm2 fast test (#7846)
* chore: initial model size reduction

* chore: fixing expected values for failing tests

* requested edits

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-07-23 14:34:07 +05:30
Dhruv Nair
f57b27d2ad Update pipeline test fetcher (#8931)
update
2024-07-23 10:02:22 +05:30
Sayak Paul
c5fdf33a10 [Benchmarking] check if runner helps to restore benchmarking (#8929)
* check if runner helps.

* remove caching

* gpus

* update runner group
2024-07-23 06:38:13 +05:30
Vishnu V Jaddipal
77c5de2e05 Add attentionless VAE support (#8769)
* Add attentionless VAE support

* make style and quality, fix-copies

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-07-22 14:16:58 -10:00
Sayak Paul
af400040f5 [Tests] proper skipping of request caching test (#8908)
proper skipping of request caching test
2024-07-22 12:52:57 -10:00
Jiwook Han
5802c2e3f2 Reflect few contributions on ethical_guidelines.md that were not reflected on #8294 (#8914)
fix_ethical_guidelines.md
2024-07-22 08:48:23 -07:00
Sayak Paul
f4af03b350 [Docs] small fixes to pag guide. (#8920)
small fixes to pag guide.
2024-07-22 08:35:01 -07:00
Seongsu Park
267bf65707 🌐 [i18n-KO] Translated docs to Korean (added 7 docs and etc) (#8804)
* remove unused docs

* add ko-18n docs

* docs typo, edit etc

* reorder list, add `in translation` in toctree

* fix minor translation

* fix docs minor tone, etc
2024-07-22 08:08:44 -07:00
Sayak Paul
1a8b3c2ee8 [Training] SD3 training fixes (#8917)
* SD3 training fixes

Co-authored-by: bghira <59658056+bghira@users.noreply.github.com>

* rewrite noise addition part to respect the eqn.

* styler

* Update examples/dreambooth/README_sd3.md

Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>

---------

Co-authored-by: bghira <59658056+bghira@users.noreply.github.com>
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
2024-07-21 16:24:04 +05:30
Lucain
56e772ab7e Use model_info.id instead of model_info.modelId (#8912)
Mention model_info.id instead of model_info.modelId
2024-07-20 20:01:21 +05:30
Pierre Chapuis
fe7948941d allow tensors in several schedulers step() call (#8905) 2024-07-19 18:58:06 -10:00
王奇勋
461efc57c5 [fix code annotation] Adjust the dimensions of the rotary positional embedding. (#8890)
* 2d rotary pos emb dim

* make style

---------

Co-authored-by: haofanwang <haofanwang.ai@gmail.com>
2024-07-19 18:57:36 -10:00
shinetzh
3b04cdc816 fix loop bug in SlicedAttnProcessor (#8836)
* fix loop bug in SlicedAttnProcessor


---------

Co-authored-by: neoshang <neoshang@tencent.com>
2024-07-19 18:14:29 -10:00
Álvaro Somoza
c009c203be [SDXL] Fix uncaught error with image to image (#8856)
* initial commit

* apply suggestion to sdxl pipelines

* apply fix to sd pipelines
2024-07-19 12:06:36 -10:00
Dhruv Nair
3f1411767b SSH into cpu runner additional fix (#8893)
* update

* update

* update
2024-07-18 16:18:45 +05:30
Dhruv Nair
588fb5c105 SSH into cpu runner fix (#8888)
* update

* update
2024-07-18 11:00:05 +05:30
Dhruv Nair
eb24e4bdb2 Add option to SSH into CPU runner. (#8884)
update
2024-07-18 10:20:24 +05:30
Sayak Paul
e02ec27e51 [Core] remove resume_download from Hub related stuff (#8648)
* remove resume_download

* fix: _fetch_index_file call.

* remove resume_download from docs.
2024-07-18 09:48:42 +05:30
Sayak Paul
a41e4c506b [Chore] add disable forward chunking to SD3 transformer. (#8838)
add disable forward chunking to SD3 transformer.
2024-07-18 09:30:18 +05:30
Aryan
12625c1c9c [docs] pipeline docs for latte (#8844)
* add pipeline docs for latte

* add inference time to latte docs

* apply review suggestions
2024-07-18 09:27:48 +05:30
Tolga Cangöz
c1dc2ae619 Fix multi-gpu case for train_cm_ct_unconditional.py (#8653)
* Fix multi-gpu case

* Prefer previously created `unwrap_model()` function

For `torch.compile()` generalizability

* `chore: update unwrap_model() function to use accelerator.unwrap_model()`
2024-07-17 19:03:12 +05:30
Beinsezii
e15a8e7f17 Add AuraFlowPipeline and KolorsPipeline to auto map (#8849)
* Add AuraFlowPipeline and KolorsPipeline to auto map

Just T2I. Validated using `quickdif`

* Add Kolors I2I and SD3 Inpaint auto maps

* style

---------

Co-authored-by: yiyixuxu <yixu310@gmail.com>
2024-07-16 17:13:28 -10:00
Sayak Paul
c2fbf8da02 [Chore] allow auraflow latest to be torch compile compatible. (#8859)
* allow auraflow latest to be torch compile compatible.

* default to 1024 1024.
2024-07-17 08:26:36 +05:30
Sayak Paul
0f09b01ab3 [Core] fix: shard loading and saving when variant is provided. (#8869)
fix: shard loading and saving when variant is provided.
2024-07-17 08:26:28 +05:30
Sayak Paul
f6cfe0a1e5 modify pocs. (#8867) 2024-07-17 08:26:13 +05:30
Tolga Cangöz
e87bf62940 [Cont'd] Add the SDE variant of ~~DPM-Solver~~ and DPM-Solver++ to DPM Single Step (#8269)
* Add the SDE variant of DPM-Solver and DPM-Solver++ to DPM Single Step


---------

Co-authored-by: cmdr2 <secondary.cmdr2@gmail.com>
2024-07-16 15:40:02 -10:00
Sayak Paul
3b37fefee9 [Docker] include python3.10 dev and solve header missing problem (#8865)
include python3.10 dev and solve header missing problem
2024-07-16 16:02:39 +05:30
Aryan
bbd2f9d4e9 [tests] fix typo in pag tests (#8845)
* fix typo in pag tests

* fix typo
2024-07-12 17:41:34 +05:30
Nguyễn Công Tú Anh
d704b3bf8c add PAG support sd15 controlnet (#8820)
* add pag support sd15 controlnet

* fix quality import

* remove unecessary import

* remove if state

* fix tests

* remove useless function

* add sd1.5 controlnet pag docs

---------

Co-authored-by: anhnct8 <anhnct8@fpt.com>
2024-07-12 15:42:56 +05:30
ustcuna
9f963e7349 [Community Pipelines] Accelerate inference of AnimateDiff by IPEX on CPU (#8643)
* add animatediff_ipex community pipeline

* address the 1st round review comments
2024-07-12 14:31:15 +05:30
Sayak Paul
973a62d408 [Docs] add AuraFlow docs (#8851)
* add pipeline documentation.

* add api spec for pipeline

* model documentation

* model spec
2024-07-12 09:52:18 +02:00
Dhruv Nair
11d18f3217 Add single file loading support for AnimateDiff (#8819)
* update

* update

* update

* update
2024-07-12 09:51:57 +05:30
Dhruv Nair
d2df40c6f3 Add VAE tiling option for SD3 (#8791)
update
2024-07-11 09:49:39 -10:00
Sayak Paul
2261510bbc [Core] Add AuraFlow (#8796)
* add lavender flow transformer

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-07-11 08:50:19 -10:00
Álvaro Somoza
87b9db644b [Core] Add Kolors (#8812)
* initial draft
2024-07-11 06:09:17 -10:00
Xin Ma
b8cf84a3f9 Latte: Latent Diffusion Transformer for Video Generation (#8404)
* add Latte to diffusers

* remove print

* remove print

* remove print

* remove unuse codes

* remove layer_norm_latte and add a flag

* remove layer_norm_latte and add a flag

* update latte_pipeline

* update latte_pipeline

* remove unuse squeeze

* add norm_hidden_states.ndim == 2: # for Latte

* fixed test latte pipeline bugs

* fixed test latte pipeline bugs

* delete sh

* add doc for latte

* add licensing

* Move Transformer3DModelOutput to modeling_outputs

* give a default value to sample_size

* remove the einops dependency

* change norm2 for latte

* modify pipeline of latte

* update test for Latte

* modify some codes for latte

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* modify for Latte pipeline

* video_length -> num_frames; update prepare_latents copied from

* make fix-copies

* make style

* typo: videe -> video

* update

* modify for Latte pipeline

* modify latte pipeline

* modify latte pipeline

* modify latte pipeline

* modify latte pipeline

* modify for Latte pipeline

* Delete .vscode directory

* make style

* make fix-copies

* add latte transformer 3d to docs _toctree.yml

* update example

* reduce frames for test

* fixed bug of _text_preprocessing

* set num frame to 1 for testing

* remove unuse print

* add text = self._clean_caption(text) again

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: Aryan <contact.aryanvs@gmail.com>
Co-authored-by: Aryan <aryan@huggingface.co>
2024-07-11 15:06:22 +05:30
Alan Du
673eb60f1c Reformat docstring for get_timestep_embedding (#8811)
* Reformat docstring for `get_timestep_embedding`


---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-07-10 15:54:44 -10:00
Sayak Paul
a785992c1d [Tests] fix more sharding tests (#8797)
* fix

* fix

* ugly

* okay

* fix more

* fix oops
2024-07-09 13:09:36 +05:30
Xu Cao
35cc66dc4c Add pipeline_stable_diffusion_3_inpaint.py for SD3 Inference (#8709)
* Add pipeline_stable_diffusion_3_inpaint


---------

Co-authored-by: Xu Cao <xucao2@jrehg-work-01.cs.illinois.edu>
Co-authored-by: IrohXu <irohcao@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-07-08 15:53:02 -10:00
Tolga Cangöz
57084dacc5 Remove unnecessary lines (#8569)
* Remove unused line


---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-07-08 10:42:02 -10:00
Zhuoqun(Jack) Chen
70611a1068 Fix static typing and doc typos (#8807)
* Fix static typing and doc typos

* Fix more same type hint typos with make fix-copies
2024-07-08 09:09:33 -10:00
PommesPeter
98388670d2 [Alpha-VLLM Team] Add Lumina-T2X to diffusers (#8652)
---------

Co-authored-by: zhuole1025 <zhuole1025@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-07-07 17:12:09 -10:00
YiYi Xu
9e9ed353a2 fix loading sharded checkpoints from subfolder (#8798)
* fix load sharded checkpoints from subfolder{

* style

* os.path.join

* add a small test

---------

Co-authored-by: sayakpaul <spsayakpaul@gmail.com>
2024-07-06 11:32:04 -10:00
apolinário
7833ed957b Improve model card for push_to_hub trainers (#8697)
* Improve trainer model cards

* Update train_dreambooth_sd3.py

* Update train_dreambooth_lora_sd3.py

* add link to adapters loading doc

* Update train_dreambooth_lora_sd3.py

---------

Co-authored-by: Linoy Tsaban <57615435+linoytsaban@users.noreply.github.com>
2024-07-05 12:18:41 +05:30
Dhruv Nair
85c4a326e0 Fix saving text encoder weights and kohya weights in advanced dreambooth lora script (#8766)
* update

* update

* update
2024-07-05 11:28:50 +05:30
Dhruv Nair
0bab9d6be7 [Single File] Allow loading T5 encoder in mixed precision (#8778)
* update

* update

* update

* update
2024-07-05 10:29:38 +05:30
Thomas Eding
2e2684f014 Add vae_roundtrip.py example (#7104)
* Add vae_roundtrip.py example

* Add cuda support to vae_roundtrip

* Move vae_roundtrip.py into research_projects/vae

* Fix channel scaling in vae roundrip and also support taesd.

* Apply ruff --fix for CI gatekeep check

---------

Co-authored-by: Álvaro Somoza <asomoza@users.noreply.github.com>
2024-07-04 01:53:09 -04:00
Sayak Paul
31adeb41cd [Tests] fix sharding tests (#8764)
fix sharding tests
2024-07-04 08:50:59 +05:30
Aryan
a7b9634e95 Fix minor bug in SD3 img2img test (#8779)
fix minor bug in sd3 img2img
2024-07-03 07:45:37 -10:00
XCL
6b6b4bcffe [Tencent Hunyuan Team] Add checkpoint conversion scripts and changed controlnet (#8783)
* add conversion files; changed controlnet for hunyuandit

* style

---------

Co-authored-by: xingchaoliu <xingchaoliu@tencent.com>
Co-authored-by: yiyixuxu <yixu310@gmail.com>
2024-07-03 07:45:18 -10:00
Linoy Tsaban
beb1c017ad [advanced dreambooth lora] add clip_skip arg (#8715)
* add clip_skip

* style

* smol fix

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-07-03 12:15:16 -05:00
Sayak Paul
06ee4db3e7 [Chore] add dummy lora attention processors to prevent failures in other libs (#8777)
add dummy lora attention processors to prevent failures in other libs
2024-07-03 13:11:00 +05:30
Sayak Paul
84bbd2f4ce Update README.md to include Colab link (#8775) 2024-07-03 07:46:38 +05:30
Sayak Paul
600ef8a4dc Allow SD3 DreamBooth LoRA fine-tuning on a free-tier Colab (#8762)
* add experimental scripts to train SD3 transformer lora on colab

* add readme

* add colab

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* fix link in the notebook.

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-07-03 07:07:47 +05:30
Sayak Paul
984d340534 Revert "[LoRA] introduce LoraBaseMixin to promote reusability." (#8773)
Revert "[LoRA] introduce `LoraBaseMixin` to promote reusability. (#8670)"

This reverts commit a2071a1837.
2024-07-03 07:05:01 +05:30
Sayak Paul
a2071a1837 [LoRA] introduce LoraBaseMixin to promote reusability. (#8670)
* introduce  to promote reusability.

* up

* add more tests

* up

* remove comments.

* fix fuse_nan test

* clarify the scope of fuse_lora and unfuse_lora

* remove space
2024-07-03 07:04:37 +05:30
YiYi Xu
d9f71ab3c3 correct attention_head_dim for JointTransformerBlock (#8608)
* add

* update sd3 controlnet

* Update src/diffusers/models/controlnet_sd3.py

---------

Co-authored-by: yiyixuxu <yixu310@gmail,com>
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2024-07-02 07:42:25 -10:00
Jiwook Han
dd4b731e68 Reflect few contributions on philosophy.md that were not reflected on #8294 (#8690)
* Update philosophy.md 

Some contributions were not reflected previously, so I am resubmitting them.

* Update docs/source/ko/conceptual/philosophy.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/ko/conceptual/philosophy.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-07-02 08:43:56 -07:00
Dhruv Nair
31b211bfe3 Fix mistake in Single File Docs page (#8765)
update
2024-07-02 12:45:49 +05:30
Dhruv Nair
610a71d7d4 Fix indent in dreambooth lora advanced SD 15 script (#8753)
update
2024-07-02 11:07:34 +05:30
Dhruv Nair
c104482b9c Fix warning in UNetMotionModel (#8756)
* update

* Update src/diffusers/models/unets/unet_motion_model.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-07-02 11:07:13 +05:30
Dhruv Nair
c7a84ba2f4 Enforce ordering when running Pipeline slow tests (#8763)
update
2024-07-02 10:55:50 +05:30
YiYi Xu
8b1e3ec93e [hunyuan-dit] refactor HunyuanCombinedTimestepTextSizeStyleEmbedding (#8761)
up

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-07-02 10:11:04 +05:30
Sayak Paul
4e57aeff1f [Tests] add test suite for SD3 DreamBooth (#8650)
* add a test suite for SD3 DreamBooth

* lora suite

* style

* add checkpointing tests for LoRA

* add test to cover train_text_encoder.
2024-07-02 07:00:22 +05:30
Álvaro Somoza
af92869d9b [SD3 LoRA Training] Fix errors when not training text encoders (#8743)
* fix

* fix things.

Co-authored-by: Linoy Tsaban <linoy.tsaban@gmail.com>

* remove patch

* apply suggestions

---------

Co-authored-by: Linoy Tsaban <57615435+linoytsaban@users.noreply.github.com>
Co-authored-by: sayakpaul <spsayakpaul@gmail.com>
Co-authored-by: Linoy Tsaban <linoy.tsaban@gmail.com>
2024-07-02 06:21:16 +05:30
Haofan Wang
0bae6e447c Allow from_transformer in SD3ControlNetModel (#8749)
* Update controlnet_sd3.py

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-07-01 07:38:38 -10:00
Dhruv Nair
0368483b61 Remove legacy single file model loading mixins (#8754)
update
2024-07-01 07:20:19 -10:00
YiYi Xu
ddb9d8548c [doc] add a tip about using SDXL refiner with hunyuan-dit and pixart (#8735)
* up

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-07-01 06:30:09 -10:00
Lucain
49979753e1 Always raise from previous error (#8751) 2024-07-01 14:22:30 +05:30
XCL
a3904d7e34 [Tencent Hunyuan Team] Add HunyuanDiT-v1.2 Support (#8747)
* add v1.2 support

---------

Co-authored-by: xingchaoliu <xingchaoliu@tencent.com>
Co-authored-by: yiyixuxu <yixu310@gmail.com>
2024-06-30 21:33:38 -10:00
WenheLI
7bfc1ee1b2 fix the LR schedulers for dreambooth_lora (#8510)
* update training

* update

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Linoy Tsaban <57615435+linoytsaban@users.noreply.github.com>
2024-07-01 08:14:57 +05:30
Bhavay Malhotra
71c046102b [train_controlnet_sdxl.py] Fix the LR schedulers when num_train_epochs is passed in a distributed training env (#8476)
* Create diffusers.yml

* num_train_epochs

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-07-01 07:21:40 +05:30
Sayak Paul
83b112a145 shift cache in benchmarking. (#8740)
* shift cache.

* comment
2024-07-01 07:14:05 +05:30
Shauray Singh
8690e8b9d6 add PAG support for SD architecture (#8725)
* add pag to sd pipelines
2024-06-29 09:26:11 -10:00
Sayak Paul
7db8c3ec40 Benchmarking workflow fix (#8389)
* fix

* fixes

* add back the deadsnakes

* better messaging

* disable IP adapter tests for the moment.

* style

* up

* empty
2024-06-29 09:06:32 +05:30
Álvaro Somoza
9b7acc7cf2 [Community pipeline] SD3 Differential Diffusion Img2Img Pipeline (#8679)
* new pipeline
2024-06-28 17:12:39 -10:00
Luo Chaofan
a216b0bb7f fix: ValueError when using FromOriginalModelMixin in subclasses #8440 (#8454)
* fix: ValueError when using FromOriginalModelMixin in subclasses #8440

(cherry picked from commit 9285997843)

* Update src/diffusers/loaders/single_file_model.py

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

* Update single_file_model.py

* Update single_file_model.py

---------

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-06-28 17:15:46 +05:30
Dhruv Nair
150142c537 [Tests] Fix precision related issues in slow pipeline tests (#8720)
update
2024-06-28 08:13:46 +05:30
Linoy Tsaban
35f45ecd71 [Advanced dreambooth lora] adjustments to align with canonical script (#8406)
* minor changes

* minor changes

* minor changes

* minor changes

* minor changes

* minor changes

* minor changes

* fix

* fix

* aligning with blora script

* aligning with blora script

* aligning with blora script

* aligning with blora script

* aligning with blora script

* remove prints

* style

* default val

* license

* move save_model_card to outside push_to_hub

* Update train_dreambooth_lora_sdxl_advanced.py

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-06-27 13:27:37 +05:30
Sayak Paul
d5dd8df3b4 [Chore] perform better deprecation for vqmodeloutput (#8719)
perform better deprecation for vqmodeloutput
2024-06-27 12:16:37 +05:30
Mathis Koroglu
3e0d128da7 Motion Model / Adapter versatility (#8301)
* Motion Model / Adapter versatility

- allow to use a different number of layers per block
- allow to use a different number of transformer per layers per block
- allow a different number of motion attention head per block
- use dropout argument in get_down/up_block in 3d blocks

* Motion Model added arguments renamed & refactoring

* Add test for asymmetric UNetMotionModel
2024-06-27 11:11:29 +05:30
vincedovy
a536e775fb Fix json WindowsPath crash (#8662)
* Add check for WindowsPath in to_json_string

On Windows, os.path.join returns a WindowsPath. to_json_string does not convert this from a WindowsPath to a string. Added check for WindowsPath to to_json_saveable.

* Remove extraneous convert to string in test_check_path_types (tests/others/test_config.py)

* Fix style issues in tests/others/test_config.py

* Add unit test to test_config.py to verify that PosixPath and WindowsPath (depending on system) both work when converted to JSON

* Remove distinction between PosixPath and WindowsPath in ConfigMixIn.to_json_string(). Conditional now tests for Path, and uses Path.as_posix() to convert to string.

---------

Co-authored-by: Vincent Dovydaitis <vincedovy@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-06-27 10:30:55 +05:30
Álvaro Somoza
3b01d72a64 Modify FlowMatch Scale Noise (#8678)
* initial fix

* apply suggestion

* delete step_index line
2024-06-27 00:36:33 -04:00
Sayak Paul
e2a4a46e99 [Release notification] add some info when there is an error. (#8718)
add some info when there is an error.
2024-06-27 09:49:15 +05:30
Sayak Paul
eda560d34c modify PR and issue templates (#8687)
* modify PR and issue templates

* add single file poc.
2024-06-27 09:01:47 +05:30
Sayak Paul
adbb04864d [LoRA] fix conversion utility so that lora dora loads correctly (#8688)
fix conversion utility so that lora dora loads correctly
2024-06-27 08:58:32 +05:30
Dhruv Nair
effe4b9784 Update xformers SD3 test (#8712)
update
2024-06-26 10:24:27 -10:00
Sayak Paul
5b51ad0052 [LoRA] fix vanilla fine-tuned lora loading. (#8691)
fix vanilla fine-tuned lora loading.
2024-06-26 07:38:57 -10:00
Sayak Paul
10b4e354b6 [Chore] remove deprecation from transformer2d regarding the output class. (#8698)
* remove deprecation from transformer2d regarding the output class.

* up

* deprecate more
2024-06-26 07:35:36 -10:00
Donald.Lee
ea6938aea5 Fix: unet save_attn_procs at UNet2DconditionLoadersMixin (#8699)
* fix: unet save_attn_procs at custom diffusion

* style: recover unchanaged parts(max line length 119) / mod: add condition

* style: recover unchanaged parts(max line length 119)

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-06-26 22:30:49 +05:30
Sayak Paul
8ef0d9deff [Observability] add reporting mechanism when mirroring community pipelines. (#8676)
* add reporting mechanism when mirroring community pipelines.

* remove unneeded argument

* get the actual PATH_IN_REPO

* don't need tag
2024-06-26 22:11:33 +05:30
XCL
fa2abfdb03 [Tencent Hunyuan Team] Add Hunyuan-DiT ControlNet Inference (#8694)
* add controlnet support

---------

Co-authored-by: xingchaoliu <xingchaoliu@tencent.com>
Co-authored-by: yiyixuxu <yixu310@gmail,com>
2024-06-26 00:43:03 -10:00
YiYi Xu
1d3ef67b09 [doc] add more about from_pipe API for PAG doc (#8701)
* add more about from_pipe API

* Update docs/source/en/using-diffusers/pag.md

* Update docs/source/en/using-diffusers/pag.md

---------

Co-authored-by: yiyixuxu <yixu310@gmail,com>
2024-06-25 22:26:12 -10:00
Dhruv Nair
0f0b531827 Add decorator for compile tests (#8703)
* update

* update

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-06-26 11:26:47 +05:30
Sayak Paul
e8284281c1 add docs on model sharding (#8658)
* add docs on model sharding

* add entry to _toctree.

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* simplify wording

* add a note on transformer library handling

* move device placement section

* Update docs/source/en/training/distributed_inference.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-06-26 07:35:11 +05:30
YiYi Xu
715a7da1b2 add sd3 conversion script (#8702)
add conversion script
2024-06-25 14:24:58 -10:00
Álvaro Somoza
14d224d4e6 [Docs] SD3 T5 Token limit doc (#8654)
* doc for max_sequence_length

* better position and changed note to tip

* apply suggestions

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-06-25 14:41:27 -04:00
YiYi Xu
540399f540 add PAG support (#7944)
* first draft


---------

Co-authored-by: yiyixuxu <yixu310@gmail,com>
Co-authored-by: Junhwa Song <ethan9867@gmail.com>
Co-authored-by: Ahn Donghoon (안동훈 / suno) <suno.vivid@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-06-25 08:40:02 -10:00
Sayak Paul
f088027e93 [Marigold tests] add is_flaky decorator to some Marigold tests (#8696)
okay
2024-06-25 06:27:28 -10:00
Linoy Tsaban
c6e08ecd46 [Sd3 Dreambooth LoRA] Add text encoder training for the clip encoders (#8630)
* add clip text-encoder training

* no dora

* text encoder traing fixes

* text encoder traing fixes

* text encoder training fixes

* text encoder training fixes

* text encoder training fixes

* text encoder training fixes

* add text_encoder layers to save_lora

* style

* fix imports

* style

* fix text encoder

* review changes

* review changes

* review changes

* minor change

* add lora tag

* style

* add readme notes

* add tests for clip encoders

* style

* typo

* fixes

* style

* Update tests/lora/test_lora_layers_sd3.py

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* Update examples/dreambooth/README_sd3.md

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* minor readme change

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-06-25 18:00:19 +05:30
Sayak Paul
4ad7a1f5fd [Chore] create a utility for calculating the expected number of shards. (#8692)
create a utility for calculating the expected number of shards.
2024-06-25 17:05:39 +05:30
Hammond Liu
1f81fbe274 Fix redundant pipe init in sd3 lora (#8680)
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-06-25 07:31:20 +05:30
Tolga Cangöz
589931ca79 Errata - Update class method convention to use cls (#8574)
* Class methods are supposed to use `cls` conventionally

* `make style && make quality`

* An Empty commit

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-06-24 10:35:45 -07:00
Steven Liu
675be88f00 [docs] Add note for float8 (#8685)
add note
2024-06-24 10:13:34 -07:00
Steven Liu
df4ad6f4ac [docs] Fix Pillow import (#8684)
fix import error
2024-06-24 10:13:15 -07:00
Sayak Paul
bc90c28bc9 [Docs] add note on caching in fast diffusion (#8675)
* add note on caching in fast diffusion

* formatting

* Update docs/source/en/tutorials/fast_diffusion.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-06-24 10:10:45 -07:00
Tolga Cangöz
f040c27d4c Errata - Fix typos and improve style (#8571)
* Fix typos

* Fix typos & up style

* chore: Update numbers

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-06-24 10:07:22 -07:00
Tolga Cangöz
138fac703a Discourage using deprecated revision parameter (#8573)
* Discourage using `revision`

* `make style && make quality`

* Refactor code to use 'variant' instead of 'revision'

* `revision="bf16"` -> `variant="bf16"`

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-06-24 10:06:49 -07:00
Tolga Cangöz
468ae09ed8 Errata - Trim trailing white space in the whole repo (#8575)
* Trim all the trailing white space in the whole repo

* Remove unnecessary empty places

* make style && make quality

* Trim trailing white space

* trim

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-06-24 18:39:15 +05:30
Dong
3fca52022f 🎨 fix xl playground device (#8550)
* 🎨 fix xl playground device

* 🎨 run `make fix-copies`

* 🎨 run `make fix-copies`

* edit xl_controlnet_img2img file

* edit playground img2img test slow

* Update tests/pipelines/stable_diffusion_xl/test_stable_diffusion_xl_img2img.py

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-06-24 16:49:55 +05:30
Tolga Cangöz
c375903db5 Errata - Fix typos & improve contributing page (#8572)
* Fix typos & improve contributing page

* `make style && make quality`

* fix typos

* Fix typo

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-06-24 14:13:03 +05:30
Vinh H. Pham
b9d52fca1d [train_lcm_distill_lora_sdxl.py] Fix the LR schedulers when num_train_epochs is passed in a distributed training env (#8446)
fix num_train_epochs

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-06-24 14:09:28 +05:30
drhead
2ada094bff Add extra performance features for EMAModel, torch._foreach operations and better support for non-blocking CPU offloading (#7685)
* Add support for _foreach operations and non-blocking to EMAModel

* default foreach to false

* add non-blocking EMA offloading to SD1.5 T2I example script

* fix whitespace

* move foreach to cli argument

* linting

* Update README.md re: EMA weight training

* correct args.foreach_ema

* add tests for foreach ema

* code quality

* add foreach to from_pretrained

* default foreach false

* fix linting

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: drhead <a@a.a>
2024-06-24 14:03:47 +05:30
Haofan Wang
f1f542bdd4 Update pipeline_stable_diffusion_3_controlnet.py (#8660)
Co-authored-by: YiYi Xu <yixu310@gmail,com>
2024-06-23 15:27:59 +05:30
Sayak Paul
a9c403c001 [LoRA] refactor lora conversion utility. (#8295)
* refactor lora conversion utility.

* remove error raises.

* add onetrainer support too.
2024-06-22 08:29:12 +05:30
Álvaro Somoza
e7b9a0762b [SD3 LoRA] Fix list index out of range (#8584)
* fix

* add check

* key present is checked before

* test case draft

* aply suggestions

* changed testing repo, back to old class

* forgot docstring

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-06-21 21:17:34 +05:30
Sayak Paul
8eb17315c8 [LoRA] get rid of the legacy lora remnants and make our codebase lighter (#8623)
* get rid of the legacy lora remnants and make our codebase lighter

* fix depcrecated lora argument

* fix

* empty commit to trigger ci

* remove print

* empty
2024-06-21 16:36:05 +05:30
YiYi Xu
c71c19c5e6 a few fix for shard checkpoints (#8656)
fix

Co-authored-by: yiyixuxu <yixu310@gmail,com>
2024-06-21 12:50:58 +05:30
Steaunk
adc31940a9 Fix Typo in StableDiffusion3 (#8642)
* fix typo in __call__ of pipeline_stable_diffusion_3.py

* fix typo in __call__ of pipeline_stable_diffusion_3_img2img.py

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-06-21 08:45:48 +05:30
satani99
963ee05d16 Update train_dreambooth_lora_sd3.py (#8600)
* Update train_dreambooth_lora_sd3.py

* Update train_dreambooth_lora_sd3.py

* Update train_dreambooth_sd3.py

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-06-20 17:42:24 +05:30
Sayak Paul
668e34c6e0 [LoRA SD3] add support for lora fusion in sd3 (#8616)
* add support for lora fusion in sd3

* add test to ensure fused lora and effective lora produce same outpouts
2024-06-20 14:25:51 +05:30
Sayak Paul
25d7bb3ea6 [Flax tests] reduce tolerance for a flax test (#8640)
reduce tolerance for a flax test
2024-06-20 00:48:08 +04:00
YiYi Xu
394b8fb996 fix from_single_file for checkpoints with t5 (#8631)
fix single file
2024-06-19 08:23:35 -10:00
Sayak Paul
a1d55e14ba Change the default weighting_scheme in the SD3 scripts (#8639)
* change to logit_normal as the weighting scheme

* sensible default mote
2024-06-19 13:05:26 +01:00
王奇勋
e5564d45bf Support SD3 ControlNet and Multi-ControlNet. (#8566)
* sd3 controlnet



---------

Co-authored-by: haofanwang <haofanwang.ai@gmail.com>
2024-06-18 14:59:22 -10:00
Nan
2921a20194 [SD3] Fix mis-matched shape when num_images_per_prompt > 1 using without T5 (text_encoder_3=None) (#8558)
* fix shape mismatch when num_images_per_prompt > 1 and text_encoder_3=None

* style

* fix copies

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: yiyixuxu <yixu310@gmail,com>
2024-06-18 12:41:18 -10:00
Carolinabanana
3376252d71 Fix gradient checkpointing issue for Stable Diffusion 3 (#8542)
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-06-18 11:36:23 -10:00
Yongsen Mao
16170c69ae add sd1.5 compatibility to controlnet-xs and fix unused_parameters error during training (#8606)
* add sd1.5 compatibility to controlnet-xs

* set use_linear_projection by base_block

* refine code style
2024-06-18 11:35:34 -10:00
kkj15dk
4408047ac5 self.upsample = Upsample1D (#8580)
Making self.upsample actually be Upsample1D
2024-06-18 11:34:07 -10:00
Vasco Ramos
34fab8b511 [SD3 Docs] Corrected title about loading model with T5 "without" -> "with" (#8602)
[SD3 Docs] Corrected title about loading model with T5

Corrected the documentation title to "Loading the single file checkpoint with T5" Previously, it incorrectly stated "Loading the single file checkpoint without T5" which contradicted the code snippet showing how to load the SD3 checkpoint with the T5 model
2024-06-18 11:33:43 -10:00
Gæros
298ce67999 [LoRA] text encoder: read the ranks for all the attn modules (#8324)
* [LoRA] text encoder: read the ranks for all the attn modules

 * In addition to out_proj, read the ranks of adapters for q_proj, k_proj, and  v_proj

 * Allow missing adapters (UNet already supports this)

* ruff format loaders.lora

* [LoRA] add tests for partial text encoders LoRAs

* [LoRA] update test_simple_inference_with_partial_text_lora to be deterministic

* [LoRA] comment justifying test_simple_inference_with_partial_text_lora

* style

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-06-18 21:10:50 +01:00
Andrew Hong
d2e7a19fd5 Remove underlines between badges (#8484)
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-06-18 10:40:12 -07:00
Sayak Paul
cd3082008e [Core] Add shift_factor to SD3 tiny autoencoder (#8618)
* shift factor argument to tiny

* remove shift factor rejigging from the sd3 docs
2024-06-18 18:28:02 +01:00
Álvaro Somoza
f3209b5b55 [SD3 Inference] T5 Token limit (#8506)
* max_sequence_length for the T5

* updated img2img

* apply suggestions

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-06-18 06:46:38 -10:00
Marc Sun
96399c3ec6 Fix sharding when no device_map is passed (#8531)
* Fix sharding when no device_map is passed

* style

* add tests

* align

* add docstring

* format

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-06-18 05:47:23 -10:00
MaoXianXin
10d3220abe A backslash is missing from the run command (#8471) 2024-06-18 16:44:34 +01:00
Dhruv Nair
f69511ecc6 [Single File Loading] Handle unexpected keys in CLIP models when accelerate isn't installed. (#8462)
* update

* update

* update

* update

* update

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-06-18 16:39:30 +01:00
Álvaro Somoza
d2b10b1f4f [SD3] TAESD3 docs (#8607)
* tased3 docs

* apply suggestion

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-06-18 15:56:38 +01:00
Sayak Paul
23a2cd3337 [LoRA] training fix the position of param casting when loading them (#8460)
fix the position of param casting when loading them
2024-06-18 14:57:34 +01:00
Sayak Paul
4edde134f6 [SD3 training] refactor the density and weighting utilities. (#8591)
refactor the density and weighting utilities.
2024-06-18 14:44:38 +01:00
Bagheera
074a7cc3c5 SD3: update default training timestep / loss weighting distribution to logit_normal (#8592)
Co-authored-by: bghira <bghira@users.github.com>
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
2024-06-18 14:15:19 +01:00
Álvaro Somoza
6bfd13f07a [SD3 Training] T5 token limit (#8564)
* initial commit

* default back to 77

* better text

* text correction

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-06-17 16:32:56 -04:00
AmosDinh
eeb70033a6 Syntax error in readme example "pipe" -> "pipeline" (#8601)
Update controlnet.md

Syntax error pipe -> pipeline
2024-06-17 11:02:07 -07:00
Dhruv Nair
c4a4750cb3 Temporarily pin Numpy in the CI (#8603)
temp pin numpy
2024-06-17 19:32:38 +05:30
YiYi Xu
a6375d4101 Image processor latent (#8513)
* fix

* up

---------

Co-authored-by: yiyixuxu <yixu310@gmail,com>
2024-06-16 22:34:55 -10:00
spacepxl
8e1b7a084a Fix the deletion of SD3 text encoders for Dreambooth/LoRA training if the text encoders are not being trained (#8536)
* Update train_dreambooth_sd3.py to fix TE garbage collection

* Update train_dreambooth_lora_sd3.py to fix TE garbage collection

---------

Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-06-16 20:52:33 +01:00
Rafie Walker
6946facf69 Implement SD3 loss weighting (#8528)
* Add lognorm and cosmap weighting

* Implement mode sampling

* Update examples/dreambooth/train_dreambooth_lora_sd3.py

* Update examples/dreambooth/train_dreambooth_lora_sd3.py

* Update examples/dreambooth/train_dreambooth_sd3.py

* Update examples/dreambooth/train_dreambooth_sd3.py

* Update examples/dreambooth/train_dreambooth_sd3.py

* Update examples/dreambooth/train_dreambooth_lora_sd3.py

* Update examples/dreambooth/train_dreambooth_sd3.py

* Update examples/dreambooth/train_dreambooth_sd3.py

* Update examples/dreambooth/train_dreambooth_lora_sd3.py

* keep timestamp sampling fully on cpu

---------

Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-06-16 20:15:50 +01:00
Sayak Paul
130dd936bb pin accelerate to 0.31.0 (#8563)
* pin accelerate to 0.31.0

* update dep table

* empty
2024-06-16 08:37:00 -10:00
Jonathan Rahn
a899e42fc7 add sentencepiece to requirements.txt for SD3 dreambooth (#8538)
* add `sentencepiece` requirement for SD3

add `sentencepiece` requirement

* Empty-Commit

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-06-14 22:48:36 +01:00
Sayak Paul
f96e4a16ad pin transformers to the latest (#8522)
thanks!
2024-06-13 07:39:24 -10:00
Tolga Cangöz
9c6e9684a2 Refactor StableDiffusion3Img2ImgPipeline to remove redundant code (#8533) 2024-06-13 07:36:46 -10:00
Sayak Paul
2e4841ef1e post release 0.29.0 (#8492)
post release
2024-06-13 06:14:20 -10:00
Haofan Wang
8bea943714 Update requirements_sd3.txt (#8521) 2024-06-13 17:02:17 +01:00
YiYi Xu
614d0c64e9 remove the deprecated prepare_mask_and_masked_image function (#8512)
remove prepare mask fn

Co-authored-by: yiyixuxu <yixu310@gmail,com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-06-13 14:59:21 +01:00
Dhruv Nair
b1a2c0d577 Expand Single File support in SD3 Pipeline (#8517)
* update

* update
2024-06-13 18:29:19 +05:30
Lucain
06ee907b73 Fix PATH_IN_REPO on new release in mirror_community_pipeline.yaml (#8519)
Fix PATH_IN_REPO in mirror workflow
2024-06-13 10:25:24 +02:00
ちくわぶ
896fb6d8d7 Fix duplicate variable assignments in SD3's JointAttnProcessor (#8516)
* Fix duplicate variable assignments.

* Fix duplicate variable assignments.
2024-06-12 21:52:35 -10:00
Beinsezii
7f51f286a5 Add Hunyuan AutoPipe mapping (#8505) 2024-06-12 16:11:55 -10:00
kkj15dk
829f6defa4 Fix spelling in scheduling_flow_match_euler_discrete.py (#8497)
Update scheduling_flow_match_euler_discrete.py

Spelling:
Foward -> Forward

Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-06-12 12:37:47 -10:00
Beinsezii
24bdf4b215 Add SD3 AutoPipeline mappings (#8489) 2024-06-12 12:31:36 -10:00
Radamés Ajna
95e0c3757d Fix small typo (#8498) 2024-06-12 15:30:58 -07:00
Sayak Paul
6cf0be5d3d fix warning log for Transformer SD3 (#8496)
fix warning log
2024-06-12 12:25:18 -10:00
Sayak Paul
ec068f9b5b fix dual transformer2d import (#8491)
fix
2024-06-12 21:10:27 +01:00
Ameer Azam
0240d4191a Update README_sd3.md (#8490)
becasue in  Readme  it was not correct
train_dreambooth_sd3.py to train_dreambooth_lora_sd3
2024-06-12 21:08:36 +01:00
Dhruv Nair
04717fd861 Add Stable Diffusion 3 (#8483)
* up

* add sd3

* update

* update

* add tests

* fix copies

* fix docs

* update

* add dreambooth lora

* add LoRA

* update

* update

* update

* update

* import fix

* update

* Update src/diffusers/pipelines/stable_diffusion_3/pipeline_stable_diffusion_3.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* import fix 2

* update

* Update src/diffusers/models/autoencoders/autoencoder_kl.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/models/autoencoders/autoencoder_kl.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/models/autoencoders/autoencoder_kl.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/models/autoencoders/autoencoder_kl.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/models/autoencoders/autoencoder_kl.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/models/autoencoders/autoencoder_kl.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/models/autoencoders/autoencoder_kl.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/models/autoencoders/autoencoder_kl.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/models/autoencoders/autoencoder_kl.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/models/autoencoders/autoencoder_kl.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/models/autoencoders/autoencoder_kl.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* update

* update

* update

* fix ckpt id

* fix more ids

* update

* missing doc

* Update src/diffusers/schedulers/scheduling_flow_match_euler_discrete.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/schedulers/scheduling_flow_match_euler_discrete.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update docs/source/en/api/pipelines/stable_diffusion/stable_diffusion_3.md

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* Update docs/source/en/api/pipelines/stable_diffusion/stable_diffusion_3.md

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* update'

* fix

* update

* Update src/diffusers/models/autoencoders/autoencoder_kl.py

* Update src/diffusers/models/autoencoders/autoencoder_kl.py

* note on gated access.

* requirements

* licensing

---------

Co-authored-by: sayakpaul <spsayakpaul@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-06-12 20:44:00 +01:00
Jiwook Han
6fd458e99d 🌐 [i18n-KO] Translated conceptual/philosophy.md and 3 other documents to Korean (#8294)
* translation about 3 documents into Korean

* evaluation doc korean translation

* _toctree.yml modify

* doc title fix : philosopy->philosophy

* Update docs/source/ko/conceptual/ethical_guidelines.md

Co-authored-by: Jihun Lim <31366038+heuristicwave@users.noreply.github.com>

* Update docs/source/ko/conceptual/ethical_guidelines.md

Co-authored-by: Jihun Lim <31366038+heuristicwave@users.noreply.github.com>

* Update docs/source/ko/conceptual/ethical_guidelines.md

Co-authored-by: Jihun Lim <31366038+heuristicwave@users.noreply.github.com>

* Update docs/source/ko/conceptual/ethical_guidelines.md

Co-authored-by: Jihun Lim <31366038+heuristicwave@users.noreply.github.com>

* Update docs/source/ko/conceptual/ethical_guidelines.md

Co-authored-by: Jihun Lim <31366038+heuristicwave@users.noreply.github.com>

* Update docs/source/ko/conceptual/ethical_guidelines.md

Co-authored-by: Jihun Lim <31366038+heuristicwave@users.noreply.github.com>

* Update docs/source/ko/conceptual/ethical_guidelines.md

Co-authored-by: Jihun Lim <31366038+heuristicwave@users.noreply.github.com>

* Update docs/source/ko/conceptual/ethical_guidelines.md

Co-authored-by: Jihun Lim <31366038+heuristicwave@users.noreply.github.com>

* Update docs/source/ko/conceptual/ethical_guidelines.md

Co-authored-by: Jihun Lim <31366038+heuristicwave@users.noreply.github.com>

* Update docs/source/ko/conceptual/ethical_guidelines.md

Co-authored-by: Jihun Lim <31366038+heuristicwave@users.noreply.github.com>

* Update docs/source/ko/conceptual/evaluation.md

Co-authored-by: Chulhwa (Evan) Han <cjfghk5697@ajou.ac.kr>

* Update docs/source/ko/conceptual/evaluation.md

Co-authored-by: Chulhwa (Evan) Han <cjfghk5697@ajou.ac.kr>

* Update docs/source/ko/conceptual/evaluation.md

Co-authored-by: Chulhwa (Evan) Han <cjfghk5697@ajou.ac.kr>

* Update docs/source/ko/conceptual/evaluation.md

Co-authored-by: Chulhwa (Evan) Han <cjfghk5697@ajou.ac.kr>

* Update docs/source/ko/conceptual/evaluation.md

Co-authored-by: Chulhwa (Evan) Han <cjfghk5697@ajou.ac.kr>

* Update docs/source/ko/conceptual/evaluation.md

Co-authored-by: Chulhwa (Evan) Han <cjfghk5697@ajou.ac.kr>

* Update docs/source/ko/conceptual/evaluation.md

Co-authored-by: Chulhwa (Evan) Han <cjfghk5697@ajou.ac.kr>

* Update philosophy.md (from jungnerd)

---------

Co-authored-by: Jihun Lim <31366038+heuristicwave@users.noreply.github.com>
Co-authored-by: Chulhwa (Evan) Han <cjfghk5697@ajou.ac.kr>
2024-06-12 09:40:37 -07:00
Greg Hunkins
1066fe4cbc 🤫 Quiet IP Adapter Mask Warning (#8475)
* quiet attn parameters

* fix lint

* make style && make quality

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-06-12 16:50:13 +01:00
Sayak Paul
d38f69ea25 change max_shard_size to 10GB (#8445)
* change max_shard_size to 10GB

* add notes to the documentation

* Update src/diffusers/models/modeling_utils.py

Co-authored-by: Lucain <lucainp@gmail.com>

* change to abs limit

---------

Co-authored-by: Lucain <lucainp@gmail.com>
2024-06-12 13:49:13 +01:00
Patrick
0a1c13af79 image_processor.py: Fixed an error in ValueError's message (#8447)
* image_processor.py: Fixed an error in ValueError's message , as the string's join method tried to join types, instead of strings

Bug that occurred:

f"Input is in incorrect format. Currently, we only support {', '.join(supported_formats)}"
TypeError: sequence item 0: expected str instance, type found

* Fixed: C417 Unnecessary `map` usage (rewrite using a generator expression)

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-06-11 08:09:24 -10:00
YiYi Xu
0028c34432 fix SEGA pipeline (#8467)
* fix

* style

---------

Co-authored-by: yiyixuxu <yixu310@gmail,com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-06-11 06:37:49 -10:00
Sayak Paul
d457beed92 Update README.md to update the MaPO project (#8470)
Update README.md
2024-06-11 10:10:45 +01:00
Jianqi Pan
1d9a6a81b9 🔧 chore: use modeling_outputs.Transformer2DModelOutput (#8436)
* 🔧 chore: use modeling_outputs.Transformer2DModelOutput

* 🔧 chore: isort

* 🔧 chore: isort

* style

---------

Co-authored-by: sayakpaul <spsayakpaul@gmail.com>
2024-06-10 12:11:41 +01:00
Luc Georges
4e0984db6c fix(ci): remove unnecessary permissions (#8457) 2024-06-10 10:49:29 +01:00
Luc Georges
83bc6c94ea feat(ci): add trufflehog secrets detection (#8430) 2024-06-08 07:56:47 +05:30
Lucain
0d68ddf327 Move away from cached_download (#8419)
* Move away from

* unused constant

* Add custom error
2024-06-07 15:43:00 +05:30
Sayak Paul
7d887118b9 [Core] support saving and loading of sharded checkpoints (#7830)
* feat: support saving a model in sharded checkpoints.

* feat: make loading of sharded checkpoints work.

* add tests

* cleanse the loading logic a bit more.

* more resilience while loading from the Hub.

* parallelize shard downloads by using snapshot_download()/

* default to a shard size.

* more fix

* Empty-Commit

* debug

* fix

* uality

* more debugging

* fix more

* initial comments from Benjamin

* move certain methods to loading_utils

* add test to check if the correct number of shards are present.

* add a test to check if loading of sharded checkpoints from the Hub is okay

* clarify the unit when passed as an int.

* use hf_hub for sharding.

* remove unnecessary code

* remove unnecessary function

* lucain's comments.

* fixes

* address high-level comments.

* fix test

* subfolder shenanigans./

* Update src/diffusers/utils/hub_utils.py

Co-authored-by: Lucain <lucainp@gmail.com>

* Apply suggestions from code review

Co-authored-by: Lucain <lucainp@gmail.com>

* remove _huggingface_hub_version as not needed.

* address more feedback.

* add a test for local_files_only=True/

* need hf hub to be at least 0.23.2

* style

* final comment.

* clean up subfolder.

* deal with suffixes in code.

* _add_variant default.

* use weights_name_pattern

* remove add_suffix_keyword

* clean up downloading of sharded ckpts.

* don't return something special when using index.json

* fix more

* don't use bare except

* remove comments and catch the errors better

* fix a couple of things when using is_file()

* empty

---------

Co-authored-by: Lucain <lucainp@gmail.com>
2024-06-07 14:49:10 +05:30
Lucain
b63c956860 Final fix for mirror community pipeline (#8427) 2024-06-07 11:08:33 +02:00
Lucain
716b2062bf Fix mirror community pipeline (#8426) 2024-06-07 11:03:48 +02:00
Lucain
5fd6825d25 Fix mirror_community_pipeline.yml name (#8425) 2024-06-07 11:00:05 +02:00
Lucain
e0fae6fd73 Mirror ./examples/community folder on HF (#8417)
* first draft

* secret

* tiktok

* capital matters

* dataset matter

* don't be a prick

* refact

* only on main or tag

* document with an example

* Update destination dataset

* link

* allow manual trigger

* better

* lin

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-06-07 10:56:05 +02:00
Tolga Cangöz
ec1aded12e Optimize test files by fixing CPU-offloading usage (#8409)
* Refactor code to remove unnecessary calls to `to(torch_device)`

* Refactor code to remove unnecessary calls to `to("cuda")`

* Update pipeline_stable_diffusion_diffedit.py
2024-06-06 09:51:26 -10:00
Steven Liu
151a56b80e [docs] Single file usage (#8412)
* single file usage

* edit
2024-06-06 12:40:34 -07:00
Sayak Paul
a3faf3f260 [Core] fix: legacy model mapping (#8416)
* fix: legacy model mapping

* remove print
2024-06-06 20:35:05 +05:30
Sayak Paul
867a2b0cf9 [Hunyuan] add optimization related sections to the hunyuan dit docs. (#8402)
* optimizations to the hunyuan dit docs.

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/api/pipelines/hunyuandit.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-06-06 05:41:38 +05:30
Tolga Cangöz
98730c5dd7 Errata (#8322)
* Fix typos

* Trim trailing whitespaces

* Remove a trailing whitespace

* chore: Update MarigoldDepthPipeline checkpoint to prs-eth/marigold-lcm-v1-0

* Revert "chore: Update MarigoldDepthPipeline checkpoint to prs-eth/marigold-lcm-v1-0"

This reverts commit fd742b30b4.

* pokemon -> naruto

* `DPMSolverMultistep` -> `DPMSolverMultistepScheduler`

* Improve Markdown stylization

* Improve style

* Improve style

* Refactor pipeline variable names for consistency

* up style
2024-06-05 13:59:09 -07:00
Guillaume LEGENDRE
7ebd359446 Update tailscale action to main (#8403) 2024-06-05 18:53:33 +05:30
Hzzone
d3881f35b7 Gligen training (#7906)
* add training code of gligen

* fix code quality tests.

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-06-05 16:26:42 +04:00
Sayak Paul
48207d6689 [Scheduler] fix: EDM schedulers when using the exp sigma schedule. (#8385)
* fix: euledm when using the exp sigma schedule.

* fix-copies

* remove print.

* reduce friction

* yiyi's suggestioms
2024-06-04 19:31:43 -10:00
Sayak Paul
2f6f426f66 [Hunyuan] allow Hunyuan DiT to run under 6GB for GPU VRAM (#8399)
* allow hunyuan dit to run under 6GB for GPU VRAM

* add section in the docs/
2024-06-05 08:24:19 +04:00
Sayak Paul
a0542c1917 [LoRA] Remove legacy LoRA code and related adjustments (#8316)
* remove legacy code from load_attn_procs.

* finish first draft

* fix more.

* fix more

* add test

* add serialization support.

* fix-copies

* require peft backend for lora tests

* style

* fix test

* fix loading.

* empty

* address benjamin's feedback.
2024-06-05 08:15:30 +04:00
Sayak Paul
a8ad6664c2 [Hunyuan] feat: support chunked ff. (#8397)
feat: support chunked ff.
2024-06-05 08:12:18 +04:00
Sayak Paul
14f7b545bd [Hunyuan DiT] feat: enable fusing qkv projections when doing attention (#8396)
* feat: introduce qkv fusion for Hunyuan

* fix copies
2024-06-05 07:58:03 +04:00
leaps
07cd20041c Update code example in pipeline_stable_unclip_img2img.py EXAMPLE_DOC_STRING (#8401)
Update code example in pipeline_stable_unclip_img2img.py

Previous code caused an error when run
2024-06-04 17:22:46 -10:00
Sayak Paul
6ddbf6222c [Transformer2DModel] Handle norm_type safely while remapping (#8370)
* handle norm_type of transformer2d_model safely.

* log an info when old model class is being returned.

* Apply suggestions from code review

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

* remove extra stuff

---------

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2024-06-04 13:39:19 +04:00
Sayak Paul
3ff39e8e86 [HunyuanDiT] minor docs changes in hunyuandit (#8395)
minor docs changes in hunyuandit
2024-06-04 12:18:53 +04:00
townwish4git
6be43bd855 Fix AsymmetricAutoencoderKL forward (#8378) 2024-06-03 17:25:11 -10:00
Marçal Comajoan Cara
dc89434bdc Update transformer2d.md title (#8375)
* Update transformer2d.md title

For the other classes (e.g., UNet2DModel) the title of the documentation coincides with the name of the class, but that was not the case for Transformer2DModel.

* Update model docs titles for consistency with class names
2024-06-03 17:01:21 -07:00
Dhruv Nair
4d633bfe9a Update slow test actions (#8381)
* update

* update

* update

* update
2024-06-03 18:32:34 +05:30
XCL
174cf868ea Tencent Hunyuan Team - Updated Doc for HunyuanDiT (#8383)
* add hunyuandit doc

* update hunyuandit doc

* update hunyuandit 2d model

* update toctree.yml for hunyuandit
2024-06-03 14:02:46 +04:00
XCL
413604405f Tencent Hunyuan Team: add HunyuanDiT related updates (#8240)
* Hunyuan Team: add HunyuanDiT related updates


---------

Co-authored-by: XCLiu <liuxc1996@gmail.com>
Co-authored-by: yiyixuxu <yixu310@gmail.com>
2024-06-01 12:41:21 -10:00
39th president of the United States, probably
bc108e1533 Fix DREAM training (#8302)
Co-authored-by: Jimmy <39@🇺🇸.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-06-01 11:27:57 +04:00
Anton Obukhov
86555c9f59 Fix marigold documentation (#8372)
* rename prs-eth/marigold-lcm-v1-0 into prs-eth/marigold-depth-lcm-v1-0

* update image paths in https://huggingface.co/datasets/huggingface/documentation-images to use main branch

* fix relative paths to other diffusers pages

* Update docs/source/en/using-diffusers/marigold_usage.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-05-31 12:10:05 -10:00
Sayak Paul
983dec3bf7 [Core] Introduce class variants for Transformer2DModel (#7647)
* init for patches

* finish patched model.

* continuous transformer

* vectorized transformer2d.

* style.

* inits.

* fix-copies.

* introduce DiTTransformer2DModel.

* fixes

* use REMAPPING as suggested by @DN6

* better logging.

* add pixart transformer model.

* inits.

* caption_channels.

* attention masking.

* fix use_additional_conditions.

* remove print.

* debug

* flatten

* fix: assertion for sigma

* handle remapping for modeling_utils

* add tests for dit transformer2d

* quality

* placeholder for pixart tests

* pixart tests

* add _no_split_modules

* add docs.

* check

* check

* check

* check

* fix tests

* fix tests

* move Transformer output to modeling_output

* move errors better and bring back use_additional_conditions attribute.

* add unnecessary things from DiT.

* clean up pixart

* fix remapping

* fix device_map things in pixart2d.

* replace Transformer2DModel with appropriate classes in dit, pixart tests

* empty

* legacy mixin classes./

* use a remapping dict for fetching class names.

* change to specifc model types in the pipeline implementations.

* move _fetch_remapped_cls_from_config to modeling_loading_utils.py

* fix dependency problems.

* add deprecation note.
2024-05-31 13:40:27 +05:30
Dhruv Nair
f9fa8a868c Change checkpoint key used to identify CLIP models in single file checkpoints (#8319)
update
2024-05-31 11:20:31 +05:30
Jonah
05be622b1c Fix depth pipeline "input/weight type should be the same" error at fp16 (#8321)
Fix "input/weight type should be the same"

Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-05-30 13:59:49 -10:00
satani99
352d96eb82 Modularize train_text_to_image_lora_sdxl inferencing during and after training in example (#8335)
* Modularized the train_lora_sdxl file

* Modularized the train_lora_sdxl file

* Modularized the train_lora_sdxl file

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-05-31 04:52:22 +05:30
Genius Patrick
3511a9623f fix(training): lr scheduler doesn't work properly in distributed scenarios (#8312) 2024-05-30 15:23:19 +05:30
Dhruv Nair
42cae93b94 Fix StableDiffusionPipeline when text_encoder=None (#8297)
* update

* update

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-05-29 09:00:51 -10:00
Tolga Cangöz
a2ecce26bc Fix Copying Mechanism typo/bug (#8232)
* Fix copying mechanism typos

* fix copying mecha

* Revert, since they are in TODO

* Fix copying mechanism
2024-05-29 09:37:18 -07:00
Steven Liu
9e00b727ad [docs] Files and formats (#7874)
* files and formats

* fix callout

* feedback

* code sample

* feedback
2024-05-29 09:31:32 -07:00
Steven Liu
f7a4626f4b [docs] DeepFloyd training (#8224)
deepfloyd training

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-05-29 09:27:37 -07:00
Tolga Cangöz
f4a44b7707 Simplify platform_info assignment in diffusers-cli env (#8298)
chore: Simplify `platform_info` assignment
2024-05-29 17:57:42 +05:30
satani99
3bc3b48c10 Modularize train_text_to_image_lora SD inferencing during and after training in example (#8283)
* Modularized the train_lora file

* Modularized the train_lora file

* Modularized the train_lora file

* Modularized the train_lora file

* Modularized the train_lora file

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-05-29 10:08:02 +05:30
Sayak Paul
581d8aacf7 post release v0.28.0 (#8286)
* post release v0.28.0

* style
2024-05-29 07:13:22 +05:30
Sayak Paul
ba1bfac20b [Core] Refactor IPAdapterPlusImageProjection a bit (#7994)
* use IPAdapterPlusImageProjectionBlock in IPAdapterPlusImageProjection

* reposition IPAdapterPlusImageProjection

* refactor complete?

* fix heads param retrieval.

* update test dict creation method.
2024-05-29 06:30:47 +05:30
Sayak Paul
5edd0b34fa move vqmodel to models.autoencoders. (#8292)
move vqmodel to models.autoencoders.
2024-05-29 06:30:35 +05:30
Sayak Paul
3a28e36aa1 [Post release 0.28.0] remove deprecated blocks. (#8291)
* remove deprecated blocks.

* update the location paths.
2024-05-29 06:29:43 +05:30
Vladimir Mandic
3393c01c9d fix pixart-sigma negative prompt handling (#8299)
* fix negative prompt

* fix

---------

Co-authored-by: yiyixuxu <yixu310@gmail,com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-05-28 13:10:35 -10:00
Steven Liu
1fa8dbc63a [docs] Outpaint (#7964)
* first draft

* edits
2024-05-28 14:42:03 -07:00
Steven Liu
0ab6dc0f23 [docs] Scheduler features (#7990)
* noise schedule

* sigmas and zero snr

* feedback

* feedback
2024-05-28 14:41:22 -07:00
Álvaro Somoza
b2030a249c Fix object has no attribute 'flush' when using without a console (#8271)
fix
2024-05-28 11:19:01 -10:00
Sajad Norouzi
67bef2027c Add Kohya fix to SD pipeline for high resolution generation (#7633)
add kohya high resolution fix.
2024-05-28 10:00:04 -10:00
Sayak Paul
aa676c641f change to yiyi's address. (#7981)
* change to yiyi's address.

* update to diffusers@huggingface.co
2024-05-28 08:28:55 -10:00
Sayak Paul
e6df8edadc [LoRA] attempt at fixing onetrainer lora. (#8242)
* attempt at fixing onetrainer lora.

* fix
2024-05-28 08:25:54 -10:00
Jiwook Han
80cfaebaa1 Fix typo in philosophy.md (#8303)
fix typo in philosophy.md
2024-05-28 10:38:48 -07:00
Álvaro Somoza
ba82414106 [docs] Add controlnet example to marigold (#8289)
* initial doc

* fix wrong LCM sentence

* implement binary colormap without requiring matplotlib
update section about Marigold for ControlNet
update formatting of marigold_usage.md

* fix indentation

---------

Co-authored-by: anton <anton.obukhov@gmail.com>
2024-05-28 11:58:06 -04:00
Sayak Paul
fe5f035f79 install wget. (#8285) 2024-05-27 18:06:07 +05:30
Anton Obukhov
b3d10d6d65 [Pipeline] Marigold depth and normals estimation (#7847)
* implement marigold depth and normals pipelines in diffusers core

* remove bibtex

* remove deprecations

* remove save_memory argument

* remove validate_vae

* remove config output

* remove batch_size autodetection

* remove presets logic
move default denoising_steps and processing_resolution into the model config
make default ensemble_size 1

* remove no_grad

* add fp16 to the example usage

* implement is_matplotlib_available
use is_matplotlib_available, is_scipy_available for conditional imports in the marigold depth pipeline

* move colormap, visualize_depth, and visualize_normals into export_utils.py

* make the denoising loop more lucid
fix the outputs to always be 4d tensors or lists of pil images
support a 4d input_image case
attempt to support model_cpu_offload_seq
move check_inputs into a separate function
change default batch_size to 1, remove any logic to make it bigger implicitly

* style

* rename denoising_steps into num_inference_steps

* rename input_image into image

* rename input_latent into latents

* remove decode_image
change decode_prediction to use the AutoencoderKL.decode method

* move clean_latent outside of progress_bar

* refactor marigold-reusable image processing bits into MarigoldImageProcessor class

* clean up the usage example docstring

* make ensemble functions members of the pipelines

* add early checks in check_inputs
rename E into ensemble_size in depth ensembling

* fix vae_scale_factor computation

* better compatibility with torch.compile
better variable naming

* move export_depth_to_png to export_utils

* remove encode_prediction

* improve visualize_depth and visualize_normals to accept multi-dimensional data and lists
remove visualization functions from the pipelines
move exporting depth as 16-bit PNGs functionality from the depth pipeline
update example docstrings

* do not shortcut vae.config variables

* change all asserts to raise ValueError

* rename output_prediction_type to output_type

* better variable names
clean up variable deletion code

* better variable names

* pass desc and leave kwargs into the diffusers progress_bar
implement nested progress bar for images and steps loops

* implement scale_invariant and shift_invariant flags in the ensemble_depth function
add scale_invariant and shift_invariant flags readout from the model config
further refactor ensemble_depth
support ensembling without alignment
add ensemble_depth docstring

* fix generator device placement checks

* move encode_empty_text body into the pipeline call

* minor empty text encoding simplifications

* adjust pipelines' class docstrings to explain the added construction arguments

* improve the scipy failure condition
add comments
improve docstrings
change the default use_full_z_range to True

* make input image values range check configurable in the preprocessor
refactor load_image_canonical in preprocessor to reject unknown types and return the image in the expected 4D format of tensor and on right device
support a list of everything as inputs to the pipeline, change type to PipelineImageInput
implement a check that all input list elements have the same dimensions
improve docstrings of pipeline outputs
remove check_input pipeline argument

* remove forgotten print

* add prediction_type model config

* add uncertainty visualization into export utils
fix NaN values in normals uncertainties

* change default of output_uncertainty to False
better handle the case of an attempt to export or visualize none

* fix `output_uncertainty=False`

* remove kwargs
fix check_inputs according to the new inputs of the pipeline

* rename prepare_latent into prepare_latents as in other pipelines
annotate prepare_latents in normals pipeline with "Copied from"
annotate encode_image in normals pipeline with "Copied from"

* move nested-capable `progress_bar` method into the pipelines
revert the original `progress_bar` method in pipeline_utils

* minor message improvement

* fix cpu offloading

* move colormap, visualize_depth, export_depth_to_16bit_png, visualize_normals, visualize_uncertainty to marigold_image_processing.py
update example docstrings

* fix missing comma

* change torch.FloatTensor to torch.Tensor

* fix importing of MarigoldImageProcessor

* fix vae offloading
fix batched image encoding
remove separate encode_image function and use vae.encode instead

* implement marigold's intial tests
relax generator checks in line with other pipelines
implement return_dict __call__ argument in line with other pipelines

* fix num_images computation

* remove MarigoldImageProcessor and outputs from import structure
update tests

* update docstrings

* update init

* update

* style

* fix

* fix

* up

* up

* up

* add simple test

* up

* update expected np input/output to be channel last

* move expand_tensor_or_array into the MarigoldImageProcessor

* rewrite tests to follow conventions - hardcoded slices instead of image artifacts
write more smoke tests

* add basic docs.

* add anton's contribution statement

* remove todos.

* fix assertion values for marigold depth slow tests

* fix assertion values for depth normals.

* remove print

* support AutoencoderTiny in the pipelines

* update documentation page
add Available Pipelines section
add Available Checkpoints section
add warning about num_inference_steps

* fix missing import in docstring
fix wrong value in visualize_depth docstring

* [doc] add marigold to pipelines overview

* [doc] add section "usage examples"

* fix an issue with latents check in the pipelines

* add "Frame-by-frame Video Processing with Consistency" section

* grammarly

* replace tables with images with css-styled images (blindly)

* style

* print

* fix the assertions.

* take from the github runner.

* take the slices from action artifacts

* style.

* update with the slices from the runner.

* remove unnecessary code blocks.

* Revert "[doc] add marigold to pipelines overview"

This reverts commit a505165150afd8dab23c474d1a054ea505a56a5f.

* remove invitation for new modalities

* split out marigold usage examples

* doc cleanup

---------

Co-authored-by: yiyixuxu <yixu310@gmail.com>
Co-authored-by: yiyixuxu <yixu310@gmail,com>
Co-authored-by: sayakpaul <spsayakpaul@gmail.com>
2024-05-27 17:21:49 +05:30
Dhruv Nair
b82f9f5666 Add zip package to doc builder image (#8284)
update
2024-05-27 15:50:00 +05:30
Sayak Paul
6a5ba1b719 [Workflows] add a more secure way to run tests from a PR. (#7969)
* add a more secure way to run tests from a PR.

* make pytest more secure.

* address dhruv's comments.

* improve validation check.

* Update .github/workflows/run_tests_from_a_pr.yml

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

---------

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2024-05-27 13:47:50 +05:30
Dhaivat Bhatt
4d40c9140c Add details about 1-stage implementation in I2VGen-XL docs (#8282)
* Add details about 1-stage implementation

* Add details about 1-stage implementation
2024-05-27 09:56:32 +05:30
Tolga Cangöz
0ab63ff647 Fix CPU Offloading Usage & Typos (#8230)
* Fix typos

* Fix `pipe.enable_model_cpu_offload()` usage

* Fix cpu offloading

* Update numbers
2024-05-24 11:25:29 -07:00
Tolga Cangöz
db33af065b Fix a grammatical error in the raise messages (#8272)
Fix grammatical error
2024-05-24 11:15:00 -07:00
Yue Wu
1096f88e2b sampling bug fix in diffusers tutorial "basic_training.md" (#8223)
sampling bug fix in basic_training.md

In the diffusers basic training tutorial, setting the manual seed argument (generator=torch.manual_seed(config.seed)) in the pipeline call inside evaluate() function rewinds the dataloader shuffling, leading to overfitting due to the model seeing same sequence of training examples after every evaluation call. Using generator=torch.Generator(device='cpu').manual_seed(config.seed) avoids this.
2024-05-24 11:14:32 -07:00
Dhruv Nair
cef4a51223 Clean up from_single_file docs (#8268)
* update

* update
2024-05-24 17:43:51 +05:30
Lucain
edf5ba6a17 Respect resume_download deprecation V2 (#8267)
* Fix resume_downoad FutureWarning

* only resume download
2024-05-24 12:11:03 +02:00
Sayak Paul
9941f1f61b [Chore] run the documentation workflow in a custom container. (#8266)
run the documentation workflow in a custom container.
2024-05-24 15:10:02 +05:30
Yifan Zhou
46a9db0336 [Community Pipeline] FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation (#8239)
* code and doc

* update paper link

* remove redundant codes

* add example video

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-05-24 14:44:20 +05:30
Dhruv Nair
370146e4e0 Use freedesktop_os_release() in diffusers cli for Python >=3.10 (#8235)
* update

* update
2024-05-24 13:30:40 +05:30
Dhruv Nair
5cd45c24bf Create custom container for doc builder (#8263)
* update

* update
2024-05-24 12:53:48 +05:30
Dhruv Nair
67b3fe0aae Fix resize issue in SVD pipeline with VideoProcessor (#8229)
update

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-05-23 11:57:34 +05:30
Dhruv Nair
baab065679 Remove unnecessary single file tests for SD Cascade UNet (#7996)
update
2024-05-22 12:29:59 +05:30
BootesVoid
509741aea7 fix: Attribute error in Logger object (logger.warning) (#8183) 2024-05-22 12:29:11 +05:30
Lucain
e1df77ee1e Use HF_TOKEN env var in CI (#7993) 2024-05-21 14:58:10 +05:30
Steven Liu
fdb1baa05c [docs] VideoProcessor (#7965)
* fix?

* fix?

* fix
2024-05-21 08:18:21 +05:30
Vinh H. Pham
6529ee67ec Make VAE compatible to torch.compile() (#7984)
make VAE compatible to torch.compile()

Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-05-20 13:43:59 -04:00
Sai-Suraj-27
df2bc5ef28 fix: Fixed few docstrings according to the Google Style Guide (#7717)
Fixed few docstrings according to the Google Style Guide.
2024-05-20 10:26:05 -07:00
Aleksei Zhuravlev
a7bf77fc28 Passing cross_attention_kwargs to StableDiffusionInstructPix2PixPipeline (#7961)
* Update pipeline_stable_diffusion_instruct_pix2pix.py

Add `cross_attention_kwargs` to `__call__` method of `StableDiffusionInstructPix2PixPipeline`, which are passed to UNet.

* Update documentation for pipeline_stable_diffusion_instruct_pix2pix.py

* Update docstring

* Update docstring

* Fix typing import
2024-05-20 13:14:34 -04:00
Junsong Chen
0f0defdb65 [docs] add doc for PixArtSigmaPipeline (#7857)
* 1. add doc for PixArtSigmaPipeline;

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Guillaume LEGENDRE <glegendre01@gmail.com>
Co-authored-by: Álvaro Somoza <asomoza@users.noreply.github.com>
Co-authored-by: Bagheera <59658056+bghira@users.noreply.github.com>
Co-authored-by: bghira <bghira@users.github.com>
Co-authored-by: Hyoungwon Cho <jhw9811@korea.ac.kr>
Co-authored-by: yiyixuxu <yixu310@gmail.com>
Co-authored-by: Tolga Cangöz <46008593+standardAI@users.noreply.github.com>
Co-authored-by: Philip Pham <phillypham@google.com>
2024-05-20 12:40:57 -04:00
Nikita
19df9f3ec0 Update pipeline_controlnet_inpaint_sd_xl.py (#7983) 2024-05-20 12:24:49 -04:00
Jacob Marks
d6ca120987 Fix typo in "attention" (#7977) 2024-05-20 11:54:29 -04:00
Sayak Paul
fb7ae0184f [tests] fix Pixart Sigma tests (#7966)
* checking tests

* checking ii.

* remove prints.

* test_pixart_1024

* fix 1024.
2024-05-19 20:56:31 +05:30
Sayak Paul
70f8d4b488 remove unsafe workflow. (#7967) 2024-05-17 13:46:24 +05:30
Álvaro Somoza
6c60e430ee Consistent SDXL Controlnet callback tensor inputs (#7958)
* make _callback_tensor_inputs consistent between sdxl pipelines

* forgot this one

* fix failing test

* fix test_components_function

* fix controlnet inpaint tests
2024-05-16 07:15:10 -10:00
Alphin Jain
1221b28eac Fix AttributeError in train_lcm_distill_lora_sdxl_wds.py (#7923)
Fix conditional teacher model check in train_lcm_distill_lora_sdxl_wds.py

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-05-16 15:49:54 +05:30
Liang Hou
746f603b20 Fix the text tokenizer name in logger warning of PixArt pipelines (#7912)
Fix CLIP to T5 in logger warning
2024-05-15 18:49:29 -10:00
Sai-Suraj-27
2afea72d29 refactor: Refactored code by Merging isinstance calls (#7710)
* Merged isinstance calls to make the code simpler.

* Corrected formatting errors using ruff.

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-05-15 18:33:19 -10:00
Sayak Paul
0f111ab794 [Workflows] add a workflow that can be manually triggered on a PR. (#7942)
* add a workflow that can be manually triggered on a PR.

* remove sudo

* add command

* small fixes.
2024-05-15 17:18:56 +05:30
Guillaume LEGENDRE
4dd7aaa06f move to GH hosted M1 runner (#7949) 2024-05-15 13:47:36 +05:30
Isamu Isozaki
d27e996ccd Adding VQGAN Training script (#5483)
* Init commit

* Removed einops

* Added default movq config for training

* Update explanation of prompts

* Fixed inheritance of discriminator and init_tracker

* Fixed incompatible api between muse and here

* Fixed output

* Setup init training

* Basic structure done

* Removed attention for quick tests

* Style fixes

* Fixed vae/vqgan styles

* Removed redefinition of wandb

* Fixed log_validation and tqdm

* Nothing commit

* Added commit loss to lookup_from_codebook

* Update src/diffusers/models/vq_model.py

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* Adding perliminary README

* Fixed one typo

* Local changes

* Fixed main issues

* Merging

* Update src/diffusers/models/vq_model.py

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* Testing+Fixed bugs in training script

* Some style fixes

* Added wandb to docs

* Fixed timm test

* get testing suite ready.

* remove return loss

* remove return_loss

* Remove diffs

* Remove diffs

* fix ruff format

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2024-05-15 08:47:12 +05:30
Sayak Paul
72780ff5b1 [tests] decorate StableDiffusion21PipelineSingleFileSlowTests with slow. (#7941)
decorate StableDiffusion21PipelineSingleFileSlowTests with slow.
2024-05-14 14:26:21 -10:00
Jingyang Zhang
69fdb8720f [Pipeline] Adding BoxDiff to community examples (#7947)
add boxdiff to community examples
2024-05-14 11:18:29 -10:00
Nikita
b2140a895b Fix added_cond_kwargs when using IP-Adapter in StableDiffusionXLControlNetInpaintPipeline (#7924)
Fix `added_cond_kwargs` when using IP-Adapter

Fix error when using IP-Adapter in pipeline and passing `ip_adapter_image_embeds` instead of `ip_adapter_image`

Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-05-14 10:32:08 -10:00
Sayak Paul
e0e8c58f64 [Core] separate the loading utilities in modeling similar to pipelines. (#7943)
separate the loading utilities in modeling similar to pipelines.
2024-05-14 22:33:43 +05:30
Sayak Paul
cbea5d1725 update to use hf-workflows for reporting the Docker build statuses (#7938)
update to use hf-workflows for reporting
2024-05-14 09:25:13 +05:30
Tolga Cangöz
a1245c2c61 Expansion proposal of diffusers-cli env (#7403)
* Expand `diffusers-cli env`

* SafeTensors -> Safetensors

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* Move `safetensors_version = "not installed"` to `else`

* Update `safetensors_version` checking

* Add GPU detection for Linux, Mac OS, and Windows

* Add accelerator detection to environment command

* Add is_peft_version to import_utils

* Update env.py

* Add `huggingface_hub` reference

* Add `transformers` reference

* Add reference for `huggingface_hub`

* Fix print statement in env.py for unusual OS

* Up

* Fix platform information in env.py

* up

* Fix import order in env.py

* ruff

* make style

* Fix platform system check in env.py

* Fix run method return type in env.py

* 🤗

* No need f-string

* Remove location info

* Remove accelerate config

* Refactor env.py to remove accelerate config

* feat: Add support for `bitsandbytes` library in environment command

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2024-05-14 08:20:24 +05:30
bssrdf
cdda94f412 fix VAE loading issue in train_dreambooth (#7632)
* fixed vae loading issue #7619

* rerun make style && make quality

* bring back model_has_vae and add change \ to / in config_file_name on windows os to make match work

* add missing import platform

* bring back import model_info

* make config_file_name OS independent

* switch to using Path.as_posix() to resolve OS dependence

* improve style

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: bssrdf <bssrdf@gmail.com>
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2024-05-14 08:19:53 +05:30
dependabot[bot]
5b830aa356 Bump transformers from 4.36.0 to 4.38.0 in /examples/research_projects/realfill (#7635)
Bump transformers in /examples/research_projects/realfill

Bumps [transformers](https://github.com/huggingface/transformers) from 4.36.0 to 4.38.0.
- [Release notes](https://github.com/huggingface/transformers/releases)
- [Commits](https://github.com/huggingface/transformers/compare/v4.36.0...v4.38.0)

---
updated-dependencies:
- dependency-name: transformers
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-05-14 08:17:06 +05:30
Kohei
9e7bae9881 Update requirements.txt for text_to_image (#7892)
Update requirements.txt

If the datasets library is old, it will not read the metadata.jsonl and the label will default to an integer of type int.

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-05-14 08:09:12 +05:30
rebel-kblee
b41ce1e090 fix multicontrolnet save_pretrained logic for compatibility (#7821)
fix multicontrolnet save_pretrained logic for compatibility

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-05-13 09:32:06 -10:00
Sayak Paul
95d3748453 [LoRA] Fix LoRA tests (side effects of RGB ordering) part ii (#7932)
* check

* check 2.

* update slices
2024-05-13 09:23:48 -10:00
Fabio Rigano
44aa9e566d fix AnimateDiff creation with a unet loaded with IP Adapter (#7791)
* Fix loading from_pipe

* Fix style

---------

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2024-05-13 08:15:01 -10:00
Álvaro Somoza
fdb05f54ef Official callbacks (#7761) 2024-05-12 17:10:29 -10:00
HelloWorldBeginner
98ba18ba55 Add Ascend NPU support for SDXL. (#7916)
Co-authored-by: mhh001 <mahonghao1@huawei.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-05-12 13:34:23 +02:00
Sayak Paul
5bb38586a9 [Core] fix offload behaviour when device_map is enabled. (#7919)
fix offload behaviour when device_map is enabled.
2024-05-12 13:29:43 +02:00
Sai-Suraj-27
ec9e88139a fix: Fixed a wrong link to supported python versions in contributing.md file (#7638)
* Fixed a wrong link to python versions in contributing.md file.

* Updated the link to a permalink, so that it will permanently point to the specific line.
2024-05-12 13:21:18 +02:00
momo
e4f8dca9a0 add custom sigmas and timesteps for StableDiffusionXLControlNet pipeline (#7913)
add custom sigmas and timesteps
2024-05-11 23:33:19 -10:00
HelloWorldBeginner
0267c5233a fix bugs when using deepspeed in sdxl (#7917)
fix bugs when using deepspeed

Co-authored-by: mhh001 <mahonghao1@huawei.com>
2024-05-11 20:49:09 +02:00
Mark Van Aken
be4afa0bb4 #7535 Update FloatTensor type hints to Tensor (#7883)
* find & replace all FloatTensors to Tensor

* apply formatting

* Update torch.FloatTensor to torch.Tensor in the remaining files

* formatting

* Fix the rest of the places where FloatTensor is used as well as in documentation

* formatting

* Update new file from FloatTensor to Tensor
2024-05-10 09:53:31 -10:00
Sayak Paul
04f4bd54ea [Core] introduce videoprocessor. (#7776)
* introduce videoprocessor.

* fix quality

* address yiyi's feedback

* fix preprocess_video call.

* video_processor -> image_processor

* fix

* fix more.

* quality

* image_processor -> video_processor

* support List[List[PIL.Image.Image]]

* change to video_processor.

* documentation

* Apply suggestions from code review

* changes

* remove print.

* refactor video processor (part # 7776) (#7861)

* update

* update remove deprecate

* Update src/diffusers/video_processor.py

* update

* Apply suggestions from code review

* deprecate list of 5d for video and list of 4d for image + apply other feedbacks

* up

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* add doc.

* tensor2vid -> postprocess_video.

* refactor preprocess with preprocess_video

* set default values.

* empty commit

* more refactoring of prepare_latents in animatediff vid2vid

* checking documentation

* remove documentation for now.

* fix animatediff sdxl

* fix test failure [part of video processor PR] (#7905)

up

* remove preceed_with_frames.

* doc

* fix

* fix

* remove video input as a single-frame video.

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-05-10 21:02:36 +02:00
Sayak Paul
82be58c512 add missing image processors to the docs (#7910)
add missing processors.
2024-05-10 14:53:57 +02:00
Sayak Paul
6695635696 upgrade to python 3.10 in the Dockerfiles (#7893)
* upgrade to python 3.10

* fix

* try https://askubuntu.com/questions/1459694/can-not-find-python3-10-after-apt-get-installation

* fix

* up

* yes

* okay

* up

* up

* up

* up

* up

* check

* okay

* up

* i[

* fix
2024-05-10 14:29:08 +02:00
YiYi Xu
b934215d4c [scheduler] support custom timesteps and sigmas (#7817)
* support custom sigmas and timesteps, dpm euler

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-05-09 11:07:43 -10:00
YiYi Xu
5ed3abd371 fix _optional_components in StableCascadeCombinedPipeline (#7894)
* fix

* up
2024-05-09 06:32:55 -10:00
Dhruv Nair
1087a510b5 Set max parallel jobs on slow test runners (#7878)
* set max parallel

* update

* update

* update
2024-05-09 19:42:18 +05:30
Sayak Paul
305f2b4498 [Tests] fix things after #7013 (#7899)
* debugging

* save the resulting image

* check if order reversing works.

* checking values.

* up

* okay

* checking

* fix

* remove print
2024-05-09 16:05:35 +02:00
Dhruv Nair
cb0f3b49cb [Refactor] Better align from_single_file logic with from_pretrained (#7496)
* refactor unet single file loading a bit.

* retrieve the unet from create_diffusers_unet_model_from_ldm

* update

* update

* updae

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* tests

* update

* update

* update

* Update docs/source/en/api/single_file.md

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* Update docs/source/en/api/single_file.md

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* Update docs/source/en/api/loaders/single_file.md

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/loaders/single_file.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update docs/source/en/api/loaders/single_file.md

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* Update docs/source/en/api/loaders/single_file.md

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* Update docs/source/en/api/loaders/single_file.md

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* Update docs/source/en/api/loaders/single_file.md

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

---------

Co-authored-by: sayakpaul <spsayakpaul@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-05-09 19:00:19 +05:30
Tolga Cangöz
caf9e985df Fix several imports (#7712)
Fix imports
2024-05-09 07:34:44 +02:00
Tolga Cangöz
c1c42698c9 Remove dead code and fix f-string issue (#7720)
* Remove dead code

* PylancereportGeneralTypeIssues: Strings nested within an f-string cannot use the same quote character as the f-string prior to Python 3.12.

* Remove dead code
2024-05-08 13:15:28 -10:00
Pierre Dulac
75aab34675 Allow users to save SDXL LoRA weights for only one text encoder (#7607)
SDXL LoRA weights for text encoders should be decoupled on save

The method checks if at least one of unet, text_encoder and
text_encoder_2 lora weights are passed, which was not reflected in the
implentation.

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-05-08 10:41:58 -10:00
YiYi Xu
35358a2dec fix offload test (#7868)
fix

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2024-05-08 07:59:08 -10:00
Aryan
818f760732 [Pipeline] AnimateDiff SDXL (#6721)
* update conversion script to handle motion adapter sdxl checkpoint

* add animatediff xl

* handle addition_embed_type

* fix output

* update

* add imports

* make fix-copies

* add decode latents

* update docstrings

* add animatediff sdxl to docs

* remove unnecessary lines

* update example

* add test

* revert conv_in conv_out kernel param

* remove unused param addition_embed_type_num_heads

* latest IPAdapter impl

* make fix-copies

* fix return

* add IPAdapterTesterMixin to tests

* fix return

* revert based on suggestion

* add freeinit

* fix test_to_dtype test

* use StableDiffusionMixin instead of different helper methods

* fix progress bar iterations

* apply suggestions from review

* hardcode flip_sin_to_cos and freq_shift

* make fix-copies

* fix ip adapter implementation

* fix last failing test

* make style

* Update docs/source/en/api/pipelines/animatediff.md

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

* remove todo

* fix doc-builder errors

---------

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2024-05-08 21:27:14 +05:30
Philip Pham
f29b93488d Check shape and remove deprecated APIs in scheduling_ddpm_flax.py (#7703)
`model_output.shape` may only have rank 1.

There are warnings related to use of random keys.

```
tests/schedulers/test_scheduler_flax.py: 13 warnings
  /Users/phillypham/diffusers/src/diffusers/schedulers/scheduling_ddpm_flax.py:268: FutureWarning: normal accepts a single key, but was given a key array of shape (1, 2) != (). Use jax.vmap for batching. In a future JAX version, this will be an error.
    noise = jax.random.normal(split_key, shape=model_output.shape, dtype=self.dtype)

tests/schedulers/test_scheduler_flax.py::FlaxDDPMSchedulerTest::test_betas
  /Users/phillypham/virtualenv/diffusers/lib/python3.9/site-packages/jax/_src/random.py:731: FutureWarning: uniform accepts a single key, but was given a key array of shape (1,) != (). Use jax.vmap for batching. In a future JAX version, this will be an error.
    u = uniform(key, shape, dtype, lo, hi)  # type: ignore[arg-type]
```
2024-05-08 13:57:19 +02:00
Tolga Cangöz
d50baf0c63 Fix image upcasting (#7858)
Fix image's upcasting before `vae.encode()` when using `fp16`

Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-05-07 16:45:02 -10:00
Hyoungwon Cho
c2217142bd Modification on the PAG community pipeline (re) (#7876)
* edited_pag_implementation

* update

---------

Co-authored-by: yiyixuxu <yixu310@gmail.com>
2024-05-07 16:35:15 -10:00
Bagheera
8edaf3b79c 7879 - adjust documentation to use naruto dataset, since pokemon is now gated (#7880)
* 7879 - adjust documentation to use naruto dataset, since pokemon is now gated

* replace references to pokemon in docs

* more references to pokemon replaced

* Japanese translation update

---------

Co-authored-by: bghira <bghira@users.github.com>
2024-05-07 09:36:39 -07:00
Álvaro Somoza
23e091564f Fix for "no lora weight found module" with some loras (#7875)
* return layer weight if not found

* better system and test

* key example and typo
2024-05-07 13:54:57 +02:00
Steven Liu
0d23645bd1 [docs] Distilled inference (#7834)
* combine

* edits
2024-05-06 15:07:25 -07:00
Guillaume LEGENDRE
7fa3e5b0f6 Ci - change cache folder (#7867) 2024-05-06 17:55:24 +05:30
Steven Liu
49b959b540 [docs] LCM (#7829)
* lcm

* lcm lora

* fix

* fix hfoption

* edits
2024-05-03 16:08:27 -07:00
HelloWorldBeginner
58237364b1 Add Ascend NPU support for SDXL fine-tuning and fix the model saving bug when using DeepSpeed. (#7816)
* Add Ascend NPU support for SDXL fine-tuning and fix the model saving bug when using DeepSpeed.

* fix check code quality

* Decouple the NPU flash attention and make it an independent module.

* add doc and unit tests for npu flash attention.

---------

Co-authored-by: mhh001 <mahonghao1@huawei.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-05-03 08:14:34 -10:00
Dhruv Nair
3e35628873 Remove installing python again in container (#7852)
update
2024-05-03 15:09:15 +05:30
Lucain
6a479588db Respect resume_download deprecation (#7843)
* Deprecate resume_download

* align docstring with transformers

* style

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-05-03 08:42:57 +02:00
Aritra Roy Gosthipaty
fa489eaed6 [Tests] reduce the model size in the blipdiffusion fast test (#7849)
reducing model size
2024-05-03 07:46:48 +05:30
Dhruv Nair
0d7c479023 Update deps for pipe test fetcher (#7838)
update

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-05-02 20:36:47 +05:30
Guillaume LEGENDRE
ce97d7e19b Change GPU Runners (#7840)
* Move to new GPU Runners for slow tests

* Move to new GPU Runners for nightly tests
2024-05-02 18:48:46 +05:30
Guillaume LEGENDRE
44ba90caff move to new runners (#7839) 2024-05-02 14:53:38 +02:00
Dhruv Nair
3c85a57297 Update CI cache (#7832)
update

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-05-02 14:03:35 +05:30
Dhruv Nair
03ca11318e Update download diff format tests (#7831)
update

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-05-02 13:15:38 +05:30
Dhruv Nair
3ffa7b46e5 Fix hanging pipeline fetching (#7837)
update
2024-05-02 13:08:57 +05:30
yunseong Cho
c1b2a89e34 Fix key error for dictionary with randomized order in convert_ldm_unet_checkpoint (#7680)
fix key error for different order

Co-authored-by: yunseong <yunseong.cho@superlabs.us>
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2024-05-02 10:29:55 +05:30
Aritra Roy Gosthipaty
435d37ce5a [Tests] reduce the model size in the audioldm fast test (#7833)
chore: initial size reduction of models
2024-05-02 06:03:52 +05:30
YiYi Xu
5915c2985d [ip-adapter] fix ip-adapter for StableDiffusionInstructPix2PixPipeline (#7820)
update prepare_ip_adapter_ for pix2pix
2024-05-01 06:27:43 -10:00
YiYi Xu
21a7ff12a7 update the logic of is_sequential_cpu_offload (#7788)
* up

* add comment to the tests + fix dit

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-05-01 06:25:57 -10:00
Sayak Paul
8909ab4b19 [Tests] fix: device map tests for models (#7825)
* fix: device module tests

* remove patch file

* Empty-Commit
2024-05-01 18:45:47 +05:30
Dhruv Nair
c1edb03c37 Fix for pipeline slow test fetcher (#7824)
* update

* update
2024-05-01 17:36:54 +05:30
Steven Liu
0d08370263 [docs] Community pipelines (#7819)
* community pipelines

* feedback

* consolidate
2024-04-30 14:10:14 -07:00
Tolga Cangöz
b8ccb46259 Fix CPU offload in docstring (#7827)
Fix cpu offload
2024-04-30 10:53:27 -07:00
Dhruv Nair
725ead2f5e SSH Runner Workflow Update (#7822)
* add debug workflow

* update
2024-04-30 20:14:18 +05:30
Linoy Tsaban
26a7851e1e Add B-Lora training option to the advanced dreambooth lora script (#7741)
* add blora

* add blora

* add blora

* add blora

* little changes

* little changes

* remove redundancies

* fixes

* add B LoRA to readme

* style

* inference

* defaults + path to loras+ generation

* minor changes

* style

* minor changes

* minor changes

* blora arg

* added --lora_unet_blocks

* style

* Update examples/advanced_diffusion_training/README.md

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* add commit hash to B-LoRA repo cloneing

* change inference, remove cloning

* change inference, remove cloning
add section about configureable unet blocks

* change inference, remove cloning
add section about configureable unet blocks

* Apply suggestions from code review

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-04-30 09:46:30 +05:30
Sayak Paul
3fd31eef51 [Core] introduce _no_split_modules to ModelMixin (#6396)
* introduce _no_split_modules.

* unnecessary spaces.

* remove unnecessary kwargs and style

* fix: accelerate imports.

* change to _determine_device_map

* add the blocks that have residual connections.

* add: CrossAttnUpBlock2D

* add: testin

* style

* line-spaces

* quality

* add disk offload test without safetensors.

* checking disk offloading percentages.

* change model split

* add: utility for checking multi-gpu requirement.

* model parallelism test

* splits.

* splits.

* splits

* splits.

* splits.

* splits.

* offload folder to test_disk_offload_with_safetensors

* add _no_split_modules

* fix-copies
2024-04-30 08:46:51 +05:30
Aritra Roy Gosthipaty
b02e2113ff [Tests] reduce the model size in the amused fast test (#7804)
* chore: reducing model sizes

* chore: shrinks further

* chore: shrinks further

* chore: shrinking model for img2img pipeline

* chore: reducing size of model for inpaint pipeline

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-04-30 08:11:26 +05:30
Aritra Roy Gosthipaty
21f023ec1a [Tests] reduce the model size in the ddpm fast test (#7797)
* chore: reducing unet size for faster tests

* review suggestions

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-04-30 08:11:03 +05:30
Aritra Roy Gosthipaty
31d9f9ea77 [Tests] reduce the model size in the ddim fast test (#7803)
chore: reducing model size for ddim fast pipeline

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-04-30 07:54:38 +05:30
Clint Adams
f53352f750 Set main_input_name in StableDiffusionSafetyChecker to "clip_input" (#7500)
FlaxStableDiffusionSafetyChecker sets main_input_name to "clip_input".
This makes StableDiffusionSafetyChecker consistent.

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-04-29 11:39:59 -10:00
RuiningLi
83ae24ce2d Added get_velocity function to EulerDiscreteScheduler. (#7733)
* Added get_velocity function to EulerDiscreteScheduler.

* Fix white space on blank lines

* Added copied from statement

* back to the original.

---------

Co-authored-by: Ruining Li <ruining@robots.ox.ac.uk>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-04-29 10:32:13 -10:00
jschoormans
8af793b2d4 Adding TextualInversionLoaderMixin for the controlnet_inpaint_sd_xl pipeline (#7288)
* added TextualInversionMixIn to controlnet_inpaint_sd_xl pipeline


---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-04-29 09:00:53 -10:00
Dhruv Nair
eb96ff0d59 Safetensor loading in AnimateDiff conversion scripts (#7764)
* update

* update
2024-04-29 17:36:50 +05:30
Yushu
a38dd79512 [Pipeline] Fix error of SVD pipeline when num_videos_per_prompt > 1 (#7786)
swap the order for do_classifier_free_guidance concat with repeat

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2024-04-29 16:24:16 +05:30
Dhruv Nair
b1c5817a89 Add debugging workflow (#7778)
add debug workflow

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-04-29 13:44:39 +05:30
Nilesh
235d34cf56 Check for latents, before calling prepare_latents - sdxlImg2Img (#7582)
* Check for latents, before calling prepare_latents - sdxlImg2Img

* Added latents check for all the img2img pipeline

* Fixed silly mistake while checking latents as None
2024-04-28 14:53:29 -10:00
Jenyuan-Huang
5029673987 Update InstantStyle usage in IP-Adapter documentation (#7806)
* enable control ip-adapter per-transformer block on-the-fly


---------

Co-authored-by: sayakpaul <spsayakpaul@gmail.com>
Co-authored-by: ResearcherXman <xhs.research@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-04-28 10:34:57 -10:00
Sayak Paul
56bd7e67c2 [Scheduler] introduce sigma schedule. (#7649)
* introduce sigma schedule.

Co-authored-by: Suraj Patil <surajp815@gmail.com>

* address yiyi

* update docstrings.

* implement the schedule for EDMDPMSolverMultistepScheduler

---------

Co-authored-by: Suraj Patil <surajp815@gmail.com>
2024-04-27 07:40:35 +05:30
39th president of the United States, probably
9d16daaf64 Add DREAM training (#6381)
A new function compute_dream_and_update_latents has been added to the
training utilities that allows you to do DREAM rectified training in line
with the paper https://arxiv.org/abs/2312.00210. The method can be used
with an extra argument in the train_text_to_image.py script.

Co-authored-by: Jimmy <39@🇺🇸.com>
2024-04-27 07:19:15 +05:30
Fabio Rigano
8e4ca1b6b2 [Docs] Update image masking and face id example (#7780)
* [Docs] Update image masking and face id example

* Update docs

* Fix docs
2024-04-26 12:51:11 -10:00
Beinsezii
0d2d424fbe Add PixArtSigmaPipeline to AutoPipeline mapping (#7783) 2024-04-26 09:10:20 -10:00
Steven Liu
e24e54fdfa [docs] Fix AutoPipeline docstring (#7779)
fix

Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-04-26 10:09:36 -07:00
btlorch
ebc99a77aa Convert RGB to BGR for the SDXL watermark encoder (#7013)
* Convert channel order to BGR for the watermark encoder. Convert the watermarked BGR images back to RGB. Fixes #6292

* Revert channel order before stacking images to overcome limitations that negative strides are currently not supported

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-04-25 14:44:53 -10:00
Steven Liu
fa750a15bd [docs] Refactor image quality docs (#7758)
* refactor

* code snippets

* fix path

* fix path in guide

* code outputs

* align toctree title

* title

* fix title
2024-04-25 16:55:35 -07:00
Steven Liu
181688012a [docs] Reproducible pipelines (#7769)
* reproducibility

* feedback

* feedback

* fix path

* github link
2024-04-25 16:15:12 -07:00
Sayak Paul
142f353e1c Fix lora device test (#7738)
* fix lora device test

* fix more.

* fix more/

* quality

* empty

---------

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2024-04-25 18:05:27 +05:30
Sayak Paul
b833d0fc80 [Tests] mark UNetControlNetXSModelTests::test_forward_no_control to be flaky (#7771)
decorate UNetControlNetXSModelTests::test_forward_no_control with is_flaky
2024-04-25 07:29:04 +05:30
Sayak Paul
e963621649 [PixArt] fix small nits in pixart sigma (#7767)
fix small nits in pixart sigma
2024-04-25 06:37:35 +05:30
Junsong Chen
39215aa30e PixArt-Sigma Implementation (#7654)
* support PixArt-DMD

---------

Co-authored-by: jschen <chenjunsong4@h-partners.com>
Co-authored-by: badayvedat <badayvedat@gmail.com>
Co-authored-by: Vedat Baday <54285744+badayvedat@users.noreply.github.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: yiyixuxu <yixu310@gmail,com>
2024-04-23 22:33:08 -10:00
Dhruv Nair
9ef43f38d4 Fix test for consistency decoder. (#7746)
update
2024-04-24 12:28:11 +05:30
Dhruv Nair
88018fcf20 Fix failing VAE tiling test (#7747)
update
2024-04-24 12:27:45 +05:30
Steven Liu
7404f1e9dc [docs] Clean up toctree (#7715)
* toctree

* optim

* feedback

* improve overview
2024-04-23 09:30:33 -07:00
Sayak Paul
5a69227863 [Metadat utils] fix: json lines ordering. (#7744)
fix: json lines ordering.
2024-04-23 14:32:30 +05:30
Sai-Suraj-27
fc9fecc217 fix: Fixed a wrong decorator by modifying it to @classmethod (#7653)
* Fixed wrong decorator by modifying it to @classmethod.

* Updated the method and it's argument.

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-04-22 14:41:35 -10:00
Fabio Rigano
065f251766 Restore AttnProcessor2_0 in unload_ip_adapter (#7727)
* Restore AttnProcessor2_0 in unload_ip_adapter

* Fix style

* Update test

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-04-22 13:59:03 -10:00
Jenyuan-Huang
21c747fa0f Support InstantStyle (#7668)
* enable control ip-adapter per-transformer block on-the-fly

---------

Co-authored-by: sayakpaul <spsayakpaul@gmail.com>
Co-authored-by: ResearcherXman <xhs.research@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-04-22 13:20:19 -10:00
Phil Butler
09129842e7 Remove redundant lines (#7396)
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-04-22 09:32:16 -10:00
Steven Liu
33b363edfa [docs] AutoPipeline (#7714)
* autopipeline

* edits

* feedback

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-04-22 10:15:07 -07:00
Dhruv Nair
a9dd86029e Fix Kandinksy V22 tests (#7699)
update
2024-04-22 15:41:59 +05:30
Dhruv Nair
9100652494 Update Wuerschten Test (#7700)
update
2024-04-22 15:41:39 +05:30
Abhinav Gopal
d1e3f489e9 Animatediff Controlnet Community Pipeline IP Adapter Fix (#7413)
* fixed encode_image function signature in controlnet animatediff

* copied encode_image from stable diffusion pipeline

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-04-19 15:35:07 -10:00
Guillaume LEGENDRE
ae05050db9 fix/add tailscale key in case of failure (#7719)
add tailscale key in case of failure
2024-04-19 10:56:40 +02:00
Sai-Suraj-27
db969cc16d fix: Fixed type annotations for compatability with python 3.8 (#7648)
* Fixed type annotations for compatability with python 3.8

* Add required imports.
2024-04-18 19:34:09 -10:00
Dhruv Nair
3cfe187dc7 Cleanup ControlnetXS (#7701)
* update

* update
2024-04-18 19:32:00 -10:00
Dhruv Nair
90250d9e48 Cast height, width to int inside prepare latents (#7691)
update
2024-04-18 19:30:39 -10:00
YiYi Xu
e5674015f3 adding back test_conversion_when_using_device_map (#7704)
* style


* Fix device map nits (#7705)


---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-04-18 19:21:32 -10:00
Fabio Rigano
b5c8b555d7 Move IP Adapter Face ID to core (#7186)
* Switch to peft and multi proj layers

* Move Face ID loading and inference to core

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-04-18 14:13:27 -10:00
Guillaume LEGENDRE
e23c27e905 Add tailscale action to push_test (#7709) 2024-04-18 18:48:39 +05:30
Steven Liu
7635d3d37f [docs] Pipeline loading (#7684)
* pipelines

* schedulers and models

* community pipelines

* feedback
2024-04-17 15:42:27 -07:00
Wentian
9132ce7c58 [Docs] Update TGATE in section optimization. (#7698)
Update tgate.md
2024-04-17 09:37:24 -07:00
Sayak Paul
30c977d1f5 [Workflows] remove installation of redundant modules from flax PR tests (#7662)
remove installation of redundant modules from flax PR tests
2024-04-17 15:16:04 +05:30
Dhruv Nair
f0fa17dd8e Don't install PEFT with UV in slow tests (#7697)
* update

* update
2024-04-17 15:10:38 +05:30
Sai-Suraj-27
c726d02beb fix: Updated ruff configuration to avoid deprecated configuration warning (#7637)
Updated ruff configuration to avoid depreceated config.

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-04-16 15:02:55 -10:00
Wentian
a68503f221 [Docs] Add TGATE in section optimization (#7639)
* Create tgate.md

* Update _toctree.yml

* Update docs/source/en/optimization/tgate.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/optimization/tgate.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/optimization/tgate.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/optimization/tgate.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/optimization/tgate.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/optimization/tgate.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/optimization/tgate.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/optimization/tgate.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/optimization/tgate.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/optimization/tgate.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update tgate.md

* Update tgate.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-04-16 17:58:27 -07:00
Sayak Paul
9d50f7eec1 [Core] is_cosxl_edit arg in SDXL ip2p. (#7650)
* is_cosxl_edit arg in SDXL ip2p.

* Empty-Commit

Co-authored-by: Yiyi Xu <yixu310@gmail.com>

* doc

* remove redundant logic.

* reflect drhuv's comments.

---------

Co-authored-by: Yiyi Xu <yixu310@gmail.com>
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2024-04-16 22:15:55 +05:30
UmerHA
fda1531d8a Fixing implementation of ControlNet-XS (#6772)
* CheckIn - created DownSubBlocks

* Added extra channels, implemented subblock fwd

* Fixed connection sizes

* checkin

* Removed iter, next in forward

* Models for SD21 & SDXL run through

* Added back pipelines, cleared up connections

* Cleaned up connection creation

* added debug logs

* updated logs

* logs: added input loading

* Update umer_debug_logger.py

* log: Loading hint

* Update umer_debug_logger.py

* added logs

* Changed debug logging

* debug: added more logs

* Fixed num_norm_groups

* Debug: Logging all of SDXL input

* Update umer_debug_logger.py

* debug: updated logs

* checkim

* Readded tests

* Removed debug logs

* Fixed Slow Tests

* Added value ckecks | Updated model_cpu_offload_seq

* accelerate-offloading works ; fast tests work

* Made unet & addon explicit in controlnet

* Updated slow tests

* Added dtype/device to ControlNetXS

* Filled in test model paths

* Added image_encoder/feature_extractor to XL pipe

* Fixed fast tests

* Added comments and docstrings

* Fixed copies

* Added docs ; Updates slow tests

* Moved changes to UNetMidBlock2DCrossAttn

* tiny cleanups

* Removed stray prints

* Removed ip adapters + freeU

- Removed ip adapters + freeU as they don't make sense for ControlNet-XS
- Fixed imports of UNet components

* Fixed test_save_load_float16

* Make style, quality, fix-copies

* Changed loading/saving API for ControlNetXS

- Changed loading/saving API for ControlNetXS
- other small fixes

* Removed ControlNet-XS from research examples

* Make style, quality, fix-copies

* Small fixes

- deleted ControlNetXSModel.init_original
- added time_embedding_mix to StableDiffusionControlNetXSPipeline .from_pretrained / StableDiffusionXLControlNetXSPipeline.from_pretrained
- fixed copy hints

* checkin May 11 '23

* CheckIn Mar 12 '24

* Fixed tests for SD

* Added tests for UNetControlNetXSModel

* Fixed SDXL tests

* cleanup

* Delete Pipfile

* CheckIn Mar 20

Started replacing sub blocks  by `ControlNetXSCrossAttnDownBlock2D` and `ControlNetXSCrossAttnUplock2D`

* check-in Mar 23

* checkin 24 Mar

* Created init for UNetCnxs and CnxsAddon

* CheckIn

* Made from_modules, from_unet and no_control work

* make style,quality,fix-copies & small changes

* Fixed freezing

* Added gradient ckpt'ing; fixed tests

* Fix slow tests(+compile) ; clear naming confusion

* Don't create UNet in init ; removed class_emb

* Incorporated review feedback

- Deleted get_base_pipeline /  get_controlnet_addon for pipes
- Pipes inherit from StableDiffusionXLPipeline
- Made module dicts for cnxs-addon's down/mid/up classes
- Added support for qkv fusion and freeU

* Make style, quality, fix-copies

* Implemented review feedback

* Removed compatibility check for vae/ctrl embedding

* make style, quality, fix-copies

* Delete Pipfile

* Integrated review feedback

- Importing ControlNetConditioningEmbedding now
- get_down/mid/up_block_addon now outside class
- renamed `do_control` to `apply_control`

* Reduced size of test tensors

For this, added `norm_num_groups` as parameter everywhere

* Renamed cnxs-`Addon` to cnxs-`Adapter`

- `ControlNetXSAddon` -> `ControlNetXSAdapter`
- `ControlNetXSAddonDownBlockComponents` -> `DownBlockControlNetXSAdapter`, and similarly for mid/up
- `get_mid_block_addon` -> `get_mid_block_adapter`, and similarly for mid/up

* Fixed save_pretrained/from_pretrained bug

* Removed redundant code

---------

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2024-04-16 21:56:20 +05:30
Sayak Paul
cf6e0407e0 don't install peft from the source with uv for now. (#7679) 2024-04-15 09:33:02 +05:30
Sayak Paul
1c000d46e1 fix: metadata token (#7631) 2024-04-15 08:32:27 +05:30
Sayak Paul
08bf754507 make docker-buildx mandatory. (#7652) 2024-04-13 07:26:34 +05:30
kabachuha
2f23437618 Add (Scheduled) Pseudo-Huber Loss training scripts to research projects (#7527)
* add scheduled pseudo-huber loss training scripts

See #7488

* add reduction modes to huber loss

* [DB Lora] *2 multiplier to huber loss cause of 1/2 a^2 conv.

pairing of c6495def1f

* [DB Lora] add option for smooth l1 (huber / delta)

Pairing of dd22958caa

* [DB Lora] unify huber scheduling

Pairing of 19a834c3ab

* [DB Lora] add snr huber scheduler

Pairing of 47fb1a6854

* fixup examples link

* use snr schedule by default in DB

* update all huber scripts with snr

* code quality

* huber: make style && make quality

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-04-13 07:26:08 +05:30
Benjamin Bossan
2523390c26 FIX Setting device for DoRA parameters (#7655)
Fix a bug that causes the the call to set_lora_device to ignore the DoRA
parameters.
2024-04-12 13:55:46 +02:00
Sai-Suraj-27
279de3c3ff fix: Replaced deprecated logger.warn with logger.warning (#7643)
Fixed deprecated logger.warn with logger.warning.
2024-04-11 09:43:01 -10:00
Yiqin Zhao
8e14535708 Fixed YAML loading. (#7579) 2024-04-11 09:08:42 -10:00
dg845
0bee4d336b LCM Distill Scripts Fix Bug when Initializing Target U-Net (#6848)
* Initialize target_unet from unet rather than teacher_unet so that we correctly add time_embedding.cond_proj if necessary.

* Use UNet2DConditionModel.from_config to initialize target_unet from unet's config.

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-04-11 07:52:12 -10:00
Steven Munn
42f25d601a Skip PEFT LoRA Scaling if the scale is 1.0 (#7576)
* Skip scaling if scale is identity

* move check for weight one to scale and unscale lora

* fix code style/quality

* Empty-Commit

---------

Co-authored-by: Steven Munn <stevenjmunn@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Steven Munn <5297082+stevenjlm@users.noreply.github.com>
2024-04-11 11:02:31 +05:30
Sayak Paul
33c5d125cb [Core] fix img2img pipeline for Playground (#7627)
* playground vae encoding should use std and mean of the vae.

* style.

* fix-copies.
2024-04-11 09:07:38 +05:30
YiYi Xu
aa1f00fd01 Fix cpu offload related slow tests (#7618)
* fix

* up

---------

Co-authored-by: yiyixuxu <yixu310@gmail,com>
2024-04-10 14:53:45 -10:00
Steven Liu
d95b993427 [docs] T2I (#7623)
* refactor t2i

* add code snippets
2024-04-10 17:10:41 -07:00
Steven Liu
1d480298c1 [docs] Prompt enhancer (#7565)
* prompt enhance

* edits

* align titles

* feedback

* feedback

* feedback

* link to style
2024-04-10 16:09:06 -07:00
Sayak Paul
b2323aa2b7 [Tests] reduce the model sizes in the SD fast tests (#7580)
* give it a shot.

* print.

* correct assertion.

* gather results from the rest of the tests.

* change the assertion values where needed.

* remove print statements.
2024-04-10 11:36:28 -10:00
satani99
37e9d695af Modularize instruct_pix2pix SD inferencing during and after training in examples (#7603)
* Modularize instruct_pix2pix code

* quality check

* quality check

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-04-10 11:19:16 +05:30
Sayak Paul
a402431de0 [docs] remove duplicate tip block. (#7625)
remove duplicate tip block.
2024-04-10 10:31:11 +05:30
IDKiro
b99b1617cf add the option of upsample function for tiny vae (#7604)
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-04-10 09:27:39 +05:30
Sayak Paul
3e4a6bd2d4 [Core] add "balanced" device_map support to pipelines (#6857)
* get device <-> component mapping when using multiple gpus.

* condition the device_map bits.

* relax condition

* device_map progress.

* device_map enhancement

* some cleaning up and debugging

* Apply suggestions from code review

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* incorporate suggestions from PR.

* remove multi-gpu condition for now.

* guard check the component -> device mapping

* fix: device_memory variable

* dispatching transformers model to have force_hooks=True

* better guarding for transformers device_map

* introduce support balanced_low_memory and balanced_ultra_low_memory.

* remove device_map patch.

* fix: intermediate variable scoping.

* fix: condition in cpu offload.

* fix: flax class restrictions.

* remove modifications from cpu_offload and model_offload

* incorporate changes.

* add a simple forward pass test

* add: torch_device in get_inputs()

* add: tests

* remove print

* safe-guard to(), model offloading and cpu offloading when balanced is used as a device_map.

* style

* remove .

* safeguard device_map with more checks and remove invalid device_mapping strategues.

* make  a class attribute and adjust tests accordingly.

* fix device_map check

* fix test

* adjust comment

* fix: device_map attribute

* fix: dispatching.

* max_memory test for pipeline

* version guard the tests

* fix guard.

* address review feedback.

* reset_device_map method.

* add: test for reset_hf_device_map

* fix a couple things.

* add reset_device_map() in the error message.

* add tests for checking reset_device_map doesn't have unintended consequences.

* fix reset_device_map and offloading tests.

* create _get_final_device_map utility.

* hf_device_map -> _hf_device_map

* add documentation

* add notes suggested by Marc.

* styling.

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* move updates within gpu condition.

* other docs related things

* note on ignore a device not specified in .

* provide a suggestion if device mapping errors out.

* fix: typo.

* _hf_device_map -> hf_device_map

* Empty-Commit

* add: example hf_device_map.

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
2024-04-10 08:59:05 +05:30
Sayak Paul
c827e94da0 [Workflows] remove installation of libsndfile1-dev and libgl1 from workflows (#7543)
* remove libsndfile1-dev and libgl1 from workflows and ensure that re present in the respective dockerfiles.

* change to self-hosted runner; let's see 🤞

* add libsndfile1-dev libgl1 for now

* use self-hosted runners for building and push too.
2024-04-10 08:34:56 +05:30
Sayak Paul
44f6b859bf [Core] refactor transformer_2d forward logic into meaningful conditions. (#7489)
* refactor transformer_2d forward logic into meaningful conditions.

* Empty-Commit

* fix: _operate_on_patched_inputs

* fix: _operate_on_patched_inputs

* check

* fix: patch output computation block.

* fix: _operate_on_patched_inputs.

* remove print.

* move operations to blocks.

* more readability neats.

* empty commit

* Apply suggestions from code review

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>

* Revert "Apply suggestions from code review"

This reverts commit 12178b1aa0.

---------

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2024-04-10 08:33:19 +05:30
Sayak Paul
ac7ff7d4a3 add utilities for updating diffusers pipeline metadata. (#7573)
* add utilities for updating diffusers pipeline metadata.

* style

* remove first empty line
2024-04-10 08:28:49 +05:30
Fabio Rigano
a0cf607667 Multi-image masking for single IP Adapter (#7499)
* Support multiimage masking

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-04-09 09:20:57 -10:00
YiYi Xu
a341b536a8 disable test_conversion_when_using_device_map (#7620)
* disable test

* update

---------

Co-authored-by: yiyixuxu <yixu310@gmail,com>
2024-04-09 09:01:19 -10:00
Christopher Beckham
8e46d97cd8 Add missing restore() EMA call in train SDXL script (#7599)
* Restore unet params back to normal from EMA when validation call is finished

* empty commit

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-04-09 18:07:55 +05:30
Junjie
7e808e768a [Docs] fix bugs in callback docs (#7594) 2024-04-08 08:46:30 -10:00
w4ffl35
7e39516627 Allow more arguments to be passed to convert_from_ckpt (#7222)
Allow safety and feature extractor arguments to be passed to convert_from_ckpt

Allows management of safety checker and feature extractor
from outside of the convert ckpt class.

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-04-08 10:13:48 +05:30
Nguyễn Công Tú Anh
56a76082ed Add AudioLDM2 TTS (#5381)
* add audioldm2 tts

* change gpt2 max new tokens

* remove unnecessary pipeline and class

* add TTS to AudioLDM2Pipeline

* add TTS docs

* delete unnecessary file

* remove unnecessary import

* add audioldm2 slow testcase

* fix code quality

* remove AudioLDMLearnablePositionalEmbedding

* add variable check vits encoder

* add use_learned_position_embedding

---------

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2024-04-08 10:11:24 +05:30
YiYi Xu
6133d98ff7 [IF| add set_begin_index for all IF pipelines (#7577)
add set_begin_index for all if pipelines
2024-04-05 06:54:07 -10:00
Sayak Paul
1c60e094de [Tests] reduce block sizes of UNet and VAE tests (#7560)
* reduce block sizes for unet1d.

* reduce blocks for unet_2d.

* reduce block size for unet_motion

* increase channels.

* correctly increase channels.

* reduce number of layers in unet2dconditionmodel tests.

* reduce block sizes for unet2dconditionmodel tests

* reduce block sizes for unet3dconditionmodel.

* fix: test_feed_forward_chunking

* fix: test_forward_with_norm_groups

* skip spatiotemporal tests on MPS.

* reduce block size in AutoencoderKL.

* reduce block sizes for vqmodel.

* further reduce block size.

* make style.

* Empty-Commit

* reduce sizes for ConsistencyDecoderVAETests

* further reduction.

* further block reductions in AutoencoderKL and AssymetricAutoencoderKL.

* massively reduce the block size in unet2dcontionmodel.

* reduce sizes for unet3d

* fix tests in unet3d.

* reduce blocks further in motion unet.

* fix: output shape

* add attention_head_dim to the test configuration.

* remove unexpected keyword arg

* up a bit.

* groups.

* up again

* fix
2024-04-05 10:08:32 +05:30
UmerHA
71f49a5d2a Skip test_freeu_enabled on MPS (#7570)
* Skip `test_freeu_enabled ` on MPS

* Small fixes

- import skip_mps correctly
- disable all instances of test_freeu_enabled

* Empty commit to trigger tests

* Empty commit to trigger CI
2024-04-04 12:16:04 +02:00
Abhinav Gopal
35db2fdea9 Update pipeline_animatediff_video2video.py (#7457)
* Update pipeline_animatediff_video2video.py

* commit with test for whether latent input can be passed into animatediffvid2vid
2024-04-03 19:34:28 +05:30
Sayak Paul
ad55ce6100 [Chore] increase number of workers for the tests. (#7558)
* increase number of workers for the tests.

* move to beefier runner.

* improve the fast push tests too.

* use a beefy machine for pytorch pipeline tests

* up the number of workers further.
2024-04-03 17:11:42 +05:30
Sayak Paul
a9a5b14f35 [Core] refactor transformers 2d into multiple init variants. (#7491)
* refactor transformers 2d into multiple legacy variants.

* fix: init.

* fix recursive init.

* add inits.

* make transformer block creation more modular.

* complete refactor.

* remove forward

* debug

* remove legacy blocks and refactor within the module itself.

* remove print

* guard caption projection

* remove fetcher.

* reduce the number of args.

* fix: norm_type

* group variables that are shared.

* remove _get_transformer_blocks

* harmonize the init function signatures.

* transformer_blocks to common

* repeat .
2024-04-03 12:56:17 +05:30
Beinsezii
aa19025989 UniPC Multistep add rescale_betas_zero_snr (#7531)
* UniPC Multistep add `rescale_betas_zero_snr`

Same patch as DPM and Euler with the patched final alpha cumprod

BF16 doesn't seem to break down, I think cause UniPC upcasts during some
phases already? We could still force an upcast since it only
loses ≈ 0.005 it/s for me but the difference in output is very small. A
better endeavor might upcasting in step() and removing all the other
upcasts elsewhere?

* UniPC ZSNR UT

* Re-add `rescale_betas_zsnr` doc oops
2024-04-02 17:23:55 -10:00
Beinsezii
19ab04ff56 UniPC Multistep fix tensor dtype/device on order=3 (#7532)
* UniPC UTs iterate solvers on FP16

It wasn't catching errs on order==3. Might be excessive?

* UniPC Multistep fix tensor dtype/device on order=3

* UniPC UTs Add v_pred to fp16 test iter

For completions sake. Probably overkill?
2024-04-02 15:41:29 -10:00
Sayak Paul
4a34307702 add: utility to format our docs too 📜 (#7314)
* add: utility to format our docs too 📜

* debugging saga

* fix: message

* checking

* should be fixed.

* revert pipeline_fixture

* remove empty line

* make style

* fix: setup.py

* style.
2024-04-02 20:49:43 +05:30
Bagheera
8e963d1c2a 7529 do not disable autocast for cuda devices (#7530)
* 7529 do not disable autocast for cuda devices

* Remove typecasting error check for non-mps platforms, as a correct autocast implementation makes it a non-issue

* add autocast fix to other training examples

* disable native_amp for dreambooth (sdxl)

* disable native_amp for pix2pix (sdxl)

* remove tests from remaining files

* disable native_amp on huggingface accelerator for every training example that uses it

* convert more usages of autocast to nullcontext, make style fixes

* make style fixes

* style.

* Empty-Commit

---------

Co-authored-by: bghira <bghira@users.github.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-04-02 20:15:06 +05:30
Sayak Paul
2b04ec2ff7 [Tests] Speed up fast pipelines part II (#7521)
* start printing the tensors.

* print full throttle

* set static slices for 7 tests.

* remove printing.

* flatten

* disable test for controlnet

* what happens when things are seeded properly?

* set the right value

* style./

* make pia test fail to check things

* print.

* fix pia.

* checking for animatediff.

* fix: animatediff.

* video synthesis

* final piece.

* style.

* print guess.

* fix: assertion for control guess.

---------

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2024-04-02 13:24:56 +05:30
Sayak Paul
000fa82a1e [Chore] remove class assignments for linear and conv. (#7553)
* remove class assignments for linear and conv.

* fix: self.nn
2024-04-02 13:01:04 +05:30
Sayak Paul
5d83f50c23 [Release tests] make nightly workflow dispatchable. (#7541)
* make nightly workflow dispatchable.

* add a note about running the release tests to setup.py
2024-04-02 12:21:17 +05:30
Dhruv Nair
5d21d4a204 Fix FreeU tests (#7540)
update
2024-04-02 11:05:50 +05:30
Álvaro Somoza
73ba81090e [Community pipeline] SDXL Differential Diffusion Img2Img Pipeline (#7550)
* initial-commit pipeline created

* updated README.md
2024-04-01 18:15:30 -10:00
YiYi Xu
7956c36aaa add a from_pipe method to DiffusionPipeline (#7241)
* add from_pipe



---------

Co-authored-by: yiyixuxu <yixu310@gmail,com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2024-04-01 13:02:00 -10:00
haikmanukyan
5266ab7935 add HD-Painter pipeline (#7520)
* add HD-Painter pipeline

* style fixing

* refactor, change doc, fix ruff

* fix docs

* used correct ruff version

---------

Co-authored-by: Hayk Manukyan <youremail@yourdomain.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-04-01 15:10:44 +05:30
YiYi Xu
7f724a930e fix the cpu offload tests (#7544)
fix
2024-04-01 14:27:14 +05:30
Jianbing Wu
9bef9f4be7 Fix SVD bug (shape of time_context) (#7268)
* Fix SVD bug (shape of `time_context`)

* Formatting code

* Formatting src/diffusers/models/transformers/transformer_temporal.py by `make style && make quality`

---------

Co-authored-by: kevinkhwu <kevinkhwu@tencent.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2024-04-01 14:05:52 +05:30
Dhruv Nair
7aa4514260 Fix typo in CPU offload test (#7542)
update
2024-03-31 22:07:17 -10:00
Bingxin Ke
c2e87869be [Community pipeline] Marigold depth estimation update -- align with marigold v0.1.5 (#7524)
* add resample option; check denoise_step; update ckpt path

* Add seeding in pipeline to increase reproducibility

* fix typo

* fix typo
2024-03-30 07:09:02 -10:00
Stephen
ca61287daa Fix IP Adapter Support for SAG Pipeline (#7260)
* fix ip adapter support

* Update sag pipelines tests, adjust sag pipeline to pass tests

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-03-30 06:15:29 -10:00
Beinsezii
f0c81562a4 Add final_sigma_zero to UniPCMultistep (#7517)
* Add `final_sigma_zero` to UniPCMultistep

Effectively the same trick as DDIM's `set_alpha_to_one` and
DPM's `final_sigma_type='zero'`.
Currently False by default but maybe this should be True?

* `final_sigma_zero: bool` -> `final_sigmas_type: str`

Should 1:1 match DPM Multistep now.

* Set `final_sigmas_type='sigma_min'` in UniPC UTs
2024-03-29 22:23:45 -10:00
Hyoungwon Cho
9d20ed37a2 Perturbed-Attention Guidance (#7512)
* pag_initial

* pag_docs

* edit_docs

* custom

* typo

* delete_docs

* whitespace

* make style

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-03-30 10:52:51 +05:30
Linoy Tsaban
bda1d4faf8 add Instant id sdxl image2image pipeline (#7507)
* initial commit - instantid img2img

* adapting to img2img

* change add_time_ids

* change add_time_ids

* WIP changes

* add strength to timesteps

* check insightface import

* style

* check insightface import changed to warning

* check insightface import changed to warning

* style

---------

Co-authored-by: apolinário <joaopaulo.passos@gmail.com>
2024-03-30 10:25:21 +05:30
UmerHA
77103d71ca Quick-Fix for #7352 block-lora (#7523)
Fixed important typo
2024-03-30 06:42:28 +05:30
UmerHA
0302446819 Implements Blockwise lora (#7352)
* Initial commit

* Implemented block lora

- implemented block lora
- updated docs
- added tests

* Finishing up

* Reverted unrelated changes made by make style

* Fixed typo

* Fixed bug + Made text_encoder_2 scalable

* Integrated some review feedback

* Incorporated review feedback

* Fix tests

* Made every module configurable

* Adapter to new lora test structure

* Final cleanup

* Some more final fixes

- Included examples in `using_peft_for_inference.md`
- Added hint that only attns are scaled
- Removed NoneTypes
- Added test to check mismatching lens of adapter names / weights raise error

* Update using_peft_for_inference.md

* Update using_peft_for_inference.md

* Make style, quality, fix-copies

* Updated tutorial;Warning if scale/adapter mismatch

* floats are forwarded as-is; changed tutorial scale

* make style, quality, fix-copies

* Fixed typo in tutorial

* Moved some warnings into `lora_loader_utils.py`

* Moved scale/lora mismatch warnings back

* Integrated final review suggestions

* Empty commit to trigger CI

* Reverted emoty commit to trigger CI

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-03-29 21:15:57 +05:30
Dhruv Nair
4d39b7483d Memory clean up on all Slow Tests (#7514)
* update

* update

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-03-29 14:23:28 +05:30
Sayak Paul
fac761694a [Tests] Speed up some fast pipeline tests (#7477)
* speed up test_vae_slicing in animatediff

* speed up test_karras_schedulers_shape for attend and excite.

* style.

* get the static slices out.

* specify torch print options.

* modify

* test run with controlnet

* specify kwarg

* fix: things

* not None

* flatten

* controlnet img2img

* complete controlet sd

* finish more

* finish more

* finish more

* finish more

* finish the final batch

* add cpu check for expected_pipe_slice.

* finish the rest

* remove print

* style

* fix ssd1b controlnet test

* checking ssd1b

* disable the test.

* make the test_ip_adapter_single controlnet test more robust

* fix: simple inpaint

* multi

* disable panorama

* enable again

* panorama is shaky so leave it for now

* remove print

* raise tolerance.
2024-03-29 14:11:38 +05:30
YiYi Xu
34c90dbb31 fix OOM for test_vae_tiling (#7510)
use float16 and add torch.no_grad()
2024-03-29 08:22:39 +05:30
Lvkesheng Shen
e49c04d5d6 Bug fix for controlnetpipeline check_image (#7103)
* Bug fix for controlnetpipeline check_image

Bug fix for controlnetpipeline check_image when using multicontrolnet and prompt list

* Update test_inference_multiple_prompt_input function

* Update test_controlnet.py

add test for multiple prompts and multiple image conditioning

* Update test_controlnet.py

Fix format error

---------

Co-authored-by: Lvkesheng Shen <45848260+Fantast416@users.noreply.github.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-03-28 08:25:18 -10:00
YiYi Xu
f238cb0736 cpu_offload: remove all hooks before offload (#7448)
* add remove_all_hooks

* a few more fix and tests

* up

* Update src/diffusers/pipelines/pipeline_utils.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* split tests

* add

---------

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
2024-03-28 08:23:02 -10:00
Bagheera
d78acdedc1 apple mps: training support for SDXL (ControlNet, LoRA, Dreambooth, T2I) (#7447)
* apple mps: training support for SDXL LoRA

* sdxl: support training lora, dreambooth, t2i, pix2pix, and controlnet on apple mps

---------

Co-authored-by: bghira <bghira@users.github.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-03-28 14:26:18 +05:30
Sayak Paul
6df103deba add: a helpful message when quality and repo consistency checks fail. (#7475) 2024-03-28 13:51:56 +05:30
Sayak Paul
73f28708be Improve nightly tests (#7385)
* flesh out the nightly tests

* address feedback.
2024-03-28 13:26:34 +05:30
Sayak Paul
0cbc78f04c [Modeling utils chore] import load_model_dict_into_meta only once (#7437)
import load_model_dict_into_meta only once
2024-03-28 13:01:53 +05:30
Thomas Liang
0cc5630945 [Chore] Fix Colab notebook links in README.md (#7495) 2024-03-27 12:36:36 -10:00
UmerHA
0b8e29289d Skip test_lora_fuse_nan on mps (#7481)
Skipping test_lora_fuse_nan on mps

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-03-27 14:35:59 +05:30
Sayak Paul
ab38ddf64f [chore] make the istructions on fetching all commits clearer. (#7474)
* make the istructions on fetching all commits clearer.

* Update setup.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-03-27 08:16:46 +05:30
YiYi Xu
ead82fedea fix torch.compile for multi-controlnet of sdxl inpaint (#7476)
fix

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-03-27 08:08:32 +05:30
Disty0
45b42d1203 Add device arg to offloading with combined pipelines (#7471)
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-03-26 13:45:16 -10:00
Long(Tony) Lian
5199ee4f7b Fix missing raise statements in check_inputs (#7473)
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-03-26 13:34:28 -10:00
Bagheera
544710ef0f diffusers#7426 fix stable diffusion xl inference on MPS when dtypes shift unexpectedly due to pytorch bugs (#7446)
* mps: fix XL pipeline inference at training time due to upstream pytorch bug

* Update src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* apply the safe-guarding logic elsewhere.

---------

Co-authored-by: bghira <bghira@users.github.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-03-26 20:05:49 +05:30
M. Tolga Cangöz
443aa14e41 Fix Tiling in ConsistencyDecoderVAE (#7290)
* Fix typos

* Add docstring to `decode` method in `ConsistencyDecoderVAE`

* Fix tiling

* Enable tiled VAE decoding with customizable tile sample size and overlap factor

* Revert "Enable tiled VAE decoding with customizable tile sample size and overlap factor"

This reverts commit 181049675e.

* Add VAE tiling test for `ConsistencyDecoderVAE`

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-03-26 17:59:08 +05:30
Sayak Paul
288632adf6 [Training utils] add kohya conversion dict. (#7435)
* add kohya conversion dict.

* update readme

* typo

* add filename
2024-03-26 17:31:22 +05:30
Ernie Chu
5ce79cbded Update train_dreambooth_lora_sd15_advanced.py (#7433)
you cannot specify `type="bool"` and `action="store_true"` at the same time.
remove excessive and buggy `type=bool`.

Co-authored-by: Linoy Tsaban <57615435+linoytsaban@users.noreply.github.com>
2024-03-26 12:53:02 +02:00
Marçal Comajoan Cara
d52f3e30f8 Fix broken link (#7472)
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-03-26 10:29:08 +05:30
Sayak Paul
699dfb084c feat: support DoRA LoRA from community (#7371)
* feat: support dora loras from community

* safe-guard dora operations under peft version.

* pop use_dora when False

* make dora lora from kohya work.

* fix: kohya conversion utils.

* add a fast test for DoRA compatibility..

* add a nightly test.
2024-03-26 09:37:33 +05:30
Sayak Paul
484c8ef399 [tests] skip dynamo tests when python is 3.12. (#7458)
skip dynamo tests when python is 3.12.
2024-03-26 08:39:48 +05:30
estelleafl
0dd0528851 Small ldm3d fix (#7464)
* fixed typo

* updated doc to be consistent in naming

* make style/quality

* preprocessing for 4 channels and not 6

* make style

* test for 4c

* make style/quality

* fixed test on cpu

* fixed doc typo

* changed default ckpt to 4c

* Update pipeline_stable_diffusion_ldm3d.py

* fix bug

---------

Co-authored-by: Aflalo <estellea@isl-iam1.rr.intel.com>
Co-authored-by: Aflalo <estellea@isl-gpu33.rr.intel.com>
Co-authored-by: Aflalo <estellea@isl-gpu38.rr.intel.com>
2024-03-25 15:33:43 -10:00
UmerHA
1cd4732e7f Fixed minor error in test_lora_layers_peft.py (#7394)
* Update test_lora_layers_peft.py

* Update utils.py
2024-03-25 11:35:27 -10:00
M. Tolga Cangöz
a51b6cc86a [Docs] Fix typos (#7451)
* Fix typos

* Fix typos

* Fix typos

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-03-25 11:48:02 -07:00
Dhruv Nair
3bce0f3da1 Fix for str_to_bool definition in testing utils (#7461)
update
2024-03-25 13:33:09 +05:30
Dhruv Nair
9a34953823 Additional Memory clean up for slow tests (#7436)
* update

* update

* update
2024-03-25 12:19:21 +05:30
Sayak Paul
e29f16cfaa [Research Projects] ORPO diffusion for alignment (#7423)
* barebones orpo

* remove reference model.

* full implementation

* change default of beta_orpo

* add a training command.

* fix: dataloading issues.

* interpreting the formulation.

* revert styling

* add: wds full blown version

* fix: per_gpu_batch_siz

* start debuggin

* debugging

* remove print

* fix

* remove filter keys.

* turn on non-blocking calls.

* device_placement

* let's see.

* add bigger training run command

* reinitialize generator for fair repro

* add: detailed readme and requirements

---------

Co-authored-by: Sayak Paul <sayakpaul@Sayaks-MacBook-Pro-2.local>
2024-03-25 08:37:41 +05:30
M. Tolga Cangöz
f7dfcfd971 [IP-Adapter] Fix IP-Adapter Support and Refactor Callback for StableDiffusionPanoramaPipeline (#7262)
* Add properties and `IPAdapterTesterMixin` tests for `StableDiffusionPanoramaPipeline`

* Update torch manual seed to use `torch.Generator(device=device)`

* Refactor 📞🔙 to support `callback_on_step_end`

* make fix-copies
2024-03-24 16:07:02 -10:00
Sayak Paul
3c67864c5a Remove distutils (#7455)
* strtobool

* replace Command from setuptools.
2024-03-25 06:44:53 +05:30
Aryan
363699044e [refactor] Fix FreeInit behaviour (#7410)
* fix freeinit impl

* fix progress bar

* fix progress bar and remove old code

* fix num_inference_steps==1 case for freeinit by atleast running 1 step when fast sampling enabled
2024-03-22 19:20:00 +05:30
Sayak Paul
9613576191 add: space for calculating memory usagee. (#7414)
* add: space for calculating memory usahe.

* Update docs/source/en/using-diffusers/loading.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-03-22 08:43:21 +05:30
YiYi Xu
e4356d6488 add a "Community Scripts" section (#7358)
* add

* add tiling

* fix

* fix

* fix

* give community script its own readme

* Update examples/community/README_community_scripts.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update examples/community/README_community_scripts.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update examples/community/README_community_scripts.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update examples/community/README_community_scripts.md

---------

Co-authored-by: Alexis Rolland <alexis.rolland@ubisoft.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-03-21 10:05:07 -10:00
Sayak Paul
82441460ef [Docs] add missing output image (#7425)
add missing output image
2024-03-21 09:22:06 -07:00
sayakpaul
3e1097cb63 Revert "add: space within docs to calculate mememory usage."
This reverts commit 78990dd960.
2024-03-21 08:33:02 +05:30
sayakpaul
78990dd960 add: space within docs to calculate mememory usage. 2024-03-21 08:32:37 +05:30
Yuanhao Zhai
405a1facd2 fix: enable unet_3d_condition to support time_cond_proj_dim (#7364)
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-03-21 07:46:32 +05:30
M. Tolga Cangöz
3028089e5e Fix typos (#7411)
* Fix typos

* Fix typo in SVD.md
2024-03-20 18:46:47 -07:00
Sayak Paul
b536f39818 [Custom Pipelines with Custom Components] fix multiple things (#7304)
* checking to improve pipelines.

* more fixes.

* add: tip to encourage the usage of revision

* Apply suggestions from code review

* retrigger ci

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-03-20 18:49:00 +05:30
Sayak Paul
e25e525fde [LoRA test suite] refactor the test suite and cleanse it (#7316)
* cleanse and refactor lora testing suite.

* more cleanup.

* make check_if_lora_correctly_set a utility function

* fix: typo

* retrigger ci

* style
2024-03-20 17:13:52 +05:30
Sayak Paul
de9adb907c clean dep installation step in push_tests (#7382)
* clean dep installation step in push_tests

* fix: deps
2024-03-20 07:30:43 +05:30
Sayak Paul
bf861e65dc [Chore] add: fives names to citations. (#7395)
* add: four names to citations.

* add: steven
2024-03-20 06:37:57 +05:30
Dhruv Nair
4da810b943 Remove insecure torch.load calls (#7393)
update
2024-03-19 12:41:50 -10:00
Stephen
161c6e14b6 Change path to posix (modeling_utils.py) (#6781)
* Change path to posix

* running isort

* run style and quality checks

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-03-19 11:50:34 -10:00
laksjdjf
a6c9015c4e Fix ControlNetModel.from_unet do not load add_embedding (#7269)
* Fix ControlNetModel.from_unet do not load add_embedding

* delete white space in blank line

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-03-19 09:45:08 -10:00
PJC
e6a5f99e5c Update pipeline_controlnet_sd_xl_img2img.py (#7353)
* Update pipeline_controlnet_sd_xl_img2img.py

fix: safetensors load error

* fix for pass test

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-03-19 09:29:39 -10:00
Dhruv Nair
80ff4ba63e Fix issue with prompt embeds and latents in SD Cascade Decoder with multiple image embeddings for a single prompt. (#7381)
* fix

* update

* update

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-03-19 07:40:14 -10:00
Sayak Paul
b09a2aa308 [LoRA] fix cross_attention_kwargs problems and tighten tests (#7388)
* debugging

* let's see the numbers

* let's see the numbers

* let's see the numbers

* restrict tolerance.

* increase inference steps.

* shallow copy of cross_attentionkwargs

* remove print
2024-03-19 17:53:38 +05:30
YiYi Xu
63b6846849 [scheduler] fix a bug in add_noise (#7386)
* fix

* fix

* add a tests

* fix

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: yiyixuxu <yixu310@gmail,com>
2024-03-19 00:50:58 -10:00
lawfordp2017
139f707e6e Correction for non-integral image resolutions with quantizations other than float32 (#7356)
* Correction for non-integral image resolutions with quantizations other than float32.

* Support for training, and use of diffusers-style casting.
2024-03-19 16:17:44 +05:30
Aryan
e4546fd5bb [docs] Add missing copied from statements in TCD Scheduler (#7360)
* add missing copied from statements in tcd scheduler

* update docstring

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-03-19 00:45:36 -10:00
Dhruv Nair
d44e31aec2 Add FreeInit Outputs to Docs Page (#7384)
* update

* fix
2024-03-19 14:13:41 +05:30
Sayak Paul
ce9825b56b [LoRA] pop the LoRA scale so that it doesn't get propagated to the weeds (#7338)
* pop scale from the top-level unet instead of getting it.

* improve readability.

* Apply suggestions from code review

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* fix a little bit.

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-03-19 09:12:05 +05:30
M. Tolga Cangöz
85f9d92883 Fix conditional statement in test_schedulers.py (#7323)
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-03-19 08:28:47 +05:30
M. Tolga Cangöz
916d9812a8 Update loading of config from a file in test_config.py (#7344)
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-03-18 11:47:36 -10:00
M. Tolga Cangöz
e6a8492242 Use PyTorch's conventional inplace functions (#7332)
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-03-18 09:12:15 -10:00
Beinsezii
ad0308b3f1 Add Cascade to Auto T2I + Decoder mappings (#7362)
* Add Cascade to Auto T2I + Decoder mappings

* ruff autofix

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-03-18 08:58:20 -10:00
M. Tolga Cangöz
e97a633b63 Update access of configuration attributes (#7343)
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-03-18 08:53:29 -10:00
Sayak Paul
01ac37b331 [LoRA] Clean Kohya conversion utils (#7374)
* clean up the kohya_conversion utility

* state dict assignment
2024-03-18 06:53:37 -10:00
M. Tolga Cangöz
6a05b274cc Fix Typos (#7325)
* Fix PyTorch's convention for inplace functions

* Fix import structure in __init__.py and update config loading logic in test_config.py

* Update configuration access

* Fix typos

* Trim trailing white spaces

* Fix typo in logger name

* Revert "Fix PyTorch's convention for inplace functions"

This reverts commit f65dc4afcb.

* Fix typo in step_index property description

* Revert "Update configuration access"

This reverts commit 8d44e870b8.

* Revert "Fix import structure in __init__.py and update config loading logic in test_config.py"

This reverts commit 2ad5e8bca2.

* Fix typos

* Fix typos

* Fix typos

* Fix a typo: tranform -> transform
2024-03-18 09:48:40 -07:00
Anatoly Belikov
98d46a3f08 delete vae and text encoders after use in SDXL training script (#6693)
delete vae and text encoders after use

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-03-18 20:03:53 +05:30
Dhruv Nair
4330a747d4 [Tests] Fix ControlNet Single File tests (#7315)
* update

* update

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-03-18 11:28:59 +05:30
Sayak Paul
76de6a09fb post-release v0.27.0 (#7329)
* post-release

* quality
2024-03-18 10:52:20 +05:30
Sayak Paul
25caf24ef9 Fix release workflow deps (#7339)
* pop scale from the top-level unet instead of getting it.

* improve readability.

* fix: pypi workflow deps

* revert
2024-03-16 07:18:11 +05:30
Abubakar Abid
8db3c9bc9f Adds docs for gradio.Interface.from_pipeline() (#7346)
* gradio docs

* Update docs/source/en/api/pipelines/stable_diffusion/overview.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* changes

* changes

* changes

* Update docs/source/en/api/pipelines/stable_diffusion/overview.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-03-16 07:11:28 +05:30
Sayak Paul
e0e9f81971 add: torch to the pypi step. (#7328) 2024-03-15 12:28:12 +05:30
M. Tolga Cangöz
5d848ec07c [Tests] Update a deprecated parameter in test files and fix several typos (#7277)
* Add properties and `IPAdapterTesterMixin` tests for `StableDiffusionPanoramaPipeline`

* Fix variable name typo and update comments

* Update deprecated `output_type="numpy"` to "np" in test files

* Discard changes to src/diffusers/pipelines/stable_diffusion_panorama/pipeline_stable_diffusion_panorama.py

* Update test_stable_diffusion_panorama.py

* Update numbers in README.md

* Update get_guidance_scale_embedding method to use timesteps instead of w

* Update number of checkpoints in README.md

* Add type hints and fix var name

* Fix PyTorch's convention for inplace functions

* Fix a typo

* Revert "Fix PyTorch's convention for inplace functions"

This reverts commit 74350cf65b.

* Fix typos

* Indent

* Refactor get_guidance_scale_embedding method in LEditsPPPipelineStableDiffusionXL class
2024-03-14 12:17:35 -07:00
Dhruv Nair
4974b84564 Update Cascade Tests (#7324)
* update

* update

* update
2024-03-14 20:51:22 +05:30
Linoy Tsaban
83062fb872 [Advanced DreamBooth LoRA SDXL] Support EDM-style training (follow up of #7126) (#7182)
* add edm style training

* style

* finish adding edm training feature

* import fix

* fix latents mean

* minor adjustments

* add edm to readme

* style

* fix autocast and scheduler config issues when using edm

* style

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-03-14 18:40:14 +05:30
Suraj Patil
b6d7e31d10 add edm schedulers in doc (#7319)
* add edm schedulers in doc

* add in toctree

* address reviewe comments
2024-03-14 11:52:25 +01:00
Anatoly Belikov
53e9aacc10 log loss per image (#7278)
* log loss per image

* add commandline param for per image loss logging

* style

* debug-loss -> debug_loss

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-03-14 11:41:43 +05:30
Dhruv Nair
41424466e3 [Tests] Fix incorrect constant in VAE scaling test. (#7301)
update
2024-03-14 10:24:01 +05:30
Sayak Paul
95de1981c9 add: pytest log installation (#7313) 2024-03-14 10:01:16 +05:30
Kenneth Gerald Hamilton
0b45b58867 update get_order_list if statement (#7309)
* update get_order_list if statement

* revery
2024-03-13 18:29:42 -10:00
Beinsezii
d3986f18be Change step_offset scheduler docstrings (#7128)
* Change step_offset scheduler docstrings

* Mention it may be needed by some models

* More docstrings

These ones failed literal S&R because I performed it case-sensitive
which is fun.

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-03-13 15:12:00 -10:00
Alexander Bonnet
ee6a3a993d Fix typos in UNet2DConditionModel documentation (#7291)
* fix typo in UNet2DConditionModel documentation

* Fix indentation that may fix doc rendering

* Fix squished doc lines
2024-03-13 09:31:29 -07:00
Michael
b300517305 Add Intro page of TCD (#7259)
* add tcd intro

* resolve repos

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* revise NFEs related

* change inpainting location

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-03-13 09:21:51 -07:00
jnhuang
ac07b6dc6a Fix Wrong Text-encoder Grad Setting in Custom_Diffusion Training (#7302)
fix index in set textencoder grad

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-03-13 20:22:44 +05:30
Sayak Paul
46ab56a468 add: support for notifying maintainers about the nightly test status (#7117)
* add: support for notifying maintainers about the nightly test status

* add: a tempoerary workflow for validation.

* cancel in progress.

* runs-on

* clean up

* add: peft dep

* change device.

* multiple edits.

* remove temp workflow.
2024-03-13 16:48:11 +05:30
Sayak Paul
038ff70023 [PyPI publishing] feat: automate the process of pypi publication to some extent. (#7270)
* feat: automate the process of pypi publication to some extent.

* utility to fetch the latest release branch

* correct package name.
2024-03-13 16:27:59 +05:30
Manuel Brack
00eca4b887 [Pipeline] Add LEDITS++ pipelines (#6074)
* Setup LEdits++ file structure

* Fix import

* LEditsPP Stable Diffusion pipeline

* Include variable image aspect ratios

* Implement LEDITS++ for SDXL

* clean up LEditsPPPipelineStableDiffusion

* Adjust inversion output

* Added docu, more cleanup for LEditsPPPipelineStableDiffusion

* clean up LEditsPPPipelineStableDiffusionXL

* Update documentation

* Fix documentation import

* Add skeleton IF implementation

* Fix documentation typo

* Add LEDTIS docu to toctree

* Add missing title

* Finalize SD documentation

* Finalize SD-XL documentation

* Fix code style and quality

* Fix typo

* Fix return types

* added LEditsPPPipelineIF; minor changes for LEditsPPPipelineStableDiffusion and LEditsPPPipelineStableDiffusionXL

* Fix copy reference

* add documentation for IF

* Add first tests

* Fix batching for SD-XL

* Fix text encoding and perfect reconstruction for SD-XL

* Add tests for SD-XL, minor changes

* move user_mask to correct device, use cross_attention_kwargs also for inversion

* Example docstring

* Fix attention resolution for non-square images

* Refactoring for PR review

* Safely remove ledits_utils.py

* Style fixes

* Replace assertions with ValueError

* Remove LEditsPPPipelineIF

* Remove unecessary input checks

* Refactoring of CrossAttnProcessor

* Revert unecessary changes to scheduler

* Remove first progress-bar in inversion

* Refactor scheduler usage and reset

* Use imageprocessor instead of custom logic

* Fix scheduler init warning

* Fix error when running the pipeline in fp16

* Update documentation wrt perfect inversion

* Update tests

* Fix code quality and copy consistency

* Update LEditsPP import

* Remove enable/disable methods that are now in StableDiffusionMixin

* Change import in docs

* Revert import structure change

* Fix ledits imports

---------

Co-authored-by: Katharina Kornmeier <katharina.kornmeier@stud.tu-darmstadt.de>
2024-03-13 12:43:47 +02:00
Dhruv Nair
30132aba30 Update Stable Cascade Conversion Scripts (#7271)
* update

* update

* update

* update

* update

* update

* update

* update

* update

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-03-13 12:35:44 +05:30
Dhruv Nair
a17d6d6858 Update Cascade documentation (#7257)
* updates

* update

* update

* Update docs/source/en/api/pipelines/stable_cascade.md

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* update

* update

* update

* update

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
2024-03-13 11:29:59 +05:30
Sayak Paul
8efd9ce787 [Chore] clean residue from copy-pasting in the UNet single file loader (#7295)
clean residue from copy-pasting
2024-03-13 11:20:13 +05:30
Dhruv Nair
299c16d0f5 Fix loading Img2Img refiner components in from_single_file (#7282)
* update

* update

* update

* update
2024-03-13 09:25:53 +05:30
Dhruv Nair
69f49195ac Fix passing pooled prompt embeds to Cascade Decoder and Combined Pipeline (#7287)
* update

* update

* update

* update
2024-03-13 09:21:41 +05:30
Dhruv Nair
ed224f94ba Add single file support for Stable Cascade (#7274)
* update

* update

* update

* update

* update

* update

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-03-13 08:37:31 +05:30
Sayak Paul
531e719163 [LoRA] use the PyTorch classes wherever needed and start depcrecation cycles (#7204)
* fix PyTorch classes and start deprecsation cycles.

* remove args crafting for accommodating scale.

* remove scale check in feedforward.

* assert against nn.Linear and not CompatibleLinear.

* remove conv_cls and lineaR_cls.

* remove scale

* 👋 scale.

* fix: unet2dcondition

* fix attention.py

* fix: attention.py again

* fix: unet_2d_blocks.

* fix-copies.

* more fixes.

* fix: resnet.py

* more fixes

* fix i2vgenxl unet.

* depcrecate scale gently.

* fix-copies

* Apply suggestions from code review

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* quality

* throw warning when scale is passed to the the BasicTransformerBlock class.

* remove scale from signature.

* cross_attention_kwargs, very nice catch by Yiyi

* fix: logger.warn

* make deprecation message clearer.

* address final comments.

* maintain same depcrecation message and also add it to activations.

* address yiyi

* fix copies

* Apply suggestions from code review

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* more depcrecation

* fix-copies

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-03-13 07:56:19 +05:30
Sayak Paul
4fbd310fd2 [Chore] switch to logger.warning (#7289)
switch to logger.warning
2024-03-13 06:56:43 +05:30
Dhruv Nair
2ea28d69dc Change export_to_video default (#6990)
update

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-03-12 17:13:12 +05:30
erliding
a1cb106459 instruct pix2pix pipeline: remove sigma scaling when computing classifier free guidance (#7006)
remove sigma scaling when computing cfg

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-03-11 10:23:31 +01:00
Sayak Paul
5dd8e04d4b [Dockerfiles] add: a workflow to check if docker containers can be built in case of modifications (#7129)
* add: a workflow to check if docker containers can be built if the files are modified.

* type

* unify docker image build test and push

* make it run on prs too.

* check

* check

* check

* check again.

* remove docker test build file.

* remove extra dependencies./

* check
2024-03-11 08:54:00 +05:30
pravdomil
165af7edd3 Inline InputPadder (#6582)
inline InputPadder
2024-03-09 11:24:07 -10:00
Haofan Wang
6c5f0de713 Support latents_mean and latents_std (#7132)
* update latents_mean and latents_std

* fix typos

* Update src/diffusers/pipelines/controlnet/pipeline_controlnet_sd_xl_img2img.py

* format

---------

Co-authored-by: ResearcherXman <xhs.research@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-03-09 08:54:19 -10:00
pravdomil
e64fdcf2ce Fix gmflow_dir (#6583)
* remove sys.path

* update readme
2024-03-09 08:53:17 -10:00
Sayak Paul
ec64f371b1 [Chore] remove tf mention (#7245)
remove tf mention
2024-03-09 11:39:04 +05:30
Aryan
cd6e1f1171 [docs/nits] Fix return values based on return_dict and minor doc updates (#7105)
* fix returns and docs

* handle latent output_type correctly

* revert to old tensor2vid impl

* make fix-copies

* fix return in community animatediff pipes

* fix return docstring

* fix return docs

* add missing quote

---------

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2024-03-08 18:47:24 -10:00
Xiaodong Wang
6f2b310a17 [UNet_Spatio_Temporal_Condition] fix default num_attention_heads in unet_spatio_temporal_condition (#7205)
fix default num_attention_heads in unet_spatio_temporal_condition

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-03-08 18:29:06 -10:00
YiYi Xu
e3cd6cae50 update the signature of from_single_file (#7216)
* update the signature of from_single_file

* Update src/diffusers/loaders/single_file.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/diffusers/loaders/single_file.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/diffusers/loaders/single_file.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/diffusers/loaders/single_file.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: yiyixuxu <yixu310@gmail,com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-03-08 17:51:51 -10:00
qqii
e5ee05da76 [Community Pipeline] Skip Marigold depth_colored with color_map=None (#7170)
[Community Pipeline] Skip Marigold depth_colored generation by passing color_map=None
2024-03-08 17:51:11 -10:00
Mengqing Cao
e6ff752840 Add npu support (#7144)
* Add npu support

* fix for code quality check

* fix for code quality check
2024-03-08 17:12:55 -10:00
UmerHA
3f9c746fb2 Adds denoising_end parameter to ControlNetPipeline for SDXL (#6175)
* Initial commit

* Removed copy hints, as in original SDXLControlNetPipeline

Removed copy hints, as in original SDXLControlNetPipeline, as the `make fix-copies` seems to have issues with the @property decorator.

* Reverted changes to ControlNetXS

* Addendum to: Removed changes to ControlNetXS

* Added test+docs for mixture of denoiser

* Update docs/source/en/using-diffusers/controlnet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/using-diffusers/controlnet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-03-08 16:42:02 -10:00
Steven Liu
1f22c98820 [docs] IP-Adapter image embedding (#7226)
* update

* fix parameter name

* feedback

* add no mask version

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-03-08 08:49:58 -08:00
Sayak Paul
b4226bd6a7 [Tests] fix config checking tests (#7247)
* debig

* cast tuples to lists.

* debug

* handle upcast attention

* handle downblock types for vae.

* remove print.

* address Dhruv's comments.

* fix: upblock types.

* upcast attention

* debug

* debug

* debug

* better guarding.

* style
2024-03-08 18:53:07 +05:30
Chi
46fac824be Solve missing clip_sample implementation in FlaxDDIMScheduler. (#7017)
* I added a new doc string to the class. This is more flexible to understanding other developers what are doing and where it's using.

* Update src/diffusers/models/unet_2d_blocks.py

This changes suggest by maintener.

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* Update src/diffusers/models/unet_2d_blocks.py

Add suggested text

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* Update unet_2d_blocks.py

I changed the Parameter to Args text.

* Update unet_2d_blocks.py

proper indentation set in this file.

* Update unet_2d_blocks.py

a little bit of change in the act_fun argument line.

* I run the black command to reformat style in the code

* Update unet_2d_blocks.py

similar doc-string add to have in the original diffusion repository.

* Fix bug for mention in this issue section #6901

* Update src/diffusers/schedulers/scheduling_ddim_flax.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Fix linter

* Restore empty line

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
2024-03-08 12:01:59 +01:00
Martin Müller
b33b64f595 Make mid block optional for flax UNet (#7083)
* make mid block optional for flax UNet

* make style
2024-03-08 11:35:06 +01:00
Sayak Paul
9d9744075e [Easy] fix: save_model_card utility of the DreamBooth SDXL LoRA script (#7258)
* fix: save_model_card utility.

* fix a little more to make it more lenient.

* remove lower()
2024-03-08 15:22:23 +05:30
Sayak Paul
d9a3b69806 [Utils] Improve " # Copied from ..." statements in the pipelines (#6917)
* copied from for t2i pipelines without ip adapter support.

* two more pipelines with proper copied from comments.

* revert to the original implementation
2024-03-08 14:42:26 +05:30
Sayak Paul
f7e5954d5e [Tests] fix: VAE tiling tests when setting the right device (#7246)
* debug

* checking

* fix more

* remove device.

* fix-copies
2024-03-08 10:01:25 +05:30
Sayak Paul
8e19c073e5 [Core] throw error when patch inputs and layernorm are provided for Transformers2D (#7200)
* throw error when patch inputs and layernorm are provided for transformers2d.

* add comment on supported norm_types in transformers2d

* more check

* fix: norm _type handling
2024-03-08 09:41:02 +05:30
Steven Liu
f6df16cbb8 [docs] Community tips (#7137)
* tips

* feedback

* callback only
2024-03-07 15:17:26 -08:00
pravdomil
b24f78349c use self.device (#6595)
* use self.device

* use device

* fix

* fix
2024-03-07 12:46:23 -10:00
Steven Liu
3ce905c9d0 [docs] Merge LoRAs (#7213)
* merge loras

* feedback

* torch.compile

* feedback
2024-03-07 11:28:50 -08:00
bimsarapathiraja
f539497ab4 Remove the line. Using it create wrong output (#7075)
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-03-07 10:04:31 -08:00
Dhruv Nair
39dfb7abbd Raise an error when trying to use SD Cascade Decoder with dtype bfloat16 and torch < 2.2 (#7244)
update
2024-03-07 17:55:46 +05:30
Sayak Paul
196835695e fix: support for loading playground v2.5 single file checkpoint. (#7230)
* fix: support for loading playground v2.5 single file checkpoint.

* remove is_playground_model.

* fix: edm key

* apply Dhruv's comments but errors.

* fix: things.

* delegate model_type inference to a function.

* address Dhruv's comment.

* address rest of the comments.

* fix: kwargs

* fix

* update

---------

Co-authored-by: DN6 <dhruv.nair@gmail.com>
2024-03-07 15:31:03 +05:30
Sayak Paul
0d4dfbbd0a [Examples] fix: prior preservation setting in DreamBooth LoRA SDXL script. (#7242)
fix: prior preservation setting in DreamBooth LoRA SDXL script.

Co-authored-by: Linoy Tsaban <57615435+linoytsaban@users.noreply.github.com>
2024-03-07 15:19:58 +05:30
Rinne
ada3bb941b fix: remove duplicated code in TemporalBasicTransformerBlock. (#7212)
fix: remove duplicate code in TemporalBasicTransformerBlock.

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-03-07 13:25:22 +05:30
Linoy Tsaban
b5814c5555 add DoRA training feature to sdxl dreambooth lora script (#7235)
* dora in canonical script

* add mention of DoRA to readme
2024-03-07 11:43:37 +05:30
Paakhhi
9940573618 Refactor Prompt2Prompt: Inherit from DiffusionPipeline (#7211)
refactor: inherit from DiffusionPipeline instead of StableDiffusionPipeline
2024-03-06 19:34:40 -10:00
Sayak Paul
59433ca1ae [Core] move out the utilities from pipeline_utils.py (#7234)
move out the utilities from pipeline_utils.py
2024-03-07 08:45:24 +05:30
Nate Landman
534f5d54fa Update train_dreambooth_lora_sdxl_advanced.py (#7227)
adding the type gives you

```
TypeError: _StoreTrueAction.__init__() got an unexpected keyword argument 'type'
```
2024-03-06 12:41:48 +01:00
Kashif Rasul
40aa47b998 [Pipiline] Wuerstchen v3 aka Stable Cascasde pipeline (#6487)
* initial diffNext v3

* move to v3 folder

* imports

* dry up the unets

* no switch_level

* fix init

* add switch_level tp config

* Fixed some things

* Added pooled text embeddings

* Initial work on adding image encoder

* changes from @dome272

* Stuff for the image encoder processing and variable naming in decoder

* fix arg name

* inference fixes

* inference fixes

* default TimestepBlock without conds

* c_skip=0 by default

* fix bfloat16 to cpu

* use config

* undo temp change

* fix gen_c_embeddings args

* change text encoding

* text encoding

* undo print

* undo .gitignore change

* Allow WuerstchenV3PriorPipeline to use the base DDPM & DDIM schedulers

* use WuerstchenV3Unet in both pipelines

* fix imports

* initial failing tests

* cleanup

* use scheduler.timesterps

* some fixes to the tests, still not fully working

* fix tests

* fix prior tests

* add dropout to the model_kwargs

* more tests passing

* update expected_slice

* initial rename

* rename tests

* rename class names

* make fix-copies

* initial docs

* autodocs

* typos

* fix arg docs

* add text_encoder info

* combined pipeline has optional image arg

* fix documentation

* Update src/diffusers/pipelines/stable_cascade/modeling_stable_cascade_common.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/diffusers/pipelines/stable_cascade/modeling_stable_cascade_common.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/diffusers/pipelines/stable_cascade/modeling_stable_cascade_common.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/pipelines/stable_cascade/modeling_stable_cascade_common.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/diffusers/pipelines/stable_cascade/pipeline_stable_cascade.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/pipelines/stable_cascade/modeling_stable_cascade_common.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* use self.config

* Update src/diffusers/pipelines/stable_cascade/modeling_stable_cascade_common.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* c_in -> in_channels

* removed kwargs from unet's forward

* Update src/diffusers/pipelines/stable_cascade/pipeline_stable_cascade.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/pipelines/stable_cascade/pipeline_stable_cascade.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* remove older callback api

* removed kwargs and fixed decoder guidance > 1

* decoder takes emeds

* check and use image_embeds

* fixed all but one decoder test

* fix decoder tests

* update callback api

* fix some more combined tests

* push combined pipeline

* initial docs

* fix doc_string

* update combined api

* no test_callback_inputs test for combined pipeline

* add optional components

* fix ordering of components

* fix combined tests

* update convert script

* Update src/diffusers/pipelines/stable_cascade/pipeline_stable_cascade_prior.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/pipelines/stable_cascade/pipeline_stable_cascade_prior.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/pipelines/stable_cascade/pipeline_stable_cascade_prior.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* fix imports

* move effnet out of deniosing loop

* prompt_embeds_pooled only when doing guidance

* Fix repeat shape

* move StableCascadeUnet to models/unets/

* more descriptive names

* converted when numpy()

* StableCascadePriorPipelineOutput docs

* rename StableCascadeUNet

* add slow tests

* fix slow tests

* update

* update

* updated model_path

* add args for weights

* set push_to_hub to false

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

---------

Co-authored-by: Dominic Rampas <d6582533@gmail.com>
Co-authored-by: Pablo Pernias <pablo@pernias.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: 99991 <99991@users.noreply.github.com>
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2024-03-06 15:07:25 +05:30
Jinay Jain
1bc0d37ffe [bug] Fix float/int guidance scale not working in StableVideoDiffusionPipeline (#7143)
* [bug] Fix float/int guidance scale not working in `StableVideoDiffusionPipeline`

* Add test to disable CFG on SVD

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-03-06 14:05:07 +05:30
bram-w
eb942b866a SDXL Turbo support and example launch (#6473)
* support and example launch for sdxl turbo

* White space fixes

* Trailing whitespace character

* ruff format

* fix guidance_scale and steps for turbo mode

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Radames Ajna <radamajna@gmail.com>
2024-03-06 11:51:01 +05:30
Michael
687bc27727 add TCD Scheduler (#7174)
* add: support TCD scheduler


---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-03-04 19:43:34 -10:00
iczaw
6246c70d21 [Community] PromptDiffusion Pipeline (#6752)
* Create promptdiffusioncontrolnet.py

* Update __init__.py

Added PromptDiffusionControlNetModel

* Update __init__.py

Added PromptDiffusionControlNetModel

* Update promptdiffusioncontrolnet.py

* Create pipeline_prompt_diffusion.py

Added Prompt Diffusion pipeline.

* Create convert_original_promptdiffusion_to_diffusers.py

* Update convert_from_ckpt.py

Added download_promptdiffusion_from_original_ckpt, convert_promptdiffusion_checkpoint

* Update promptdiffusioncontrolnet.py

* Update pipeline_prompt_diffusion.py

* Update README.md

* Update pipeline_prompt_diffusion.py

* Delete src/diffusers/models/promptdiffusioncontrolnet.py

* Update __init__.py

* Update __init__.py

* Delete scripts/convert_original_promptdiffusion_to_diffusers.py

* Update convert_from_ckpt.py

* Update README.md

* Delete examples/community/pipeline_prompt_diffusion.py

* Create README.md

* Create promptdiffusioncontrolnet.py

* Create convert_original_promptdiffusion_to_diffusers.py

* Create pipeline_prompt_diffusion.py

* Update README.md

* Update pipeline_prompt_diffusion.py

* Update README.md

* Update pipeline_prompt_diffusion.py

* Update convert_original_promptdiffusion_to_diffusers.py

* Update promptdiffusioncontrolnet.py

* Update README.md

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-03-05 09:06:02 +05:30
Sayak Paul
577b8a2783 [Core] errors should be caught as soon as possible. (#7203)
errors should be caught as soon as possible.
2024-03-05 08:57:38 +05:30
Vinh H. Pham
13f0c8b219 [Docs] Update callback.md code example (#7150)
Update callback.md

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-03-05 08:44:19 +05:30
Aryan
fa1bdce3d4 [docs] Improve SVD pipeline docs (#7087)
* update svd docs

* fix example doc string

* update return type hints/docs

* update type hints

* Fix typos in pipeline_stable_video_diffusion.py

* make style && make fix-copies

* Update src/diffusers/pipelines/stable_video_diffusion/pipeline_stable_video_diffusion.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/diffusers/pipelines/stable_video_diffusion/pipeline_stable_video_diffusion.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* update based on suggestion

---------

Co-authored-by: M. Tolga Cangöz <mtcangoz@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-03-04 12:12:38 -08:00
Thiago Crepaldi
ca6cdc77a9 Enable PyTorch's FakeTensorMode for EulerDiscreteScheduler scheduler (#7151)
* Enable FakeTensorMode for EulerDiscreteScheduler scheduler

PyTorch's FakeTensorMode does not support `.numpy()` or `numpy.array()`
calls.

This PR replaces `sigmas` numpy tensor by a PyTorch tensor equivalent

Repro

```python
with torch._subclasses.FakeTensorMode() as fake_mode, ONNXTorchPatcher():
    fake_model = DiffusionPipeline.from_pretrained(model_name, low_cpu_mem_usage=False)
```

that otherwise would fail with
`RuntimeError: .numpy() is not supported for tensor subclasses.`

* Address comments
2024-03-04 09:19:59 -10:00
M. Tolga Cangöz
f4977abcd8 Fix typos (#7181)
* Fix typos

* Fix typos

* Fix typos and update documentation in lora.md
2024-03-04 10:28:23 -08:00
fpgaminer
df8559a7f9 Fix: UNet2DModel::__init__ type hints; fixes issue #4806 (#7175)
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-03-04 09:50:34 -08:00
YiYi Xu
8f206a5873 fix a bug in from_config (#7192)
* fix

* fix

* update comment

---------

Co-authored-by: yiyixuxu <yixu310@gmail,com>
2024-03-04 07:18:59 -10:00
Linoy Tsaban
8da360aa12 [training scripts] add tags of diffusers-training (#7206)
* add tags for diffusers training

* add tags for diffusers training

* add tags for diffusers training

* add tags for diffusers training

* add tags for diffusers training

* add tags for diffusers training

* add dora tags for drambooth lora scripts

* style
2024-03-04 22:17:25 +05:30
Dhruv Nair
869bad3e52 FIx torch and cuda version in ONNX tests (#7164)
update

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-03-04 15:46:39 +05:30
Linoy Tsaban
01ee0978cc [advanced dreambooth lora sdxl] add DoRA training feature (#7072)
* add is_dora arg

* style

* add dora training feature to sd 1.5 script

* added notes about DoRA training

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-03-04 11:07:54 +01:00
Sayak Paul
56b68459f5 Update requirements.txt to remove huggingface-cli (#7202)
Internal message: https://huggingface.slack.com/archives/C03Q18WK18T/p1709529892062479
2024-03-04 11:29:01 +05:30
Vinh H. Pham
2ca264244b adding callback_on_step_end for StableDiffusionLDM3DPipeline (#7149)
* refactor callback

* run make style

* add copy

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-03-03 18:44:14 -10:00
Sayak Paul
b9e1c30d0e [Docs] more elaborate example for peft torch.compile (#7161)
more elaborate example for peft torch.compile
2024-03-04 08:55:30 +05:30
Sayak Paul
03cd62520f feat: add ip adapter benchmark (#6936)
* feat: add ip adapter benchmark

* sdxl support too.

* Empty-Commit
2024-03-03 15:10:55 +05:30
Álvaro Somoza
001b14023e [ip-adapter] fix problem using embeds with the plus version of ip adapters (#7189)
* initial

* check_inputs fix to the rest of pipelines

* add fix for no cfg too

* use of variable

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-03-02 21:37:00 -10:00
Junsong Chen
f55873b783 Fix PixArt 256px inference (#6789)
* feat 256px diffusers inference bug

* change the max_length of T5 to pipeline config file

* fix bug in convert_pixart_alpha_to_diffusers.py

* Update scripts/convert_pixart_alpha_to_diffusers.py

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* remove multi_scale_train parser

* Update src/diffusers/pipelines/pixart_alpha/pipeline_pixart_alpha.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Update src/diffusers/pipelines/pixart_alpha/pipeline_pixart_alpha.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* styling

* change `model_token_max_length` to call argument.

* Refactoring

* add: max_sequence_length to the docstring.

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-03-03 10:31:21 +05:30
Sayak Paul
ccb93dcad1 Support EDM-style training in DreamBooth LoRA SDXL script (#7126)
* add: dreambooth lora script for Playground v2.5

* fix: kwarg

* address suraj's comments.

* Apply suggestions from code review

Co-authored-by: Suraj Patil <surajp815@gmail.com>

* apply suraj's suggestion

* incorporate changes in the canonical script./

* tracker naming

* fix: schedule determination

* add: two simple tests

* remove playground script

* note about edm-style training

* address pedro's comments.

* address part of Suraj's comments.

* Apply suggestions from code review

Co-authored-by: Suraj Patil <surajp815@gmail.com>

* remove guidance_scale.

* use mse_loss.

* add comments for preconditioning.

* quality

* Update examples/dreambooth/train_dreambooth_lora_sdxl.py

Co-authored-by: Suraj Patil <surajp815@gmail.com>

* tackle v-pred.

* Empty-Commit

* support edm for sdxl too.

* address suraj's comments.

* Empty-Commit

---------

Co-authored-by: Suraj Patil <surajp815@gmail.com>
2024-03-03 09:28:57 +05:30
YiYi Xu
ec953047bc [stalebot] fix a bug (#7156)
fix

Co-authored-by: yiyixuxu <yixu310@gmail,com>
2024-03-01 09:44:00 -10:00
Oleh
9a2600ede9 Map speedup (#6745)
* Speed up dataset mapping

* Fix missing columns

* Remove cache files cleanup

* Update examples/text_to_image/train_text_to_image_sdxl.py

* make style

* Fix code style

* style

* Empty-Commit

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>
Co-authored-by: Quentin Lhoest <lhoest.q@gmail.com>
2024-03-01 21:38:21 +05:30
Sayak Paul
5f150c4cef fix: loading problem for sdxl lora dreambooth (#7166) 2024-03-01 19:30:48 +05:30
Quentin Lhoest
66f8bd6869 Fix vae_encodings_fn hash in train_text_to_image_sdxl.py (#7171)
Update train_text_to_image_sdxl.py
2024-03-01 16:56:54 +05:30
Dhruv Nair
64a8cd627a [CI] Remove max parallel flag on slow test runners (#7162)
update
2024-03-01 11:38:21 +05:30
Sayak Paul
5d3923b670 Fix LCM benchmark test (#7158)
* make workflow dispatchable.

* fix: lcm lora compile
2024-03-01 10:33:44 +05:30
Sayak Paul
9451235e5a [Urgent][Docker CI] pin uv version for now and a minor change in the Slack notification (#7155)
pin uv version for now.
2024-03-01 10:11:07 +05:30
Sayak Paul
c2b6ac4e34 [CI] fix path filtering in the documentation workflows (#7153)
fix: path
2024-03-01 07:18:49 +05:30
YiYi Xu
06b01ea87e [ip-adapter] refactor prepare_ip_adapter_image_embeds and skip load image_encoder (#7016)
* add

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

---------

Co-authored-by: yiyixuxu <yixu310@gmail,com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-02-29 15:38:49 -10:00
M. Tolga Cangöz
f4fc75035f [Docs] Fix typos (#7131)
* Add copyright notice to relevant files and fix typos

* Set `timestep_spacing` parameter of `StableDiffusionXLPipeline`'s scheduler to `'trailing'`.

* Update `StableDiffusionXLPipeline.from_single_file` by including EulerAncestralDiscreteScheduler with `timestep_spacing="trailing"` param.

* Update model loading method in SDXL Turbo documentation
2024-02-29 13:03:01 -08:00
Dhruv Nair
8f2d13c684 Fix setting fp16 dtype in AnimateDiff convert script. (#7127)
* update

* update
2024-02-29 22:47:39 +05:30
Sayak Paul
fcfa270fbd add: support for notifying the maintainers about the docker ci status. (#7113) 2024-02-29 19:28:52 +05:30
Sayak Paul
56dac1cedc limit documentation workflow runs for relevant changes. (#7125) 2024-02-29 19:01:54 +05:30
Sayak Paul
3daebe2b44 use uv for installing stuff in the workflows. (#7116)
* use uv for installing stuff in the workflows.

* fix: from source installation command when using uv.

* fix uv venv issue

* edit editable installation.

* fix quality installation

* checking

* make editable.

* more check

* check

* add: export step

* venv handling.

* checking.

* fix: dependency workflows.

* peft tests.

* proper way to initialize env.

* Empty-Commit

* Empty-Commit
2024-02-29 08:27:24 +05:30
Aryan
abd922bd0c [docs] unet type hints (#7134)
update
2024-02-28 10:39:01 -10:00
elucida
fa633ed6de refactor: move model helper function in pipeline to a mixin class (#6571)
* move model helper function in pipeline to EfficiencyMixin

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-02-28 09:26:39 -10:00
YiYi Xu
2cad1a8465 [stalebot] don't close the issue if the stale label is removed (#7106)
* fix

* fix

* remove stalebot's ability to close issues

---------

Co-authored-by: yiyixuxu <yixu310@gmail,com>
2024-02-28 08:04:28 -10:00
Dhruv Nair
e6cf21906d [Diffusers CI] Switch slow test runners (#7123)
update
2024-02-28 22:15:04 +05:30
Sayak Paul
7db935a141 fix kwarg in the SDXL LoRA DreamBooth (#7124)
* fix kwarg

* Empty-Commit
2024-02-28 10:21:28 +05:30
Sayak Paul
fa9bc029b4 [Tests] make test steps dependent on certain things and general cleanup of the workflows (#7026)
make tests conditional and other things.
2024-02-28 08:26:46 +05:30
Beinsezii
2e31a759b5 DPMSolverMultistep add rescale_betas_zero_snr (#7097)
* DPMMultistep rescale_betas_zero_snr

* DPM upcast samples in step()

* DPM rescale_betas_zero_snr UT

* DPMSolverMulti move sample upcast after model convert

Avoids having to re-use the dtype.

* Add a newline for Ruff
2024-02-27 11:37:34 -10:00
M. Tolga Cangöz
e51862bbed [Docs] Fix typos (#7118)
Fix typos, formatting and remove trailing whitespace
2024-02-27 12:38:00 -08:00
Suraj Patil
8492db2332 add DPM scheduler with EDM formulation (#7120)
* add DPM scheduler with EDM formulation

* set sigmas in init

* add _compute_sigmas

* Apply suggestions from code review

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* address some review comments

* up,

* add tests

---------

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
2024-02-27 23:39:38 +05:30
Suraj Patil
f57e7bd92c Add EDMEulerScheduler (#7109)
* Add EDMEulerScheduler

* address review comments

* fix import

* fix test

* add tests

* add co-author

Co-authored-by:  @dg845 dgu8957@gmail.com
2024-02-27 17:51:19 +05:30
Sayak Paul
3e3d46924b [Dockerfile] remove uv from docker jax tpu (#7115)
remove uv from docker jax tpu
2024-02-27 16:25:50 +05:30
Suraj Patil
d71ecad8cd denormalize latents with the mean and std if available (#7111)
* denormalize latents with the mean and std if available

* fix denormalize

* add latent mean and std in vae config

* address sayak's comment
2024-02-27 16:20:05 +05:30
Dhruv Nair
ac49f97a75 Add tests to check configs when using single file loading (#7099)
* update

* update

* update

* update

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-02-27 15:47:23 +05:30
Sayak Paul
04bafcbbc2 move to uv in the Dockerfiles. (#7094)
move to uv in the Dockerfiles.
2024-02-27 15:11:28 +05:30
Sayak Paul
7081a25618 [Examples] Multiple enhancements to the ControlNet training scripts (#7096)
* log_validation unification for controlnet.

* additional fixes.

* remove print.

* better reuse and loading

* make final inference run conditional.

* Update examples/controlnet/README_sdxl.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* resize the control image in the snippet.

---------

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
2024-02-27 09:18:46 +05:30
Sayak Paul
848f9fe6ce [Core] pass revision in the loading_kwargs. (#7019)
* pass revision in the loading_kwarhs.

* remove revision from load_sub_model.
2024-02-27 08:52:38 +05:30
Younes Belkada
8a692739c0 FIX [PEFT / Core] Copy the state dict when passing it to load_lora_weights (#7058)
* copy the state dict in load lora weights

* fixup
2024-02-27 02:42:23 +01:00
Sayak Paul
5aa31bd674 [Easy] edit issue and PR templates (#7092)
edit templates to remove patrick's name.
2024-02-27 07:10:03 +05:30
jinghuan-Chen
88aa7f6ebf Make LoRACompatibleConv padding_mode work. (#6031)
* Make LoRACompatibleConv padding_mode work.

* Format code style.

* add fast test

* Update src/diffusers/models/lora.py

Simplify the code by patrickvonplaten.

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* code refactor

* apply patrickvonplaten suggestion to simplify the code.

* rm test_lora_layers_old_backend.py and add test case in test_lora_layers_peft.py

* update test case.

---------

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-02-26 14:05:13 -10:00
M. Tolga Cangöz
ad310af0d6 Fix EMA in train_text_to_image_sdxl.py (#7048)
* Fix typos
2024-02-26 10:39:57 -10:00
Dhruv Nair
d603ccb614 Small change to download in dance diffusion convert script (#7070)
* update

* make style
2024-02-26 12:05:19 +05:30
jiqing-feng
fd0f469568 Resize image before crop (#7095)
resize first

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-02-26 11:14:08 +05:30
Stephen
ae84e405a3 Pass use_linear_projection parameter to mid block in UNetMotionModel (#7035)
* pass linear projection parameter to mid block

* add cond_proj_dim to motion UNet

* run style and quality checks
2024-02-26 10:49:14 +05:30
Aryan
3a66113306 [Community] Bug fix + Latest IP-Adapter impl. for AnimateDiff img2vid/controlnet (#7086)
* fix img2vid; update to latest ip-adapter impl

* update README

* update animatediff controlnet to latest impl
2024-02-26 10:27:42 +05:30
Vinh H. Pham
7f16187182 Modularize Dreambooth LoRA SDXL inferencing during and after training (#6655)
* modularize log validation

* run make style

* revert import wandb

* fix code quality & import wandb

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-02-26 09:53:12 +05:30
Vinh H. Pham
f11b922b4f Modularize Dreambooth LoRA SD inferencing during and after training (#6654)
* modulize log validation

* run make style and refactor wanddb support

* remove redundant initialization

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-02-26 09:01:39 +05:30
Steven Liu
3dd4168d4c [docs] Minor updates (#7063)
* updates

* feedback
2024-02-25 09:38:02 -08:00
Fabio Rigano
1c47d1fc05 Fix head_to_batch_dim for IPAdapterAttnProcessor (#7077)
* Fix IPAdapterAttnProcessor

* Fix batch_to_head_dim and revert reshape
2024-02-25 00:05:46 -10:00
Aryan
bbf70c8739 Fix truthy-ness condition in pipelines that use denoising_start (#6912)
* fix denoising start

* fix tests

* remove debug
2024-02-24 23:39:22 -10:00
M. Tolga Cangöz
738c986957 [Refactor] StableDiffusionReferencePipeline inheriting from DiffusionPipeline (#7071)
Refactor StableDiffusionReferencePipeline to inherit from DiffusionPipeline rather than StableDiffusionPipeline
2024-02-23 22:04:31 -10:00
caiyueliang
c09bb588d3 fix: TensorRTStableDiffusionPipeline cannot set guidance_scale (#7065) 2024-02-23 14:59:02 -10:00
bimsarapathiraja
66a7160f9d Change images to image. The variable images is not used anywhere (#7074) 2024-02-23 10:40:21 -10:00
Chong-U Lim
f05ee56b2f Fix docstring of community pipeline imagic (#7062) 2024-02-23 09:37:52 -08:00
M. Tolga Cangöz
34cc7f9b98 Fix typos (#7068) 2024-02-23 09:24:51 -08:00
M. Tolga Cangöz
53605ed00a [Refactor] save_model_card function in text_to_image examples (#7051)
* Refactor save_model_card function to handle images and repo_folder parameters

* Discard changes to examples/text_to_image/train_text_to_image.py

* Discard changes to examples/text_to_image/train_text_to_image_lora_sdxl.py

* Update train_text_to_image_lora.py

* Update train_text_to_image_sdxl.py
2024-02-23 20:57:37 +05:30
Aryan
bb1b76d3bf IPAdapterTesterMixin (#6862)
* begin IPAdapterTesterMixin



---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-02-22 14:25:33 -10:00
YiYi Xu
e4b8f173b9 re-add unet refactor PR (#7044)
* add

* remove copied from

---------

Co-authored-by: ultranity <1095429904@qq.com>
Co-authored-by: yiyixuxu <yixu310@gmail,com>
2024-02-21 22:00:32 -10:00
Hezi Zisman
f0216b7756 allow explicit tokenizer & text_encoder in unload_textual_inversion (#6977)
* allow passing tokenizer & text_encoder to unload_textual_inversion


---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: Fabio Rigano <fabio2rigano@gmail.com>
Co-authored-by: Linoy Tsaban <57615435+linoytsaban@users.noreply.github.com>
2024-02-21 16:53:25 -10:00
Lincoln Stein
d5f444de4b Update checkpoint_merger pipeline to pass the "variant" argument (#6670)
* make checkpoint_merger pipeline pass the "variant" argument to from_pretrained()

* make style

---------

Co-authored-by: Lincoln Stein <lstein@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-02-21 15:45:50 -10:00
M. Tolga Cangöz
5a54dc9e95 Fix typos in text_to_image examples (#7050)
Update copyright information and fix typos in text_to_image examples
2024-02-21 16:40:45 -08:00
YiYi Xu
6fedbd850a fix doc example for fom_single_file (#7015)
* fix doc

* remove use_safetensors from signature

* more

---------

Co-authored-by: yiyixuxu <yixu310@gmail,com>
2024-02-21 19:11:21 +05:30
pravdomil
1b3cfb1b10 update header (#6596)
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-02-20 10:09:15 -08:00
Dhruv Nair
af13a90ebd Remove disable_full_determinism from StableVideoDiffusion xformers test. (#7039)
* update

* update
2024-02-20 21:43:23 +05:30
Dhruv Nair
3067da1261 Fix load_model_dict_into_meta for ControlNet from_single_file (#7034)
update
2024-02-20 11:01:02 +05:30
Dhruv Nair
6bceaea3fe Update ControlNet Inpaint single file test (#7022)
update
2024-02-20 10:58:30 +05:30
Dhruv Nair
baf9924be7 Fix alt text and image links in AnimateLCM docs (#7029)
update
2024-02-20 08:30:44 +05:30
Nontapat Kaewamporn
d8d208acde Supper IP Adapter weight loading in StableDiffusionXLControlNetInpaintPipeline (#7031)
* support ip adapter loading

* fix style
2024-02-19 09:44:35 -10:00
Vinh H. Pham
e0f33dfca4 IP-Adapter support for StableDiffusionXLControlNetInpaintPipeline (#6941)
* add ip-adapter support

* support ip image embeds

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-02-19 08:20:24 -10:00
Dhruv Nair
15b125bb0e Add section on AnimateLCM to docs (#7024)
* update

* update

* update
2024-02-19 22:20:37 +05:30
ustcuna
12004bf3a7 [Community Pipelines]Accelerate inference of stable diffusion xl (SDXL) by IPEX on CPU (#6683)
* add stable_diffusion_xl_ipex community pipeline

* make style for code quality check

* update docs as suggested

---------

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2024-02-19 13:39:08 +01:00
Dhruv Nair
d2fc5ebb95 [Refactor] FreeInit for AnimateDiff based pipelines (#6874)
* update

* update

* update

* update

* update

* update

* update

* update

* update

* update
2024-02-19 11:11:42 +05:30
YiYi Xu
779eef95b4 [from_single_file] pass torch_dtype to set_module_tensor_to_device (#6994)
fix

Co-authored-by: yiyixuxu <yixu310@gmail,com>
2024-02-19 10:09:19 +05:30
Dhruv Nair
d5b8d1ca04 Fix Pixart Slow Tests (#6962)
* update

* update
2024-02-19 09:50:18 +05:30
Fabio Rigano
eba7e7a6d7 IP-Adapter attention masking (#6847)
* Add attention masking to attn processors

* Update tensor conversion

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-02-18 18:06:14 -10:00
Sayak Paul
31de879fb4 [IP2P] Make text encoder truly optional in InstructPi2Pix (#6995)
* make text encoder component truly optional.

* more fixes

* Apply suggestions from code review

Co-authored-by: YiYi Xu <yixu310@gmail.com>

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-02-18 20:21:22 +05:30
nbpppp
07349c25fe Fix deprecation warning for torch.utils._pytree._register_pytree_node in PyTorch 2.2 (#7008)
Fixed deprecation warning for torch.utils._pytree._register_pytree_node in PyTorch 2.2

Co-authored-by: Yinghua <yzho0423@uni.sydney.edu.au>
2024-02-18 20:11:33 +05:30
YiYi Xu
8974c50bff [SVD] fix a bug when passing image as tensor (#6999)
* fix

* update docstring

---------

Co-authored-by: yiyixuxu <yixu310@gmail,com>
2024-02-17 23:10:31 -10:00
Mikhail Koltakov
c18058b405 Fixed typos in dosctrings of __init__() and in forward() of Unet3DConditionModel (#6663)
* Fixed typos in __init__ and in forward of Unet3DConditionModel

* Resolving conflicts

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-02-17 21:56:16 -10:00
Thomas Lips
2938d5a672 Add documentation for strength parameter in Controlnet_img2img pipelines (#6951)
copy docstring for `strength` from  stablediffusion img2img pipeline to controlnet img2img pipelines
2024-02-17 21:47:02 -10:00
Sayak Paul
d4ade821cd start depcrecation cycle for lora_attention_proc 👋 (#7007) 2024-02-18 13:11:30 +05:30
Steven Liu
3a7e481611 [docs] Video generation (#6701)
* first draft

* fix path

* fix path

* i2vgen-xl

* review

* modelscopet2v

* feedback
2024-02-16 16:35:37 -08:00
Steven Liu
d649d6c6f3 [docs] Fix callout (#6998)
Update ip_adapter.md
2024-02-16 10:37:12 -10:00
Bhavay Malhotra
777063e1bf Update textual_inversion.py (#6952)
* Update textual_inversion.py

* Apply suggestions from code review

* Update textual_inversion.py

* Update textual_inversion.py

* Update textual_inversion.py

* Update textual_inversion.py

* Update examples/textual_inversion/textual_inversion.py

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* Update textual_inversion.py

* styling

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-02-16 15:39:51 +05:30
Stephen
104afbce84 Standardize model card for textual inversion sdxl (#6963)
* standardize model card

* fix tags

* correct import styling and update tags

* run make style and make quality

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-02-16 14:27:11 +05:30
co63oc
c0f5346a20 Fix procecss process (#6591)
* Fix words

* Fix

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-02-15 19:06:33 -10:00
Sayak Paul
087daee2f0 add: peft to the benchmark workflow (#6989) 2024-02-16 09:29:10 +05:30
Paakhhi
7e164d98a8 Fix diffusers import prompt2prompt (#6927)
* Bugfix: correct import for diffusers

* Fix: Prompt2Prompt example

* Format style

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-02-15 15:30:16 -10:00
Sayak Paul
e6d1728e0a [IP Adapters] feat: allow low_cpu_mem_usage in ip adapter loading (#6946)
* feat: allow low_cpu_mem_usage in ip adapter loading

* reduce the number of device placements.

* documentation.

* throw low_cpu_mem_usage warning only once from the main entry point.
2024-02-15 15:37:17 +05:30
Linoy Tsaban
8f2c7b4df0 [advanced sdxl lora script] - fix #6967 bug when using prior preservation loss (#6968)
* fix bug in micro-conditioning of class images

* fix bug in micro-conditioning of class images

* style
2024-02-15 12:20:05 +05:30
YiYi Xu
2e387dad5f fix IPAdapter unload_ip_adapter test (#6972)
add

Co-authored-by: yiyixuxu <yixu310@gmail,com>
2024-02-14 20:42:40 -10:00
Steven Liu
9efe1e52c3 [docs] IP-Adapter (#6897)
* use cases

* first draft

* fix image links

* lcm-lora

* feedback

* review

* feedback

* feedback
2024-02-14 13:23:37 -08:00
Sayak Paul
37b09517b9 fix: controlnet inpaint single file. (#6975) 2024-02-14 19:04:57 +05:30
Sayak Paul
4343ce2c8e [Core] Harmonize single file ckpt model loading (#6971)
* use load_model_into_meta in single file utils

* propagate to autoencoder and controlnet.

* correct class name access behaviour.

* remove torch_dtype from load_model_into_meta; seems unncessary

* remove incorrect kwarg

* style to avoid extra unnecessary line breaks
2024-02-14 10:49:06 +05:30
Younes Belkada
0ca7b68198 [PEFT / docs] Add a note about torch.compile (#6864)
* Update using_peft_for_inference.md

* add more explanation
2024-02-14 02:29:29 +01:00
Dhruv Nair
3cf4f9c735 Allow passing config_file argument to ControlNetModel when using from_single_file (#6959)
* update

* update

* update
2024-02-13 18:54:53 +05:30
Dhruv Nair
40dd9cb2bd Move SDXL T2I Adapter lora test into PEFT workflow (#6965)
update
2024-02-13 17:08:53 +05:30
Dhruv Nair
30bcda7de6 Fix flaky IP Adapter test (#6960)
update
2024-02-13 17:07:39 +05:30
YiYi Xu
9ea62d119a [DPMSolverSinglestepScheduler] correct get_order_list for solver_order=2and lower_order_final=True (#6953)
* add

* change default

---------

Co-authored-by: yiyixuxu <yixu310@gmail,com>
2024-02-12 22:10:33 -10:00
Dhruv Nair
a326d61118 Fix configuring VAE from single file mixin (#6950)
* update
2024-02-12 22:10:05 -10:00
Alex Umnov
e7696e20f9 Updated lora inference instructions (#6913)
* Updated lora inference instructions

* Update examples/dreambooth/README.md

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* Update README.md

* Update README.md

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-02-13 09:35:20 +05:30
Piyush Thakur
4b89aeffe1 [Type annotations] fixed in save_model_card (#6948)
fixed type annotations

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-02-13 08:56:45 +05:30
Steven Liu
0a1daadef8 [docs] Community pipelines (#6929)
fix
2024-02-12 10:38:13 -08:00
Sayak Paul
371f765908 [Diffusers -> Original SD conversion] fix things (#6933)
* fix: bias loading bug

* fixes for SDXL

* apply changes to the conversion script to match single_file_utils.py

* do transpose to match the single file loading logic.
2024-02-12 17:30:22 +05:30
Piyush Thakur
75aee39eac [Model Card] standardize T2I Adapter Sdxl model card (#6947)
standardize model card template t21-adapter-sdxl
2024-02-12 16:43:20 +05:30
Dhruv Nair
215e6804d3 Unpin torch versions in CI (#6945)
* update

* update

* update

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-02-12 16:01:05 +05:30
Disty0
9254d1f39a Pass device to enable_model_cpu_offload in maybe_free_model_hooks (#6937) 2024-02-12 13:42:32 +05:30
Piyush Thakur
e1bdcc7af3 [Model Card] standardize T2I Sdxl Lora model card (#6944)
* standardize model card template t2i-lora-sdxl

* type annotations
2024-02-12 11:45:40 +05:30
Dhruv Nair
84905ca728 Update PixArt Alpha test module to match src module (#6943)
update
2024-02-12 11:01:33 +05:30
Piyush Thakur
6f336650c3 [Model Card] standardize T2I Sdxl model card (#6942)
standardize model card template t2i-sdxl
2024-02-12 10:01:20 +05:30
Piyush Thakur
06a042cd0e [Model Card] standardize T2I Lora model card (#6940)
standardize model card t2i-lora
2024-02-12 10:01:13 +05:30
Piyush Thakur
8772496586 [Model Card] standardize T2I model card (#6939)
* standardize model card

* fix base_model
2024-02-12 10:00:41 +05:30
dg845
35fd84be27 Replace hardcoded values in SchedulerCommonTest with properties (#5479)
---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2024-02-10 21:34:20 -10:00
YiYi Xu
f2756253e6 Fix a bug in AutoPipeline.from_pipe when switching pipeline with optional components (#6820)
* fix

* add tests

---------

Co-authored-by: yiyixuxu <yixu310@gmail,com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-02-09 21:56:52 -10:00
YiYi Xu
0071478d9e allow attention processors to have different signatures (#6915)
add

Co-authored-by: yiyixuxu <yixu310@gmail,com>
2024-02-09 21:56:19 -10:00
Sayak Paul
7c8cab313e post release 0.26.2 (#6885)
* post release

* style

* Empty-Commit

---------

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2024-02-09 07:36:38 -10:00
Sayak Paul
ca9ed5e8d1 [LoRA] deprecate certain lora methods from the old backend. (#6889)
* deprecate certain lora methods from the old backend.

* uncomment necessary things.

* safe remove old lora backend 👋
2024-02-09 17:14:32 +01:00
Bingxin Ke
98b6bee1a1 [Community Pipeline][Bug Fix] marigold_depth_estimation: input image value range (#6787)
[FIX] IMPORTANT: rgb normalization
2024-02-09 16:55:37 +01:00
Dhruv Nair
ab7113487c More IPAdapter test fixes (#6888)
update
2024-02-09 13:54:58 +05:30
camaro
59c307f1d5 Standardize model card for Controlnet (#6910)
* controlnet

* style

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-02-09 12:29:08 +05:30
Sayak Paul
159885adc6 correct hub_token exposition behaviour (thanks to @bghira). (#6918) 2024-02-08 18:38:27 -10:00
Aryan
7337eea59b [refactor] unnecessary lines in decode_latents in video pipelines (#6682)
* refactor decode latents in video pipelines

* make fix-copies
2024-02-08 17:10:52 -10:00
camaro
f07899a57c Standardize model card for Controlnet SDXL (#6908)
controlnet-sdxl
2024-02-09 07:53:39 +05:30
camaro
a83cc0c0bc Standardize model card for Controlnet flax (#6909)
* controlnet-flax

* style

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-02-09 07:52:56 +05:30
C Q
db5194a45d Fix Compatibility Issues in stable_diffusion_xl_reference.py (#6251)
* Fix Compatibility Issues in stable_diffusion_xl_reference.py

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-02-08 10:52:59 -10:00
shaoxiaowang
e6c9c2513f fix examples/community/pipeline_stable_diffusion_xl_instantid.py (#6759)
Co-authored-by: wangshaoxiao <wangshaoxiao@xiaomi.com>
2024-02-08 10:09:40 -10:00
Kyunghwan Kim
d643b6691f Add vae tiling and slicing in img2img and inpaint (#6871)
* Add vae tiling in img2img and inpaint

* Add vae tiling not slicing
2024-02-08 09:50:00 -10:00
Michael
f5c9be3a0a Remove <cat-toy> validation prompt from examples/textual_inversion/textual_inversion_sdxl.py (#6877)
Remove <cat-toy> validation prompt from textual_inversion_sdxl.py

The `<cat-toy>` validation prompt is a default choice for the example task in the README. But no other part of `textual_inversion_sdxl.py` references the cat toy and `textual_inversion.py` has a default validation prompt of `None` as well.

So bring `textual_inversion_sdxl.py` in line with `textual_inversion.py` and change default validation prompt to `None`
2024-02-08 09:46:06 -10:00
Laisky.Cai
1824d0050e fix: load_image should support PIL Image (#6904)
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-02-08 08:25:37 -10:00
Sayak Paul
30e5e81d58 change to 2024 in the license (#6902)
change to 2024
2024-02-08 08:19:31 -10:00
Masamune Ishihara
8de78001df Add fps argument to export_to_gif function. (#6786) 2024-02-08 21:59:51 +05:30
Patryk Bartkowiak
3ac2357794 changed positional parameters to named parameters like in docs (#6905)
Co-authored-by: Patryk Bartkowiak <patryk.bartkowiak@tcl.com>
Co-authored-by: Linoy Tsaban <57615435+linoytsaban@users.noreply.github.com>
2024-02-08 21:39:03 +05:30
Ehsan Akhgari
17808a091e Fix bug when converting checkpoint to diffusers format (#6900)
This fixes #6899.
2024-02-08 18:52:11 +05:30
Sayak Paul
491a933a1b [I2VGenXL] attention_head_dim in the UNet (#6872)
* attention_head_dim

* debug

* print more info

* correct num_attention_heads behaviour

* down_block_num_attention_heads -> num_attention_heads.

* correct the image link in doc.

* add: deprecation for num_attention_head

* fix: test argument to use attention_head_dim

* more fixes.

* quality

* address comments.

* remove depcrecation.
2024-02-08 12:30:14 +05:30
Sayak Paul
aa82df52e7 [IP Adapters] introduce ip_adapter_image_embeds in the SD pipeline call (#6868)
* add: support for passing ip adapter image embeddings

* debugging

* make feature_extractor unloading conditioned on safety_checker

* better condition

* type annotation

* index to look into value slices

* more debugging

* debugging

* serialize embeddings dict

* better conditioning

* remove unnecessary prints.

* Update src/diffusers/loaders/ip_adapter.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* make fix-copies and styling.

* styling and further copy fixing.

* fix: check_inputs call in controlnet sdxl img2img pipeline

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-02-08 11:10:10 +05:30
Srimanth Agastyaraju
a11b0f83b7 Fix: training resume from fp16 for SDXL Consistency Distillation (#6840)
* Fix: training resume from fp16 for lcm distill lora sdxl

* Fix coding quality - run linter

* Fix 1 - shift mixed precision cast before optimizer

* Fix 2 - State dict errors by removing load_lora_into_unet

* Update train_lcm_distill_lora_sdxl.py - Revert default cache dir to None

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-02-08 11:09:29 +05:30
Sayak Paul
1835510524 Remove torch_dtype in to() to end deprecation (#6886)
* remove torch_dtype from to()

* remove torch_dtype from usage scripts.

* remove old lora backend

* Revert "remove old lora backend"

This reverts commit adcddf6ba4.
2024-02-08 09:38:57 +05:30
camaro
4a3d52850b fix: keyword argument mismatch (#6895) 2024-02-08 09:37:56 +05:30
YiYi Xu
97d004b9b4 [ip-adapter] make sure length of scale is same as number of ip-adapters when using set_ip_adapter_scale (#6884)
add

Co-authored-by: yiyixuxu <yixu310@gmail,com>
2024-02-07 10:13:12 -10:00
Sayak Paul
76696dca55 [Model Card] standardize dreambooth model card (#6729)
* feat: standarize model card creation for dreambooth training.

* correct 'inference

* remove comments.

* take component out of kwargs

* style

* add: card template to have a leaner description.

* widget support.

* propagate changes to train_dreambooth_lora

* propagate changes to custom diffusion

* make widget properly type-annotated
2024-02-07 15:07:11 +05:30
Félix Sanz
17612de451 fix: typo in callback function name and property (#6834)
* fix: callback function name is incorrect

On this tutorial there is a function defined and then used inside `callback_on_step_end` argument, but the name was not correct (mismatch)

* fix: typo in num_timestep (correct is num_timesteps)

fixed property name
2024-02-06 12:05:40 -08:00
Dhruv Nair
994360f7a5 Fix last IP Adapter test (#6875)
update
2024-02-06 08:53:40 -10:00
Dhruv Nair
e6a48db633 Refactor Deepfloyd IF tests. (#6855)
* update

* update

* update
2024-02-06 16:43:17 +05:30
sayakpaul
4f1df69d1a Revert "add attention_head_dim"
This reverts commit 15f6b22466.
2024-02-06 14:48:49 +05:30
sayakpaul
15f6b22466 add attention_head_dim 2024-02-06 14:48:07 +05:30
Sayak Paul
e6fd9ada3a [I2vGenXL] clean up things (#6845)
* remove _to_tensor

* remove _to_tensor definition

* remove _collapse_frames_into_batch

* remove lora for not bloating the code.

* remove sample_size.

* simplify code a bit more

* ensure timesteps are always in tensor.
2024-02-06 09:22:07 +05:30
Edward Li
493228a708 Fix AutoencoderTiny with use_slicing (#6850)
* Fix `AutoencoderTiny` with `use_slicing`

When using slicing with AutoencoderTiny, the encoder mistakenly encodes the entire batch for every image in the batch.

* Fixed formatting issue
2024-02-05 09:18:22 -10:00
Dhruv Nair
8bf046b7fb Add single file and IP Adapter support to PIA Pipeline (#6851)
update
2024-02-05 16:23:18 +05:30
Dhruv Nair
bb99623d09 Update IP Adapter tests to use cosine similarity distance (#6806)
* update

* update
2024-02-05 16:22:59 +05:30
Dhruv Nair
fdf55b1f1c Fix posix path issue in testing utils (#6849)
update
2024-02-05 08:57:18 +05:30
小咩Goat
c6f8c310c3 Fix forward pass in UNetMotionModel when gradient checkpoint is enabled (#6744)
fix #6742

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2024-02-05 08:04:01 +05:30
YiYi Xu
64909f17b7 update IP-adapter code in UNetMotionModel (#6828)
fix

Co-authored-by: yiyixuxu <yixu310@gmail,com>
2024-02-05 07:26:46 +05:30
Dhruv Nair
f09ca909c8 Multiple small fixes to Video Pipeline docs (#6805)
* update

* update

* update

* Update src/diffusers/pipelines/i2vgen_xl/pipeline_i2vgen_xl.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* update

* update

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-02-05 07:24:38 +05:30
YiYi Xu
a5fc62f819 add self.use_ada_layer_norm_* params back to BasicTransformerBlock (#6841)
fix sd reference community ppeline

Co-authored-by: yiyixuxu <yixu310@gmail,com>
2024-02-04 11:16:44 -10:00
Linoy Tsaban
fbdf26bac5 [dreambooth lora sdxl] add sdxl micro conditioning (#6795)
* add micro conditioning

* remove redundant lines

* style

* fix missing 's'

* fix missing shape bug due to missing RGB if statement

* remove redundant if, change arg order

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-02-04 16:00:09 +02:00
Fabio Rigano
13001ee315 Bugfix in IPAdapterFaceID (#6835) 2024-02-03 08:56:55 -10:00
Linoy Tsaban
65329aed98 [advanced dreambooth lora sdxl script] new features + bug fixes (#6691)
* add noise_offset param

* micro conditioning - wip

* image processing adjusted and moved to support micro conditioning

* change time ids to be computed inside train loop

* change time ids to be computed inside train loop

* change time ids to be computed inside train loop

* time ids shape fix

* move token replacement of validation prompt to the same section of instance prompt and class prompt

* add offset noise to sd15 advanced script

* fix token loading during validation

* fix token loading during validation in sdxl script

* a little clean

* style

* a little clean

* style

* sdxl script - a little clean + minor path fix

sd 1.5 script - change default resolution value

* ad 1.5 script - minor path fix

* fix missing comma in code example in model card

* clean up commented lines

* style

* remove time ids computed outside training loop - no longer used now that we utilize micro-conditioning, as all time ids are now computed inside the training loop

* style

* [WIP] - added draft readme, building off of examples/dreambooth/README.md

* readme

* readme

* readme

* readme

* readme

* readme

* readme

* readme

* removed --crops_coords_top_left from CLI args

* style

* fix missing shape bug due to missing RGB if statement

* add blog mention at the start of the reamde as well

* Update examples/advanced_diffusion_training/README.md

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* change note to render nicely as well

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-02-03 17:33:43 +02:00
Stephen
02338c9317 Change path to posix (testing_utils.py) (#6803)
change path to pathlib as_posix

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-02-03 12:44:13 +05:30
Younes Belkada
15ed53d272 Fixes LoRA SDXL training script with DDP + PEFT (#6816)
Update train_dreambooth_lora_sdxl.py
2024-02-03 09:46:32 +05:30
UmerHA
9cc59ba089 [Contributor Experience] Fix test collection on MPS (#6808)
* Update testing_utils.py

* Update testing_utils.py
2024-02-02 20:59:00 +05:30
YiYi Xu
adcbe674a4 [refactor]Scheduler.set_begin_index (#6728) 2024-02-01 09:51:02 -10:00
Sayak Paul
ec9840a5db [Refactor] harmonize the module structure for models in tests (#6738)
* harmonize the module structure for models in tests

* make the folders modules.

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-02-01 14:23:39 +05:30
YiYi Xu
093a03a1a1 add is_torchvision_available (#6800)
* add

* remove transformer

---------

Co-authored-by: yiyixuxu <yixu310@gmail,com>
2024-01-31 20:01:44 -10:00
Patrick von Platen
c3369f5673 fix torchvision import (#6796) 2024-01-31 12:13:10 -10:00
Sayak Paul
04cd6adf8c [Feat] add I2VGenXL for image-to-video generation (#6665)
---------

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-01-31 10:38:51 -10:00
YiYi Xu
66722dbea7 [sdxl k-diffusion pipeline]move sigma to device (#6757)
move sigma to device

Co-authored-by: yiyixuxu <yixu310@gmail,com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-01-31 09:29:15 -10:00
YiYi Xu
2e8d18e699 [IP-Adapter] Support multiple IP-Adapters (#6573)
---------

Co-authored-by: yiyixuxu <yixu310@gmail,com>
Co-authored-by: Alvaro Somoza <somoza.alvaro@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2024-01-31 07:11:15 -10:00
Steven Liu
03373de0db [docs] Add missing parameter (#6775)
add missing param
2024-01-31 08:53:40 -08:00
Dhruv Nair
56bea6b4a1 Add PIA Model/Pipeline (#6698)
* update

* update

* updaet

* add tests and docs

* clean up

* add to toctree

* fix copies

* pr review feedback

* fix copies

* fix tests

* update docs

* update

* update

* update docs

* update

* update

* update

* update
2024-01-31 18:00:17 +02:00
Dhruv Nair
d7dc0ffd79 Fix setting scaling factor in VAE config (#6779)
fix
2024-01-31 19:47:22 +05:30
Kashif Rasul
97ee616971 add ipo, hinge and cpo loss to dpo trainer (#6788)
add ipo and hinge loss to dpo trainer
2024-01-31 16:41:31 +05:30
Sayak Paul
0fc62d1702 [Kandinsky tests] add is_flaky to test_model_cpu_offload_forward_pass (#6762)
* add is_flaky to test_model_cpu_offload_forward_pass

* style

* update

---------

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2024-01-31 14:51:12 +05:30
Dhruv Nair
f4d3f913f4 Pin torch < 2.2.0 in test runners (#6780)
* update

* update

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-01-31 13:41:18 +05:30
Viet Nguyen
1cab64b3be Update train_diffusion_dpo.py (#6754)
* Update train_diffusion_dpo.py

Address #6702

* Update train_diffusion_dpo_sdxl.py

* Empty-Commit

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-01-31 12:46:23 +05:30
Sayak Paul
8d7dc85312 add note about serialization (#6764) 2024-01-31 12:45:40 +05:30
dg845
87a92f779c Fix bug in ResnetBlock2D.forward where LoRA Scale gets Overwritten (#6736)
Fix bug in ResnetBlock2D.forward when not USE_PEFT_BACKEND and using scale_shift for time emb where the lora scale  gets overwritten.

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-01-30 14:43:48 -10:00
Yunxuan Xiao
0db766ba77 [DDPMScheduler] Load alpha_cumprod to device to avoid redundant data movement. (#6704)
* load cumprod tensor to device

Signed-off-by: woshiyyya <xiaoyunxuan1998@gmail.com>

* fixing ci

Signed-off-by: woshiyyya <xiaoyunxuan1998@gmail.com>

* make fix-copies

Signed-off-by: woshiyyya <xiaoyunxuan1998@gmail.com>

---------

Signed-off-by: woshiyyya <xiaoyunxuan1998@gmail.com>
2024-01-30 13:19:37 -10:00
Dhruv Nair
8e94663503 Update export to video to support new tensor_to_vid function in video pipelines (#6715)
update
2024-01-30 19:43:33 +05:30
YiYi Xu
b09b90e24c udpate ip-adapter slow tests (#6760)
* udpate slices

* up

* hopefully last one

---------

Co-authored-by: yiyixuxu <yixu310@gmail,com>
2024-01-29 17:55:41 -10:00
Sajad Norouzi
058b47553e Fix mixed precision fine-tuning for text-to-image-lora-sdxl example. (#6751)
* Fix mixed precision fine-tuning for text-to-image-lora-sdxl example.

* fix text_encoder_two bug.

---------

Co-authored-by: Sajad Norouzi <sajadn@dev-dsk-sajadn-2a-87239470.us-west-2.amazon.com>
2024-01-30 06:55:02 +05:30
xhedit
7f58a76f48 Update lora.md with a more accurate description of rank (#6724)
* Update lora.md with a more accurate description of rank

* Update docs/source/en/training/lora.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-01-29 09:41:51 -08:00
Sayak Paul
09b7bfce91 [Core] move transformer scripts to transformers modules (#6747)
* move transformer scripts to transformers modules

* move transformer model test

* move prior transformer test to  directory

* fix doc path

* correct doc path

* add: __init__.py
2024-01-29 22:28:28 +05:30
Fabio Rigano
5d8b1987ec Add unload_textual_inversion method (#6656)
* Add unload_textual_inversion

* Fix dicts in tokenizer

* Fix quality

* Apply suggestions from code review

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Fix variable name after last update

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-01-29 06:26:22 -10:00
Sayak Paul
acd1962769 correct hflip arg (#6743) 2024-01-28 17:42:34 +05:30
Stephen
5b1b80a5b6 Change os.path to pathlib Path (#6737)
Change os.path to pathlib
2024-01-28 10:24:13 +05:30
gzguevara
8581d9bce4 changed to posix unet (#6719)
changed to posix

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-01-27 16:31:52 +05:30
Yingtian Liu
c101066227 Correct SNR weighted loss in v-prediction case by only adding 1 to SNR on the denominator (#6307)
* fix minsnr implementation for v-prediction case

* format code

* always compute snr when snr_gamma is specified

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-01-27 09:18:09 +05:30
Sayak Paul
d4c7ab7bf1 [Hub] feat: explicitly tag to diffusers when using push_to_hub (#6678)
* feat: explicitly tag to diffusers when using push_to_hub

* remove tags.

* reset repo.

* Apply suggestions from code review

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* fix: tests

* fix: push_to_hub behaviour for tagging from save_pretrained

* Apply suggestions from code review

Co-authored-by: Lucain <lucainp@gmail.com>

* Apply suggestions from code review

Co-authored-by: Lucain <lucainp@gmail.com>

* import fixes.

* add library name to existing model card.

* add: standalone test for generate_model_card

* fix tests for standalone method

* moved library_name to a better place.

* merge create_model_card and generate_model_card.

* fix test

* address lucain's comments

* fix return identation

* Apply suggestions from code review

Co-authored-by: Lucain <lucainp@gmail.com>

* address further comments.

* Update src/diffusers/pipelines/pipeline_utils.py

Co-authored-by: Lucain <lucainp@gmail.com>

---------

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Lucain <lucainp@gmail.com>
2024-01-26 23:01:48 +05:30
dg845
ea9dc3fa90 Add UFOGenScheduler to Community Examples (#6650)
* Add UFOGenScheduler with diffusers imports changed from relative to absolute.

* make style

* Add community README entry for UFOGenScheduler.
2024-01-26 15:11:14 +02:00
dg845
b4220e97b1 Add Community Example Consistency Training Script (#6717)
* initial commit for unconditional/class-conditional consistency training script

* make style

* Add entry for consistency training script in community README.

* Move consistency training script from community to research_projects/consistency_training

* Add requirements.txt and README to research_projects/consistency_training directory.

* Manually revert community README changes for consistency training.

* Fix path to script after moving script to research projects.

* Add option to load U-Net weights from pretrained model.

---------

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2024-01-26 15:10:57 +02:00
Dhruv Nair
dc85b578c2 Move tests for SD inference variant pipelines into their own modules (#6707)
* update

* update

* update
2024-01-26 14:09:41 +02:00
Dhruv Nair
0d927c7542 Add IP Adapters to slow tests (#6714)
update
2024-01-25 19:52:50 -10:00
Andrew Ishutin
5b93338235 fix custom diffusion training with concept list (#6710)
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-01-26 11:17:51 +08:00
Aryan V S
7c1c705f60 fix community README (#6645) 2024-01-25 17:52:52 -08:00
Aryan V S
9e72016468 [docs] AnimateDiff Video-to-Video (#6712)
* add animatediff vid2vid to docs

* Update docs/source/en/api/pipelines/animatediff.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* apply suggestions from review

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-01-25 17:51:43 -08:00
Patrick von Platen
3e9716f22b Correct sigmas cpu settings (#6708) 2024-01-26 09:32:24 +08:00
Steven Liu
87bfbc320d [docs] UViT2D (#6643)
* uvit2d

* fix

* fix?

* add correct paper

* fix paths

* update abstract
2024-01-25 09:37:28 -08:00
Aryan V S
a517f665a4 AnimateDiff Video to Video (#6328)
* begin animatediff img2video and video2video

* revert animatediff to original implementation

* add img2video as pipeline

* update

* add vid2vid pipeline

* update imports

* update

* remove copied from line for check_inputs

* update

* update examples

* add multi-batch support

* fix __init__.py files

* move img2vid to community

* update community readme and examples

* fix

* make fix-copies

* add vid2vid batch params

* apply suggestions from review

Co-Authored-By: Dhruv Nair <dhruv.nair@gmail.com>

* add test for animatediff vid2vid

* torch.stack -> torch.cat

Co-Authored-By: Dhruv Nair <dhruv.nair@gmail.com>

* make style

* docs for vid2vid

* update

* fix prepare_latents

* fix docs

* remove img2vid

* update README to :main

* remove slow test

* refactor pipeline output

* update docs

* update docs

* merge community readme from :main

* final fix i promise

* add support for url in animatediff example

* update example

* update callbacks to latest implementation

* Update src/diffusers/pipelines/animatediff/pipeline_animatediff_video2video.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/diffusers/pipelines/animatediff/pipeline_animatediff_video2video.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* fix merge

* Apply suggestions from code review

* remove callback and callback_steps as suggested in review

* Update tests/pipelines/animatediff/test_animatediff_video2video.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* fix import error caused due to unet refactor in #6630

* fix numpy import error after tensor2vid refactor in #6626

* make fix-copies

* fix numpy error

* fix progress bar test

---------

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2024-01-24 18:22:26 +05:30
Brandon Strong
16748d1eba SD 1.5 Support For Advanced Lora Training (train_dreambooth_lora_sdxl_advanced.py) (#6449)
* sd1.5 support in separate script

A quick adaptation to support people interested in using this method on 1.5 models.

* sd15 prompt text encoding and unet conversions

as per @linoytsaban 's recommendations. Testing would be appreciated,

* Readability and quality improvements

Removed some mentions of SDXL, and some arguments that don't apply to sd 1.5, and cleaned up some comments.

* make style/quality commands

* tracker rename and run-it doc

* Update examples/advanced_diffusion_training/train_dreambooth_lora_sd15_advanced.py

* Update examples/advanced_diffusion_training/train_dreambooth_lora_sd15_advanced.py

---------

Co-authored-by: Linoy Tsaban <57615435+linoytsaban@users.noreply.github.com>
2024-01-24 11:20:08 +02:00
Haofan Wang
c9081a8abd [Fix bugs] pipeline_controlnet_sd_xl.py (#6653)
* Update pipeline_controlnet_sd_xl.py

* Update pipeline_controlnet_xs_sd_xl.py
2024-01-24 09:18:12 +05:30
Yasuna
0eb68d9ddb [Docs] update: tutorials ja | AUTOPIPELINE.md (#6629)
* add en file

* translate 1-118 lines

* add text

* add toctree

* fix

* fix typo

* fix link
2024-01-23 09:19:55 -08:00
Haofan Wang
9941b3f124 Add InstantID Pipeline (#6673)
* add instantid pipeline

* format

* Update README.md

* Update README.md

* format

---------

Co-authored-by: ResearcherXman <xhs.research@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-01-23 17:25:23 +05:30
Ayush Mangal
16b9f98b48 [WIP][Community Pipeline] InstaFlow! One-Step Stable Diffusion with Rectified Flow (#6057)
* Add instaflow community pipeline

* Make styling fixes

* Add lora

* Fix formatting

* Add docs

* Update README.md

* Update README.md

* Remove do LORA

* Update readme

* Update README.md

* Update README.md

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2024-01-23 13:52:50 +02:00
Dhruv Nair
fee93c81eb [Refactor] Update from single file (#6428)
* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update'

* update

* update

* update

* update

* update

* update

* up

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* up

* update

* update

* update

* update

* update'

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* clean

* update

* update

* clean up

* clean up

* update

* clean

* clean

* update

* updaet

* clean up

* fix docs

* update

* update

* Revert "update"

This reverts commit dbfb8f1ea9.

* update

* update

* update

* update

* fix controlnet

* fix scheduler

* fix controlnet tests
2024-01-23 14:42:03 +05:30
Sayak Paul
5308cce994 [Tests] Test for passing local config file to from_single_file() (#6638)
make config file local too.
2024-01-23 14:21:23 +05:30
YiYi Xu
318556b20e fix dpm related slow test failure (#6680)
fix

Co-authored-by: yiyixuxu <yixu310@gmail,com>
2024-01-22 18:52:05 -10:00
Dhruv Nair
6620eda357 Standardise outputs for video pipelines (#6626)
* update

* update

* update

* update

* update

* update

* update

* clean up

* clean up
2024-01-23 10:07:07 +05:30
Sayak Paul
1f0705adcf [Big refactor] move unets to unets module 🦋 (#6630)
* move unets to  module 🦋

* parameterize unet-level import.

* fix flax unet2dcondition model import

* models __init__

* mildly depcrecating models.unet_2d_blocks in favor of models.unets.unet_2d_blocks.

* noqa

* correct depcrecation behaviour

* inherit from the actual classes.

* Empty-Commit

* backwards compatibility for unet_2d.py

* backward compatibility for unet_2d_condition

* bc for unet_1d

* bc for unet_1d_blocks
2024-01-23 08:57:58 +05:30
M. Tolga Cangöz
5e96333cb2 Update README (#6669)
Update number of checkpoints and repositories in README
2024-01-22 08:08:07 -08:00
Sayak Paul
da95a28ff6 [Diffusion DPO] apply fixes from #6547 (#6668)
apply fixes from #6547
2024-01-22 20:14:54 +05:30
Dhruv Nair
d66d554dc2 Add tearDown method to LoRA tests. (#6660)
* update

* update
2024-01-22 14:00:37 +05:30
Junsong Chen
c7df846dec add Sa-Solver (#5975)
* add Sa-Solver



---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: scxue <xueshuchen17@mails.ucas.edu.cn>
Co-authored-by: jschen <chenjunsong4@h-partners.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: yiyixuxu <yixu310@gmail,com>
2024-01-21 21:37:44 -10:00
Vinh H. Pham
8e7bbfbe5a add padding_mask_crop to all inpaint pipelines (#6360)
* add padding_mask_crop
---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-01-21 19:42:22 -10:00
YiYi Xu
e2773c6255 fix SDXL-kdiffusion tests (#6647)
🤞🤞🤞

Co-authored-by: yiyixuxu <yixu310@gmail,com>
2024-01-19 17:37:29 -10:00
YiYi Xu
ac61eefc9f fix DPM Scheduler with use_karras_sigmas option (#6477)
* fix

---------

Co-authored-by: yiyixuxu <yixu310@gmail,com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2024-01-19 07:08:22 -10:00
HelloWorldBeginner
f95615b823 Fixed the bug related to saving DeepSpeed models. (#6628)
* Fixed the bug related to saving DeepSpeed models.

* Add information about training SD models using DeepSpeed to the README.

* Apply suggestions from code review

---------

Co-authored-by: mhh001 <mahonghao1@huawei.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-01-19 19:21:57 +05:30
SangKim
a9288b49c9 Modularize InstructPix2Pix SDXL inferencing during and after training in examples (#6569) 2024-01-19 15:47:34 +05:30
elucida
c54419658b refactor: extract init/forward function in UNet2DConditionModel (#6478)
* - extract function for stage in UNet2DConditionModel init & forward
- Add new function get_mid_block() to unet_2d_blocks.py

* add type hint to get_mid_block aligned with get_up_block and get_down_block; rename _set_xxx function

* add type hint and  use keyword arguments

* remove `copy from` in versatile diffusion
2024-01-19 12:13:34 +02:00
Aryan V S
6382663dc8 [Community] Experimental AnimateDiff Image to Video (open to improvements) (#6509)
* add animatediff img2vid

* fix

* Update examples/community/README.md

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* fix code snippet between ip adapter face id and animatediff img2vid

---------

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2024-01-19 12:05:41 +02:00
spezialspezial
58b8dce129 Update convert_from_ckpt.py / read checkpoint config yaml contents (#6633)
Update convert_from_ckpt.py / read config yaml contents

Added missing reading of config yaml file contents
2024-01-19 08:25:03 +05:30
Dhruv Nair
a65ca8a059 Fix failing tests due to Posix Path (#6627)
update
2024-01-18 19:19:57 +05:30
Steven Liu
5ca062e011 [docs] Fix missing API function (#6604)
fix?
2024-01-17 13:59:09 -08:00
Linoy Tsaban
619e3ab6f6 [bug fix] advanced dreambooth lora sdxl - fixes bugs described in #6486 (#6599)
* fixes bugs:
1. redundant retraction
2. param clone
3. stopping optimization of text encoder params

* param upscaling

* style
2024-01-17 20:11:45 +05:30
Patrick von Platen
9e2804f720 Update pr_test_peft_backend.yml to use 1 process for testing (#6613) 2024-01-17 19:25:30 +05:30
Aryan V S
9112028ed8 FreeInit (#6315)
* freeinit

* update freeinit implementation based on review

Co-Authored-By: Dhruv Nair <dhruv.nair@gmail.com>

* fix

* another fix

* refactor

* fix timesteps missing bug

* apply suggestions from review

Co-Authored-By: Dhruv Nair <dhruv.nair@gmail.com>

* add test for freeinit

* apply suggestions from review

Co-Authored-By: Dhruv Nair <dhruv.nair@gmail.com>

* refactor

* fix test

* fix tensor not on same device

* update

* remove return_intermediate_results

* fix broken freeinit test

* update animatediff docs

---------

Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
2024-01-17 17:17:07 +05:30
Steve Rhoades
dce06680d2 Fixes torch.compile() compatible training (#6589)
resolve conflicts
2024-01-17 07:47:03 +05:30
Bhavay Malhotra
dd63168319 Update installation.md (#6438)
* Update installation.md

* Update installation.md

* Update installation.md
2024-01-16 13:41:38 -08:00
Celestial Phineas
1040dfd9cc [Fix] Multiple image conditionings in a single batch for StableDiffusionControlNetPipeline (#6334)
* [Fix] Multiple image conditionings in a single batch for `StableDiffusionControlNetPipeline`.

* Refactor `check_inputs` in `StableDiffusionControlNetPipeline` to avoid redundant codes.

* Make the behavior of MultiControlNetModel to be the same to the original ControlNetModel

* Keep the code change minimum for nested list support

* Add fast test `test_inference_nested_image_input`

* Remove redundant check for nested image condition in `check_inputs`

Remove `len(image) == len(prompt)` check out of `check_image()`

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Better `ValueError` message for incompatible nested image list size

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Fix syntax error in `check_inputs`

* Remove warning message for multi-ControlNets with multiple prompts

* Fix a typo in test_controlnet.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* Add test case for multiple prompts, single image conditioning in `StableDiffusionMultiControlNetPipelineFastTests`

* Improved `ValueError` message for nested `controlnet_conditioning_scale`

* Documenting the behavior of image list as `StableDiffusionControlNetPipeline` input

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-01-16 10:40:55 -10:00
Sayak Paul
49a4b377c1 remove omegaconf from the residues 👋 (#6600)
remove omegaconf 👋
2024-01-16 21:55:41 +05:30
JuanCarlosPi
dff35a86e4 Change in ip-adapter docs. CLIPVisionModelWithProjection should be im… (#6597)
Change in ip-adapter docs. CLIPVisionModelWithProjection should be imported from transformers, not diffusers
2024-01-16 08:18:13 -08:00
Yondon Fu
8842bcadb9 [SVD] Return np.ndarray when output_type="np" (#6507)
[SVD] Fix output_type="np"
2024-01-16 14:51:36 +05:30
Steve Rhoades
181280baba Fixes training resuming: Advanced Dreambooth LoRa Training (#6566)
* Fixes #6418 Advanced Dreambooth LoRa Training

* change order of import to fix nit

* fix nit, use cast_training_params

* remove torch.compile fix, will move to a new PR

* remove unnecessary import
2024-01-16 14:30:49 +05:30
Charchit Sharma
53f498d2a4 Use of Posix to better support Windows compatibility in testing_utils (#6587)
* changes in utils

* removed loc
2024-01-16 10:27:29 +02:00
Charchit Sharma
990860911f change to posix for better Windows support for lora loaders (#6590)
* posix lora

* changes and style fix
2024-01-16 13:46:29 +05:30
Fabio Rigano
23eed39702 Fix path generation in IP Adapter (#6564)
* Fix path generation on Windows

* Update set_default_attn_processors

* Use pathlib

* Fix quality

* Fix copy

* Revert changes in set_default_attn_processors

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-01-16 11:25:22 +05:30
YiYi Xu
fefed44543 update slow test for SDXL k-diffusion pipeline (#6588)
update expected slice
2024-01-15 18:54:33 -10:00
Dong
814f56d2fe 🐛 fix ip-adapter controlnet img2img missing code (#6528)
* 🐛 fix ip-adapter controlnet img2img missing code

* 📝 edit test

* 📝 edit test

* 📝 run make style and quality

* 🎨 remove slow tests
2024-01-15 16:41:09 -10:00
SangKim
96d6e16550 Enable image resizing to adjust its height and width in StableDiffusionXLInstructPix2PixPipeline (#6581)
* Enable image resizing to adjust its height and width in StableDiffusionXLInstructPix2PixPipeline

* Ensure that validation is performed at every 'validation_step', not at every step
2024-01-16 07:50:34 +05:30
Aryan V S
c11de13588 [training] fix training resuming problem for fp16 (SD LoRA DreamBooth) (#6554)
* fix training resume

* update

* update
2024-01-16 07:27:06 +05:30
Patrick von Platen
357855f8fc [Docs] Fix controlnet indent (#6578) 2024-01-15 18:12:55 +02:00
Fabio Rigano
f825221b5d [Community Pipeline] IPAdapter FaceID (#6276)
* Add support for IPAdapter FaceID

* Add docs

* Move subfolder to kwargs

* Fix quality

* Fix image encoder loading

* Fix loading + add test

* Move to community folder

* Fix style

* Revert constant update

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-01-15 16:43:54 +02:00
Aryan V S
119d734f6e [AnimateDiff+Controlnet] Fix multicontrolnet support (#6551)
* fix multicontrolnet support

* update README with multicontrolnet example
2024-01-15 16:36:54 +02:00
Sayak Paul
cb4b3f0b78 [OmegaConf] replace it with yaml (#6488)
* remove omegaconf from convert_from_ckpt.

* remove from single_file.

* change to string based ubscription.

* style

* okay

* fix: vae_param

* no . indexing.

* style

* style

* turn getattrs into explicit if/else

* style

* propagate changes to ldm_uncond.

* propagate to gligen

* propagate to if.

* fix: quotes.

* propagate to audioldm.

* propagate to audioldm2

* propagate to musicldm.

* propagate to vq_diffusion

* propagate to zero123.

* remove omegaconf from diffusers codebase.
2024-01-15 20:02:10 +05:30
Haofan Wang
3d574b3bbe Fix a bug of flip in SDXL training script (#6547)
* Update train_text_to_image_sdxl.py

* Update train_text_to_image_lora_sdxl.py
2024-01-15 16:28:04 +02:00
Charchit Sharma
09903774d9 Make T2I Adapter SDXL Training Script torch.compile compatible (#6577)
update for t2i_adapter
2024-01-15 19:42:56 +05:30
dependabot[bot]
d6a70d8ba8 Bump jinja2 from 3.1.2 to 3.1.3 in /examples/research_projects/realfill (#6539)
Bumps [jinja2](https://github.com/pallets/jinja) from 3.1.2 to 3.1.3.
- [Release notes](https://github.com/pallets/jinja/releases)
- [Changelog](https://github.com/pallets/jinja/blob/main/CHANGES.rst)
- [Commits](https://github.com/pallets/jinja/compare/3.1.2...3.1.3)

---
updated-dependencies:
- dependency-name: jinja2
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-01-15 16:10:10 +02:00
Charchit Sharma
e3103e171f Make InstructPix2Pix SDXL Training Script torch.compile compatible (#6576)
* changes for pix2pix_sdxl

* style fix
2024-01-15 17:54:03 +05:30
Charchit Sharma
b053053ac9 Make InstructPix2Pix Training Script torch.compile compatible (#6558)
* added torch.compile for pix2pix

* required changes
2024-01-15 17:03:22 +05:30
Vinh H. Pham
08702fc1cb Make text-to-image SDXL LoRA Training Script torch.compile compatible (#6556)
make compile compatible
2024-01-15 16:58:16 +05:30
Vinh H. Pham
7ce89e979c Make text-to-image SD LoRA Training Script torch.compile compatible (#6555)
make compile compatible
2024-01-15 16:55:08 +05:30
gzguevara
05faf3263b SDXL text-to-image torch compatible (#6550)
* torch compatible

* code quality fix

* ruff style

* ruff format
2024-01-15 16:49:11 +05:30
Sayak Paul
a080f0d3a2 [Training Utils] create a utility for casting the lora params during training. (#6553)
create a utility for casting the lora params during training.
2024-01-15 13:51:13 +05:30
Sayak Paul
79df50388d [Training] fix training resuming problem when using FP16 (SDXL LoRA DreamBooth) (#6514)
* fix: training resume from fp16.

* add: comment

* remove residue from another branch.

* remove more residues.

* thanks to Younes; no hacks.

* style.

* clean things a bit and modularize _set_state_dict_into_text_encoder

* add comment about the fix detailed.
2024-01-12 17:11:06 +05:30
Vinh H. Pham
7d631825b0 Make Dreambooth SD Training Script torch.compile compatible (#6532)
* support compile

* make style

* move unwrap_model inside function

* change unwrap call

* run make style

* Update examples/dreambooth/train_dreambooth.py

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* Revert "Update examples/dreambooth/train_dreambooth.py"

This reverts commit 70ab09732e.

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-01-12 12:50:15 +05:30
gzguevara
33d2b5b087 SD text-to-image torch compile compatible (#6519)
* added unwrapper

* fiz typo
2024-01-12 09:28:35 +05:30
Suvaditya Mukherjee
f486d34b04 Make ControlNet SD Training Script torch.compile compatible (#6525)
* update: make controlnet script torch compile compatible

Signed-off-by: Suvaditya Mukherjee <suvadityamuk@gmail.com>

* update: correct earlier mistakes for compilation

Signed-off-by: Suvaditya Mukherjee <suvadityamuk@gmail.com>

* update: fix code style issues

Signed-off-by: Suvaditya Mukherjee <suvadityamuk@gmail.com>

---------

Signed-off-by: Suvaditya Mukherjee <suvadityamuk@gmail.com>
2024-01-12 09:27:26 +05:30
Charchit Sharma
e44b205e0b Make ControlNet SDXL Training Script torch.compile compatible (#6526)
* make torch.compile compatible

* fix quality
2024-01-12 09:25:09 +05:30
Vinh H. Pham
60cb44323d Make Dreambooth SD LoRA Training Script torch.compile compatible (#6534)
support compile
2024-01-12 09:24:03 +05:30
Radamés Ajna
1dd0ac9401 [DPO Training] pass tracker name as argument (#6542)
pass tracker name as argumentw
2024-01-12 09:15:39 +05:30
Yassine El Boudouri
c6b04589b6 Remove conversion to RGB (#6479)
* Remove conversion to RGB

* Add a Conversion Function

* Add type hint for convert_method

* Update src/diffusers/utils/loading_utils.py

Update docstring

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update docstring

* Optimize imports

* Optimize imports (2)

* Reformat code

---------

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-01-12 07:20:24 +05:30
Sayak Paul
5dc3471380 [SVD] support generators that are created on GPU (#6484)
* debug generator

* fix?

* fix?

* fix

* remove print.

* revert none check
2024-01-11 20:08:18 +05:30
Aryan V S
9df566e6da [Community] StyleAligned Pipeline (#6489)
* add stylealigned sdxl pipeline

* bugfix

* update docs

* remove einops dependency

* update README

* update example docstring
2024-01-11 14:35:55 +01:00
Sayak Paul
be0b425762 [Training] make checkpointing compatible when using torch.compile (part II) (#6511)
make checkpointing compatible when using torch.compile.
2024-01-11 18:37:30 +05:30
jquintanilla4
da843b3d53 .load_ip_adapter in StableDiffusionXLAdapterPipeline (#6246)
* Added testing notebook and .load_ip_adapter to XLAdapterPipeline

* Added annotations

* deleted testing notebook

* Update src/diffusers/pipelines/t2i_adapter/pipeline_stable_diffusion_xl_adapter.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

* code clean up

* Add feature_extractor and image_encoder to components

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-01-11 13:04:08 +05:30
dg845
17cece072a Fix bug in LCM Distillation Scripts when args.unet_time_cond_proj_dim is used (#6523)
* Fix bug where unet's time_cond_proj_dim is not set correctly if using args.unet_time_cond_proj_dim.

* make style
2024-01-11 08:21:07 +05:30
Steven Liu
a551ddf928 [docs] mask_blur and padding_mask_crop (#6498)
new inpaint features
2024-01-10 08:14:34 -08:00
Steven Liu
1d57892980 [docs] Callbacks (#6471)
edits
2024-01-10 08:14:07 -08:00
antoine-scenario
3e8b63216e Add IP-Adapter to StableDiffusionXLControlNetImg2ImgPipeline (#6293)
* add IP-Adapter to StableDiffusionXLControlNetImg2ImgPipeline

Update src/diffusers/pipelines/controlnet/pipeline_controlnet_sd_xl_img2img.py

Co-authored-by: YiYi Xu <yixu310@gmail.com>

fix tests

* fix failing test

---------

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-01-09 22:02:11 -10:00
YiYi Xu
dd4459ad79 [Refactor] splitingResnetBlock2D into multiple blocks (#6166)
---------

Co-authored-by: yiyixuxu <yixu310@gmail,com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-01-09 21:38:05 -10:00
YiYi Xu
6313645b6b add StableDiffusionXLKDiffusionPipeline (#6447)
---------

Co-authored-by: yiyixuxu <yixu310@gmail,com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2024-01-09 16:29:01 -10:00
Rahul Raman
2d1f2182cc example: Train Instruct pix2 pix with lora implementation (#6469)
* base template file - train_instruct_pix2pix.py

* additional import and parser argument requried for lora

* finetune only instructpix2pix model -- no need to include these layers

* inject lora layers

* freeze unet model -- only lora layers are trained

* training modifications to train only lora parameters

* store only lora parameters

* move train script to research project

* run quality and style code checks

* move train script to a new folder

* add README

* update README

* update references in README

---------

Co-authored-by: Rahul Raman <rahulraman@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-01-10 06:38:19 +05:30
Steven Liu
3be7c96e28 [docs] Stable video diffusion (#6472)
svd
2024-01-09 09:21:58 -08:00
Steven Liu
3c79dd9dbe [docs] PEFT adapter API (#6499)
follow up
2024-01-09 08:09:15 -08:00
Steven Liu
9d767916da [docs] Fast diffusion (#6470)
* edits

* fix

* feedback
2024-01-09 08:08:31 -08:00
Patrick von Platen
ae7cd5ad4c Link issue template to discussions 2024-01-09 17:00:15 +01:00
Sayak Paul
4497b3ec98 [Training] make DreamBooth SDXL LoRA training script compatible with torch.compile (#6483)
* make it torch.compile comaptible

* make the text encoder compatible too.

* style
2024-01-09 20:11:26 +05:30
Yifan Zhou
fc63ebdd3a [Community Pipeline] Rerender-A-Video: Zero-Shot Video-to-Video Translation (#6332)
* upload codes and doc

* lint

* lint

* lint

* update code

* remove blank lines

* Fix load url
2024-01-09 14:55:34 +01:00
Sayak Paul
8b6fae4ce5 [SVD] fix: vae type (#6475)
fix: vae type
2024-01-09 14:50:02 +01:00
jiqing-feng
aa1797e109 enable stable-xl textual inversion (#6421)
* enable stable-xl textual inversion

* check if optimizer_2 exists

* check text_encoder_2 before using

* add textual inversion for sdxl in a single file

* fix style

* fix example style

* reset for error changes

* add readme for sdxl

* fix style

* disable autocast as it will cause cast error when weight_dtype=bf16

* fix spelling error

* fix style and readme and 8bit optimizer

* add README_sdxl.md link

* add tracker key on log_validation

* run style

* rm the second center crop

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-01-09 15:12:33 +05:30
Patrick von Platen
5bacc2f5af [SAG] Support more schedulers, add better error message and make tests faster (#6465)
* finish

* finish

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-01-09 09:24:38 +05:30
Yasuna
6ae7e8112a [Docs] update: tutorials ja | INDEX.md, TUTORIAL_OVERVIEW.md, TOCTREE.yml (#6338)
* add tutorials to toctree.yml

* fix title

* fix words

* add overview ja

* fix diffusion to 拡散

* fix line 21

* add space

* delete supported pipline

* fix tutorial_overview.md

* fix space

* fix typo

* Delete docs/source/ja/tutorials/using_peft_for_inference.md

this file is not translated

* Delete docs/source/ja/tutorials/basic_training.md

this file is not translated

* Delete docs/source/ja/tutorials/autopipeline.md

this file is not translated

* fix toctree
2024-01-08 09:06:46 -08:00
Sayak Paul
e0f349c2b0 correct reviewers in PR template (#6485) 2024-01-08 19:12:04 +05:30
Sayak Paul
774f5c4581 minor changes to the SVD doc (#6466)
minor changes
2024-01-06 08:40:46 +05:30
Sayak Paul
a483a8eddf Rename REAMDE.md to README.md 2024-01-05 20:49:44 +05:30
Lucain
9a9daee724 Fix offline mode import (#6467) 2024-01-05 15:34:40 +01:00
Sayak Paul
585f941366 [Core] introduce PeftAdapterMixin module. (#6416)
* introduce integrations module.

* remove duplicate methods.

* better imports.

* move to loaders.py

* remove peftadaptermixin from modelmixin.

* add: peftadaptermixin selectively.

* add: entry to _toctree

* Empty-Commit
2024-01-05 18:18:28 +05:30
Dhruv Nair
86a26761ac Correctly handle creating model index json files when setting compiled modules in pipelines. (#6436)
update
2024-01-05 18:02:09 +05:30
Liang Hou
6ef2b8a92f Fix amused paper link (#6462) 2024-01-05 13:12:09 +01:00
Vinh H. Pham
3848606c7e [Community Pipeline] Add gluegen (#6433)
* init works

* add gluegen pipeline

* add gluegen code

* add another way to load language adapter

* make style

* Update README.md

* change doc
2024-01-05 13:05:40 +01:00
Sayak Paul
2a97067b84 [Experimental] Diffusion LoRA DPO training (#6422)
* add: experimental script for diffusion dpo training.

* random_crop cli.

* fix: caption tokenization.

* fix: pixel_values index.

* fix: grad?

* debug

* fix: reduction.

* fixes in the loss calculation.

* style

* fix: unwrap call.

* fix: validation inference.

* add: initial sdxl script

* debug

* make sure images in the tuple are of same res

* fix model_max_length

* report print

* boom

* fix: numerical issues.

* fix: resolution

* comment about resize.

* change the order of the training transformation.

* save call.

* debug

* remove print

* manually detaching necessary?

* use the same vae for validation.

* add: readme.
2024-01-05 16:40:06 +05:30
Sayak Paul
ae060fc4f1 [feat] introduce unload_lora(). (#6451)
* introduce unload_lora.

* fix-copies
2024-01-05 16:22:11 +05:30
Sayak Paul
9d945b2b90 0.25.0 post release (#6358)
* post release

* style

---------

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2024-01-05 16:13:27 +05:30
Junsheng121
d184291c7d null-text-inversion-pipeline-implementation (#6329)
* null-text-inversion-implementation

* edited

* edited

* edited

* edited

* edited

* edit

* makestyle

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-01-05 11:35:21 +01:00
Sayak Paul
0a0bb526aa [LoRA depcrecation] LoRA depcrecation trilogy (#6450)
* edebug

* debug

* more debug

* more more debug

* remove tests for LoRAAttnProcessors.

* rename
2024-01-05 15:48:20 +05:30
Linoy Tsaban
2fada8dc1b [bug fix] fixes #6444 - checkpointing save issue in advanced dreambooth lora sdxl script (#6464)
* unwrap text encoder when saving hook only for full text encoder tuning

* unwrap text encoder when saving hook only for full text encoder tuning

* save embeddings in each checkpoint as well

* save embeddings in each checkpoint as well

* save embeddings in each checkpoint as well

* Update examples/advanced_diffusion_training/train_dreambooth_lora_sdxl_advanced.py

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-01-05 15:35:24 +05:30
jiqing-feng
f2d51a28f7 Intel Gen 4 Xeon and later support bf16 (#6367)
* Intel Gen 4 Xeon and later support bf16

* fix bf16 notes
2024-01-05 11:47:28 +05:30
Horseee
811fd06292 [Doc] Add DeepCache in section optimization/General optimizations (#6390)
* add documentation for DeepCache

* fix typo

* add wandb url for DeepCache

* fix some typos

* add item in _toctree.yml

* update formats for arguments

* Update deepcache.md

* Update docs/source/en/optimization/deepcache.md

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* add StableDiffusionXLPipeline in doc

* Separate SDPipeline and SDXLPipeline

* Add the paper link of ablation experiments for hyper-parameters

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-01-05 09:57:08 +05:30
dg845
f3d1333e02 Improve LCM(-LoRA) Distillation Scripts (#6420)
* Make WDS pipeline interpolation type configurable.

* Make the VAE encoding batch size configurable.

* Make lora_alpha and lora_dropout configurable for LCM LoRA scripts.

* Generalize scalings_for_boundary_conditions function and make the timestep scaling configurable.

* Make LoRA target modules configurable for LCM-LoRA scripts.

* Move resolve_interpolation_mode to src/diffusers/training_utils.py and make interpolation type configurable in non-WDS script.

* apply suggestions from review
2024-01-05 06:55:13 +05:30
Steven Liu
acd926f4f2 [docs] Fix local links (#6440)
fix local links
2024-01-04 09:59:11 -08:00
Lucain
691d8d3e15 Respect offline mode when loading pipeline (#6456)
* Respect offline mode when loading model

* default to local entry if connectionerror
2024-01-04 17:05:55 +01:00
Sayak Paul
e7c0af5e71 add: amused paper link. (#6453) 2024-01-04 13:44:54 +05:30
Sayak Paul
107e02160a [LoRA tests] fix stuff related to assertions arising from the recent changes. (#6448)
* debug

* debug test_with_different_scales_fusion_equivalence

* use the right method.

* place it right.

* let's see.

* let's see again

* alright then.

* add a comment.
2024-01-04 12:55:15 +05:30
sayakpaul
6dbef45e6e Revert "debug"
This reverts commit 7715e6c31c.
2024-01-04 10:39:38 +05:30
sayakpaul
7715e6c31c debug 2024-01-04 10:39:00 +05:30
sayakpaul
05b3d36a25 Revert "debug"
This reverts commit fb4aec0ce3.
2024-01-04 10:38:04 +05:30
sayakpaul
fb4aec0ce3 debug 2024-01-04 10:37:28 +05:30
Sayak Paul
63de23e3db disable running peft non-peft lora test in the peft env. (#6437)
* disable running peft non-peft lora test in the peft env.

* Empty-Commit
2024-01-04 10:18:46 +05:30
Chi
2993257f2a Batter way to write binarize() function. (#6394)
* I added a new doc string to the class. This is more flexible to understanding other developers what are doing and where it's using.

* Update src/diffusers/models/unet_2d_blocks.py

This changes suggest by maintener.

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* Update src/diffusers/models/unet_2d_blocks.py

Add suggested text

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* Update unet_2d_blocks.py

I changed the Parameter to Args text.

* Update unet_2d_blocks.py

proper indentation set in this file.

* Update unet_2d_blocks.py

a little bit of change in the act_fun argument line.

* I run the black command to reformat style in the code

* Update unet_2d_blocks.py

similar doc-string add to have in the original diffusion repository.

* Batter way to write binarize function

* Solve check_code_quality error

* My mistake to run pull request but not reformated file

* Update image_processor.py

* remove extra variable and space

* Update image_processor.py

* Run ruff libarary to reformat my file

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
2024-01-04 09:32:08 +05:30
Sayak Paul
aad18faa3e Update README_sdxl.md to update the LR (#6432)
Update README_sdxl.md
2024-01-03 20:55:51 +05:30
Sayak Paul
d700140076 [LoRA deprecation] handle rest of the stuff related to deprecated lora stuff. (#6426)
* handle rest of the stuff related to deprecated lora stuff.

* fix: copies

* don't modify the uNet in-place.

* fix: temporal autoencoder.

* manually remove lora layers.

* don't copy unet.

* alright

* remove lora attn processors from unet3d

* fix: unet3d.

* styl

* Empty-Commit
2024-01-03 20:54:09 +05:30
Sayak Paul
2e4dc3e25d [LoRA] add: test to check if peft loras are loadable in non-peft envs. (#6400)
* add: test to check if peft loras are loadable in non-peft envs.

* add torch_device approrpiately.

* fix: get_dummy_inputs().

* test logits.

* rename

* debug

* debug

* fix: generator

* new assertion values after fixing the seed.

* shape

* remove print statements and settle this.

* to update values.

* change values when lora config is initialized under a fixed seed.

* update colab link

* update notebook link

* sanity restored by getting the exact same values without peft.
2024-01-03 09:57:49 +05:30
YiYi Xu
3e2961f0b4 [doc] update inpaint doc to use apply_overlay (#6364)
add doc

Co-authored-by: yiyixuxu <yixu310@gmail,com>
2024-01-02 11:16:36 -10:00
Vinh H. Pham
79c380bc80 Correct how apply_overlay read crop_coords (#6417)
correct reading variables
2024-01-02 19:31:12 +01:00
Aryan V S
e30b661437 Update lpw_xl pipeline to latest diffusers (#6411)
* add clip_skip, freeu, qkv

* fix

* add ip-adapter support

* callback on step end

* update

* fix NoneType bug

* fix

* add guidance scale embedding

* add textual inversion
2024-01-02 16:28:45 +01:00
Linoy Tsaban
b4077af212 [bug fix] using snr gamma and prior preservation loss in the dreambooth lora sdxl training scripts (#6356)
* change timesteps used to calculate snr when --with_prior_preservation is enabled

* change timesteps used to calculate snr when --with_prior_preservation is enabled (canonical script)

* style

* revert canonical script to before snr gamma change

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-01-02 09:21:39 -06:00
Daniel Socek
9f2bff502e [svd] fix noise_aug_strength type in svd pipe (#6389) 2024-01-02 14:45:07 +01:00
CyrusVorwald
0cb92717f9 add StableDiffusionXLControlNetInpaintPipeline to auto pipeline (#6302)
* add StableDiffusionXLControlNetInpaintPipeline to auto pipeline

* fixed style
2024-01-02 14:44:48 +01:00
Fabio Rigano
86714b72d0 Add unload_ip_adapter method (#6192)
* Add unload_ip_adapter method

* Update attn_processors with original layers

* Add test

* Use set_default_attn_processor

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-01-02 14:40:46 +01:00
Sayak Paul
61f6c5472a [LoRA] Remove the use of depcrecated loRA functionalities such as LoRAAttnProcessor (#6369)
* start deprecating loraattn.

* fix

* wrap into unet_lora_state_dict

* utilize text_encoder_lora_params

* utilize text_encoder_attn_modules

* debug

* debug

* remove print

* don't use text encoder for test_stable_diffusion_lora

* load the procs.

* set_default_attn_processor

* fix: set_default_attn_processor call.

* fix: lora_components[unet_lora_params]

* checking for 3d.

* 3d.

* more fixes.

* debug

* debug

* debug

* debug

* more debug

* more debug

* more debug

* more debug

* more debug

* more debug

* hack.

* remove comments and prep for a PR.

* appropriate set_lora_weights()

* fix

* fix: test_unload_lora_sd

* fix: test_unload_lora_sd

* use dfault attebtion processors.

* debu

* debug nan

* debug nan

* debug nan

* use NaN instead of inf

* remove comments.

* fix: test_text_encoder_lora_state_dict_unchanged

* attention processor default

* default attention processors.

* default

* style
2024-01-02 18:14:04 +05:30
lookas
17546020fc Fix #6409 (#6410)
* Update value_guided_sampling.py

Fix #6409

* Comply code style

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-01-02 11:16:21 +05:30
2510
8a366b835c Fix gradient-checkpointing option is ignored in SDXL+LoRA training. (#6388) (#6402)
* Fix gradient-checkpointing option is ignored in SDXL+LoRA training. (#6388)

* Fix gradient-checkpointing option is ignored in SD+LoRA training.

* Fix gradient checkpoint is not applied to text encoders. (SDXL+LoRA)

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-01-01 08:51:04 +05:30
Sayak Paul
61d223c884 add: CUDA graph details. (#6408) 2023-12-31 13:43:26 +05:30
apolinário
bf725e044e Add new WebUI conversion state_dict_utils to __init__ utils (#6404)
* Add new state_dict_utils to __init__ utils

* style

---------

Co-authored-by: multimodalart <joaopaulo.passos+multimodal@gmail.com>
2023-12-30 09:31:39 -06:00
apolinário
1622265e13 Add WebUI format support to Advanced Training Script (#6403)
* Add WebUI format support to Advanced Training Script

* style

---------

Co-authored-by: multimodalart <joaopaulo.passos+multimodal@gmail.com>
2023-12-30 08:45:49 -06:00
apolinário
0b63ad5ad5 Create convert_diffusers_sdxl_lora_to_webui.py (#6395)
* Create convert_diffusers_sdxl_lora_to_webui.py

* Move some conversion logic to utils

* fix logging import

* Add usage example

---------

Co-authored-by: multimodalart <joaopaulo.passos+multimodal@gmail.com>
2023-12-30 08:15:11 -06:00
Sayak Paul
6a376ceea2 [LoRA] remove unnecessary components from lora peft test suite (#6401)
remove unnecessary components from lora peft suite/
2023-12-30 18:25:40 +05:30
gzguevara
9f283b01d2 changed w&b report link (#6387) 2023-12-29 19:49:11 +05:30
Sayak Paul
203724e9d9 [Docs] add note on fp16 in fast diffusion (#6380)
add note on fp16
2023-12-29 09:38:50 +05:30
gzguevara
e7044a4221 multi-subject-dreambooth-inpainting with 🤗 datasets (#6378)
* files added

* fixing code quality

* fixing code quality

* fixing code quality

* fixing code quality

* sorted import block

* seperated import wandb

* ruff on script

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2023-12-29 09:33:49 +05:30
Sayak Paul
034b39b8cb [docs] add details concerning diffusers-specific bits. (#6375)
add details concerning diffusers-specific bits.
2023-12-28 23:12:49 +05:30
Sayak Paul
2db73f4a50 remove delete documentation trigger workflows. (#6373) 2023-12-28 18:26:14 +05:30
Adrian Punga
84d7faebe4 Fix support for MPS in KDPM2AncestralDiscreteScheduler (#6365)
Fix support for MPS

MPS doesn't support float64
2023-12-28 10:22:02 +01:00
YiYi Xu
4c483deb90 [refactor embeddings] gligen + ip-adapter (#6244)
* refactor ip-adapter-imageproj, gligen

---------

Co-authored-by: yiyixuxu <yixu310@gmail,com>
2023-12-27 18:48:42 -10:00
Sayak Paul
1ac07d8a8d [Training examples] Follow up of #6306 (#6346)
* add to dreambooth lora.

* add: t2i lora.

* add: sdxl t2i lora.

* style

* lcm lora sdxl.

* unwrap

* fix: enable_adapters().
2023-12-28 07:37:50 +05:30
apolinário
1fff527702 Fix keys for lora format on advanced training scripts (#6361)
fix keys for lora format on advanced training scripts
2023-12-27 11:38:03 -06:00
apolinário
645a62bf3b Add PEFT to advanced training script (#6294)
* Fix ProdigyOPT in SDXL Dreambooth script

* style

* style

* Add PEFT to Advanced Training Script

* style

* style

*  style 

* change order for logic operation

* add lora alpha

* style

* Align PEFT to new format

* Update train_dreambooth_lora_sdxl_advanced.py

Apply #6355 fix

---------

Co-authored-by: multimodalart <joaopaulo.passos+multimodal@gmail.com>
2023-12-27 10:00:32 -03:00
Dhruv Nair
6414d4e4f9 Fix chunking in SVD (#6350)
fix
2023-12-27 13:07:41 +01:00
Andy W
43672b4a22 Fix "push_to_hub only create repo in consistency model lora SDXL training script" (#6102)
* fix

* style fix

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2023-12-27 15:25:19 +05:30
dg845
9df3d84382 Fix LCM distillation bug when creating the guidance scale embeddings using multiple GPUs. (#6279)
Fix bug when creating the guidance embeddings using multiple GPUs.

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2023-12-27 14:25:21 +05:30
Jianqi Pan
c751449011 fix: use retrieve_latents (#6337) 2023-12-27 10:44:26 +05:30
Dhruv Nair
c1e8bdf1d4 Move ControlNetXS into Community Folder (#6316)
* update

* update

* update

* update

* update

* make style

* remove docs

* update

* move to research folder.

* fix-copies

* remove _toctree entry.

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2023-12-27 08:15:23 +05:30
Sayak Paul
78b87dc25a [LoRA] make LoRAs trained with peft loadable when peft isn't installed (#6306)
* spit diffusers-native format from the get go.

* rejig the peft_to_diffusers mapping.
2023-12-27 08:01:10 +05:30
Will Berman
0af12f1f8a amused update links to new repo (#6344)
* amused update links to new repo

* lint
2023-12-26 22:46:28 +01:00
Justin Ruan
6e123688dc Remove unused parameters and fixed FutureWarning (#6317)
* Remove unused parameters and fixed `FutureWarning`

* Fixed wrong config instance

* update unittest for `DDIMInverseScheduler`
2023-12-26 22:09:10 +01:00
YiYi Xu
f0a588b8e2 adding auto1111 features to inpainting pipeline (#6072)
* add inpaint_full_res

* fix

* update

* move get_crop_region to image processor

* Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_inpaint.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* move apply_overlay to image processor

---------

Co-authored-by: yiyixuxu <yixu310@gmail,com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2023-12-26 10:20:29 -10:00
priprapre
fa31704420 [SDXL-IP2P] Update README_sdxl, Replace the link for wandb log with the correct run (#6270)
Replace the link for wandb log with the correct run
2023-12-26 21:13:11 +01:00
Sayak Paul
9d79991da0 [Docs] fix: video rendering on svd. (#6330)
fix: video rendering on svd.
2023-12-26 21:05:22 +01:00
Will Berman
7d865ac9c6 amused other pipelines docs (#6343)
other pipelines
2023-12-26 20:20:32 +01:00
Dhruv Nair
fb02316db8 Add AnimateDiff conversion scripts (#6340)
* add scripts

* update
2023-12-26 22:40:00 +05:30
Dhruv Nair
98a2b3d2d8 Update Animatediff docs (#6341)
* update

* update

* update
2023-12-26 22:39:46 +05:30
Dhruv Nair
2026ec0a02 Interruptable Pipelines (#5867)
* add interruptable pipelines

* add tests

* updatemsmq

* add interrupt property

* make fix copies

* Revert "make fix copies"

This reverts commit 914b35332b.

* add docs

* add tutorial

* Update docs/source/en/tutorials/interrupting_diffusion_process.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/tutorials/interrupting_diffusion_process.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* update

* fix quality issues

* fix

* update

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2023-12-26 22:39:26 +05:30
dg845
3706aa3305 Add rescale_betas_zero_snr Argument to DDPMScheduler (#6305)
* Add rescale_betas_zero_snr argument to DDPMScheduler.

* Propagate rescale_betas_zero_snr changes to DDPMParallelScheduler.

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2023-12-26 17:54:30 +01:00
Sayak Paul
d4f10ea362 [Diffusion fast] add doc for diffusion fast (#6311)
* add doc for diffusion fast

* add entry to _toctree

* Apply suggestions from code review

* fix titlew

* fix: title entry

* add note about fuse_qkv_projections
2023-12-26 22:19:55 +05:30
Younes Belkada
3aba99af8f [Peft / Lora] Add adapter_names in fuse_lora (#5823)
* add adapter_name in fuse

* add tesrt

* up

* fix CI

* adapt from suggestion

* Update src/diffusers/utils/testing_utils.py

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

* change to `require_peft_version_greater`

* change variable names in test

* Update src/diffusers/loaders/lora.py

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

* break into 2 lines

* final comments

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
2023-12-26 16:54:47 +01:00
Sayak Paul
6683f97959 [Training] Add datasets version of LCM LoRA SDXL (#5778)
* add: script to train lcm lora for sdxl with 🤗 datasets

* suit up the args.

* remove comments.

* fix num_update_steps

* fix batch unmarshalling

* fix num_update_steps_per_epoch

* fix; dataloading.

* fix microconditions.

* unconditional predictions debug

* fix batch size.

* no need to use use_auth_token

* Apply suggestions from code review

Co-authored-by: Suraj Patil <surajp815@gmail.com>

* make vae encoding batch size an arg

* final serialization in kohya

* style

* state dict rejigging

* feat: no separate teacher unet.

* debug

* fix state dict serialization

* debug

* debug

* debug

* remove prints.

* remove kohya utility and make style

* fix serialization

* fix

* add test

* add peft dependency.

* add: peft

* remove peft

* autocast device determination from accelerator

* autocast

* reduce lora rank.

* remove unneeded space

* Apply suggestions from code review

Co-authored-by: Suraj Patil <surajp815@gmail.com>

* style

* remove prompt dropout.

* also save in native diffusers ckpt format.

* debug

* debug

* debug

* better formation of the null embeddings.

* remove space.

* autocast fixes.

* autocast fix.

* hacky

* remove lora_sayak

* Apply suggestions from code review

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* style

* make log validation leaner.

* move back enabled in.

* fix: log_validation call.

* add: checkpointing tests

* taking my chances to see if disabling autocasting has any effect?

* start debugging

* name

* name

* name

* more debug

* more debug

* index

* remove index.

* print length

* print length

* print length

* move unet.train() after add_adapter()

* disable some prints.

* enable_adapters() manually.

* remove prints.

* some changes.

* fix params_to_optimize

* more fixes

* debug

* debug

* remove print

* disable grad for certain contexts.

* Add support for IPAdapterFull (#5911)

* Add support for IPAdapterFull


Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Fix a bug in `add_noise` function  (#6085)

* fix

* copies

---------

Co-authored-by: yiyixuxu <yixu310@gmail,com>

* [Advanced Diffusion Script] Add Widget default text (#6100)

add widget

* [Advanced Training Script] Fix pipe example (#6106)

* IP-Adapter for StableDiffusionControlNetImg2ImgPipeline (#5901)

* adapter for StableDiffusionControlNetImg2ImgPipeline

* fix-copies

* fix-copies

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* IP adapter support for most pipelines (#5900)

* support ip-adapter in src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_upscale.py

* support ip-adapter in src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_attend_and_excite.py

* support ip-adapter in src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_instruct_pix2pix.py

* update tests

* support ip-adapter in src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_panorama.py

* support ip-adapter in src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_sag.py

* support ip-adapter in src/diffusers/pipelines/stable_diffusion_safe/pipeline_stable_diffusion_safe.py

* support ip-adapter in src/diffusers/pipelines/latent_consistency_models/pipeline_latent_consistency_text2img.py

* support ip-adapter in src/diffusers/pipelines/latent_consistency_models/pipeline_latent_consistency_img2img.py

* support ip-adapter in src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_ldm3d.py

* revert changes to sd_attend_and_excite and sd_upscale

* make style

* fix broken tests

* update ip-adapter implementation to latest

* apply suggestions from review

---------

Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* fix: lora_alpha

* make vae casting conditional/

* param upcasting

* propagate comments from https://github.com/huggingface/diffusers/pull/6145

Co-authored-by: dg845 <dgu8957@gmail.com>

* [Peft] fix saving / loading when unet is not "unet" (#6046)

* [Peft] fix saving / loading when unet is not "unet"

* Update src/diffusers/loaders/lora.py

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* undo stablediffusion-xl changes

* use unet_name to get unet for lora helpers

* use unet_name

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* [Wuerstchen] fix fp16 training and correct lora args (#6245)

fix fp16 training

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* [docs] fix: animatediff docs (#6339)

fix: animatediff docs

* add: note about the new script in readme_sdxl.

* Revert "[Peft] fix saving / loading when unet is not "unet" (#6046)"

This reverts commit 4c7e983bb5.

* Revert "[Wuerstchen] fix fp16 training and correct lora args (#6245)"

This reverts commit 0bb9cf0216.

* Revert "[docs] fix: animatediff docs (#6339)"

This reverts commit 11659a6f74.

* remove tokenize_prompt().

* assistive comments around enable_adapters() and diable_adapters().

---------

Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Fabio Rigano <57982783+fabiorigano@users.noreply.github.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: yiyixuxu <yixu310@gmail,com>
Co-authored-by: apolinário <joaopaulo.passos@gmail.com>
Co-authored-by: Charchit Sharma <charchitsharma11@gmail.com>
Co-authored-by: Aryan V S <contact.aryanvs@gmail.com>
Co-authored-by: dg845 <dgu8957@gmail.com>
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
2023-12-26 21:22:05 +05:30
Sayak Paul
4e7b0cb396 [docs] fix: animatediff docs (#6339)
fix: animatediff docs
2023-12-26 19:13:49 +05:30
Kashif Rasul
35b81fffae [Wuerstchen] fix fp16 training and correct lora args (#6245)
fix fp16 training

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2023-12-26 11:40:04 +01:00
Kashif Rasul
e0d8c910e9 [Peft] fix saving / loading when unet is not "unet" (#6046)
* [Peft] fix saving / loading when unet is not "unet"

* Update src/diffusers/loaders/lora.py

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* undo stablediffusion-xl changes

* use unet_name to get unet for lora helpers

* use unet_name

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2023-12-26 11:39:28 +01:00
dg845
a3d31e3a3e Change LCM-LoRA README Script Example Learning Rates to 1e-4 (#6304)
Change README LCM-LoRA example learning rates to 1e-4.
2023-12-25 21:29:20 +05:30
Jianqi Pan
84c403aedb fix: cannot set guidance_scale (#6326)
fix: set guidance_scale
2023-12-25 21:16:57 +05:30
Sayak Paul
f4b0b26f7e [Tests] Speed up example tests (#6319)
* remove validation args from textual onverson tests

* reduce number of train steps in textual inversion tests

* fix: directories.

* debig

* fix: directories.

* remove validation tests from textual onversion

* try reducing the time of test_text_to_image_checkpointing_use_ema

* fix: directories

* speed up test_text_to_image_checkpointing

* speed up test_text_to_image_checkpointing_checkpoints_total_limit_removes_multiple_checkpoints

* fix

* speed up test_instruct_pix2pix_checkpointing_checkpoints_total_limit_removes_multiple_checkpoints

* set checkpoints_total_limit to 2.

* test_text_to_image_lora_checkpointing_checkpoints_total_limit_removes_multiple_checkpoints speed up

* speed up test_unconditional_checkpointing_checkpoints_total_limit_removes_multiple_checkpoints

* debug

* fix: directories.

* speed up test_instruct_pix2pix_checkpointing_checkpoints_total_limit

* speed up: test_controlnet_checkpointing_checkpoints_total_limit_removes_multiple_checkpoints

* speed up test_controlnet_sdxl

* speed up dreambooth tests

* speed up test_dreambooth_lora_checkpointing_checkpoints_total_limit_removes_multiple_checkpoints

* speed up test_custom_diffusion_checkpointing_checkpoints_total_limit_removes_multiple_checkpoints

* speed up test_text_to_image_lora_sdxl_text_encoder_checkpointing_checkpoints_total_limit

* speed up # checkpoint-2 should have been deleted

* speed up examples/text_to_image/test_text_to_image.py::TextToImage::test_text_to_image_checkpointing_checkpoints_total_limit

* additional speed ups

* style
2023-12-25 19:50:48 +05:30
Sayak Paul
89459a5d56 fix: lora peft dummy components (#6308)
* fix: lora peft dummy components

* fix: dummy components
2023-12-25 11:26:45 +05:30
Sayak Paul
008d9818a2 fix: t2i apdater paper link (#6314) 2023-12-25 10:45:14 +05:30
mwkldeveloper
2d43094ffc fix RuntimeError: Input type (float) and bias type (c10::Half) should be the same in train_text_to_image_lora.py (#6259)
* fix RuntimeError: Input type (float) and bias type (c10::Half) should be the same

* format source code

* format code

* remove the autocast blocks within the pipeline

* add autocast blocks to pipeline caller in train_text_to_image_lora.py
2023-12-24 14:34:35 +05:30
Celestial Phineas
7c05b975b7 Fix typos in the ValueError for a nested image list as StableDiffusionControlNetPipeline input. (#6286)
Fixed typos in the `ValueError` for a nested image list as input.
2023-12-24 14:32:24 +05:30
Dhruv Nair
fe574c8b29 LoRA Unfusion test fix (#6291)
update

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2023-12-24 14:31:48 +05:30
Sayak Paul
90b9479903 [LoRA PEFT] fix LoRA loading so that correct alphas are parsed (#6225)
* initialize alpha too.

* add: test

* remove config parsing

* store rank

* debug

* remove faulty test
2023-12-24 09:59:41 +05:30
apolinário
df76a39e1b Fix Prodigy optimizer in SDXL Dreambooth script (#6290)
* Fix ProdigyOPT in SDXL Dreambooth script

* style

* style
2023-12-22 06:42:04 -06:00
Bingxin Ke
3369bc810a [Community Pipeline] Add Marigold Monocular Depth Estimation (#6249)
* [Community Pipeline] Add Marigold Monocular Depth Estimation

- add single-file pipeline
- update README

* fix format - add one blank line

* format script with ruff

* use direct image link in example code

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2023-12-22 15:41:46 +05:30
Pedro Cuenca
7fe47596af Allow diffusers to load with Flax, w/o PyTorch (#6272) 2023-12-22 09:37:30 +01:00
Dhruv Nair
59d1caa238 Remove peft tests from old lora backend tests (#6273)
update
2023-12-22 13:35:52 +05:30
Dhruv Nair
c022e52923 Remove ONNX inpaint legacy (#6269)
update

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2023-12-22 13:35:21 +05:30
Will Berman
4039815276 open muse (#5437)
amused

rename

Update docs/source/en/api/pipelines/amused.md

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

AdaLayerNormContinuous default values

custom micro conditioning

micro conditioning docs

put lookup from codebook in constructor

fix conversion script

remove manual fused flash attn kernel

add training script

temp remove training script

add dummy gradient checkpointing func

clarify temperatures is an instance variable by setting it

remove additional SkipFF block args

hardcode norm args

rename tests folder

fix paths and samples

fix tests

add training script

training readme

lora saving and loading

non-lora saving/loading

some readme fixes

guards

Update docs/source/en/api/pipelines/amused.md

Co-authored-by: Suraj Patil <surajp815@gmail.com>

Update examples/amused/README.md

Co-authored-by: Suraj Patil <surajp815@gmail.com>

Update examples/amused/train_amused.py

Co-authored-by: Suraj Patil <surajp815@gmail.com>

vae upcasting

add fp16 integration tests

use tuple for micro cond

copyrights

remove casts

delegate to torch.nn.LayerNorm

move temperature to pipeline call

upsampling/downsampling changes
2023-12-21 11:40:55 -08:00
Sayak Paul
5b186b7128 [Refactor] move ldm3d out of stable_diffusion. (#6263)
ldm3d.
2023-12-21 18:59:55 +05:30
411 changed files with 3318 additions and 56577 deletions

View File

@@ -238,13 +238,12 @@ jobs:
run_flax_tpu_tests:
name: Nightly Flax TPU Tests
runs-on:
group: gcp-ct5lp-hightpu-8t
runs-on: docker-tpu
if: github.event_name == 'schedule'
container:
image: diffusers/diffusers-flax-tpu
options: --shm-size "16gb" --ipc host --privileged ${{ vars.V5_LITEPOD_8_ENV}} -v /mnt/hf_cache:/mnt/hf_cache
options: --shm-size "16gb" --ipc host -v /mnt/hf_cache:/mnt/cache/ --privileged
defaults:
run:
shell: bash
@@ -348,66 +347,6 @@ jobs:
pip install slack_sdk tabulate
python utils/log_reports.py >> $GITHUB_STEP_SUMMARY
run_nightly_quantization_tests:
name: Torch quantization nightly tests
strategy:
fail-fast: false
max-parallel: 2
matrix:
config:
- backend: "bitsandbytes"
test_location: "bnb"
- backend: "gguf"
test_location: "gguf"
runs-on:
group: aws-g6e-xlarge-plus
container:
image: diffusers/diffusers-pytorch-cuda
options: --shm-size "20gb" --ipc host --gpus 0
steps:
- name: Checkout diffusers
uses: actions/checkout@v3
with:
fetch-depth: 2
- name: NVIDIA-SMI
run: nvidia-smi
- name: Install dependencies
run: |
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
python -m uv pip install -e [quality,test]
python -m uv pip install -U ${{ matrix.config.backend }}
python -m uv pip install pytest-reportlog
- name: Environment
run: |
python utils/print_env.py
- name: ${{ matrix.config.backend }} quantization tests on GPU
env:
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
# https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
CUBLAS_WORKSPACE_CONFIG: :16:8
BIG_GPU_MEMORY: 40
run: |
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
--make-reports=tests_${{ matrix.config.backend }}_torch_cuda \
--report-log=tests_${{ matrix.config.backend }}_torch_cuda.log \
tests/quantization/${{ matrix.config.test_location }}
- name: Failure short reports
if: ${{ failure() }}
run: |
cat reports/tests_${{ matrix.config.backend }}_torch_cuda_stats.txt
cat reports/tests_${{ matrix.config.backend }}_torch_cuda_failures_short.txt
- name: Test suite reports artifacts
if: ${{ always() }}
uses: actions/upload-artifact@v4
with:
name: torch_cuda_${{ matrix.config.backend }}_reports
path: reports
- name: Generate Report and Notify Channel
if: always()
run: |
pip install slack_sdk tabulate
python utils/log_reports.py >> $GITHUB_STEP_SUMMARY
# M1 runner currently not well supported
# TODO: (Dhruv) add these back when we setup better testing for Apple Silicon
# run_nightly_tests_apple_m1:
@@ -522,4 +461,4 @@ jobs:
# if: always()
# run: |
# pip install slack_sdk tabulate
# python utils/log_reports.py >> $GITHUB_STEP_SUMMARY
# python utils/log_reports.py >> $GITHUB_STEP_SUMMARY

View File

@@ -0,0 +1,134 @@
name: Fast tests for PRs - PEFT backend
on:
pull_request:
branches:
- main
paths:
- "src/diffusers/**.py"
- "tests/**.py"
concurrency:
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
cancel-in-progress: true
env:
DIFFUSERS_IS_CI: yes
OMP_NUM_THREADS: 4
MKL_NUM_THREADS: 4
PYTEST_TIMEOUT: 60
jobs:
check_code_quality:
runs-on: ubuntu-22.04
steps:
- uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: "3.8"
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install .[quality]
- name: Check quality
run: make quality
- name: Check if failure
if: ${{ failure() }}
run: |
echo "Quality check failed. Please ensure the right dependency versions are installed with 'pip install -e .[quality]' and run 'make style && make quality'" >> $GITHUB_STEP_SUMMARY
check_repository_consistency:
needs: check_code_quality
runs-on: ubuntu-22.04
steps:
- uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: "3.8"
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install .[quality]
- name: Check repo consistency
run: |
python utils/check_copies.py
python utils/check_dummies.py
make deps_table_check_updated
- name: Check if failure
if: ${{ failure() }}
run: |
echo "Repo consistency check failed. Please ensure the right dependency versions are installed with 'pip install -e .[quality]' and run 'make fix-copies'" >> $GITHUB_STEP_SUMMARY
run_fast_tests:
needs: [check_code_quality, check_repository_consistency]
strategy:
fail-fast: false
matrix:
lib-versions: ["main", "latest"]
name: LoRA - ${{ matrix.lib-versions }}
runs-on:
group: aws-general-8-plus
container:
image: diffusers/diffusers-pytorch-cpu
options: --shm-size "16gb" --ipc host -v /mnt/hf_cache:/mnt/cache/
defaults:
run:
shell: bash
steps:
- name: Checkout diffusers
uses: actions/checkout@v3
with:
fetch-depth: 2
- name: Install dependencies
run: |
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
python -m uv pip install -e [quality,test]
# TODO (sayakpaul, DN6): revisit `--no-deps`
if [ "${{ matrix.lib-versions }}" == "main" ]; then
python -m pip install -U peft@git+https://github.com/huggingface/peft.git --no-deps
python -m uv pip install -U transformers@git+https://github.com/huggingface/transformers.git --no-deps
pip uninstall accelerate -y && python -m uv pip install -U accelerate@git+https://github.com/huggingface/accelerate.git --no-deps
else
python -m uv pip install -U peft --no-deps
python -m uv pip install -U transformers accelerate --no-deps
fi
- name: Environment
run: |
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
python utils/print_env.py
- name: Run fast PyTorch LoRA CPU tests with PEFT backend
run: |
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
python -m pytest -n 4 --max-worker-restart=0 --dist=loadfile \
-s -v \
--make-reports=tests_${{ matrix.lib-versions }} \
tests/lora/
python -m pytest -n 4 --max-worker-restart=0 --dist=loadfile \
-s -v \
--make-reports=tests_models_lora_${{ matrix.lib-versions }} \
tests/models/ -k "lora"
- name: Failure short reports
if: ${{ failure() }}
run: |
cat reports/tests_${{ matrix.lib-versions }}_failures_short.txt
cat reports/tests_models_lora_${{ matrix.lib-versions }}_failures_short.txt
- name: Test suite reports artifacts
if: ${{ always() }}
uses: actions/upload-artifact@v4
with:
name: pr_${{ matrix.lib-versions }}_test_reports
path: reports

View File

@@ -234,67 +234,3 @@ jobs:
with:
name: pr_${{ matrix.config.report }}_test_reports
path: reports
run_lora_tests:
needs: [check_code_quality, check_repository_consistency]
strategy:
fail-fast: false
name: LoRA tests with PEFT main
runs-on:
group: aws-general-8-plus
container:
image: diffusers/diffusers-pytorch-cpu
options: --shm-size "16gb" --ipc host -v /mnt/hf_cache:/mnt/cache/
defaults:
run:
shell: bash
steps:
- name: Checkout diffusers
uses: actions/checkout@v3
with:
fetch-depth: 2
- name: Install dependencies
run: |
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
python -m uv pip install -e [quality,test]
# TODO (sayakpaul, DN6): revisit `--no-deps`
python -m pip install -U peft@git+https://github.com/huggingface/peft.git --no-deps
python -m uv pip install -U transformers@git+https://github.com/huggingface/transformers.git --no-deps
pip uninstall accelerate -y && python -m uv pip install -U accelerate@git+https://github.com/huggingface/accelerate.git --no-deps
- name: Environment
run: |
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
python utils/print_env.py
- name: Run fast PyTorch LoRA tests with PEFT
run: |
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
python -m pytest -n 4 --max-worker-restart=0 --dist=loadfile \
-s -v \
--make-reports=tests_peft_main \
tests/lora/
python -m pytest -n 4 --max-worker-restart=0 --dist=loadfile \
-s -v \
--make-reports=tests_models_lora_peft_main \
tests/models/ -k "lora"
- name: Failure short reports
if: ${{ failure() }}
run: |
cat reports/tests_lora_failures_short.txt
cat reports/tests_models_lora_failures_short.txt
- name: Test suite reports artifacts
if: ${{ always() }}
uses: actions/upload-artifact@v4
with:
name: pr_main_test_reports
path: reports

View File

@@ -161,11 +161,10 @@ jobs:
flax_tpu_tests:
name: Flax TPU Tests
runs-on:
group: gcp-ct5lp-hightpu-8t
runs-on: docker-tpu
container:
image: diffusers/diffusers-flax-tpu
options: --shm-size "16gb" --ipc host --privileged ${{ vars.V5_LITEPOD_8_ENV}} -v /mnt/hf_cache:/mnt/hf_cache
options: --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/ --privileged
defaults:
run:
shell: bash

View File

@@ -46,7 +46,7 @@ jobs:
shell: arch -arch arm64 bash {0}
run: |
${CONDA_RUN} python -m pip install --upgrade pip uv
${CONDA_RUN} python -m uv pip install -e ".[quality,test]"
${CONDA_RUN} python -m uv pip install -e [quality,test]
${CONDA_RUN} python -m uv pip install torch torchvision torchaudio
${CONDA_RUN} python -m uv pip install accelerate@git+https://github.com/huggingface/accelerate.git
${CONDA_RUN} python -m uv pip install transformers --upgrade

View File

@@ -112,9 +112,9 @@ Check out the [Quickstart](https://huggingface.co/docs/diffusers/quicktour) to l
| **Documentation** | **What can I learn?** |
|---------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| [Tutorial](https://huggingface.co/docs/diffusers/tutorials/tutorial_overview) | A basic crash course for learning how to use the library's most important features like using models and schedulers to build your own diffusion system, and training your own diffusion model. |
| [Loading](https://huggingface.co/docs/diffusers/using-diffusers/loading) | Guides for how to load and configure all the components (pipelines, models, and schedulers) of the library, as well as how to use different schedulers. |
| [Pipelines for inference](https://huggingface.co/docs/diffusers/using-diffusers/overview_techniques) | Guides for how to use pipelines for different inference tasks, batched generation, controlling generated outputs and randomness, and how to contribute a pipeline to the library. |
| [Optimization](https://huggingface.co/docs/diffusers/optimization/fp16) | Guides for how to optimize your diffusion model to run faster and consume less memory. |
| [Loading](https://huggingface.co/docs/diffusers/using-diffusers/loading_overview) | Guides for how to load and configure all the components (pipelines, models, and schedulers) of the library, as well as how to use different schedulers. |
| [Pipelines for inference](https://huggingface.co/docs/diffusers/using-diffusers/pipeline_overview) | Guides for how to use pipelines for different inference tasks, batched generation, controlling generated outputs and randomness, and how to contribute a pipeline to the library. |
| [Optimization](https://huggingface.co/docs/diffusers/optimization/opt_overview) | Guides for how to optimize your diffusion model to run faster and consume less memory. |
| [Training](https://huggingface.co/docs/diffusers/training/overview) | Guides for how to train a diffusion model for different tasks with different training techniques. |
## Contribution

View File

@@ -28,7 +28,7 @@ ENV PATH="/opt/venv/bin:$PATH"
# pre-install the heavy dependencies (these can later be overridden by the deps from setup.py)
RUN python3.10 -m pip install --no-cache-dir --upgrade pip uv==0.1.11 && \
python3.10 -m uv pip install --no-cache-dir \
torch \
"torch<2.5.0" \
torchvision \
torchaudio \
"onnxruntime-gpu>=1.13.1" \

View File

@@ -29,7 +29,7 @@ ENV PATH="/opt/venv/bin:$PATH"
# pre-install the heavy dependencies (these can later be overridden by the deps from setup.py)
RUN python3.10 -m pip install --no-cache-dir --upgrade pip uv==0.1.11 && \
python3.10 -m uv pip install --no-cache-dir \
torch \
"torch<2.5.0" \
torchvision \
torchaudio \
invisible_watermark && \

View File

@@ -29,7 +29,7 @@ ENV PATH="/opt/venv/bin:$PATH"
# pre-install the heavy dependencies (these can later be overridden by the deps from setup.py)
RUN python3.10 -m pip install --no-cache-dir --upgrade pip uv==0.1.11 && \
python3.10 -m uv pip install --no-cache-dir \
torch \
"torch<2.5.0" \
torchvision \
torchaudio \
invisible_watermark \

View File

@@ -29,7 +29,7 @@ ENV PATH="/opt/venv/bin:$PATH"
# pre-install the heavy dependencies (these can later be overridden by the deps from setup.py)
RUN python3.10 -m pip install --no-cache-dir --upgrade pip uv==0.1.11 && \
python3.10 -m uv pip install --no-cache-dir \
torch \
"torch<2.5.0" \
torchvision \
torchaudio \
invisible_watermark && \

View File

@@ -29,7 +29,7 @@ ENV PATH="/opt/venv/bin:$PATH"
# pre-install the heavy dependencies (these can later be overridden by the deps from setup.py)
RUN python3.10 -m pip install --no-cache-dir --upgrade pip uv==0.1.11 && \
python3.10 -m pip install --no-cache-dir \
torch \
"torch<2.5.0" \
torchvision \
torchaudio \
invisible_watermark && \

View File

@@ -55,8 +55,6 @@
- sections:
- local: using-diffusers/overview_techniques
title: Overview
- local: using-diffusers/create_a_server
title: Create a server
- local: training/distributed_inference
title: Distributed inference
- local: using-diffusers/merge_loras
@@ -157,10 +155,6 @@
title: Getting Started
- local: quantization/bitsandbytes
title: bitsandbytes
- local: quantization/gguf
title: gguf
- local: quantization/torchao
title: torchao
title: Quantization Methods
- sections:
- local: optimization/fp16
@@ -238,8 +232,6 @@
title: Textual Inversion
- local: api/loaders/unet
title: UNet
- local: api/loaders/transformer_sd3
title: SD3Transformer2D
- local: api/loaders/peft
title: PEFT
title: Loaders
@@ -258,8 +250,6 @@
title: SD3ControlNetModel
- local: api/models/controlnet_sparsectrl
title: SparseControlNetModel
- local: api/models/controlnet_union
title: ControlNetUnionModel
title: ControlNets
- sections:
- local: api/models/allegro_transformer3d
@@ -276,14 +266,10 @@
title: FluxTransformer2DModel
- local: api/models/hunyuan_transformer2d
title: HunyuanDiT2DModel
- local: api/models/hunyuan_video_transformer_3d
title: HunyuanVideoTransformer3DModel
- local: api/models/latte_transformer3d
title: LatteTransformer3DModel
- local: api/models/lumina_nextdit2d
title: LuminaNextDiT2DModel
- local: api/models/ltx_video_transformer3d
title: LTXVideoTransformer3DModel
- local: api/models/mochi_transformer3d
title: MochiTransformer3DModel
- local: api/models/pixart_transformer2d
@@ -292,8 +278,6 @@
title: PriorTransformer
- local: api/models/sd3_transformer2d
title: SD3Transformer2DModel
- local: api/models/sana_transformer2d
title: SanaTransformer2DModel
- local: api/models/stable_audio_transformer
title: StableAudioDiTModel
- local: api/models/transformer2d
@@ -324,16 +308,10 @@
title: AutoencoderKLAllegro
- local: api/models/autoencoderkl_cogvideox
title: AutoencoderKLCogVideoX
- local: api/models/autoencoder_kl_hunyuan_video
title: AutoencoderKLHunyuanVideo
- local: api/models/autoencoderkl_ltx_video
title: AutoencoderKLLTXVideo
- local: api/models/autoencoderkl_mochi
title: AutoencoderKLMochi
- local: api/models/asymmetricautoencoderkl
title: AsymmetricAutoencoderKL
- local: api/models/autoencoder_dc
title: AutoencoderDC
- local: api/models/consistency_decoder_vae
title: ConsistencyDecoderVAE
- local: api/models/autoencoder_oobleck
@@ -386,8 +364,6 @@
title: ControlNet-XS
- local: api/pipelines/controlnetxs_sdxl
title: ControlNet-XS with Stable Diffusion XL
- local: api/pipelines/controlnet_union
title: ControlNetUnion
- local: api/pipelines/dance_diffusion
title: Dance Diffusion
- local: api/pipelines/ddim
@@ -402,12 +378,8 @@
title: DiT
- local: api/pipelines/flux
title: Flux
- local: api/pipelines/control_flux_inpaint
title: FluxControlInpaint
- local: api/pipelines/hunyuandit
title: Hunyuan-DiT
- local: api/pipelines/hunyuan_video
title: HunyuanVideo
- local: api/pipelines/i2vgenxl
title: I2VGen-XL
- local: api/pipelines/pix2pix
@@ -428,8 +400,6 @@
title: Latte
- local: api/pipelines/ledits_pp
title: LEDITS++
- local: api/pipelines/ltx_video
title: LTX
- local: api/pipelines/lumina
title: Lumina-T2X
- local: api/pipelines/marigold
@@ -450,8 +420,6 @@
title: PixArt-α
- local: api/pipelines/pixart_sigma
title: PixArt-Σ
- local: api/pipelines/sana
title: Sana
- local: api/pipelines/self_attention_guidance
title: Self-Attention Guidance
- local: api/pipelines/semantic_stable_diffusion

View File

@@ -15,135 +15,40 @@ specific language governing permissions and limitations under the License.
An attention processor is a class for applying different types of attention mechanisms.
## AttnProcessor
[[autodoc]] models.attention_processor.AttnProcessor
## AttnProcessor2_0
[[autodoc]] models.attention_processor.AttnProcessor2_0
## AttnAddedKVProcessor
[[autodoc]] models.attention_processor.AttnAddedKVProcessor
## AttnAddedKVProcessor2_0
[[autodoc]] models.attention_processor.AttnAddedKVProcessor2_0
[[autodoc]] models.attention_processor.AttnProcessorNPU
[[autodoc]] models.attention_processor.FusedAttnProcessor2_0
## Allegro
[[autodoc]] models.attention_processor.AllegroAttnProcessor2_0
## AuraFlow
[[autodoc]] models.attention_processor.AuraFlowAttnProcessor2_0
[[autodoc]] models.attention_processor.FusedAuraFlowAttnProcessor2_0
## CogVideoX
[[autodoc]] models.attention_processor.CogVideoXAttnProcessor2_0
[[autodoc]] models.attention_processor.FusedCogVideoXAttnProcessor2_0
## CrossFrameAttnProcessor
[[autodoc]] pipelines.text_to_video_synthesis.pipeline_text_to_video_zero.CrossFrameAttnProcessor
## Custom Diffusion
## CustomDiffusionAttnProcessor
[[autodoc]] models.attention_processor.CustomDiffusionAttnProcessor
## CustomDiffusionAttnProcessor2_0
[[autodoc]] models.attention_processor.CustomDiffusionAttnProcessor2_0
## CustomDiffusionXFormersAttnProcessor
[[autodoc]] models.attention_processor.CustomDiffusionXFormersAttnProcessor
## Flux
[[autodoc]] models.attention_processor.FluxAttnProcessor2_0
[[autodoc]] models.attention_processor.FusedFluxAttnProcessor2_0
[[autodoc]] models.attention_processor.FluxSingleAttnProcessor2_0
## Hunyuan
[[autodoc]] models.attention_processor.HunyuanAttnProcessor2_0
[[autodoc]] models.attention_processor.FusedHunyuanAttnProcessor2_0
[[autodoc]] models.attention_processor.PAGHunyuanAttnProcessor2_0
[[autodoc]] models.attention_processor.PAGCFGHunyuanAttnProcessor2_0
## IdentitySelfAttnProcessor2_0
[[autodoc]] models.attention_processor.PAGIdentitySelfAttnProcessor2_0
[[autodoc]] models.attention_processor.PAGCFGIdentitySelfAttnProcessor2_0
## IP-Adapter
[[autodoc]] models.attention_processor.IPAdapterAttnProcessor
[[autodoc]] models.attention_processor.IPAdapterAttnProcessor2_0
[[autodoc]] models.attention_processor.SD3IPAdapterJointAttnProcessor2_0
## JointAttnProcessor2_0
[[autodoc]] models.attention_processor.JointAttnProcessor2_0
[[autodoc]] models.attention_processor.PAGJointAttnProcessor2_0
[[autodoc]] models.attention_processor.PAGCFGJointAttnProcessor2_0
[[autodoc]] models.attention_processor.FusedJointAttnProcessor2_0
## LoRA
[[autodoc]] models.attention_processor.LoRAAttnProcessor
[[autodoc]] models.attention_processor.LoRAAttnProcessor2_0
[[autodoc]] models.attention_processor.LoRAAttnAddedKVProcessor
[[autodoc]] models.attention_processor.LoRAXFormersAttnProcessor
## Lumina-T2X
[[autodoc]] models.attention_processor.LuminaAttnProcessor2_0
## Mochi
[[autodoc]] models.attention_processor.MochiAttnProcessor2_0
[[autodoc]] models.attention_processor.MochiVaeAttnProcessor2_0
## Sana
[[autodoc]] models.attention_processor.SanaLinearAttnProcessor2_0
[[autodoc]] models.attention_processor.SanaMultiscaleAttnProcessor2_0
[[autodoc]] models.attention_processor.PAGCFGSanaLinearAttnProcessor2_0
[[autodoc]] models.attention_processor.PAGIdentitySanaLinearAttnProcessor2_0
## Stable Audio
[[autodoc]] models.attention_processor.StableAudioAttnProcessor2_0
## FusedAttnProcessor2_0
[[autodoc]] models.attention_processor.FusedAttnProcessor2_0
## SlicedAttnProcessor
[[autodoc]] models.attention_processor.SlicedAttnProcessor
## SlicedAttnAddedKVProcessor
[[autodoc]] models.attention_processor.SlicedAttnAddedKVProcessor
## XFormersAttnProcessor
[[autodoc]] models.attention_processor.XFormersAttnProcessor
[[autodoc]] models.attention_processor.XFormersAttnAddedKVProcessor
## XLAFlashAttnProcessor2_0
[[autodoc]] models.attention_processor.XLAFlashAttnProcessor2_0
## AttnProcessorNPU
[[autodoc]] models.attention_processor.AttnProcessorNPU

View File

@@ -24,12 +24,6 @@ Learn how to load an IP-Adapter checkpoint and image in the IP-Adapter [loading]
[[autodoc]] loaders.ip_adapter.IPAdapterMixin
## SD3IPAdapterMixin
[[autodoc]] loaders.ip_adapter.SD3IPAdapterMixin
- all
- is_ip_adapter_active
## IPAdapterMaskProcessor
[[autodoc]] image_processor.IPAdapterMaskProcessor

View File

@@ -17,9 +17,6 @@ LoRA is a fast and lightweight training method that inserts and trains a signifi
- [`StableDiffusionLoraLoaderMixin`] provides functions for loading and unloading, fusing and unfusing, enabling and disabling, and more functions for managing LoRA weights. This class can be used with any model.
- [`StableDiffusionXLLoraLoaderMixin`] is a [Stable Diffusion (SDXL)](../../api/pipelines/stable_diffusion/stable_diffusion_xl) version of the [`StableDiffusionLoraLoaderMixin`] class for loading and saving LoRA weights. It can only be used with the SDXL model.
- [`SD3LoraLoaderMixin`] provides similar functions for [Stable Diffusion 3](https://huggingface.co/blog/sd3).
- [`FluxLoraLoaderMixin`] provides similar functions for [Flux](https://huggingface.co/docs/diffusers/main/en/api/pipelines/flux).
- [`CogVideoXLoraLoaderMixin`] provides similar functions for [CogVideoX](https://huggingface.co/docs/diffusers/main/en/api/pipelines/cogvideox).
- [`Mochi1LoraLoaderMixin`] provides similar functions for [Mochi](https://huggingface.co/docs/diffusers/main/en/api/pipelines/mochi).
- [`AmusedLoraLoaderMixin`] is for the [`AmusedPipeline`].
- [`LoraBaseMixin`] provides a base class with several utility methods to fuse, unfuse, unload, LoRAs and more.
@@ -41,18 +38,6 @@ To learn more about how to load LoRA weights, see the [LoRA](../../using-diffuse
[[autodoc]] loaders.lora_pipeline.SD3LoraLoaderMixin
## FluxLoraLoaderMixin
[[autodoc]] loaders.lora_pipeline.FluxLoraLoaderMixin
## CogVideoXLoraLoaderMixin
[[autodoc]] loaders.lora_pipeline.CogVideoXLoraLoaderMixin
## Mochi1LoraLoaderMixin
[[autodoc]] loaders.lora_pipeline.Mochi1LoraLoaderMixin
## AmusedLoraLoaderMixin
[[autodoc]] loaders.lora_pipeline.AmusedLoraLoaderMixin

View File

@@ -1,29 +0,0 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# SD3Transformer2D
This class is useful when *only* loading weights into a [`SD3Transformer2DModel`]. If you need to load weights into the text encoder or a text encoder and SD3Transformer2DModel, check [`SD3LoraLoaderMixin`](lora#diffusers.loaders.SD3LoraLoaderMixin) class instead.
The [`SD3Transformer2DLoadersMixin`] class currently only loads IP-Adapter weights, but will be used in the future to save weights and load LoRAs.
<Tip>
To learn more about how to load LoRA weights, see the [LoRA](../../using-diffusers/loading_adapters#lora) loading guide.
</Tip>
## SD3Transformer2DLoadersMixin
[[autodoc]] loaders.transformer_sd3.SD3Transformer2DLoadersMixin
- all
- _load_ip_adapter_weights

View File

@@ -1,72 +0,0 @@
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License. -->
# AutoencoderDC
The 2D Autoencoder model used in [SANA](https://huggingface.co/papers/2410.10629) and introduced in [DCAE](https://huggingface.co/papers/2410.10733) by authors Junyu Chen\*, Han Cai\*, Junsong Chen, Enze Xie, Shang Yang, Haotian Tang, Muyang Li, Yao Lu, Song Han from MIT HAN Lab.
The abstract from the paper is:
*We present Deep Compression Autoencoder (DC-AE), a new family of autoencoder models for accelerating high-resolution diffusion models. Existing autoencoder models have demonstrated impressive results at a moderate spatial compression ratio (e.g., 8x), but fail to maintain satisfactory reconstruction accuracy for high spatial compression ratios (e.g., 64x). We address this challenge by introducing two key techniques: (1) Residual Autoencoding, where we design our models to learn residuals based on the space-to-channel transformed features to alleviate the optimization difficulty of high spatial-compression autoencoders; (2) Decoupled High-Resolution Adaptation, an efficient decoupled three-phases training strategy for mitigating the generalization penalty of high spatial-compression autoencoders. With these designs, we improve the autoencoder's spatial compression ratio up to 128 while maintaining the reconstruction quality. Applying our DC-AE to latent diffusion models, we achieve significant speedup without accuracy drop. For example, on ImageNet 512x512, our DC-AE provides 19.1x inference speedup and 17.9x training speedup on H100 GPU for UViT-H while achieving a better FID, compared with the widely used SD-VAE-f8 autoencoder. Our code is available at [this https URL](https://github.com/mit-han-lab/efficientvit).*
The following DCAE models are released and supported in Diffusers.
| Diffusers format | Original format |
|:----------------:|:---------------:|
| [`mit-han-lab/dc-ae-f32c32-sana-1.0-diffusers`](https://huggingface.co/mit-han-lab/dc-ae-f32c32-sana-1.0-diffusers) | [`mit-han-lab/dc-ae-f32c32-sana-1.0`](https://huggingface.co/mit-han-lab/dc-ae-f32c32-sana-1.0)
| [`mit-han-lab/dc-ae-f32c32-in-1.0-diffusers`](https://huggingface.co/mit-han-lab/dc-ae-f32c32-in-1.0-diffusers) | [`mit-han-lab/dc-ae-f32c32-in-1.0`](https://huggingface.co/mit-han-lab/dc-ae-f32c32-in-1.0)
| [`mit-han-lab/dc-ae-f32c32-mix-1.0-diffusers`](https://huggingface.co/mit-han-lab/dc-ae-f32c32-mix-1.0-diffusers) | [`mit-han-lab/dc-ae-f32c32-mix-1.0`](https://huggingface.co/mit-han-lab/dc-ae-f32c32-mix-1.0)
| [`mit-han-lab/dc-ae-f64c128-in-1.0-diffusers`](https://huggingface.co/mit-han-lab/dc-ae-f64c128-in-1.0-diffusers) | [`mit-han-lab/dc-ae-f64c128-in-1.0`](https://huggingface.co/mit-han-lab/dc-ae-f64c128-in-1.0)
| [`mit-han-lab/dc-ae-f64c128-mix-1.0-diffusers`](https://huggingface.co/mit-han-lab/dc-ae-f64c128-mix-1.0-diffusers) | [`mit-han-lab/dc-ae-f64c128-mix-1.0`](https://huggingface.co/mit-han-lab/dc-ae-f64c128-mix-1.0)
| [`mit-han-lab/dc-ae-f128c512-in-1.0-diffusers`](https://huggingface.co/mit-han-lab/dc-ae-f128c512-in-1.0-diffusers) | [`mit-han-lab/dc-ae-f128c512-in-1.0`](https://huggingface.co/mit-han-lab/dc-ae-f128c512-in-1.0)
| [`mit-han-lab/dc-ae-f128c512-mix-1.0-diffusers`](https://huggingface.co/mit-han-lab/dc-ae-f128c512-mix-1.0-diffusers) | [`mit-han-lab/dc-ae-f128c512-mix-1.0`](https://huggingface.co/mit-han-lab/dc-ae-f128c512-mix-1.0)
This model was contributed by [lawrence-cj](https://github.com/lawrence-cj).
Load a model in Diffusers format with [`~ModelMixin.from_pretrained`].
```python
from diffusers import AutoencoderDC
ae = AutoencoderDC.from_pretrained("mit-han-lab/dc-ae-f32c32-sana-1.0-diffusers", torch_dtype=torch.float32).to("cuda")
```
## Load a model in Diffusers via `from_single_file`
```python
from difusers import AutoencoderDC
ckpt_path = "https://huggingface.co/mit-han-lab/dc-ae-f32c32-sana-1.0/blob/main/model.safetensors"
model = AutoencoderDC.from_single_file(ckpt_path)
```
The `AutoencoderDC` model has `in` and `mix` single file checkpoint variants that have matching checkpoint keys, but use different scaling factors. It is not possible for Diffusers to automatically infer the correct config file to use with the model based on just the checkpoint and will default to configuring the model using the `mix` variant config file. To override the automatically determined config, please use the `config` argument when using single file loading with `in` variant checkpoints.
```python
from diffusers import AutoencoderDC
ckpt_path = "https://huggingface.co/mit-han-lab/dc-ae-f128c512-in-1.0/blob/main/model.safetensors"
model = AutoencoderDC.from_single_file(ckpt_path, config="mit-han-lab/dc-ae-f128c512-in-1.0-diffusers")
```
## AutoencoderDC
[[autodoc]] AutoencoderDC
- encode
- decode
- all
## DecoderOutput
[[autodoc]] models.autoencoders.vae.DecoderOutput

View File

@@ -1,32 +0,0 @@
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License. -->
# AutoencoderKLHunyuanVideo
The 3D variational autoencoder (VAE) model with KL loss used in [HunyuanVideo](https://github.com/Tencent/HunyuanVideo/), which was introduced in [HunyuanVideo: A Systematic Framework For Large Video Generative Models](https://huggingface.co/papers/2412.03603) by Tencent.
The model can be loaded with the following code snippet.
```python
from diffusers import AutoencoderKLHunyuanVideo
vae = AutoencoderKLHunyuanVideo.from_pretrained("hunyuanvideo-community/HunyuanVideo", subfolder="vae", torch_dtype=torch.float16)
```
## AutoencoderKLHunyuanVideo
[[autodoc]] AutoencoderKLHunyuanVideo
- decode
- all
## DecoderOutput
[[autodoc]] models.autoencoders.vae.DecoderOutput

View File

@@ -1,37 +0,0 @@
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License. -->
# AutoencoderKLLTXVideo
The 3D variational autoencoder (VAE) model with KL loss used in [LTX](https://huggingface.co/Lightricks/LTX-Video) was introduced by Lightricks.
The model can be loaded with the following code snippet.
```python
from diffusers import AutoencoderKLLTXVideo
vae = AutoencoderKLLTXVideo.from_pretrained("TODO/TODO", subfolder="vae", torch_dtype=torch.float32).to("cuda")
```
## AutoencoderKLLTXVideo
[[autodoc]] AutoencoderKLLTXVideo
- decode
- encode
- all
## AutoencoderKLOutput
[[autodoc]] models.autoencoders.autoencoder_kl.AutoencoderKLOutput
## DecoderOutput
[[autodoc]] models.autoencoders.vae.DecoderOutput

View File

@@ -1,35 +0,0 @@
<!--Copyright 2024 The HuggingFace Team and The InstantX Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# ControlNetUnionModel
ControlNetUnionModel is an implementation of ControlNet for Stable Diffusion XL.
The ControlNet model was introduced in [ControlNetPlus](https://github.com/xinsir6/ControlNetPlus) by xinsir6. It supports multiple conditioning inputs without increasing computation.
*We design a new architecture that can support 10+ control types in condition text-to-image generation and can generate high resolution images visually comparable with midjourney. The network is based on the original ControlNet architecture, we propose two new modules to: 1 Extend the original ControlNet to support different image conditions using the same network parameter. 2 Support multiple conditions input without increasing computation offload, which is especially important for designers who want to edit image in detail, different conditions use the same condition encoder, without adding extra computations or parameters.*
## Loading
By default the [`ControlNetUnionModel`] should be loaded with [`~ModelMixin.from_pretrained`].
```py
from diffusers import StableDiffusionXLControlNetUnionPipeline, ControlNetUnionModel
controlnet = ControlNetUnionModel.from_pretrained("xinsir/controlnet-union-sdxl-1.0")
pipe = StableDiffusionXLControlNetUnionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", controlnet=controlnet)
```
## ControlNetUnionModel
[[autodoc]] ControlNetUnionModel

View File

@@ -1,30 +0,0 @@
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License. -->
# HunyuanVideoTransformer3DModel
A Diffusion Transformer model for 3D video-like data was introduced in [HunyuanVideo: A Systematic Framework For Large Video Generative Models](https://huggingface.co/papers/2412.03603) by Tencent.
The model can be loaded with the following code snippet.
```python
from diffusers import HunyuanVideoTransformer3DModel
transformer = HunyuanVideoTransformer3DModel.from_pretrained("hunyuanvideo-community/HunyuanVideo", subfolder="transformer", torch_dtype=torch.bfloat16)
```
## HunyuanVideoTransformer3DModel
[[autodoc]] HunyuanVideoTransformer3DModel
## Transformer2DModelOutput
[[autodoc]] models.modeling_outputs.Transformer2DModelOutput

View File

@@ -1,30 +0,0 @@
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License. -->
# LTXVideoTransformer3DModel
A Diffusion Transformer model for 3D data from [LTX](https://huggingface.co/Lightricks/LTX-Video) was introduced by Lightricks.
The model can be loaded with the following code snippet.
```python
from diffusers import LTXVideoTransformer3DModel
transformer = LTXVideoTransformer3DModel.from_pretrained("TODO/TODO", subfolder="transformer", torch_dtype=torch.bfloat16).to("cuda")
```
## LTXVideoTransformer3DModel
[[autodoc]] LTXVideoTransformer3DModel
## Transformer2DModelOutput
[[autodoc]] models.modeling_outputs.Transformer2DModelOutput

View File

@@ -1,34 +0,0 @@
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License. -->
# SanaTransformer2DModel
A Diffusion Transformer model for 2D data from [SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers](https://huggingface.co/papers/2410.10629) was introduced from NVIDIA and MIT HAN Lab, by Enze Xie, Junsong Chen, Junyu Chen, Han Cai, Haotian Tang, Yujun Lin, Zhekai Zhang, Muyang Li, Ligeng Zhu, Yao Lu, Song Han.
The abstract from the paper is:
*We introduce Sana, a text-to-image framework that can efficiently generate images up to 4096×4096 resolution. Sana can synthesize high-resolution, high-quality images with strong text-image alignment at a remarkably fast speed, deployable on laptop GPU. Core designs include: (1) Deep compression autoencoder: unlike traditional AEs, which compress images only 8×, we trained an AE that can compress images 32×, effectively reducing the number of latent tokens. (2) Linear DiT: we replace all vanilla attention in DiT with linear attention, which is more efficient at high resolutions without sacrificing quality. (3) Decoder-only text encoder: we replaced T5 with modern decoder-only small LLM as the text encoder and designed complex human instruction with in-context learning to enhance the image-text alignment. (4) Efficient training and sampling: we propose Flow-DPM-Solver to reduce sampling steps, with efficient caption labeling and selection to accelerate convergence. As a result, Sana-0.6B is very competitive with modern giant diffusion model (e.g. Flux-12B), being 20 times smaller and 100+ times faster in measured throughput. Moreover, Sana-0.6B can be deployed on a 16GB laptop GPU, taking less than 1 second to generate a 1024×1024 resolution image. Sana enables content creation at low cost. Code and model will be publicly released.*
The model can be loaded with the following code snippet.
```python
from diffusers import SanaTransformer2DModel
transformer = SanaTransformer2DModel.from_pretrained("Efficient-Large-Model/Sana_1600M_1024px_BF16_diffusers", subfolder="transformer", torch_dtype=torch.bfloat16)
```
## SanaTransformer2DModel
[[autodoc]] SanaTransformer2DModel
## Transformer2DModelOutput
[[autodoc]] models.modeling_outputs.Transformer2DModelOutput

View File

@@ -29,32 +29,16 @@ Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers.m
This pipeline was contributed by [zRzRzRzRzRzRzR](https://github.com/zRzRzRzRzRzRzR). The original codebase can be found [here](https://huggingface.co/THUDM). The original weights can be found under [hf.co/THUDM](https://huggingface.co/THUDM).
There are three official CogVideoX checkpoints for text-to-video and video-to-video.
There are two models available that can be used with the text-to-video and video-to-video CogVideoX pipelines:
- [`THUDM/CogVideoX-2b`](https://huggingface.co/THUDM/CogVideoX-2b): The recommended dtype for running this model is `fp16`.
- [`THUDM/CogVideoX-5b`](https://huggingface.co/THUDM/CogVideoX-5b): The recommended dtype for running this model is `bf16`.
| checkpoints | recommended inference dtype |
|:---:|:---:|
| [`THUDM/CogVideoX-2b`](https://huggingface.co/THUDM/CogVideoX-2b) | torch.float16 |
| [`THUDM/CogVideoX-5b`](https://huggingface.co/THUDM/CogVideoX-5b) | torch.bfloat16 |
| [`THUDM/CogVideoX1.5-5b`](https://huggingface.co/THUDM/CogVideoX1.5-5b) | torch.bfloat16 |
There is one model available that can be used with the image-to-video CogVideoX pipeline:
- [`THUDM/CogVideoX-5b-I2V`](https://huggingface.co/THUDM/CogVideoX-5b-I2V): The recommended dtype for running this model is `bf16`.
There are two official CogVideoX checkpoints available for image-to-video.
| checkpoints | recommended inference dtype |
|:---:|:---:|
| [`THUDM/CogVideoX-5b-I2V`](https://huggingface.co/THUDM/CogVideoX-5b-I2V) | torch.bfloat16 |
| [`THUDM/CogVideoX-1.5-5b-I2V`](https://huggingface.co/THUDM/CogVideoX-1.5-5b-I2V) | torch.bfloat16 |
For the CogVideoX 1.5 series:
- Text-to-video (T2V) works best at a resolution of 1360x768 because it was trained with that specific resolution.
- Image-to-video (I2V) works for multiple resolutions. The width can vary from 768 to 1360, but the height must be 768. The height/width must be divisible by 16.
- Both T2V and I2V models support generation with 81 and 161 frames and work best at this value. Exporting videos at 16 FPS is recommended.
There are two official CogVideoX checkpoints that support pose controllable generation (by the [Alibaba-PAI](https://huggingface.co/alibaba-pai) team).
| checkpoints | recommended inference dtype |
|:---:|:---:|
| [`alibaba-pai/CogVideoX-Fun-V1.1-2b-Pose`](https://huggingface.co/alibaba-pai/CogVideoX-Fun-V1.1-2b-Pose) | torch.bfloat16 |
| [`alibaba-pai/CogVideoX-Fun-V1.1-5b-Pose`](https://huggingface.co/alibaba-pai/CogVideoX-Fun-V1.1-5b-Pose) | torch.bfloat16 |
There are two models that support pose controllable generation (by the [Alibaba-PAI](https://huggingface.co/alibaba-pai) team):
- [`alibaba-pai/CogVideoX-Fun-V1.1-2b-Pose`](https://huggingface.co/alibaba-pai/CogVideoX-Fun-V1.1-2b-Pose): The recommended dtype for running this model is `bf16`.
- [`alibaba-pai/CogVideoX-Fun-V1.1-5b-Pose`](https://huggingface.co/alibaba-pai/CogVideoX-Fun-V1.1-5b-Pose): The recommended dtype for running this model is `bf16`.
## Inference

View File

@@ -1,89 +0,0 @@
<!--Copyright 2024 The HuggingFace Team, The Black Forest Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# FluxControlInpaint
FluxControlInpaintPipeline is an implementation of Inpainting for Flux.1 Depth/Canny models. It is a pipeline that allows you to inpaint images using the Flux.1 Depth/Canny models. The pipeline takes an image and a mask as input and returns the inpainted image.
FLUX.1 Depth and Canny [dev] is a 12 billion parameter rectified flow transformer capable of generating an image based on a text description while following the structure of a given input image. **This is not a ControlNet model**.
| Control type | Developer | Link |
| -------- | ---------- | ---- |
| Depth | [Black Forest Labs](https://huggingface.co/black-forest-labs) | [Link](https://huggingface.co/black-forest-labs/FLUX.1-Depth-dev) |
| Canny | [Black Forest Labs](https://huggingface.co/black-forest-labs) | [Link](https://huggingface.co/black-forest-labs/FLUX.1-Canny-dev) |
<Tip>
Flux can be quite expensive to run on consumer hardware devices. However, you can perform a suite of optimizations to run it faster and in a more memory-friendly manner. Check out [this section](https://huggingface.co/blog/sd3#memory-optimizations-for-sd3) for more details. Additionally, Flux can benefit from quantization for memory efficiency with a trade-off in inference latency. Refer to [this blog post](https://huggingface.co/blog/quanto-diffusers) to learn more. For an exhaustive list of resources, check out [this gist](https://gist.github.com/sayakpaul/b664605caf0aa3bf8585ab109dd5ac9c).
</Tip>
```python
import torch
from diffusers import FluxControlInpaintPipeline
from diffusers.models.transformers import FluxTransformer2DModel
from transformers import T5EncoderModel
from diffusers.utils import load_image, make_image_grid
from image_gen_aux import DepthPreprocessor # https://github.com/huggingface/image_gen_aux
from PIL import Image
import numpy as np
pipe = FluxControlInpaintPipeline.from_pretrained(
"black-forest-labs/FLUX.1-Depth-dev",
torch_dtype=torch.bfloat16,
)
# use following lines if you have GPU constraints
# ---------------------------------------------------------------
transformer = FluxTransformer2DModel.from_pretrained(
"sayakpaul/FLUX.1-Depth-dev-nf4", subfolder="transformer", torch_dtype=torch.bfloat16
)
text_encoder_2 = T5EncoderModel.from_pretrained(
"sayakpaul/FLUX.1-Depth-dev-nf4", subfolder="text_encoder_2", torch_dtype=torch.bfloat16
)
pipe.transformer = transformer
pipe.text_encoder_2 = text_encoder_2
pipe.enable_model_cpu_offload()
# ---------------------------------------------------------------
pipe.to("cuda")
prompt = "a blue robot singing opera with human-like expressions"
image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/robot.png")
head_mask = np.zeros_like(image)
head_mask[65:580,300:642] = 255
mask_image = Image.fromarray(head_mask)
processor = DepthPreprocessor.from_pretrained("LiheYoung/depth-anything-large-hf")
control_image = processor(image)[0].convert("RGB")
output = pipe(
prompt=prompt,
image=image,
control_image=control_image,
mask_image=mask_image,
num_inference_steps=30,
strength=0.9,
guidance_scale=10.0,
generator=torch.Generator().manual_seed(42),
).images[0]
make_image_grid([image, control_image, mask_image, output.resize(image.size)], rows=1, cols=4).save("output.png")
```
## FluxControlInpaintPipeline
[[autodoc]] FluxControlInpaintPipeline
- all
- __call__
## FluxPipelineOutput
[[autodoc]] pipelines.flux.pipeline_output.FluxPipelineOutput

View File

@@ -28,7 +28,6 @@ This controlnet code is mainly implemented by [The InstantX Team](https://huggin
| ControlNet type | Developer | Link |
| -------- | ---------- | ---- |
| Canny | [The InstantX Team](https://huggingface.co/InstantX) | [Link](https://huggingface.co/InstantX/SD3-Controlnet-Canny) |
| Depth | [The InstantX Team](https://huggingface.co/InstantX) | [Link](https://huggingface.co/InstantX/SD3-Controlnet-Depth) |
| Pose | [The InstantX Team](https://huggingface.co/InstantX) | [Link](https://huggingface.co/InstantX/SD3-Controlnet-Pose) |
| Tile | [The InstantX Team](https://huggingface.co/InstantX) | [Link](https://huggingface.co/InstantX/SD3-Controlnet-Tile) |
| Inpainting | [The AlimamaCreative Team](https://huggingface.co/alimama-creative) | [link](https://huggingface.co/alimama-creative/SD3-Controlnet-Inpainting) |

View File

@@ -1,35 +0,0 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# ControlNetUnion
ControlNetUnionModel is an implementation of ControlNet for Stable Diffusion XL.
The ControlNet model was introduced in [ControlNetPlus](https://github.com/xinsir6/ControlNetPlus) by xinsir6. It supports multiple conditioning inputs without increasing computation.
*We design a new architecture that can support 10+ control types in condition text-to-image generation and can generate high resolution images visually comparable with midjourney. The network is based on the original ControlNet architecture, we propose two new modules to: 1 Extend the original ControlNet to support different image conditions using the same network parameter. 2 Support multiple conditions input without increasing computation offload, which is especially important for designers who want to edit image in detail, different conditions use the same condition encoder, without adding extra computations or parameters.*
## StableDiffusionXLControlNetUnionPipeline
[[autodoc]] StableDiffusionXLControlNetUnionPipeline
- all
- __call__
## StableDiffusionXLControlNetUnionImg2ImgPipeline
[[autodoc]] StableDiffusionXLControlNetUnionImg2ImgPipeline
- all
- __call__
## StableDiffusionXLControlNetUnionInpaintPipeline
[[autodoc]] StableDiffusionXLControlNetUnionInpaintPipeline
- all
- __call__

View File

@@ -22,20 +22,12 @@ Flux can be quite expensive to run on consumer hardware devices. However, you ca
</Tip>
Flux comes in the following variants:
Flux comes in two variants:
| model type | model id |
|:----------:|:--------:|
| Timestep-distilled | [`black-forest-labs/FLUX.1-schnell`](https://huggingface.co/black-forest-labs/FLUX.1-schnell) |
| Guidance-distilled | [`black-forest-labs/FLUX.1-dev`](https://huggingface.co/black-forest-labs/FLUX.1-dev) |
| Fill Inpainting/Outpainting (Guidance-distilled) | [`black-forest-labs/FLUX.1-Fill-dev`](https://huggingface.co/black-forest-labs/FLUX.1-Fill-dev) |
| Canny Control (Guidance-distilled) | [`black-forest-labs/FLUX.1-Canny-dev`](https://huggingface.co/black-forest-labs/FLUX.1-Canny-dev) |
| Depth Control (Guidance-distilled) | [`black-forest-labs/FLUX.1-Depth-dev`](https://huggingface.co/black-forest-labs/FLUX.1-Depth-dev) |
| Canny Control (LoRA) | [`black-forest-labs/FLUX.1-Canny-dev-lora`](https://huggingface.co/black-forest-labs/FLUX.1-Canny-dev-lora) |
| Depth Control (LoRA) | [`black-forest-labs/FLUX.1-Depth-dev-lora`](https://huggingface.co/black-forest-labs/FLUX.1-Depth-dev-lora) |
| Redux (Adapter) | [`black-forest-labs/FLUX.1-Redux-dev`](https://huggingface.co/black-forest-labs/FLUX.1-Redux-dev) |
* Timestep-distilled (`black-forest-labs/FLUX.1-schnell`)
* Guidance-distilled (`black-forest-labs/FLUX.1-dev`)
All checkpoints have different usage which we detail below.
Both checkpoints have slightly difference usage which we detail below.
### Timestep-distilled
@@ -85,228 +77,7 @@ out = pipe(
out.save("image.png")
```
### Fill Inpainting/Outpainting
* Flux Fill pipeline does not require `strength` as an input like regular inpainting pipelines.
* It supports both inpainting and outpainting.
```python
import torch
from diffusers import FluxFillPipeline
from diffusers.utils import load_image
image = load_image("https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/cup.png")
mask = load_image("https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/cup_mask.png")
repo_id = "black-forest-labs/FLUX.1-Fill-dev"
pipe = FluxFillPipeline.from_pretrained(repo_id, torch_dtype=torch.bfloat16).to("cuda")
image = pipe(
prompt="a white paper cup",
image=image,
mask_image=mask,
height=1632,
width=1232,
max_sequence_length=512,
generator=torch.Generator("cpu").manual_seed(0)
).images[0]
image.save(f"output.png")
```
### Canny Control
**Note:** `black-forest-labs/Flux.1-Canny-dev` is _not_ a [`ControlNetModel`] model. ControlNet models are a separate component from the UNet/Transformer whose residuals are added to the actual underlying model. Canny Control is an alternate architecture that achieves effectively the same results as a ControlNet model would, by using channel-wise concatenation with input control condition and ensuring the transformer learns structure control by following the condition as closely as possible.
```python
# !pip install -U controlnet-aux
import torch
from controlnet_aux import CannyDetector
from diffusers import FluxControlPipeline
from diffusers.utils import load_image
pipe = FluxControlPipeline.from_pretrained("black-forest-labs/FLUX.1-Canny-dev", torch_dtype=torch.bfloat16).to("cuda")
prompt = "A robot made of exotic candies and chocolates of different kinds. The background is filled with confetti and celebratory gifts."
control_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/robot.png")
processor = CannyDetector()
control_image = processor(control_image, low_threshold=50, high_threshold=200, detect_resolution=1024, image_resolution=1024)
image = pipe(
prompt=prompt,
control_image=control_image,
height=1024,
width=1024,
num_inference_steps=50,
guidance_scale=30.0,
).images[0]
image.save("output.png")
```
Canny Control is also possible with a LoRA variant of this condition. The usage is as follows:
```python
# !pip install -U controlnet-aux
import torch
from controlnet_aux import CannyDetector
from diffusers import FluxControlPipeline
from diffusers.utils import load_image
pipe = FluxControlPipeline.from_pretrained("black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16).to("cuda")
pipe.load_lora_weights("black-forest-labs/FLUX.1-Canny-dev-lora")
prompt = "A robot made of exotic candies and chocolates of different kinds. The background is filled with confetti and celebratory gifts."
control_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/robot.png")
processor = CannyDetector()
control_image = processor(control_image, low_threshold=50, high_threshold=200, detect_resolution=1024, image_resolution=1024)
image = pipe(
prompt=prompt,
control_image=control_image,
height=1024,
width=1024,
num_inference_steps=50,
guidance_scale=30.0,
).images[0]
image.save("output.png")
```
### Depth Control
**Note:** `black-forest-labs/Flux.1-Depth-dev` is _not_ a ControlNet model. [`ControlNetModel`] models are a separate component from the UNet/Transformer whose residuals are added to the actual underlying model. Depth Control is an alternate architecture that achieves effectively the same results as a ControlNet model would, by using channel-wise concatenation with input control condition and ensuring the transformer learns structure control by following the condition as closely as possible.
```python
# !pip install git+https://github.com/huggingface/image_gen_aux
import torch
from diffusers import FluxControlPipeline, FluxTransformer2DModel
from diffusers.utils import load_image
from image_gen_aux import DepthPreprocessor
pipe = FluxControlPipeline.from_pretrained("black-forest-labs/FLUX.1-Depth-dev", torch_dtype=torch.bfloat16).to("cuda")
prompt = "A robot made of exotic candies and chocolates of different kinds. The background is filled with confetti and celebratory gifts."
control_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/robot.png")
processor = DepthPreprocessor.from_pretrained("LiheYoung/depth-anything-large-hf")
control_image = processor(control_image)[0].convert("RGB")
image = pipe(
prompt=prompt,
control_image=control_image,
height=1024,
width=1024,
num_inference_steps=30,
guidance_scale=10.0,
generator=torch.Generator().manual_seed(42),
).images[0]
image.save("output.png")
```
Depth Control is also possible with a LoRA variant of this condition. The usage is as follows:
```python
# !pip install git+https://github.com/huggingface/image_gen_aux
import torch
from diffusers import FluxControlPipeline, FluxTransformer2DModel
from diffusers.utils import load_image
from image_gen_aux import DepthPreprocessor
pipe = FluxControlPipeline.from_pretrained("black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16).to("cuda")
pipe.load_lora_weights("black-forest-labs/FLUX.1-Depth-dev-lora")
prompt = "A robot made of exotic candies and chocolates of different kinds. The background is filled with confetti and celebratory gifts."
control_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/robot.png")
processor = DepthPreprocessor.from_pretrained("LiheYoung/depth-anything-large-hf")
control_image = processor(control_image)[0].convert("RGB")
image = pipe(
prompt=prompt,
control_image=control_image,
height=1024,
width=1024,
num_inference_steps=30,
guidance_scale=10.0,
generator=torch.Generator().manual_seed(42),
).images[0]
image.save("output.png")
```
### Redux
* Flux Redux pipeline is an adapter for FLUX.1 base models. It can be used with both flux-dev and flux-schnell, for image-to-image generation.
* You can first use the `FluxPriorReduxPipeline` to get the `prompt_embeds` and `pooled_prompt_embeds`, and then feed them into the `FluxPipeline` for image-to-image generation.
* When use `FluxPriorReduxPipeline` with a base pipeline, you can set `text_encoder=None` and `text_encoder_2=None` in the base pipeline, in order to save VRAM.
```python
import torch
from diffusers import FluxPriorReduxPipeline, FluxPipeline
from diffusers.utils import load_image
device = "cuda"
dtype = torch.bfloat16
repo_redux = "black-forest-labs/FLUX.1-Redux-dev"
repo_base = "black-forest-labs/FLUX.1-dev"
pipe_prior_redux = FluxPriorReduxPipeline.from_pretrained(repo_redux, torch_dtype=dtype).to(device)
pipe = FluxPipeline.from_pretrained(
repo_base,
text_encoder=None,
text_encoder_2=None,
torch_dtype=torch.bfloat16
).to(device)
image = load_image("https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/style_ziggy/img5.png")
pipe_prior_output = pipe_prior_redux(image)
images = pipe(
guidance_scale=2.5,
num_inference_steps=50,
generator=torch.Generator("cpu").manual_seed(0),
**pipe_prior_output,
).images
images[0].save("flux-redux.png")
```
## Combining Flux Turbo LoRAs with Flux Control, Fill, and Redux
We can combine Flux Turbo LoRAs with Flux Control and other pipelines like Fill and Redux to enable few-steps' inference. The example below shows how to do that for Flux Control LoRA for depth and turbo LoRA from [`ByteDance/Hyper-SD`](https://hf.co/ByteDance/Hyper-SD).
```py
from diffusers import FluxControlPipeline
from image_gen_aux import DepthPreprocessor
from diffusers.utils import load_image
from huggingface_hub import hf_hub_download
import torch
control_pipe = FluxControlPipeline.from_pretrained("black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16)
control_pipe.load_lora_weights("black-forest-labs/FLUX.1-Depth-dev-lora", adapter_name="depth")
control_pipe.load_lora_weights(
hf_hub_download("ByteDance/Hyper-SD", "Hyper-FLUX.1-dev-8steps-lora.safetensors"), adapter_name="hyper-sd"
)
control_pipe.set_adapters(["depth", "hyper-sd"], adapter_weights=[0.85, 0.125])
control_pipe.enable_model_cpu_offload()
prompt = "A robot made of exotic candies and chocolates of different kinds. The background is filled with confetti and celebratory gifts."
control_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/robot.png")
processor = DepthPreprocessor.from_pretrained("LiheYoung/depth-anything-large-hf")
control_image = processor(control_image)[0].convert("RGB")
image = control_pipe(
prompt=prompt,
control_image=control_image,
height=1024,
width=1024,
num_inference_steps=8,
guidance_scale=10.0,
generator=torch.Generator().manual_seed(42),
).images[0]
image.save("output.png")
```
## Running FP16 inference
Flux can generate high-quality images with FP16 (i.e. to accelerate inference on Turing/Volta GPUs) but produces different outputs compared to FP32/BF16. The issue is that some activations in the text encoders have to be clipped when running in FP16, which affects the overall image. Forcing text encoders to run with FP32 inference thus removes this output difference. See [here](https://github.com/huggingface/diffusers/pull/9097#issuecomment-2272292516) for details.
FP16 inference code:
@@ -417,27 +188,3 @@ image.save("flux-fp8-dev.png")
[[autodoc]] FluxControlNetImg2ImgPipeline
- all
- __call__
## FluxControlPipeline
[[autodoc]] FluxControlPipeline
- all
- __call__
## FluxControlImg2ImgPipeline
[[autodoc]] FluxControlImg2ImgPipeline
- all
- __call__
## FluxPriorReduxPipeline
[[autodoc]] FluxPriorReduxPipeline
- all
- __call__
## FluxFillPipeline
[[autodoc]] FluxFillPipeline
- all
- __call__

View File

@@ -1,43 +0,0 @@
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License. -->
# HunyuanVideo
[HunyuanVideo](https://www.arxiv.org/abs/2412.03603) by Tencent.
*Recent advancements in video generation have significantly impacted daily life for both individuals and industries. However, the leading video generation models remain closed-source, resulting in a notable performance gap between industry capabilities and those available to the public. In this report, we introduce HunyuanVideo, an innovative open-source video foundation model that demonstrates performance in video generation comparable to, or even surpassing, that of leading closed-source models. HunyuanVideo encompasses a comprehensive framework that integrates several key elements, including data curation, advanced architectural design, progressive model scaling and training, and an efficient infrastructure tailored for large-scale model training and inference. As a result, we successfully trained a video generative model with over 13 billion parameters, making it the largest among all open-source models. We conducted extensive experiments and implemented a series of targeted designs to ensure high visual quality, motion dynamics, text-video alignment, and advanced filming techniques. According to evaluations by professionals, HunyuanVideo outperforms previous state-of-the-art models, including Runway Gen-3, Luma 1.6, and three top-performing Chinese video generative models. By releasing the code for the foundation model and its applications, we aim to bridge the gap between closed-source and open-source communities. This initiative will empower individuals within the community to experiment with their ideas, fostering a more dynamic and vibrant video generation ecosystem. The code is publicly available at [this https URL](https://github.com/Tencent/HunyuanVideo).*
<Tip>
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers.md) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading.md#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.
</Tip>
Recommendations for inference:
- Both text encoders should be in `torch.float16`.
- Transformer should be in `torch.bfloat16`.
- VAE should be in `torch.float16`.
- `num_frames` should be of the form `4 * k + 1`, for example `49` or `129`.
- For smaller resolution videos, try lower values of `shift` (between `2.0` to `5.0`) in the [Scheduler](https://huggingface.co/docs/diffusers/main/en/api/schedulers/flow_match_euler_discrete#diffusers.FlowMatchEulerDiscreteScheduler.shift). For larger resolution images, try higher values (between `7.0` and `12.0`). The default value is `7.0` for HunyuanVideo.
- For more information about supported resolutions and other details, please refer to the original repository [here](https://github.com/Tencent/HunyuanVideo/).
## HunyuanVideoPipeline
[[autodoc]] HunyuanVideoPipeline
- all
- __call__
## HunyuanVideoPipelineOutput
[[autodoc]] pipelines.hunyuan_video.pipeline_output.HunyuanVideoPipelineOutput

View File

@@ -1,118 +0,0 @@
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License. -->
# LTX
[LTX Video](https://huggingface.co/Lightricks/LTX-Video) is the first DiT-based video generation model capable of generating high-quality videos in real-time. It produces 24 FPS videos at a 768x512 resolution faster than they can be watched. Trained on a large-scale dataset of diverse videos, the model generates high-resolution videos with realistic and varied content. We provide a model for both text-to-video as well as image + text-to-video usecases.
<Tip>
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers.md) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading.md#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.
</Tip>
## Loading Single Files
Loading the original LTX Video checkpoints is also possible with [`~ModelMixin.from_single_file`].
```python
import torch
from diffusers import AutoencoderKLLTXVideo, LTXImageToVideoPipeline, LTXVideoTransformer3DModel
single_file_url = "https://huggingface.co/Lightricks/LTX-Video/ltx-video-2b-v0.9.safetensors"
transformer = LTXVideoTransformer3DModel.from_single_file(
single_file_url, torch_dtype=torch.bfloat16
)
vae = AutoencoderKLLTXVideo.from_single_file(single_file_url, torch_dtype=torch.bfloat16)
pipe = LTXImageToVideoPipeline.from_pretrained(
"Lightricks/LTX-Video", transformer=transformer, vae=vae, torch_dtype=torch.bfloat16
)
# ... inference code ...
```
Alternatively, the pipeline can be used to load the weights with [`~FromSingleFileMixin.from_single_file`].
```python
import torch
from diffusers import LTXImageToVideoPipeline
from transformers import T5EncoderModel, T5Tokenizer
single_file_url = "https://huggingface.co/Lightricks/LTX-Video/ltx-video-2b-v0.9.safetensors"
text_encoder = T5EncoderModel.from_pretrained(
"Lightricks/LTX-Video", subfolder="text_encoder", torch_dtype=torch.bfloat16
)
tokenizer = T5Tokenizer.from_pretrained(
"Lightricks/LTX-Video", subfolder="tokenizer", torch_dtype=torch.bfloat16
)
pipe = LTXImageToVideoPipeline.from_single_file(
single_file_url, text_encoder=text_encoder, tokenizer=tokenizer, torch_dtype=torch.bfloat16
)
```
Loading [LTX GGUF checkpoints](https://huggingface.co/city96/LTX-Video-gguf) are also supported:
```py
import torch
from diffusers.utils import export_to_video
from diffusers import LTXPipeline, LTXVideoTransformer3DModel, GGUFQuantizationConfig
ckpt_path = (
"https://huggingface.co/city96/LTX-Video-gguf/blob/main/ltx-video-2b-v0.9-Q3_K_S.gguf"
)
transformer = LTXVideoTransformer3DModel.from_single_file(
ckpt_path,
quantization_config=GGUFQuantizationConfig(compute_dtype=torch.bfloat16),
torch_dtype=torch.bfloat16,
)
pipe = LTXPipeline.from_pretrained(
"Lightricks/LTX-Video",
transformer=transformer,
torch_dtype=torch.bfloat16,
)
pipe.enable_model_cpu_offload()
prompt = "A woman with long brown hair and light skin smiles at another woman with long blonde hair. The woman with brown hair wears a black jacket and has a small, barely noticeable mole on her right cheek. The camera angle is a close-up, focused on the woman with brown hair's face. The lighting is warm and natural, likely from the setting sun, casting a soft glow on the scene. The scene appears to be real-life footage"
negative_prompt = "worst quality, inconsistent motion, blurry, jittery, distorted"
video = pipe(
prompt=prompt,
negative_prompt=negative_prompt,
width=704,
height=480,
num_frames=161,
num_inference_steps=50,
).frames[0]
export_to_video(video, "output_gguf_ltx.mp4", fps=24)
```
Make sure to read the [documentation on GGUF](../../quantization/gguf) to learn more about our GGUF support.
Refer to [this section](https://huggingface.co/docs/diffusers/main/en/api/pipelines/cogvideox#memory-optimization) to learn more about optimizing memory consumption.
## LTXPipeline
[[autodoc]] LTXPipeline
- all
- __call__
## LTXImageToVideoPipeline
[[autodoc]] LTXImageToVideoPipeline
- all
- __call__
## LTXPipelineOutput
[[autodoc]] pipelines.ltx.pipeline_output.LTXPipelineOutput

View File

@@ -13,7 +13,7 @@
# limitations under the License.
-->
# Mochi 1 Preview
# Mochi
[Mochi 1 Preview](https://huggingface.co/genmo/mochi-1-preview) from Genmo.
@@ -25,201 +25,6 @@ Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers.m
</Tip>
## Generating videos with Mochi-1 Preview
The following example will download the full precision `mochi-1-preview` weights and produce the highest quality results but will require at least 42GB VRAM to run.
```python
import torch
from diffusers import MochiPipeline
from diffusers.utils import export_to_video
pipe = MochiPipeline.from_pretrained("genmo/mochi-1-preview")
# Enable memory savings
pipe.enable_model_cpu_offload()
pipe.enable_vae_tiling()
prompt = "Close-up of a chameleon's eye, with its scaly skin changing color. Ultra high resolution 4k."
with torch.autocast("cuda", torch.bfloat16, cache_enabled=False):
frames = pipe(prompt, num_frames=85).frames[0]
export_to_video(frames, "mochi.mp4", fps=30)
```
## Using a lower precision variant to save memory
The following example will use the `bfloat16` variant of the model and requires 22GB VRAM to run. There is a slight drop in the quality of the generated video as a result.
```python
import torch
from diffusers import MochiPipeline
from diffusers.utils import export_to_video
pipe = MochiPipeline.from_pretrained("genmo/mochi-1-preview", variant="bf16", torch_dtype=torch.bfloat16)
# Enable memory savings
pipe.enable_model_cpu_offload()
pipe.enable_vae_tiling()
prompt = "Close-up of a chameleon's eye, with its scaly skin changing color. Ultra high resolution 4k."
frames = pipe(prompt, num_frames=85).frames[0]
export_to_video(frames, "mochi.mp4", fps=30)
```
## Reproducing the results from the Genmo Mochi repo
The [Genmo Mochi implementation](https://github.com/genmoai/mochi/tree/main) uses different precision values for each stage in the inference process. The text encoder and VAE use `torch.float32`, while the DiT uses `torch.bfloat16` with the [attention kernel](https://pytorch.org/docs/stable/generated/torch.nn.attention.sdpa_kernel.html#torch.nn.attention.sdpa_kernel) set to `EFFICIENT_ATTENTION`. Diffusers pipelines currently do not support setting different `dtypes` for different stages of the pipeline. In order to run inference in the same way as the the original implementation, please refer to the following example.
<Tip>
The original Mochi implementation zeros out empty prompts. However, enabling this option and placing the entire pipeline under autocast can lead to numerical overflows with the T5 text encoder.
When enabling `force_zeros_for_empty_prompt`, it is recommended to run the text encoding step outside the autocast context in full precision.
</Tip>
<Tip>
Decoding the latents in full precision is very memory intensive. You will need at least 70GB VRAM to generate the 163 frames in this example. To reduce memory, either reduce the number of frames or run the decoding step in `torch.bfloat16`.
</Tip>
```python
import torch
from torch.nn.attention import SDPBackend, sdpa_kernel
from diffusers import MochiPipeline
from diffusers.utils import export_to_video
from diffusers.video_processor import VideoProcessor
pipe = MochiPipeline.from_pretrained("genmo/mochi-1-preview", force_zeros_for_empty_prompt=True)
pipe.enable_vae_tiling()
pipe.enable_model_cpu_offload()
prompt = "An aerial shot of a parade of elephants walking across the African savannah. The camera showcases the herd and the surrounding landscape."
with torch.no_grad():
prompt_embeds, prompt_attention_mask, negative_prompt_embeds, negative_prompt_attention_mask = (
pipe.encode_prompt(prompt=prompt)
)
with torch.autocast("cuda", torch.bfloat16):
with sdpa_kernel(SDPBackend.EFFICIENT_ATTENTION):
frames = pipe(
prompt_embeds=prompt_embeds,
prompt_attention_mask=prompt_attention_mask,
negative_prompt_embeds=negative_prompt_embeds,
negative_prompt_attention_mask=negative_prompt_attention_mask,
guidance_scale=4.5,
num_inference_steps=64,
height=480,
width=848,
num_frames=163,
generator=torch.Generator("cuda").manual_seed(0),
output_type="latent",
return_dict=False,
)[0]
video_processor = VideoProcessor(vae_scale_factor=8)
has_latents_mean = hasattr(pipe.vae.config, "latents_mean") and pipe.vae.config.latents_mean is not None
has_latents_std = hasattr(pipe.vae.config, "latents_std") and pipe.vae.config.latents_std is not None
if has_latents_mean and has_latents_std:
latents_mean = (
torch.tensor(pipe.vae.config.latents_mean).view(1, 12, 1, 1, 1).to(frames.device, frames.dtype)
)
latents_std = (
torch.tensor(pipe.vae.config.latents_std).view(1, 12, 1, 1, 1).to(frames.device, frames.dtype)
)
frames = frames * latents_std / pipe.vae.config.scaling_factor + latents_mean
else:
frames = frames / pipe.vae.config.scaling_factor
with torch.no_grad():
video = pipe.vae.decode(frames.to(pipe.vae.dtype), return_dict=False)[0]
video = video_processor.postprocess_video(video)[0]
export_to_video(video, "mochi.mp4", fps=30)
```
## Running inference with multiple GPUs
It is possible to split the large Mochi transformer across multiple GPUs using the `device_map` and `max_memory` options in `from_pretrained`. In the following example we split the model across two GPUs, each with 24GB of VRAM.
```python
import torch
from diffusers import MochiPipeline, MochiTransformer3DModel
from diffusers.utils import export_to_video
model_id = "genmo/mochi-1-preview"
transformer = MochiTransformer3DModel.from_pretrained(
model_id,
subfolder="transformer",
device_map="auto",
max_memory={0: "24GB", 1: "24GB"}
)
pipe = MochiPipeline.from_pretrained(model_id, transformer=transformer)
pipe.enable_model_cpu_offload()
pipe.enable_vae_tiling()
with torch.autocast(device_type="cuda", dtype=torch.bfloat16, cache_enabled=False):
frames = pipe(
prompt="Close-up of a chameleon's eye, with its scaly skin changing color. Ultra high resolution 4k.",
negative_prompt="",
height=480,
width=848,
num_frames=85,
num_inference_steps=50,
guidance_scale=4.5,
num_videos_per_prompt=1,
generator=torch.Generator(device="cuda").manual_seed(0),
max_sequence_length=256,
output_type="pil",
).frames[0]
export_to_video(frames, "output.mp4", fps=30)
```
## Using single file loading with the Mochi Transformer
You can use `from_single_file` to load the Mochi transformer in its original format.
<Tip>
Diffusers currently doesn't support using the FP8 scaled versions of the Mochi single file checkpoints.
</Tip>
```python
import torch
from diffusers import MochiPipeline, MochiTransformer3DModel
from diffusers.utils import export_to_video
model_id = "genmo/mochi-1-preview"
ckpt_path = "https://huggingface.co/Comfy-Org/mochi_preview_repackaged/blob/main/split_files/diffusion_models/mochi_preview_bf16.safetensors"
transformer = MochiTransformer3DModel.from_pretrained(ckpt_path, torch_dtype=torch.bfloat16)
pipe = MochiPipeline.from_pretrained(model_id, transformer=transformer)
pipe.enable_model_cpu_offload()
pipe.enable_vae_tiling()
with torch.autocast(device_type="cuda", dtype=torch.bfloat16, cache_enabled=False):
frames = pipe(
prompt="Close-up of a chameleon's eye, with its scaly skin changing color. Ultra high resolution 4k.",
negative_prompt="",
height=480,
width=848,
num_frames=85,
num_inference_steps=50,
guidance_scale=4.5,
num_videos_per_prompt=1,
generator=torch.Generator(device="cuda").manual_seed(0),
max_sequence_length=256,
output_type="pil",
).frames[0]
export_to_video(frames, "output.mp4", fps=30)
```
## MochiPipeline
[[autodoc]] MochiPipeline

View File

@@ -48,11 +48,6 @@ Since RegEx is supported as a way for matching layer identifiers, it is crucial
- all
- __call__
## StableDiffusionPAGInpaintPipeline
[[autodoc]] StableDiffusionPAGInpaintPipeline
- all
- __call__
## StableDiffusionPAGPipeline
[[autodoc]] StableDiffusionPAGPipeline
- all
@@ -101,10 +96,6 @@ Since RegEx is supported as a way for matching layer identifiers, it is crucial
- all
- __call__
## StableDiffusion3PAGImg2ImgPipeline
[[autodoc]] StableDiffusion3PAGImg2ImgPipeline
- all
- __call__
## PixArtSigmaPAGPipeline
[[autodoc]] PixArtSigmaPAGPipeline

View File

@@ -1,67 +0,0 @@
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License. -->
# SanaPipeline
[SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers](https://huggingface.co/papers/2410.10629) from NVIDIA and MIT HAN Lab, by Enze Xie, Junsong Chen, Junyu Chen, Han Cai, Haotian Tang, Yujun Lin, Zhekai Zhang, Muyang Li, Ligeng Zhu, Yao Lu, Song Han.
The abstract from the paper is:
*We introduce Sana, a text-to-image framework that can efficiently generate images up to 4096×4096 resolution. Sana can synthesize high-resolution, high-quality images with strong text-image alignment at a remarkably fast speed, deployable on laptop GPU. Core designs include: (1) Deep compression autoencoder: unlike traditional AEs, which compress images only 8×, we trained an AE that can compress images 32×, effectively reducing the number of latent tokens. (2) Linear DiT: we replace all vanilla attention in DiT with linear attention, which is more efficient at high resolutions without sacrificing quality. (3) Decoder-only text encoder: we replaced T5 with modern decoder-only small LLM as the text encoder and designed complex human instruction with in-context learning to enhance the image-text alignment. (4) Efficient training and sampling: we propose Flow-DPM-Solver to reduce sampling steps, with efficient caption labeling and selection to accelerate convergence. As a result, Sana-0.6B is very competitive with modern giant diffusion model (e.g. Flux-12B), being 20 times smaller and 100+ times faster in measured throughput. Moreover, Sana-0.6B can be deployed on a 16GB laptop GPU, taking less than 1 second to generate a 1024×1024 resolution image. Sana enables content creation at low cost. Code and model will be publicly released.*
<Tip>
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers.md) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading.md#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.
</Tip>
This pipeline was contributed by [lawrence-cj](https://github.com/lawrence-cj) and [chenjy2003](https://github.com/chenjy2003). The original codebase can be found [here](https://github.com/NVlabs/Sana). The original weights can be found under [hf.co/Efficient-Large-Model](https://huggingface.co/Efficient-Large-Model).
Available models:
| Model | Recommended dtype |
|:-----:|:-----------------:|
| [`Efficient-Large-Model/Sana_1600M_1024px_BF16_diffusers`](https://huggingface.co/Efficient-Large-Model/Sana_1600M_1024px_BF16_diffusers) | `torch.bfloat16` |
| [`Efficient-Large-Model/Sana_1600M_1024px_diffusers`](https://huggingface.co/Efficient-Large-Model/Sana_1600M_1024px_diffusers) | `torch.float16` |
| [`Efficient-Large-Model/Sana_1600M_1024px_MultiLing_diffusers`](https://huggingface.co/Efficient-Large-Model/Sana_1600M_1024px_MultiLing_diffusers) | `torch.float16` |
| [`Efficient-Large-Model/Sana_1600M_512px_diffusers`](https://huggingface.co/Efficient-Large-Model/Sana_1600M_512px_diffusers) | `torch.float16` |
| [`Efficient-Large-Model/Sana_1600M_512px_MultiLing_diffusers`](https://huggingface.co/Efficient-Large-Model/Sana_1600M_512px_MultiLing_diffusers) | `torch.float16` |
| [`Efficient-Large-Model/Sana_600M_1024px_diffusers`](https://huggingface.co/Efficient-Large-Model/Sana_600M_1024px_diffusers) | `torch.float16` |
| [`Efficient-Large-Model/Sana_600M_512px_diffusers`](https://huggingface.co/Efficient-Large-Model/Sana_600M_512px_diffusers) | `torch.float16` |
Refer to [this](https://huggingface.co/collections/Efficient-Large-Model/sana-673efba2a57ed99843f11f9e) collection for more information.
Note: The recommended dtype mentioned is for the transformer weights. The text encoder and VAE weights must stay in `torch.bfloat16` or `torch.float32` for the model to work correctly. Please refer to the inference example below to see how to load the model with the recommended dtype.
<Tip>
Make sure to pass the `variant` argument for downloaded checkpoints to use lower disk space. Set it to `"fp16"` for models with recommended dtype as `torch.float16`, and `"bf16"` for models with recommended dtype as `torch.bfloat16`. By default, `torch.float32` weights are downloaded, which use twice the amount of disk storage. Additionally, `torch.float32` weights can be downcasted on-the-fly by specifying the `torch_dtype` argument. Read about it in the [docs](https://huggingface.co/docs/diffusers/v0.31.0/en/api/pipelines/overview#diffusers.DiffusionPipeline.from_pretrained).
</Tip>
## SanaPipeline
[[autodoc]] SanaPipeline
- all
- __call__
## SanaPAGPipeline
[[autodoc]] SanaPAGPipeline
- all
- __call__
## SanaPipelineOutput
[[autodoc]] pipelines.sana.pipeline_output.SanaPipelineOutput

View File

@@ -59,76 +59,9 @@ image.save("sd3_hello_world.png")
- [`stabilityai/stable-diffusion-3.5-large`](https://huggingface.co/stabilityai/stable-diffusion-3-5-large)
- [`stabilityai/stable-diffusion-3.5-large-turbo`](https://huggingface.co/stabilityai/stable-diffusion-3-5-large-turbo)
## Image Prompting with IP-Adapters
An IP-Adapter lets you prompt SD3 with images, in addition to the text prompt. This is especially useful when describing complex concepts that are difficult to articulate through text alone and you have reference images. To load and use an IP-Adapter, you need:
- `image_encoder`: Pre-trained vision model used to obtain image features, usually a CLIP image encoder.
- `feature_extractor`: Image processor that prepares the input image for the chosen `image_encoder`.
- `ip_adapter_id`: Checkpoint containing parameters of image cross attention layers and image projection.
IP-Adapters are trained for a specific model architecture, so they also work in finetuned variations of the base model. You can use the [`~SD3IPAdapterMixin.set_ip_adapter_scale`] function to adjust how strongly the output aligns with the image prompt. The higher the value, the more closely the model follows the image prompt. A default value of 0.5 is typically a good balance, ensuring the model considers both the text and image prompts equally.
```python
import torch
from PIL import Image
from diffusers import StableDiffusion3Pipeline
from transformers import SiglipVisionModel, SiglipImageProcessor
image_encoder_id = "google/siglip-so400m-patch14-384"
ip_adapter_id = "InstantX/SD3.5-Large-IP-Adapter"
feature_extractor = SiglipImageProcessor.from_pretrained(
image_encoder_id,
torch_dtype=torch.float16
)
image_encoder = SiglipVisionModel.from_pretrained(
image_encoder_id,
torch_dtype=torch.float16
).to( "cuda")
pipe = StableDiffusion3Pipeline.from_pretrained(
"stabilityai/stable-diffusion-3.5-large",
torch_dtype=torch.float16,
feature_extractor=feature_extractor,
image_encoder=image_encoder,
).to("cuda")
pipe.load_ip_adapter(ip_adapter_id)
pipe.set_ip_adapter_scale(0.6)
ref_img = Image.open("image.jpg").convert('RGB')
image = pipe(
width=1024,
height=1024,
prompt="a cat",
negative_prompt="lowres, low quality, worst quality",
num_inference_steps=24,
guidance_scale=5.0,
ip_adapter_image=ref_img
).images[0]
image.save("result.jpg")
```
<div class="justify-center">
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/sd3_ip_adapter_example.png"/>
<figcaption class="mt-2 text-sm text-center text-gray-500">IP-Adapter examples with prompt "a cat"</figcaption>
</div>
<Tip>
Check out [IP-Adapter](../../../using-diffusers/ip_adapter) to learn more about how IP-Adapters work.
</Tip>
## Memory Optimisations for SD3
SD3 uses three text encoders, one of which is the very large T5-XXL model. This makes it challenging to run the model on GPUs with less than 24GB of VRAM, even when using `fp16` precision. The following section outlines a few memory optimizations in Diffusers that make it easier to run SD3 on low resource hardware.
SD3 uses three text encoders, one if which is the very large T5-XXL model. This makes it challenging to run the model on GPUs with less than 24GB of VRAM, even when using `fp16` precision. The following section outlines a few memory optimizations in Diffusers that make it easier to run SD3 on low resource hardware.
### Running Inference with Model Offloading

View File

@@ -28,13 +28,6 @@ Learn how to quantize models in the [Quantization](../quantization/overview) gui
[[autodoc]] BitsAndBytesConfig
## GGUFQuantizationConfig
[[autodoc]] GGUFQuantizationConfig
## TorchAoConfig
[[autodoc]] TorchAoConfig
## DiffusersQuantizer
[[autodoc]] quantizers.base.DiffusersQuantizer

View File

@@ -181,7 +181,7 @@ Then we load the [v1-5 checkpoint](https://huggingface.co/stable-diffusion-v1-5/
```python
model_ckpt_1_5 = "stable-diffusion-v1-5/stable-diffusion-v1-5"
sd_pipeline_1_5 = StableDiffusionPipeline.from_pretrained(model_ckpt_1_5, torch_dtype=torch.float16).to("cuda")
sd_pipeline_1_5 = StableDiffusionPipeline.from_pretrained(model_ckpt_1_5, torch_dtype=weight_dtype).to(device)
images_1_5 = sd_pipeline_1_5(prompts, num_images_per_prompt=1, generator=generator, output_type="np").images
```
@@ -280,7 +280,7 @@ from diffusers import StableDiffusionInstructPix2PixPipeline
instruct_pix2pix_pipeline = StableDiffusionInstructPix2PixPipeline.from_pretrained(
"timbrooks/instruct-pix2pix", torch_dtype=torch.float16
).to("cuda")
).to(device)
```
Now, we perform the edits:
@@ -326,9 +326,9 @@ from transformers import (
clip_id = "openai/clip-vit-large-patch14"
tokenizer = CLIPTokenizer.from_pretrained(clip_id)
text_encoder = CLIPTextModelWithProjection.from_pretrained(clip_id).to("cuda")
text_encoder = CLIPTextModelWithProjection.from_pretrained(clip_id).to(device)
image_processor = CLIPImageProcessor.from_pretrained(clip_id)
image_encoder = CLIPVisionModelWithProjection.from_pretrained(clip_id).to("cuda")
image_encoder = CLIPVisionModelWithProjection.from_pretrained(clip_id).to(device)
```
Notice that we are using a particular CLIP checkpoint, i.e., `openai/clip-vit-large-patch14`. This is because the Stable Diffusion pre-training was performed with this CLIP variant. For more details, refer to the [documentation](https://huggingface.co/docs/transformers/model_doc/clip).
@@ -350,7 +350,7 @@ class DirectionalSimilarity(nn.Module):
def preprocess_image(self, image):
image = self.image_processor(image, return_tensors="pt")["pixel_values"]
return {"pixel_values": image.to("cuda")}
return {"pixel_values": image.to(device)}
def tokenize_text(self, text):
inputs = self.tokenizer(
@@ -360,7 +360,7 @@ class DirectionalSimilarity(nn.Module):
truncation=True,
return_tensors="pt",
)
return {"input_ids": inputs.input_ids.to("cuda")}
return {"input_ids": inputs.input_ids.to(device)}
def encode_image(self, image):
preprocessed_image = self.preprocess_image(image)
@@ -459,7 +459,6 @@ with ZipFile(local_filepath, "r") as zipper:
```python
from PIL import Image
import os
import numpy as np
dataset_path = "sample-imagenet-images"
image_paths = sorted([os.path.join(dataset_path, x) for x in os.listdir(dataset_path)])
@@ -478,7 +477,6 @@ Now that the images are loaded, let's apply some lightweight pre-processing on t
```python
from torchvision.transforms import functional as F
import torch
def preprocess_image(image):
@@ -500,10 +498,6 @@ dit_pipeline = DiTPipeline.from_pretrained("facebook/DiT-XL-2-256", torch_dtype=
dit_pipeline.scheduler = DPMSolverMultistepScheduler.from_config(dit_pipeline.scheduler.config)
dit_pipeline = dit_pipeline.to("cuda")
seed = 0
generator = torch.manual_seed(seed)
words = [
"cassette player",
"chainsaw",

View File

@@ -17,12 +17,6 @@ specific language governing permissions and limitations under the License.
4-bit quantization compresses a model even further, and it is commonly used with [QLoRA](https://hf.co/papers/2305.14314) to finetune quantized LLMs.
This guide demonstrates how quantization can enable running
[FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev)
on less than 16GB of VRAM and even on a free Google
Colab instance.
![comparison image](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/quant-bnb/comparison.png)
To use bitsandbytes, make sure you have the following libraries installed:
@@ -37,167 +31,70 @@ Now you can quantize a model by passing a [`BitsAndBytesConfig`] to [`~ModelMixi
Quantizing a model in 8-bit halves the memory-usage:
bitsandbytes is supported in both Transformers and Diffusers, so you can quantize both the
[`FluxTransformer2DModel`] and [`~transformers.T5EncoderModel`].
```py
from diffusers import FluxTransformer2DModel, BitsAndBytesConfig
For Ada and higher-series GPUs. we recommend changing `torch_dtype` to `torch.bfloat16`.
quantization_config = BitsAndBytesConfig(load_in_8bit=True)
> [!TIP]
> The [`CLIPTextModel`] and [`AutoencoderKL`] aren't quantized because they're already small in size and because [`AutoencoderKL`] only has a few `torch.nn.Linear` layers.
model_8bit = FluxTransformer2DModel.from_pretrained(
"black-forest-labs/FLUX.1-dev",
subfolder="transformer",
quantization_config=quantization_config
)
```
By default, all the other modules such as `torch.nn.LayerNorm` are converted to `torch.float16`. You can change the data type of these modules with the `torch_dtype` parameter if you want:
```py
from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig
from transformers import BitsAndBytesConfig as TransformersBitsAndBytesConfig
from diffusers import FluxTransformer2DModel, BitsAndBytesConfig
from diffusers import FluxTransformer2DModel
from transformers import T5EncoderModel
quantization_config = BitsAndBytesConfig(load_in_8bit=True)
quant_config = TransformersBitsAndBytesConfig(load_in_8bit=True,)
text_encoder_2_8bit = T5EncoderModel.from_pretrained(
"black-forest-labs/FLUX.1-dev",
subfolder="text_encoder_2",
quantization_config=quant_config,
torch_dtype=torch.float16,
)
quant_config = DiffusersBitsAndBytesConfig(load_in_8bit=True,)
transformer_8bit = FluxTransformer2DModel.from_pretrained(
"black-forest-labs/FLUX.1-dev",
model_8bit = FluxTransformer2DModel.from_pretrained(
"black-forest-labs/FLUX.1-dev",
subfolder="transformer",
quantization_config=quant_config,
torch_dtype=torch.float16,
quantization_config=quantization_config,
torch_dtype=torch.float32
)
model_8bit.transformer_blocks.layers[-1].norm2.weight.dtype
```
By default, all the other modules such as `torch.nn.LayerNorm` are converted to `torch.float16`. You can change the data type of these modules with the `torch_dtype` parameter.
```diff
transformer_8bit = FluxTransformer2DModel.from_pretrained(
"black-forest-labs/FLUX.1-dev",
subfolder="transformer",
quantization_config=quant_config,
+ torch_dtype=torch.float32,
)
```
Let's generate an image using our quantized models.
Setting `device_map="auto"` automatically fills all available space on the GPU(s) first, then the
CPU, and finally, the hard drive (the absolute slowest option) if there is still not enough memory.
```py
pipe = FluxPipeline.from_pretrained(
"black-forest-labs/FLUX.1-dev",
transformer=transformer_8bit,
text_encoder_2=text_encoder_2_8bit,
torch_dtype=torch.float16,
device_map="auto",
)
pipe_kwargs = {
"prompt": "A cat holding a sign that says hello world",
"height": 1024,
"width": 1024,
"guidance_scale": 3.5,
"num_inference_steps": 50,
"max_sequence_length": 512,
}
image = pipe(**pipe_kwargs, generator=torch.manual_seed(0),).images[0]
```
<div class="flex justify-center">
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/quant-bnb/8bit.png"/>
</div>
When there is enough memory, you can also directly move the pipeline to the GPU with `.to("cuda")` and apply [`~DiffusionPipeline.enable_model_cpu_offload`] to optimize GPU memory usage.
Once a model is quantized, you can push the model to the Hub with the [`~ModelMixin.push_to_hub`] method. The quantization `config.json` file is pushed first, followed by the quantized model weights. You can also save the serialized 8-bit models locally with [`~ModelMixin.save_pretrained`].
Once a model is quantized, you can push the model to the Hub with the [`~ModelMixin.push_to_hub`] method. The quantization `config.json` file is pushed first, followed by the quantized model weights. You can also save the serialized 4-bit models locally with [`~ModelMixin.save_pretrained`].
</hfoption>
<hfoption id="4-bit">
Quantizing a model in 4-bit reduces your memory-usage by 4x:
bitsandbytes is supported in both Transformers and Diffusers, so you can can quantize both the
[`FluxTransformer2DModel`] and [`~transformers.T5EncoderModel`].
```py
from diffusers import FluxTransformer2DModel, BitsAndBytesConfig
For Ada and higher-series GPUs. we recommend changing `torch_dtype` to `torch.bfloat16`.
quantization_config = BitsAndBytesConfig(load_in_4bit=True)
> [!TIP]
> The [`CLIPTextModel`] and [`AutoencoderKL`] aren't quantized because they're already small in size and because [`AutoencoderKL`] only has a few `torch.nn.Linear` layers.
model_4bit = FluxTransformer2DModel.from_pretrained(
"black-forest-labs/FLUX.1-dev",
subfolder="transformer",
quantization_config=quantization_config
)
```
By default, all the other modules such as `torch.nn.LayerNorm` are converted to `torch.float16`. You can change the data type of these modules with the `torch_dtype` parameter if you want:
```py
from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig
from transformers import BitsAndBytesConfig as TransformersBitsAndBytesConfig
from diffusers import FluxTransformer2DModel, BitsAndBytesConfig
from diffusers import FluxTransformer2DModel
from transformers import T5EncoderModel
quantization_config = BitsAndBytesConfig(load_in_4bit=True)
quant_config = TransformersBitsAndBytesConfig(load_in_4bit=True,)
text_encoder_2_4bit = T5EncoderModel.from_pretrained(
"black-forest-labs/FLUX.1-dev",
subfolder="text_encoder_2",
quantization_config=quant_config,
torch_dtype=torch.float16,
)
quant_config = DiffusersBitsAndBytesConfig(load_in_4bit=True,)
transformer_4bit = FluxTransformer2DModel.from_pretrained(
"black-forest-labs/FLUX.1-dev",
model_4bit = FluxTransformer2DModel.from_pretrained(
"black-forest-labs/FLUX.1-dev",
subfolder="transformer",
quantization_config=quant_config,
torch_dtype=torch.float16,
quantization_config=quantization_config,
torch_dtype=torch.float32
)
model_4bit.transformer_blocks.layers[-1].norm2.weight.dtype
```
By default, all the other modules such as `torch.nn.LayerNorm` are converted to `torch.float16`. You can change the data type of these modules with the `torch_dtype` parameter.
```diff
transformer_4bit = FluxTransformer2DModel.from_pretrained(
"black-forest-labs/FLUX.1-dev",
subfolder="transformer",
quantization_config=quant_config,
+ torch_dtype=torch.float32,
)
```
Let's generate an image using our quantized models.
Setting `device_map="auto"` automatically fills all available space on the GPU(s) first, then the CPU, and finally, the hard drive (the absolute slowest option) if there is still not enough memory.
```py
pipe = FluxPipeline.from_pretrained(
"black-forest-labs/FLUX.1-dev",
transformer=transformer_4bit,
text_encoder_2=text_encoder_2_4bit,
torch_dtype=torch.float16,
device_map="auto",
)
pipe_kwargs = {
"prompt": "A cat holding a sign that says hello world",
"height": 1024,
"width": 1024,
"guidance_scale": 3.5,
"num_inference_steps": 50,
"max_sequence_length": 512,
}
image = pipe(**pipe_kwargs, generator=torch.manual_seed(0),).images[0]
```
<div class="flex justify-center">
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/quant-bnb/4bit.png"/>
</div>
When there is enough memory, you can also directly move the pipeline to the GPU with `.to("cuda")` and apply [`~DiffusionPipeline.enable_model_cpu_offload`] to optimize GPU memory usage.
Once a model is quantized, you can push the model to the Hub with the [`~ModelMixin.push_to_hub`] method. The quantization `config.json` file is pushed first, followed by the quantized model weights. You can also save the serialized 4-bit models locally with [`~ModelMixin.save_pretrained`].
Call [`~ModelMixin.push_to_hub`] after loading it in 4-bit precision. You can also save the serialized 4-bit models locally with [`~ModelMixin.save_pretrained`].
</hfoption>
</hfoptions>
@@ -302,34 +199,17 @@ quantization_config = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_compute_dty
NF4 is a 4-bit data type from the [QLoRA](https://hf.co/papers/2305.14314) paper, adapted for weights initialized from a normal distribution. You should use NF4 for training 4-bit base models. This can be configured with the `bnb_4bit_quant_type` parameter in the [`BitsAndBytesConfig`]:
```py
from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig
from transformers import BitsAndBytesConfig as TransformersBitsAndBytesConfig
from diffusers import BitsAndBytesConfig
from diffusers import FluxTransformer2DModel
from transformers import T5EncoderModel
quant_config = TransformersBitsAndBytesConfig(
nf4_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
)
text_encoder_2_4bit = T5EncoderModel.from_pretrained(
"black-forest-labs/FLUX.1-dev",
subfolder="text_encoder_2",
quantization_config=quant_config,
torch_dtype=torch.float16,
)
quant_config = DiffusersBitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
)
transformer_4bit = FluxTransformer2DModel.from_pretrained(
"black-forest-labs/FLUX.1-dev",
model_nf4 = SD3Transformer2DModel.from_pretrained(
"stabilityai/stable-diffusion-3-medium-diffusers",
subfolder="transformer",
quantization_config=quant_config,
torch_dtype=torch.float16,
quantization_config=nf4_config,
)
```
@@ -340,74 +220,38 @@ For inference, the `bnb_4bit_quant_type` does not have a huge impact on performa
Nested quantization is a technique that can save additional memory at no additional performance cost. This feature performs a second quantization of the already quantized weights to save an additional 0.4 bits/parameter.
```py
from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig
from transformers import BitsAndBytesConfig as TransformersBitsAndBytesConfig
from diffusers import BitsAndBytesConfig
from diffusers import FluxTransformer2DModel
from transformers import T5EncoderModel
quant_config = TransformersBitsAndBytesConfig(
double_quant_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
)
text_encoder_2_4bit = T5EncoderModel.from_pretrained(
"black-forest-labs/FLUX.1-dev",
subfolder="text_encoder_2",
quantization_config=quant_config,
torch_dtype=torch.float16,
)
quant_config = DiffusersBitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
)
transformer_4bit = FluxTransformer2DModel.from_pretrained(
"black-forest-labs/FLUX.1-dev",
double_quant_model = SD3Transformer2DModel.from_pretrained(
"stabilityai/stable-diffusion-3-medium-diffusers",
subfolder="transformer",
quantization_config=quant_config,
torch_dtype=torch.float16,
quantization_config=double_quant_config,
)
```
## Dequantizing `bitsandbytes` models
Once quantized, you can dequantize a model to its original precision, but this might result in a small loss of quality. Make sure you have enough GPU RAM to fit the dequantized model.
Once quantized, you can dequantize the model to the original precision but this might result in a small quality loss of the model. Make sure you have enough GPU RAM to fit the dequantized model.
```python
from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig
from transformers import BitsAndBytesConfig as TransformersBitsAndBytesConfig
from diffusers import BitsAndBytesConfig
from diffusers import FluxTransformer2DModel
from transformers import T5EncoderModel
quant_config = TransformersBitsAndBytesConfig(
double_quant_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
)
text_encoder_2_4bit = T5EncoderModel.from_pretrained(
"black-forest-labs/FLUX.1-dev",
subfolder="text_encoder_2",
quantization_config=quant_config,
torch_dtype=torch.float16,
)
quant_config = DiffusersBitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
)
transformer_4bit = FluxTransformer2DModel.from_pretrained(
"black-forest-labs/FLUX.1-dev",
double_quant_model = SD3Transformer2DModel.from_pretrained(
"stabilityai/stable-diffusion-3-medium-diffusers",
subfolder="transformer",
quantization_config=quant_config,
torch_dtype=torch.float16,
quantization_config=double_quant_config,
)
text_encoder_2_4bit.dequantize()
transformer_4bit.dequantize()
model.dequantize()
```
## Resources

View File

@@ -1,69 +0,0 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# GGUF
The GGUF file format is typically used to store models for inference with [GGML](https://github.com/ggerganov/ggml) and supports a variety of block wise quantization options. Diffusers supports loading checkpoints prequantized and saved in the GGUF format via `from_single_file` loading with Model classes. Loading GGUF checkpoints via Pipelines is currently not supported.
The following example will load the [FLUX.1 DEV](https://huggingface.co/black-forest-labs/FLUX.1-dev) transformer model using the GGUF Q2_K quantization variant.
Before starting please install gguf in your environment
```shell
pip install -U gguf
```
Since GGUF is a single file format, use [`~FromSingleFileMixin.from_single_file`] to load the model and pass in the [`GGUFQuantizationConfig`].
When using GGUF checkpoints, the quantized weights remain in a low memory `dtype`(typically `torch.uint8`) and are dynamically dequantized and cast to the configured `compute_dtype` during each module's forward pass through the model. The `GGUFQuantizationConfig` allows you to set the `compute_dtype`.
The functions used for dynamic dequantizatation are based on the great work done by [city96](https://github.com/city96/ComfyUI-GGUF), who created the Pytorch ports of the original [`numpy`](https://github.com/ggerganov/llama.cpp/blob/master/gguf-py/gguf/quants.py) implementation by [compilade](https://github.com/compilade).
```python
import torch
from diffusers import FluxPipeline, FluxTransformer2DModel, GGUFQuantizationConfig
ckpt_path = (
"https://huggingface.co/city96/FLUX.1-dev-gguf/blob/main/flux1-dev-Q2_K.gguf"
)
transformer = FluxTransformer2DModel.from_single_file(
ckpt_path,
quantization_config=GGUFQuantizationConfig(compute_dtype=torch.bfloat16),
torch_dtype=torch.bfloat16,
)
pipe = FluxPipeline.from_pretrained(
"black-forest-labs/FLUX.1-dev",
transformer=transformer,
torch_dtype=torch.bfloat16,
)
pipe.enable_model_cpu_offload()
prompt = "A cat holding a sign that says hello world"
image = pipe(prompt, generator=torch.manual_seed(0)).images[0]
image.save("flux-gguf.png")
```
## Supported Quantization Types
- BF16
- Q4_0
- Q4_1
- Q5_0
- Q5_1
- Q8_0
- Q2_K
- Q3_K
- Q4_K
- Q5_K
- Q6_K

View File

@@ -17,7 +17,7 @@ Quantization techniques focus on representing data with less information while a
<Tip>
Interested in adding a new quantization method to Diffusers? Refer to the [Contribute new quantization method guide](https://huggingface.co/docs/transformers/main/en/quantization/contribute) to learn more about adding a new quantization method.
Interested in adding a new quantization method to Transformers? Refer to the [Contribute new quantization method guide](https://huggingface.co/docs/transformers/main/en/quantization/contribute) to learn more about adding a new quantization method.
</Tip>
@@ -32,9 +32,4 @@ If you are new to the quantization field, we recommend you to check out these be
## When to use what?
Diffusers currently supports the following quantization methods.
- [BitsandBytes](./bitsandbytes)
- [TorchAO](./torchao)
- [GGUF](./gguf)
[This resource](https://huggingface.co/docs/transformers/main/en/quantization/overview#when-to-use-what) provides a good overview of the pros and cons of different quantization techniques.
This section will be expanded once Diffusers has multiple quantization backends. Currently, we only support `bitsandbytes`. [This resource](https://huggingface.co/docs/transformers/main/en/quantization/overview#when-to-use-what) provides a good overview of the pros and cons of different quantization techniques.

View File

@@ -1,94 +0,0 @@
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License. -->
# torchao
[TorchAO](https://github.com/pytorch/ao) is an architecture optimization library for PyTorch. It provides high-performance dtypes, optimization techniques, and kernels for inference and training, featuring composability with native PyTorch features like [torch.compile](https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html), FullyShardedDataParallel (FSDP), and more.
Before you begin, make sure you have Pytorch 2.5+ and TorchAO installed.
```bash
pip install -U torch torchao
```
Quantize a model by passing [`TorchAoConfig`] to [`~ModelMixin.from_pretrained`] (you can also load pre-quantized models). This works for any model in any modality, as long as it supports loading with [Accelerate](https://hf.co/docs/accelerate/index) and contains `torch.nn.Linear` layers.
The example below only quantizes the weights to int8.
```python
from diffusers import FluxPipeline, FluxTransformer2DModel, TorchAoConfig
model_id = "black-forest-labs/FLUX.1-dev"
dtype = torch.bfloat16
quantization_config = TorchAoConfig("int8wo")
transformer = FluxTransformer2DModel.from_pretrained(
model_id,
subfolder="transformer",
quantization_config=quantization_config,
torch_dtype=dtype,
)
pipe = FluxPipeline.from_pretrained(
model_id,
transformer=transformer,
torch_dtype=dtype,
)
pipe.to("cuda")
prompt = "A cat holding a sign that says hello world"
image = pipe(
prompt, num_inference_steps=50, guidance_scale=4.5, max_sequence_length=512
).images[0]
image.save("output.png")
```
TorchAO is fully compatible with [torch.compile](./optimization/torch2.0#torchcompile), setting it apart from other quantization methods. This makes it easy to speed up inference with just one line of code.
```python
# In the above code, add the following after initializing the transformer
transformer = torch.compile(transformer, mode="max-autotune", fullgraph=True)
```
For speed and memory benchmarks on Flux and CogVideoX, please refer to the table [here](https://github.com/huggingface/diffusers/pull/10009#issue-2688781450). You can also find some torchao [benchmarks](https://github.com/pytorch/ao/tree/main/torchao/quantization#benchmarks) numbers for various hardware.
torchao also supports an automatic quantization API through [autoquant](https://github.com/pytorch/ao/blob/main/torchao/quantization/README.md#autoquantization). Autoquantization determines the best quantization strategy applicable to a model by comparing the performance of each technique on chosen input types and shapes. Currently, this can be used directly on the underlying modeling components. Diffusers will also expose an autoquant configuration option in the future.
The `TorchAoConfig` class accepts three parameters:
- `quant_type`: A string value mentioning one of the quantization types below.
- `modules_to_not_convert`: A list of module full/partial module names for which quantization should not be performed. For example, to not perform any quantization of the [`FluxTransformer2DModel`]'s first block, one would specify: `modules_to_not_convert=["single_transformer_blocks.0"]`.
- `kwargs`: A dict of keyword arguments to pass to the underlying quantization method which will be invoked based on `quant_type`.
## Supported quantization types
torchao supports weight-only quantization and weight and dynamic-activation quantization for int8, float3-float8, and uint1-uint7.
Weight-only quantization stores the model weights in a specific low-bit data type but performs computation with a higher-precision data type, like `bfloat16`. This lowers the memory requirements from model weights but retains the memory peaks for activation computation.
Dynamic activation quantization stores the model weights in a low-bit dtype, while also quantizing the activations on-the-fly to save additional memory. This lowers the memory requirements from model weights, while also lowering the memory overhead from activation computations. However, this may come at a quality tradeoff at times, so it is recommended to test different models thoroughly.
The quantization methods supported are as follows:
| **Category** | **Full Function Names** | **Shorthands** |
|--------------|-------------------------|----------------|
| **Integer quantization** | `int4_weight_only`, `int8_dynamic_activation_int4_weight`, `int8_weight_only`, `int8_dynamic_activation_int8_weight` | `int4wo`, `int4dq`, `int8wo`, `int8dq` |
| **Floating point 8-bit quantization** | `float8_weight_only`, `float8_dynamic_activation_float8_weight`, `float8_static_activation_float8_weight` | `float8wo`, `float8wo_e5m2`, `float8wo_e4m3`, `float8dq`, `float8dq_e4m3`, `float8_e4m3_tensor`, `float8_e4m3_row` |
| **Floating point X-bit quantization** | `fpx_weight_only` | `fpX_eAwB` where `X` is the number of bits (1-7), `A` is exponent bits, and `B` is mantissa bits. Constraint: `X == A + B + 1` |
| **Unsigned Integer quantization** | `uintx_weight_only` | `uint1wo`, `uint2wo`, `uint3wo`, `uint4wo`, `uint5wo`, `uint6wo`, `uint7wo` |
Some quantization methods are aliases (for example, `int8wo` is the commonly used shorthand for `int8_weight_only`). This allows using the quantization methods described in the torchao docs as-is, while also making it convenient to remember their shorthand notations.
Refer to the official torchao documentation for a better understanding of the available quantization methods and the exhaustive list of configuration options available.
## Resources
- [TorchAO Quantization API](https://github.com/pytorch/ao/blob/main/torchao/quantization/README.md)
- [Diffusers-TorchAO examples](https://github.com/sayakpaul/diffusers-torchao)

View File

@@ -1,6 +1,6 @@
# Create a dataset for training
There are many datasets on the [Hub](https://huggingface.co/datasets?task_categories=task_categories:text-to-image&sort=downloads) to train a model on, but if you can't find one you're interested in or want to use your own, you can create a dataset with the 🤗 [Datasets](https://huggingface.co/docs/datasets) library. The dataset structure depends on the task you want to train your model on. The most basic dataset structure is a directory of images for tasks like unconditional image generation. Another dataset structure may be a directory of images and a text file containing their corresponding text captions for tasks like text-to-image generation.
There are many datasets on the [Hub](https://huggingface.co/datasets?task_categories=task_categories:text-to-image&sort=downloads) to train a model on, but if you can't find one you're interested in or want to use your own, you can create a dataset with the 🤗 [Datasets](hf.co/docs/datasets) library. The dataset structure depends on the task you want to train your model on. The most basic dataset structure is a directory of images for tasks like unconditional image generation. Another dataset structure may be a directory of images and a text file containing their corresponding text captions for tasks like text-to-image generation.
This guide will show you two ways to create a dataset to finetune on:
@@ -87,4 +87,4 @@ accelerate launch --mixed_precision="fp16" train_text_to_image.py \
Now that you've created a dataset, you can plug it into the `train_data_dir` (if your dataset is local) or `dataset_name` (if your dataset is on the Hub) arguments of a training script.
For your next steps, feel free to try and use your dataset to train a model for [unconditional generation](unconditional_training) or [text-to-image generation](text2image)!
For your next steps, feel free to try and use your dataset to train a model for [unconditional generation](unconditional_training) or [text-to-image generation](text2image)!

View File

@@ -75,7 +75,7 @@ For convenience, create a `TrainingConfig` class containing the training hyperpa
... push_to_hub = True # whether to upload the saved model to the HF Hub
... hub_model_id = "<your-username>/<my-awesome-model>" # the name of the repository to create on the HF Hub
... hub_private_repo = None
... hub_private_repo = False
... overwrite_output_dir = True # overwrite the old model when re-running the notebook
... seed = 0

View File

@@ -56,7 +56,7 @@ image
With the `adapter_name` parameter, it is really easy to use another adapter for inference! Load the [nerijs/pixel-art-xl](https://huggingface.co/nerijs/pixel-art-xl) adapter that has been fine-tuned to generate pixel art images and call it `"pixel"`.
The pipeline automatically sets the first loaded adapter (`"toy"`) as the active adapter, but you can activate the `"pixel"` adapter with the [`~PeftAdapterMixin.set_adapters`] method:
The pipeline automatically sets the first loaded adapter (`"toy"`) as the active adapter, but you can activate the `"pixel"` adapter with the [`~diffusers.loaders.UNet2DConditionLoadersMixin.set_adapters`] method:
```python
pipe.load_lora_weights("nerijs/pixel-art-xl", weight_name="pixel-art-xl.safetensors", adapter_name="pixel")
@@ -85,7 +85,7 @@ By default, if the most up-to-date versions of PEFT and Transformers are detecte
You can also merge different adapter checkpoints for inference to blend their styles together.
Once again, use the [`~PeftAdapterMixin.set_adapters`] method to activate the `pixel` and `toy` adapters and specify the weights for how they should be merged.
Once again, use the [`~diffusers.loaders.UNet2DConditionLoadersMixin.set_adapters`] method to activate the `pixel` and `toy` adapters and specify the weights for how they should be merged.
```python
pipe.set_adapters(["pixel", "toy"], adapter_weights=[0.5, 1.0])
@@ -114,7 +114,7 @@ Impressive! As you can see, the model generated an image that mixed the characte
> [!TIP]
> Through its PEFT integration, Diffusers also offers more efficient merging methods which you can learn about in the [Merge LoRAs](../using-diffusers/merge_loras) guide!
To return to only using one adapter, use the [`~PeftAdapterMixin.set_adapters`] method to activate the `"toy"` adapter:
To return to only using one adapter, use the [`~diffusers.loaders.UNet2DConditionLoadersMixin.set_adapters`] method to activate the `"toy"` adapter:
```python
pipe.set_adapters("toy")
@@ -127,7 +127,7 @@ image = pipe(
image
```
Or to disable all adapters entirely, use the [`~PeftAdapterMixin.disable_lora`] method to return the base model.
Or to disable all adapters entirely, use the [`~diffusers.loaders.UNet2DConditionLoadersMixin.disable_lora`] method to return the base model.
```python
pipe.disable_lora()
@@ -140,8 +140,7 @@ image
![no-lora](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/peft_integration/diffusers_peft_lora_inference_20_1.png)
### Customize adapters strength
For even more customization, you can control how strongly the adapter affects each part of the pipeline. For this, pass a dictionary with the control strengths (called "scales") to [`~PeftAdapterMixin.set_adapters`].
For even more customization, you can control how strongly the adapter affects each part of the pipeline. For this, pass a dictionary with the control strengths (called "scales") to [`~diffusers.loaders.UNet2DConditionLoadersMixin.set_adapters`].
For example, here's how you can turn on the adapter for the `down` parts, but turn it off for the `mid` and `up` parts:
```python
@@ -196,7 +195,7 @@ image
![block-lora-mixed](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/peft_integration/diffusers_peft_lora_inference_block_mixed.png)
## Manage adapters
## Manage active adapters
You have attached multiple adapters in this tutorial, and if you're feeling a bit lost on what adapters have been attached to the pipeline's components, use the [`~diffusers.loaders.StableDiffusionLoraLoaderMixin.get_active_adapters`] method to check the list of active adapters:
@@ -213,11 +212,3 @@ list_adapters_component_wise = pipe.get_list_adapters()
list_adapters_component_wise
{"text_encoder": ["toy", "pixel"], "unet": ["toy", "pixel"], "text_encoder_2": ["toy", "pixel"]}
```
The [`~PeftAdapterMixin.delete_adapters`] function completely removes an adapter and their LoRA layers from a model.
```py
pipe.delete_adapters("toy")
pipe.get_active_adapters()
["pixel"]
```

View File

@@ -1,61 +0,0 @@
# Create a server
Diffusers' pipelines can be used as an inference engine for a server. It supports concurrent and multithreaded requests to generate images that may be requested by multiple users at the same time.
This guide will show you how to use the [`StableDiffusion3Pipeline`] in a server, but feel free to use any pipeline you want.
Start by navigating to the `examples/server` folder and installing all of the dependencies.
```py
pip install .
pip install -f requirements.txt
```
Launch the server with the following command.
```py
python server.py
```
The server is accessed at http://localhost:8000. You can curl this model with the following command.
```
curl -X POST -H "Content-Type: application/json" --data '{"model": "something", "prompt": "a kitten in front of a fireplace"}' http://localhost:8000/v1/images/generations
```
If you need to upgrade some dependencies, you can use either [pip-tools](https://github.com/jazzband/pip-tools) or [uv](https://github.com/astral-sh/uv). For example, upgrade the dependencies with `uv` using the following command.
```
uv pip compile requirements.in -o requirements.txt
```
The server is built with [FastAPI](https://fastapi.tiangolo.com/async/). The endpoint for `v1/images/generations` is shown below.
```py
@app.post("/v1/images/generations")
async def generate_image(image_input: TextToImageInput):
try:
loop = asyncio.get_event_loop()
scheduler = shared_pipeline.pipeline.scheduler.from_config(shared_pipeline.pipeline.scheduler.config)
pipeline = StableDiffusion3Pipeline.from_pipe(shared_pipeline.pipeline, scheduler=scheduler)
generator = torch.Generator(device="cuda")
generator.manual_seed(random.randint(0, 10000000))
output = await loop.run_in_executor(None, lambda: pipeline(image_input.prompt, generator = generator))
logger.info(f"output: {output}")
image_url = save_image(output.images[0])
return {"data": [{"url": image_url}]}
except Exception as e:
if isinstance(e, HTTPException):
raise e
elif hasattr(e, 'message'):
raise HTTPException(status_code=500, detail=e.message + traceback.format_exc())
raise HTTPException(status_code=500, detail=str(e) + traceback.format_exc())
```
The `generate_image` function is defined as asynchronous with the [async](https://fastapi.tiangolo.com/async/) keyword so that FastAPI knows that whatever is happening in this function won't necessarily return a result right away. Once it hits some point in the function that it needs to await some other [Task](https://docs.python.org/3/library/asyncio-task.html#asyncio.Task), the main thread goes back to answering other HTTP requests. This is shown in the code below with the [await](https://fastapi.tiangolo.com/async/#async-and-await) keyword.
```py
output = await loop.run_in_executor(None, lambda: pipeline(image_input.prompt, generator = generator))
```
At this point, the execution of the pipeline function is placed onto a [new thread](https://docs.python.org/3/library/asyncio-eventloop.html#asyncio.loop.run_in_executor), and the main thread performs other things until a result is returned from the `pipeline`.
Another important aspect of this implementation is creating a `pipeline` from `shared_pipeline`. The goal behind this is to avoid loading the underlying model more than once onto the GPU while still allowing for each new request that is running on a separate thread to have its own generator and scheduler. The scheduler, in particular, is not thread-safe, and it will cause errors like: `IndexError: index 21 is out of bounds for dimension 0 with size 21` if you try to use the same scheduler across multiple threads.

View File

@@ -134,16 +134,14 @@ The [`~loaders.StableDiffusionLoraLoaderMixin.load_lora_weights`] method loads L
- the LoRA weights don't have separate identifiers for the UNet and text encoder
- the LoRA weights have separate identifiers for the UNet and text encoder
To directly load (and save) a LoRA adapter at the *model-level*, use [`~PeftAdapterMixin.load_lora_adapter`], which builds and prepares the necessary model configuration for the adapter. Like [`~loaders.StableDiffusionLoraLoaderMixin.load_lora_weights`], [`PeftAdapterMixin.load_lora_adapter`] can load LoRAs for both the UNet and text encoder. For example, if you're loading a LoRA for the UNet, [`PeftAdapterMixin.load_lora_adapter`] ignores the keys for the text encoder.
Use the `weight_name` parameter to specify the specific weight file and the `prefix` parameter to filter for the appropriate state dicts (`"unet"` in this case) to load.
But if you only need to load LoRA weights into the UNet, then you can use the [`~loaders.UNet2DConditionLoadersMixin.load_attn_procs`] method. Let's load the [jbilcke-hf/sdxl-cinematic-1](https://huggingface.co/jbilcke-hf/sdxl-cinematic-1) LoRA:
```py
from diffusers import AutoPipelineForText2Image
import torch
pipeline = AutoPipelineForText2Image.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16).to("cuda")
pipeline.unet.load_lora_adapter("jbilcke-hf/sdxl-cinematic-1", weight_name="pytorch_lora_weights.safetensors", prefix="unet")
pipeline.unet.load_attn_procs("jbilcke-hf/sdxl-cinematic-1", weight_name="pytorch_lora_weights.safetensors")
# use cnmt in the prompt to trigger the LoRA
prompt = "A cute cnmt eating a slice of pizza, stunning color scheme, masterpiece, illustration"
@@ -155,8 +153,6 @@ image
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/load_attn_proc.png" />
</div>
Save an adapter with [`~PeftAdapterMixin.save_lora_adapter`].
To unload the LoRA weights, use the [`~loaders.StableDiffusionLoraLoaderMixin.unload_lora_weights`] method to discard the LoRA weights and restore the model to its original weights:
```py

View File

@@ -121,7 +121,7 @@ image = pipe(prompt=prompt, image=init_image, mask_image=mask_image, num_inferen
### 이미지 결과물을 정제하기
[base 모델 체크포인트](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0)에서, StableDiffusion-XL 또한 고주파 품질을 향상시키는 이미지를 생성하기 위해 낮은 노이즈 단계 이미지를 제거하는데 특화된 [refiner 체크포인트](https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0)를 포함하고 있습니다. 이 refiner 체크포인트는 이미지 품질을 향상시키기 위해 base 체크포인트를 실행한 후 "두 번째 단계" 파이프라인에 사용될 수 있습니다.
[base 모델 체크포인트](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0)에서, StableDiffusion-XL 또한 고주파 품질을 향상시키는 이미지를 생성하기 위해 낮은 노이즈 단계 이미지를 제거하는데 특화된 [refiner 체크포인트](huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0)를 포함하고 있습니다. 이 refiner 체크포인트는 이미지 품질을 향상시키기 위해 base 체크포인트를 실행한 후 "두 번째 단계" 파이프라인에 사용될 수 있습니다.
refiner를 사용할 때, 쉽게 사용할 수 있습니다
- 1.) base 모델과 refiner을 사용하는데, 이는 *Denoisers의 앙상블*을 위한 첫 번째 제안된 [eDiff-I](https://research.nvidia.com/labs/dir/eDiff-I/)를 사용하거나
@@ -215,7 +215,7 @@ image = refiner(
#### 2.) 노이즈가 완전히 제거된 기본 이미지에서 이미지 출력을 정제하기
일반적인 [`StableDiffusionImg2ImgPipeline`] 방식에서, 기본 모델에서 생성된 완전히 노이즈가 제거된 이미지는 [refiner checkpoint](https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0)를 사용해 더 향상시킬 수 있습니다.
일반적인 [`StableDiffusionImg2ImgPipeline`] 방식에서, 기본 모델에서 생성된 완전히 노이즈가 제거된 이미지는 [refiner checkpoint](huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0)를 사용해 더 향상시킬 수 있습니다.
이를 위해, 보통의 "base" text-to-image 파이프라인을 수행 후에 image-to-image 파이프라인으로써 refiner를 실행시킬 수 있습니다. base 모델의 출력을 잠재 공간에 남겨둘 수 있습니다.

View File

@@ -1,7 +1,7 @@
# 학습을 위한 데이터셋 만들기
[Hub](https://huggingface.co/datasets?task_categories=task_categories:text-to-image&sort=downloads) 에는 모델 교육을 위한 많은 데이터셋이 있지만,
관심이 있거나 사용하고 싶은 데이터셋을 찾을 수 없는 경우 🤗 [Datasets](https://huggingface.co/docs/datasets) 라이브러리를 사용하여 데이터셋을 만들 수 있습니다.
관심이 있거나 사용하고 싶은 데이터셋을 찾을 수 없는 경우 🤗 [Datasets](hf.co/docs/datasets) 라이브러리를 사용하여 데이터셋을 만들 수 있습니다.
데이터셋 구조는 모델을 학습하려는 작업에 따라 달라집니다.
가장 기본적인 데이터셋 구조는 unconditional 이미지 생성과 같은 작업을 위한 이미지 디렉토리입니다.
또 다른 데이터셋 구조는 이미지 디렉토리와 text-to-image 생성과 같은 작업에 해당하는 텍스트 캡션이 포함된 텍스트 파일일 수 있습니다.

View File

@@ -36,7 +36,7 @@ specific language governing permissions and limitations under the License.
[cloneofsimo](https://github.com/cloneofsimo)는 인기 있는 [lora](https://github.com/cloneofsimo/lora) GitHub 리포지토리에서 Stable Diffusion을 위한 LoRA 학습을 최초로 시도했습니다. 🧨 Diffusers는 [text-to-image 생성](https://github.com/huggingface/diffusers/tree/main/examples/text_to_image#training-with-lora) 및 [DreamBooth](https://github.com/huggingface/diffusers/tree/main/examples/dreambooth#training-with-low-rank-adaptation-of-large-language-models-lora)을 지원합니다. 이 가이드는 두 가지를 모두 수행하는 방법을 보여줍니다.
모델을 저장하거나 커뮤니티와 공유하려면 Hugging Face 계정에 로그인하세요(아직 계정이 없는 경우 [생성](https://huggingface.co/join)하세요):
모델을 저장하거나 커뮤니티와 공유하려면 Hugging Face 계정에 로그인하세요(아직 계정이 없는 경우 [생성](hf.co/join)하세요):
```bash
huggingface-cli login

View File

@@ -76,7 +76,7 @@ huggingface-cli login
... output_dir = "ddpm-butterflies-128" # 로컬 및 HF Hub에 저장되는 모델명
... push_to_hub = True # 저장된 모델을 HF Hub에 업로드할지 여부
... hub_private_repo = None
... hub_private_repo = False
... overwrite_output_dir = True # 노트북을 다시 실행할 때 이전 모델에 덮어씌울지
... seed = 0

View File

@@ -2154,7 +2154,6 @@ def main(args):
# encode batch prompts when custom prompts are provided for each image -
if train_dataset.custom_instance_prompts:
elems_to_repeat = 1
if freeze_text_encoder:
prompt_embeds, pooled_prompt_embeds, text_ids = compute_text_embeddings(
prompts, text_encoders, tokenizers
@@ -2169,21 +2168,17 @@ def main(args):
max_sequence_length=args.max_sequence_length,
add_special_tokens=add_special_tokens_t5,
)
else:
elems_to_repeat = len(prompts)
if not freeze_text_encoder:
prompt_embeds, pooled_prompt_embeds, text_ids = encode_prompt(
text_encoders=[text_encoder_one, text_encoder_two],
tokenizers=[None, None],
text_input_ids_list=[
tokens_one.repeat(elems_to_repeat, 1),
tokens_two.repeat(elems_to_repeat, 1),
],
text_input_ids_list=[tokens_one, tokens_two],
max_sequence_length=args.max_sequence_length,
device=accelerator.device,
prompt=prompts,
)
# Convert images to latent space
if args.cache_latents:
model_input = latents_cache[step].sample()
@@ -2376,9 +2371,6 @@ def main(args):
epoch=epoch,
torch_dtype=weight_dtype,
)
images = None
del pipeline
if freeze_text_encoder:
del text_encoder_one, text_encoder_two
free_memory()
@@ -2456,8 +2448,6 @@ def main(args):
commit_message="End of training",
ignore_patterns=["step_*", "epoch_*"],
)
images = None
del pipeline
accelerator.end_training()

View File

@@ -872,9 +872,10 @@ def prepare_rotary_positional_embeddings(
crops_coords=grid_crops_coords,
grid_size=(grid_height, grid_width),
temporal_size=num_frames,
device=device,
)
freqs_cos = freqs_cos.to(device=device)
freqs_sin = freqs_sin.to(device=device)
return freqs_cos, freqs_sin

View File

@@ -894,9 +894,10 @@ def prepare_rotary_positional_embeddings(
crops_coords=grid_crops_coords,
grid_size=(grid_height, grid_width),
temporal_size=num_frames,
device=device,
)
freqs_cos = freqs_cos.to(device=device)
freqs_sin = freqs_sin.to(device=device)
return freqs_cos, freqs_sin

View File

@@ -11,22 +11,22 @@ Please also check out our [Community Scripts](https://github.com/huggingface/dif
| Example | Description | Code Example | Colab | Author |
|:--------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------:|
|Adaptive Mask Inpainting|Adaptive Mask Inpainting algorithm from [Beyond the Contact: Discovering Comprehensive Affordance for 3D Objects from Pre-trained 2D Diffusion Models](https://github.com/snuvclab/coma) (ECCV '24, Oral) provides a way to insert human inside the scene image without altering the background, by inpainting with adapting mask.|[Adaptive Mask Inpainting](#adaptive-mask-inpainting)|-|[Hyeonwoo Kim](https://sshowbiz.xyz),[Sookwan Han](https://jellyheadandrew.github.io)|
|Flux with CFG|[Flux with CFG](https://github.com/ToTheBeginning/PuLID/blob/main/docs/pulid_for_flux.md) provides an implementation of using CFG in [Flux](https://blackforestlabs.ai/announcing-black-forest-labs/).|[Flux with CFG](#flux-with-cfg)|[Notebook](https://github.com/huggingface/notebooks/blob/main/diffusers/flux_with_cfg.ipynb)|[Linoy Tsaban](https://github.com/linoytsaban), [Apolinário](https://github.com/apolinario), and [Sayak Paul](https://github.com/sayakpaul)|
|Flux with CFG|[Flux with CFG](https://github.com/ToTheBeginning/PuLID/blob/main/docs/pulid_for_flux.md) provides an implementation of using CFG in [Flux](https://blackforestlabs.ai/announcing-black-forest-labs/).|[Flux with CFG](#flux-with-cfg)|NA|[Linoy Tsaban](https://github.com/linoytsaban), [Apolinário](https://github.com/apolinario), and [Sayak Paul](https://github.com/sayakpaul)|
|Differential Diffusion|[Differential Diffusion](https://github.com/exx8/differential-diffusion) modifies an image according to a text prompt, and according to a map that specifies the amount of change in each region.|[Differential Diffusion](#differential-diffusion)|[![Hugging Face Space](https://img.shields.io/badge/🤗%20Hugging%20Face-Space-yellow)](https://huggingface.co/spaces/exx8/differential-diffusion) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/exx8/differential-diffusion/blob/main/examples/SD2.ipynb)|[Eran Levin](https://github.com/exx8) and [Ohad Fried](https://www.ohadf.com/)|
| HD-Painter | [HD-Painter](https://github.com/Picsart-AI-Research/HD-Painter) enables prompt-faithfull and high resolution (up to 2k) image inpainting upon any diffusion-based image inpainting method. | [HD-Painter](#hd-painter) | [![Hugging Face Space](https://img.shields.io/badge/🤗%20Hugging%20Face-Space-yellow)](https://huggingface.co/spaces/PAIR/HD-Painter) | [Manukyan Hayk](https://github.com/haikmanukyan) and [Sargsyan Andranik](https://github.com/AndranikSargsyan) |
| Marigold Monocular Depth Estimation | A universal monocular depth estimator, utilizing Stable Diffusion, delivering sharp predictions in the wild. (See the [project page](https://marigoldmonodepth.github.io) and [full codebase](https://github.com/prs-eth/marigold) for more details.) | [Marigold Depth Estimation](#marigold-depth-estimation) | [![Hugging Face Space](https://img.shields.io/badge/🤗%20Hugging%20Face-Space-yellow)](https://huggingface.co/spaces/toshas/marigold) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/12G8reD13DdpMie5ZQlaFNo2WCGeNUH-u?usp=sharing) | [Bingxin Ke](https://github.com/markkua) and [Anton Obukhov](https://github.com/toshas) |
| LLM-grounded Diffusion (LMD+) | LMD greatly improves the prompt following ability of text-to-image generation models by introducing an LLM as a front-end prompt parser and layout planner. [Project page.](https://llm-grounded-diffusion.github.io/) [See our full codebase (also with diffusers).](https://github.com/TonyLianLong/LLM-groundedDiffusion) | [LLM-grounded Diffusion (LMD+)](#llm-grounded-diffusion) | [Huggingface Demo](https://huggingface.co/spaces/longlian/llm-grounded-diffusion) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1SXzMSeAB-LJYISb2yrUOdypLz4OYWUKj) | [Long (Tony) Lian](https://tonylian.com/) |
| CLIP Guided Stable Diffusion | Doing CLIP guidance for text to image generation with Stable Diffusion | [CLIP Guided Stable Diffusion](#clip-guided-stable-diffusion) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/CLIP_Guided_Stable_diffusion_with_diffusers.ipynb) | [Suraj Patil](https://github.com/patil-suraj/) |
| One Step U-Net (Dummy) | Example showcasing of how to use Community Pipelines (see <https://github.com/huggingface/diffusers/issues/841>) | [One Step U-Net](#one-step-unet) | - | [Patrick von Platen](https://github.com/patrickvonplaten/) |
| Stable Diffusion Interpolation | Interpolate the latent space of Stable Diffusion between different prompts/seeds | [Stable Diffusion Interpolation](#stable-diffusion-interpolation) | [Notebook](https://github.com/huggingface/notebooks/blob/main/diffusers/stable_diffusion_interpolation.ipynb) | [Nate Raw](https://github.com/nateraw/) |
| Stable Diffusion Mega | **One** Stable Diffusion Pipeline with all functionalities of [Text2Image](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py), [Image2Image](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_img2img.py) and [Inpainting](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_inpaint.py) | [Stable Diffusion Mega](#stable-diffusion-mega) | [Notebook](https://github.com/huggingface/notebooks/blob/main/diffusers/stable_diffusion_mega.ipynb) | [Patrick von Platen](https://github.com/patrickvonplaten/) |
| Long Prompt Weighting Stable Diffusion | **One** Stable Diffusion Pipeline without tokens length limit, and support parsing weighting in prompt. | [Long Prompt Weighting Stable Diffusion](#long-prompt-weighting-stable-diffusion) | [Notebook](https://github.com/huggingface/notebooks/blob/main/diffusers/long_prompt_weighting_stable_diffusion.ipynb) | [SkyTNT](https://github.com/SkyTNT) |
| Speech to Image | Using automatic-speech-recognition to transcribe text and Stable Diffusion to generate images | [Speech to Image](#speech-to-image) |[Notebook](https://github.com/huggingface/notebooks/blob/main/diffusers/speech_to_image.ipynb) | [Mikail Duzenli](https://github.com/MikailINTech)
| Wild Card Stable Diffusion | Stable Diffusion Pipeline that supports prompts that contain wildcard terms (indicated by surrounding double underscores), with values instantiated randomly from a corresponding txt file or a dictionary of possible values | [Wildcard Stable Diffusion](#wildcard-stable-diffusion) | [Notebook](https://github.com/huggingface/notebooks/blob/main/diffusers/wildcard_stable_diffusion.ipynb) | [Shyam Sudhakaran](https://github.com/shyamsn97) |
| Stable Diffusion Interpolation | Interpolate the latent space of Stable Diffusion between different prompts/seeds | [Stable Diffusion Interpolation](#stable-diffusion-interpolation) | - | [Nate Raw](https://github.com/nateraw/) |
| Stable Diffusion Mega | **One** Stable Diffusion Pipeline with all functionalities of [Text2Image](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py), [Image2Image](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_img2img.py) and [Inpainting](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_inpaint.py) | [Stable Diffusion Mega](#stable-diffusion-mega) | - | [Patrick von Platen](https://github.com/patrickvonplaten/) |
| Long Prompt Weighting Stable Diffusion | **One** Stable Diffusion Pipeline without tokens length limit, and support parsing weighting in prompt. | [Long Prompt Weighting Stable Diffusion](#long-prompt-weighting-stable-diffusion) | - | [SkyTNT](https://github.com/SkyTNT) |
| Speech to Image | Using automatic-speech-recognition to transcribe text and Stable Diffusion to generate images | [Speech to Image](#speech-to-image) | - | [Mikail Duzenli](https://github.com/MikailINTech)
| Wild Card Stable Diffusion | Stable Diffusion Pipeline that supports prompts that contain wildcard terms (indicated by surrounding double underscores), with values instantiated randomly from a corresponding txt file or a dictionary of possible values | [Wildcard Stable Diffusion](#wildcard-stable-diffusion) | - | [Shyam Sudhakaran](https://github.com/shyamsn97) |
| [Composable Stable Diffusion](https://energy-based-model.github.io/Compositional-Visual-Generation-with-Composable-Diffusion-Models/) | Stable Diffusion Pipeline that supports prompts that contain "&#124;" in prompts (as an AND condition) and weights (separated by "&#124;" as well) to positively / negatively weight prompts. | [Composable Stable Diffusion](#composable-stable-diffusion) | - | [Mark Rich](https://github.com/MarkRich) |
| Seed Resizing Stable Diffusion | Stable Diffusion Pipeline that supports resizing an image and retaining the concepts of the 512 by 512 generation. | [Seed Resizing](#seed-resizing) | - | [Mark Rich](https://github.com/MarkRich) |
| Imagic Stable Diffusion | Stable Diffusion Pipeline that enables writing a text prompt to edit an existing image | [Imagic Stable Diffusion](#imagic-stable-diffusion) | - | [Mark Rich](https://github.com/MarkRich) |
| Multilingual Stable Diffusion | Stable Diffusion Pipeline that supports prompts in 50 different languages. | [Multilingual Stable Diffusion](#multilingual-stable-diffusion-pipeline) | [Notebook](https://github.com/huggingface/notebooks/blob/main/diffusers/multilingual_stable_diffusion.ipynb) | [Juan Carlos Piñeros](https://github.com/juancopi81) |
| Multilingual Stable Diffusion | Stable Diffusion Pipeline that supports prompts in 50 different languages. | [Multilingual Stable Diffusion](#multilingual-stable-diffusion-pipeline) | - | [Juan Carlos Piñeros](https://github.com/juancopi81) |
| GlueGen Stable Diffusion | Stable Diffusion Pipeline that supports prompts in different languages using GlueGen adapter. | [GlueGen Stable Diffusion](#gluegen-stable-diffusion-pipeline) | - | [Phạm Hồng Vinh](https://github.com/rootonchair) |
| Image to Image Inpainting Stable Diffusion | Stable Diffusion Pipeline that enables the overlaying of two images and subsequent inpainting | [Image to Image Inpainting Stable Diffusion](#image-to-image-inpainting-stable-diffusion) | - | [Alex McKinney](https://github.com/vvvm23) |
| Text Based Inpainting Stable Diffusion | Stable Diffusion Inpainting Pipeline that enables passing a text prompt to generate the mask for inpainting | [Text Based Inpainting Stable Diffusion](#text-based-inpainting-stable-diffusion) | - | [Dhruv Karan](https://github.com/unography) |
@@ -41,8 +41,8 @@ Please also check out our [Community Scripts](https://github.com/huggingface/dif
| DDIM Noise Comparative Analysis Pipeline | Investigating how the diffusion models learn visual concepts from each noise level (which is a contribution of [P2 weighting (CVPR 2022)](https://arxiv.org/abs/2204.00227)) | [DDIM Noise Comparative Analysis Pipeline](#ddim-noise-comparative-analysis-pipeline) | - | [Aengus (Duc-Anh)](https://github.com/aengusng8) |
| CLIP Guided Img2Img Stable Diffusion Pipeline | Doing CLIP guidance for image to image generation with Stable Diffusion | [CLIP Guided Img2Img Stable Diffusion](#clip-guided-img2img-stable-diffusion) | - | [Nipun Jindal](https://github.com/nipunjindal/) |
| TensorRT Stable Diffusion Text to Image Pipeline | Accelerates the Stable Diffusion Text2Image Pipeline using TensorRT | [TensorRT Stable Diffusion Text to Image Pipeline](#tensorrt-text2image-stable-diffusion-pipeline) | - | [Asfiya Baig](https://github.com/asfiyab-nvidia) |
| EDICT Image Editing Pipeline | Diffusion pipeline for text-guided image editing | [EDICT Image Editing Pipeline](#edict-image-editing-pipeline) | [Notebook](https://github.com/huggingface/notebooks/blob/main/diffusers/edict_image_pipeline.ipynb) | [Joqsan Azocar](https://github.com/Joqsan) |
| Stable Diffusion RePaint | Stable Diffusion pipeline using [RePaint](https://arxiv.org/abs/2201.09865) for inpainting. | [Stable Diffusion RePaint](#stable-diffusion-repaint )|[Notebook](https://github.com/huggingface/notebooks/blob/main/diffusers/stable_diffusion_repaint.ipynb)| [Markus Pobitzer](https://github.com/Markus-Pobitzer) |
| EDICT Image Editing Pipeline | Diffusion pipeline for text-guided image editing | [EDICT Image Editing Pipeline](#edict-image-editing-pipeline) | - | [Joqsan Azocar](https://github.com/Joqsan) |
| Stable Diffusion RePaint | Stable Diffusion pipeline using [RePaint](https://arxiv.org/abs/2201.09865) for inpainting. | [Stable Diffusion RePaint](#stable-diffusion-repaint ) | - | [Markus Pobitzer](https://github.com/Markus-Pobitzer) |
| TensorRT Stable Diffusion Image to Image Pipeline | Accelerates the Stable Diffusion Image2Image Pipeline using TensorRT | [TensorRT Stable Diffusion Image to Image Pipeline](#tensorrt-image2image-stable-diffusion-pipeline) | - | [Asfiya Baig](https://github.com/asfiyab-nvidia) |
| Stable Diffusion IPEX Pipeline | Accelerate Stable Diffusion inference pipeline with BF16/FP32 precision on Intel Xeon CPUs with [IPEX](https://github.com/intel/intel-extension-for-pytorch) | [Stable Diffusion on IPEX](#stable-diffusion-on-ipex) | - | [Yingjie Han](https://github.com/yingjie-han/) |
| CLIP Guided Images Mixing Stable Diffusion Pipeline | Сombine images using usual diffusion models. | [CLIP Guided Images Mixing Using Stable Diffusion](#clip-guided-images-mixing-with-stable-diffusion) | - | [Karachev Denis](https://github.com/TheDenk) |
@@ -61,13 +61,13 @@ Please also check out our [Community Scripts](https://github.com/huggingface/dif
| Regional Prompting Pipeline | Assign multiple prompts for different regions | [Regional Prompting Pipeline](#regional-prompting-pipeline) | - | [hako-mikan](https://github.com/hako-mikan) |
| LDM3D-sr (LDM3D upscaler) | Upscale low resolution RGB and depth inputs to high resolution | [StableDiffusionUpscaleLDM3D Pipeline](https://github.com/estelleafl/diffusers/tree/ldm3d_upscaler_community/examples/community#stablediffusionupscaleldm3d-pipeline) | - | [Estelle Aflalo](https://github.com/estelleafl) |
| AnimateDiff ControlNet Pipeline | Combines AnimateDiff with precise motion control using ControlNets | [AnimateDiff ControlNet Pipeline](#animatediff-controlnet-pipeline) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1SKboYeGjEQmQPWoFC0aLYpBlYdHXkvAu?usp=sharing) | [Aryan V S](https://github.com/a-r-r-o-w) and [Edoardo Botta](https://github.com/EdoardoBotta) |
| DemoFusion Pipeline | Implementation of [DemoFusion: Democratising High-Resolution Image Generation With No $$$](https://arxiv.org/abs/2311.16973) | [DemoFusion Pipeline](#demofusion) | [Notebook](https://github.com/huggingface/notebooks/blob/main/diffusers/demo_fusion.ipynb) | [Ruoyi Du](https://github.com/RuoyiDu) |
| Instaflow Pipeline | Implementation of [InstaFlow! One-Step Stable Diffusion with Rectified Flow](https://arxiv.org/abs/2309.06380) | [Instaflow Pipeline](#instaflow-pipeline) | [Notebook](https://github.com/huggingface/notebooks/blob/main/diffusers/insta_flow.ipynb) | [Ayush Mangal](https://github.com/ayushtues) |
| DemoFusion Pipeline | Implementation of [DemoFusion: Democratising High-Resolution Image Generation With No $$$](https://arxiv.org/abs/2311.16973) | [DemoFusion Pipeline](#demofusion) | - | [Ruoyi Du](https://github.com/RuoyiDu) |
| Instaflow Pipeline | Implementation of [InstaFlow! One-Step Stable Diffusion with Rectified Flow](https://arxiv.org/abs/2309.06380) | [Instaflow Pipeline](#instaflow-pipeline) | - | [Ayush Mangal](https://github.com/ayushtues) |
| Null-Text Inversion Pipeline | Implement [Null-text Inversion for Editing Real Images using Guided Diffusion Models](https://arxiv.org/abs/2211.09794) as a pipeline. | [Null-Text Inversion](https://github.com/google/prompt-to-prompt/) | - | [Junsheng Luan](https://github.com/Junsheng121) |
| Rerender A Video Pipeline | Implementation of [[SIGGRAPH Asia 2023] Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation](https://arxiv.org/abs/2306.07954) | [Rerender A Video Pipeline](#rerender-a-video) | - | [Yifan Zhou](https://github.com/SingleZombie) |
| StyleAligned Pipeline | Implementation of [Style Aligned Image Generation via Shared Attention](https://arxiv.org/abs/2312.02133) | [StyleAligned Pipeline](#stylealigned-pipeline) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://drive.google.com/file/d/15X2E0jFPTajUIjS0FzX50OaHsCbP2lQ0/view?usp=sharing) | [Aryan V S](https://github.com/a-r-r-o-w) |
| AnimateDiff Image-To-Video Pipeline | Experimental Image-To-Video support for AnimateDiff (open to improvements) | [AnimateDiff Image To Video Pipeline](#animatediff-image-to-video-pipeline) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://drive.google.com/file/d/1TvzCDPHhfFtdcJZe4RLloAwyoLKuttWK/view?usp=sharing) | [Aryan V S](https://github.com/a-r-r-o-w) |
| IP Adapter FaceID Stable Diffusion | Stable Diffusion Pipeline that supports IP Adapter Face ID | [IP Adapter Face ID](#ip-adapter-face-id) |[Notebook](https://github.com/huggingface/notebooks/blob/main/diffusers/ip_adapter_face_id.ipynb)| [Fabio Rigano](https://github.com/fabiorigano) |
| IP Adapter FaceID Stable Diffusion | Stable Diffusion Pipeline that supports IP Adapter Face ID | [IP Adapter Face ID](#ip-adapter-face-id) | - | [Fabio Rigano](https://github.com/fabiorigano) |
| InstantID Pipeline | Stable Diffusion XL Pipeline that supports InstantID | [InstantID Pipeline](#instantid-pipeline) | [![Hugging Face Space](https://img.shields.io/badge/🤗%20Hugging%20Face-Space-yellow)](https://huggingface.co/spaces/InstantX/InstantID) | [Haofan Wang](https://github.com/haofanwang) |
| UFOGen Scheduler | Scheduler for UFOGen Model (compatible with Stable Diffusion pipelines) | [UFOGen Scheduler](#ufogen-scheduler) | - | [dg845](https://github.com/dg845) |
| Stable Diffusion XL IPEX Pipeline | Accelerate Stable Diffusion XL inference pipeline with BF16/FP32 precision on Intel Xeon CPUs with [IPEX](https://github.com/intel/intel-extension-for-pytorch) | [Stable Diffusion XL on IPEX](#stable-diffusion-xl-on-ipex) | - | [Dan Li](https://github.com/ustcuna/) |
@@ -251,30 +251,24 @@ Example usage:
from diffusers import DiffusionPipeline
import torch
model_name = "black-forest-labs/FLUX.1-dev"
prompt = "a watercolor painting of a unicorn"
negative_prompt = "pink"
# Load the diffusion pipeline
pipeline = DiffusionPipeline.from_pretrained(
model_name,
"black-forest-labs/FLUX.1-dev",
torch_dtype=torch.bfloat16,
custom_pipeline="pipeline_flux_with_cfg"
)
pipeline.enable_model_cpu_offload()
prompt = "a watercolor painting of a unicorn"
negative_prompt = "pink"
# Generate the image
img = pipeline(
prompt=prompt,
negative_prompt=negative_prompt,
true_cfg=1.5,
guidance_scale=3.5,
num_images_per_prompt=1,
generator=torch.manual_seed(0)
).images[0]
# Save the generated image
img.save("cfg_flux.png")
print("Image generated and saved successfully.")
```
### Differential Diffusion
@@ -847,8 +841,6 @@ out = pipe(
wildcard_files=["object.txt", "animal.txt"],
num_prompt_samples=1
)
out.images[0].save("image.png")
torch.cuda.empty_cache()
```
### Composable Stable diffusion
@@ -2625,17 +2617,16 @@ for obj in range(bs):
### Stable Diffusion XL Reference
This pipeline uses the Reference. Refer to the [Stable Diffusion Reference](https://github.com/huggingface/diffusers/blob/main/examples/community/README.md#stable-diffusion-reference) section for more information.
This pipeline uses the Reference. Refer to the [stable_diffusion_reference](https://github.com/huggingface/diffusers/blob/main/examples/community/README.md#stable-diffusion-reference).
```py
import torch
# from diffusers import DiffusionPipeline
from PIL import Image
from diffusers.utils import load_image
from diffusers import DiffusionPipeline
from diffusers.schedulers import UniPCMultistepScheduler
from .stable_diffusion_xl_reference import StableDiffusionXLReferencePipeline
input_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/sdxl_reference_input_cat.jpg")
input_image = load_image("https://hf.co/datasets/huggingface/documentation-images/resolve/main/diffusers/input_image_vermeer.png")
# pipe = DiffusionPipeline.from_pretrained(
# "stabilityai/stable-diffusion-xl-base-1.0",
@@ -2653,7 +2644,7 @@ pipe = StableDiffusionXLReferencePipeline.from_pretrained(
pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)
result_img = pipe(ref_image=input_image,
prompt="a dog",
prompt="1girl",
num_inference_steps=20,
reference_attn=True,
reference_adain=True).images[0]
@@ -2661,14 +2652,14 @@ result_img = pipe(ref_image=input_image,
Reference Image
![reference_image](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/sdxl_reference_input_cat.jpg)
![reference_image](https://hf.co/datasets/huggingface/documentation-images/resolve/main/diffusers/input_image_vermeer.png)
Output Image
`prompt: a dog`
`prompt: 1 girl`
`reference_attn=False, reference_adain=True, num_inference_steps=20`
![Output_image](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/sdxl_reference_adain_dog.png)
`reference_attn=True, reference_adain=True, num_inference_steps=20`
![Output_image](https://github.com/zideliu/diffusers/assets/34944964/743848da-a215-48f9-ae39-b5e2ae49fb13)
Reference Image
![reference_image](https://github.com/huggingface/diffusers/assets/34944964/449bdab6-e744-4fb2-9620-d4068d9a741b)
@@ -2690,88 +2681,6 @@ Output Image
`reference_attn=True, reference_adain=True, num_inference_steps=20`
![output_image](https://github.com/huggingface/diffusers/assets/34944964/9b2f1aca-886f-49c3-89ec-d2031c8e3670)
### Stable Diffusion XL ControlNet Reference
This pipeline uses the Reference Control and with ControlNet. Refer to the [Stable Diffusion ControlNet Reference](https://github.com/huggingface/diffusers/blob/main/examples/community/README.md#stable-diffusion-controlnet-reference) and [Stable Diffusion XL Reference](https://github.com/huggingface/diffusers/blob/main/examples/community/README.md#stable-diffusion-xl-reference) sections for more information.
```py
from diffusers import ControlNetModel, AutoencoderKL
from diffusers.schedulers import UniPCMultistepScheduler
from diffusers.utils import load_image
import numpy as np
import torch
import cv2
from PIL import Image
from .stable_diffusion_xl_controlnet_reference import StableDiffusionXLControlNetReferencePipeline
# download an image
canny_image = load_image(
"https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/sdxl_reference_input_cat.jpg"
)
ref_image = load_image(
"https://hf.co/datasets/hf-internal-testing/diffusers-images/resolve/main/sd_controlnet/hf-logo.png"
)
# initialize the models and pipeline
controlnet_conditioning_scale = 0.5 # recommended for good generalization
controlnet = ControlNetModel.from_pretrained(
"diffusers/controlnet-canny-sdxl-1.0", torch_dtype=torch.float16
)
vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16)
pipe = StableDiffusionXLControlNetReferencePipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0", controlnet=controlnet, vae=vae, torch_dtype=torch.float16
).to("cuda:0")
pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)
# get canny image
image = np.array(canny_image)
image = cv2.Canny(image, 100, 200)
image = image[:, :, None]
image = np.concatenate([image, image, image], axis=2)
canny_image = Image.fromarray(image)
# generate image
image = pipe(
prompt="a cat",
num_inference_steps=20,
controlnet_conditioning_scale=controlnet_conditioning_scale,
image=canny_image,
ref_image=ref_image,
reference_attn=False,
reference_adain=True,
style_fidelity=1.0,
generator=torch.Generator("cuda").manual_seed(42)
).images[0]
```
Canny ControlNet Image
![canny_image](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/sdxl_reference_input_cat.jpg)
Reference Image
![ref_image](https://hf.co/datasets/hf-internal-testing/diffusers-images/resolve/main/sd_controlnet/hf-logo.png)
Output Image
`prompt: a cat`
`reference_attn=True, reference_adain=True, num_inference_steps=20, style_fidelity=1.0`
![Output_image](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/sdxl_reference_attn_adain_canny_cat.png)
`reference_attn=False, reference_adain=True, num_inference_steps=20, style_fidelity=1.0`
![Output_image](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/sdxl_reference_adain_canny_cat.png)
`reference_attn=True, reference_adain=False, num_inference_steps=20, style_fidelity=1.0`
![Output_image](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/sdxl_reference_attn_canny_cat.png)
### Stable diffusion fabric pipeline
FABRIC approach applicable to a wide range of popular diffusion models, which exploits
@@ -3467,20 +3376,6 @@ best quality, 3persons in garden, a boy blue shirt BREAK
best quality, 3persons in garden, an old man red suit
```
### Use base prompt
You can use a base prompt to apply the prompt to all areas. You can set a base prompt by adding `ADDBASE` at the end. Base prompts can also be combined with common prompts, but the base prompt must be specified first.
```
2d animation style ADDBASE
masterpiece, high quality ADDCOMM
(blue sky)++ BREAK
green hair twintail BREAK
book shelf BREAK
messy desk BREAK
orange++ dress and sofa
```
### Negative prompt
Negative prompts are equally effective across all regions, but it is possible to set region-specific prompts for negative prompts as well. The number of BREAKs must be the same as the number of prompts. If the number of prompts does not match, the negative prompts will be used without being divided into regions.
@@ -3511,7 +3406,6 @@ pipe(prompt=prompt, rp_args=rp_args)
### Optional Parameters
- `save_mask`: In `Prompt` mode, choose whether to output the generated mask along with the image. The default is `False`.
- `base_ratio`: Used with `ADDBASE`. Sets the ratio of the base prompt; if base ratio is set to 0.2, then resulting images will consist of `20%*BASE_PROMPT + 80%*REGION_PROMPT`
The Pipeline supports `compel` syntax. Input prompts using the `compel` structure will be automatically applied and processed.
@@ -3840,7 +3734,6 @@ The original repo can be found at [repo](https://github.com/PRIS-CV/DemoFusion).
```py
from diffusers import DiffusionPipeline
import torch
pipe = DiffusionPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
@@ -3964,10 +3857,9 @@ You can also combine it with LORA out of the box, like <https://huggingface.co/a
from diffusers import DiffusionPipeline
import torch
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
pipe = DiffusionPipeline.from_pretrained("XCLIU/instaflow_0_9B_from_sd_1_5", torch_dtype=torch.float16, custom_pipeline="instaflow_one_step")
pipe.to(device) ### if GPU is not available, comment this line
pipe.to("cuda") ### if GPU is not available, comment this line
pipe.load_lora_weights("artificialguybr/logo-redmond-1-5v-logo-lora-for-liberteredmond-sd-1-5")
prompt = "logo, A logo for a fitness app, dynamic running figure, energetic colors (red, orange) ),LogoRedAF ,"
images = pipe(prompt=prompt,
@@ -4800,4 +4692,4 @@ with torch.no_grad():
```
In the folder examples/pixart there is also a script that can be used to train new models.
Please check the script `train_controlnet_hf_diffusers.sh` on how to start the training.
Please check the script `train_controlnet_hf_diffusers.sh` on how to start the training.

View File

@@ -6,9 +6,9 @@ If a community script doesn't work as expected, please open an issue and ping th
| Example | Description | Code Example | Colab | Author |
|:--------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------:|
| Using IP-Adapter with Negative Noise | Using negative noise with IP-adapter to better control the generation (see the [original post](https://github.com/huggingface/diffusers/discussions/7167) on the forum for more details) | [IP-Adapter Negative Noise](#ip-adapter-negative-noise) |[Notebook](https://github.com/huggingface/notebooks/blob/main/diffusers/ip_adapter_negative_noise.ipynb) | [Álvaro Somoza](https://github.com/asomoza)|
| Asymmetric Tiling |configure seamless image tiling independently for the X and Y axes | [Asymmetric Tiling](#Asymmetric-Tiling ) |[Notebook](https://github.com/huggingface/notebooks/blob/main/diffusers/asymetric_tiling.ipynb) | [alexisrolland](https://github.com/alexisrolland)|
| Prompt Scheduling Callback |Allows changing prompts during a generation | [Prompt Scheduling-Callback](#Prompt-Scheduling-Callback ) |[Notebook](https://github.com/huggingface/notebooks/blob/main/diffusers/prompt_scheduling_callback.ipynb) | [hlky](https://github.com/hlky)|
| Using IP-Adapter with Negative Noise | Using negative noise with IP-adapter to better control the generation (see the [original post](https://github.com/huggingface/diffusers/discussions/7167) on the forum for more details) | [IP-Adapter Negative Noise](#ip-adapter-negative-noise) | https://github.com/huggingface/notebooks/blob/main/diffusers/ip_adapter_negative_noise.ipynb | [Álvaro Somoza](https://github.com/asomoza)|
| Asymmetric Tiling |configure seamless image tiling independently for the X and Y axes | [Asymmetric Tiling](#Asymmetric-Tiling ) |https://github.com/huggingface/notebooks/blob/main/diffusers/asymetric_tiling.ipynb | [alexisrolland](https://github.com/alexisrolland)|
| Prompt Scheduling Callback |Allows changing prompts during a generation | [Prompt Scheduling-Callback](#Prompt-Scheduling-Callback ) |https://github.com/huggingface/notebooks/blob/main/diffusers/prompt_scheduling_callback.ipynb | [hlky](https://github.com/hlky)|
## Example usages
@@ -241,15 +241,27 @@ from diffusers import StableDiffusionPipeline
from diffusers.callbacks import PipelineCallback, MultiPipelineCallbacks
from diffusers.configuration_utils import register_to_config
import torch
from typing import Any, Dict, Tuple, Union
from typing import Any, Dict, Optional
class SDPromptSchedulingCallback(PipelineCallback):
pipeline: StableDiffusionPipeline = StableDiffusionPipeline.from_pretrained(
"stable-diffusion-v1-5/stable-diffusion-v1-5",
torch_dtype=torch.float16,
variant="fp16",
use_safetensors=True,
).to("cuda")
pipeline.safety_checker = None
pipeline.requires_safety_checker = False
class SDPromptScheduleCallback(PipelineCallback):
@register_to_config
def __init__(
self,
encoded_prompt: Union[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]],
cutoff_step_ratio=None,
prompt: str,
negative_prompt: Optional[str] = None,
num_images_per_prompt: int = 1,
cutoff_step_ratio=1.0,
cutoff_step_index=None,
):
super().__init__(
@@ -263,10 +275,6 @@ class SDPromptSchedulingCallback(PipelineCallback):
) -> Dict[str, Any]:
cutoff_step_ratio = self.config.cutoff_step_ratio
cutoff_step_index = self.config.cutoff_step_index
if isinstance(self.config.encoded_prompt, tuple):
prompt_embeds, negative_prompt_embeds = self.config.encoded_prompt
else:
prompt_embeds = self.config.encoded_prompt
# Use cutoff_step_index if it's not None, otherwise use cutoff_step_ratio
cutoff_step = (
@@ -276,164 +284,34 @@ class SDPromptSchedulingCallback(PipelineCallback):
)
if step_index == cutoff_step:
prompt_embeds, negative_prompt_embeds = pipeline.encode_prompt(
prompt=self.config.prompt,
negative_prompt=self.config.negative_prompt,
device=pipeline._execution_device,
num_images_per_prompt=self.config.num_images_per_prompt,
do_classifier_free_guidance=pipeline.do_classifier_free_guidance,
)
if pipeline.do_classifier_free_guidance:
prompt_embeds = torch.cat([negative_prompt_embeds, prompt_embeds])
callback_kwargs[self.tensor_inputs[0]] = prompt_embeds
return callback_kwargs
pipeline: StableDiffusionPipeline = StableDiffusionPipeline.from_pretrained(
"stable-diffusion-v1-5/stable-diffusion-v1-5",
torch_dtype=torch.float16,
variant="fp16",
use_safetensors=True,
).to("cuda")
pipeline.safety_checker = None
pipeline.requires_safety_checker = False
callback = MultiPipelineCallbacks(
[
SDPromptSchedulingCallback(
encoded_prompt=pipeline.encode_prompt(
prompt=f"prompt {index}",
negative_prompt=f"negative prompt {index}",
device=pipeline._execution_device,
num_images_per_prompt=1,
# pipeline.do_classifier_free_guidance can't be accessed until after pipeline is ran
do_classifier_free_guidance=True,
),
cutoff_step_index=index,
) for index in range(1, 20)
SDPromptScheduleCallback(
prompt="Official portrait of a smiling world war ii general, female, cheerful, happy, detailed face, 20th century, highly detailed, cinematic lighting, digital art painting by Greg Rutkowski",
negative_prompt="Deformed, ugly, bad anatomy",
cutoff_step_ratio=0.25,
)
]
)
image = pipeline(
prompt="prompt"
negative_prompt="negative prompt",
prompt="Official portrait of a smiling world war ii general, male, cheerful, happy, detailed face, 20th century, highly detailed, cinematic lighting, digital art painting by Greg Rutkowski",
negative_prompt="Deformed, ugly, bad anatomy",
callback_on_step_end=callback,
callback_on_step_end_tensor_inputs=["prompt_embeds"],
).images[0]
torch.cuda.empty_cache()
image.save('image.png')
```
```python
from diffusers import StableDiffusionXLPipeline
from diffusers.callbacks import PipelineCallback, MultiPipelineCallbacks
from diffusers.configuration_utils import register_to_config
import torch
from typing import Any, Dict, Tuple, Union
class SDXLPromptSchedulingCallback(PipelineCallback):
@register_to_config
def __init__(
self,
encoded_prompt: Union[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]],
add_text_embeds: Union[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]],
add_time_ids: Union[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]],
cutoff_step_ratio=None,
cutoff_step_index=None,
):
super().__init__(
cutoff_step_ratio=cutoff_step_ratio, cutoff_step_index=cutoff_step_index
)
tensor_inputs = ["prompt_embeds", "add_text_embeds", "add_time_ids"]
def callback_fn(
self, pipeline, step_index, timestep, callback_kwargs
) -> Dict[str, Any]:
cutoff_step_ratio = self.config.cutoff_step_ratio
cutoff_step_index = self.config.cutoff_step_index
if isinstance(self.config.encoded_prompt, tuple):
prompt_embeds, negative_prompt_embeds = self.config.encoded_prompt
else:
prompt_embeds = self.config.encoded_prompt
if isinstance(self.config.add_text_embeds, tuple):
add_text_embeds, negative_add_text_embeds = self.config.add_text_embeds
else:
add_text_embeds = self.config.add_text_embeds
if isinstance(self.config.add_time_ids, tuple):
add_time_ids, negative_add_time_ids = self.config.add_time_ids
else:
add_time_ids = self.config.add_time_ids
# Use cutoff_step_index if it's not None, otherwise use cutoff_step_ratio
cutoff_step = (
cutoff_step_index
if cutoff_step_index is not None
else int(pipeline.num_timesteps * cutoff_step_ratio)
)
if step_index == cutoff_step:
if pipeline.do_classifier_free_guidance:
prompt_embeds = torch.cat([negative_prompt_embeds, prompt_embeds])
add_text_embeds = torch.cat([negative_add_text_embeds, add_text_embeds])
add_time_ids = torch.cat([negative_add_time_ids, add_time_ids])
callback_kwargs[self.tensor_inputs[0]] = prompt_embeds
callback_kwargs[self.tensor_inputs[1]] = add_text_embeds
callback_kwargs[self.tensor_inputs[2]] = add_time_ids
return callback_kwargs
pipeline: StableDiffusionXLPipeline = StableDiffusionXLPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
torch_dtype=torch.float16,
variant="fp16",
use_safetensors=True,
).to("cuda")
callbacks = []
for index in range(1, 20):
(
prompt_embeds,
negative_prompt_embeds,
pooled_prompt_embeds,
negative_pooled_prompt_embeds,
) = pipeline.encode_prompt(
prompt=f"prompt {index}",
negative_prompt=f"prompt {index}",
device=pipeline._execution_device,
num_images_per_prompt=1,
# pipeline.do_classifier_free_guidance can't be accessed until after pipeline is ran
do_classifier_free_guidance=True,
)
text_encoder_projection_dim = int(pooled_prompt_embeds.shape[-1])
add_time_ids = pipeline._get_add_time_ids(
(1024, 1024),
(0, 0),
(1024, 1024),
dtype=prompt_embeds.dtype,
text_encoder_projection_dim=text_encoder_projection_dim,
)
negative_add_time_ids = pipeline._get_add_time_ids(
(1024, 1024),
(0, 0),
(1024, 1024),
dtype=prompt_embeds.dtype,
text_encoder_projection_dim=text_encoder_projection_dim,
)
callbacks.append(
SDXLPromptSchedulingCallback(
encoded_prompt=(prompt_embeds, negative_prompt_embeds),
add_text_embeds=(pooled_prompt_embeds, negative_pooled_prompt_embeds),
add_time_ids=(add_time_ids, negative_add_time_ids),
cutoff_step_index=index,
)
)
callback = MultiPipelineCallbacks(callbacks)
image = pipeline(
prompt="prompt",
negative_prompt="negative prompt",
callback_on_step_end=callback,
callback_on_step_end_tensor_inputs=[
"prompt_embeds",
"add_text_embeds",
"add_time_ids",
],
).images[0]
```

File diff suppressed because it is too large Load Diff

View File

@@ -1008,8 +1008,6 @@ class HunyuanDiTDifferentialImg2ImgPipeline(DiffusionPipeline):
self.transformer.inner_dim // self.transformer.num_heads,
grid_crops_coords,
(grid_height, grid_width),
device=device,
output_type="pt",
)
style = torch.tensor([0], device=device)

View File

@@ -3,12 +3,13 @@ from typing import Dict, Optional
import torch
import torchvision.transforms.functional as FF
from transformers import CLIPImageProcessor, CLIPTextModel, CLIPTokenizer, CLIPVisionModelWithProjection
from transformers import CLIPImageProcessor, CLIPTextModel, CLIPTokenizer
from diffusers import StableDiffusionPipeline
from diffusers.models import AutoencoderKL, UNet2DConditionModel
from diffusers.pipelines.stable_diffusion.safety_checker import StableDiffusionSafetyChecker
from diffusers.schedulers import KarrasDiffusionSchedulers
from diffusers.utils import USE_PEFT_BACKEND
try:
@@ -16,7 +17,6 @@ try:
except ImportError:
Compel = None
KBASE = "ADDBASE"
KCOMM = "ADDCOMM"
KBRK = "BREAK"
@@ -34,11 +34,6 @@ class RegionalPromptingStableDiffusionPipeline(StableDiffusionPipeline):
Optional
rp_args["save_mask"]: True/False (save masks in prompt mode)
rp_args["power"]: int (power for attention maps in prompt mode)
rp_args["base_ratio"]:
float (Sets the ratio of the base prompt)
ex) 0.2 (20%*BASE_PROMPT + 80%*REGION_PROMPT)
[Use base prompt](https://github.com/hako-mikan/sd-webui-regional-prompter?tab=readme-ov-file#use-base-prompt)
Pipeline for text-to-image generation using Stable Diffusion.
@@ -75,7 +70,6 @@ class RegionalPromptingStableDiffusionPipeline(StableDiffusionPipeline):
scheduler: KarrasDiffusionSchedulers,
safety_checker: StableDiffusionSafetyChecker,
feature_extractor: CLIPImageProcessor,
image_encoder: CLIPVisionModelWithProjection = None,
requires_safety_checker: bool = True,
):
super().__init__(
@@ -86,7 +80,6 @@ class RegionalPromptingStableDiffusionPipeline(StableDiffusionPipeline):
scheduler,
safety_checker,
feature_extractor,
image_encoder,
requires_safety_checker,
)
self.register_modules(
@@ -97,7 +90,6 @@ class RegionalPromptingStableDiffusionPipeline(StableDiffusionPipeline):
scheduler=scheduler,
safety_checker=safety_checker,
feature_extractor=feature_extractor,
image_encoder=image_encoder,
)
@torch.no_grad()
@@ -118,40 +110,17 @@ class RegionalPromptingStableDiffusionPipeline(StableDiffusionPipeline):
rp_args: Dict[str, str] = None,
):
active = KBRK in prompt[0] if isinstance(prompt, list) else KBRK in prompt
use_base = KBASE in prompt[0] if isinstance(prompt, list) else KBASE in prompt
if negative_prompt is None:
negative_prompt = "" if isinstance(prompt, str) else [""] * len(prompt)
device = self._execution_device
regions = 0
self.base_ratio = float(rp_args["base_ratio"]) if "base_ratio" in rp_args else 0.0
self.power = int(rp_args["power"]) if "power" in rp_args else 1
prompts = prompt if isinstance(prompt, list) else [prompt]
n_prompts = negative_prompt if isinstance(negative_prompt, list) else [negative_prompt]
n_prompts = negative_prompt if isinstance(prompt, str) else [negative_prompt]
self.batch = batch = num_images_per_prompt * len(prompts)
if use_base:
bases = prompts.copy()
n_bases = n_prompts.copy()
for i, prompt in enumerate(prompts):
parts = prompt.split(KBASE)
if len(parts) == 2:
bases[i], prompts[i] = parts
elif len(parts) > 2:
raise ValueError(f"Multiple instances of {KBASE} found in prompt: {prompt}")
for i, prompt in enumerate(n_prompts):
n_parts = prompt.split(KBASE)
if len(n_parts) == 2:
n_bases[i], n_prompts[i] = n_parts
elif len(n_parts) > 2:
raise ValueError(f"Multiple instances of {KBASE} found in negative prompt: {prompt}")
all_bases_cn, _ = promptsmaker(bases, num_images_per_prompt)
all_n_bases_cn, _ = promptsmaker(n_bases, num_images_per_prompt)
all_prompts_cn, all_prompts_p = promptsmaker(prompts, num_images_per_prompt)
all_n_prompts_cn, _ = promptsmaker(n_prompts, num_images_per_prompt)
@@ -168,16 +137,8 @@ class RegionalPromptingStableDiffusionPipeline(StableDiffusionPipeline):
conds = getcompelembs(all_prompts_cn)
unconds = getcompelembs(all_n_prompts_cn)
base_embs = getcompelembs(all_bases_cn) if use_base else None
base_n_embs = getcompelembs(all_n_bases_cn) if use_base else None
# When using base, it seems more reasonable to use base prompts as prompt_embeddings rather than regional prompts
embs = getcompelembs(prompts) if not use_base else base_embs
n_embs = getcompelembs(n_prompts) if not use_base else base_n_embs
if use_base and self.base_ratio > 0:
conds = self.base_ratio * base_embs + (1 - self.base_ratio) * conds
unconds = self.base_ratio * base_n_embs + (1 - self.base_ratio) * unconds
embs = getcompelembs(prompts)
n_embs = getcompelembs(n_prompts)
prompt = negative_prompt = None
else:
conds = self.encode_prompt(prompts, device, 1, True)[0]
@@ -186,18 +147,6 @@ class RegionalPromptingStableDiffusionPipeline(StableDiffusionPipeline):
if equal
else self.encode_prompt(all_n_prompts_cn, device, 1, True)[0]
)
if use_base and self.base_ratio > 0:
base_embs = self.encode_prompt(bases, device, 1, True)[0]
base_n_embs = (
self.encode_prompt(n_bases, device, 1, True)[0]
if equal
else self.encode_prompt(all_n_bases_cn, device, 1, True)[0]
)
conds = self.base_ratio * base_embs + (1 - self.base_ratio) * conds
unconds = self.base_ratio * base_n_embs + (1 - self.base_ratio) * unconds
embs = n_embs = None
if not active:
@@ -276,6 +225,8 @@ class RegionalPromptingStableDiffusionPipeline(StableDiffusionPipeline):
residual = hidden_states
args = () if USE_PEFT_BACKEND else (scale,)
if attn.spatial_norm is not None:
hidden_states = attn.spatial_norm(hidden_states, temb)
@@ -296,15 +247,16 @@ class RegionalPromptingStableDiffusionPipeline(StableDiffusionPipeline):
if attn.group_norm is not None:
hidden_states = attn.group_norm(hidden_states.transpose(1, 2)).transpose(1, 2)
query = attn.to_q(hidden_states)
args = () if USE_PEFT_BACKEND else (scale,)
query = attn.to_q(hidden_states, *args)
if encoder_hidden_states is None:
encoder_hidden_states = hidden_states
elif attn.norm_cross:
encoder_hidden_states = attn.norm_encoder_hidden_states(encoder_hidden_states)
key = attn.to_k(encoder_hidden_states)
value = attn.to_v(encoder_hidden_states)
key = attn.to_k(encoder_hidden_states, *args)
value = attn.to_v(encoder_hidden_states, *args)
inner_dim = key.shape[-1]
head_dim = inner_dim // attn.heads
@@ -331,7 +283,7 @@ class RegionalPromptingStableDiffusionPipeline(StableDiffusionPipeline):
hidden_states = hidden_states.to(query.dtype)
# linear proj
hidden_states = attn.to_out[0](hidden_states)
hidden_states = attn.to_out[0](hidden_states, *args)
# dropout
hidden_states = attn.to_out[1](hidden_states)
@@ -458,9 +410,9 @@ def promptsmaker(prompts, batch):
add = ""
if KCOMM in prompt:
add, prompt = prompt.split(KCOMM)
add = add.strip() + " "
prompts = [p.strip() for p in prompt.split(KBRK)]
out_p.append([add + p for i, p in enumerate(prompts)])
add = add + " "
prompts = prompt.split(KBRK)
out_p.append([add + p for p in prompts])
out = [None] * batch * len(out_p[0]) * len(out_p)
for p, prs in enumerate(out_p): # inputs prompts
for r, pr in enumerate(prs): # prompts for regions
@@ -497,6 +449,7 @@ def make_cells(ratios):
add = []
startend(add, inratios[1:])
icells.append(add)
return ocells, icells, sum(len(cell) for cell in icells)

File diff suppressed because it is too large Load Diff

View File

@@ -1,6 +1,5 @@
# Based on stable_diffusion_reference.py
import inspect
from typing import Any, Callable, Dict, List, Optional, Tuple, Union
import numpy as np
@@ -8,33 +7,28 @@ import PIL.Image
import torch
from diffusers import StableDiffusionXLPipeline
from diffusers.callbacks import MultiPipelineCallbacks, PipelineCallback
from diffusers.image_processor import PipelineImageInput
from diffusers.models.attention import BasicTransformerBlock
from diffusers.models.unets.unet_2d_blocks import CrossAttnDownBlock2D, CrossAttnUpBlock2D, DownBlock2D, UpBlock2D
from diffusers.pipelines.stable_diffusion_xl.pipeline_output import StableDiffusionXLPipelineOutput
from diffusers.utils import PIL_INTERPOLATION, deprecate, is_torch_xla_available, logging, replace_example_docstring
from diffusers.models.unets.unet_2d_blocks import (
CrossAttnDownBlock2D,
CrossAttnUpBlock2D,
DownBlock2D,
UpBlock2D,
)
from diffusers.pipelines.stable_diffusion_xl import StableDiffusionXLPipelineOutput
from diffusers.utils import PIL_INTERPOLATION, logging
from diffusers.utils.torch_utils import randn_tensor
if is_torch_xla_available():
import torch_xla.core.xla_model as xm # type: ignore
XLA_AVAILABLE = True
else:
XLA_AVAILABLE = False
logger = logging.get_logger(__name__) # pylint: disable=invalid-name
EXAMPLE_DOC_STRING = """
Examples:
```py
>>> import torch
>>> from diffusers.schedulers import UniPCMultistepScheduler
>>> from diffusers import UniPCMultistepScheduler
>>> from diffusers.utils import load_image
>>> input_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/sdxl_reference_input_cat.jpg")
>>> input_image = load_image("https://hf.co/datasets/huggingface/documentation-images/resolve/main/diffusers/input_image_vermeer.png")
>>> pipe = StableDiffusionXLReferencePipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
@@ -44,7 +38,7 @@ EXAMPLE_DOC_STRING = """
>>> pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)
>>> result_img = pipe(ref_image=input_image,
prompt="a dog",
prompt="1girl",
num_inference_steps=20,
reference_attn=True,
reference_adain=True).images[0]
@@ -62,6 +56,8 @@ def torch_dfs(model: torch.nn.Module):
# Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.rescale_noise_cfg
def rescale_noise_cfg(noise_cfg, noise_pred_text, guidance_rescale=0.0):
"""
Rescale `noise_cfg` according to `guidance_rescale`. Based on findings of [Common Diffusion Noise Schedules and
@@ -76,102 +72,33 @@ def rescale_noise_cfg(noise_cfg, noise_pred_text, guidance_rescale=0.0):
return noise_cfg
# Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.retrieve_timesteps
def retrieve_timesteps(
scheduler,
num_inference_steps: Optional[int] = None,
device: Optional[Union[str, torch.device]] = None,
timesteps: Optional[List[int]] = None,
sigmas: Optional[List[float]] = None,
**kwargs,
):
r"""
Calls the scheduler's `set_timesteps` method and retrieves timesteps from the scheduler after the call. Handles
custom timesteps. Any kwargs will be supplied to `scheduler.set_timesteps`.
Args:
scheduler (`SchedulerMixin`):
The scheduler to get timesteps from.
num_inference_steps (`int`):
The number of diffusion steps used when generating samples with a pre-trained model. If used, `timesteps`
must be `None`.
device (`str` or `torch.device`, *optional*):
The device to which the timesteps should be moved to. If `None`, the timesteps are not moved.
timesteps (`List[int]`, *optional*):
Custom timesteps used to override the timestep spacing strategy of the scheduler. If `timesteps` is passed,
`num_inference_steps` and `sigmas` must be `None`.
sigmas (`List[float]`, *optional*):
Custom sigmas used to override the timestep spacing strategy of the scheduler. If `sigmas` is passed,
`num_inference_steps` and `timesteps` must be `None`.
Returns:
`Tuple[torch.Tensor, int]`: A tuple where the first element is the timestep schedule from the scheduler and the
second element is the number of inference steps.
"""
if timesteps is not None and sigmas is not None:
raise ValueError("Only one of `timesteps` or `sigmas` can be passed. Please choose one to set custom values")
if timesteps is not None:
accepts_timesteps = "timesteps" in set(inspect.signature(scheduler.set_timesteps).parameters.keys())
if not accepts_timesteps:
raise ValueError(
f"The current scheduler class {scheduler.__class__}'s `set_timesteps` does not support custom"
f" timestep schedules. Please check whether you are using the correct scheduler."
)
scheduler.set_timesteps(timesteps=timesteps, device=device, **kwargs)
timesteps = scheduler.timesteps
num_inference_steps = len(timesteps)
elif sigmas is not None:
accept_sigmas = "sigmas" in set(inspect.signature(scheduler.set_timesteps).parameters.keys())
if not accept_sigmas:
raise ValueError(
f"The current scheduler class {scheduler.__class__}'s `set_timesteps` does not support custom"
f" sigmas schedules. Please check whether you are using the correct scheduler."
)
scheduler.set_timesteps(sigmas=sigmas, device=device, **kwargs)
timesteps = scheduler.timesteps
num_inference_steps = len(timesteps)
else:
scheduler.set_timesteps(num_inference_steps, device=device, **kwargs)
timesteps = scheduler.timesteps
return timesteps, num_inference_steps
class StableDiffusionXLReferencePipeline(StableDiffusionXLPipeline):
def prepare_ref_latents(self, refimage, batch_size, dtype, device, generator, do_classifier_free_guidance):
refimage = refimage.to(device=device)
if self.vae.dtype == torch.float16 and self.vae.config.force_upcast:
self.upcast_vae()
refimage = refimage.to(next(iter(self.vae.post_quant_conv.parameters())).dtype)
if refimage.dtype != self.vae.dtype:
refimage = refimage.to(dtype=self.vae.dtype)
# encode the mask image into latents space so we can concatenate it to the latents
if isinstance(generator, list):
ref_image_latents = [
self.vae.encode(refimage[i : i + 1]).latent_dist.sample(generator=generator[i])
for i in range(batch_size)
]
ref_image_latents = torch.cat(ref_image_latents, dim=0)
else:
ref_image_latents = self.vae.encode(refimage).latent_dist.sample(generator=generator)
ref_image_latents = self.vae.config.scaling_factor * ref_image_latents
def _default_height_width(self, height, width, image):
# NOTE: It is possible that a list of images have different
# dimensions for each image, so just checking the first image
# is not _exactly_ correct, but it is simple.
while isinstance(image, list):
image = image[0]
# duplicate mask and ref_image_latents for each generation per prompt, using mps friendly method
if ref_image_latents.shape[0] < batch_size:
if not batch_size % ref_image_latents.shape[0] == 0:
raise ValueError(
"The passed images and the required batch size don't match. Images are supposed to be duplicated"
f" to a total batch size of {batch_size}, but {ref_image_latents.shape[0]} images were passed."
" Make sure the number of images that you pass is divisible by the total requested batch size."
)
ref_image_latents = ref_image_latents.repeat(batch_size // ref_image_latents.shape[0], 1, 1, 1)
if height is None:
if isinstance(image, PIL.Image.Image):
height = image.height
elif isinstance(image, torch.Tensor):
height = image.shape[2]
ref_image_latents = torch.cat([ref_image_latents] * 2) if do_classifier_free_guidance else ref_image_latents
height = (height // 8) * 8 # round down to nearest multiple of 8
# aligning device to prevent device errors when concating it with the latent model input
ref_image_latents = ref_image_latents.to(device=device, dtype=dtype)
return ref_image_latents
if width is None:
if isinstance(image, PIL.Image.Image):
width = image.width
elif isinstance(image, torch.Tensor):
width = image.shape[3]
def prepare_ref_image(
width = (width // 8) * 8
return height, width
def prepare_image(
self,
image,
width,
@@ -224,42 +151,41 @@ class StableDiffusionXLReferencePipeline(StableDiffusionXLPipeline):
return image
def check_ref_inputs(
self,
ref_image,
reference_guidance_start,
reference_guidance_end,
style_fidelity,
reference_attn,
reference_adain,
):
ref_image_is_pil = isinstance(ref_image, PIL.Image.Image)
ref_image_is_tensor = isinstance(ref_image, torch.Tensor)
def prepare_ref_latents(self, refimage, batch_size, dtype, device, generator, do_classifier_free_guidance):
refimage = refimage.to(device=device)
if self.vae.dtype == torch.float16 and self.vae.config.force_upcast:
self.upcast_vae()
refimage = refimage.to(next(iter(self.vae.post_quant_conv.parameters())).dtype)
if refimage.dtype != self.vae.dtype:
refimage = refimage.to(dtype=self.vae.dtype)
# encode the mask image into latents space so we can concatenate it to the latents
if isinstance(generator, list):
ref_image_latents = [
self.vae.encode(refimage[i : i + 1]).latent_dist.sample(generator=generator[i])
for i in range(batch_size)
]
ref_image_latents = torch.cat(ref_image_latents, dim=0)
else:
ref_image_latents = self.vae.encode(refimage).latent_dist.sample(generator=generator)
ref_image_latents = self.vae.config.scaling_factor * ref_image_latents
if not ref_image_is_pil and not ref_image_is_tensor:
raise TypeError(
f"ref image must be passed and be one of PIL image or torch tensor, but is {type(ref_image)}"
)
# duplicate mask and ref_image_latents for each generation per prompt, using mps friendly method
if ref_image_latents.shape[0] < batch_size:
if not batch_size % ref_image_latents.shape[0] == 0:
raise ValueError(
"The passed images and the required batch size don't match. Images are supposed to be duplicated"
f" to a total batch size of {batch_size}, but {ref_image_latents.shape[0]} images were passed."
" Make sure the number of images that you pass is divisible by the total requested batch size."
)
ref_image_latents = ref_image_latents.repeat(batch_size // ref_image_latents.shape[0], 1, 1, 1)
if not reference_attn and not reference_adain:
raise ValueError("`reference_attn` or `reference_adain` must be True.")
ref_image_latents = torch.cat([ref_image_latents] * 2) if do_classifier_free_guidance else ref_image_latents
if style_fidelity < 0.0:
raise ValueError(f"style fidelity: {style_fidelity} can't be smaller than 0.")
if style_fidelity > 1.0:
raise ValueError(f"style fidelity: {style_fidelity} can't be larger than 1.0.")
if reference_guidance_start >= reference_guidance_end:
raise ValueError(
f"reference guidance start: {reference_guidance_start} cannot be larger or equal to reference guidance end: {reference_guidance_end}."
)
if reference_guidance_start < 0.0:
raise ValueError(f"reference guidance start: {reference_guidance_start} can't be smaller than 0.")
if reference_guidance_end > 1.0:
raise ValueError(f"reference guidance end: {reference_guidance_end} can't be larger than 1.0.")
# aligning device to prevent device errors when concating it with the latent model input
ref_image_latents = ref_image_latents.to(device=device, dtype=dtype)
return ref_image_latents
@torch.no_grad()
@replace_example_docstring(EXAMPLE_DOC_STRING)
def __call__(
self,
prompt: Union[str, List[str]] = None,
@@ -268,8 +194,6 @@ class StableDiffusionXLReferencePipeline(StableDiffusionXLPipeline):
height: Optional[int] = None,
width: Optional[int] = None,
num_inference_steps: int = 50,
timesteps: List[int] = None,
sigmas: List[float] = None,
denoising_end: Optional[float] = None,
guidance_scale: float = 5.0,
negative_prompt: Optional[Union[str, List[str]]] = None,
@@ -282,220 +206,28 @@ class StableDiffusionXLReferencePipeline(StableDiffusionXLPipeline):
negative_prompt_embeds: Optional[torch.Tensor] = None,
pooled_prompt_embeds: Optional[torch.Tensor] = None,
negative_pooled_prompt_embeds: Optional[torch.Tensor] = None,
ip_adapter_image: Optional[PipelineImageInput] = None,
ip_adapter_image_embeds: Optional[List[torch.Tensor]] = None,
output_type: Optional[str] = "pil",
return_dict: bool = True,
callback: Optional[Callable[[int, int, torch.Tensor], None]] = None,
callback_steps: int = 1,
cross_attention_kwargs: Optional[Dict[str, Any]] = None,
guidance_rescale: float = 0.0,
original_size: Optional[Tuple[int, int]] = None,
crops_coords_top_left: Tuple[int, int] = (0, 0),
target_size: Optional[Tuple[int, int]] = None,
negative_original_size: Optional[Tuple[int, int]] = None,
negative_crops_coords_top_left: Tuple[int, int] = (0, 0),
negative_target_size: Optional[Tuple[int, int]] = None,
clip_skip: Optional[int] = None,
callback_on_step_end: Optional[
Union[Callable[[int, int, Dict], None], PipelineCallback, MultiPipelineCallbacks]
] = None,
callback_on_step_end_tensor_inputs: List[str] = ["latents"],
attention_auto_machine_weight: float = 1.0,
gn_auto_machine_weight: float = 1.0,
reference_guidance_start: float = 0.0,
reference_guidance_end: float = 1.0,
style_fidelity: float = 0.5,
reference_attn: bool = True,
reference_adain: bool = True,
**kwargs,
):
r"""
Function invoked when calling the pipeline for generation.
Args:
prompt (`str` or `List[str]`, *optional*):
The prompt or prompts to guide the image generation. If not defined, one has to pass `prompt_embeds`.
instead.
prompt_2 (`str` or `List[str]`, *optional*):
The prompt or prompts to be sent to the `tokenizer_2` and `text_encoder_2`. If not defined, `prompt` is
used in both text-encoders
ref_image (`torch.Tensor`, `PIL.Image.Image`):
The Reference Control input condition. Reference Control uses this input condition to generate guidance to Unet. If
the type is specified as `Torch.Tensor`, it is passed to Reference Control as is. `PIL.Image.Image` can
also be accepted as an image.
height (`int`, *optional*, defaults to self.unet.config.sample_size * self.vae_scale_factor):
The height in pixels of the generated image. This is set to 1024 by default for the best results.
Anything below 512 pixels won't work well for
[stabilityai/stable-diffusion-xl-base-1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0)
and checkpoints that are not specifically fine-tuned on low resolutions.
width (`int`, *optional*, defaults to self.unet.config.sample_size * self.vae_scale_factor):
The width in pixels of the generated image. This is set to 1024 by default for the best results.
Anything below 512 pixels won't work well for
[stabilityai/stable-diffusion-xl-base-1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0)
and checkpoints that are not specifically fine-tuned on low resolutions.
num_inference_steps (`int`, *optional*, defaults to 50):
The number of denoising steps. More denoising steps usually lead to a higher quality image at the
expense of slower inference.
timesteps (`List[int]`, *optional*):
Custom timesteps to use for the denoising process with schedulers which support a `timesteps` argument
in their `set_timesteps` method. If not defined, the default behavior when `num_inference_steps` is
passed will be used. Must be in descending order.
sigmas (`List[float]`, *optional*):
Custom sigmas to use for the denoising process with schedulers which support a `sigmas` argument in
their `set_timesteps` method. If not defined, the default behavior when `num_inference_steps` is passed
will be used.
denoising_end (`float`, *optional*):
When specified, determines the fraction (between 0.0 and 1.0) of the total denoising process to be
completed before it is intentionally prematurely terminated. As a result, the returned sample will
still retain a substantial amount of noise as determined by the discrete timesteps selected by the
scheduler. The denoising_end parameter should ideally be utilized when this pipeline forms a part of a
"Mixture of Denoisers" multi-pipeline setup, as elaborated in [**Refining the Image
Output**](https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/stable_diffusion_xl#refining-the-image-output)
guidance_scale (`float`, *optional*, defaults to 5.0):
Guidance scale as defined in [Classifier-Free Diffusion Guidance](https://arxiv.org/abs/2207.12598).
`guidance_scale` is defined as `w` of equation 2. of [Imagen
Paper](https://arxiv.org/pdf/2205.11487.pdf). Guidance scale is enabled by setting `guidance_scale >
1`. Higher guidance scale encourages to generate images that are closely linked to the text `prompt`,
usually at the expense of lower image quality.
negative_prompt (`str` or `List[str]`, *optional*):
The prompt or prompts not to guide the image generation. If not defined, one has to pass
`negative_prompt_embeds` instead. Ignored when not using guidance (i.e., ignored if `guidance_scale` is
less than `1`).
negative_prompt_2 (`str` or `List[str]`, *optional*):
The prompt or prompts not to guide the image generation to be sent to `tokenizer_2` and
`text_encoder_2`. If not defined, `negative_prompt` is used in both text-encoders
num_images_per_prompt (`int`, *optional*, defaults to 1):
The number of images to generate per prompt.
eta (`float`, *optional*, defaults to 0.0):
Corresponds to parameter eta (η) in the DDIM paper: https://arxiv.org/abs/2010.02502. Only applies to
[`schedulers.DDIMScheduler`], will be ignored for others.
generator (`torch.Generator` or `List[torch.Generator]`, *optional*):
One or a list of [torch generator(s)](https://pytorch.org/docs/stable/generated/torch.Generator.html)
to make generation deterministic.
latents (`torch.Tensor`, *optional*):
Pre-generated noisy latents, sampled from a Gaussian distribution, to be used as inputs for image
generation. Can be used to tweak the same generation with different prompts. If not provided, a latents
tensor will ge generated by sampling using the supplied random `generator`.
prompt_embeds (`torch.Tensor`, *optional*):
Pre-generated text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt weighting. If not
provided, text embeddings will be generated from `prompt` input argument.
negative_prompt_embeds (`torch.Tensor`, *optional*):
Pre-generated negative text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt
weighting. If not provided, negative_prompt_embeds will be generated from `negative_prompt` input
argument.
pooled_prompt_embeds (`torch.Tensor`, *optional*):
Pre-generated pooled text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt weighting.
If not provided, pooled text embeddings will be generated from `prompt` input argument.
negative_pooled_prompt_embeds (`torch.Tensor`, *optional*):
Pre-generated negative pooled text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt
weighting. If not provided, pooled negative_prompt_embeds will be generated from `negative_prompt`
input argument.
ip_adapter_image: (`PipelineImageInput`, *optional*): Optional image input to work with IP Adapters.
ip_adapter_image_embeds (`List[torch.Tensor]`, *optional*):
Pre-generated image embeddings for IP-Adapter. It should be a list of length same as number of
IP-adapters. Each element should be a tensor of shape `(batch_size, num_images, emb_dim)`. It should
contain the negative image embedding if `do_classifier_free_guidance` is set to `True`. If not
provided, embeddings are computed from the `ip_adapter_image` input argument.
output_type (`str`, *optional*, defaults to `"pil"`):
The output format of the generate image. Choose between
[PIL](https://pillow.readthedocs.io/en/stable/): `PIL.Image.Image` or `np.array`.
return_dict (`bool`, *optional*, defaults to `True`):
Whether or not to return a [`~pipelines.stable_diffusion_xl.StableDiffusionXLPipelineOutput`] instead
of a plain tuple.
cross_attention_kwargs (`dict`, *optional*):
A kwargs dictionary that if specified is passed along to the `AttentionProcessor` as defined under
`self.processor` in
[diffusers.models.attention_processor](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention_processor.py).
guidance_rescale (`float`, *optional*, defaults to 0.0):
Guidance rescale factor proposed by [Common Diffusion Noise Schedules and Sample Steps are
Flawed](https://arxiv.org/pdf/2305.08891.pdf) `guidance_scale` is defined as `φ` in equation 16. of
[Common Diffusion Noise Schedules and Sample Steps are Flawed](https://arxiv.org/pdf/2305.08891.pdf).
Guidance rescale factor should fix overexposure when using zero terminal SNR.
original_size (`Tuple[int]`, *optional*, defaults to (1024, 1024)):
If `original_size` is not the same as `target_size` the image will appear to be down- or upsampled.
`original_size` defaults to `(height, width)` if not specified. Part of SDXL's micro-conditioning as
explained in section 2.2 of
[https://huggingface.co/papers/2307.01952](https://huggingface.co/papers/2307.01952).
crops_coords_top_left (`Tuple[int]`, *optional*, defaults to (0, 0)):
`crops_coords_top_left` can be used to generate an image that appears to be "cropped" from the position
`crops_coords_top_left` downwards. Favorable, well-centered images are usually achieved by setting
`crops_coords_top_left` to (0, 0). Part of SDXL's micro-conditioning as explained in section 2.2 of
[https://huggingface.co/papers/2307.01952](https://huggingface.co/papers/2307.01952).
target_size (`Tuple[int]`, *optional*, defaults to (1024, 1024)):
For most cases, `target_size` should be set to the desired height and width of the generated image. If
not specified it will default to `(height, width)`. Part of SDXL's micro-conditioning as explained in
section 2.2 of [https://huggingface.co/papers/2307.01952](https://huggingface.co/papers/2307.01952).
negative_original_size (`Tuple[int]`, *optional*, defaults to (1024, 1024)):
To negatively condition the generation process based on a specific image resolution. Part of SDXL's
micro-conditioning as explained in section 2.2 of
[https://huggingface.co/papers/2307.01952](https://huggingface.co/papers/2307.01952). For more
information, refer to this issue thread: https://github.com/huggingface/diffusers/issues/4208.
negative_crops_coords_top_left (`Tuple[int]`, *optional*, defaults to (0, 0)):
To negatively condition the generation process based on a specific crop coordinates. Part of SDXL's
micro-conditioning as explained in section 2.2 of
[https://huggingface.co/papers/2307.01952](https://huggingface.co/papers/2307.01952). For more
information, refer to this issue thread: https://github.com/huggingface/diffusers/issues/4208.
negative_target_size (`Tuple[int]`, *optional*, defaults to (1024, 1024)):
To negatively condition the generation process based on a target image resolution. It should be as same
as the `target_size` for most cases. Part of SDXL's micro-conditioning as explained in section 2.2 of
[https://huggingface.co/papers/2307.01952](https://huggingface.co/papers/2307.01952). For more
information, refer to this issue thread: https://github.com/huggingface/diffusers/issues/4208.
callback_on_step_end (`Callable`, `PipelineCallback`, `MultiPipelineCallbacks`, *optional*):
A function or a subclass of `PipelineCallback` or `MultiPipelineCallbacks` that is called at the end of
each denoising step during the inference. with the following arguments: `callback_on_step_end(self:
DiffusionPipeline, step: int, timestep: int, callback_kwargs: Dict)`. `callback_kwargs` will include a
list of all tensors as specified by `callback_on_step_end_tensor_inputs`.
callback_on_step_end_tensor_inputs (`List`, *optional*):
The list of tensor inputs for the `callback_on_step_end` function. The tensors specified in the list
will be passed as `callback_kwargs` argument. You will only be able to include variables listed in the
`._callback_tensor_inputs` attribute of your pipeline class.
attention_auto_machine_weight (`float`):
Weight of using reference query for self attention's context.
If attention_auto_machine_weight=1.0, use reference query for all self attention's context.
gn_auto_machine_weight (`float`):
Weight of using reference adain. If gn_auto_machine_weight=2.0, use all reference adain plugins.
reference_guidance_start (`float`, *optional*, defaults to 0.0):
The percentage of total steps at which the reference ControlNet starts applying.
reference_guidance_end (`float`, *optional*, defaults to 1.0):
The percentage of total steps at which the reference ControlNet stops applying.
style_fidelity (`float`):
style fidelity of ref_uncond_xt. If style_fidelity=1.0, control more important,
elif style_fidelity=0.0, prompt more important, else balanced.
reference_attn (`bool`):
Whether to use reference query for self attention's context.
reference_adain (`bool`):
Whether to use reference adain.
Examples:
Returns:
[`~pipelines.stable_diffusion_xl.StableDiffusionXLPipelineOutput`] or `tuple`:
[`~pipelines.stable_diffusion_xl.StableDiffusionXLPipelineOutput`] if `return_dict` is True, otherwise a
`tuple`. When returning a tuple, the first element is a list with the generated images.
"""
callback = kwargs.pop("callback", None)
callback_steps = kwargs.pop("callback_steps", None)
if callback is not None:
deprecate(
"callback",
"1.0.0",
"Passing `callback` as an input argument to `__call__` is deprecated, consider use `callback_on_step_end`",
)
if callback_steps is not None:
deprecate(
"callback_steps",
"1.0.0",
"Passing `callback_steps` as an input argument to `__call__` is deprecated, consider use `callback_on_step_end`",
)
if isinstance(callback_on_step_end, (PipelineCallback, MultiPipelineCallbacks)):
callback_on_step_end_tensor_inputs = callback_on_step_end.tensor_inputs
assert reference_attn or reference_adain, "`reference_attn` or `reference_adain` must be True."
# 0. Default height and width to unet
# height, width = self._default_height_width(height, width, ref_image)
height = height or self.default_sample_size * self.vae_scale_factor
width = width or self.default_sample_size * self.vae_scale_factor
original_size = original_size or (height, width)
target_size = target_size or (height, width)
@@ -512,27 +244,8 @@ class StableDiffusionXLReferencePipeline(StableDiffusionXLPipeline):
negative_prompt_embeds,
pooled_prompt_embeds,
negative_pooled_prompt_embeds,
ip_adapter_image,
ip_adapter_image_embeds,
callback_on_step_end_tensor_inputs,
)
self.check_ref_inputs(
ref_image,
reference_guidance_start,
reference_guidance_end,
style_fidelity,
reference_attn,
reference_adain,
)
self._guidance_scale = guidance_scale
self._guidance_rescale = guidance_rescale
self._clip_skip = clip_skip
self._cross_attention_kwargs = cross_attention_kwargs
self._denoising_end = denoising_end
self._interrupt = False
# 2. Define call parameters
if prompt is not None and isinstance(prompt, str):
batch_size = 1
@@ -543,11 +256,15 @@ class StableDiffusionXLReferencePipeline(StableDiffusionXLPipeline):
device = self._execution_device
# 3. Encode input prompt
lora_scale = (
self.cross_attention_kwargs.get("scale", None) if self.cross_attention_kwargs is not None else None
)
# here `guidance_scale` is defined analog to the guidance weight `w` of equation (2)
# of the Imagen paper: https://arxiv.org/pdf/2205.11487.pdf . `guidance_scale = 1`
# corresponds to doing no classifier free guidance.
do_classifier_free_guidance = guidance_scale > 1.0
# 3. Encode input prompt
text_encoder_lora_scale = (
cross_attention_kwargs.get("scale", None) if cross_attention_kwargs is not None else None
)
(
prompt_embeds,
negative_prompt_embeds,
@@ -558,19 +275,17 @@ class StableDiffusionXLReferencePipeline(StableDiffusionXLPipeline):
prompt_2=prompt_2,
device=device,
num_images_per_prompt=num_images_per_prompt,
do_classifier_free_guidance=self.do_classifier_free_guidance,
do_classifier_free_guidance=do_classifier_free_guidance,
negative_prompt=negative_prompt,
negative_prompt_2=negative_prompt_2,
prompt_embeds=prompt_embeds,
negative_prompt_embeds=negative_prompt_embeds,
pooled_prompt_embeds=pooled_prompt_embeds,
negative_pooled_prompt_embeds=negative_pooled_prompt_embeds,
lora_scale=lora_scale,
clip_skip=self.clip_skip,
lora_scale=text_encoder_lora_scale,
)
# 4. Preprocess reference image
ref_image = self.prepare_ref_image(
ref_image = self.prepare_image(
image=ref_image,
width=width,
height=height,
@@ -581,9 +296,9 @@ class StableDiffusionXLReferencePipeline(StableDiffusionXLPipeline):
)
# 5. Prepare timesteps
timesteps, num_inference_steps = retrieve_timesteps(
self.scheduler, num_inference_steps, device, timesteps, sigmas
)
self.scheduler.set_timesteps(num_inference_steps, device=device)
timesteps = self.scheduler.timesteps
# 6. Prepare latent variables
num_channels_latents = self.unet.config.in_channels
@@ -597,7 +312,6 @@ class StableDiffusionXLReferencePipeline(StableDiffusionXLPipeline):
generator,
latents,
)
# 7. Prepare reference latent variables
ref_image_latents = self.prepare_ref_latents(
ref_image,
@@ -605,21 +319,13 @@ class StableDiffusionXLReferencePipeline(StableDiffusionXLPipeline):
prompt_embeds.dtype,
device,
generator,
self.do_classifier_free_guidance,
do_classifier_free_guidance,
)
# 8. Prepare extra step kwargs. TODO: Logic should ideally just be moved out of the pipeline
extra_step_kwargs = self.prepare_extra_step_kwargs(generator, eta)
# 8.1 Create tensor stating which reference controlnets to keep
reference_keeps = []
for i in range(len(timesteps)):
reference_keep = 1.0 - float(
i / len(timesteps) < reference_guidance_start or (i + 1) / len(timesteps) > reference_guidance_end
)
reference_keeps.append(reference_keep)
# 8.2 Modify self attention and group norm
# 9. Modify self attebtion and group norm
MODE = "write"
uc_mask = (
torch.Tensor([1] * batch_size * num_images_per_prompt + [0] * batch_size * num_images_per_prompt)
@@ -627,8 +333,6 @@ class StableDiffusionXLReferencePipeline(StableDiffusionXLPipeline):
.bool()
)
do_classifier_free_guidance = self.do_classifier_free_guidance
def hacked_basic_transformer_inner_forward(
self,
hidden_states: torch.Tensor,
@@ -900,7 +604,7 @@ class StableDiffusionXLReferencePipeline(StableDiffusionXLPipeline):
return hidden_states
def hacked_UpBlock2D_forward(
self, hidden_states, res_hidden_states_tuple, temb=None, upsample_size=None, *args, **kwargs
self, hidden_states, res_hidden_states_tuple, temb=None, upsample_size=None, **kwargs
):
eps = 1e-6
for i, resnet in enumerate(self.resnets):
@@ -980,7 +684,7 @@ class StableDiffusionXLReferencePipeline(StableDiffusionXLPipeline):
module.var_bank = []
module.gn_weight *= 2
# 9. Prepare added time ids & embeddings
# 10. Prepare added time ids & embeddings
add_text_embeds = pooled_prompt_embeds
if self.text_encoder_2 is None:
text_encoder_projection_dim = int(pooled_prompt_embeds.shape[-1])
@@ -994,101 +698,62 @@ class StableDiffusionXLReferencePipeline(StableDiffusionXLPipeline):
dtype=prompt_embeds.dtype,
text_encoder_projection_dim=text_encoder_projection_dim,
)
if negative_original_size is not None and negative_target_size is not None:
negative_add_time_ids = self._get_add_time_ids(
negative_original_size,
negative_crops_coords_top_left,
negative_target_size,
dtype=prompt_embeds.dtype,
text_encoder_projection_dim=text_encoder_projection_dim,
)
else:
negative_add_time_ids = add_time_ids
if self.do_classifier_free_guidance:
if do_classifier_free_guidance:
prompt_embeds = torch.cat([negative_prompt_embeds, prompt_embeds], dim=0)
add_text_embeds = torch.cat([negative_pooled_prompt_embeds, add_text_embeds], dim=0)
add_time_ids = torch.cat([negative_add_time_ids, add_time_ids], dim=0)
add_time_ids = torch.cat([add_time_ids, add_time_ids], dim=0)
prompt_embeds = prompt_embeds.to(device)
add_text_embeds = add_text_embeds.to(device)
add_time_ids = add_time_ids.to(device).repeat(batch_size * num_images_per_prompt, 1)
if ip_adapter_image is not None or ip_adapter_image_embeds is not None:
image_embeds = self.prepare_ip_adapter_image_embeds(
ip_adapter_image,
ip_adapter_image_embeds,
device,
batch_size * num_images_per_prompt,
self.do_classifier_free_guidance,
)
# 10. Denoising loop
# 11. Denoising loop
num_warmup_steps = max(len(timesteps) - num_inference_steps * self.scheduler.order, 0)
# 10.1 Apply denoising_end
if (
self.denoising_end is not None
and isinstance(self.denoising_end, float)
and self.denoising_end > 0
and self.denoising_end < 1
):
if denoising_end is not None and isinstance(denoising_end, float) and denoising_end > 0 and denoising_end < 1:
discrete_timestep_cutoff = int(
round(
self.scheduler.config.num_train_timesteps
- (self.denoising_end * self.scheduler.config.num_train_timesteps)
- (denoising_end * self.scheduler.config.num_train_timesteps)
)
)
num_inference_steps = len(list(filter(lambda ts: ts >= discrete_timestep_cutoff, timesteps)))
timesteps = timesteps[:num_inference_steps]
# 11. Optionally get Guidance Scale Embedding
timestep_cond = None
if self.unet.config.time_cond_proj_dim is not None:
guidance_scale_tensor = torch.tensor(self.guidance_scale - 1).repeat(batch_size * num_images_per_prompt)
timestep_cond = self.get_guidance_scale_embedding(
guidance_scale_tensor, embedding_dim=self.unet.config.time_cond_proj_dim
).to(device=device, dtype=latents.dtype)
self._num_timesteps = len(timesteps)
with self.progress_bar(total=num_inference_steps) as progress_bar:
for i, t in enumerate(timesteps):
if self.interrupt:
continue
# expand the latents if we are doing classifier free guidance
latent_model_input = torch.cat([latents] * 2) if self.do_classifier_free_guidance else latents
latent_model_input = torch.cat([latents] * 2) if do_classifier_free_guidance else latents
latent_model_input = self.scheduler.scale_model_input(latent_model_input, t)
# predict the noise residual
added_cond_kwargs = {"text_embeds": add_text_embeds, "time_ids": add_time_ids}
if ip_adapter_image is not None or ip_adapter_image_embeds is not None:
added_cond_kwargs["image_embeds"] = image_embeds
# ref only part
if reference_keeps[i] > 0:
noise = randn_tensor(
ref_image_latents.shape, generator=generator, device=device, dtype=ref_image_latents.dtype
)
ref_xt = self.scheduler.add_noise(
ref_image_latents,
noise,
t.reshape(
1,
),
)
ref_xt = self.scheduler.scale_model_input(ref_xt, t)
noise = randn_tensor(
ref_image_latents.shape, generator=generator, device=device, dtype=ref_image_latents.dtype
)
ref_xt = self.scheduler.add_noise(
ref_image_latents,
noise,
t.reshape(
1,
),
)
ref_xt = self.scheduler.scale_model_input(ref_xt, t)
MODE = "write"
self.unet(
ref_xt,
t,
encoder_hidden_states=prompt_embeds,
cross_attention_kwargs=cross_attention_kwargs,
added_cond_kwargs=added_cond_kwargs,
return_dict=False,
)
MODE = "write"
self.unet(
ref_xt,
t,
encoder_hidden_states=prompt_embeds,
cross_attention_kwargs=cross_attention_kwargs,
added_cond_kwargs=added_cond_kwargs,
return_dict=False,
)
# predict the noise residual
MODE = "read"
@@ -1096,44 +761,22 @@ class StableDiffusionXLReferencePipeline(StableDiffusionXLPipeline):
latent_model_input,
t,
encoder_hidden_states=prompt_embeds,
timestep_cond=timestep_cond,
cross_attention_kwargs=self.cross_attention_kwargs,
cross_attention_kwargs=cross_attention_kwargs,
added_cond_kwargs=added_cond_kwargs,
return_dict=False,
)[0]
# perform guidance
if self.do_classifier_free_guidance:
if do_classifier_free_guidance:
noise_pred_uncond, noise_pred_text = noise_pred.chunk(2)
noise_pred = noise_pred_uncond + self.guidance_scale * (noise_pred_text - noise_pred_uncond)
noise_pred = noise_pred_uncond + guidance_scale * (noise_pred_text - noise_pred_uncond)
if self.do_classifier_free_guidance and self.guidance_rescale > 0.0:
if do_classifier_free_guidance and guidance_rescale > 0.0:
# Based on 3.4. in https://arxiv.org/pdf/2305.08891.pdf
noise_pred = rescale_noise_cfg(noise_pred, noise_pred_text, guidance_rescale=self.guidance_rescale)
noise_pred = rescale_noise_cfg(noise_pred, noise_pred_text, guidance_rescale=guidance_rescale)
# compute the previous noisy sample x_t -> x_t-1
latents_dtype = latents.dtype
latents = self.scheduler.step(noise_pred, t, latents, **extra_step_kwargs, return_dict=False)[0]
if latents.dtype != latents_dtype:
if torch.backends.mps.is_available():
# some platforms (eg. apple mps) misbehave due to a pytorch bug: https://github.com/pytorch/pytorch/pull/99272
latents = latents.to(latents_dtype)
if callback_on_step_end is not None:
callback_kwargs = {}
for k in callback_on_step_end_tensor_inputs:
callback_kwargs[k] = locals()[k]
callback_outputs = callback_on_step_end(self, i, t, callback_kwargs)
latents = callback_outputs.pop("latents", latents)
prompt_embeds = callback_outputs.pop("prompt_embeds", prompt_embeds)
negative_prompt_embeds = callback_outputs.pop("negative_prompt_embeds", negative_prompt_embeds)
add_text_embeds = callback_outputs.pop("add_text_embeds", add_text_embeds)
negative_pooled_prompt_embeds = callback_outputs.pop(
"negative_pooled_prompt_embeds", negative_pooled_prompt_embeds
)
add_time_ids = callback_outputs.pop("add_time_ids", add_time_ids)
negative_add_time_ids = callback_outputs.pop("negative_add_time_ids", negative_add_time_ids)
# call the callback, if provided
if i == len(timesteps) - 1 or ((i + 1) > num_warmup_steps and (i + 1) % self.scheduler.order == 0):
@@ -1142,9 +785,6 @@ class StableDiffusionXLReferencePipeline(StableDiffusionXLPipeline):
step_idx = i // getattr(self.scheduler, "order", 1)
callback(step_idx, t, latents)
if XLA_AVAILABLE:
xm.mark_step()
if not output_type == "latent":
# make sure the VAE is in float32 mode, as it overflows in float16
needs_upcasting = self.vae.dtype == torch.float16 and self.vae.config.force_upcast
@@ -1152,43 +792,25 @@ class StableDiffusionXLReferencePipeline(StableDiffusionXLPipeline):
if needs_upcasting:
self.upcast_vae()
latents = latents.to(next(iter(self.vae.post_quant_conv.parameters())).dtype)
elif latents.dtype != self.vae.dtype:
if torch.backends.mps.is_available():
# some platforms (eg. apple mps) misbehave due to a pytorch bug: https://github.com/pytorch/pytorch/pull/99272
self.vae = self.vae.to(latents.dtype)
# unscale/denormalize the latents
# denormalize with the mean and std if available and not None
has_latents_mean = hasattr(self.vae.config, "latents_mean") and self.vae.config.latents_mean is not None
has_latents_std = hasattr(self.vae.config, "latents_std") and self.vae.config.latents_std is not None
if has_latents_mean and has_latents_std:
latents_mean = (
torch.tensor(self.vae.config.latents_mean).view(1, 4, 1, 1).to(latents.device, latents.dtype)
)
latents_std = (
torch.tensor(self.vae.config.latents_std).view(1, 4, 1, 1).to(latents.device, latents.dtype)
)
latents = latents * latents_std / self.vae.config.scaling_factor + latents_mean
else:
latents = latents / self.vae.config.scaling_factor
image = self.vae.decode(latents, return_dict=False)[0]
image = self.vae.decode(latents / self.vae.config.scaling_factor, return_dict=False)[0]
# cast back to fp16 if needed
if needs_upcasting:
self.vae.to(dtype=torch.float16)
else:
image = latents
return StableDiffusionXLPipelineOutput(images=image)
if not output_type == "latent":
# apply watermark if available
if self.watermark is not None:
image = self.watermark.apply_watermark(image)
# apply watermark if available
if self.watermark is not None:
image = self.watermark.apply_watermark(image)
image = self.image_processor.postprocess(image, output_type=output_type)
image = self.image_processor.postprocess(image, output_type=output_type)
# Offload all models
self.maybe_free_model_hooks()
# Offload last model to CPU
if hasattr(self, "final_offload_hook") and self.final_offload_hook is not None:
self.final_offload_hook.offload()
if not return_dict:
return (image,)

View File

@@ -1,6 +1,6 @@
# ControlNet training example for Stable Diffusion 3/3.5 (SD3/3.5)
# ControlNet training example for Stable Diffusion 3 (SD3)
The `train_controlnet_sd3.py` script shows how to implement the ControlNet training procedure and adapt it for [Stable Diffusion 3](https://arxiv.org/abs/2403.03206) and [Stable Diffusion 3.5](https://stability.ai/news/introducing-stable-diffusion-3-5).
The `train_controlnet_sd3.py` script shows how to implement the ControlNet training procedure and adapt it for [Stable Diffusion 3](https://arxiv.org/abs/2403.03206).
## Running locally with PyTorch
@@ -51,9 +51,9 @@ Please download the dataset and unzip it in the directory `fill50k` in the `exam
## Training
First download the SD3 model from [Hugging Face Hub](https://huggingface.co/stabilityai/stable-diffusion-3-medium-diffusers) or the SD3.5 model from [Hugging Face Hub](https://huggingface.co/stabilityai/stable-diffusion-3.5-medium). We will use it as a base model for the ControlNet training.
First download the SD3 model from [Hugging Face Hub](https://huggingface.co/stabilityai/stable-diffusion-3-medium). We will use it as a base model for the ControlNet training.
> [!NOTE]
> As the model is gated, before using it with diffusers you first need to go to the [Stable Diffusion 3 Medium Hugging Face page](https://huggingface.co/stabilityai/stable-diffusion-3-medium-diffusers) or [Stable Diffusion 3.5 Large Hugging Face page](https://huggingface.co/stabilityai/stable-diffusion-3.5-medium), fill in the form and accept the gate. Once you are in, you need to log in so that your system knows youve accepted the gate. Use the command below to log in:
> As the model is gated, before using it with diffusers you first need to go to the [Stable Diffusion 3 Medium Hugging Face page](https://huggingface.co/stabilityai/stable-diffusion-3-medium-diffusers), fill in the form and accept the gate. Once you are in, you need to log in so that your system knows youve accepted the gate. Use the command below to log in:
```bash
huggingface-cli login
@@ -90,8 +90,6 @@ accelerate launch train_controlnet_sd3.py \
--gradient_accumulation_steps=4
```
To train a ControlNet model for Stable Diffusion 3.5, replace the `MODEL_DIR` with `stabilityai/stable-diffusion-3.5-medium`.
To better track our training experiments, we're using flags `validation_image`, `validation_prompt`, and `validation_steps` to allow the script to do a few validation inference runs. This allows us to qualitatively check if the training is progressing as expected.
Our experiments were conducted on a single 40GB A100 GPU.
@@ -126,8 +124,6 @@ image = pipe(
image.save("./output.png")
```
Similarly, for SD3.5, replace the `base_model_path` with `stabilityai/stable-diffusion-3.5-medium` and controlnet_path `DavyMorgan/sd35-controlnet-out'.
## Notes
### GPU usage
@@ -139,8 +135,6 @@ Make sure to use the right GPU when configuring the [accelerator](https://huggin
## Example results
### SD3
#### After 500 steps with batch size 8
| | |
@@ -156,20 +150,3 @@ Make sure to use the right GPU when configuring the [accelerator](https://huggin
|| pale golden rod circle with old lace background |
![conditioning image](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/controlnet_training/conditioning_image_1.png) | ![pale golden rod circle with old lace background](https://huggingface.co/datasets/DavyMorgan/sd3-controlnet-results/resolve/main/step-6500.png) |
### SD3.5
#### After 500 steps with batch size 8
| | |
|-------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------:|
|| pale golden rod circle with old lace background |
![conditioning image](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/controlnet_training/conditioning_image_1.png) | ![pale golden rod circle with old lace background](https://huggingface.co/datasets/DavyMorgan/sd3-controlnet-results/resolve/main/step-500-3.5.png) |
#### After 3000 steps with batch size 8:
| | |
|-------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------:|
|| pale golden rod circle with old lace background |
![conditioning image](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/controlnet_training/conditioning_image_1.png) | ![pale golden rod circle with old lace background](https://huggingface.co/datasets/DavyMorgan/sd3-controlnet-results/resolve/main/step-3000-3.5.png) |

View File

@@ -138,27 +138,6 @@ class ControlNetSD3(ExamplesTestsAccelerate):
self.assertTrue(os.path.isfile(os.path.join(tmpdir, "diffusion_pytorch_model.safetensors")))
class ControlNetSD35(ExamplesTestsAccelerate):
def test_controlnet_sd3(self):
with tempfile.TemporaryDirectory() as tmpdir:
test_args = f"""
examples/controlnet/train_controlnet_sd3.py
--pretrained_model_name_or_path=hf-internal-testing/tiny-sd35-pipe
--dataset_name=hf-internal-testing/fill10
--output_dir={tmpdir}
--resolution=64
--train_batch_size=1
--gradient_accumulation_steps=1
--controlnet_model_name_or_path=DavyMorgan/tiny-controlnet-sd35
--max_train_steps=4
--checkpointing_steps=2
""".split()
run_command(self._launch_args + test_args)
self.assertTrue(os.path.isfile(os.path.join(tmpdir, "diffusion_pytorch_model.safetensors")))
class ControlNetflux(ExamplesTestsAccelerate):
def test_controlnet_flux(self):
with tempfile.TemporaryDirectory() as tmpdir:

View File

@@ -571,6 +571,9 @@ def parse_args(input_args=None):
if args.dataset_name is None and args.train_data_dir is None:
raise ValueError("Specify either `--dataset_name` or `--train_data_dir`")
if args.dataset_name is not None and args.train_data_dir is not None:
raise ValueError("Specify only one of `--dataset_name` or `--train_data_dir`")
if args.proportion_empty_prompts < 0 or args.proportion_empty_prompts > 1:
raise ValueError("`--proportion_empty_prompts` must be in the range [0, 1].")
@@ -612,7 +615,6 @@ def make_train_dataset(args, tokenizer, accelerator):
args.dataset_name,
args.dataset_config_name,
cache_dir=args.cache_dir,
data_dir=args.train_data_dir,
)
else:
if args.train_data_dir is not None:

View File

@@ -263,12 +263,6 @@ def parse_args(input_args=None):
help="Path to pretrained controlnet model or model identifier from huggingface.co/models."
" If not specified controlnet weights are initialized from unet.",
)
parser.add_argument(
"--num_extra_conditioning_channels",
type=int,
default=0,
help="Number of extra conditioning channels for controlnet.",
)
parser.add_argument(
"--revision",
type=str,
@@ -545,9 +539,6 @@ def parse_args(input_args=None):
default=77,
help="Maximum sequence length to use with with the T5 text encoder",
)
parser.add_argument(
"--dataset_preprocess_batch_size", type=int, default=1000, help="Batch size for preprocessing dataset."
)
parser.add_argument(
"--validation_prompt",
type=str,
@@ -995,9 +986,7 @@ def main(args):
controlnet = SD3ControlNetModel.from_pretrained(args.controlnet_model_name_or_path)
else:
logger.info("Initializing controlnet weights from transformer")
controlnet = SD3ControlNetModel.from_transformer(
transformer, num_extra_conditioning_channels=args.num_extra_conditioning_channels
)
controlnet = SD3ControlNetModel.from_transformer(transformer)
transformer.requires_grad_(False)
vae.requires_grad_(False)
@@ -1134,12 +1123,7 @@ def main(args):
# fingerprint used by the cache for the other processes to load the result
# details: https://github.com/huggingface/diffusers/pull/4038#discussion_r1266078401
new_fingerprint = Hasher.hash(args)
train_dataset = train_dataset.map(
compute_embeddings_fn,
batched=True,
batch_size=args.dataset_preprocess_batch_size,
new_fingerprint=new_fingerprint,
)
train_dataset = train_dataset.map(compute_embeddings_fn, batched=True, new_fingerprint=new_fingerprint)
del text_encoder_one, text_encoder_two, text_encoder_three
del tokenizer_one, tokenizer_two, tokenizer_three

View File

@@ -598,6 +598,9 @@ def parse_args(input_args=None):
if args.dataset_name is None and args.train_data_dir is None:
raise ValueError("Specify either `--dataset_name` or `--train_data_dir`")
if args.dataset_name is not None and args.train_data_dir is not None:
raise ValueError("Specify only one of `--dataset_name` or `--train_data_dir`")
if args.proportion_empty_prompts < 0 or args.proportion_empty_prompts > 1:
raise ValueError("`--proportion_empty_prompts` must be in the range [0, 1].")
@@ -639,7 +642,6 @@ def get_train_dataset(args, accelerator):
args.dataset_name,
args.dataset_config_name,
cache_dir=args.cache_dir,
data_dir=args.train_data_dir,
)
else:
if args.train_data_dir is not None:

View File

@@ -118,7 +118,7 @@ accelerate launch train_dreambooth_flux.py \
To better track our training experiments, we're using the following flags in the command above:
* `report_to="wandb` will ensure the training runs are tracked on [Weights and Biases](https://wandb.ai/site). To use it, be sure to install `wandb` with `pip install wandb`. Don't forget to call `wandb login <your_api_key>` before training if you haven't done it before.
* `report_to="wandb` will ensure the training runs are tracked on Weights and Biases. To use it, be sure to install `wandb` with `pip install wandb`.
* `validation_prompt` and `validation_epochs` to allow the script to do a few validation inference runs. This allows us to qualitatively check if the training is progressing as expected.
> [!NOTE]

View File

@@ -1,127 +0,0 @@
# DreamBooth training example for SANA
[DreamBooth](https://arxiv.org/abs/2208.12242) is a method to personalize text2image models like stable diffusion given just a few (3~5) images of a subject.
The `train_dreambooth_lora_sana.py` script shows how to implement the training procedure with [LoRA](https://huggingface.co/docs/peft/conceptual_guides/adapter#low-rank-adaptation-lora) and adapt it for [SANA](https://arxiv.org/abs/2410.10629).
This will also allow us to push the trained model parameters to the Hugging Face Hub platform.
## Running locally with PyTorch
### Installing the dependencies
Before running the scripts, make sure to install the library's training dependencies:
**Important**
To make sure you can successfully run the latest versions of the example scripts, we highly recommend **installing from source** and keeping the install up to date as we update the example scripts frequently and install some example-specific requirements. To do this, execute the following steps in a new virtual environment:
```bash
git clone https://github.com/huggingface/diffusers
cd diffusers
pip install -e .
```
Then cd in the `examples/dreambooth` folder and run
```bash
pip install -r requirements_sana.txt
```
And initialize an [🤗Accelerate](https://github.com/huggingface/accelerate/) environment with:
```bash
accelerate config
```
Or for a default accelerate configuration without answering questions about your environment
```bash
accelerate config default
```
Or if your environment doesn't support an interactive shell (e.g., a notebook)
```python
from accelerate.utils import write_basic_config
write_basic_config()
```
When running `accelerate config`, if we specify torch compile mode to True there can be dramatic speedups.
Note also that we use PEFT library as backend for LoRA training, make sure to have `peft>=0.14.0` installed in your environment.
### Dog toy example
Now let's get our dataset. For this example we will use some dog images: https://huggingface.co/datasets/diffusers/dog-example.
Let's first download it locally:
```python
from huggingface_hub import snapshot_download
local_dir = "./dog"
snapshot_download(
"diffusers/dog-example",
local_dir=local_dir, repo_type="dataset",
ignore_patterns=".gitattributes",
)
```
This will also allow us to push the trained LoRA parameters to the Hugging Face Hub platform.
Now, we can launch training using:
```bash
export MODEL_NAME="Efficient-Large-Model/Sana_1600M_1024px_BF16_diffusers"
export INSTANCE_DIR="dog"
export OUTPUT_DIR="trained-sana-lora"
accelerate launch train_dreambooth_lora_sana.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--instance_data_dir=$INSTANCE_DIR \
--output_dir=$OUTPUT_DIR \
--mixed_precision="bf16" \
--instance_prompt="a photo of sks dog" \
--resolution=1024 \
--train_batch_size=1 \
--gradient_accumulation_steps=4 \
--use_8bit_adam \
--learning_rate=1e-4 \
--report_to="wandb" \
--lr_scheduler="constant" \
--lr_warmup_steps=0 \
--max_train_steps=500 \
--validation_prompt="A photo of sks dog in a bucket" \
--validation_epochs=25 \
--seed="0" \
--push_to_hub
```
For using `push_to_hub`, make you're logged into your Hugging Face account:
```bash
huggingface-cli login
```
To better track our training experiments, we're using the following flags in the command above:
* `report_to="wandb` will ensure the training runs are tracked on [Weights and Biases](https://wandb.ai/site). To use it, be sure to install `wandb` with `pip install wandb`. Don't forget to call `wandb login <your_api_key>` before training if you haven't done it before.
* `validation_prompt` and `validation_epochs` to allow the script to do a few validation inference runs. This allows us to qualitatively check if the training is progressing as expected.
## Notes
Additionally, we welcome you to explore the following CLI arguments:
* `--lora_layers`: The transformer modules to apply LoRA training on. Please specify the layers in a comma seperated. E.g. - "to_k,to_q,to_v" will result in lora training of attention layers only.
* `--complex_human_instruction`: Instructions for complex human attention as shown in [here](https://github.com/NVlabs/Sana/blob/main/configs/sana_app_config/Sana_1600M_app.yaml#L55).
* `--max_sequence_length`: Maximum sequence length to use for text embeddings.
We provide several options for optimizing memory optimization:
* `--offload`: When enabled, we will offload the text encoder and VAE to CPU, when they are not used.
* `cache_latents`: When enabled, we will pre-compute the latents from the input images with the VAE and remove the VAE from memory once done.
* `--use_8bit_adam`: When enabled, we will use the 8bit version of AdamW provided by the `bitsandbytes` library.
Refer to the [official documentation](https://huggingface.co/docs/diffusers/main/en/api/pipelines/sana) of the `SanaPipeline` to know more about the models available under the SANA family and their preferred dtypes during inference.

View File

@@ -105,7 +105,7 @@ accelerate launch train_dreambooth_sd3.py \
To better track our training experiments, we're using the following flags in the command above:
* `report_to="wandb` will ensure the training runs are tracked on [Weights and Biases](https://wandb.ai/site). To use it, be sure to install `wandb` with `pip install wandb`. Don't forget to call `wandb login <your_api_key>` before training if you haven't done it before.
* `report_to="wandb` will ensure the training runs are tracked on Weights and Biases. To use it, be sure to install `wandb` with `pip install wandb`.
* `validation_prompt` and `validation_epochs` to allow the script to do a few validation inference runs. This allows us to qualitatively check if the training is progressing as expected.
> [!NOTE]

View File

@@ -99,7 +99,7 @@ accelerate launch train_dreambooth_lora_sdxl.py \
To better track our training experiments, we're using the following flags in the command above:
* `report_to="wandb` will ensure the training runs are tracked on [Weights and Biases](https://wandb.ai/site). To use it, be sure to install `wandb` with `pip install wandb`. Don't forget to call `wandb login <your_api_key>` before training if you haven't done it before.
* `report_to="wandb` will ensure the training runs are tracked on Weights and Biases. To use it, be sure to install `wandb` with `pip install wandb`.
* `validation_prompt` and `validation_epochs` to allow the script to do a few validation inference runs. This allows us to qualitatively check if the training is progressing as expected.
Our experiments were conducted on a single 40GB A100 GPU.

View File

@@ -1,8 +0,0 @@
accelerate>=1.0.0
torchvision
transformers>=4.47.0
ftfy
tensorboard
Jinja2
peft>=0.14.0
sentencepiece

View File

@@ -1,206 +0,0 @@
# coding=utf-8
# Copyright 2024 HuggingFace Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import logging
import os
import sys
import tempfile
import safetensors
sys.path.append("..")
from test_examples_utils import ExamplesTestsAccelerate, run_command # noqa: E402
logging.basicConfig(level=logging.DEBUG)
logger = logging.getLogger()
stream_handler = logging.StreamHandler(sys.stdout)
logger.addHandler(stream_handler)
class DreamBoothLoRASANA(ExamplesTestsAccelerate):
instance_data_dir = "docs/source/en/imgs"
pretrained_model_name_or_path = "hf-internal-testing/tiny-sana-pipe"
script_path = "examples/dreambooth/train_dreambooth_lora_sana.py"
transformer_layer_type = "transformer_blocks.0.attn1.to_k"
def test_dreambooth_lora_sana(self):
with tempfile.TemporaryDirectory() as tmpdir:
test_args = f"""
{self.script_path}
--pretrained_model_name_or_path {self.pretrained_model_name_or_path}
--instance_data_dir {self.instance_data_dir}
--resolution 32
--train_batch_size 1
--gradient_accumulation_steps 1
--max_train_steps 2
--learning_rate 5.0e-04
--scale_lr
--lr_scheduler constant
--lr_warmup_steps 0
--output_dir {tmpdir}
--max_sequence_length 16
""".split()
test_args.extend(["--instance_prompt", ""])
run_command(self._launch_args + test_args)
# save_pretrained smoke test
self.assertTrue(os.path.isfile(os.path.join(tmpdir, "pytorch_lora_weights.safetensors")))
# make sure the state_dict has the correct naming in the parameters.
lora_state_dict = safetensors.torch.load_file(os.path.join(tmpdir, "pytorch_lora_weights.safetensors"))
is_lora = all("lora" in k for k in lora_state_dict.keys())
self.assertTrue(is_lora)
# when not training the text encoder, all the parameters in the state dict should start
# with `"transformer"` in their names.
starts_with_transformer = all(key.startswith("transformer") for key in lora_state_dict.keys())
self.assertTrue(starts_with_transformer)
def test_dreambooth_lora_latent_caching(self):
with tempfile.TemporaryDirectory() as tmpdir:
test_args = f"""
{self.script_path}
--pretrained_model_name_or_path {self.pretrained_model_name_or_path}
--instance_data_dir {self.instance_data_dir}
--resolution 32
--train_batch_size 1
--gradient_accumulation_steps 1
--max_train_steps 2
--cache_latents
--learning_rate 5.0e-04
--scale_lr
--lr_scheduler constant
--lr_warmup_steps 0
--output_dir {tmpdir}
--max_sequence_length 16
""".split()
test_args.extend(["--instance_prompt", ""])
run_command(self._launch_args + test_args)
# save_pretrained smoke test
self.assertTrue(os.path.isfile(os.path.join(tmpdir, "pytorch_lora_weights.safetensors")))
# make sure the state_dict has the correct naming in the parameters.
lora_state_dict = safetensors.torch.load_file(os.path.join(tmpdir, "pytorch_lora_weights.safetensors"))
is_lora = all("lora" in k for k in lora_state_dict.keys())
self.assertTrue(is_lora)
# when not training the text encoder, all the parameters in the state dict should start
# with `"transformer"` in their names.
starts_with_transformer = all(key.startswith("transformer") for key in lora_state_dict.keys())
self.assertTrue(starts_with_transformer)
def test_dreambooth_lora_layers(self):
with tempfile.TemporaryDirectory() as tmpdir:
test_args = f"""
{self.script_path}
--pretrained_model_name_or_path {self.pretrained_model_name_or_path}
--instance_data_dir {self.instance_data_dir}
--resolution 32
--train_batch_size 1
--gradient_accumulation_steps 1
--max_train_steps 2
--cache_latents
--learning_rate 5.0e-04
--scale_lr
--lora_layers {self.transformer_layer_type}
--lr_scheduler constant
--lr_warmup_steps 0
--output_dir {tmpdir}
--max_sequence_length 16
""".split()
test_args.extend(["--instance_prompt", ""])
run_command(self._launch_args + test_args)
# save_pretrained smoke test
self.assertTrue(os.path.isfile(os.path.join(tmpdir, "pytorch_lora_weights.safetensors")))
# make sure the state_dict has the correct naming in the parameters.
lora_state_dict = safetensors.torch.load_file(os.path.join(tmpdir, "pytorch_lora_weights.safetensors"))
is_lora = all("lora" in k for k in lora_state_dict.keys())
self.assertTrue(is_lora)
# when not training the text encoder, all the parameters in the state dict should start
# with `"transformer"` in their names. In this test, we only params of
# `self.transformer_layer_type` should be in the state dict.
starts_with_transformer = all(self.transformer_layer_type in key for key in lora_state_dict)
self.assertTrue(starts_with_transformer)
def test_dreambooth_lora_sana_checkpointing_checkpoints_total_limit(self):
with tempfile.TemporaryDirectory() as tmpdir:
test_args = f"""
{self.script_path}
--pretrained_model_name_or_path={self.pretrained_model_name_or_path}
--instance_data_dir={self.instance_data_dir}
--output_dir={tmpdir}
--resolution=32
--train_batch_size=1
--gradient_accumulation_steps=1
--max_train_steps=6
--checkpoints_total_limit=2
--checkpointing_steps=2
--max_sequence_length 16
""".split()
test_args.extend(["--instance_prompt", ""])
run_command(self._launch_args + test_args)
self.assertEqual(
{x for x in os.listdir(tmpdir) if "checkpoint" in x},
{"checkpoint-4", "checkpoint-6"},
)
def test_dreambooth_lora_sana_checkpointing_checkpoints_total_limit_removes_multiple_checkpoints(self):
with tempfile.TemporaryDirectory() as tmpdir:
test_args = f"""
{self.script_path}
--pretrained_model_name_or_path={self.pretrained_model_name_or_path}
--instance_data_dir={self.instance_data_dir}
--output_dir={tmpdir}
--resolution=32
--train_batch_size=1
--gradient_accumulation_steps=1
--max_train_steps=4
--checkpointing_steps=2
--max_sequence_length 166
""".split()
test_args.extend(["--instance_prompt", ""])
run_command(self._launch_args + test_args)
self.assertEqual({x for x in os.listdir(tmpdir) if "checkpoint" in x}, {"checkpoint-2", "checkpoint-4"})
resume_run_args = f"""
{self.script_path}
--pretrained_model_name_or_path={self.pretrained_model_name_or_path}
--instance_data_dir={self.instance_data_dir}
--output_dir={tmpdir}
--resolution=32
--train_batch_size=1
--gradient_accumulation_steps=1
--max_train_steps=8
--checkpointing_steps=2
--resume_from_checkpoint=checkpoint-4
--checkpoints_total_limit=2
--max_sequence_length 16
""".split()
resume_run_args.extend(["--instance_prompt", ""])
run_command(self._launch_args + resume_run_args)
self.assertEqual({x for x in os.listdir(tmpdir) if "checkpoint" in x}, {"checkpoint-6", "checkpoint-8"})

View File

@@ -1300,17 +1300,16 @@ def main(args):
# Since we predict the noise instead of x_0, the original formulation is slightly changed.
# This is discussed in Section 4.2 of the same paper.
snr = compute_snr(noise_scheduler, timesteps)
base_weight = (
torch.stack([snr, args.snr_gamma * torch.ones_like(timesteps)], dim=1).min(dim=1)[0] / snr
)
if noise_scheduler.config.prediction_type == "v_prediction":
# Velocity objective needs to be floored to an SNR weight of one.
divisor = snr + 1
mse_loss_weights = base_weight + 1
else:
divisor = snr
mse_loss_weights = (
torch.stack([snr, args.snr_gamma * torch.ones_like(timesteps)], dim=1).min(dim=1)[0] / divisor
)
# Epsilon and sample both use the same loss weights.
mse_loss_weights = base_weight
loss = F.mse_loss(model_pred.float(), target.float(), reduction="none")
loss = loss.mean(dim=list(range(1, len(loss.shape)))) * mse_loss_weights
loss = loss.mean()

View File

@@ -1648,15 +1648,11 @@ def main(args):
prompt=prompts,
)
else:
elems_to_repeat = len(prompts)
if args.train_text_encoder:
prompt_embeds, pooled_prompt_embeds, text_ids = encode_prompt(
text_encoders=[text_encoder_one, text_encoder_two],
tokenizers=[None, None],
text_input_ids_list=[
tokens_one.repeat(elems_to_repeat, 1),
tokens_two.repeat(elems_to_repeat, 1),
],
text_input_ids_list=[tokens_one, tokens_two],
max_sequence_length=args.max_sequence_length,
device=accelerator.device,
prompt=args.instance_prompt,

File diff suppressed because it is too large Load Diff

View File

@@ -1294,13 +1294,10 @@ def main(args):
for model in models:
if isinstance(model, type(unwrap_model(transformer))):
transformer_lora_layers_to_save = get_peft_model_state_dict(model)
elif isinstance(model, type(unwrap_model(text_encoder_one))): # or text_encoder_two
# both text encoders are of the same class, so we check hidden size to distinguish between the two
hidden_size = unwrap_model(model).config.hidden_size
if hidden_size == 768:
text_encoder_one_lora_layers_to_save = get_peft_model_state_dict(model)
elif hidden_size == 1280:
text_encoder_two_lora_layers_to_save = get_peft_model_state_dict(model)
elif isinstance(model, type(unwrap_model(text_encoder_one))):
text_encoder_one_lora_layers_to_save = get_peft_model_state_dict(model)
elif isinstance(model, type(unwrap_model(text_encoder_two))):
text_encoder_two_lora_layers_to_save = get_peft_model_state_dict(model)
else:
raise ValueError(f"unexpected save model: {model.__class__}")

View File

@@ -1,204 +0,0 @@
# Training Flux Control
This (experimental) example shows how to train Control LoRAs with [Flux](https://huggingface.co/black-forest-labs/FLUX.1-dev) by conditioning it with additional structural controls (like depth maps, poses, etc.). We provide a script for full fine-tuning, too, refer to [this section](#full-fine-tuning). To know more about Flux Control family, refer to the following resources:
* [Docs](https://github.com/black-forest-labs/flux/blob/main/docs/structural-conditioning.md) by Black Forest Labs
* Diffusers docs ([1](https://huggingface.co/docs/diffusers/main/en/api/pipelines/flux#canny-control), [2](https://huggingface.co/docs/diffusers/main/en/api/pipelines/flux#depth-control))
To incorporate additional condition latents, we expand the input features of Flux.1-Dev from 64 to 128. The first 64 channels correspond to the original input latents to be denoised, while the latter 64 channels correspond to control latents. This expansion happens on the `x_embedder` layer, where the combined latents are projected to the expected feature dimension of rest of the network. Inference is performed using the `FluxControlPipeline`.
> [!NOTE]
> **Gated model**
>
> As the model is gated, before using it with diffusers you first need to go to the [FLUX.1 [dev] Hugging Face page](https://huggingface.co/black-forest-labs/FLUX.1-dev), fill in the form and accept the gate. Once you are in, you need to log in so that your system knows youve accepted the gate. Use the command below to log in:
```bash
huggingface-cli login
```
The example command below shows how to launch fine-tuning for pose conditions. The dataset ([`raulc0399/open_pose_controlnet`](https://huggingface.co/datasets/raulc0399/open_pose_controlnet)) being used here already has the pose conditions of the original images, so we don't have to compute them.
```bash
accelerate launch train_control_lora_flux.py \
--pretrained_model_name_or_path="black-forest-labs/FLUX.1-dev" \
--dataset_name="raulc0399/open_pose_controlnet" \
--output_dir="pose-control-lora" \
--mixed_precision="bf16" \
--train_batch_size=1 \
--rank=64 \
--gradient_accumulation_steps=4 \
--gradient_checkpointing \
--use_8bit_adam \
--learning_rate=1e-4 \
--report_to="wandb" \
--lr_scheduler="constant" \
--lr_warmup_steps=0 \
--max_train_steps=5000 \
--validation_image="openpose.png" \
--validation_prompt="A couple, 4k photo, highly detailed" \
--offload \
--seed="0" \
--push_to_hub
```
`openpose.png` comes from [here](https://huggingface.co/Adapter/t2iadapter/resolve/main/openpose.png).
You need to install `diffusers` from the branch of [this PR](https://github.com/huggingface/diffusers/pull/9999). When it's merged, you should install `diffusers` from the `main`.
The training script exposes additional CLI args that might be useful to experiment with:
* `use_lora_bias`: When set, additionally trains the biases of the `lora_B` layer.
* `train_norm_layers`: When set, additionally trains the normalization scales. Takes care of saving and loading.
* `lora_layers`: Specify the layers you want to apply LoRA to. If you specify "all-linear", all the linear layers will be LoRA-attached.
### Training with DeepSpeed
It's possible to train with [DeepSpeed](https://github.com/microsoft/DeepSpeed), specifically leveraging the Zero2 system optimization. To use it, save the following config to an YAML file (feel free to modify as needed):
```yaml
compute_environment: LOCAL_MACHINE
debug: false
deepspeed_config:
gradient_accumulation_steps: 1
gradient_clipping: 1.0
offload_optimizer_device: cpu
offload_param_device: cpu
zero3_init_flag: false
zero_stage: 2
distributed_type: DEEPSPEED
downcast_bf16: 'no'
enable_cpu_affinity: false
machine_rank: 0
main_training_function: main
mixed_precision: bf16
num_machines: 1
num_processes: 1
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: false
```
And then while launching training, pass the config file:
```bash
accelerate launch --config_file=CONFIG_FILE.yaml ...
```
### Inference
The pose images in our dataset were computed using the [`controlnet_aux`](https://github.com/huggingface/controlnet_aux) library. Let's install it first:
```bash
pip install controlnet_aux
```
And then we are ready:
```py
from controlnet_aux import OpenposeDetector
from diffusers import FluxControlPipeline
from diffusers.utils import load_image
from PIL import Image
import numpy as np
import torch
pipe = FluxControlPipeline.from_pretrained("black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16).to("cuda")
pipe.load_lora_weights("...") # change this.
open_pose = OpenposeDetector.from_pretrained("lllyasviel/Annotators")
# prepare pose condition.
url = "https://huggingface.co/Adapter/t2iadapter/resolve/main/people.jpg"
image = load_image(url)
image = open_pose(image, detect_resolution=512, image_resolution=1024)
image = np.array(image)[:, :, ::-1]
image = Image.fromarray(np.uint8(image))
prompt = "A couple, 4k photo, highly detailed"
gen_images = pipe(
prompt=prompt,
condition_image=image,
num_inference_steps=50,
joint_attention_kwargs={"scale": 0.9},
guidance_scale=25.,
).images[0]
gen_images.save("output.png")
```
## Full fine-tuning
We provide a non-LoRA version of the training script `train_control_flux.py`. Here is an example command:
```bash
accelerate launch --config_file=accelerate_ds2.yaml train_control_flux.py \
--pretrained_model_name_or_path="black-forest-labs/FLUX.1-dev" \
--dataset_name="raulc0399/open_pose_controlnet" \
--output_dir="pose-control" \
--mixed_precision="bf16" \
--train_batch_size=2 \
--dataloader_num_workers=4 \
--gradient_accumulation_steps=4 \
--gradient_checkpointing \
--use_8bit_adam \
--proportion_empty_prompts=0.2 \
--learning_rate=5e-5 \
--adam_weight_decay=1e-4 \
--report_to="wandb" \
--lr_scheduler="cosine" \
--lr_warmup_steps=1000 \
--checkpointing_steps=1000 \
--max_train_steps=10000 \
--validation_steps=200 \
--validation_image "2_pose_1024.jpg" "3_pose_1024.jpg" \
--validation_prompt "two friends sitting by each other enjoying a day at the park, full hd, cinematic" "person enjoying a day at the park, full hd, cinematic" \
--offload \
--seed="0" \
--push_to_hub
```
Change the `validation_image` and `validation_prompt` as needed.
For inference, this time, we will run:
```py
from controlnet_aux import OpenposeDetector
from diffusers import FluxControlPipeline, FluxTransformer2DModel
from diffusers.utils import load_image
from PIL import Image
import numpy as np
import torch
transformer = FluxTransformer2DModel.from_pretrained("...") # change this.
pipe = FluxControlPipeline.from_pretrained(
"black-forest-labs/FLUX.1-dev", transformer=transformer, torch_dtype=torch.bfloat16
).to("cuda")
open_pose = OpenposeDetector.from_pretrained("lllyasviel/Annotators")
# prepare pose condition.
url = "https://huggingface.co/Adapter/t2iadapter/resolve/main/people.jpg"
image = load_image(url)
image = open_pose(image, detect_resolution=512, image_resolution=1024)
image = np.array(image)[:, :, ::-1]
image = Image.fromarray(np.uint8(image))
prompt = "A couple, 4k photo, highly detailed"
gen_images = pipe(
prompt=prompt,
condition_image=image,
num_inference_steps=50,
guidance_scale=25.,
).images[0]
gen_images.save("output.png")
```
## Things to note
* The scripts provided in this directory are experimental and educational. This means we may have to tweak things around to get good results on a given condition. We believe this is best done with the community 🤗
* The scripts are not memory-optimized but we offload the VAE and the text encoders to CPU when they are not used.
* We can extract LoRAs from the fully fine-tuned model. While we currently don't provide any utilities for that, users are welcome to refer to [this script](https://github.com/Stability-AI/stability-ComfyUI-nodes/blob/master/control_lora_create.py) that provides a similar functionality.

View File

@@ -1,6 +0,0 @@
transformers==4.47.0
wandb
torch
torchvision
accelerate==1.2.0
peft>=0.14.0

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -1,175 +0,0 @@
# Search models on Civitai and Hugging Face
The [auto_diffusers](https://github.com/suzukimain/auto_diffusers) library provides additional functionalities to Diffusers such as searching for models on Civitai and the Hugging Face Hub.
Please refer to the original library [here](https://pypi.org/project/auto-diffusers/)
## Installation
Before running the scripts, make sure to install the library's training dependencies:
> [!IMPORTANT]
> To make sure you can successfully run the latest versions of the example scripts, we highly recommend **installing from source** and keeping the installation up to date as we update the example scripts frequently and install some example-specific requirements. To do this, execute the following steps in a new virtual environment.
```bash
git clone https://github.com/huggingface/diffusers
cd diffusers
pip install .
```
Set up the pipeline. You can also cd to this folder and run it.
```bash
!wget https://raw.githubusercontent.com/suzukimain/auto_diffusers/refs/heads/master/src/auto_diffusers/pipeline_easy.py
```
## Load from Civitai
```python
from pipeline_easy import (
EasyPipelineForText2Image,
EasyPipelineForImage2Image,
EasyPipelineForInpainting,
)
# Text-to-Image
pipeline = EasyPipelineForText2Image.from_civitai(
"search_word",
base_model="SD 1.5",
).to("cuda")
# Image-to-Image
pipeline = EasyPipelineForImage2Image.from_civitai(
"search_word",
base_model="SD 1.5",
).to("cuda")
# Inpainting
pipeline = EasyPipelineForInpainting.from_civitai(
"search_word",
base_model="SD 1.5",
).to("cuda")
```
## Load from Hugging Face
```python
from pipeline_easy import (
EasyPipelineForText2Image,
EasyPipelineForImage2Image,
EasyPipelineForInpainting,
)
# Text-to-Image
pipeline = EasyPipelineForText2Image.from_huggingface(
"search_word",
checkpoint_format="diffusers",
).to("cuda")
# Image-to-Image
pipeline = EasyPipelineForImage2Image.from_huggingface(
"search_word",
checkpoint_format="diffusers",
).to("cuda")
# Inpainting
pipeline = EasyPipelineForInpainting.from_huggingface(
"search_word",
checkpoint_format="diffusers",
).to("cuda")
```
## Search Civitai and Huggingface
```python
from pipeline_easy import (
search_huggingface,
search_civitai,
)
# Search Lora
Lora = search_civitai(
"Keyword_to_search_Lora",
model_type="LORA",
base_model = "SD 1.5",
download=True,
)
# Load Lora into the pipeline.
pipeline.load_lora_weights(Lora)
# Search TextualInversion
TextualInversion = search_civitai(
"EasyNegative",
model_type="TextualInversion",
base_model = "SD 1.5",
download=True
)
# Load TextualInversion into the pipeline.
pipeline.load_textual_inversion(TextualInversion, token="EasyNegative")
```
### Search Civitai
> [!TIP]
> **If an error occurs, insert the `token` and run again.**
#### `EasyPipeline.from_civitai` parameters
| Name | Type | Default | Description |
|:---------------:|:----------------------:|:-------------:|:-----------------------------------------------------------------------------------:|
| search_word | string, Path | ー | The search query string. Can be a keyword, Civitai URL, local directory or file path. |
| model_type | string | `Checkpoint` | The type of model to search for. <br>(for example `Checkpoint`, `TextualInversion`, `Controlnet`, `LORA`, `Hypernetwork`, `AestheticGradient`, `Poses`) |
| base_model | string | None | Trained model tag (for example `SD 1.5`, `SD 3.5`, `SDXL 1.0`) |
| torch_dtype | string, torch.dtype | None | Override the default `torch.dtype` and load the model with another dtype. |
| force_download | bool | False | Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist. |
| cache_dir | string, Path | None | Path to the folder where cached files are stored. |
| resume | bool | False | Whether to resume an incomplete download. |
| token | string | None | API token for Civitai authentication. |
#### `search_civitai` parameters
| Name | Type | Default | Description |
|:---------------:|:--------------:|:-------------:|:-----------------------------------------------------------------------------------:|
| search_word | string, Path | ー | The search query string. Can be a keyword, Civitai URL, local directory or file path. |
| model_type | string | `Checkpoint` | The type of model to search for. <br>(for example `Checkpoint`, `TextualInversion`, `Controlnet`, `LORA`, `Hypernetwork`, `AestheticGradient`, `Poses`) |
| base_model | string | None | Trained model tag (for example `SD 1.5`, `SD 3.5`, `SDXL 1.0`) |
| download | bool | False | Whether to download the model. |
| force_download | bool | False | Whether to force the download if the model already exists. |
| cache_dir | string, Path | None | Path to the folder where cached files are stored. |
| resume | bool | False | Whether to resume an incomplete download. |
| token | string | None | API token for Civitai authentication. |
| include_params | bool | False | Whether to include parameters in the returned data. |
| skip_error | bool | False | Whether to skip errors and return None. |
### Search Huggingface
> [!TIP]
> **If an error occurs, insert the `token` and run again.**
#### `EasyPipeline.from_huggingface` parameters
| Name | Type | Default | Description |
|:---------------------:|:-------------------:|:--------------:|:----------------------------------------------------------------:|
| search_word | string, Path | ー | The search query string. Can be a keyword, Hugging Face URL, local directory or file path, or a Hugging Face path (`<creator>/<repo>`). |
| checkpoint_format | string | `single_file` | The format of the model checkpoint.<br>● `single_file` to search for `single file checkpoint` <br>●`diffusers` to search for `multifolder diffusers format checkpoint` |
| torch_dtype | string, torch.dtype | None | Override the default `torch.dtype` and load the model with another dtype. |
| force_download | bool | False | Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist. |
| cache_dir | string, Path | None | Path to a directory where a downloaded pretrained model configuration is cached if the standard cache is not used. |
| token | string, bool | None | The token to use as HTTP bearer authorization for remote files. |
#### `search_huggingface` parameters
| Name | Type | Default | Description |
|:---------------------:|:-------------------:|:--------------:|:----------------------------------------------------------------:|
| search_word | string, Path | ー | The search query string. Can be a keyword, Hugging Face URL, local directory or file path, or a Hugging Face path (`<creator>/<repo>`). |
| checkpoint_format | string | `single_file` | The format of the model checkpoint. <br>● `single_file` to search for `single file checkpoint` <br>●`diffusers` to search for `multifolder diffusers format checkpoint` |
| pipeline_tag | string | None | Tag to filter models by pipeline. |
| download | bool | False | Whether to download the model. |
| force_download | bool | False | Whether or not to force the (re-)download of the model weights and configuration files, overriding the cached versions if they exist. |
| cache_dir | string, Path | None | Path to a directory where a downloaded pretrained model configuration is cached if the standard cache is not used. |
| token | string, bool | None | The token to use as HTTP bearer authorization for remote files. |
| include_params | bool | False | Whether to include parameters in the returned data. |
| skip_error | bool | False | Whether to skip errors and return None. |

File diff suppressed because it is too large Load Diff

View File

@@ -1 +0,0 @@
huggingface-hub>=0.26.2

View File

@@ -1,226 +0,0 @@
# IP Adapter Training Example
[IP Adapter](https://arxiv.org/abs/2308.06721) is a novel approach designed to enhance text-to-image models such as Stable Diffusion by enabling them to generate images based on image prompts rather than text prompts alone. Unlike traditional methods that rely solely on complex text prompts, IP Adapter introduces the concept of using image prompts, leveraging the idea that "an image is worth a thousand words." By decoupling cross-attention layers for text and image features, IP Adapter effectively integrates image prompts into the generation process without the need for extensive fine-tuning or large computing resources.
## Training locally with PyTorch
### Installing the dependencies
Before running the scripts, make sure to install the library's training dependencies:
**Important**
To make sure you can successfully run the latest versions of the example scripts, we highly recommend **installing from source** and keeping the install up to date as we update the example scripts frequently and install some example-specific requirements. To do this, execute the following steps in a new virtual environment:
```bash
git clone https://github.com/huggingface/diffusers
cd diffusers
pip install -e .
```
Then cd in the example folder and run
```bash
pip install -r requirements.txt
```
And initialize an [🤗Accelerate](https://github.com/huggingface/accelerate/) environment with:
```bash
accelerate config
```
Or for a default accelerate configuration without answering questions about your environment
```bash
accelerate config default
```
Or if your environment doesn't support an interactive shell e.g. a notebook
```python
from accelerate.utils import write_basic_config
write_basic_config()
```
Certainly! Below is the documentation in pure Markdown format:
### Accelerate Launch Command Documentation
#### Description:
The Accelerate launch command is used to train a model using multiple GPUs and mixed precision training. It launches the training script `tutorial_train_ip-adapter.py` with specified parameters and configurations.
#### Usage Example:
```
accelerate launch --mixed_precision "fp16" \
tutorial_train_ip-adapter.py \
--pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5/" \
--image_encoder_path="{image_encoder_path}" \
--data_json_file="{data.json}" \
--data_root_path="{image_path}" \
--mixed_precision="fp16" \
--resolution=512 \
--train_batch_size=8 \
--dataloader_num_workers=4 \
--learning_rate=1e-04 \
--weight_decay=0.01 \
--output_dir="{output_dir}" \
--save_steps=10000
```
### Multi-GPU Script:
```
accelerate launch --num_processes 8 --multi_gpu --mixed_precision "fp16" \
tutorial_train_ip-adapter.py \
--pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5/" \
--image_encoder_path="{image_encoder_path}" \
--data_json_file="{data.json}" \
--data_root_path="{image_path}" \
--mixed_precision="fp16" \
--resolution=512 \
--train_batch_size=8 \
--dataloader_num_workers=4 \
--learning_rate=1e-04 \
--weight_decay=0.01 \
--output_dir="{output_dir}" \
--save_steps=10000
```
#### Parameters:
- `--num_processes`: Number of processes to launch for distributed training (in this example, 8 processes).
- `--multi_gpu`: Flag indicating the usage of multiple GPUs for training.
- `--mixed_precision "fp16"`: Enables mixed precision training with 16-bit floating-point precision.
- `tutorial_train_ip-adapter.py`: Name of the training script to be executed.
- `--pretrained_model_name_or_path`: Path or identifier for a pretrained model.
- `--image_encoder_path`: Path to the CLIP image encoder.
- `--data_json_file`: Path to the training data in JSON format.
- `--data_root_path`: Root path where training images are located.
- `--resolution`: Resolution of input images (512x512 in this example).
- `--train_batch_size`: Batch size for training data (8 in this example).
- `--dataloader_num_workers`: Number of subprocesses for data loading (4 in this example).
- `--learning_rate`: Learning rate for training (1e-04 in this example).
- `--weight_decay`: Weight decay for regularization (0.01 in this example).
- `--output_dir`: Directory to save model checkpoints and predictions.
- `--save_steps`: Frequency of saving checkpoints during training (10000 in this example).
### Inference
#### Description:
The provided inference code is used to load a trained model checkpoint and extract the components related to image projection and IP (Image Processing) adapter. These components are then saved into a binary file for later use in inference.
#### Usage Example:
```python
from safetensors.torch import load_file, save_file
# Load the trained model checkpoint in safetensors format
ckpt = "checkpoint-50000/pytorch_model.safetensors"
sd = load_file(ckpt) # Using safetensors load function
# Extract image projection and IP adapter components
image_proj_sd = {}
ip_sd = {}
for k in sd:
if k.startswith("unet"):
pass # Skip unet-related keys
elif k.startswith("image_proj_model"):
image_proj_sd[k.replace("image_proj_model.", "")] = sd[k]
elif k.startswith("adapter_modules"):
ip_sd[k.replace("adapter_modules.", "")] = sd[k]
# Save the components into separate safetensors files
save_file(image_proj_sd, "image_proj.safetensors")
save_file(ip_sd, "ip_adapter.safetensors")
```
### Sample Inference Script using the CLIP Model
```python
import torch
from safetensors.torch import load_file
from transformers import CLIPProcessor, CLIPModel # Using the Hugging Face CLIP model
# Load model components from safetensors
image_proj_ckpt = "image_proj.safetensors"
ip_adapter_ckpt = "ip_adapter.safetensors"
# Load the saved weights
image_proj_sd = load_file(image_proj_ckpt)
ip_adapter_sd = load_file(ip_adapter_ckpt)
# Define the model Parameters
class ImageProjectionModel(torch.nn.Module):
def __init__(self, input_dim=768, output_dim=512): # CLIP's default embedding size is 768
super().__init__()
self.model = torch.nn.Linear(input_dim, output_dim)
def forward(self, x):
return self.model(x)
class IPAdapterModel(torch.nn.Module):
def __init__(self, input_dim=512, output_dim=10): # Example for 10 classes
super().__init__()
self.model = torch.nn.Linear(input_dim, output_dim)
def forward(self, x):
return self.model(x)
# Initialize models
image_proj_model = ImageProjectionModel()
ip_adapter_model = IPAdapterModel()
# Load weights into models
image_proj_model.load_state_dict(image_proj_sd)
ip_adapter_model.load_state_dict(ip_adapter_sd)
# Set models to evaluation mode
image_proj_model.eval()
ip_adapter_model.eval()
#Inference pipeline
def inference(image_tensor):
"""
Run inference using the loaded models.
Args:
image_tensor: Preprocessed image tensor from CLIPProcessor
Returns:
Final inference results
"""
with torch.no_grad():
# Step 1: Project the image features
image_proj = image_proj_model(image_tensor)
# Step 2: Pass the projected features through the IP Adapter
result = ip_adapter_model(image_proj)
return result
# Using CLIP for image preprocessing
processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32")
clip_model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32")
#Image file path
image_path = "path/to/image.jpg"
# Preprocess the image
inputs = processor(images=image_path, return_tensors="pt")
image_features = clip_model.get_image_features(inputs["pixel_values"])
# Normalize the image features as per CLIP's recommendations
image_features = image_features / image_features.norm(dim=-1, keepdim=True)
# Run inference
output = inference(image_features)
print("Inference output:", output)
```
#### Parameters:
- `ckpt`: Path to the trained model checkpoint file.
- `map_location="cpu"`: Specifies that the model should be loaded onto the CPU.
- `image_proj_sd`: Dictionary to store the components related to image projection.
- `ip_sd`: Dictionary to store the components related to the IP adapter.
- `"unet"`, `"image_proj_model"`, `"adapter_modules"`: Prefixes indicating components of the model.

View File

@@ -1,4 +0,0 @@
accelerate
torchvision
transformers>=4.25.1
ip_adapter

View File

@@ -1,415 +0,0 @@
import argparse
import itertools
import json
import os
import random
import time
from pathlib import Path
import torch
import torch.nn.functional as F
from accelerate import Accelerator
from accelerate.utils import ProjectConfiguration
from ip_adapter.attention_processor_faceid import LoRAAttnProcessor, LoRAIPAttnProcessor
from ip_adapter.ip_adapter_faceid import MLPProjModel
from PIL import Image
from torchvision import transforms
from transformers import CLIPTextModel, CLIPTokenizer
from diffusers import AutoencoderKL, DDPMScheduler, UNet2DConditionModel
# Dataset
class MyDataset(torch.utils.data.Dataset):
def __init__(
self, json_file, tokenizer, size=512, t_drop_rate=0.05, i_drop_rate=0.05, ti_drop_rate=0.05, image_root_path=""
):
super().__init__()
self.tokenizer = tokenizer
self.size = size
self.i_drop_rate = i_drop_rate
self.t_drop_rate = t_drop_rate
self.ti_drop_rate = ti_drop_rate
self.image_root_path = image_root_path
self.data = json.load(
open(json_file)
) # list of dict: [{"image_file": "1.png", "id_embed_file": "faceid.bin"}]
self.transform = transforms.Compose(
[
transforms.Resize(self.size, interpolation=transforms.InterpolationMode.BILINEAR),
transforms.CenterCrop(self.size),
transforms.ToTensor(),
transforms.Normalize([0.5], [0.5]),
]
)
def __getitem__(self, idx):
item = self.data[idx]
text = item["text"]
image_file = item["image_file"]
# read image
raw_image = Image.open(os.path.join(self.image_root_path, image_file))
image = self.transform(raw_image.convert("RGB"))
face_id_embed = torch.load(item["id_embed_file"], map_location="cpu")
face_id_embed = torch.from_numpy(face_id_embed)
# drop
drop_image_embed = 0
rand_num = random.random()
if rand_num < self.i_drop_rate:
drop_image_embed = 1
elif rand_num < (self.i_drop_rate + self.t_drop_rate):
text = ""
elif rand_num < (self.i_drop_rate + self.t_drop_rate + self.ti_drop_rate):
text = ""
drop_image_embed = 1
if drop_image_embed:
face_id_embed = torch.zeros_like(face_id_embed)
# get text and tokenize
text_input_ids = self.tokenizer(
text,
max_length=self.tokenizer.model_max_length,
padding="max_length",
truncation=True,
return_tensors="pt",
).input_ids
return {
"image": image,
"text_input_ids": text_input_ids,
"face_id_embed": face_id_embed,
"drop_image_embed": drop_image_embed,
}
def __len__(self):
return len(self.data)
def collate_fn(data):
images = torch.stack([example["image"] for example in data])
text_input_ids = torch.cat([example["text_input_ids"] for example in data], dim=0)
face_id_embed = torch.stack([example["face_id_embed"] for example in data])
drop_image_embeds = [example["drop_image_embed"] for example in data]
return {
"images": images,
"text_input_ids": text_input_ids,
"face_id_embed": face_id_embed,
"drop_image_embeds": drop_image_embeds,
}
class IPAdapter(torch.nn.Module):
"""IP-Adapter"""
def __init__(self, unet, image_proj_model, adapter_modules, ckpt_path=None):
super().__init__()
self.unet = unet
self.image_proj_model = image_proj_model
self.adapter_modules = adapter_modules
if ckpt_path is not None:
self.load_from_checkpoint(ckpt_path)
def forward(self, noisy_latents, timesteps, encoder_hidden_states, image_embeds):
ip_tokens = self.image_proj_model(image_embeds)
encoder_hidden_states = torch.cat([encoder_hidden_states, ip_tokens], dim=1)
# Predict the noise residual
noise_pred = self.unet(noisy_latents, timesteps, encoder_hidden_states).sample
return noise_pred
def load_from_checkpoint(self, ckpt_path: str):
# Calculate original checksums
orig_ip_proj_sum = torch.sum(torch.stack([torch.sum(p) for p in self.image_proj_model.parameters()]))
orig_adapter_sum = torch.sum(torch.stack([torch.sum(p) for p in self.adapter_modules.parameters()]))
state_dict = torch.load(ckpt_path, map_location="cpu")
# Load state dict for image_proj_model and adapter_modules
self.image_proj_model.load_state_dict(state_dict["image_proj"], strict=True)
self.adapter_modules.load_state_dict(state_dict["ip_adapter"], strict=True)
# Calculate new checksums
new_ip_proj_sum = torch.sum(torch.stack([torch.sum(p) for p in self.image_proj_model.parameters()]))
new_adapter_sum = torch.sum(torch.stack([torch.sum(p) for p in self.adapter_modules.parameters()]))
# Verify if the weights have changed
assert orig_ip_proj_sum != new_ip_proj_sum, "Weights of image_proj_model did not change!"
assert orig_adapter_sum != new_adapter_sum, "Weights of adapter_modules did not change!"
print(f"Successfully loaded weights from checkpoint {ckpt_path}")
def parse_args():
parser = argparse.ArgumentParser(description="Simple example of a training script.")
parser.add_argument(
"--pretrained_model_name_or_path",
type=str,
default=None,
required=True,
help="Path to pretrained model or model identifier from huggingface.co/models.",
)
parser.add_argument(
"--pretrained_ip_adapter_path",
type=str,
default=None,
help="Path to pretrained ip adapter model. If not specified weights are initialized randomly.",
)
parser.add_argument(
"--data_json_file",
type=str,
default=None,
required=True,
help="Training data",
)
parser.add_argument(
"--data_root_path",
type=str,
default="",
required=True,
help="Training data root path",
)
parser.add_argument(
"--image_encoder_path",
type=str,
default=None,
required=True,
help="Path to CLIP image encoder",
)
parser.add_argument(
"--output_dir",
type=str,
default="sd-ip_adapter",
help="The output directory where the model predictions and checkpoints will be written.",
)
parser.add_argument(
"--logging_dir",
type=str,
default="logs",
help=(
"[TensorBoard](https://www.tensorflow.org/tensorboard) log directory. Will default to"
" *output_dir/runs/**CURRENT_DATETIME_HOSTNAME***."
),
)
parser.add_argument(
"--resolution",
type=int,
default=512,
help=("The resolution for input images"),
)
parser.add_argument(
"--learning_rate",
type=float,
default=1e-4,
help="Learning rate to use.",
)
parser.add_argument("--weight_decay", type=float, default=1e-2, help="Weight decay to use.")
parser.add_argument("--num_train_epochs", type=int, default=100)
parser.add_argument(
"--train_batch_size", type=int, default=8, help="Batch size (per device) for the training dataloader."
)
parser.add_argument(
"--dataloader_num_workers",
type=int,
default=0,
help=(
"Number of subprocesses to use for data loading. 0 means that the data will be loaded in the main process."
),
)
parser.add_argument(
"--save_steps",
type=int,
default=2000,
help=("Save a checkpoint of the training state every X updates"),
)
parser.add_argument(
"--mixed_precision",
type=str,
default=None,
choices=["no", "fp16", "bf16"],
help=(
"Whether to use mixed precision. Choose between fp16 and bf16 (bfloat16). Bf16 requires PyTorch >="
" 1.10.and an Nvidia Ampere GPU. Default to the value of accelerate config of the current system or the"
" flag passed with the `accelerate.launch` command. Use this argument to override the accelerate config."
),
)
parser.add_argument(
"--report_to",
type=str,
default="tensorboard",
help=(
'The integration to report the results and logs to. Supported platforms are `"tensorboard"`'
' (default), `"wandb"` and `"comet_ml"`. Use `"all"` to report to all integrations.'
),
)
parser.add_argument("--local_rank", type=int, default=-1, help="For distributed training: local_rank")
args = parser.parse_args()
env_local_rank = int(os.environ.get("LOCAL_RANK", -1))
if env_local_rank != -1 and env_local_rank != args.local_rank:
args.local_rank = env_local_rank
return args
def main():
args = parse_args()
logging_dir = Path(args.output_dir, args.logging_dir)
accelerator_project_config = ProjectConfiguration(project_dir=args.output_dir, logging_dir=logging_dir)
accelerator = Accelerator(
mixed_precision=args.mixed_precision,
log_with=args.report_to,
project_config=accelerator_project_config,
)
if accelerator.is_main_process:
if args.output_dir is not None:
os.makedirs(args.output_dir, exist_ok=True)
# Load scheduler, tokenizer and models.
noise_scheduler = DDPMScheduler.from_pretrained(args.pretrained_model_name_or_path, subfolder="scheduler")
tokenizer = CLIPTokenizer.from_pretrained(args.pretrained_model_name_or_path, subfolder="tokenizer")
text_encoder = CLIPTextModel.from_pretrained(args.pretrained_model_name_or_path, subfolder="text_encoder")
vae = AutoencoderKL.from_pretrained(args.pretrained_model_name_or_path, subfolder="vae")
unet = UNet2DConditionModel.from_pretrained(args.pretrained_model_name_or_path, subfolder="unet")
# image_encoder = CLIPVisionModelWithProjection.from_pretrained(args.image_encoder_path)
# freeze parameters of models to save more memory
unet.requires_grad_(False)
vae.requires_grad_(False)
text_encoder.requires_grad_(False)
# image_encoder.requires_grad_(False)
# ip-adapter
image_proj_model = MLPProjModel(
cross_attention_dim=unet.config.cross_attention_dim,
id_embeddings_dim=512,
num_tokens=4,
)
# init adapter modules
lora_rank = 128
attn_procs = {}
unet_sd = unet.state_dict()
for name in unet.attn_processors.keys():
cross_attention_dim = None if name.endswith("attn1.processor") else unet.config.cross_attention_dim
if name.startswith("mid_block"):
hidden_size = unet.config.block_out_channels[-1]
elif name.startswith("up_blocks"):
block_id = int(name[len("up_blocks.")])
hidden_size = list(reversed(unet.config.block_out_channels))[block_id]
elif name.startswith("down_blocks"):
block_id = int(name[len("down_blocks.")])
hidden_size = unet.config.block_out_channels[block_id]
if cross_attention_dim is None:
attn_procs[name] = LoRAAttnProcessor(
hidden_size=hidden_size, cross_attention_dim=cross_attention_dim, rank=lora_rank
)
else:
layer_name = name.split(".processor")[0]
weights = {
"to_k_ip.weight": unet_sd[layer_name + ".to_k.weight"],
"to_v_ip.weight": unet_sd[layer_name + ".to_v.weight"],
}
attn_procs[name] = LoRAIPAttnProcessor(
hidden_size=hidden_size, cross_attention_dim=cross_attention_dim, rank=lora_rank
)
attn_procs[name].load_state_dict(weights, strict=False)
unet.set_attn_processor(attn_procs)
adapter_modules = torch.nn.ModuleList(unet.attn_processors.values())
ip_adapter = IPAdapter(unet, image_proj_model, adapter_modules, args.pretrained_ip_adapter_path)
weight_dtype = torch.float32
if accelerator.mixed_precision == "fp16":
weight_dtype = torch.float16
elif accelerator.mixed_precision == "bf16":
weight_dtype = torch.bfloat16
# unet.to(accelerator.device, dtype=weight_dtype)
vae.to(accelerator.device, dtype=weight_dtype)
text_encoder.to(accelerator.device, dtype=weight_dtype)
# image_encoder.to(accelerator.device, dtype=weight_dtype)
# optimizer
params_to_opt = itertools.chain(ip_adapter.image_proj_model.parameters(), ip_adapter.adapter_modules.parameters())
optimizer = torch.optim.AdamW(params_to_opt, lr=args.learning_rate, weight_decay=args.weight_decay)
# dataloader
train_dataset = MyDataset(
args.data_json_file, tokenizer=tokenizer, size=args.resolution, image_root_path=args.data_root_path
)
train_dataloader = torch.utils.data.DataLoader(
train_dataset,
shuffle=True,
collate_fn=collate_fn,
batch_size=args.train_batch_size,
num_workers=args.dataloader_num_workers,
)
# Prepare everything with our `accelerator`.
ip_adapter, optimizer, train_dataloader = accelerator.prepare(ip_adapter, optimizer, train_dataloader)
global_step = 0
for epoch in range(0, args.num_train_epochs):
begin = time.perf_counter()
for step, batch in enumerate(train_dataloader):
load_data_time = time.perf_counter() - begin
with accelerator.accumulate(ip_adapter):
# Convert images to latent space
with torch.no_grad():
latents = vae.encode(
batch["images"].to(accelerator.device, dtype=weight_dtype)
).latent_dist.sample()
latents = latents * vae.config.scaling_factor
# Sample noise that we'll add to the latents
noise = torch.randn_like(latents)
bsz = latents.shape[0]
# Sample a random timestep for each image
timesteps = torch.randint(0, noise_scheduler.num_train_timesteps, (bsz,), device=latents.device)
timesteps = timesteps.long()
# Add noise to the latents according to the noise magnitude at each timestep
# (this is the forward diffusion process)
noisy_latents = noise_scheduler.add_noise(latents, noise, timesteps)
image_embeds = batch["face_id_embed"].to(accelerator.device, dtype=weight_dtype)
with torch.no_grad():
encoder_hidden_states = text_encoder(batch["text_input_ids"].to(accelerator.device))[0]
noise_pred = ip_adapter(noisy_latents, timesteps, encoder_hidden_states, image_embeds)
loss = F.mse_loss(noise_pred.float(), noise.float(), reduction="mean")
# Gather the losses across all processes for logging (if we use distributed training).
avg_loss = accelerator.gather(loss.repeat(args.train_batch_size)).mean().item()
# Backpropagate
accelerator.backward(loss)
optimizer.step()
optimizer.zero_grad()
if accelerator.is_main_process:
print(
"Epoch {}, step {}, data_time: {}, time: {}, step_loss: {}".format(
epoch, step, load_data_time, time.perf_counter() - begin, avg_loss
)
)
global_step += 1
if global_step % args.save_steps == 0:
save_path = os.path.join(args.output_dir, f"checkpoint-{global_step}")
accelerator.save_state(save_path)
begin = time.perf_counter()
if __name__ == "__main__":
main()

View File

@@ -1,422 +0,0 @@
import argparse
import itertools
import json
import os
import random
import time
from pathlib import Path
import torch
import torch.nn.functional as F
from accelerate import Accelerator
from accelerate.utils import ProjectConfiguration
from ip_adapter.ip_adapter import ImageProjModel
from ip_adapter.utils import is_torch2_available
from PIL import Image
from torchvision import transforms
from transformers import CLIPImageProcessor, CLIPTextModel, CLIPTokenizer, CLIPVisionModelWithProjection
from diffusers import AutoencoderKL, DDPMScheduler, UNet2DConditionModel
if is_torch2_available():
from ip_adapter.attention_processor import AttnProcessor2_0 as AttnProcessor
from ip_adapter.attention_processor import IPAttnProcessor2_0 as IPAttnProcessor
else:
from ip_adapter.attention_processor import AttnProcessor, IPAttnProcessor
# Dataset
class MyDataset(torch.utils.data.Dataset):
def __init__(
self, json_file, tokenizer, size=512, t_drop_rate=0.05, i_drop_rate=0.05, ti_drop_rate=0.05, image_root_path=""
):
super().__init__()
self.tokenizer = tokenizer
self.size = size
self.i_drop_rate = i_drop_rate
self.t_drop_rate = t_drop_rate
self.ti_drop_rate = ti_drop_rate
self.image_root_path = image_root_path
self.data = json.load(open(json_file)) # list of dict: [{"image_file": "1.png", "text": "A dog"}]
self.transform = transforms.Compose(
[
transforms.Resize(self.size, interpolation=transforms.InterpolationMode.BILINEAR),
transforms.CenterCrop(self.size),
transforms.ToTensor(),
transforms.Normalize([0.5], [0.5]),
]
)
self.clip_image_processor = CLIPImageProcessor()
def __getitem__(self, idx):
item = self.data[idx]
text = item["text"]
image_file = item["image_file"]
# read image
raw_image = Image.open(os.path.join(self.image_root_path, image_file))
image = self.transform(raw_image.convert("RGB"))
clip_image = self.clip_image_processor(images=raw_image, return_tensors="pt").pixel_values
# drop
drop_image_embed = 0
rand_num = random.random()
if rand_num < self.i_drop_rate:
drop_image_embed = 1
elif rand_num < (self.i_drop_rate + self.t_drop_rate):
text = ""
elif rand_num < (self.i_drop_rate + self.t_drop_rate + self.ti_drop_rate):
text = ""
drop_image_embed = 1
# get text and tokenize
text_input_ids = self.tokenizer(
text,
max_length=self.tokenizer.model_max_length,
padding="max_length",
truncation=True,
return_tensors="pt",
).input_ids
return {
"image": image,
"text_input_ids": text_input_ids,
"clip_image": clip_image,
"drop_image_embed": drop_image_embed,
}
def __len__(self):
return len(self.data)
def collate_fn(data):
images = torch.stack([example["image"] for example in data])
text_input_ids = torch.cat([example["text_input_ids"] for example in data], dim=0)
clip_images = torch.cat([example["clip_image"] for example in data], dim=0)
drop_image_embeds = [example["drop_image_embed"] for example in data]
return {
"images": images,
"text_input_ids": text_input_ids,
"clip_images": clip_images,
"drop_image_embeds": drop_image_embeds,
}
class IPAdapter(torch.nn.Module):
"""IP-Adapter"""
def __init__(self, unet, image_proj_model, adapter_modules, ckpt_path=None):
super().__init__()
self.unet = unet
self.image_proj_model = image_proj_model
self.adapter_modules = adapter_modules
if ckpt_path is not None:
self.load_from_checkpoint(ckpt_path)
def forward(self, noisy_latents, timesteps, encoder_hidden_states, image_embeds):
ip_tokens = self.image_proj_model(image_embeds)
encoder_hidden_states = torch.cat([encoder_hidden_states, ip_tokens], dim=1)
# Predict the noise residual
noise_pred = self.unet(noisy_latents, timesteps, encoder_hidden_states).sample
return noise_pred
def load_from_checkpoint(self, ckpt_path: str):
# Calculate original checksums
orig_ip_proj_sum = torch.sum(torch.stack([torch.sum(p) for p in self.image_proj_model.parameters()]))
orig_adapter_sum = torch.sum(torch.stack([torch.sum(p) for p in self.adapter_modules.parameters()]))
state_dict = torch.load(ckpt_path, map_location="cpu")
# Load state dict for image_proj_model and adapter_modules
self.image_proj_model.load_state_dict(state_dict["image_proj"], strict=True)
self.adapter_modules.load_state_dict(state_dict["ip_adapter"], strict=True)
# Calculate new checksums
new_ip_proj_sum = torch.sum(torch.stack([torch.sum(p) for p in self.image_proj_model.parameters()]))
new_adapter_sum = torch.sum(torch.stack([torch.sum(p) for p in self.adapter_modules.parameters()]))
# Verify if the weights have changed
assert orig_ip_proj_sum != new_ip_proj_sum, "Weights of image_proj_model did not change!"
assert orig_adapter_sum != new_adapter_sum, "Weights of adapter_modules did not change!"
print(f"Successfully loaded weights from checkpoint {ckpt_path}")
def parse_args():
parser = argparse.ArgumentParser(description="Simple example of a training script.")
parser.add_argument(
"--pretrained_model_name_or_path",
type=str,
default=None,
required=True,
help="Path to pretrained model or model identifier from huggingface.co/models.",
)
parser.add_argument(
"--pretrained_ip_adapter_path",
type=str,
default=None,
help="Path to pretrained ip adapter model. If not specified weights are initialized randomly.",
)
parser.add_argument(
"--data_json_file",
type=str,
default=None,
required=True,
help="Training data",
)
parser.add_argument(
"--data_root_path",
type=str,
default="",
required=True,
help="Training data root path",
)
parser.add_argument(
"--image_encoder_path",
type=str,
default=None,
required=True,
help="Path to CLIP image encoder",
)
parser.add_argument(
"--output_dir",
type=str,
default="sd-ip_adapter",
help="The output directory where the model predictions and checkpoints will be written.",
)
parser.add_argument(
"--logging_dir",
type=str,
default="logs",
help=(
"[TensorBoard](https://www.tensorflow.org/tensorboard) log directory. Will default to"
" *output_dir/runs/**CURRENT_DATETIME_HOSTNAME***."
),
)
parser.add_argument(
"--resolution",
type=int,
default=512,
help=("The resolution for input images"),
)
parser.add_argument(
"--learning_rate",
type=float,
default=1e-4,
help="Learning rate to use.",
)
parser.add_argument("--weight_decay", type=float, default=1e-2, help="Weight decay to use.")
parser.add_argument("--num_train_epochs", type=int, default=100)
parser.add_argument(
"--train_batch_size", type=int, default=8, help="Batch size (per device) for the training dataloader."
)
parser.add_argument(
"--dataloader_num_workers",
type=int,
default=0,
help=(
"Number of subprocesses to use for data loading. 0 means that the data will be loaded in the main process."
),
)
parser.add_argument(
"--save_steps",
type=int,
default=2000,
help=("Save a checkpoint of the training state every X updates"),
)
parser.add_argument(
"--mixed_precision",
type=str,
default=None,
choices=["no", "fp16", "bf16"],
help=(
"Whether to use mixed precision. Choose between fp16 and bf16 (bfloat16). Bf16 requires PyTorch >="
" 1.10.and an Nvidia Ampere GPU. Default to the value of accelerate config of the current system or the"
" flag passed with the `accelerate.launch` command. Use this argument to override the accelerate config."
),
)
parser.add_argument(
"--report_to",
type=str,
default="tensorboard",
help=(
'The integration to report the results and logs to. Supported platforms are `"tensorboard"`'
' (default), `"wandb"` and `"comet_ml"`. Use `"all"` to report to all integrations.'
),
)
parser.add_argument("--local_rank", type=int, default=-1, help="For distributed training: local_rank")
args = parser.parse_args()
env_local_rank = int(os.environ.get("LOCAL_RANK", -1))
if env_local_rank != -1 and env_local_rank != args.local_rank:
args.local_rank = env_local_rank
return args
def main():
args = parse_args()
logging_dir = Path(args.output_dir, args.logging_dir)
accelerator_project_config = ProjectConfiguration(project_dir=args.output_dir, logging_dir=logging_dir)
accelerator = Accelerator(
mixed_precision=args.mixed_precision,
log_with=args.report_to,
project_config=accelerator_project_config,
)
if accelerator.is_main_process:
if args.output_dir is not None:
os.makedirs(args.output_dir, exist_ok=True)
# Load scheduler, tokenizer and models.
noise_scheduler = DDPMScheduler.from_pretrained(args.pretrained_model_name_or_path, subfolder="scheduler")
tokenizer = CLIPTokenizer.from_pretrained(args.pretrained_model_name_or_path, subfolder="tokenizer")
text_encoder = CLIPTextModel.from_pretrained(args.pretrained_model_name_or_path, subfolder="text_encoder")
vae = AutoencoderKL.from_pretrained(args.pretrained_model_name_or_path, subfolder="vae")
unet = UNet2DConditionModel.from_pretrained(args.pretrained_model_name_or_path, subfolder="unet")
image_encoder = CLIPVisionModelWithProjection.from_pretrained(args.image_encoder_path)
# freeze parameters of models to save more memory
unet.requires_grad_(False)
vae.requires_grad_(False)
text_encoder.requires_grad_(False)
image_encoder.requires_grad_(False)
# ip-adapter
image_proj_model = ImageProjModel(
cross_attention_dim=unet.config.cross_attention_dim,
clip_embeddings_dim=image_encoder.config.projection_dim,
clip_extra_context_tokens=4,
)
# init adapter modules
attn_procs = {}
unet_sd = unet.state_dict()
for name in unet.attn_processors.keys():
cross_attention_dim = None if name.endswith("attn1.processor") else unet.config.cross_attention_dim
if name.startswith("mid_block"):
hidden_size = unet.config.block_out_channels[-1]
elif name.startswith("up_blocks"):
block_id = int(name[len("up_blocks.")])
hidden_size = list(reversed(unet.config.block_out_channels))[block_id]
elif name.startswith("down_blocks"):
block_id = int(name[len("down_blocks.")])
hidden_size = unet.config.block_out_channels[block_id]
if cross_attention_dim is None:
attn_procs[name] = AttnProcessor()
else:
layer_name = name.split(".processor")[0]
weights = {
"to_k_ip.weight": unet_sd[layer_name + ".to_k.weight"],
"to_v_ip.weight": unet_sd[layer_name + ".to_v.weight"],
}
attn_procs[name] = IPAttnProcessor(hidden_size=hidden_size, cross_attention_dim=cross_attention_dim)
attn_procs[name].load_state_dict(weights)
unet.set_attn_processor(attn_procs)
adapter_modules = torch.nn.ModuleList(unet.attn_processors.values())
ip_adapter = IPAdapter(unet, image_proj_model, adapter_modules, args.pretrained_ip_adapter_path)
weight_dtype = torch.float32
if accelerator.mixed_precision == "fp16":
weight_dtype = torch.float16
elif accelerator.mixed_precision == "bf16":
weight_dtype = torch.bfloat16
# unet.to(accelerator.device, dtype=weight_dtype)
vae.to(accelerator.device, dtype=weight_dtype)
text_encoder.to(accelerator.device, dtype=weight_dtype)
image_encoder.to(accelerator.device, dtype=weight_dtype)
# optimizer
params_to_opt = itertools.chain(ip_adapter.image_proj_model.parameters(), ip_adapter.adapter_modules.parameters())
optimizer = torch.optim.AdamW(params_to_opt, lr=args.learning_rate, weight_decay=args.weight_decay)
# dataloader
train_dataset = MyDataset(
args.data_json_file, tokenizer=tokenizer, size=args.resolution, image_root_path=args.data_root_path
)
train_dataloader = torch.utils.data.DataLoader(
train_dataset,
shuffle=True,
collate_fn=collate_fn,
batch_size=args.train_batch_size,
num_workers=args.dataloader_num_workers,
)
# Prepare everything with our `accelerator`.
ip_adapter, optimizer, train_dataloader = accelerator.prepare(ip_adapter, optimizer, train_dataloader)
global_step = 0
for epoch in range(0, args.num_train_epochs):
begin = time.perf_counter()
for step, batch in enumerate(train_dataloader):
load_data_time = time.perf_counter() - begin
with accelerator.accumulate(ip_adapter):
# Convert images to latent space
with torch.no_grad():
latents = vae.encode(
batch["images"].to(accelerator.device, dtype=weight_dtype)
).latent_dist.sample()
latents = latents * vae.config.scaling_factor
# Sample noise that we'll add to the latents
noise = torch.randn_like(latents)
bsz = latents.shape[0]
# Sample a random timestep for each image
timesteps = torch.randint(0, noise_scheduler.num_train_timesteps, (bsz,), device=latents.device)
timesteps = timesteps.long()
# Add noise to the latents according to the noise magnitude at each timestep
# (this is the forward diffusion process)
noisy_latents = noise_scheduler.add_noise(latents, noise, timesteps)
with torch.no_grad():
image_embeds = image_encoder(
batch["clip_images"].to(accelerator.device, dtype=weight_dtype)
).image_embeds
image_embeds_ = []
for image_embed, drop_image_embed in zip(image_embeds, batch["drop_image_embeds"]):
if drop_image_embed == 1:
image_embeds_.append(torch.zeros_like(image_embed))
else:
image_embeds_.append(image_embed)
image_embeds = torch.stack(image_embeds_)
with torch.no_grad():
encoder_hidden_states = text_encoder(batch["text_input_ids"].to(accelerator.device))[0]
noise_pred = ip_adapter(noisy_latents, timesteps, encoder_hidden_states, image_embeds)
loss = F.mse_loss(noise_pred.float(), noise.float(), reduction="mean")
# Gather the losses across all processes for logging (if we use distributed training).
avg_loss = accelerator.gather(loss.repeat(args.train_batch_size)).mean().item()
# Backpropagate
accelerator.backward(loss)
optimizer.step()
optimizer.zero_grad()
if accelerator.is_main_process:
print(
"Epoch {}, step {}, data_time: {}, time: {}, step_loss: {}".format(
epoch, step, load_data_time, time.perf_counter() - begin, avg_loss
)
)
global_step += 1
if global_step % args.save_steps == 0:
save_path = os.path.join(args.output_dir, f"checkpoint-{global_step}")
accelerator.save_state(save_path)
begin = time.perf_counter()
if __name__ == "__main__":
main()

View File

@@ -1,445 +0,0 @@
import argparse
import itertools
import json
import os
import random
import time
from pathlib import Path
import torch
import torch.nn.functional as F
from accelerate import Accelerator
from accelerate.utils import ProjectConfiguration
from ip_adapter.resampler import Resampler
from ip_adapter.utils import is_torch2_available
from PIL import Image
from torchvision import transforms
from transformers import CLIPImageProcessor, CLIPTextModel, CLIPTokenizer, CLIPVisionModelWithProjection
from diffusers import AutoencoderKL, DDPMScheduler, UNet2DConditionModel
if is_torch2_available():
from ip_adapter.attention_processor import AttnProcessor2_0 as AttnProcessor
from ip_adapter.attention_processor import IPAttnProcessor2_0 as IPAttnProcessor
else:
from ip_adapter.attention_processor import AttnProcessor, IPAttnProcessor
# Dataset
class MyDataset(torch.utils.data.Dataset):
def __init__(
self, json_file, tokenizer, size=512, t_drop_rate=0.05, i_drop_rate=0.05, ti_drop_rate=0.05, image_root_path=""
):
super().__init__()
self.tokenizer = tokenizer
self.size = size
self.i_drop_rate = i_drop_rate
self.t_drop_rate = t_drop_rate
self.ti_drop_rate = ti_drop_rate
self.image_root_path = image_root_path
self.data = json.load(open(json_file)) # list of dict: [{"image_file": "1.png", "text": "A dog"}]
self.transform = transforms.Compose(
[
transforms.Resize(self.size, interpolation=transforms.InterpolationMode.BILINEAR),
transforms.CenterCrop(self.size),
transforms.ToTensor(),
transforms.Normalize([0.5], [0.5]),
]
)
self.clip_image_processor = CLIPImageProcessor()
def __getitem__(self, idx):
item = self.data[idx]
text = item["text"]
image_file = item["image_file"]
# read image
raw_image = Image.open(os.path.join(self.image_root_path, image_file))
image = self.transform(raw_image.convert("RGB"))
clip_image = self.clip_image_processor(images=raw_image, return_tensors="pt").pixel_values
# drop
drop_image_embed = 0
rand_num = random.random()
if rand_num < self.i_drop_rate:
drop_image_embed = 1
elif rand_num < (self.i_drop_rate + self.t_drop_rate):
text = ""
elif rand_num < (self.i_drop_rate + self.t_drop_rate + self.ti_drop_rate):
text = ""
drop_image_embed = 1
# get text and tokenize
text_input_ids = self.tokenizer(
text,
max_length=self.tokenizer.model_max_length,
padding="max_length",
truncation=True,
return_tensors="pt",
).input_ids
return {
"image": image,
"text_input_ids": text_input_ids,
"clip_image": clip_image,
"drop_image_embed": drop_image_embed,
}
def __len__(self):
return len(self.data)
def collate_fn(data):
images = torch.stack([example["image"] for example in data])
text_input_ids = torch.cat([example["text_input_ids"] for example in data], dim=0)
clip_images = torch.cat([example["clip_image"] for example in data], dim=0)
drop_image_embeds = [example["drop_image_embed"] for example in data]
return {
"images": images,
"text_input_ids": text_input_ids,
"clip_images": clip_images,
"drop_image_embeds": drop_image_embeds,
}
class IPAdapter(torch.nn.Module):
"""IP-Adapter"""
def __init__(self, unet, image_proj_model, adapter_modules, ckpt_path=None):
super().__init__()
self.unet = unet
self.image_proj_model = image_proj_model
self.adapter_modules = adapter_modules
if ckpt_path is not None:
self.load_from_checkpoint(ckpt_path)
def forward(self, noisy_latents, timesteps, encoder_hidden_states, image_embeds):
ip_tokens = self.image_proj_model(image_embeds)
encoder_hidden_states = torch.cat([encoder_hidden_states, ip_tokens], dim=1)
# Predict the noise residual
noise_pred = self.unet(noisy_latents, timesteps, encoder_hidden_states).sample
return noise_pred
def load_from_checkpoint(self, ckpt_path: str):
# Calculate original checksums
orig_ip_proj_sum = torch.sum(torch.stack([torch.sum(p) for p in self.image_proj_model.parameters()]))
orig_adapter_sum = torch.sum(torch.stack([torch.sum(p) for p in self.adapter_modules.parameters()]))
state_dict = torch.load(ckpt_path, map_location="cpu")
# Check if 'latents' exists in both the saved state_dict and the current model's state_dict
strict_load_image_proj_model = True
if "latents" in state_dict["image_proj"] and "latents" in self.image_proj_model.state_dict():
# Check if the shapes are mismatched
if state_dict["image_proj"]["latents"].shape != self.image_proj_model.state_dict()["latents"].shape:
print(f"Shapes of 'image_proj.latents' in checkpoint {ckpt_path} and current model do not match.")
print("Removing 'latents' from checkpoint and loading the rest of the weights.")
del state_dict["image_proj"]["latents"]
strict_load_image_proj_model = False
# Load state dict for image_proj_model and adapter_modules
self.image_proj_model.load_state_dict(state_dict["image_proj"], strict=strict_load_image_proj_model)
self.adapter_modules.load_state_dict(state_dict["ip_adapter"], strict=True)
# Calculate new checksums
new_ip_proj_sum = torch.sum(torch.stack([torch.sum(p) for p in self.image_proj_model.parameters()]))
new_adapter_sum = torch.sum(torch.stack([torch.sum(p) for p in self.adapter_modules.parameters()]))
# Verify if the weights have changed
assert orig_ip_proj_sum != new_ip_proj_sum, "Weights of image_proj_model did not change!"
assert orig_adapter_sum != new_adapter_sum, "Weights of adapter_modules did not change!"
print(f"Successfully loaded weights from checkpoint {ckpt_path}")
def parse_args():
parser = argparse.ArgumentParser(description="Simple example of a training script.")
parser.add_argument(
"--pretrained_model_name_or_path",
type=str,
default=None,
required=True,
help="Path to pretrained model or model identifier from huggingface.co/models.",
)
parser.add_argument(
"--pretrained_ip_adapter_path",
type=str,
default=None,
help="Path to pretrained ip adapter model. If not specified weights are initialized randomly.",
)
parser.add_argument(
"--num_tokens",
type=int,
default=16,
help="Number of tokens to query from the CLIP image encoding.",
)
parser.add_argument(
"--data_json_file",
type=str,
default=None,
required=True,
help="Training data",
)
parser.add_argument(
"--data_root_path",
type=str,
default="",
required=True,
help="Training data root path",
)
parser.add_argument(
"--image_encoder_path",
type=str,
default=None,
required=True,
help="Path to CLIP image encoder",
)
parser.add_argument(
"--output_dir",
type=str,
default="sd-ip_adapter",
help="The output directory where the model predictions and checkpoints will be written.",
)
parser.add_argument(
"--logging_dir",
type=str,
default="logs",
help=(
"[TensorBoard](https://www.tensorflow.org/tensorboard) log directory. Will default to"
" *output_dir/runs/**CURRENT_DATETIME_HOSTNAME***."
),
)
parser.add_argument(
"--resolution",
type=int,
default=512,
help=("The resolution for input images"),
)
parser.add_argument(
"--learning_rate",
type=float,
default=1e-4,
help="Learning rate to use.",
)
parser.add_argument("--weight_decay", type=float, default=1e-2, help="Weight decay to use.")
parser.add_argument("--num_train_epochs", type=int, default=100)
parser.add_argument(
"--train_batch_size", type=int, default=8, help="Batch size (per device) for the training dataloader."
)
parser.add_argument(
"--dataloader_num_workers",
type=int,
default=0,
help=(
"Number of subprocesses to use for data loading. 0 means that the data will be loaded in the main process."
),
)
parser.add_argument(
"--save_steps",
type=int,
default=2000,
help=("Save a checkpoint of the training state every X updates"),
)
parser.add_argument(
"--mixed_precision",
type=str,
default=None,
choices=["no", "fp16", "bf16"],
help=(
"Whether to use mixed precision. Choose between fp16 and bf16 (bfloat16). Bf16 requires PyTorch >="
" 1.10.and an Nvidia Ampere GPU. Default to the value of accelerate config of the current system or the"
" flag passed with the `accelerate.launch` command. Use this argument to override the accelerate config."
),
)
parser.add_argument(
"--report_to",
type=str,
default="tensorboard",
help=(
'The integration to report the results and logs to. Supported platforms are `"tensorboard"`'
' (default), `"wandb"` and `"comet_ml"`. Use `"all"` to report to all integrations.'
),
)
parser.add_argument("--local_rank", type=int, default=-1, help="For distributed training: local_rank")
args = parser.parse_args()
env_local_rank = int(os.environ.get("LOCAL_RANK", -1))
if env_local_rank != -1 and env_local_rank != args.local_rank:
args.local_rank = env_local_rank
return args
def main():
args = parse_args()
logging_dir = Path(args.output_dir, args.logging_dir)
accelerator_project_config = ProjectConfiguration(project_dir=args.output_dir, logging_dir=logging_dir)
accelerator = Accelerator(
mixed_precision=args.mixed_precision,
log_with=args.report_to,
project_config=accelerator_project_config,
)
if accelerator.is_main_process:
if args.output_dir is not None:
os.makedirs(args.output_dir, exist_ok=True)
# Load scheduler, tokenizer and models.
noise_scheduler = DDPMScheduler.from_pretrained(args.pretrained_model_name_or_path, subfolder="scheduler")
tokenizer = CLIPTokenizer.from_pretrained(args.pretrained_model_name_or_path, subfolder="tokenizer")
text_encoder = CLIPTextModel.from_pretrained(args.pretrained_model_name_or_path, subfolder="text_encoder")
vae = AutoencoderKL.from_pretrained(args.pretrained_model_name_or_path, subfolder="vae")
unet = UNet2DConditionModel.from_pretrained(args.pretrained_model_name_or_path, subfolder="unet")
image_encoder = CLIPVisionModelWithProjection.from_pretrained(args.image_encoder_path)
# freeze parameters of models to save more memory
unet.requires_grad_(False)
vae.requires_grad_(False)
text_encoder.requires_grad_(False)
image_encoder.requires_grad_(False)
# ip-adapter-plus
image_proj_model = Resampler(
dim=unet.config.cross_attention_dim,
depth=4,
dim_head=64,
heads=12,
num_queries=args.num_tokens,
embedding_dim=image_encoder.config.hidden_size,
output_dim=unet.config.cross_attention_dim,
ff_mult=4,
)
# init adapter modules
attn_procs = {}
unet_sd = unet.state_dict()
for name in unet.attn_processors.keys():
cross_attention_dim = None if name.endswith("attn1.processor") else unet.config.cross_attention_dim
if name.startswith("mid_block"):
hidden_size = unet.config.block_out_channels[-1]
elif name.startswith("up_blocks"):
block_id = int(name[len("up_blocks.")])
hidden_size = list(reversed(unet.config.block_out_channels))[block_id]
elif name.startswith("down_blocks"):
block_id = int(name[len("down_blocks.")])
hidden_size = unet.config.block_out_channels[block_id]
if cross_attention_dim is None:
attn_procs[name] = AttnProcessor()
else:
layer_name = name.split(".processor")[0]
weights = {
"to_k_ip.weight": unet_sd[layer_name + ".to_k.weight"],
"to_v_ip.weight": unet_sd[layer_name + ".to_v.weight"],
}
attn_procs[name] = IPAttnProcessor(
hidden_size=hidden_size, cross_attention_dim=cross_attention_dim, num_tokens=args.num_tokens
)
attn_procs[name].load_state_dict(weights)
unet.set_attn_processor(attn_procs)
adapter_modules = torch.nn.ModuleList(unet.attn_processors.values())
ip_adapter = IPAdapter(unet, image_proj_model, adapter_modules, args.pretrained_ip_adapter_path)
weight_dtype = torch.float32
if accelerator.mixed_precision == "fp16":
weight_dtype = torch.float16
elif accelerator.mixed_precision == "bf16":
weight_dtype = torch.bfloat16
# unet.to(accelerator.device, dtype=weight_dtype)
vae.to(accelerator.device, dtype=weight_dtype)
text_encoder.to(accelerator.device, dtype=weight_dtype)
image_encoder.to(accelerator.device, dtype=weight_dtype)
# optimizer
params_to_opt = itertools.chain(ip_adapter.image_proj_model.parameters(), ip_adapter.adapter_modules.parameters())
optimizer = torch.optim.AdamW(params_to_opt, lr=args.learning_rate, weight_decay=args.weight_decay)
# dataloader
train_dataset = MyDataset(
args.data_json_file, tokenizer=tokenizer, size=args.resolution, image_root_path=args.data_root_path
)
train_dataloader = torch.utils.data.DataLoader(
train_dataset,
shuffle=True,
collate_fn=collate_fn,
batch_size=args.train_batch_size,
num_workers=args.dataloader_num_workers,
)
# Prepare everything with our `accelerator`.
ip_adapter, optimizer, train_dataloader = accelerator.prepare(ip_adapter, optimizer, train_dataloader)
global_step = 0
for epoch in range(0, args.num_train_epochs):
begin = time.perf_counter()
for step, batch in enumerate(train_dataloader):
load_data_time = time.perf_counter() - begin
with accelerator.accumulate(ip_adapter):
# Convert images to latent space
with torch.no_grad():
latents = vae.encode(
batch["images"].to(accelerator.device, dtype=weight_dtype)
).latent_dist.sample()
latents = latents * vae.config.scaling_factor
# Sample noise that we'll add to the latents
noise = torch.randn_like(latents)
bsz = latents.shape[0]
# Sample a random timestep for each image
timesteps = torch.randint(0, noise_scheduler.num_train_timesteps, (bsz,), device=latents.device)
timesteps = timesteps.long()
# Add noise to the latents according to the noise magnitude at each timestep
# (this is the forward diffusion process)
noisy_latents = noise_scheduler.add_noise(latents, noise, timesteps)
clip_images = []
for clip_image, drop_image_embed in zip(batch["clip_images"], batch["drop_image_embeds"]):
if drop_image_embed == 1:
clip_images.append(torch.zeros_like(clip_image))
else:
clip_images.append(clip_image)
clip_images = torch.stack(clip_images, dim=0)
with torch.no_grad():
image_embeds = image_encoder(
clip_images.to(accelerator.device, dtype=weight_dtype), output_hidden_states=True
).hidden_states[-2]
with torch.no_grad():
encoder_hidden_states = text_encoder(batch["text_input_ids"].to(accelerator.device))[0]
noise_pred = ip_adapter(noisy_latents, timesteps, encoder_hidden_states, image_embeds)
loss = F.mse_loss(noise_pred.float(), noise.float(), reduction="mean")
# Gather the losses across all processes for logging (if we use distributed training).
avg_loss = accelerator.gather(loss.repeat(args.train_batch_size)).mean().item()
# Backpropagate
accelerator.backward(loss)
optimizer.step()
optimizer.zero_grad()
if accelerator.is_main_process:
print(
"Epoch {}, step {}, data_time: {}, time: {}, step_loss: {}".format(
epoch, step, load_data_time, time.perf_counter() - begin, avg_loss
)
)
global_step += 1
if global_step % args.save_steps == 0:
save_path = os.path.join(args.output_dir, f"checkpoint-{global_step}")
accelerator.save_state(save_path)
begin = time.perf_counter()
if __name__ == "__main__":
main()

View File

@@ -1,520 +0,0 @@
import argparse
import itertools
import json
import os
import random
import time
from pathlib import Path
import numpy as np
import torch
import torch.nn.functional as F
from accelerate import Accelerator
from accelerate.utils import ProjectConfiguration
from ip_adapter.ip_adapter import ImageProjModel
from ip_adapter.utils import is_torch2_available
from PIL import Image
from torchvision import transforms
from transformers import (
CLIPImageProcessor,
CLIPTextModel,
CLIPTextModelWithProjection,
CLIPTokenizer,
CLIPVisionModelWithProjection,
)
from diffusers import AutoencoderKL, DDPMScheduler, UNet2DConditionModel
if is_torch2_available():
from ip_adapter.attention_processor import AttnProcessor2_0 as AttnProcessor
from ip_adapter.attention_processor import IPAttnProcessor2_0 as IPAttnProcessor
else:
from ip_adapter.attention_processor import AttnProcessor, IPAttnProcessor
# Dataset
class MyDataset(torch.utils.data.Dataset):
def __init__(
self,
json_file,
tokenizer,
tokenizer_2,
size=1024,
center_crop=True,
t_drop_rate=0.05,
i_drop_rate=0.05,
ti_drop_rate=0.05,
image_root_path="",
):
super().__init__()
self.tokenizer = tokenizer
self.tokenizer_2 = tokenizer_2
self.size = size
self.center_crop = center_crop
self.i_drop_rate = i_drop_rate
self.t_drop_rate = t_drop_rate
self.ti_drop_rate = ti_drop_rate
self.image_root_path = image_root_path
self.data = json.load(open(json_file)) # list of dict: [{"image_file": "1.png", "text": "A dog"}]
self.transform = transforms.Compose(
[
transforms.Resize(self.size, interpolation=transforms.InterpolationMode.BILINEAR),
transforms.ToTensor(),
transforms.Normalize([0.5], [0.5]),
]
)
self.clip_image_processor = CLIPImageProcessor()
def __getitem__(self, idx):
item = self.data[idx]
text = item["text"]
image_file = item["image_file"]
# read image
raw_image = Image.open(os.path.join(self.image_root_path, image_file))
# original size
original_width, original_height = raw_image.size
original_size = torch.tensor([original_height, original_width])
image_tensor = self.transform(raw_image.convert("RGB"))
# random crop
delta_h = image_tensor.shape[1] - self.size
delta_w = image_tensor.shape[2] - self.size
assert not all([delta_h, delta_w])
if self.center_crop:
top = delta_h // 2
left = delta_w // 2
else:
top = np.random.randint(0, delta_h + 1)
left = np.random.randint(0, delta_w + 1)
image = transforms.functional.crop(image_tensor, top=top, left=left, height=self.size, width=self.size)
crop_coords_top_left = torch.tensor([top, left])
clip_image = self.clip_image_processor(images=raw_image, return_tensors="pt").pixel_values
# drop
drop_image_embed = 0
rand_num = random.random()
if rand_num < self.i_drop_rate:
drop_image_embed = 1
elif rand_num < (self.i_drop_rate + self.t_drop_rate):
text = ""
elif rand_num < (self.i_drop_rate + self.t_drop_rate + self.ti_drop_rate):
text = ""
drop_image_embed = 1
# get text and tokenize
text_input_ids = self.tokenizer(
text,
max_length=self.tokenizer.model_max_length,
padding="max_length",
truncation=True,
return_tensors="pt",
).input_ids
text_input_ids_2 = self.tokenizer_2(
text,
max_length=self.tokenizer_2.model_max_length,
padding="max_length",
truncation=True,
return_tensors="pt",
).input_ids
return {
"image": image,
"text_input_ids": text_input_ids,
"text_input_ids_2": text_input_ids_2,
"clip_image": clip_image,
"drop_image_embed": drop_image_embed,
"original_size": original_size,
"crop_coords_top_left": crop_coords_top_left,
"target_size": torch.tensor([self.size, self.size]),
}
def __len__(self):
return len(self.data)
def collate_fn(data):
images = torch.stack([example["image"] for example in data])
text_input_ids = torch.cat([example["text_input_ids"] for example in data], dim=0)
text_input_ids_2 = torch.cat([example["text_input_ids_2"] for example in data], dim=0)
clip_images = torch.cat([example["clip_image"] for example in data], dim=0)
drop_image_embeds = [example["drop_image_embed"] for example in data]
original_size = torch.stack([example["original_size"] for example in data])
crop_coords_top_left = torch.stack([example["crop_coords_top_left"] for example in data])
target_size = torch.stack([example["target_size"] for example in data])
return {
"images": images,
"text_input_ids": text_input_ids,
"text_input_ids_2": text_input_ids_2,
"clip_images": clip_images,
"drop_image_embeds": drop_image_embeds,
"original_size": original_size,
"crop_coords_top_left": crop_coords_top_left,
"target_size": target_size,
}
class IPAdapter(torch.nn.Module):
"""IP-Adapter"""
def __init__(self, unet, image_proj_model, adapter_modules, ckpt_path=None):
super().__init__()
self.unet = unet
self.image_proj_model = image_proj_model
self.adapter_modules = adapter_modules
if ckpt_path is not None:
self.load_from_checkpoint(ckpt_path)
def forward(self, noisy_latents, timesteps, encoder_hidden_states, unet_added_cond_kwargs, image_embeds):
ip_tokens = self.image_proj_model(image_embeds)
encoder_hidden_states = torch.cat([encoder_hidden_states, ip_tokens], dim=1)
# Predict the noise residual
noise_pred = self.unet(
noisy_latents, timesteps, encoder_hidden_states, added_cond_kwargs=unet_added_cond_kwargs
).sample
return noise_pred
def load_from_checkpoint(self, ckpt_path: str):
# Calculate original checksums
orig_ip_proj_sum = torch.sum(torch.stack([torch.sum(p) for p in self.image_proj_model.parameters()]))
orig_adapter_sum = torch.sum(torch.stack([torch.sum(p) for p in self.adapter_modules.parameters()]))
state_dict = torch.load(ckpt_path, map_location="cpu")
# Load state dict for image_proj_model and adapter_modules
self.image_proj_model.load_state_dict(state_dict["image_proj"], strict=True)
self.adapter_modules.load_state_dict(state_dict["ip_adapter"], strict=True)
# Calculate new checksums
new_ip_proj_sum = torch.sum(torch.stack([torch.sum(p) for p in self.image_proj_model.parameters()]))
new_adapter_sum = torch.sum(torch.stack([torch.sum(p) for p in self.adapter_modules.parameters()]))
# Verify if the weights have changed
assert orig_ip_proj_sum != new_ip_proj_sum, "Weights of image_proj_model did not change!"
assert orig_adapter_sum != new_adapter_sum, "Weights of adapter_modules did not change!"
print(f"Successfully loaded weights from checkpoint {ckpt_path}")
def parse_args():
parser = argparse.ArgumentParser(description="Simple example of a training script.")
parser.add_argument(
"--pretrained_model_name_or_path",
type=str,
default=None,
required=True,
help="Path to pretrained model or model identifier from huggingface.co/models.",
)
parser.add_argument(
"--pretrained_ip_adapter_path",
type=str,
default=None,
help="Path to pretrained ip adapter model. If not specified weights are initialized randomly.",
)
parser.add_argument(
"--data_json_file",
type=str,
default=None,
required=True,
help="Training data",
)
parser.add_argument(
"--data_root_path",
type=str,
default="",
required=True,
help="Training data root path",
)
parser.add_argument(
"--image_encoder_path",
type=str,
default=None,
required=True,
help="Path to CLIP image encoder",
)
parser.add_argument(
"--output_dir",
type=str,
default="sd-ip_adapter",
help="The output directory where the model predictions and checkpoints will be written.",
)
parser.add_argument(
"--logging_dir",
type=str,
default="logs",
help=(
"[TensorBoard](https://www.tensorflow.org/tensorboard) log directory. Will default to"
" *output_dir/runs/**CURRENT_DATETIME_HOSTNAME***."
),
)
parser.add_argument(
"--resolution",
type=int,
default=512,
help=("The resolution for input images"),
)
parser.add_argument(
"--learning_rate",
type=float,
default=1e-4,
help="Learning rate to use.",
)
parser.add_argument("--weight_decay", type=float, default=1e-2, help="Weight decay to use.")
parser.add_argument("--num_train_epochs", type=int, default=100)
parser.add_argument(
"--train_batch_size", type=int, default=8, help="Batch size (per device) for the training dataloader."
)
parser.add_argument("--noise_offset", type=float, default=None, help="noise offset")
parser.add_argument(
"--dataloader_num_workers",
type=int,
default=0,
help=(
"Number of subprocesses to use for data loading. 0 means that the data will be loaded in the main process."
),
)
parser.add_argument(
"--save_steps",
type=int,
default=2000,
help=("Save a checkpoint of the training state every X updates"),
)
parser.add_argument(
"--mixed_precision",
type=str,
default=None,
choices=["no", "fp16", "bf16"],
help=(
"Whether to use mixed precision. Choose between fp16 and bf16 (bfloat16). Bf16 requires PyTorch >="
" 1.10.and an Nvidia Ampere GPU. Default to the value of accelerate config of the current system or the"
" flag passed with the `accelerate.launch` command. Use this argument to override the accelerate config."
),
)
parser.add_argument(
"--report_to",
type=str,
default="tensorboard",
help=(
'The integration to report the results and logs to. Supported platforms are `"tensorboard"`'
' (default), `"wandb"` and `"comet_ml"`. Use `"all"` to report to all integrations.'
),
)
parser.add_argument("--local_rank", type=int, default=-1, help="For distributed training: local_rank")
args = parser.parse_args()
env_local_rank = int(os.environ.get("LOCAL_RANK", -1))
if env_local_rank != -1 and env_local_rank != args.local_rank:
args.local_rank = env_local_rank
return args
def main():
args = parse_args()
logging_dir = Path(args.output_dir, args.logging_dir)
accelerator_project_config = ProjectConfiguration(project_dir=args.output_dir, logging_dir=logging_dir)
accelerator = Accelerator(
mixed_precision=args.mixed_precision,
log_with=args.report_to,
project_config=accelerator_project_config,
)
if accelerator.is_main_process:
if args.output_dir is not None:
os.makedirs(args.output_dir, exist_ok=True)
# Load scheduler, tokenizer and models.
noise_scheduler = DDPMScheduler.from_pretrained(args.pretrained_model_name_or_path, subfolder="scheduler")
tokenizer = CLIPTokenizer.from_pretrained(args.pretrained_model_name_or_path, subfolder="tokenizer")
text_encoder = CLIPTextModel.from_pretrained(args.pretrained_model_name_or_path, subfolder="text_encoder")
tokenizer_2 = CLIPTokenizer.from_pretrained(args.pretrained_model_name_or_path, subfolder="tokenizer_2")
text_encoder_2 = CLIPTextModelWithProjection.from_pretrained(
args.pretrained_model_name_or_path, subfolder="text_encoder_2"
)
vae = AutoencoderKL.from_pretrained(args.pretrained_model_name_or_path, subfolder="vae")
unet = UNet2DConditionModel.from_pretrained(args.pretrained_model_name_or_path, subfolder="unet")
image_encoder = CLIPVisionModelWithProjection.from_pretrained(args.image_encoder_path)
# freeze parameters of models to save more memory
unet.requires_grad_(False)
vae.requires_grad_(False)
text_encoder.requires_grad_(False)
text_encoder_2.requires_grad_(False)
image_encoder.requires_grad_(False)
# ip-adapter
num_tokens = 4
image_proj_model = ImageProjModel(
cross_attention_dim=unet.config.cross_attention_dim,
clip_embeddings_dim=image_encoder.config.projection_dim,
clip_extra_context_tokens=num_tokens,
)
# init adapter modules
attn_procs = {}
unet_sd = unet.state_dict()
for name in unet.attn_processors.keys():
cross_attention_dim = None if name.endswith("attn1.processor") else unet.config.cross_attention_dim
if name.startswith("mid_block"):
hidden_size = unet.config.block_out_channels[-1]
elif name.startswith("up_blocks"):
block_id = int(name[len("up_blocks.")])
hidden_size = list(reversed(unet.config.block_out_channels))[block_id]
elif name.startswith("down_blocks"):
block_id = int(name[len("down_blocks.")])
hidden_size = unet.config.block_out_channels[block_id]
if cross_attention_dim is None:
attn_procs[name] = AttnProcessor()
else:
layer_name = name.split(".processor")[0]
weights = {
"to_k_ip.weight": unet_sd[layer_name + ".to_k.weight"],
"to_v_ip.weight": unet_sd[layer_name + ".to_v.weight"],
}
attn_procs[name] = IPAttnProcessor(
hidden_size=hidden_size, cross_attention_dim=cross_attention_dim, num_tokens=num_tokens
)
attn_procs[name].load_state_dict(weights)
unet.set_attn_processor(attn_procs)
adapter_modules = torch.nn.ModuleList(unet.attn_processors.values())
ip_adapter = IPAdapter(unet, image_proj_model, adapter_modules, args.pretrained_ip_adapter_path)
weight_dtype = torch.float32
if accelerator.mixed_precision == "fp16":
weight_dtype = torch.float16
elif accelerator.mixed_precision == "bf16":
weight_dtype = torch.bfloat16
# unet.to(accelerator.device, dtype=weight_dtype)
vae.to(accelerator.device) # use fp32
text_encoder.to(accelerator.device, dtype=weight_dtype)
text_encoder_2.to(accelerator.device, dtype=weight_dtype)
image_encoder.to(accelerator.device, dtype=weight_dtype)
# optimizer
params_to_opt = itertools.chain(ip_adapter.image_proj_model.parameters(), ip_adapter.adapter_modules.parameters())
optimizer = torch.optim.AdamW(params_to_opt, lr=args.learning_rate, weight_decay=args.weight_decay)
# dataloader
train_dataset = MyDataset(
args.data_json_file,
tokenizer=tokenizer,
tokenizer_2=tokenizer_2,
size=args.resolution,
image_root_path=args.data_root_path,
)
train_dataloader = torch.utils.data.DataLoader(
train_dataset,
shuffle=True,
collate_fn=collate_fn,
batch_size=args.train_batch_size,
num_workers=args.dataloader_num_workers,
)
# Prepare everything with our `accelerator`.
ip_adapter, optimizer, train_dataloader = accelerator.prepare(ip_adapter, optimizer, train_dataloader)
global_step = 0
for epoch in range(0, args.num_train_epochs):
begin = time.perf_counter()
for step, batch in enumerate(train_dataloader):
load_data_time = time.perf_counter() - begin
with accelerator.accumulate(ip_adapter):
# Convert images to latent space
with torch.no_grad():
# vae of sdxl should use fp32
latents = vae.encode(batch["images"].to(accelerator.device, dtype=vae.dtype)).latent_dist.sample()
latents = latents * vae.config.scaling_factor
latents = latents.to(accelerator.device, dtype=weight_dtype)
# Sample noise that we'll add to the latents
noise = torch.randn_like(latents)
if args.noise_offset:
# https://www.crosslabs.org//blog/diffusion-with-offset-noise
noise += args.noise_offset * torch.randn((latents.shape[0], latents.shape[1], 1, 1)).to(
accelerator.device, dtype=weight_dtype
)
bsz = latents.shape[0]
# Sample a random timestep for each image
timesteps = torch.randint(0, noise_scheduler.num_train_timesteps, (bsz,), device=latents.device)
timesteps = timesteps.long()
# Add noise to the latents according to the noise magnitude at each timestep
# (this is the forward diffusion process)
noisy_latents = noise_scheduler.add_noise(latents, noise, timesteps)
with torch.no_grad():
image_embeds = image_encoder(
batch["clip_images"].to(accelerator.device, dtype=weight_dtype)
).image_embeds
image_embeds_ = []
for image_embed, drop_image_embed in zip(image_embeds, batch["drop_image_embeds"]):
if drop_image_embed == 1:
image_embeds_.append(torch.zeros_like(image_embed))
else:
image_embeds_.append(image_embed)
image_embeds = torch.stack(image_embeds_)
with torch.no_grad():
encoder_output = text_encoder(
batch["text_input_ids"].to(accelerator.device), output_hidden_states=True
)
text_embeds = encoder_output.hidden_states[-2]
encoder_output_2 = text_encoder_2(
batch["text_input_ids_2"].to(accelerator.device), output_hidden_states=True
)
pooled_text_embeds = encoder_output_2[0]
text_embeds_2 = encoder_output_2.hidden_states[-2]
text_embeds = torch.concat([text_embeds, text_embeds_2], dim=-1) # concat
# add cond
add_time_ids = [
batch["original_size"].to(accelerator.device),
batch["crop_coords_top_left"].to(accelerator.device),
batch["target_size"].to(accelerator.device),
]
add_time_ids = torch.cat(add_time_ids, dim=1).to(accelerator.device, dtype=weight_dtype)
unet_added_cond_kwargs = {"text_embeds": pooled_text_embeds, "time_ids": add_time_ids}
noise_pred = ip_adapter(noisy_latents, timesteps, text_embeds, unet_added_cond_kwargs, image_embeds)
loss = F.mse_loss(noise_pred.float(), noise.float(), reduction="mean")
# Gather the losses across all processes for logging (if we use distributed training).
avg_loss = accelerator.gather(loss.repeat(args.train_batch_size)).mean().item()
# Backpropagate
accelerator.backward(loss)
optimizer.step()
optimizer.zero_grad()
if accelerator.is_main_process:
print(
"Epoch {}, step {}, data_time: {}, time: {}, step_loss: {}".format(
epoch, step, load_data_time, time.perf_counter() - begin, avg_loss
)
)
global_step += 1
if global_step % args.save_steps == 0:
save_path = os.path.join(args.output_dir, f"checkpoint-{global_step}")
accelerator.save_state(save_path)
begin = time.perf_counter()
if __name__ == "__main__":
main()

View File

@@ -7,14 +7,13 @@ It has been tested on v4 and v5p TPU versions. Training code has been tested on
This script implements Distributed Data Parallel using GSPMD feature in XLA compiler
where we shard the input batches over the TPU devices.
As of 10-31-2024, these are some expected step times.
As of 9-11-2024, these are some expected step times.
| accelerator | global batch size | step time (seconds) |
| ----------- | ----------------- | --------- |
| v5p-512 | 16384 | 1.01 |
| v5p-256 | 8192 | 1.01 |
| v5p-128 | 4096 | 1.0 |
| v5p-64 | 2048 | 1.01 |
| v5p-128 | 1024 | 0.245 |
| v5p-256 | 2048 | 0.234 |
| v5p-512 | 4096 | 0.2498 |
## Create TPU
@@ -44,9 +43,8 @@ Install PyTorch and PyTorch/XLA nightly versions:
gcloud compute tpus tpu-vm ssh ${TPU_NAME} \
--project=${PROJECT_ID} --zone=${ZONE} --worker=all \
--command='
pip3 install --pre torch==2.6.0.dev20241031+cpu torchvision --index-url https://download.pytorch.org/whl/nightly/cpu
pip3 install "torch_xla[tpu] @ https://storage.googleapis.com/pytorch-xla-releases/wheels/tpuvm/torch_xla-2.6.0.dev20241031.cxx11-cp310-cp310-linux_x86_64.whl" -f https://storage.googleapis.com/libtpu-releases/index.html
pip install torch_xla[pallas] -f https://storage.googleapis.com/jax-releases/jax_nightly_releases.html -f https://storage.googleapis.com/jax-releases/jaxlib_nightly_releases.html
pip3 install --pre torch==2.5.0.dev20240905+cpu torchvision==0.20.0.dev20240905+cpu --index-url https://download.pytorch.org/whl/nightly/cpu
pip3 install "torch_xla[tpu] @ https://storage.googleapis.com/pytorch-xla-releases/wheels/tpuvm/torch_xla-2.5.0.dev20240905-cp310-cp310-linux_x86_64.whl" -f https://storage.googleapis.com/libtpu-releases/index.html
'
```
@@ -90,18 +88,17 @@ are fixed.
gcloud compute tpus tpu-vm ssh ${TPU_NAME} \
--project=${PROJECT_ID} --zone=${ZONE} --worker=all \
--command='
export XLA_DISABLE_FUNCTIONALIZATION=0
export XLA_DISABLE_FUNCTIONALIZATION=1
export PROFILE_DIR=/tmp/
export CACHE_DIR=/tmp/
export DATASET_NAME=lambdalabs/naruto-blip-captions
export PER_HOST_BATCH_SIZE=32 # This is known to work on TPU v4. Can set this to 64 for TPU v5p
export TRAIN_STEPS=50
export OUTPUT_DIR=/tmp/trained-model/
python diffusers/examples/research_projects/pytorch_xla/train_text_to_image_xla.py --pretrained_model_name_or_path=stabilityai/stable-diffusion-2-base --dataset_name=$DATASET_NAME --resolution=512 --center_crop --random_flip --train_batch_size=$PER_HOST_BATCH_SIZE --max_train_steps=$TRAIN_STEPS --learning_rate=1e-06 --mixed_precision=bf16 --profile_duration=80000 --output_dir=$OUTPUT_DIR --dataloader_num_workers=8 --loader_prefetch_size=4 --device_prefetch_size=4'
python diffusers/examples/research_projects/pytorch_xla/train_text_to_image_xla.py --pretrained_model_name_or_path=stabilityai/stable-diffusion-2-base --dataset_name=$DATASET_NAME --resolution=512 --center_crop --random_flip --train_batch_size=$PER_HOST_BATCH_SIZE --max_train_steps=$TRAIN_STEPS --learning_rate=1e-06 --mixed_precision=bf16 --profile_duration=80000 --output_dir=$OUTPUT_DIR --dataloader_num_workers=4 --loader_prefetch_size=4 --device_prefetch_size=4'
```
Pass `--print_loss` if you would like to see the loss printed at every step. Be aware that printing the loss at every step disrupts the optimized flow execution, thus the step time will be longer.
### Environment Envs Explained
* `XLA_DISABLE_FUNCTIONALIZATION`: To optimize the performance for AdamW optimizer.

View File

@@ -140,43 +140,33 @@ class TrainSD:
self.optimizer.step()
def start_training(self):
dataloader_exception = False
measure_start_step = args.measure_start_step
assert measure_start_step < self.args.max_train_steps
total_time = 0
for step in range(0, self.args.max_train_steps):
times = []
last_time = time.time()
step = 0
while True:
if self.global_step >= self.args.max_train_steps:
xm.mark_step()
break
if step == 4 and PROFILE_DIR is not None:
xm.wait_device_ops()
xp.trace_detached(f"localhost:{PORT}", PROFILE_DIR, duration_ms=args.profile_duration)
try:
batch = next(self.dataloader)
except Exception as e:
dataloader_exception = True
print(e)
break
if step == measure_start_step and PROFILE_DIR is not None:
xm.wait_device_ops()
xp.trace_detached(f"localhost:{PORT}", PROFILE_DIR, duration_ms=args.profile_duration)
last_time = time.time()
loss = self.step_fn(batch["pixel_values"], batch["input_ids"])
step_time = time.time() - last_time
if step >= 10:
times.append(step_time)
print(f"step: {step}, step_time: {step_time}")
if step % 5 == 0:
print(f"step: {step}, loss: {loss}")
last_time = time.time()
self.global_step += 1
def print_loss_closure(step, loss):
print(f"Step: {step}, Loss: {loss}")
if args.print_loss:
xm.add_step_closure(
print_loss_closure,
args=(
self.global_step,
loss,
),
)
xm.mark_step()
if not dataloader_exception:
xm.wait_device_ops()
total_time = time.time() - last_time
print(f"Average step time: {total_time/(self.args.max_train_steps-measure_start_step)}")
else:
print("dataloader exception happen, skip result")
return
step += 1
# print(f"Average step time: {sum(times)/len(times)}")
xm.wait_device_ops()
def step_fn(
self,
@@ -190,10 +180,7 @@ class TrainSD:
noise = torch.randn_like(latents).to(self.device, dtype=self.weight_dtype)
bsz = latents.shape[0]
timesteps = torch.randint(
0,
self.noise_scheduler.config.num_train_timesteps,
(bsz,),
device=latents.device,
0, self.noise_scheduler.config.num_train_timesteps, (bsz,), device=latents.device
)
timesteps = timesteps.long()
@@ -237,6 +224,9 @@ class TrainSD:
def parse_args():
parser = argparse.ArgumentParser(description="Simple example of a training script.")
parser.add_argument(
"--input_perturbation", type=float, default=0, help="The scale of input perturbation. Recommended 0.1."
)
parser.add_argument("--profile_duration", type=int, default=10000, help="Profile duration in ms")
parser.add_argument(
"--pretrained_model_name_or_path",
@@ -268,6 +258,12 @@ def parse_args():
" or to a folder containing files that 🤗 Datasets can understand."
),
)
parser.add_argument(
"--dataset_config_name",
type=str,
default=None,
help="The config of the Dataset, leave as None if there's only one config.",
)
parser.add_argument(
"--train_data_dir",
type=str,
@@ -287,6 +283,15 @@ def parse_args():
default="text",
help="The column of the dataset containing a caption or a list of captions.",
)
parser.add_argument(
"--max_train_samples",
type=int,
default=None,
help=(
"For debugging purposes or quicker training, truncate the number of training examples to this "
"value if set."
),
)
parser.add_argument(
"--output_dir",
type=str,
@@ -299,6 +304,7 @@ def parse_args():
default=None,
help="The directory where the downloaded models and datasets will be stored.",
)
parser.add_argument("--seed", type=int, default=None, help="A seed for reproducible training.")
parser.add_argument(
"--resolution",
type=int,
@@ -368,19 +374,12 @@ def parse_args():
default=1,
help=("Number of subprocesses to use for data loading to cpu."),
)
parser.add_argument(
"--loader_prefetch_factor",
type=int,
default=2,
help=("Number of batches loaded in advance by each worker."),
)
parser.add_argument(
"--device_prefetch_size",
type=int,
default=1,
help=("Number of subprocesses to use for data loading to tpu from cpu. "),
)
parser.add_argument("--measure_start_step", type=int, default=10, help="Step to start profiling.")
parser.add_argument("--adam_beta1", type=float, default=0.9, help="The beta1 parameter for the Adam optimizer.")
parser.add_argument("--adam_beta2", type=float, default=0.999, help="The beta2 parameter for the Adam optimizer.")
parser.add_argument("--adam_weight_decay", type=float, default=1e-2, help="Weight decay to use.")
@@ -395,8 +394,12 @@ def parse_args():
"--mixed_precision",
type=str,
default=None,
choices=["no", "bf16"],
help=("Whether to use mixed precision. Bf16 requires PyTorch >= 1.10"),
choices=["no", "fp16", "bf16"],
help=(
"Whether to use mixed precision. Choose between fp16 and bf16 (bfloat16). Bf16 requires PyTorch >="
" 1.10.and an Nvidia Ampere GPU. Default to the value of accelerate config of the current system or the"
" flag passed with the `accelerate.launch` command. Use this argument to override the accelerate config."
),
)
parser.add_argument("--push_to_hub", action="store_true", help="Whether or not to push the model to the Hub.")
parser.add_argument("--hub_token", type=str, default=None, help="The token to use to push to the Model Hub.")
@@ -406,12 +409,6 @@ def parse_args():
default=None,
help="The name of the repository to keep in sync with the local `output_dir`.",
)
parser.add_argument(
"--print_loss",
default=False,
action="store_true",
help=("Print loss at every step."),
)
args = parser.parse_args()
@@ -439,6 +436,7 @@ def load_dataset(args):
# Downloading and loading a dataset from the hub.
dataset = datasets.load_dataset(
args.dataset_name,
args.dataset_config_name,
cache_dir=args.cache_dir,
data_dir=args.train_data_dir,
)
@@ -483,7 +481,9 @@ def main(args):
_ = xp.start_server(PORT)
num_devices = xr.global_runtime_device_count()
mesh = xs.get_1d_mesh("data")
device_ids = np.arange(num_devices)
mesh_shape = (num_devices, 1)
mesh = xs.Mesh(device_ids, mesh_shape, ("x", "y"))
xs.set_global_mesh(mesh)
text_encoder = CLIPTextModel.from_pretrained(
@@ -520,7 +520,6 @@ def main(args):
from torch_xla.distributed.fsdp.utils import apply_xla_patch_to_nn_linear
unet = apply_xla_patch_to_nn_linear(unet, xs.xla_patched_nn_linear_forward)
unet.enable_xla_flash_attention(partition_spec=("data", None, None, None))
vae.requires_grad_(False)
text_encoder.requires_grad_(False)
@@ -531,12 +530,15 @@ def main(args):
# as these weights are only used for inference, keeping weights in full
# precision is not required.
weight_dtype = torch.float32
if args.mixed_precision == "bf16":
if args.mixed_precision == "fp16":
weight_dtype = torch.float16
elif args.mixed_precision == "bf16":
weight_dtype = torch.bfloat16
device = xm.xla_device()
print("device: ", device)
print("weight_dtype: ", weight_dtype)
# Move text_encode and vae to device and cast to weight_dtype
text_encoder = text_encoder.to(device, dtype=weight_dtype)
vae = vae.to(device, dtype=weight_dtype)
unet = unet.to(device, dtype=weight_dtype)
@@ -604,27 +606,24 @@ def main(args):
collate_fn=collate_fn,
num_workers=args.dataloader_num_workers,
batch_size=args.train_batch_size,
prefetch_factor=args.loader_prefetch_factor,
)
train_dataloader = pl.MpDeviceLoader(
train_dataloader,
device,
input_sharding={
"pixel_values": xs.ShardingSpec(mesh, ("data", None, None, None), minibatch=True),
"input_ids": xs.ShardingSpec(mesh, ("data", None), minibatch=True),
"pixel_values": xs.ShardingSpec(mesh, ("x", None, None, None), minibatch=True),
"input_ids": xs.ShardingSpec(mesh, ("x", None), minibatch=True),
},
loader_prefetch_size=args.loader_prefetch_size,
device_prefetch_size=args.device_prefetch_size,
)
num_hosts = xr.process_count()
num_devices_per_host = num_devices // num_hosts
if xm.is_master_ordinal():
print("***** Running training *****")
print(f"Instantaneous batch size per device = {args.train_batch_size // num_devices_per_host }")
print(f"Instantaneous batch size per device = {args.train_batch_size}")
print(
f"Total train batch size (w. parallel, distributed & accumulation) = {args.train_batch_size * num_hosts}"
f"Total train batch size (w. parallel, distributed & accumulation) = {args.train_batch_size * num_devices}"
)
print(f" Total optimization steps = {args.max_train_steps}")

View File

@@ -1,61 +0,0 @@
# Create a server
Diffusers' pipelines can be used as an inference engine for a server. It supports concurrent and multithreaded requests to generate images that may be requested by multiple users at the same time.
This guide will show you how to use the [`StableDiffusion3Pipeline`] in a server, but feel free to use any pipeline you want.
Start by navigating to the `examples/server` folder and installing all of the dependencies.
```py
pip install .
pip install -f requirements.txt
```
Launch the server with the following command.
```py
python server.py
```
The server is accessed at http://localhost:8000. You can curl this model with the following command.
```
curl -X POST -H "Content-Type: application/json" --data '{"model": "something", "prompt": "a kitten in front of a fireplace"}' http://localhost:8000/v1/images/generations
```
If you need to upgrade some dependencies, you can use either [pip-tools](https://github.com/jazzband/pip-tools) or [uv](https://github.com/astral-sh/uv). For example, upgrade the dependencies with `uv` using the following command.
```
uv pip compile requirements.in -o requirements.txt
```
The server is built with [FastAPI](https://fastapi.tiangolo.com/async/). The endpoint for `v1/images/generations` is shown below.
```py
@app.post("/v1/images/generations")
async def generate_image(image_input: TextToImageInput):
try:
loop = asyncio.get_event_loop()
scheduler = shared_pipeline.pipeline.scheduler.from_config(shared_pipeline.pipeline.scheduler.config)
pipeline = StableDiffusion3Pipeline.from_pipe(shared_pipeline.pipeline, scheduler=scheduler)
generator = torch.Generator(device="cuda")
generator.manual_seed(random.randint(0, 10000000))
output = await loop.run_in_executor(None, lambda: pipeline(image_input.prompt, generator = generator))
logger.info(f"output: {output}")
image_url = save_image(output.images[0])
return {"data": [{"url": image_url}]}
except Exception as e:
if isinstance(e, HTTPException):
raise e
elif hasattr(e, 'message'):
raise HTTPException(status_code=500, detail=e.message + traceback.format_exc())
raise HTTPException(status_code=500, detail=str(e) + traceback.format_exc())
```
The `generate_image` function is defined as asynchronous with the [async](https://fastapi.tiangolo.com/async/) keyword so that FastAPI knows that whatever is happening in this function won't necessarily return a result right away. Once it hits some point in the function that it needs to await some other [Task](https://docs.python.org/3/library/asyncio-task.html#asyncio.Task), the main thread goes back to answering other HTTP requests. This is shown in the code below with the [await](https://fastapi.tiangolo.com/async/#async-and-await) keyword.
```py
output = await loop.run_in_executor(None, lambda: pipeline(image_input.prompt, generator = generator))
```
At this point, the execution of the pipeline function is placed onto a [new thread](https://docs.python.org/3/library/asyncio-eventloop.html#asyncio.loop.run_in_executor), and the main thread performs other things until a result is returned from the `pipeline`.
Another important aspect of this implementation is creating a `pipeline` from `shared_pipeline`. The goal behind this is to avoid loading the underlying model more than once onto the GPU while still allowing for each new request that is running on a separate thread to have its own generator and scheduler. The scheduler, in particular, is not thread-safe, and it will cause errors like: `IndexError: index 21 is out of bounds for dimension 0 with size 21` if you try to use the same scheduler across multiple threads.

View File

@@ -1,9 +0,0 @@
torch~=2.4.0
transformers==4.46.1
sentencepiece
aiohttp
py-consul
prometheus_client >= 0.18.0
prometheus-fastapi-instrumentator >= 7.0.0
fastapi
uvicorn

View File

@@ -1,124 +0,0 @@
# This file was autogenerated by uv via the following command:
# uv pip compile requirements.in -o requirements.txt
aiohappyeyeballs==2.4.3
# via aiohttp
aiohttp==3.10.10
# via -r requirements.in
aiosignal==1.3.1
# via aiohttp
annotated-types==0.7.0
# via pydantic
anyio==4.6.2.post1
# via starlette
attrs==24.2.0
# via aiohttp
certifi==2024.8.30
# via requests
charset-normalizer==3.4.0
# via requests
click==8.1.7
# via uvicorn
fastapi==0.115.3
# via -r requirements.in
filelock==3.16.1
# via
# huggingface-hub
# torch
# transformers
frozenlist==1.5.0
# via
# aiohttp
# aiosignal
fsspec==2024.10.0
# via
# huggingface-hub
# torch
h11==0.14.0
# via uvicorn
huggingface-hub==0.26.1
# via
# tokenizers
# transformers
idna==3.10
# via
# anyio
# requests
# yarl
jinja2==3.1.4
# via torch
markupsafe==3.0.2
# via jinja2
mpmath==1.3.0
# via sympy
multidict==6.1.0
# via
# aiohttp
# yarl
networkx==3.4.2
# via torch
numpy==2.1.2
# via transformers
packaging==24.1
# via
# huggingface-hub
# transformers
prometheus-client==0.21.0
# via
# -r requirements.in
# prometheus-fastapi-instrumentator
prometheus-fastapi-instrumentator==7.0.0
# via -r requirements.in
propcache==0.2.0
# via yarl
py-consul==1.5.3
# via -r requirements.in
pydantic==2.9.2
# via fastapi
pydantic-core==2.23.4
# via pydantic
pyyaml==6.0.2
# via
# huggingface-hub
# transformers
regex==2024.9.11
# via transformers
requests==2.32.3
# via
# huggingface-hub
# py-consul
# transformers
safetensors==0.4.5
# via transformers
sentencepiece==0.2.0
# via -r requirements.in
sniffio==1.3.1
# via anyio
starlette==0.41.0
# via
# fastapi
# prometheus-fastapi-instrumentator
sympy==1.13.3
# via torch
tokenizers==0.20.1
# via transformers
torch==2.4.1
# via -r requirements.in
tqdm==4.66.5
# via
# huggingface-hub
# transformers
transformers==4.46.1
# via -r requirements.in
typing-extensions==4.12.2
# via
# fastapi
# huggingface-hub
# pydantic
# pydantic-core
# torch
urllib3==2.2.3
# via requests
uvicorn==0.32.0
# via -r requirements.in
yarl==1.16.0
# via aiohttp

View File

@@ -1,133 +0,0 @@
import asyncio
import logging
import os
import random
import tempfile
import traceback
import uuid
import aiohttp
import torch
from fastapi import FastAPI, HTTPException
from fastapi.middleware.cors import CORSMiddleware
from fastapi.staticfiles import StaticFiles
from pydantic import BaseModel
from diffusers.pipelines.stable_diffusion_3 import StableDiffusion3Pipeline
logger = logging.getLogger(__name__)
class TextToImageInput(BaseModel):
model: str
prompt: str
size: str | None = None
n: int | None = None
class HttpClient:
session: aiohttp.ClientSession = None
def start(self):
self.session = aiohttp.ClientSession()
async def stop(self):
await self.session.close()
self.session = None
def __call__(self) -> aiohttp.ClientSession:
assert self.session is not None
return self.session
class TextToImagePipeline:
pipeline: StableDiffusion3Pipeline = None
device: str = None
def start(self):
if torch.cuda.is_available():
model_path = os.getenv("MODEL_PATH", "stabilityai/stable-diffusion-3.5-large")
logger.info("Loading CUDA")
self.device = "cuda"
self.pipeline = StableDiffusion3Pipeline.from_pretrained(
model_path,
torch_dtype=torch.bfloat16,
).to(device=self.device)
elif torch.backends.mps.is_available():
model_path = os.getenv("MODEL_PATH", "stabilityai/stable-diffusion-3.5-medium")
logger.info("Loading MPS for Mac M Series")
self.device = "mps"
self.pipeline = StableDiffusion3Pipeline.from_pretrained(
model_path,
torch_dtype=torch.bfloat16,
).to(device=self.device)
else:
raise Exception("No CUDA or MPS device available")
app = FastAPI()
service_url = os.getenv("SERVICE_URL", "http://localhost:8000")
image_dir = os.path.join(tempfile.gettempdir(), "images")
if not os.path.exists(image_dir):
os.makedirs(image_dir)
app.mount("/images", StaticFiles(directory=image_dir), name="images")
http_client = HttpClient()
shared_pipeline = TextToImagePipeline()
# Configure CORS settings
app.add_middleware(
CORSMiddleware,
allow_origins=["*"], # Allows all origins
allow_credentials=True,
allow_methods=["*"], # Allows all methods, e.g., GET, POST, OPTIONS, etc.
allow_headers=["*"], # Allows all headers
)
@app.on_event("startup")
def startup():
http_client.start()
shared_pipeline.start()
def save_image(image):
filename = "draw" + str(uuid.uuid4()).split("-")[0] + ".png"
image_path = os.path.join(image_dir, filename)
# write image to disk at image_path
logger.info(f"Saving image to {image_path}")
image.save(image_path)
return os.path.join(service_url, "images", filename)
@app.get("/")
@app.post("/")
@app.options("/")
async def base():
return "Welcome to Diffusers! Where you can use diffusion models to generate images"
@app.post("/v1/images/generations")
async def generate_image(image_input: TextToImageInput):
try:
loop = asyncio.get_event_loop()
scheduler = shared_pipeline.pipeline.scheduler.from_config(shared_pipeline.pipeline.scheduler.config)
pipeline = StableDiffusion3Pipeline.from_pipe(shared_pipeline.pipeline, scheduler=scheduler)
generator = torch.Generator(device=shared_pipeline.device)
generator.manual_seed(random.randint(0, 10000000))
output = await loop.run_in_executor(None, lambda: pipeline(image_input.prompt, generator=generator))
logger.info(f"output: {output}")
image_url = save_image(output.images[0])
return {"data": [{"url": image_url}]}
except Exception as e:
if isinstance(e, HTTPException):
raise e
elif hasattr(e, "message"):
raise HTTPException(status_code=500, detail=e.message + traceback.format_exc())
raise HTTPException(status_code=500, detail=str(e) + traceback.format_exc())
if __name__ == "__main__":
import uvicorn
uvicorn.run(app, host="0.0.0.0", port=8000)

View File

@@ -483,6 +483,7 @@ def parse_args(input_args=None):
# Sanity checks
if args.dataset_name is None and args.train_data_dir is None:
raise ValueError("Need either a dataset name or a training folder.")
if args.proportion_empty_prompts < 0 or args.proportion_empty_prompts > 1:
raise ValueError("`--proportion_empty_prompts` must be in the range [0, 1].")
@@ -823,7 +824,9 @@ def main(args):
if args.dataset_name is not None:
# Downloading and loading a dataset from the hub.
dataset = load_dataset(
args.dataset_name, args.dataset_config_name, cache_dir=args.cache_dir, data_dir=args.train_data_dir
args.dataset_name,
args.dataset_config_name,
cache_dir=args.cache_dir,
)
else:
data_files = {}

Some files were not shown because too many files have changed in this diff Show More