* docs: introduce cache-dit to diffusers
* docs: introduce cache-dit to diffusers
* docs: introduce cache-dit to diffusers
* docs: introduce cache-dit to diffusers
* docs: introduce cache-dit to diffusers
* docs: introduce cache-dit to diffusers
* docs: introduce cache-dit to diffusers
* misc: update examples link
* misc: update examples link
* docs: introduce cache-dit to diffusers
* docs: introduce cache-dit to diffusers
* docs: introduce cache-dit to diffusers
* docs: introduce cache-dit to diffusers
* docs: introduce cache-dit to diffusers
* Refine documentation for CacheDiT features
Updated the wording for clarity and consistency in the documentation. Adjusted sections on cache acceleration, automatic block adapter, patch functor, and hybrid cache configuration.
* Add Pruna optimization framework documentation
- Introduced a new section for Pruna in the table of contents.
- Added comprehensive documentation for Pruna, detailing its optimization techniques, installation instructions, and examples for optimizing and evaluating models
* Enhance Pruna documentation with image alt text and code block formatting
- Added alt text to images for better accessibility and context.
- Changed code block syntax from diff to python for improved clarity.
* Add installation section to Pruna documentation
- Introduced a new installation section in the Pruna documentation to guide users on how to install the framework.
- Enhanced the overall clarity and usability of the documentation for new users.
* Update pruna.md
* Update pruna.md
* Update Pruna documentation for model optimization and evaluation
- Changed section titles for consistency and clarity, from "Optimizing models" to "Optimize models" and "Evaluating and benchmarking optimized models" to "Evaluate and benchmark models".
- Enhanced descriptions to clarify the use of `diffusers` models and the evaluation process.
- Added a new example for evaluating standalone `diffusers` models.
- Updated references and links for better navigation within the documentation.
* Refactor Pruna documentation for clarity and consistency
- Removed outdated references to FLUX-juiced and streamlined the explanation of benchmarking.
- Enhanced the description of evaluating standalone `diffusers` models.
- Cleaned up code examples by removing unnecessary imports and comments for better readability.
* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Enhance Pruna documentation with new examples and clarifications
- Added an image to illustrate the optimization process.
- Updated the explanation for sharing and loading optimized models on the Hugging Face Hub.
- Clarified the evaluation process for optimized models using the EvaluationAgent.
- Improved descriptions for defining metrics and evaluating standalone diffusers models.
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* update
* fix
* non_blocking; handle parameters and buffers
* update
* Group offloading with cuda stream prefetching (#10516)
* cuda stream prefetch
* remove breakpoints
* update
* copy model hook implementation from pab
* update; ~very workaround based implementation but it seems to work as expected; needs cleanup and rewrite
* more workarounds to make it actually work
* cleanup
* rewrite
* update
* make sure to sync current stream before overwriting with pinned params
not doing so will lead to erroneous computations on the GPU and cause bad results
* better check
* update
* remove hook implementation to not deal with merge conflict
* re-add hook changes
* why use more memory when less memory do trick
* why still use slightly more memory when less memory do trick
* optimise
* add model tests
* add pipeline tests
* update docs
* add layernorm and groupnorm
* address review comments
* improve tests; add docs
* improve docs
* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* apply suggestions from code review
* update tests
* apply suggestions from review
* enable_group_offloading -> enable_group_offload for naming consistency
* raise errors if multiple offloading strategies used; add relevant tests
* handle .to() when group offload applied
* refactor some repeated code
* remove unintentional change from merge conflict
* handle .cuda()
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* update
* update
* make style
* remove dynamo disable
* add coauthor
Co-Authored-By: Dhruv Nair <dhruv.nair@gmail.com>
* update
* update
* update
* update mixin
* add some basic tests
* update
* update
* non_blocking
* improvements
* update
* norm.* -> norm
* apply suggestions from review
* add example
* update hook implementation to the latest changes from pyramid attention broadcast
* deinitialize should raise an error
* update doc page
* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* update docs
* update
* refactor
* fix _always_upcast_modules for asym ae and vq_model
* fix lumina embedding forward to not depend on weight dtype
* refactor tests
* add simple lora inference tests
* _always_upcast_modules -> _precision_sensitive_module_patterns
* remove todo comments about review; revert changes to self.dtype in unets because .dtype on ModelMixin should be able to handle fp8 weight case
* check layer dtypes in lora test
* fix UNet1DModelTests::test_layerwise_upcasting_inference
* _precision_sensitive_module_patterns -> _skip_layerwise_casting_patterns based on feedback
* skip test in NCSNppModelTests
* skip tests for AutoencoderTinyTests
* skip tests for AutoencoderOobleckTests
* skip tests for UNet1DModelTests - unsupported pytorch operations
* layerwise_upcasting -> layerwise_casting
* skip tests for UNetRLModelTests; needs next pytorch release for currently unimplemented operation support
* add layerwise fp8 pipeline test
* use xfail
* Apply suggestions from code review
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
* add assertion with fp32 comparison; add tolerance to fp8-fp32 vs fp32-fp32 comparison (required for a few models' test to pass)
* add note about memory consumption on tesla CI runner for failing test
---------
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* [docs] Replace runwayml/stable-diffusion-v1-5 with Lykon/dreamshaper-8
Updated documentation as runwayml/stable-diffusion-v1-5 has been removed from Huggingface.
* Update docs/source/en/using-diffusers/inpaint.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Replace with stable-diffusion-v1-5/stable-diffusion-v1-5
* Update inpaint.md
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* find & replace all FloatTensors to Tensor
* apply formatting
* Update torch.FloatTensor to torch.Tensor in the remaining files
* formatting
* Fix the rest of the places where FloatTensor is used as well as in documentation
* formatting
* Update new file from FloatTensor to Tensor
* add documentation for DeepCache
* fix typo
* add wandb url for DeepCache
* fix some typos
* add item in _toctree.yml
* update formats for arguments
* Update deepcache.md
* Update docs/source/en/optimization/deepcache.md
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* add StableDiffusionXLPipeline in doc
* Separate SDPipeline and SDXLPipeline
* Add the paper link of ablation experiments for hyper-parameters
* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>