mirror of
https://github.com/huggingface/diffusers.git
synced 2025-12-09 05:54:24 +08:00
Compare commits
1 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
0dc0f98526 |
57
.github/ISSUE_TEMPLATE/bug-report.yml
vendored
57
.github/ISSUE_TEMPLATE/bug-report.yml
vendored
@@ -13,9 +13,8 @@ body:
|
|||||||
*Give your issue a fitting title. Assume that someone which very limited knowledge of diffusers can understand your issue. Add links to the source code, documentation other issues, pull requests etc...*
|
*Give your issue a fitting title. Assume that someone which very limited knowledge of diffusers can understand your issue. Add links to the source code, documentation other issues, pull requests etc...*
|
||||||
- 2. If your issue is about something not working, **always** provide a reproducible code snippet. The reader should be able to reproduce your issue by **only copy-pasting your code snippet into a Python shell**.
|
- 2. If your issue is about something not working, **always** provide a reproducible code snippet. The reader should be able to reproduce your issue by **only copy-pasting your code snippet into a Python shell**.
|
||||||
*The community cannot solve your issue if it cannot reproduce it. If your bug is related to training, add your training script and make everything needed to train public. Otherwise, just add a simple Python code snippet.*
|
*The community cannot solve your issue if it cannot reproduce it. If your bug is related to training, add your training script and make everything needed to train public. Otherwise, just add a simple Python code snippet.*
|
||||||
- 3. Add the **minimum** amount of code / context that is needed to understand, reproduce your issue.
|
- 3. Add the **minimum amount of code / context that is needed to understand, reproduce your issue**.
|
||||||
*Make the life of maintainers easy. `diffusers` is getting many issues every day. Make sure your issue is about one bug and one bug only. Make sure you add only the context, code needed to understand your issues - nothing more. Generally, every issue is a way of documenting this library, try to make it a good documentation entry.*
|
*Make the life of maintainers easy. `diffusers` is getting many issues every day. Make sure your issue is about one bug and one bug only. Make sure you add only the context, code needed to understand your issues - nothing more. Generally, every issue is a way of documenting this library, try to make it a good documentation entry.*
|
||||||
- 4. For issues related to community pipelines (i.e., the pipelines located in the `examples/community` folder), please tag the author of the pipeline in your issue thread as those pipelines are not maintained.
|
|
||||||
- type: markdown
|
- type: markdown
|
||||||
attributes:
|
attributes:
|
||||||
value: |
|
value: |
|
||||||
@@ -50,57 +49,3 @@ body:
|
|||||||
placeholder: diffusers version, platform, python version, ...
|
placeholder: diffusers version, platform, python version, ...
|
||||||
validations:
|
validations:
|
||||||
required: true
|
required: true
|
||||||
- type: textarea
|
|
||||||
id: who-can-help
|
|
||||||
attributes:
|
|
||||||
label: Who can help?
|
|
||||||
description: |
|
|
||||||
Your issue will be replied to more quickly if you can figure out the right person to tag with @
|
|
||||||
If you know how to use git blame, that is the easiest way, otherwise, here is a rough guide of **who to tag**.
|
|
||||||
|
|
||||||
All issues are read by one of the core maintainers, so if you don't know who to tag, just leave this blank and
|
|
||||||
a core maintainer will ping the right person.
|
|
||||||
|
|
||||||
Please tag a maximum of 2 people.
|
|
||||||
|
|
||||||
Questions on DiffusionPipeline (Saving, Loading, From pretrained, ...):
|
|
||||||
|
|
||||||
Questions on pipelines:
|
|
||||||
- Stable Diffusion @yiyixuxu @DN6 @patrickvonplaten @sayakpaul @patrickvonplaten
|
|
||||||
- Stable Diffusion XL @yiyixuxu @sayakpaul @DN6 @patrickvonplaten
|
|
||||||
- Kandinsky @yiyixuxu @patrickvonplaten
|
|
||||||
- ControlNet @sayakpaul @yiyixuxu @DN6 @patrickvonplaten
|
|
||||||
- T2I Adapter @sayakpaul @yiyixuxu @DN6 @patrickvonplaten
|
|
||||||
- IF @DN6 @patrickvonplaten
|
|
||||||
- Text-to-Video / Video-to-Video @DN6 @sayakpaul @patrickvonplaten
|
|
||||||
- Wuerstchen @DN6 @patrickvonplaten
|
|
||||||
- Other: @yiyixuxu @DN6
|
|
||||||
|
|
||||||
Questions on models:
|
|
||||||
- UNet @DN6 @yiyixuxu @sayakpaul @patrickvonplaten
|
|
||||||
- VAE @sayakpaul @DN6 @yiyixuxu @patrickvonplaten
|
|
||||||
- Transformers/Attention @DN6 @yiyixuxu @sayakpaul @DN6 @patrickvonplaten
|
|
||||||
|
|
||||||
Questions on Schedulers: @yiyixuxu @patrickvonplaten
|
|
||||||
|
|
||||||
Questions on LoRA: @sayakpaul @patrickvonplaten
|
|
||||||
|
|
||||||
Questions on Textual Inversion: @sayakpaul @patrickvonplaten
|
|
||||||
|
|
||||||
Questions on Training:
|
|
||||||
- DreamBooth @sayakpaul @patrickvonplaten
|
|
||||||
- Text-to-Image Fine-tuning @sayakpaul @patrickvonplaten
|
|
||||||
- Textual Inversion @sayakpaul @patrickvonplaten
|
|
||||||
- ControlNet @sayakpaul @patrickvonplaten
|
|
||||||
|
|
||||||
Questions on Tests: @DN6 @sayakpaul @yiyixuxu
|
|
||||||
|
|
||||||
Questions on Documentation: @stevhliu
|
|
||||||
|
|
||||||
Questions on JAX- and MPS-related things: @pcuenca
|
|
||||||
|
|
||||||
Questions on audio pipelines: @DN6 @patrickvonplaten
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
placeholder: "@Username ..."
|
|
||||||
|
|||||||
60
.github/PULL_REQUEST_TEMPLATE.md
vendored
60
.github/PULL_REQUEST_TEMPLATE.md
vendored
@@ -1,60 +0,0 @@
|
|||||||
# What does this PR do?
|
|
||||||
|
|
||||||
<!--
|
|
||||||
Congratulations! You've made it this far! You're not quite done yet though.
|
|
||||||
|
|
||||||
Once merged, your PR is going to appear in the release notes with the title you set, so make sure it's a great title that fully reflects the extent of your awesome contribution.
|
|
||||||
|
|
||||||
Then, please replace this with a description of the change and which issue is fixed (if applicable). Please also include relevant motivation and context. List any dependencies (if any) that are required for this change.
|
|
||||||
|
|
||||||
Once you're done, someone will review your PR shortly (see the section "Who can review?" below to tag some potential reviewers). They may suggest changes to make the code even better. If no one reviewed your PR after a week has passed, don't hesitate to post a new comment @-mentioning the same persons---sometimes notifications get lost.
|
|
||||||
-->
|
|
||||||
|
|
||||||
<!-- Remove if not applicable -->
|
|
||||||
|
|
||||||
Fixes # (issue)
|
|
||||||
|
|
||||||
|
|
||||||
## Before submitting
|
|
||||||
- [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
|
|
||||||
- [ ] Did you read the [contributor guideline](https://github.com/huggingface/diffusers/blob/main/CONTRIBUTING.md)?
|
|
||||||
- [ ] Did you read our [philosophy doc](https://github.com/huggingface/diffusers/blob/main/PHILOSOPHY.md) (important for complex PRs)?
|
|
||||||
- [ ] Was this discussed/approved via a Github issue or the [forum](https://discuss.huggingface.co/)? Please add a link to it if that's the case.
|
|
||||||
- [ ] Did you make sure to update the documentation with your changes? Here are the
|
|
||||||
[documentation guidelines](https://github.com/huggingface/diffusers/tree/main/docs), and
|
|
||||||
[here are tips on formatting docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
|
|
||||||
- [ ] Did you write any new necessary tests?
|
|
||||||
|
|
||||||
|
|
||||||
## Who can review?
|
|
||||||
|
|
||||||
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
|
|
||||||
members/contributors who may be interested in your PR.
|
|
||||||
|
|
||||||
<!-- Your PR will be replied to more quickly if you can figure out the right person to tag with @
|
|
||||||
|
|
||||||
If you know how to use git blame, that is the easiest way, otherwise, here is a rough guide of **who to tag**.
|
|
||||||
Please tag fewer than 3 people.
|
|
||||||
|
|
||||||
Core library:
|
|
||||||
|
|
||||||
- Schedulers: @williamberman and @patrickvonplaten
|
|
||||||
- Pipelines: @patrickvonplaten and @sayakpaul
|
|
||||||
- Training examples: @sayakpaul and @patrickvonplaten
|
|
||||||
- Docs: @stevhliu and @yiyixuxu
|
|
||||||
- JAX and MPS: @pcuenca
|
|
||||||
- Audio: @sanchit-gandhi
|
|
||||||
- General functionalities: @patrickvonplaten and @sayakpaul
|
|
||||||
|
|
||||||
Integrations:
|
|
||||||
|
|
||||||
- deepspeed: HF Trainer/Accelerate: @pacman100
|
|
||||||
|
|
||||||
HF projects:
|
|
||||||
|
|
||||||
- accelerate: [different repo](https://github.com/huggingface/accelerate)
|
|
||||||
- datasets: [different repo](https://github.com/huggingface/datasets)
|
|
||||||
- transformers: [different repo](https://github.com/huggingface/transformers)
|
|
||||||
- safetensors: [different repo](https://github.com/huggingface/safetensors)
|
|
||||||
|
|
||||||
-->
|
|
||||||
4
.github/actions/setup-miniconda/action.yml
vendored
4
.github/actions/setup-miniconda/action.yml
vendored
@@ -27,7 +27,7 @@ runs:
|
|||||||
- name: Get date
|
- name: Get date
|
||||||
id: get-date
|
id: get-date
|
||||||
shell: bash
|
shell: bash
|
||||||
run: echo "today=$(/bin/date -u '+%Y%m%d')d" >> $GITHUB_OUTPUT
|
run: echo "::set-output name=today::$(/bin/date -u '+%Y%m%d')d"
|
||||||
- name: Setup miniconda cache
|
- name: Setup miniconda cache
|
||||||
id: miniconda-cache
|
id: miniconda-cache
|
||||||
uses: actions/cache@v2
|
uses: actions/cache@v2
|
||||||
@@ -143,4 +143,4 @@ runs:
|
|||||||
echo "There is ${AVAIL}KB free space left in $MOUNT, continue"
|
echo "There is ${AVAIL}KB free space left in $MOUNT, continue"
|
||||||
fi
|
fi
|
||||||
fi
|
fi
|
||||||
done
|
done
|
||||||
2
.github/workflows/build_docker_images.yml
vendored
2
.github/workflows/build_docker_images.yml
vendored
@@ -26,8 +26,6 @@ jobs:
|
|||||||
image-name:
|
image-name:
|
||||||
- diffusers-pytorch-cpu
|
- diffusers-pytorch-cpu
|
||||||
- diffusers-pytorch-cuda
|
- diffusers-pytorch-cuda
|
||||||
- diffusers-pytorch-compile-cuda
|
|
||||||
- diffusers-pytorch-xformers-cuda
|
|
||||||
- diffusers-flax-cpu
|
- diffusers-flax-cpu
|
||||||
- diffusers-flax-tpu
|
- diffusers-flax-tpu
|
||||||
- diffusers-onnxruntime-cpu
|
- diffusers-onnxruntime-cpu
|
||||||
|
|||||||
8
.github/workflows/build_documentation.yml
vendored
8
.github/workflows/build_documentation.yml
vendored
@@ -6,18 +6,14 @@ on:
|
|||||||
- main
|
- main
|
||||||
- doc-builder*
|
- doc-builder*
|
||||||
- v*-release
|
- v*-release
|
||||||
- v*-patch
|
|
||||||
|
|
||||||
jobs:
|
jobs:
|
||||||
build:
|
build:
|
||||||
uses: huggingface/doc-builder/.github/workflows/build_main_documentation.yml@main
|
uses: huggingface/doc-builder/.github/workflows/build_main_documentation.yml@main
|
||||||
with:
|
with:
|
||||||
commit_sha: ${{ github.sha }}
|
commit_sha: ${{ github.sha }}
|
||||||
install_libgl1: true
|
|
||||||
package: diffusers
|
package: diffusers
|
||||||
notebook_folder: diffusers_doc
|
notebook_folder: diffusers_doc
|
||||||
languages: en ko zh ja pt
|
languages: en ko
|
||||||
|
|
||||||
secrets:
|
secrets:
|
||||||
token: ${{ secrets.HUGGINGFACE_PUSH }}
|
token: ${{ secrets.HUGGINGFACE_PUSH }}
|
||||||
hf_token: ${{ secrets.HF_DOC_BUILD_PUSH }}
|
|
||||||
|
|||||||
3
.github/workflows/build_pr_documentation.yml
vendored
3
.github/workflows/build_pr_documentation.yml
vendored
@@ -13,6 +13,5 @@ jobs:
|
|||||||
with:
|
with:
|
||||||
commit_sha: ${{ github.event.pull_request.head.sha }}
|
commit_sha: ${{ github.event.pull_request.head.sha }}
|
||||||
pr_number: ${{ github.event.number }}
|
pr_number: ${{ github.event.number }}
|
||||||
install_libgl1: true
|
|
||||||
package: diffusers
|
package: diffusers
|
||||||
languages: en ko zh ja pt
|
languages: en ko
|
||||||
|
|||||||
13
.github/workflows/delete_doc_comment.yml
vendored
13
.github/workflows/delete_doc_comment.yml
vendored
@@ -1,14 +1,13 @@
|
|||||||
name: Delete doc comment
|
name: Delete dev documentation
|
||||||
|
|
||||||
on:
|
on:
|
||||||
workflow_run:
|
pull_request:
|
||||||
workflows: ["Delete doc comment trigger"]
|
types: [ closed ]
|
||||||
types:
|
|
||||||
- completed
|
|
||||||
|
|
||||||
|
|
||||||
jobs:
|
jobs:
|
||||||
delete:
|
delete:
|
||||||
uses: huggingface/doc-builder/.github/workflows/delete_doc_comment.yml@main
|
uses: huggingface/doc-builder/.github/workflows/delete_doc_comment.yml@main
|
||||||
secrets:
|
with:
|
||||||
comment_bot_token: ${{ secrets.COMMENT_BOT_TOKEN }}
|
pr_number: ${{ github.event.number }}
|
||||||
|
package: diffusers
|
||||||
|
|||||||
12
.github/workflows/delete_doc_comment_trigger.yml
vendored
12
.github/workflows/delete_doc_comment_trigger.yml
vendored
@@ -1,12 +0,0 @@
|
|||||||
name: Delete doc comment trigger
|
|
||||||
|
|
||||||
on:
|
|
||||||
pull_request:
|
|
||||||
types: [ closed ]
|
|
||||||
|
|
||||||
|
|
||||||
jobs:
|
|
||||||
delete:
|
|
||||||
uses: huggingface/doc-builder/.github/workflows/delete_doc_comment_trigger.yml@main
|
|
||||||
with:
|
|
||||||
pr_number: ${{ github.event.number }}
|
|
||||||
32
.github/workflows/pr_dependency_test.yml
vendored
32
.github/workflows/pr_dependency_test.yml
vendored
@@ -1,32 +0,0 @@
|
|||||||
name: Run dependency tests
|
|
||||||
|
|
||||||
on:
|
|
||||||
pull_request:
|
|
||||||
branches:
|
|
||||||
- main
|
|
||||||
push:
|
|
||||||
branches:
|
|
||||||
- main
|
|
||||||
|
|
||||||
concurrency:
|
|
||||||
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
|
|
||||||
cancel-in-progress: true
|
|
||||||
|
|
||||||
jobs:
|
|
||||||
check_dependencies:
|
|
||||||
runs-on: ubuntu-latest
|
|
||||||
steps:
|
|
||||||
- uses: actions/checkout@v3
|
|
||||||
- name: Set up Python
|
|
||||||
uses: actions/setup-python@v4
|
|
||||||
with:
|
|
||||||
python-version: "3.8"
|
|
||||||
- name: Install dependencies
|
|
||||||
run: |
|
|
||||||
python -m pip install --upgrade pip
|
|
||||||
pip install -e .
|
|
||||||
pip install pytest
|
|
||||||
- name: Check for soft dependencies
|
|
||||||
run: |
|
|
||||||
pytest tests/others/test_dependencies.py
|
|
||||||
|
|
||||||
4
.github/workflows/pr_quality.yml
vendored
4
.github/workflows/pr_quality.yml
vendored
@@ -20,7 +20,7 @@ jobs:
|
|||||||
- name: Set up Python
|
- name: Set up Python
|
||||||
uses: actions/setup-python@v4
|
uses: actions/setup-python@v4
|
||||||
with:
|
with:
|
||||||
python-version: "3.8"
|
python-version: "3.7"
|
||||||
- name: Install dependencies
|
- name: Install dependencies
|
||||||
run: |
|
run: |
|
||||||
python -m pip install --upgrade pip
|
python -m pip install --upgrade pip
|
||||||
@@ -38,7 +38,7 @@ jobs:
|
|||||||
- name: Set up Python
|
- name: Set up Python
|
||||||
uses: actions/setup-python@v4
|
uses: actions/setup-python@v4
|
||||||
with:
|
with:
|
||||||
python-version: "3.8"
|
python-version: "3.7"
|
||||||
- name: Install dependencies
|
- name: Install dependencies
|
||||||
run: |
|
run: |
|
||||||
python -m pip install --upgrade pip
|
python -m pip install --upgrade pip
|
||||||
|
|||||||
67
.github/workflows/pr_test_peft_backend.yml
vendored
67
.github/workflows/pr_test_peft_backend.yml
vendored
@@ -1,67 +0,0 @@
|
|||||||
name: Fast tests for PRs - PEFT backend
|
|
||||||
|
|
||||||
on:
|
|
||||||
pull_request:
|
|
||||||
branches:
|
|
||||||
- main
|
|
||||||
|
|
||||||
concurrency:
|
|
||||||
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
|
|
||||||
cancel-in-progress: true
|
|
||||||
|
|
||||||
env:
|
|
||||||
DIFFUSERS_IS_CI: yes
|
|
||||||
OMP_NUM_THREADS: 4
|
|
||||||
MKL_NUM_THREADS: 4
|
|
||||||
PYTEST_TIMEOUT: 60
|
|
||||||
|
|
||||||
jobs:
|
|
||||||
run_fast_tests:
|
|
||||||
strategy:
|
|
||||||
fail-fast: false
|
|
||||||
matrix:
|
|
||||||
config:
|
|
||||||
- name: LoRA
|
|
||||||
framework: lora
|
|
||||||
runner: docker-cpu
|
|
||||||
image: diffusers/diffusers-pytorch-cpu
|
|
||||||
report: torch_cpu_lora
|
|
||||||
|
|
||||||
|
|
||||||
name: ${{ matrix.config.name }}
|
|
||||||
|
|
||||||
runs-on: ${{ matrix.config.runner }}
|
|
||||||
|
|
||||||
container:
|
|
||||||
image: ${{ matrix.config.image }}
|
|
||||||
options: --shm-size "16gb" --ipc host -v /mnt/hf_cache:/mnt/cache/
|
|
||||||
|
|
||||||
defaults:
|
|
||||||
run:
|
|
||||||
shell: bash
|
|
||||||
|
|
||||||
steps:
|
|
||||||
- name: Checkout diffusers
|
|
||||||
uses: actions/checkout@v3
|
|
||||||
with:
|
|
||||||
fetch-depth: 2
|
|
||||||
|
|
||||||
- name: Install dependencies
|
|
||||||
run: |
|
|
||||||
apt-get update && apt-get install libsndfile1-dev libgl1 -y
|
|
||||||
python -m pip install -e .[quality,test]
|
|
||||||
python -m pip install git+https://github.com/huggingface/accelerate.git
|
|
||||||
python -m pip install -U git+https://github.com/huggingface/transformers.git
|
|
||||||
python -m pip install -U git+https://github.com/huggingface/peft.git
|
|
||||||
|
|
||||||
- name: Environment
|
|
||||||
run: |
|
|
||||||
python utils/print_env.py
|
|
||||||
|
|
||||||
- name: Run fast PyTorch LoRA CPU tests with PEFT backend
|
|
||||||
if: ${{ matrix.config.framework == 'lora' }}
|
|
||||||
run: |
|
|
||||||
python -m pytest -n 2 --max-worker-restart=0 --dist=loadfile \
|
|
||||||
-s -v \
|
|
||||||
--make-reports=tests_${{ matrix.config.report }} \
|
|
||||||
tests/lora/test_lora_layers_peft.py
|
|
||||||
100
.github/workflows/pr_tests.yml
vendored
100
.github/workflows/pr_tests.yml
vendored
@@ -4,9 +4,6 @@ on:
|
|||||||
pull_request:
|
pull_request:
|
||||||
branches:
|
branches:
|
||||||
- main
|
- main
|
||||||
push:
|
|
||||||
branches:
|
|
||||||
- ci-*
|
|
||||||
|
|
||||||
concurrency:
|
concurrency:
|
||||||
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
|
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
|
||||||
@@ -34,16 +31,16 @@ jobs:
|
|||||||
runner: docker-cpu
|
runner: docker-cpu
|
||||||
image: diffusers/diffusers-pytorch-cpu
|
image: diffusers/diffusers-pytorch-cpu
|
||||||
report: torch_cpu_models_schedulers
|
report: torch_cpu_models_schedulers
|
||||||
- name: LoRA
|
|
||||||
framework: lora
|
|
||||||
runner: docker-cpu
|
|
||||||
image: diffusers/diffusers-pytorch-cpu
|
|
||||||
report: torch_cpu_lora
|
|
||||||
- name: Fast Flax CPU tests
|
- name: Fast Flax CPU tests
|
||||||
framework: flax
|
framework: flax
|
||||||
runner: docker-cpu
|
runner: docker-cpu
|
||||||
image: diffusers/diffusers-flax-cpu
|
image: diffusers/diffusers-flax-cpu
|
||||||
report: flax_cpu
|
report: flax_cpu
|
||||||
|
- name: Fast ONNXRuntime CPU tests
|
||||||
|
framework: onnxruntime
|
||||||
|
runner: docker-cpu
|
||||||
|
image: diffusers/diffusers-onnxruntime-cpu
|
||||||
|
report: onnx_cpu
|
||||||
- name: PyTorch Example CPU tests
|
- name: PyTorch Example CPU tests
|
||||||
framework: pytorch_examples
|
framework: pytorch_examples
|
||||||
runner: docker-cpu
|
runner: docker-cpu
|
||||||
@@ -70,9 +67,10 @@ jobs:
|
|||||||
|
|
||||||
- name: Install dependencies
|
- name: Install dependencies
|
||||||
run: |
|
run: |
|
||||||
apt-get update && apt-get install libsndfile1-dev libgl1 -y
|
apt-get update && apt-get install libsndfile1-dev -y
|
||||||
python -m pip install -e .[quality,test]
|
python -m pip install -e .[quality,test]
|
||||||
python -m pip install git+https://github.com/huggingface/accelerate.git
|
python -m pip install -U git+https://github.com/huggingface/transformers
|
||||||
|
python -m pip install git+https://github.com/huggingface/accelerate
|
||||||
|
|
||||||
- name: Environment
|
- name: Environment
|
||||||
run: |
|
run: |
|
||||||
@@ -90,18 +88,10 @@ jobs:
|
|||||||
if: ${{ matrix.config.framework == 'pytorch_models' }}
|
if: ${{ matrix.config.framework == 'pytorch_models' }}
|
||||||
run: |
|
run: |
|
||||||
python -m pytest -n 2 --max-worker-restart=0 --dist=loadfile \
|
python -m pytest -n 2 --max-worker-restart=0 --dist=loadfile \
|
||||||
-s -v -k "not Flax and not Onnx and not Dependency" \
|
-s -v -k "not Flax and not Onnx" \
|
||||||
--make-reports=tests_${{ matrix.config.report }} \
|
--make-reports=tests_${{ matrix.config.report }} \
|
||||||
tests/models tests/schedulers tests/others
|
tests/models tests/schedulers tests/others
|
||||||
|
|
||||||
- name: Run fast PyTorch LoRA CPU tests
|
|
||||||
if: ${{ matrix.config.framework == 'lora' }}
|
|
||||||
run: |
|
|
||||||
python -m pytest -n 2 --max-worker-restart=0 --dist=loadfile \
|
|
||||||
-s -v -k "not Flax and not Onnx and not Dependency" \
|
|
||||||
--make-reports=tests_${{ matrix.config.report }} \
|
|
||||||
tests/lora
|
|
||||||
|
|
||||||
- name: Run fast Flax TPU tests
|
- name: Run fast Flax TPU tests
|
||||||
if: ${{ matrix.config.framework == 'flax' }}
|
if: ${{ matrix.config.framework == 'flax' }}
|
||||||
run: |
|
run: |
|
||||||
@@ -110,6 +100,14 @@ jobs:
|
|||||||
--make-reports=tests_${{ matrix.config.report }} \
|
--make-reports=tests_${{ matrix.config.report }} \
|
||||||
tests
|
tests
|
||||||
|
|
||||||
|
- name: Run fast ONNXRuntime CPU tests
|
||||||
|
if: ${{ matrix.config.framework == 'onnxruntime' }}
|
||||||
|
run: |
|
||||||
|
python -m pytest -n 2 --max-worker-restart=0 --dist=loadfile \
|
||||||
|
-s -v -k "Onnx" \
|
||||||
|
--make-reports=tests_${{ matrix.config.report }} \
|
||||||
|
tests/
|
||||||
|
|
||||||
- name: Run example PyTorch CPU tests
|
- name: Run example PyTorch CPU tests
|
||||||
if: ${{ matrix.config.framework == 'pytorch_examples' }}
|
if: ${{ matrix.config.framework == 'pytorch_examples' }}
|
||||||
run: |
|
run: |
|
||||||
@@ -128,28 +126,9 @@ jobs:
|
|||||||
name: pr_${{ matrix.config.report }}_test_reports
|
name: pr_${{ matrix.config.report }}_test_reports
|
||||||
path: reports
|
path: reports
|
||||||
|
|
||||||
run_staging_tests:
|
run_fast_tests_apple_m1:
|
||||||
strategy:
|
name: Fast PyTorch MPS tests on MacOS
|
||||||
fail-fast: false
|
runs-on: [ self-hosted, apple-m1 ]
|
||||||
matrix:
|
|
||||||
config:
|
|
||||||
- name: Hub tests for models, schedulers, and pipelines
|
|
||||||
framework: hub_tests_pytorch
|
|
||||||
runner: docker-cpu
|
|
||||||
image: diffusers/diffusers-pytorch-cpu
|
|
||||||
report: torch_hub
|
|
||||||
|
|
||||||
name: ${{ matrix.config.name }}
|
|
||||||
|
|
||||||
runs-on: ${{ matrix.config.runner }}
|
|
||||||
|
|
||||||
container:
|
|
||||||
image: ${{ matrix.config.image }}
|
|
||||||
options: --shm-size "16gb" --ipc host -v /mnt/hf_cache:/mnt/cache/
|
|
||||||
|
|
||||||
defaults:
|
|
||||||
run:
|
|
||||||
shell: bash
|
|
||||||
|
|
||||||
steps:
|
steps:
|
||||||
- name: Checkout diffusers
|
- name: Checkout diffusers
|
||||||
@@ -157,30 +136,45 @@ jobs:
|
|||||||
with:
|
with:
|
||||||
fetch-depth: 2
|
fetch-depth: 2
|
||||||
|
|
||||||
- name: Install dependencies
|
- name: Clean checkout
|
||||||
|
shell: arch -arch arm64 bash {0}
|
||||||
run: |
|
run: |
|
||||||
apt-get update && apt-get install libsndfile1-dev libgl1 -y
|
git clean -fxd
|
||||||
python -m pip install -e .[quality,test]
|
|
||||||
|
- name: Setup miniconda
|
||||||
|
uses: ./.github/actions/setup-miniconda
|
||||||
|
with:
|
||||||
|
python-version: 3.9
|
||||||
|
|
||||||
|
- name: Install dependencies
|
||||||
|
shell: arch -arch arm64 bash {0}
|
||||||
|
run: |
|
||||||
|
${CONDA_RUN} python -m pip install --upgrade pip
|
||||||
|
${CONDA_RUN} python -m pip install -e .[quality,test]
|
||||||
|
${CONDA_RUN} python -m pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cpu
|
||||||
|
${CONDA_RUN} python -m pip install git+https://github.com/huggingface/accelerate
|
||||||
|
${CONDA_RUN} python -m pip install -U git+https://github.com/huggingface/transformers
|
||||||
|
|
||||||
- name: Environment
|
- name: Environment
|
||||||
|
shell: arch -arch arm64 bash {0}
|
||||||
run: |
|
run: |
|
||||||
python utils/print_env.py
|
${CONDA_RUN} python utils/print_env.py
|
||||||
|
|
||||||
- name: Run Hub tests for models, schedulers, and pipelines on a staging env
|
- name: Run fast PyTorch tests on M1 (MPS)
|
||||||
if: ${{ matrix.config.framework == 'hub_tests_pytorch' }}
|
shell: arch -arch arm64 bash {0}
|
||||||
|
env:
|
||||||
|
HF_HOME: /System/Volumes/Data/mnt/cache
|
||||||
|
HUGGING_FACE_HUB_TOKEN: ${{ secrets.HUGGING_FACE_HUB_TOKEN }}
|
||||||
run: |
|
run: |
|
||||||
HUGGINGFACE_CO_STAGING=true python -m pytest \
|
${CONDA_RUN} python -m pytest -n 0 -s -v --make-reports=tests_torch_mps tests/
|
||||||
-m "is_staging_test" \
|
|
||||||
--make-reports=tests_${{ matrix.config.report }} \
|
|
||||||
tests
|
|
||||||
|
|
||||||
- name: Failure short reports
|
- name: Failure short reports
|
||||||
if: ${{ failure() }}
|
if: ${{ failure() }}
|
||||||
run: cat reports/tests_${{ matrix.config.report }}_failures_short.txt
|
run: cat reports/tests_torch_mps_failures_short.txt
|
||||||
|
|
||||||
- name: Test suite reports artifacts
|
- name: Test suite reports artifacts
|
||||||
if: ${{ always() }}
|
if: ${{ always() }}
|
||||||
uses: actions/upload-artifact@v2
|
uses: actions/upload-artifact@v2
|
||||||
with:
|
with:
|
||||||
name: pr_${{ matrix.config.report }}_test_reports
|
name: pr_torch_mps_test_reports
|
||||||
path: reports
|
path: reports
|
||||||
|
|||||||
361
.github/workflows/push_tests.yml
vendored
361
.github/workflows/push_tests.yml
vendored
@@ -1,11 +1,10 @@
|
|||||||
name: Slow Tests on main
|
name: Slow tests on main
|
||||||
|
|
||||||
on:
|
on:
|
||||||
push:
|
push:
|
||||||
branches:
|
branches:
|
||||||
- main
|
- main
|
||||||
|
|
||||||
|
|
||||||
env:
|
env:
|
||||||
DIFFUSERS_IS_CI: yes
|
DIFFUSERS_IS_CI: yes
|
||||||
HF_HOME: /mnt/cache
|
HF_HOME: /mnt/cache
|
||||||
@@ -13,371 +12,101 @@ env:
|
|||||||
MKL_NUM_THREADS: 8
|
MKL_NUM_THREADS: 8
|
||||||
PYTEST_TIMEOUT: 600
|
PYTEST_TIMEOUT: 600
|
||||||
RUN_SLOW: yes
|
RUN_SLOW: yes
|
||||||
PIPELINE_USAGE_CUTOFF: 50000
|
|
||||||
|
|
||||||
jobs:
|
jobs:
|
||||||
setup_torch_cuda_pipeline_matrix:
|
run_slow_tests:
|
||||||
name: Setup Torch Pipelines CUDA Slow Tests Matrix
|
|
||||||
runs-on: docker-gpu
|
|
||||||
container:
|
|
||||||
image: diffusers/diffusers-pytorch-cpu # this is a CPU image, but we need it to fetch the matrix
|
|
||||||
options: --shm-size "16gb" --ipc host
|
|
||||||
outputs:
|
|
||||||
pipeline_test_matrix: ${{ steps.fetch_pipeline_matrix.outputs.pipeline_test_matrix }}
|
|
||||||
steps:
|
|
||||||
- name: Checkout diffusers
|
|
||||||
uses: actions/checkout@v3
|
|
||||||
with:
|
|
||||||
fetch-depth: 2
|
|
||||||
- name: Install dependencies
|
|
||||||
run: |
|
|
||||||
apt-get update && apt-get install libsndfile1-dev libgl1 -y
|
|
||||||
python -m pip install -e .[quality,test]
|
|
||||||
python -m pip install git+https://github.com/huggingface/accelerate.git
|
|
||||||
|
|
||||||
- name: Environment
|
|
||||||
run: |
|
|
||||||
python utils/print_env.py
|
|
||||||
|
|
||||||
- name: Fetch Pipeline Matrix
|
|
||||||
id: fetch_pipeline_matrix
|
|
||||||
run: |
|
|
||||||
matrix=$(python utils/fetch_torch_cuda_pipeline_test_matrix.py)
|
|
||||||
echo $matrix
|
|
||||||
echo "pipeline_test_matrix=$matrix" >> $GITHUB_OUTPUT
|
|
||||||
|
|
||||||
- name: Pipeline Tests Artifacts
|
|
||||||
if: ${{ always() }}
|
|
||||||
uses: actions/upload-artifact@v2
|
|
||||||
with:
|
|
||||||
name: test-pipelines.json
|
|
||||||
path: reports
|
|
||||||
|
|
||||||
torch_pipelines_cuda_tests:
|
|
||||||
name: Torch Pipelines CUDA Slow Tests
|
|
||||||
needs: setup_torch_cuda_pipeline_matrix
|
|
||||||
strategy:
|
strategy:
|
||||||
fail-fast: false
|
fail-fast: false
|
||||||
max-parallel: 1
|
|
||||||
matrix:
|
matrix:
|
||||||
module: ${{ fromJson(needs.setup_torch_cuda_pipeline_matrix.outputs.pipeline_test_matrix) }}
|
config:
|
||||||
runs-on: docker-gpu
|
- name: Slow PyTorch CUDA tests on Ubuntu
|
||||||
container:
|
framework: pytorch
|
||||||
image: diffusers/diffusers-pytorch-cuda
|
runner: docker-gpu
|
||||||
options: --shm-size "16gb" --ipc host -v /mnt/hf_cache:/mnt/cache/ --gpus 0
|
image: diffusers/diffusers-pytorch-cuda
|
||||||
steps:
|
report: torch_cuda
|
||||||
- name: Checkout diffusers
|
- name: Slow Flax TPU tests on Ubuntu
|
||||||
uses: actions/checkout@v3
|
framework: flax
|
||||||
with:
|
runner: docker-tpu
|
||||||
fetch-depth: 2
|
image: diffusers/diffusers-flax-tpu
|
||||||
- name: NVIDIA-SMI
|
report: flax_tpu
|
||||||
run: |
|
- name: Slow ONNXRuntime CUDA tests on Ubuntu
|
||||||
nvidia-smi
|
framework: onnxruntime
|
||||||
- name: Install dependencies
|
runner: docker-gpu
|
||||||
run: |
|
image: diffusers/diffusers-onnxruntime-cuda
|
||||||
apt-get update && apt-get install libsndfile1-dev libgl1 -y
|
report: onnx_cuda
|
||||||
python -m pip install -e .[quality,test]
|
|
||||||
python -m pip install git+https://github.com/huggingface/accelerate.git
|
|
||||||
- name: Environment
|
|
||||||
run: |
|
|
||||||
python utils/print_env.py
|
|
||||||
- name: Slow PyTorch CUDA checkpoint tests on Ubuntu
|
|
||||||
env:
|
|
||||||
HUGGING_FACE_HUB_TOKEN: ${{ secrets.HUGGING_FACE_HUB_TOKEN }}
|
|
||||||
# https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
|
|
||||||
CUBLAS_WORKSPACE_CONFIG: :16:8
|
|
||||||
run: |
|
|
||||||
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
|
|
||||||
-s -v -k "not Flax and not Onnx" \
|
|
||||||
--make-reports=tests_pipeline_${{ matrix.module }}_cuda \
|
|
||||||
tests/pipelines/${{ matrix.module }}
|
|
||||||
- name: Failure short reports
|
|
||||||
if: ${{ failure() }}
|
|
||||||
run: |
|
|
||||||
cat reports/tests_pipeline_${{ matrix.module }}_cuda_stats.txt
|
|
||||||
cat reports/tests_pipeline_${{ matrix.module }}_cuda_failures_short.txt
|
|
||||||
|
|
||||||
- name: Test suite reports artifacts
|
name: ${{ matrix.config.name }}
|
||||||
if: ${{ always() }}
|
|
||||||
uses: actions/upload-artifact@v2
|
runs-on: ${{ matrix.config.runner }}
|
||||||
with:
|
|
||||||
name: pipeline_${{ matrix.module }}_test_reports
|
|
||||||
path: reports
|
|
||||||
|
|
||||||
torch_cuda_tests:
|
|
||||||
name: Torch CUDA Tests
|
|
||||||
runs-on: docker-gpu
|
|
||||||
container:
|
container:
|
||||||
image: diffusers/diffusers-pytorch-cuda
|
image: ${{ matrix.config.image }}
|
||||||
options: --shm-size "16gb" --ipc host -v /mnt/hf_cache:/mnt/cache/ --gpus 0
|
options: --shm-size "16gb" --ipc host -v /mnt/hf_cache:/mnt/cache/ ${{ matrix.config.runner == 'docker-tpu' && '--privileged' || '--gpus 0'}}
|
||||||
|
|
||||||
defaults:
|
defaults:
|
||||||
run:
|
run:
|
||||||
shell: bash
|
shell: bash
|
||||||
strategy:
|
|
||||||
matrix:
|
|
||||||
module: [models, schedulers, lora, others]
|
|
||||||
steps:
|
steps:
|
||||||
- name: Checkout diffusers
|
- name: Checkout diffusers
|
||||||
uses: actions/checkout@v3
|
uses: actions/checkout@v3
|
||||||
with:
|
with:
|
||||||
fetch-depth: 2
|
fetch-depth: 2
|
||||||
|
|
||||||
|
- name: NVIDIA-SMI
|
||||||
|
if : ${{ matrix.config.runner == 'docker-gpu' }}
|
||||||
|
run: |
|
||||||
|
nvidia-smi
|
||||||
|
|
||||||
- name: Install dependencies
|
- name: Install dependencies
|
||||||
run: |
|
run: |
|
||||||
apt-get update && apt-get install libsndfile1-dev libgl1 -y
|
|
||||||
python -m pip install -e .[quality,test]
|
python -m pip install -e .[quality,test]
|
||||||
python -m pip install git+https://github.com/huggingface/accelerate.git
|
python -m pip install -U git+https://github.com/huggingface/transformers
|
||||||
|
python -m pip install git+https://github.com/huggingface/accelerate
|
||||||
|
|
||||||
- name: Environment
|
- name: Environment
|
||||||
run: |
|
run: |
|
||||||
python utils/print_env.py
|
python utils/print_env.py
|
||||||
|
|
||||||
- name: Run slow PyTorch CUDA tests
|
- name: Run slow PyTorch CUDA tests
|
||||||
|
if: ${{ matrix.config.framework == 'pytorch' }}
|
||||||
env:
|
env:
|
||||||
HUGGING_FACE_HUB_TOKEN: ${{ secrets.HUGGING_FACE_HUB_TOKEN }}
|
HUGGING_FACE_HUB_TOKEN: ${{ secrets.HUGGING_FACE_HUB_TOKEN }}
|
||||||
# https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
|
|
||||||
CUBLAS_WORKSPACE_CONFIG: :16:8
|
|
||||||
run: |
|
run: |
|
||||||
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
|
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
|
||||||
-s -v -k "not Flax and not Onnx" \
|
-s -v -k "not Flax and not Onnx" \
|
||||||
--make-reports=tests_torch_cuda \
|
--make-reports=tests_${{ matrix.config.report }} \
|
||||||
tests/${{ matrix.module }}
|
tests/
|
||||||
|
|
||||||
- name: Failure short reports
|
|
||||||
if: ${{ failure() }}
|
|
||||||
run: |
|
|
||||||
cat reports/tests_torch_cuda_stats.txt
|
|
||||||
cat reports/tests_torch_cuda_failures_short.txt
|
|
||||||
|
|
||||||
- name: Test suite reports artifacts
|
|
||||||
if: ${{ always() }}
|
|
||||||
uses: actions/upload-artifact@v2
|
|
||||||
with:
|
|
||||||
name: torch_cuda_test_reports
|
|
||||||
path: reports
|
|
||||||
|
|
||||||
peft_cuda_tests:
|
|
||||||
name: PEFT CUDA Tests
|
|
||||||
runs-on: docker-gpu
|
|
||||||
container:
|
|
||||||
image: diffusers/diffusers-pytorch-cuda
|
|
||||||
options: --shm-size "16gb" --ipc host -v /mnt/hf_cache:/mnt/cache/ --gpus 0
|
|
||||||
defaults:
|
|
||||||
run:
|
|
||||||
shell: bash
|
|
||||||
steps:
|
|
||||||
- name: Checkout diffusers
|
|
||||||
uses: actions/checkout@v3
|
|
||||||
with:
|
|
||||||
fetch-depth: 2
|
|
||||||
|
|
||||||
- name: Install dependencies
|
|
||||||
run: |
|
|
||||||
apt-get update && apt-get install libsndfile1-dev libgl1 -y
|
|
||||||
python -m pip install -e .[quality,test]
|
|
||||||
python -m pip install git+https://github.com/huggingface/accelerate.git
|
|
||||||
python -m pip install git+https://github.com/huggingface/peft.git
|
|
||||||
|
|
||||||
- name: Environment
|
|
||||||
run: |
|
|
||||||
python utils/print_env.py
|
|
||||||
|
|
||||||
- name: Run slow PEFT CUDA tests
|
|
||||||
env:
|
|
||||||
HUGGING_FACE_HUB_TOKEN: ${{ secrets.HUGGING_FACE_HUB_TOKEN }}
|
|
||||||
# https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
|
|
||||||
CUBLAS_WORKSPACE_CONFIG: :16:8
|
|
||||||
run: |
|
|
||||||
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
|
|
||||||
-s -v -k "not Flax and not Onnx" \
|
|
||||||
--make-reports=tests_peft_cuda \
|
|
||||||
tests/lora/
|
|
||||||
|
|
||||||
- name: Failure short reports
|
|
||||||
if: ${{ failure() }}
|
|
||||||
run: |
|
|
||||||
cat reports/tests_peft_cuda_stats.txt
|
|
||||||
cat reports/tests_peft_cuda_failures_short.txt
|
|
||||||
|
|
||||||
- name: Test suite reports artifacts
|
|
||||||
if: ${{ always() }}
|
|
||||||
uses: actions/upload-artifact@v2
|
|
||||||
with:
|
|
||||||
name: torch_peft_test_reports
|
|
||||||
path: reports
|
|
||||||
|
|
||||||
flax_tpu_tests:
|
|
||||||
name: Flax TPU Tests
|
|
||||||
runs-on: docker-tpu
|
|
||||||
container:
|
|
||||||
image: diffusers/diffusers-flax-tpu
|
|
||||||
options: --shm-size "16gb" --ipc host -v /mnt/hf_cache:/mnt/cache/ --privileged
|
|
||||||
defaults:
|
|
||||||
run:
|
|
||||||
shell: bash
|
|
||||||
steps:
|
|
||||||
- name: Checkout diffusers
|
|
||||||
uses: actions/checkout@v3
|
|
||||||
with:
|
|
||||||
fetch-depth: 2
|
|
||||||
|
|
||||||
- name: Install dependencies
|
|
||||||
run: |
|
|
||||||
apt-get update && apt-get install libsndfile1-dev libgl1 -y
|
|
||||||
python -m pip install -e .[quality,test]
|
|
||||||
python -m pip install git+https://github.com/huggingface/accelerate.git
|
|
||||||
|
|
||||||
- name: Environment
|
|
||||||
run: |
|
|
||||||
python utils/print_env.py
|
|
||||||
|
|
||||||
- name: Run slow Flax TPU tests
|
- name: Run slow Flax TPU tests
|
||||||
|
if: ${{ matrix.config.framework == 'flax' }}
|
||||||
env:
|
env:
|
||||||
HUGGING_FACE_HUB_TOKEN: ${{ secrets.HUGGING_FACE_HUB_TOKEN }}
|
HUGGING_FACE_HUB_TOKEN: ${{ secrets.HUGGING_FACE_HUB_TOKEN }}
|
||||||
run: |
|
run: |
|
||||||
python -m pytest -n 0 \
|
python -m pytest -n 0 \
|
||||||
-s -v -k "Flax" \
|
-s -v -k "Flax" \
|
||||||
--make-reports=tests_flax_tpu \
|
--make-reports=tests_${{ matrix.config.report }} \
|
||||||
tests/
|
tests/
|
||||||
|
|
||||||
- name: Failure short reports
|
|
||||||
if: ${{ failure() }}
|
|
||||||
run: |
|
|
||||||
cat reports/tests_flax_tpu_stats.txt
|
|
||||||
cat reports/tests_flax_tpu_failures_short.txt
|
|
||||||
|
|
||||||
- name: Test suite reports artifacts
|
|
||||||
if: ${{ always() }}
|
|
||||||
uses: actions/upload-artifact@v2
|
|
||||||
with:
|
|
||||||
name: flax_tpu_test_reports
|
|
||||||
path: reports
|
|
||||||
|
|
||||||
onnx_cuda_tests:
|
|
||||||
name: ONNX CUDA Tests
|
|
||||||
runs-on: docker-gpu
|
|
||||||
container:
|
|
||||||
image: diffusers/diffusers-onnxruntime-cuda
|
|
||||||
options: --shm-size "16gb" --ipc host -v /mnt/hf_cache:/mnt/cache/ --gpus 0
|
|
||||||
defaults:
|
|
||||||
run:
|
|
||||||
shell: bash
|
|
||||||
steps:
|
|
||||||
- name: Checkout diffusers
|
|
||||||
uses: actions/checkout@v3
|
|
||||||
with:
|
|
||||||
fetch-depth: 2
|
|
||||||
|
|
||||||
- name: Install dependencies
|
|
||||||
run: |
|
|
||||||
apt-get update && apt-get install libsndfile1-dev libgl1 -y
|
|
||||||
python -m pip install -e .[quality,test]
|
|
||||||
python -m pip install git+https://github.com/huggingface/accelerate.git
|
|
||||||
|
|
||||||
- name: Environment
|
|
||||||
run: |
|
|
||||||
python utils/print_env.py
|
|
||||||
|
|
||||||
- name: Run slow ONNXRuntime CUDA tests
|
- name: Run slow ONNXRuntime CUDA tests
|
||||||
|
if: ${{ matrix.config.framework == 'onnxruntime' }}
|
||||||
env:
|
env:
|
||||||
HUGGING_FACE_HUB_TOKEN: ${{ secrets.HUGGING_FACE_HUB_TOKEN }}
|
HUGGING_FACE_HUB_TOKEN: ${{ secrets.HUGGING_FACE_HUB_TOKEN }}
|
||||||
run: |
|
run: |
|
||||||
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
|
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
|
||||||
-s -v -k "Onnx" \
|
-s -v -k "Onnx" \
|
||||||
--make-reports=tests_onnx_cuda \
|
--make-reports=tests_${{ matrix.config.report }} \
|
||||||
tests/
|
tests/
|
||||||
|
|
||||||
- name: Failure short reports
|
- name: Failure short reports
|
||||||
if: ${{ failure() }}
|
if: ${{ failure() }}
|
||||||
run: |
|
run: cat reports/tests_${{ matrix.config.report }}_failures_short.txt
|
||||||
cat reports/tests_onnx_cuda_stats.txt
|
|
||||||
cat reports/tests_onnx_cuda_failures_short.txt
|
|
||||||
|
|
||||||
- name: Test suite reports artifacts
|
- name: Test suite reports artifacts
|
||||||
if: ${{ always() }}
|
if: ${{ always() }}
|
||||||
uses: actions/upload-artifact@v2
|
uses: actions/upload-artifact@v2
|
||||||
with:
|
with:
|
||||||
name: onnx_cuda_test_reports
|
name: ${{ matrix.config.report }}_test_reports
|
||||||
path: reports
|
|
||||||
|
|
||||||
run_torch_compile_tests:
|
|
||||||
name: PyTorch Compile CUDA tests
|
|
||||||
|
|
||||||
runs-on: docker-gpu
|
|
||||||
|
|
||||||
container:
|
|
||||||
image: diffusers/diffusers-pytorch-compile-cuda
|
|
||||||
options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/hf_cache:/mnt/cache/
|
|
||||||
|
|
||||||
steps:
|
|
||||||
- name: Checkout diffusers
|
|
||||||
uses: actions/checkout@v3
|
|
||||||
with:
|
|
||||||
fetch-depth: 2
|
|
||||||
|
|
||||||
- name: NVIDIA-SMI
|
|
||||||
run: |
|
|
||||||
nvidia-smi
|
|
||||||
- name: Install dependencies
|
|
||||||
run: |
|
|
||||||
python -m pip install -e .[quality,test,training]
|
|
||||||
- name: Environment
|
|
||||||
run: |
|
|
||||||
python utils/print_env.py
|
|
||||||
- name: Run example tests on GPU
|
|
||||||
env:
|
|
||||||
HUGGING_FACE_HUB_TOKEN: ${{ secrets.HUGGING_FACE_HUB_TOKEN }}
|
|
||||||
run: |
|
|
||||||
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile -s -v -k "compile" --make-reports=tests_torch_compile_cuda tests/
|
|
||||||
- name: Failure short reports
|
|
||||||
if: ${{ failure() }}
|
|
||||||
run: cat reports/tests_torch_compile_cuda_failures_short.txt
|
|
||||||
|
|
||||||
- name: Test suite reports artifacts
|
|
||||||
if: ${{ always() }}
|
|
||||||
uses: actions/upload-artifact@v2
|
|
||||||
with:
|
|
||||||
name: torch_compile_test_reports
|
|
||||||
path: reports
|
|
||||||
|
|
||||||
run_xformers_tests:
|
|
||||||
name: PyTorch xformers CUDA tests
|
|
||||||
|
|
||||||
runs-on: docker-gpu
|
|
||||||
|
|
||||||
container:
|
|
||||||
image: diffusers/diffusers-pytorch-xformers-cuda
|
|
||||||
options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/hf_cache:/mnt/cache/
|
|
||||||
|
|
||||||
steps:
|
|
||||||
- name: Checkout diffusers
|
|
||||||
uses: actions/checkout@v3
|
|
||||||
with:
|
|
||||||
fetch-depth: 2
|
|
||||||
|
|
||||||
- name: NVIDIA-SMI
|
|
||||||
run: |
|
|
||||||
nvidia-smi
|
|
||||||
- name: Install dependencies
|
|
||||||
run: |
|
|
||||||
python -m pip install -e .[quality,test,training]
|
|
||||||
- name: Environment
|
|
||||||
run: |
|
|
||||||
python utils/print_env.py
|
|
||||||
- name: Run example tests on GPU
|
|
||||||
env:
|
|
||||||
HUGGING_FACE_HUB_TOKEN: ${{ secrets.HUGGING_FACE_HUB_TOKEN }}
|
|
||||||
run: |
|
|
||||||
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile -s -v -k "xformers" --make-reports=tests_torch_xformers_cuda tests/
|
|
||||||
- name: Failure short reports
|
|
||||||
if: ${{ failure() }}
|
|
||||||
run: cat reports/tests_torch_xformers_cuda_failures_short.txt
|
|
||||||
|
|
||||||
- name: Test suite reports artifacts
|
|
||||||
if: ${{ always() }}
|
|
||||||
uses: actions/upload-artifact@v2
|
|
||||||
with:
|
|
||||||
name: torch_xformers_test_reports
|
|
||||||
path: reports
|
path: reports
|
||||||
|
|
||||||
run_examples_tests:
|
run_examples_tests:
|
||||||
@@ -402,6 +131,8 @@ jobs:
|
|||||||
- name: Install dependencies
|
- name: Install dependencies
|
||||||
run: |
|
run: |
|
||||||
python -m pip install -e .[quality,test,training]
|
python -m pip install -e .[quality,test,training]
|
||||||
|
python -m pip install git+https://github.com/huggingface/accelerate
|
||||||
|
python -m pip install -U git+https://github.com/huggingface/transformers
|
||||||
|
|
||||||
- name: Environment
|
- name: Environment
|
||||||
run: |
|
run: |
|
||||||
@@ -415,13 +146,11 @@ jobs:
|
|||||||
|
|
||||||
- name: Failure short reports
|
- name: Failure short reports
|
||||||
if: ${{ failure() }}
|
if: ${{ failure() }}
|
||||||
run: |
|
run: cat reports/examples_torch_cuda_failures_short.txt
|
||||||
cat reports/examples_torch_cuda_stats.txt
|
|
||||||
cat reports/examples_torch_cuda_failures_short.txt
|
|
||||||
|
|
||||||
- name: Test suite reports artifacts
|
- name: Test suite reports artifacts
|
||||||
if: ${{ always() }}
|
if: ${{ always() }}
|
||||||
uses: actions/upload-artifact@v2
|
uses: actions/upload-artifact@v2
|
||||||
with:
|
with:
|
||||||
name: examples_test_reports
|
name: examples_test_reports
|
||||||
path: reports
|
path: reports
|
||||||
|
|||||||
59
.github/workflows/push_tests_fast.yml
vendored
59
.github/workflows/push_tests_fast.yml
vendored
@@ -1,4 +1,4 @@
|
|||||||
name: Fast tests on main
|
name: Slow tests on main
|
||||||
|
|
||||||
on:
|
on:
|
||||||
push:
|
push:
|
||||||
@@ -60,8 +60,10 @@ jobs:
|
|||||||
|
|
||||||
- name: Install dependencies
|
- name: Install dependencies
|
||||||
run: |
|
run: |
|
||||||
apt-get update && apt-get install libsndfile1-dev libgl1 -y
|
apt-get update && apt-get install libsndfile1-dev -y
|
||||||
python -m pip install -e .[quality,test]
|
python -m pip install -e .[quality,test]
|
||||||
|
python -m pip install -U git+https://github.com/huggingface/transformers
|
||||||
|
python -m pip install git+https://github.com/huggingface/accelerate
|
||||||
|
|
||||||
- name: Environment
|
- name: Environment
|
||||||
run: |
|
run: |
|
||||||
@@ -108,3 +110,56 @@ jobs:
|
|||||||
with:
|
with:
|
||||||
name: pr_${{ matrix.config.report }}_test_reports
|
name: pr_${{ matrix.config.report }}_test_reports
|
||||||
path: reports
|
path: reports
|
||||||
|
|
||||||
|
run_fast_tests_apple_m1:
|
||||||
|
name: Fast PyTorch MPS tests on MacOS
|
||||||
|
runs-on: [ self-hosted, apple-m1 ]
|
||||||
|
|
||||||
|
steps:
|
||||||
|
- name: Checkout diffusers
|
||||||
|
uses: actions/checkout@v3
|
||||||
|
with:
|
||||||
|
fetch-depth: 2
|
||||||
|
|
||||||
|
- name: Clean checkout
|
||||||
|
shell: arch -arch arm64 bash {0}
|
||||||
|
run: |
|
||||||
|
git clean -fxd
|
||||||
|
|
||||||
|
- name: Setup miniconda
|
||||||
|
uses: ./.github/actions/setup-miniconda
|
||||||
|
with:
|
||||||
|
python-version: 3.9
|
||||||
|
|
||||||
|
- name: Install dependencies
|
||||||
|
shell: arch -arch arm64 bash {0}
|
||||||
|
run: |
|
||||||
|
${CONDA_RUN} python -m pip install --upgrade pip
|
||||||
|
${CONDA_RUN} python -m pip install -e .[quality,test]
|
||||||
|
${CONDA_RUN} python -m pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cpu
|
||||||
|
${CONDA_RUN} python -m pip install git+https://github.com/huggingface/accelerate
|
||||||
|
${CONDA_RUN} python -m pip install -U git+https://github.com/huggingface/transformers
|
||||||
|
|
||||||
|
- name: Environment
|
||||||
|
shell: arch -arch arm64 bash {0}
|
||||||
|
run: |
|
||||||
|
${CONDA_RUN} python utils/print_env.py
|
||||||
|
|
||||||
|
- name: Run fast PyTorch tests on M1 (MPS)
|
||||||
|
shell: arch -arch arm64 bash {0}
|
||||||
|
env:
|
||||||
|
HF_HOME: /System/Volumes/Data/mnt/cache
|
||||||
|
HUGGING_FACE_HUB_TOKEN: ${{ secrets.HUGGING_FACE_HUB_TOKEN }}
|
||||||
|
run: |
|
||||||
|
${CONDA_RUN} python -m pytest -n 0 -s -v --make-reports=tests_torch_mps tests/
|
||||||
|
|
||||||
|
- name: Failure short reports
|
||||||
|
if: ${{ failure() }}
|
||||||
|
run: cat reports/tests_torch_mps_failures_short.txt
|
||||||
|
|
||||||
|
- name: Test suite reports artifacts
|
||||||
|
if: ${{ always() }}
|
||||||
|
uses: actions/upload-artifact@v2
|
||||||
|
with:
|
||||||
|
name: pr_torch_mps_test_reports
|
||||||
|
path: reports
|
||||||
|
|||||||
68
.github/workflows/push_tests_mps.yml
vendored
68
.github/workflows/push_tests_mps.yml
vendored
@@ -1,68 +0,0 @@
|
|||||||
name: Fast mps tests on main
|
|
||||||
|
|
||||||
on:
|
|
||||||
push:
|
|
||||||
branches:
|
|
||||||
- main
|
|
||||||
|
|
||||||
env:
|
|
||||||
DIFFUSERS_IS_CI: yes
|
|
||||||
HF_HOME: /mnt/cache
|
|
||||||
OMP_NUM_THREADS: 8
|
|
||||||
MKL_NUM_THREADS: 8
|
|
||||||
PYTEST_TIMEOUT: 600
|
|
||||||
RUN_SLOW: no
|
|
||||||
|
|
||||||
jobs:
|
|
||||||
run_fast_tests_apple_m1:
|
|
||||||
name: Fast PyTorch MPS tests on MacOS
|
|
||||||
runs-on: [ self-hosted, apple-m1 ]
|
|
||||||
|
|
||||||
steps:
|
|
||||||
- name: Checkout diffusers
|
|
||||||
uses: actions/checkout@v3
|
|
||||||
with:
|
|
||||||
fetch-depth: 2
|
|
||||||
|
|
||||||
- name: Clean checkout
|
|
||||||
shell: arch -arch arm64 bash {0}
|
|
||||||
run: |
|
|
||||||
git clean -fxd
|
|
||||||
|
|
||||||
- name: Setup miniconda
|
|
||||||
uses: ./.github/actions/setup-miniconda
|
|
||||||
with:
|
|
||||||
python-version: 3.9
|
|
||||||
|
|
||||||
- name: Install dependencies
|
|
||||||
shell: arch -arch arm64 bash {0}
|
|
||||||
run: |
|
|
||||||
${CONDA_RUN} python -m pip install --upgrade pip
|
|
||||||
${CONDA_RUN} python -m pip install -e .[quality,test]
|
|
||||||
${CONDA_RUN} python -m pip install torch torchvision torchaudio
|
|
||||||
${CONDA_RUN} python -m pip install git+https://github.com/huggingface/accelerate.git
|
|
||||||
${CONDA_RUN} python -m pip install transformers --upgrade
|
|
||||||
|
|
||||||
- name: Environment
|
|
||||||
shell: arch -arch arm64 bash {0}
|
|
||||||
run: |
|
|
||||||
${CONDA_RUN} python utils/print_env.py
|
|
||||||
|
|
||||||
- name: Run fast PyTorch tests on M1 (MPS)
|
|
||||||
shell: arch -arch arm64 bash {0}
|
|
||||||
env:
|
|
||||||
HF_HOME: /System/Volumes/Data/mnt/cache
|
|
||||||
HUGGING_FACE_HUB_TOKEN: ${{ secrets.HUGGING_FACE_HUB_TOKEN }}
|
|
||||||
run: |
|
|
||||||
${CONDA_RUN} python -m pytest -n 0 -s -v --make-reports=tests_torch_mps tests/
|
|
||||||
|
|
||||||
- name: Failure short reports
|
|
||||||
if: ${{ failure() }}
|
|
||||||
run: cat reports/tests_torch_mps_failures_short.txt
|
|
||||||
|
|
||||||
- name: Test suite reports artifacts
|
|
||||||
if: ${{ always() }}
|
|
||||||
uses: actions/upload-artifact@v2
|
|
||||||
with:
|
|
||||||
name: pr_torch_mps_test_reports
|
|
||||||
path: reports
|
|
||||||
2
.github/workflows/stale.yml
vendored
2
.github/workflows/stale.yml
vendored
@@ -17,7 +17,7 @@ jobs:
|
|||||||
- name: Setup Python
|
- name: Setup Python
|
||||||
uses: actions/setup-python@v1
|
uses: actions/setup-python@v1
|
||||||
with:
|
with:
|
||||||
python-version: 3.8
|
python-version: 3.7
|
||||||
|
|
||||||
- name: Install requirements
|
- name: Install requirements
|
||||||
run: |
|
run: |
|
||||||
|
|||||||
16
.github/workflows/upload_pr_documentation.yml
vendored
16
.github/workflows/upload_pr_documentation.yml
vendored
@@ -1,16 +0,0 @@
|
|||||||
name: Upload PR Documentation
|
|
||||||
|
|
||||||
on:
|
|
||||||
workflow_run:
|
|
||||||
workflows: ["Build PR Documentation"]
|
|
||||||
types:
|
|
||||||
- completed
|
|
||||||
|
|
||||||
jobs:
|
|
||||||
build:
|
|
||||||
uses: huggingface/doc-builder/.github/workflows/upload_pr_documentation.yml@main
|
|
||||||
with:
|
|
||||||
package_name: diffusers
|
|
||||||
secrets:
|
|
||||||
hf_token: ${{ secrets.HF_DOC_BUILD_PUSH }}
|
|
||||||
comment_bot_token: ${{ secrets.COMMENT_BOT_TOKEN }}
|
|
||||||
@@ -40,7 +40,7 @@ In the following, we give an overview of different ways to contribute, ranked by
|
|||||||
As said before, **all contributions are valuable to the community**.
|
As said before, **all contributions are valuable to the community**.
|
||||||
In the following, we will explain each contribution a bit more in detail.
|
In the following, we will explain each contribution a bit more in detail.
|
||||||
|
|
||||||
For all contributions 4.-9. you will need to open a PR. It is explained in detail how to do so in [Opening a pull request](#how-to-open-a-pr)
|
For all contributions 4.-9. you will need to open a PR. It is explained in detail how to do so in [Opening a pull requst](#how-to-open-a-pr)
|
||||||
|
|
||||||
### 1. Asking and answering questions on the Diffusers discussion forum or on the Diffusers Discord
|
### 1. Asking and answering questions on the Diffusers discussion forum or on the Diffusers Discord
|
||||||
|
|
||||||
@@ -63,7 +63,7 @@ In the same spirit, you are of immense help to the community by answering such q
|
|||||||
|
|
||||||
**Please** keep in mind that the more effort you put into asking or answering a question, the higher
|
**Please** keep in mind that the more effort you put into asking or answering a question, the higher
|
||||||
the quality of the publicly documented knowledge. In the same way, well-posed and well-answered questions create a high-quality knowledge database accessible to everybody, while badly posed questions or answers reduce the overall quality of the public knowledge database.
|
the quality of the publicly documented knowledge. In the same way, well-posed and well-answered questions create a high-quality knowledge database accessible to everybody, while badly posed questions or answers reduce the overall quality of the public knowledge database.
|
||||||
In short, a high quality question or answer is *precise*, *concise*, *relevant*, *easy-to-understand*, *accessible*, and *well-formated/well-posed*. For more information, please have a look through the [How to write a good issue](#how-to-write-a-good-issue) section.
|
In short, a high quality question or answer is *precise*, *concise*, *relevant*, *easy-to-understand*, *accesible*, and *well-formated/well-posed*. For more information, please have a look through the [How to write a good issue](#how-to-write-a-good-issue) section.
|
||||||
|
|
||||||
**NOTE about channels**:
|
**NOTE about channels**:
|
||||||
[*The forum*](https://discuss.huggingface.co/c/discussion-related-to-httpsgithubcomhuggingfacediffusers/63) is much better indexed by search engines, such as Google. Posts are ranked by popularity rather than chronologically. Hence, it's easier to look up questions and answers that we posted some time ago.
|
[*The forum*](https://discuss.huggingface.co/c/discussion-related-to-httpsgithubcomhuggingfacediffusers/63) is much better indexed by search engines, such as Google. Posts are ranked by popularity rather than chronologically. Hence, it's easier to look up questions and answers that we posted some time ago.
|
||||||
@@ -125,14 +125,14 @@ Awesome! Tell us what problem it solved for you.
|
|||||||
|
|
||||||
You can open a feature request [here](https://github.com/huggingface/diffusers/issues/new?assignees=&labels=&template=feature_request.md&title=).
|
You can open a feature request [here](https://github.com/huggingface/diffusers/issues/new?assignees=&labels=&template=feature_request.md&title=).
|
||||||
|
|
||||||
#### 2.3 Feedback.
|
#### 2.3 Feedback.
|
||||||
|
|
||||||
Feedback about the library design and why it is good or not good helps the core maintainers immensely to build a user-friendly library. To understand the philosophy behind the current design philosophy, please have a look [here](https://huggingface.co/docs/diffusers/conceptual/philosophy). If you feel like a certain design choice does not fit with the current design philosophy, please explain why and how it should be changed. If a certain design choice follows the design philosophy too much, hence restricting use cases, explain why and how it should be changed.
|
Feedback about the library design and why it is good or not good helps the core maintainers immensely to build a user-friendly library. To understand the philosophy behind the current design philosophy, please have a look [here](https://huggingface.co/docs/diffusers/conceptual/philosophy). If you feel like a certain design choice does not fit with the current design philosophy, please explain why and how it should be changed. If a certain design choice follows the design philosophy too much, hence restricting use cases, explain why and how it should be changed.
|
||||||
If a certain design choice is very useful for you, please also leave a note as this is great feedback for future design decisions.
|
If a certain design choice is very useful for you, please also leave a note as this is great feedback for future design decisions.
|
||||||
|
|
||||||
You can open an issue about feedback [here](https://github.com/huggingface/diffusers/issues/new?assignees=&labels=&template=feedback.md&title=).
|
You can open an issue about feedback [here](https://github.com/huggingface/diffusers/issues/new?assignees=&labels=&template=feedback.md&title=).
|
||||||
|
|
||||||
#### 2.4 Technical questions.
|
#### 2.4 Technical questions.
|
||||||
|
|
||||||
Technical questions are mainly about why certain code of the library was written in a certain way, or what a certain part of the code does. Please make sure to link to the code in question and please provide detail on
|
Technical questions are mainly about why certain code of the library was written in a certain way, or what a certain part of the code does. Please make sure to link to the code in question and please provide detail on
|
||||||
why this part of the code is difficult to understand.
|
why this part of the code is difficult to understand.
|
||||||
@@ -168,7 +168,7 @@ more precise, provide the link to a duplicated issue or redirect them to [the fo
|
|||||||
If you have verified that the issued bug report is correct and requires a correction in the source code,
|
If you have verified that the issued bug report is correct and requires a correction in the source code,
|
||||||
please have a look at the next sections.
|
please have a look at the next sections.
|
||||||
|
|
||||||
For all of the following contributions, you will need to open a PR. It is explained in detail how to do so in the [Opening a pull request](#how-to-open-a-pr) section.
|
For all of the following contributions, you will need to open a PR. It is explained in detail how to do so in the [Opening a pull requst](#how-to-open-a-pr) section.
|
||||||
|
|
||||||
### 4. Fixing a "Good first issue"
|
### 4. Fixing a "Good first issue"
|
||||||
|
|
||||||
@@ -297,7 +297,7 @@ if you don't know yet what specific component you would like to add:
|
|||||||
- [Model or pipeline](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3A%22New+pipeline%2Fmodel%22)
|
- [Model or pipeline](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3A%22New+pipeline%2Fmodel%22)
|
||||||
- [Scheduler](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3A%22New+scheduler%22)
|
- [Scheduler](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3A%22New+scheduler%22)
|
||||||
|
|
||||||
Before adding any of the three components, it is strongly recommended that you give the [Philosophy guide](https://github.com/huggingface/diffusers/blob/main/PHILOSOPHY.md) a read to better understand the design of any of the three components. Please be aware that
|
Before adding any of the three components, it is strongly recommended that you give the [Philosophy guide](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3A%22Good+second+issue%22) a read to better understand the design of any of the three components. Please be aware that
|
||||||
we cannot merge model, scheduler, or pipeline additions that strongly diverge from our design philosophy
|
we cannot merge model, scheduler, or pipeline additions that strongly diverge from our design philosophy
|
||||||
as it will lead to API inconsistencies. If you fundamentally disagree with a design choice, please
|
as it will lead to API inconsistencies. If you fundamentally disagree with a design choice, please
|
||||||
open a [Feedback issue](https://github.com/huggingface/diffusers/issues/new?assignees=&labels=&template=feedback.md&title=) instead so that it can be discussed whether a certain design
|
open a [Feedback issue](https://github.com/huggingface/diffusers/issues/new?assignees=&labels=&template=feedback.md&title=) instead so that it can be discussed whether a certain design
|
||||||
@@ -394,8 +394,8 @@ passes. You should run the tests impacted by your changes like this:
|
|||||||
```bash
|
```bash
|
||||||
$ pytest tests/<TEST_TO_RUN>.py
|
$ pytest tests/<TEST_TO_RUN>.py
|
||||||
```
|
```
|
||||||
|
|
||||||
Before you run the tests, please make sure you install the dependencies required for testing. You can do so
|
Before you run the tests, please make sure you install the dependencies required for testing. You can do so
|
||||||
with this command:
|
with this command:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
|
|||||||
2
Makefile
2
Makefile
@@ -78,7 +78,7 @@ test:
|
|||||||
# Run tests for examples
|
# Run tests for examples
|
||||||
|
|
||||||
test-examples:
|
test-examples:
|
||||||
python -m pytest -n auto --dist=loadfile -s -v ./examples/
|
python -m pytest -n auto --dist=loadfile -s -v ./examples/pytorch/
|
||||||
|
|
||||||
|
|
||||||
# Release stuff
|
# Release stuff
|
||||||
|
|||||||
@@ -27,18 +27,18 @@ In a nutshell, Diffusers is built to be a natural extension of PyTorch. Therefor
|
|||||||
|
|
||||||
## Simple over easy
|
## Simple over easy
|
||||||
|
|
||||||
As PyTorch states, **explicit is better than implicit** and **simple is better than complex**. This design philosophy is reflected in multiple parts of the library:
|
As PyTorch states, **explicit is better than implicit** and **simple is better than complex**. This design philosophy is reflected in multiple parts of the library:
|
||||||
- We follow PyTorch's API with methods like [`DiffusionPipeline.to`](https://huggingface.co/docs/diffusers/main/en/api/diffusion_pipeline#diffusers.DiffusionPipeline.to) to let the user handle device management.
|
- We follow PyTorch's API with methods like [`DiffusionPipeline.to`](https://huggingface.co/docs/diffusers/main/en/api/diffusion_pipeline#diffusers.DiffusionPipeline.to) to let the user handle device management.
|
||||||
- Raising concise error messages is preferred to silently correct erroneous input. Diffusers aims at teaching the user, rather than making the library as easy to use as possible.
|
- Raising concise error messages is preferred to silently correct erroneous input. Diffusers aims at teaching the user, rather than making the library as easy to use as possible.
|
||||||
- Complex model vs. scheduler logic is exposed instead of magically handled inside. Schedulers/Samplers are separated from diffusion models with minimal dependencies on each other. This forces the user to write the unrolled denoising loop. However, the separation allows for easier debugging and gives the user more control over adapting the denoising process or switching out diffusion models or schedulers.
|
- Complex model vs. scheduler logic is exposed instead of magically handled inside. Schedulers/Samplers are separated from diffusion models with minimal dependencies on each other. This forces the user to write the unrolled denoising loop. However, the separation allows for easier debugging and gives the user more control over adapting the denoising process or switching out diffusion models or schedulers.
|
||||||
- Separately trained components of the diffusion pipeline, *e.g.* the text encoder, the unet, and the variational autoencoder, each have their own model class. This forces the user to handle the interaction between the different model components, and the serialization format separates the model components into different files. However, this allows for easier debugging and customization. Dreambooth or textual inversion training
|
- Separately trained components of the diffusion pipeline, *e.g.* the text encoder, the unet, and the variational autoencoder, each have their own model class. This forces the user to handle the interaction between the different model components, and the serialization format separates the model components into different files. However, this allows for easier debugging and customization. Dreambooth or textual inversion training
|
||||||
is very simple thanks to diffusers' ability to separate single components of the diffusion pipeline.
|
is very simple thanks to diffusers' ability to separate single components of the diffusion pipeline.
|
||||||
|
|
||||||
## Tweakable, contributor-friendly over abstraction
|
## Tweakable, contributor-friendly over abstraction
|
||||||
|
|
||||||
For large parts of the library, Diffusers adopts an important design principle of the [Transformers library](https://github.com/huggingface/transformers), which is to prefer copy-pasted code over hasty abstractions. This design principle is very opinionated and stands in stark contrast to popular design principles such as [Don't repeat yourself (DRY)](https://en.wikipedia.org/wiki/Don%27t_repeat_yourself).
|
For large parts of the library, Diffusers adopts an important design principle of the [Transformers library](https://github.com/huggingface/transformers), which is to prefer copy-pasted code over hasty abstractions. This design principle is very opinionated and stands in stark contrast to popular design principles such as [Don't repeat yourself (DRY)](https://en.wikipedia.org/wiki/Don%27t_repeat_yourself).
|
||||||
In short, just like Transformers does for modeling files, diffusers prefers to keep an extremely low level of abstraction and very self-contained code for pipelines and schedulers.
|
In short, just like Transformers does for modeling files, diffusers prefers to keep an extremely low level of abstraction and very self-contained code for pipelines and schedulers.
|
||||||
Functions, long code blocks, and even classes can be copied across multiple files which at first can look like a bad, sloppy design choice that makes the library unmaintainable.
|
Functions, long code blocks, and even classes can be copied across multiple files which at first can look like a bad, sloppy design choice that makes the library unmaintainable.
|
||||||
**However**, this design has proven to be extremely successful for Transformers and makes a lot of sense for community-driven, open-source machine learning libraries because:
|
**However**, this design has proven to be extremely successful for Transformers and makes a lot of sense for community-driven, open-source machine learning libraries because:
|
||||||
- Machine Learning is an extremely fast-moving field in which paradigms, model architectures, and algorithms are changing rapidly, which therefore makes it very difficult to define long-lasting code abstractions.
|
- Machine Learning is an extremely fast-moving field in which paradigms, model architectures, and algorithms are changing rapidly, which therefore makes it very difficult to define long-lasting code abstractions.
|
||||||
- Machine Learning practitioners like to be able to quickly tweak existing code for ideation and research and therefore prefer self-contained code over one that contains many abstractions.
|
- Machine Learning practitioners like to be able to quickly tweak existing code for ideation and research and therefore prefer self-contained code over one that contains many abstractions.
|
||||||
@@ -47,10 +47,10 @@ Functions, long code blocks, and even classes can be copied across multiple file
|
|||||||
At Hugging Face, we call this design the **single-file policy** which means that almost all of the code of a certain class should be written in a single, self-contained file. To read more about the philosophy, you can have a look
|
At Hugging Face, we call this design the **single-file policy** which means that almost all of the code of a certain class should be written in a single, self-contained file. To read more about the philosophy, you can have a look
|
||||||
at [this blog post](https://huggingface.co/blog/transformers-design-philosophy).
|
at [this blog post](https://huggingface.co/blog/transformers-design-philosophy).
|
||||||
|
|
||||||
In diffusers, we follow this philosophy for both pipelines and schedulers, but only partly for diffusion models. The reason we don't follow this design fully for diffusion models is because almost all diffusion pipelines, such
|
In diffusers, we follow this philosophy for both pipelines and schedulers, but only partly for diffusion models. The reason we don't follow this design fully for diffusion models is because almost all diffusion pipelines, such
|
||||||
as [DDPM](https://huggingface.co/docs/diffusers/v0.12.0/en/api/pipelines/ddpm), [Stable Diffusion](https://huggingface.co/docs/diffusers/v0.12.0/en/api/pipelines/stable_diffusion/overview#stable-diffusion-pipelines), [UnCLIP (Dalle-2)](https://huggingface.co/docs/diffusers/v0.12.0/en/api/pipelines/unclip#overview) and [Imagen](https://imagen.research.google/) all rely on the same diffusion model, the [UNet](https://huggingface.co/docs/diffusers/api/models#diffusers.UNet2DConditionModel).
|
as [DDPM](https://huggingface.co/docs/diffusers/v0.12.0/en/api/pipelines/ddpm), [Stable Diffusion](https://huggingface.co/docs/diffusers/v0.12.0/en/api/pipelines/stable_diffusion/overview#stable-diffusion-pipelines), [UnCLIP (Dalle-2)](https://huggingface.co/docs/diffusers/v0.12.0/en/api/pipelines/unclip#overview) and [Imagen](https://imagen.research.google/) all rely on the same diffusion model, the [UNet](https://huggingface.co/docs/diffusers/api/models#diffusers.UNet2DConditionModel).
|
||||||
|
|
||||||
Great, now you should have generally understood why 🧨 Diffusers is designed the way it is 🤗.
|
Great, now you should have generally understood why 🧨 Diffusers is designed the way it is 🤗.
|
||||||
We try to apply these design principles consistently across the library. Nevertheless, there are some minor exceptions to the philosophy or some unlucky design choices. If you have feedback regarding the design, we would ❤️ to hear it [directly on GitHub](https://github.com/huggingface/diffusers/issues/new?assignees=&labels=&template=feedback.md&title=).
|
We try to apply these design principles consistently across the library. Nevertheless, there are some minor exceptions to the philosophy or some unlucky design choices. If you have feedback regarding the design, we would ❤️ to hear it [directly on GitHub](https://github.com/huggingface/diffusers/issues/new?assignees=&labels=&template=feedback.md&title=).
|
||||||
|
|
||||||
## Design Philosophy in Details
|
## Design Philosophy in Details
|
||||||
@@ -70,7 +70,7 @@ The following design principles are followed:
|
|||||||
- Pipelines should be used **only** for inference.
|
- Pipelines should be used **only** for inference.
|
||||||
- Pipelines should be very readable, self-explanatory, and easy to tweak.
|
- Pipelines should be very readable, self-explanatory, and easy to tweak.
|
||||||
- Pipelines should be designed to build on top of each other and be easy to integrate into higher-level APIs.
|
- Pipelines should be designed to build on top of each other and be easy to integrate into higher-level APIs.
|
||||||
- Pipelines are **not** intended to be feature-complete user interfaces. For future complete user interfaces one should rather have a look at [InvokeAI](https://github.com/invoke-ai/InvokeAI), [Diffuzers](https://github.com/abhishekkrthakur/diffuzers), and [lama-cleaner](https://github.com/Sanster/lama-cleaner).
|
- Pipelines are **not** intended to be feature-complete user interfaces. For future complete user interfaces one should rather have a look at [InvokeAI](https://github.com/invoke-ai/InvokeAI), [Diffuzers](https://github.com/abhishekkrthakur/diffuzers), and [lama-cleaner](https://github.com/Sanster/lama-cleaner)
|
||||||
- Every pipeline should have one and only one way to run it via a `__call__` method. The naming of the `__call__` arguments should be shared across all pipelines.
|
- Every pipeline should have one and only one way to run it via a `__call__` method. The naming of the `__call__` arguments should be shared across all pipelines.
|
||||||
- Pipelines should be named after the task they are intended to solve.
|
- Pipelines should be named after the task they are intended to solve.
|
||||||
- In almost all cases, novel diffusion pipelines shall be implemented in a new pipeline folder/file.
|
- In almost all cases, novel diffusion pipelines shall be implemented in a new pipeline folder/file.
|
||||||
@@ -89,22 +89,22 @@ The following design principles are followed:
|
|||||||
- Models should by default have the highest precision and lowest performance setting.
|
- Models should by default have the highest precision and lowest performance setting.
|
||||||
- To integrate new model checkpoints whose general architecture can be classified as an architecture that already exists in Diffusers, the existing model architecture shall be adapted to make it work with the new checkpoint. One should only create a new file if the model architecture is fundamentally different.
|
- To integrate new model checkpoints whose general architecture can be classified as an architecture that already exists in Diffusers, the existing model architecture shall be adapted to make it work with the new checkpoint. One should only create a new file if the model architecture is fundamentally different.
|
||||||
- Models should be designed to be easily extendable to future changes. This can be achieved by limiting public function arguments, configuration arguments, and "foreseeing" future changes, *e.g.* it is usually better to add `string` "...type" arguments that can easily be extended to new future types instead of boolean `is_..._type` arguments. Only the minimum amount of changes shall be made to existing architectures to make a new model checkpoint work.
|
- Models should be designed to be easily extendable to future changes. This can be achieved by limiting public function arguments, configuration arguments, and "foreseeing" future changes, *e.g.* it is usually better to add `string` "...type" arguments that can easily be extended to new future types instead of boolean `is_..._type` arguments. Only the minimum amount of changes shall be made to existing architectures to make a new model checkpoint work.
|
||||||
- The model design is a difficult trade-off between keeping code readable and concise and supporting many model checkpoints. For most parts of the modeling code, classes shall be adapted for new model checkpoints, while there are some exceptions where it is preferred to add new classes to make sure the code is kept concise and
|
- The model design is a difficult trade-off between keeping code readable and concise and supporting many model checkpoints. For most parts of the modeling code, classes shall be adapted for new model checkpoints, while there are some exceptions where it is preferred to add new classes to make sure the code is kept concise and
|
||||||
readable longterm, such as [UNet blocks](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/unet_2d_blocks.py) and [Attention processors](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention_processor.py).
|
readable longterm, such as [UNet blocks](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/unet_2d_blocks.py) and [Attention processors](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/cross_attention.py).
|
||||||
|
|
||||||
### Schedulers
|
### Schedulers
|
||||||
|
|
||||||
Schedulers are responsible to guide the denoising process for inference as well as to define a noise schedule for training. They are designed as individual classes with loadable configuration files and strongly follow the **single-file policy**.
|
Schedulers are responsible to guide the denoising process for inference as well as to define a noise schedule for training. They are designed as individual classes with loadable configuration files and strongly follow the **single-file policy**.
|
||||||
|
|
||||||
The following design principles are followed:
|
The following design principles are followed:
|
||||||
- All schedulers are found in [`src/diffusers/schedulers`](https://github.com/huggingface/diffusers/tree/main/src/diffusers/schedulers).
|
- All schedulers are found in [`src/diffusers/schedulers`](https://github.com/huggingface/diffusers/tree/main/src/diffusers/schedulers).
|
||||||
- Schedulers are **not** allowed to import from large utils files and shall be kept very self-contained.
|
- Schedulers are **not** allowed to import from large utils files and shall be kept very self-contained.
|
||||||
- One scheduler python file corresponds to one scheduler algorithm (as might be defined in a paper).
|
- One scheduler python file corresponds to one scheduler algorithm (as might be defined in a paper).
|
||||||
- If schedulers share similar functionalities, we can make use of the `#Copied from` mechanism.
|
- If schedulers share similar functionalities, we can make use of the `#Copied from` mechanism.
|
||||||
- Schedulers all inherit from `SchedulerMixin` and `ConfigMixin`.
|
- Schedulers all inherit from `SchedulerMixin` and `ConfigMixin`.
|
||||||
- Schedulers can be easily swapped out with the [`ConfigMixin.from_config`](https://huggingface.co/docs/diffusers/main/en/api/configuration#diffusers.ConfigMixin.from_config) method as explained in detail [here](./using-diffusers/schedulers.md).
|
- Schedulers can be easily swapped out with the [`ConfigMixin.from_config`](https://huggingface.co/docs/diffusers/main/en/api/configuration#diffusers.ConfigMixin.from_config) method as explained in detail [here](./using-diffusers/schedulers.mdx).
|
||||||
- Every scheduler has to have a `set_num_inference_steps`, and a `step` function. `set_num_inference_steps(...)` has to be called before every denoising process, *i.e.* before `step(...)` is called.
|
- Every scheduler has to have a `set_num_inference_steps`, and a `step` function. `set_num_inference_steps(...)` has to be called before every denoising process, *i.e.* before `step(...)` is called.
|
||||||
- Every scheduler exposes the timesteps to be "looped over" via a `timesteps` attribute, which is an array of timesteps the model will be called upon.
|
- Every scheduler exposes the timesteps to be "looped over" via a `timesteps` attribute, which is an array of timesteps the model will be called upon
|
||||||
- The `step(...)` function takes a predicted model output and the "current" sample (x_t) and returns the "previous", slightly more denoised sample (x_t-1).
|
- The `step(...)` function takes a predicted model output and the "current" sample (x_t) and returns the "previous", slightly more denoised sample (x_t-1).
|
||||||
- Given the complexity of diffusion schedulers, the `step` function does not expose all the complexity and can be a bit of a "black box".
|
- Given the complexity of diffusion schedulers, the `step` function does not expose all the complexity and can be a bit of a "black box".
|
||||||
- In almost all cases, novel schedulers shall be implemented in a new scheduling file.
|
- In almost all cases, novel schedulers shall be implemented in a new scheduling file.
|
||||||
|
|||||||
152
README.md
152
README.md
@@ -1,6 +1,6 @@
|
|||||||
<p align="center">
|
<p align="center">
|
||||||
<br>
|
<br>
|
||||||
<img src="https://raw.githubusercontent.com/huggingface/diffusers/main/docs/source/en/imgs/diffusers_library.jpg" width="400"/>
|
<img src="./docs/source/en/imgs/diffusers_library.jpg" width="400"/>
|
||||||
<br>
|
<br>
|
||||||
<p>
|
<p>
|
||||||
<p align="center">
|
<p align="center">
|
||||||
@@ -10,9 +10,6 @@
|
|||||||
<a href="https://github.com/huggingface/diffusers/releases">
|
<a href="https://github.com/huggingface/diffusers/releases">
|
||||||
<img alt="GitHub release" src="https://img.shields.io/github/release/huggingface/diffusers.svg">
|
<img alt="GitHub release" src="https://img.shields.io/github/release/huggingface/diffusers.svg">
|
||||||
</a>
|
</a>
|
||||||
<a href="https://pepy.tech/project/diffusers">
|
|
||||||
<img alt="GitHub release" src="https://static.pepy.tech/badge/diffusers/month">
|
|
||||||
</a>
|
|
||||||
<a href="CODE_OF_CONDUCT.md">
|
<a href="CODE_OF_CONDUCT.md">
|
||||||
<img alt="Contributor Covenant" src="https://img.shields.io/badge/Contributor%20Covenant-2.0-4baaaa.svg">
|
<img alt="Contributor Covenant" src="https://img.shields.io/badge/Contributor%20Covenant-2.0-4baaaa.svg">
|
||||||
</a>
|
</a>
|
||||||
@@ -28,12 +25,12 @@
|
|||||||
|
|
||||||
## Installation
|
## Installation
|
||||||
|
|
||||||
We recommend installing 🤗 Diffusers in a virtual environment from PyPi or Conda. For more details about installing [PyTorch](https://pytorch.org/get-started/locally/) and [Flax](https://flax.readthedocs.io/en/latest/#installation), please refer to their official documentation.
|
We recommend installing 🤗 Diffusers in a virtual environment from PyPi or Conda. For more details about installing [PyTorch](https://pytorch.org/get-started/locally/) and [Flax](https://flax.readthedocs.io/en/latest/installation.html), please refer to their official documentation.
|
||||||
|
|
||||||
### PyTorch
|
### PyTorch
|
||||||
|
|
||||||
With `pip` (official package):
|
With `pip` (official package):
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
pip install --upgrade diffusers[torch]
|
pip install --upgrade diffusers[torch]
|
||||||
```
|
```
|
||||||
@@ -62,9 +59,8 @@ Generating outputs is super easy with 🤗 Diffusers. To generate an image from
|
|||||||
|
|
||||||
```python
|
```python
|
||||||
from diffusers import DiffusionPipeline
|
from diffusers import DiffusionPipeline
|
||||||
import torch
|
|
||||||
|
|
||||||
pipeline = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16)
|
pipeline = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
|
||||||
pipeline.to("cuda")
|
pipeline.to("cuda")
|
||||||
pipeline("An image of a squirrel in Picasso style").images[0]
|
pipeline("An image of a squirrel in Picasso style").images[0]
|
||||||
```
|
```
|
||||||
@@ -103,14 +99,58 @@ Check out the [Quickstart](https://huggingface.co/docs/diffusers/quicktour) to l
|
|||||||
|
|
||||||
| **Documentation** | **What can I learn?** |
|
| **Documentation** | **What can I learn?** |
|
||||||
|---------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|
|---------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|
||||||
| [Tutorial](https://huggingface.co/docs/diffusers/tutorials/tutorial_overview) | A basic crash course for learning how to use the library's most important features like using models and schedulers to build your own diffusion system, and training your own diffusion model. |
|
| Tutorial | A basic crash course for learning how to use the library's most important features like using models and schedulers to build your own diffusion system, and training your own diffusion model. |
|
||||||
| [Loading](https://huggingface.co/docs/diffusers/using-diffusers/loading_overview) | Guides for how to load and configure all the components (pipelines, models, and schedulers) of the library, as well as how to use different schedulers. |
|
| Loading | Guides for how to load and configure all the components (pipelines, models, and schedulers) of the library, as well as how to use different schedulers. |
|
||||||
| [Pipelines for inference](https://huggingface.co/docs/diffusers/using-diffusers/pipeline_overview) | Guides for how to use pipelines for different inference tasks, batched generation, controlling generated outputs and randomness, and how to contribute a pipeline to the library. |
|
| Pipelines for inference | Guides for how to use pipelines for different inference tasks, batched generation, controlling generated outputs and randomness, and how to contribute a pipeline to the library. |
|
||||||
| [Optimization](https://huggingface.co/docs/diffusers/optimization/opt_overview) | Guides for how to optimize your diffusion model to run faster and consume less memory. |
|
| Optimization | Guides for how to optimize your diffusion model to run faster and consume less memory. |
|
||||||
| [Training](https://huggingface.co/docs/diffusers/training/overview) | Guides for how to train a diffusion model for different tasks with different training techniques. |
|
| [Training](https://huggingface.co/docs/diffusers/training/overview) | Guides for how to train a diffusion model for different tasks with different training techniques. |
|
||||||
|
|
||||||
|
## Supported pipelines
|
||||||
|
|
||||||
|
| Pipeline | Paper | Tasks |
|
||||||
|
|---|---|:---:|
|
||||||
|
| [alt_diffusion](./api/pipelines/alt_diffusion) | [**AltDiffusion**](https://arxiv.org/abs/2211.06679) | Image-to-Image Text-Guided Generation |
|
||||||
|
| [audio_diffusion](./api/pipelines/audio_diffusion) | [**Audio Diffusion**](https://github.com/teticio/audio-diffusion.git) | Unconditional Audio Generation |
|
||||||
|
| [controlnet](./api/pipelines/stable_diffusion/controlnet) | [**ControlNet with Stable Diffusion**](https://arxiv.org/abs/2302.05543) | Image-to-Image Text-Guided Generation |
|
||||||
|
| [cycle_diffusion](./api/pipelines/cycle_diffusion) | [**Cycle Diffusion**](https://arxiv.org/abs/2210.05559) | Image-to-Image Text-Guided Generation |
|
||||||
|
| [dance_diffusion](./api/pipelines/dance_diffusion) | [**Dance Diffusion**](https://github.com/williamberman/diffusers.git) | Unconditional Audio Generation |
|
||||||
|
| [ddpm](./api/pipelines/ddpm) | [**Denoising Diffusion Probabilistic Models**](https://arxiv.org/abs/2006.11239) | Unconditional Image Generation |
|
||||||
|
| [ddim](./api/pipelines/ddim) | [**Denoising Diffusion Implicit Models**](https://arxiv.org/abs/2010.02502) | Unconditional Image Generation |
|
||||||
|
| [latent_diffusion](./api/pipelines/latent_diffusion) | [**High-Resolution Image Synthesis with Latent Diffusion Models**](https://arxiv.org/abs/2112.10752)| Text-to-Image Generation |
|
||||||
|
| [latent_diffusion](./api/pipelines/latent_diffusion) | [**High-Resolution Image Synthesis with Latent Diffusion Models**](https://arxiv.org/abs/2112.10752)| Super Resolution Image-to-Image |
|
||||||
|
| [latent_diffusion_uncond](./api/pipelines/latent_diffusion_uncond) | [**High-Resolution Image Synthesis with Latent Diffusion Models**](https://arxiv.org/abs/2112.10752) | Unconditional Image Generation |
|
||||||
|
| [paint_by_example](./api/pipelines/paint_by_example) | [**Paint by Example: Exemplar-based Image Editing with Diffusion Models**](https://arxiv.org/abs/2211.13227) | Image-Guided Image Inpainting |
|
||||||
|
| [pndm](./api/pipelines/pndm) | [**Pseudo Numerical Methods for Diffusion Models on Manifolds**](https://arxiv.org/abs/2202.09778) | Unconditional Image Generation |
|
||||||
|
| [score_sde_ve](./api/pipelines/score_sde_ve) | [**Score-Based Generative Modeling through Stochastic Differential Equations**](https://openreview.net/forum?id=PxTIG12RRHS) | Unconditional Image Generation |
|
||||||
|
| [score_sde_vp](./api/pipelines/score_sde_vp) | [**Score-Based Generative Modeling through Stochastic Differential Equations**](https://openreview.net/forum?id=PxTIG12RRHS) | Unconditional Image Generation |
|
||||||
|
| [semantic_stable_diffusion](./api/pipelines/semantic_stable_diffusion) | [**Semantic Guidance**](https://arxiv.org/abs/2301.12247) | Text-Guided Generation |
|
||||||
|
| [stable_diffusion_text2img](./api/pipelines/stable_diffusion/text2img) | [**Stable Diffusion**](https://stability.ai/blog/stable-diffusion-public-release) | Text-to-Image Generation |
|
||||||
|
| [stable_diffusion_img2img](./api/pipelines/stable_diffusion/img2img) | [**Stable Diffusion**](https://stability.ai/blog/stable-diffusion-public-release) | Image-to-Image Text-Guided Generation |
|
||||||
|
| [stable_diffusion_inpaint](./api/pipelines/stable_diffusion/inpaint) | [**Stable Diffusion**](https://stability.ai/blog/stable-diffusion-public-release) | Text-Guided Image Inpainting |
|
||||||
|
| [stable_diffusion_panorama](./api/pipelines/stable_diffusion/panorama) | [**MultiDiffusion**](https://multidiffusion.github.io/) | Text-to-Panorama Generation |
|
||||||
|
| [stable_diffusion_pix2pix](./api/pipelines/stable_diffusion/pix2pix) | [**InstructPix2Pix**](https://github.com/timothybrooks/instruct-pix2pix) | Text-Guided Image Editing|
|
||||||
|
| [stable_diffusion_pix2pix_zero](./api/pipelines/stable_diffusion/pix2pix_zero) | [**Zero-shot Image-to-Image Translation**](https://pix2pixzero.github.io/) | Text-Guided Image Editing |
|
||||||
|
| [stable_diffusion_attend_and_excite](./api/pipelines/stable_diffusion/attend_and_excite) | [**Attend and Excite for Stable Diffusion**](https://attendandexcite.github.io/Attend-and-Excite/) | Text-to-Image Generation |
|
||||||
|
| [stable_diffusion_self_attention_guidance](./api/pipelines/stable_diffusion/self_attention_guidance) | [**Self-Attention Guidance**](https://ku-cvlab.github.io/Self-Attention-Guidance) | Text-to-Image Generation |
|
||||||
|
| [stable_diffusion_image_variation](./stable_diffusion/image_variation) | [**Stable Diffusion Image Variations**](https://github.com/LambdaLabsML/lambda-diffusers#stable-diffusion-image-variations) | Image-to-Image Generation |
|
||||||
|
| [stable_diffusion_latent_upscale](./stable_diffusion/latent_upscale) | [**Stable Diffusion Latent Upscaler**](https://twitter.com/StabilityAI/status/1590531958815064065) | Text-Guided Super Resolution Image-to-Image |
|
||||||
|
| [stable_diffusion_2](./api/pipelines/stable_diffusion_2) | [**Stable Diffusion 2**](https://stability.ai/blog/stable-diffusion-v2-release) | Text-to-Image Generation |
|
||||||
|
| [stable_diffusion_2](./api/pipelines/stable_diffusion_2) | [**Stable Diffusion 2**](https://stability.ai/blog/stable-diffusion-v2-release) | Text-Guided Image Inpainting |
|
||||||
|
| [stable_diffusion_2](./api/pipelines/stable_diffusion_2) | [**Depth-Conditional Stable Diffusion**](https://github.com/Stability-AI/stablediffusion#depth-conditional-stable-diffusion) | Depth-to-Image Generation |
|
||||||
|
| [stable_diffusion_2](./api/pipelines/stable_diffusion_2) | [**Stable Diffusion 2**](https://stability.ai/blog/stable-diffusion-v2-release) | Text-Guided Super Resolution Image-to-Image |
|
||||||
|
| [stable_diffusion_safe](./api/pipelines/stable_diffusion_safe) | [**Safe Stable Diffusion**](https://arxiv.org/abs/2211.05105) | Text-Guided Generation |
|
||||||
|
| [stable_unclip](./stable_unclip) | **Stable unCLIP** | Text-to-Image Generation |
|
||||||
|
| [stable_unclip](./stable_unclip) | **Stable unCLIP** | Image-to-Image Text-Guided Generation |
|
||||||
|
| [stochastic_karras_ve](./api/pipelines/stochastic_karras_ve) | [**Elucidating the Design Space of Diffusion-Based Generative Models**](https://arxiv.org/abs/2206.00364) | Unconditional Image Generation |
|
||||||
|
| [unclip](./api/pipelines/unclip) | [Hierarchical Text-Conditional Image Generation with CLIP Latents](https://arxiv.org/abs/2204.06125) | Text-to-Image Generation |
|
||||||
|
| [versatile_diffusion](./api/pipelines/versatile_diffusion) | [Versatile Diffusion: Text, Images and Variations All in One Diffusion Model](https://arxiv.org/abs/2211.08332) | Text-to-Image Generation |
|
||||||
|
| [versatile_diffusion](./api/pipelines/versatile_diffusion) | [Versatile Diffusion: Text, Images and Variations All in One Diffusion Model](https://arxiv.org/abs/2211.08332) | Image Variations Generation |
|
||||||
|
| [versatile_diffusion](./api/pipelines/versatile_diffusion) | [Versatile Diffusion: Text, Images and Variations All in One Diffusion Model](https://arxiv.org/abs/2211.08332) | Dual Image and Text Guided Generation |
|
||||||
|
| [vq_diffusion](./api/pipelines/vq_diffusion) | [Vector Quantized Diffusion Model for Text-to-Image Synthesis](https://arxiv.org/abs/2111.14822) | Text-to-Image Generation |
|
||||||
|
|
||||||
## Contribution
|
## Contribution
|
||||||
|
|
||||||
We ❤️ contributions from the open-source community!
|
We ❤️ contributions from the open-source community!
|
||||||
If you want to contribute to this library, please check out our [Contribution guide](https://github.com/huggingface/diffusers/blob/main/CONTRIBUTING.md).
|
If you want to contribute to this library, please check out our [Contribution guide](https://github.com/huggingface/diffusers/blob/main/CONTRIBUTING.md).
|
||||||
You can look out for [issues](https://github.com/huggingface/diffusers/issues) you'd like to tackle to contribute to the library.
|
You can look out for [issues](https://github.com/huggingface/diffusers/issues) you'd like to tackle to contribute to the library.
|
||||||
- See [Good first issues](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22) for general opportunities to contribute
|
- See [Good first issues](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22) for general opportunities to contribute
|
||||||
@@ -120,92 +160,6 @@ You can look out for [issues](https://github.com/huggingface/diffusers/issues) y
|
|||||||
Also, say 👋 in our public Discord channel <a href="https://discord.gg/G7tWnz98XR"><img alt="Join us on Discord" src="https://img.shields.io/discord/823813159592001537?color=5865F2&logo=discord&logoColor=white"></a>. We discuss the hottest trends about diffusion models, help each other with contributions, personal projects or
|
Also, say 👋 in our public Discord channel <a href="https://discord.gg/G7tWnz98XR"><img alt="Join us on Discord" src="https://img.shields.io/discord/823813159592001537?color=5865F2&logo=discord&logoColor=white"></a>. We discuss the hottest trends about diffusion models, help each other with contributions, personal projects or
|
||||||
just hang out ☕.
|
just hang out ☕.
|
||||||
|
|
||||||
|
|
||||||
## Popular Tasks & Pipelines
|
|
||||||
|
|
||||||
<table>
|
|
||||||
<tr>
|
|
||||||
<th>Task</th>
|
|
||||||
<th>Pipeline</th>
|
|
||||||
<th>🤗 Hub</th>
|
|
||||||
</tr>
|
|
||||||
<tr style="border-top: 2px solid black">
|
|
||||||
<td>Unconditional Image Generation</td>
|
|
||||||
<td><a href="https://huggingface.co/docs/diffusers/api/pipelines/ddpm"> DDPM </a></td>
|
|
||||||
<td><a href="https://huggingface.co/google/ddpm-ema-church-256"> google/ddpm-ema-church-256 </a></td>
|
|
||||||
</tr>
|
|
||||||
<tr style="border-top: 2px solid black">
|
|
||||||
<td>Text-to-Image</td>
|
|
||||||
<td><a href="https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/text2img">Stable Diffusion Text-to-Image</a></td>
|
|
||||||
<td><a href="https://huggingface.co/runwayml/stable-diffusion-v1-5"> runwayml/stable-diffusion-v1-5 </a></td>
|
|
||||||
</tr>
|
|
||||||
<tr>
|
|
||||||
<td>Text-to-Image</td>
|
|
||||||
<td><a href="https://huggingface.co/docs/diffusers/api/pipelines/unclip">unclip</a></td>
|
|
||||||
<td><a href="https://huggingface.co/kakaobrain/karlo-v1-alpha"> kakaobrain/karlo-v1-alpha </a></td>
|
|
||||||
</tr>
|
|
||||||
<tr>
|
|
||||||
<td>Text-to-Image</td>
|
|
||||||
<td><a href="https://huggingface.co/docs/diffusers/api/pipelines/if">DeepFloyd IF</a></td>
|
|
||||||
<td><a href="https://huggingface.co/DeepFloyd/IF-I-XL-v1.0"> DeepFloyd/IF-I-XL-v1.0 </a></td>
|
|
||||||
</tr>
|
|
||||||
<tr>
|
|
||||||
<td>Text-to-Image</td>
|
|
||||||
<td><a href="https://huggingface.co/docs/diffusers/api/pipelines/kandinsky">Kandinsky</a></td>
|
|
||||||
<td><a href="https://huggingface.co/kandinsky-community/kandinsky-2-2-decoder"> kandinsky-community/kandinsky-2-2-decoder </a></td>
|
|
||||||
</tr>
|
|
||||||
<tr style="border-top: 2px solid black">
|
|
||||||
<td>Text-guided Image-to-Image</td>
|
|
||||||
<td><a href="https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/controlnet">Controlnet</a></td>
|
|
||||||
<td><a href="https://huggingface.co/lllyasviel/sd-controlnet-canny"> lllyasviel/sd-controlnet-canny </a></td>
|
|
||||||
</tr>
|
|
||||||
<tr>
|
|
||||||
<td>Text-guided Image-to-Image</td>
|
|
||||||
<td><a href="https://huggingface.co/docs/diffusers/api/pipelines/pix2pix">Instruct Pix2Pix</a></td>
|
|
||||||
<td><a href="https://huggingface.co/timbrooks/instruct-pix2pix"> timbrooks/instruct-pix2pix </a></td>
|
|
||||||
</tr>
|
|
||||||
<tr>
|
|
||||||
<td>Text-guided Image-to-Image</td>
|
|
||||||
<td><a href="https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/img2img">Stable Diffusion Image-to-Image</a></td>
|
|
||||||
<td><a href="https://huggingface.co/runwayml/stable-diffusion-v1-5"> runwayml/stable-diffusion-v1-5 </a></td>
|
|
||||||
</tr>
|
|
||||||
<tr style="border-top: 2px solid black">
|
|
||||||
<td>Text-guided Image Inpainting</td>
|
|
||||||
<td><a href="https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/inpaint">Stable Diffusion Inpaint</a></td>
|
|
||||||
<td><a href="https://huggingface.co/runwayml/stable-diffusion-inpainting"> runwayml/stable-diffusion-inpainting </a></td>
|
|
||||||
</tr>
|
|
||||||
<tr style="border-top: 2px solid black">
|
|
||||||
<td>Image Variation</td>
|
|
||||||
<td><a href="https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/image_variation">Stable Diffusion Image Variation</a></td>
|
|
||||||
<td><a href="https://huggingface.co/lambdalabs/sd-image-variations-diffusers"> lambdalabs/sd-image-variations-diffusers </a></td>
|
|
||||||
</tr>
|
|
||||||
<tr style="border-top: 2px solid black">
|
|
||||||
<td>Super Resolution</td>
|
|
||||||
<td><a href="https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/upscale">Stable Diffusion Upscale</a></td>
|
|
||||||
<td><a href="https://huggingface.co/stabilityai/stable-diffusion-x4-upscaler"> stabilityai/stable-diffusion-x4-upscaler </a></td>
|
|
||||||
</tr>
|
|
||||||
<tr>
|
|
||||||
<td>Super Resolution</td>
|
|
||||||
<td><a href="https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/latent_upscale">Stable Diffusion Latent Upscale</a></td>
|
|
||||||
<td><a href="https://huggingface.co/stabilityai/sd-x2-latent-upscaler"> stabilityai/sd-x2-latent-upscaler </a></td>
|
|
||||||
</tr>
|
|
||||||
</table>
|
|
||||||
|
|
||||||
## Popular libraries using 🧨 Diffusers
|
|
||||||
|
|
||||||
- https://github.com/microsoft/TaskMatrix
|
|
||||||
- https://github.com/invoke-ai/InvokeAI
|
|
||||||
- https://github.com/apple/ml-stable-diffusion
|
|
||||||
- https://github.com/Sanster/lama-cleaner
|
|
||||||
- https://github.com/IDEA-Research/Grounded-Segment-Anything
|
|
||||||
- https://github.com/ashawkey/stable-dreamfusion
|
|
||||||
- https://github.com/deep-floyd/IF
|
|
||||||
- https://github.com/bentoml/BentoML
|
|
||||||
- https://github.com/bmaltais/kohya_ss
|
|
||||||
- +3000 other amazing GitHub repositories 💪
|
|
||||||
|
|
||||||
Thank you for using us ❤️
|
|
||||||
|
|
||||||
## Credits
|
## Credits
|
||||||
|
|
||||||
This library concretizes previous work by many different authors and would not have been possible without their great research and implementations. We'd like to thank, in particular, the following implementations which have helped us in our development and without which the API could not have been as polished today:
|
This library concretizes previous work by many different authors and would not have been possible without their great research and implementations. We'd like to thank, in particular, the following implementations which have helped us in our development and without which the API could not have been as polished today:
|
||||||
|
|||||||
@@ -1,46 +0,0 @@
|
|||||||
FROM nvidia/cuda:12.1.0-runtime-ubuntu20.04
|
|
||||||
LABEL maintainer="Hugging Face"
|
|
||||||
LABEL repository="diffusers"
|
|
||||||
|
|
||||||
ENV DEBIAN_FRONTEND=noninteractive
|
|
||||||
|
|
||||||
RUN apt update && \
|
|
||||||
apt install -y bash \
|
|
||||||
build-essential \
|
|
||||||
git \
|
|
||||||
git-lfs \
|
|
||||||
curl \
|
|
||||||
ca-certificates \
|
|
||||||
libsndfile1-dev \
|
|
||||||
libgl1 \
|
|
||||||
python3.9 \
|
|
||||||
python3.9-dev \
|
|
||||||
python3-pip \
|
|
||||||
python3.9-venv && \
|
|
||||||
rm -rf /var/lib/apt/lists
|
|
||||||
|
|
||||||
# make sure to use venv
|
|
||||||
RUN python3.9 -m venv /opt/venv
|
|
||||||
ENV PATH="/opt/venv/bin:$PATH"
|
|
||||||
|
|
||||||
# pre-install the heavy dependencies (these can later be overridden by the deps from setup.py)
|
|
||||||
RUN python3.9 -m pip install --no-cache-dir --upgrade pip && \
|
|
||||||
python3.9 -m pip install --no-cache-dir \
|
|
||||||
torch \
|
|
||||||
torchvision \
|
|
||||||
torchaudio \
|
|
||||||
invisible_watermark && \
|
|
||||||
python3.9 -m pip install --no-cache-dir \
|
|
||||||
accelerate \
|
|
||||||
datasets \
|
|
||||||
hf-doc-builder \
|
|
||||||
huggingface-hub \
|
|
||||||
Jinja2 \
|
|
||||||
librosa \
|
|
||||||
numpy \
|
|
||||||
scipy \
|
|
||||||
tensorboard \
|
|
||||||
transformers \
|
|
||||||
omegaconf
|
|
||||||
|
|
||||||
CMD ["/bin/bash"]
|
|
||||||
@@ -14,7 +14,6 @@ RUN apt update && \
|
|||||||
libsndfile1-dev \
|
libsndfile1-dev \
|
||||||
python3.8 \
|
python3.8 \
|
||||||
python3-pip \
|
python3-pip \
|
||||||
libgl1 \
|
|
||||||
python3.8-venv && \
|
python3.8-venv && \
|
||||||
rm -rf /var/lib/apt/lists
|
rm -rf /var/lib/apt/lists
|
||||||
|
|
||||||
@@ -28,7 +27,6 @@ RUN python3 -m pip install --no-cache-dir --upgrade pip && \
|
|||||||
torch \
|
torch \
|
||||||
torchvision \
|
torchvision \
|
||||||
torchaudio \
|
torchaudio \
|
||||||
invisible_watermark \
|
|
||||||
--extra-index-url https://download.pytorch.org/whl/cpu && \
|
--extra-index-url https://download.pytorch.org/whl/cpu && \
|
||||||
python3 -m pip install --no-cache-dir \
|
python3 -m pip install --no-cache-dir \
|
||||||
accelerate \
|
accelerate \
|
||||||
@@ -42,4 +40,4 @@ RUN python3 -m pip install --no-cache-dir --upgrade pip && \
|
|||||||
tensorboard \
|
tensorboard \
|
||||||
transformers
|
transformers
|
||||||
|
|
||||||
CMD ["/bin/bash"]
|
CMD ["/bin/bash"]
|
||||||
@@ -1,4 +1,4 @@
|
|||||||
FROM nvidia/cuda:12.1.0-runtime-ubuntu20.04
|
FROM nvidia/cuda:11.7.1-cudnn8-runtime-ubuntu20.04
|
||||||
LABEL maintainer="Hugging Face"
|
LABEL maintainer="Hugging Face"
|
||||||
LABEL repository="diffusers"
|
LABEL repository="diffusers"
|
||||||
|
|
||||||
@@ -6,16 +6,15 @@ ENV DEBIAN_FRONTEND=noninteractive
|
|||||||
|
|
||||||
RUN apt update && \
|
RUN apt update && \
|
||||||
apt install -y bash \
|
apt install -y bash \
|
||||||
build-essential \
|
build-essential \
|
||||||
git \
|
git \
|
||||||
git-lfs \
|
git-lfs \
|
||||||
curl \
|
curl \
|
||||||
ca-certificates \
|
ca-certificates \
|
||||||
libsndfile1-dev \
|
libsndfile1-dev \
|
||||||
libgl1 \
|
python3.8 \
|
||||||
python3.8 \
|
python3-pip \
|
||||||
python3-pip \
|
python3.8-venv && \
|
||||||
python3.8-venv && \
|
|
||||||
rm -rf /var/lib/apt/lists
|
rm -rf /var/lib/apt/lists
|
||||||
|
|
||||||
# make sure to use venv
|
# make sure to use venv
|
||||||
@@ -25,22 +24,19 @@ ENV PATH="/opt/venv/bin:$PATH"
|
|||||||
# pre-install the heavy dependencies (these can later be overridden by the deps from setup.py)
|
# pre-install the heavy dependencies (these can later be overridden by the deps from setup.py)
|
||||||
RUN python3 -m pip install --no-cache-dir --upgrade pip && \
|
RUN python3 -m pip install --no-cache-dir --upgrade pip && \
|
||||||
python3 -m pip install --no-cache-dir \
|
python3 -m pip install --no-cache-dir \
|
||||||
torch \
|
torch \
|
||||||
torchvision \
|
torchvision \
|
||||||
torchaudio \
|
torchaudio \
|
||||||
invisible_watermark && \
|
|
||||||
python3 -m pip install --no-cache-dir \
|
python3 -m pip install --no-cache-dir \
|
||||||
accelerate \
|
accelerate \
|
||||||
datasets \
|
datasets \
|
||||||
hf-doc-builder \
|
hf-doc-builder \
|
||||||
huggingface-hub \
|
huggingface-hub \
|
||||||
Jinja2 \
|
Jinja2 \
|
||||||
librosa \
|
librosa \
|
||||||
numpy \
|
numpy \
|
||||||
scipy \
|
scipy \
|
||||||
tensorboard \
|
tensorboard \
|
||||||
transformers \
|
transformers
|
||||||
omegaconf \
|
|
||||||
pytorch-lightning
|
|
||||||
|
|
||||||
CMD ["/bin/bash"]
|
CMD ["/bin/bash"]
|
||||||
|
|||||||
@@ -1,46 +0,0 @@
|
|||||||
FROM nvidia/cuda:12.1.0-runtime-ubuntu20.04
|
|
||||||
LABEL maintainer="Hugging Face"
|
|
||||||
LABEL repository="diffusers"
|
|
||||||
|
|
||||||
ENV DEBIAN_FRONTEND=noninteractive
|
|
||||||
|
|
||||||
RUN apt update && \
|
|
||||||
apt install -y bash \
|
|
||||||
build-essential \
|
|
||||||
git \
|
|
||||||
git-lfs \
|
|
||||||
curl \
|
|
||||||
ca-certificates \
|
|
||||||
libsndfile1-dev \
|
|
||||||
libgl1 \
|
|
||||||
python3.8 \
|
|
||||||
python3-pip \
|
|
||||||
python3.8-venv && \
|
|
||||||
rm -rf /var/lib/apt/lists
|
|
||||||
|
|
||||||
# make sure to use venv
|
|
||||||
RUN python3 -m venv /opt/venv
|
|
||||||
ENV PATH="/opt/venv/bin:$PATH"
|
|
||||||
|
|
||||||
# pre-install the heavy dependencies (these can later be overridden by the deps from setup.py)
|
|
||||||
RUN python3 -m pip install --no-cache-dir --upgrade pip && \
|
|
||||||
python3 -m pip install --no-cache-dir \
|
|
||||||
torch \
|
|
||||||
torchvision \
|
|
||||||
torchaudio \
|
|
||||||
invisible_watermark && \
|
|
||||||
python3 -m pip install --no-cache-dir \
|
|
||||||
accelerate \
|
|
||||||
datasets \
|
|
||||||
hf-doc-builder \
|
|
||||||
huggingface-hub \
|
|
||||||
Jinja2 \
|
|
||||||
librosa \
|
|
||||||
numpy \
|
|
||||||
scipy \
|
|
||||||
tensorboard \
|
|
||||||
transformers \
|
|
||||||
omegaconf \
|
|
||||||
xformers
|
|
||||||
|
|
||||||
CMD ["/bin/bash"]
|
|
||||||
@@ -68,10 +68,10 @@ The `preview` command only works with existing doc files. When you add a complet
|
|||||||
|
|
||||||
## Adding a new element to the navigation bar
|
## Adding a new element to the navigation bar
|
||||||
|
|
||||||
Accepted files are Markdown (.md).
|
Accepted files are Markdown (.md or .mdx).
|
||||||
|
|
||||||
Create a file with its extension and put it in the source directory. You can then link it to the toc-tree by putting
|
Create a file with its extension and put it in the source directory. You can then link it to the toc-tree by putting
|
||||||
the filename without the extension in the [`_toctree.yml`](https://github.com/huggingface/diffusers/blob/main/docs/source/en/_toctree.yml) file.
|
the filename without the extension in the [`_toctree.yml`](https://github.com/huggingface/diffusers/blob/main/docs/source/_toctree.yml) file.
|
||||||
|
|
||||||
## Renaming section headers and moving sections
|
## Renaming section headers and moving sections
|
||||||
|
|
||||||
@@ -81,14 +81,14 @@ Therefore, we simply keep a little map of moved sections at the end of the docum
|
|||||||
|
|
||||||
So if you renamed a section from: "Section A" to "Section B", then you can add at the end of the file:
|
So if you renamed a section from: "Section A" to "Section B", then you can add at the end of the file:
|
||||||
|
|
||||||
```md
|
```
|
||||||
Sections that were moved:
|
Sections that were moved:
|
||||||
|
|
||||||
[ <a href="#section-b">Section A</a><a id="section-a"></a> ]
|
[ <a href="#section-b">Section A</a><a id="section-a"></a> ]
|
||||||
```
|
```
|
||||||
and of course, if you moved it to another file, then:
|
and of course, if you moved it to another file, then:
|
||||||
|
|
||||||
```md
|
```
|
||||||
Sections that were moved:
|
Sections that were moved:
|
||||||
|
|
||||||
[ <a href="../new-file#section-b">Section A</a><a id="section-a"></a> ]
|
[ <a href="../new-file#section-b">Section A</a><a id="section-a"></a> ]
|
||||||
@@ -96,7 +96,7 @@ Sections that were moved:
|
|||||||
|
|
||||||
Use the relative style to link to the new file so that the versioned docs continue to work.
|
Use the relative style to link to the new file so that the versioned docs continue to work.
|
||||||
|
|
||||||
For an example of a rich moved section set please see the very end of [the transformers Trainer doc](https://github.com/huggingface/transformers/blob/main/docs/source/en/main_classes/trainer.md).
|
For an example of a rich moved section set please see the very end of [the transformers Trainer doc](https://github.com/huggingface/transformers/blob/main/docs/source/en/main_classes/trainer.mdx).
|
||||||
|
|
||||||
|
|
||||||
## Writing Documentation - Specification
|
## Writing Documentation - Specification
|
||||||
@@ -109,8 +109,8 @@ although we can write them directly in Markdown.
|
|||||||
|
|
||||||
Adding a new tutorial or section is done in two steps:
|
Adding a new tutorial or section is done in two steps:
|
||||||
|
|
||||||
- Add a new Markdown (.md) file under `docs/source/<languageCode>`.
|
- Add a new file under `docs/source`. This file can either be ReStructuredText (.rst) or Markdown (.md).
|
||||||
- Link that file in `docs/source/<languageCode>/_toctree.yml` on the correct toc-tree.
|
- Link that file in `docs/source/_toctree.yml` on the correct toc-tree.
|
||||||
|
|
||||||
Make sure to put your new file under the proper section. It's unlikely to go in the first section (*Get Started*), so
|
Make sure to put your new file under the proper section. It's unlikely to go in the first section (*Get Started*), so
|
||||||
depending on the intended targets (beginners, more advanced users, or researchers) it should go in sections two, three, or four.
|
depending on the intended targets (beginners, more advanced users, or researchers) it should go in sections two, three, or four.
|
||||||
@@ -119,8 +119,8 @@ depending on the intended targets (beginners, more advanced users, or researcher
|
|||||||
|
|
||||||
When adding a new pipeline:
|
When adding a new pipeline:
|
||||||
|
|
||||||
- Create a file `xxx.md` under `docs/source/<languageCode>/api/pipelines` (don't hesitate to copy an existing file as template).
|
- create a file `xxx.mdx` under `docs/source/api/pipelines` (don't hesitate to copy an existing file as template).
|
||||||
- Link that file in (*Diffusers Summary*) section in `docs/source/api/pipelines/overview.md`, along with the link to the paper, and a colab notebook (if available).
|
- Link that file in (*Diffusers Summary*) section in `docs/source/api/pipelines/overview.mdx`, along with the link to the paper, and a colab notebook (if available).
|
||||||
- Write a short overview of the diffusion model:
|
- Write a short overview of the diffusion model:
|
||||||
- Overview with paper & authors
|
- Overview with paper & authors
|
||||||
- Paper abstract
|
- Paper abstract
|
||||||
@@ -129,6 +129,8 @@ When adding a new pipeline:
|
|||||||
- Add all the pipeline classes that should be linked in the diffusion model. These classes should be added using our Markdown syntax. By default as follows:
|
- Add all the pipeline classes that should be linked in the diffusion model. These classes should be added using our Markdown syntax. By default as follows:
|
||||||
|
|
||||||
```
|
```
|
||||||
|
## XXXPipeline
|
||||||
|
|
||||||
[[autodoc]] XXXPipeline
|
[[autodoc]] XXXPipeline
|
||||||
- all
|
- all
|
||||||
- __call__
|
- __call__
|
||||||
@@ -146,7 +148,7 @@ This will include every public method of the pipeline that is documented, as wel
|
|||||||
- disable_xformers_memory_efficient_attention
|
- disable_xformers_memory_efficient_attention
|
||||||
```
|
```
|
||||||
|
|
||||||
You can follow the same process to create a new scheduler under the `docs/source/<languageCode>/api/schedulers` folder.
|
You can follow the same process to create a new scheduler under the `docs/source/api/schedulers` folder
|
||||||
|
|
||||||
### Writing source documentation
|
### Writing source documentation
|
||||||
|
|
||||||
@@ -162,7 +164,7 @@ provide its path. For instance: \[\`pipelines.ImagePipelineOutput\`\]. This will
|
|||||||
`pipelines.ImagePipelineOutput` in the description. To get rid of the path and only keep the name of the object you are
|
`pipelines.ImagePipelineOutput` in the description. To get rid of the path and only keep the name of the object you are
|
||||||
linking to in the description, add a ~: \[\`~pipelines.ImagePipelineOutput\`\] will generate a link with `ImagePipelineOutput` in the description.
|
linking to in the description, add a ~: \[\`~pipelines.ImagePipelineOutput\`\] will generate a link with `ImagePipelineOutput` in the description.
|
||||||
|
|
||||||
The same works for methods so you can either use \[\`XXXClass.method\`\] or \[\`~XXXClass.method\`\].
|
The same works for methods so you can either use \[\`XXXClass.method\`\] or \[~\`XXXClass.method\`\].
|
||||||
|
|
||||||
#### Defining arguments in a method
|
#### Defining arguments in a method
|
||||||
|
|
||||||
@@ -194,8 +196,8 @@ Here's an example showcasing everything so far:
|
|||||||
For optional arguments or arguments with defaults we follow the following syntax: imagine we have a function with the
|
For optional arguments or arguments with defaults we follow the following syntax: imagine we have a function with the
|
||||||
following signature:
|
following signature:
|
||||||
|
|
||||||
```py
|
```
|
||||||
def my_function(x: str=None, a: float=3.14):
|
def my_function(x: str = None, a: float = 1):
|
||||||
```
|
```
|
||||||
|
|
||||||
then its documentation should look like this:
|
then its documentation should look like this:
|
||||||
@@ -204,7 +206,7 @@ then its documentation should look like this:
|
|||||||
Args:
|
Args:
|
||||||
x (`str`, *optional*):
|
x (`str`, *optional*):
|
||||||
This argument controls ...
|
This argument controls ...
|
||||||
a (`float`, *optional*, defaults to `3.14`):
|
a (`float`, *optional*, defaults to 1):
|
||||||
This argument is used to ...
|
This argument is used to ...
|
||||||
```
|
```
|
||||||
|
|
||||||
@@ -266,3 +268,4 @@ We have an automatic script running with the `make style` command that will make
|
|||||||
This script may have some weird failures if you made a syntax mistake or if you uncover a bug. Therefore, it's
|
This script may have some weird failures if you made a syntax mistake or if you uncover a bug. Therefore, it's
|
||||||
recommended to commit your changes before running `make style`, so you can revert the changes done by that script
|
recommended to commit your changes before running `make style`, so you can revert the changes done by that script
|
||||||
easily.
|
easily.
|
||||||
|
|
||||||
|
|||||||
@@ -6,4 +6,4 @@ INSTALL_CONTENT = """
|
|||||||
# ! pip install git+https://github.com/huggingface/diffusers.git
|
# ! pip install git+https://github.com/huggingface/diffusers.git
|
||||||
"""
|
"""
|
||||||
|
|
||||||
notebook_first_cells = [{"type": "code", "content": INSTALL_CONTENT}]
|
notebook_first_cells = [{"type": "code", "content": INSTALL_CONTENT}]
|
||||||
@@ -12,13 +12,9 @@
|
|||||||
- local: tutorials/tutorial_overview
|
- local: tutorials/tutorial_overview
|
||||||
title: Overview
|
title: Overview
|
||||||
- local: using-diffusers/write_own_pipeline
|
- local: using-diffusers/write_own_pipeline
|
||||||
title: Understanding pipelines, models and schedulers
|
title: Understanding models and schedulers
|
||||||
- local: tutorials/autopipeline
|
|
||||||
title: AutoPipeline
|
|
||||||
- local: tutorials/basic_training
|
- local: tutorials/basic_training
|
||||||
title: Train a diffusion model
|
title: Train a diffusion model
|
||||||
- local: tutorials/using_peft_for_inference
|
|
||||||
title: Inference with PEFT
|
|
||||||
title: Tutorials
|
title: Tutorials
|
||||||
- sections:
|
- sections:
|
||||||
- sections:
|
- sections:
|
||||||
@@ -29,15 +25,9 @@
|
|||||||
- local: using-diffusers/schedulers
|
- local: using-diffusers/schedulers
|
||||||
title: Load and compare different schedulers
|
title: Load and compare different schedulers
|
||||||
- local: using-diffusers/custom_pipeline_overview
|
- local: using-diffusers/custom_pipeline_overview
|
||||||
title: Load community pipelines and components
|
title: Load community pipelines
|
||||||
- local: using-diffusers/using_safetensors
|
- local: using-diffusers/kerascv
|
||||||
title: Load safetensors
|
title: Load KerasCV Stable Diffusion checkpoints
|
||||||
- local: using-diffusers/other-formats
|
|
||||||
title: Load different Stable Diffusion formats
|
|
||||||
- local: using-diffusers/loading_adapters
|
|
||||||
title: Load adapters
|
|
||||||
- local: using-diffusers/push_to_hub
|
|
||||||
title: Push files to the Hub
|
|
||||||
title: Loading & Hub
|
title: Loading & Hub
|
||||||
- sections:
|
- sections:
|
||||||
- local: using-diffusers/pipeline_overview
|
- local: using-diffusers/pipeline_overview
|
||||||
@@ -45,59 +35,31 @@
|
|||||||
- local: using-diffusers/unconditional_image_generation
|
- local: using-diffusers/unconditional_image_generation
|
||||||
title: Unconditional image generation
|
title: Unconditional image generation
|
||||||
- local: using-diffusers/conditional_image_generation
|
- local: using-diffusers/conditional_image_generation
|
||||||
title: Text-to-image
|
title: Text-to-image generation
|
||||||
- local: using-diffusers/img2img
|
- local: using-diffusers/img2img
|
||||||
title: Image-to-image
|
title: Text-guided image-to-image
|
||||||
- local: using-diffusers/inpaint
|
- local: using-diffusers/inpaint
|
||||||
title: Inpainting
|
title: Text-guided image-inpainting
|
||||||
- local: using-diffusers/depth2img
|
- local: using-diffusers/depth2img
|
||||||
title: Depth-to-image
|
title: Text-guided depth-to-image
|
||||||
title: Tasks
|
|
||||||
- sections:
|
|
||||||
- local: using-diffusers/textual_inversion_inference
|
|
||||||
title: Textual inversion
|
|
||||||
- local: training/distributed_inference
|
|
||||||
title: Distributed inference with multiple GPUs
|
|
||||||
- local: using-diffusers/reusing_seeds
|
- local: using-diffusers/reusing_seeds
|
||||||
title: Improve image quality with deterministic generation
|
title: Improve image quality with deterministic generation
|
||||||
- local: using-diffusers/control_brightness
|
|
||||||
title: Control image brightness
|
|
||||||
- local: using-diffusers/weighted_prompts
|
|
||||||
title: Prompt weighting
|
|
||||||
- local: using-diffusers/freeu
|
|
||||||
title: Improve generation quality with FreeU
|
|
||||||
title: Techniques
|
|
||||||
- sections:
|
|
||||||
- local: using-diffusers/pipeline_overview
|
|
||||||
title: Overview
|
|
||||||
- local: using-diffusers/sdxl
|
|
||||||
title: Stable Diffusion XL
|
|
||||||
- local: using-diffusers/kandinsky
|
|
||||||
title: Kandinsky
|
|
||||||
- local: using-diffusers/controlnet
|
|
||||||
title: ControlNet
|
|
||||||
- local: using-diffusers/callback
|
|
||||||
title: Callback
|
|
||||||
- local: using-diffusers/shap-e
|
|
||||||
title: Shap-E
|
|
||||||
- local: using-diffusers/diffedit
|
|
||||||
title: DiffEdit
|
|
||||||
- local: using-diffusers/distilled_sd
|
|
||||||
title: Distilled Stable Diffusion inference
|
|
||||||
- local: using-diffusers/reproducibility
|
- local: using-diffusers/reproducibility
|
||||||
title: Create reproducible pipelines
|
title: Create reproducible pipelines
|
||||||
- local: using-diffusers/custom_pipeline_examples
|
- local: using-diffusers/custom_pipeline_examples
|
||||||
title: Community pipelines
|
title: Community pipelines
|
||||||
- local: using-diffusers/contribute_pipeline
|
- local: using-diffusers/contribute_pipeline
|
||||||
title: Contribute a community pipeline
|
title: How to contribute a community pipeline
|
||||||
title: Specific pipeline examples
|
- local: using-diffusers/using_safetensors
|
||||||
|
title: Using safetensors
|
||||||
|
- local: using-diffusers/stable_diffusion_jax_how_to
|
||||||
|
title: Stable Diffusion in JAX/Flax
|
||||||
|
- local: using-diffusers/weighted_prompts
|
||||||
|
title: Weighting Prompts
|
||||||
|
title: Pipelines for Inference
|
||||||
- sections:
|
- sections:
|
||||||
- local: training/overview
|
- local: training/overview
|
||||||
title: Overview
|
title: Overview
|
||||||
- local: training/create_dataset
|
|
||||||
title: Create a dataset for training
|
|
||||||
- local: training/adapt_a_model
|
|
||||||
title: Adapt a model to a new task
|
|
||||||
- local: training/unconditional_training
|
- local: training/unconditional_training
|
||||||
title: Unconditional image generation
|
title: Unconditional image generation
|
||||||
- local: training/text_inversion
|
- local: training/text_inversion
|
||||||
@@ -114,12 +76,12 @@
|
|||||||
title: InstructPix2Pix Training
|
title: InstructPix2Pix Training
|
||||||
- local: training/custom_diffusion
|
- local: training/custom_diffusion
|
||||||
title: Custom Diffusion
|
title: Custom Diffusion
|
||||||
- local: training/t2i_adapters
|
|
||||||
title: T2I-Adapters
|
|
||||||
- local: training/ddpo
|
|
||||||
title: Reinforcement learning training with DDPO
|
|
||||||
title: Training
|
title: Training
|
||||||
- sections:
|
- sections:
|
||||||
|
- local: using-diffusers/rl
|
||||||
|
title: Reinforcement Learning
|
||||||
|
- local: using-diffusers/audio
|
||||||
|
title: Audio
|
||||||
- local: using-diffusers/other-modalities
|
- local: using-diffusers/other-modalities
|
||||||
title: Other Modalities
|
title: Other Modalities
|
||||||
title: Taking Diffusers Beyond Images
|
title: Taking Diffusers Beyond Images
|
||||||
@@ -127,35 +89,23 @@
|
|||||||
- sections:
|
- sections:
|
||||||
- local: optimization/opt_overview
|
- local: optimization/opt_overview
|
||||||
title: Overview
|
title: Overview
|
||||||
- sections:
|
- local: optimization/fp16
|
||||||
- local: optimization/fp16
|
title: Memory and Speed
|
||||||
title: Speed up inference
|
- local: optimization/torch2.0
|
||||||
- local: optimization/memory
|
title: Torch2.0 support
|
||||||
title: Reduce memory usage
|
- local: optimization/xformers
|
||||||
- local: optimization/torch2.0
|
title: xFormers
|
||||||
title: Torch 2.0
|
- local: optimization/onnx
|
||||||
- local: optimization/xformers
|
title: ONNX
|
||||||
title: xFormers
|
- local: optimization/open_vino
|
||||||
- local: optimization/tome
|
title: OpenVINO
|
||||||
title: Token merging
|
- local: optimization/coreml
|
||||||
title: General optimizations
|
title: Core ML
|
||||||
- sections:
|
- local: optimization/mps
|
||||||
- local: using-diffusers/stable_diffusion_jax_how_to
|
title: MPS
|
||||||
title: JAX/Flax
|
- local: optimization/habana
|
||||||
- local: optimization/onnx
|
title: Habana Gaudi
|
||||||
title: ONNX
|
title: Optimization/Special Hardware
|
||||||
- local: optimization/open_vino
|
|
||||||
title: OpenVINO
|
|
||||||
- local: optimization/coreml
|
|
||||||
title: Core ML
|
|
||||||
title: Optimized model types
|
|
||||||
- sections:
|
|
||||||
- local: optimization/mps
|
|
||||||
title: Metal Performance Shaders (MPS)
|
|
||||||
- local: optimization/habana
|
|
||||||
title: Habana Gaudi
|
|
||||||
title: Optimized hardware
|
|
||||||
title: Optimization
|
|
||||||
- sections:
|
- sections:
|
||||||
- local: conceptual/philosophy
|
- local: conceptual/philosophy
|
||||||
title: Philosophy
|
title: Philosophy
|
||||||
@@ -170,70 +120,28 @@
|
|||||||
title: Conceptual Guides
|
title: Conceptual Guides
|
||||||
- sections:
|
- sections:
|
||||||
- sections:
|
- sections:
|
||||||
- local: api/configuration
|
- local: api/models
|
||||||
title: Configuration
|
title: Models
|
||||||
- local: api/loaders
|
- local: api/diffusion_pipeline
|
||||||
title: Loaders
|
title: Diffusion Pipeline
|
||||||
- local: api/logging
|
- local: api/logging
|
||||||
title: Logging
|
title: Logging
|
||||||
|
- local: api/configuration
|
||||||
|
title: Configuration
|
||||||
- local: api/outputs
|
- local: api/outputs
|
||||||
title: Outputs
|
title: Outputs
|
||||||
|
- local: api/loaders
|
||||||
|
title: Loaders
|
||||||
title: Main Classes
|
title: Main Classes
|
||||||
- sections:
|
|
||||||
- local: api/models/overview
|
|
||||||
title: Overview
|
|
||||||
- local: api/models/unet
|
|
||||||
title: UNet1DModel
|
|
||||||
- local: api/models/unet2d
|
|
||||||
title: UNet2DModel
|
|
||||||
- local: api/models/unet2d-cond
|
|
||||||
title: UNet2DConditionModel
|
|
||||||
- local: api/models/unet3d-cond
|
|
||||||
title: UNet3DConditionModel
|
|
||||||
- local: api/models/unet-motion
|
|
||||||
title: UNetMotionModel
|
|
||||||
- local: api/models/vq
|
|
||||||
title: VQModel
|
|
||||||
- local: api/models/autoencoderkl
|
|
||||||
title: AutoencoderKL
|
|
||||||
- local: api/models/asymmetricautoencoderkl
|
|
||||||
title: AsymmetricAutoencoderKL
|
|
||||||
- local: api/models/autoencoder_tiny
|
|
||||||
title: Tiny AutoEncoder
|
|
||||||
- local: api/models/transformer2d
|
|
||||||
title: Transformer2D
|
|
||||||
- local: api/models/transformer_temporal
|
|
||||||
title: Transformer Temporal
|
|
||||||
- local: api/models/prior_transformer
|
|
||||||
title: Prior Transformer
|
|
||||||
- local: api/models/controlnet
|
|
||||||
title: ControlNet
|
|
||||||
title: Models
|
|
||||||
- sections:
|
- sections:
|
||||||
- local: api/pipelines/overview
|
- local: api/pipelines/overview
|
||||||
title: Overview
|
title: Overview
|
||||||
- local: api/pipelines/alt_diffusion
|
- local: api/pipelines/alt_diffusion
|
||||||
title: AltDiffusion
|
title: AltDiffusion
|
||||||
- local: api/pipelines/animatediff
|
|
||||||
title: AnimateDiff
|
|
||||||
- local: api/pipelines/attend_and_excite
|
|
||||||
title: Attend-and-Excite
|
|
||||||
- local: api/pipelines/audio_diffusion
|
- local: api/pipelines/audio_diffusion
|
||||||
title: Audio Diffusion
|
title: Audio Diffusion
|
||||||
- local: api/pipelines/audioldm
|
- local: api/pipelines/audioldm
|
||||||
title: AudioLDM
|
title: AudioLDM
|
||||||
- local: api/pipelines/audioldm2
|
|
||||||
title: AudioLDM 2
|
|
||||||
- local: api/pipelines/auto_pipeline
|
|
||||||
title: AutoPipeline
|
|
||||||
- local: api/pipelines/blip_diffusion
|
|
||||||
title: BLIP Diffusion
|
|
||||||
- local: api/pipelines/consistency_models
|
|
||||||
title: Consistency Models
|
|
||||||
- local: api/pipelines/controlnet
|
|
||||||
title: ControlNet
|
|
||||||
- local: api/pipelines/controlnet_sdxl
|
|
||||||
title: ControlNet with Stable Diffusion XL
|
|
||||||
- local: api/pipelines/cycle_diffusion
|
- local: api/pipelines/cycle_diffusion
|
||||||
title: Cycle Diffusion
|
title: Cycle Diffusion
|
||||||
- local: api/pipelines/dance_diffusion
|
- local: api/pipelines/dance_diffusion
|
||||||
@@ -242,167 +150,121 @@
|
|||||||
title: DDIM
|
title: DDIM
|
||||||
- local: api/pipelines/ddpm
|
- local: api/pipelines/ddpm
|
||||||
title: DDPM
|
title: DDPM
|
||||||
- local: api/pipelines/deepfloyd_if
|
|
||||||
title: DeepFloyd IF
|
|
||||||
- local: api/pipelines/diffedit
|
|
||||||
title: DiffEdit
|
|
||||||
- local: api/pipelines/dit
|
- local: api/pipelines/dit
|
||||||
title: DiT
|
title: DiT
|
||||||
- local: api/pipelines/pix2pix
|
|
||||||
title: InstructPix2Pix
|
|
||||||
- local: api/pipelines/kandinsky
|
|
||||||
title: Kandinsky 2.1
|
|
||||||
- local: api/pipelines/kandinsky_v22
|
|
||||||
title: Kandinsky 2.2
|
|
||||||
- local: api/pipelines/latent_consistency_models
|
|
||||||
title: Latent Consistency Models
|
|
||||||
- local: api/pipelines/latent_diffusion
|
- local: api/pipelines/latent_diffusion
|
||||||
title: Latent Diffusion
|
title: Latent Diffusion
|
||||||
- local: api/pipelines/panorama
|
|
||||||
title: MultiDiffusion
|
|
||||||
- local: api/pipelines/musicldm
|
|
||||||
title: MusicLDM
|
|
||||||
- local: api/pipelines/paint_by_example
|
- local: api/pipelines/paint_by_example
|
||||||
title: Paint By Example
|
title: PaintByExample
|
||||||
- local: api/pipelines/paradigms
|
|
||||||
title: Parallel Sampling of Diffusion Models
|
|
||||||
- local: api/pipelines/pix2pix_zero
|
|
||||||
title: Pix2Pix Zero
|
|
||||||
- local: api/pipelines/pixart
|
|
||||||
title: PixArt
|
|
||||||
- local: api/pipelines/pndm
|
- local: api/pipelines/pndm
|
||||||
title: PNDM
|
title: PNDM
|
||||||
- local: api/pipelines/repaint
|
- local: api/pipelines/repaint
|
||||||
title: RePaint
|
title: RePaint
|
||||||
|
- local: api/pipelines/stable_diffusion_safe
|
||||||
|
title: Safe Stable Diffusion
|
||||||
- local: api/pipelines/score_sde_ve
|
- local: api/pipelines/score_sde_ve
|
||||||
title: Score SDE VE
|
title: Score SDE VE
|
||||||
- local: api/pipelines/self_attention_guidance
|
|
||||||
title: Self-Attention Guidance
|
|
||||||
- local: api/pipelines/semantic_stable_diffusion
|
- local: api/pipelines/semantic_stable_diffusion
|
||||||
title: Semantic Guidance
|
title: Semantic Guidance
|
||||||
- local: api/pipelines/shap_e
|
|
||||||
title: Shap-E
|
|
||||||
- local: api/pipelines/spectrogram_diffusion
|
- local: api/pipelines/spectrogram_diffusion
|
||||||
title: Spectrogram Diffusion
|
title: "Spectrogram Diffusion"
|
||||||
- sections:
|
- sections:
|
||||||
- local: api/pipelines/stable_diffusion/overview
|
- local: api/pipelines/stable_diffusion/overview
|
||||||
title: Overview
|
title: Overview
|
||||||
- local: api/pipelines/stable_diffusion/text2img
|
- local: api/pipelines/stable_diffusion/text2img
|
||||||
title: Text-to-image
|
title: Text-to-Image
|
||||||
- local: api/pipelines/stable_diffusion/img2img
|
- local: api/pipelines/stable_diffusion/img2img
|
||||||
title: Image-to-image
|
title: Image-to-Image
|
||||||
- local: api/pipelines/stable_diffusion/inpaint
|
- local: api/pipelines/stable_diffusion/inpaint
|
||||||
title: Inpainting
|
title: Inpaint
|
||||||
- local: api/pipelines/stable_diffusion/depth2img
|
- local: api/pipelines/stable_diffusion/depth2img
|
||||||
title: Depth-to-image
|
title: Depth-to-Image
|
||||||
- local: api/pipelines/stable_diffusion/image_variation
|
- local: api/pipelines/stable_diffusion/image_variation
|
||||||
title: Image variation
|
title: Image-Variation
|
||||||
- local: api/pipelines/stable_diffusion/stable_diffusion_safe
|
|
||||||
title: Safe Stable Diffusion
|
|
||||||
- local: api/pipelines/stable_diffusion/stable_diffusion_2
|
|
||||||
title: Stable Diffusion 2
|
|
||||||
- local: api/pipelines/stable_diffusion/stable_diffusion_xl
|
|
||||||
title: Stable Diffusion XL
|
|
||||||
- local: api/pipelines/stable_diffusion/latent_upscale
|
|
||||||
title: Latent upscaler
|
|
||||||
- local: api/pipelines/stable_diffusion/upscale
|
- local: api/pipelines/stable_diffusion/upscale
|
||||||
title: Super-resolution
|
title: Super-Resolution
|
||||||
- local: api/pipelines/stable_diffusion/ldm3d_diffusion
|
- local: api/pipelines/stable_diffusion/latent_upscale
|
||||||
title: LDM3D Text-to-(RGB, Depth)
|
title: Stable-Diffusion-Latent-Upscaler
|
||||||
- local: api/pipelines/stable_diffusion/adapter
|
- local: api/pipelines/stable_diffusion/pix2pix
|
||||||
title: Stable Diffusion T2I-Adapter
|
title: InstructPix2Pix
|
||||||
- local: api/pipelines/stable_diffusion/gligen
|
- local: api/pipelines/stable_diffusion/attend_and_excite
|
||||||
title: GLIGEN (Grounded Language-to-Image Generation)
|
title: Attend and Excite
|
||||||
|
- local: api/pipelines/stable_diffusion/pix2pix_zero
|
||||||
|
title: Pix2Pix Zero
|
||||||
|
- local: api/pipelines/stable_diffusion/self_attention_guidance
|
||||||
|
title: Self-Attention Guidance
|
||||||
|
- local: api/pipelines/stable_diffusion/panorama
|
||||||
|
title: MultiDiffusion Panorama
|
||||||
|
- local: api/pipelines/stable_diffusion/controlnet
|
||||||
|
title: Text-to-Image Generation with ControlNet Conditioning
|
||||||
|
- local: api/pipelines/stable_diffusion/model_editing
|
||||||
|
title: Text-to-Image Model Editing
|
||||||
title: Stable Diffusion
|
title: Stable Diffusion
|
||||||
|
- local: api/pipelines/stable_diffusion_2
|
||||||
|
title: Stable Diffusion 2
|
||||||
- local: api/pipelines/stable_unclip
|
- local: api/pipelines/stable_unclip
|
||||||
title: Stable unCLIP
|
title: Stable unCLIP
|
||||||
- local: api/pipelines/stochastic_karras_ve
|
- local: api/pipelines/stochastic_karras_ve
|
||||||
title: Stochastic Karras VE
|
title: Stochastic Karras VE
|
||||||
- local: api/pipelines/model_editing
|
|
||||||
title: Text-to-image model editing
|
|
||||||
- local: api/pipelines/text_to_video
|
- local: api/pipelines/text_to_video
|
||||||
title: Text-to-video
|
title: Text-to-Video
|
||||||
- local: api/pipelines/text_to_video_zero
|
- local: api/pipelines/text_to_video_zero
|
||||||
title: Text2Video-Zero
|
title: Text-to-Video Zero
|
||||||
- local: api/pipelines/unclip
|
- local: api/pipelines/unclip
|
||||||
title: unCLIP
|
title: UnCLIP
|
||||||
- local: api/pipelines/latent_diffusion_uncond
|
- local: api/pipelines/latent_diffusion_uncond
|
||||||
title: Unconditional Latent Diffusion
|
title: Unconditional Latent Diffusion
|
||||||
- local: api/pipelines/unidiffuser
|
|
||||||
title: UniDiffuser
|
|
||||||
- local: api/pipelines/value_guided_sampling
|
|
||||||
title: Value-guided sampling
|
|
||||||
- local: api/pipelines/versatile_diffusion
|
- local: api/pipelines/versatile_diffusion
|
||||||
title: Versatile Diffusion
|
title: Versatile Diffusion
|
||||||
- local: api/pipelines/vq_diffusion
|
- local: api/pipelines/vq_diffusion
|
||||||
title: VQ Diffusion
|
title: VQ Diffusion
|
||||||
- local: api/pipelines/wuerstchen
|
|
||||||
title: Wuerstchen
|
|
||||||
title: Pipelines
|
title: Pipelines
|
||||||
- sections:
|
- sections:
|
||||||
- local: api/schedulers/overview
|
- local: api/schedulers/overview
|
||||||
title: Overview
|
title: Overview
|
||||||
- local: api/schedulers/cm_stochastic_iterative
|
|
||||||
title: CMStochasticIterativeScheduler
|
|
||||||
- local: api/schedulers/ddim_inverse
|
|
||||||
title: DDIMInverseScheduler
|
|
||||||
- local: api/schedulers/ddim
|
- local: api/schedulers/ddim
|
||||||
title: DDIMScheduler
|
title: DDIM
|
||||||
|
- local: api/schedulers/ddim_inverse
|
||||||
|
title: DDIMInverse
|
||||||
- local: api/schedulers/ddpm
|
- local: api/schedulers/ddpm
|
||||||
title: DDPMScheduler
|
title: DDPM
|
||||||
- local: api/schedulers/deis
|
- local: api/schedulers/deis
|
||||||
title: DEISMultistepScheduler
|
title: DEIS
|
||||||
- local: api/schedulers/multistep_dpm_solver_inverse
|
|
||||||
title: DPMSolverMultistepInverse
|
|
||||||
- local: api/schedulers/multistep_dpm_solver
|
|
||||||
title: DPMSolverMultistepScheduler
|
|
||||||
- local: api/schedulers/dpm_sde
|
|
||||||
title: DPMSolverSDEScheduler
|
|
||||||
- local: api/schedulers/singlestep_dpm_solver
|
|
||||||
title: DPMSolverSinglestepScheduler
|
|
||||||
- local: api/schedulers/euler_ancestral
|
|
||||||
title: EulerAncestralDiscreteScheduler
|
|
||||||
- local: api/schedulers/euler
|
|
||||||
title: EulerDiscreteScheduler
|
|
||||||
- local: api/schedulers/heun
|
|
||||||
title: HeunDiscreteScheduler
|
|
||||||
- local: api/schedulers/ipndm
|
|
||||||
title: IPNDMScheduler
|
|
||||||
- local: api/schedulers/stochastic_karras_ve
|
|
||||||
title: KarrasVeScheduler
|
|
||||||
- local: api/schedulers/dpm_discrete_ancestral
|
|
||||||
title: KDPM2AncestralDiscreteScheduler
|
|
||||||
- local: api/schedulers/dpm_discrete
|
- local: api/schedulers/dpm_discrete
|
||||||
title: KDPM2DiscreteScheduler
|
title: DPM Discrete Scheduler
|
||||||
- local: api/schedulers/lcm
|
- local: api/schedulers/dpm_discrete_ancestral
|
||||||
title: LCMScheduler
|
title: DPM Discrete Scheduler with ancestral sampling
|
||||||
|
- local: api/schedulers/euler_ancestral
|
||||||
|
title: Euler Ancestral Scheduler
|
||||||
|
- local: api/schedulers/euler
|
||||||
|
title: Euler scheduler
|
||||||
|
- local: api/schedulers/heun
|
||||||
|
title: Heun Scheduler
|
||||||
|
- local: api/schedulers/ipndm
|
||||||
|
title: IPNDM
|
||||||
- local: api/schedulers/lms_discrete
|
- local: api/schedulers/lms_discrete
|
||||||
title: LMSDiscreteScheduler
|
title: Linear Multistep
|
||||||
|
- local: api/schedulers/multistep_dpm_solver
|
||||||
|
title: Multistep DPM-Solver
|
||||||
- local: api/schedulers/pndm
|
- local: api/schedulers/pndm
|
||||||
title: PNDMScheduler
|
title: PNDM
|
||||||
- local: api/schedulers/repaint
|
- local: api/schedulers/repaint
|
||||||
title: RePaintScheduler
|
title: RePaint Scheduler
|
||||||
- local: api/schedulers/score_sde_ve
|
- local: api/schedulers/singlestep_dpm_solver
|
||||||
title: ScoreSdeVeScheduler
|
title: Singlestep DPM-Solver
|
||||||
- local: api/schedulers/score_sde_vp
|
- local: api/schedulers/stochastic_karras_ve
|
||||||
title: ScoreSdeVpScheduler
|
title: Stochastic Kerras VE
|
||||||
- local: api/schedulers/unipc
|
- local: api/schedulers/unipc
|
||||||
title: UniPCMultistepScheduler
|
title: UniPCMultistepScheduler
|
||||||
|
- local: api/schedulers/score_sde_ve
|
||||||
|
title: VE-SDE
|
||||||
|
- local: api/schedulers/score_sde_vp
|
||||||
|
title: VP-SDE
|
||||||
- local: api/schedulers/vq_diffusion
|
- local: api/schedulers/vq_diffusion
|
||||||
title: VQDiffusionScheduler
|
title: VQDiffusionScheduler
|
||||||
title: Schedulers
|
title: Schedulers
|
||||||
- sections:
|
- sections:
|
||||||
- local: api/internal_classes_overview
|
- local: api/experimental/rl
|
||||||
title: Overview
|
title: RL Planning
|
||||||
- local: api/attnprocessor
|
title: Experimental Features
|
||||||
title: Attention Processor
|
|
||||||
- local: api/activations
|
|
||||||
title: Custom activation functions
|
|
||||||
- local: api/normalization
|
|
||||||
title: Custom normalization layers
|
|
||||||
- local: api/utilities
|
|
||||||
title: Utilities
|
|
||||||
- local: api/image_processor
|
|
||||||
title: VAE Image Processor
|
|
||||||
title: Internal classes
|
|
||||||
title: API
|
title: API
|
||||||
|
|||||||
@@ -1,15 +0,0 @@
|
|||||||
# Activation functions
|
|
||||||
|
|
||||||
Customized activation functions for supporting various models in 🤗 Diffusers.
|
|
||||||
|
|
||||||
## GELU
|
|
||||||
|
|
||||||
[[autodoc]] models.activations.GELU
|
|
||||||
|
|
||||||
## GEGLU
|
|
||||||
|
|
||||||
[[autodoc]] models.activations.GEGLU
|
|
||||||
|
|
||||||
## ApproximateGELU
|
|
||||||
|
|
||||||
[[autodoc]] models.activations.ApproximateGELU
|
|
||||||
@@ -1,45 +0,0 @@
|
|||||||
# Attention Processor
|
|
||||||
|
|
||||||
An attention processor is a class for applying different types of attention mechanisms.
|
|
||||||
|
|
||||||
## AttnProcessor
|
|
||||||
[[autodoc]] models.attention_processor.AttnProcessor
|
|
||||||
|
|
||||||
## AttnProcessor2_0
|
|
||||||
[[autodoc]] models.attention_processor.AttnProcessor2_0
|
|
||||||
|
|
||||||
## LoRAAttnProcessor
|
|
||||||
[[autodoc]] models.attention_processor.LoRAAttnProcessor
|
|
||||||
|
|
||||||
## LoRAAttnProcessor2_0
|
|
||||||
[[autodoc]] models.attention_processor.LoRAAttnProcessor2_0
|
|
||||||
|
|
||||||
## CustomDiffusionAttnProcessor
|
|
||||||
[[autodoc]] models.attention_processor.CustomDiffusionAttnProcessor
|
|
||||||
|
|
||||||
## CustomDiffusionAttnProcessor2_0
|
|
||||||
[[autodoc]] models.attention_processor.CustomDiffusionAttnProcessor2_0
|
|
||||||
|
|
||||||
## AttnAddedKVProcessor
|
|
||||||
[[autodoc]] models.attention_processor.AttnAddedKVProcessor
|
|
||||||
|
|
||||||
## AttnAddedKVProcessor2_0
|
|
||||||
[[autodoc]] models.attention_processor.AttnAddedKVProcessor2_0
|
|
||||||
|
|
||||||
## LoRAAttnAddedKVProcessor
|
|
||||||
[[autodoc]] models.attention_processor.LoRAAttnAddedKVProcessor
|
|
||||||
|
|
||||||
## XFormersAttnProcessor
|
|
||||||
[[autodoc]] models.attention_processor.XFormersAttnProcessor
|
|
||||||
|
|
||||||
## LoRAXFormersAttnProcessor
|
|
||||||
[[autodoc]] models.attention_processor.LoRAXFormersAttnProcessor
|
|
||||||
|
|
||||||
## CustomDiffusionXFormersAttnProcessor
|
|
||||||
[[autodoc]] models.attention_processor.CustomDiffusionXFormersAttnProcessor
|
|
||||||
|
|
||||||
## SlicedAttnProcessor
|
|
||||||
[[autodoc]] models.attention_processor.SlicedAttnProcessor
|
|
||||||
|
|
||||||
## SlicedAttnAddedKVProcessor
|
|
||||||
[[autodoc]] models.attention_processor.SlicedAttnAddedKVProcessor
|
|
||||||
@@ -12,13 +12,8 @@ specific language governing permissions and limitations under the License.
|
|||||||
|
|
||||||
# Configuration
|
# Configuration
|
||||||
|
|
||||||
Schedulers from [`~schedulers.scheduling_utils.SchedulerMixin`] and models from [`ModelMixin`] inherit from [`ConfigMixin`] which stores all the parameters that are passed to their respective `__init__` methods in a JSON-configuration file.
|
Schedulers from [`~schedulers.scheduling_utils.SchedulerMixin`] and models from [`ModelMixin`] inherit from [`ConfigMixin`] which conveniently takes care of storing all the parameters that are
|
||||||
|
passed to their respective `__init__` methods in a JSON-configuration file.
|
||||||
<Tip>
|
|
||||||
|
|
||||||
To use private or [gated](https://huggingface.co/docs/hub/models-gated#gated-models) models, log-in with `huggingface-cli login`.
|
|
||||||
|
|
||||||
</Tip>
|
|
||||||
|
|
||||||
## ConfigMixin
|
## ConfigMixin
|
||||||
|
|
||||||
47
docs/source/en/api/diffusion_pipeline.mdx
Normal file
47
docs/source/en/api/diffusion_pipeline.mdx
Normal file
@@ -0,0 +1,47 @@
|
|||||||
|
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
|
||||||
|
|
||||||
|
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||||
|
the License. You may obtain a copy of the License at
|
||||||
|
|
||||||
|
http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
|
||||||
|
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||||
|
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||||
|
specific language governing permissions and limitations under the License.
|
||||||
|
-->
|
||||||
|
|
||||||
|
# Pipelines
|
||||||
|
|
||||||
|
The [`DiffusionPipeline`] is the easiest way to load any pretrained diffusion pipeline from the [Hub](https://huggingface.co/models?library=diffusers) and to use it in inference.
|
||||||
|
|
||||||
|
<Tip>
|
||||||
|
|
||||||
|
One should not use the Diffusion Pipeline class for training or fine-tuning a diffusion model. Individual
|
||||||
|
components of diffusion pipelines are usually trained individually, so we suggest to directly work
|
||||||
|
with [`UNetModel`] and [`UNetConditionModel`].
|
||||||
|
|
||||||
|
</Tip>
|
||||||
|
|
||||||
|
Any diffusion pipeline that is loaded with [`~DiffusionPipeline.from_pretrained`] will automatically
|
||||||
|
detect the pipeline type, *e.g.* [`StableDiffusionPipeline`] and consequently load each component of the
|
||||||
|
pipeline and pass them into the `__init__` function of the pipeline, *e.g.* [`~StableDiffusionPipeline.__init__`].
|
||||||
|
|
||||||
|
Any pipeline object can be saved locally with [`~DiffusionPipeline.save_pretrained`].
|
||||||
|
|
||||||
|
## DiffusionPipeline
|
||||||
|
[[autodoc]] DiffusionPipeline
|
||||||
|
- all
|
||||||
|
- __call__
|
||||||
|
- device
|
||||||
|
- to
|
||||||
|
- components
|
||||||
|
|
||||||
|
## ImagePipelineOutput
|
||||||
|
By default diffusion pipelines return an object of class
|
||||||
|
|
||||||
|
[[autodoc]] pipelines.ImagePipelineOutput
|
||||||
|
|
||||||
|
## AudioPipelineOutput
|
||||||
|
By default diffusion pipelines return an object of class
|
||||||
|
|
||||||
|
[[autodoc]] pipelines.AudioPipelineOutput
|
||||||
15
docs/source/en/api/experimental/rl.mdx
Normal file
15
docs/source/en/api/experimental/rl.mdx
Normal file
@@ -0,0 +1,15 @@
|
|||||||
|
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
|
||||||
|
|
||||||
|
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||||
|
the License. You may obtain a copy of the License at
|
||||||
|
|
||||||
|
http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
|
||||||
|
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||||
|
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||||
|
specific language governing permissions and limitations under the License.
|
||||||
|
-->
|
||||||
|
|
||||||
|
# TODO
|
||||||
|
|
||||||
|
Coming soon!
|
||||||
@@ -1,27 +0,0 @@
|
|||||||
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
|
|
||||||
|
|
||||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
|
||||||
the License. You may obtain a copy of the License at
|
|
||||||
|
|
||||||
http://www.apache.org/licenses/LICENSE-2.0
|
|
||||||
|
|
||||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
|
||||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
|
||||||
specific language governing permissions and limitations under the License.
|
|
||||||
-->
|
|
||||||
|
|
||||||
# VAE Image Processor
|
|
||||||
|
|
||||||
The [`VaeImageProcessor`] provides a unified API for [`StableDiffusionPipeline`]'s to prepare image inputs for VAE encoding and post-processing outputs once they're decoded. This includes transformations such as resizing, normalization, and conversion between PIL Image, PyTorch, and NumPy arrays.
|
|
||||||
|
|
||||||
All pipelines with [`VaeImageProcessor`] accepts PIL Image, PyTorch tensor, or NumPy arrays as image inputs and returns outputs based on the `output_type` argument by the user. You can pass encoded image latents directly to the pipeline and return latents from the pipeline as a specific output with the `output_type` argument (for example `output_type="pt"`). This allows you to take the generated latents from one pipeline and pass it to another pipeline as input without leaving the latent space. It also makes it much easier to use multiple pipelines together by passing PyTorch tensors directly between different pipelines.
|
|
||||||
|
|
||||||
## VaeImageProcessor
|
|
||||||
|
|
||||||
[[autodoc]] image_processor.VaeImageProcessor
|
|
||||||
|
|
||||||
## VaeImageProcessorLDM3D
|
|
||||||
|
|
||||||
The [`VaeImageProcessorLDM3D`] accepts RGB and depth inputs and returns RGB and depth outputs.
|
|
||||||
|
|
||||||
[[autodoc]] image_processor.VaeImageProcessorLDM3D
|
|
||||||
@@ -1,3 +0,0 @@
|
|||||||
# Overview
|
|
||||||
|
|
||||||
The APIs in this section are more experimental and prone to breaking changes. Most of them are used internally for development, but they may also be useful to you if you're interested in building a diffusion model with some custom parts or if you're interested in some of our helper utilities for working with 🤗 Diffusers.
|
|
||||||
@@ -1,49 +0,0 @@
|
|||||||
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
|
|
||||||
|
|
||||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
|
||||||
the License. You may obtain a copy of the License at
|
|
||||||
|
|
||||||
http://www.apache.org/licenses/LICENSE-2.0
|
|
||||||
|
|
||||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
|
||||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
|
||||||
specific language governing permissions and limitations under the License.
|
|
||||||
-->
|
|
||||||
|
|
||||||
# Loaders
|
|
||||||
|
|
||||||
Adapters (textual inversion, LoRA, hypernetworks) allow you to modify a diffusion model to generate images in a specific style without training or finetuning the entire model. The adapter weights are typically only a tiny fraction of the pretrained model's which making them very portable. 🤗 Diffusers provides an easy-to-use `LoaderMixin` API to load adapter weights.
|
|
||||||
|
|
||||||
<Tip warning={true}>
|
|
||||||
|
|
||||||
🧪 The `LoaderMixins` are highly experimental and prone to future changes. To use private or [gated](https://huggingface.co/docs/hub/models-gated#gated-models) models, log-in with `huggingface-cli login`.
|
|
||||||
|
|
||||||
</Tip>
|
|
||||||
|
|
||||||
## UNet2DConditionLoadersMixin
|
|
||||||
|
|
||||||
[[autodoc]] loaders.UNet2DConditionLoadersMixin
|
|
||||||
|
|
||||||
## TextualInversionLoaderMixin
|
|
||||||
|
|
||||||
[[autodoc]] loaders.TextualInversionLoaderMixin
|
|
||||||
|
|
||||||
## StableDiffusionXLLoraLoaderMixin
|
|
||||||
|
|
||||||
[[autodoc]] loaders.StableDiffusionXLLoraLoaderMixin
|
|
||||||
|
|
||||||
## LoraLoaderMixin
|
|
||||||
|
|
||||||
[[autodoc]] loaders.LoraLoaderMixin
|
|
||||||
|
|
||||||
## FromSingleFileMixin
|
|
||||||
|
|
||||||
[[autodoc]] loaders.FromSingleFileMixin
|
|
||||||
|
|
||||||
## FromOriginalControlnetMixin
|
|
||||||
|
|
||||||
[[autodoc]] loaders.FromOriginalControlnetMixin
|
|
||||||
|
|
||||||
## FromOriginalVAEMixin
|
|
||||||
|
|
||||||
[[autodoc]] loaders.FromOriginalVAEMixin
|
|
||||||
42
docs/source/en/api/loaders.mdx
Normal file
42
docs/source/en/api/loaders.mdx
Normal file
@@ -0,0 +1,42 @@
|
|||||||
|
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
|
||||||
|
|
||||||
|
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||||
|
the License. You may obtain a copy of the License at
|
||||||
|
|
||||||
|
http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
|
||||||
|
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||||
|
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||||
|
specific language governing permissions and limitations under the License.
|
||||||
|
-->
|
||||||
|
|
||||||
|
# Loaders
|
||||||
|
|
||||||
|
There are many ways to train adapter neural networks for diffusion models, such as
|
||||||
|
- [Textual Inversion](./training/text_inversion.mdx)
|
||||||
|
- [LoRA](https://github.com/cloneofsimo/lora)
|
||||||
|
- [Hypernetworks](https://arxiv.org/abs/1609.09106)
|
||||||
|
|
||||||
|
Such adapter neural networks often only consist of a fraction of the number of weights compared
|
||||||
|
to the pretrained model and as such are very portable. The Diffusers library offers an easy-to-use
|
||||||
|
API to load such adapter neural networks via the [`loaders.py` module](https://github.com/huggingface/diffusers/blob/main/src/diffusers/loaders.py).
|
||||||
|
|
||||||
|
**Note**: This module is still highly experimental and prone to future changes.
|
||||||
|
|
||||||
|
## LoaderMixins
|
||||||
|
|
||||||
|
### UNet2DConditionLoadersMixin
|
||||||
|
|
||||||
|
[[autodoc]] loaders.UNet2DConditionLoadersMixin
|
||||||
|
|
||||||
|
### TextualInversionLoaderMixin
|
||||||
|
|
||||||
|
[[autodoc]] loaders.TextualInversionLoaderMixin
|
||||||
|
|
||||||
|
### LoraLoaderMixin
|
||||||
|
|
||||||
|
[[autodoc]] loaders.LoraLoaderMixin
|
||||||
|
|
||||||
|
### FromCkptMixin
|
||||||
|
|
||||||
|
[[autodoc]] loaders.FromCkptMixin
|
||||||
@@ -1,96 +0,0 @@
|
|||||||
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
|
|
||||||
|
|
||||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
|
||||||
the License. You may obtain a copy of the License at
|
|
||||||
|
|
||||||
http://www.apache.org/licenses/LICENSE-2.0
|
|
||||||
|
|
||||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
|
||||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
|
||||||
specific language governing permissions and limitations under the License.
|
|
||||||
-->
|
|
||||||
|
|
||||||
# Logging
|
|
||||||
|
|
||||||
🤗 Diffusers has a centralized logging system to easily manage the verbosity of the library. The default verbosity is set to `WARNING`.
|
|
||||||
|
|
||||||
To change the verbosity level, use one of the direct setters. For instance, to change the verbosity to the `INFO` level.
|
|
||||||
|
|
||||||
```python
|
|
||||||
import diffusers
|
|
||||||
|
|
||||||
diffusers.logging.set_verbosity_info()
|
|
||||||
```
|
|
||||||
|
|
||||||
You can also use the environment variable `DIFFUSERS_VERBOSITY` to override the default verbosity. You can set it
|
|
||||||
to one of the following: `debug`, `info`, `warning`, `error`, `critical`. For example:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
DIFFUSERS_VERBOSITY=error ./myprogram.py
|
|
||||||
```
|
|
||||||
|
|
||||||
Additionally, some `warnings` can be disabled by setting the environment variable
|
|
||||||
`DIFFUSERS_NO_ADVISORY_WARNINGS` to a true value, like `1`. This disables any warning logged by
|
|
||||||
[`logger.warning_advice`]. For example:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
DIFFUSERS_NO_ADVISORY_WARNINGS=1 ./myprogram.py
|
|
||||||
```
|
|
||||||
|
|
||||||
Here is an example of how to use the same logger as the library in your own module or script:
|
|
||||||
|
|
||||||
```python
|
|
||||||
from diffusers.utils import logging
|
|
||||||
|
|
||||||
logging.set_verbosity_info()
|
|
||||||
logger = logging.get_logger("diffusers")
|
|
||||||
logger.info("INFO")
|
|
||||||
logger.warning("WARN")
|
|
||||||
```
|
|
||||||
|
|
||||||
|
|
||||||
All methods of the logging module are documented below. The main methods are
|
|
||||||
[`logging.get_verbosity`] to get the current level of verbosity in the logger and
|
|
||||||
[`logging.set_verbosity`] to set the verbosity to the level of your choice.
|
|
||||||
|
|
||||||
In order from the least verbose to the most verbose:
|
|
||||||
|
|
||||||
| Method | Integer value | Description |
|
|
||||||
|----------------------------------------------------------:|--------------:|----------------------------------------------------:|
|
|
||||||
| `diffusers.logging.CRITICAL` or `diffusers.logging.FATAL` | 50 | only report the most critical errors |
|
|
||||||
| `diffusers.logging.ERROR` | 40 | only report errors |
|
|
||||||
| `diffusers.logging.WARNING` or `diffusers.logging.WARN` | 30 | only report errors and warnings (default) |
|
|
||||||
| `diffusers.logging.INFO` | 20 | only report errors, warnings, and basic information |
|
|
||||||
| `diffusers.logging.DEBUG` | 10 | report all information |
|
|
||||||
|
|
||||||
By default, `tqdm` progress bars are displayed during model download. [`logging.disable_progress_bar`] and [`logging.enable_progress_bar`] are used to enable or disable this behavior.
|
|
||||||
|
|
||||||
## Base setters
|
|
||||||
|
|
||||||
[[autodoc]] utils.logging.set_verbosity_error
|
|
||||||
|
|
||||||
[[autodoc]] utils.logging.set_verbosity_warning
|
|
||||||
|
|
||||||
[[autodoc]] utils.logging.set_verbosity_info
|
|
||||||
|
|
||||||
[[autodoc]] utils.logging.set_verbosity_debug
|
|
||||||
|
|
||||||
## Other functions
|
|
||||||
|
|
||||||
[[autodoc]] utils.logging.get_verbosity
|
|
||||||
|
|
||||||
[[autodoc]] utils.logging.set_verbosity
|
|
||||||
|
|
||||||
[[autodoc]] utils.logging.get_logger
|
|
||||||
|
|
||||||
[[autodoc]] utils.logging.enable_default_handler
|
|
||||||
|
|
||||||
[[autodoc]] utils.logging.disable_default_handler
|
|
||||||
|
|
||||||
[[autodoc]] utils.logging.enable_explicit_format
|
|
||||||
|
|
||||||
[[autodoc]] utils.logging.reset_format
|
|
||||||
|
|
||||||
[[autodoc]] utils.logging.enable_progress_bar
|
|
||||||
|
|
||||||
[[autodoc]] utils.logging.disable_progress_bar
|
|
||||||
98
docs/source/en/api/logging.mdx
Normal file
98
docs/source/en/api/logging.mdx
Normal file
@@ -0,0 +1,98 @@
|
|||||||
|
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
|
||||||
|
|
||||||
|
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||||
|
the License. You may obtain a copy of the License at
|
||||||
|
|
||||||
|
http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
|
||||||
|
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||||
|
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||||
|
specific language governing permissions and limitations under the License.
|
||||||
|
-->
|
||||||
|
|
||||||
|
# Logging
|
||||||
|
|
||||||
|
🧨 Diffusers has a centralized logging system, so that you can setup the verbosity of the library easily.
|
||||||
|
|
||||||
|
Currently the default verbosity of the library is `WARNING`.
|
||||||
|
|
||||||
|
To change the level of verbosity, just use one of the direct setters. For instance, here is how to change the verbosity
|
||||||
|
to the INFO level.
|
||||||
|
|
||||||
|
```python
|
||||||
|
import diffusers
|
||||||
|
|
||||||
|
diffusers.logging.set_verbosity_info()
|
||||||
|
```
|
||||||
|
|
||||||
|
You can also use the environment variable `DIFFUSERS_VERBOSITY` to override the default verbosity. You can set it
|
||||||
|
to one of the following: `debug`, `info`, `warning`, `error`, `critical`. For example:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
DIFFUSERS_VERBOSITY=error ./myprogram.py
|
||||||
|
```
|
||||||
|
|
||||||
|
Additionally, some `warnings` can be disabled by setting the environment variable
|
||||||
|
`DIFFUSERS_NO_ADVISORY_WARNINGS` to a true value, like *1*. This will disable any warning that is logged using
|
||||||
|
[`logger.warning_advice`]. For example:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
DIFFUSERS_NO_ADVISORY_WARNINGS=1 ./myprogram.py
|
||||||
|
```
|
||||||
|
|
||||||
|
Here is an example of how to use the same logger as the library in your own module or script:
|
||||||
|
|
||||||
|
```python
|
||||||
|
from diffusers.utils import logging
|
||||||
|
|
||||||
|
logging.set_verbosity_info()
|
||||||
|
logger = logging.get_logger("diffusers")
|
||||||
|
logger.info("INFO")
|
||||||
|
logger.warning("WARN")
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
All the methods of this logging module are documented below, the main ones are
|
||||||
|
[`logging.get_verbosity`] to get the current level of verbosity in the logger and
|
||||||
|
[`logging.set_verbosity`] to set the verbosity to the level of your choice. In order (from the least
|
||||||
|
verbose to the most verbose), those levels (with their corresponding int values in parenthesis) are:
|
||||||
|
|
||||||
|
- `diffusers.logging.CRITICAL` or `diffusers.logging.FATAL` (int value, 50): only report the most
|
||||||
|
critical errors.
|
||||||
|
- `diffusers.logging.ERROR` (int value, 40): only report errors.
|
||||||
|
- `diffusers.logging.WARNING` or `diffusers.logging.WARN` (int value, 30): only reports error and
|
||||||
|
warnings. This the default level used by the library.
|
||||||
|
- `diffusers.logging.INFO` (int value, 20): reports error, warnings and basic information.
|
||||||
|
- `diffusers.logging.DEBUG` (int value, 10): report all information.
|
||||||
|
|
||||||
|
By default, `tqdm` progress bars will be displayed during model download. [`logging.disable_progress_bar`] and [`logging.enable_progress_bar`] can be used to suppress or unsuppress this behavior.
|
||||||
|
|
||||||
|
## Base setters
|
||||||
|
|
||||||
|
[[autodoc]] logging.set_verbosity_error
|
||||||
|
|
||||||
|
[[autodoc]] logging.set_verbosity_warning
|
||||||
|
|
||||||
|
[[autodoc]] logging.set_verbosity_info
|
||||||
|
|
||||||
|
[[autodoc]] logging.set_verbosity_debug
|
||||||
|
|
||||||
|
## Other functions
|
||||||
|
|
||||||
|
[[autodoc]] logging.get_verbosity
|
||||||
|
|
||||||
|
[[autodoc]] logging.set_verbosity
|
||||||
|
|
||||||
|
[[autodoc]] logging.get_logger
|
||||||
|
|
||||||
|
[[autodoc]] logging.enable_default_handler
|
||||||
|
|
||||||
|
[[autodoc]] logging.disable_default_handler
|
||||||
|
|
||||||
|
[[autodoc]] logging.enable_explicit_format
|
||||||
|
|
||||||
|
[[autodoc]] logging.reset_format
|
||||||
|
|
||||||
|
[[autodoc]] logging.enable_progress_bar
|
||||||
|
|
||||||
|
[[autodoc]] logging.disable_progress_bar
|
||||||
107
docs/source/en/api/models.mdx
Normal file
107
docs/source/en/api/models.mdx
Normal file
@@ -0,0 +1,107 @@
|
|||||||
|
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
|
||||||
|
|
||||||
|
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||||
|
the License. You may obtain a copy of the License at
|
||||||
|
|
||||||
|
http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
|
||||||
|
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||||
|
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||||
|
specific language governing permissions and limitations under the License.
|
||||||
|
-->
|
||||||
|
|
||||||
|
# Models
|
||||||
|
|
||||||
|
Diffusers contains pretrained models for popular algorithms and modules for creating the next set of diffusion models.
|
||||||
|
The primary function of these models is to denoise an input sample, by modeling the distribution $p_\theta(\mathbf{x}_{t-1}|\mathbf{x}_t)$.
|
||||||
|
The models are built on the base class ['ModelMixin'] that is a `torch.nn.module` with basic functionality for saving and loading models both locally and from the HuggingFace hub.
|
||||||
|
|
||||||
|
## ModelMixin
|
||||||
|
[[autodoc]] ModelMixin
|
||||||
|
|
||||||
|
## UNet2DOutput
|
||||||
|
[[autodoc]] models.unet_2d.UNet2DOutput
|
||||||
|
|
||||||
|
## UNet2DModel
|
||||||
|
[[autodoc]] UNet2DModel
|
||||||
|
|
||||||
|
## UNet1DOutput
|
||||||
|
[[autodoc]] models.unet_1d.UNet1DOutput
|
||||||
|
|
||||||
|
## UNet1DModel
|
||||||
|
[[autodoc]] UNet1DModel
|
||||||
|
|
||||||
|
## UNet2DConditionOutput
|
||||||
|
[[autodoc]] models.unet_2d_condition.UNet2DConditionOutput
|
||||||
|
|
||||||
|
## UNet2DConditionModel
|
||||||
|
[[autodoc]] UNet2DConditionModel
|
||||||
|
|
||||||
|
## UNet3DConditionOutput
|
||||||
|
[[autodoc]] models.unet_3d_condition.UNet3DConditionOutput
|
||||||
|
|
||||||
|
## UNet3DConditionModel
|
||||||
|
[[autodoc]] UNet3DConditionModel
|
||||||
|
|
||||||
|
## DecoderOutput
|
||||||
|
[[autodoc]] models.vae.DecoderOutput
|
||||||
|
|
||||||
|
## VQEncoderOutput
|
||||||
|
[[autodoc]] models.vq_model.VQEncoderOutput
|
||||||
|
|
||||||
|
## VQModel
|
||||||
|
[[autodoc]] VQModel
|
||||||
|
|
||||||
|
## AutoencoderKLOutput
|
||||||
|
[[autodoc]] models.autoencoder_kl.AutoencoderKLOutput
|
||||||
|
|
||||||
|
## AutoencoderKL
|
||||||
|
[[autodoc]] AutoencoderKL
|
||||||
|
|
||||||
|
## Transformer2DModel
|
||||||
|
[[autodoc]] Transformer2DModel
|
||||||
|
|
||||||
|
## Transformer2DModelOutput
|
||||||
|
[[autodoc]] models.transformer_2d.Transformer2DModelOutput
|
||||||
|
|
||||||
|
## TransformerTemporalModel
|
||||||
|
[[autodoc]] models.transformer_temporal.TransformerTemporalModel
|
||||||
|
|
||||||
|
## Transformer2DModelOutput
|
||||||
|
[[autodoc]] models.transformer_temporal.TransformerTemporalModelOutput
|
||||||
|
|
||||||
|
## PriorTransformer
|
||||||
|
[[autodoc]] models.prior_transformer.PriorTransformer
|
||||||
|
|
||||||
|
## PriorTransformerOutput
|
||||||
|
[[autodoc]] models.prior_transformer.PriorTransformerOutput
|
||||||
|
|
||||||
|
## ControlNetOutput
|
||||||
|
[[autodoc]] models.controlnet.ControlNetOutput
|
||||||
|
|
||||||
|
## ControlNetModel
|
||||||
|
[[autodoc]] ControlNetModel
|
||||||
|
|
||||||
|
## FlaxModelMixin
|
||||||
|
[[autodoc]] FlaxModelMixin
|
||||||
|
|
||||||
|
## FlaxUNet2DConditionOutput
|
||||||
|
[[autodoc]] models.unet_2d_condition_flax.FlaxUNet2DConditionOutput
|
||||||
|
|
||||||
|
## FlaxUNet2DConditionModel
|
||||||
|
[[autodoc]] FlaxUNet2DConditionModel
|
||||||
|
|
||||||
|
## FlaxDecoderOutput
|
||||||
|
[[autodoc]] models.vae_flax.FlaxDecoderOutput
|
||||||
|
|
||||||
|
## FlaxAutoencoderKLOutput
|
||||||
|
[[autodoc]] models.vae_flax.FlaxAutoencoderKLOutput
|
||||||
|
|
||||||
|
## FlaxAutoencoderKL
|
||||||
|
[[autodoc]] FlaxAutoencoderKL
|
||||||
|
|
||||||
|
## FlaxControlNetOutput
|
||||||
|
[[autodoc]] models.controlnet_flax.FlaxControlNetOutput
|
||||||
|
|
||||||
|
## FlaxControlNetModel
|
||||||
|
[[autodoc]] FlaxControlNetModel
|
||||||
@@ -1,55 +0,0 @@
|
|||||||
# AsymmetricAutoencoderKL
|
|
||||||
|
|
||||||
Improved larger variational autoencoder (VAE) model with KL loss for inpainting task: [Designing a Better Asymmetric VQGAN for StableDiffusion](https://arxiv.org/abs/2306.04632) by Zixin Zhu, Xuelu Feng, Dongdong Chen, Jianmin Bao, Le Wang, Yinpeng Chen, Lu Yuan, Gang Hua.
|
|
||||||
|
|
||||||
The abstract from the paper is:
|
|
||||||
|
|
||||||
*StableDiffusion is a revolutionary text-to-image generator that is causing a stir in the world of image generation and editing. Unlike traditional methods that learn a diffusion model in pixel space, StableDiffusion learns a diffusion model in the latent space via a VQGAN, ensuring both efficiency and quality. It not only supports image generation tasks, but also enables image editing for real images, such as image inpainting and local editing. However, we have observed that the vanilla VQGAN used in StableDiffusion leads to significant information loss, causing distortion artifacts even in non-edited image regions. To this end, we propose a new asymmetric VQGAN with two simple designs. Firstly, in addition to the input from the encoder, the decoder contains a conditional branch that incorporates information from task-specific priors, such as the unmasked image region in inpainting. Secondly, the decoder is much heavier than the encoder, allowing for more detailed recovery while only slightly increasing the total inference cost. The training cost of our asymmetric VQGAN is cheap, and we only need to retrain a new asymmetric decoder while keeping the vanilla VQGAN encoder and StableDiffusion unchanged. Our asymmetric VQGAN can be widely used in StableDiffusion-based inpainting and local editing methods. Extensive experiments demonstrate that it can significantly improve the inpainting and editing performance, while maintaining the original text-to-image capability. The code is available at https://github.com/buxiangzhiren/Asymmetric_VQGAN*
|
|
||||||
|
|
||||||
Evaluation results can be found in section 4.1 of the original paper.
|
|
||||||
|
|
||||||
## Available checkpoints
|
|
||||||
|
|
||||||
* [https://huggingface.co/cross-attention/asymmetric-autoencoder-kl-x-1-5](https://huggingface.co/cross-attention/asymmetric-autoencoder-kl-x-1-5)
|
|
||||||
* [https://huggingface.co/cross-attention/asymmetric-autoencoder-kl-x-2](https://huggingface.co/cross-attention/asymmetric-autoencoder-kl-x-2)
|
|
||||||
|
|
||||||
## Example Usage
|
|
||||||
|
|
||||||
```python
|
|
||||||
from io import BytesIO
|
|
||||||
from PIL import Image
|
|
||||||
import requests
|
|
||||||
from diffusers import AsymmetricAutoencoderKL, StableDiffusionInpaintPipeline
|
|
||||||
|
|
||||||
|
|
||||||
def download_image(url: str) -> Image.Image:
|
|
||||||
response = requests.get(url)
|
|
||||||
return Image.open(BytesIO(response.content)).convert("RGB")
|
|
||||||
|
|
||||||
|
|
||||||
prompt = "a photo of a person"
|
|
||||||
img_url = "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/repaint/celeba_hq_256.png"
|
|
||||||
mask_url = "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/repaint/mask_256.png"
|
|
||||||
|
|
||||||
image = download_image(img_url).resize((256, 256))
|
|
||||||
mask_image = download_image(mask_url).resize((256, 256))
|
|
||||||
|
|
||||||
pipe = StableDiffusionInpaintPipeline.from_pretrained("runwayml/stable-diffusion-inpainting")
|
|
||||||
pipe.vae = AsymmetricAutoencoderKL.from_pretrained("cross-attention/asymmetric-autoencoder-kl-x-1-5")
|
|
||||||
pipe.to("cuda")
|
|
||||||
|
|
||||||
image = pipe(prompt=prompt, image=image, mask_image=mask_image).images[0]
|
|
||||||
image.save("image.jpeg")
|
|
||||||
```
|
|
||||||
|
|
||||||
## AsymmetricAutoencoderKL
|
|
||||||
|
|
||||||
[[autodoc]] models.autoencoder_asym_kl.AsymmetricAutoencoderKL
|
|
||||||
|
|
||||||
## AutoencoderKLOutput
|
|
||||||
|
|
||||||
[[autodoc]] models.autoencoder_kl.AutoencoderKLOutput
|
|
||||||
|
|
||||||
## DecoderOutput
|
|
||||||
|
|
||||||
[[autodoc]] models.vae.DecoderOutput
|
|
||||||
@@ -1,45 +0,0 @@
|
|||||||
# Tiny AutoEncoder
|
|
||||||
|
|
||||||
Tiny AutoEncoder for Stable Diffusion (TAESD) was introduced in [madebyollin/taesd](https://github.com/madebyollin/taesd) by Ollin Boer Bohan. It is a tiny distilled version of Stable Diffusion's VAE that can quickly decode the latents in a [`StableDiffusionPipeline`] or [`StableDiffusionXLPipeline`] almost instantly.
|
|
||||||
|
|
||||||
To use with Stable Diffusion v-2.1:
|
|
||||||
|
|
||||||
```python
|
|
||||||
import torch
|
|
||||||
from diffusers import DiffusionPipeline, AutoencoderTiny
|
|
||||||
|
|
||||||
pipe = DiffusionPipeline.from_pretrained(
|
|
||||||
"stabilityai/stable-diffusion-2-1-base", torch_dtype=torch.float16
|
|
||||||
)
|
|
||||||
pipe.vae = AutoencoderTiny.from_pretrained("madebyollin/taesd", torch_dtype=torch.float16)
|
|
||||||
pipe = pipe.to("cuda")
|
|
||||||
|
|
||||||
prompt = "slice of delicious New York-style berry cheesecake"
|
|
||||||
image = pipe(prompt, num_inference_steps=25).images[0]
|
|
||||||
image.save("cheesecake.png")
|
|
||||||
```
|
|
||||||
|
|
||||||
To use with Stable Diffusion XL 1.0
|
|
||||||
|
|
||||||
```python
|
|
||||||
import torch
|
|
||||||
from diffusers import DiffusionPipeline, AutoencoderTiny
|
|
||||||
|
|
||||||
pipe = DiffusionPipeline.from_pretrained(
|
|
||||||
"stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16
|
|
||||||
)
|
|
||||||
pipe.vae = AutoencoderTiny.from_pretrained("madebyollin/taesdxl", torch_dtype=torch.float16)
|
|
||||||
pipe = pipe.to("cuda")
|
|
||||||
|
|
||||||
prompt = "slice of delicious New York-style berry cheesecake"
|
|
||||||
image = pipe(prompt, num_inference_steps=25).images[0]
|
|
||||||
image.save("cheesecake_sdxl.png")
|
|
||||||
```
|
|
||||||
|
|
||||||
## AutoencoderTiny
|
|
||||||
|
|
||||||
[[autodoc]] AutoencoderTiny
|
|
||||||
|
|
||||||
## AutoencoderTinyOutput
|
|
||||||
|
|
||||||
[[autodoc]] models.autoencoder_tiny.AutoencoderTinyOutput
|
|
||||||
@@ -1,43 +0,0 @@
|
|||||||
# AutoencoderKL
|
|
||||||
|
|
||||||
The variational autoencoder (VAE) model with KL loss was introduced in [Auto-Encoding Variational Bayes](https://arxiv.org/abs/1312.6114v11) by Diederik P. Kingma and Max Welling. The model is used in 🤗 Diffusers to encode images into latents and to decode latent representations into images.
|
|
||||||
|
|
||||||
The abstract from the paper is:
|
|
||||||
|
|
||||||
*How can we perform efficient inference and learning in directed probabilistic models, in the presence of continuous latent variables with intractable posterior distributions, and large datasets? We introduce a stochastic variational inference and learning algorithm that scales to large datasets and, under some mild differentiability conditions, even works in the intractable case. Our contributions are two-fold. First, we show that a reparameterization of the variational lower bound yields a lower bound estimator that can be straightforwardly optimized using standard stochastic gradient methods. Second, we show that for i.i.d. datasets with continuous latent variables per datapoint, posterior inference can be made especially efficient by fitting an approximate inference model (also called a recognition model) to the intractable posterior using the proposed lower bound estimator. Theoretical advantages are reflected in experimental results.*
|
|
||||||
|
|
||||||
## Loading from the original format
|
|
||||||
|
|
||||||
By default the [`AutoencoderKL`] should be loaded with [`~ModelMixin.from_pretrained`], but it can also be loaded
|
|
||||||
from the original format using [`FromOriginalVAEMixin.from_single_file`] as follows:
|
|
||||||
|
|
||||||
```py
|
|
||||||
from diffusers import AutoencoderKL
|
|
||||||
|
|
||||||
url = "https://huggingface.co/stabilityai/sd-vae-ft-mse-original/blob/main/vae-ft-mse-840000-ema-pruned.safetensors" # can also be local file
|
|
||||||
model = AutoencoderKL.from_single_file(url)
|
|
||||||
```
|
|
||||||
|
|
||||||
## AutoencoderKL
|
|
||||||
|
|
||||||
[[autodoc]] AutoencoderKL
|
|
||||||
|
|
||||||
## AutoencoderKLOutput
|
|
||||||
|
|
||||||
[[autodoc]] models.autoencoder_kl.AutoencoderKLOutput
|
|
||||||
|
|
||||||
## DecoderOutput
|
|
||||||
|
|
||||||
[[autodoc]] models.vae.DecoderOutput
|
|
||||||
|
|
||||||
## FlaxAutoencoderKL
|
|
||||||
|
|
||||||
[[autodoc]] FlaxAutoencoderKL
|
|
||||||
|
|
||||||
## FlaxAutoencoderKLOutput
|
|
||||||
|
|
||||||
[[autodoc]] models.vae_flax.FlaxAutoencoderKLOutput
|
|
||||||
|
|
||||||
## FlaxDecoderOutput
|
|
||||||
|
|
||||||
[[autodoc]] models.vae_flax.FlaxDecoderOutput
|
|
||||||
@@ -1,38 +0,0 @@
|
|||||||
# ControlNet
|
|
||||||
|
|
||||||
The ControlNet model was introduced in [Adding Conditional Control to Text-to-Image Diffusion Models](https://huggingface.co/papers/2302.05543) by Lvmin Zhang and Maneesh Agrawala. It provides a greater degree of control over text-to-image generation by conditioning the model on additional inputs such as edge maps, depth maps, segmentation maps, and keypoints for pose detection.
|
|
||||||
|
|
||||||
The abstract from the paper is:
|
|
||||||
|
|
||||||
*We present a neural network structure, ControlNet, to control pretrained large diffusion models to support additional input conditions. The ControlNet learns task-specific conditions in an end-to-end way, and the learning is robust even when the training dataset is small (< 50k). Moreover, training a ControlNet is as fast as fine-tuning a diffusion model, and the model can be trained on a personal devices. Alternatively, if powerful computation clusters are available, the model can scale to large amounts (millions to billions) of data. We report that large diffusion models like Stable Diffusion can be augmented with ControlNets to enable conditional inputs like edge maps, segmentation maps, keypoints, etc. This may enrich the methods to control large diffusion models and further facilitate related applications.*
|
|
||||||
|
|
||||||
## Loading from the original format
|
|
||||||
|
|
||||||
By default the [`ControlNetModel`] should be loaded with [`~ModelMixin.from_pretrained`], but it can also be loaded
|
|
||||||
from the original format using [`FromOriginalControlnetMixin.from_single_file`] as follows:
|
|
||||||
|
|
||||||
```py
|
|
||||||
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
|
|
||||||
|
|
||||||
url = "https://huggingface.co/lllyasviel/ControlNet-v1-1/blob/main/control_v11p_sd15_canny.pth" # can also be a local path
|
|
||||||
controlnet = ControlNetModel.from_single_file(url)
|
|
||||||
|
|
||||||
url = "https://huggingface.co/runwayml/stable-diffusion-v1-5/blob/main/v1-5-pruned.safetensors" # can also be a local path
|
|
||||||
pipe = StableDiffusionControlNetPipeline.from_single_file(url, controlnet=controlnet)
|
|
||||||
```
|
|
||||||
|
|
||||||
## ControlNetModel
|
|
||||||
|
|
||||||
[[autodoc]] ControlNetModel
|
|
||||||
|
|
||||||
## ControlNetOutput
|
|
||||||
|
|
||||||
[[autodoc]] models.controlnet.ControlNetOutput
|
|
||||||
|
|
||||||
## FlaxControlNetModel
|
|
||||||
|
|
||||||
[[autodoc]] FlaxControlNetModel
|
|
||||||
|
|
||||||
## FlaxControlNetOutput
|
|
||||||
|
|
||||||
[[autodoc]] models.controlnet_flax.FlaxControlNetOutput
|
|
||||||
@@ -1,16 +0,0 @@
|
|||||||
# Models
|
|
||||||
|
|
||||||
🤗 Diffusers provides pretrained models for popular algorithms and modules to create custom diffusion systems. The primary function of models is to denoise an input sample as modeled by the distribution \\(p_{\theta}(x_{t-1}|x_{t})\\).
|
|
||||||
|
|
||||||
All models are built from the base [`ModelMixin`] class which is a [`torch.nn.module`](https://pytorch.org/docs/stable/generated/torch.nn.Module.html) providing basic functionality for saving and loading models, locally and from the Hugging Face Hub.
|
|
||||||
|
|
||||||
## ModelMixin
|
|
||||||
[[autodoc]] ModelMixin
|
|
||||||
|
|
||||||
## FlaxModelMixin
|
|
||||||
|
|
||||||
[[autodoc]] FlaxModelMixin
|
|
||||||
|
|
||||||
## PushToHubMixin
|
|
||||||
|
|
||||||
[[autodoc]] utils.PushToHubMixin
|
|
||||||
@@ -1,16 +0,0 @@
|
|||||||
# Prior Transformer
|
|
||||||
|
|
||||||
The Prior Transformer was originally introduced in [Hierarchical Text-Conditional Image Generation with CLIP Latents
|
|
||||||
](https://huggingface.co/papers/2204.06125) by Ramesh et al. It is used to predict CLIP image embeddings from CLIP text embeddings; image embeddings are predicted through a denoising diffusion process.
|
|
||||||
|
|
||||||
The abstract from the paper is:
|
|
||||||
|
|
||||||
*Contrastive models like CLIP have been shown to learn robust representations of images that capture both semantics and style. To leverage these representations for image generation, we propose a two-stage model: a prior that generates a CLIP image embedding given a text caption, and a decoder that generates an image conditioned on the image embedding. We show that explicitly generating image representations improves image diversity with minimal loss in photorealism and caption similarity. Our decoders conditioned on image representations can also produce variations of an image that preserve both its semantics and style, while varying the non-essential details absent from the image representation. Moreover, the joint embedding space of CLIP enables language-guided image manipulations in a zero-shot fashion. We use diffusion models for the decoder and experiment with both autoregressive and diffusion models for the prior, finding that the latter are computationally more efficient and produce higher-quality samples.*
|
|
||||||
|
|
||||||
## PriorTransformer
|
|
||||||
|
|
||||||
[[autodoc]] PriorTransformer
|
|
||||||
|
|
||||||
## PriorTransformerOutput
|
|
||||||
|
|
||||||
[[autodoc]] models.prior_transformer.PriorTransformerOutput
|
|
||||||
@@ -1,29 +0,0 @@
|
|||||||
# Transformer2D
|
|
||||||
|
|
||||||
A Transformer model for image-like data from [CompVis](https://huggingface.co/CompVis) that is based on the [Vision Transformer](https://huggingface.co/papers/2010.11929) introduced by Dosovitskiy et al. The [`Transformer2DModel`] accepts discrete (classes of vector embeddings) or continuous (actual embeddings) inputs.
|
|
||||||
|
|
||||||
When the input is **continuous**:
|
|
||||||
|
|
||||||
1. Project the input and reshape it to `(batch_size, sequence_length, feature_dimension)`.
|
|
||||||
2. Apply the Transformer blocks in the standard way.
|
|
||||||
3. Reshape to image.
|
|
||||||
|
|
||||||
When the input is **discrete**:
|
|
||||||
|
|
||||||
<Tip>
|
|
||||||
|
|
||||||
It is assumed one of the input classes is the masked latent pixel. The predicted classes of the unnoised image don't contain a prediction for the masked pixel because the unnoised image cannot be masked.
|
|
||||||
|
|
||||||
</Tip>
|
|
||||||
|
|
||||||
1. Convert input (classes of latent pixels) to embeddings and apply positional embeddings.
|
|
||||||
2. Apply the Transformer blocks in the standard way.
|
|
||||||
3. Predict classes of unnoised image.
|
|
||||||
|
|
||||||
## Transformer2DModel
|
|
||||||
|
|
||||||
[[autodoc]] Transformer2DModel
|
|
||||||
|
|
||||||
## Transformer2DModelOutput
|
|
||||||
|
|
||||||
[[autodoc]] models.transformer_2d.Transformer2DModelOutput
|
|
||||||
@@ -1,11 +0,0 @@
|
|||||||
# Transformer Temporal
|
|
||||||
|
|
||||||
A Transformer model for video-like data.
|
|
||||||
|
|
||||||
## TransformerTemporalModel
|
|
||||||
|
|
||||||
[[autodoc]] models.transformer_temporal.TransformerTemporalModel
|
|
||||||
|
|
||||||
## TransformerTemporalModelOutput
|
|
||||||
|
|
||||||
[[autodoc]] models.transformer_temporal.TransformerTemporalModelOutput
|
|
||||||
@@ -1,13 +0,0 @@
|
|||||||
# UNetMotionModel
|
|
||||||
|
|
||||||
The [UNet](https://huggingface.co/papers/1505.04597) model was originally introduced by Ronneberger et al for biomedical image segmentation, but it is also commonly used in 🤗 Diffusers because it outputs images that are the same size as the input. It is one of the most important components of a diffusion system because it facilitates the actual diffusion process. There are several variants of the UNet model in 🤗 Diffusers, depending on it's number of dimensions and whether it is a conditional model or not. This is a 2D UNet model.
|
|
||||||
|
|
||||||
The abstract from the paper is:
|
|
||||||
|
|
||||||
*There is large consent that successful training of deep networks requires many thousand annotated training samples. In this paper, we present a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently. The architecture consists of a contracting path to capture context and a symmetric expanding path that enables precise localization. We show that such a network can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks. Using the same network trained on transmitted light microscopy images (phase contrast and DIC) we won the ISBI cell tracking challenge 2015 in these categories by a large margin. Moreover, the network is fast. Segmentation of a 512x512 image takes less than a second on a recent GPU. The full implementation (based on Caffe) and the trained networks are available at http://lmb.informatik.uni-freiburg.de/people/ronneber/u-net.*
|
|
||||||
|
|
||||||
## UNetMotionModel
|
|
||||||
[[autodoc]] UNetMotionModel
|
|
||||||
|
|
||||||
## UNet3DConditionOutput
|
|
||||||
[[autodoc]] models.unet_3d_condition.UNet3DConditionOutput
|
|
||||||
@@ -1,13 +0,0 @@
|
|||||||
# UNet1DModel
|
|
||||||
|
|
||||||
The [UNet](https://huggingface.co/papers/1505.04597) model was originally introduced by Ronneberger et al for biomedical image segmentation, but it is also commonly used in 🤗 Diffusers because it outputs images that are the same size as the input. It is one of the most important components of a diffusion system because it facilitates the actual diffusion process. There are several variants of the UNet model in 🤗 Diffusers, depending on it's number of dimensions and whether it is a conditional model or not. This is a 1D UNet model.
|
|
||||||
|
|
||||||
The abstract from the paper is:
|
|
||||||
|
|
||||||
*There is large consent that successful training of deep networks requires many thousand annotated training samples. In this paper, we present a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently. The architecture consists of a contracting path to capture context and a symmetric expanding path that enables precise localization. We show that such a network can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks. Using the same network trained on transmitted light microscopy images (phase contrast and DIC) we won the ISBI cell tracking challenge 2015 in these categories by a large margin. Moreover, the network is fast. Segmentation of a 512x512 image takes less than a second on a recent GPU. The full implementation (based on Caffe) and the trained networks are available at http://lmb.informatik.uni-freiburg.de/people/ronneber/u-net.*
|
|
||||||
|
|
||||||
## UNet1DModel
|
|
||||||
[[autodoc]] UNet1DModel
|
|
||||||
|
|
||||||
## UNet1DOutput
|
|
||||||
[[autodoc]] models.unet_1d.UNet1DOutput
|
|
||||||
@@ -1,19 +0,0 @@
|
|||||||
# UNet2DConditionModel
|
|
||||||
|
|
||||||
The [UNet](https://huggingface.co/papers/1505.04597) model was originally introduced by Ronneberger et al for biomedical image segmentation, but it is also commonly used in 🤗 Diffusers because it outputs images that are the same size as the input. It is one of the most important components of a diffusion system because it facilitates the actual diffusion process. There are several variants of the UNet model in 🤗 Diffusers, depending on it's number of dimensions and whether it is a conditional model or not. This is a 2D UNet conditional model.
|
|
||||||
|
|
||||||
The abstract from the paper is:
|
|
||||||
|
|
||||||
*There is large consent that successful training of deep networks requires many thousand annotated training samples. In this paper, we present a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently. The architecture consists of a contracting path to capture context and a symmetric expanding path that enables precise localization. We show that such a network can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks. Using the same network trained on transmitted light microscopy images (phase contrast and DIC) we won the ISBI cell tracking challenge 2015 in these categories by a large margin. Moreover, the network is fast. Segmentation of a 512x512 image takes less than a second on a recent GPU. The full implementation (based on Caffe) and the trained networks are available at http://lmb.informatik.uni-freiburg.de/people/ronneber/u-net.*
|
|
||||||
|
|
||||||
## UNet2DConditionModel
|
|
||||||
[[autodoc]] UNet2DConditionModel
|
|
||||||
|
|
||||||
## UNet2DConditionOutput
|
|
||||||
[[autodoc]] models.unet_2d_condition.UNet2DConditionOutput
|
|
||||||
|
|
||||||
## FlaxUNet2DConditionModel
|
|
||||||
[[autodoc]] models.unet_2d_condition_flax.FlaxUNet2DConditionModel
|
|
||||||
|
|
||||||
## FlaxUNet2DConditionOutput
|
|
||||||
[[autodoc]] models.unet_2d_condition_flax.FlaxUNet2DConditionOutput
|
|
||||||
@@ -1,13 +0,0 @@
|
|||||||
# UNet2DModel
|
|
||||||
|
|
||||||
The [UNet](https://huggingface.co/papers/1505.04597) model was originally introduced by Ronneberger et al for biomedical image segmentation, but it is also commonly used in 🤗 Diffusers because it outputs images that are the same size as the input. It is one of the most important components of a diffusion system because it facilitates the actual diffusion process. There are several variants of the UNet model in 🤗 Diffusers, depending on it's number of dimensions and whether it is a conditional model or not. This is a 2D UNet model.
|
|
||||||
|
|
||||||
The abstract from the paper is:
|
|
||||||
|
|
||||||
*There is large consent that successful training of deep networks requires many thousand annotated training samples. In this paper, we present a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently. The architecture consists of a contracting path to capture context and a symmetric expanding path that enables precise localization. We show that such a network can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks. Using the same network trained on transmitted light microscopy images (phase contrast and DIC) we won the ISBI cell tracking challenge 2015 in these categories by a large margin. Moreover, the network is fast. Segmentation of a 512x512 image takes less than a second on a recent GPU. The full implementation (based on Caffe) and the trained networks are available at http://lmb.informatik.uni-freiburg.de/people/ronneber/u-net.*
|
|
||||||
|
|
||||||
## UNet2DModel
|
|
||||||
[[autodoc]] UNet2DModel
|
|
||||||
|
|
||||||
## UNet2DOutput
|
|
||||||
[[autodoc]] models.unet_2d.UNet2DOutput
|
|
||||||
@@ -1,13 +0,0 @@
|
|||||||
# UNet3DConditionModel
|
|
||||||
|
|
||||||
The [UNet](https://huggingface.co/papers/1505.04597) model was originally introduced by Ronneberger et al for biomedical image segmentation, but it is also commonly used in 🤗 Diffusers because it outputs images that are the same size as the input. It is one of the most important components of a diffusion system because it facilitates the actual diffusion process. There are several variants of the UNet model in 🤗 Diffusers, depending on it's number of dimensions and whether it is a conditional model or not. This is a 3D UNet conditional model.
|
|
||||||
|
|
||||||
The abstract from the paper is:
|
|
||||||
|
|
||||||
*There is large consent that successful training of deep networks requires many thousand annotated training samples. In this paper, we present a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently. The architecture consists of a contracting path to capture context and a symmetric expanding path that enables precise localization. We show that such a network can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks. Using the same network trained on transmitted light microscopy images (phase contrast and DIC) we won the ISBI cell tracking challenge 2015 in these categories by a large margin. Moreover, the network is fast. Segmentation of a 512x512 image takes less than a second on a recent GPU. The full implementation (based on Caffe) and the trained networks are available at http://lmb.informatik.uni-freiburg.de/people/ronneber/u-net.*
|
|
||||||
|
|
||||||
## UNet3DConditionModel
|
|
||||||
[[autodoc]] UNet3DConditionModel
|
|
||||||
|
|
||||||
## UNet3DConditionOutput
|
|
||||||
[[autodoc]] models.unet_3d_condition.UNet3DConditionOutput
|
|
||||||
@@ -1,15 +0,0 @@
|
|||||||
# VQModel
|
|
||||||
|
|
||||||
The VQ-VAE model was introduced in [Neural Discrete Representation Learning](https://huggingface.co/papers/1711.00937) by Aaron van den Oord, Oriol Vinyals and Koray Kavukcuoglu. The model is used in 🤗 Diffusers to decode latent representations into images. Unlike [`AutoencoderKL`], the [`VQModel`] works in a quantized latent space.
|
|
||||||
|
|
||||||
The abstract from the paper is:
|
|
||||||
|
|
||||||
*Learning useful representations without supervision remains a key challenge in machine learning. In this paper, we propose a simple yet powerful generative model that learns such discrete representations. Our model, the Vector Quantised-Variational AutoEncoder (VQ-VAE), differs from VAEs in two key ways: the encoder network outputs discrete, rather than continuous, codes; and the prior is learnt rather than static. In order to learn a discrete latent representation, we incorporate ideas from vector quantisation (VQ). Using the VQ method allows the model to circumvent issues of "posterior collapse" -- where the latents are ignored when they are paired with a powerful autoregressive decoder -- typically observed in the VAE framework. Pairing these representations with an autoregressive prior, the model can generate high quality images, videos, and speech as well as doing high quality speaker conversion and unsupervised learning of phonemes, providing further evidence of the utility of the learnt representations.*
|
|
||||||
|
|
||||||
## VQModel
|
|
||||||
|
|
||||||
[[autodoc]] VQModel
|
|
||||||
|
|
||||||
## VQEncoderOutput
|
|
||||||
|
|
||||||
[[autodoc]] models.vq_model.VQEncoderOutput
|
|
||||||
@@ -1,15 +0,0 @@
|
|||||||
# Normalization layers
|
|
||||||
|
|
||||||
Customized normalization layers for supporting various models in 🤗 Diffusers.
|
|
||||||
|
|
||||||
## AdaLayerNorm
|
|
||||||
|
|
||||||
[[autodoc]] models.normalization.AdaLayerNorm
|
|
||||||
|
|
||||||
## AdaLayerNormZero
|
|
||||||
|
|
||||||
[[autodoc]] models.normalization.AdaLayerNormZero
|
|
||||||
|
|
||||||
## AdaGroupNorm
|
|
||||||
|
|
||||||
[[autodoc]] models.normalization.AdaGroupNorm
|
|
||||||
@@ -1,67 +0,0 @@
|
|||||||
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
|
|
||||||
|
|
||||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
|
||||||
the License. You may obtain a copy of the License at
|
|
||||||
|
|
||||||
http://www.apache.org/licenses/LICENSE-2.0
|
|
||||||
|
|
||||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
|
||||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
|
||||||
specific language governing permissions and limitations under the License.
|
|
||||||
-->
|
|
||||||
|
|
||||||
# Outputs
|
|
||||||
|
|
||||||
All models outputs are subclasses of [`~utils.BaseOutput`], data structures containing all the information returned by the model. The outputs can also be used as tuples or dictionaries.
|
|
||||||
|
|
||||||
For example:
|
|
||||||
|
|
||||||
```python
|
|
||||||
from diffusers import DDIMPipeline
|
|
||||||
|
|
||||||
pipeline = DDIMPipeline.from_pretrained("google/ddpm-cifar10-32")
|
|
||||||
outputs = pipeline()
|
|
||||||
```
|
|
||||||
|
|
||||||
The `outputs` object is a [`~pipelines.ImagePipelineOutput`] which means it has an image attribute.
|
|
||||||
|
|
||||||
You can access each attribute as you normally would or with a keyword lookup, and if that attribute is not returned by the model, you will get `None`:
|
|
||||||
|
|
||||||
```python
|
|
||||||
outputs.images
|
|
||||||
outputs["images"]
|
|
||||||
```
|
|
||||||
|
|
||||||
When considering the `outputs` object as a tuple, it only considers the attributes that don't have `None` values.
|
|
||||||
For instance, retrieving an image by indexing into it returns the tuple `(outputs.images)`:
|
|
||||||
|
|
||||||
```python
|
|
||||||
outputs[:1]
|
|
||||||
```
|
|
||||||
|
|
||||||
<Tip>
|
|
||||||
|
|
||||||
To check a specific pipeline or model output, refer to its corresponding API documentation.
|
|
||||||
|
|
||||||
</Tip>
|
|
||||||
|
|
||||||
## BaseOutput
|
|
||||||
|
|
||||||
[[autodoc]] utils.BaseOutput
|
|
||||||
- to_tuple
|
|
||||||
|
|
||||||
## ImagePipelineOutput
|
|
||||||
|
|
||||||
[[autodoc]] pipelines.ImagePipelineOutput
|
|
||||||
|
|
||||||
## FlaxImagePipelineOutput
|
|
||||||
|
|
||||||
[[autodoc]] pipelines.pipeline_flax_utils.FlaxImagePipelineOutput
|
|
||||||
|
|
||||||
## AudioPipelineOutput
|
|
||||||
|
|
||||||
[[autodoc]] pipelines.AudioPipelineOutput
|
|
||||||
|
|
||||||
## ImageTextPipelineOutput
|
|
||||||
|
|
||||||
[[autodoc]] ImageTextPipelineOutput
|
|
||||||
55
docs/source/en/api/outputs.mdx
Normal file
55
docs/source/en/api/outputs.mdx
Normal file
@@ -0,0 +1,55 @@
|
|||||||
|
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
|
||||||
|
|
||||||
|
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||||
|
the License. You may obtain a copy of the License at
|
||||||
|
|
||||||
|
http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
|
||||||
|
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||||
|
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||||
|
specific language governing permissions and limitations under the License.
|
||||||
|
-->
|
||||||
|
|
||||||
|
# BaseOutputs
|
||||||
|
|
||||||
|
All models have outputs that are instances of subclasses of [`~utils.BaseOutput`]. Those are
|
||||||
|
data structures containing all the information returned by the model, but that can also be used as tuples or
|
||||||
|
dictionaries.
|
||||||
|
|
||||||
|
Let's see how this looks in an example:
|
||||||
|
|
||||||
|
```python
|
||||||
|
from diffusers import DDIMPipeline
|
||||||
|
|
||||||
|
pipeline = DDIMPipeline.from_pretrained("google/ddpm-cifar10-32")
|
||||||
|
outputs = pipeline()
|
||||||
|
```
|
||||||
|
|
||||||
|
The `outputs` object is a [`~pipelines.ImagePipelineOutput`], as we can see in the
|
||||||
|
documentation of that class below, it means it has an image attribute.
|
||||||
|
|
||||||
|
You can access each attribute as you would usually do, and if that attribute has not been returned by the model, you will get `None`:
|
||||||
|
|
||||||
|
```python
|
||||||
|
outputs.images
|
||||||
|
```
|
||||||
|
|
||||||
|
or via keyword lookup
|
||||||
|
|
||||||
|
```python
|
||||||
|
outputs["images"]
|
||||||
|
```
|
||||||
|
|
||||||
|
When considering our `outputs` object as tuple, it only considers the attributes that don't have `None` values.
|
||||||
|
Here for instance, we could retrieve images via indexing:
|
||||||
|
|
||||||
|
```python
|
||||||
|
outputs[:1]
|
||||||
|
```
|
||||||
|
|
||||||
|
which will return the tuple `(outputs.images)` for instance.
|
||||||
|
|
||||||
|
## BaseOutput
|
||||||
|
|
||||||
|
[[autodoc]] utils.BaseOutput
|
||||||
|
- to_tuple
|
||||||
@@ -1,47 +0,0 @@
|
|||||||
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
|
|
||||||
|
|
||||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
|
||||||
the License. You may obtain a copy of the License at
|
|
||||||
|
|
||||||
http://www.apache.org/licenses/LICENSE-2.0
|
|
||||||
|
|
||||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
|
||||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
|
||||||
specific language governing permissions and limitations under the License.
|
|
||||||
-->
|
|
||||||
|
|
||||||
# AltDiffusion
|
|
||||||
|
|
||||||
AltDiffusion was proposed in [AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities](https://huggingface.co/papers/2211.06679) by Zhongzhi Chen, Guang Liu, Bo-Wen Zhang, Fulong Ye, Qinghong Yang, Ledell Wu.
|
|
||||||
|
|
||||||
The abstract from the paper is:
|
|
||||||
|
|
||||||
*In this work, we present a conceptually simple and effective method to train a strong bilingual multimodal representation model. Starting from the pretrained multimodal representation model CLIP released by OpenAI, we switched its text encoder with a pretrained multilingual text encoder XLM-R, and aligned both languages and image representations by a two-stage training schema consisting of teacher learning and contrastive learning. We validate our method through evaluations of a wide range of tasks. We set new state-of-the-art performances on a bunch of tasks including ImageNet-CN, Flicker30k- CN, and COCO-CN. Further, we obtain very close performances with CLIP on almost all tasks, suggesting that one can simply alter the text encoder in CLIP for extended capabilities such as multilingual understanding.*
|
|
||||||
|
|
||||||
## Tips
|
|
||||||
|
|
||||||
`AltDiffusion` is conceptually the same as [Stable Diffusion](./stable_diffusion/overview).
|
|
||||||
|
|
||||||
<Tip>
|
|
||||||
|
|
||||||
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-components-across-pipelines) section to learn how to efficiently load the same components into multiple pipelines.
|
|
||||||
|
|
||||||
</Tip>
|
|
||||||
|
|
||||||
## AltDiffusionPipeline
|
|
||||||
|
|
||||||
[[autodoc]] AltDiffusionPipeline
|
|
||||||
- all
|
|
||||||
- __call__
|
|
||||||
|
|
||||||
## AltDiffusionImg2ImgPipeline
|
|
||||||
|
|
||||||
[[autodoc]] AltDiffusionImg2ImgPipeline
|
|
||||||
- all
|
|
||||||
- __call__
|
|
||||||
|
|
||||||
## AltDiffusionPipelineOutput
|
|
||||||
|
|
||||||
[[autodoc]] pipelines.alt_diffusion.AltDiffusionPipelineOutput
|
|
||||||
- all
|
|
||||||
- __call__
|
|
||||||
83
docs/source/en/api/pipelines/alt_diffusion.mdx
Normal file
83
docs/source/en/api/pipelines/alt_diffusion.mdx
Normal file
@@ -0,0 +1,83 @@
|
|||||||
|
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
|
||||||
|
|
||||||
|
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||||
|
the License. You may obtain a copy of the License at
|
||||||
|
|
||||||
|
http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
|
||||||
|
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||||
|
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||||
|
specific language governing permissions and limitations under the License.
|
||||||
|
-->
|
||||||
|
|
||||||
|
# AltDiffusion
|
||||||
|
|
||||||
|
AltDiffusion was proposed in [AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities](https://arxiv.org/abs/2211.06679) by Zhongzhi Chen, Guang Liu, Bo-Wen Zhang, Fulong Ye, Qinghong Yang, Ledell Wu.
|
||||||
|
|
||||||
|
The abstract of the paper is the following:
|
||||||
|
|
||||||
|
*In this work, we present a conceptually simple and effective method to train a strong bilingual multimodal representation model. Starting from the pretrained multimodal representation model CLIP released by OpenAI, we switched its text encoder with a pretrained multilingual text encoder XLM-R, and aligned both languages and image representations by a two-stage training schema consisting of teacher learning and contrastive learning. We validate our method through evaluations of a wide range of tasks. We set new state-of-the-art performances on a bunch of tasks including ImageNet-CN, Flicker30k- CN, and COCO-CN. Further, we obtain very close performances with CLIP on almost all tasks, suggesting that one can simply alter the text encoder in CLIP for extended capabilities such as multilingual understanding.*
|
||||||
|
|
||||||
|
|
||||||
|
*Overview*:
|
||||||
|
|
||||||
|
| Pipeline | Tasks | Colab | Demo
|
||||||
|
|---|---|:---:|:---:|
|
||||||
|
| [pipeline_alt_diffusion.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/alt_diffusion/pipeline_alt_diffusion.py) | *Text-to-Image Generation* | - | -
|
||||||
|
| [pipeline_alt_diffusion_img2img.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/alt_diffusion/pipeline_alt_diffusion_img2img.py) | *Image-to-Image Text-Guided Generation* | - |-
|
||||||
|
|
||||||
|
## Tips
|
||||||
|
|
||||||
|
- AltDiffusion is conceptually exactly the same as [Stable Diffusion](./stable_diffusion/overview).
|
||||||
|
|
||||||
|
- *Run AltDiffusion*
|
||||||
|
|
||||||
|
AltDiffusion can be tested very easily with the [`AltDiffusionPipeline`], [`AltDiffusionImg2ImgPipeline`] and the `"BAAI/AltDiffusion-m9"` checkpoint exactly in the same way it is shown in the [Conditional Image Generation Guide](../../using-diffusers/conditional_image_generation) and the [Image-to-Image Generation Guide](../../using-diffusers/img2img).
|
||||||
|
|
||||||
|
- *How to load and use different schedulers.*
|
||||||
|
|
||||||
|
The alt diffusion pipeline uses [`DDIMScheduler`] scheduler by default. But `diffusers` provides many other schedulers that can be used with the alt diffusion pipeline such as [`PNDMScheduler`], [`LMSDiscreteScheduler`], [`EulerDiscreteScheduler`], [`EulerAncestralDiscreteScheduler`] etc.
|
||||||
|
To use a different scheduler, you can either change it via the [`ConfigMixin.from_config`] method or pass the `scheduler` argument to the `from_pretrained` method of the pipeline. For example, to use the [`EulerDiscreteScheduler`], you can do the following:
|
||||||
|
|
||||||
|
```python
|
||||||
|
>>> from diffusers import AltDiffusionPipeline, EulerDiscreteScheduler
|
||||||
|
|
||||||
|
>>> pipeline = AltDiffusionPipeline.from_pretrained("BAAI/AltDiffusion-m9")
|
||||||
|
>>> pipeline.scheduler = EulerDiscreteScheduler.from_config(pipeline.scheduler.config)
|
||||||
|
|
||||||
|
>>> # or
|
||||||
|
>>> euler_scheduler = EulerDiscreteScheduler.from_pretrained("BAAI/AltDiffusion-m9", subfolder="scheduler")
|
||||||
|
>>> pipeline = AltDiffusionPipeline.from_pretrained("BAAI/AltDiffusion-m9", scheduler=euler_scheduler)
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
- *How to convert all use cases with multiple or single pipeline*
|
||||||
|
|
||||||
|
If you want to use all possible use cases in a single `DiffusionPipeline` we recommend using the `components` functionality to instantiate all components in the most memory-efficient way:
|
||||||
|
|
||||||
|
```python
|
||||||
|
>>> from diffusers import (
|
||||||
|
... AltDiffusionPipeline,
|
||||||
|
... AltDiffusionImg2ImgPipeline,
|
||||||
|
... )
|
||||||
|
|
||||||
|
>>> text2img = AltDiffusionPipeline.from_pretrained("BAAI/AltDiffusion-m9")
|
||||||
|
>>> img2img = AltDiffusionImg2ImgPipeline(**text2img.components)
|
||||||
|
|
||||||
|
>>> # now you can use text2img(...) and img2img(...) just like the call methods of each respective pipeline
|
||||||
|
```
|
||||||
|
|
||||||
|
## AltDiffusionPipelineOutput
|
||||||
|
[[autodoc]] pipelines.alt_diffusion.AltDiffusionPipelineOutput
|
||||||
|
- all
|
||||||
|
- __call__
|
||||||
|
|
||||||
|
## AltDiffusionPipeline
|
||||||
|
[[autodoc]] AltDiffusionPipeline
|
||||||
|
- all
|
||||||
|
- __call__
|
||||||
|
|
||||||
|
## AltDiffusionImg2ImgPipeline
|
||||||
|
[[autodoc]] AltDiffusionImg2ImgPipeline
|
||||||
|
- all
|
||||||
|
- __call__
|
||||||
@@ -1,230 +0,0 @@
|
|||||||
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
|
|
||||||
|
|
||||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
|
||||||
the License. You may obtain a copy of the License at
|
|
||||||
|
|
||||||
http://www.apache.org/licenses/LICENSE-2.0
|
|
||||||
|
|
||||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
|
||||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
|
||||||
specific language governing permissions and limitations under the License.
|
|
||||||
-->
|
|
||||||
|
|
||||||
# Text-to-Video Generation with AnimateDiff
|
|
||||||
|
|
||||||
## Overview
|
|
||||||
|
|
||||||
[AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning](https://arxiv.org/abs/2307.04725) by Yuwei Guo, Ceyuan Yang*, Anyi Rao, Yaohui Wang, Yu Qiao, Dahua Lin, Bo Dai
|
|
||||||
|
|
||||||
The abstract of the paper is the following:
|
|
||||||
|
|
||||||
With the advance of text-to-image models (e.g., Stable Diffusion) and corresponding personalization techniques such as DreamBooth and LoRA, everyone can manifest their imagination into high-quality images at an affordable cost. Subsequently, there is a great demand for image animation techniques to further combine generated static images with motion dynamics. In this report, we propose a practical framework to animate most of the existing personalized text-to-image models once and for all, saving efforts in model-specific tuning. At the core of the proposed framework is to insert a newly initialized motion modeling module into the frozen text-to-image model and train it on video clips to distill reasonable motion priors. Once trained, by simply injecting this motion modeling module, all personalized versions derived from the same base T2I readily become text-driven models that produce diverse and personalized animated images. We conduct our evaluation on several public representative personalized text-to-image models across anime pictures and realistic photographs, and demonstrate that our proposed framework helps these models generate temporally smooth animation clips while preserving the domain and diversity of their outputs. Code and pre-trained weights will be publicly available at this https URL .
|
|
||||||
|
|
||||||
## Available Pipelines
|
|
||||||
|
|
||||||
| Pipeline | Tasks | Demo
|
|
||||||
|---|---|:---:|
|
|
||||||
| [AnimateDiffPipeline](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/animatediff/pipeline_animatediff.py) | *Text-to-Video Generation with AnimateDiff* |
|
|
||||||
|
|
||||||
## Available checkpoints
|
|
||||||
|
|
||||||
Motion Adapter checkpoints can be found under [guoyww](https://huggingface.co/guoyww/). These checkpoints are meant to work with any model based on Stable Diffusion 1.4/1.5
|
|
||||||
|
|
||||||
## Usage example
|
|
||||||
|
|
||||||
AnimateDiff works with a MotionAdapter checkpoint and a Stable Diffusion model checkpoint. The MotionAdapter is a collection of Motion Modules that are responsible for adding coherent motion across image frames. These modules are applied after the Resnet and Attention blocks in Stable Diffusion UNet.
|
|
||||||
|
|
||||||
The following example demonstrates how to use a *MotionAdapter* checkpoint with Diffusers for inference based on StableDiffusion-1.4/1.5.
|
|
||||||
|
|
||||||
```python
|
|
||||||
import torch
|
|
||||||
from diffusers import MotionAdapter, AnimateDiffPipeline, DDIMScheduler
|
|
||||||
from diffusers.utils import export_to_gif
|
|
||||||
|
|
||||||
# Load the motion adapter
|
|
||||||
adapter = MotionAdapter.from_pretrained("guoyww/animatediff-motion-adapter-v1-5-2")
|
|
||||||
# load SD 1.5 based finetuned model
|
|
||||||
model_id = "SG161222/Realistic_Vision_V5.1_noVAE"
|
|
||||||
pipe = AnimateDiffPipeline.from_pretrained(model_id, motion_adapter=adapter)
|
|
||||||
scheduler = DDIMScheduler.from_pretrained(
|
|
||||||
model_id, subfolder="scheduler", clip_sample=False, timestep_spacing="linspace", steps_offset=1
|
|
||||||
)
|
|
||||||
pipe.scheduler = scheduler
|
|
||||||
|
|
||||||
# enable memory savings
|
|
||||||
pipe.enable_vae_slicing()
|
|
||||||
pipe.enable_model_cpu_offload()
|
|
||||||
|
|
||||||
output = pipe(
|
|
||||||
prompt=(
|
|
||||||
"masterpiece, bestquality, highlydetailed, ultradetailed, sunset, "
|
|
||||||
"orange sky, warm lighting, fishing boats, ocean waves seagulls, "
|
|
||||||
"rippling water, wharf, silhouette, serene atmosphere, dusk, evening glow, "
|
|
||||||
"golden hour, coastal landscape, seaside scenery"
|
|
||||||
),
|
|
||||||
negative_prompt="bad quality, worse quality",
|
|
||||||
num_frames=16,
|
|
||||||
guidance_scale=7.5,
|
|
||||||
num_inference_steps=25,
|
|
||||||
generator=torch.Generator("cpu").manual_seed(42),
|
|
||||||
)
|
|
||||||
frames = output.frames[0]
|
|
||||||
export_to_gif(frames, "animation.gif")
|
|
||||||
```
|
|
||||||
|
|
||||||
Here are some sample outputs:
|
|
||||||
|
|
||||||
<table>
|
|
||||||
<tr>
|
|
||||||
<td><center>
|
|
||||||
masterpiece, bestquality, sunset.
|
|
||||||
<br>
|
|
||||||
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/animatediff-realistic-doc.gif"
|
|
||||||
alt="masterpiece, bestquality, sunset"
|
|
||||||
style="width: 300px;" />
|
|
||||||
</center></td>
|
|
||||||
</tr>
|
|
||||||
</table>
|
|
||||||
|
|
||||||
<Tip>
|
|
||||||
|
|
||||||
AnimateDiff tends to work better with finetuned Stable Diffusion models. If you plan on using a scheduler that can clip samples, make sure to disable it by setting `clip_sample=False` in the scheduler as this can also have an adverse effect on generated samples.
|
|
||||||
|
|
||||||
</Tip>
|
|
||||||
|
|
||||||
## Using Motion LoRAs
|
|
||||||
|
|
||||||
Motion LoRAs are a collection of LoRAs that work with the `guoyww/animatediff-motion-adapter-v1-5-2` checkpoint. These LoRAs are responsible for adding specific types of motion to the animations.
|
|
||||||
|
|
||||||
```python
|
|
||||||
import torch
|
|
||||||
from diffusers import MotionAdapter, AnimateDiffPipeline, DDIMScheduler
|
|
||||||
from diffusers.utils import export_to_gif
|
|
||||||
|
|
||||||
# Load the motion adapter
|
|
||||||
adapter = MotionAdapter.from_pretrained("guoyww/animatediff-motion-adapter-v1-5-2")
|
|
||||||
# load SD 1.5 based finetuned model
|
|
||||||
model_id = "SG161222/Realistic_Vision_V5.1_noVAE"
|
|
||||||
pipe = AnimateDiffPipeline.from_pretrained(model_id, motion_adapter=adapter)
|
|
||||||
pipe.load_lora_weights("guoyww/animatediff-motion-lora-zoom-out", adapter_name="zoom-out")
|
|
||||||
|
|
||||||
scheduler = DDIMScheduler.from_pretrained(
|
|
||||||
model_id, subfolder="scheduler", clip_sample=False, timestep_spacing="linspace", steps_offset=1
|
|
||||||
)
|
|
||||||
pipe.scheduler = scheduler
|
|
||||||
|
|
||||||
# enable memory savings
|
|
||||||
pipe.enable_vae_slicing()
|
|
||||||
pipe.enable_model_cpu_offload()
|
|
||||||
|
|
||||||
output = pipe(
|
|
||||||
prompt=(
|
|
||||||
"masterpiece, bestquality, highlydetailed, ultradetailed, sunset, "
|
|
||||||
"orange sky, warm lighting, fishing boats, ocean waves seagulls, "
|
|
||||||
"rippling water, wharf, silhouette, serene atmosphere, dusk, evening glow, "
|
|
||||||
"golden hour, coastal landscape, seaside scenery"
|
|
||||||
),
|
|
||||||
negative_prompt="bad quality, worse quality",
|
|
||||||
num_frames=16,
|
|
||||||
guidance_scale=7.5,
|
|
||||||
num_inference_steps=25,
|
|
||||||
generator=torch.Generator("cpu").manual_seed(42),
|
|
||||||
)
|
|
||||||
frames = output.frames[0]
|
|
||||||
export_to_gif(frames, "animation.gif")
|
|
||||||
```
|
|
||||||
|
|
||||||
<table>
|
|
||||||
<tr>
|
|
||||||
<td><center>
|
|
||||||
masterpiece, bestquality, sunset.
|
|
||||||
<br>
|
|
||||||
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/animatediff-zoom-out-lora.gif"
|
|
||||||
alt="masterpiece, bestquality, sunset"
|
|
||||||
style="width: 300px;" />
|
|
||||||
</center></td>
|
|
||||||
</tr>
|
|
||||||
</table>
|
|
||||||
|
|
||||||
## Using Motion LoRAs with PEFT
|
|
||||||
|
|
||||||
You can also leverage the [PEFT](https://github.com/huggingface/peft) backend to combine Motion LoRA's and create more complex animations.
|
|
||||||
|
|
||||||
First install PEFT with
|
|
||||||
|
|
||||||
```shell
|
|
||||||
pip install peft
|
|
||||||
```
|
|
||||||
|
|
||||||
Then you can use the following code to combine Motion LoRAs.
|
|
||||||
|
|
||||||
```python
|
|
||||||
import torch
|
|
||||||
from diffusers import MotionAdapter, AnimateDiffPipeline, DDIMScheduler
|
|
||||||
from diffusers.utils import export_to_gif
|
|
||||||
|
|
||||||
# Load the motion adapter
|
|
||||||
adapter = MotionAdapter.from_pretrained("guoyww/animatediff-motion-adapter-v1-5-2")
|
|
||||||
# load SD 1.5 based finetuned model
|
|
||||||
model_id = "SG161222/Realistic_Vision_V5.1_noVAE"
|
|
||||||
pipe = AnimateDiffPipeline.from_pretrained(model_id, motion_adapter=adapter)
|
|
||||||
|
|
||||||
pipe.load_lora_weights("diffusers/animatediff-motion-lora-zoom-out", adapter_name="zoom-out")
|
|
||||||
pipe.load_lora_weights("diffusers/animatediff-motion-lora-pan-left", adapter_name="pan-left")
|
|
||||||
pipe.set_adapters(["zoom-out", "pan-left"], adapter_weights=[1.0, 1.0])
|
|
||||||
|
|
||||||
scheduler = DDIMScheduler.from_pretrained(
|
|
||||||
model_id, subfolder="scheduler", clip_sample=False, timestep_spacing="linspace", steps_offset=1
|
|
||||||
)
|
|
||||||
pipe.scheduler = scheduler
|
|
||||||
|
|
||||||
# enable memory savings
|
|
||||||
pipe.enable_vae_slicing()
|
|
||||||
pipe.enable_model_cpu_offload()
|
|
||||||
|
|
||||||
output = pipe(
|
|
||||||
prompt=(
|
|
||||||
"masterpiece, bestquality, highlydetailed, ultradetailed, sunset, "
|
|
||||||
"orange sky, warm lighting, fishing boats, ocean waves seagulls, "
|
|
||||||
"rippling water, wharf, silhouette, serene atmosphere, dusk, evening glow, "
|
|
||||||
"golden hour, coastal landscape, seaside scenery"
|
|
||||||
),
|
|
||||||
negative_prompt="bad quality, worse quality",
|
|
||||||
num_frames=16,
|
|
||||||
guidance_scale=7.5,
|
|
||||||
num_inference_steps=25,
|
|
||||||
generator=torch.Generator("cpu").manual_seed(42),
|
|
||||||
)
|
|
||||||
frames = output.frames[0]
|
|
||||||
export_to_gif(frames, "animation.gif")
|
|
||||||
```
|
|
||||||
|
|
||||||
<table>
|
|
||||||
<tr>
|
|
||||||
<td><center>
|
|
||||||
masterpiece, bestquality, sunset.
|
|
||||||
<br>
|
|
||||||
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/animatediff-zoom-out-pan-left-lora.gif"
|
|
||||||
alt="masterpiece, bestquality, sunset"
|
|
||||||
style="width: 300px;" />
|
|
||||||
</center></td>
|
|
||||||
</tr>
|
|
||||||
</table>
|
|
||||||
|
|
||||||
|
|
||||||
## AnimateDiffPipeline
|
|
||||||
|
|
||||||
[[autodoc]] AnimateDiffPipeline
|
|
||||||
- all
|
|
||||||
- __call__
|
|
||||||
- enable_freeu
|
|
||||||
- disable_freeu
|
|
||||||
- enable_vae_slicing
|
|
||||||
- disable_vae_slicing
|
|
||||||
- enable_vae_tiling
|
|
||||||
- disable_vae_tiling
|
|
||||||
|
|
||||||
## AnimateDiffPipelineOutput
|
|
||||||
|
|
||||||
[[autodoc]] pipelines.animatediff.AnimateDiffPipelineOutput
|
|
||||||
|
|
||||||
@@ -1,37 +0,0 @@
|
|||||||
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
|
|
||||||
|
|
||||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
|
||||||
the License. You may obtain a copy of the License at
|
|
||||||
|
|
||||||
http://www.apache.org/licenses/LICENSE-2.0
|
|
||||||
|
|
||||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
|
||||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
|
||||||
specific language governing permissions and limitations under the License.
|
|
||||||
-->
|
|
||||||
|
|
||||||
# Attend-and-Excite
|
|
||||||
|
|
||||||
Attend-and-Excite for Stable Diffusion was proposed in [Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models](https://attendandexcite.github.io/Attend-and-Excite/) and provides textual attention control over image generation.
|
|
||||||
|
|
||||||
The abstract from the paper is:
|
|
||||||
|
|
||||||
*Text-to-image diffusion models have recently received a lot of interest for their astonishing ability to produce high-fidelity images from text only. However, achieving one-shot generation that aligns with the user's intent is nearly impossible, yet small changes to the input prompt often result in very different images. This leaves the user with little semantic control. To put the user in control, we show how to interact with the diffusion process to flexibly steer it along semantic directions. This semantic guidance (SEGA) allows for subtle and extensive edits, changes in composition and style, as well as optimizing the overall artistic conception. We demonstrate SEGA's effectiveness on a variety of tasks and provide evidence for its versatility and flexibility.*
|
|
||||||
|
|
||||||
You can find additional information about Attend-and-Excite on the [project page](https://attendandexcite.github.io/Attend-and-Excite/), the [original codebase](https://github.com/AttendAndExcite/Attend-and-Excite), or try it out in a [demo](https://huggingface.co/spaces/AttendAndExcite/Attend-and-Excite).
|
|
||||||
|
|
||||||
<Tip>
|
|
||||||
|
|
||||||
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-components-across-pipelines) section to learn how to efficiently load the same components into multiple pipelines.
|
|
||||||
|
|
||||||
</Tip>
|
|
||||||
|
|
||||||
## StableDiffusionAttendAndExcitePipeline
|
|
||||||
|
|
||||||
[[autodoc]] StableDiffusionAttendAndExcitePipeline
|
|
||||||
- all
|
|
||||||
- __call__
|
|
||||||
|
|
||||||
## StableDiffusionPipelineOutput
|
|
||||||
|
|
||||||
[[autodoc]] pipelines.stable_diffusion.StableDiffusionPipelineOutput
|
|
||||||
@@ -1,37 +0,0 @@
|
|||||||
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
|
|
||||||
|
|
||||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
|
||||||
the License. You may obtain a copy of the License at
|
|
||||||
|
|
||||||
http://www.apache.org/licenses/LICENSE-2.0
|
|
||||||
|
|
||||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
|
||||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
|
||||||
specific language governing permissions and limitations under the License.
|
|
||||||
-->
|
|
||||||
|
|
||||||
# Audio Diffusion
|
|
||||||
|
|
||||||
[Audio Diffusion](https://github.com/teticio/audio-diffusion) is by Robert Dargavel Smith, and it leverages the recent advances in image generation from diffusion models by converting audio samples to and from Mel spectrogram images.
|
|
||||||
|
|
||||||
The original codebase, training scripts and example notebooks can be found at [teticio/audio-diffusion](https://github.com/teticio/audio-diffusion).
|
|
||||||
|
|
||||||
<Tip>
|
|
||||||
|
|
||||||
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-components-across-pipelines) section to learn how to efficiently load the same components into multiple pipelines.
|
|
||||||
|
|
||||||
</Tip>
|
|
||||||
|
|
||||||
## AudioDiffusionPipeline
|
|
||||||
[[autodoc]] AudioDiffusionPipeline
|
|
||||||
- all
|
|
||||||
- __call__
|
|
||||||
|
|
||||||
## AudioPipelineOutput
|
|
||||||
[[autodoc]] pipelines.AudioPipelineOutput
|
|
||||||
|
|
||||||
## ImagePipelineOutput
|
|
||||||
[[autodoc]] pipelines.ImagePipelineOutput
|
|
||||||
|
|
||||||
## Mel
|
|
||||||
[[autodoc]] Mel
|
|
||||||
98
docs/source/en/api/pipelines/audio_diffusion.mdx
Normal file
98
docs/source/en/api/pipelines/audio_diffusion.mdx
Normal file
@@ -0,0 +1,98 @@
|
|||||||
|
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
|
||||||
|
|
||||||
|
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||||
|
the License. You may obtain a copy of the License at
|
||||||
|
|
||||||
|
http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
|
||||||
|
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||||
|
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||||
|
specific language governing permissions and limitations under the License.
|
||||||
|
-->
|
||||||
|
|
||||||
|
# Audio Diffusion
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
[Audio Diffusion](https://github.com/teticio/audio-diffusion) by Robert Dargavel Smith.
|
||||||
|
|
||||||
|
Audio Diffusion leverages the recent advances in image generation using diffusion models by converting audio samples to
|
||||||
|
and from mel spectrogram images.
|
||||||
|
|
||||||
|
The original codebase of this implementation can be found [here](https://github.com/teticio/audio-diffusion), including
|
||||||
|
training scripts and example notebooks.
|
||||||
|
|
||||||
|
## Available Pipelines:
|
||||||
|
|
||||||
|
| Pipeline | Tasks | Colab
|
||||||
|
|---|---|:---:|
|
||||||
|
| [pipeline_audio_diffusion.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/audio_diffusion/pipeline_audio_diffusion.py) | *Unconditional Audio Generation* | [](https://colab.research.google.com/github/teticio/audio-diffusion/blob/master/notebooks/audio_diffusion_pipeline.ipynb) |
|
||||||
|
|
||||||
|
|
||||||
|
## Examples:
|
||||||
|
|
||||||
|
### Audio Diffusion
|
||||||
|
|
||||||
|
```python
|
||||||
|
import torch
|
||||||
|
from IPython.display import Audio
|
||||||
|
from diffusers import DiffusionPipeline
|
||||||
|
|
||||||
|
device = "cuda" if torch.cuda.is_available() else "cpu"
|
||||||
|
pipe = DiffusionPipeline.from_pretrained("teticio/audio-diffusion-256").to(device)
|
||||||
|
|
||||||
|
output = pipe()
|
||||||
|
display(output.images[0])
|
||||||
|
display(Audio(output.audios[0], rate=mel.get_sample_rate()))
|
||||||
|
```
|
||||||
|
|
||||||
|
### Latent Audio Diffusion
|
||||||
|
|
||||||
|
```python
|
||||||
|
import torch
|
||||||
|
from IPython.display import Audio
|
||||||
|
from diffusers import DiffusionPipeline
|
||||||
|
|
||||||
|
device = "cuda" if torch.cuda.is_available() else "cpu"
|
||||||
|
pipe = DiffusionPipeline.from_pretrained("teticio/latent-audio-diffusion-256").to(device)
|
||||||
|
|
||||||
|
output = pipe()
|
||||||
|
display(output.images[0])
|
||||||
|
display(Audio(output.audios[0], rate=pipe.mel.get_sample_rate()))
|
||||||
|
```
|
||||||
|
|
||||||
|
### Audio Diffusion with DDIM (faster)
|
||||||
|
|
||||||
|
```python
|
||||||
|
import torch
|
||||||
|
from IPython.display import Audio
|
||||||
|
from diffusers import DiffusionPipeline
|
||||||
|
|
||||||
|
device = "cuda" if torch.cuda.is_available() else "cpu"
|
||||||
|
pipe = DiffusionPipeline.from_pretrained("teticio/audio-diffusion-ddim-256").to(device)
|
||||||
|
|
||||||
|
output = pipe()
|
||||||
|
display(output.images[0])
|
||||||
|
display(Audio(output.audios[0], rate=pipe.mel.get_sample_rate()))
|
||||||
|
```
|
||||||
|
|
||||||
|
### Variations, in-painting, out-painting etc.
|
||||||
|
|
||||||
|
```python
|
||||||
|
output = pipe(
|
||||||
|
raw_audio=output.audios[0, 0],
|
||||||
|
start_step=int(pipe.get_default_steps() / 2),
|
||||||
|
mask_start_secs=1,
|
||||||
|
mask_end_secs=1,
|
||||||
|
)
|
||||||
|
display(output.images[0])
|
||||||
|
display(Audio(output.audios[0], rate=pipe.mel.get_sample_rate()))
|
||||||
|
```
|
||||||
|
|
||||||
|
## AudioDiffusionPipeline
|
||||||
|
[[autodoc]] AudioDiffusionPipeline
|
||||||
|
- all
|
||||||
|
- __call__
|
||||||
|
|
||||||
|
## Mel
|
||||||
|
[[autodoc]] Mel
|
||||||
@@ -1,50 +0,0 @@
|
|||||||
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
|
|
||||||
|
|
||||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
|
||||||
the License. You may obtain a copy of the License at
|
|
||||||
|
|
||||||
http://www.apache.org/licenses/LICENSE-2.0
|
|
||||||
|
|
||||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
|
||||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
|
||||||
specific language governing permissions and limitations under the License.
|
|
||||||
-->
|
|
||||||
|
|
||||||
# AudioLDM
|
|
||||||
|
|
||||||
AudioLDM was proposed in [AudioLDM: Text-to-Audio Generation with Latent Diffusion Models](https://huggingface.co/papers/2301.12503) by Haohe Liu et al. Inspired by [Stable Diffusion](https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/overview), AudioLDM
|
|
||||||
is a text-to-audio _latent diffusion model (LDM)_ that learns continuous audio representations from [CLAP](https://huggingface.co/docs/transformers/main/model_doc/clap)
|
|
||||||
latents. AudioLDM takes a text prompt as input and predicts the corresponding audio. It can generate text-conditional
|
|
||||||
sound effects, human speech and music.
|
|
||||||
|
|
||||||
The abstract from the paper is:
|
|
||||||
|
|
||||||
*Text-to-audio (TTA) system has recently gained attention for its ability to synthesize general audio based on text descriptions. However, previous studies in TTA have limited generation quality with high computational costs. In this study, we propose AudioLDM, a TTA system that is built on a latent space to learn the continuous audio representations from contrastive language-audio pretraining (CLAP) latents. The pretrained CLAP models enable us to train LDMs with audio embedding while providing text embedding as a condition during sampling. By learning the latent representations of audio signals and their compositions without modeling the cross-modal relationship, AudioLDM is advantageous in both generation quality and computational efficiency. Trained on AudioCaps with a single GPU, AudioLDM achieves state-of-the-art TTA performance measured by both objective and subjective metrics (e.g., frechet distance). Moreover, AudioLDM is the first TTA system that enables various text-guided audio manipulations (e.g., style transfer) in a zero-shot fashion. Our implementation and demos are available at https://audioldm.github.io.*
|
|
||||||
|
|
||||||
The original codebase can be found at [haoheliu/AudioLDM](https://github.com/haoheliu/AudioLDM).
|
|
||||||
|
|
||||||
## Tips
|
|
||||||
|
|
||||||
When constructing a prompt, keep in mind:
|
|
||||||
|
|
||||||
* Descriptive prompt inputs work best; you can use adjectives to describe the sound (for example, "high quality" or "clear") and make the prompt context specific (for example, "water stream in a forest" instead of "stream").
|
|
||||||
* It's best to use general terms like "cat" or "dog" instead of specific names or abstract objects the model may not be familiar with.
|
|
||||||
|
|
||||||
During inference:
|
|
||||||
|
|
||||||
* The _quality_ of the predicted audio sample can be controlled by the `num_inference_steps` argument; higher steps give higher quality audio at the expense of slower inference.
|
|
||||||
* The _length_ of the predicted audio sample can be controlled by varying the `audio_length_in_s` argument.
|
|
||||||
|
|
||||||
<Tip>
|
|
||||||
|
|
||||||
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-components-across-pipelines) section to learn how to efficiently load the same components into multiple pipelines.
|
|
||||||
|
|
||||||
</Tip>
|
|
||||||
|
|
||||||
## AudioLDMPipeline
|
|
||||||
[[autodoc]] AudioLDMPipeline
|
|
||||||
- all
|
|
||||||
- __call__
|
|
||||||
|
|
||||||
## AudioPipelineOutput
|
|
||||||
[[autodoc]] pipelines.AudioPipelineOutput
|
|
||||||
82
docs/source/en/api/pipelines/audioldm.mdx
Normal file
82
docs/source/en/api/pipelines/audioldm.mdx
Normal file
@@ -0,0 +1,82 @@
|
|||||||
|
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
|
||||||
|
|
||||||
|
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||||
|
the License. You may obtain a copy of the License at
|
||||||
|
|
||||||
|
http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
|
||||||
|
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||||
|
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||||
|
specific language governing permissions and limitations under the License.
|
||||||
|
-->
|
||||||
|
|
||||||
|
# AudioLDM
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
AudioLDM was proposed in [AudioLDM: Text-to-Audio Generation with Latent Diffusion Models](https://arxiv.org/abs/2301.12503) by Haohe Liu et al.
|
||||||
|
|
||||||
|
Inspired by [Stable Diffusion](https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/overview), AudioLDM
|
||||||
|
is a text-to-audio _latent diffusion model (LDM)_ that learns continuous audio representations from [CLAP](https://huggingface.co/docs/transformers/main/model_doc/clap)
|
||||||
|
latents. AudioLDM takes a text prompt as input and predicts the corresponding audio. It can generate text-conditional
|
||||||
|
sound effects, human speech and music.
|
||||||
|
|
||||||
|
This pipeline was contributed by [sanchit-gandhi](https://huggingface.co/sanchit-gandhi). The original codebase can be found [here](https://github.com/haoheliu/AudioLDM).
|
||||||
|
|
||||||
|
## Text-to-Audio
|
||||||
|
|
||||||
|
The [`AudioLDMPipeline`] can be used to load pre-trained weights from [cvssp/audioldm](https://huggingface.co/cvssp/audioldm) and generate text-conditional audio outputs:
|
||||||
|
|
||||||
|
```python
|
||||||
|
from diffusers import AudioLDMPipeline
|
||||||
|
import torch
|
||||||
|
import scipy
|
||||||
|
|
||||||
|
repo_id = "cvssp/audioldm"
|
||||||
|
pipe = AudioLDMPipeline.from_pretrained(repo_id, torch_dtype=torch.float16)
|
||||||
|
pipe = pipe.to("cuda")
|
||||||
|
|
||||||
|
prompt = "Techno music with a strong, upbeat tempo and high melodic riffs"
|
||||||
|
audio = pipe(prompt, num_inference_steps=10, audio_length_in_s=5.0).audios[0]
|
||||||
|
|
||||||
|
# save the audio sample as a .wav file
|
||||||
|
scipy.io.wavfile.write("techno.wav", rate=16000, data=audio)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Tips
|
||||||
|
|
||||||
|
Prompts:
|
||||||
|
* Descriptive prompt inputs work best: you can use adjectives to describe the sound (e.g. "high quality" or "clear") and make the prompt context specific (e.g., "water stream in a forest" instead of "stream").
|
||||||
|
* It's best to use general terms like 'cat' or 'dog' instead of specific names or abstract objects that the model may not be familiar with.
|
||||||
|
|
||||||
|
Inference:
|
||||||
|
* The _quality_ of the predicted audio sample can be controlled by the `num_inference_steps` argument: higher steps give higher quality audio at the expense of slower inference.
|
||||||
|
* The _length_ of the predicted audio sample can be controlled by varying the `audio_length_in_s` argument.
|
||||||
|
|
||||||
|
### How to load and use different schedulers
|
||||||
|
|
||||||
|
The AudioLDM pipeline uses [`DDIMScheduler`] scheduler by default. But `diffusers` provides many other schedulers
|
||||||
|
that can be used with the AudioLDM pipeline such as [`PNDMScheduler`], [`LMSDiscreteScheduler`], [`EulerDiscreteScheduler`],
|
||||||
|
[`EulerAncestralDiscreteScheduler`] etc. We recommend using the [`DPMSolverMultistepScheduler`] as it's currently the fastest
|
||||||
|
scheduler there is.
|
||||||
|
|
||||||
|
To use a different scheduler, you can either change it via the [`ConfigMixin.from_config`]
|
||||||
|
method, or pass the `scheduler` argument to the `from_pretrained` method of the pipeline. For example, to use the
|
||||||
|
[`DPMSolverMultistepScheduler`], you can do the following:
|
||||||
|
|
||||||
|
```python
|
||||||
|
>>> from diffusers import AudioLDMPipeline, DPMSolverMultistepScheduler
|
||||||
|
>>> import torch
|
||||||
|
|
||||||
|
>>> pipeline = AudioLDMPipeline.from_pretrained("cvssp/audioldm", torch_dtype=torch.float16)
|
||||||
|
>>> pipeline.scheduler = DPMSolverMultistepScheduler.from_config(pipeline.scheduler.config)
|
||||||
|
|
||||||
|
>>> # or
|
||||||
|
>>> dpm_scheduler = DPMSolverMultistepScheduler.from_pretrained("cvssp/audioldm", subfolder="scheduler")
|
||||||
|
>>> pipeline = AudioLDMPipeline.from_pretrained("cvssp/audioldm", scheduler=dpm_scheduler, torch_dtype=torch.float16)
|
||||||
|
```
|
||||||
|
|
||||||
|
## AudioLDMPipeline
|
||||||
|
[[autodoc]] AudioLDMPipeline
|
||||||
|
- all
|
||||||
|
- __call__
|
||||||
@@ -1,91 +0,0 @@
|
|||||||
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
|
|
||||||
|
|
||||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
|
||||||
the License. You may obtain a copy of the License at
|
|
||||||
|
|
||||||
http://www.apache.org/licenses/LICENSE-2.0
|
|
||||||
|
|
||||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
|
||||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
|
||||||
specific language governing permissions and limitations under the License.
|
|
||||||
-->
|
|
||||||
|
|
||||||
# AudioLDM 2
|
|
||||||
|
|
||||||
AudioLDM 2 was proposed in [AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining](https://arxiv.org/abs/2308.05734)
|
|
||||||
by Haohe Liu et al. AudioLDM 2 takes a text prompt as input and predicts the corresponding audio. It can generate
|
|
||||||
text-conditional sound effects, human speech and music.
|
|
||||||
|
|
||||||
Inspired by [Stable Diffusion](https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/overview), AudioLDM 2
|
|
||||||
is a text-to-audio _latent diffusion model (LDM)_ that learns continuous audio representations from text embeddings. Two
|
|
||||||
text encoder models are used to compute the text embeddings from a prompt input: the text-branch of [CLAP](https://huggingface.co/docs/transformers/main/en/model_doc/clap)
|
|
||||||
and the encoder of [Flan-T5](https://huggingface.co/docs/transformers/main/en/model_doc/flan-t5). These text embeddings
|
|
||||||
are then projected to a shared embedding space by an [AudioLDM2ProjectionModel](https://huggingface.co/docs/diffusers/main/api/pipelines/audioldm2#diffusers.AudioLDM2ProjectionModel).
|
|
||||||
A [GPT2](https://huggingface.co/docs/transformers/main/en/model_doc/gpt2) _language model (LM)_ is used to auto-regressively
|
|
||||||
predict eight new embedding vectors, conditional on the projected CLAP and Flan-T5 embeddings. The generated embedding
|
|
||||||
vectors and Flan-T5 text embeddings are used as cross-attention conditioning in the LDM. The [UNet](https://huggingface.co/docs/diffusers/main/en/api/pipelines/audioldm2#diffusers.AudioLDM2UNet2DConditionModel)
|
|
||||||
of AudioLDM 2 is unique in the sense that it takes **two** cross-attention embeddings, as opposed to one cross-attention
|
|
||||||
conditioning, as in most other LDMs.
|
|
||||||
|
|
||||||
The abstract of the paper is the following:
|
|
||||||
|
|
||||||
*Although audio generation shares commonalities across different types of audio, such as speech, music, and sound effects, designing models for each type requires careful consideration of specific objectives and biases that can significantly differ from those of other types. To bring us closer to a unified perspective of audio generation, this paper proposes a framework that utilizes the same learning method for speech, music, and sound effect generation. Our framework introduces a general representation of audio, called language of audio (LOA). Any audio can be translated into LOA based on AudioMAE, a self-supervised pre-trained representation learning model. In the generation process, we translate any modalities into LOA by using a GPT-2 model, and we perform self-supervised audio generation learning with a latent diffusion model conditioned on LOA. The proposed framework naturally brings advantages such as in-context learning abilities and reusable self-supervised pretrained AudioMAE and latent diffusion models. Experiments on the major benchmarks of text-to-audio, text-to-music, and text-to-speech demonstrate new state-of-the-art or competitive performance to previous approaches.*
|
|
||||||
|
|
||||||
This pipeline was contributed by [sanchit-gandhi](https://huggingface.co/sanchit-gandhi). The original codebase can be
|
|
||||||
found at [haoheliu/audioldm2](https://github.com/haoheliu/audioldm2).
|
|
||||||
|
|
||||||
## Tips
|
|
||||||
|
|
||||||
### Choosing a checkpoint
|
|
||||||
|
|
||||||
AudioLDM2 comes in three variants. Two of these checkpoints are applicable to the general task of text-to-audio
|
|
||||||
generation. The third checkpoint is trained exclusively on text-to-music generation.
|
|
||||||
|
|
||||||
All checkpoints share the same model size for the text encoders and VAE. They differ in the size and depth of the UNet.
|
|
||||||
See table below for details on the three checkpoints:
|
|
||||||
|
|
||||||
| Checkpoint | Task | UNet Model Size | Total Model Size | Training Data / h |
|
|
||||||
|-----------------------------------------------------------------|---------------|-----------------|------------------|-------------------|
|
|
||||||
| [audioldm2](https://huggingface.co/cvssp/audioldm2) | Text-to-audio | 350M | 1.1B | 1150k |
|
|
||||||
| [audioldm2-large](https://huggingface.co/cvssp/audioldm2-large) | Text-to-audio | 750M | 1.5B | 1150k |
|
|
||||||
| [audioldm2-music](https://huggingface.co/cvssp/audioldm2-music) | Text-to-music | 350M | 1.1B | 665k |
|
|
||||||
|
|
||||||
### Constructing a prompt
|
|
||||||
|
|
||||||
* Descriptive prompt inputs work best: use adjectives to describe the sound (e.g. "high quality" or "clear") and make the prompt context specific (e.g. "water stream in a forest" instead of "stream").
|
|
||||||
* It's best to use general terms like "cat" or "dog" instead of specific names or abstract objects the model may not be familiar with.
|
|
||||||
* Using a **negative prompt** can significantly improve the quality of the generated waveform, by guiding the generation away from terms that correspond to poor quality audio. Try using a negative prompt of "Low quality."
|
|
||||||
|
|
||||||
### Controlling inference
|
|
||||||
|
|
||||||
* The _quality_ of the predicted audio sample can be controlled by the `num_inference_steps` argument; higher steps give higher quality audio at the expense of slower inference.
|
|
||||||
* The _length_ of the predicted audio sample can be controlled by varying the `audio_length_in_s` argument.
|
|
||||||
|
|
||||||
### Evaluating generated waveforms:
|
|
||||||
|
|
||||||
* The quality of the generated waveforms can vary significantly based on the seed. Try generating with different seeds until you find a satisfactory generation
|
|
||||||
* Multiple waveforms can be generated in one go: set `num_waveforms_per_prompt` to a value greater than 1. Automatic scoring will be performed between the generated waveforms and prompt text, and the audios ranked from best to worst accordingly.
|
|
||||||
|
|
||||||
The following example demonstrates how to construct good music generation using the aforementioned tips: [example](https://huggingface.co/docs/diffusers/main/en/api/pipelines/audioldm2#diffusers.AudioLDM2Pipeline.__call__.example).
|
|
||||||
|
|
||||||
<Tip>
|
|
||||||
|
|
||||||
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-components-across-pipelines) section to learn how to efficiently load the same components into multiple pipelines.
|
|
||||||
|
|
||||||
</Tip>
|
|
||||||
|
|
||||||
## AudioLDM2Pipeline
|
|
||||||
[[autodoc]] AudioLDM2Pipeline
|
|
||||||
- all
|
|
||||||
- __call__
|
|
||||||
|
|
||||||
## AudioLDM2ProjectionModel
|
|
||||||
[[autodoc]] AudioLDM2ProjectionModel
|
|
||||||
- forward
|
|
||||||
|
|
||||||
## AudioLDM2UNet2DConditionModel
|
|
||||||
[[autodoc]] AudioLDM2UNet2DConditionModel
|
|
||||||
- forward
|
|
||||||
|
|
||||||
## AudioPipelineOutput
|
|
||||||
[[autodoc]] pipelines.AudioPipelineOutput
|
|
||||||
@@ -1,74 +0,0 @@
|
|||||||
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
|
|
||||||
|
|
||||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
|
||||||
the License. You may obtain a copy of the License at
|
|
||||||
|
|
||||||
http://www.apache.org/licenses/LICENSE-2.0
|
|
||||||
|
|
||||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
|
||||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
|
||||||
specific language governing permissions and limitations under the License.
|
|
||||||
-->
|
|
||||||
|
|
||||||
# AutoPipeline
|
|
||||||
|
|
||||||
`AutoPipeline` is designed to:
|
|
||||||
|
|
||||||
1. make it easy for you to load a checkpoint for a task without knowing the specific pipeline class to use
|
|
||||||
2. use multiple pipelines in your workflow
|
|
||||||
|
|
||||||
Based on the task, the `AutoPipeline` class automatically retrieves the relevant pipeline given the name or path to the pretrained weights with the `from_pretrained()` method.
|
|
||||||
|
|
||||||
To seamlessly switch between tasks with the same checkpoint without reallocating additional memory, use the `from_pipe()` method to transfer the components from the original pipeline to the new one.
|
|
||||||
|
|
||||||
```py
|
|
||||||
from diffusers import AutoPipelineForText2Image
|
|
||||||
import torch
|
|
||||||
|
|
||||||
pipeline = AutoPipelineForText2Image.from_pretrained(
|
|
||||||
"runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16, use_safetensors=True
|
|
||||||
).to("cuda")
|
|
||||||
prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
|
|
||||||
|
|
||||||
image = pipeline(prompt, num_inference_steps=25).images[0]
|
|
||||||
```
|
|
||||||
|
|
||||||
<Tip>
|
|
||||||
|
|
||||||
Check out the [AutoPipeline](/tutorials/autopipeline) tutorial to learn how to use this API!
|
|
||||||
|
|
||||||
</Tip>
|
|
||||||
|
|
||||||
`AutoPipeline` supports text-to-image, image-to-image, and inpainting for the following diffusion models:
|
|
||||||
|
|
||||||
- [Stable Diffusion](./stable_diffusion)
|
|
||||||
- [ControlNet](./controlnet)
|
|
||||||
- [Stable Diffusion XL (SDXL)](./stable_diffusion/stable_diffusion_xl)
|
|
||||||
- [DeepFloyd IF](./if)
|
|
||||||
- [Kandinsky](./kandinsky)
|
|
||||||
- [Kandinsky 2.2](./kandinsky#kandinsky-22)
|
|
||||||
|
|
||||||
|
|
||||||
## AutoPipelineForText2Image
|
|
||||||
|
|
||||||
[[autodoc]] AutoPipelineForText2Image
|
|
||||||
- all
|
|
||||||
- from_pretrained
|
|
||||||
- from_pipe
|
|
||||||
|
|
||||||
|
|
||||||
## AutoPipelineForImage2Image
|
|
||||||
|
|
||||||
[[autodoc]] AutoPipelineForImage2Image
|
|
||||||
- all
|
|
||||||
- from_pretrained
|
|
||||||
- from_pipe
|
|
||||||
|
|
||||||
## AutoPipelineForInpainting
|
|
||||||
|
|
||||||
[[autodoc]] AutoPipelineForInpainting
|
|
||||||
- all
|
|
||||||
- from_pretrained
|
|
||||||
- from_pipe
|
|
||||||
|
|
||||||
|
|
||||||
@@ -1,29 +0,0 @@
|
|||||||
# Blip Diffusion
|
|
||||||
|
|
||||||
Blip Diffusion was proposed in [BLIP-Diffusion: Pre-trained Subject Representation for Controllable Text-to-Image Generation and Editing](https://arxiv.org/abs/2305.14720). It enables zero-shot subject-driven generation and control-guided zero-shot generation.
|
|
||||||
|
|
||||||
|
|
||||||
The abstract from the paper is:
|
|
||||||
|
|
||||||
*Subject-driven text-to-image generation models create novel renditions of an input subject based on text prompts. Existing models suffer from lengthy fine-tuning and difficulties preserving the subject fidelity. To overcome these limitations, we introduce BLIP-Diffusion, a new subject-driven image generation model that supports multimodal control which consumes inputs of subject images and text prompts. Unlike other subject-driven generation models, BLIP-Diffusion introduces a new multimodal encoder which is pre-trained to provide subject representation. We first pre-train the multimodal encoder following BLIP-2 to produce visual representation aligned with the text. Then we design a subject representation learning task which enables a diffusion model to leverage such visual representation and generates new subject renditions. Compared with previous methods such as DreamBooth, our model enables zero-shot subject-driven generation, and efficient fine-tuning for customized subject with up to 20x speedup. We also demonstrate that BLIP-Diffusion can be flexibly combined with existing techniques such as ControlNet and prompt-to-prompt to enable novel subject-driven generation and editing applications.*
|
|
||||||
|
|
||||||
The original codebase can be found at [salesforce/LAVIS](https://github.com/salesforce/LAVIS/tree/main/projects/blip-diffusion). You can find the official BLIP Diffusion checkpoints under the [hf.co/SalesForce](https://hf.co/SalesForce) organization.
|
|
||||||
|
|
||||||
`BlipDiffusionPipeline` and `BlipDiffusionControlNetPipeline` were contributed by [`ayushtues`](https://github.com/ayushtues/).
|
|
||||||
|
|
||||||
<Tip>
|
|
||||||
|
|
||||||
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-components-across-pipelines) section to learn how to efficiently load the same components into multiple pipelines.
|
|
||||||
|
|
||||||
</Tip>
|
|
||||||
|
|
||||||
|
|
||||||
## BlipDiffusionPipeline
|
|
||||||
[[autodoc]] BlipDiffusionPipeline
|
|
||||||
- all
|
|
||||||
- __call__
|
|
||||||
|
|
||||||
## BlipDiffusionControlNetPipeline
|
|
||||||
[[autodoc]] BlipDiffusionControlNetPipeline
|
|
||||||
- all
|
|
||||||
- __call__
|
|
||||||
@@ -1,43 +0,0 @@
|
|||||||
# Consistency Models
|
|
||||||
|
|
||||||
Consistency Models were proposed in [Consistency Models](https://huggingface.co/papers/2303.01469) by Yang Song, Prafulla Dhariwal, Mark Chen, and Ilya Sutskever.
|
|
||||||
|
|
||||||
The abstract from the paper is:
|
|
||||||
|
|
||||||
*Diffusion models have significantly advanced the fields of image, audio, and video generation, but they depend on an iterative sampling process that causes slow generation. To overcome this limitation, we propose consistency models, a new family of models that generate high quality samples by directly mapping noise to data. They support fast one-step generation by design, while still allowing multistep sampling to trade compute for sample quality. They also support zero-shot data editing, such as image inpainting, colorization, and super-resolution, without requiring explicit training on these tasks. Consistency models can be trained either by distilling pre-trained diffusion models, or as standalone generative models altogether. Through extensive experiments, we demonstrate that they outperform existing distillation techniques for diffusion models in one- and few-step sampling, achieving the new state-of-the-art FID of 3.55 on CIFAR-10 and 6.20 on ImageNet 64x64 for one-step generation. When trained in isolation, consistency models become a new family of generative models that can outperform existing one-step, non-adversarial generative models on standard benchmarks such as CIFAR-10, ImageNet 64x64 and LSUN 256x256. *
|
|
||||||
|
|
||||||
The original codebase can be found at [openai/consistency_models](https://github.com/openai/consistency_models), and additional checkpoints are available at [openai](https://huggingface.co/openai).
|
|
||||||
|
|
||||||
The pipeline was contributed by [dg845](https://github.com/dg845) and [ayushtues](https://huggingface.co/ayushtues). ❤️
|
|
||||||
|
|
||||||
## Tips
|
|
||||||
|
|
||||||
For an additional speed-up, use `torch.compile` to generate multiple images in <1 second:
|
|
||||||
|
|
||||||
```diff
|
|
||||||
import torch
|
|
||||||
from diffusers import ConsistencyModelPipeline
|
|
||||||
|
|
||||||
device = "cuda"
|
|
||||||
# Load the cd_bedroom256_lpips checkpoint.
|
|
||||||
model_id_or_path = "openai/diffusers-cd_bedroom256_lpips"
|
|
||||||
pipe = ConsistencyModelPipeline.from_pretrained(model_id_or_path, torch_dtype=torch.float16)
|
|
||||||
pipe.to(device)
|
|
||||||
|
|
||||||
+ pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True)
|
|
||||||
|
|
||||||
# Multistep sampling
|
|
||||||
# Timesteps can be explicitly specified; the particular timesteps below are from the original Github repo:
|
|
||||||
# https://github.com/openai/consistency_models/blob/main/scripts/launch.sh#L83
|
|
||||||
for _ in range(10):
|
|
||||||
image = pipe(timesteps=[17, 0]).images[0]
|
|
||||||
image.show()
|
|
||||||
```
|
|
||||||
|
|
||||||
## ConsistencyModelPipeline
|
|
||||||
[[autodoc]] ConsistencyModelPipeline
|
|
||||||
- all
|
|
||||||
- __call__
|
|
||||||
|
|
||||||
## ImagePipelineOutput
|
|
||||||
[[autodoc]] pipelines.ImagePipelineOutput
|
|
||||||
@@ -1,80 +0,0 @@
|
|||||||
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
|
|
||||||
|
|
||||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
|
||||||
the License. You may obtain a copy of the License at
|
|
||||||
|
|
||||||
http://www.apache.org/licenses/LICENSE-2.0
|
|
||||||
|
|
||||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
|
||||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
|
||||||
specific language governing permissions and limitations under the License.
|
|
||||||
-->
|
|
||||||
|
|
||||||
# ControlNet
|
|
||||||
|
|
||||||
ControlNet was introduced in [Adding Conditional Control to Text-to-Image Diffusion Models](https://huggingface.co/papers/2302.05543) by Lvmin Zhang and Maneesh Agrawala.
|
|
||||||
|
|
||||||
With a ControlNet model, you can provide an additional control image to condition and control Stable Diffusion generation. For example, if you provide a depth map, the ControlNet model generates an image that'll preserve the spatial information from the depth map. It is a more flexible and accurate way to control the image generation process.
|
|
||||||
|
|
||||||
The abstract from the paper is:
|
|
||||||
|
|
||||||
*We present a neural network structure, ControlNet, to control pretrained large diffusion models to support additional input conditions. The ControlNet learns task-specific conditions in an end-to-end way, and the learning is robust even when the training dataset is small (< 50k). Moreover, training a ControlNet is as fast as fine-tuning a diffusion model, and the model can be trained on a personal devices. Alternatively, if powerful computation clusters are available, the model can scale to large amounts (millions to billions) of data. We report that large diffusion models like Stable Diffusion can be augmented with ControlNets to enable conditional inputs like edge maps, segmentation maps, keypoints, etc. This may enrich the methods to control large diffusion models and further facilitate related applications.*
|
|
||||||
|
|
||||||
This model was contributed by [takuma104](https://huggingface.co/takuma104). ❤️
|
|
||||||
|
|
||||||
The original codebase can be found at [lllyasviel/ControlNet](https://github.com/lllyasviel/ControlNet), and you can find official ControlNet checkpoints on [lllyasviel's](https://huggingface.co/lllyasviel) Hub profile.
|
|
||||||
|
|
||||||
<Tip>
|
|
||||||
|
|
||||||
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-components-across-pipelines) section to learn how to efficiently load the same components into multiple pipelines.
|
|
||||||
|
|
||||||
</Tip>
|
|
||||||
|
|
||||||
## StableDiffusionControlNetPipeline
|
|
||||||
[[autodoc]] StableDiffusionControlNetPipeline
|
|
||||||
- all
|
|
||||||
- __call__
|
|
||||||
- enable_attention_slicing
|
|
||||||
- disable_attention_slicing
|
|
||||||
- enable_vae_slicing
|
|
||||||
- disable_vae_slicing
|
|
||||||
- enable_xformers_memory_efficient_attention
|
|
||||||
- disable_xformers_memory_efficient_attention
|
|
||||||
- load_textual_inversion
|
|
||||||
|
|
||||||
## StableDiffusionControlNetImg2ImgPipeline
|
|
||||||
[[autodoc]] StableDiffusionControlNetImg2ImgPipeline
|
|
||||||
- all
|
|
||||||
- __call__
|
|
||||||
- enable_attention_slicing
|
|
||||||
- disable_attention_slicing
|
|
||||||
- enable_vae_slicing
|
|
||||||
- disable_vae_slicing
|
|
||||||
- enable_xformers_memory_efficient_attention
|
|
||||||
- disable_xformers_memory_efficient_attention
|
|
||||||
- load_textual_inversion
|
|
||||||
|
|
||||||
## StableDiffusionControlNetInpaintPipeline
|
|
||||||
[[autodoc]] StableDiffusionControlNetInpaintPipeline
|
|
||||||
- all
|
|
||||||
- __call__
|
|
||||||
- enable_attention_slicing
|
|
||||||
- disable_attention_slicing
|
|
||||||
- enable_vae_slicing
|
|
||||||
- disable_vae_slicing
|
|
||||||
- enable_xformers_memory_efficient_attention
|
|
||||||
- disable_xformers_memory_efficient_attention
|
|
||||||
- load_textual_inversion
|
|
||||||
|
|
||||||
## StableDiffusionPipelineOutput
|
|
||||||
|
|
||||||
[[autodoc]] pipelines.stable_diffusion.StableDiffusionPipelineOutput
|
|
||||||
|
|
||||||
## FlaxStableDiffusionControlNetPipeline
|
|
||||||
[[autodoc]] FlaxStableDiffusionControlNetPipeline
|
|
||||||
- all
|
|
||||||
- __call__
|
|
||||||
|
|
||||||
## FlaxStableDiffusionControlNetPipelineOutput
|
|
||||||
|
|
||||||
[[autodoc]] pipelines.stable_diffusion.FlaxStableDiffusionPipelineOutput
|
|
||||||
@@ -1,55 +0,0 @@
|
|||||||
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
|
|
||||||
|
|
||||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
|
||||||
the License. You may obtain a copy of the License at
|
|
||||||
|
|
||||||
http://www.apache.org/licenses/LICENSE-2.0
|
|
||||||
|
|
||||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
|
||||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
|
||||||
specific language governing permissions and limitations under the License.
|
|
||||||
-->
|
|
||||||
|
|
||||||
# ControlNet with Stable Diffusion XL
|
|
||||||
|
|
||||||
ControlNet was introduced in [Adding Conditional Control to Text-to-Image Diffusion Models](https://huggingface.co/papers/2302.05543) by Lvmin Zhang and Maneesh Agrawala.
|
|
||||||
|
|
||||||
With a ControlNet model, you can provide an additional control image to condition and control Stable Diffusion generation. For example, if you provide a depth map, the ControlNet model generates an image that'll preserve the spatial information from the depth map. It is a more flexible and accurate way to control the image generation process.
|
|
||||||
|
|
||||||
The abstract from the paper is:
|
|
||||||
|
|
||||||
*We present a neural network structure, ControlNet, to control pretrained large diffusion models to support additional input conditions. The ControlNet learns task-specific conditions in an end-to-end way, and the learning is robust even when the training dataset is small (< 50k). Moreover, training a ControlNet is as fast as fine-tuning a diffusion model, and the model can be trained on a personal devices. Alternatively, if powerful computation clusters are available, the model can scale to large amounts (millions to billions) of data. We report that large diffusion models like Stable Diffusion can be augmented with ControlNets to enable conditional inputs like edge maps, segmentation maps, keypoints, etc. This may enrich the methods to control large diffusion models and further facilitate related applications.*
|
|
||||||
|
|
||||||
You can find additional smaller Stable Diffusion XL (SDXL) ControlNet checkpoints from the 🤗 [Diffusers](https://huggingface.co/diffusers) Hub organization, and browse [community-trained](https://huggingface.co/models?other=stable-diffusion-xl&other=controlnet) checkpoints on the Hub.
|
|
||||||
|
|
||||||
<Tip warning={true}>
|
|
||||||
|
|
||||||
🧪 Many of the SDXL ControlNet checkpoints are experimental, and there is a lot of room for improvement. Feel free to open an [Issue](https://github.com/huggingface/diffusers/issues/new/choose) and leave us feedback on how we can improve!
|
|
||||||
|
|
||||||
</Tip>
|
|
||||||
|
|
||||||
If you don't see a checkpoint you're interested in, you can train your own SDXL ControlNet with our [training script](https://github.com/huggingface/diffusers/blob/main/examples/controlnet/README_sdxl.md).
|
|
||||||
|
|
||||||
<Tip>
|
|
||||||
|
|
||||||
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-components-across-pipelines) section to learn how to efficiently load the same components into multiple pipelines.
|
|
||||||
|
|
||||||
</Tip>
|
|
||||||
|
|
||||||
## StableDiffusionXLControlNetPipeline
|
|
||||||
[[autodoc]] StableDiffusionXLControlNetPipeline
|
|
||||||
- all
|
|
||||||
- __call__
|
|
||||||
|
|
||||||
## StableDiffusionXLControlNetImg2ImgPipeline
|
|
||||||
[[autodoc]] StableDiffusionXLControlNetImg2ImgPipeline
|
|
||||||
- all
|
|
||||||
- __call__
|
|
||||||
|
|
||||||
## StableDiffusionXLControlNetInpaintPipeline
|
|
||||||
[[autodoc]] StableDiffusionXLControlNetInpaintPipeline
|
|
||||||
- all
|
|
||||||
- __call__
|
|
||||||
## StableDiffusionPipelineOutput
|
|
||||||
|
|
||||||
[[autodoc]] pipelines.stable_diffusion.StableDiffusionPipelineOutput
|
|
||||||
@@ -1,33 +0,0 @@
|
|||||||
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
|
|
||||||
|
|
||||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
|
||||||
the License. You may obtain a copy of the License at
|
|
||||||
|
|
||||||
http://www.apache.org/licenses/LICENSE-2.0
|
|
||||||
|
|
||||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
|
||||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
|
||||||
specific language governing permissions and limitations under the License.
|
|
||||||
-->
|
|
||||||
|
|
||||||
# Cycle Diffusion
|
|
||||||
|
|
||||||
Cycle Diffusion is a text guided image-to-image generation model proposed in [Unifying Diffusion Models' Latent Space, with Applications to CycleDiffusion and Guidance](https://huggingface.co/papers/2210.05559) by Chen Henry Wu, Fernando De la Torre.
|
|
||||||
|
|
||||||
The abstract from the paper is:
|
|
||||||
|
|
||||||
*Diffusion models have achieved unprecedented performance in generative modeling. The commonly-adopted formulation of the latent code of diffusion models is a sequence of gradually denoised samples, as opposed to the simpler (e.g., Gaussian) latent space of GANs, VAEs, and normalizing flows. This paper provides an alternative, Gaussian formulation of the latent space of various diffusion models, as well as an invertible DPM-Encoder that maps images into the latent space. While our formulation is purely based on the definition of diffusion models, we demonstrate several intriguing consequences. (1) Empirically, we observe that a common latent space emerges from two diffusion models trained independently on related domains. In light of this finding, we propose CycleDiffusion, which uses DPM-Encoder for unpaired image-to-image translation. Furthermore, applying CycleDiffusion to text-to-image diffusion models, we show that large-scale text-to-image diffusion models can be used as zero-shot image-to-image editors. (2) One can guide pre-trained diffusion models and GANs by controlling the latent codes in a unified, plug-and-play formulation based on energy-based models. Using the CLIP model and a face recognition model as guidance, we demonstrate that diffusion models have better coverage of low-density sub-populations and individuals than GANs.*
|
|
||||||
|
|
||||||
<Tip>
|
|
||||||
|
|
||||||
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-components-across-pipelines) section to learn how to efficiently load the same components into multiple pipelines.
|
|
||||||
|
|
||||||
</Tip>
|
|
||||||
|
|
||||||
## CycleDiffusionPipeline
|
|
||||||
[[autodoc]] CycleDiffusionPipeline
|
|
||||||
- all
|
|
||||||
- __call__
|
|
||||||
|
|
||||||
## StableDiffusionPiplineOutput
|
|
||||||
[[autodoc]] pipelines.stable_diffusion.StableDiffusionPipelineOutput
|
|
||||||
100
docs/source/en/api/pipelines/cycle_diffusion.mdx
Normal file
100
docs/source/en/api/pipelines/cycle_diffusion.mdx
Normal file
@@ -0,0 +1,100 @@
|
|||||||
|
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
|
||||||
|
|
||||||
|
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||||
|
the License. You may obtain a copy of the License at
|
||||||
|
|
||||||
|
http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
|
||||||
|
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||||
|
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||||
|
specific language governing permissions and limitations under the License.
|
||||||
|
-->
|
||||||
|
|
||||||
|
# Cycle Diffusion
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
Cycle Diffusion is a Text-Guided Image-to-Image Generation model proposed in [Unifying Diffusion Models' Latent Space, with Applications to CycleDiffusion and Guidance](https://arxiv.org/abs/2210.05559) by Chen Henry Wu, Fernando De la Torre.
|
||||||
|
|
||||||
|
The abstract of the paper is the following:
|
||||||
|
|
||||||
|
*Diffusion models have achieved unprecedented performance in generative modeling. The commonly-adopted formulation of the latent code of diffusion models is a sequence of gradually denoised samples, as opposed to the simpler (e.g., Gaussian) latent space of GANs, VAEs, and normalizing flows. This paper provides an alternative, Gaussian formulation of the latent space of various diffusion models, as well as an invertible DPM-Encoder that maps images into the latent space. While our formulation is purely based on the definition of diffusion models, we demonstrate several intriguing consequences. (1) Empirically, we observe that a common latent space emerges from two diffusion models trained independently on related domains. In light of this finding, we propose CycleDiffusion, which uses DPM-Encoder for unpaired image-to-image translation. Furthermore, applying CycleDiffusion to text-to-image diffusion models, we show that large-scale text-to-image diffusion models can be used as zero-shot image-to-image editors. (2) One can guide pre-trained diffusion models and GANs by controlling the latent codes in a unified, plug-and-play formulation based on energy-based models. Using the CLIP model and a face recognition model as guidance, we demonstrate that diffusion models have better coverage of low-density sub-populations and individuals than GANs.*
|
||||||
|
|
||||||
|
*Tips*:
|
||||||
|
- The Cycle Diffusion pipeline is fully compatible with any [Stable Diffusion](./stable_diffusion) checkpoints
|
||||||
|
- Currently Cycle Diffusion only works with the [`DDIMScheduler`].
|
||||||
|
|
||||||
|
*Example*:
|
||||||
|
|
||||||
|
In the following we should how to best use the [`CycleDiffusionPipeline`]
|
||||||
|
|
||||||
|
```python
|
||||||
|
import requests
|
||||||
|
import torch
|
||||||
|
from PIL import Image
|
||||||
|
from io import BytesIO
|
||||||
|
|
||||||
|
from diffusers import CycleDiffusionPipeline, DDIMScheduler
|
||||||
|
|
||||||
|
# load the pipeline
|
||||||
|
# make sure you're logged in with `huggingface-cli login`
|
||||||
|
model_id_or_path = "CompVis/stable-diffusion-v1-4"
|
||||||
|
scheduler = DDIMScheduler.from_pretrained(model_id_or_path, subfolder="scheduler")
|
||||||
|
pipe = CycleDiffusionPipeline.from_pretrained(model_id_or_path, scheduler=scheduler).to("cuda")
|
||||||
|
|
||||||
|
# let's download an initial image
|
||||||
|
url = "https://raw.githubusercontent.com/ChenWu98/cycle-diffusion/main/data/dalle2/An%20astronaut%20riding%20a%20horse.png"
|
||||||
|
response = requests.get(url)
|
||||||
|
init_image = Image.open(BytesIO(response.content)).convert("RGB")
|
||||||
|
init_image = init_image.resize((512, 512))
|
||||||
|
init_image.save("horse.png")
|
||||||
|
|
||||||
|
# let's specify a prompt
|
||||||
|
source_prompt = "An astronaut riding a horse"
|
||||||
|
prompt = "An astronaut riding an elephant"
|
||||||
|
|
||||||
|
# call the pipeline
|
||||||
|
image = pipe(
|
||||||
|
prompt=prompt,
|
||||||
|
source_prompt=source_prompt,
|
||||||
|
image=init_image,
|
||||||
|
num_inference_steps=100,
|
||||||
|
eta=0.1,
|
||||||
|
strength=0.8,
|
||||||
|
guidance_scale=2,
|
||||||
|
source_guidance_scale=1,
|
||||||
|
).images[0]
|
||||||
|
|
||||||
|
image.save("horse_to_elephant.png")
|
||||||
|
|
||||||
|
# let's try another example
|
||||||
|
# See more samples at the original repo: https://github.com/ChenWu98/cycle-diffusion
|
||||||
|
url = "https://raw.githubusercontent.com/ChenWu98/cycle-diffusion/main/data/dalle2/A%20black%20colored%20car.png"
|
||||||
|
response = requests.get(url)
|
||||||
|
init_image = Image.open(BytesIO(response.content)).convert("RGB")
|
||||||
|
init_image = init_image.resize((512, 512))
|
||||||
|
init_image.save("black.png")
|
||||||
|
|
||||||
|
source_prompt = "A black colored car"
|
||||||
|
prompt = "A blue colored car"
|
||||||
|
|
||||||
|
# call the pipeline
|
||||||
|
torch.manual_seed(0)
|
||||||
|
image = pipe(
|
||||||
|
prompt=prompt,
|
||||||
|
source_prompt=source_prompt,
|
||||||
|
image=init_image,
|
||||||
|
num_inference_steps=100,
|
||||||
|
eta=0.1,
|
||||||
|
strength=0.85,
|
||||||
|
guidance_scale=3,
|
||||||
|
source_guidance_scale=1,
|
||||||
|
).images[0]
|
||||||
|
|
||||||
|
image.save("black_to_blue.png")
|
||||||
|
```
|
||||||
|
|
||||||
|
## CycleDiffusionPipeline
|
||||||
|
[[autodoc]] CycleDiffusionPipeline
|
||||||
|
- all
|
||||||
|
- __call__
|
||||||
@@ -12,22 +12,23 @@ specific language governing permissions and limitations under the License.
|
|||||||
|
|
||||||
# Dance Diffusion
|
# Dance Diffusion
|
||||||
|
|
||||||
[Dance Diffusion](https://github.com/Harmonai-org/sample-generator) is by Zach Evans.
|
## Overview
|
||||||
|
|
||||||
Dance Diffusion is the first in a suite of generative audio tools for producers and musicians released by [Harmonai](https://github.com/Harmonai-org).
|
[Dance Diffusion](https://github.com/Harmonai-org/sample-generator) by Zach Evans.
|
||||||
|
|
||||||
The original codebase of this implementation can be found at [Harmonai-org](https://github.com/Harmonai-org/sample-generator).
|
Dance Diffusion is the first in a suite of generative audio tools for producers and musicians to be released by Harmonai.
|
||||||
|
For more info or to get involved in the development of these tools, please visit https://harmonai.org and fill out the form on the front page.
|
||||||
|
|
||||||
<Tip>
|
The original codebase of this implementation can be found [here](https://github.com/Harmonai-org/sample-generator).
|
||||||
|
|
||||||
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-components-across-pipelines) section to learn how to efficiently load the same components into multiple pipelines.
|
## Available Pipelines:
|
||||||
|
|
||||||
|
| Pipeline | Tasks | Colab
|
||||||
|
|---|---|:---:|
|
||||||
|
| [pipeline_dance_diffusion.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/dance_diffusion/pipeline_dance_diffusion.py) | *Unconditional Audio Generation* | - |
|
||||||
|
|
||||||
</Tip>
|
|
||||||
|
|
||||||
## DanceDiffusionPipeline
|
## DanceDiffusionPipeline
|
||||||
[[autodoc]] DanceDiffusionPipeline
|
[[autodoc]] DanceDiffusionPipeline
|
||||||
- all
|
- all
|
||||||
- __call__
|
- __call__
|
||||||
|
|
||||||
## AudioPipelineOutput
|
|
||||||
[[autodoc]] pipelines.AudioPipelineOutput
|
|
||||||
@@ -1,29 +0,0 @@
|
|||||||
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
|
|
||||||
|
|
||||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
|
||||||
the License. You may obtain a copy of the License at
|
|
||||||
|
|
||||||
http://www.apache.org/licenses/LICENSE-2.0
|
|
||||||
|
|
||||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
|
||||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
|
||||||
specific language governing permissions and limitations under the License.
|
|
||||||
-->
|
|
||||||
|
|
||||||
# DDIM
|
|
||||||
|
|
||||||
[Denoising Diffusion Implicit Models](https://huggingface.co/papers/2010.02502) (DDIM) by Jiaming Song, Chenlin Meng and Stefano Ermon.
|
|
||||||
|
|
||||||
The abstract from the paper is:
|
|
||||||
|
|
||||||
*Denoising diffusion probabilistic models (DDPMs) have achieved high quality image generation without adversarial training, yet they require simulating a Markov chain for many steps to produce a sample. To accelerate sampling, we present denoising diffusion implicit models (DDIMs), a more efficient class of iterative implicit probabilistic models with the same training procedure as DDPMs. In DDPMs, the generative process is defined as the reverse of a Markovian diffusion process. We construct a class of non-Markovian diffusion processes that lead to the same training objective, but whose reverse process can be much faster to sample from. We empirically demonstrate that DDIMs can produce high quality samples 10× to 50× faster in terms of wall-clock time compared to DDPMs, allow us to trade off computation for sample quality, and can perform semantically meaningful image interpolation directly in the latent space.*
|
|
||||||
|
|
||||||
The original codebase can be found at [ermongroup/ddim](https://github.com/ermongroup/ddim).
|
|
||||||
|
|
||||||
## DDIMPipeline
|
|
||||||
[[autodoc]] DDIMPipeline
|
|
||||||
- all
|
|
||||||
- __call__
|
|
||||||
|
|
||||||
## ImagePipelineOutput
|
|
||||||
[[autodoc]] pipelines.ImagePipelineOutput
|
|
||||||
36
docs/source/en/api/pipelines/ddim.mdx
Normal file
36
docs/source/en/api/pipelines/ddim.mdx
Normal file
@@ -0,0 +1,36 @@
|
|||||||
|
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
|
||||||
|
|
||||||
|
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||||
|
the License. You may obtain a copy of the License at
|
||||||
|
|
||||||
|
http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
|
||||||
|
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||||
|
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||||
|
specific language governing permissions and limitations under the License.
|
||||||
|
-->
|
||||||
|
|
||||||
|
# DDIM
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
[Denoising Diffusion Implicit Models](https://arxiv.org/abs/2010.02502) (DDIM) by Jiaming Song, Chenlin Meng and Stefano Ermon.
|
||||||
|
|
||||||
|
The abstract of the paper is the following:
|
||||||
|
|
||||||
|
Denoising diffusion probabilistic models (DDPMs) have achieved high quality image generation without adversarial training, yet they require simulating a Markov chain for many steps to produce a sample. To accelerate sampling, we present denoising diffusion implicit models (DDIMs), a more efficient class of iterative implicit probabilistic models with the same training procedure as DDPMs. In DDPMs, the generative process is defined as the reverse of a Markovian diffusion process. We construct a class of non-Markovian diffusion processes that lead to the same training objective, but whose reverse process can be much faster to sample from. We empirically demonstrate that DDIMs can produce high quality samples 10× to 50× faster in terms of wall-clock time compared to DDPMs, allow us to trade off computation for sample quality, and can perform semantically meaningful image interpolation directly in the latent space.
|
||||||
|
|
||||||
|
The original codebase of this paper can be found here: [ermongroup/ddim](https://github.com/ermongroup/ddim).
|
||||||
|
For questions, feel free to contact the author on [tsong.me](https://tsong.me/).
|
||||||
|
|
||||||
|
## Available Pipelines:
|
||||||
|
|
||||||
|
| Pipeline | Tasks | Colab
|
||||||
|
|---|---|:---:|
|
||||||
|
| [pipeline_ddim.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/ddim/pipeline_ddim.py) | *Unconditional Image Generation* | - |
|
||||||
|
|
||||||
|
|
||||||
|
## DDIMPipeline
|
||||||
|
[[autodoc]] DDIMPipeline
|
||||||
|
- all
|
||||||
|
- __call__
|
||||||
@@ -1,35 +0,0 @@
|
|||||||
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
|
|
||||||
|
|
||||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
|
||||||
the License. You may obtain a copy of the License at
|
|
||||||
|
|
||||||
http://www.apache.org/licenses/LICENSE-2.0
|
|
||||||
|
|
||||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
|
||||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
|
||||||
specific language governing permissions and limitations under the License.
|
|
||||||
-->
|
|
||||||
|
|
||||||
# DDPM
|
|
||||||
|
|
||||||
[Denoising Diffusion Probabilistic Models](https://huggingface.co/papers/2006.11239) (DDPM) by Jonathan Ho, Ajay Jain and Pieter Abbeel proposes a diffusion based model of the same name. In the 🤗 Diffusers library, DDPM refers to the *discrete denoising scheduler* from the paper as well as the pipeline.
|
|
||||||
|
|
||||||
The abstract from the paper is:
|
|
||||||
|
|
||||||
*We present high quality image synthesis results using diffusion probabilistic models, a class of latent variable models inspired by considerations from nonequilibrium thermodynamics. Our best results are obtained by training on a weighted variational bound designed according to a novel connection between diffusion probabilistic models and denoising score matching with Langevin dynamics, and our models naturally admit a progressive lossy decompression scheme that can be interpreted as a generalization of autoregressive decoding. On the unconditional CIFAR10 dataset, we obtain an Inception score of 9.46 and a state-of-the-art FID score of 3.17. On 256x256 LSUN, we obtain sample quality similar to ProgressiveGAN.*
|
|
||||||
|
|
||||||
The original codebase can be found at [hohonathanho/diffusion](https://github.com/hojonathanho/diffusion).
|
|
||||||
|
|
||||||
<Tip>
|
|
||||||
|
|
||||||
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-components-across-pipelines) section to learn how to efficiently load the same components into multiple pipelines.
|
|
||||||
|
|
||||||
</Tip>
|
|
||||||
|
|
||||||
# DDPMPipeline
|
|
||||||
[[autodoc]] DDPMPipeline
|
|
||||||
- all
|
|
||||||
- __call__
|
|
||||||
|
|
||||||
## ImagePipelineOutput
|
|
||||||
[[autodoc]] pipelines.ImagePipelineOutput
|
|
||||||
37
docs/source/en/api/pipelines/ddpm.mdx
Normal file
37
docs/source/en/api/pipelines/ddpm.mdx
Normal file
@@ -0,0 +1,37 @@
|
|||||||
|
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
|
||||||
|
|
||||||
|
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||||
|
the License. You may obtain a copy of the License at
|
||||||
|
|
||||||
|
http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
|
||||||
|
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||||
|
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||||
|
specific language governing permissions and limitations under the License.
|
||||||
|
-->
|
||||||
|
|
||||||
|
# DDPM
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
[Denoising Diffusion Probabilistic Models](https://arxiv.org/abs/2006.11239)
|
||||||
|
(DDPM) by Jonathan Ho, Ajay Jain and Pieter Abbeel proposes the diffusion based model of the same name, but in the context of the 🤗 Diffusers library, DDPM refers to the discrete denoising scheduler from the paper as well as the pipeline.
|
||||||
|
|
||||||
|
The abstract of the paper is the following:
|
||||||
|
|
||||||
|
We present high quality image synthesis results using diffusion probabilistic models, a class of latent variable models inspired by considerations from nonequilibrium thermodynamics. Our best results are obtained by training on a weighted variational bound designed according to a novel connection between diffusion probabilistic models and denoising score matching with Langevin dynamics, and our models naturally admit a progressive lossy decompression scheme that can be interpreted as a generalization of autoregressive decoding. On the unconditional CIFAR10 dataset, we obtain an Inception score of 9.46 and a state-of-the-art FID score of 3.17. On 256x256 LSUN, we obtain sample quality similar to ProgressiveGAN.
|
||||||
|
|
||||||
|
The original codebase of this paper can be found [here](https://github.com/hojonathanho/diffusion).
|
||||||
|
|
||||||
|
|
||||||
|
## Available Pipelines:
|
||||||
|
|
||||||
|
| Pipeline | Tasks | Colab
|
||||||
|
|---|---|:---:|
|
||||||
|
| [pipeline_ddpm.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/ddpm/pipeline_ddpm.py) | *Unconditional Image Generation* | - |
|
||||||
|
|
||||||
|
|
||||||
|
# DDPMPipeline
|
||||||
|
[[autodoc]] DDPMPipeline
|
||||||
|
- all
|
||||||
|
- __call__
|
||||||
@@ -1,523 +0,0 @@
|
|||||||
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
|
|
||||||
|
|
||||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
|
||||||
the License. You may obtain a copy of the License at
|
|
||||||
|
|
||||||
http://www.apache.org/licenses/LICENSE-2.0
|
|
||||||
|
|
||||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
|
||||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
|
||||||
specific language governing permissions and limitations under the License.
|
|
||||||
-->
|
|
||||||
|
|
||||||
# DeepFloyd IF
|
|
||||||
|
|
||||||
## Overview
|
|
||||||
|
|
||||||
DeepFloyd IF is a novel state-of-the-art open-source text-to-image model with a high degree of photorealism and language understanding.
|
|
||||||
The model is a modular composed of a frozen text encoder and three cascaded pixel diffusion modules:
|
|
||||||
- Stage 1: a base model that generates 64x64 px image based on text prompt,
|
|
||||||
- Stage 2: a 64x64 px => 256x256 px super-resolution model, and a
|
|
||||||
- Stage 3: a 256x256 px => 1024x1024 px super-resolution model
|
|
||||||
Stage 1 and Stage 2 utilize a frozen text encoder based on the T5 transformer to extract text embeddings,
|
|
||||||
which are then fed into a UNet architecture enhanced with cross-attention and attention pooling.
|
|
||||||
Stage 3 is [Stability's x4 Upscaling model](https://huggingface.co/stabilityai/stable-diffusion-x4-upscaler).
|
|
||||||
The result is a highly efficient model that outperforms current state-of-the-art models, achieving a zero-shot FID score of 6.66 on the COCO dataset.
|
|
||||||
Our work underscores the potential of larger UNet architectures in the first stage of cascaded diffusion models and depicts a promising future for text-to-image synthesis.
|
|
||||||
|
|
||||||
## Usage
|
|
||||||
|
|
||||||
Before you can use IF, you need to accept its usage conditions. To do so:
|
|
||||||
1. Make sure to have a [Hugging Face account](https://huggingface.co/join) and be logged in
|
|
||||||
2. Accept the license on the model card of [DeepFloyd/IF-I-XL-v1.0](https://huggingface.co/DeepFloyd/IF-I-XL-v1.0). Accepting the license on the stage I model card will auto accept for the other IF models.
|
|
||||||
3. Make sure to login locally. Install `huggingface_hub`
|
|
||||||
```sh
|
|
||||||
pip install huggingface_hub --upgrade
|
|
||||||
```
|
|
||||||
|
|
||||||
run the login function in a Python shell
|
|
||||||
|
|
||||||
```py
|
|
||||||
from huggingface_hub import login
|
|
||||||
|
|
||||||
login()
|
|
||||||
```
|
|
||||||
|
|
||||||
and enter your [Hugging Face Hub access token](https://huggingface.co/docs/hub/security-tokens#what-are-user-access-tokens).
|
|
||||||
|
|
||||||
Next we install `diffusers` and dependencies:
|
|
||||||
|
|
||||||
```sh
|
|
||||||
pip install diffusers accelerate transformers safetensors
|
|
||||||
```
|
|
||||||
|
|
||||||
The following sections give more in-detail examples of how to use IF. Specifically:
|
|
||||||
|
|
||||||
- [Text-to-Image Generation](#text-to-image-generation)
|
|
||||||
- [Image-to-Image Generation](#text-guided-image-to-image-generation)
|
|
||||||
- [Inpainting](#text-guided-inpainting-generation)
|
|
||||||
- [Reusing model weights](#converting-between-different-pipelines)
|
|
||||||
- [Speed optimization](#optimizing-for-speed)
|
|
||||||
- [Memory optimization](#optimizing-for-memory)
|
|
||||||
|
|
||||||
**Available checkpoints**
|
|
||||||
- *Stage-1*
|
|
||||||
- [DeepFloyd/IF-I-XL-v1.0](https://huggingface.co/DeepFloyd/IF-I-XL-v1.0)
|
|
||||||
- [DeepFloyd/IF-I-L-v1.0](https://huggingface.co/DeepFloyd/IF-I-L-v1.0)
|
|
||||||
- [DeepFloyd/IF-I-M-v1.0](https://huggingface.co/DeepFloyd/IF-I-M-v1.0)
|
|
||||||
|
|
||||||
- *Stage-2*
|
|
||||||
- [DeepFloyd/IF-II-L-v1.0](https://huggingface.co/DeepFloyd/IF-II-L-v1.0)
|
|
||||||
- [DeepFloyd/IF-II-M-v1.0](https://huggingface.co/DeepFloyd/IF-II-M-v1.0)
|
|
||||||
|
|
||||||
- *Stage-3*
|
|
||||||
- [stabilityai/stable-diffusion-x4-upscaler](https://huggingface.co/stabilityai/stable-diffusion-x4-upscaler)
|
|
||||||
|
|
||||||
**Demo**
|
|
||||||
[](https://huggingface.co/spaces/DeepFloyd/IF)
|
|
||||||
|
|
||||||
**Google Colab**
|
|
||||||
[](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/deepfloyd_if_free_tier_google_colab.ipynb)
|
|
||||||
|
|
||||||
### Text-to-Image Generation
|
|
||||||
|
|
||||||
By default diffusers makes use of [model cpu offloading](https://huggingface.co/docs/diffusers/optimization/fp16#model-offloading-for-fast-inference-and-memory-savings)
|
|
||||||
to run the whole IF pipeline with as little as 14 GB of VRAM.
|
|
||||||
|
|
||||||
```python
|
|
||||||
from diffusers import DiffusionPipeline
|
|
||||||
from diffusers.utils import pt_to_pil
|
|
||||||
import torch
|
|
||||||
|
|
||||||
# stage 1
|
|
||||||
stage_1 = DiffusionPipeline.from_pretrained("DeepFloyd/IF-I-XL-v1.0", variant="fp16", torch_dtype=torch.float16)
|
|
||||||
stage_1.enable_model_cpu_offload()
|
|
||||||
|
|
||||||
# stage 2
|
|
||||||
stage_2 = DiffusionPipeline.from_pretrained(
|
|
||||||
"DeepFloyd/IF-II-L-v1.0", text_encoder=None, variant="fp16", torch_dtype=torch.float16
|
|
||||||
)
|
|
||||||
stage_2.enable_model_cpu_offload()
|
|
||||||
|
|
||||||
# stage 3
|
|
||||||
safety_modules = {
|
|
||||||
"feature_extractor": stage_1.feature_extractor,
|
|
||||||
"safety_checker": stage_1.safety_checker,
|
|
||||||
"watermarker": stage_1.watermarker,
|
|
||||||
}
|
|
||||||
stage_3 = DiffusionPipeline.from_pretrained(
|
|
||||||
"stabilityai/stable-diffusion-x4-upscaler", **safety_modules, torch_dtype=torch.float16
|
|
||||||
)
|
|
||||||
stage_3.enable_model_cpu_offload()
|
|
||||||
|
|
||||||
prompt = 'a photo of a kangaroo wearing an orange hoodie and blue sunglasses standing in front of the eiffel tower holding a sign that says "very deep learning"'
|
|
||||||
generator = torch.manual_seed(1)
|
|
||||||
|
|
||||||
# text embeds
|
|
||||||
prompt_embeds, negative_embeds = stage_1.encode_prompt(prompt)
|
|
||||||
|
|
||||||
# stage 1
|
|
||||||
image = stage_1(
|
|
||||||
prompt_embeds=prompt_embeds, negative_prompt_embeds=negative_embeds, generator=generator, output_type="pt"
|
|
||||||
).images
|
|
||||||
pt_to_pil(image)[0].save("./if_stage_I.png")
|
|
||||||
|
|
||||||
# stage 2
|
|
||||||
image = stage_2(
|
|
||||||
image=image,
|
|
||||||
prompt_embeds=prompt_embeds,
|
|
||||||
negative_prompt_embeds=negative_embeds,
|
|
||||||
generator=generator,
|
|
||||||
output_type="pt",
|
|
||||||
).images
|
|
||||||
pt_to_pil(image)[0].save("./if_stage_II.png")
|
|
||||||
|
|
||||||
# stage 3
|
|
||||||
image = stage_3(prompt=prompt, image=image, noise_level=100, generator=generator).images
|
|
||||||
image[0].save("./if_stage_III.png")
|
|
||||||
```
|
|
||||||
|
|
||||||
### Text Guided Image-to-Image Generation
|
|
||||||
|
|
||||||
The same IF model weights can be used for text-guided image-to-image translation or image variation.
|
|
||||||
In this case just make sure to load the weights using the [`IFInpaintingPipeline`] and [`IFInpaintingSuperResolutionPipeline`] pipelines.
|
|
||||||
|
|
||||||
**Note**: You can also directly move the weights of the text-to-image pipelines to the image-to-image pipelines
|
|
||||||
without loading them twice by making use of the [`~DiffusionPipeline.components()`] function as explained [here](#converting-between-different-pipelines).
|
|
||||||
|
|
||||||
```python
|
|
||||||
from diffusers import IFImg2ImgPipeline, IFImg2ImgSuperResolutionPipeline, DiffusionPipeline
|
|
||||||
from diffusers.utils import pt_to_pil
|
|
||||||
|
|
||||||
import torch
|
|
||||||
|
|
||||||
from PIL import Image
|
|
||||||
import requests
|
|
||||||
from io import BytesIO
|
|
||||||
|
|
||||||
# download image
|
|
||||||
url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg"
|
|
||||||
response = requests.get(url)
|
|
||||||
original_image = Image.open(BytesIO(response.content)).convert("RGB")
|
|
||||||
original_image = original_image.resize((768, 512))
|
|
||||||
|
|
||||||
# stage 1
|
|
||||||
stage_1 = IFImg2ImgPipeline.from_pretrained("DeepFloyd/IF-I-XL-v1.0", variant="fp16", torch_dtype=torch.float16)
|
|
||||||
stage_1.enable_model_cpu_offload()
|
|
||||||
|
|
||||||
# stage 2
|
|
||||||
stage_2 = IFImg2ImgSuperResolutionPipeline.from_pretrained(
|
|
||||||
"DeepFloyd/IF-II-L-v1.0", text_encoder=None, variant="fp16", torch_dtype=torch.float16
|
|
||||||
)
|
|
||||||
stage_2.enable_model_cpu_offload()
|
|
||||||
|
|
||||||
# stage 3
|
|
||||||
safety_modules = {
|
|
||||||
"feature_extractor": stage_1.feature_extractor,
|
|
||||||
"safety_checker": stage_1.safety_checker,
|
|
||||||
"watermarker": stage_1.watermarker,
|
|
||||||
}
|
|
||||||
stage_3 = DiffusionPipeline.from_pretrained(
|
|
||||||
"stabilityai/stable-diffusion-x4-upscaler", **safety_modules, torch_dtype=torch.float16
|
|
||||||
)
|
|
||||||
stage_3.enable_model_cpu_offload()
|
|
||||||
|
|
||||||
prompt = "A fantasy landscape in style minecraft"
|
|
||||||
generator = torch.manual_seed(1)
|
|
||||||
|
|
||||||
# text embeds
|
|
||||||
prompt_embeds, negative_embeds = stage_1.encode_prompt(prompt)
|
|
||||||
|
|
||||||
# stage 1
|
|
||||||
image = stage_1(
|
|
||||||
image=original_image,
|
|
||||||
prompt_embeds=prompt_embeds,
|
|
||||||
negative_prompt_embeds=negative_embeds,
|
|
||||||
generator=generator,
|
|
||||||
output_type="pt",
|
|
||||||
).images
|
|
||||||
pt_to_pil(image)[0].save("./if_stage_I.png")
|
|
||||||
|
|
||||||
# stage 2
|
|
||||||
image = stage_2(
|
|
||||||
image=image,
|
|
||||||
original_image=original_image,
|
|
||||||
prompt_embeds=prompt_embeds,
|
|
||||||
negative_prompt_embeds=negative_embeds,
|
|
||||||
generator=generator,
|
|
||||||
output_type="pt",
|
|
||||||
).images
|
|
||||||
pt_to_pil(image)[0].save("./if_stage_II.png")
|
|
||||||
|
|
||||||
# stage 3
|
|
||||||
image = stage_3(prompt=prompt, image=image, generator=generator, noise_level=100).images
|
|
||||||
image[0].save("./if_stage_III.png")
|
|
||||||
```
|
|
||||||
|
|
||||||
### Text Guided Inpainting Generation
|
|
||||||
|
|
||||||
The same IF model weights can be used for text-guided image-to-image translation or image variation.
|
|
||||||
In this case just make sure to load the weights using the [`IFInpaintingPipeline`] and [`IFInpaintingSuperResolutionPipeline`] pipelines.
|
|
||||||
|
|
||||||
**Note**: You can also directly move the weights of the text-to-image pipelines to the image-to-image pipelines
|
|
||||||
without loading them twice by making use of the [`~DiffusionPipeline.components()`] function as explained [here](#converting-between-different-pipelines).
|
|
||||||
|
|
||||||
```python
|
|
||||||
from diffusers import IFInpaintingPipeline, IFInpaintingSuperResolutionPipeline, DiffusionPipeline
|
|
||||||
from diffusers.utils import pt_to_pil
|
|
||||||
import torch
|
|
||||||
|
|
||||||
from PIL import Image
|
|
||||||
import requests
|
|
||||||
from io import BytesIO
|
|
||||||
|
|
||||||
# download image
|
|
||||||
url = "https://huggingface.co/datasets/diffusers/docs-images/resolve/main/if/person.png"
|
|
||||||
response = requests.get(url)
|
|
||||||
original_image = Image.open(BytesIO(response.content)).convert("RGB")
|
|
||||||
original_image = original_image
|
|
||||||
|
|
||||||
# download mask
|
|
||||||
url = "https://huggingface.co/datasets/diffusers/docs-images/resolve/main/if/glasses_mask.png"
|
|
||||||
response = requests.get(url)
|
|
||||||
mask_image = Image.open(BytesIO(response.content))
|
|
||||||
mask_image = mask_image
|
|
||||||
|
|
||||||
# stage 1
|
|
||||||
stage_1 = IFInpaintingPipeline.from_pretrained("DeepFloyd/IF-I-XL-v1.0", variant="fp16", torch_dtype=torch.float16)
|
|
||||||
stage_1.enable_model_cpu_offload()
|
|
||||||
|
|
||||||
# stage 2
|
|
||||||
stage_2 = IFInpaintingSuperResolutionPipeline.from_pretrained(
|
|
||||||
"DeepFloyd/IF-II-L-v1.0", text_encoder=None, variant="fp16", torch_dtype=torch.float16
|
|
||||||
)
|
|
||||||
stage_2.enable_model_cpu_offload()
|
|
||||||
|
|
||||||
# stage 3
|
|
||||||
safety_modules = {
|
|
||||||
"feature_extractor": stage_1.feature_extractor,
|
|
||||||
"safety_checker": stage_1.safety_checker,
|
|
||||||
"watermarker": stage_1.watermarker,
|
|
||||||
}
|
|
||||||
stage_3 = DiffusionPipeline.from_pretrained(
|
|
||||||
"stabilityai/stable-diffusion-x4-upscaler", **safety_modules, torch_dtype=torch.float16
|
|
||||||
)
|
|
||||||
stage_3.enable_model_cpu_offload()
|
|
||||||
|
|
||||||
prompt = "blue sunglasses"
|
|
||||||
generator = torch.manual_seed(1)
|
|
||||||
|
|
||||||
# text embeds
|
|
||||||
prompt_embeds, negative_embeds = stage_1.encode_prompt(prompt)
|
|
||||||
|
|
||||||
# stage 1
|
|
||||||
image = stage_1(
|
|
||||||
image=original_image,
|
|
||||||
mask_image=mask_image,
|
|
||||||
prompt_embeds=prompt_embeds,
|
|
||||||
negative_prompt_embeds=negative_embeds,
|
|
||||||
generator=generator,
|
|
||||||
output_type="pt",
|
|
||||||
).images
|
|
||||||
pt_to_pil(image)[0].save("./if_stage_I.png")
|
|
||||||
|
|
||||||
# stage 2
|
|
||||||
image = stage_2(
|
|
||||||
image=image,
|
|
||||||
original_image=original_image,
|
|
||||||
mask_image=mask_image,
|
|
||||||
prompt_embeds=prompt_embeds,
|
|
||||||
negative_prompt_embeds=negative_embeds,
|
|
||||||
generator=generator,
|
|
||||||
output_type="pt",
|
|
||||||
).images
|
|
||||||
pt_to_pil(image)[0].save("./if_stage_II.png")
|
|
||||||
|
|
||||||
# stage 3
|
|
||||||
image = stage_3(prompt=prompt, image=image, generator=generator, noise_level=100).images
|
|
||||||
image[0].save("./if_stage_III.png")
|
|
||||||
```
|
|
||||||
|
|
||||||
### Converting between different pipelines
|
|
||||||
|
|
||||||
In addition to being loaded with `from_pretrained`, Pipelines can also be loaded directly from each other.
|
|
||||||
|
|
||||||
```python
|
|
||||||
from diffusers import IFPipeline, IFSuperResolutionPipeline
|
|
||||||
|
|
||||||
pipe_1 = IFPipeline.from_pretrained("DeepFloyd/IF-I-XL-v1.0")
|
|
||||||
pipe_2 = IFSuperResolutionPipeline.from_pretrained("DeepFloyd/IF-II-L-v1.0")
|
|
||||||
|
|
||||||
|
|
||||||
from diffusers import IFImg2ImgPipeline, IFImg2ImgSuperResolutionPipeline
|
|
||||||
|
|
||||||
pipe_1 = IFImg2ImgPipeline(**pipe_1.components)
|
|
||||||
pipe_2 = IFImg2ImgSuperResolutionPipeline(**pipe_2.components)
|
|
||||||
|
|
||||||
|
|
||||||
from diffusers import IFInpaintingPipeline, IFInpaintingSuperResolutionPipeline
|
|
||||||
|
|
||||||
pipe_1 = IFInpaintingPipeline(**pipe_1.components)
|
|
||||||
pipe_2 = IFInpaintingSuperResolutionPipeline(**pipe_2.components)
|
|
||||||
```
|
|
||||||
|
|
||||||
### Optimizing for speed
|
|
||||||
|
|
||||||
The simplest optimization to run IF faster is to move all model components to the GPU.
|
|
||||||
|
|
||||||
```py
|
|
||||||
pipe = DiffusionPipeline.from_pretrained("DeepFloyd/IF-I-XL-v1.0", variant="fp16", torch_dtype=torch.float16)
|
|
||||||
pipe.to("cuda")
|
|
||||||
```
|
|
||||||
|
|
||||||
You can also run the diffusion process for a shorter number of timesteps.
|
|
||||||
|
|
||||||
This can either be done with the `num_inference_steps` argument
|
|
||||||
|
|
||||||
```py
|
|
||||||
pipe("<prompt>", num_inference_steps=30)
|
|
||||||
```
|
|
||||||
|
|
||||||
Or with the `timesteps` argument
|
|
||||||
|
|
||||||
```py
|
|
||||||
from diffusers.pipelines.deepfloyd_if import fast27_timesteps
|
|
||||||
|
|
||||||
pipe("<prompt>", timesteps=fast27_timesteps)
|
|
||||||
```
|
|
||||||
|
|
||||||
When doing image variation or inpainting, you can also decrease the number of timesteps
|
|
||||||
with the strength argument. The strength argument is the amount of noise to add to
|
|
||||||
the input image which also determines how many steps to run in the denoising process.
|
|
||||||
A smaller number will vary the image less but run faster.
|
|
||||||
|
|
||||||
```py
|
|
||||||
pipe = IFImg2ImgPipeline.from_pretrained("DeepFloyd/IF-I-XL-v1.0", variant="fp16", torch_dtype=torch.float16)
|
|
||||||
pipe.to("cuda")
|
|
||||||
|
|
||||||
image = pipe(image=image, prompt="<prompt>", strength=0.3).images
|
|
||||||
```
|
|
||||||
|
|
||||||
You can also use [`torch.compile`](../../optimization/torch2.0). Note that we have not exhaustively tested `torch.compile`
|
|
||||||
with IF and it might not give expected results.
|
|
||||||
|
|
||||||
```py
|
|
||||||
import torch
|
|
||||||
|
|
||||||
pipe = DiffusionPipeline.from_pretrained("DeepFloyd/IF-I-XL-v1.0", variant="fp16", torch_dtype=torch.float16)
|
|
||||||
pipe.to("cuda")
|
|
||||||
|
|
||||||
pipe.text_encoder = torch.compile(pipe.text_encoder)
|
|
||||||
pipe.unet = torch.compile(pipe.unet)
|
|
||||||
```
|
|
||||||
|
|
||||||
### Optimizing for memory
|
|
||||||
|
|
||||||
When optimizing for GPU memory, we can use the standard diffusers cpu offloading APIs.
|
|
||||||
|
|
||||||
Either the model based CPU offloading,
|
|
||||||
|
|
||||||
```py
|
|
||||||
pipe = DiffusionPipeline.from_pretrained("DeepFloyd/IF-I-XL-v1.0", variant="fp16", torch_dtype=torch.float16)
|
|
||||||
pipe.enable_model_cpu_offload()
|
|
||||||
```
|
|
||||||
|
|
||||||
or the more aggressive layer based CPU offloading.
|
|
||||||
|
|
||||||
```py
|
|
||||||
pipe = DiffusionPipeline.from_pretrained("DeepFloyd/IF-I-XL-v1.0", variant="fp16", torch_dtype=torch.float16)
|
|
||||||
pipe.enable_sequential_cpu_offload()
|
|
||||||
```
|
|
||||||
|
|
||||||
Additionally, T5 can be loaded in 8bit precision
|
|
||||||
|
|
||||||
```py
|
|
||||||
from transformers import T5EncoderModel
|
|
||||||
|
|
||||||
text_encoder = T5EncoderModel.from_pretrained(
|
|
||||||
"DeepFloyd/IF-I-XL-v1.0", subfolder="text_encoder", device_map="auto", load_in_8bit=True, variant="8bit"
|
|
||||||
)
|
|
||||||
|
|
||||||
from diffusers import DiffusionPipeline
|
|
||||||
|
|
||||||
pipe = DiffusionPipeline.from_pretrained(
|
|
||||||
"DeepFloyd/IF-I-XL-v1.0",
|
|
||||||
text_encoder=text_encoder, # pass the previously instantiated 8bit text encoder
|
|
||||||
unet=None,
|
|
||||||
device_map="auto",
|
|
||||||
)
|
|
||||||
|
|
||||||
prompt_embeds, negative_embeds = pipe.encode_prompt("<prompt>")
|
|
||||||
```
|
|
||||||
|
|
||||||
For CPU RAM constrained machines like google colab free tier where we can't load all
|
|
||||||
model components to the CPU at once, we can manually only load the pipeline with
|
|
||||||
the text encoder or unet when the respective model components are needed.
|
|
||||||
|
|
||||||
```py
|
|
||||||
from diffusers import IFPipeline, IFSuperResolutionPipeline
|
|
||||||
import torch
|
|
||||||
import gc
|
|
||||||
from transformers import T5EncoderModel
|
|
||||||
from diffusers.utils import pt_to_pil
|
|
||||||
|
|
||||||
text_encoder = T5EncoderModel.from_pretrained(
|
|
||||||
"DeepFloyd/IF-I-XL-v1.0", subfolder="text_encoder", device_map="auto", load_in_8bit=True, variant="8bit"
|
|
||||||
)
|
|
||||||
|
|
||||||
# text to image
|
|
||||||
|
|
||||||
pipe = DiffusionPipeline.from_pretrained(
|
|
||||||
"DeepFloyd/IF-I-XL-v1.0",
|
|
||||||
text_encoder=text_encoder, # pass the previously instantiated 8bit text encoder
|
|
||||||
unet=None,
|
|
||||||
device_map="auto",
|
|
||||||
)
|
|
||||||
|
|
||||||
prompt = 'a photo of a kangaroo wearing an orange hoodie and blue sunglasses standing in front of the eiffel tower holding a sign that says "very deep learning"'
|
|
||||||
prompt_embeds, negative_embeds = pipe.encode_prompt(prompt)
|
|
||||||
|
|
||||||
# Remove the pipeline so we can re-load the pipeline with the unet
|
|
||||||
del text_encoder
|
|
||||||
del pipe
|
|
||||||
gc.collect()
|
|
||||||
torch.cuda.empty_cache()
|
|
||||||
|
|
||||||
pipe = IFPipeline.from_pretrained(
|
|
||||||
"DeepFloyd/IF-I-XL-v1.0", text_encoder=None, variant="fp16", torch_dtype=torch.float16, device_map="auto"
|
|
||||||
)
|
|
||||||
|
|
||||||
generator = torch.Generator().manual_seed(0)
|
|
||||||
image = pipe(
|
|
||||||
prompt_embeds=prompt_embeds,
|
|
||||||
negative_prompt_embeds=negative_embeds,
|
|
||||||
output_type="pt",
|
|
||||||
generator=generator,
|
|
||||||
).images
|
|
||||||
|
|
||||||
pt_to_pil(image)[0].save("./if_stage_I.png")
|
|
||||||
|
|
||||||
# Remove the pipeline so we can load the super-resolution pipeline
|
|
||||||
del pipe
|
|
||||||
gc.collect()
|
|
||||||
torch.cuda.empty_cache()
|
|
||||||
|
|
||||||
# First super resolution
|
|
||||||
|
|
||||||
pipe = IFSuperResolutionPipeline.from_pretrained(
|
|
||||||
"DeepFloyd/IF-II-L-v1.0", text_encoder=None, variant="fp16", torch_dtype=torch.float16, device_map="auto"
|
|
||||||
)
|
|
||||||
|
|
||||||
generator = torch.Generator().manual_seed(0)
|
|
||||||
image = pipe(
|
|
||||||
image=image,
|
|
||||||
prompt_embeds=prompt_embeds,
|
|
||||||
negative_prompt_embeds=negative_embeds,
|
|
||||||
output_type="pt",
|
|
||||||
generator=generator,
|
|
||||||
).images
|
|
||||||
|
|
||||||
pt_to_pil(image)[0].save("./if_stage_II.png")
|
|
||||||
```
|
|
||||||
|
|
||||||
|
|
||||||
## Available Pipelines:
|
|
||||||
|
|
||||||
| Pipeline | Tasks | Colab
|
|
||||||
|---|---|:---:|
|
|
||||||
| [pipeline_if.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/deepfloyd_if/pipeline_if.py) | *Text-to-Image Generation* | - |
|
|
||||||
| [pipeline_if_superresolution.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/deepfloyd_if/pipeline_if.py) | *Text-to-Image Generation* | - |
|
|
||||||
| [pipeline_if_img2img.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/deepfloyd_if/pipeline_if_img2img.py) | *Image-to-Image Generation* | - |
|
|
||||||
| [pipeline_if_img2img_superresolution.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/deepfloyd_if/pipeline_if_img2img_superresolution.py) | *Image-to-Image Generation* | - |
|
|
||||||
| [pipeline_if_inpainting.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/deepfloyd_if/pipeline_if_inpainting.py) | *Image-to-Image Generation* | - |
|
|
||||||
| [pipeline_if_inpainting_superresolution.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/deepfloyd_if/pipeline_if_inpainting_superresolution.py) | *Image-to-Image Generation* | - |
|
|
||||||
|
|
||||||
## IFPipeline
|
|
||||||
[[autodoc]] IFPipeline
|
|
||||||
- all
|
|
||||||
- __call__
|
|
||||||
|
|
||||||
## IFSuperResolutionPipeline
|
|
||||||
[[autodoc]] IFSuperResolutionPipeline
|
|
||||||
- all
|
|
||||||
- __call__
|
|
||||||
|
|
||||||
## IFImg2ImgPipeline
|
|
||||||
[[autodoc]] IFImg2ImgPipeline
|
|
||||||
- all
|
|
||||||
- __call__
|
|
||||||
|
|
||||||
## IFImg2ImgSuperResolutionPipeline
|
|
||||||
[[autodoc]] IFImg2ImgSuperResolutionPipeline
|
|
||||||
- all
|
|
||||||
- __call__
|
|
||||||
|
|
||||||
## IFInpaintingPipeline
|
|
||||||
[[autodoc]] IFInpaintingPipeline
|
|
||||||
- all
|
|
||||||
- __call__
|
|
||||||
|
|
||||||
## IFInpaintingSuperResolutionPipeline
|
|
||||||
[[autodoc]] IFInpaintingSuperResolutionPipeline
|
|
||||||
- all
|
|
||||||
- __call__
|
|
||||||
@@ -1,55 +0,0 @@
|
|||||||
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
|
|
||||||
|
|
||||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
|
||||||
the License. You may obtain a copy of the License at
|
|
||||||
|
|
||||||
http://www.apache.org/licenses/LICENSE-2.0
|
|
||||||
|
|
||||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
|
||||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
|
||||||
specific language governing permissions and limitations under the License.
|
|
||||||
-->
|
|
||||||
|
|
||||||
# DiffEdit
|
|
||||||
|
|
||||||
[DiffEdit: Diffusion-based semantic image editing with mask guidance](https://huggingface.co/papers/2210.11427) is by Guillaume Couairon, Jakob Verbeek, Holger Schwenk, and Matthieu Cord.
|
|
||||||
|
|
||||||
The abstract from the paper is:
|
|
||||||
|
|
||||||
*Image generation has recently seen tremendous advances, with diffusion models allowing to synthesize convincing images for a large variety of text prompts. In this article, we propose DiffEdit, a method to take advantage of text-conditioned diffusion models for the task of semantic image editing, where the goal is to edit an image based on a text query. Semantic image editing is an extension of image generation, with the additional constraint that the generated image should be as similar as possible to a given input image. Current editing methods based on diffusion models usually require to provide a mask, making the task much easier by treating it as a conditional inpainting task. In contrast, our main contribution is able to automatically generate a mask highlighting regions of the input image that need to be edited, by contrasting predictions of a diffusion model conditioned on different text prompts. Moreover, we rely on latent inference to preserve content in those regions of interest and show excellent synergies with mask-based diffusion. DiffEdit achieves state-of-the-art editing performance on ImageNet. In addition, we evaluate semantic image editing in more challenging settings, using images from the COCO dataset as well as text-based generated images.*
|
|
||||||
|
|
||||||
The original codebase can be found at [Xiang-cd/DiffEdit-stable-diffusion](https://github.com/Xiang-cd/DiffEdit-stable-diffusion), and you can try it out in this [demo](https://blog.problemsolversguild.com/technical/research/2022/11/02/DiffEdit-Implementation.html).
|
|
||||||
|
|
||||||
This pipeline was contributed by [clarencechen](https://github.com/clarencechen). ❤️
|
|
||||||
|
|
||||||
## Tips
|
|
||||||
|
|
||||||
* The pipeline can generate masks that can be fed into other inpainting pipelines.
|
|
||||||
* In order to generate an image using this pipeline, both an image mask (source and target prompts can be manually specified or generated, and passed to [`~StableDiffusionDiffEditPipeline.generate_mask`])
|
|
||||||
and a set of partially inverted latents (generated using [`~StableDiffusionDiffEditPipeline.invert`]) _must_ be provided as arguments when calling the pipeline to generate the final edited image.
|
|
||||||
* The function [`~StableDiffusionDiffEditPipeline.generate_mask`] exposes two prompt arguments, `source_prompt` and `target_prompt`
|
|
||||||
that let you control the locations of the semantic edits in the final image to be generated. Let's say,
|
|
||||||
you wanted to translate from "cat" to "dog". In this case, the edit direction will be "cat -> dog". To reflect
|
|
||||||
this in the generated mask, you simply have to set the embeddings related to the phrases including "cat" to
|
|
||||||
`source_prompt` and "dog" to `target_prompt`.
|
|
||||||
* When generating partially inverted latents using `invert`, assign a caption or text embedding describing the
|
|
||||||
overall image to the `prompt` argument to help guide the inverse latent sampling process. In most cases, the
|
|
||||||
source concept is sufficiently descriptive to yield good results, but feel free to explore alternatives.
|
|
||||||
* When calling the pipeline to generate the final edited image, assign the source concept to `negative_prompt`
|
|
||||||
and the target concept to `prompt`. Taking the above example, you simply have to set the embeddings related to
|
|
||||||
the phrases including "cat" to `negative_prompt` and "dog" to `prompt`.
|
|
||||||
* If you wanted to reverse the direction in the example above, i.e., "dog -> cat", then it's recommended to:
|
|
||||||
* Swap the `source_prompt` and `target_prompt` in the arguments to `generate_mask`.
|
|
||||||
* Change the input prompt in [`~StableDiffusionDiffEditPipeline.invert`] to include "dog".
|
|
||||||
* Swap the `prompt` and `negative_prompt` in the arguments to call the pipeline to generate the final edited image.
|
|
||||||
* The source and target prompts, or their corresponding embeddings, can also be automatically generated. Please refer to the [DiffEdit](/using-diffusers/diffedit) guide for more details.
|
|
||||||
|
|
||||||
## StableDiffusionDiffEditPipeline
|
|
||||||
[[autodoc]] StableDiffusionDiffEditPipeline
|
|
||||||
- all
|
|
||||||
- generate_mask
|
|
||||||
- invert
|
|
||||||
- __call__
|
|
||||||
|
|
||||||
## StableDiffusionPipelineOutput
|
|
||||||
[[autodoc]] pipelines.stable_diffusion.StableDiffusionPipelineOutput
|
|
||||||
@@ -10,26 +10,50 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o
|
|||||||
specific language governing permissions and limitations under the License.
|
specific language governing permissions and limitations under the License.
|
||||||
-->
|
-->
|
||||||
|
|
||||||
# DiT
|
# Scalable Diffusion Models with Transformers (DiT)
|
||||||
|
|
||||||
[Scalable Diffusion Models with Transformers](https://huggingface.co/papers/2212.09748) (DiT) is by William Peebles and Saining Xie.
|
## Overview
|
||||||
|
|
||||||
The abstract from the paper is:
|
[Scalable Diffusion Models with Transformers](https://arxiv.org/abs/2212.09748) (DiT) by William Peebles and Saining Xie.
|
||||||
|
|
||||||
|
The abstract of the paper is the following:
|
||||||
|
|
||||||
*We explore a new class of diffusion models based on the transformer architecture. We train latent diffusion models of images, replacing the commonly-used U-Net backbone with a transformer that operates on latent patches. We analyze the scalability of our Diffusion Transformers (DiTs) through the lens of forward pass complexity as measured by Gflops. We find that DiTs with higher Gflops -- through increased transformer depth/width or increased number of input tokens -- consistently have lower FID. In addition to possessing good scalability properties, our largest DiT-XL/2 models outperform all prior diffusion models on the class-conditional ImageNet 512x512 and 256x256 benchmarks, achieving a state-of-the-art FID of 2.27 on the latter.*
|
*We explore a new class of diffusion models based on the transformer architecture. We train latent diffusion models of images, replacing the commonly-used U-Net backbone with a transformer that operates on latent patches. We analyze the scalability of our Diffusion Transformers (DiTs) through the lens of forward pass complexity as measured by Gflops. We find that DiTs with higher Gflops -- through increased transformer depth/width or increased number of input tokens -- consistently have lower FID. In addition to possessing good scalability properties, our largest DiT-XL/2 models outperform all prior diffusion models on the class-conditional ImageNet 512x512 and 256x256 benchmarks, achieving a state-of-the-art FID of 2.27 on the latter.*
|
||||||
|
|
||||||
The original codebase can be found at [facebookresearch/dit](https://github.com/facebookresearch/dit).
|
The original codebase of this paper can be found here: [facebookresearch/dit](https://github.com/facebookresearch/dit).
|
||||||
|
|
||||||
<Tip>
|
## Available Pipelines:
|
||||||
|
|
||||||
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-components-across-pipelines) section to learn how to efficiently load the same components into multiple pipelines.
|
| Pipeline | Tasks | Colab
|
||||||
|
|---|---|:---:|
|
||||||
|
| [pipeline_dit.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/dit/pipeline_dit.py) | *Conditional Image Generation* | - |
|
||||||
|
|
||||||
</Tip>
|
|
||||||
|
## Usage example
|
||||||
|
|
||||||
|
```python
|
||||||
|
from diffusers import DiTPipeline, DPMSolverMultistepScheduler
|
||||||
|
import torch
|
||||||
|
|
||||||
|
pipe = DiTPipeline.from_pretrained("facebook/DiT-XL-2-256", torch_dtype=torch.float16)
|
||||||
|
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
|
||||||
|
pipe = pipe.to("cuda")
|
||||||
|
|
||||||
|
# pick words from Imagenet class labels
|
||||||
|
pipe.labels # to print all available words
|
||||||
|
|
||||||
|
# pick words that exist in ImageNet
|
||||||
|
words = ["white shark", "umbrella"]
|
||||||
|
|
||||||
|
class_ids = pipe.get_label_ids(words)
|
||||||
|
|
||||||
|
generator = torch.manual_seed(33)
|
||||||
|
output = pipe(class_labels=class_ids, num_inference_steps=25, generator=generator)
|
||||||
|
|
||||||
|
image = output.images[0] # label 'white shark'
|
||||||
|
```
|
||||||
|
|
||||||
## DiTPipeline
|
## DiTPipeline
|
||||||
[[autodoc]] DiTPipeline
|
[[autodoc]] DiTPipeline
|
||||||
- all
|
- all
|
||||||
- __call__
|
- __call__
|
||||||
|
|
||||||
## ImagePipelineOutput
|
|
||||||
[[autodoc]] pipelines.ImagePipelineOutput
|
|
||||||
@@ -1,67 +0,0 @@
|
|||||||
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
|
|
||||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
|
||||||
the License. You may obtain a copy of the License at
|
|
||||||
http://www.apache.org/licenses/LICENSE-2.0
|
|
||||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
|
||||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
|
||||||
specific language governing permissions and limitations under the License.
|
|
||||||
-->
|
|
||||||
|
|
||||||
# Kandinsky 2.1
|
|
||||||
|
|
||||||
Kandinsky 2.1 is created by [Arseniy Shakhmatov](https://github.com/cene555), [Anton Razzhigaev](https://github.com/razzant), [Aleksandr Nikolich](https://github.com/AlexWortega), [Igor Pavlov](https://github.com/boomb0om), [Andrey Kuznetsov](https://github.com/kuznetsoffandrey) and [Denis Dimitrov](https://github.com/denndimitrov).
|
|
||||||
|
|
||||||
The description from it's GitHub page is:
|
|
||||||
|
|
||||||
*Kandinsky 2.1 inherits best practicies from Dall-E 2 and Latent diffusion, while introducing some new ideas. As text and image encoder it uses CLIP model and diffusion image prior (mapping) between latent spaces of CLIP modalities. This approach increases the visual performance of the model and unveils new horizons in blending images and text-guided image manipulation.*
|
|
||||||
|
|
||||||
The original codebase can be found at [ai-forever/Kandinsky-2](https://github.com/ai-forever/Kandinsky-2).
|
|
||||||
|
|
||||||
<Tip>
|
|
||||||
|
|
||||||
Check out the [Kandinsky Community](https://huggingface.co/kandinsky-community) organization on the Hub for the official model checkpoints for tasks like text-to-image, image-to-image, and inpainting.
|
|
||||||
|
|
||||||
</Tip>
|
|
||||||
|
|
||||||
## KandinskyPriorPipeline
|
|
||||||
|
|
||||||
[[autodoc]] KandinskyPriorPipeline
|
|
||||||
- all
|
|
||||||
- __call__
|
|
||||||
- interpolate
|
|
||||||
|
|
||||||
## KandinskyPipeline
|
|
||||||
|
|
||||||
[[autodoc]] KandinskyPipeline
|
|
||||||
- all
|
|
||||||
- __call__
|
|
||||||
|
|
||||||
## KandinskyCombinedPipeline
|
|
||||||
|
|
||||||
[[autodoc]] KandinskyCombinedPipeline
|
|
||||||
- all
|
|
||||||
- __call__
|
|
||||||
|
|
||||||
## KandinskyImg2ImgPipeline
|
|
||||||
|
|
||||||
[[autodoc]] KandinskyImg2ImgPipeline
|
|
||||||
- all
|
|
||||||
- __call__
|
|
||||||
|
|
||||||
## KandinskyImg2ImgCombinedPipeline
|
|
||||||
|
|
||||||
[[autodoc]] KandinskyImg2ImgCombinedPipeline
|
|
||||||
- all
|
|
||||||
- __call__
|
|
||||||
|
|
||||||
## KandinskyInpaintPipeline
|
|
||||||
|
|
||||||
[[autodoc]] KandinskyInpaintPipeline
|
|
||||||
- all
|
|
||||||
- __call__
|
|
||||||
|
|
||||||
## KandinskyInpaintCombinedPipeline
|
|
||||||
|
|
||||||
[[autodoc]] KandinskyInpaintCombinedPipeline
|
|
||||||
- all
|
|
||||||
- __call__
|
|
||||||
@@ -1,86 +0,0 @@
|
|||||||
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
|
|
||||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
|
||||||
the License. You may obtain a copy of the License at
|
|
||||||
http://www.apache.org/licenses/LICENSE-2.0
|
|
||||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
|
||||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
|
||||||
specific language governing permissions and limitations under the License.
|
|
||||||
-->
|
|
||||||
|
|
||||||
# Kandinsky 2.2
|
|
||||||
|
|
||||||
Kandinsky 2.1 is created by [Arseniy Shakhmatov](https://github.com/cene555), [Anton Razzhigaev](https://github.com/razzant), [Aleksandr Nikolich](https://github.com/AlexWortega), [Igor Pavlov](https://github.com/boomb0om), [Andrey Kuznetsov](https://github.com/kuznetsoffandrey) and [Denis Dimitrov](https://github.com/denndimitrov).
|
|
||||||
|
|
||||||
The description from it's GitHub page is:
|
|
||||||
|
|
||||||
*Kandinsky 2.2 brings substantial improvements upon its predecessor, Kandinsky 2.1, by introducing a new, more powerful image encoder - CLIP-ViT-G and the ControlNet support. The switch to CLIP-ViT-G as the image encoder significantly increases the model's capability to generate more aesthetic pictures and better understand text, thus enhancing the model's overall performance. The addition of the ControlNet mechanism allows the model to effectively control the process of generating images. This leads to more accurate and visually appealing outputs and opens new possibilities for text-guided image manipulation.*
|
|
||||||
|
|
||||||
The original codebase can be found at [ai-forever/Kandinsky-2](https://github.com/ai-forever/Kandinsky-2).
|
|
||||||
|
|
||||||
<Tip>
|
|
||||||
|
|
||||||
Check out the [Kandinsky Community](https://huggingface.co/kandinsky-community) organization on the Hub for the official model checkpoints for tasks like text-to-image, image-to-image, and inpainting.
|
|
||||||
|
|
||||||
</Tip>
|
|
||||||
|
|
||||||
## KandinskyV22PriorPipeline
|
|
||||||
|
|
||||||
[[autodoc]] KandinskyV22PriorPipeline
|
|
||||||
- all
|
|
||||||
- __call__
|
|
||||||
- interpolate
|
|
||||||
|
|
||||||
## KandinskyV22Pipeline
|
|
||||||
|
|
||||||
[[autodoc]] KandinskyV22Pipeline
|
|
||||||
- all
|
|
||||||
- __call__
|
|
||||||
|
|
||||||
## KandinskyV22CombinedPipeline
|
|
||||||
|
|
||||||
[[autodoc]] KandinskyV22CombinedPipeline
|
|
||||||
- all
|
|
||||||
- __call__
|
|
||||||
|
|
||||||
## KandinskyV22ControlnetPipeline
|
|
||||||
|
|
||||||
[[autodoc]] KandinskyV22ControlnetPipeline
|
|
||||||
- all
|
|
||||||
- __call__
|
|
||||||
|
|
||||||
## KandinskyV22PriorEmb2EmbPipeline
|
|
||||||
|
|
||||||
[[autodoc]] KandinskyV22PriorEmb2EmbPipeline
|
|
||||||
- all
|
|
||||||
- __call__
|
|
||||||
- interpolate
|
|
||||||
|
|
||||||
## KandinskyV22Img2ImgPipeline
|
|
||||||
|
|
||||||
[[autodoc]] KandinskyV22Img2ImgPipeline
|
|
||||||
- all
|
|
||||||
- __call__
|
|
||||||
|
|
||||||
## KandinskyV22Img2ImgCombinedPipeline
|
|
||||||
|
|
||||||
[[autodoc]] KandinskyV22Img2ImgCombinedPipeline
|
|
||||||
- all
|
|
||||||
- __call__
|
|
||||||
|
|
||||||
## KandinskyV22ControlnetImg2ImgPipeline
|
|
||||||
|
|
||||||
[[autodoc]] KandinskyV22ControlnetImg2ImgPipeline
|
|
||||||
- all
|
|
||||||
- __call__
|
|
||||||
|
|
||||||
## KandinskyV22InpaintPipeline
|
|
||||||
|
|
||||||
[[autodoc]] KandinskyV22InpaintPipeline
|
|
||||||
- all
|
|
||||||
- __call__
|
|
||||||
|
|
||||||
## KandinskyV22InpaintCombinedPipeline
|
|
||||||
|
|
||||||
[[autodoc]] KandinskyV22InpaintCombinedPipeline
|
|
||||||
- all
|
|
||||||
- __call__
|
|
||||||
@@ -1,40 +0,0 @@
|
|||||||
# Latent Consistency Models
|
|
||||||
|
|
||||||
Latent Consistency Models (LCMs) were proposed in [Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference](https://arxiv.org/abs/2310.04378) by Simian Luo, Yiqin Tan, Longbo Huang, Jian Li, and Hang Zhao.
|
|
||||||
|
|
||||||
The abstract of the [paper](https://arxiv.org/pdf/2310.04378.pdf) is as follows:
|
|
||||||
|
|
||||||
*Latent Diffusion models (LDMs) have achieved remarkable results in synthesizing high-resolution images. However, the iterative sampling process is computationally intensive and leads to slow generation. Inspired by Consistency Models (song et al.), we propose Latent Consistency Models (LCMs), enabling swift inference with minimal steps on any pre-trained LDMs, including Stable Diffusion (rombach et al). Viewing the guided reverse diffusion process as solving an augmented probability flow ODE (PF-ODE), LCMs are designed to directly predict the solution of such ODE in latent space, mitigating the need for numerous iterations and allowing rapid, high-fidelity sampling. Efficiently distilled from pre-trained classifier-free guided diffusion models, a high-quality 768 x 768 2~4-step LCM takes only 32 A100 GPU hours for training. Furthermore, we introduce Latent Consistency Fine-tuning (LCF), a novel method that is tailored for fine-tuning LCMs on customized image datasets. Evaluation on the LAION-5B-Aesthetics dataset demonstrates that LCMs achieve state-of-the-art text-to-image generation performance with few-step inference.*
|
|
||||||
|
|
||||||
A demo for the [SimianLuo/LCM_Dreamshaper_v7](https://huggingface.co/SimianLuo/LCM_Dreamshaper_v7) checkpoint can be found [here](https://huggingface.co/spaces/SimianLuo/Latent_Consistency_Model).
|
|
||||||
|
|
||||||
The pipelines were contributed by [luosiallen](https://luosiallen.github.io/), [nagolinc](https://github.com/nagolinc), and [dg845](https://github.com/dg845).
|
|
||||||
|
|
||||||
|
|
||||||
## LatentConsistencyModelPipeline
|
|
||||||
|
|
||||||
[[autodoc]] LatentConsistencyModelPipeline
|
|
||||||
- all
|
|
||||||
- __call__
|
|
||||||
- enable_freeu
|
|
||||||
- disable_freeu
|
|
||||||
- enable_vae_slicing
|
|
||||||
- disable_vae_slicing
|
|
||||||
- enable_vae_tiling
|
|
||||||
- disable_vae_tiling
|
|
||||||
|
|
||||||
## LatentConsistencyModelImg2ImgPipeline
|
|
||||||
|
|
||||||
[[autodoc]] LatentConsistencyModelImg2ImgPipeline
|
|
||||||
- all
|
|
||||||
- __call__
|
|
||||||
- enable_freeu
|
|
||||||
- disable_freeu
|
|
||||||
- enable_vae_slicing
|
|
||||||
- disable_vae_slicing
|
|
||||||
- enable_vae_tiling
|
|
||||||
- disable_vae_tiling
|
|
||||||
|
|
||||||
## StableDiffusionPipelineOutput
|
|
||||||
|
|
||||||
[[autodoc]] pipelines.stable_diffusion.StableDiffusionPipelineOutput
|
|
||||||
@@ -12,19 +12,31 @@ specific language governing permissions and limitations under the License.
|
|||||||
|
|
||||||
# Latent Diffusion
|
# Latent Diffusion
|
||||||
|
|
||||||
Latent Diffusion was proposed in [High-Resolution Image Synthesis with Latent Diffusion Models](https://huggingface.co/papers/2112.10752) by Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, Björn Ommer.
|
## Overview
|
||||||
|
|
||||||
The abstract from the paper is:
|
Latent Diffusion was proposed in [High-Resolution Image Synthesis with Latent Diffusion Models](https://arxiv.org/abs/2112.10752) by Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, Björn Ommer.
|
||||||
|
|
||||||
|
The abstract of the paper is the following:
|
||||||
|
|
||||||
*By decomposing the image formation process into a sequential application of denoising autoencoders, diffusion models (DMs) achieve state-of-the-art synthesis results on image data and beyond. Additionally, their formulation allows for a guiding mechanism to control the image generation process without retraining. However, since these models typically operate directly in pixel space, optimization of powerful DMs often consumes hundreds of GPU days and inference is expensive due to sequential evaluations. To enable DM training on limited computational resources while retaining their quality and flexibility, we apply them in the latent space of powerful pretrained autoencoders. In contrast to previous work, training diffusion models on such a representation allows for the first time to reach a near-optimal point between complexity reduction and detail preservation, greatly boosting visual fidelity. By introducing cross-attention layers into the model architecture, we turn diffusion models into powerful and flexible generators for general conditioning inputs such as text or bounding boxes and high-resolution synthesis becomes possible in a convolutional manner. Our latent diffusion models (LDMs) achieve a new state of the art for image inpainting and highly competitive performance on various tasks, including unconditional image generation, semantic scene synthesis, and super-resolution, while significantly reducing computational requirements compared to pixel-based DMs.*
|
*By decomposing the image formation process into a sequential application of denoising autoencoders, diffusion models (DMs) achieve state-of-the-art synthesis results on image data and beyond. Additionally, their formulation allows for a guiding mechanism to control the image generation process without retraining. However, since these models typically operate directly in pixel space, optimization of powerful DMs often consumes hundreds of GPU days and inference is expensive due to sequential evaluations. To enable DM training on limited computational resources while retaining their quality and flexibility, we apply them in the latent space of powerful pretrained autoencoders. In contrast to previous work, training diffusion models on such a representation allows for the first time to reach a near-optimal point between complexity reduction and detail preservation, greatly boosting visual fidelity. By introducing cross-attention layers into the model architecture, we turn diffusion models into powerful and flexible generators for general conditioning inputs such as text or bounding boxes and high-resolution synthesis becomes possible in a convolutional manner. Our latent diffusion models (LDMs) achieve a new state of the art for image inpainting and highly competitive performance on various tasks, including unconditional image generation, semantic scene synthesis, and super-resolution, while significantly reducing computational requirements compared to pixel-based DMs.*
|
||||||
|
|
||||||
The original codebase can be found at [Compvis/latent-diffusion](https://github.com/CompVis/latent-diffusion).
|
The original codebase can be found [here](https://github.com/CompVis/latent-diffusion).
|
||||||
|
|
||||||
<Tip>
|
## Tips:
|
||||||
|
|
||||||
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-components-across-pipelines) section to learn how to efficiently load the same components into multiple pipelines.
|
-
|
||||||
|
-
|
||||||
|
-
|
||||||
|
|
||||||
|
## Available Pipelines:
|
||||||
|
|
||||||
|
| Pipeline | Tasks | Colab
|
||||||
|
|---|---|:---:|
|
||||||
|
| [pipeline_latent_diffusion.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/latent_diffusion/pipeline_latent_diffusion.py) | *Text-to-Image Generation* | - |
|
||||||
|
| [pipeline_latent_diffusion_superresolution.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/latent_diffusion/pipeline_latent_diffusion_superresolution.py) | *Super Resolution* | - |
|
||||||
|
|
||||||
|
## Examples:
|
||||||
|
|
||||||
</Tip>
|
|
||||||
|
|
||||||
## LDMTextToImagePipeline
|
## LDMTextToImagePipeline
|
||||||
[[autodoc]] LDMTextToImagePipeline
|
[[autodoc]] LDMTextToImagePipeline
|
||||||
@@ -35,6 +47,3 @@ Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers)
|
|||||||
[[autodoc]] LDMSuperResolutionPipeline
|
[[autodoc]] LDMSuperResolutionPipeline
|
||||||
- all
|
- all
|
||||||
- __call__
|
- __call__
|
||||||
|
|
||||||
## ImagePipelineOutput
|
|
||||||
[[autodoc]] pipelines.ImagePipelineOutput
|
|
||||||
@@ -12,24 +12,31 @@ specific language governing permissions and limitations under the License.
|
|||||||
|
|
||||||
# Unconditional Latent Diffusion
|
# Unconditional Latent Diffusion
|
||||||
|
|
||||||
Unconditional Latent Diffusion was proposed in [High-Resolution Image Synthesis with Latent Diffusion Models](https://huggingface.co/papers/2112.10752) by Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, Björn Ommer.
|
## Overview
|
||||||
|
|
||||||
The abstract from the paper is:
|
Unconditional Latent Diffusion was proposed in [High-Resolution Image Synthesis with Latent Diffusion Models](https://arxiv.org/abs/2112.10752) by Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, Björn Ommer.
|
||||||
|
|
||||||
|
The abstract of the paper is the following:
|
||||||
|
|
||||||
*By decomposing the image formation process into a sequential application of denoising autoencoders, diffusion models (DMs) achieve state-of-the-art synthesis results on image data and beyond. Additionally, their formulation allows for a guiding mechanism to control the image generation process without retraining. However, since these models typically operate directly in pixel space, optimization of powerful DMs often consumes hundreds of GPU days and inference is expensive due to sequential evaluations. To enable DM training on limited computational resources while retaining their quality and flexibility, we apply them in the latent space of powerful pretrained autoencoders. In contrast to previous work, training diffusion models on such a representation allows for the first time to reach a near-optimal point between complexity reduction and detail preservation, greatly boosting visual fidelity. By introducing cross-attention layers into the model architecture, we turn diffusion models into powerful and flexible generators for general conditioning inputs such as text or bounding boxes and high-resolution synthesis becomes possible in a convolutional manner. Our latent diffusion models (LDMs) achieve a new state of the art for image inpainting and highly competitive performance on various tasks, including unconditional image generation, semantic scene synthesis, and super-resolution, while significantly reducing computational requirements compared to pixel-based DMs.*
|
*By decomposing the image formation process into a sequential application of denoising autoencoders, diffusion models (DMs) achieve state-of-the-art synthesis results on image data and beyond. Additionally, their formulation allows for a guiding mechanism to control the image generation process without retraining. However, since these models typically operate directly in pixel space, optimization of powerful DMs often consumes hundreds of GPU days and inference is expensive due to sequential evaluations. To enable DM training on limited computational resources while retaining their quality and flexibility, we apply them in the latent space of powerful pretrained autoencoders. In contrast to previous work, training diffusion models on such a representation allows for the first time to reach a near-optimal point between complexity reduction and detail preservation, greatly boosting visual fidelity. By introducing cross-attention layers into the model architecture, we turn diffusion models into powerful and flexible generators for general conditioning inputs such as text or bounding boxes and high-resolution synthesis becomes possible in a convolutional manner. Our latent diffusion models (LDMs) achieve a new state of the art for image inpainting and highly competitive performance on various tasks, including unconditional image generation, semantic scene synthesis, and super-resolution, while significantly reducing computational requirements compared to pixel-based DMs.*
|
||||||
|
|
||||||
The original codebase can be found at [CompVis/latent-diffusion](https://github.com/CompVis/latent-diffusion).
|
The original codebase can be found [here](https://github.com/CompVis/latent-diffusion).
|
||||||
|
|
||||||
<Tip>
|
## Tips:
|
||||||
|
|
||||||
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-components-across-pipelines) section to learn how to efficiently load the same components into multiple pipelines.
|
-
|
||||||
|
-
|
||||||
|
-
|
||||||
|
|
||||||
</Tip>
|
## Available Pipelines:
|
||||||
|
|
||||||
|
| Pipeline | Tasks | Colab
|
||||||
|
|---|---|:---:|
|
||||||
|
| [pipeline_latent_diffusion_uncond.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/latent_diffusion_uncond/pipeline_latent_diffusion_uncond.py) | *Unconditional Image Generation* | - |
|
||||||
|
|
||||||
|
## Examples:
|
||||||
|
|
||||||
## LDMPipeline
|
## LDMPipeline
|
||||||
[[autodoc]] LDMPipeline
|
[[autodoc]] LDMPipeline
|
||||||
- all
|
- all
|
||||||
- __call__
|
- __call__
|
||||||
|
|
||||||
## ImagePipelineOutput
|
|
||||||
[[autodoc]] pipelines.ImagePipelineOutput
|
|
||||||
@@ -1,55 +0,0 @@
|
|||||||
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
|
|
||||||
|
|
||||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
|
||||||
the License. You may obtain a copy of the License at
|
|
||||||
|
|
||||||
http://www.apache.org/licenses/LICENSE-2.0
|
|
||||||
|
|
||||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
|
||||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
|
||||||
specific language governing permissions and limitations under the License.
|
|
||||||
-->
|
|
||||||
|
|
||||||
# MusicLDM
|
|
||||||
|
|
||||||
MusicLDM was proposed in [MusicLDM: Enhancing Novelty in Text-to-Music Generation Using Beat-Synchronous Mixup Strategies](https://huggingface.co/papers/2308.01546) by Ke Chen, Yusong Wu, Haohe Liu, Marianna Nezhurina, Taylor Berg-Kirkpatrick, Shlomo Dubnov.
|
|
||||||
MusicLDM takes a text prompt as input and predicts the corresponding music sample.
|
|
||||||
|
|
||||||
Inspired by [Stable Diffusion](https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/overview) and [AudioLDM](https://huggingface.co/docs/diffusers/api/pipelines/audioldm/overview),
|
|
||||||
MusicLDM is a text-to-music _latent diffusion model (LDM)_ that learns continuous audio representations from [CLAP](https://huggingface.co/docs/transformers/main/model_doc/clap)
|
|
||||||
latents.
|
|
||||||
|
|
||||||
MusicLDM is trained on a corpus of 466 hours of music data. Beat-synchronous data augmentation strategies are applied to
|
|
||||||
the music samples, both in the time domain and in the latent space. Using beat-synchronous data augmentation strategies
|
|
||||||
encourages the model to interpolate between the training samples, but stay within the domain of the training data. The
|
|
||||||
result is generated music that is more diverse while staying faithful to the corresponding style.
|
|
||||||
|
|
||||||
The abstract of the paper is the following:
|
|
||||||
|
|
||||||
*In this paper, we present MusicLDM, a state-of-the-art text-to-music model that adapts Stable Diffusion and AudioLDM architectures to the music domain. We achieve this by retraining the contrastive language-audio pretraining model (CLAP) and the Hifi-GAN vocoder, as components of MusicLDM, on a collection of music data samples. Then, we leverage a beat tracking model and propose two different mixup strategies for data augmentation: beat-synchronous audio mixup and beat-synchronous latent mixup, to encourage the model to generate music more diverse while still staying faithful to the corresponding style.*
|
|
||||||
|
|
||||||
This pipeline was contributed by [sanchit-gandhi](https://huggingface.co/sanchit-gandhi).
|
|
||||||
|
|
||||||
## Tips
|
|
||||||
|
|
||||||
When constructing a prompt, keep in mind:
|
|
||||||
|
|
||||||
* Descriptive prompt inputs work best; use adjectives to describe the sound (for example, "high quality" or "clear") and make the prompt context specific where possible (e.g. "melodic techno with a fast beat and synths" works better than "techno").
|
|
||||||
* Using a *negative prompt* can significantly improve the quality of the generated audio. Try using a negative prompt of "low quality, average quality".
|
|
||||||
|
|
||||||
During inference:
|
|
||||||
|
|
||||||
* The _quality_ of the generated audio sample can be controlled by the `num_inference_steps` argument; higher steps give higher quality audio at the expense of slower inference.
|
|
||||||
* Multiple waveforms can be generated in one go: set `num_waveforms_per_prompt` to a value greater than 1 to enable. Automatic scoring will be performed between the generated waveforms and prompt text, and the audios ranked from best to worst accordingly.
|
|
||||||
* The _length_ of the generated audio sample can be controlled by varying the `audio_length_in_s` argument.
|
|
||||||
|
|
||||||
<Tip>
|
|
||||||
|
|
||||||
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-components-across-pipelines) section to learn how to efficiently load the same components into multiple pipelines.
|
|
||||||
|
|
||||||
</Tip>
|
|
||||||
|
|
||||||
## MusicLDMPipeline
|
|
||||||
[[autodoc]] MusicLDMPipeline
|
|
||||||
- all
|
|
||||||
- __call__
|
|
||||||
@@ -1,98 +0,0 @@
|
|||||||
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
|
|
||||||
|
|
||||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
|
||||||
the License. You may obtain a copy of the License at
|
|
||||||
|
|
||||||
http://www.apache.org/licenses/LICENSE-2.0
|
|
||||||
|
|
||||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
|
||||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
|
||||||
specific language governing permissions and limitations under the License.
|
|
||||||
-->
|
|
||||||
|
|
||||||
# Pipelines
|
|
||||||
|
|
||||||
Pipelines provide a simple way to run state-of-the-art diffusion models in inference by bundling all of the necessary components (multiple independently-trained models, schedulers, and processors) into a single end-to-end class. Pipelines are flexible and they can be adapted to use different schedulers or even model components.
|
|
||||||
|
|
||||||
All pipelines are built from the base [`DiffusionPipeline`] class which provides basic functionality for loading, downloading, and saving all the components. Specific pipeline types (for example [`StableDiffusionPipeline`]) loaded with [`~DiffusionPipeline.from_pretrained`] are automatically detected and the pipeline components are loaded and passed to the `__init__` function of the pipeline.
|
|
||||||
|
|
||||||
<Tip warning={true}>
|
|
||||||
|
|
||||||
You shouldn't use the [`DiffusionPipeline`] class for training. Individual components (for example, [`UNet2DModel`] and [`UNet2DConditionModel`]) of diffusion pipelines are usually trained individually, so we suggest directly working with them instead.
|
|
||||||
|
|
||||||
<br>
|
|
||||||
|
|
||||||
Pipelines do not offer any training functionality. You'll notice PyTorch's autograd is disabled by decorating the [`~DiffusionPipeline.__call__`] method with a [`torch.no_grad`](https://pytorch.org/docs/stable/generated/torch.no_grad.html) decorator because pipelines should not be used for training. If you're interested in training, please take a look at the [Training](../../training/overview) guides instead!
|
|
||||||
|
|
||||||
</Tip>
|
|
||||||
|
|
||||||
The table below lists all the pipelines currently available in 🤗 Diffusers and the tasks they support. Click on a pipeline to view its abstract and published paper.
|
|
||||||
|
|
||||||
| Pipeline | Tasks |
|
|
||||||
|---|---|
|
|
||||||
| [AltDiffusion](alt_diffusion) | image2image |
|
|
||||||
| [Attend-and-Excite](attend_and_excite) | text2image |
|
|
||||||
| [Audio Diffusion](audio_diffusion) | image2audio |
|
|
||||||
| [AudioLDM](audioldm) | text2audio |
|
|
||||||
| [AudioLDM2](audioldm2) | text2audio |
|
|
||||||
| [BLIP Diffusion](blip_diffusion) | text2image |
|
|
||||||
| [Consistency Models](consistency_models) | unconditional image generation |
|
|
||||||
| [ControlNet](controlnet) | text2image, image2image, inpainting |
|
|
||||||
| [ControlNet with Stable Diffusion XL](controlnet_sdxl) | text2image |
|
|
||||||
| [Cycle Diffusion](cycle_diffusion) | image2image |
|
|
||||||
| [Dance Diffusion](dance_diffusion) | unconditional audio generation |
|
|
||||||
| [DDIM](ddim) | unconditional image generation |
|
|
||||||
| [DDPM](ddpm) | unconditional image generation |
|
|
||||||
| [DeepFloyd IF](deepfloyd_if) | text2image, image2image, inpainting, super-resolution |
|
|
||||||
| [DiffEdit](diffedit) | inpainting |
|
|
||||||
| [DiT](dit) | text2image |
|
|
||||||
| [GLIGEN](gligen) | text2image |
|
|
||||||
| [InstructPix2Pix](pix2pix) | image editing |
|
|
||||||
| [Kandinsky](kandinsky) | text2image, image2image, inpainting, interpolation |
|
|
||||||
| [Kandinsky 2.2](kandinsky_v22) | text2image, image2image, inpainting |
|
|
||||||
| [Latent Diffusion](latent_diffusion) | text2image, super-resolution |
|
|
||||||
| [LDM3D](ldm3d_diffusion) | text2image, text-to-3D |
|
|
||||||
| [MultiDiffusion](panorama) | text2image |
|
|
||||||
| [MusicLDM](musicldm) | text2audio |
|
|
||||||
| [PaintByExample](paint_by_example) | inpainting |
|
|
||||||
| [ParaDiGMS](paradigms) | text2image |
|
|
||||||
| [Pix2Pix Zero](pix2pix_zero) | image editing |
|
|
||||||
| [PNDM](pndm) | unconditional image generation |
|
|
||||||
| [RePaint](repaint) | inpainting |
|
|
||||||
| [ScoreSdeVe](score_sde_ve) | unconditional image generation |
|
|
||||||
| [Self-Attention Guidance](self_attention_guidance) | text2image |
|
|
||||||
| [Semantic Guidance](semantic_stable_diffusion) | text2image |
|
|
||||||
| [Shap-E](shap_e) | text-to-3D, image-to-3D |
|
|
||||||
| [Spectrogram Diffusion](spectrogram_diffusion) | |
|
|
||||||
| [Stable Diffusion](stable_diffusion/overview) | text2image, image2image, depth2image, inpainting, image variation, latent upscaler, super-resolution |
|
|
||||||
| [Stable Diffusion Model Editing](model_editing) | model editing |
|
|
||||||
| [Stable Diffusion XL](stable_diffusion_xl) | text2image, image2image, inpainting |
|
|
||||||
| [Stable unCLIP](stable_unclip) | text2image, image variation |
|
|
||||||
| [KarrasVe](karras_ve) | unconditional image generation |
|
|
||||||
| [T2I Adapter](adapter) | text2image |
|
|
||||||
| [Text2Video](text_to_video) | text2video, video2video |
|
|
||||||
| [Text2Video Zero](text_to_video_zero) | text2video |
|
|
||||||
| [UnCLIP](unclip) | text2image, image variation |
|
|
||||||
| [Unconditional Latent Diffusion](latent_diffusion_uncond) | unconditional image generation |
|
|
||||||
| [UniDiffuser](unidiffuser) | text2image, image2text, image variation, text variation, unconditional image generation, unconditional audio generation |
|
|
||||||
| [Value-guided planning](value_guided_sampling) | value guided sampling |
|
|
||||||
| [Versatile Diffusion](versatile_diffusion) | text2image, image variation |
|
|
||||||
| [VQ Diffusion](vq_diffusion) | text2image |
|
|
||||||
| [Wuerstchen](wuerstchen) | text2image |
|
|
||||||
|
|
||||||
## DiffusionPipeline
|
|
||||||
|
|
||||||
[[autodoc]] DiffusionPipeline
|
|
||||||
- all
|
|
||||||
- __call__
|
|
||||||
- device
|
|
||||||
- to
|
|
||||||
- components
|
|
||||||
|
|
||||||
## FlaxDiffusionPipeline
|
|
||||||
|
|
||||||
[[autodoc]] pipelines.pipeline_flax_utils.FlaxDiffusionPipeline
|
|
||||||
|
|
||||||
## PushToHubMixin
|
|
||||||
|
|
||||||
[[autodoc]] utils.PushToHubMixin
|
|
||||||
214
docs/source/en/api/pipelines/overview.mdx
Normal file
214
docs/source/en/api/pipelines/overview.mdx
Normal file
@@ -0,0 +1,214 @@
|
|||||||
|
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
|
||||||
|
|
||||||
|
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||||
|
the License. You may obtain a copy of the License at
|
||||||
|
|
||||||
|
http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
|
||||||
|
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||||
|
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||||
|
specific language governing permissions and limitations under the License.
|
||||||
|
-->
|
||||||
|
|
||||||
|
# Pipelines
|
||||||
|
|
||||||
|
Pipelines provide a simple way to run state-of-the-art diffusion models in inference.
|
||||||
|
Most diffusion systems consist of multiple independently-trained models and highly adaptable scheduler
|
||||||
|
components - all of which are needed to have a functioning end-to-end diffusion system.
|
||||||
|
|
||||||
|
As an example, [Stable Diffusion](https://huggingface.co/blog/stable_diffusion) has three independently trained models:
|
||||||
|
- [Autoencoder](./api/models#vae)
|
||||||
|
- [Conditional Unet](./api/models#UNet2DConditionModel)
|
||||||
|
- [CLIP text encoder](https://huggingface.co/docs/transformers/v4.27.1/en/model_doc/clip#transformers.CLIPTextModel)
|
||||||
|
- a scheduler component, [scheduler](./api/scheduler#pndm),
|
||||||
|
- a [CLIPImageProcessor](https://huggingface.co/docs/transformers/v4.27.1/en/model_doc/clip#transformers.CLIPImageProcessor),
|
||||||
|
- as well as a [safety checker](./stable_diffusion#safety_checker).
|
||||||
|
All of these components are necessary to run stable diffusion in inference even though they were trained
|
||||||
|
or created independently from each other.
|
||||||
|
|
||||||
|
To that end, we strive to offer all open-sourced, state-of-the-art diffusion system under a unified API.
|
||||||
|
More specifically, we strive to provide pipelines that
|
||||||
|
- 1. can load the officially published weights and yield 1-to-1 the same outputs as the original implementation according to the corresponding paper (*e.g.* [LDMTextToImagePipeline](https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines/latent_diffusion), uses the officially released weights of [High-Resolution Image Synthesis with Latent Diffusion Models](https://arxiv.org/abs/2112.10752)),
|
||||||
|
- 2. have a simple user interface to run the model in inference (see the [Pipelines API](#pipelines-api) section),
|
||||||
|
- 3. are easy to understand with code that is self-explanatory and can be read along-side the official paper (see [Pipelines summary](#pipelines-summary)),
|
||||||
|
- 4. can easily be contributed by the community (see the [Contribution](#contribution) section).
|
||||||
|
|
||||||
|
**Note** that pipelines do not (and should not) offer any training functionality.
|
||||||
|
If you are looking for *official* training examples, please have a look at [examples](https://github.com/huggingface/diffusers/tree/main/examples).
|
||||||
|
|
||||||
|
## 🧨 Diffusers Summary
|
||||||
|
|
||||||
|
The following table summarizes all officially supported pipelines, their corresponding paper, and if
|
||||||
|
available a colab notebook to directly try them out.
|
||||||
|
|
||||||
|
|
||||||
|
| Pipeline | Paper | Tasks | Colab
|
||||||
|
|---|---|:---:|:---:|
|
||||||
|
| [alt_diffusion](./alt_diffusion) | [**AltDiffusion**](https://arxiv.org/abs/2211.06679) | Image-to-Image Text-Guided Generation | -
|
||||||
|
| [audio_diffusion](./audio_diffusion) | [**Audio Diffusion**](https://github.com/teticio/audio_diffusion.git) | Unconditional Audio Generation |
|
||||||
|
| [controlnet](./api/pipelines/stable_diffusion/controlnet) | [**ControlNet with Stable Diffusion**](https://arxiv.org/abs/2302.05543) | Image-to-Image Text-Guided Generation | [](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/controlnet.ipynb)
|
||||||
|
| [cycle_diffusion](./cycle_diffusion) | [**Cycle Diffusion**](https://arxiv.org/abs/2210.05559) | Image-to-Image Text-Guided Generation |
|
||||||
|
| [dance_diffusion](./dance_diffusion) | [**Dance Diffusion**](https://github.com/williamberman/diffusers.git) | Unconditional Audio Generation |
|
||||||
|
| [ddpm](./ddpm) | [**Denoising Diffusion Probabilistic Models**](https://arxiv.org/abs/2006.11239) | Unconditional Image Generation |
|
||||||
|
| [ddim](./ddim) | [**Denoising Diffusion Implicit Models**](https://arxiv.org/abs/2010.02502) | Unconditional Image Generation |
|
||||||
|
| [latent_diffusion](./latent_diffusion) | [**High-Resolution Image Synthesis with Latent Diffusion Models**](https://arxiv.org/abs/2112.10752)| Text-to-Image Generation |
|
||||||
|
| [latent_diffusion](./latent_diffusion) | [**High-Resolution Image Synthesis with Latent Diffusion Models**](https://arxiv.org/abs/2112.10752)| Super Resolution Image-to-Image |
|
||||||
|
| [latent_diffusion_uncond](./latent_diffusion_uncond) | [**High-Resolution Image Synthesis with Latent Diffusion Models**](https://arxiv.org/abs/2112.10752) | Unconditional Image Generation |
|
||||||
|
| [paint_by_example](./paint_by_example) | [**Paint by Example: Exemplar-based Image Editing with Diffusion Models**](https://arxiv.org/abs/2211.13227) | Image-Guided Image Inpainting |
|
||||||
|
| [pndm](./pndm) | [**Pseudo Numerical Methods for Diffusion Models on Manifolds**](https://arxiv.org/abs/2202.09778) | Unconditional Image Generation |
|
||||||
|
| [score_sde_ve](./score_sde_ve) | [**Score-Based Generative Modeling through Stochastic Differential Equations**](https://openreview.net/forum?id=PxTIG12RRHS) | Unconditional Image Generation |
|
||||||
|
| [score_sde_vp](./score_sde_vp) | [**Score-Based Generative Modeling through Stochastic Differential Equations**](https://openreview.net/forum?id=PxTIG12RRHS) | Unconditional Image Generation |
|
||||||
|
| [semantic_stable_diffusion](./semantic_stable_diffusion) | [**SEGA: Instructing Diffusion using Semantic Dimensions**](https://arxiv.org/abs/2301.12247) | Text-to-Image Generation |
|
||||||
|
| [stable_diffusion_text2img](./stable_diffusion/text2img) | [**Stable Diffusion**](https://stability.ai/blog/stable-diffusion-public-release) | Text-to-Image Generation | [](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/training_example.ipynb)
|
||||||
|
| [stable_diffusion_img2img](./stable_diffusion/img2img) | [**Stable Diffusion**](https://stability.ai/blog/stable-diffusion-public-release) | Image-to-Image Text-Guided Generation | [](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/image_2_image_using_diffusers.ipynb)
|
||||||
|
| [stable_diffusion_inpaint](./stable_diffusion/inpaint) | [**Stable Diffusion**](https://stability.ai/blog/stable-diffusion-public-release) | Text-Guided Image Inpainting | [](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/in_painting_with_stable_diffusion_using_diffusers.ipynb)
|
||||||
|
| [stable_diffusion_panorama](./stable_diffusion/panorama) | [**MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation**](https://arxiv.org/abs/2302.08113) | Text-Guided Panorama View Generation |
|
||||||
|
| [stable_diffusion_pix2pix](./stable_diffusion/pix2pix) | [**InstructPix2Pix: Learning to Follow Image Editing Instructions**](https://arxiv.org/abs/2211.09800) | Text-Based Image Editing |
|
||||||
|
| [stable_diffusion_pix2pix_zero](./stable_diffusion/pix2pix_zero) | [**Zero-shot Image-to-Image Translation**](https://arxiv.org/abs/2302.03027) | Text-Based Image Editing |
|
||||||
|
| [stable_diffusion_attend_and_excite](./stable_diffusion/attend_and_excite) | [**Attend and Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models**](https://arxiv.org/abs/2301.13826) | Text-to-Image Generation |
|
||||||
|
| [stable_diffusion_self_attention_guidance](./stable_diffusion/self_attention_guidance) | [**Self-Attention Guidance**](https://arxiv.org/abs/2210.00939) | Text-to-Image Generation |
|
||||||
|
| [stable_diffusion_image_variation](./stable_diffusion/image_variation) | [**Stable Diffusion Image Variations**](https://github.com/LambdaLabsML/lambda-diffusers#stable-diffusion-image-variations) | Image-to-Image Generation |
|
||||||
|
| [stable_diffusion_latent_upscale](./stable_diffusion/latent_upscale) | [**Stable Diffusion Latent Upscaler**](https://twitter.com/StabilityAI/status/1590531958815064065) | Text-Guided Super Resolution Image-to-Image |
|
||||||
|
| [stable_diffusion_2](./stable_diffusion_2/) | [**Stable Diffusion 2**](https://stability.ai/blog/stable-diffusion-v2-release) | Text-to-Image Generation |
|
||||||
|
| [stable_diffusion_2](./stable_diffusion_2) | [**Stable Diffusion 2**](https://stability.ai/blog/stable-diffusion-v2-release) | Text-Guided Image Inpainting |
|
||||||
|
| [stable_diffusion_2](./stable_diffusion_2) | [**Stable Diffusion 2**](https://stability.ai/blog/stable-diffusion-v2-release) | Depth-to-Image Text-Guided Generation |
|
||||||
|
| [stable_diffusion_2](./stable_diffusion_2) | [**Stable Diffusion 2**](https://stability.ai/blog/stable-diffusion-v2-release) | Text-Guided Super Resolution Image-to-Image |
|
||||||
|
| [stable_diffusion_safe](./stable_diffusion_safe) | [**Safe Stable Diffusion**](https://arxiv.org/abs/2211.05105) | Text-Guided Generation | [](https://colab.research.google.com/github/ml-research/safe-latent-diffusion/blob/main/examples/Safe%20Latent%20Diffusion.ipynb)
|
||||||
|
| [stable_unclip](./stable_unclip) | **Stable unCLIP** | Text-to-Image Generation |
|
||||||
|
| [stable_unclip](./stable_unclip) | **Stable unCLIP** | Image-to-Image Text-Guided Generation |
|
||||||
|
| [stochastic_karras_ve](./stochastic_karras_ve) | [**Elucidating the Design Space of Diffusion-Based Generative Models**](https://arxiv.org/abs/2206.00364) | Unconditional Image Generation |
|
||||||
|
| [text_to_video_sd](./api/pipelines/text_to_video) | [Modelscope's Text-to-video-synthesis Model in Open Domain](https://modelscope.cn/models/damo/text-to-video-synthesis/summary) | Text-to-Video Generation |
|
||||||
|
| [unclip](./unclip) | [Hierarchical Text-Conditional Image Generation with CLIP Latents](https://arxiv.org/abs/2204.06125) | Text-to-Image Generation |
|
||||||
|
| [versatile_diffusion](./versatile_diffusion) | [Versatile Diffusion: Text, Images and Variations All in One Diffusion Model](https://arxiv.org/abs/2211.08332) | Text-to-Image Generation |
|
||||||
|
| [versatile_diffusion](./versatile_diffusion) | [Versatile Diffusion: Text, Images and Variations All in One Diffusion Model](https://arxiv.org/abs/2211.08332) | Image Variations Generation |
|
||||||
|
| [versatile_diffusion](./versatile_diffusion) | [Versatile Diffusion: Text, Images and Variations All in One Diffusion Model](https://arxiv.org/abs/2211.08332) | Dual Image and Text Guided Generation |
|
||||||
|
| [vq_diffusion](./vq_diffusion) | [Vector Quantized Diffusion Model for Text-to-Image Synthesis](https://arxiv.org/abs/2111.14822) | Text-to-Image Generation |
|
||||||
|
| [text_to_video_zero](./text_to_video_zero) | [Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators](https://arxiv.org/abs/2303.13439) | Text-to-Video Generation |
|
||||||
|
|
||||||
|
|
||||||
|
**Note**: Pipelines are simple examples of how to play around with the diffusion systems as described in the corresponding papers.
|
||||||
|
|
||||||
|
However, most of them can be adapted to use different scheduler components or even different model components. Some pipeline examples are shown in the [Examples](#examples) below.
|
||||||
|
|
||||||
|
## Pipelines API
|
||||||
|
|
||||||
|
Diffusion models often consist of multiple independently-trained models or other previously existing components.
|
||||||
|
|
||||||
|
|
||||||
|
Each model has been trained independently on a different task and the scheduler can easily be swapped out and replaced with a different one.
|
||||||
|
During inference, we however want to be able to easily load all components and use them in inference - even if one component, *e.g.* CLIP's text encoder, originates from a different library, such as [Transformers](https://github.com/huggingface/transformers). To that end, all pipelines provide the following functionality:
|
||||||
|
|
||||||
|
- [`from_pretrained` method](../diffusion_pipeline) that accepts a Hugging Face Hub repository id, *e.g.* [runwayml/stable-diffusion-v1-5](https://huggingface.co/runwayml/stable-diffusion-v1-5) or a path to a local directory, *e.g.*
|
||||||
|
"./stable-diffusion". To correctly retrieve which models and components should be loaded, one has to provide a `model_index.json` file, *e.g.* [runwayml/stable-diffusion-v1-5/model_index.json](https://huggingface.co/runwayml/stable-diffusion-v1-5/blob/main/model_index.json), which defines all components that should be
|
||||||
|
loaded into the pipelines. More specifically, for each model/component one needs to define the format `<name>: ["<library>", "<class name>"]`. `<name>` is the attribute name given to the loaded instance of `<class name>` which can be found in the library or pipeline folder called `"<library>"`.
|
||||||
|
- [`save_pretrained`](../diffusion_pipeline) that accepts a local path, *e.g.* `./stable-diffusion` under which all models/components of the pipeline will be saved. For each component/model a folder is created inside the local path that is named after the given attribute name, *e.g.* `./stable_diffusion/unet`.
|
||||||
|
In addition, a `model_index.json` file is created at the root of the local path, *e.g.* `./stable_diffusion/model_index.json` so that the complete pipeline can again be instantiated
|
||||||
|
from the local path.
|
||||||
|
- [`to`](../diffusion_pipeline) which accepts a `string` or `torch.device` to move all models that are of type `torch.nn.Module` to the passed device. The behavior is fully analogous to [PyTorch's `to` method](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.to).
|
||||||
|
- [`__call__`] method to use the pipeline in inference. `__call__` defines inference logic of the pipeline and should ideally encompass all aspects of it, from pre-processing to forwarding tensors to the different models and schedulers, as well as post-processing. The API of the `__call__` method can strongly vary from pipeline to pipeline. *E.g.* a text-to-image pipeline, such as [`StableDiffusionPipeline`](./stable_diffusion) should accept among other things the text prompt to generate the image. A pure image generation pipeline, such as [DDPMPipeline](https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines/ddpm) on the other hand can be run without providing any inputs. To better understand what inputs can be adapted for
|
||||||
|
each pipeline, one should look directly into the respective pipeline.
|
||||||
|
|
||||||
|
**Note**: All pipelines have PyTorch's autograd disabled by decorating the `__call__` method with a [`torch.no_grad`](https://pytorch.org/docs/stable/generated/torch.no_grad.html) decorator because pipelines should
|
||||||
|
not be used for training. If you want to store the gradients during the forward pass, we recommend writing your own pipeline, see also our [community-examples](https://github.com/huggingface/diffusers/tree/main/examples/community).
|
||||||
|
|
||||||
|
## Contribution
|
||||||
|
|
||||||
|
We are more than happy about any contribution to the officially supported pipelines 🤗. We aspire
|
||||||
|
all of our pipelines to be **self-contained**, **easy-to-tweak**, **beginner-friendly** and for **one-purpose-only**.
|
||||||
|
|
||||||
|
- **Self-contained**: A pipeline shall be as self-contained as possible. More specifically, this means that all functionality should be either directly defined in the pipeline file itself, should be inherited from (and only from) the [`DiffusionPipeline` class](.../diffusion_pipeline) or be directly attached to the model and scheduler components of the pipeline.
|
||||||
|
- **Easy-to-use**: Pipelines should be extremely easy to use - one should be able to load the pipeline and
|
||||||
|
use it for its designated task, *e.g.* text-to-image generation, in just a couple of lines of code. Most
|
||||||
|
logic including pre-processing, an unrolled diffusion loop, and post-processing should all happen inside the `__call__` method.
|
||||||
|
- **Easy-to-tweak**: Certain pipelines will not be able to handle all use cases and tasks that you might like them to. If you want to use a certain pipeline for a specific use case that is not yet supported, you might have to copy the pipeline file and tweak the code to your needs. We try to make the pipeline code as readable as possible so that each part –from pre-processing to diffusing to post-processing– can easily be adapted. If you would like the community to benefit from your customized pipeline, we would love to see a contribution to our [community-examples](https://github.com/huggingface/diffusers/tree/main/examples/community). If you feel that an important pipeline should be part of the official pipelines but isn't, a contribution to the [official pipelines](./overview) would be even better.
|
||||||
|
- **One-purpose-only**: Pipelines should be used for one task and one task only. Even if two tasks are very similar from a modeling point of view, *e.g.* image2image translation and in-painting, pipelines shall be used for one task only to keep them *easy-to-tweak* and *readable*.
|
||||||
|
|
||||||
|
## Examples
|
||||||
|
|
||||||
|
### Text-to-Image generation with Stable Diffusion
|
||||||
|
|
||||||
|
```python
|
||||||
|
# make sure you're logged in with `huggingface-cli login`
|
||||||
|
from diffusers import StableDiffusionPipeline, LMSDiscreteScheduler
|
||||||
|
|
||||||
|
pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
|
||||||
|
pipe = pipe.to("cuda")
|
||||||
|
|
||||||
|
prompt = "a photo of an astronaut riding a horse on mars"
|
||||||
|
image = pipe(prompt).images[0]
|
||||||
|
|
||||||
|
image.save("astronaut_rides_horse.png")
|
||||||
|
```
|
||||||
|
|
||||||
|
### Image-to-Image text-guided generation with Stable Diffusion
|
||||||
|
|
||||||
|
The `StableDiffusionImg2ImgPipeline` lets you pass a text prompt and an initial image to condition the generation of new images.
|
||||||
|
|
||||||
|
```python
|
||||||
|
import requests
|
||||||
|
from PIL import Image
|
||||||
|
from io import BytesIO
|
||||||
|
|
||||||
|
from diffusers import StableDiffusionImg2ImgPipeline
|
||||||
|
|
||||||
|
# load the pipeline
|
||||||
|
device = "cuda"
|
||||||
|
pipe = StableDiffusionImg2ImgPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16).to(
|
||||||
|
device
|
||||||
|
)
|
||||||
|
|
||||||
|
# let's download an initial image
|
||||||
|
url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg"
|
||||||
|
|
||||||
|
response = requests.get(url)
|
||||||
|
init_image = Image.open(BytesIO(response.content)).convert("RGB")
|
||||||
|
init_image = init_image.resize((768, 512))
|
||||||
|
|
||||||
|
prompt = "A fantasy landscape, trending on artstation"
|
||||||
|
|
||||||
|
images = pipe(prompt=prompt, image=init_image, strength=0.75, guidance_scale=7.5).images
|
||||||
|
|
||||||
|
images[0].save("fantasy_landscape.png")
|
||||||
|
```
|
||||||
|
You can also run this example on colab [](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/image_2_image_using_diffusers.ipynb)
|
||||||
|
|
||||||
|
### Tweak prompts reusing seeds and latents
|
||||||
|
|
||||||
|
You can generate your own latents to reproduce results, or tweak your prompt on a specific result you liked. [This notebook](https://github.com/pcuenca/diffusers-examples/blob/main/notebooks/stable-diffusion-seeds.ipynb) shows how to do it step by step. You can also run it in Google Colab [](https://colab.research.google.com/github/pcuenca/diffusers-examples/blob/main/notebooks/stable-diffusion-seeds.ipynb)
|
||||||
|
|
||||||
|
|
||||||
|
### In-painting using Stable Diffusion
|
||||||
|
|
||||||
|
The `StableDiffusionInpaintPipeline` lets you edit specific parts of an image by providing a mask and text prompt.
|
||||||
|
|
||||||
|
```python
|
||||||
|
import PIL
|
||||||
|
import requests
|
||||||
|
import torch
|
||||||
|
from io import BytesIO
|
||||||
|
|
||||||
|
from diffusers import StableDiffusionInpaintPipeline
|
||||||
|
|
||||||
|
|
||||||
|
def download_image(url):
|
||||||
|
response = requests.get(url)
|
||||||
|
return PIL.Image.open(BytesIO(response.content)).convert("RGB")
|
||||||
|
|
||||||
|
|
||||||
|
img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png"
|
||||||
|
mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png"
|
||||||
|
|
||||||
|
init_image = download_image(img_url).resize((512, 512))
|
||||||
|
mask_image = download_image(mask_url).resize((512, 512))
|
||||||
|
|
||||||
|
pipe = StableDiffusionInpaintPipeline.from_pretrained(
|
||||||
|
"runwayml/stable-diffusion-inpainting",
|
||||||
|
torch_dtype=torch.float16,
|
||||||
|
)
|
||||||
|
pipe = pipe.to("cuda")
|
||||||
|
|
||||||
|
prompt = "Face of a yellow cat, high resolution, sitting on a park bench"
|
||||||
|
image = pipe(prompt=prompt, image=init_image, mask_image=mask_image).images[0]
|
||||||
|
```
|
||||||
|
|
||||||
|
You can also run this example on colab [](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/in_painting_with_stable_diffusion_using_diffusers.ipynb)
|
||||||
@@ -1,39 +0,0 @@
|
|||||||
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
|
|
||||||
|
|
||||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
|
||||||
the License. You may obtain a copy of the License at
|
|
||||||
|
|
||||||
http://www.apache.org/licenses/LICENSE-2.0
|
|
||||||
|
|
||||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
|
||||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
|
||||||
specific language governing permissions and limitations under the License.
|
|
||||||
-->
|
|
||||||
|
|
||||||
# Paint By Example
|
|
||||||
|
|
||||||
[Paint by Example: Exemplar-based Image Editing with Diffusion Models](https://huggingface.co/papers/2211.13227) is by Binxin Yang, Shuyang Gu, Bo Zhang, Ting Zhang, Xuejin Chen, Xiaoyan Sun, Dong Chen, Fang Wen.
|
|
||||||
|
|
||||||
The abstract from the paper is:
|
|
||||||
|
|
||||||
*Language-guided image editing has achieved great success recently. In this paper, for the first time, we investigate exemplar-guided image editing for more precise control. We achieve this goal by leveraging self-supervised training to disentangle and re-organize the source image and the exemplar. However, the naive approach will cause obvious fusing artifacts. We carefully analyze it and propose an information bottleneck and strong augmentations to avoid the trivial solution of directly copying and pasting the exemplar image. Meanwhile, to ensure the controllability of the editing process, we design an arbitrary shape mask for the exemplar image and leverage the classifier-free guidance to increase the similarity to the exemplar image. The whole framework involves a single forward of the diffusion model without any iterative optimization. We demonstrate that our method achieves an impressive performance and enables controllable editing on in-the-wild images with high fidelity.*
|
|
||||||
|
|
||||||
The original codebase can be found at [Fantasy-Studio/Paint-by-Example](https://github.com/Fantasy-Studio/Paint-by-Example), and you can try it out in a [demo](https://huggingface.co/spaces/Fantasy-Studio/Paint-by-Example).
|
|
||||||
|
|
||||||
## Tips
|
|
||||||
|
|
||||||
PaintByExample is supported by the official [Fantasy-Studio/Paint-by-Example](https://huggingface.co/Fantasy-Studio/Paint-by-Example) checkpoint. The checkpoint is warm-started from [CompVis/stable-diffusion-v1-4](https://huggingface.co/CompVis/stable-diffusion-v1-4) to inpaint partly masked images conditioned on example and reference images.
|
|
||||||
|
|
||||||
<Tip>
|
|
||||||
|
|
||||||
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-components-across-pipelines) section to learn how to efficiently load the same components into multiple pipelines.
|
|
||||||
|
|
||||||
</Tip>
|
|
||||||
|
|
||||||
## PaintByExamplePipeline
|
|
||||||
[[autodoc]] PaintByExamplePipeline
|
|
||||||
- all
|
|
||||||
- __call__
|
|
||||||
|
|
||||||
## StableDiffusionPipelineOutput
|
|
||||||
[[autodoc]] pipelines.stable_diffusion.StableDiffusionPipelineOutput
|
|
||||||
74
docs/source/en/api/pipelines/paint_by_example.mdx
Normal file
74
docs/source/en/api/pipelines/paint_by_example.mdx
Normal file
@@ -0,0 +1,74 @@
|
|||||||
|
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
|
||||||
|
|
||||||
|
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||||
|
the License. You may obtain a copy of the License at
|
||||||
|
|
||||||
|
http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
|
||||||
|
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||||
|
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||||
|
specific language governing permissions and limitations under the License.
|
||||||
|
-->
|
||||||
|
|
||||||
|
# PaintByExample
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
[Paint by Example: Exemplar-based Image Editing with Diffusion Models](https://arxiv.org/abs/2211.13227) by Binxin Yang, Shuyang Gu, Bo Zhang, Ting Zhang, Xuejin Chen, Xiaoyan Sun, Dong Chen, Fang Wen.
|
||||||
|
|
||||||
|
The abstract of the paper is the following:
|
||||||
|
|
||||||
|
*Language-guided image editing has achieved great success recently. In this paper, for the first time, we investigate exemplar-guided image editing for more precise control. We achieve this goal by leveraging self-supervised training to disentangle and re-organize the source image and the exemplar. However, the naive approach will cause obvious fusing artifacts. We carefully analyze it and propose an information bottleneck and strong augmentations to avoid the trivial solution of directly copying and pasting the exemplar image. Meanwhile, to ensure the controllability of the editing process, we design an arbitrary shape mask for the exemplar image and leverage the classifier-free guidance to increase the similarity to the exemplar image. The whole framework involves a single forward of the diffusion model without any iterative optimization. We demonstrate that our method achieves an impressive performance and enables controllable editing on in-the-wild images with high fidelity.*
|
||||||
|
|
||||||
|
The original codebase can be found [here](https://github.com/Fantasy-Studio/Paint-by-Example).
|
||||||
|
|
||||||
|
## Available Pipelines:
|
||||||
|
|
||||||
|
| Pipeline | Tasks | Colab
|
||||||
|
|---|---|:---:|
|
||||||
|
| [pipeline_paint_by_example.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/paint_by_example/pipeline_paint_by_example.py) | *Image-Guided Image Painting* | - |
|
||||||
|
|
||||||
|
## Tips
|
||||||
|
|
||||||
|
- PaintByExample is supported by the official [Fantasy-Studio/Paint-by-Example](https://huggingface.co/Fantasy-Studio/Paint-by-Example) checkpoint. The checkpoint has been warm-started from the [CompVis/stable-diffusion-v1-4](https://huggingface.co/CompVis/stable-diffusion-v1-4) and with the objective to inpaint partly masked images conditioned on example / reference images
|
||||||
|
- To quickly demo *PaintByExample*, please have a look at [this demo](https://huggingface.co/spaces/Fantasy-Studio/Paint-by-Example)
|
||||||
|
- You can run the following code snippet as an example:
|
||||||
|
|
||||||
|
|
||||||
|
```python
|
||||||
|
# !pip install diffusers transformers
|
||||||
|
|
||||||
|
import PIL
|
||||||
|
import requests
|
||||||
|
import torch
|
||||||
|
from io import BytesIO
|
||||||
|
from diffusers import DiffusionPipeline
|
||||||
|
|
||||||
|
|
||||||
|
def download_image(url):
|
||||||
|
response = requests.get(url)
|
||||||
|
return PIL.Image.open(BytesIO(response.content)).convert("RGB")
|
||||||
|
|
||||||
|
|
||||||
|
img_url = "https://raw.githubusercontent.com/Fantasy-Studio/Paint-by-Example/main/examples/image/example_1.png"
|
||||||
|
mask_url = "https://raw.githubusercontent.com/Fantasy-Studio/Paint-by-Example/main/examples/mask/example_1.png"
|
||||||
|
example_url = "https://raw.githubusercontent.com/Fantasy-Studio/Paint-by-Example/main/examples/reference/example_1.jpg"
|
||||||
|
|
||||||
|
init_image = download_image(img_url).resize((512, 512))
|
||||||
|
mask_image = download_image(mask_url).resize((512, 512))
|
||||||
|
example_image = download_image(example_url).resize((512, 512))
|
||||||
|
|
||||||
|
pipe = DiffusionPipeline.from_pretrained(
|
||||||
|
"Fantasy-Studio/Paint-by-Example",
|
||||||
|
torch_dtype=torch.float16,
|
||||||
|
)
|
||||||
|
pipe = pipe.to("cuda")
|
||||||
|
|
||||||
|
image = pipe(image=init_image, mask_image=mask_image, example_image=example_image).images[0]
|
||||||
|
image
|
||||||
|
```
|
||||||
|
|
||||||
|
## PaintByExamplePipeline
|
||||||
|
[[autodoc]] PaintByExamplePipeline
|
||||||
|
- all
|
||||||
|
- __call__
|
||||||
@@ -1,57 +0,0 @@
|
|||||||
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
|
|
||||||
|
|
||||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
|
||||||
the License. You may obtain a copy of the License at
|
|
||||||
|
|
||||||
http://www.apache.org/licenses/LICENSE-2.0
|
|
||||||
|
|
||||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
|
||||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
|
||||||
specific language governing permissions and limitations under the License.
|
|
||||||
-->
|
|
||||||
|
|
||||||
# MultiDiffusion
|
|
||||||
|
|
||||||
[MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation](https://huggingface.co/papers/2302.08113) is by Omer Bar-Tal, Lior Yariv, Yaron Lipman, and Tali Dekel.
|
|
||||||
|
|
||||||
The abstract from the paper is:
|
|
||||||
|
|
||||||
*Recent advances in text-to-image generation with diffusion models present transformative capabilities in image quality. However, user controllability of the generated image, and fast adaptation to new tasks still remains an open challenge, currently mostly addressed by costly and long re-training and fine-tuning or ad-hoc adaptations to specific image generation tasks. In this work, we present MultiDiffusion, a unified framework that enables versatile and controllable image generation, using a pre-trained text-to-image diffusion model, without any further training or finetuning. At the center of our approach is a new generation process, based on an optimization task that binds together multiple diffusion generation processes with a shared set of parameters or constraints. We show that MultiDiffusion can be readily applied to generate high quality and diverse images that adhere to user-provided controls, such as desired aspect ratio (e.g., panorama), and spatial guiding signals, ranging from tight segmentation masks to bounding boxes.*
|
|
||||||
|
|
||||||
You can find additional information about MultiDiffusion on the [project page](https://multidiffusion.github.io/), [original codebase](https://github.com/omerbt/MultiDiffusion), and try it out in a [demo](https://huggingface.co/spaces/weizmannscience/MultiDiffusion).
|
|
||||||
|
|
||||||
## Tips
|
|
||||||
|
|
||||||
While calling [`StableDiffusionPanoramaPipeline`], it's possible to specify the `view_batch_size` parameter to be > 1.
|
|
||||||
For some GPUs with high performance, this can speedup the generation process and increase VRAM usage.
|
|
||||||
|
|
||||||
To generate panorama-like images make sure you pass the width parameter accordingly. We recommend a width value of 2048 which is the default.
|
|
||||||
|
|
||||||
Circular padding is applied to ensure there are no stitching artifacts when working with
|
|
||||||
panoramas to ensure a seamless transition from the rightmost part to the leftmost part.
|
|
||||||
By enabling circular padding (set `circular_padding=True`), the operation applies additional
|
|
||||||
crops after the rightmost point of the image, allowing the model to "see” the transition
|
|
||||||
from the rightmost part to the leftmost part. This helps maintain visual consistency in
|
|
||||||
a 360-degree sense and creates a proper “panorama” that can be viewed using 360-degree
|
|
||||||
panorama viewers. When decoding latents in Stable Diffusion, circular padding is applied
|
|
||||||
to ensure that the decoded latents match in the RGB space.
|
|
||||||
|
|
||||||
For example, without circular padding, there is a stitching artifact (default):
|
|
||||||

|
|
||||||
|
|
||||||
But with circular padding, the right and the left parts are matching (`circular_padding=True`):
|
|
||||||

|
|
||||||
|
|
||||||
<Tip>
|
|
||||||
|
|
||||||
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-components-across-pipelines) section to learn how to efficiently load the same components into multiple pipelines.
|
|
||||||
|
|
||||||
</Tip>
|
|
||||||
|
|
||||||
## StableDiffusionPanoramaPipeline
|
|
||||||
[[autodoc]] StableDiffusionPanoramaPipeline
|
|
||||||
- __call__
|
|
||||||
- all
|
|
||||||
|
|
||||||
## StableDiffusionPipelineOutput
|
|
||||||
[[autodoc]] pipelines.stable_diffusion.StableDiffusionPipelineOutput
|
|
||||||
@@ -1,54 +0,0 @@
|
|||||||
<!--Copyright 2023 ParaDiGMS authors and The HuggingFace Team. All rights reserved.
|
|
||||||
|
|
||||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
|
||||||
the License. You may obtain a copy of the License at
|
|
||||||
|
|
||||||
http://www.apache.org/licenses/LICENSE-2.0
|
|
||||||
|
|
||||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
|
||||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
|
||||||
specific language governing permissions and limitations under the License.
|
|
||||||
-->
|
|
||||||
|
|
||||||
# Parallel Sampling of Diffusion Models
|
|
||||||
|
|
||||||
[Parallel Sampling of Diffusion Models](https://huggingface.co/papers/2305.16317) is by Andy Shih, Suneel Belkhale, Stefano Ermon, Dorsa Sadigh, Nima Anari.
|
|
||||||
|
|
||||||
The abstract from the paper is:
|
|
||||||
|
|
||||||
*Diffusion models are powerful generative models but suffer from slow sampling, often taking 1000 sequential denoising steps for one sample. As a result, considerable efforts have been directed toward reducing the number of denoising steps, but these methods hurt sample quality. Instead of reducing the number of denoising steps (trading quality for speed), in this paper we explore an orthogonal approach: can we run the denoising steps in parallel (trading compute for speed)? In spite of the sequential nature of the denoising steps, we show that surprisingly it is possible to parallelize sampling via Picard iterations, by guessing the solution of future denoising steps and iteratively refining until convergence. With this insight, we present ParaDiGMS, a novel method to accelerate the sampling of pretrained diffusion models by denoising multiple steps in parallel. ParaDiGMS is the first diffusion sampling method that enables trading compute for speed and is even compatible with existing fast sampling techniques such as DDIM and DPMSolver. Using ParaDiGMS, we improve sampling speed by 2-4x across a range of robotics and image generation models, giving state-of-the-art sampling speeds of 0.2s on 100-step DiffusionPolicy and 16s on 1000-step StableDiffusion-v2 with no measurable degradation of task reward, FID score, or CLIP score.*
|
|
||||||
|
|
||||||
The original codebase can be found at [AndyShih12/paradigms](https://github.com/AndyShih12/paradigms), and the pipeline was contributed by [AndyShih12](https://github.com/AndyShih12). ❤️
|
|
||||||
|
|
||||||
## Tips
|
|
||||||
|
|
||||||
This pipeline improves sampling speed by running denoising steps in parallel, at the cost of increased total FLOPs.
|
|
||||||
Therefore, it is better to call this pipeline when running on multiple GPUs. Otherwise, without enough GPU bandwidth
|
|
||||||
sampling may be even slower than sequential sampling.
|
|
||||||
|
|
||||||
The two parameters to play with are `parallel` (batch size) and `tolerance`.
|
|
||||||
- If it fits in memory, for a 1000-step DDPM you can aim for a batch size of around 100
|
|
||||||
(for example, 8 GPUs and `batch_per_device=12` to get `parallel=96`). A higher batch size
|
|
||||||
may not fit in memory, and lower batch size gives less parallelism.
|
|
||||||
- For tolerance, using a higher tolerance may get better speedups but can risk sample quality degradation.
|
|
||||||
If there is quality degradation with the default tolerance, then use a lower tolerance like `0.001`.
|
|
||||||
|
|
||||||
For a 1000-step DDPM on 8 A100 GPUs, you can expect around a 3x speedup from [`StableDiffusionParadigmsPipeline`] compared to the [`StableDiffusionPipeline`]
|
|
||||||
by setting `parallel=80` and `tolerance=0.1`.
|
|
||||||
|
|
||||||
🤗 Diffusers offers [distributed inference support](../training/distributed_inference) for generating multiple prompts
|
|
||||||
in parallel on multiple GPUs. But [`StableDiffusionParadigmsPipeline`] is designed for speeding up sampling of a single prompt by using multiple GPUs.
|
|
||||||
|
|
||||||
<Tip>
|
|
||||||
|
|
||||||
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-components-across-pipelines) section to learn how to efficiently load the same components into multiple pipelines.
|
|
||||||
|
|
||||||
</Tip>
|
|
||||||
|
|
||||||
## StableDiffusionParadigmsPipeline
|
|
||||||
[[autodoc]] StableDiffusionParadigmsPipeline
|
|
||||||
- __call__
|
|
||||||
- all
|
|
||||||
|
|
||||||
## StableDiffusionPipelineOutput
|
|
||||||
[[autodoc]] pipelines.stable_diffusion.StableDiffusionPipelineOutput
|
|
||||||
@@ -1,36 +0,0 @@
|
|||||||
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
|
|
||||||
|
|
||||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
|
||||||
the License. You may obtain a copy of the License at
|
|
||||||
|
|
||||||
http://www.apache.org/licenses/LICENSE-2.0
|
|
||||||
|
|
||||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
|
||||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
|
||||||
specific language governing permissions and limitations under the License.
|
|
||||||
-->
|
|
||||||
|
|
||||||
# PixArt
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
[PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis](https://huggingface.co/papers/2310.00426) is Junsong Chen, Jincheng Yu, Chongjian Ge, Lewei Yao, Enze Xie, Yue Wu, Zhongdao Wang, James Kwok, Ping Luo, Huchuan Lu, and Zhenguo Li.
|
|
||||||
|
|
||||||
The abstract from the paper is:
|
|
||||||
|
|
||||||
*The most advanced text-to-image (T2I) models require significant training costs (e.g., millions of GPU hours), seriously hindering the fundamental innovation for the AIGC community while increasing CO2 emissions. This paper introduces PIXART-α, a Transformer-based T2I diffusion model whose image generation quality is competitive with state-of-the-art image generators (e.g., Imagen, SDXL, and even Midjourney), reaching near-commercial application standards. Additionally, it supports high-resolution image synthesis up to 1024px resolution with low training cost, as shown in Figure 1 and 2. To achieve this goal, three core designs are proposed: (1) Training strategy decomposition: We devise three distinct training steps that separately optimize pixel dependency, text-image alignment, and image aesthetic quality; (2) Efficient T2I Transformer: We incorporate cross-attention modules into Diffusion Transformer (DiT) to inject text conditions and streamline the computation-intensive class-condition branch; (3) High-informative data: We emphasize the significance of concept density in text-image pairs and leverage a large Vision-Language model to auto-label dense pseudo-captions to assist text-image alignment learning. As a result, PIXART-α's training speed markedly surpasses existing large-scale T2I models, e.g., PIXART-α only takes 10.8% of Stable Diffusion v1.5's training time (675 vs. 6,250 A100 GPU days), saving nearly $300,000 ($26,000 vs. $320,000) and reducing 90% CO2 emissions. Moreover, compared with a larger SOTA model, RAPHAEL, our training cost is merely 1%. Extensive experiments demonstrate that PIXART-α excels in image quality, artistry, and semantic control. We hope PIXART-α will provide new insights to the AIGC community and startups to accelerate building their own high-quality yet low-cost generative models from scratch.*
|
|
||||||
|
|
||||||
You can find the original codebase at [PixArt-alpha/PixArt-alpha](https://github.com/PixArt-alpha/PixArt-alpha) and all the available checkpoints at [PixArt-alpha](https://huggingface.co/PixArt-alpha).
|
|
||||||
|
|
||||||
Some notes about this pipeline:
|
|
||||||
|
|
||||||
* It uses a Transformer backbone (instead of a UNet) for denoising. As such it has a similar architecture as [DiT](./dit.md).
|
|
||||||
* It was trained using text conditions computed from T5. This aspect makes the pipeline better at following complex text prompts with intricate details.
|
|
||||||
* It is good at producing high-resolution images at different aspect ratios. To get the best results, the authors recommend some size brackets which can be found [here](https://github.com/PixArt-alpha/PixArt-alpha/blob/08fbbd281ec96866109bdd2cdb75f2f58fb17610/diffusion/data/datasets/utils.py).
|
|
||||||
* It rivals the quality of state-of-the-art text-to-image generation systems (as of this writing) such as Stable Diffusion XL, Imagen, and DALL-E 2, while being more efficient than them.
|
|
||||||
|
|
||||||
## PixArtAlphaPipeline
|
|
||||||
|
|
||||||
[[autodoc]] PixArtAlphaPipeline
|
|
||||||
- all
|
|
||||||
- __call__
|
|
||||||
@@ -1,35 +0,0 @@
|
|||||||
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
|
|
||||||
|
|
||||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
|
||||||
the License. You may obtain a copy of the License at
|
|
||||||
|
|
||||||
http://www.apache.org/licenses/LICENSE-2.0
|
|
||||||
|
|
||||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
|
||||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
|
||||||
specific language governing permissions and limitations under the License.
|
|
||||||
-->
|
|
||||||
|
|
||||||
# PNDM
|
|
||||||
|
|
||||||
[Pseudo Numerical methods for Diffusion Models on manifolds](https://huggingface.co/papers/2202.09778) (PNDM) is by Luping Liu, Yi Ren, Zhijie Lin and Zhou Zhao.
|
|
||||||
|
|
||||||
The abstract from the paper is:
|
|
||||||
|
|
||||||
*Denoising Diffusion Probabilistic Models (DDPMs) can generate high-quality samples such as image and audio samples. However, DDPMs require hundreds to thousands of iterations to produce final samples. Several prior works have successfully accelerated DDPMs through adjusting the variance schedule (e.g., Improved Denoising Diffusion Probabilistic Models) or the denoising equation (e.g., Denoising Diffusion Implicit Models (DDIMs)). However, these acceleration methods cannot maintain the quality of samples and even introduce new noise at a high speedup rate, which limit their practicability. To accelerate the inference process while keeping the sample quality, we provide a fresh perspective that DDPMs should be treated as solving differential equations on manifolds. Under such a perspective, we propose pseudo numerical methods for diffusion models (PNDMs). Specifically, we figure out how to solve differential equations on manifolds and show that DDIMs are simple cases of pseudo numerical methods. We change several classical numerical methods to corresponding pseudo numerical methods and find that the pseudo linear multi-step method is the best in most situations. According to our experiments, by directly using pre-trained models on Cifar10, CelebA and LSUN, PNDMs can generate higher quality synthetic images with only 50 steps compared with 1000-step DDIMs (20x speedup), significantly outperform DDIMs with 250 steps (by around 0.4 in FID) and have good generalization on different variance schedules.*
|
|
||||||
|
|
||||||
The original codebase can be found at [luping-liu/PNDM](https://github.com/luping-liu/PNDM).
|
|
||||||
|
|
||||||
<Tip>
|
|
||||||
|
|
||||||
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-components-across-pipelines) section to learn how to efficiently load the same components into multiple pipelines.
|
|
||||||
|
|
||||||
</Tip>
|
|
||||||
|
|
||||||
## PNDMPipeline
|
|
||||||
[[autodoc]] PNDMPipeline
|
|
||||||
- all
|
|
||||||
- __call__
|
|
||||||
|
|
||||||
## ImagePipelineOutput
|
|
||||||
[[autodoc]] pipelines.ImagePipelineOutput
|
|
||||||
35
docs/source/en/api/pipelines/pndm.mdx
Normal file
35
docs/source/en/api/pipelines/pndm.mdx
Normal file
@@ -0,0 +1,35 @@
|
|||||||
|
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
|
||||||
|
|
||||||
|
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||||
|
the License. You may obtain a copy of the License at
|
||||||
|
|
||||||
|
http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
|
||||||
|
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||||
|
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||||
|
specific language governing permissions and limitations under the License.
|
||||||
|
-->
|
||||||
|
|
||||||
|
# PNDM
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
[Pseudo Numerical methods for Diffusion Models on manifolds](https://arxiv.org/abs/2202.09778) (PNDM) by Luping Liu, Yi Ren, Zhijie Lin and Zhou Zhao.
|
||||||
|
|
||||||
|
The abstract of the paper is the following:
|
||||||
|
|
||||||
|
Denoising Diffusion Probabilistic Models (DDPMs) can generate high-quality samples such as image and audio samples. However, DDPMs require hundreds to thousands of iterations to produce final samples. Several prior works have successfully accelerated DDPMs through adjusting the variance schedule (e.g., Improved Denoising Diffusion Probabilistic Models) or the denoising equation (e.g., Denoising Diffusion Implicit Models (DDIMs)). However, these acceleration methods cannot maintain the quality of samples and even introduce new noise at a high speedup rate, which limit their practicability. To accelerate the inference process while keeping the sample quality, we provide a fresh perspective that DDPMs should be treated as solving differential equations on manifolds. Under such a perspective, we propose pseudo numerical methods for diffusion models (PNDMs). Specifically, we figure out how to solve differential equations on manifolds and show that DDIMs are simple cases of pseudo numerical methods. We change several classical numerical methods to corresponding pseudo numerical methods and find that the pseudo linear multi-step method is the best in most situations. According to our experiments, by directly using pre-trained models on Cifar10, CelebA and LSUN, PNDMs can generate higher quality synthetic images with only 50 steps compared with 1000-step DDIMs (20x speedup), significantly outperform DDIMs with 250 steps (by around 0.4 in FID) and have good generalization on different variance schedules.
|
||||||
|
|
||||||
|
The original codebase can be found [here](https://github.com/luping-liu/PNDM).
|
||||||
|
|
||||||
|
## Available Pipelines:
|
||||||
|
|
||||||
|
| Pipeline | Tasks | Colab
|
||||||
|
|---|---|:---:|
|
||||||
|
| [pipeline_pndm.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/pndm/pipeline_pndm.py) | *Unconditional Image Generation* | - |
|
||||||
|
|
||||||
|
|
||||||
|
## PNDMPipeline
|
||||||
|
[[autodoc]] PNDMPipeline
|
||||||
|
- all
|
||||||
|
- __call__
|
||||||
@@ -1,37 +0,0 @@
|
|||||||
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
|
|
||||||
|
|
||||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
|
||||||
the License. You may obtain a copy of the License at
|
|
||||||
|
|
||||||
http://www.apache.org/licenses/LICENSE-2.0
|
|
||||||
|
|
||||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
|
||||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
|
||||||
specific language governing permissions and limitations under the License.
|
|
||||||
-->
|
|
||||||
|
|
||||||
# RePaint
|
|
||||||
|
|
||||||
[RePaint: Inpainting using Denoising Diffusion Probabilistic Models](https://huggingface.co/papers/2201.09865) is by Andreas Lugmayr, Martin Danelljan, Andres Romero, Fisher Yu, Radu Timofte, Luc Van Gool.
|
|
||||||
|
|
||||||
The abstract from the paper is:
|
|
||||||
|
|
||||||
*Free-form inpainting is the task of adding new content to an image in the regions specified by an arbitrary binary mask. Most existing approaches train for a certain distribution of masks, which limits their generalization capabilities to unseen mask types. Furthermore, training with pixel-wise and perceptual losses often leads to simple textural extensions towards the missing areas instead of semantically meaningful generation. In this work, we propose RePaint: A Denoising Diffusion Probabilistic Model (DDPM) based inpainting approach that is applicable to even extreme masks. We employ a pretrained unconditional DDPM as the generative prior. To condition the generation process, we only alter the reverse diffusion iterations by sampling the unmasked regions using the given image information. Since this technique does not modify or condition the original DDPM network itself, the model produces high-quality and diverse output images for any inpainting form. We validate our method for both faces and general-purpose image inpainting using standard and extreme masks.
|
|
||||||
RePaint outperforms state-of-the-art Autoregressive, and GAN approaches for at least five out of six mask distributions.*
|
|
||||||
|
|
||||||
The original codebase can be found at [andreas128/RePaint](https://github.com/andreas128/RePaint).
|
|
||||||
|
|
||||||
<Tip>
|
|
||||||
|
|
||||||
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-components-across-pipelines) section to learn how to efficiently load the same components into multiple pipelines.
|
|
||||||
|
|
||||||
</Tip>
|
|
||||||
|
|
||||||
|
|
||||||
## RePaintPipeline
|
|
||||||
[[autodoc]] RePaintPipeline
|
|
||||||
- all
|
|
||||||
- __call__
|
|
||||||
|
|
||||||
## ImagePipelineOutput
|
|
||||||
[[autodoc]] pipelines.ImagePipelineOutput
|
|
||||||
77
docs/source/en/api/pipelines/repaint.mdx
Normal file
77
docs/source/en/api/pipelines/repaint.mdx
Normal file
@@ -0,0 +1,77 @@
|
|||||||
|
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
|
||||||
|
|
||||||
|
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||||
|
the License. You may obtain a copy of the License at
|
||||||
|
|
||||||
|
http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
|
||||||
|
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||||
|
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||||
|
specific language governing permissions and limitations under the License.
|
||||||
|
-->
|
||||||
|
|
||||||
|
# RePaint
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
[RePaint: Inpainting using Denoising Diffusion Probabilistic Models](https://arxiv.org/abs/2201.09865) (PNDM) by Andreas Lugmayr, Martin Danelljan, Andres Romero, Fisher Yu, Radu Timofte, Luc Van Gool.
|
||||||
|
|
||||||
|
The abstract of the paper is the following:
|
||||||
|
|
||||||
|
Free-form inpainting is the task of adding new content to an image in the regions specified by an arbitrary binary mask. Most existing approaches train for a certain distribution of masks, which limits their generalization capabilities to unseen mask types. Furthermore, training with pixel-wise and perceptual losses often leads to simple textural extensions towards the missing areas instead of semantically meaningful generation. In this work, we propose RePaint: A Denoising Diffusion Probabilistic Model (DDPM) based inpainting approach that is applicable to even extreme masks. We employ a pretrained unconditional DDPM as the generative prior. To condition the generation process, we only alter the reverse diffusion iterations by sampling the unmasked regions using the given image information. Since this technique does not modify or condition the original DDPM network itself, the model produces high-quality and diverse output images for any inpainting form. We validate our method for both faces and general-purpose image inpainting using standard and extreme masks.
|
||||||
|
RePaint outperforms state-of-the-art Autoregressive, and GAN approaches for at least five out of six mask distributions.
|
||||||
|
|
||||||
|
The original codebase can be found [here](https://github.com/andreas128/RePaint).
|
||||||
|
|
||||||
|
## Available Pipelines:
|
||||||
|
|
||||||
|
| Pipeline | Tasks | Colab
|
||||||
|
|-------------------------------------------------------------------------------------------------------------------------------|--------------------|:---:|
|
||||||
|
| [pipeline_repaint.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/repaint/pipeline_repaint.py) | *Image Inpainting* | - |
|
||||||
|
|
||||||
|
## Usage example
|
||||||
|
|
||||||
|
```python
|
||||||
|
from io import BytesIO
|
||||||
|
|
||||||
|
import torch
|
||||||
|
|
||||||
|
import PIL
|
||||||
|
import requests
|
||||||
|
from diffusers import RePaintPipeline, RePaintScheduler
|
||||||
|
|
||||||
|
|
||||||
|
def download_image(url):
|
||||||
|
response = requests.get(url)
|
||||||
|
return PIL.Image.open(BytesIO(response.content)).convert("RGB")
|
||||||
|
|
||||||
|
|
||||||
|
img_url = "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/repaint/celeba_hq_256.png"
|
||||||
|
mask_url = "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/repaint/mask_256.png"
|
||||||
|
|
||||||
|
# Load the original image and the mask as PIL images
|
||||||
|
original_image = download_image(img_url).resize((256, 256))
|
||||||
|
mask_image = download_image(mask_url).resize((256, 256))
|
||||||
|
|
||||||
|
# Load the RePaint scheduler and pipeline based on a pretrained DDPM model
|
||||||
|
scheduler = RePaintScheduler.from_pretrained("google/ddpm-ema-celebahq-256")
|
||||||
|
pipe = RePaintPipeline.from_pretrained("google/ddpm-ema-celebahq-256", scheduler=scheduler)
|
||||||
|
pipe = pipe.to("cuda")
|
||||||
|
|
||||||
|
generator = torch.Generator(device="cuda").manual_seed(0)
|
||||||
|
output = pipe(
|
||||||
|
original_image=original_image,
|
||||||
|
mask_image=mask_image,
|
||||||
|
num_inference_steps=250,
|
||||||
|
eta=0.0,
|
||||||
|
jump_length=10,
|
||||||
|
jump_n_sample=10,
|
||||||
|
generator=generator,
|
||||||
|
)
|
||||||
|
inpainted_image = output.images[0]
|
||||||
|
```
|
||||||
|
|
||||||
|
## RePaintPipeline
|
||||||
|
[[autodoc]] RePaintPipeline
|
||||||
|
- all
|
||||||
|
- __call__
|
||||||
@@ -1,35 +0,0 @@
|
|||||||
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
|
|
||||||
|
|
||||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
|
||||||
the License. You may obtain a copy of the License at
|
|
||||||
|
|
||||||
http://www.apache.org/licenses/LICENSE-2.0
|
|
||||||
|
|
||||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
|
||||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
|
||||||
specific language governing permissions and limitations under the License.
|
|
||||||
-->
|
|
||||||
|
|
||||||
# Score SDE VE
|
|
||||||
|
|
||||||
[Score-Based Generative Modeling through Stochastic Differential Equations](https://huggingface.co/papers/2011.13456) (Score SDE) is by Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon and Ben Poole. This pipeline implements the variance expanding (VE) variant of the stochastic differential equation method.
|
|
||||||
|
|
||||||
The abstract from the paper is:
|
|
||||||
|
|
||||||
*Creating noise from data is easy; creating data from noise is generative modeling. We present a stochastic differential equation (SDE) that smoothly transforms a complex data distribution to a known prior distribution by slowly injecting noise, and a corresponding reverse-time SDE that transforms the prior distribution back into the data distribution by slowly removing the noise. Crucially, the reverse-time SDE depends only on the time-dependent gradient field (\aka, score) of the perturbed data distribution. By leveraging advances in score-based generative modeling, we can accurately estimate these scores with neural networks, and use numerical SDE solvers to generate samples. We show that this framework encapsulates previous approaches in score-based generative modeling and diffusion probabilistic modeling, allowing for new sampling procedures and new modeling capabilities. In particular, we introduce a predictor-corrector framework to correct errors in the evolution of the discretized reverse-time SDE. We also derive an equivalent neural ODE that samples from the same distribution as the SDE, but additionally enables exact likelihood computation, and improved sampling efficiency. In addition, we provide a new way to solve inverse problems with score-based models, as demonstrated with experiments on class-conditional generation, image inpainting, and colorization. Combined with multiple architectural improvements, we achieve record-breaking performance for unconditional image generation on CIFAR-10 with an Inception score of 9.89 and FID of 2.20, a competitive likelihood of 2.99 bits/dim, and demonstrate high fidelity generation of 1024 x 1024 images for the first time from a score-based generative model.*
|
|
||||||
|
|
||||||
The original codebase can be found at [yang-song/score_sde_pytorch](https://github.com/yang-song/score_sde_pytorch).
|
|
||||||
|
|
||||||
<Tip>
|
|
||||||
|
|
||||||
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-components-across-pipelines) section to learn how to efficiently load the same components into multiple pipelines.
|
|
||||||
|
|
||||||
</Tip>
|
|
||||||
|
|
||||||
## ScoreSdeVePipeline
|
|
||||||
[[autodoc]] ScoreSdeVePipeline
|
|
||||||
- all
|
|
||||||
- __call__
|
|
||||||
|
|
||||||
## ImagePipelineOutput
|
|
||||||
[[autodoc]] pipelines.ImagePipelineOutput
|
|
||||||
36
docs/source/en/api/pipelines/score_sde_ve.mdx
Normal file
36
docs/source/en/api/pipelines/score_sde_ve.mdx
Normal file
@@ -0,0 +1,36 @@
|
|||||||
|
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
|
||||||
|
|
||||||
|
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||||
|
the License. You may obtain a copy of the License at
|
||||||
|
|
||||||
|
http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
|
||||||
|
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||||
|
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||||
|
specific language governing permissions and limitations under the License.
|
||||||
|
-->
|
||||||
|
|
||||||
|
# Score SDE VE
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
[Score-Based Generative Modeling through Stochastic Differential Equations](https://arxiv.org/abs/2011.13456) (Score SDE) by Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon and Ben Poole.
|
||||||
|
|
||||||
|
The abstract of the paper is the following:
|
||||||
|
|
||||||
|
Creating noise from data is easy; creating data from noise is generative modeling. We present a stochastic differential equation (SDE) that smoothly transforms a complex data distribution to a known prior distribution by slowly injecting noise, and a corresponding reverse-time SDE that transforms the prior distribution back into the data distribution by slowly removing the noise. Crucially, the reverse-time SDE depends only on the time-dependent gradient field (\aka, score) of the perturbed data distribution. By leveraging advances in score-based generative modeling, we can accurately estimate these scores with neural networks, and use numerical SDE solvers to generate samples. We show that this framework encapsulates previous approaches in score-based generative modeling and diffusion probabilistic modeling, allowing for new sampling procedures and new modeling capabilities. In particular, we introduce a predictor-corrector framework to correct errors in the evolution of the discretized reverse-time SDE. We also derive an equivalent neural ODE that samples from the same distribution as the SDE, but additionally enables exact likelihood computation, and improved sampling efficiency. In addition, we provide a new way to solve inverse problems with score-based models, as demonstrated with experiments on class-conditional generation, image inpainting, and colorization. Combined with multiple architectural improvements, we achieve record-breaking performance for unconditional image generation on CIFAR-10 with an Inception score of 9.89 and FID of 2.20, a competitive likelihood of 2.99 bits/dim, and demonstrate high fidelity generation of 1024 x 1024 images for the first time from a score-based generative model.
|
||||||
|
|
||||||
|
The original codebase can be found [here](https://github.com/yang-song/score_sde_pytorch).
|
||||||
|
|
||||||
|
This pipeline implements the Variance Expanding (VE) variant of the method.
|
||||||
|
|
||||||
|
## Available Pipelines:
|
||||||
|
|
||||||
|
| Pipeline | Tasks | Colab
|
||||||
|
|---|---|:---:|
|
||||||
|
| [pipeline_score_sde_ve.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/score_sde_ve/pipeline_score_sde_ve.py) | *Unconditional Image Generation* | - |
|
||||||
|
|
||||||
|
## ScoreSdeVePipeline
|
||||||
|
[[autodoc]] ScoreSdeVePipeline
|
||||||
|
- all
|
||||||
|
- __call__
|
||||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user