mirror of
https://github.com/huggingface/diffusers.git
synced 2025-12-08 13:34:27 +08:00
Compare commits
801 Commits
kandinsky-
...
dduf-with-
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
1cd5155bb8 | ||
|
|
b14bffeffe | ||
|
|
e66c4d0dab | ||
|
|
7d2c7d5553 | ||
|
|
78135f1478 | ||
|
|
9ff72433fa | ||
|
|
c1926cef6b | ||
|
|
8421c1461b | ||
|
|
cfdeebd4a8 | ||
|
|
6a51427b6a | ||
|
|
5effcd3e64 | ||
|
|
619b9658e2 | ||
|
|
b58f67f2d5 | ||
|
|
8ac6de963c | ||
|
|
2be66e6aa0 | ||
|
|
cf258948b2 | ||
|
|
d8408677c5 | ||
|
|
63b631f383 | ||
|
|
acf79b3487 | ||
|
|
fc72e0f261 | ||
|
|
0763a7edf4 | ||
|
|
963ffca434 | ||
|
|
30f2e9bd20 | ||
|
|
2312b27f79 | ||
|
|
6db33337a4 | ||
|
|
beb856685d | ||
|
|
a9d3f6c359 | ||
|
|
cd344393e2 | ||
|
|
c44fba8899 | ||
|
|
922c5f5c3c | ||
|
|
8d386f7990 | ||
|
|
827b6c25f9 | ||
|
|
784b351f32 | ||
|
|
cbee7cbc6b | ||
|
|
c96bfa5c80 | ||
|
|
6b288ec44d | ||
|
|
fdec8bd675 | ||
|
|
2eeda25321 | ||
|
|
069186fac5 | ||
|
|
69c83d6eed | ||
|
|
e44fc75acb | ||
|
|
e47cc1fc1a | ||
|
|
75bd1e83cb | ||
|
|
0389333113 | ||
|
|
1fb86e34c0 | ||
|
|
8d477daed5 | ||
|
|
ad5ecd1251 | ||
|
|
074e12358b | ||
|
|
047bf49291 | ||
|
|
c4b5d2ff6b | ||
|
|
7ac6e286ee | ||
|
|
b5fd6f13f5 | ||
|
|
64b3e0f539 | ||
|
|
2e86a3f023 | ||
|
|
cd6ca9df29 | ||
|
|
e564abe292 | ||
|
|
3139d39fa7 | ||
|
|
12358622e5 | ||
|
|
805aa93789 | ||
|
|
f6f7afa1d7 | ||
|
|
637e2302ac | ||
|
|
99c0483b67 | ||
|
|
cc7d88f247 | ||
|
|
ea40933f36 | ||
|
|
0583a8d12a | ||
|
|
7d0b9c4d4e | ||
|
|
acf479bded | ||
|
|
03bf77c4af | ||
|
|
3b2830618d | ||
|
|
c3c94fe71b | ||
|
|
365a938884 | ||
|
|
345907f32d | ||
|
|
07d0fbf3ec | ||
|
|
1d2204d3a0 | ||
|
|
d38c50c8dd | ||
|
|
e255920719 | ||
|
|
40ab1c03f3 | ||
|
|
5c94937dc7 | ||
|
|
d74483c47a | ||
|
|
1dbd26fa23 | ||
|
|
dac623b59f | ||
|
|
8d6dc2be5d | ||
|
|
d720b2132e | ||
|
|
9cc96a64f1 | ||
|
|
5b972fbd6a | ||
|
|
0be52c07d6 | ||
|
|
1b392544c7 | ||
|
|
5588725e8e | ||
|
|
ded3db164b | ||
|
|
76b7d86a9a | ||
|
|
e2b3c248d8 | ||
|
|
a03bf4a531 | ||
|
|
08ac5cbc7f | ||
|
|
3f329a426a | ||
|
|
a3cc641f78 | ||
|
|
13e8fdecda | ||
|
|
c10f875ff0 | ||
|
|
a98a839de7 | ||
|
|
3deed729e6 | ||
|
|
7ffbc2525f | ||
|
|
f55f1f7ee5 | ||
|
|
9dcac83057 | ||
|
|
c75431843f | ||
|
|
d2e5cb3c10 | ||
|
|
41e4779d98 | ||
|
|
ff182ad669 | ||
|
|
4adf6affbb | ||
|
|
8ce37ab055 | ||
|
|
09b8aebd67 | ||
|
|
c1d4a0dded | ||
|
|
9a92b8177c | ||
|
|
0d1d267b12 | ||
|
|
c5376c5695 | ||
|
|
743a5697f2 | ||
|
|
db5b6a9630 | ||
|
|
493aa74312 | ||
|
|
3b5b1c5698 | ||
|
|
fddbab7993 | ||
|
|
298ab6eb01 | ||
|
|
73b59f5203 | ||
|
|
52d4449810 | ||
|
|
df073ba137 | ||
|
|
94643fac8a | ||
|
|
435f6b7e47 | ||
|
|
1d1e1a2888 | ||
|
|
24c7d578ba | ||
|
|
bfa0aa4ff2 | ||
|
|
ab1b7b2080 | ||
|
|
9366c8f84b | ||
|
|
e45c25d03a | ||
|
|
76c00c7236 | ||
|
|
0d9d98fe5f | ||
|
|
60ffa84253 | ||
|
|
0f079b932d | ||
|
|
b0ffe92230 | ||
|
|
1b64772b79 | ||
|
|
2d280f173f | ||
|
|
63a0c9e5f7 | ||
|
|
e2d037bbf1 | ||
|
|
bcd61fd349 | ||
|
|
d27ecc5960 | ||
|
|
6b915672f4 | ||
|
|
b821f006d0 | ||
|
|
24281f8036 | ||
|
|
2a1d2f6218 | ||
|
|
56d6d21bae | ||
|
|
89565e9171 | ||
|
|
5d3e7bdaaa | ||
|
|
2541d141d5 | ||
|
|
5704376d03 | ||
|
|
9a7f824645 | ||
|
|
d9029f2c59 | ||
|
|
d204e53291 | ||
|
|
8cabd4a0db | ||
|
|
5783286d2b | ||
|
|
ee4ab23892 | ||
|
|
cef4f65cf7 | ||
|
|
29a2c5d1ca | ||
|
|
0d935df67d | ||
|
|
3e9a28a8a1 | ||
|
|
2ffbb88f1c | ||
|
|
d40da7b68a | ||
|
|
a3e8d3f7de | ||
|
|
fff4be8e23 | ||
|
|
355bb641e3 | ||
|
|
92d2baf643 | ||
|
|
dccf39f01e | ||
|
|
99d87474fd | ||
|
|
79b118e863 | ||
|
|
9d0616189e | ||
|
|
5f0df17703 | ||
|
|
957e5cabff | ||
|
|
3e4c5707c3 | ||
|
|
1bcd19e4d0 | ||
|
|
22ed39f571 | ||
|
|
56c21150d8 | ||
|
|
5956b68a69 | ||
|
|
8d81564b27 | ||
|
|
68d16f7806 | ||
|
|
86bcbc389e | ||
|
|
6a5f06488c | ||
|
|
c7a6d77b5f | ||
|
|
0f8fb75c7b | ||
|
|
3033f08201 | ||
|
|
164ec9f423 | ||
|
|
38a3e4df92 | ||
|
|
e16fd93d0a | ||
|
|
07bd2fabb6 | ||
|
|
af28ae2d5b | ||
|
|
31058cdaef | ||
|
|
ec9e5264c0 | ||
|
|
acd6d2c42f | ||
|
|
86bd991ee5 | ||
|
|
02eeb8e77e | ||
|
|
66eef9a6dc | ||
|
|
63a5c8742a | ||
|
|
1287822973 | ||
|
|
a80f689200 | ||
|
|
2cb383f591 | ||
|
|
31010ecc45 | ||
|
|
3159e60d59 | ||
|
|
99f608218c | ||
|
|
7f323f0f31 | ||
|
|
61d37640ad | ||
|
|
33fafe3d14 | ||
|
|
c4a8979f30 | ||
|
|
f9fd511466 | ||
|
|
8e7d6c03a3 | ||
|
|
b28675c605 | ||
|
|
bd4df2856a | ||
|
|
11542431a5 | ||
|
|
81cf3b2f15 | ||
|
|
534848c370 | ||
|
|
2daedc0ad3 | ||
|
|
665c6b47a2 | ||
|
|
066ea374c8 | ||
|
|
9cd37557d5 | ||
|
|
1c6ede9371 | ||
|
|
aa3c46d99a | ||
|
|
c76e88405c | ||
|
|
d9c969172d | ||
|
|
065ce07ac3 | ||
|
|
6ca5a58e43 | ||
|
|
b52684c3ed | ||
|
|
bac8a2412d | ||
|
|
28f9d84549 | ||
|
|
2b5bc5be0b | ||
|
|
bab17789b5 | ||
|
|
19547a5734 | ||
|
|
3e69e241f7 | ||
|
|
65f9439b56 | ||
|
|
00f5b41862 | ||
|
|
14f6464bef | ||
|
|
ba5af5aebb | ||
|
|
aa73072f1f | ||
|
|
e5d0a328d6 | ||
|
|
14a1b86fc7 | ||
|
|
2b443a5d62 | ||
|
|
d13b0d63c0 | ||
|
|
5d476f57c5 | ||
|
|
da18fbd54c | ||
|
|
ba06124e4a | ||
|
|
bb1b0fa1f9 | ||
|
|
8fcfb2a456 | ||
|
|
5440cbd34e | ||
|
|
b52119ae92 | ||
|
|
8336405e50 | ||
|
|
2171f77ac5 | ||
|
|
2454b98af4 | ||
|
|
37e3603c4a | ||
|
|
e2ead7cdcc | ||
|
|
48e36353d8 | ||
|
|
6dc6486565 | ||
|
|
1e8cf2763d | ||
|
|
6cf8d98ce1 | ||
|
|
45aa8bb187 | ||
|
|
5e1427a7da | ||
|
|
b9e2f886cd | ||
|
|
b19827f6b4 | ||
|
|
c002731d93 | ||
|
|
adf1f911f0 | ||
|
|
f28a8c257a | ||
|
|
2c6a6c97b3 | ||
|
|
a7361dccdc | ||
|
|
485b8bb000 | ||
|
|
d08ad65819 | ||
|
|
8cdcdd9e32 | ||
|
|
d269cc8a4e | ||
|
|
6dfa49963c | ||
|
|
5249a2666e | ||
|
|
55ac421f7b | ||
|
|
53051cf282 | ||
|
|
3000551729 | ||
|
|
249a9e48e8 | ||
|
|
2ee3215949 | ||
|
|
8ecf499d8b | ||
|
|
dcf320f293 | ||
|
|
8ba90aa706 | ||
|
|
9d49b45b19 | ||
|
|
81da2e1c95 | ||
|
|
24053832b5 | ||
|
|
f6f16a0c11 | ||
|
|
1c1ccaa03f | ||
|
|
007ad0e2aa | ||
|
|
0e6a8403f6 | ||
|
|
af6c0fb766 | ||
|
|
d8a16635f4 | ||
|
|
e417d02811 | ||
|
|
1d4d71875b | ||
|
|
61d96c3ae7 | ||
|
|
4f495b06dc | ||
|
|
40c13fe5b4 | ||
|
|
2a3fbc2cc2 | ||
|
|
089cf798eb | ||
|
|
cbc2ec8f44 | ||
|
|
b5f591fea8 | ||
|
|
05b38c3c0d | ||
|
|
8f7fde5701 | ||
|
|
a59672655b | ||
|
|
9aca79f2b8 | ||
|
|
bbcf2a8589 | ||
|
|
4cfb2164fb | ||
|
|
c977966502 | ||
|
|
1ca0a75567 | ||
|
|
c1e6a32ae4 | ||
|
|
77b2162817 | ||
|
|
4e66513a74 | ||
|
|
4e74206b0c | ||
|
|
255ac592c2 | ||
|
|
2d9ccf39b5 | ||
|
|
960c149c77 | ||
|
|
dc07fc29da | ||
|
|
805bf33fa7 | ||
|
|
0ec64fe9fc | ||
|
|
5090b09d48 | ||
|
|
32d6492c7b | ||
|
|
43f1090a0f | ||
|
|
c291617518 | ||
|
|
9003d75f20 | ||
|
|
750bd79206 | ||
|
|
214372aa99 | ||
|
|
867e0c919e | ||
|
|
16a3dad474 | ||
|
|
21682bab7e | ||
|
|
214990e5f2 | ||
|
|
cf2c49b179 | ||
|
|
eda36c4c28 | ||
|
|
803e817e3e | ||
|
|
67f5cce294 | ||
|
|
d72bbc68a9 | ||
|
|
9ab80a99a4 | ||
|
|
940b8e0358 | ||
|
|
b2add10d13 | ||
|
|
815d882217 | ||
|
|
ba4348d9a7 | ||
|
|
d25eb5d385 | ||
|
|
7ef8a46523 | ||
|
|
f848febacd | ||
|
|
b38255006a | ||
|
|
cba548d8a3 | ||
|
|
db829a4be4 | ||
|
|
e780c05cc3 | ||
|
|
e649678bf5 | ||
|
|
39b87b14b5 | ||
|
|
3e46043223 | ||
|
|
1a92bc05a7 | ||
|
|
0c1e63bd11 | ||
|
|
e7e45bd127 | ||
|
|
82058a5413 | ||
|
|
a85b34e7fd | ||
|
|
5ffbe14c32 | ||
|
|
cc0513091a | ||
|
|
15eb77bc4c | ||
|
|
413ca29b71 | ||
|
|
10dc06c8d9 | ||
|
|
3ece143308 | ||
|
|
98930ee131 | ||
|
|
c1079f0887 | ||
|
|
65e30907b5 | ||
|
|
cee7c1b0fb | ||
|
|
1fcb811a8e | ||
|
|
ae026db7aa | ||
|
|
8e3affc669 | ||
|
|
ba7e48455a | ||
|
|
2dad462d9b | ||
|
|
e3568d14ba | ||
|
|
f6df22447c | ||
|
|
9b5180cb5f | ||
|
|
16a93f1a25 | ||
|
|
2d753b6fb5 | ||
|
|
39e1f7eaa4 | ||
|
|
e1b603dc2e | ||
|
|
e4325606db | ||
|
|
926daa30f9 | ||
|
|
325a5de3a9 | ||
|
|
4c6152c2fb | ||
|
|
87e50a2f1d | ||
|
|
a57a7af45c | ||
|
|
52f1378e64 | ||
|
|
3dc97bd148 | ||
|
|
6d32b29239 | ||
|
|
bc3c73ad0b | ||
|
|
5934873b8f | ||
|
|
b7058d142c | ||
|
|
e1d508ae92 | ||
|
|
fc6a91e383 | ||
|
|
2b76099610 | ||
|
|
4f0d01d387 | ||
|
|
3dc10a535f | ||
|
|
c370b90ff1 | ||
|
|
ebf3ab1477 | ||
|
|
fbe29c6298 | ||
|
|
7071b7461b | ||
|
|
a054c78495 | ||
|
|
b1f43d7189 | ||
|
|
0e460675e2 | ||
|
|
7b98c4cc67 | ||
|
|
27637a5402 | ||
|
|
2ea22e1cc7 | ||
|
|
95a7832879 | ||
|
|
c646fbc124 | ||
|
|
05b706c003 | ||
|
|
ea1b4ea7ca | ||
|
|
e5b94b4c57 | ||
|
|
69e72b1dd1 | ||
|
|
8c4856cd6c | ||
|
|
f240a936da | ||
|
|
00d8d46e23 | ||
|
|
bfc9369f0a | ||
|
|
73acebb8cf | ||
|
|
ca0747a07e | ||
|
|
5c53ca5ed8 | ||
|
|
57a021d5e4 | ||
|
|
1168eaaadd | ||
|
|
bce9105ac7 | ||
|
|
2afb2e0aac | ||
|
|
d87fe95f90 | ||
|
|
50e66f2f95 | ||
|
|
9b8c8605d1 | ||
|
|
62863bb1ea | ||
|
|
1fd647f2a0 | ||
|
|
0bda1d7b89 | ||
|
|
527430d0a4 | ||
|
|
3ae0ee88d3 | ||
|
|
5fbb4d32d5 | ||
|
|
d8bcb33f4b | ||
|
|
4a782f462a | ||
|
|
cdd12bde17 | ||
|
|
2c25b98c8e | ||
|
|
93983b6780 | ||
|
|
41b705f42d | ||
|
|
50d21f7c6a | ||
|
|
3bb1fd6fc0 | ||
|
|
cf55dcf0ff | ||
|
|
7a95f8d9d8 | ||
|
|
7710415baf | ||
|
|
8b21feed42 | ||
|
|
f57b27d2ad | ||
|
|
c5fdf33a10 | ||
|
|
77c5de2e05 | ||
|
|
af400040f5 | ||
|
|
5802c2e3f2 | ||
|
|
f4af03b350 | ||
|
|
267bf65707 | ||
|
|
1a8b3c2ee8 | ||
|
|
56e772ab7e | ||
|
|
fe7948941d | ||
|
|
461efc57c5 | ||
|
|
3b04cdc816 | ||
|
|
c009c203be | ||
|
|
3f1411767b | ||
|
|
588fb5c105 | ||
|
|
eb24e4bdb2 | ||
|
|
e02ec27e51 | ||
|
|
a41e4c506b | ||
|
|
12625c1c9c | ||
|
|
c1dc2ae619 | ||
|
|
e15a8e7f17 | ||
|
|
c2fbf8da02 | ||
|
|
0f09b01ab3 | ||
|
|
f6cfe0a1e5 | ||
|
|
e87bf62940 | ||
|
|
3b37fefee9 | ||
|
|
bbd2f9d4e9 | ||
|
|
d704b3bf8c | ||
|
|
9f963e7349 | ||
|
|
973a62d408 | ||
|
|
11d18f3217 | ||
|
|
d2df40c6f3 | ||
|
|
2261510bbc | ||
|
|
87b9db644b | ||
|
|
b8cf84a3f9 | ||
|
|
673eb60f1c | ||
|
|
a785992c1d | ||
|
|
35cc66dc4c | ||
|
|
57084dacc5 | ||
|
|
70611a1068 | ||
|
|
98388670d2 | ||
|
|
9e9ed353a2 | ||
|
|
7833ed957b | ||
|
|
85c4a326e0 | ||
|
|
0bab9d6be7 | ||
|
|
2e2684f014 | ||
|
|
31adeb41cd | ||
|
|
a7b9634e95 | ||
|
|
6b6b4bcffe | ||
|
|
beb1c017ad | ||
|
|
06ee4db3e7 | ||
|
|
84bbd2f4ce | ||
|
|
600ef8a4dc | ||
|
|
984d340534 | ||
|
|
a2071a1837 | ||
|
|
d9f71ab3c3 | ||
|
|
dd4b731e68 | ||
|
|
31b211bfe3 | ||
|
|
610a71d7d4 | ||
|
|
c104482b9c | ||
|
|
c7a84ba2f4 | ||
|
|
8b1e3ec93e | ||
|
|
4e57aeff1f | ||
|
|
af92869d9b | ||
|
|
0bae6e447c | ||
|
|
0368483b61 | ||
|
|
ddb9d8548c | ||
|
|
49979753e1 | ||
|
|
a3904d7e34 | ||
|
|
7bfc1ee1b2 | ||
|
|
71c046102b | ||
|
|
83b112a145 | ||
|
|
8690e8b9d6 | ||
|
|
7db8c3ec40 | ||
|
|
9b7acc7cf2 | ||
|
|
a216b0bb7f | ||
|
|
150142c537 | ||
|
|
35f45ecd71 | ||
|
|
d5dd8df3b4 | ||
|
|
3e0d128da7 | ||
|
|
a536e775fb | ||
|
|
3b01d72a64 | ||
|
|
e2a4a46e99 | ||
|
|
eda560d34c | ||
|
|
adbb04864d | ||
|
|
effe4b9784 | ||
|
|
5b51ad0052 | ||
|
|
10b4e354b6 | ||
|
|
ea6938aea5 | ||
|
|
8ef0d9deff | ||
|
|
fa2abfdb03 | ||
|
|
1d3ef67b09 | ||
|
|
0f0b531827 | ||
|
|
e8284281c1 | ||
|
|
715a7da1b2 | ||
|
|
14d224d4e6 | ||
|
|
540399f540 | ||
|
|
f088027e93 | ||
|
|
c6e08ecd46 | ||
|
|
4ad7a1f5fd | ||
|
|
1f81fbe274 | ||
|
|
589931ca79 | ||
|
|
675be88f00 | ||
|
|
df4ad6f4ac | ||
|
|
bc90c28bc9 | ||
|
|
f040c27d4c | ||
|
|
138fac703a | ||
|
|
468ae09ed8 | ||
|
|
3fca52022f | ||
|
|
c375903db5 | ||
|
|
b9d52fca1d | ||
|
|
2ada094bff | ||
|
|
f1f542bdd4 | ||
|
|
a9c403c001 | ||
|
|
e7b9a0762b | ||
|
|
8eb17315c8 | ||
|
|
c71c19c5e6 | ||
|
|
adc31940a9 | ||
|
|
963ee05d16 | ||
|
|
668e34c6e0 | ||
|
|
25d7bb3ea6 | ||
|
|
394b8fb996 | ||
|
|
a1d55e14ba | ||
|
|
e5564d45bf | ||
|
|
2921a20194 | ||
|
|
3376252d71 | ||
|
|
16170c69ae | ||
|
|
4408047ac5 | ||
|
|
34fab8b511 | ||
|
|
298ce67999 | ||
|
|
d2e7a19fd5 | ||
|
|
cd3082008e | ||
|
|
f3209b5b55 | ||
|
|
96399c3ec6 | ||
|
|
10d3220abe | ||
|
|
f69511ecc6 | ||
|
|
d2b10b1f4f | ||
|
|
23a2cd3337 | ||
|
|
4edde134f6 | ||
|
|
074a7cc3c5 | ||
|
|
6bfd13f07a | ||
|
|
eeb70033a6 | ||
|
|
c4a4750cb3 | ||
|
|
a6375d4101 | ||
|
|
8e1b7a084a | ||
|
|
6946facf69 | ||
|
|
130dd936bb | ||
|
|
a899e42fc7 | ||
|
|
f96e4a16ad | ||
|
|
9c6e9684a2 | ||
|
|
2e4841ef1e | ||
|
|
8bea943714 | ||
|
|
614d0c64e9 | ||
|
|
b1a2c0d577 | ||
|
|
06ee907b73 | ||
|
|
896fb6d8d7 | ||
|
|
7f51f286a5 | ||
|
|
829f6defa4 | ||
|
|
24bdf4b215 | ||
|
|
95e0c3757d | ||
|
|
6cf0be5d3d | ||
|
|
ec068f9b5b | ||
|
|
0240d4191a | ||
|
|
04717fd861 | ||
|
|
6fd458e99d | ||
|
|
1066fe4cbc | ||
|
|
d38f69ea25 | ||
|
|
0a1c13af79 | ||
|
|
0028c34432 | ||
|
|
d457beed92 | ||
|
|
1d9a6a81b9 | ||
|
|
4e0984db6c | ||
|
|
83bc6c94ea | ||
|
|
0d68ddf327 | ||
|
|
7d887118b9 | ||
|
|
b63c956860 | ||
|
|
716b2062bf | ||
|
|
5fd6825d25 | ||
|
|
e0fae6fd73 | ||
|
|
ec1aded12e | ||
|
|
151a56b80e | ||
|
|
a3faf3f260 | ||
|
|
867a2b0cf9 | ||
|
|
98730c5dd7 | ||
|
|
7ebd359446 | ||
|
|
d3881f35b7 | ||
|
|
48207d6689 | ||
|
|
2f6f426f66 | ||
|
|
a0542c1917 | ||
|
|
a8ad6664c2 | ||
|
|
14f7b545bd | ||
|
|
07cd20041c | ||
|
|
6ddbf6222c | ||
|
|
3ff39e8e86 | ||
|
|
6be43bd855 | ||
|
|
dc89434bdc | ||
|
|
4d633bfe9a | ||
|
|
174cf868ea | ||
|
|
413604405f | ||
|
|
bc108e1533 | ||
|
|
86555c9f59 | ||
|
|
983dec3bf7 | ||
|
|
f9fa8a868c | ||
|
|
05be622b1c | ||
|
|
352d96eb82 | ||
|
|
3511a9623f | ||
|
|
42cae93b94 | ||
|
|
a2ecce26bc | ||
|
|
9e00b727ad | ||
|
|
f7a4626f4b | ||
|
|
f4a44b7707 | ||
|
|
3bc3b48c10 | ||
|
|
581d8aacf7 | ||
|
|
ba1bfac20b | ||
|
|
5edd0b34fa | ||
|
|
3a28e36aa1 | ||
|
|
3393c01c9d | ||
|
|
1fa8dbc63a | ||
|
|
0ab6dc0f23 | ||
|
|
b2030a249c | ||
|
|
67bef2027c | ||
|
|
aa676c641f | ||
|
|
e6df8edadc | ||
|
|
80cfaebaa1 | ||
|
|
ba82414106 | ||
|
|
fe5f035f79 | ||
|
|
b3d10d6d65 | ||
|
|
b82f9f5666 | ||
|
|
6a5ba1b719 | ||
|
|
4d40c9140c | ||
|
|
0ab63ff647 | ||
|
|
db33af065b | ||
|
|
1096f88e2b | ||
|
|
cef4a51223 | ||
|
|
edf5ba6a17 | ||
|
|
9941f1f61b | ||
|
|
46a9db0336 | ||
|
|
370146e4e0 | ||
|
|
5cd45c24bf | ||
|
|
67b3fe0aae | ||
|
|
baab065679 | ||
|
|
509741aea7 | ||
|
|
e1df77ee1e | ||
|
|
fdb1baa05c | ||
|
|
6529ee67ec | ||
|
|
df2bc5ef28 | ||
|
|
a7bf77fc28 | ||
|
|
0f0defdb65 | ||
|
|
19df9f3ec0 | ||
|
|
d6ca120987 | ||
|
|
fb7ae0184f | ||
|
|
70f8d4b488 | ||
|
|
6c60e430ee | ||
|
|
1221b28eac | ||
|
|
746f603b20 | ||
|
|
2afea72d29 | ||
|
|
0f111ab794 | ||
|
|
4dd7aaa06f | ||
|
|
d27e996ccd | ||
|
|
72780ff5b1 | ||
|
|
69fdb8720f | ||
|
|
b2140a895b | ||
|
|
e0e8c58f64 | ||
|
|
cbea5d1725 | ||
|
|
a1245c2c61 | ||
|
|
cdda94f412 | ||
|
|
5b830aa356 | ||
|
|
9e7bae9881 | ||
|
|
b41ce1e090 | ||
|
|
95d3748453 | ||
|
|
44aa9e566d | ||
|
|
fdb05f54ef | ||
|
|
98ba18ba55 | ||
|
|
5bb38586a9 | ||
|
|
ec9e88139a | ||
|
|
e4f8dca9a0 | ||
|
|
0267c5233a | ||
|
|
be4afa0bb4 | ||
|
|
04f4bd54ea | ||
|
|
82be58c512 | ||
|
|
6695635696 | ||
|
|
b934215d4c | ||
|
|
5ed3abd371 | ||
|
|
1087a510b5 | ||
|
|
305f2b4498 | ||
|
|
cb0f3b49cb | ||
|
|
caf9e985df | ||
|
|
c1c42698c9 | ||
|
|
75aab34675 | ||
|
|
35358a2dec | ||
|
|
818f760732 | ||
|
|
f29b93488d | ||
|
|
d50baf0c63 | ||
|
|
c2217142bd | ||
|
|
8edaf3b79c | ||
|
|
23e091564f | ||
|
|
0d23645bd1 | ||
|
|
7fa3e5b0f6 | ||
|
|
49b959b540 | ||
|
|
58237364b1 | ||
|
|
3e35628873 | ||
|
|
6a479588db | ||
|
|
fa489eaed6 | ||
|
|
0d7c479023 | ||
|
|
ce97d7e19b | ||
|
|
44ba90caff | ||
|
|
3c85a57297 | ||
|
|
03ca11318e | ||
|
|
3ffa7b46e5 | ||
|
|
c1b2a89e34 | ||
|
|
435d37ce5a | ||
|
|
5915c2985d | ||
|
|
21a7ff12a7 | ||
|
|
8909ab4b19 | ||
|
|
c1edb03c37 | ||
|
|
0d08370263 | ||
|
|
b8ccb46259 | ||
|
|
725ead2f5e | ||
|
|
26a7851e1e | ||
|
|
3fd31eef51 | ||
|
|
b02e2113ff | ||
|
|
21f023ec1a | ||
|
|
31d9f9ea77 | ||
|
|
f53352f750 | ||
|
|
83ae24ce2d | ||
|
|
8af793b2d4 | ||
|
|
eb96ff0d59 | ||
|
|
a38dd79512 | ||
|
|
b1c5817a89 | ||
|
|
235d34cf56 | ||
|
|
5029673987 | ||
|
|
56bd7e67c2 | ||
|
|
9d16daaf64 | ||
|
|
8e4ca1b6b2 | ||
|
|
0d2d424fbe | ||
|
|
e24e54fdfa | ||
|
|
ebc99a77aa | ||
|
|
fa750a15bd | ||
|
|
181688012a | ||
|
|
142f353e1c | ||
|
|
b833d0fc80 | ||
|
|
e963621649 | ||
|
|
39215aa30e | ||
|
|
9ef43f38d4 | ||
|
|
88018fcf20 | ||
|
|
7404f1e9dc | ||
|
|
5a69227863 | ||
|
|
fc9fecc217 | ||
|
|
065f251766 | ||
|
|
21c747fa0f | ||
|
|
09129842e7 | ||
|
|
33b363edfa | ||
|
|
a9dd86029e | ||
|
|
9100652494 | ||
|
|
d1e3f489e9 | ||
|
|
ae05050db9 | ||
|
|
db969cc16d | ||
|
|
3cfe187dc7 | ||
|
|
90250d9e48 | ||
|
|
e5674015f3 | ||
|
|
b5c8b555d7 | ||
|
|
e23c27e905 | ||
|
|
7635d3d37f | ||
|
|
9132ce7c58 |
14
.github/ISSUE_TEMPLATE/bug-report.yml
vendored
14
.github/ISSUE_TEMPLATE/bug-report.yml
vendored
@@ -63,23 +63,27 @@ body:
|
|||||||
|
|
||||||
Please tag a maximum of 2 people.
|
Please tag a maximum of 2 people.
|
||||||
|
|
||||||
Questions on DiffusionPipeline (Saving, Loading, From pretrained, ...):
|
Questions on DiffusionPipeline (Saving, Loading, From pretrained, ...): @sayakpaul @DN6
|
||||||
|
|
||||||
Questions on pipelines:
|
Questions on pipelines:
|
||||||
- Stable Diffusion @yiyixuxu @DN6 @sayakpaul
|
- Stable Diffusion @yiyixuxu @asomoza
|
||||||
- Stable Diffusion XL @yiyixuxu @sayakpaul @DN6
|
- Stable Diffusion XL @yiyixuxu @sayakpaul @DN6
|
||||||
|
- Stable Diffusion 3: @yiyixuxu @sayakpaul @DN6 @asomoza
|
||||||
- Kandinsky @yiyixuxu
|
- Kandinsky @yiyixuxu
|
||||||
- ControlNet @sayakpaul @yiyixuxu @DN6
|
- ControlNet @sayakpaul @yiyixuxu @DN6
|
||||||
- T2I Adapter @sayakpaul @yiyixuxu @DN6
|
- T2I Adapter @sayakpaul @yiyixuxu @DN6
|
||||||
- IF @DN6
|
- IF @DN6
|
||||||
- Text-to-Video / Video-to-Video @DN6 @sayakpaul
|
- Text-to-Video / Video-to-Video @DN6 @a-r-r-o-w
|
||||||
- Wuerstchen @DN6
|
- Wuerstchen @DN6
|
||||||
- Other: @yiyixuxu @DN6
|
- Other: @yiyixuxu @DN6
|
||||||
|
- Improving generation quality: @asomoza
|
||||||
|
|
||||||
Questions on models:
|
Questions on models:
|
||||||
- UNet @DN6 @yiyixuxu @sayakpaul
|
- UNet @DN6 @yiyixuxu @sayakpaul
|
||||||
- VAE @sayakpaul @DN6 @yiyixuxu
|
- VAE @sayakpaul @DN6 @yiyixuxu
|
||||||
- Transformers/Attention @DN6 @yiyixuxu @sayakpaul @DN6
|
- Transformers/Attention @DN6 @yiyixuxu @sayakpaul
|
||||||
|
|
||||||
|
Questions on single file checkpoints: @DN6
|
||||||
|
|
||||||
Questions on Schedulers: @yiyixuxu
|
Questions on Schedulers: @yiyixuxu
|
||||||
|
|
||||||
@@ -99,7 +103,7 @@ body:
|
|||||||
|
|
||||||
Questions on JAX- and MPS-related things: @pcuenca
|
Questions on JAX- and MPS-related things: @pcuenca
|
||||||
|
|
||||||
Questions on audio pipelines: @DN6
|
Questions on audio pipelines: @sanchit-gandhi
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
5
.github/PULL_REQUEST_TEMPLATE.md
vendored
5
.github/PULL_REQUEST_TEMPLATE.md
vendored
@@ -39,7 +39,7 @@ members/contributors who may be interested in your PR.
|
|||||||
Core library:
|
Core library:
|
||||||
|
|
||||||
- Schedulers: @yiyixuxu
|
- Schedulers: @yiyixuxu
|
||||||
- Pipelines: @sayakpaul @yiyixuxu @DN6
|
- Pipelines and pipeline callbacks: @yiyixuxu and @asomoza
|
||||||
- Training examples: @sayakpaul
|
- Training examples: @sayakpaul
|
||||||
- Docs: @stevhliu and @sayakpaul
|
- Docs: @stevhliu and @sayakpaul
|
||||||
- JAX and MPS: @pcuenca
|
- JAX and MPS: @pcuenca
|
||||||
@@ -48,7 +48,8 @@ Core library:
|
|||||||
|
|
||||||
Integrations:
|
Integrations:
|
||||||
|
|
||||||
- deepspeed: HF Trainer/Accelerate: @pacman100
|
- deepspeed: HF Trainer/Accelerate: @SunMarc
|
||||||
|
- PEFT: @sayakpaul @BenjaminBossan
|
||||||
|
|
||||||
HF projects:
|
HF projects:
|
||||||
|
|
||||||
|
|||||||
24
.github/workflows/benchmark.yml
vendored
24
.github/workflows/benchmark.yml
vendored
@@ -7,20 +7,24 @@ on:
|
|||||||
|
|
||||||
env:
|
env:
|
||||||
DIFFUSERS_IS_CI: yes
|
DIFFUSERS_IS_CI: yes
|
||||||
|
HF_HUB_ENABLE_HF_TRANSFER: 1
|
||||||
HF_HOME: /mnt/cache
|
HF_HOME: /mnt/cache
|
||||||
OMP_NUM_THREADS: 8
|
OMP_NUM_THREADS: 8
|
||||||
MKL_NUM_THREADS: 8
|
MKL_NUM_THREADS: 8
|
||||||
|
|
||||||
jobs:
|
jobs:
|
||||||
torch_pipelines_cuda_benchmark_tests:
|
torch_pipelines_cuda_benchmark_tests:
|
||||||
|
env:
|
||||||
|
SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL_BENCHMARK }}
|
||||||
name: Torch Core Pipelines CUDA Benchmarking Tests
|
name: Torch Core Pipelines CUDA Benchmarking Tests
|
||||||
strategy:
|
strategy:
|
||||||
fail-fast: false
|
fail-fast: false
|
||||||
max-parallel: 1
|
max-parallel: 1
|
||||||
runs-on: [single-gpu, nvidia-gpu, a10, ci]
|
runs-on:
|
||||||
|
group: aws-g6-4xlarge-plus
|
||||||
container:
|
container:
|
||||||
image: diffusers/diffusers-pytorch-cuda
|
image: diffusers/diffusers-pytorch-compile-cuda
|
||||||
options: --shm-size "16gb" --ipc host -v /mnt/hf_cache:/mnt/cache/ --gpus 0
|
options: --shm-size "16gb" --ipc host --gpus 0
|
||||||
steps:
|
steps:
|
||||||
- name: Checkout diffusers
|
- name: Checkout diffusers
|
||||||
uses: actions/checkout@v3
|
uses: actions/checkout@v3
|
||||||
@@ -39,7 +43,7 @@ jobs:
|
|||||||
python utils/print_env.py
|
python utils/print_env.py
|
||||||
- name: Diffusers Benchmarking
|
- name: Diffusers Benchmarking
|
||||||
env:
|
env:
|
||||||
HUGGING_FACE_HUB_TOKEN: ${{ secrets.DIFFUSERS_BOT_TOKEN }}
|
HF_TOKEN: ${{ secrets.DIFFUSERS_BOT_TOKEN }}
|
||||||
BASE_PATH: benchmark_outputs
|
BASE_PATH: benchmark_outputs
|
||||||
run: |
|
run: |
|
||||||
export TOTAL_GPU_MEMORY=$(python -c "import torch; print(torch.cuda.get_device_properties(0).total_memory / (1024**3))")
|
export TOTAL_GPU_MEMORY=$(python -c "import torch; print(torch.cuda.get_device_properties(0).total_memory / (1024**3))")
|
||||||
@@ -47,7 +51,17 @@ jobs:
|
|||||||
|
|
||||||
- name: Test suite reports artifacts
|
- name: Test suite reports artifacts
|
||||||
if: ${{ always() }}
|
if: ${{ always() }}
|
||||||
uses: actions/upload-artifact@v2
|
uses: actions/upload-artifact@v4
|
||||||
with:
|
with:
|
||||||
name: benchmark_test_reports
|
name: benchmark_test_reports
|
||||||
path: benchmarks/benchmark_outputs
|
path: benchmarks/benchmark_outputs
|
||||||
|
|
||||||
|
- name: Report success status
|
||||||
|
if: ${{ success() }}
|
||||||
|
run: |
|
||||||
|
pip install requests && python utils/notify_benchmarking_status.py --status=success
|
||||||
|
|
||||||
|
- name: Report failure status
|
||||||
|
if: ${{ failure() }}
|
||||||
|
run: |
|
||||||
|
pip install requests && python utils/notify_benchmarking_status.py --status=failure
|
||||||
30
.github/workflows/build_docker_images.yml
vendored
30
.github/workflows/build_docker_images.yml
vendored
@@ -20,7 +20,8 @@ env:
|
|||||||
|
|
||||||
jobs:
|
jobs:
|
||||||
test-build-docker-images:
|
test-build-docker-images:
|
||||||
runs-on: [ self-hosted, intel-cpu, 8-cpu, ci ]
|
runs-on:
|
||||||
|
group: aws-general-8-plus
|
||||||
if: github.event_name == 'pull_request'
|
if: github.event_name == 'pull_request'
|
||||||
steps:
|
steps:
|
||||||
- name: Set up Docker Buildx
|
- name: Set up Docker Buildx
|
||||||
@@ -50,7 +51,8 @@ jobs:
|
|||||||
if: steps.file_changes.outputs.all != ''
|
if: steps.file_changes.outputs.all != ''
|
||||||
|
|
||||||
build-and-push-docker-images:
|
build-and-push-docker-images:
|
||||||
runs-on: [ self-hosted, intel-cpu, 8-cpu, ci ]
|
runs-on:
|
||||||
|
group: aws-general-8-plus
|
||||||
if: github.event_name != 'pull_request'
|
if: github.event_name != 'pull_request'
|
||||||
|
|
||||||
permissions:
|
permissions:
|
||||||
@@ -69,6 +71,7 @@ jobs:
|
|||||||
- diffusers-flax-tpu
|
- diffusers-flax-tpu
|
||||||
- diffusers-onnxruntime-cpu
|
- diffusers-onnxruntime-cpu
|
||||||
- diffusers-onnxruntime-cuda
|
- diffusers-onnxruntime-cuda
|
||||||
|
- diffusers-doc-builder
|
||||||
|
|
||||||
steps:
|
steps:
|
||||||
- name: Checkout repository
|
- name: Checkout repository
|
||||||
@@ -90,24 +93,11 @@ jobs:
|
|||||||
|
|
||||||
- name: Post to a Slack channel
|
- name: Post to a Slack channel
|
||||||
id: slack
|
id: slack
|
||||||
uses: slackapi/slack-github-action@6c661ce58804a1a20f6dc5fbee7f0381b469e001
|
uses: huggingface/hf-workflows/.github/actions/post-slack@main
|
||||||
with:
|
with:
|
||||||
# Slack channel id, channel name, or user id to post message.
|
# Slack channel id, channel name, or user id to post message.
|
||||||
# See also: https://api.slack.com/methods/chat.postMessage#channels
|
# See also: https://api.slack.com/methods/chat.postMessage#channels
|
||||||
channel-id: ${{ env.CI_SLACK_CHANNEL }}
|
slack_channel: ${{ env.CI_SLACK_CHANNEL }}
|
||||||
# For posting a rich message using Block Kit
|
title: "🤗 Results of the ${{ matrix.image-name }} Docker Image build"
|
||||||
payload: |
|
status: ${{ job.status }}
|
||||||
{
|
slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
|
||||||
"text": "${{ matrix.image-name }} Docker Image build result: ${{ job.status }}\n${{ github.event.head_commit.url }}",
|
|
||||||
"blocks": [
|
|
||||||
{
|
|
||||||
"type": "section",
|
|
||||||
"text": {
|
|
||||||
"type": "mrkdwn",
|
|
||||||
"text": "${{ matrix.image-name }} Docker Image build result: ${{ job.status }}\n${{ github.event.head_commit.url }}"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
]
|
|
||||||
}
|
|
||||||
env:
|
|
||||||
SLACK_BOT_TOKEN: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
|
|
||||||
|
|||||||
2
.github/workflows/build_documentation.yml
vendored
2
.github/workflows/build_documentation.yml
vendored
@@ -21,7 +21,7 @@ jobs:
|
|||||||
package: diffusers
|
package: diffusers
|
||||||
notebook_folder: diffusers_doc
|
notebook_folder: diffusers_doc
|
||||||
languages: en ko zh ja pt
|
languages: en ko zh ja pt
|
||||||
|
custom_container: diffusers/diffusers-doc-builder
|
||||||
secrets:
|
secrets:
|
||||||
token: ${{ secrets.HUGGINGFACE_PUSH }}
|
token: ${{ secrets.HUGGINGFACE_PUSH }}
|
||||||
hf_token: ${{ secrets.HF_DOC_BUILD_PUSH }}
|
hf_token: ${{ secrets.HF_DOC_BUILD_PUSH }}
|
||||||
|
|||||||
1
.github/workflows/build_pr_documentation.yml
vendored
1
.github/workflows/build_pr_documentation.yml
vendored
@@ -20,3 +20,4 @@ jobs:
|
|||||||
install_libgl1: true
|
install_libgl1: true
|
||||||
package: diffusers
|
package: diffusers
|
||||||
languages: en ko zh ja pt
|
languages: en ko zh ja pt
|
||||||
|
custom_container: diffusers/diffusers-doc-builder
|
||||||
|
|||||||
102
.github/workflows/mirror_community_pipeline.yml
vendored
Normal file
102
.github/workflows/mirror_community_pipeline.yml
vendored
Normal file
@@ -0,0 +1,102 @@
|
|||||||
|
name: Mirror Community Pipeline
|
||||||
|
|
||||||
|
on:
|
||||||
|
# Push changes on the main branch
|
||||||
|
push:
|
||||||
|
branches:
|
||||||
|
- main
|
||||||
|
paths:
|
||||||
|
- 'examples/community/**.py'
|
||||||
|
|
||||||
|
# And on tag creation (e.g. `v0.28.1`)
|
||||||
|
tags:
|
||||||
|
- '*'
|
||||||
|
|
||||||
|
# Manual trigger with ref input
|
||||||
|
workflow_dispatch:
|
||||||
|
inputs:
|
||||||
|
ref:
|
||||||
|
description: "Either 'main' or a tag ref"
|
||||||
|
required: true
|
||||||
|
default: 'main'
|
||||||
|
|
||||||
|
jobs:
|
||||||
|
mirror_community_pipeline:
|
||||||
|
env:
|
||||||
|
SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL_COMMUNITY_MIRROR }}
|
||||||
|
|
||||||
|
runs-on: ubuntu-22.04
|
||||||
|
steps:
|
||||||
|
# Checkout to correct ref
|
||||||
|
# If workflow dispatch
|
||||||
|
# If ref is 'main', set:
|
||||||
|
# CHECKOUT_REF=refs/heads/main
|
||||||
|
# PATH_IN_REPO=main
|
||||||
|
# Else it must be a tag. Set:
|
||||||
|
# CHECKOUT_REF=refs/tags/{tag}
|
||||||
|
# PATH_IN_REPO={tag}
|
||||||
|
# If not workflow dispatch
|
||||||
|
# If ref is 'refs/heads/main' => set 'main'
|
||||||
|
# Else it must be a tag => set {tag}
|
||||||
|
- name: Set checkout_ref and path_in_repo
|
||||||
|
run: |
|
||||||
|
if [ "${{ github.event_name }}" == "workflow_dispatch" ]; then
|
||||||
|
if [ -z "${{ github.event.inputs.ref }}" ]; then
|
||||||
|
echo "Error: Missing ref input"
|
||||||
|
exit 1
|
||||||
|
elif [ "${{ github.event.inputs.ref }}" == "main" ]; then
|
||||||
|
echo "CHECKOUT_REF=refs/heads/main" >> $GITHUB_ENV
|
||||||
|
echo "PATH_IN_REPO=main" >> $GITHUB_ENV
|
||||||
|
else
|
||||||
|
echo "CHECKOUT_REF=refs/tags/${{ github.event.inputs.ref }}" >> $GITHUB_ENV
|
||||||
|
echo "PATH_IN_REPO=${{ github.event.inputs.ref }}" >> $GITHUB_ENV
|
||||||
|
fi
|
||||||
|
elif [ "${{ github.ref }}" == "refs/heads/main" ]; then
|
||||||
|
echo "CHECKOUT_REF=${{ github.ref }}" >> $GITHUB_ENV
|
||||||
|
echo "PATH_IN_REPO=main" >> $GITHUB_ENV
|
||||||
|
else
|
||||||
|
# e.g. refs/tags/v0.28.1 -> v0.28.1
|
||||||
|
echo "CHECKOUT_REF=${{ github.ref }}" >> $GITHUB_ENV
|
||||||
|
echo "PATH_IN_REPO=$(echo ${{ github.ref }} | sed 's/^refs\/tags\///')" >> $GITHUB_ENV
|
||||||
|
fi
|
||||||
|
- name: Print env vars
|
||||||
|
run: |
|
||||||
|
echo "CHECKOUT_REF: ${{ env.CHECKOUT_REF }}"
|
||||||
|
echo "PATH_IN_REPO: ${{ env.PATH_IN_REPO }}"
|
||||||
|
- uses: actions/checkout@v3
|
||||||
|
with:
|
||||||
|
ref: ${{ env.CHECKOUT_REF }}
|
||||||
|
|
||||||
|
# Setup + install dependencies
|
||||||
|
- name: Set up Python
|
||||||
|
uses: actions/setup-python@v4
|
||||||
|
with:
|
||||||
|
python-version: "3.10"
|
||||||
|
- name: Install dependencies
|
||||||
|
run: |
|
||||||
|
python -m pip install --upgrade pip
|
||||||
|
pip install --upgrade huggingface_hub
|
||||||
|
|
||||||
|
# Check secret is set
|
||||||
|
- name: whoami
|
||||||
|
run: huggingface-cli whoami
|
||||||
|
env:
|
||||||
|
HF_TOKEN: ${{ secrets.HF_TOKEN_MIRROR_COMMUNITY_PIPELINES }}
|
||||||
|
|
||||||
|
# Push to HF! (under subfolder based on checkout ref)
|
||||||
|
# https://huggingface.co/datasets/diffusers/community-pipelines-mirror
|
||||||
|
- name: Mirror community pipeline to HF
|
||||||
|
run: huggingface-cli upload diffusers/community-pipelines-mirror ./examples/community ${PATH_IN_REPO} --repo-type dataset
|
||||||
|
env:
|
||||||
|
PATH_IN_REPO: ${{ env.PATH_IN_REPO }}
|
||||||
|
HF_TOKEN: ${{ secrets.HF_TOKEN_MIRROR_COMMUNITY_PIPELINES }}
|
||||||
|
|
||||||
|
- name: Report success status
|
||||||
|
if: ${{ success() }}
|
||||||
|
run: |
|
||||||
|
pip install requests && python utils/notify_community_pipelines_mirror.py --status=success
|
||||||
|
|
||||||
|
- name: Report failure status
|
||||||
|
if: ${{ failure() }}
|
||||||
|
run: |
|
||||||
|
pip install requests && python utils/notify_community_pipelines_mirror.py --status=failure
|
||||||
388
.github/workflows/nightly_tests.yml
vendored
388
.github/workflows/nightly_tests.yml
vendored
@@ -7,7 +7,7 @@ on:
|
|||||||
|
|
||||||
env:
|
env:
|
||||||
DIFFUSERS_IS_CI: yes
|
DIFFUSERS_IS_CI: yes
|
||||||
HF_HOME: /mnt/cache
|
HF_HUB_ENABLE_HF_TRANSFER: 1
|
||||||
OMP_NUM_THREADS: 8
|
OMP_NUM_THREADS: 8
|
||||||
MKL_NUM_THREADS: 8
|
MKL_NUM_THREADS: 8
|
||||||
PYTEST_TIMEOUT: 600
|
PYTEST_TIMEOUT: 600
|
||||||
@@ -18,8 +18,11 @@ env:
|
|||||||
|
|
||||||
jobs:
|
jobs:
|
||||||
setup_torch_cuda_pipeline_matrix:
|
setup_torch_cuda_pipeline_matrix:
|
||||||
name: Setup Torch Pipelines Matrix
|
name: Setup Torch Pipelines CUDA Slow Tests Matrix
|
||||||
runs-on: ubuntu-latest
|
runs-on:
|
||||||
|
group: aws-general-8-plus
|
||||||
|
container:
|
||||||
|
image: diffusers/diffusers-pytorch-cpu
|
||||||
outputs:
|
outputs:
|
||||||
pipeline_test_matrix: ${{ steps.fetch_pipeline_matrix.outputs.pipeline_test_matrix }}
|
pipeline_test_matrix: ${{ steps.fetch_pipeline_matrix.outputs.pipeline_test_matrix }}
|
||||||
steps:
|
steps:
|
||||||
@@ -27,13 +30,9 @@ jobs:
|
|||||||
uses: actions/checkout@v3
|
uses: actions/checkout@v3
|
||||||
with:
|
with:
|
||||||
fetch-depth: 2
|
fetch-depth: 2
|
||||||
- name: Set up Python
|
|
||||||
uses: actions/setup-python@v4
|
|
||||||
with:
|
|
||||||
python-version: "3.8"
|
|
||||||
- name: Install dependencies
|
- name: Install dependencies
|
||||||
run: |
|
run: |
|
||||||
pip install -e .
|
pip install -e .[test]
|
||||||
pip install huggingface_hub
|
pip install huggingface_hub
|
||||||
- name: Fetch Pipeline Matrix
|
- name: Fetch Pipeline Matrix
|
||||||
id: fetch_pipeline_matrix
|
id: fetch_pipeline_matrix
|
||||||
@@ -44,22 +43,24 @@ jobs:
|
|||||||
|
|
||||||
- name: Pipeline Tests Artifacts
|
- name: Pipeline Tests Artifacts
|
||||||
if: ${{ always() }}
|
if: ${{ always() }}
|
||||||
uses: actions/upload-artifact@v2
|
uses: actions/upload-artifact@v4
|
||||||
with:
|
with:
|
||||||
name: test-pipelines.json
|
name: test-pipelines.json
|
||||||
path: reports
|
path: reports
|
||||||
|
|
||||||
run_nightly_tests_for_torch_pipelines:
|
run_nightly_tests_for_torch_pipelines:
|
||||||
name: Torch Pipelines CUDA Nightly Tests
|
name: Nightly Torch Pipelines CUDA Tests
|
||||||
needs: setup_torch_cuda_pipeline_matrix
|
needs: setup_torch_cuda_pipeline_matrix
|
||||||
strategy:
|
strategy:
|
||||||
fail-fast: false
|
fail-fast: false
|
||||||
|
max-parallel: 8
|
||||||
matrix:
|
matrix:
|
||||||
module: ${{ fromJson(needs.setup_torch_cuda_pipeline_matrix.outputs.pipeline_test_matrix) }}
|
module: ${{ fromJson(needs.setup_torch_cuda_pipeline_matrix.outputs.pipeline_test_matrix) }}
|
||||||
runs-on: [single-gpu, nvidia-gpu, t4, ci]
|
runs-on:
|
||||||
|
group: aws-g4dn-2xlarge
|
||||||
container:
|
container:
|
||||||
image: diffusers/diffusers-pytorch-cuda
|
image: diffusers/diffusers-pytorch-cuda
|
||||||
options: --shm-size "16gb" --ipc host -v /mnt/hf_cache:/mnt/cache/ --gpus 0
|
options: --shm-size "16gb" --ipc host --gpus 0
|
||||||
steps:
|
steps:
|
||||||
- name: Checkout diffusers
|
- name: Checkout diffusers
|
||||||
uses: actions/checkout@v3
|
uses: actions/checkout@v3
|
||||||
@@ -67,21 +68,18 @@ jobs:
|
|||||||
fetch-depth: 2
|
fetch-depth: 2
|
||||||
- name: NVIDIA-SMI
|
- name: NVIDIA-SMI
|
||||||
run: nvidia-smi
|
run: nvidia-smi
|
||||||
|
|
||||||
- name: Install dependencies
|
- name: Install dependencies
|
||||||
run: |
|
run: |
|
||||||
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
||||||
python -m uv pip install -e [quality,test]
|
python -m uv pip install -e [quality,test]
|
||||||
python -m uv pip install accelerate@git+https://github.com/huggingface/accelerate.git
|
pip uninstall accelerate -y && python -m uv pip install -U accelerate@git+https://github.com/huggingface/accelerate.git
|
||||||
python -m uv pip install pytest-reportlog
|
python -m uv pip install pytest-reportlog
|
||||||
|
|
||||||
- name: Environment
|
- name: Environment
|
||||||
run: |
|
run: |
|
||||||
python utils/print_env.py
|
python utils/print_env.py
|
||||||
|
- name: Pipeline CUDA Test
|
||||||
- name: Nightly PyTorch CUDA checkpoint (pipelines) tests
|
|
||||||
env:
|
env:
|
||||||
HUGGING_FACE_HUB_TOKEN: ${{ secrets.HUGGING_FACE_HUB_TOKEN }}
|
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
|
||||||
# https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
|
# https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
|
||||||
CUBLAS_WORKSPACE_CONFIG: :16:8
|
CUBLAS_WORKSPACE_CONFIG: :16:8
|
||||||
run: |
|
run: |
|
||||||
@@ -90,38 +88,38 @@ jobs:
|
|||||||
--make-reports=tests_pipeline_${{ matrix.module }}_cuda \
|
--make-reports=tests_pipeline_${{ matrix.module }}_cuda \
|
||||||
--report-log=tests_pipeline_${{ matrix.module }}_cuda.log \
|
--report-log=tests_pipeline_${{ matrix.module }}_cuda.log \
|
||||||
tests/pipelines/${{ matrix.module }}
|
tests/pipelines/${{ matrix.module }}
|
||||||
|
|
||||||
- name: Failure short reports
|
- name: Failure short reports
|
||||||
if: ${{ failure() }}
|
if: ${{ failure() }}
|
||||||
run: |
|
run: |
|
||||||
cat reports/tests_pipeline_${{ matrix.module }}_cuda_stats.txt
|
cat reports/tests_pipeline_${{ matrix.module }}_cuda_stats.txt
|
||||||
cat reports/tests_pipeline_${{ matrix.module }}_cuda_failures_short.txt
|
cat reports/tests_pipeline_${{ matrix.module }}_cuda_failures_short.txt
|
||||||
|
|
||||||
- name: Test suite reports artifacts
|
- name: Test suite reports artifacts
|
||||||
if: ${{ always() }}
|
if: ${{ always() }}
|
||||||
uses: actions/upload-artifact@v2
|
uses: actions/upload-artifact@v4
|
||||||
with:
|
with:
|
||||||
name: pipeline_${{ matrix.module }}_test_reports
|
name: pipeline_${{ matrix.module }}_test_reports
|
||||||
path: reports
|
path: reports
|
||||||
|
|
||||||
- name: Generate Report and Notify Channel
|
- name: Generate Report and Notify Channel
|
||||||
if: always()
|
if: always()
|
||||||
run: |
|
run: |
|
||||||
pip install slack_sdk tabulate
|
pip install slack_sdk tabulate
|
||||||
python scripts/log_reports.py >> $GITHUB_STEP_SUMMARY
|
python utils/log_reports.py >> $GITHUB_STEP_SUMMARY
|
||||||
|
|
||||||
run_nightly_tests_for_other_torch_modules:
|
run_nightly_tests_for_other_torch_modules:
|
||||||
name: Torch Non-Pipelines CUDA Nightly Tests
|
name: Nightly Torch CUDA Tests
|
||||||
runs-on: docker-gpu
|
runs-on:
|
||||||
|
group: aws-g4dn-2xlarge
|
||||||
container:
|
container:
|
||||||
image: diffusers/diffusers-pytorch-cuda
|
image: diffusers/diffusers-pytorch-cuda
|
||||||
options: --shm-size "16gb" --ipc host -v /mnt/hf_cache:/mnt/cache/ --gpus 0
|
options: --shm-size "16gb" --ipc host --gpus 0
|
||||||
defaults:
|
defaults:
|
||||||
run:
|
run:
|
||||||
shell: bash
|
shell: bash
|
||||||
strategy:
|
strategy:
|
||||||
|
fail-fast: false
|
||||||
|
max-parallel: 2
|
||||||
matrix:
|
matrix:
|
||||||
module: [models, schedulers, others, examples]
|
module: [models, schedulers, lora, others, single_file, examples]
|
||||||
steps:
|
steps:
|
||||||
- name: Checkout diffusers
|
- name: Checkout diffusers
|
||||||
uses: actions/checkout@v3
|
uses: actions/checkout@v3
|
||||||
@@ -132,16 +130,16 @@ jobs:
|
|||||||
run: |
|
run: |
|
||||||
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
||||||
python -m uv pip install -e [quality,test]
|
python -m uv pip install -e [quality,test]
|
||||||
python -m uv pip install accelerate@git+https://github.com/huggingface/accelerate.git
|
python -m uv pip install peft@git+https://github.com/huggingface/peft.git
|
||||||
|
pip uninstall accelerate -y && python -m uv pip install -U accelerate@git+https://github.com/huggingface/accelerate.git
|
||||||
python -m uv pip install pytest-reportlog
|
python -m uv pip install pytest-reportlog
|
||||||
|
|
||||||
- name: Environment
|
- name: Environment
|
||||||
run: python utils/print_env.py
|
run: python utils/print_env.py
|
||||||
|
|
||||||
- name: Run nightly PyTorch CUDA tests for non-pipeline modules
|
- name: Run nightly PyTorch CUDA tests for non-pipeline modules
|
||||||
if: ${{ matrix.module != 'examples'}}
|
if: ${{ matrix.module != 'examples'}}
|
||||||
env:
|
env:
|
||||||
HUGGING_FACE_HUB_TOKEN: ${{ secrets.HUGGING_FACE_HUB_TOKEN }}
|
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
|
||||||
# https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
|
# https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
|
||||||
CUBLAS_WORKSPACE_CONFIG: :16:8
|
CUBLAS_WORKSPACE_CONFIG: :16:8
|
||||||
run: |
|
run: |
|
||||||
@@ -154,11 +152,10 @@ jobs:
|
|||||||
- name: Run nightly example tests with Torch
|
- name: Run nightly example tests with Torch
|
||||||
if: ${{ matrix.module == 'examples' }}
|
if: ${{ matrix.module == 'examples' }}
|
||||||
env:
|
env:
|
||||||
HUGGING_FACE_HUB_TOKEN: ${{ secrets.HUGGING_FACE_HUB_TOKEN }}
|
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
|
||||||
# https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
|
# https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
|
||||||
CUBLAS_WORKSPACE_CONFIG: :16:8
|
CUBLAS_WORKSPACE_CONFIG: :16:8
|
||||||
run: |
|
run: |
|
||||||
python -m uv pip install peft@git+https://github.com/huggingface/peft.git
|
|
||||||
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
|
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
|
||||||
-s -v --make-reports=examples_torch_cuda \
|
-s -v --make-reports=examples_torch_cuda \
|
||||||
--report-log=examples_torch_cuda.log \
|
--report-log=examples_torch_cuda.log \
|
||||||
@@ -172,7 +169,7 @@ jobs:
|
|||||||
|
|
||||||
- name: Test suite reports artifacts
|
- name: Test suite reports artifacts
|
||||||
if: ${{ always() }}
|
if: ${{ always() }}
|
||||||
uses: actions/upload-artifact@v2
|
uses: actions/upload-artifact@v4
|
||||||
with:
|
with:
|
||||||
name: torch_${{ matrix.module }}_cuda_test_reports
|
name: torch_${{ matrix.module }}_cuda_test_reports
|
||||||
path: reports
|
path: reports
|
||||||
@@ -181,64 +178,63 @@ jobs:
|
|||||||
if: always()
|
if: always()
|
||||||
run: |
|
run: |
|
||||||
pip install slack_sdk tabulate
|
pip install slack_sdk tabulate
|
||||||
python scripts/log_reports.py >> $GITHUB_STEP_SUMMARY
|
python utils/log_reports.py >> $GITHUB_STEP_SUMMARY
|
||||||
|
|
||||||
run_lora_nightly_tests:
|
run_big_gpu_torch_tests:
|
||||||
name: Nightly LoRA Tests with PEFT and TORCH
|
name: Torch tests on big GPU
|
||||||
runs-on: docker-gpu
|
strategy:
|
||||||
|
fail-fast: false
|
||||||
|
max-parallel: 2
|
||||||
|
runs-on:
|
||||||
|
group: aws-g6e-xlarge-plus
|
||||||
container:
|
container:
|
||||||
image: diffusers/diffusers-pytorch-cuda
|
image: diffusers/diffusers-pytorch-cuda
|
||||||
options: --shm-size "16gb" --ipc host -v /mnt/hf_cache:/mnt/cache/ --gpus 0
|
options: --shm-size "16gb" --ipc host --gpus 0
|
||||||
defaults:
|
|
||||||
run:
|
|
||||||
shell: bash
|
|
||||||
steps:
|
steps:
|
||||||
- name: Checkout diffusers
|
- name: Checkout diffusers
|
||||||
uses: actions/checkout@v3
|
uses: actions/checkout@v3
|
||||||
with:
|
with:
|
||||||
fetch-depth: 2
|
fetch-depth: 2
|
||||||
|
- name: NVIDIA-SMI
|
||||||
- name: Install dependencies
|
run: nvidia-smi
|
||||||
run: |
|
- name: Install dependencies
|
||||||
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
run: |
|
||||||
python -m uv pip install -e [quality,test]
|
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
||||||
python -m uv pip install accelerate@git+https://github.com/huggingface/accelerate.git
|
python -m uv pip install -e [quality,test]
|
||||||
python -m uv pip install peft@git+https://github.com/huggingface/peft.git
|
python -m uv pip install peft@git+https://github.com/huggingface/peft.git
|
||||||
python -m uv pip install pytest-reportlog
|
pip uninstall accelerate -y && python -m uv pip install -U accelerate@git+https://github.com/huggingface/accelerate.git
|
||||||
|
python -m uv pip install pytest-reportlog
|
||||||
- name: Environment
|
- name: Environment
|
||||||
run: python utils/print_env.py
|
run: |
|
||||||
|
python utils/print_env.py
|
||||||
- name: Run nightly LoRA tests with PEFT and Torch
|
- name: Selected Torch CUDA Test on big GPU
|
||||||
env:
|
env:
|
||||||
HUGGING_FACE_HUB_TOKEN: ${{ secrets.HUGGING_FACE_HUB_TOKEN }}
|
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
|
||||||
# https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
|
# https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
|
||||||
CUBLAS_WORKSPACE_CONFIG: :16:8
|
CUBLAS_WORKSPACE_CONFIG: :16:8
|
||||||
run: |
|
BIG_GPU_MEMORY: 40
|
||||||
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
|
run: |
|
||||||
-s -v -k "not Flax and not Onnx" \
|
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
|
||||||
--make-reports=tests_torch_lora_cuda \
|
-m "big_gpu_with_torch_cuda" \
|
||||||
--report-log=tests_torch_lora_cuda.log \
|
--make-reports=tests_big_gpu_torch_cuda \
|
||||||
tests/lora
|
--report-log=tests_big_gpu_torch_cuda.log \
|
||||||
|
tests/
|
||||||
- name: Failure short reports
|
- name: Failure short reports
|
||||||
if: ${{ failure() }}
|
if: ${{ failure() }}
|
||||||
run: |
|
run: |
|
||||||
cat reports/tests_torch_lora_cuda_stats.txt
|
cat reports/tests_big_gpu_torch_cuda_stats.txt
|
||||||
cat reports/tests_torch_lora_cuda_failures_short.txt
|
cat reports/tests_big_gpu_torch_cuda_failures_short.txt
|
||||||
|
- name: Test suite reports artifacts
|
||||||
- name: Test suite reports artifacts
|
if: ${{ always() }}
|
||||||
if: ${{ always() }}
|
uses: actions/upload-artifact@v4
|
||||||
uses: actions/upload-artifact@v2
|
with:
|
||||||
with:
|
name: torch_cuda_big_gpu_test_reports
|
||||||
name: torch_lora_cuda_test_reports
|
path: reports
|
||||||
path: reports
|
- name: Generate Report and Notify Channel
|
||||||
|
if: always()
|
||||||
- name: Generate Report and Notify Channel
|
run: |
|
||||||
if: always()
|
pip install slack_sdk tabulate
|
||||||
run: |
|
python utils/log_reports.py >> $GITHUB_STEP_SUMMARY
|
||||||
pip install slack_sdk tabulate
|
|
||||||
python scripts/log_reports.py >> $GITHUB_STEP_SUMMARY
|
|
||||||
|
|
||||||
run_flax_tpu_tests:
|
run_flax_tpu_tests:
|
||||||
name: Nightly Flax TPU Tests
|
name: Nightly Flax TPU Tests
|
||||||
@@ -261,7 +257,7 @@ jobs:
|
|||||||
run: |
|
run: |
|
||||||
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
||||||
python -m uv pip install -e [quality,test]
|
python -m uv pip install -e [quality,test]
|
||||||
python -m uv pip install accelerate@git+https://github.com/huggingface/accelerate.git
|
pip uninstall accelerate -y && python -m uv pip install -U accelerate@git+https://github.com/huggingface/accelerate.git
|
||||||
python -m uv pip install pytest-reportlog
|
python -m uv pip install pytest-reportlog
|
||||||
|
|
||||||
- name: Environment
|
- name: Environment
|
||||||
@@ -269,7 +265,7 @@ jobs:
|
|||||||
|
|
||||||
- name: Run nightly Flax TPU tests
|
- name: Run nightly Flax TPU tests
|
||||||
env:
|
env:
|
||||||
HUGGING_FACE_HUB_TOKEN: ${{ secrets.HUGGING_FACE_HUB_TOKEN }}
|
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
|
||||||
run: |
|
run: |
|
||||||
python -m pytest -n 0 \
|
python -m pytest -n 0 \
|
||||||
-s -v -k "Flax" \
|
-s -v -k "Flax" \
|
||||||
@@ -285,7 +281,7 @@ jobs:
|
|||||||
|
|
||||||
- name: Test suite reports artifacts
|
- name: Test suite reports artifacts
|
||||||
if: ${{ always() }}
|
if: ${{ always() }}
|
||||||
uses: actions/upload-artifact@v2
|
uses: actions/upload-artifact@v4
|
||||||
with:
|
with:
|
||||||
name: flax_tpu_test_reports
|
name: flax_tpu_test_reports
|
||||||
path: reports
|
path: reports
|
||||||
@@ -294,14 +290,15 @@ jobs:
|
|||||||
if: always()
|
if: always()
|
||||||
run: |
|
run: |
|
||||||
pip install slack_sdk tabulate
|
pip install slack_sdk tabulate
|
||||||
python scripts/log_reports.py >> $GITHUB_STEP_SUMMARY
|
python utils/log_reports.py >> $GITHUB_STEP_SUMMARY
|
||||||
|
|
||||||
run_nightly_onnx_tests:
|
run_nightly_onnx_tests:
|
||||||
name: Nightly ONNXRuntime CUDA tests on Ubuntu
|
name: Nightly ONNXRuntime CUDA tests on Ubuntu
|
||||||
runs-on: docker-gpu
|
runs-on:
|
||||||
|
group: aws-g4dn-2xlarge
|
||||||
container:
|
container:
|
||||||
image: diffusers/diffusers-onnxruntime-cuda
|
image: diffusers/diffusers-onnxruntime-cuda
|
||||||
options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/hf_cache:/mnt/cache/
|
options: --gpus 0 --shm-size "16gb" --ipc host
|
||||||
|
|
||||||
steps:
|
steps:
|
||||||
- name: Checkout diffusers
|
- name: Checkout diffusers
|
||||||
@@ -316,15 +313,14 @@ jobs:
|
|||||||
run: |
|
run: |
|
||||||
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
||||||
python -m uv pip install -e [quality,test]
|
python -m uv pip install -e [quality,test]
|
||||||
python -m uv pip install accelerate@git+https://github.com/huggingface/accelerate.git
|
pip uninstall accelerate -y && python -m uv pip install -U accelerate@git+https://github.com/huggingface/accelerate.git
|
||||||
python -m uv pip install pytest-reportlog
|
python -m uv pip install pytest-reportlog
|
||||||
|
|
||||||
- name: Environment
|
- name: Environment
|
||||||
run: python utils/print_env.py
|
run: python utils/print_env.py
|
||||||
|
|
||||||
- name: Run nightly ONNXRuntime CUDA tests
|
- name: Run Nightly ONNXRuntime CUDA tests
|
||||||
env:
|
env:
|
||||||
HUGGING_FACE_HUB_TOKEN: ${{ secrets.HUGGING_FACE_HUB_TOKEN }}
|
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
|
||||||
run: |
|
run: |
|
||||||
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
|
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
|
||||||
-s -v -k "Onnx" \
|
-s -v -k "Onnx" \
|
||||||
@@ -340,75 +336,187 @@ jobs:
|
|||||||
|
|
||||||
- name: Test suite reports artifacts
|
- name: Test suite reports artifacts
|
||||||
if: ${{ always() }}
|
if: ${{ always() }}
|
||||||
uses: actions/upload-artifact@v2
|
uses: actions/upload-artifact@v4
|
||||||
with:
|
with:
|
||||||
name: ${{ matrix.config.report }}_test_reports
|
name: tests_onnx_cuda_reports
|
||||||
path: reports
|
path: reports
|
||||||
|
|
||||||
- name: Generate Report and Notify Channel
|
- name: Generate Report and Notify Channel
|
||||||
if: always()
|
if: always()
|
||||||
run: |
|
run: |
|
||||||
pip install slack_sdk tabulate
|
pip install slack_sdk tabulate
|
||||||
python scripts/log_reports.py >> $GITHUB_STEP_SUMMARY
|
python utils/log_reports.py >> $GITHUB_STEP_SUMMARY
|
||||||
|
|
||||||
run_nightly_tests_apple_m1:
|
|
||||||
name: Nightly PyTorch MPS tests on MacOS
|
|
||||||
runs-on: [ self-hosted, apple-m1 ]
|
|
||||||
if: github.event_name == 'schedule'
|
|
||||||
|
|
||||||
|
run_nightly_quantization_tests:
|
||||||
|
name: Torch quantization nightly tests
|
||||||
|
strategy:
|
||||||
|
fail-fast: false
|
||||||
|
max-parallel: 2
|
||||||
|
matrix:
|
||||||
|
config:
|
||||||
|
- backend: "bitsandbytes"
|
||||||
|
test_location: "bnb"
|
||||||
|
runs-on:
|
||||||
|
group: aws-g6e-xlarge-plus
|
||||||
|
container:
|
||||||
|
image: diffusers/diffusers-pytorch-cuda
|
||||||
|
options: --shm-size "20gb" --ipc host --gpus 0
|
||||||
steps:
|
steps:
|
||||||
- name: Checkout diffusers
|
- name: Checkout diffusers
|
||||||
uses: actions/checkout@v3
|
uses: actions/checkout@v3
|
||||||
with:
|
with:
|
||||||
fetch-depth: 2
|
fetch-depth: 2
|
||||||
|
- name: NVIDIA-SMI
|
||||||
- name: Clean checkout
|
run: nvidia-smi
|
||||||
shell: arch -arch arm64 bash {0}
|
|
||||||
run: |
|
|
||||||
git clean -fxd
|
|
||||||
|
|
||||||
- name: Setup miniconda
|
|
||||||
uses: ./.github/actions/setup-miniconda
|
|
||||||
with:
|
|
||||||
python-version: 3.9
|
|
||||||
|
|
||||||
- name: Install dependencies
|
- name: Install dependencies
|
||||||
shell: arch -arch arm64 bash {0}
|
|
||||||
run: |
|
run: |
|
||||||
${CONDA_RUN} python -m pip install --upgrade pip uv
|
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
||||||
${CONDA_RUN} python -m uv pip install -e [quality,test]
|
python -m uv pip install -e [quality,test]
|
||||||
${CONDA_RUN} python -m uv pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cpu
|
python -m uv pip install -U ${{ matrix.config.backend }}
|
||||||
${CONDA_RUN} python -m uv pip install accelerate@git+https://github.com/huggingface/accelerate
|
python -m uv pip install pytest-reportlog
|
||||||
${CONDA_RUN} python -m uv pip install pytest-reportlog
|
|
||||||
|
|
||||||
- name: Environment
|
- name: Environment
|
||||||
shell: arch -arch arm64 bash {0}
|
|
||||||
run: |
|
run: |
|
||||||
${CONDA_RUN} python utils/print_env.py
|
python utils/print_env.py
|
||||||
|
- name: ${{ matrix.config.backend }} quantization tests on GPU
|
||||||
- name: Run nightly PyTorch tests on M1 (MPS)
|
|
||||||
shell: arch -arch arm64 bash {0}
|
|
||||||
env:
|
env:
|
||||||
HF_HOME: /System/Volumes/Data/mnt/cache
|
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
|
||||||
HUGGING_FACE_HUB_TOKEN: ${{ secrets.HUGGING_FACE_HUB_TOKEN }}
|
# https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
|
||||||
|
CUBLAS_WORKSPACE_CONFIG: :16:8
|
||||||
|
BIG_GPU_MEMORY: 40
|
||||||
run: |
|
run: |
|
||||||
${CONDA_RUN} python -m pytest -n 1 -s -v --make-reports=tests_torch_mps \
|
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
|
||||||
--report-log=tests_torch_mps.log \
|
--make-reports=tests_${{ matrix.config.backend }}_torch_cuda \
|
||||||
tests/
|
--report-log=tests_${{ matrix.config.backend }}_torch_cuda.log \
|
||||||
|
tests/quantization/${{ matrix.config.test_location }}
|
||||||
- name: Failure short reports
|
- name: Failure short reports
|
||||||
if: ${{ failure() }}
|
if: ${{ failure() }}
|
||||||
run: cat reports/tests_torch_mps_failures_short.txt
|
run: |
|
||||||
|
cat reports/tests_${{ matrix.config.backend }}_torch_cuda_stats.txt
|
||||||
|
cat reports/tests_${{ matrix.config.backend }}_torch_cuda_failures_short.txt
|
||||||
- name: Test suite reports artifacts
|
- name: Test suite reports artifacts
|
||||||
if: ${{ always() }}
|
if: ${{ always() }}
|
||||||
uses: actions/upload-artifact@v2
|
uses: actions/upload-artifact@v4
|
||||||
with:
|
with:
|
||||||
name: torch_mps_test_reports
|
name: torch_cuda_${{ matrix.config.backend }}_reports
|
||||||
path: reports
|
path: reports
|
||||||
|
|
||||||
- name: Generate Report and Notify Channel
|
- name: Generate Report and Notify Channel
|
||||||
if: always()
|
if: always()
|
||||||
run: |
|
run: |
|
||||||
pip install slack_sdk tabulate
|
pip install slack_sdk tabulate
|
||||||
python scripts/log_reports.py >> $GITHUB_STEP_SUMMARY
|
python utils/log_reports.py >> $GITHUB_STEP_SUMMARY
|
||||||
|
|
||||||
|
# M1 runner currently not well supported
|
||||||
|
# TODO: (Dhruv) add these back when we setup better testing for Apple Silicon
|
||||||
|
# run_nightly_tests_apple_m1:
|
||||||
|
# name: Nightly PyTorch MPS tests on MacOS
|
||||||
|
# runs-on: [ self-hosted, apple-m1 ]
|
||||||
|
# if: github.event_name == 'schedule'
|
||||||
|
#
|
||||||
|
# steps:
|
||||||
|
# - name: Checkout diffusers
|
||||||
|
# uses: actions/checkout@v3
|
||||||
|
# with:
|
||||||
|
# fetch-depth: 2
|
||||||
|
#
|
||||||
|
# - name: Clean checkout
|
||||||
|
# shell: arch -arch arm64 bash {0}
|
||||||
|
# run: |
|
||||||
|
# git clean -fxd
|
||||||
|
# - name: Setup miniconda
|
||||||
|
# uses: ./.github/actions/setup-miniconda
|
||||||
|
# with:
|
||||||
|
# python-version: 3.9
|
||||||
|
#
|
||||||
|
# - name: Install dependencies
|
||||||
|
# shell: arch -arch arm64 bash {0}
|
||||||
|
# run: |
|
||||||
|
# ${CONDA_RUN} python -m pip install --upgrade pip uv
|
||||||
|
# ${CONDA_RUN} python -m uv pip install -e [quality,test]
|
||||||
|
# ${CONDA_RUN} python -m uv pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cpu
|
||||||
|
# ${CONDA_RUN} python -m uv pip install accelerate@git+https://github.com/huggingface/accelerate
|
||||||
|
# ${CONDA_RUN} python -m uv pip install pytest-reportlog
|
||||||
|
# - name: Environment
|
||||||
|
# shell: arch -arch arm64 bash {0}
|
||||||
|
# run: |
|
||||||
|
# ${CONDA_RUN} python utils/print_env.py
|
||||||
|
# - name: Run nightly PyTorch tests on M1 (MPS)
|
||||||
|
# shell: arch -arch arm64 bash {0}
|
||||||
|
# env:
|
||||||
|
# HF_HOME: /System/Volumes/Data/mnt/cache
|
||||||
|
# HF_TOKEN: ${{ secrets.HF_TOKEN }}
|
||||||
|
# run: |
|
||||||
|
# ${CONDA_RUN} python -m pytest -n 1 -s -v --make-reports=tests_torch_mps \
|
||||||
|
# --report-log=tests_torch_mps.log \
|
||||||
|
# tests/
|
||||||
|
# - name: Failure short reports
|
||||||
|
# if: ${{ failure() }}
|
||||||
|
# run: cat reports/tests_torch_mps_failures_short.txt
|
||||||
|
#
|
||||||
|
# - name: Test suite reports artifacts
|
||||||
|
# if: ${{ always() }}
|
||||||
|
# uses: actions/upload-artifact@v4
|
||||||
|
# with:
|
||||||
|
# name: torch_mps_test_reports
|
||||||
|
# path: reports
|
||||||
|
#
|
||||||
|
# - name: Generate Report and Notify Channel
|
||||||
|
# if: always()
|
||||||
|
# run: |
|
||||||
|
# pip install slack_sdk tabulate
|
||||||
|
# python utils/log_reports.py >> $GITHUB_STEP_SUMMARY run_nightly_tests_apple_m1:
|
||||||
|
# name: Nightly PyTorch MPS tests on MacOS
|
||||||
|
# runs-on: [ self-hosted, apple-m1 ]
|
||||||
|
# if: github.event_name == 'schedule'
|
||||||
|
#
|
||||||
|
# steps:
|
||||||
|
# - name: Checkout diffusers
|
||||||
|
# uses: actions/checkout@v3
|
||||||
|
# with:
|
||||||
|
# fetch-depth: 2
|
||||||
|
#
|
||||||
|
# - name: Clean checkout
|
||||||
|
# shell: arch -arch arm64 bash {0}
|
||||||
|
# run: |
|
||||||
|
# git clean -fxd
|
||||||
|
# - name: Setup miniconda
|
||||||
|
# uses: ./.github/actions/setup-miniconda
|
||||||
|
# with:
|
||||||
|
# python-version: 3.9
|
||||||
|
#
|
||||||
|
# - name: Install dependencies
|
||||||
|
# shell: arch -arch arm64 bash {0}
|
||||||
|
# run: |
|
||||||
|
# ${CONDA_RUN} python -m pip install --upgrade pip uv
|
||||||
|
# ${CONDA_RUN} python -m uv pip install -e [quality,test]
|
||||||
|
# ${CONDA_RUN} python -m uv pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cpu
|
||||||
|
# ${CONDA_RUN} python -m uv pip install accelerate@git+https://github.com/huggingface/accelerate
|
||||||
|
# ${CONDA_RUN} python -m uv pip install pytest-reportlog
|
||||||
|
# - name: Environment
|
||||||
|
# shell: arch -arch arm64 bash {0}
|
||||||
|
# run: |
|
||||||
|
# ${CONDA_RUN} python utils/print_env.py
|
||||||
|
# - name: Run nightly PyTorch tests on M1 (MPS)
|
||||||
|
# shell: arch -arch arm64 bash {0}
|
||||||
|
# env:
|
||||||
|
# HF_HOME: /System/Volumes/Data/mnt/cache
|
||||||
|
# HF_TOKEN: ${{ secrets.HF_TOKEN }}
|
||||||
|
# run: |
|
||||||
|
# ${CONDA_RUN} python -m pytest -n 1 -s -v --make-reports=tests_torch_mps \
|
||||||
|
# --report-log=tests_torch_mps.log \
|
||||||
|
# tests/
|
||||||
|
# - name: Failure short reports
|
||||||
|
# if: ${{ failure() }}
|
||||||
|
# run: cat reports/tests_torch_mps_failures_short.txt
|
||||||
|
#
|
||||||
|
# - name: Test suite reports artifacts
|
||||||
|
# if: ${{ always() }}
|
||||||
|
# uses: actions/upload-artifact@v4
|
||||||
|
# with:
|
||||||
|
# name: torch_mps_test_reports
|
||||||
|
# path: reports
|
||||||
|
#
|
||||||
|
# - name: Generate Report and Notify Channel
|
||||||
|
# if: always()
|
||||||
|
# run: |
|
||||||
|
# pip install slack_sdk tabulate
|
||||||
|
# python utils/log_reports.py >> $GITHUB_STEP_SUMMARY
|
||||||
@@ -7,7 +7,7 @@ on:
|
|||||||
|
|
||||||
jobs:
|
jobs:
|
||||||
build:
|
build:
|
||||||
runs-on: ubuntu-latest
|
runs-on: ubuntu-22.04
|
||||||
|
|
||||||
steps:
|
steps:
|
||||||
- uses: actions/checkout@v3
|
- uses: actions/checkout@v3
|
||||||
|
|||||||
3
.github/workflows/pr_dependency_test.yml
vendored
3
.github/workflows/pr_dependency_test.yml
vendored
@@ -16,7 +16,7 @@ concurrency:
|
|||||||
|
|
||||||
jobs:
|
jobs:
|
||||||
check_dependencies:
|
check_dependencies:
|
||||||
runs-on: ubuntu-latest
|
runs-on: ubuntu-22.04
|
||||||
steps:
|
steps:
|
||||||
- uses: actions/checkout@v3
|
- uses: actions/checkout@v3
|
||||||
- name: Set up Python
|
- name: Set up Python
|
||||||
@@ -33,4 +33,3 @@ jobs:
|
|||||||
run: |
|
run: |
|
||||||
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
||||||
pytest tests/others/test_dependencies.py
|
pytest tests/others/test_dependencies.py
|
||||||
|
|
||||||
@@ -16,7 +16,7 @@ concurrency:
|
|||||||
|
|
||||||
jobs:
|
jobs:
|
||||||
check_flax_dependencies:
|
check_flax_dependencies:
|
||||||
runs-on: ubuntu-latest
|
runs-on: ubuntu-22.04
|
||||||
steps:
|
steps:
|
||||||
- uses: actions/checkout@v3
|
- uses: actions/checkout@v3
|
||||||
- name: Set up Python
|
- name: Set up Python
|
||||||
|
|||||||
13
.github/workflows/pr_test_fetcher.yml
vendored
13
.github/workflows/pr_test_fetcher.yml
vendored
@@ -15,7 +15,8 @@ concurrency:
|
|||||||
jobs:
|
jobs:
|
||||||
setup_pr_tests:
|
setup_pr_tests:
|
||||||
name: Setup PR Tests
|
name: Setup PR Tests
|
||||||
runs-on: docker-cpu
|
runs-on:
|
||||||
|
group: aws-general-8-plus
|
||||||
container:
|
container:
|
||||||
image: diffusers/diffusers-pytorch-cpu
|
image: diffusers/diffusers-pytorch-cpu
|
||||||
options: --shm-size "16gb" --ipc host -v /mnt/hf_cache:/mnt/cache/
|
options: --shm-size "16gb" --ipc host -v /mnt/hf_cache:/mnt/cache/
|
||||||
@@ -73,7 +74,8 @@ jobs:
|
|||||||
max-parallel: 2
|
max-parallel: 2
|
||||||
matrix:
|
matrix:
|
||||||
modules: ${{ fromJson(needs.setup_pr_tests.outputs.matrix) }}
|
modules: ${{ fromJson(needs.setup_pr_tests.outputs.matrix) }}
|
||||||
runs-on: docker-cpu
|
runs-on:
|
||||||
|
group: aws-general-8-plus
|
||||||
container:
|
container:
|
||||||
image: diffusers/diffusers-pytorch-cpu
|
image: diffusers/diffusers-pytorch-cpu
|
||||||
options: --shm-size "16gb" --ipc host -v /mnt/hf_cache:/mnt/cache/
|
options: --shm-size "16gb" --ipc host -v /mnt/hf_cache:/mnt/cache/
|
||||||
@@ -123,12 +125,13 @@ jobs:
|
|||||||
config:
|
config:
|
||||||
- name: Hub tests for models, schedulers, and pipelines
|
- name: Hub tests for models, schedulers, and pipelines
|
||||||
framework: hub_tests_pytorch
|
framework: hub_tests_pytorch
|
||||||
runner: docker-cpu
|
runner: aws-general-8-plus
|
||||||
image: diffusers/diffusers-pytorch-cpu
|
image: diffusers/diffusers-pytorch-cpu
|
||||||
report: torch_hub
|
report: torch_hub
|
||||||
|
|
||||||
name: ${{ matrix.config.name }}
|
name: ${{ matrix.config.name }}
|
||||||
runs-on: ${{ matrix.config.runner }}
|
runs-on:
|
||||||
|
group: ${{ matrix.config.runner }}
|
||||||
container:
|
container:
|
||||||
image: ${{ matrix.config.image }}
|
image: ${{ matrix.config.image }}
|
||||||
options: --shm-size "16gb" --ipc host -v /mnt/hf_cache:/mnt/cache/
|
options: --shm-size "16gb" --ipc host -v /mnt/hf_cache:/mnt/cache/
|
||||||
@@ -168,7 +171,7 @@ jobs:
|
|||||||
|
|
||||||
- name: Test suite reports artifacts
|
- name: Test suite reports artifacts
|
||||||
if: ${{ always() }}
|
if: ${{ always() }}
|
||||||
uses: actions/upload-artifact@v2
|
uses: actions/upload-artifact@v4
|
||||||
with:
|
with:
|
||||||
name: pr_${{ matrix.config.report }}_test_reports
|
name: pr_${{ matrix.config.report }}_test_reports
|
||||||
path: reports
|
path: reports
|
||||||
|
|||||||
37
.github/workflows/pr_test_peft_backend.yml
vendored
37
.github/workflows/pr_test_peft_backend.yml
vendored
@@ -20,7 +20,7 @@ env:
|
|||||||
|
|
||||||
jobs:
|
jobs:
|
||||||
check_code_quality:
|
check_code_quality:
|
||||||
runs-on: ubuntu-latest
|
runs-on: ubuntu-22.04
|
||||||
steps:
|
steps:
|
||||||
- uses: actions/checkout@v3
|
- uses: actions/checkout@v3
|
||||||
- name: Set up Python
|
- name: Set up Python
|
||||||
@@ -40,7 +40,7 @@ jobs:
|
|||||||
|
|
||||||
check_repository_consistency:
|
check_repository_consistency:
|
||||||
needs: check_code_quality
|
needs: check_code_quality
|
||||||
runs-on: ubuntu-latest
|
runs-on: ubuntu-22.04
|
||||||
steps:
|
steps:
|
||||||
- uses: actions/checkout@v3
|
- uses: actions/checkout@v3
|
||||||
- name: Set up Python
|
- name: Set up Python
|
||||||
@@ -71,7 +71,8 @@ jobs:
|
|||||||
|
|
||||||
name: LoRA - ${{ matrix.lib-versions }}
|
name: LoRA - ${{ matrix.lib-versions }}
|
||||||
|
|
||||||
runs-on: [ self-hosted, intel-cpu, 8-cpu, ci ]
|
runs-on:
|
||||||
|
group: aws-general-8-plus
|
||||||
|
|
||||||
container:
|
container:
|
||||||
image: diffusers/diffusers-pytorch-cpu
|
image: diffusers/diffusers-pytorch-cpu
|
||||||
@@ -91,12 +92,14 @@ jobs:
|
|||||||
run: |
|
run: |
|
||||||
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
||||||
python -m uv pip install -e [quality,test]
|
python -m uv pip install -e [quality,test]
|
||||||
|
# TODO (sayakpaul, DN6): revisit `--no-deps`
|
||||||
if [ "${{ matrix.lib-versions }}" == "main" ]; then
|
if [ "${{ matrix.lib-versions }}" == "main" ]; then
|
||||||
python -m pip install -U peft@git+https://github.com/huggingface/peft.git
|
python -m pip install -U peft@git+https://github.com/huggingface/peft.git --no-deps
|
||||||
python -m uv pip install -U transformers@git+https://github.com/huggingface/transformers.git
|
python -m uv pip install -U transformers@git+https://github.com/huggingface/transformers.git --no-deps
|
||||||
python -m uv pip install -U accelerate@git+https://github.com/huggingface/accelerate.git
|
pip uninstall accelerate -y && python -m uv pip install -U accelerate@git+https://github.com/huggingface/accelerate.git --no-deps
|
||||||
else
|
else
|
||||||
python -m uv pip install -U peft transformers accelerate
|
python -m uv pip install -U peft --no-deps
|
||||||
|
python -m uv pip install -U transformers accelerate --no-deps
|
||||||
fi
|
fi
|
||||||
|
|
||||||
- name: Environment
|
- name: Environment
|
||||||
@@ -109,5 +112,23 @@ jobs:
|
|||||||
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
||||||
python -m pytest -n 4 --max-worker-restart=0 --dist=loadfile \
|
python -m pytest -n 4 --max-worker-restart=0 --dist=loadfile \
|
||||||
-s -v \
|
-s -v \
|
||||||
--make-reports=tests_${{ matrix.config.report }} \
|
--make-reports=tests_${{ matrix.lib-versions }} \
|
||||||
tests/lora/
|
tests/lora/
|
||||||
|
python -m pytest -n 4 --max-worker-restart=0 --dist=loadfile \
|
||||||
|
-s -v \
|
||||||
|
--make-reports=tests_models_lora_${{ matrix.lib-versions }} \
|
||||||
|
tests/models/ -k "lora"
|
||||||
|
|
||||||
|
|
||||||
|
- name: Failure short reports
|
||||||
|
if: ${{ failure() }}
|
||||||
|
run: |
|
||||||
|
cat reports/tests_${{ matrix.lib-versions }}_failures_short.txt
|
||||||
|
cat reports/tests_models_lora_${{ matrix.lib-versions }}_failures_short.txt
|
||||||
|
|
||||||
|
- name: Test suite reports artifacts
|
||||||
|
if: ${{ always() }}
|
||||||
|
uses: actions/upload-artifact@v4
|
||||||
|
with:
|
||||||
|
name: pr_${{ matrix.lib-versions }}_test_reports
|
||||||
|
path: reports
|
||||||
|
|||||||
27
.github/workflows/pr_tests.yml
vendored
27
.github/workflows/pr_tests.yml
vendored
@@ -22,13 +22,14 @@ concurrency:
|
|||||||
|
|
||||||
env:
|
env:
|
||||||
DIFFUSERS_IS_CI: yes
|
DIFFUSERS_IS_CI: yes
|
||||||
|
HF_HUB_ENABLE_HF_TRANSFER: 1
|
||||||
OMP_NUM_THREADS: 4
|
OMP_NUM_THREADS: 4
|
||||||
MKL_NUM_THREADS: 4
|
MKL_NUM_THREADS: 4
|
||||||
PYTEST_TIMEOUT: 60
|
PYTEST_TIMEOUT: 60
|
||||||
|
|
||||||
jobs:
|
jobs:
|
||||||
check_code_quality:
|
check_code_quality:
|
||||||
runs-on: ubuntu-latest
|
runs-on: ubuntu-22.04
|
||||||
steps:
|
steps:
|
||||||
- uses: actions/checkout@v3
|
- uses: actions/checkout@v3
|
||||||
- name: Set up Python
|
- name: Set up Python
|
||||||
@@ -48,7 +49,7 @@ jobs:
|
|||||||
|
|
||||||
check_repository_consistency:
|
check_repository_consistency:
|
||||||
needs: check_code_quality
|
needs: check_code_quality
|
||||||
runs-on: ubuntu-latest
|
runs-on: ubuntu-22.04
|
||||||
steps:
|
steps:
|
||||||
- uses: actions/checkout@v3
|
- uses: actions/checkout@v3
|
||||||
- name: Set up Python
|
- name: Set up Python
|
||||||
@@ -77,28 +78,29 @@ jobs:
|
|||||||
config:
|
config:
|
||||||
- name: Fast PyTorch Pipeline CPU tests
|
- name: Fast PyTorch Pipeline CPU tests
|
||||||
framework: pytorch_pipelines
|
framework: pytorch_pipelines
|
||||||
runner: [ self-hosted, intel-cpu, 32-cpu, 256-ram, ci ]
|
runner: aws-highmemory-32-plus
|
||||||
image: diffusers/diffusers-pytorch-cpu
|
image: diffusers/diffusers-pytorch-cpu
|
||||||
report: torch_cpu_pipelines
|
report: torch_cpu_pipelines
|
||||||
- name: Fast PyTorch Models & Schedulers CPU tests
|
- name: Fast PyTorch Models & Schedulers CPU tests
|
||||||
framework: pytorch_models
|
framework: pytorch_models
|
||||||
runner: [ self-hosted, intel-cpu, 8-cpu, ci ]
|
runner: aws-general-8-plus
|
||||||
image: diffusers/diffusers-pytorch-cpu
|
image: diffusers/diffusers-pytorch-cpu
|
||||||
report: torch_cpu_models_schedulers
|
report: torch_cpu_models_schedulers
|
||||||
- name: Fast Flax CPU tests
|
- name: Fast Flax CPU tests
|
||||||
framework: flax
|
framework: flax
|
||||||
runner: [ self-hosted, intel-cpu, 8-cpu, ci ]
|
runner: aws-general-8-plus
|
||||||
image: diffusers/diffusers-flax-cpu
|
image: diffusers/diffusers-flax-cpu
|
||||||
report: flax_cpu
|
report: flax_cpu
|
||||||
- name: PyTorch Example CPU tests
|
- name: PyTorch Example CPU tests
|
||||||
framework: pytorch_examples
|
framework: pytorch_examples
|
||||||
runner: [ self-hosted, intel-cpu, 8-cpu, ci ]
|
runner: aws-general-8-plus
|
||||||
image: diffusers/diffusers-pytorch-cpu
|
image: diffusers/diffusers-pytorch-cpu
|
||||||
report: torch_example_cpu
|
report: torch_example_cpu
|
||||||
|
|
||||||
name: ${{ matrix.config.name }}
|
name: ${{ matrix.config.name }}
|
||||||
|
|
||||||
runs-on: ${{ matrix.config.runner }}
|
runs-on:
|
||||||
|
group: ${{ matrix.config.runner }}
|
||||||
|
|
||||||
container:
|
container:
|
||||||
image: ${{ matrix.config.image }}
|
image: ${{ matrix.config.image }}
|
||||||
@@ -156,7 +158,7 @@ jobs:
|
|||||||
if: ${{ matrix.config.framework == 'pytorch_examples' }}
|
if: ${{ matrix.config.framework == 'pytorch_examples' }}
|
||||||
run: |
|
run: |
|
||||||
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
||||||
python -m uv pip install peft
|
python -m uv pip install peft timm
|
||||||
python -m pytest -n 4 --max-worker-restart=0 --dist=loadfile \
|
python -m pytest -n 4 --max-worker-restart=0 --dist=loadfile \
|
||||||
--make-reports=tests_${{ matrix.config.report }} \
|
--make-reports=tests_${{ matrix.config.report }} \
|
||||||
examples
|
examples
|
||||||
@@ -167,9 +169,9 @@ jobs:
|
|||||||
|
|
||||||
- name: Test suite reports artifacts
|
- name: Test suite reports artifacts
|
||||||
if: ${{ always() }}
|
if: ${{ always() }}
|
||||||
uses: actions/upload-artifact@v2
|
uses: actions/upload-artifact@v4
|
||||||
with:
|
with:
|
||||||
name: pr_${{ matrix.config.report }}_test_reports
|
name: pr_${{ matrix.config.framework }}_${{ matrix.config.report }}_test_reports
|
||||||
path: reports
|
path: reports
|
||||||
|
|
||||||
run_staging_tests:
|
run_staging_tests:
|
||||||
@@ -180,7 +182,8 @@ jobs:
|
|||||||
config:
|
config:
|
||||||
- name: Hub tests for models, schedulers, and pipelines
|
- name: Hub tests for models, schedulers, and pipelines
|
||||||
framework: hub_tests_pytorch
|
framework: hub_tests_pytorch
|
||||||
runner: [ self-hosted, intel-cpu, 8-cpu, ci ]
|
runner:
|
||||||
|
group: aws-general-8-plus
|
||||||
image: diffusers/diffusers-pytorch-cpu
|
image: diffusers/diffusers-pytorch-cpu
|
||||||
report: torch_hub
|
report: torch_hub
|
||||||
|
|
||||||
@@ -227,7 +230,7 @@ jobs:
|
|||||||
|
|
||||||
- name: Test suite reports artifacts
|
- name: Test suite reports artifacts
|
||||||
if: ${{ always() }}
|
if: ${{ always() }}
|
||||||
uses: actions/upload-artifact@v2
|
uses: actions/upload-artifact@v4
|
||||||
with:
|
with:
|
||||||
name: pr_${{ matrix.config.report }}_test_reports
|
name: pr_${{ matrix.config.report }}_test_reports
|
||||||
path: reports
|
path: reports
|
||||||
|
|||||||
@@ -16,7 +16,7 @@ concurrency:
|
|||||||
|
|
||||||
jobs:
|
jobs:
|
||||||
check_torch_dependencies:
|
check_torch_dependencies:
|
||||||
runs-on: ubuntu-latest
|
runs-on: ubuntu-22.04
|
||||||
steps:
|
steps:
|
||||||
- uses: actions/checkout@v3
|
- uses: actions/checkout@v3
|
||||||
- name: Set up Python
|
- name: Set up Python
|
||||||
|
|||||||
169
.github/workflows/push_tests.yml
vendored
169
.github/workflows/push_tests.yml
vendored
@@ -1,6 +1,7 @@
|
|||||||
name: Slow Tests on main
|
name: Fast GPU Tests on main
|
||||||
|
|
||||||
on:
|
on:
|
||||||
|
workflow_dispatch:
|
||||||
push:
|
push:
|
||||||
branches:
|
branches:
|
||||||
- main
|
- main
|
||||||
@@ -11,17 +12,19 @@ on:
|
|||||||
|
|
||||||
env:
|
env:
|
||||||
DIFFUSERS_IS_CI: yes
|
DIFFUSERS_IS_CI: yes
|
||||||
HF_HOME: /mnt/cache
|
|
||||||
OMP_NUM_THREADS: 8
|
OMP_NUM_THREADS: 8
|
||||||
MKL_NUM_THREADS: 8
|
MKL_NUM_THREADS: 8
|
||||||
|
HF_HUB_ENABLE_HF_TRANSFER: 1
|
||||||
PYTEST_TIMEOUT: 600
|
PYTEST_TIMEOUT: 600
|
||||||
RUN_SLOW: yes
|
|
||||||
PIPELINE_USAGE_CUTOFF: 50000
|
PIPELINE_USAGE_CUTOFF: 50000
|
||||||
|
|
||||||
jobs:
|
jobs:
|
||||||
setup_torch_cuda_pipeline_matrix:
|
setup_torch_cuda_pipeline_matrix:
|
||||||
name: Setup Torch Pipelines CUDA Slow Tests Matrix
|
name: Setup Torch Pipelines CUDA Slow Tests Matrix
|
||||||
runs-on: ubuntu-latest
|
runs-on:
|
||||||
|
group: aws-general-8-plus
|
||||||
|
container:
|
||||||
|
image: diffusers/diffusers-pytorch-cpu
|
||||||
outputs:
|
outputs:
|
||||||
pipeline_test_matrix: ${{ steps.fetch_pipeline_matrix.outputs.pipeline_test_matrix }}
|
pipeline_test_matrix: ${{ steps.fetch_pipeline_matrix.outputs.pipeline_test_matrix }}
|
||||||
steps:
|
steps:
|
||||||
@@ -29,14 +32,13 @@ jobs:
|
|||||||
uses: actions/checkout@v3
|
uses: actions/checkout@v3
|
||||||
with:
|
with:
|
||||||
fetch-depth: 2
|
fetch-depth: 2
|
||||||
- name: Set up Python
|
|
||||||
uses: actions/setup-python@v4
|
|
||||||
with:
|
|
||||||
python-version: "3.8"
|
|
||||||
- name: Install dependencies
|
- name: Install dependencies
|
||||||
run: |
|
run: |
|
||||||
pip install -e .
|
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
||||||
pip install huggingface_hub
|
python -m uv pip install -e [quality,test]
|
||||||
|
- name: Environment
|
||||||
|
run: |
|
||||||
|
python utils/print_env.py
|
||||||
- name: Fetch Pipeline Matrix
|
- name: Fetch Pipeline Matrix
|
||||||
id: fetch_pipeline_matrix
|
id: fetch_pipeline_matrix
|
||||||
run: |
|
run: |
|
||||||
@@ -45,22 +47,24 @@ jobs:
|
|||||||
echo "pipeline_test_matrix=$matrix" >> $GITHUB_OUTPUT
|
echo "pipeline_test_matrix=$matrix" >> $GITHUB_OUTPUT
|
||||||
- name: Pipeline Tests Artifacts
|
- name: Pipeline Tests Artifacts
|
||||||
if: ${{ always() }}
|
if: ${{ always() }}
|
||||||
uses: actions/upload-artifact@v2
|
uses: actions/upload-artifact@v4
|
||||||
with:
|
with:
|
||||||
name: test-pipelines.json
|
name: test-pipelines.json
|
||||||
path: reports
|
path: reports
|
||||||
|
|
||||||
torch_pipelines_cuda_tests:
|
torch_pipelines_cuda_tests:
|
||||||
name: Torch Pipelines CUDA Slow Tests
|
name: Torch Pipelines CUDA Tests
|
||||||
needs: setup_torch_cuda_pipeline_matrix
|
needs: setup_torch_cuda_pipeline_matrix
|
||||||
strategy:
|
strategy:
|
||||||
fail-fast: false
|
fail-fast: false
|
||||||
|
max-parallel: 8
|
||||||
matrix:
|
matrix:
|
||||||
module: ${{ fromJson(needs.setup_torch_cuda_pipeline_matrix.outputs.pipeline_test_matrix) }}
|
module: ${{ fromJson(needs.setup_torch_cuda_pipeline_matrix.outputs.pipeline_test_matrix) }}
|
||||||
runs-on: [single-gpu, nvidia-gpu, t4, ci]
|
runs-on:
|
||||||
|
group: aws-g4dn-2xlarge
|
||||||
container:
|
container:
|
||||||
image: diffusers/diffusers-pytorch-cuda
|
image: diffusers/diffusers-pytorch-cuda
|
||||||
options: --shm-size "16gb" --ipc host -v /mnt/hf_cache:/mnt/cache/ --gpus 0
|
options: --shm-size "16gb" --ipc host --gpus 0
|
||||||
steps:
|
steps:
|
||||||
- name: Checkout diffusers
|
- name: Checkout diffusers
|
||||||
uses: actions/checkout@v3
|
uses: actions/checkout@v3
|
||||||
@@ -73,13 +77,13 @@ jobs:
|
|||||||
run: |
|
run: |
|
||||||
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
||||||
python -m uv pip install -e [quality,test]
|
python -m uv pip install -e [quality,test]
|
||||||
python -m uv pip install accelerate@git+https://github.com/huggingface/accelerate.git
|
pip uninstall accelerate -y && python -m uv pip install -U accelerate@git+https://github.com/huggingface/accelerate.git
|
||||||
- name: Environment
|
- name: Environment
|
||||||
run: |
|
run: |
|
||||||
python utils/print_env.py
|
python utils/print_env.py
|
||||||
- name: Slow PyTorch CUDA checkpoint tests on Ubuntu
|
- name: PyTorch CUDA checkpoint tests on Ubuntu
|
||||||
env:
|
env:
|
||||||
HUGGING_FACE_HUB_TOKEN: ${{ secrets.HUGGING_FACE_HUB_TOKEN }}
|
HF_TOKEN: ${{ secrets.HF_TOKEN }}
|
||||||
# https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
|
# https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
|
||||||
CUBLAS_WORKSPACE_CONFIG: :16:8
|
CUBLAS_WORKSPACE_CONFIG: :16:8
|
||||||
run: |
|
run: |
|
||||||
@@ -92,26 +96,28 @@ jobs:
|
|||||||
run: |
|
run: |
|
||||||
cat reports/tests_pipeline_${{ matrix.module }}_cuda_stats.txt
|
cat reports/tests_pipeline_${{ matrix.module }}_cuda_stats.txt
|
||||||
cat reports/tests_pipeline_${{ matrix.module }}_cuda_failures_short.txt
|
cat reports/tests_pipeline_${{ matrix.module }}_cuda_failures_short.txt
|
||||||
|
|
||||||
- name: Test suite reports artifacts
|
- name: Test suite reports artifacts
|
||||||
if: ${{ always() }}
|
if: ${{ always() }}
|
||||||
uses: actions/upload-artifact@v2
|
uses: actions/upload-artifact@v4
|
||||||
with:
|
with:
|
||||||
name: pipeline_${{ matrix.module }}_test_reports
|
name: pipeline_${{ matrix.module }}_test_reports
|
||||||
path: reports
|
path: reports
|
||||||
|
|
||||||
torch_cuda_tests:
|
torch_cuda_tests:
|
||||||
name: Torch CUDA Tests
|
name: Torch CUDA Tests
|
||||||
runs-on: docker-gpu
|
runs-on:
|
||||||
|
group: aws-g4dn-2xlarge
|
||||||
container:
|
container:
|
||||||
image: diffusers/diffusers-pytorch-cuda
|
image: diffusers/diffusers-pytorch-cuda
|
||||||
options: --shm-size "16gb" --ipc host -v /mnt/hf_cache:/mnt/cache/ --gpus 0
|
options: --shm-size "16gb" --ipc host --gpus 0
|
||||||
defaults:
|
defaults:
|
||||||
run:
|
run:
|
||||||
shell: bash
|
shell: bash
|
||||||
strategy:
|
strategy:
|
||||||
|
fail-fast: false
|
||||||
|
max-parallel: 2
|
||||||
matrix:
|
matrix:
|
||||||
module: [models, schedulers, lora, others]
|
module: [models, schedulers, lora, others, single_file]
|
||||||
steps:
|
steps:
|
||||||
- name: Checkout diffusers
|
- name: Checkout diffusers
|
||||||
uses: actions/checkout@v3
|
uses: actions/checkout@v3
|
||||||
@@ -122,84 +128,35 @@ jobs:
|
|||||||
run: |
|
run: |
|
||||||
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
||||||
python -m uv pip install -e [quality,test]
|
python -m uv pip install -e [quality,test]
|
||||||
python -m uv pip install accelerate@git+https://github.com/huggingface/accelerate.git
|
python -m uv pip install peft@git+https://github.com/huggingface/peft.git
|
||||||
|
pip uninstall accelerate -y && python -m uv pip install -U accelerate@git+https://github.com/huggingface/accelerate.git
|
||||||
|
|
||||||
- name: Environment
|
- name: Environment
|
||||||
run: |
|
run: |
|
||||||
python utils/print_env.py
|
python utils/print_env.py
|
||||||
|
|
||||||
- name: Run slow PyTorch CUDA tests
|
- name: Run PyTorch CUDA tests
|
||||||
env:
|
env:
|
||||||
HUGGING_FACE_HUB_TOKEN: ${{ secrets.HUGGING_FACE_HUB_TOKEN }}
|
HF_TOKEN: ${{ secrets.HF_TOKEN }}
|
||||||
# https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
|
# https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
|
||||||
CUBLAS_WORKSPACE_CONFIG: :16:8
|
CUBLAS_WORKSPACE_CONFIG: :16:8
|
||||||
run: |
|
run: |
|
||||||
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
|
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
|
||||||
-s -v -k "not Flax and not Onnx" \
|
-s -v -k "not Flax and not Onnx" \
|
||||||
--make-reports=tests_torch_cuda \
|
--make-reports=tests_torch_cuda_${{ matrix.module }} \
|
||||||
tests/${{ matrix.module }}
|
tests/${{ matrix.module }}
|
||||||
|
|
||||||
- name: Failure short reports
|
- name: Failure short reports
|
||||||
if: ${{ failure() }}
|
if: ${{ failure() }}
|
||||||
run: |
|
run: |
|
||||||
cat reports/tests_torch_cuda_stats.txt
|
cat reports/tests_torch_cuda_${{ matrix.module }}_stats.txt
|
||||||
cat reports/tests_torch_cuda_failures_short.txt
|
cat reports/tests_torch_cuda_${{ matrix.module }}_failures_short.txt
|
||||||
|
|
||||||
- name: Test suite reports artifacts
|
- name: Test suite reports artifacts
|
||||||
if: ${{ always() }}
|
if: ${{ always() }}
|
||||||
uses: actions/upload-artifact@v2
|
uses: actions/upload-artifact@v4
|
||||||
with:
|
with:
|
||||||
name: torch_cuda_test_reports
|
name: torch_cuda_test_reports_${{ matrix.module }}
|
||||||
path: reports
|
|
||||||
|
|
||||||
peft_cuda_tests:
|
|
||||||
name: PEFT CUDA Tests
|
|
||||||
runs-on: docker-gpu
|
|
||||||
container:
|
|
||||||
image: diffusers/diffusers-pytorch-cuda
|
|
||||||
options: --shm-size "16gb" --ipc host -v /mnt/hf_cache:/mnt/cache/ --gpus 0
|
|
||||||
defaults:
|
|
||||||
run:
|
|
||||||
shell: bash
|
|
||||||
steps:
|
|
||||||
- name: Checkout diffusers
|
|
||||||
uses: actions/checkout@v3
|
|
||||||
with:
|
|
||||||
fetch-depth: 2
|
|
||||||
|
|
||||||
- name: Install dependencies
|
|
||||||
run: |
|
|
||||||
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
|
||||||
python -m uv pip install -e [quality,test]
|
|
||||||
python -m uv pip install accelerate@git+https://github.com/huggingface/accelerate.git
|
|
||||||
python -m pip install -U peft@git+https://github.com/huggingface/peft.git
|
|
||||||
|
|
||||||
- name: Environment
|
|
||||||
run: |
|
|
||||||
python utils/print_env.py
|
|
||||||
|
|
||||||
- name: Run slow PEFT CUDA tests
|
|
||||||
env:
|
|
||||||
HUGGING_FACE_HUB_TOKEN: ${{ secrets.HUGGING_FACE_HUB_TOKEN }}
|
|
||||||
# https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
|
|
||||||
CUBLAS_WORKSPACE_CONFIG: :16:8
|
|
||||||
run: |
|
|
||||||
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
|
|
||||||
-s -v -k "not Flax and not Onnx and not PEFTLoRALoading" \
|
|
||||||
--make-reports=tests_peft_cuda \
|
|
||||||
tests/lora/
|
|
||||||
|
|
||||||
- name: Failure short reports
|
|
||||||
if: ${{ failure() }}
|
|
||||||
run: |
|
|
||||||
cat reports/tests_peft_cuda_stats.txt
|
|
||||||
cat reports/tests_peft_cuda_failures_short.txt
|
|
||||||
|
|
||||||
- name: Test suite reports artifacts
|
|
||||||
if: ${{ always() }}
|
|
||||||
uses: actions/upload-artifact@v2
|
|
||||||
with:
|
|
||||||
name: torch_peft_test_reports
|
|
||||||
path: reports
|
path: reports
|
||||||
|
|
||||||
flax_tpu_tests:
|
flax_tpu_tests:
|
||||||
@@ -207,7 +164,7 @@ jobs:
|
|||||||
runs-on: docker-tpu
|
runs-on: docker-tpu
|
||||||
container:
|
container:
|
||||||
image: diffusers/diffusers-flax-tpu
|
image: diffusers/diffusers-flax-tpu
|
||||||
options: --shm-size "16gb" --ipc host -v /mnt/hf_cache:/mnt/cache/ --privileged
|
options: --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/ --privileged
|
||||||
defaults:
|
defaults:
|
||||||
run:
|
run:
|
||||||
shell: bash
|
shell: bash
|
||||||
@@ -221,15 +178,15 @@ jobs:
|
|||||||
run: |
|
run: |
|
||||||
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
||||||
python -m uv pip install -e [quality,test]
|
python -m uv pip install -e [quality,test]
|
||||||
python -m uv pip install accelerate@git+https://github.com/huggingface/accelerate.git
|
pip uninstall accelerate -y && python -m uv pip install -U accelerate@git+https://github.com/huggingface/accelerate.git
|
||||||
|
|
||||||
- name: Environment
|
- name: Environment
|
||||||
run: |
|
run: |
|
||||||
python utils/print_env.py
|
python utils/print_env.py
|
||||||
|
|
||||||
- name: Run slow Flax TPU tests
|
- name: Run Flax TPU tests
|
||||||
env:
|
env:
|
||||||
HUGGING_FACE_HUB_TOKEN: ${{ secrets.HUGGING_FACE_HUB_TOKEN }}
|
HF_TOKEN: ${{ secrets.HF_TOKEN }}
|
||||||
run: |
|
run: |
|
||||||
python -m pytest -n 0 \
|
python -m pytest -n 0 \
|
||||||
-s -v -k "Flax" \
|
-s -v -k "Flax" \
|
||||||
@@ -244,17 +201,18 @@ jobs:
|
|||||||
|
|
||||||
- name: Test suite reports artifacts
|
- name: Test suite reports artifacts
|
||||||
if: ${{ always() }}
|
if: ${{ always() }}
|
||||||
uses: actions/upload-artifact@v2
|
uses: actions/upload-artifact@v4
|
||||||
with:
|
with:
|
||||||
name: flax_tpu_test_reports
|
name: flax_tpu_test_reports
|
||||||
path: reports
|
path: reports
|
||||||
|
|
||||||
onnx_cuda_tests:
|
onnx_cuda_tests:
|
||||||
name: ONNX CUDA Tests
|
name: ONNX CUDA Tests
|
||||||
runs-on: docker-gpu
|
runs-on:
|
||||||
|
group: aws-g4dn-2xlarge
|
||||||
container:
|
container:
|
||||||
image: diffusers/diffusers-onnxruntime-cuda
|
image: diffusers/diffusers-onnxruntime-cuda
|
||||||
options: --shm-size "16gb" --ipc host -v /mnt/hf_cache:/mnt/cache/ --gpus 0
|
options: --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/ --gpus 0
|
||||||
defaults:
|
defaults:
|
||||||
run:
|
run:
|
||||||
shell: bash
|
shell: bash
|
||||||
@@ -268,15 +226,15 @@ jobs:
|
|||||||
run: |
|
run: |
|
||||||
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
||||||
python -m uv pip install -e [quality,test]
|
python -m uv pip install -e [quality,test]
|
||||||
python -m uv pip install accelerate@git+https://github.com/huggingface/accelerate.git
|
pip uninstall accelerate -y && python -m uv pip install -U accelerate@git+https://github.com/huggingface/accelerate.git
|
||||||
|
|
||||||
- name: Environment
|
- name: Environment
|
||||||
run: |
|
run: |
|
||||||
python utils/print_env.py
|
python utils/print_env.py
|
||||||
|
|
||||||
- name: Run slow ONNXRuntime CUDA tests
|
- name: Run ONNXRuntime CUDA tests
|
||||||
env:
|
env:
|
||||||
HUGGING_FACE_HUB_TOKEN: ${{ secrets.HUGGING_FACE_HUB_TOKEN }}
|
HF_TOKEN: ${{ secrets.HF_TOKEN }}
|
||||||
run: |
|
run: |
|
||||||
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
|
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
|
||||||
-s -v -k "Onnx" \
|
-s -v -k "Onnx" \
|
||||||
@@ -291,7 +249,7 @@ jobs:
|
|||||||
|
|
||||||
- name: Test suite reports artifacts
|
- name: Test suite reports artifacts
|
||||||
if: ${{ always() }}
|
if: ${{ always() }}
|
||||||
uses: actions/upload-artifact@v2
|
uses: actions/upload-artifact@v4
|
||||||
with:
|
with:
|
||||||
name: onnx_cuda_test_reports
|
name: onnx_cuda_test_reports
|
||||||
path: reports
|
path: reports
|
||||||
@@ -299,11 +257,12 @@ jobs:
|
|||||||
run_torch_compile_tests:
|
run_torch_compile_tests:
|
||||||
name: PyTorch Compile CUDA tests
|
name: PyTorch Compile CUDA tests
|
||||||
|
|
||||||
runs-on: docker-gpu
|
runs-on:
|
||||||
|
group: aws-g4dn-2xlarge
|
||||||
|
|
||||||
container:
|
container:
|
||||||
image: diffusers/diffusers-pytorch-compile-cuda
|
image: diffusers/diffusers-pytorch-compile-cuda
|
||||||
options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/hf_cache:/mnt/cache/
|
options: --gpus 0 --shm-size "16gb" --ipc host
|
||||||
|
|
||||||
steps:
|
steps:
|
||||||
- name: Checkout diffusers
|
- name: Checkout diffusers
|
||||||
@@ -323,7 +282,8 @@ jobs:
|
|||||||
python utils/print_env.py
|
python utils/print_env.py
|
||||||
- name: Run example tests on GPU
|
- name: Run example tests on GPU
|
||||||
env:
|
env:
|
||||||
HUGGING_FACE_HUB_TOKEN: ${{ secrets.HUGGING_FACE_HUB_TOKEN }}
|
HF_TOKEN: ${{ secrets.HF_TOKEN }}
|
||||||
|
RUN_COMPILE: yes
|
||||||
run: |
|
run: |
|
||||||
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile -s -v -k "compile" --make-reports=tests_torch_compile_cuda tests/
|
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile -s -v -k "compile" --make-reports=tests_torch_compile_cuda tests/
|
||||||
- name: Failure short reports
|
- name: Failure short reports
|
||||||
@@ -332,7 +292,7 @@ jobs:
|
|||||||
|
|
||||||
- name: Test suite reports artifacts
|
- name: Test suite reports artifacts
|
||||||
if: ${{ always() }}
|
if: ${{ always() }}
|
||||||
uses: actions/upload-artifact@v2
|
uses: actions/upload-artifact@v4
|
||||||
with:
|
with:
|
||||||
name: torch_compile_test_reports
|
name: torch_compile_test_reports
|
||||||
path: reports
|
path: reports
|
||||||
@@ -340,11 +300,12 @@ jobs:
|
|||||||
run_xformers_tests:
|
run_xformers_tests:
|
||||||
name: PyTorch xformers CUDA tests
|
name: PyTorch xformers CUDA tests
|
||||||
|
|
||||||
runs-on: docker-gpu
|
runs-on:
|
||||||
|
group: aws-g4dn-2xlarge
|
||||||
|
|
||||||
container:
|
container:
|
||||||
image: diffusers/diffusers-pytorch-xformers-cuda
|
image: diffusers/diffusers-pytorch-xformers-cuda
|
||||||
options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/hf_cache:/mnt/cache/
|
options: --gpus 0 --shm-size "16gb" --ipc host
|
||||||
|
|
||||||
steps:
|
steps:
|
||||||
- name: Checkout diffusers
|
- name: Checkout diffusers
|
||||||
@@ -364,7 +325,7 @@ jobs:
|
|||||||
python utils/print_env.py
|
python utils/print_env.py
|
||||||
- name: Run example tests on GPU
|
- name: Run example tests on GPU
|
||||||
env:
|
env:
|
||||||
HUGGING_FACE_HUB_TOKEN: ${{ secrets.HUGGING_FACE_HUB_TOKEN }}
|
HF_TOKEN: ${{ secrets.HF_TOKEN }}
|
||||||
run: |
|
run: |
|
||||||
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile -s -v -k "xformers" --make-reports=tests_torch_xformers_cuda tests/
|
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile -s -v -k "xformers" --make-reports=tests_torch_xformers_cuda tests/
|
||||||
- name: Failure short reports
|
- name: Failure short reports
|
||||||
@@ -373,7 +334,7 @@ jobs:
|
|||||||
|
|
||||||
- name: Test suite reports artifacts
|
- name: Test suite reports artifacts
|
||||||
if: ${{ always() }}
|
if: ${{ always() }}
|
||||||
uses: actions/upload-artifact@v2
|
uses: actions/upload-artifact@v4
|
||||||
with:
|
with:
|
||||||
name: torch_xformers_test_reports
|
name: torch_xformers_test_reports
|
||||||
path: reports
|
path: reports
|
||||||
@@ -381,11 +342,12 @@ jobs:
|
|||||||
run_examples_tests:
|
run_examples_tests:
|
||||||
name: Examples PyTorch CUDA tests on Ubuntu
|
name: Examples PyTorch CUDA tests on Ubuntu
|
||||||
|
|
||||||
runs-on: docker-gpu
|
runs-on:
|
||||||
|
group: aws-g4dn-2xlarge
|
||||||
|
|
||||||
container:
|
container:
|
||||||
image: diffusers/diffusers-pytorch-cuda
|
image: diffusers/diffusers-pytorch-cuda
|
||||||
options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/hf_cache:/mnt/cache/
|
options: --gpus 0 --shm-size "16gb" --ipc host
|
||||||
|
|
||||||
steps:
|
steps:
|
||||||
- name: Checkout diffusers
|
- name: Checkout diffusers
|
||||||
@@ -409,9 +371,10 @@ jobs:
|
|||||||
|
|
||||||
- name: Run example tests on GPU
|
- name: Run example tests on GPU
|
||||||
env:
|
env:
|
||||||
HUGGING_FACE_HUB_TOKEN: ${{ secrets.HUGGING_FACE_HUB_TOKEN }}
|
HF_TOKEN: ${{ secrets.HF_TOKEN }}
|
||||||
run: |
|
run: |
|
||||||
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
||||||
|
python -m uv pip install timm
|
||||||
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile -s -v --make-reports=examples_torch_cuda examples/
|
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile -s -v --make-reports=examples_torch_cuda examples/
|
||||||
|
|
||||||
- name: Failure short reports
|
- name: Failure short reports
|
||||||
@@ -422,7 +385,7 @@ jobs:
|
|||||||
|
|
||||||
- name: Test suite reports artifacts
|
- name: Test suite reports artifacts
|
||||||
if: ${{ always() }}
|
if: ${{ always() }}
|
||||||
uses: actions/upload-artifact@v2
|
uses: actions/upload-artifact@v4
|
||||||
with:
|
with:
|
||||||
name: examples_test_reports
|
name: examples_test_reports
|
||||||
path: reports
|
path: reports
|
||||||
16
.github/workflows/push_tests_fast.yml
vendored
16
.github/workflows/push_tests_fast.yml
vendored
@@ -18,6 +18,7 @@ env:
|
|||||||
HF_HOME: /mnt/cache
|
HF_HOME: /mnt/cache
|
||||||
OMP_NUM_THREADS: 8
|
OMP_NUM_THREADS: 8
|
||||||
MKL_NUM_THREADS: 8
|
MKL_NUM_THREADS: 8
|
||||||
|
HF_HUB_ENABLE_HF_TRANSFER: 1
|
||||||
PYTEST_TIMEOUT: 600
|
PYTEST_TIMEOUT: 600
|
||||||
RUN_SLOW: no
|
RUN_SLOW: no
|
||||||
|
|
||||||
@@ -29,28 +30,29 @@ jobs:
|
|||||||
config:
|
config:
|
||||||
- name: Fast PyTorch CPU tests on Ubuntu
|
- name: Fast PyTorch CPU tests on Ubuntu
|
||||||
framework: pytorch
|
framework: pytorch
|
||||||
runner: [ self-hosted, intel-cpu, 8-cpu, ci ]
|
runner: aws-general-8-plus
|
||||||
image: diffusers/diffusers-pytorch-cpu
|
image: diffusers/diffusers-pytorch-cpu
|
||||||
report: torch_cpu
|
report: torch_cpu
|
||||||
- name: Fast Flax CPU tests on Ubuntu
|
- name: Fast Flax CPU tests on Ubuntu
|
||||||
framework: flax
|
framework: flax
|
||||||
runner: [ self-hosted, intel-cpu, 8-cpu, ci ]
|
runner: aws-general-8-plus
|
||||||
image: diffusers/diffusers-flax-cpu
|
image: diffusers/diffusers-flax-cpu
|
||||||
report: flax_cpu
|
report: flax_cpu
|
||||||
- name: Fast ONNXRuntime CPU tests on Ubuntu
|
- name: Fast ONNXRuntime CPU tests on Ubuntu
|
||||||
framework: onnxruntime
|
framework: onnxruntime
|
||||||
runner: [ self-hosted, intel-cpu, 8-cpu, ci ]
|
runner: aws-general-8-plus
|
||||||
image: diffusers/diffusers-onnxruntime-cpu
|
image: diffusers/diffusers-onnxruntime-cpu
|
||||||
report: onnx_cpu
|
report: onnx_cpu
|
||||||
- name: PyTorch Example CPU tests on Ubuntu
|
- name: PyTorch Example CPU tests on Ubuntu
|
||||||
framework: pytorch_examples
|
framework: pytorch_examples
|
||||||
runner: [ self-hosted, intel-cpu, 8-cpu, ci ]
|
runner: aws-general-8-plus
|
||||||
image: diffusers/diffusers-pytorch-cpu
|
image: diffusers/diffusers-pytorch-cpu
|
||||||
report: torch_example_cpu
|
report: torch_example_cpu
|
||||||
|
|
||||||
name: ${{ matrix.config.name }}
|
name: ${{ matrix.config.name }}
|
||||||
|
|
||||||
runs-on: ${{ matrix.config.runner }}
|
runs-on:
|
||||||
|
group: ${{ matrix.config.runner }}
|
||||||
|
|
||||||
container:
|
container:
|
||||||
image: ${{ matrix.config.image }}
|
image: ${{ matrix.config.image }}
|
||||||
@@ -107,7 +109,7 @@ jobs:
|
|||||||
if: ${{ matrix.config.framework == 'pytorch_examples' }}
|
if: ${{ matrix.config.framework == 'pytorch_examples' }}
|
||||||
run: |
|
run: |
|
||||||
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
||||||
python -m uv pip install peft
|
python -m uv pip install peft timm
|
||||||
python -m pytest -n 4 --max-worker-restart=0 --dist=loadfile \
|
python -m pytest -n 4 --max-worker-restart=0 --dist=loadfile \
|
||||||
--make-reports=tests_${{ matrix.config.report }} \
|
--make-reports=tests_${{ matrix.config.report }} \
|
||||||
examples
|
examples
|
||||||
@@ -118,7 +120,7 @@ jobs:
|
|||||||
|
|
||||||
- name: Test suite reports artifacts
|
- name: Test suite reports artifacts
|
||||||
if: ${{ always() }}
|
if: ${{ always() }}
|
||||||
uses: actions/upload-artifact@v2
|
uses: actions/upload-artifact@v4
|
||||||
with:
|
with:
|
||||||
name: pr_${{ matrix.config.report }}_test_reports
|
name: pr_${{ matrix.config.report }}_test_reports
|
||||||
path: reports
|
path: reports
|
||||||
|
|||||||
7
.github/workflows/push_tests_mps.yml
vendored
7
.github/workflows/push_tests_mps.yml
vendored
@@ -13,6 +13,7 @@ env:
|
|||||||
HF_HOME: /mnt/cache
|
HF_HOME: /mnt/cache
|
||||||
OMP_NUM_THREADS: 8
|
OMP_NUM_THREADS: 8
|
||||||
MKL_NUM_THREADS: 8
|
MKL_NUM_THREADS: 8
|
||||||
|
HF_HUB_ENABLE_HF_TRANSFER: 1
|
||||||
PYTEST_TIMEOUT: 600
|
PYTEST_TIMEOUT: 600
|
||||||
RUN_SLOW: no
|
RUN_SLOW: no
|
||||||
|
|
||||||
@@ -23,7 +24,7 @@ concurrency:
|
|||||||
jobs:
|
jobs:
|
||||||
run_fast_tests_apple_m1:
|
run_fast_tests_apple_m1:
|
||||||
name: Fast PyTorch MPS tests on MacOS
|
name: Fast PyTorch MPS tests on MacOS
|
||||||
runs-on: [ self-hosted, apple-m1 ]
|
runs-on: macos-13-xlarge
|
||||||
|
|
||||||
steps:
|
steps:
|
||||||
- name: Checkout diffusers
|
- name: Checkout diffusers
|
||||||
@@ -59,7 +60,7 @@ jobs:
|
|||||||
shell: arch -arch arm64 bash {0}
|
shell: arch -arch arm64 bash {0}
|
||||||
env:
|
env:
|
||||||
HF_HOME: /System/Volumes/Data/mnt/cache
|
HF_HOME: /System/Volumes/Data/mnt/cache
|
||||||
HUGGING_FACE_HUB_TOKEN: ${{ secrets.HUGGING_FACE_HUB_TOKEN }}
|
HF_TOKEN: ${{ secrets.HF_TOKEN }}
|
||||||
run: |
|
run: |
|
||||||
${CONDA_RUN} python -m pytest -n 0 -s -v --make-reports=tests_torch_mps tests/
|
${CONDA_RUN} python -m pytest -n 0 -s -v --make-reports=tests_torch_mps tests/
|
||||||
|
|
||||||
@@ -69,7 +70,7 @@ jobs:
|
|||||||
|
|
||||||
- name: Test suite reports artifacts
|
- name: Test suite reports artifacts
|
||||||
if: ${{ always() }}
|
if: ${{ always() }}
|
||||||
uses: actions/upload-artifact@v2
|
uses: actions/upload-artifact@v4
|
||||||
with:
|
with:
|
||||||
name: pr_torch_mps_test_reports
|
name: pr_torch_mps_test_reports
|
||||||
path: reports
|
path: reports
|
||||||
|
|||||||
4
.github/workflows/pypi_publish.yaml
vendored
4
.github/workflows/pypi_publish.yaml
vendored
@@ -10,7 +10,7 @@ on:
|
|||||||
|
|
||||||
jobs:
|
jobs:
|
||||||
find-and-checkout-latest-branch:
|
find-and-checkout-latest-branch:
|
||||||
runs-on: ubuntu-latest
|
runs-on: ubuntu-22.04
|
||||||
outputs:
|
outputs:
|
||||||
latest_branch: ${{ steps.set_latest_branch.outputs.latest_branch }}
|
latest_branch: ${{ steps.set_latest_branch.outputs.latest_branch }}
|
||||||
steps:
|
steps:
|
||||||
@@ -36,7 +36,7 @@ jobs:
|
|||||||
|
|
||||||
release:
|
release:
|
||||||
needs: find-and-checkout-latest-branch
|
needs: find-and-checkout-latest-branch
|
||||||
runs-on: ubuntu-latest
|
runs-on: ubuntu-22.04
|
||||||
|
|
||||||
steps:
|
steps:
|
||||||
- name: Checkout Repo
|
- name: Checkout Repo
|
||||||
|
|||||||
389
.github/workflows/release_tests_fast.yml
vendored
Normal file
389
.github/workflows/release_tests_fast.yml
vendored
Normal file
@@ -0,0 +1,389 @@
|
|||||||
|
# Duplicate workflow to push_tests.yml that is meant to run on release/patch branches as a final check
|
||||||
|
# Creating a duplicate workflow here is simpler than adding complex path/branch parsing logic to push_tests.yml
|
||||||
|
# Needs to be updated if push_tests.yml updated
|
||||||
|
name: (Release) Fast GPU Tests on main
|
||||||
|
|
||||||
|
on:
|
||||||
|
push:
|
||||||
|
branches:
|
||||||
|
- "v*.*.*-release"
|
||||||
|
- "v*.*.*-patch"
|
||||||
|
|
||||||
|
env:
|
||||||
|
DIFFUSERS_IS_CI: yes
|
||||||
|
OMP_NUM_THREADS: 8
|
||||||
|
MKL_NUM_THREADS: 8
|
||||||
|
PYTEST_TIMEOUT: 600
|
||||||
|
PIPELINE_USAGE_CUTOFF: 50000
|
||||||
|
|
||||||
|
jobs:
|
||||||
|
setup_torch_cuda_pipeline_matrix:
|
||||||
|
name: Setup Torch Pipelines CUDA Slow Tests Matrix
|
||||||
|
runs-on:
|
||||||
|
group: aws-general-8-plus
|
||||||
|
container:
|
||||||
|
image: diffusers/diffusers-pytorch-cpu
|
||||||
|
outputs:
|
||||||
|
pipeline_test_matrix: ${{ steps.fetch_pipeline_matrix.outputs.pipeline_test_matrix }}
|
||||||
|
steps:
|
||||||
|
- name: Checkout diffusers
|
||||||
|
uses: actions/checkout@v3
|
||||||
|
with:
|
||||||
|
fetch-depth: 2
|
||||||
|
- name: Install dependencies
|
||||||
|
run: |
|
||||||
|
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
||||||
|
python -m uv pip install -e [quality,test]
|
||||||
|
- name: Environment
|
||||||
|
run: |
|
||||||
|
python utils/print_env.py
|
||||||
|
- name: Fetch Pipeline Matrix
|
||||||
|
id: fetch_pipeline_matrix
|
||||||
|
run: |
|
||||||
|
matrix=$(python utils/fetch_torch_cuda_pipeline_test_matrix.py)
|
||||||
|
echo $matrix
|
||||||
|
echo "pipeline_test_matrix=$matrix" >> $GITHUB_OUTPUT
|
||||||
|
- name: Pipeline Tests Artifacts
|
||||||
|
if: ${{ always() }}
|
||||||
|
uses: actions/upload-artifact@v4
|
||||||
|
with:
|
||||||
|
name: test-pipelines.json
|
||||||
|
path: reports
|
||||||
|
|
||||||
|
torch_pipelines_cuda_tests:
|
||||||
|
name: Torch Pipelines CUDA Tests
|
||||||
|
needs: setup_torch_cuda_pipeline_matrix
|
||||||
|
strategy:
|
||||||
|
fail-fast: false
|
||||||
|
max-parallel: 8
|
||||||
|
matrix:
|
||||||
|
module: ${{ fromJson(needs.setup_torch_cuda_pipeline_matrix.outputs.pipeline_test_matrix) }}
|
||||||
|
runs-on:
|
||||||
|
group: aws-g4dn-2xlarge
|
||||||
|
container:
|
||||||
|
image: diffusers/diffusers-pytorch-cuda
|
||||||
|
options: --shm-size "16gb" --ipc host --gpus 0
|
||||||
|
steps:
|
||||||
|
- name: Checkout diffusers
|
||||||
|
uses: actions/checkout@v3
|
||||||
|
with:
|
||||||
|
fetch-depth: 2
|
||||||
|
- name: NVIDIA-SMI
|
||||||
|
run: |
|
||||||
|
nvidia-smi
|
||||||
|
- name: Install dependencies
|
||||||
|
run: |
|
||||||
|
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
||||||
|
python -m uv pip install -e [quality,test]
|
||||||
|
pip uninstall accelerate -y && python -m uv pip install -U accelerate@git+https://github.com/huggingface/accelerate.git
|
||||||
|
- name: Environment
|
||||||
|
run: |
|
||||||
|
python utils/print_env.py
|
||||||
|
- name: Slow PyTorch CUDA checkpoint tests on Ubuntu
|
||||||
|
env:
|
||||||
|
HF_TOKEN: ${{ secrets.HF_TOKEN }}
|
||||||
|
# https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
|
||||||
|
CUBLAS_WORKSPACE_CONFIG: :16:8
|
||||||
|
run: |
|
||||||
|
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
|
||||||
|
-s -v -k "not Flax and not Onnx" \
|
||||||
|
--make-reports=tests_pipeline_${{ matrix.module }}_cuda \
|
||||||
|
tests/pipelines/${{ matrix.module }}
|
||||||
|
- name: Failure short reports
|
||||||
|
if: ${{ failure() }}
|
||||||
|
run: |
|
||||||
|
cat reports/tests_pipeline_${{ matrix.module }}_cuda_stats.txt
|
||||||
|
cat reports/tests_pipeline_${{ matrix.module }}_cuda_failures_short.txt
|
||||||
|
- name: Test suite reports artifacts
|
||||||
|
if: ${{ always() }}
|
||||||
|
uses: actions/upload-artifact@v4
|
||||||
|
with:
|
||||||
|
name: pipeline_${{ matrix.module }}_test_reports
|
||||||
|
path: reports
|
||||||
|
|
||||||
|
torch_cuda_tests:
|
||||||
|
name: Torch CUDA Tests
|
||||||
|
runs-on:
|
||||||
|
group: aws-g4dn-2xlarge
|
||||||
|
container:
|
||||||
|
image: diffusers/diffusers-pytorch-cuda
|
||||||
|
options: --shm-size "16gb" --ipc host --gpus 0
|
||||||
|
defaults:
|
||||||
|
run:
|
||||||
|
shell: bash
|
||||||
|
strategy:
|
||||||
|
fail-fast: false
|
||||||
|
max-parallel: 2
|
||||||
|
matrix:
|
||||||
|
module: [models, schedulers, lora, others, single_file]
|
||||||
|
steps:
|
||||||
|
- name: Checkout diffusers
|
||||||
|
uses: actions/checkout@v3
|
||||||
|
with:
|
||||||
|
fetch-depth: 2
|
||||||
|
|
||||||
|
- name: Install dependencies
|
||||||
|
run: |
|
||||||
|
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
||||||
|
python -m uv pip install -e [quality,test]
|
||||||
|
python -m uv pip install peft@git+https://github.com/huggingface/peft.git
|
||||||
|
pip uninstall accelerate -y && python -m uv pip install -U accelerate@git+https://github.com/huggingface/accelerate.git
|
||||||
|
|
||||||
|
- name: Environment
|
||||||
|
run: |
|
||||||
|
python utils/print_env.py
|
||||||
|
|
||||||
|
- name: Run PyTorch CUDA tests
|
||||||
|
env:
|
||||||
|
HF_TOKEN: ${{ secrets.HF_TOKEN }}
|
||||||
|
# https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
|
||||||
|
CUBLAS_WORKSPACE_CONFIG: :16:8
|
||||||
|
run: |
|
||||||
|
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
|
||||||
|
-s -v -k "not Flax and not Onnx" \
|
||||||
|
--make-reports=tests_torch_${{ matrix.module }}_cuda \
|
||||||
|
tests/${{ matrix.module }}
|
||||||
|
|
||||||
|
- name: Failure short reports
|
||||||
|
if: ${{ failure() }}
|
||||||
|
run: |
|
||||||
|
cat reports/tests_torch_${{ matrix.module }}_cuda_stats.txt
|
||||||
|
cat reports/tests_torch_${{ matrix.module }}_cuda_failures_short.txt
|
||||||
|
|
||||||
|
- name: Test suite reports artifacts
|
||||||
|
if: ${{ always() }}
|
||||||
|
uses: actions/upload-artifact@v4
|
||||||
|
with:
|
||||||
|
name: torch_cuda_${{ matrix.module }}_test_reports
|
||||||
|
path: reports
|
||||||
|
|
||||||
|
flax_tpu_tests:
|
||||||
|
name: Flax TPU Tests
|
||||||
|
runs-on: docker-tpu
|
||||||
|
container:
|
||||||
|
image: diffusers/diffusers-flax-tpu
|
||||||
|
options: --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/ --privileged
|
||||||
|
defaults:
|
||||||
|
run:
|
||||||
|
shell: bash
|
||||||
|
steps:
|
||||||
|
- name: Checkout diffusers
|
||||||
|
uses: actions/checkout@v3
|
||||||
|
with:
|
||||||
|
fetch-depth: 2
|
||||||
|
|
||||||
|
- name: Install dependencies
|
||||||
|
run: |
|
||||||
|
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
||||||
|
python -m uv pip install -e [quality,test]
|
||||||
|
pip uninstall accelerate -y && python -m uv pip install -U accelerate@git+https://github.com/huggingface/accelerate.git
|
||||||
|
|
||||||
|
- name: Environment
|
||||||
|
run: |
|
||||||
|
python utils/print_env.py
|
||||||
|
|
||||||
|
- name: Run slow Flax TPU tests
|
||||||
|
env:
|
||||||
|
HF_TOKEN: ${{ secrets.HF_TOKEN }}
|
||||||
|
run: |
|
||||||
|
python -m pytest -n 0 \
|
||||||
|
-s -v -k "Flax" \
|
||||||
|
--make-reports=tests_flax_tpu \
|
||||||
|
tests/
|
||||||
|
|
||||||
|
- name: Failure short reports
|
||||||
|
if: ${{ failure() }}
|
||||||
|
run: |
|
||||||
|
cat reports/tests_flax_tpu_stats.txt
|
||||||
|
cat reports/tests_flax_tpu_failures_short.txt
|
||||||
|
|
||||||
|
- name: Test suite reports artifacts
|
||||||
|
if: ${{ always() }}
|
||||||
|
uses: actions/upload-artifact@v4
|
||||||
|
with:
|
||||||
|
name: flax_tpu_test_reports
|
||||||
|
path: reports
|
||||||
|
|
||||||
|
onnx_cuda_tests:
|
||||||
|
name: ONNX CUDA Tests
|
||||||
|
runs-on:
|
||||||
|
group: aws-g4dn-2xlarge
|
||||||
|
container:
|
||||||
|
image: diffusers/diffusers-onnxruntime-cuda
|
||||||
|
options: --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/ --gpus 0
|
||||||
|
defaults:
|
||||||
|
run:
|
||||||
|
shell: bash
|
||||||
|
steps:
|
||||||
|
- name: Checkout diffusers
|
||||||
|
uses: actions/checkout@v3
|
||||||
|
with:
|
||||||
|
fetch-depth: 2
|
||||||
|
|
||||||
|
- name: Install dependencies
|
||||||
|
run: |
|
||||||
|
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
||||||
|
python -m uv pip install -e [quality,test]
|
||||||
|
pip uninstall accelerate -y && python -m uv pip install -U accelerate@git+https://github.com/huggingface/accelerate.git
|
||||||
|
|
||||||
|
- name: Environment
|
||||||
|
run: |
|
||||||
|
python utils/print_env.py
|
||||||
|
|
||||||
|
- name: Run slow ONNXRuntime CUDA tests
|
||||||
|
env:
|
||||||
|
HF_TOKEN: ${{ secrets.HF_TOKEN }}
|
||||||
|
run: |
|
||||||
|
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
|
||||||
|
-s -v -k "Onnx" \
|
||||||
|
--make-reports=tests_onnx_cuda \
|
||||||
|
tests/
|
||||||
|
|
||||||
|
- name: Failure short reports
|
||||||
|
if: ${{ failure() }}
|
||||||
|
run: |
|
||||||
|
cat reports/tests_onnx_cuda_stats.txt
|
||||||
|
cat reports/tests_onnx_cuda_failures_short.txt
|
||||||
|
|
||||||
|
- name: Test suite reports artifacts
|
||||||
|
if: ${{ always() }}
|
||||||
|
uses: actions/upload-artifact@v4
|
||||||
|
with:
|
||||||
|
name: onnx_cuda_test_reports
|
||||||
|
path: reports
|
||||||
|
|
||||||
|
run_torch_compile_tests:
|
||||||
|
name: PyTorch Compile CUDA tests
|
||||||
|
|
||||||
|
runs-on:
|
||||||
|
group: aws-g4dn-2xlarge
|
||||||
|
|
||||||
|
container:
|
||||||
|
image: diffusers/diffusers-pytorch-compile-cuda
|
||||||
|
options: --gpus 0 --shm-size "16gb" --ipc host
|
||||||
|
|
||||||
|
steps:
|
||||||
|
- name: Checkout diffusers
|
||||||
|
uses: actions/checkout@v3
|
||||||
|
with:
|
||||||
|
fetch-depth: 2
|
||||||
|
|
||||||
|
- name: NVIDIA-SMI
|
||||||
|
run: |
|
||||||
|
nvidia-smi
|
||||||
|
- name: Install dependencies
|
||||||
|
run: |
|
||||||
|
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
||||||
|
python -m uv pip install -e [quality,test,training]
|
||||||
|
- name: Environment
|
||||||
|
run: |
|
||||||
|
python utils/print_env.py
|
||||||
|
- name: Run example tests on GPU
|
||||||
|
env:
|
||||||
|
HF_TOKEN: ${{ secrets.HF_TOKEN }}
|
||||||
|
RUN_COMPILE: yes
|
||||||
|
run: |
|
||||||
|
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile -s -v -k "compile" --make-reports=tests_torch_compile_cuda tests/
|
||||||
|
- name: Failure short reports
|
||||||
|
if: ${{ failure() }}
|
||||||
|
run: cat reports/tests_torch_compile_cuda_failures_short.txt
|
||||||
|
|
||||||
|
- name: Test suite reports artifacts
|
||||||
|
if: ${{ always() }}
|
||||||
|
uses: actions/upload-artifact@v4
|
||||||
|
with:
|
||||||
|
name: torch_compile_test_reports
|
||||||
|
path: reports
|
||||||
|
|
||||||
|
run_xformers_tests:
|
||||||
|
name: PyTorch xformers CUDA tests
|
||||||
|
|
||||||
|
runs-on:
|
||||||
|
group: aws-g4dn-2xlarge
|
||||||
|
|
||||||
|
container:
|
||||||
|
image: diffusers/diffusers-pytorch-xformers-cuda
|
||||||
|
options: --gpus 0 --shm-size "16gb" --ipc host
|
||||||
|
|
||||||
|
steps:
|
||||||
|
- name: Checkout diffusers
|
||||||
|
uses: actions/checkout@v3
|
||||||
|
with:
|
||||||
|
fetch-depth: 2
|
||||||
|
|
||||||
|
- name: NVIDIA-SMI
|
||||||
|
run: |
|
||||||
|
nvidia-smi
|
||||||
|
- name: Install dependencies
|
||||||
|
run: |
|
||||||
|
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
||||||
|
python -m uv pip install -e [quality,test,training]
|
||||||
|
- name: Environment
|
||||||
|
run: |
|
||||||
|
python utils/print_env.py
|
||||||
|
- name: Run example tests on GPU
|
||||||
|
env:
|
||||||
|
HF_TOKEN: ${{ secrets.HF_TOKEN }}
|
||||||
|
run: |
|
||||||
|
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile -s -v -k "xformers" --make-reports=tests_torch_xformers_cuda tests/
|
||||||
|
- name: Failure short reports
|
||||||
|
if: ${{ failure() }}
|
||||||
|
run: cat reports/tests_torch_xformers_cuda_failures_short.txt
|
||||||
|
|
||||||
|
- name: Test suite reports artifacts
|
||||||
|
if: ${{ always() }}
|
||||||
|
uses: actions/upload-artifact@v4
|
||||||
|
with:
|
||||||
|
name: torch_xformers_test_reports
|
||||||
|
path: reports
|
||||||
|
|
||||||
|
run_examples_tests:
|
||||||
|
name: Examples PyTorch CUDA tests on Ubuntu
|
||||||
|
|
||||||
|
runs-on:
|
||||||
|
group: aws-g4dn-2xlarge
|
||||||
|
|
||||||
|
container:
|
||||||
|
image: diffusers/diffusers-pytorch-cuda
|
||||||
|
options: --gpus 0 --shm-size "16gb" --ipc host
|
||||||
|
|
||||||
|
steps:
|
||||||
|
- name: Checkout diffusers
|
||||||
|
uses: actions/checkout@v3
|
||||||
|
with:
|
||||||
|
fetch-depth: 2
|
||||||
|
|
||||||
|
- name: NVIDIA-SMI
|
||||||
|
run: |
|
||||||
|
nvidia-smi
|
||||||
|
|
||||||
|
- name: Install dependencies
|
||||||
|
run: |
|
||||||
|
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
||||||
|
python -m uv pip install -e [quality,test,training]
|
||||||
|
|
||||||
|
- name: Environment
|
||||||
|
run: |
|
||||||
|
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
||||||
|
python utils/print_env.py
|
||||||
|
|
||||||
|
- name: Run example tests on GPU
|
||||||
|
env:
|
||||||
|
HF_TOKEN: ${{ secrets.HF_TOKEN }}
|
||||||
|
run: |
|
||||||
|
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
||||||
|
python -m uv pip install timm
|
||||||
|
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile -s -v --make-reports=examples_torch_cuda examples/
|
||||||
|
|
||||||
|
- name: Failure short reports
|
||||||
|
if: ${{ failure() }}
|
||||||
|
run: |
|
||||||
|
cat reports/examples_torch_cuda_stats.txt
|
||||||
|
cat reports/examples_torch_cuda_failures_short.txt
|
||||||
|
|
||||||
|
- name: Test suite reports artifacts
|
||||||
|
if: ${{ always() }}
|
||||||
|
uses: actions/upload-artifact@v4
|
||||||
|
with:
|
||||||
|
name: examples_test_reports
|
||||||
|
path: reports
|
||||||
74
.github/workflows/run_tests_from_a_pr.yml
vendored
Normal file
74
.github/workflows/run_tests_from_a_pr.yml
vendored
Normal file
@@ -0,0 +1,74 @@
|
|||||||
|
name: Check running SLOW tests from a PR (only GPU)
|
||||||
|
|
||||||
|
on:
|
||||||
|
workflow_dispatch:
|
||||||
|
inputs:
|
||||||
|
docker_image:
|
||||||
|
default: 'diffusers/diffusers-pytorch-cuda'
|
||||||
|
description: 'Name of the Docker image'
|
||||||
|
required: true
|
||||||
|
branch:
|
||||||
|
description: 'PR Branch to test on'
|
||||||
|
required: true
|
||||||
|
test:
|
||||||
|
description: 'Tests to run (e.g.: `tests/models`).'
|
||||||
|
required: true
|
||||||
|
|
||||||
|
env:
|
||||||
|
DIFFUSERS_IS_CI: yes
|
||||||
|
IS_GITHUB_CI: "1"
|
||||||
|
HF_HOME: /mnt/cache
|
||||||
|
OMP_NUM_THREADS: 8
|
||||||
|
MKL_NUM_THREADS: 8
|
||||||
|
PYTEST_TIMEOUT: 600
|
||||||
|
RUN_SLOW: yes
|
||||||
|
|
||||||
|
jobs:
|
||||||
|
run_tests:
|
||||||
|
name: "Run a test on our runner from a PR"
|
||||||
|
runs-on:
|
||||||
|
group: aws-g4dn-2xlarge
|
||||||
|
container:
|
||||||
|
image: ${{ github.event.inputs.docker_image }}
|
||||||
|
options: --gpus 0 --privileged --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
|
||||||
|
|
||||||
|
steps:
|
||||||
|
- name: Validate test files input
|
||||||
|
id: validate_test_files
|
||||||
|
env:
|
||||||
|
PY_TEST: ${{ github.event.inputs.test }}
|
||||||
|
run: |
|
||||||
|
if [[ ! "$PY_TEST" =~ ^tests/ ]]; then
|
||||||
|
echo "Error: The input string must start with 'tests/'."
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
if [[ ! "$PY_TEST" =~ ^tests/(models|pipelines) ]]; then
|
||||||
|
echo "Error: The input string must contain either 'models' or 'pipelines' after 'tests/'."
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
if [[ "$PY_TEST" == *";"* ]]; then
|
||||||
|
echo "Error: The input string must not contain ';'."
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
echo "$PY_TEST"
|
||||||
|
|
||||||
|
- name: Checkout PR branch
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
with:
|
||||||
|
ref: ${{ github.event.inputs.branch }}
|
||||||
|
repository: ${{ github.event.pull_request.head.repo.full_name }}
|
||||||
|
|
||||||
|
|
||||||
|
- name: Install pytest
|
||||||
|
run: |
|
||||||
|
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
||||||
|
python -m uv pip install -e [quality,test]
|
||||||
|
python -m uv pip install peft
|
||||||
|
|
||||||
|
- name: Run tests
|
||||||
|
env:
|
||||||
|
PY_TEST: ${{ github.event.inputs.test }}
|
||||||
|
run: |
|
||||||
|
pytest "$PY_TEST"
|
||||||
40
.github/workflows/ssh-pr-runner.yml
vendored
Normal file
40
.github/workflows/ssh-pr-runner.yml
vendored
Normal file
@@ -0,0 +1,40 @@
|
|||||||
|
name: SSH into PR runners
|
||||||
|
|
||||||
|
on:
|
||||||
|
workflow_dispatch:
|
||||||
|
inputs:
|
||||||
|
docker_image:
|
||||||
|
description: 'Name of the Docker image'
|
||||||
|
required: true
|
||||||
|
|
||||||
|
env:
|
||||||
|
IS_GITHUB_CI: "1"
|
||||||
|
HF_HUB_READ_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }}
|
||||||
|
HF_HOME: /mnt/cache
|
||||||
|
DIFFUSERS_IS_CI: yes
|
||||||
|
OMP_NUM_THREADS: 8
|
||||||
|
MKL_NUM_THREADS: 8
|
||||||
|
RUN_SLOW: yes
|
||||||
|
|
||||||
|
jobs:
|
||||||
|
ssh_runner:
|
||||||
|
name: "SSH"
|
||||||
|
runs-on:
|
||||||
|
group: aws-highmemory-32-plus
|
||||||
|
container:
|
||||||
|
image: ${{ github.event.inputs.docker_image }}
|
||||||
|
options: --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface/diffusers:/mnt/cache/ --privileged
|
||||||
|
|
||||||
|
steps:
|
||||||
|
- name: Checkout diffusers
|
||||||
|
uses: actions/checkout@v3
|
||||||
|
with:
|
||||||
|
fetch-depth: 2
|
||||||
|
|
||||||
|
- name: Tailscale # In order to be able to SSH when a test fails
|
||||||
|
uses: huggingface/tailscale-action@main
|
||||||
|
with:
|
||||||
|
authkey: ${{ secrets.TAILSCALE_SSH_AUTHKEY }}
|
||||||
|
slackChannel: ${{ secrets.SLACK_CIFEEDBACK_CHANNEL }}
|
||||||
|
slackToken: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
|
||||||
|
waitForSSH: true
|
||||||
52
.github/workflows/ssh-runner.yml
vendored
Normal file
52
.github/workflows/ssh-runner.yml
vendored
Normal file
@@ -0,0 +1,52 @@
|
|||||||
|
name: SSH into GPU runners
|
||||||
|
|
||||||
|
on:
|
||||||
|
workflow_dispatch:
|
||||||
|
inputs:
|
||||||
|
runner_type:
|
||||||
|
description: 'Type of runner to test (aws-g6-4xlarge-plus: a10, aws-g4dn-2xlarge: t4, aws-g6e-xlarge-plus: L40)'
|
||||||
|
type: choice
|
||||||
|
required: true
|
||||||
|
options:
|
||||||
|
- aws-g6-4xlarge-plus
|
||||||
|
- aws-g4dn-2xlarge
|
||||||
|
- aws-g6e-xlarge-plus
|
||||||
|
docker_image:
|
||||||
|
description: 'Name of the Docker image'
|
||||||
|
required: true
|
||||||
|
|
||||||
|
env:
|
||||||
|
IS_GITHUB_CI: "1"
|
||||||
|
HF_HUB_READ_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }}
|
||||||
|
HF_HOME: /mnt/cache
|
||||||
|
DIFFUSERS_IS_CI: yes
|
||||||
|
OMP_NUM_THREADS: 8
|
||||||
|
MKL_NUM_THREADS: 8
|
||||||
|
RUN_SLOW: yes
|
||||||
|
|
||||||
|
jobs:
|
||||||
|
ssh_runner:
|
||||||
|
name: "SSH"
|
||||||
|
runs-on:
|
||||||
|
group: "${{ github.event.inputs.runner_type }}"
|
||||||
|
container:
|
||||||
|
image: ${{ github.event.inputs.docker_image }}
|
||||||
|
options: --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface/diffusers:/mnt/cache/ --gpus 0 --privileged
|
||||||
|
|
||||||
|
steps:
|
||||||
|
- name: Checkout diffusers
|
||||||
|
uses: actions/checkout@v3
|
||||||
|
with:
|
||||||
|
fetch-depth: 2
|
||||||
|
|
||||||
|
- name: NVIDIA-SMI
|
||||||
|
run: |
|
||||||
|
nvidia-smi
|
||||||
|
|
||||||
|
- name: Tailscale # In order to be able to SSH when a test fails
|
||||||
|
uses: huggingface/tailscale-action@main
|
||||||
|
with:
|
||||||
|
authkey: ${{ secrets.TAILSCALE_SSH_AUTHKEY }}
|
||||||
|
slackChannel: ${{ secrets.SLACK_CIFEEDBACK_CHANNEL }}
|
||||||
|
slackToken: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
|
||||||
|
waitForSSH: true
|
||||||
5
.github/workflows/stale.yml
vendored
5
.github/workflows/stale.yml
vendored
@@ -8,7 +8,10 @@ jobs:
|
|||||||
close_stale_issues:
|
close_stale_issues:
|
||||||
name: Close Stale Issues
|
name: Close Stale Issues
|
||||||
if: github.repository == 'huggingface/diffusers'
|
if: github.repository == 'huggingface/diffusers'
|
||||||
runs-on: ubuntu-latest
|
runs-on: ubuntu-22.04
|
||||||
|
permissions:
|
||||||
|
issues: write
|
||||||
|
pull-requests: write
|
||||||
env:
|
env:
|
||||||
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
|
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
|
||||||
steps:
|
steps:
|
||||||
|
|||||||
15
.github/workflows/trufflehog.yml
vendored
Normal file
15
.github/workflows/trufflehog.yml
vendored
Normal file
@@ -0,0 +1,15 @@
|
|||||||
|
on:
|
||||||
|
push:
|
||||||
|
|
||||||
|
name: Secret Leaks
|
||||||
|
|
||||||
|
jobs:
|
||||||
|
trufflehog:
|
||||||
|
runs-on: ubuntu-22.04
|
||||||
|
steps:
|
||||||
|
- name: Checkout code
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
with:
|
||||||
|
fetch-depth: 0
|
||||||
|
- name: Secret Scanning
|
||||||
|
uses: trufflesecurity/trufflehog@main
|
||||||
2
.github/workflows/typos.yml
vendored
2
.github/workflows/typos.yml
vendored
@@ -5,7 +5,7 @@ on:
|
|||||||
|
|
||||||
jobs:
|
jobs:
|
||||||
build:
|
build:
|
||||||
runs-on: ubuntu-latest
|
runs-on: ubuntu-22.04
|
||||||
|
|
||||||
steps:
|
steps:
|
||||||
- uses: actions/checkout@v3
|
- uses: actions/checkout@v3
|
||||||
|
|||||||
2
.github/workflows/update_metadata.yml
vendored
2
.github/workflows/update_metadata.yml
vendored
@@ -25,6 +25,6 @@ jobs:
|
|||||||
|
|
||||||
- name: Update metadata
|
- name: Update metadata
|
||||||
env:
|
env:
|
||||||
HUGGING_FACE_HUB_TOKEN: ${{ secrets.SAYAK_HF_TOKEN }}
|
HF_TOKEN: ${{ secrets.SAYAK_HF_TOKEN }}
|
||||||
run: |
|
run: |
|
||||||
python utils/update_metadata.py --commit_sha ${{ github.sha }}
|
python utils/update_metadata.py --commit_sha ${{ github.sha }}
|
||||||
|
|||||||
@@ -57,13 +57,13 @@ Any question or comment related to the Diffusers library can be asked on the [di
|
|||||||
- ...
|
- ...
|
||||||
|
|
||||||
Every question that is asked on the forum or on Discord actively encourages the community to publicly
|
Every question that is asked on the forum or on Discord actively encourages the community to publicly
|
||||||
share knowledge and might very well help a beginner in the future that has the same question you're
|
share knowledge and might very well help a beginner in the future who has the same question you're
|
||||||
having. Please do pose any questions you might have.
|
having. Please do pose any questions you might have.
|
||||||
In the same spirit, you are of immense help to the community by answering such questions because this way you are publicly documenting knowledge for everybody to learn from.
|
In the same spirit, you are of immense help to the community by answering such questions because this way you are publicly documenting knowledge for everybody to learn from.
|
||||||
|
|
||||||
**Please** keep in mind that the more effort you put into asking or answering a question, the higher
|
**Please** keep in mind that the more effort you put into asking or answering a question, the higher
|
||||||
the quality of the publicly documented knowledge. In the same way, well-posed and well-answered questions create a high-quality knowledge database accessible to everybody, while badly posed questions or answers reduce the overall quality of the public knowledge database.
|
the quality of the publicly documented knowledge. In the same way, well-posed and well-answered questions create a high-quality knowledge database accessible to everybody, while badly posed questions or answers reduce the overall quality of the public knowledge database.
|
||||||
In short, a high quality question or answer is *precise*, *concise*, *relevant*, *easy-to-understand*, *accessible*, and *well-formated/well-posed*. For more information, please have a look through the [How to write a good issue](#how-to-write-a-good-issue) section.
|
In short, a high quality question or answer is *precise*, *concise*, *relevant*, *easy-to-understand*, *accessible*, and *well-formatted/well-posed*. For more information, please have a look through the [How to write a good issue](#how-to-write-a-good-issue) section.
|
||||||
|
|
||||||
**NOTE about channels**:
|
**NOTE about channels**:
|
||||||
[*The forum*](https://discuss.huggingface.co/c/discussion-related-to-httpsgithubcomhuggingfacediffusers/63) is much better indexed by search engines, such as Google. Posts are ranked by popularity rather than chronologically. Hence, it's easier to look up questions and answers that we posted some time ago.
|
[*The forum*](https://discuss.huggingface.co/c/discussion-related-to-httpsgithubcomhuggingfacediffusers/63) is much better indexed by search engines, such as Google. Posts are ranked by popularity rather than chronologically. Hence, it's easier to look up questions and answers that we posted some time ago.
|
||||||
@@ -245,7 +245,7 @@ The official training examples are maintained by the Diffusers' core maintainers
|
|||||||
This is because of the same reasons put forward in [6. Contribute a community pipeline](#6-contribute-a-community-pipeline) for official pipelines vs. community pipelines: It is not feasible for the core maintainers to maintain all possible training methods for diffusion models.
|
This is because of the same reasons put forward in [6. Contribute a community pipeline](#6-contribute-a-community-pipeline) for official pipelines vs. community pipelines: It is not feasible for the core maintainers to maintain all possible training methods for diffusion models.
|
||||||
If the Diffusers core maintainers and the community consider a certain training paradigm to be too experimental or not popular enough, the corresponding training code should be put in the `research_projects` folder and maintained by the author.
|
If the Diffusers core maintainers and the community consider a certain training paradigm to be too experimental or not popular enough, the corresponding training code should be put in the `research_projects` folder and maintained by the author.
|
||||||
|
|
||||||
Both official training and research examples consist of a directory that contains one or more training scripts, a requirements.txt file, and a README.md file. In order for the user to make use of the
|
Both official training and research examples consist of a directory that contains one or more training scripts, a `requirements.txt` file, and a `README.md` file. In order for the user to make use of the
|
||||||
training examples, it is required to clone the repository:
|
training examples, it is required to clone the repository:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
@@ -255,7 +255,8 @@ git clone https://github.com/huggingface/diffusers
|
|||||||
as well as to install all additional dependencies required for training:
|
as well as to install all additional dependencies required for training:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
pip install -r /examples/<your-example-folder>/requirements.txt
|
cd diffusers
|
||||||
|
pip install -r examples/<your-example-folder>/requirements.txt
|
||||||
```
|
```
|
||||||
|
|
||||||
Therefore when adding an example, the `requirements.txt` file shall define all pip dependencies required for your training example so that once all those are installed, the user can run the example's training script. See, for example, the [DreamBooth `requirements.txt` file](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/requirements.txt).
|
Therefore when adding an example, the `requirements.txt` file shall define all pip dependencies required for your training example so that once all those are installed, the user can run the example's training script. See, for example, the [DreamBooth `requirements.txt` file](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/requirements.txt).
|
||||||
@@ -355,7 +356,7 @@ You will need basic `git` proficiency to be able to contribute to
|
|||||||
manual. Type `git --help` in a shell and enjoy. If you prefer books, [Pro
|
manual. Type `git --help` in a shell and enjoy. If you prefer books, [Pro
|
||||||
Git](https://git-scm.com/book/en/v2) is a very good reference.
|
Git](https://git-scm.com/book/en/v2) is a very good reference.
|
||||||
|
|
||||||
Follow these steps to start contributing ([supported Python versions](https://github.com/huggingface/diffusers/blob/main/setup.py#L265)):
|
Follow these steps to start contributing ([supported Python versions](https://github.com/huggingface/diffusers/blob/42f25d601a910dceadaee6c44345896b4cfa9928/setup.py#L270)):
|
||||||
|
|
||||||
1. Fork the [repository](https://github.com/huggingface/diffusers) by
|
1. Fork the [repository](https://github.com/huggingface/diffusers) by
|
||||||
clicking on the 'Fork' button on the repository's page. This creates a copy of the code
|
clicking on the 'Fork' button on the repository's page. This creates a copy of the code
|
||||||
|
|||||||
@@ -15,7 +15,7 @@ specific language governing permissions and limitations under the License.
|
|||||||
🧨 Diffusers provides **state-of-the-art** pretrained diffusion models across multiple modalities.
|
🧨 Diffusers provides **state-of-the-art** pretrained diffusion models across multiple modalities.
|
||||||
Its purpose is to serve as a **modular toolbox** for both inference and training.
|
Its purpose is to serve as a **modular toolbox** for both inference and training.
|
||||||
|
|
||||||
We aim at building a library that stands the test of time and therefore take API design very seriously.
|
We aim to build a library that stands the test of time and therefore take API design very seriously.
|
||||||
|
|
||||||
In a nutshell, Diffusers is built to be a natural extension of PyTorch. Therefore, most of our design choices are based on [PyTorch's Design Principles](https://pytorch.org/docs/stable/community/design.html#pytorch-design-philosophy). Let's go over the most important ones:
|
In a nutshell, Diffusers is built to be a natural extension of PyTorch. Therefore, most of our design choices are based on [PyTorch's Design Principles](https://pytorch.org/docs/stable/community/design.html#pytorch-design-philosophy). Let's go over the most important ones:
|
||||||
|
|
||||||
@@ -63,14 +63,14 @@ Let's walk through more detailed design decisions for each class.
|
|||||||
Pipelines are designed to be easy to use (therefore do not follow [*Simple over easy*](#simple-over-easy) 100%), are not feature complete, and should loosely be seen as examples of how to use [models](#models) and [schedulers](#schedulers) for inference.
|
Pipelines are designed to be easy to use (therefore do not follow [*Simple over easy*](#simple-over-easy) 100%), are not feature complete, and should loosely be seen as examples of how to use [models](#models) and [schedulers](#schedulers) for inference.
|
||||||
|
|
||||||
The following design principles are followed:
|
The following design principles are followed:
|
||||||
- Pipelines follow the single-file policy. All pipelines can be found in individual directories under src/diffusers/pipelines. One pipeline folder corresponds to one diffusion paper/project/release. Multiple pipeline files can be gathered in one pipeline folder, as it’s done for [`src/diffusers/pipelines/stable-diffusion`](https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines/stable_diffusion). If pipelines share similar functionality, one can make use of the [#Copied from mechanism](https://github.com/huggingface/diffusers/blob/125d783076e5bd9785beb05367a2d2566843a271/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_img2img.py#L251).
|
- Pipelines follow the single-file policy. All pipelines can be found in individual directories under src/diffusers/pipelines. One pipeline folder corresponds to one diffusion paper/project/release. Multiple pipeline files can be gathered in one pipeline folder, as it’s done for [`src/diffusers/pipelines/stable-diffusion`](https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines/stable_diffusion). If pipelines share similar functionality, one can make use of the [# Copied from mechanism](https://github.com/huggingface/diffusers/blob/125d783076e5bd9785beb05367a2d2566843a271/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_img2img.py#L251).
|
||||||
- Pipelines all inherit from [`DiffusionPipeline`].
|
- Pipelines all inherit from [`DiffusionPipeline`].
|
||||||
- Every pipeline consists of different model and scheduler components, that are documented in the [`model_index.json` file](https://huggingface.co/runwayml/stable-diffusion-v1-5/blob/main/model_index.json), are accessible under the same name as attributes of the pipeline and can be shared between pipelines with [`DiffusionPipeline.components`](https://huggingface.co/docs/diffusers/main/en/api/diffusion_pipeline#diffusers.DiffusionPipeline.components) function.
|
- Every pipeline consists of different model and scheduler components, that are documented in the [`model_index.json` file](https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5/blob/main/model_index.json), are accessible under the same name as attributes of the pipeline and can be shared between pipelines with [`DiffusionPipeline.components`](https://huggingface.co/docs/diffusers/main/en/api/diffusion_pipeline#diffusers.DiffusionPipeline.components) function.
|
||||||
- Every pipeline should be loadable via the [`DiffusionPipeline.from_pretrained`](https://huggingface.co/docs/diffusers/main/en/api/diffusion_pipeline#diffusers.DiffusionPipeline.from_pretrained) function.
|
- Every pipeline should be loadable via the [`DiffusionPipeline.from_pretrained`](https://huggingface.co/docs/diffusers/main/en/api/diffusion_pipeline#diffusers.DiffusionPipeline.from_pretrained) function.
|
||||||
- Pipelines should be used **only** for inference.
|
- Pipelines should be used **only** for inference.
|
||||||
- Pipelines should be very readable, self-explanatory, and easy to tweak.
|
- Pipelines should be very readable, self-explanatory, and easy to tweak.
|
||||||
- Pipelines should be designed to build on top of each other and be easy to integrate into higher-level APIs.
|
- Pipelines should be designed to build on top of each other and be easy to integrate into higher-level APIs.
|
||||||
- Pipelines are **not** intended to be feature-complete user interfaces. For future complete user interfaces one should rather have a look at [InvokeAI](https://github.com/invoke-ai/InvokeAI), [Diffuzers](https://github.com/abhishekkrthakur/diffuzers), and [lama-cleaner](https://github.com/Sanster/lama-cleaner).
|
- Pipelines are **not** intended to be feature-complete user interfaces. For feature-complete user interfaces one should rather have a look at [InvokeAI](https://github.com/invoke-ai/InvokeAI), [Diffuzers](https://github.com/abhishekkrthakur/diffuzers), and [lama-cleaner](https://github.com/Sanster/lama-cleaner).
|
||||||
- Every pipeline should have one and only one way to run it via a `__call__` method. The naming of the `__call__` arguments should be shared across all pipelines.
|
- Every pipeline should have one and only one way to run it via a `__call__` method. The naming of the `__call__` arguments should be shared across all pipelines.
|
||||||
- Pipelines should be named after the task they are intended to solve.
|
- Pipelines should be named after the task they are intended to solve.
|
||||||
- In almost all cases, novel diffusion pipelines shall be implemented in a new pipeline folder/file.
|
- In almost all cases, novel diffusion pipelines shall be implemented in a new pipeline folder/file.
|
||||||
@@ -81,7 +81,7 @@ Models are designed as configurable toolboxes that are natural extensions of [Py
|
|||||||
|
|
||||||
The following design principles are followed:
|
The following design principles are followed:
|
||||||
- Models correspond to **a type of model architecture**. *E.g.* the [`UNet2DConditionModel`] class is used for all UNet variations that expect 2D image inputs and are conditioned on some context.
|
- Models correspond to **a type of model architecture**. *E.g.* the [`UNet2DConditionModel`] class is used for all UNet variations that expect 2D image inputs and are conditioned on some context.
|
||||||
- All models can be found in [`src/diffusers/models`](https://github.com/huggingface/diffusers/tree/main/src/diffusers/models) and every model architecture shall be defined in its file, e.g. [`unet_2d_condition.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/unet_2d_condition.py), [`transformer_2d.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/transformer_2d.py), etc...
|
- All models can be found in [`src/diffusers/models`](https://github.com/huggingface/diffusers/tree/main/src/diffusers/models) and every model architecture shall be defined in its file, e.g. [`unets/unet_2d_condition.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/unets/unet_2d_condition.py), [`transformers/transformer_2d.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/transformers/transformer_2d.py), etc...
|
||||||
- Models **do not** follow the single-file policy and should make use of smaller model building blocks, such as [`attention.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention.py), [`resnet.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/resnet.py), [`embeddings.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/embeddings.py), etc... **Note**: This is in stark contrast to Transformers' modeling files and shows that models do not really follow the single-file policy.
|
- Models **do not** follow the single-file policy and should make use of smaller model building blocks, such as [`attention.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention.py), [`resnet.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/resnet.py), [`embeddings.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/embeddings.py), etc... **Note**: This is in stark contrast to Transformers' modeling files and shows that models do not really follow the single-file policy.
|
||||||
- Models intend to expose complexity, just like PyTorch's `Module` class, and give clear error messages.
|
- Models intend to expose complexity, just like PyTorch's `Module` class, and give clear error messages.
|
||||||
- Models all inherit from `ModelMixin` and `ConfigMixin`.
|
- Models all inherit from `ModelMixin` and `ConfigMixin`.
|
||||||
@@ -90,7 +90,7 @@ The following design principles are followed:
|
|||||||
- To integrate new model checkpoints whose general architecture can be classified as an architecture that already exists in Diffusers, the existing model architecture shall be adapted to make it work with the new checkpoint. One should only create a new file if the model architecture is fundamentally different.
|
- To integrate new model checkpoints whose general architecture can be classified as an architecture that already exists in Diffusers, the existing model architecture shall be adapted to make it work with the new checkpoint. One should only create a new file if the model architecture is fundamentally different.
|
||||||
- Models should be designed to be easily extendable to future changes. This can be achieved by limiting public function arguments, configuration arguments, and "foreseeing" future changes, *e.g.* it is usually better to add `string` "...type" arguments that can easily be extended to new future types instead of boolean `is_..._type` arguments. Only the minimum amount of changes shall be made to existing architectures to make a new model checkpoint work.
|
- Models should be designed to be easily extendable to future changes. This can be achieved by limiting public function arguments, configuration arguments, and "foreseeing" future changes, *e.g.* it is usually better to add `string` "...type" arguments that can easily be extended to new future types instead of boolean `is_..._type` arguments. Only the minimum amount of changes shall be made to existing architectures to make a new model checkpoint work.
|
||||||
- The model design is a difficult trade-off between keeping code readable and concise and supporting many model checkpoints. For most parts of the modeling code, classes shall be adapted for new model checkpoints, while there are some exceptions where it is preferred to add new classes to make sure the code is kept concise and
|
- The model design is a difficult trade-off between keeping code readable and concise and supporting many model checkpoints. For most parts of the modeling code, classes shall be adapted for new model checkpoints, while there are some exceptions where it is preferred to add new classes to make sure the code is kept concise and
|
||||||
readable long-term, such as [UNet blocks](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/unet_2d_blocks.py) and [Attention processors](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention_processor.py).
|
readable long-term, such as [UNet blocks](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/unets/unet_2d_blocks.py) and [Attention processors](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention_processor.py).
|
||||||
|
|
||||||
### Schedulers
|
### Schedulers
|
||||||
|
|
||||||
@@ -100,7 +100,7 @@ The following design principles are followed:
|
|||||||
- All schedulers are found in [`src/diffusers/schedulers`](https://github.com/huggingface/diffusers/tree/main/src/diffusers/schedulers).
|
- All schedulers are found in [`src/diffusers/schedulers`](https://github.com/huggingface/diffusers/tree/main/src/diffusers/schedulers).
|
||||||
- Schedulers are **not** allowed to import from large utils files and shall be kept very self-contained.
|
- Schedulers are **not** allowed to import from large utils files and shall be kept very self-contained.
|
||||||
- One scheduler Python file corresponds to one scheduler algorithm (as might be defined in a paper).
|
- One scheduler Python file corresponds to one scheduler algorithm (as might be defined in a paper).
|
||||||
- If schedulers share similar functionalities, we can make use of the `#Copied from` mechanism.
|
- If schedulers share similar functionalities, we can make use of the `# Copied from` mechanism.
|
||||||
- Schedulers all inherit from `SchedulerMixin` and `ConfigMixin`.
|
- Schedulers all inherit from `SchedulerMixin` and `ConfigMixin`.
|
||||||
- Schedulers can be easily swapped out with the [`ConfigMixin.from_config`](https://huggingface.co/docs/diffusers/main/en/api/configuration#diffusers.ConfigMixin.from_config) method as explained in detail [here](./docs/source/en/using-diffusers/schedulers.md).
|
- Schedulers can be easily swapped out with the [`ConfigMixin.from_config`](https://huggingface.co/docs/diffusers/main/en/api/configuration#diffusers.ConfigMixin.from_config) method as explained in detail [here](./docs/source/en/using-diffusers/schedulers.md).
|
||||||
- Every scheduler has to have a `set_num_inference_steps`, and a `step` function. `set_num_inference_steps(...)` has to be called before every denoising process, *i.e.* before `step(...)` is called.
|
- Every scheduler has to have a `set_num_inference_steps`, and a `step` function. `set_num_inference_steps(...)` has to be called before every denoising process, *i.e.* before `step(...)` is called.
|
||||||
|
|||||||
33
README.md
33
README.md
@@ -20,21 +20,11 @@ limitations under the License.
|
|||||||
<br>
|
<br>
|
||||||
<p>
|
<p>
|
||||||
<p align="center">
|
<p align="center">
|
||||||
<a href="https://github.com/huggingface/diffusers/blob/main/LICENSE">
|
<a href="https://github.com/huggingface/diffusers/blob/main/LICENSE"><img alt="GitHub" src="https://img.shields.io/github/license/huggingface/datasets.svg?color=blue"></a>
|
||||||
<img alt="GitHub" src="https://img.shields.io/github/license/huggingface/datasets.svg?color=blue">
|
<a href="https://github.com/huggingface/diffusers/releases"><img alt="GitHub release" src="https://img.shields.io/github/release/huggingface/diffusers.svg"></a>
|
||||||
</a>
|
<a href="https://pepy.tech/project/diffusers"><img alt="GitHub release" src="https://static.pepy.tech/badge/diffusers/month"></a>
|
||||||
<a href="https://github.com/huggingface/diffusers/releases">
|
<a href="CODE_OF_CONDUCT.md"><img alt="Contributor Covenant" src="https://img.shields.io/badge/Contributor%20Covenant-2.1-4baaaa.svg"></a>
|
||||||
<img alt="GitHub release" src="https://img.shields.io/github/release/huggingface/diffusers.svg">
|
<a href="https://twitter.com/diffuserslib"><img alt="X account" src="https://img.shields.io/twitter/url/https/twitter.com/diffuserslib.svg?style=social&label=Follow%20%40diffuserslib"></a>
|
||||||
</a>
|
|
||||||
<a href="https://pepy.tech/project/diffusers">
|
|
||||||
<img alt="GitHub release" src="https://static.pepy.tech/badge/diffusers/month">
|
|
||||||
</a>
|
|
||||||
<a href="CODE_OF_CONDUCT.md">
|
|
||||||
<img alt="Contributor Covenant" src="https://img.shields.io/badge/Contributor%20Covenant-2.1-4baaaa.svg">
|
|
||||||
</a>
|
|
||||||
<a href="https://twitter.com/diffuserslib">
|
|
||||||
<img alt="X account" src="https://img.shields.io/twitter/url/https/twitter.com/diffuserslib.svg?style=social&label=Follow%20%40diffuserslib">
|
|
||||||
</a>
|
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
🤗 Diffusers is the go-to library for state-of-the-art pretrained diffusion models for generating images, audio, and even 3D structures of molecules. Whether you're looking for a simple inference solution or training your own diffusion models, 🤗 Diffusers is a modular toolbox that supports both. Our library is designed with a focus on [usability over performance](https://huggingface.co/docs/diffusers/conceptual/philosophy#usability-over-performance), [simple over easy](https://huggingface.co/docs/diffusers/conceptual/philosophy#simple-over-easy), and [customizability over abstractions](https://huggingface.co/docs/diffusers/conceptual/philosophy#tweakable-contributorfriendly-over-abstraction).
|
🤗 Diffusers is the go-to library for state-of-the-art pretrained diffusion models for generating images, audio, and even 3D structures of molecules. Whether you're looking for a simple inference solution or training your own diffusion models, 🤗 Diffusers is a modular toolbox that supports both. Our library is designed with a focus on [usability over performance](https://huggingface.co/docs/diffusers/conceptual/philosophy#usability-over-performance), [simple over easy](https://huggingface.co/docs/diffusers/conceptual/philosophy#simple-over-easy), and [customizability over abstractions](https://huggingface.co/docs/diffusers/conceptual/philosophy#tweakable-contributorfriendly-over-abstraction).
|
||||||
@@ -77,13 +67,13 @@ Please refer to the [How to use Stable Diffusion in Apple Silicon](https://huggi
|
|||||||
|
|
||||||
## Quickstart
|
## Quickstart
|
||||||
|
|
||||||
Generating outputs is super easy with 🤗 Diffusers. To generate an image from text, use the `from_pretrained` method to load any pretrained diffusion model (browse the [Hub](https://huggingface.co/models?library=diffusers&sort=downloads) for 22000+ checkpoints):
|
Generating outputs is super easy with 🤗 Diffusers. To generate an image from text, use the `from_pretrained` method to load any pretrained diffusion model (browse the [Hub](https://huggingface.co/models?library=diffusers&sort=downloads) for 30,000+ checkpoints):
|
||||||
|
|
||||||
```python
|
```python
|
||||||
from diffusers import DiffusionPipeline
|
from diffusers import DiffusionPipeline
|
||||||
import torch
|
import torch
|
||||||
|
|
||||||
pipeline = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16)
|
pipeline = DiffusionPipeline.from_pretrained("stable-diffusion-v1-5/stable-diffusion-v1-5", torch_dtype=torch.float16)
|
||||||
pipeline.to("cuda")
|
pipeline.to("cuda")
|
||||||
pipeline("An image of a squirrel in Picasso style").images[0]
|
pipeline("An image of a squirrel in Picasso style").images[0]
|
||||||
```
|
```
|
||||||
@@ -124,7 +114,7 @@ Check out the [Quickstart](https://huggingface.co/docs/diffusers/quicktour) to l
|
|||||||
| [Tutorial](https://huggingface.co/docs/diffusers/tutorials/tutorial_overview) | A basic crash course for learning how to use the library's most important features like using models and schedulers to build your own diffusion system, and training your own diffusion model. |
|
| [Tutorial](https://huggingface.co/docs/diffusers/tutorials/tutorial_overview) | A basic crash course for learning how to use the library's most important features like using models and schedulers to build your own diffusion system, and training your own diffusion model. |
|
||||||
| [Loading](https://huggingface.co/docs/diffusers/using-diffusers/loading_overview) | Guides for how to load and configure all the components (pipelines, models, and schedulers) of the library, as well as how to use different schedulers. |
|
| [Loading](https://huggingface.co/docs/diffusers/using-diffusers/loading_overview) | Guides for how to load and configure all the components (pipelines, models, and schedulers) of the library, as well as how to use different schedulers. |
|
||||||
| [Pipelines for inference](https://huggingface.co/docs/diffusers/using-diffusers/pipeline_overview) | Guides for how to use pipelines for different inference tasks, batched generation, controlling generated outputs and randomness, and how to contribute a pipeline to the library. |
|
| [Pipelines for inference](https://huggingface.co/docs/diffusers/using-diffusers/pipeline_overview) | Guides for how to use pipelines for different inference tasks, batched generation, controlling generated outputs and randomness, and how to contribute a pipeline to the library. |
|
||||||
| [Optimization](https://huggingface.co/docs/diffusers/optimization/opt_overview) | Guides for how to optimize your diffusion model to run faster and consume less memory. |
|
| [Optimization](https://huggingface.co/docs/diffusers/optimization/fp16) | Guides for how to optimize your diffusion model to run faster and consume less memory. |
|
||||||
| [Training](https://huggingface.co/docs/diffusers/training/overview) | Guides for how to train a diffusion model for different tasks with different training techniques. |
|
| [Training](https://huggingface.co/docs/diffusers/training/overview) | Guides for how to train a diffusion model for different tasks with different training techniques. |
|
||||||
## Contribution
|
## Contribution
|
||||||
|
|
||||||
@@ -154,7 +144,7 @@ Also, say 👋 in our public Discord channel <a href="https://discord.gg/G7tWnz9
|
|||||||
<tr style="border-top: 2px solid black">
|
<tr style="border-top: 2px solid black">
|
||||||
<td>Text-to-Image</td>
|
<td>Text-to-Image</td>
|
||||||
<td><a href="https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/text2img">Stable Diffusion Text-to-Image</a></td>
|
<td><a href="https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/text2img">Stable Diffusion Text-to-Image</a></td>
|
||||||
<td><a href="https://huggingface.co/runwayml/stable-diffusion-v1-5"> runwayml/stable-diffusion-v1-5 </a></td>
|
<td><a href="https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5"> stable-diffusion-v1-5/stable-diffusion-v1-5 </a></td>
|
||||||
</tr>
|
</tr>
|
||||||
<tr>
|
<tr>
|
||||||
<td>Text-to-Image</td>
|
<td>Text-to-Image</td>
|
||||||
@@ -184,7 +174,7 @@ Also, say 👋 in our public Discord channel <a href="https://discord.gg/G7tWnz9
|
|||||||
<tr>
|
<tr>
|
||||||
<td>Text-guided Image-to-Image</td>
|
<td>Text-guided Image-to-Image</td>
|
||||||
<td><a href="https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/img2img">Stable Diffusion Image-to-Image</a></td>
|
<td><a href="https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/img2img">Stable Diffusion Image-to-Image</a></td>
|
||||||
<td><a href="https://huggingface.co/runwayml/stable-diffusion-v1-5"> runwayml/stable-diffusion-v1-5 </a></td>
|
<td><a href="https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5"> stable-diffusion-v1-5/stable-diffusion-v1-5 </a></td>
|
||||||
</tr>
|
</tr>
|
||||||
<tr style="border-top: 2px solid black">
|
<tr style="border-top: 2px solid black">
|
||||||
<td>Text-guided Image Inpainting</td>
|
<td>Text-guided Image Inpainting</td>
|
||||||
@@ -212,6 +202,7 @@ Also, say 👋 in our public Discord channel <a href="https://discord.gg/G7tWnz9
|
|||||||
|
|
||||||
- https://github.com/microsoft/TaskMatrix
|
- https://github.com/microsoft/TaskMatrix
|
||||||
- https://github.com/invoke-ai/InvokeAI
|
- https://github.com/invoke-ai/InvokeAI
|
||||||
|
- https://github.com/InstantID/InstantID
|
||||||
- https://github.com/apple/ml-stable-diffusion
|
- https://github.com/apple/ml-stable-diffusion
|
||||||
- https://github.com/Sanster/lama-cleaner
|
- https://github.com/Sanster/lama-cleaner
|
||||||
- https://github.com/IDEA-Research/Grounded-Segment-Anything
|
- https://github.com/IDEA-Research/Grounded-Segment-Anything
|
||||||
@@ -219,7 +210,7 @@ Also, say 👋 in our public Discord channel <a href="https://discord.gg/G7tWnz9
|
|||||||
- https://github.com/deep-floyd/IF
|
- https://github.com/deep-floyd/IF
|
||||||
- https://github.com/bentoml/BentoML
|
- https://github.com/bentoml/BentoML
|
||||||
- https://github.com/bmaltais/kohya_ss
|
- https://github.com/bmaltais/kohya_ss
|
||||||
- +9000 other amazing GitHub repositories 💪
|
- +14,000 other amazing GitHub repositories 💪
|
||||||
|
|
||||||
Thank you for using us ❤️.
|
Thank you for using us ❤️.
|
||||||
|
|
||||||
|
|||||||
@@ -34,7 +34,7 @@ from utils import ( # noqa: E402
|
|||||||
|
|
||||||
|
|
||||||
RESOLUTION_MAPPING = {
|
RESOLUTION_MAPPING = {
|
||||||
"runwayml/stable-diffusion-v1-5": (512, 512),
|
"Lykon/DreamShaper": (512, 512),
|
||||||
"lllyasviel/sd-controlnet-canny": (512, 512),
|
"lllyasviel/sd-controlnet-canny": (512, 512),
|
||||||
"diffusers/controlnet-canny-sdxl-1.0": (1024, 1024),
|
"diffusers/controlnet-canny-sdxl-1.0": (1024, 1024),
|
||||||
"TencentARC/t2iadapter_canny_sd14v1": (512, 512),
|
"TencentARC/t2iadapter_canny_sd14v1": (512, 512),
|
||||||
@@ -268,7 +268,7 @@ class IPAdapterTextToImageBenchmark(TextToImageBenchmark):
|
|||||||
class ControlNetBenchmark(TextToImageBenchmark):
|
class ControlNetBenchmark(TextToImageBenchmark):
|
||||||
pipeline_class = StableDiffusionControlNetPipeline
|
pipeline_class = StableDiffusionControlNetPipeline
|
||||||
aux_network_class = ControlNetModel
|
aux_network_class = ControlNetModel
|
||||||
root_ckpt = "runwayml/stable-diffusion-v1-5"
|
root_ckpt = "Lykon/DreamShaper"
|
||||||
|
|
||||||
url = "https://huggingface.co/datasets/diffusers/docs-images/resolve/main/benchmarking/canny_image_condition.png"
|
url = "https://huggingface.co/datasets/diffusers/docs-images/resolve/main/benchmarking/canny_image_condition.png"
|
||||||
image = load_image(url).convert("RGB")
|
image = load_image(url).convert("RGB")
|
||||||
@@ -311,7 +311,7 @@ class ControlNetSDXLBenchmark(ControlNetBenchmark):
|
|||||||
class T2IAdapterBenchmark(ControlNetBenchmark):
|
class T2IAdapterBenchmark(ControlNetBenchmark):
|
||||||
pipeline_class = StableDiffusionAdapterPipeline
|
pipeline_class = StableDiffusionAdapterPipeline
|
||||||
aux_network_class = T2IAdapter
|
aux_network_class = T2IAdapter
|
||||||
root_ckpt = "CompVis/stable-diffusion-v1-4"
|
root_ckpt = "Lykon/DreamShaper"
|
||||||
|
|
||||||
url = "https://huggingface.co/datasets/diffusers/docs-images/resolve/main/benchmarking/canny_for_adapter.png"
|
url = "https://huggingface.co/datasets/diffusers/docs-images/resolve/main/benchmarking/canny_for_adapter.png"
|
||||||
image = load_image(url).convert("L")
|
image = load_image(url).convert("L")
|
||||||
|
|||||||
@@ -7,7 +7,8 @@ from base_classes import IPAdapterTextToImageBenchmark # noqa: E402
|
|||||||
|
|
||||||
|
|
||||||
IP_ADAPTER_CKPTS = {
|
IP_ADAPTER_CKPTS = {
|
||||||
"runwayml/stable-diffusion-v1-5": ("h94/IP-Adapter", "ip-adapter_sd15.bin"),
|
# because original SD v1.5 has been taken down.
|
||||||
|
"Lykon/DreamShaper": ("h94/IP-Adapter", "ip-adapter_sd15.bin"),
|
||||||
"stabilityai/stable-diffusion-xl-base-1.0": ("h94/IP-Adapter", "ip-adapter_sdxl.bin"),
|
"stabilityai/stable-diffusion-xl-base-1.0": ("h94/IP-Adapter", "ip-adapter_sdxl.bin"),
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -17,7 +18,7 @@ if __name__ == "__main__":
|
|||||||
parser.add_argument(
|
parser.add_argument(
|
||||||
"--ckpt",
|
"--ckpt",
|
||||||
type=str,
|
type=str,
|
||||||
default="runwayml/stable-diffusion-v1-5",
|
default="rstabilityai/stable-diffusion-xl-base-1.0",
|
||||||
choices=list(IP_ADAPTER_CKPTS.keys()),
|
choices=list(IP_ADAPTER_CKPTS.keys()),
|
||||||
)
|
)
|
||||||
parser.add_argument("--batch_size", type=int, default=1)
|
parser.add_argument("--batch_size", type=int, default=1)
|
||||||
|
|||||||
@@ -11,9 +11,9 @@ if __name__ == "__main__":
|
|||||||
parser.add_argument(
|
parser.add_argument(
|
||||||
"--ckpt",
|
"--ckpt",
|
||||||
type=str,
|
type=str,
|
||||||
default="runwayml/stable-diffusion-v1-5",
|
default="Lykon/DreamShaper",
|
||||||
choices=[
|
choices=[
|
||||||
"runwayml/stable-diffusion-v1-5",
|
"Lykon/DreamShaper",
|
||||||
"stabilityai/stable-diffusion-2-1",
|
"stabilityai/stable-diffusion-2-1",
|
||||||
"stabilityai/stable-diffusion-xl-refiner-1.0",
|
"stabilityai/stable-diffusion-xl-refiner-1.0",
|
||||||
"stabilityai/sdxl-turbo",
|
"stabilityai/sdxl-turbo",
|
||||||
|
|||||||
@@ -11,9 +11,9 @@ if __name__ == "__main__":
|
|||||||
parser.add_argument(
|
parser.add_argument(
|
||||||
"--ckpt",
|
"--ckpt",
|
||||||
type=str,
|
type=str,
|
||||||
default="runwayml/stable-diffusion-v1-5",
|
default="Lykon/DreamShaper",
|
||||||
choices=[
|
choices=[
|
||||||
"runwayml/stable-diffusion-v1-5",
|
"Lykon/DreamShaper",
|
||||||
"stabilityai/stable-diffusion-2-1",
|
"stabilityai/stable-diffusion-2-1",
|
||||||
"stabilityai/stable-diffusion-xl-base-1.0",
|
"stabilityai/stable-diffusion-xl-base-1.0",
|
||||||
],
|
],
|
||||||
|
|||||||
@@ -7,7 +7,7 @@ from base_classes import TextToImageBenchmark, TurboTextToImageBenchmark # noqa
|
|||||||
|
|
||||||
|
|
||||||
ALL_T2I_CKPTS = [
|
ALL_T2I_CKPTS = [
|
||||||
"runwayml/stable-diffusion-v1-5",
|
"Lykon/DreamShaper",
|
||||||
"segmind/SSD-1B",
|
"segmind/SSD-1B",
|
||||||
"stabilityai/stable-diffusion-xl-base-1.0",
|
"stabilityai/stable-diffusion-xl-base-1.0",
|
||||||
"kandinsky-community/kandinsky-2-2-decoder",
|
"kandinsky-community/kandinsky-2-2-decoder",
|
||||||
@@ -21,7 +21,7 @@ if __name__ == "__main__":
|
|||||||
parser.add_argument(
|
parser.add_argument(
|
||||||
"--ckpt",
|
"--ckpt",
|
||||||
type=str,
|
type=str,
|
||||||
default="runwayml/stable-diffusion-v1-5",
|
default="Lykon/DreamShaper",
|
||||||
choices=ALL_T2I_CKPTS,
|
choices=ALL_T2I_CKPTS,
|
||||||
)
|
)
|
||||||
parser.add_argument("--batch_size", type=int, default=1)
|
parser.add_argument("--batch_size", type=int, default=1)
|
||||||
|
|||||||
@@ -3,7 +3,7 @@ import sys
|
|||||||
|
|
||||||
import pandas as pd
|
import pandas as pd
|
||||||
from huggingface_hub import hf_hub_download, upload_file
|
from huggingface_hub import hf_hub_download, upload_file
|
||||||
from huggingface_hub.utils._errors import EntryNotFoundError
|
from huggingface_hub.utils import EntryNotFoundError
|
||||||
|
|
||||||
|
|
||||||
sys.path.append(".")
|
sys.path.append(".")
|
||||||
|
|||||||
@@ -40,7 +40,7 @@ def main():
|
|||||||
print(f"****** Running file: {file} ******")
|
print(f"****** Running file: {file} ******")
|
||||||
|
|
||||||
# Run with canonical settings.
|
# Run with canonical settings.
|
||||||
if file != "benchmark_text_to_image.py":
|
if file != "benchmark_text_to_image.py" and file != "benchmark_ip_adapters.py":
|
||||||
command = f"python {file}"
|
command = f"python {file}"
|
||||||
run_command(command.split())
|
run_command(command.split())
|
||||||
|
|
||||||
@@ -49,6 +49,10 @@ def main():
|
|||||||
|
|
||||||
# Run variants.
|
# Run variants.
|
||||||
for file in python_files:
|
for file in python_files:
|
||||||
|
# See: https://github.com/pytorch/pytorch/issues/129637
|
||||||
|
if file == "benchmark_ip_adapters.py":
|
||||||
|
continue
|
||||||
|
|
||||||
if file == "benchmark_text_to_image.py":
|
if file == "benchmark_text_to_image.py":
|
||||||
for ckpt in ALL_T2I_CKPTS:
|
for ckpt in ALL_T2I_CKPTS:
|
||||||
command = f"python {file} --ckpt {ckpt}"
|
command = f"python {file} --ckpt {ckpt}"
|
||||||
|
|||||||
52
docker/diffusers-doc-builder/Dockerfile
Normal file
52
docker/diffusers-doc-builder/Dockerfile
Normal file
@@ -0,0 +1,52 @@
|
|||||||
|
FROM ubuntu:20.04
|
||||||
|
LABEL maintainer="Hugging Face"
|
||||||
|
LABEL repository="diffusers"
|
||||||
|
|
||||||
|
ENV DEBIAN_FRONTEND=noninteractive
|
||||||
|
|
||||||
|
RUN apt-get -y update \
|
||||||
|
&& apt-get install -y software-properties-common \
|
||||||
|
&& add-apt-repository ppa:deadsnakes/ppa
|
||||||
|
|
||||||
|
RUN apt install -y bash \
|
||||||
|
build-essential \
|
||||||
|
git \
|
||||||
|
git-lfs \
|
||||||
|
curl \
|
||||||
|
ca-certificates \
|
||||||
|
libsndfile1-dev \
|
||||||
|
python3.10 \
|
||||||
|
python3-pip \
|
||||||
|
libgl1 \
|
||||||
|
zip \
|
||||||
|
wget \
|
||||||
|
python3.10-venv && \
|
||||||
|
rm -rf /var/lib/apt/lists
|
||||||
|
|
||||||
|
# make sure to use venv
|
||||||
|
RUN python3.10 -m venv /opt/venv
|
||||||
|
ENV PATH="/opt/venv/bin:$PATH"
|
||||||
|
|
||||||
|
# pre-install the heavy dependencies (these can later be overridden by the deps from setup.py)
|
||||||
|
RUN python3.10 -m pip install --no-cache-dir --upgrade pip uv==0.1.11 && \
|
||||||
|
python3.10 -m uv pip install --no-cache-dir \
|
||||||
|
torch \
|
||||||
|
torchvision \
|
||||||
|
torchaudio \
|
||||||
|
invisible_watermark \
|
||||||
|
--extra-index-url https://download.pytorch.org/whl/cpu && \
|
||||||
|
python3.10 -m uv pip install --no-cache-dir \
|
||||||
|
accelerate \
|
||||||
|
datasets \
|
||||||
|
hf-doc-builder \
|
||||||
|
huggingface-hub \
|
||||||
|
Jinja2 \
|
||||||
|
librosa \
|
||||||
|
numpy==1.26.4 \
|
||||||
|
scipy \
|
||||||
|
tensorboard \
|
||||||
|
transformers \
|
||||||
|
matplotlib \
|
||||||
|
setuptools==69.5.1
|
||||||
|
|
||||||
|
CMD ["/bin/bash"]
|
||||||
@@ -4,22 +4,25 @@ LABEL repository="diffusers"
|
|||||||
|
|
||||||
ENV DEBIAN_FRONTEND=noninteractive
|
ENV DEBIAN_FRONTEND=noninteractive
|
||||||
|
|
||||||
RUN apt update && \
|
RUN apt-get -y update \
|
||||||
apt install -y bash \
|
&& apt-get install -y software-properties-common \
|
||||||
build-essential \
|
&& add-apt-repository ppa:deadsnakes/ppa
|
||||||
git \
|
|
||||||
git-lfs \
|
RUN apt install -y bash \
|
||||||
curl \
|
build-essential \
|
||||||
ca-certificates \
|
git \
|
||||||
libsndfile1-dev \
|
git-lfs \
|
||||||
libgl1 \
|
curl \
|
||||||
python3.8 \
|
ca-certificates \
|
||||||
python3-pip \
|
libsndfile1-dev \
|
||||||
python3.8-venv && \
|
libgl1 \
|
||||||
|
python3.10 \
|
||||||
|
python3-pip \
|
||||||
|
python3.10-venv && \
|
||||||
rm -rf /var/lib/apt/lists
|
rm -rf /var/lib/apt/lists
|
||||||
|
|
||||||
# make sure to use venv
|
# make sure to use venv
|
||||||
RUN python3 -m venv /opt/venv
|
RUN python3.10 -m venv /opt/venv
|
||||||
ENV PATH="/opt/venv/bin:$PATH"
|
ENV PATH="/opt/venv/bin:$PATH"
|
||||||
|
|
||||||
# pre-install the heavy dependencies (these can later be overridden by the deps from setup.py)
|
# pre-install the heavy dependencies (these can later be overridden by the deps from setup.py)
|
||||||
@@ -37,9 +40,10 @@ RUN python3 -m pip install --no-cache-dir --upgrade pip uv==0.1.11 && \
|
|||||||
huggingface-hub \
|
huggingface-hub \
|
||||||
Jinja2 \
|
Jinja2 \
|
||||||
librosa \
|
librosa \
|
||||||
numpy \
|
numpy==1.26.4 \
|
||||||
scipy \
|
scipy \
|
||||||
tensorboard \
|
tensorboard \
|
||||||
transformers
|
transformers \
|
||||||
|
hf_transfer
|
||||||
|
|
||||||
CMD ["/bin/bash"]
|
CMD ["/bin/bash"]
|
||||||
@@ -4,8 +4,11 @@ LABEL repository="diffusers"
|
|||||||
|
|
||||||
ENV DEBIAN_FRONTEND=noninteractive
|
ENV DEBIAN_FRONTEND=noninteractive
|
||||||
|
|
||||||
RUN apt update && \
|
RUN apt-get -y update \
|
||||||
apt install -y bash \
|
&& apt-get install -y software-properties-common \
|
||||||
|
&& add-apt-repository ppa:deadsnakes/ppa
|
||||||
|
|
||||||
|
RUN apt install -y bash \
|
||||||
build-essential \
|
build-essential \
|
||||||
git \
|
git \
|
||||||
git-lfs \
|
git-lfs \
|
||||||
@@ -13,13 +16,13 @@ RUN apt update && \
|
|||||||
ca-certificates \
|
ca-certificates \
|
||||||
libsndfile1-dev \
|
libsndfile1-dev \
|
||||||
libgl1 \
|
libgl1 \
|
||||||
python3.8 \
|
python3.10 \
|
||||||
python3-pip \
|
python3-pip \
|
||||||
python3.8-venv && \
|
python3.10-venv && \
|
||||||
rm -rf /var/lib/apt/lists
|
rm -rf /var/lib/apt/lists
|
||||||
|
|
||||||
# make sure to use venv
|
# make sure to use venv
|
||||||
RUN python3 -m venv /opt/venv
|
RUN python3.10 -m venv /opt/venv
|
||||||
ENV PATH="/opt/venv/bin:$PATH"
|
ENV PATH="/opt/venv/bin:$PATH"
|
||||||
|
|
||||||
# pre-install the heavy dependencies (these can later be overridden by the deps from setup.py)
|
# pre-install the heavy dependencies (these can later be overridden by the deps from setup.py)
|
||||||
@@ -39,9 +42,10 @@ RUN python3 -m pip install --no-cache-dir --upgrade pip uv==0.1.11 && \
|
|||||||
huggingface-hub \
|
huggingface-hub \
|
||||||
Jinja2 \
|
Jinja2 \
|
||||||
librosa \
|
librosa \
|
||||||
numpy \
|
numpy==1.26.4 \
|
||||||
scipy \
|
scipy \
|
||||||
tensorboard \
|
tensorboard \
|
||||||
transformers
|
transformers \
|
||||||
|
hf_transfer
|
||||||
|
|
||||||
CMD ["/bin/bash"]
|
CMD ["/bin/bash"]
|
||||||
@@ -4,8 +4,11 @@ LABEL repository="diffusers"
|
|||||||
|
|
||||||
ENV DEBIAN_FRONTEND=noninteractive
|
ENV DEBIAN_FRONTEND=noninteractive
|
||||||
|
|
||||||
RUN apt update && \
|
RUN apt-get -y update \
|
||||||
apt install -y bash \
|
&& apt-get install -y software-properties-common \
|
||||||
|
&& add-apt-repository ppa:deadsnakes/ppa
|
||||||
|
|
||||||
|
RUN apt install -y bash \
|
||||||
build-essential \
|
build-essential \
|
||||||
git \
|
git \
|
||||||
git-lfs \
|
git-lfs \
|
||||||
@@ -13,13 +16,13 @@ RUN apt update && \
|
|||||||
ca-certificates \
|
ca-certificates \
|
||||||
libsndfile1-dev \
|
libsndfile1-dev \
|
||||||
libgl1 \
|
libgl1 \
|
||||||
python3.8 \
|
python3.10 \
|
||||||
python3-pip \
|
python3-pip \
|
||||||
python3.8-venv && \
|
python3.10-venv && \
|
||||||
rm -rf /var/lib/apt/lists
|
rm -rf /var/lib/apt/lists
|
||||||
|
|
||||||
# make sure to use venv
|
# make sure to use venv
|
||||||
RUN python3 -m venv /opt/venv
|
RUN python3.10 -m venv /opt/venv
|
||||||
ENV PATH="/opt/venv/bin:$PATH"
|
ENV PATH="/opt/venv/bin:$PATH"
|
||||||
|
|
||||||
# pre-install the heavy dependencies (these can later be overridden by the deps from setup.py)
|
# pre-install the heavy dependencies (these can later be overridden by the deps from setup.py)
|
||||||
@@ -37,9 +40,10 @@ RUN python3 -m pip install --no-cache-dir --upgrade pip uv==0.1.11 && \
|
|||||||
huggingface-hub \
|
huggingface-hub \
|
||||||
Jinja2 \
|
Jinja2 \
|
||||||
librosa \
|
librosa \
|
||||||
numpy \
|
numpy==1.26.4 \
|
||||||
scipy \
|
scipy \
|
||||||
tensorboard \
|
tensorboard \
|
||||||
transformers
|
transformers \
|
||||||
|
hf_transfer
|
||||||
|
|
||||||
CMD ["/bin/bash"]
|
CMD ["/bin/bash"]
|
||||||
@@ -4,8 +4,11 @@ LABEL repository="diffusers"
|
|||||||
|
|
||||||
ENV DEBIAN_FRONTEND=noninteractive
|
ENV DEBIAN_FRONTEND=noninteractive
|
||||||
|
|
||||||
RUN apt update && \
|
RUN apt-get -y update \
|
||||||
apt install -y bash \
|
&& apt-get install -y software-properties-common \
|
||||||
|
&& add-apt-repository ppa:deadsnakes/ppa
|
||||||
|
|
||||||
|
RUN apt install -y bash \
|
||||||
build-essential \
|
build-essential \
|
||||||
git \
|
git \
|
||||||
git-lfs \
|
git-lfs \
|
||||||
@@ -13,33 +16,35 @@ RUN apt update && \
|
|||||||
ca-certificates \
|
ca-certificates \
|
||||||
libsndfile1-dev \
|
libsndfile1-dev \
|
||||||
libgl1 \
|
libgl1 \
|
||||||
python3.8 \
|
python3.10 \
|
||||||
python3-pip \
|
python3-pip \
|
||||||
python3.8-venv && \
|
python3.10-venv && \
|
||||||
rm -rf /var/lib/apt/lists
|
rm -rf /var/lib/apt/lists
|
||||||
|
|
||||||
# make sure to use venv
|
# make sure to use venv
|
||||||
RUN python3 -m venv /opt/venv
|
RUN python3.10 -m venv /opt/venv
|
||||||
ENV PATH="/opt/venv/bin:$PATH"
|
ENV PATH="/opt/venv/bin:$PATH"
|
||||||
|
|
||||||
# pre-install the heavy dependencies (these can later be overridden by the deps from setup.py)
|
# pre-install the heavy dependencies (these can later be overridden by the deps from setup.py)
|
||||||
RUN python3 -m pip install --no-cache-dir --upgrade pip uv==0.1.11 && \
|
RUN python3.10 -m pip install --no-cache-dir --upgrade pip uv==0.1.11 && \
|
||||||
python3 -m uv pip install --no-cache-dir \
|
python3.10 -m uv pip install --no-cache-dir \
|
||||||
torch \
|
torch \
|
||||||
torchvision \
|
torchvision \
|
||||||
torchaudio \
|
torchaudio \
|
||||||
"onnxruntime-gpu>=1.13.1" \
|
"onnxruntime-gpu>=1.13.1" \
|
||||||
--extra-index-url https://download.pytorch.org/whl/cu117 && \
|
--extra-index-url https://download.pytorch.org/whl/cu117 && \
|
||||||
python3 -m uv pip install --no-cache-dir \
|
python3.10 -m uv pip install --no-cache-dir \
|
||||||
accelerate \
|
accelerate \
|
||||||
datasets \
|
datasets \
|
||||||
hf-doc-builder \
|
hf-doc-builder \
|
||||||
huggingface-hub \
|
huggingface-hub \
|
||||||
|
hf_transfer \
|
||||||
Jinja2 \
|
Jinja2 \
|
||||||
librosa \
|
librosa \
|
||||||
numpy \
|
numpy==1.26.4 \
|
||||||
scipy \
|
scipy \
|
||||||
tensorboard \
|
tensorboard \
|
||||||
transformers
|
transformers \
|
||||||
|
hf_transfer
|
||||||
|
|
||||||
CMD ["/bin/bash"]
|
CMD ["/bin/bash"]
|
||||||
@@ -4,8 +4,11 @@ LABEL repository="diffusers"
|
|||||||
|
|
||||||
ENV DEBIAN_FRONTEND=noninteractive
|
ENV DEBIAN_FRONTEND=noninteractive
|
||||||
|
|
||||||
RUN apt update && \
|
RUN apt-get -y update \
|
||||||
apt install -y bash \
|
&& apt-get install -y software-properties-common \
|
||||||
|
&& add-apt-repository ppa:deadsnakes/ppa
|
||||||
|
|
||||||
|
RUN apt install -y bash \
|
||||||
build-essential \
|
build-essential \
|
||||||
git \
|
git \
|
||||||
git-lfs \
|
git-lfs \
|
||||||
@@ -13,33 +16,35 @@ RUN apt update && \
|
|||||||
ca-certificates \
|
ca-certificates \
|
||||||
libsndfile1-dev \
|
libsndfile1-dev \
|
||||||
libgl1 \
|
libgl1 \
|
||||||
python3.9 \
|
python3.10 \
|
||||||
python3.9-dev \
|
python3.10-dev \
|
||||||
python3-pip \
|
python3-pip \
|
||||||
python3.9-venv && \
|
python3.10-venv && \
|
||||||
rm -rf /var/lib/apt/lists
|
rm -rf /var/lib/apt/lists
|
||||||
|
|
||||||
# make sure to use venv
|
# make sure to use venv
|
||||||
RUN python3.9 -m venv /opt/venv
|
RUN python3.10 -m venv /opt/venv
|
||||||
ENV PATH="/opt/venv/bin:$PATH"
|
ENV PATH="/opt/venv/bin:$PATH"
|
||||||
|
|
||||||
# pre-install the heavy dependencies (these can later be overridden by the deps from setup.py)
|
# pre-install the heavy dependencies (these can later be overridden by the deps from setup.py)
|
||||||
RUN python3.9 -m pip install --no-cache-dir --upgrade pip uv==0.1.11 && \
|
RUN python3.10 -m pip install --no-cache-dir --upgrade pip uv==0.1.11 && \
|
||||||
python3.9 -m uv pip install --no-cache-dir \
|
python3.10 -m uv pip install --no-cache-dir \
|
||||||
torch \
|
torch \
|
||||||
torchvision \
|
torchvision \
|
||||||
torchaudio \
|
torchaudio \
|
||||||
invisible_watermark && \
|
invisible_watermark && \
|
||||||
python3.9 -m pip install --no-cache-dir \
|
python3.10 -m pip install --no-cache-dir \
|
||||||
accelerate \
|
accelerate \
|
||||||
datasets \
|
datasets \
|
||||||
hf-doc-builder \
|
hf-doc-builder \
|
||||||
huggingface-hub \
|
huggingface-hub \
|
||||||
|
hf_transfer \
|
||||||
Jinja2 \
|
Jinja2 \
|
||||||
librosa \
|
librosa \
|
||||||
numpy \
|
numpy==1.26.4 \
|
||||||
scipy \
|
scipy \
|
||||||
tensorboard \
|
tensorboard \
|
||||||
transformers
|
transformers \
|
||||||
|
hf_transfer
|
||||||
|
|
||||||
CMD ["/bin/bash"]
|
CMD ["/bin/bash"]
|
||||||
|
|||||||
@@ -4,42 +4,47 @@ LABEL repository="diffusers"
|
|||||||
|
|
||||||
ENV DEBIAN_FRONTEND=noninteractive
|
ENV DEBIAN_FRONTEND=noninteractive
|
||||||
|
|
||||||
RUN apt update && \
|
RUN apt-get -y update \
|
||||||
apt install -y bash \
|
&& apt-get install -y software-properties-common \
|
||||||
|
&& add-apt-repository ppa:deadsnakes/ppa
|
||||||
|
|
||||||
|
RUN apt install -y bash \
|
||||||
build-essential \
|
build-essential \
|
||||||
git \
|
git \
|
||||||
git-lfs \
|
git-lfs \
|
||||||
curl \
|
curl \
|
||||||
ca-certificates \
|
ca-certificates \
|
||||||
libsndfile1-dev \
|
libsndfile1-dev \
|
||||||
python3.8 \
|
python3.10 \
|
||||||
|
python3.10-dev \
|
||||||
python3-pip \
|
python3-pip \
|
||||||
libgl1 \
|
libgl1 \
|
||||||
python3.8-venv && \
|
python3.10-venv && \
|
||||||
rm -rf /var/lib/apt/lists
|
rm -rf /var/lib/apt/lists
|
||||||
|
|
||||||
# make sure to use venv
|
# make sure to use venv
|
||||||
RUN python3 -m venv /opt/venv
|
RUN python3.10 -m venv /opt/venv
|
||||||
ENV PATH="/opt/venv/bin:$PATH"
|
ENV PATH="/opt/venv/bin:$PATH"
|
||||||
|
|
||||||
# pre-install the heavy dependencies (these can later be overridden by the deps from setup.py)
|
# pre-install the heavy dependencies (these can later be overridden by the deps from setup.py)
|
||||||
RUN python3 -m pip install --no-cache-dir --upgrade pip uv==0.1.11 && \
|
RUN python3.10 -m pip install --no-cache-dir --upgrade pip uv==0.1.11 && \
|
||||||
python3 -m uv pip install --no-cache-dir \
|
python3.10 -m uv pip install --no-cache-dir \
|
||||||
torch \
|
torch \
|
||||||
torchvision \
|
torchvision \
|
||||||
torchaudio \
|
torchaudio \
|
||||||
invisible_watermark \
|
invisible_watermark \
|
||||||
--extra-index-url https://download.pytorch.org/whl/cpu && \
|
--extra-index-url https://download.pytorch.org/whl/cpu && \
|
||||||
python3 -m uv pip install --no-cache-dir \
|
python3.10 -m uv pip install --no-cache-dir \
|
||||||
accelerate \
|
accelerate \
|
||||||
datasets \
|
datasets \
|
||||||
hf-doc-builder \
|
hf-doc-builder \
|
||||||
huggingface-hub \
|
huggingface-hub \
|
||||||
Jinja2 \
|
Jinja2 \
|
||||||
librosa \
|
librosa \
|
||||||
numpy \
|
numpy==1.26.4 \
|
||||||
scipy \
|
scipy \
|
||||||
tensorboard \
|
tensorboard \
|
||||||
transformers matplotlib
|
transformers matplotlib \
|
||||||
|
hf_transfer
|
||||||
|
|
||||||
CMD ["/bin/bash"]
|
CMD ["/bin/bash"]
|
||||||
|
|||||||
@@ -4,8 +4,11 @@ LABEL repository="diffusers"
|
|||||||
|
|
||||||
ENV DEBIAN_FRONTEND=noninteractive
|
ENV DEBIAN_FRONTEND=noninteractive
|
||||||
|
|
||||||
RUN apt update && \
|
RUN apt-get -y update \
|
||||||
apt install -y bash \
|
&& apt-get install -y software-properties-common \
|
||||||
|
&& add-apt-repository ppa:deadsnakes/ppa
|
||||||
|
|
||||||
|
RUN apt install -y bash \
|
||||||
build-essential \
|
build-essential \
|
||||||
git \
|
git \
|
||||||
git-lfs \
|
git-lfs \
|
||||||
@@ -13,33 +16,36 @@ RUN apt update && \
|
|||||||
ca-certificates \
|
ca-certificates \
|
||||||
libsndfile1-dev \
|
libsndfile1-dev \
|
||||||
libgl1 \
|
libgl1 \
|
||||||
python3.8 \
|
python3.10 \
|
||||||
|
python3.10-dev \
|
||||||
python3-pip \
|
python3-pip \
|
||||||
python3.8-venv && \
|
python3.10-venv && \
|
||||||
rm -rf /var/lib/apt/lists
|
rm -rf /var/lib/apt/lists
|
||||||
|
|
||||||
# make sure to use venv
|
# make sure to use venv
|
||||||
RUN python3 -m venv /opt/venv
|
RUN python3.10 -m venv /opt/venv
|
||||||
ENV PATH="/opt/venv/bin:$PATH"
|
ENV PATH="/opt/venv/bin:$PATH"
|
||||||
|
|
||||||
# pre-install the heavy dependencies (these can later be overridden by the deps from setup.py)
|
# pre-install the heavy dependencies (these can later be overridden by the deps from setup.py)
|
||||||
RUN python3 -m pip install --no-cache-dir --upgrade pip uv==0.1.11 && \
|
RUN python3.10 -m pip install --no-cache-dir --upgrade pip uv==0.1.11 && \
|
||||||
python3 -m uv pip install --no-cache-dir \
|
python3.10 -m uv pip install --no-cache-dir \
|
||||||
torch \
|
torch \
|
||||||
torchvision \
|
torchvision \
|
||||||
torchaudio \
|
torchaudio \
|
||||||
invisible_watermark && \
|
invisible_watermark && \
|
||||||
python3 -m pip install --no-cache-dir \
|
python3.10 -m pip install --no-cache-dir \
|
||||||
accelerate \
|
accelerate \
|
||||||
datasets \
|
datasets \
|
||||||
hf-doc-builder \
|
hf-doc-builder \
|
||||||
huggingface-hub \
|
huggingface-hub \
|
||||||
|
hf_transfer \
|
||||||
Jinja2 \
|
Jinja2 \
|
||||||
librosa \
|
librosa \
|
||||||
numpy \
|
numpy==1.26.4 \
|
||||||
scipy \
|
scipy \
|
||||||
tensorboard \
|
tensorboard \
|
||||||
transformers \
|
transformers \
|
||||||
pytorch-lightning
|
pytorch-lightning \
|
||||||
|
hf_transfer
|
||||||
|
|
||||||
CMD ["/bin/bash"]
|
CMD ["/bin/bash"]
|
||||||
|
|||||||
@@ -4,8 +4,11 @@ LABEL repository="diffusers"
|
|||||||
|
|
||||||
ENV DEBIAN_FRONTEND=noninteractive
|
ENV DEBIAN_FRONTEND=noninteractive
|
||||||
|
|
||||||
RUN apt update && \
|
RUN apt-get -y update \
|
||||||
apt install -y bash \
|
&& apt-get install -y software-properties-common \
|
||||||
|
&& add-apt-repository ppa:deadsnakes/ppa
|
||||||
|
|
||||||
|
RUN apt install -y bash \
|
||||||
build-essential \
|
build-essential \
|
||||||
git \
|
git \
|
||||||
git-lfs \
|
git-lfs \
|
||||||
@@ -13,33 +16,36 @@ RUN apt update && \
|
|||||||
ca-certificates \
|
ca-certificates \
|
||||||
libsndfile1-dev \
|
libsndfile1-dev \
|
||||||
libgl1 \
|
libgl1 \
|
||||||
python3.8 \
|
python3.10 \
|
||||||
|
python3.10-dev \
|
||||||
python3-pip \
|
python3-pip \
|
||||||
python3.8-venv && \
|
python3.10-venv && \
|
||||||
rm -rf /var/lib/apt/lists
|
rm -rf /var/lib/apt/lists
|
||||||
|
|
||||||
# make sure to use venv
|
# make sure to use venv
|
||||||
RUN python3 -m venv /opt/venv
|
RUN python3.10 -m venv /opt/venv
|
||||||
ENV PATH="/opt/venv/bin:$PATH"
|
ENV PATH="/opt/venv/bin:$PATH"
|
||||||
|
|
||||||
# pre-install the heavy dependencies (these can later be overridden by the deps from setup.py)
|
# pre-install the heavy dependencies (these can later be overridden by the deps from setup.py)
|
||||||
RUN python3 -m pip install --no-cache-dir --upgrade pip uv==0.1.11 && \
|
RUN python3.10 -m pip install --no-cache-dir --upgrade pip uv==0.1.11 && \
|
||||||
python3 -m pip install --no-cache-dir \
|
python3.10 -m pip install --no-cache-dir \
|
||||||
torch \
|
torch \
|
||||||
torchvision \
|
torchvision \
|
||||||
torchaudio \
|
torchaudio \
|
||||||
invisible_watermark && \
|
invisible_watermark && \
|
||||||
python3 -m uv pip install --no-cache-dir \
|
python3.10 -m uv pip install --no-cache-dir \
|
||||||
accelerate \
|
accelerate \
|
||||||
datasets \
|
datasets \
|
||||||
hf-doc-builder \
|
hf-doc-builder \
|
||||||
huggingface-hub \
|
huggingface-hub \
|
||||||
|
hf_transfer \
|
||||||
Jinja2 \
|
Jinja2 \
|
||||||
librosa \
|
librosa \
|
||||||
numpy \
|
numpy==1.26.4 \
|
||||||
scipy \
|
scipy \
|
||||||
tensorboard \
|
tensorboard \
|
||||||
transformers \
|
transformers \
|
||||||
xformers
|
xformers \
|
||||||
|
hf_transfer
|
||||||
|
|
||||||
CMD ["/bin/bash"]
|
CMD ["/bin/bash"]
|
||||||
|
|||||||
@@ -242,10 +242,10 @@ Here's an example of a tuple return, comprising several objects:
|
|||||||
|
|
||||||
```
|
```
|
||||||
Returns:
|
Returns:
|
||||||
`tuple(torch.FloatTensor)` comprising various elements depending on the configuration ([`BertConfig`]) and inputs:
|
`tuple(torch.Tensor)` comprising various elements depending on the configuration ([`BertConfig`]) and inputs:
|
||||||
- ** loss** (*optional*, returned when `masked_lm_labels` is provided) `torch.FloatTensor` of shape `(1,)` --
|
- ** loss** (*optional*, returned when `masked_lm_labels` is provided) `torch.Tensor` of shape `(1,)` --
|
||||||
Total loss is the sum of the masked language modeling loss and the next sequence prediction (classification) loss.
|
Total loss is the sum of the masked language modeling loss and the next sequence prediction (classification) loss.
|
||||||
- **prediction_scores** (`torch.FloatTensor` of shape `(batch_size, sequence_length, config.vocab_size)`) --
|
- **prediction_scores** (`torch.Tensor` of shape `(batch_size, sequence_length, config.vocab_size)`) --
|
||||||
Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).
|
Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|||||||
@@ -21,160 +21,160 @@
|
|||||||
title: Load LoRAs for inference
|
title: Load LoRAs for inference
|
||||||
- local: tutorials/fast_diffusion
|
- local: tutorials/fast_diffusion
|
||||||
title: Accelerate inference of text-to-image diffusion models
|
title: Accelerate inference of text-to-image diffusion models
|
||||||
|
- local: tutorials/inference_with_big_models
|
||||||
|
title: Working with big models
|
||||||
title: Tutorials
|
title: Tutorials
|
||||||
- sections:
|
- sections:
|
||||||
- sections:
|
- local: using-diffusers/loading
|
||||||
- local: using-diffusers/loading_overview
|
title: Load pipelines
|
||||||
title: Overview
|
- local: using-diffusers/custom_pipeline_overview
|
||||||
- local: using-diffusers/loading
|
title: Load community pipelines and components
|
||||||
title: Load pipelines, models, and schedulers
|
- local: using-diffusers/schedulers
|
||||||
- local: using-diffusers/schedulers
|
title: Load schedulers and models
|
||||||
title: Load and compare different schedulers
|
- local: using-diffusers/other-formats
|
||||||
- local: using-diffusers/custom_pipeline_overview
|
title: Model files and layouts
|
||||||
title: Load community pipelines and components
|
- local: using-diffusers/loading_adapters
|
||||||
- local: using-diffusers/using_safetensors
|
title: Load adapters
|
||||||
title: Load safetensors
|
- local: using-diffusers/push_to_hub
|
||||||
- local: using-diffusers/other-formats
|
title: Push files to the Hub
|
||||||
title: Load different Stable Diffusion formats
|
title: Load pipelines and adapters
|
||||||
- local: using-diffusers/loading_adapters
|
|
||||||
title: Load adapters
|
|
||||||
- local: using-diffusers/push_to_hub
|
|
||||||
title: Push files to the Hub
|
|
||||||
title: Loading & Hub
|
|
||||||
- sections:
|
|
||||||
- local: using-diffusers/pipeline_overview
|
|
||||||
title: Overview
|
|
||||||
- local: using-diffusers/unconditional_image_generation
|
|
||||||
title: Unconditional image generation
|
|
||||||
- local: using-diffusers/conditional_image_generation
|
|
||||||
title: Text-to-image
|
|
||||||
- local: using-diffusers/img2img
|
|
||||||
title: Image-to-image
|
|
||||||
- local: using-diffusers/inpaint
|
|
||||||
title: Inpainting
|
|
||||||
- local: using-diffusers/text-img2vid
|
|
||||||
title: Text or image-to-video
|
|
||||||
- local: using-diffusers/depth2img
|
|
||||||
title: Depth-to-image
|
|
||||||
title: Tasks
|
|
||||||
- sections:
|
|
||||||
- local: using-diffusers/textual_inversion_inference
|
|
||||||
title: Textual inversion
|
|
||||||
- local: using-diffusers/ip_adapter
|
|
||||||
title: IP-Adapter
|
|
||||||
- local: using-diffusers/merge_loras
|
|
||||||
title: Merge LoRAs
|
|
||||||
- local: training/distributed_inference
|
|
||||||
title: Distributed inference with multiple GPUs
|
|
||||||
- local: using-diffusers/reusing_seeds
|
|
||||||
title: Improve image quality with deterministic generation
|
|
||||||
- local: using-diffusers/control_brightness
|
|
||||||
title: Control image brightness
|
|
||||||
- local: using-diffusers/weighted_prompts
|
|
||||||
title: Prompt techniques
|
|
||||||
- local: using-diffusers/freeu
|
|
||||||
title: Improve generation quality with FreeU
|
|
||||||
title: Techniques
|
|
||||||
- sections:
|
|
||||||
- local: using-diffusers/pipeline_overview
|
|
||||||
title: Overview
|
|
||||||
- local: using-diffusers/sdxl
|
|
||||||
title: Stable Diffusion XL
|
|
||||||
- local: using-diffusers/sdxl_turbo
|
|
||||||
title: SDXL Turbo
|
|
||||||
- local: using-diffusers/kandinsky
|
|
||||||
title: Kandinsky
|
|
||||||
- local: using-diffusers/controlnet
|
|
||||||
title: ControlNet
|
|
||||||
- local: using-diffusers/t2i_adapter
|
|
||||||
title: T2I-Adapter
|
|
||||||
- local: using-diffusers/shap-e
|
|
||||||
title: Shap-E
|
|
||||||
- local: using-diffusers/diffedit
|
|
||||||
title: DiffEdit
|
|
||||||
- local: using-diffusers/distilled_sd
|
|
||||||
title: Distilled Stable Diffusion inference
|
|
||||||
- local: using-diffusers/callback
|
|
||||||
title: Pipeline callbacks
|
|
||||||
- local: using-diffusers/reproducibility
|
|
||||||
title: Create reproducible pipelines
|
|
||||||
- local: using-diffusers/custom_pipeline_examples
|
|
||||||
title: Community pipelines
|
|
||||||
- local: using-diffusers/contribute_pipeline
|
|
||||||
title: Contribute a community pipeline
|
|
||||||
- local: using-diffusers/inference_with_lcm_lora
|
|
||||||
title: Latent Consistency Model-LoRA
|
|
||||||
- local: using-diffusers/inference_with_lcm
|
|
||||||
title: Latent Consistency Model
|
|
||||||
- local: using-diffusers/inference_with_tcd_lora
|
|
||||||
title: Trajectory Consistency Distillation-LoRA
|
|
||||||
- local: using-diffusers/svd
|
|
||||||
title: Stable Video Diffusion
|
|
||||||
title: Specific pipeline examples
|
|
||||||
- sections:
|
|
||||||
- local: training/overview
|
|
||||||
title: Overview
|
|
||||||
- local: training/create_dataset
|
|
||||||
title: Create a dataset for training
|
|
||||||
- local: training/adapt_a_model
|
|
||||||
title: Adapt a model to a new task
|
|
||||||
- sections:
|
|
||||||
- local: training/unconditional_training
|
|
||||||
title: Unconditional image generation
|
|
||||||
- local: training/text2image
|
|
||||||
title: Text-to-image
|
|
||||||
- local: training/sdxl
|
|
||||||
title: Stable Diffusion XL
|
|
||||||
- local: training/kandinsky
|
|
||||||
title: Kandinsky 2.2
|
|
||||||
- local: training/wuerstchen
|
|
||||||
title: Wuerstchen
|
|
||||||
- local: training/controlnet
|
|
||||||
title: ControlNet
|
|
||||||
- local: training/t2i_adapters
|
|
||||||
title: T2I-Adapters
|
|
||||||
- local: training/instructpix2pix
|
|
||||||
title: InstructPix2Pix
|
|
||||||
title: Models
|
|
||||||
- sections:
|
|
||||||
- local: training/text_inversion
|
|
||||||
title: Textual Inversion
|
|
||||||
- local: training/dreambooth
|
|
||||||
title: DreamBooth
|
|
||||||
- local: training/lora
|
|
||||||
title: LoRA
|
|
||||||
- local: training/custom_diffusion
|
|
||||||
title: Custom Diffusion
|
|
||||||
- local: training/lcm_distill
|
|
||||||
title: Latent Consistency Distillation
|
|
||||||
- local: training/ddpo
|
|
||||||
title: Reinforcement learning training with DDPO
|
|
||||||
title: Methods
|
|
||||||
title: Training
|
|
||||||
- sections:
|
|
||||||
- local: using-diffusers/other-modalities
|
|
||||||
title: Other Modalities
|
|
||||||
title: Taking Diffusers Beyond Images
|
|
||||||
title: Using Diffusers
|
|
||||||
- sections:
|
- sections:
|
||||||
- local: optimization/opt_overview
|
- local: using-diffusers/unconditional_image_generation
|
||||||
|
title: Unconditional image generation
|
||||||
|
- local: using-diffusers/conditional_image_generation
|
||||||
|
title: Text-to-image
|
||||||
|
- local: using-diffusers/img2img
|
||||||
|
title: Image-to-image
|
||||||
|
- local: using-diffusers/inpaint
|
||||||
|
title: Inpainting
|
||||||
|
- local: using-diffusers/text-img2vid
|
||||||
|
title: Text or image-to-video
|
||||||
|
- local: using-diffusers/depth2img
|
||||||
|
title: Depth-to-image
|
||||||
|
title: Generative tasks
|
||||||
|
- sections:
|
||||||
|
- local: using-diffusers/overview_techniques
|
||||||
title: Overview
|
title: Overview
|
||||||
- sections:
|
- local: using-diffusers/create_a_server
|
||||||
- local: optimization/fp16
|
title: Create a server
|
||||||
title: Speed up inference
|
- local: training/distributed_inference
|
||||||
- local: optimization/memory
|
title: Distributed inference
|
||||||
title: Reduce memory usage
|
- local: using-diffusers/merge_loras
|
||||||
- local: optimization/torch2.0
|
title: Merge LoRAs
|
||||||
title: PyTorch 2.0
|
- local: using-diffusers/scheduler_features
|
||||||
- local: optimization/xformers
|
title: Scheduler features
|
||||||
title: xFormers
|
- local: using-diffusers/callback
|
||||||
- local: optimization/tome
|
title: Pipeline callbacks
|
||||||
title: Token merging
|
- local: using-diffusers/reusing_seeds
|
||||||
- local: optimization/deepcache
|
title: Reproducible pipelines
|
||||||
title: DeepCache
|
- local: using-diffusers/image_quality
|
||||||
- local: optimization/tgate
|
title: Controlling image quality
|
||||||
title: TGATE
|
- local: using-diffusers/weighted_prompts
|
||||||
title: General optimizations
|
title: Prompt techniques
|
||||||
|
title: Inference techniques
|
||||||
|
- sections:
|
||||||
|
- local: advanced_inference/outpaint
|
||||||
|
title: Outpainting
|
||||||
|
title: Advanced inference
|
||||||
|
- sections:
|
||||||
|
- local: using-diffusers/cogvideox
|
||||||
|
title: CogVideoX
|
||||||
|
- local: using-diffusers/sdxl
|
||||||
|
title: Stable Diffusion XL
|
||||||
|
- local: using-diffusers/sdxl_turbo
|
||||||
|
title: SDXL Turbo
|
||||||
|
- local: using-diffusers/kandinsky
|
||||||
|
title: Kandinsky
|
||||||
|
- local: using-diffusers/ip_adapter
|
||||||
|
title: IP-Adapter
|
||||||
|
- local: using-diffusers/pag
|
||||||
|
title: PAG
|
||||||
|
- local: using-diffusers/controlnet
|
||||||
|
title: ControlNet
|
||||||
|
- local: using-diffusers/t2i_adapter
|
||||||
|
title: T2I-Adapter
|
||||||
|
- local: using-diffusers/inference_with_lcm
|
||||||
|
title: Latent Consistency Model
|
||||||
|
- local: using-diffusers/textual_inversion_inference
|
||||||
|
title: Textual inversion
|
||||||
|
- local: using-diffusers/shap-e
|
||||||
|
title: Shap-E
|
||||||
|
- local: using-diffusers/diffedit
|
||||||
|
title: DiffEdit
|
||||||
|
- local: using-diffusers/inference_with_tcd_lora
|
||||||
|
title: Trajectory Consistency Distillation-LoRA
|
||||||
|
- local: using-diffusers/svd
|
||||||
|
title: Stable Video Diffusion
|
||||||
|
- local: using-diffusers/marigold_usage
|
||||||
|
title: Marigold Computer Vision
|
||||||
|
title: Specific pipeline examples
|
||||||
|
- sections:
|
||||||
|
- local: training/overview
|
||||||
|
title: Overview
|
||||||
|
- local: training/create_dataset
|
||||||
|
title: Create a dataset for training
|
||||||
|
- local: training/adapt_a_model
|
||||||
|
title: Adapt a model to a new task
|
||||||
|
- isExpanded: false
|
||||||
|
sections:
|
||||||
|
- local: training/unconditional_training
|
||||||
|
title: Unconditional image generation
|
||||||
|
- local: training/text2image
|
||||||
|
title: Text-to-image
|
||||||
|
- local: training/sdxl
|
||||||
|
title: Stable Diffusion XL
|
||||||
|
- local: training/kandinsky
|
||||||
|
title: Kandinsky 2.2
|
||||||
|
- local: training/wuerstchen
|
||||||
|
title: Wuerstchen
|
||||||
|
- local: training/controlnet
|
||||||
|
title: ControlNet
|
||||||
|
- local: training/t2i_adapters
|
||||||
|
title: T2I-Adapters
|
||||||
|
- local: training/instructpix2pix
|
||||||
|
title: InstructPix2Pix
|
||||||
|
- local: training/cogvideox
|
||||||
|
title: CogVideoX
|
||||||
|
title: Models
|
||||||
|
- isExpanded: false
|
||||||
|
sections:
|
||||||
|
- local: training/text_inversion
|
||||||
|
title: Textual Inversion
|
||||||
|
- local: training/dreambooth
|
||||||
|
title: DreamBooth
|
||||||
|
- local: training/lora
|
||||||
|
title: LoRA
|
||||||
|
- local: training/custom_diffusion
|
||||||
|
title: Custom Diffusion
|
||||||
|
- local: training/lcm_distill
|
||||||
|
title: Latent Consistency Distillation
|
||||||
|
- local: training/ddpo
|
||||||
|
title: Reinforcement learning training with DDPO
|
||||||
|
title: Methods
|
||||||
|
title: Training
|
||||||
|
- sections:
|
||||||
|
- local: quantization/overview
|
||||||
|
title: Getting Started
|
||||||
|
- local: quantization/bitsandbytes
|
||||||
|
title: bitsandbytes
|
||||||
|
title: Quantization Methods
|
||||||
|
- sections:
|
||||||
|
- local: optimization/fp16
|
||||||
|
title: Speed up inference
|
||||||
|
- local: optimization/memory
|
||||||
|
title: Reduce memory usage
|
||||||
|
- local: optimization/torch2.0
|
||||||
|
title: PyTorch 2.0
|
||||||
|
- local: optimization/xformers
|
||||||
|
title: xFormers
|
||||||
|
- local: optimization/tome
|
||||||
|
title: Token merging
|
||||||
|
- local: optimization/deepcache
|
||||||
|
title: DeepCache
|
||||||
|
- local: optimization/tgate
|
||||||
|
title: TGATE
|
||||||
|
- local: optimization/xdit
|
||||||
|
title: xDiT
|
||||||
- sections:
|
- sections:
|
||||||
- local: using-diffusers/stable_diffusion_jax_how_to
|
- local: using-diffusers/stable_diffusion_jax_how_to
|
||||||
title: JAX/Flax
|
title: JAX/Flax
|
||||||
@@ -184,14 +184,16 @@
|
|||||||
title: OpenVINO
|
title: OpenVINO
|
||||||
- local: optimization/coreml
|
- local: optimization/coreml
|
||||||
title: Core ML
|
title: Core ML
|
||||||
title: Optimized model types
|
title: Optimized model formats
|
||||||
- sections:
|
- sections:
|
||||||
- local: optimization/mps
|
- local: optimization/mps
|
||||||
title: Metal Performance Shaders (MPS)
|
title: Metal Performance Shaders (MPS)
|
||||||
- local: optimization/habana
|
- local: optimization/habana
|
||||||
title: Habana Gaudi
|
title: Habana Gaudi
|
||||||
|
- local: optimization/neuron
|
||||||
|
title: AWS Neuron
|
||||||
title: Optimized hardware
|
title: Optimized hardware
|
||||||
title: Optimization
|
title: Accelerate inference and reduce memory
|
||||||
- sections:
|
- sections:
|
||||||
- local: conceptual/philosophy
|
- local: conceptual/philosophy
|
||||||
title: Philosophy
|
title: Philosophy
|
||||||
@@ -205,15 +207,23 @@
|
|||||||
title: Evaluating Diffusion Models
|
title: Evaluating Diffusion Models
|
||||||
title: Conceptual Guides
|
title: Conceptual Guides
|
||||||
- sections:
|
- sections:
|
||||||
- sections:
|
- local: community_projects
|
||||||
|
title: Projects built with Diffusers
|
||||||
|
title: Community Projects
|
||||||
|
- sections:
|
||||||
|
- isExpanded: false
|
||||||
|
sections:
|
||||||
- local: api/configuration
|
- local: api/configuration
|
||||||
title: Configuration
|
title: Configuration
|
||||||
- local: api/logging
|
- local: api/logging
|
||||||
title: Logging
|
title: Logging
|
||||||
- local: api/outputs
|
- local: api/outputs
|
||||||
title: Outputs
|
title: Outputs
|
||||||
|
- local: api/quantization
|
||||||
|
title: Quantization
|
||||||
title: Main Classes
|
title: Main Classes
|
||||||
- sections:
|
- isExpanded: false
|
||||||
|
sections:
|
||||||
- local: api/loaders/ip_adapter
|
- local: api/loaders/ip_adapter
|
||||||
title: IP-Adapter
|
title: IP-Adapter
|
||||||
- local: api/loaders/lora
|
- local: api/loaders/lora
|
||||||
@@ -227,43 +237,99 @@
|
|||||||
- local: api/loaders/peft
|
- local: api/loaders/peft
|
||||||
title: PEFT
|
title: PEFT
|
||||||
title: Loaders
|
title: Loaders
|
||||||
- sections:
|
- isExpanded: false
|
||||||
|
sections:
|
||||||
- local: api/models/overview
|
- local: api/models/overview
|
||||||
title: Overview
|
title: Overview
|
||||||
- local: api/models/unet
|
- sections:
|
||||||
title: UNet1DModel
|
- local: api/models/controlnet
|
||||||
- local: api/models/unet2d
|
title: ControlNetModel
|
||||||
title: UNet2DModel
|
- local: api/models/controlnet_flux
|
||||||
- local: api/models/unet2d-cond
|
title: FluxControlNetModel
|
||||||
title: UNet2DConditionModel
|
- local: api/models/controlnet_hunyuandit
|
||||||
- local: api/models/unet3d-cond
|
title: HunyuanDiT2DControlNetModel
|
||||||
title: UNet3DConditionModel
|
- local: api/models/controlnet_sd3
|
||||||
- local: api/models/unet-motion
|
title: SD3ControlNetModel
|
||||||
title: UNetMotionModel
|
- local: api/models/controlnet_sparsectrl
|
||||||
- local: api/models/uvit2d
|
title: SparseControlNetModel
|
||||||
title: UViT2DModel
|
title: ControlNets
|
||||||
- local: api/models/vq
|
- sections:
|
||||||
title: VQModel
|
- local: api/models/allegro_transformer3d
|
||||||
- local: api/models/autoencoderkl
|
title: AllegroTransformer3DModel
|
||||||
title: AutoencoderKL
|
- local: api/models/aura_flow_transformer2d
|
||||||
- local: api/models/asymmetricautoencoderkl
|
title: AuraFlowTransformer2DModel
|
||||||
title: AsymmetricAutoencoderKL
|
- local: api/models/cogvideox_transformer3d
|
||||||
- local: api/models/autoencoder_tiny
|
title: CogVideoXTransformer3DModel
|
||||||
title: Tiny AutoEncoder
|
- local: api/models/cogview3plus_transformer2d
|
||||||
- local: api/models/consistency_decoder_vae
|
title: CogView3PlusTransformer2DModel
|
||||||
title: ConsistencyDecoderVAE
|
- local: api/models/dit_transformer2d
|
||||||
- local: api/models/transformer2d
|
title: DiTTransformer2DModel
|
||||||
title: Transformer2D
|
- local: api/models/flux_transformer
|
||||||
- local: api/models/transformer_temporal
|
title: FluxTransformer2DModel
|
||||||
title: Transformer Temporal
|
- local: api/models/hunyuan_transformer2d
|
||||||
- local: api/models/prior_transformer
|
title: HunyuanDiT2DModel
|
||||||
title: Prior Transformer
|
- local: api/models/latte_transformer3d
|
||||||
- local: api/models/controlnet
|
title: LatteTransformer3DModel
|
||||||
title: ControlNet
|
- local: api/models/lumina_nextdit2d
|
||||||
|
title: LuminaNextDiT2DModel
|
||||||
|
- local: api/models/mochi_transformer3d
|
||||||
|
title: MochiTransformer3DModel
|
||||||
|
- local: api/models/pixart_transformer2d
|
||||||
|
title: PixArtTransformer2DModel
|
||||||
|
- local: api/models/prior_transformer
|
||||||
|
title: PriorTransformer
|
||||||
|
- local: api/models/sd3_transformer2d
|
||||||
|
title: SD3Transformer2DModel
|
||||||
|
- local: api/models/stable_audio_transformer
|
||||||
|
title: StableAudioDiTModel
|
||||||
|
- local: api/models/transformer2d
|
||||||
|
title: Transformer2DModel
|
||||||
|
- local: api/models/transformer_temporal
|
||||||
|
title: TransformerTemporalModel
|
||||||
|
title: Transformers
|
||||||
|
- sections:
|
||||||
|
- local: api/models/stable_cascade_unet
|
||||||
|
title: StableCascadeUNet
|
||||||
|
- local: api/models/unet
|
||||||
|
title: UNet1DModel
|
||||||
|
- local: api/models/unet2d
|
||||||
|
title: UNet2DModel
|
||||||
|
- local: api/models/unet2d-cond
|
||||||
|
title: UNet2DConditionModel
|
||||||
|
- local: api/models/unet3d-cond
|
||||||
|
title: UNet3DConditionModel
|
||||||
|
- local: api/models/unet-motion
|
||||||
|
title: UNetMotionModel
|
||||||
|
- local: api/models/uvit2d
|
||||||
|
title: UViT2DModel
|
||||||
|
title: UNets
|
||||||
|
- sections:
|
||||||
|
- local: api/models/autoencoderkl
|
||||||
|
title: AutoencoderKL
|
||||||
|
- local: api/models/autoencoderkl_allegro
|
||||||
|
title: AutoencoderKLAllegro
|
||||||
|
- local: api/models/autoencoderkl_cogvideox
|
||||||
|
title: AutoencoderKLCogVideoX
|
||||||
|
- local: api/models/autoencoderkl_mochi
|
||||||
|
title: AutoencoderKLMochi
|
||||||
|
- local: api/models/asymmetricautoencoderkl
|
||||||
|
title: AsymmetricAutoencoderKL
|
||||||
|
- local: api/models/consistency_decoder_vae
|
||||||
|
title: ConsistencyDecoderVAE
|
||||||
|
- local: api/models/autoencoder_oobleck
|
||||||
|
title: Oobleck AutoEncoder
|
||||||
|
- local: api/models/autoencoder_tiny
|
||||||
|
title: Tiny AutoEncoder
|
||||||
|
- local: api/models/vq
|
||||||
|
title: VQModel
|
||||||
|
title: VAEs
|
||||||
title: Models
|
title: Models
|
||||||
- sections:
|
- isExpanded: false
|
||||||
|
sections:
|
||||||
- local: api/pipelines/overview
|
- local: api/pipelines/overview
|
||||||
title: Overview
|
title: Overview
|
||||||
|
- local: api/pipelines/allegro
|
||||||
|
title: Allegro
|
||||||
- local: api/pipelines/amused
|
- local: api/pipelines/amused
|
||||||
title: aMUSEd
|
title: aMUSEd
|
||||||
- local: api/pipelines/animatediff
|
- local: api/pipelines/animatediff
|
||||||
@@ -274,14 +340,26 @@
|
|||||||
title: AudioLDM
|
title: AudioLDM
|
||||||
- local: api/pipelines/audioldm2
|
- local: api/pipelines/audioldm2
|
||||||
title: AudioLDM 2
|
title: AudioLDM 2
|
||||||
|
- local: api/pipelines/aura_flow
|
||||||
|
title: AuraFlow
|
||||||
- local: api/pipelines/auto_pipeline
|
- local: api/pipelines/auto_pipeline
|
||||||
title: AutoPipeline
|
title: AutoPipeline
|
||||||
- local: api/pipelines/blip_diffusion
|
- local: api/pipelines/blip_diffusion
|
||||||
title: BLIP-Diffusion
|
title: BLIP-Diffusion
|
||||||
|
- local: api/pipelines/cogvideox
|
||||||
|
title: CogVideoX
|
||||||
|
- local: api/pipelines/cogview3
|
||||||
|
title: CogView3
|
||||||
- local: api/pipelines/consistency_models
|
- local: api/pipelines/consistency_models
|
||||||
title: Consistency Models
|
title: Consistency Models
|
||||||
- local: api/pipelines/controlnet
|
- local: api/pipelines/controlnet
|
||||||
title: ControlNet
|
title: ControlNet
|
||||||
|
- local: api/pipelines/controlnet_flux
|
||||||
|
title: ControlNet with Flux.1
|
||||||
|
- local: api/pipelines/controlnet_hunyuandit
|
||||||
|
title: ControlNet with Hunyuan-DiT
|
||||||
|
- local: api/pipelines/controlnet_sd3
|
||||||
|
title: ControlNet with Stable Diffusion 3
|
||||||
- local: api/pipelines/controlnet_sdxl
|
- local: api/pipelines/controlnet_sdxl
|
||||||
title: ControlNet with Stable Diffusion XL
|
title: ControlNet with Stable Diffusion XL
|
||||||
- local: api/pipelines/controlnetxs
|
- local: api/pipelines/controlnetxs
|
||||||
@@ -300,6 +378,10 @@
|
|||||||
title: DiffEdit
|
title: DiffEdit
|
||||||
- local: api/pipelines/dit
|
- local: api/pipelines/dit
|
||||||
title: DiT
|
title: DiT
|
||||||
|
- local: api/pipelines/flux
|
||||||
|
title: Flux
|
||||||
|
- local: api/pipelines/hunyuandit
|
||||||
|
title: Hunyuan-DiT
|
||||||
- local: api/pipelines/i2vgenxl
|
- local: api/pipelines/i2vgenxl
|
||||||
title: I2VGen-XL
|
title: I2VGen-XL
|
||||||
- local: api/pipelines/pix2pix
|
- local: api/pipelines/pix2pix
|
||||||
@@ -310,28 +392,44 @@
|
|||||||
title: Kandinsky 2.2
|
title: Kandinsky 2.2
|
||||||
- local: api/pipelines/kandinsky3
|
- local: api/pipelines/kandinsky3
|
||||||
title: Kandinsky 3
|
title: Kandinsky 3
|
||||||
|
- local: api/pipelines/kolors
|
||||||
|
title: Kolors
|
||||||
- local: api/pipelines/latent_consistency_models
|
- local: api/pipelines/latent_consistency_models
|
||||||
title: Latent Consistency Models
|
title: Latent Consistency Models
|
||||||
- local: api/pipelines/latent_diffusion
|
- local: api/pipelines/latent_diffusion
|
||||||
title: Latent Diffusion
|
title: Latent Diffusion
|
||||||
|
- local: api/pipelines/latte
|
||||||
|
title: Latte
|
||||||
- local: api/pipelines/ledits_pp
|
- local: api/pipelines/ledits_pp
|
||||||
title: LEDITS++
|
title: LEDITS++
|
||||||
|
- local: api/pipelines/lumina
|
||||||
|
title: Lumina-T2X
|
||||||
|
- local: api/pipelines/marigold
|
||||||
|
title: Marigold
|
||||||
|
- local: api/pipelines/mochi
|
||||||
|
title: Mochi
|
||||||
- local: api/pipelines/panorama
|
- local: api/pipelines/panorama
|
||||||
title: MultiDiffusion
|
title: MultiDiffusion
|
||||||
- local: api/pipelines/musicldm
|
- local: api/pipelines/musicldm
|
||||||
title: MusicLDM
|
title: MusicLDM
|
||||||
|
- local: api/pipelines/pag
|
||||||
|
title: PAG
|
||||||
- local: api/pipelines/paint_by_example
|
- local: api/pipelines/paint_by_example
|
||||||
title: Paint by Example
|
title: Paint by Example
|
||||||
- local: api/pipelines/pia
|
- local: api/pipelines/pia
|
||||||
title: Personalized Image Animator (PIA)
|
title: Personalized Image Animator (PIA)
|
||||||
- local: api/pipelines/pixart
|
- local: api/pipelines/pixart
|
||||||
title: PixArt-α
|
title: PixArt-α
|
||||||
|
- local: api/pipelines/pixart_sigma
|
||||||
|
title: PixArt-Σ
|
||||||
- local: api/pipelines/self_attention_guidance
|
- local: api/pipelines/self_attention_guidance
|
||||||
title: Self-Attention Guidance
|
title: Self-Attention Guidance
|
||||||
- local: api/pipelines/semantic_stable_diffusion
|
- local: api/pipelines/semantic_stable_diffusion
|
||||||
title: Semantic Guidance
|
title: Semantic Guidance
|
||||||
- local: api/pipelines/shap_e
|
- local: api/pipelines/shap_e
|
||||||
title: Shap-E
|
title: Shap-E
|
||||||
|
- local: api/pipelines/stable_audio
|
||||||
|
title: Stable Audio
|
||||||
- local: api/pipelines/stable_cascade
|
- local: api/pipelines/stable_cascade
|
||||||
title: Stable Cascade
|
title: Stable Cascade
|
||||||
- sections:
|
- sections:
|
||||||
@@ -353,6 +451,8 @@
|
|||||||
title: Safe Stable Diffusion
|
title: Safe Stable Diffusion
|
||||||
- local: api/pipelines/stable_diffusion/stable_diffusion_2
|
- local: api/pipelines/stable_diffusion/stable_diffusion_2
|
||||||
title: Stable Diffusion 2
|
title: Stable Diffusion 2
|
||||||
|
- local: api/pipelines/stable_diffusion/stable_diffusion_3
|
||||||
|
title: Stable Diffusion 3
|
||||||
- local: api/pipelines/stable_diffusion/stable_diffusion_xl
|
- local: api/pipelines/stable_diffusion/stable_diffusion_xl
|
||||||
title: Stable Diffusion XL
|
title: Stable Diffusion XL
|
||||||
- local: api/pipelines/stable_diffusion/sdxl_turbo
|
- local: api/pipelines/stable_diffusion/sdxl_turbo
|
||||||
@@ -385,13 +485,16 @@
|
|||||||
- local: api/pipelines/wuerstchen
|
- local: api/pipelines/wuerstchen
|
||||||
title: Wuerstchen
|
title: Wuerstchen
|
||||||
title: Pipelines
|
title: Pipelines
|
||||||
- sections:
|
- isExpanded: false
|
||||||
|
sections:
|
||||||
- local: api/schedulers/overview
|
- local: api/schedulers/overview
|
||||||
title: Overview
|
title: Overview
|
||||||
- local: api/schedulers/cm_stochastic_iterative
|
- local: api/schedulers/cm_stochastic_iterative
|
||||||
title: CMStochasticIterativeScheduler
|
title: CMStochasticIterativeScheduler
|
||||||
- local: api/schedulers/consistency_decoder
|
- local: api/schedulers/consistency_decoder
|
||||||
title: ConsistencyDecoderScheduler
|
title: ConsistencyDecoderScheduler
|
||||||
|
- local: api/schedulers/cosine_dpm
|
||||||
|
title: CosineDPMSolverMultistepScheduler
|
||||||
- local: api/schedulers/ddim_inverse
|
- local: api/schedulers/ddim_inverse
|
||||||
title: DDIMInverseScheduler
|
title: DDIMInverseScheduler
|
||||||
- local: api/schedulers/ddim
|
- local: api/schedulers/ddim
|
||||||
@@ -416,6 +519,10 @@
|
|||||||
title: EulerAncestralDiscreteScheduler
|
title: EulerAncestralDiscreteScheduler
|
||||||
- local: api/schedulers/euler
|
- local: api/schedulers/euler
|
||||||
title: EulerDiscreteScheduler
|
title: EulerDiscreteScheduler
|
||||||
|
- local: api/schedulers/flow_match_euler_discrete
|
||||||
|
title: FlowMatchEulerDiscreteScheduler
|
||||||
|
- local: api/schedulers/flow_match_heun_discrete
|
||||||
|
title: FlowMatchHeunDiscreteScheduler
|
||||||
- local: api/schedulers/heun
|
- local: api/schedulers/heun
|
||||||
title: HeunDiscreteScheduler
|
title: HeunDiscreteScheduler
|
||||||
- local: api/schedulers/ipndm
|
- local: api/schedulers/ipndm
|
||||||
@@ -445,7 +552,8 @@
|
|||||||
- local: api/schedulers/vq_diffusion
|
- local: api/schedulers/vq_diffusion
|
||||||
title: VQDiffusionScheduler
|
title: VQDiffusionScheduler
|
||||||
title: Schedulers
|
title: Schedulers
|
||||||
- sections:
|
- isExpanded: false
|
||||||
|
sections:
|
||||||
- local: api/internal_classes_overview
|
- local: api/internal_classes_overview
|
||||||
title: Overview
|
title: Overview
|
||||||
- local: api/attnprocessor
|
- local: api/attnprocessor
|
||||||
@@ -458,5 +566,7 @@
|
|||||||
title: Utilities
|
title: Utilities
|
||||||
- local: api/image_processor
|
- local: api/image_processor
|
||||||
title: VAE Image Processor
|
title: VAE Image Processor
|
||||||
|
- local: api/video_processor
|
||||||
|
title: Video Processor
|
||||||
title: Internal classes
|
title: Internal classes
|
||||||
title: API
|
title: API
|
||||||
|
|||||||
231
docs/source/en/advanced_inference/outpaint.md
Normal file
231
docs/source/en/advanced_inference/outpaint.md
Normal file
@@ -0,0 +1,231 @@
|
|||||||
|
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
|
||||||
|
|
||||||
|
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||||
|
the License. You may obtain a copy of the License at
|
||||||
|
|
||||||
|
http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
|
||||||
|
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||||
|
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||||
|
specific language governing permissions and limitations under the License.
|
||||||
|
-->
|
||||||
|
|
||||||
|
# Outpainting
|
||||||
|
|
||||||
|
Outpainting extends an image beyond its original boundaries, allowing you to add, replace, or modify visual elements in an image while preserving the original image. Like [inpainting](../using-diffusers/inpaint), you want to fill the white area (in this case, the area outside of the original image) with new visual elements while keeping the original image (represented by a mask of black pixels). There are a couple of ways to outpaint, such as with a [ControlNet](https://hf.co/blog/OzzyGT/outpainting-controlnet) or with [Differential Diffusion](https://hf.co/blog/OzzyGT/outpainting-differential-diffusion).
|
||||||
|
|
||||||
|
This guide will show you how to outpaint with an inpainting model, ControlNet, and a ZoeDepth estimator.
|
||||||
|
|
||||||
|
Before you begin, make sure you have the [controlnet_aux](https://github.com/huggingface/controlnet_aux) library installed so you can use the ZoeDepth estimator.
|
||||||
|
|
||||||
|
```py
|
||||||
|
!pip install -q controlnet_aux
|
||||||
|
```
|
||||||
|
|
||||||
|
## Image preparation
|
||||||
|
|
||||||
|
Start by picking an image to outpaint with and remove the background with a Space like [BRIA-RMBG-1.4](https://hf.co/spaces/briaai/BRIA-RMBG-1.4).
|
||||||
|
|
||||||
|
<iframe
|
||||||
|
src="https://briaai-bria-rmbg-1-4.hf.space"
|
||||||
|
frameborder="0"
|
||||||
|
width="850"
|
||||||
|
height="450"
|
||||||
|
></iframe>
|
||||||
|
|
||||||
|
For example, remove the background from this image of a pair of shoes.
|
||||||
|
|
||||||
|
<div class="flex flex-row gap-4">
|
||||||
|
<div class="flex-1">
|
||||||
|
<img class="rounded-xl" src="https://huggingface.co/datasets/stevhliu/testing-images/resolve/main/original-jordan.png"/>
|
||||||
|
<figcaption class="mt-2 text-center text-sm text-gray-500">original image</figcaption>
|
||||||
|
</div>
|
||||||
|
<div class="flex-1">
|
||||||
|
<img class="rounded-xl" src="https://huggingface.co/datasets/stevhliu/testing-images/resolve/main/no-background-jordan.png"/>
|
||||||
|
<figcaption class="mt-2 text-center text-sm text-gray-500">background removed</figcaption>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
[Stable Diffusion XL (SDXL)](../using-diffusers/sdxl) models work best with 1024x1024 images, but you can resize the image to any size as long as your hardware has enough memory to support it. The transparent background in the image should also be replaced with a white background. Create a function (like the one below) that scales and pastes the image onto a white background.
|
||||||
|
|
||||||
|
```py
|
||||||
|
import random
|
||||||
|
|
||||||
|
import requests
|
||||||
|
import torch
|
||||||
|
from controlnet_aux import ZoeDetector
|
||||||
|
from PIL import Image, ImageOps
|
||||||
|
|
||||||
|
from diffusers import (
|
||||||
|
AutoencoderKL,
|
||||||
|
ControlNetModel,
|
||||||
|
StableDiffusionXLControlNetPipeline,
|
||||||
|
StableDiffusionXLInpaintPipeline,
|
||||||
|
)
|
||||||
|
|
||||||
|
def scale_and_paste(original_image):
|
||||||
|
aspect_ratio = original_image.width / original_image.height
|
||||||
|
|
||||||
|
if original_image.width > original_image.height:
|
||||||
|
new_width = 1024
|
||||||
|
new_height = round(new_width / aspect_ratio)
|
||||||
|
else:
|
||||||
|
new_height = 1024
|
||||||
|
new_width = round(new_height * aspect_ratio)
|
||||||
|
|
||||||
|
resized_original = original_image.resize((new_width, new_height), Image.LANCZOS)
|
||||||
|
white_background = Image.new("RGBA", (1024, 1024), "white")
|
||||||
|
x = (1024 - new_width) // 2
|
||||||
|
y = (1024 - new_height) // 2
|
||||||
|
white_background.paste(resized_original, (x, y), resized_original)
|
||||||
|
|
||||||
|
return resized_original, white_background
|
||||||
|
|
||||||
|
original_image = Image.open(
|
||||||
|
requests.get(
|
||||||
|
"https://huggingface.co/datasets/stevhliu/testing-images/resolve/main/no-background-jordan.png",
|
||||||
|
stream=True,
|
||||||
|
).raw
|
||||||
|
).convert("RGBA")
|
||||||
|
resized_img, white_bg_image = scale_and_paste(original_image)
|
||||||
|
```
|
||||||
|
|
||||||
|
To avoid adding unwanted extra details, use the ZoeDepth estimator to provide additional guidance during generation and to ensure the shoes remain consistent with the original image.
|
||||||
|
|
||||||
|
```py
|
||||||
|
zoe = ZoeDetector.from_pretrained("lllyasviel/Annotators")
|
||||||
|
image_zoe = zoe(white_bg_image, detect_resolution=512, image_resolution=1024)
|
||||||
|
image_zoe
|
||||||
|
```
|
||||||
|
|
||||||
|
<div class="flex justify-center">
|
||||||
|
<img src="https://huggingface.co/datasets/stevhliu/testing-images/resolve/main/zoedepth-jordan.png"/>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
## Outpaint
|
||||||
|
|
||||||
|
Once your image is ready, you can generate content in the white area around the shoes with [controlnet-inpaint-dreamer-sdxl](https://hf.co/destitech/controlnet-inpaint-dreamer-sdxl), a SDXL ControlNet trained for inpainting.
|
||||||
|
|
||||||
|
Load the inpainting ControlNet, ZoeDepth model, VAE and pass them to the [`StableDiffusionXLControlNetPipeline`]. Then you can create an optional `generate_image` function (for convenience) to outpaint an initial image.
|
||||||
|
|
||||||
|
```py
|
||||||
|
controlnets = [
|
||||||
|
ControlNetModel.from_pretrained(
|
||||||
|
"destitech/controlnet-inpaint-dreamer-sdxl", torch_dtype=torch.float16, variant="fp16"
|
||||||
|
),
|
||||||
|
ControlNetModel.from_pretrained(
|
||||||
|
"diffusers/controlnet-zoe-depth-sdxl-1.0", torch_dtype=torch.float16
|
||||||
|
),
|
||||||
|
]
|
||||||
|
vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16).to("cuda")
|
||||||
|
pipeline = StableDiffusionXLControlNetPipeline.from_pretrained(
|
||||||
|
"SG161222/RealVisXL_V4.0", torch_dtype=torch.float16, variant="fp16", controlnet=controlnets, vae=vae
|
||||||
|
).to("cuda")
|
||||||
|
|
||||||
|
def generate_image(prompt, negative_prompt, inpaint_image, zoe_image, seed: int = None):
|
||||||
|
if seed is None:
|
||||||
|
seed = random.randint(0, 2**32 - 1)
|
||||||
|
|
||||||
|
generator = torch.Generator(device="cpu").manual_seed(seed)
|
||||||
|
|
||||||
|
image = pipeline(
|
||||||
|
prompt,
|
||||||
|
negative_prompt=negative_prompt,
|
||||||
|
image=[inpaint_image, zoe_image],
|
||||||
|
guidance_scale=6.5,
|
||||||
|
num_inference_steps=25,
|
||||||
|
generator=generator,
|
||||||
|
controlnet_conditioning_scale=[0.5, 0.8],
|
||||||
|
control_guidance_end=[0.9, 0.6],
|
||||||
|
).images[0]
|
||||||
|
|
||||||
|
return image
|
||||||
|
|
||||||
|
prompt = "nike air jordans on a basketball court"
|
||||||
|
negative_prompt = ""
|
||||||
|
|
||||||
|
temp_image = generate_image(prompt, negative_prompt, white_bg_image, image_zoe, 908097)
|
||||||
|
```
|
||||||
|
|
||||||
|
Paste the original image over the initial outpainted image. You'll improve the outpainted background in a later step.
|
||||||
|
|
||||||
|
```py
|
||||||
|
x = (1024 - resized_img.width) // 2
|
||||||
|
y = (1024 - resized_img.height) // 2
|
||||||
|
temp_image.paste(resized_img, (x, y), resized_img)
|
||||||
|
temp_image
|
||||||
|
```
|
||||||
|
|
||||||
|
<div class="flex justify-center">
|
||||||
|
<img src="https://huggingface.co/datasets/stevhliu/testing-images/resolve/main/initial-outpaint.png"/>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
> [!TIP]
|
||||||
|
> Now is a good time to free up some memory if you're running low!
|
||||||
|
>
|
||||||
|
> ```py
|
||||||
|
> pipeline=None
|
||||||
|
> torch.cuda.empty_cache()
|
||||||
|
> ```
|
||||||
|
|
||||||
|
Now that you have an initial outpainted image, load the [`StableDiffusionXLInpaintPipeline`] with the [RealVisXL](https://hf.co/SG161222/RealVisXL_V4.0) model to generate the final outpainted image with better quality.
|
||||||
|
|
||||||
|
```py
|
||||||
|
pipeline = StableDiffusionXLInpaintPipeline.from_pretrained(
|
||||||
|
"OzzyGT/RealVisXL_V4.0_inpainting",
|
||||||
|
torch_dtype=torch.float16,
|
||||||
|
variant="fp16",
|
||||||
|
vae=vae,
|
||||||
|
).to("cuda")
|
||||||
|
```
|
||||||
|
|
||||||
|
Prepare a mask for the final outpainted image. To create a more natural transition between the original image and the outpainted background, blur the mask to help it blend better.
|
||||||
|
|
||||||
|
```py
|
||||||
|
mask = Image.new("L", temp_image.size)
|
||||||
|
mask.paste(resized_img.split()[3], (x, y))
|
||||||
|
mask = ImageOps.invert(mask)
|
||||||
|
final_mask = mask.point(lambda p: p > 128 and 255)
|
||||||
|
mask_blurred = pipeline.mask_processor.blur(final_mask, blur_factor=20)
|
||||||
|
mask_blurred
|
||||||
|
```
|
||||||
|
|
||||||
|
<div class="flex justify-center">
|
||||||
|
<img src="https://huggingface.co/datasets/stevhliu/testing-images/resolve/main/blurred-mask.png"/>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
Create a better prompt and pass it to the `generate_outpaint` function to generate the final outpainted image. Again, paste the original image over the final outpainted background.
|
||||||
|
|
||||||
|
```py
|
||||||
|
def generate_outpaint(prompt, negative_prompt, image, mask, seed: int = None):
|
||||||
|
if seed is None:
|
||||||
|
seed = random.randint(0, 2**32 - 1)
|
||||||
|
|
||||||
|
generator = torch.Generator(device="cpu").manual_seed(seed)
|
||||||
|
|
||||||
|
image = pipeline(
|
||||||
|
prompt,
|
||||||
|
negative_prompt=negative_prompt,
|
||||||
|
image=image,
|
||||||
|
mask_image=mask,
|
||||||
|
guidance_scale=10.0,
|
||||||
|
strength=0.8,
|
||||||
|
num_inference_steps=30,
|
||||||
|
generator=generator,
|
||||||
|
).images[0]
|
||||||
|
|
||||||
|
return image
|
||||||
|
|
||||||
|
prompt = "high quality photo of nike air jordans on a basketball court, highly detailed"
|
||||||
|
negative_prompt = ""
|
||||||
|
|
||||||
|
final_image = generate_outpaint(prompt, negative_prompt, temp_image, mask_blurred, 7688778)
|
||||||
|
x = (1024 - resized_img.width) // 2
|
||||||
|
y = (1024 - resized_img.height) // 2
|
||||||
|
final_image.paste(resized_img, (x, y), resized_img)
|
||||||
|
final_image
|
||||||
|
```
|
||||||
|
|
||||||
|
<div class="flex justify-center">
|
||||||
|
<img src="https://huggingface.co/datasets/stevhliu/testing-images/resolve/main/final-outpaint.png"/>
|
||||||
|
</div>
|
||||||
@@ -41,12 +41,6 @@ An attention processor is a class for applying different types of attention mech
|
|||||||
## FusedAttnProcessor2_0
|
## FusedAttnProcessor2_0
|
||||||
[[autodoc]] models.attention_processor.FusedAttnProcessor2_0
|
[[autodoc]] models.attention_processor.FusedAttnProcessor2_0
|
||||||
|
|
||||||
## LoRAAttnAddedKVProcessor
|
|
||||||
[[autodoc]] models.attention_processor.LoRAAttnAddedKVProcessor
|
|
||||||
|
|
||||||
## LoRAXFormersAttnProcessor
|
|
||||||
[[autodoc]] models.attention_processor.LoRAXFormersAttnProcessor
|
|
||||||
|
|
||||||
## SlicedAttnProcessor
|
## SlicedAttnProcessor
|
||||||
[[autodoc]] models.attention_processor.SlicedAttnProcessor
|
[[autodoc]] models.attention_processor.SlicedAttnProcessor
|
||||||
|
|
||||||
@@ -55,3 +49,6 @@ An attention processor is a class for applying different types of attention mech
|
|||||||
|
|
||||||
## XFormersAttnProcessor
|
## XFormersAttnProcessor
|
||||||
[[autodoc]] models.attention_processor.XFormersAttnProcessor
|
[[autodoc]] models.attention_processor.XFormersAttnProcessor
|
||||||
|
|
||||||
|
## AttnProcessorNPU
|
||||||
|
[[autodoc]] models.attention_processor.AttnProcessorNPU
|
||||||
|
|||||||
@@ -25,3 +25,11 @@ All pipelines with [`VaeImageProcessor`] accept PIL Image, PyTorch tensor, or Nu
|
|||||||
The [`VaeImageProcessorLDM3D`] accepts RGB and depth inputs and returns RGB and depth outputs.
|
The [`VaeImageProcessorLDM3D`] accepts RGB and depth inputs and returns RGB and depth outputs.
|
||||||
|
|
||||||
[[autodoc]] image_processor.VaeImageProcessorLDM3D
|
[[autodoc]] image_processor.VaeImageProcessorLDM3D
|
||||||
|
|
||||||
|
## PixArtImageProcessor
|
||||||
|
|
||||||
|
[[autodoc]] image_processor.PixArtImageProcessor
|
||||||
|
|
||||||
|
## IPAdapterMaskProcessor
|
||||||
|
|
||||||
|
[[autodoc]] image_processor.IPAdapterMaskProcessor
|
||||||
|
|||||||
@@ -12,10 +12,13 @@ specific language governing permissions and limitations under the License.
|
|||||||
|
|
||||||
# LoRA
|
# LoRA
|
||||||
|
|
||||||
LoRA is a fast and lightweight training method that inserts and trains a significantly smaller number of parameters instead of all the model parameters. This produces a smaller file (~100 MBs) and makes it easier to quickly train a model to learn a new concept. LoRA weights are typically loaded into the UNet, text encoder or both. There are two classes for loading LoRA weights:
|
LoRA is a fast and lightweight training method that inserts and trains a significantly smaller number of parameters instead of all the model parameters. This produces a smaller file (~100 MBs) and makes it easier to quickly train a model to learn a new concept. LoRA weights are typically loaded into the denoiser, text encoder or both. The denoiser usually corresponds to a UNet ([`UNet2DConditionModel`], for example) or a Transformer ([`SD3Transformer2DModel`], for example). There are several classes for loading LoRA weights:
|
||||||
|
|
||||||
- [`LoraLoaderMixin`] provides functions for loading and unloading, fusing and unfusing, enabling and disabling, and more functions for managing LoRA weights. This class can be used with any model.
|
- [`StableDiffusionLoraLoaderMixin`] provides functions for loading and unloading, fusing and unfusing, enabling and disabling, and more functions for managing LoRA weights. This class can be used with any model.
|
||||||
- [`StableDiffusionXLLoraLoaderMixin`] is a [Stable Diffusion (SDXL)](../../api/pipelines/stable_diffusion/stable_diffusion_xl) version of the [`LoraLoaderMixin`] class for loading and saving LoRA weights. It can only be used with the SDXL model.
|
- [`StableDiffusionXLLoraLoaderMixin`] is a [Stable Diffusion (SDXL)](../../api/pipelines/stable_diffusion/stable_diffusion_xl) version of the [`StableDiffusionLoraLoaderMixin`] class for loading and saving LoRA weights. It can only be used with the SDXL model.
|
||||||
|
- [`SD3LoraLoaderMixin`] provides similar functions for [Stable Diffusion 3](https://huggingface.co/blog/sd3).
|
||||||
|
- [`AmusedLoraLoaderMixin`] is for the [`AmusedPipeline`].
|
||||||
|
- [`LoraBaseMixin`] provides a base class with several utility methods to fuse, unfuse, unload, LoRAs and more.
|
||||||
|
|
||||||
<Tip>
|
<Tip>
|
||||||
|
|
||||||
@@ -23,10 +26,22 @@ To learn more about how to load LoRA weights, see the [LoRA](../../using-diffuse
|
|||||||
|
|
||||||
</Tip>
|
</Tip>
|
||||||
|
|
||||||
## LoraLoaderMixin
|
## StableDiffusionLoraLoaderMixin
|
||||||
|
|
||||||
[[autodoc]] loaders.lora.LoraLoaderMixin
|
[[autodoc]] loaders.lora_pipeline.StableDiffusionLoraLoaderMixin
|
||||||
|
|
||||||
## StableDiffusionXLLoraLoaderMixin
|
## StableDiffusionXLLoraLoaderMixin
|
||||||
|
|
||||||
[[autodoc]] loaders.lora.StableDiffusionXLLoraLoaderMixin
|
[[autodoc]] loaders.lora_pipeline.StableDiffusionXLLoraLoaderMixin
|
||||||
|
|
||||||
|
## SD3LoraLoaderMixin
|
||||||
|
|
||||||
|
[[autodoc]] loaders.lora_pipeline.SD3LoraLoaderMixin
|
||||||
|
|
||||||
|
## AmusedLoraLoaderMixin
|
||||||
|
|
||||||
|
[[autodoc]] loaders.lora_pipeline.AmusedLoraLoaderMixin
|
||||||
|
|
||||||
|
## LoraBaseMixin
|
||||||
|
|
||||||
|
[[autodoc]] loaders.lora_base.LoraBaseMixin
|
||||||
@@ -12,7 +12,7 @@ specific language governing permissions and limitations under the License.
|
|||||||
|
|
||||||
# PEFT
|
# PEFT
|
||||||
|
|
||||||
Diffusers supports loading adapters such as [LoRA](../../using-diffusers/loading_adapters) with the [PEFT](https://huggingface.co/docs/peft/index) library with the [`~loaders.peft.PeftAdapterMixin`] class. This allows modeling classes in Diffusers like [`UNet2DConditionModel`] to load an adapter.
|
Diffusers supports loading adapters such as [LoRA](../../using-diffusers/loading_adapters) with the [PEFT](https://huggingface.co/docs/peft/index) library with the [`~loaders.peft.PeftAdapterMixin`] class. This allows modeling classes in Diffusers like [`UNet2DConditionModel`], [`SD3Transformer2DModel`] to operate with an adapter.
|
||||||
|
|
||||||
<Tip>
|
<Tip>
|
||||||
|
|
||||||
|
|||||||
@@ -12,26 +12,51 @@ specific language governing permissions and limitations under the License.
|
|||||||
|
|
||||||
# Single files
|
# Single files
|
||||||
|
|
||||||
Diffusers supports loading pretrained pipeline (or model) weights stored in a single file, such as a `ckpt` or `safetensors` file. These single file types are typically produced from community trained models. There are three classes for loading single file weights:
|
The [`~loaders.FromSingleFileMixin.from_single_file`] method allows you to load:
|
||||||
|
|
||||||
- [`FromSingleFileMixin`] supports loading pretrained pipeline weights stored in a single file, which can either be a `ckpt` or `safetensors` file.
|
* a model stored in a single file, which is useful if you're working with models from the diffusion ecosystem, like Automatic1111, and commonly rely on a single-file layout to store and share models
|
||||||
- [`FromOriginalVAEMixin`] supports loading a pretrained [`AutoencoderKL`] from pretrained ControlNet weights stored in a single file, which can either be a `ckpt` or `safetensors` file.
|
* a model stored in their originally distributed layout, which is useful if you're working with models finetuned with other services, and want to load it directly into Diffusers model objects and pipelines
|
||||||
- [`FromOriginalControlnetMixin`] supports loading pretrained ControlNet weights stored in a single file, which can either be a `ckpt` or `safetensors` file.
|
|
||||||
|
|
||||||
<Tip>
|
> [!TIP]
|
||||||
|
> Read the [Model files and layouts](../../using-diffusers/other-formats) guide to learn more about the Diffusers-multifolder layout versus the single-file layout, and how to load models stored in these different layouts.
|
||||||
|
|
||||||
To learn more about how to load single file weights, see the [Load different Stable Diffusion formats](../../using-diffusers/other-formats) loading guide.
|
## Supported pipelines
|
||||||
|
|
||||||
</Tip>
|
- [`StableDiffusionPipeline`]
|
||||||
|
- [`StableDiffusionImg2ImgPipeline`]
|
||||||
|
- [`StableDiffusionInpaintPipeline`]
|
||||||
|
- [`StableDiffusionControlNetPipeline`]
|
||||||
|
- [`StableDiffusionControlNetImg2ImgPipeline`]
|
||||||
|
- [`StableDiffusionControlNetInpaintPipeline`]
|
||||||
|
- [`StableDiffusionUpscalePipeline`]
|
||||||
|
- [`StableDiffusionXLPipeline`]
|
||||||
|
- [`StableDiffusionXLImg2ImgPipeline`]
|
||||||
|
- [`StableDiffusionXLInpaintPipeline`]
|
||||||
|
- [`StableDiffusionXLInstructPix2PixPipeline`]
|
||||||
|
- [`StableDiffusionXLControlNetPipeline`]
|
||||||
|
- [`StableDiffusionXLKDiffusionPipeline`]
|
||||||
|
- [`StableDiffusion3Pipeline`]
|
||||||
|
- [`LatentConsistencyModelPipeline`]
|
||||||
|
- [`LatentConsistencyModelImg2ImgPipeline`]
|
||||||
|
- [`StableDiffusionControlNetXSPipeline`]
|
||||||
|
- [`StableDiffusionXLControlNetXSPipeline`]
|
||||||
|
- [`LEditsPPPipelineStableDiffusion`]
|
||||||
|
- [`LEditsPPPipelineStableDiffusionXL`]
|
||||||
|
- [`PIAPipeline`]
|
||||||
|
|
||||||
|
## Supported models
|
||||||
|
|
||||||
|
- [`UNet2DConditionModel`]
|
||||||
|
- [`StableCascadeUNet`]
|
||||||
|
- [`AutoencoderKL`]
|
||||||
|
- [`ControlNetModel`]
|
||||||
|
- [`SD3Transformer2DModel`]
|
||||||
|
- [`FluxTransformer2DModel`]
|
||||||
|
|
||||||
## FromSingleFileMixin
|
## FromSingleFileMixin
|
||||||
|
|
||||||
[[autodoc]] loaders.single_file.FromSingleFileMixin
|
[[autodoc]] loaders.single_file.FromSingleFileMixin
|
||||||
|
|
||||||
## FromOriginalVAEMixin
|
## FromOriginalModelMixin
|
||||||
|
|
||||||
[[autodoc]] loaders.autoencoder.FromOriginalVAEMixin
|
[[autodoc]] loaders.single_file_model.FromOriginalModelMixin
|
||||||
|
|
||||||
## FromOriginalControlnetMixin
|
|
||||||
|
|
||||||
[[autodoc]] loaders.controlnet.FromOriginalControlNetMixin
|
|
||||||
|
|||||||
@@ -12,7 +12,7 @@ specific language governing permissions and limitations under the License.
|
|||||||
|
|
||||||
# UNet
|
# UNet
|
||||||
|
|
||||||
Some training methods - like LoRA and Custom Diffusion - typically target the UNet's attention layers, but these training methods can also target other non-attention layers. Instead of training all of a model's parameters, only a subset of the parameters are trained, which is faster and more efficient. This class is useful if you're *only* loading weights into a UNet. If you need to load weights into the text encoder or a text encoder and UNet, try using the [`~loaders.LoraLoaderMixin.load_lora_weights`] function instead.
|
Some training methods - like LoRA and Custom Diffusion - typically target the UNet's attention layers, but these training methods can also target other non-attention layers. Instead of training all of a model's parameters, only a subset of the parameters are trained, which is faster and more efficient. This class is useful if you're *only* loading weights into a UNet. If you need to load weights into the text encoder or a text encoder and UNet, try using the [`~loaders.StableDiffusionLoraLoaderMixin.load_lora_weights`] function instead.
|
||||||
|
|
||||||
The [`UNet2DConditionLoadersMixin`] class provides functions for loading and saving weights, fusing and unfusing LoRAs, disabling and enabling LoRAs, and setting and deleting adapters.
|
The [`UNet2DConditionLoadersMixin`] class provides functions for loading and saving weights, fusing and unfusing LoRAs, disabling and enabling LoRAs, and setting and deleting adapters.
|
||||||
|
|
||||||
|
|||||||
30
docs/source/en/api/models/allegro_transformer3d.md
Normal file
30
docs/source/en/api/models/allegro_transformer3d.md
Normal file
@@ -0,0 +1,30 @@
|
|||||||
|
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
|
||||||
|
|
||||||
|
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||||
|
the License. You may obtain a copy of the License at
|
||||||
|
|
||||||
|
http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
|
||||||
|
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||||
|
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||||
|
specific language governing permissions and limitations under the License. -->
|
||||||
|
|
||||||
|
# AllegroTransformer3DModel
|
||||||
|
|
||||||
|
A Diffusion Transformer model for 3D data from [Allegro](https://github.com/rhymes-ai/Allegro) was introduced in [Allegro: Open the Black Box of Commercial-Level Video Generation Model](https://huggingface.co/papers/2410.15458) by RhymesAI.
|
||||||
|
|
||||||
|
The model can be loaded with the following code snippet.
|
||||||
|
|
||||||
|
```python
|
||||||
|
from diffusers import AllegroTransformer3DModel
|
||||||
|
|
||||||
|
vae = AllegroTransformer3DModel.from_pretrained("rhymes-ai/Allegro", subfolder="transformer", torch_dtype=torch.bfloat16).to("cuda")
|
||||||
|
```
|
||||||
|
|
||||||
|
## AllegroTransformer3DModel
|
||||||
|
|
||||||
|
[[autodoc]] AllegroTransformer3DModel
|
||||||
|
|
||||||
|
## Transformer2DModelOutput
|
||||||
|
|
||||||
|
[[autodoc]] models.modeling_outputs.Transformer2DModelOutput
|
||||||
19
docs/source/en/api/models/aura_flow_transformer2d.md
Normal file
19
docs/source/en/api/models/aura_flow_transformer2d.md
Normal file
@@ -0,0 +1,19 @@
|
|||||||
|
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
|
||||||
|
|
||||||
|
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||||
|
the License. You may obtain a copy of the License at
|
||||||
|
|
||||||
|
http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
|
||||||
|
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||||
|
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||||
|
specific language governing permissions and limitations under the License.
|
||||||
|
-->
|
||||||
|
|
||||||
|
# AuraFlowTransformer2DModel
|
||||||
|
|
||||||
|
A Transformer model for image-like data from [AuraFlow](https://blog.fal.ai/auraflow/).
|
||||||
|
|
||||||
|
## AuraFlowTransformer2DModel
|
||||||
|
|
||||||
|
[[autodoc]] AuraFlowTransformer2DModel
|
||||||
38
docs/source/en/api/models/autoencoder_oobleck.md
Normal file
38
docs/source/en/api/models/autoencoder_oobleck.md
Normal file
@@ -0,0 +1,38 @@
|
|||||||
|
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
|
||||||
|
|
||||||
|
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||||
|
the License. You may obtain a copy of the License at
|
||||||
|
|
||||||
|
http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
|
||||||
|
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||||
|
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||||
|
specific language governing permissions and limitations under the License.
|
||||||
|
-->
|
||||||
|
|
||||||
|
# AutoencoderOobleck
|
||||||
|
|
||||||
|
The Oobleck variational autoencoder (VAE) model with KL loss was introduced in [Stability-AI/stable-audio-tools](https://github.com/Stability-AI/stable-audio-tools) and [Stable Audio Open](https://huggingface.co/papers/2407.14358) by Stability AI. The model is used in 🤗 Diffusers to encode audio waveforms into latents and to decode latent representations into audio waveforms.
|
||||||
|
|
||||||
|
The abstract from the paper is:
|
||||||
|
|
||||||
|
*Open generative models are vitally important for the community, allowing for fine-tunes and serving as baselines when presenting new models. However, most current text-to-audio models are private and not accessible for artists and researchers to build upon. Here we describe the architecture and training process of a new open-weights text-to-audio model trained with Creative Commons data. Our evaluation shows that the model's performance is competitive with the state-of-the-art across various metrics. Notably, the reported FDopenl3 results (measuring the realism of the generations) showcase its potential for high-quality stereo sound synthesis at 44.1kHz.*
|
||||||
|
|
||||||
|
## AutoencoderOobleck
|
||||||
|
|
||||||
|
[[autodoc]] AutoencoderOobleck
|
||||||
|
- decode
|
||||||
|
- encode
|
||||||
|
- all
|
||||||
|
|
||||||
|
## OobleckDecoderOutput
|
||||||
|
|
||||||
|
[[autodoc]] models.autoencoders.autoencoder_oobleck.OobleckDecoderOutput
|
||||||
|
|
||||||
|
## OobleckDecoderOutput
|
||||||
|
|
||||||
|
[[autodoc]] models.autoencoders.autoencoder_oobleck.OobleckDecoderOutput
|
||||||
|
|
||||||
|
## AutoencoderOobleckOutput
|
||||||
|
|
||||||
|
[[autodoc]] models.autoencoders.autoencoder_oobleck.AutoencoderOobleckOutput
|
||||||
@@ -21,7 +21,7 @@ The abstract from the paper is:
|
|||||||
## Loading from the original format
|
## Loading from the original format
|
||||||
|
|
||||||
By default the [`AutoencoderKL`] should be loaded with [`~ModelMixin.from_pretrained`], but it can also be loaded
|
By default the [`AutoencoderKL`] should be loaded with [`~ModelMixin.from_pretrained`], but it can also be loaded
|
||||||
from the original format using [`FromOriginalVAEMixin.from_single_file`] as follows:
|
from the original format using [`FromOriginalModelMixin.from_single_file`] as follows:
|
||||||
|
|
||||||
```py
|
```py
|
||||||
from diffusers import AutoencoderKL
|
from diffusers import AutoencoderKL
|
||||||
|
|||||||
37
docs/source/en/api/models/autoencoderkl_allegro.md
Normal file
37
docs/source/en/api/models/autoencoderkl_allegro.md
Normal file
@@ -0,0 +1,37 @@
|
|||||||
|
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
|
||||||
|
|
||||||
|
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||||
|
the License. You may obtain a copy of the License at
|
||||||
|
|
||||||
|
http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
|
||||||
|
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||||
|
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||||
|
specific language governing permissions and limitations under the License. -->
|
||||||
|
|
||||||
|
# AutoencoderKLAllegro
|
||||||
|
|
||||||
|
The 3D variational autoencoder (VAE) model with KL loss used in [Allegro](https://github.com/rhymes-ai/Allegro) was introduced in [Allegro: Open the Black Box of Commercial-Level Video Generation Model](https://huggingface.co/papers/2410.15458) by RhymesAI.
|
||||||
|
|
||||||
|
The model can be loaded with the following code snippet.
|
||||||
|
|
||||||
|
```python
|
||||||
|
from diffusers import AutoencoderKLAllegro
|
||||||
|
|
||||||
|
vae = AutoencoderKLCogVideoX.from_pretrained("rhymes-ai/Allegro", subfolder="vae", torch_dtype=torch.float32).to("cuda")
|
||||||
|
```
|
||||||
|
|
||||||
|
## AutoencoderKLAllegro
|
||||||
|
|
||||||
|
[[autodoc]] AutoencoderKLAllegro
|
||||||
|
- decode
|
||||||
|
- encode
|
||||||
|
- all
|
||||||
|
|
||||||
|
## AutoencoderKLOutput
|
||||||
|
|
||||||
|
[[autodoc]] models.autoencoders.autoencoder_kl.AutoencoderKLOutput
|
||||||
|
|
||||||
|
## DecoderOutput
|
||||||
|
|
||||||
|
[[autodoc]] models.autoencoders.vae.DecoderOutput
|
||||||
37
docs/source/en/api/models/autoencoderkl_cogvideox.md
Normal file
37
docs/source/en/api/models/autoencoderkl_cogvideox.md
Normal file
@@ -0,0 +1,37 @@
|
|||||||
|
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
|
||||||
|
|
||||||
|
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||||
|
the License. You may obtain a copy of the License at
|
||||||
|
|
||||||
|
http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
|
||||||
|
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||||
|
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||||
|
specific language governing permissions and limitations under the License. -->
|
||||||
|
|
||||||
|
# AutoencoderKLCogVideoX
|
||||||
|
|
||||||
|
The 3D variational autoencoder (VAE) model with KL loss used in [CogVideoX](https://github.com/THUDM/CogVideo) was introduced in [CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer](https://github.com/THUDM/CogVideo/blob/main/resources/CogVideoX.pdf) by Tsinghua University & ZhipuAI.
|
||||||
|
|
||||||
|
The model can be loaded with the following code snippet.
|
||||||
|
|
||||||
|
```python
|
||||||
|
from diffusers import AutoencoderKLCogVideoX
|
||||||
|
|
||||||
|
vae = AutoencoderKLCogVideoX.from_pretrained("THUDM/CogVideoX-2b", subfolder="vae", torch_dtype=torch.float16).to("cuda")
|
||||||
|
```
|
||||||
|
|
||||||
|
## AutoencoderKLCogVideoX
|
||||||
|
|
||||||
|
[[autodoc]] AutoencoderKLCogVideoX
|
||||||
|
- decode
|
||||||
|
- encode
|
||||||
|
- all
|
||||||
|
|
||||||
|
## AutoencoderKLOutput
|
||||||
|
|
||||||
|
[[autodoc]] models.autoencoders.autoencoder_kl.AutoencoderKLOutput
|
||||||
|
|
||||||
|
## DecoderOutput
|
||||||
|
|
||||||
|
[[autodoc]] models.autoencoders.vae.DecoderOutput
|
||||||
32
docs/source/en/api/models/autoencoderkl_mochi.md
Normal file
32
docs/source/en/api/models/autoencoderkl_mochi.md
Normal file
@@ -0,0 +1,32 @@
|
|||||||
|
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
|
||||||
|
|
||||||
|
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||||
|
the License. You may obtain a copy of the License at
|
||||||
|
|
||||||
|
http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
|
||||||
|
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||||
|
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||||
|
specific language governing permissions and limitations under the License. -->
|
||||||
|
|
||||||
|
# AutoencoderKLMochi
|
||||||
|
|
||||||
|
The 3D variational autoencoder (VAE) model with KL loss used in [Mochi](https://github.com/genmoai/models) was introduced in [Mochi 1 Preview](https://huggingface.co/genmo/mochi-1-preview) by Tsinghua University & ZhipuAI.
|
||||||
|
|
||||||
|
The model can be loaded with the following code snippet.
|
||||||
|
|
||||||
|
```python
|
||||||
|
from diffusers import AutoencoderKLMochi
|
||||||
|
|
||||||
|
vae = AutoencoderKLMochi.from_pretrained("genmo/mochi-1-preview", subfolder="vae", torch_dtype=torch.float32).to("cuda")
|
||||||
|
```
|
||||||
|
|
||||||
|
## AutoencoderKLMochi
|
||||||
|
|
||||||
|
[[autodoc]] AutoencoderKLMochi
|
||||||
|
- decode
|
||||||
|
- all
|
||||||
|
|
||||||
|
## DecoderOutput
|
||||||
|
|
||||||
|
[[autodoc]] models.autoencoders.vae.DecoderOutput
|
||||||
30
docs/source/en/api/models/cogvideox_transformer3d.md
Normal file
30
docs/source/en/api/models/cogvideox_transformer3d.md
Normal file
@@ -0,0 +1,30 @@
|
|||||||
|
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
|
||||||
|
|
||||||
|
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||||
|
the License. You may obtain a copy of the License at
|
||||||
|
|
||||||
|
http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
|
||||||
|
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||||
|
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||||
|
specific language governing permissions and limitations under the License. -->
|
||||||
|
|
||||||
|
# CogVideoXTransformer3DModel
|
||||||
|
|
||||||
|
A Diffusion Transformer model for 3D data from [CogVideoX](https://github.com/THUDM/CogVideo) was introduced in [CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer](https://github.com/THUDM/CogVideo/blob/main/resources/CogVideoX.pdf) by Tsinghua University & ZhipuAI.
|
||||||
|
|
||||||
|
The model can be loaded with the following code snippet.
|
||||||
|
|
||||||
|
```python
|
||||||
|
from diffusers import CogVideoXTransformer3DModel
|
||||||
|
|
||||||
|
vae = CogVideoXTransformer3DModel.from_pretrained("THUDM/CogVideoX-2b", subfolder="transformer", torch_dtype=torch.float16).to("cuda")
|
||||||
|
```
|
||||||
|
|
||||||
|
## CogVideoXTransformer3DModel
|
||||||
|
|
||||||
|
[[autodoc]] CogVideoXTransformer3DModel
|
||||||
|
|
||||||
|
## Transformer2DModelOutput
|
||||||
|
|
||||||
|
[[autodoc]] models.modeling_outputs.Transformer2DModelOutput
|
||||||
30
docs/source/en/api/models/cogview3plus_transformer2d.md
Normal file
30
docs/source/en/api/models/cogview3plus_transformer2d.md
Normal file
@@ -0,0 +1,30 @@
|
|||||||
|
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
|
||||||
|
|
||||||
|
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||||
|
the License. You may obtain a copy of the License at
|
||||||
|
|
||||||
|
http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
|
||||||
|
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||||
|
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||||
|
specific language governing permissions and limitations under the License. -->
|
||||||
|
|
||||||
|
# CogView3PlusTransformer2DModel
|
||||||
|
|
||||||
|
A Diffusion Transformer model for 2D data from [CogView3Plus](https://github.com/THUDM/CogView3) was introduced in [CogView3: Finer and Faster Text-to-Image Generation via Relay Diffusion](https://huggingface.co/papers/2403.05121) by Tsinghua University & ZhipuAI.
|
||||||
|
|
||||||
|
The model can be loaded with the following code snippet.
|
||||||
|
|
||||||
|
```python
|
||||||
|
from diffusers import CogView3PlusTransformer2DModel
|
||||||
|
|
||||||
|
vae = CogView3PlusTransformer2DModel.from_pretrained("THUDM/CogView3Plus-3b", subfolder="transformer", torch_dtype=torch.bfloat16).to("cuda")
|
||||||
|
```
|
||||||
|
|
||||||
|
## CogView3PlusTransformer2DModel
|
||||||
|
|
||||||
|
[[autodoc]] CogView3PlusTransformer2DModel
|
||||||
|
|
||||||
|
## Transformer2DModelOutput
|
||||||
|
|
||||||
|
[[autodoc]] models.modeling_outputs.Transformer2DModelOutput
|
||||||
@@ -10,7 +10,7 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o
|
|||||||
specific language governing permissions and limitations under the License.
|
specific language governing permissions and limitations under the License.
|
||||||
-->
|
-->
|
||||||
|
|
||||||
# ControlNet
|
# ControlNetModel
|
||||||
|
|
||||||
The ControlNet model was introduced in [Adding Conditional Control to Text-to-Image Diffusion Models](https://huggingface.co/papers/2302.05543) by Lvmin Zhang, Anyi Rao, Maneesh Agrawala. It provides a greater degree of control over text-to-image generation by conditioning the model on additional inputs such as edge maps, depth maps, segmentation maps, and keypoints for pose detection.
|
The ControlNet model was introduced in [Adding Conditional Control to Text-to-Image Diffusion Models](https://huggingface.co/papers/2302.05543) by Lvmin Zhang, Anyi Rao, Maneesh Agrawala. It provides a greater degree of control over text-to-image generation by conditioning the model on additional inputs such as edge maps, depth maps, segmentation maps, and keypoints for pose detection.
|
||||||
|
|
||||||
@@ -21,7 +21,7 @@ The abstract from the paper is:
|
|||||||
## Loading from the original format
|
## Loading from the original format
|
||||||
|
|
||||||
By default the [`ControlNetModel`] should be loaded with [`~ModelMixin.from_pretrained`], but it can also be loaded
|
By default the [`ControlNetModel`] should be loaded with [`~ModelMixin.from_pretrained`], but it can also be loaded
|
||||||
from the original format using [`FromOriginalControlnetMixin.from_single_file`] as follows:
|
from the original format using [`FromOriginalModelMixin.from_single_file`] as follows:
|
||||||
|
|
||||||
```py
|
```py
|
||||||
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
|
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
|
||||||
@@ -29,7 +29,7 @@ from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
|
|||||||
url = "https://huggingface.co/lllyasviel/ControlNet-v1-1/blob/main/control_v11p_sd15_canny.pth" # can also be a local path
|
url = "https://huggingface.co/lllyasviel/ControlNet-v1-1/blob/main/control_v11p_sd15_canny.pth" # can also be a local path
|
||||||
controlnet = ControlNetModel.from_single_file(url)
|
controlnet = ControlNetModel.from_single_file(url)
|
||||||
|
|
||||||
url = "https://huggingface.co/runwayml/stable-diffusion-v1-5/blob/main/v1-5-pruned.safetensors" # can also be a local path
|
url = "https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5/blob/main/v1-5-pruned.safetensors" # can also be a local path
|
||||||
pipe = StableDiffusionControlNetPipeline.from_single_file(url, controlnet=controlnet)
|
pipe = StableDiffusionControlNetPipeline.from_single_file(url, controlnet=controlnet)
|
||||||
```
|
```
|
||||||
|
|
||||||
@@ -39,7 +39,7 @@ pipe = StableDiffusionControlNetPipeline.from_single_file(url, controlnet=contro
|
|||||||
|
|
||||||
## ControlNetOutput
|
## ControlNetOutput
|
||||||
|
|
||||||
[[autodoc]] models.controlnet.ControlNetOutput
|
[[autodoc]] models.controlnets.controlnet.ControlNetOutput
|
||||||
|
|
||||||
## FlaxControlNetModel
|
## FlaxControlNetModel
|
||||||
|
|
||||||
@@ -47,4 +47,4 @@ pipe = StableDiffusionControlNetPipeline.from_single_file(url, controlnet=contro
|
|||||||
|
|
||||||
## FlaxControlNetOutput
|
## FlaxControlNetOutput
|
||||||
|
|
||||||
[[autodoc]] models.controlnet_flax.FlaxControlNetOutput
|
[[autodoc]] models.controlnets.controlnet_flax.FlaxControlNetOutput
|
||||||
|
|||||||
45
docs/source/en/api/models/controlnet_flux.md
Normal file
45
docs/source/en/api/models/controlnet_flux.md
Normal file
@@ -0,0 +1,45 @@
|
|||||||
|
<!--Copyright 2024 The HuggingFace Team and The InstantX Team. All rights reserved.
|
||||||
|
|
||||||
|
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||||
|
the License. You may obtain a copy of the License at
|
||||||
|
|
||||||
|
http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
|
||||||
|
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||||
|
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||||
|
specific language governing permissions and limitations under the License.
|
||||||
|
-->
|
||||||
|
|
||||||
|
# FluxControlNetModel
|
||||||
|
|
||||||
|
FluxControlNetModel is an implementation of ControlNet for Flux.1.
|
||||||
|
|
||||||
|
The ControlNet model was introduced in [Adding Conditional Control to Text-to-Image Diffusion Models](https://huggingface.co/papers/2302.05543) by Lvmin Zhang, Anyi Rao, Maneesh Agrawala. It provides a greater degree of control over text-to-image generation by conditioning the model on additional inputs such as edge maps, depth maps, segmentation maps, and keypoints for pose detection.
|
||||||
|
|
||||||
|
The abstract from the paper is:
|
||||||
|
|
||||||
|
*We present ControlNet, a neural network architecture to add spatial conditioning controls to large, pretrained text-to-image diffusion models. ControlNet locks the production-ready large diffusion models, and reuses their deep and robust encoding layers pretrained with billions of images as a strong backbone to learn a diverse set of conditional controls. The neural architecture is connected with "zero convolutions" (zero-initialized convolution layers) that progressively grow the parameters from zero and ensure that no harmful noise could affect the finetuning. We test various conditioning controls, eg, edges, depth, segmentation, human pose, etc, with Stable Diffusion, using single or multiple conditions, with or without prompts. We show that the training of ControlNets is robust with small (<50k) and large (>1m) datasets. Extensive results show that ControlNet may facilitate wider applications to control image diffusion models.*
|
||||||
|
|
||||||
|
## Loading from the original format
|
||||||
|
|
||||||
|
By default the [`FluxControlNetModel`] should be loaded with [`~ModelMixin.from_pretrained`].
|
||||||
|
|
||||||
|
```py
|
||||||
|
from diffusers import FluxControlNetPipeline
|
||||||
|
from diffusers.models import FluxControlNetModel, FluxMultiControlNetModel
|
||||||
|
|
||||||
|
controlnet = FluxControlNetModel.from_pretrained("InstantX/FLUX.1-dev-Controlnet-Canny")
|
||||||
|
pipe = FluxControlNetPipeline.from_pretrained("black-forest-labs/FLUX.1-dev", controlnet=controlnet)
|
||||||
|
|
||||||
|
controlnet = FluxControlNetModel.from_pretrained("InstantX/FLUX.1-dev-Controlnet-Canny")
|
||||||
|
controlnet = FluxMultiControlNetModel([controlnet])
|
||||||
|
pipe = FluxControlNetPipeline.from_pretrained("black-forest-labs/FLUX.1-dev", controlnet=controlnet)
|
||||||
|
```
|
||||||
|
|
||||||
|
## FluxControlNetModel
|
||||||
|
|
||||||
|
[[autodoc]] FluxControlNetModel
|
||||||
|
|
||||||
|
## FluxControlNetOutput
|
||||||
|
|
||||||
|
[[autodoc]] models.controlnet_flux.FluxControlNetOutput
|
||||||
37
docs/source/en/api/models/controlnet_hunyuandit.md
Normal file
37
docs/source/en/api/models/controlnet_hunyuandit.md
Normal file
@@ -0,0 +1,37 @@
|
|||||||
|
<!--Copyright 2024 The HuggingFace Team and Tencent Hunyuan Team. All rights reserved.
|
||||||
|
|
||||||
|
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||||
|
the License. You may obtain a copy of the License at
|
||||||
|
|
||||||
|
http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
|
||||||
|
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||||
|
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||||
|
specific language governing permissions and limitations under the License.
|
||||||
|
-->
|
||||||
|
|
||||||
|
# HunyuanDiT2DControlNetModel
|
||||||
|
|
||||||
|
HunyuanDiT2DControlNetModel is an implementation of ControlNet for [Hunyuan-DiT](https://arxiv.org/abs/2405.08748).
|
||||||
|
|
||||||
|
ControlNet was introduced in [Adding Conditional Control to Text-to-Image Diffusion Models](https://huggingface.co/papers/2302.05543) by Lvmin Zhang, Anyi Rao, and Maneesh Agrawala.
|
||||||
|
|
||||||
|
With a ControlNet model, you can provide an additional control image to condition and control Hunyuan-DiT generation. For example, if you provide a depth map, the ControlNet model generates an image that'll preserve the spatial information from the depth map. It is a more flexible and accurate way to control the image generation process.
|
||||||
|
|
||||||
|
The abstract from the paper is:
|
||||||
|
|
||||||
|
*We present ControlNet, a neural network architecture to add spatial conditioning controls to large, pretrained text-to-image diffusion models. ControlNet locks the production-ready large diffusion models, and reuses their deep and robust encoding layers pretrained with billions of images as a strong backbone to learn a diverse set of conditional controls. The neural architecture is connected with "zero convolutions" (zero-initialized convolution layers) that progressively grow the parameters from zero and ensure that no harmful noise could affect the finetuning. We test various conditioning controls, eg, edges, depth, segmentation, human pose, etc, with Stable Diffusion, using single or multiple conditions, with or without prompts. We show that the training of ControlNets is robust with small (<50k) and large (>1m) datasets. Extensive results show that ControlNet may facilitate wider applications to control image diffusion models.*
|
||||||
|
|
||||||
|
This code is implemented by Tencent Hunyuan Team. You can find pre-trained checkpoints for Hunyuan-DiT ControlNets on [Tencent Hunyuan](https://huggingface.co/Tencent-Hunyuan).
|
||||||
|
|
||||||
|
## Example For Loading HunyuanDiT2DControlNetModel
|
||||||
|
|
||||||
|
```py
|
||||||
|
from diffusers import HunyuanDiT2DControlNetModel
|
||||||
|
import torch
|
||||||
|
controlnet = HunyuanDiT2DControlNetModel.from_pretrained("Tencent-Hunyuan/HunyuanDiT-v1.1-ControlNet-Diffusers-Pose", torch_dtype=torch.float16)
|
||||||
|
```
|
||||||
|
|
||||||
|
## HunyuanDiT2DControlNetModel
|
||||||
|
|
||||||
|
[[autodoc]] HunyuanDiT2DControlNetModel
|
||||||
42
docs/source/en/api/models/controlnet_sd3.md
Normal file
42
docs/source/en/api/models/controlnet_sd3.md
Normal file
@@ -0,0 +1,42 @@
|
|||||||
|
<!--Copyright 2024 The HuggingFace Team and The InstantX Team. All rights reserved.
|
||||||
|
|
||||||
|
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||||
|
the License. You may obtain a copy of the License at
|
||||||
|
|
||||||
|
http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
|
||||||
|
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||||
|
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||||
|
specific language governing permissions and limitations under the License.
|
||||||
|
-->
|
||||||
|
|
||||||
|
# SD3ControlNetModel
|
||||||
|
|
||||||
|
SD3ControlNetModel is an implementation of ControlNet for Stable Diffusion 3.
|
||||||
|
|
||||||
|
The ControlNet model was introduced in [Adding Conditional Control to Text-to-Image Diffusion Models](https://huggingface.co/papers/2302.05543) by Lvmin Zhang, Anyi Rao, Maneesh Agrawala. It provides a greater degree of control over text-to-image generation by conditioning the model on additional inputs such as edge maps, depth maps, segmentation maps, and keypoints for pose detection.
|
||||||
|
|
||||||
|
The abstract from the paper is:
|
||||||
|
|
||||||
|
*We present ControlNet, a neural network architecture to add spatial conditioning controls to large, pretrained text-to-image diffusion models. ControlNet locks the production-ready large diffusion models, and reuses their deep and robust encoding layers pretrained with billions of images as a strong backbone to learn a diverse set of conditional controls. The neural architecture is connected with "zero convolutions" (zero-initialized convolution layers) that progressively grow the parameters from zero and ensure that no harmful noise could affect the finetuning. We test various conditioning controls, eg, edges, depth, segmentation, human pose, etc, with Stable Diffusion, using single or multiple conditions, with or without prompts. We show that the training of ControlNets is robust with small (<50k) and large (>1m) datasets. Extensive results show that ControlNet may facilitate wider applications to control image diffusion models.*
|
||||||
|
|
||||||
|
## Loading from the original format
|
||||||
|
|
||||||
|
By default the [`SD3ControlNetModel`] should be loaded with [`~ModelMixin.from_pretrained`].
|
||||||
|
|
||||||
|
```py
|
||||||
|
from diffusers import StableDiffusion3ControlNetPipeline
|
||||||
|
from diffusers.models import SD3ControlNetModel, SD3MultiControlNetModel
|
||||||
|
|
||||||
|
controlnet = SD3ControlNetModel.from_pretrained("InstantX/SD3-Controlnet-Canny")
|
||||||
|
pipe = StableDiffusion3ControlNetPipeline.from_pretrained("stabilityai/stable-diffusion-3-medium-diffusers", controlnet=controlnet)
|
||||||
|
```
|
||||||
|
|
||||||
|
## SD3ControlNetModel
|
||||||
|
|
||||||
|
[[autodoc]] SD3ControlNetModel
|
||||||
|
|
||||||
|
## SD3ControlNetOutput
|
||||||
|
|
||||||
|
[[autodoc]] models.controlnets.controlnet_sd3.SD3ControlNetOutput
|
||||||
|
|
||||||
46
docs/source/en/api/models/controlnet_sparsectrl.md
Normal file
46
docs/source/en/api/models/controlnet_sparsectrl.md
Normal file
@@ -0,0 +1,46 @@
|
|||||||
|
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
|
||||||
|
|
||||||
|
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||||
|
the License. You may obtain a copy of the License at
|
||||||
|
|
||||||
|
http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
|
||||||
|
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||||
|
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||||
|
specific language governing permissions and limitations under the License. -->
|
||||||
|
|
||||||
|
# SparseControlNetModel
|
||||||
|
|
||||||
|
SparseControlNetModel is an implementation of ControlNet for [AnimateDiff](https://arxiv.org/abs/2307.04725).
|
||||||
|
|
||||||
|
ControlNet was introduced in [Adding Conditional Control to Text-to-Image Diffusion Models](https://huggingface.co/papers/2302.05543) by Lvmin Zhang, Anyi Rao, and Maneesh Agrawala.
|
||||||
|
|
||||||
|
The SparseCtrl version of ControlNet was introduced in [SparseCtrl: Adding Sparse Controls to Text-to-Video Diffusion Models](https://arxiv.org/abs/2311.16933) for achieving controlled generation in text-to-video diffusion models by Yuwei Guo, Ceyuan Yang, Anyi Rao, Maneesh Agrawala, Dahua Lin, and Bo Dai.
|
||||||
|
|
||||||
|
The abstract from the paper is:
|
||||||
|
|
||||||
|
*The development of text-to-video (T2V), i.e., generating videos with a given text prompt, has been significantly advanced in recent years. However, relying solely on text prompts often results in ambiguous frame composition due to spatial uncertainty. The research community thus leverages the dense structure signals, e.g., per-frame depth/edge sequences, to enhance controllability, whose collection accordingly increases the burden of inference. In this work, we present SparseCtrl to enable flexible structure control with temporally sparse signals, requiring only one or a few inputs, as shown in Figure 1. It incorporates an additional condition encoder to process these sparse signals while leaving the pre-trained T2V model untouched. The proposed approach is compatible with various modalities, including sketches, depth maps, and RGB images, providing more practical control for video generation and promoting applications such as storyboarding, depth rendering, keyframe animation, and interpolation. Extensive experiments demonstrate the generalization of SparseCtrl on both original and personalized T2V generators. Codes and models will be publicly available at [this https URL](https://guoyww.github.io/projects/SparseCtrl).*
|
||||||
|
|
||||||
|
## Example for loading SparseControlNetModel
|
||||||
|
|
||||||
|
```python
|
||||||
|
import torch
|
||||||
|
from diffusers import SparseControlNetModel
|
||||||
|
|
||||||
|
# fp32 variant in float16
|
||||||
|
# 1. Scribble checkpoint
|
||||||
|
controlnet = SparseControlNetModel.from_pretrained("guoyww/animatediff-sparsectrl-scribble", torch_dtype=torch.float16)
|
||||||
|
|
||||||
|
# 2. RGB checkpoint
|
||||||
|
controlnet = SparseControlNetModel.from_pretrained("guoyww/animatediff-sparsectrl-rgb", torch_dtype=torch.float16)
|
||||||
|
|
||||||
|
# For loading fp16 variant, pass `variant="fp16"` as an additional parameter
|
||||||
|
```
|
||||||
|
|
||||||
|
## SparseControlNetModel
|
||||||
|
|
||||||
|
[[autodoc]] SparseControlNetModel
|
||||||
|
|
||||||
|
## SparseControlNetOutput
|
||||||
|
|
||||||
|
[[autodoc]] models.controlnet_sparsectrl.SparseControlNetOutput
|
||||||
19
docs/source/en/api/models/dit_transformer2d.md
Normal file
19
docs/source/en/api/models/dit_transformer2d.md
Normal file
@@ -0,0 +1,19 @@
|
|||||||
|
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
|
||||||
|
|
||||||
|
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||||
|
the License. You may obtain a copy of the License at
|
||||||
|
|
||||||
|
http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
|
||||||
|
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||||
|
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||||
|
specific language governing permissions and limitations under the License.
|
||||||
|
-->
|
||||||
|
|
||||||
|
# DiTTransformer2DModel
|
||||||
|
|
||||||
|
A Transformer model for image-like data from [DiT](https://huggingface.co/papers/2212.09748).
|
||||||
|
|
||||||
|
## DiTTransformer2DModel
|
||||||
|
|
||||||
|
[[autodoc]] DiTTransformer2DModel
|
||||||
19
docs/source/en/api/models/flux_transformer.md
Normal file
19
docs/source/en/api/models/flux_transformer.md
Normal file
@@ -0,0 +1,19 @@
|
|||||||
|
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
|
||||||
|
|
||||||
|
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||||
|
the License. You may obtain a copy of the License at
|
||||||
|
|
||||||
|
http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
|
||||||
|
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||||
|
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||||
|
specific language governing permissions and limitations under the License.
|
||||||
|
-->
|
||||||
|
|
||||||
|
# FluxTransformer2DModel
|
||||||
|
|
||||||
|
A Transformer model for image-like data from [Flux](https://blackforestlabs.ai/announcing-black-forest-labs/).
|
||||||
|
|
||||||
|
## FluxTransformer2DModel
|
||||||
|
|
||||||
|
[[autodoc]] FluxTransformer2DModel
|
||||||
20
docs/source/en/api/models/hunyuan_transformer2d.md
Normal file
20
docs/source/en/api/models/hunyuan_transformer2d.md
Normal file
@@ -0,0 +1,20 @@
|
|||||||
|
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
|
||||||
|
|
||||||
|
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||||
|
the License. You may obtain a copy of the License at
|
||||||
|
|
||||||
|
http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
|
||||||
|
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||||
|
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||||
|
specific language governing permissions and limitations under the License.
|
||||||
|
-->
|
||||||
|
|
||||||
|
# HunyuanDiT2DModel
|
||||||
|
|
||||||
|
A Diffusion Transformer model for 2D data from [Hunyuan-DiT](https://github.com/Tencent/HunyuanDiT).
|
||||||
|
|
||||||
|
## HunyuanDiT2DModel
|
||||||
|
|
||||||
|
[[autodoc]] HunyuanDiT2DModel
|
||||||
|
|
||||||
19
docs/source/en/api/models/latte_transformer3d.md
Normal file
19
docs/source/en/api/models/latte_transformer3d.md
Normal file
@@ -0,0 +1,19 @@
|
|||||||
|
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
|
||||||
|
|
||||||
|
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||||
|
the License. You may obtain a copy of the License at
|
||||||
|
|
||||||
|
http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
|
||||||
|
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||||
|
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||||
|
specific language governing permissions and limitations under the License.
|
||||||
|
-->
|
||||||
|
|
||||||
|
## LatteTransformer3DModel
|
||||||
|
|
||||||
|
A Diffusion Transformer model for 3D data from [Latte](https://github.com/Vchitect/Latte).
|
||||||
|
|
||||||
|
## LatteTransformer3DModel
|
||||||
|
|
||||||
|
[[autodoc]] LatteTransformer3DModel
|
||||||
20
docs/source/en/api/models/lumina_nextdit2d.md
Normal file
20
docs/source/en/api/models/lumina_nextdit2d.md
Normal file
@@ -0,0 +1,20 @@
|
|||||||
|
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
|
||||||
|
|
||||||
|
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||||
|
the License. You may obtain a copy of the License at
|
||||||
|
|
||||||
|
http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
|
||||||
|
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||||
|
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||||
|
specific language governing permissions and limitations under the License.
|
||||||
|
-->
|
||||||
|
|
||||||
|
# LuminaNextDiT2DModel
|
||||||
|
|
||||||
|
A Next Version of Diffusion Transformer model for 2D data from [Lumina-T2X](https://github.com/Alpha-VLLM/Lumina-T2X).
|
||||||
|
|
||||||
|
## LuminaNextDiT2DModel
|
||||||
|
|
||||||
|
[[autodoc]] LuminaNextDiT2DModel
|
||||||
|
|
||||||
30
docs/source/en/api/models/mochi_transformer3d.md
Normal file
30
docs/source/en/api/models/mochi_transformer3d.md
Normal file
@@ -0,0 +1,30 @@
|
|||||||
|
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
|
||||||
|
|
||||||
|
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||||
|
the License. You may obtain a copy of the License at
|
||||||
|
|
||||||
|
http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
|
||||||
|
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||||
|
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||||
|
specific language governing permissions and limitations under the License. -->
|
||||||
|
|
||||||
|
# MochiTransformer3DModel
|
||||||
|
|
||||||
|
A Diffusion Transformer model for 3D video-like data was introduced in [Mochi-1 Preview](https://huggingface.co/genmo/mochi-1-preview) by Genmo.
|
||||||
|
|
||||||
|
The model can be loaded with the following code snippet.
|
||||||
|
|
||||||
|
```python
|
||||||
|
from diffusers import MochiTransformer3DModel
|
||||||
|
|
||||||
|
vae = MochiTransformer3DModel.from_pretrained("genmo/mochi-1-preview", subfolder="transformer", torch_dtype=torch.float16).to("cuda")
|
||||||
|
```
|
||||||
|
|
||||||
|
## MochiTransformer3DModel
|
||||||
|
|
||||||
|
[[autodoc]] MochiTransformer3DModel
|
||||||
|
|
||||||
|
## Transformer2DModelOutput
|
||||||
|
|
||||||
|
[[autodoc]] models.modeling_outputs.Transformer2DModelOutput
|
||||||
19
docs/source/en/api/models/pixart_transformer2d.md
Normal file
19
docs/source/en/api/models/pixart_transformer2d.md
Normal file
@@ -0,0 +1,19 @@
|
|||||||
|
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
|
||||||
|
|
||||||
|
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||||
|
the License. You may obtain a copy of the License at
|
||||||
|
|
||||||
|
http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
|
||||||
|
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||||
|
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||||
|
specific language governing permissions and limitations under the License.
|
||||||
|
-->
|
||||||
|
|
||||||
|
# PixArtTransformer2DModel
|
||||||
|
|
||||||
|
A Transformer model for image-like data from [PixArt-Alpha](https://huggingface.co/papers/2310.00426) and [PixArt-Sigma](https://huggingface.co/papers/2403.04692).
|
||||||
|
|
||||||
|
## PixArtTransformer2DModel
|
||||||
|
|
||||||
|
[[autodoc]] PixArtTransformer2DModel
|
||||||
@@ -10,7 +10,7 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o
|
|||||||
specific language governing permissions and limitations under the License.
|
specific language governing permissions and limitations under the License.
|
||||||
-->
|
-->
|
||||||
|
|
||||||
# Prior Transformer
|
# PriorTransformer
|
||||||
|
|
||||||
The Prior Transformer was originally introduced in [Hierarchical Text-Conditional Image Generation with CLIP Latents](https://huggingface.co/papers/2204.06125) by Ramesh et al. It is used to predict CLIP image embeddings from CLIP text embeddings; image embeddings are predicted through a denoising diffusion process.
|
The Prior Transformer was originally introduced in [Hierarchical Text-Conditional Image Generation with CLIP Latents](https://huggingface.co/papers/2204.06125) by Ramesh et al. It is used to predict CLIP image embeddings from CLIP text embeddings; image embeddings are predicted through a denoising diffusion process.
|
||||||
|
|
||||||
|
|||||||
19
docs/source/en/api/models/sd3_transformer2d.md
Normal file
19
docs/source/en/api/models/sd3_transformer2d.md
Normal file
@@ -0,0 +1,19 @@
|
|||||||
|
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
|
||||||
|
|
||||||
|
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||||
|
the License. You may obtain a copy of the License at
|
||||||
|
|
||||||
|
http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
|
||||||
|
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||||
|
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||||
|
specific language governing permissions and limitations under the License.
|
||||||
|
-->
|
||||||
|
|
||||||
|
# SD3 Transformer Model
|
||||||
|
|
||||||
|
The Transformer model introduced in [Stable Diffusion 3](https://hf.co/papers/2403.03206). Its novelty lies in the MMDiT transformer block.
|
||||||
|
|
||||||
|
## SD3Transformer2DModel
|
||||||
|
|
||||||
|
[[autodoc]] SD3Transformer2DModel
|
||||||
19
docs/source/en/api/models/stable_audio_transformer.md
Normal file
19
docs/source/en/api/models/stable_audio_transformer.md
Normal file
@@ -0,0 +1,19 @@
|
|||||||
|
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
|
||||||
|
|
||||||
|
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||||
|
the License. You may obtain a copy of the License at
|
||||||
|
|
||||||
|
http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
|
||||||
|
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||||
|
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||||
|
specific language governing permissions and limitations under the License.
|
||||||
|
-->
|
||||||
|
|
||||||
|
# StableAudioDiTModel
|
||||||
|
|
||||||
|
A Transformer model for audio waveforms from [Stable Audio Open](https://huggingface.co/papers/2407.14358).
|
||||||
|
|
||||||
|
## StableAudioDiTModel
|
||||||
|
|
||||||
|
[[autodoc]] StableAudioDiTModel
|
||||||
19
docs/source/en/api/models/stable_cascade_unet.md
Normal file
19
docs/source/en/api/models/stable_cascade_unet.md
Normal file
@@ -0,0 +1,19 @@
|
|||||||
|
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
|
||||||
|
|
||||||
|
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||||
|
the License. You may obtain a copy of the License at
|
||||||
|
|
||||||
|
http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
|
||||||
|
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||||
|
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||||
|
specific language governing permissions and limitations under the License.
|
||||||
|
-->
|
||||||
|
|
||||||
|
# StableCascadeUNet
|
||||||
|
|
||||||
|
A UNet model from the [Stable Cascade pipeline](../pipelines/stable_cascade.md).
|
||||||
|
|
||||||
|
## StableCascadeUNet
|
||||||
|
|
||||||
|
[[autodoc]] models.unets.unet_stable_cascade.StableCascadeUNet
|
||||||
@@ -10,7 +10,7 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o
|
|||||||
specific language governing permissions and limitations under the License.
|
specific language governing permissions and limitations under the License.
|
||||||
-->
|
-->
|
||||||
|
|
||||||
# Transformer2D
|
# Transformer2DModel
|
||||||
|
|
||||||
A Transformer model for image-like data from [CompVis](https://huggingface.co/CompVis) that is based on the [Vision Transformer](https://huggingface.co/papers/2010.11929) introduced by Dosovitskiy et al. The [`Transformer2DModel`] accepts discrete (classes of vector embeddings) or continuous (actual embeddings) inputs.
|
A Transformer model for image-like data from [CompVis](https://huggingface.co/CompVis) that is based on the [Vision Transformer](https://huggingface.co/papers/2010.11929) introduced by Dosovitskiy et al. The [`Transformer2DModel`] accepts discrete (classes of vector embeddings) or continuous (actual embeddings) inputs.
|
||||||
|
|
||||||
@@ -38,4 +38,4 @@ It is assumed one of the input classes is the masked latent pixel. The predicted
|
|||||||
|
|
||||||
## Transformer2DModelOutput
|
## Transformer2DModelOutput
|
||||||
|
|
||||||
[[autodoc]] models.transformers.transformer_2d.Transformer2DModelOutput
|
[[autodoc]] models.modeling_outputs.Transformer2DModelOutput
|
||||||
|
|||||||
@@ -10,7 +10,7 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o
|
|||||||
specific language governing permissions and limitations under the License.
|
specific language governing permissions and limitations under the License.
|
||||||
-->
|
-->
|
||||||
|
|
||||||
# Transformer Temporal
|
# TransformerTemporalModel
|
||||||
|
|
||||||
A Transformer model for video-like data.
|
A Transformer model for video-like data.
|
||||||
|
|
||||||
|
|||||||
@@ -24,4 +24,4 @@ The abstract from the paper is:
|
|||||||
|
|
||||||
## VQEncoderOutput
|
## VQEncoderOutput
|
||||||
|
|
||||||
[[autodoc]] models.vq_model.VQEncoderOutput
|
[[autodoc]] models.autoencoders.vq_model.VQEncoderOutput
|
||||||
|
|||||||
34
docs/source/en/api/pipelines/allegro.md
Normal file
34
docs/source/en/api/pipelines/allegro.md
Normal file
@@ -0,0 +1,34 @@
|
|||||||
|
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
|
||||||
|
|
||||||
|
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||||
|
the License. You may obtain a copy of the License at
|
||||||
|
|
||||||
|
http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
|
||||||
|
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||||
|
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||||
|
specific language governing permissions and limitations under the License. -->
|
||||||
|
|
||||||
|
# Allegro
|
||||||
|
|
||||||
|
[Allegro: Open the Black Box of Commercial-Level Video Generation Model](https://huggingface.co/papers/2410.15458) from RhymesAI, by Yuan Zhou, Qiuyue Wang, Yuxuan Cai, Huan Yang.
|
||||||
|
|
||||||
|
The abstract from the paper is:
|
||||||
|
|
||||||
|
*Significant advancements have been made in the field of video generation, with the open-source community contributing a wealth of research papers and tools for training high-quality models. However, despite these efforts, the available information and resources remain insufficient for achieving commercial-level performance. In this report, we open the black box and introduce Allegro, an advanced video generation model that excels in both quality and temporal consistency. We also highlight the current limitations in the field and present a comprehensive methodology for training high-performance, commercial-level video generation models, addressing key aspects such as data, model architecture, training pipeline, and evaluation. Our user study shows that Allegro surpasses existing open-source models and most commercial models, ranking just behind Hailuo and Kling. Code: https://github.com/rhymes-ai/Allegro , Model: https://huggingface.co/rhymes-ai/Allegro , Gallery: https://rhymes.ai/allegro_gallery .*
|
||||||
|
|
||||||
|
<Tip>
|
||||||
|
|
||||||
|
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers.md) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading.md#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.
|
||||||
|
|
||||||
|
</Tip>
|
||||||
|
|
||||||
|
## AllegroPipeline
|
||||||
|
|
||||||
|
[[autodoc]] AllegroPipeline
|
||||||
|
- all
|
||||||
|
- __call__
|
||||||
|
|
||||||
|
## AllegroPipelineOutput
|
||||||
|
|
||||||
|
[[autodoc]] pipelines.allegro.pipeline_output.AllegroPipelineOutput
|
||||||
@@ -25,7 +25,11 @@ The abstract of the paper is the following:
|
|||||||
| Pipeline | Tasks | Demo
|
| Pipeline | Tasks | Demo
|
||||||
|---|---|:---:|
|
|---|---|:---:|
|
||||||
| [AnimateDiffPipeline](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/animatediff/pipeline_animatediff.py) | *Text-to-Video Generation with AnimateDiff* |
|
| [AnimateDiffPipeline](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/animatediff/pipeline_animatediff.py) | *Text-to-Video Generation with AnimateDiff* |
|
||||||
|
| [AnimateDiffControlNetPipeline](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/animatediff/pipeline_animatediff_controlnet.py) | *Controlled Video-to-Video Generation with AnimateDiff using ControlNet* |
|
||||||
|
| [AnimateDiffSparseControlNetPipeline](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/animatediff/pipeline_animatediff_sparsectrl.py) | *Controlled Video-to-Video Generation with AnimateDiff using SparseCtrl* |
|
||||||
|
| [AnimateDiffSDXLPipeline](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/animatediff/pipeline_animatediff_sdxl.py) | *Video-to-Video Generation with AnimateDiff* |
|
||||||
| [AnimateDiffVideoToVideoPipeline](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/animatediff/pipeline_animatediff_video2video.py) | *Video-to-Video Generation with AnimateDiff* |
|
| [AnimateDiffVideoToVideoPipeline](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/animatediff/pipeline_animatediff_video2video.py) | *Video-to-Video Generation with AnimateDiff* |
|
||||||
|
| [AnimateDiffVideoToVideoControlNetPipeline](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/animatediff/pipeline_animatediff_video2video_controlnet.py) | *Video-to-Video Generation with AnimateDiff using ControlNet* |
|
||||||
|
|
||||||
## Available checkpoints
|
## Available checkpoints
|
||||||
|
|
||||||
@@ -78,7 +82,6 @@ output = pipe(
|
|||||||
)
|
)
|
||||||
frames = output.frames[0]
|
frames = output.frames[0]
|
||||||
export_to_gif(frames, "animation.gif")
|
export_to_gif(frames, "animation.gif")
|
||||||
|
|
||||||
```
|
```
|
||||||
|
|
||||||
Here are some sample outputs:
|
Here are some sample outputs:
|
||||||
@@ -101,6 +104,313 @@ AnimateDiff tends to work better with finetuned Stable Diffusion models. If you
|
|||||||
|
|
||||||
</Tip>
|
</Tip>
|
||||||
|
|
||||||
|
### AnimateDiffControlNetPipeline
|
||||||
|
|
||||||
|
AnimateDiff can also be used with ControlNets ControlNet was introduced in [Adding Conditional Control to Text-to-Image Diffusion Models](https://huggingface.co/papers/2302.05543) by Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. With a ControlNet model, you can provide an additional control image to condition and control Stable Diffusion generation. For example, if you provide depth maps, the ControlNet model generates a video that'll preserve the spatial information from the depth maps. It is a more flexible and accurate way to control the video generation process.
|
||||||
|
|
||||||
|
```python
|
||||||
|
import torch
|
||||||
|
from diffusers import AnimateDiffControlNetPipeline, AutoencoderKL, ControlNetModel, MotionAdapter, LCMScheduler
|
||||||
|
from diffusers.utils import export_to_gif, load_video
|
||||||
|
|
||||||
|
# Additionally, you will need a preprocess videos before they can be used with the ControlNet
|
||||||
|
# HF maintains just the right package for it: `pip install controlnet_aux`
|
||||||
|
from controlnet_aux.processor import ZoeDetector
|
||||||
|
|
||||||
|
# Download controlnets from https://huggingface.co/lllyasviel/ControlNet-v1-1 to use .from_single_file
|
||||||
|
# Download Diffusers-format controlnets, such as https://huggingface.co/lllyasviel/sd-controlnet-depth, to use .from_pretrained()
|
||||||
|
controlnet = ControlNetModel.from_single_file("control_v11f1p_sd15_depth.pth", torch_dtype=torch.float16)
|
||||||
|
|
||||||
|
# We use AnimateLCM for this example but one can use the original motion adapters as well (for example, https://huggingface.co/guoyww/animatediff-motion-adapter-v1-5-3)
|
||||||
|
motion_adapter = MotionAdapter.from_pretrained("wangfuyun/AnimateLCM")
|
||||||
|
|
||||||
|
vae = AutoencoderKL.from_pretrained("stabilityai/sd-vae-ft-mse", torch_dtype=torch.float16)
|
||||||
|
pipe: AnimateDiffControlNetPipeline = AnimateDiffControlNetPipeline.from_pretrained(
|
||||||
|
"SG161222/Realistic_Vision_V5.1_noVAE",
|
||||||
|
motion_adapter=motion_adapter,
|
||||||
|
controlnet=controlnet,
|
||||||
|
vae=vae,
|
||||||
|
).to(device="cuda", dtype=torch.float16)
|
||||||
|
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config, beta_schedule="linear")
|
||||||
|
pipe.load_lora_weights("wangfuyun/AnimateLCM", weight_name="AnimateLCM_sd15_t2v_lora.safetensors", adapter_name="lcm-lora")
|
||||||
|
pipe.set_adapters(["lcm-lora"], [0.8])
|
||||||
|
|
||||||
|
depth_detector = ZoeDetector.from_pretrained("lllyasviel/Annotators").to("cuda")
|
||||||
|
video = load_video("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/animatediff-vid2vid-input-1.gif")
|
||||||
|
conditioning_frames = []
|
||||||
|
|
||||||
|
with pipe.progress_bar(total=len(video)) as progress_bar:
|
||||||
|
for frame in video:
|
||||||
|
conditioning_frames.append(depth_detector(frame))
|
||||||
|
progress_bar.update()
|
||||||
|
|
||||||
|
prompt = "a panda, playing a guitar, sitting in a pink boat, in the ocean, mountains in background, realistic, high quality"
|
||||||
|
negative_prompt = "bad quality, worst quality"
|
||||||
|
|
||||||
|
video = pipe(
|
||||||
|
prompt=prompt,
|
||||||
|
negative_prompt=negative_prompt,
|
||||||
|
num_frames=len(video),
|
||||||
|
num_inference_steps=10,
|
||||||
|
guidance_scale=2.0,
|
||||||
|
conditioning_frames=conditioning_frames,
|
||||||
|
generator=torch.Generator().manual_seed(42),
|
||||||
|
).frames[0]
|
||||||
|
|
||||||
|
export_to_gif(video, "animatediff_controlnet.gif", fps=8)
|
||||||
|
```
|
||||||
|
|
||||||
|
Here are some sample outputs:
|
||||||
|
|
||||||
|
<table align="center">
|
||||||
|
<tr>
|
||||||
|
<th align="center">Source Video</th>
|
||||||
|
<th align="center">Output Video</th>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td align="center">
|
||||||
|
raccoon playing a guitar
|
||||||
|
<br />
|
||||||
|
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/animatediff-vid2vid-input-1.gif" alt="racoon playing a guitar" />
|
||||||
|
</td>
|
||||||
|
<td align="center">
|
||||||
|
a panda, playing a guitar, sitting in a pink boat, in the ocean, mountains in background, realistic, high quality
|
||||||
|
<br/>
|
||||||
|
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/animatediff-controlnet-output.gif" alt="a panda, playing a guitar, sitting in a pink boat, in the ocean, mountains in background, realistic, high quality" />
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
</table>
|
||||||
|
|
||||||
|
### AnimateDiffSparseControlNetPipeline
|
||||||
|
|
||||||
|
[SparseCtrl: Adding Sparse Controls to Text-to-Video Diffusion Models](https://arxiv.org/abs/2311.16933) for achieving controlled generation in text-to-video diffusion models by Yuwei Guo, Ceyuan Yang, Anyi Rao, Maneesh Agrawala, Dahua Lin, and Bo Dai.
|
||||||
|
|
||||||
|
The abstract from the paper is:
|
||||||
|
|
||||||
|
*The development of text-to-video (T2V), i.e., generating videos with a given text prompt, has been significantly advanced in recent years. However, relying solely on text prompts often results in ambiguous frame composition due to spatial uncertainty. The research community thus leverages the dense structure signals, e.g., per-frame depth/edge sequences, to enhance controllability, whose collection accordingly increases the burden of inference. In this work, we present SparseCtrl to enable flexible structure control with temporally sparse signals, requiring only one or a few inputs, as shown in Figure 1. It incorporates an additional condition encoder to process these sparse signals while leaving the pre-trained T2V model untouched. The proposed approach is compatible with various modalities, including sketches, depth maps, and RGB images, providing more practical control for video generation and promoting applications such as storyboarding, depth rendering, keyframe animation, and interpolation. Extensive experiments demonstrate the generalization of SparseCtrl on both original and personalized T2V generators. Codes and models will be publicly available at [this https URL](https://guoyww.github.io/projects/SparseCtrl).*
|
||||||
|
|
||||||
|
SparseCtrl introduces the following checkpoints for controlled text-to-video generation:
|
||||||
|
|
||||||
|
- [SparseCtrl Scribble](https://huggingface.co/guoyww/animatediff-sparsectrl-scribble)
|
||||||
|
- [SparseCtrl RGB](https://huggingface.co/guoyww/animatediff-sparsectrl-rgb)
|
||||||
|
|
||||||
|
#### Using SparseCtrl Scribble
|
||||||
|
|
||||||
|
```python
|
||||||
|
import torch
|
||||||
|
|
||||||
|
from diffusers import AnimateDiffSparseControlNetPipeline
|
||||||
|
from diffusers.models import AutoencoderKL, MotionAdapter, SparseControlNetModel
|
||||||
|
from diffusers.schedulers import DPMSolverMultistepScheduler
|
||||||
|
from diffusers.utils import export_to_gif, load_image
|
||||||
|
|
||||||
|
|
||||||
|
model_id = "SG161222/Realistic_Vision_V5.1_noVAE"
|
||||||
|
motion_adapter_id = "guoyww/animatediff-motion-adapter-v1-5-3"
|
||||||
|
controlnet_id = "guoyww/animatediff-sparsectrl-scribble"
|
||||||
|
lora_adapter_id = "guoyww/animatediff-motion-lora-v1-5-3"
|
||||||
|
vae_id = "stabilityai/sd-vae-ft-mse"
|
||||||
|
device = "cuda"
|
||||||
|
|
||||||
|
motion_adapter = MotionAdapter.from_pretrained(motion_adapter_id, torch_dtype=torch.float16).to(device)
|
||||||
|
controlnet = SparseControlNetModel.from_pretrained(controlnet_id, torch_dtype=torch.float16).to(device)
|
||||||
|
vae = AutoencoderKL.from_pretrained(vae_id, torch_dtype=torch.float16).to(device)
|
||||||
|
scheduler = DPMSolverMultistepScheduler.from_pretrained(
|
||||||
|
model_id,
|
||||||
|
subfolder="scheduler",
|
||||||
|
beta_schedule="linear",
|
||||||
|
algorithm_type="dpmsolver++",
|
||||||
|
use_karras_sigmas=True,
|
||||||
|
)
|
||||||
|
pipe = AnimateDiffSparseControlNetPipeline.from_pretrained(
|
||||||
|
model_id,
|
||||||
|
motion_adapter=motion_adapter,
|
||||||
|
controlnet=controlnet,
|
||||||
|
vae=vae,
|
||||||
|
scheduler=scheduler,
|
||||||
|
torch_dtype=torch.float16,
|
||||||
|
).to(device)
|
||||||
|
pipe.load_lora_weights(lora_adapter_id, adapter_name="motion_lora")
|
||||||
|
pipe.fuse_lora(lora_scale=1.0)
|
||||||
|
|
||||||
|
prompt = "an aerial view of a cyberpunk city, night time, neon lights, masterpiece, high quality"
|
||||||
|
negative_prompt = "low quality, worst quality, letterboxed"
|
||||||
|
|
||||||
|
image_files = [
|
||||||
|
"https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/animatediff-scribble-1.png",
|
||||||
|
"https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/animatediff-scribble-2.png",
|
||||||
|
"https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/animatediff-scribble-3.png"
|
||||||
|
]
|
||||||
|
condition_frame_indices = [0, 8, 15]
|
||||||
|
conditioning_frames = [load_image(img_file) for img_file in image_files]
|
||||||
|
|
||||||
|
video = pipe(
|
||||||
|
prompt=prompt,
|
||||||
|
negative_prompt=negative_prompt,
|
||||||
|
num_inference_steps=25,
|
||||||
|
conditioning_frames=conditioning_frames,
|
||||||
|
controlnet_conditioning_scale=1.0,
|
||||||
|
controlnet_frame_indices=condition_frame_indices,
|
||||||
|
generator=torch.Generator().manual_seed(1337),
|
||||||
|
).frames[0]
|
||||||
|
export_to_gif(video, "output.gif")
|
||||||
|
```
|
||||||
|
|
||||||
|
Here are some sample outputs:
|
||||||
|
|
||||||
|
<table align="center">
|
||||||
|
<tr>
|
||||||
|
<center>
|
||||||
|
<b>an aerial view of a cyberpunk city, night time, neon lights, masterpiece, high quality</b>
|
||||||
|
</center>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>
|
||||||
|
<center>
|
||||||
|
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/animatediff-scribble-1.png" alt="scribble-1" />
|
||||||
|
</center>
|
||||||
|
</td>
|
||||||
|
<td>
|
||||||
|
<center>
|
||||||
|
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/animatediff-scribble-2.png" alt="scribble-2" />
|
||||||
|
</center>
|
||||||
|
</td>
|
||||||
|
<td>
|
||||||
|
<center>
|
||||||
|
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/animatediff-scribble-3.png" alt="scribble-3" />
|
||||||
|
</center>
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td colspan=3>
|
||||||
|
<center>
|
||||||
|
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/animatediff-sparsectrl-scribble-results.gif" alt="an aerial view of a cyberpunk city, night time, neon lights, masterpiece, high quality" />
|
||||||
|
</center>
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
</table>
|
||||||
|
|
||||||
|
#### Using SparseCtrl RGB
|
||||||
|
|
||||||
|
```python
|
||||||
|
import torch
|
||||||
|
|
||||||
|
from diffusers import AnimateDiffSparseControlNetPipeline
|
||||||
|
from diffusers.models import AutoencoderKL, MotionAdapter, SparseControlNetModel
|
||||||
|
from diffusers.schedulers import DPMSolverMultistepScheduler
|
||||||
|
from diffusers.utils import export_to_gif, load_image
|
||||||
|
|
||||||
|
|
||||||
|
model_id = "SG161222/Realistic_Vision_V5.1_noVAE"
|
||||||
|
motion_adapter_id = "guoyww/animatediff-motion-adapter-v1-5-3"
|
||||||
|
controlnet_id = "guoyww/animatediff-sparsectrl-rgb"
|
||||||
|
lora_adapter_id = "guoyww/animatediff-motion-lora-v1-5-3"
|
||||||
|
vae_id = "stabilityai/sd-vae-ft-mse"
|
||||||
|
device = "cuda"
|
||||||
|
|
||||||
|
motion_adapter = MotionAdapter.from_pretrained(motion_adapter_id, torch_dtype=torch.float16).to(device)
|
||||||
|
controlnet = SparseControlNetModel.from_pretrained(controlnet_id, torch_dtype=torch.float16).to(device)
|
||||||
|
vae = AutoencoderKL.from_pretrained(vae_id, torch_dtype=torch.float16).to(device)
|
||||||
|
scheduler = DPMSolverMultistepScheduler.from_pretrained(
|
||||||
|
model_id,
|
||||||
|
subfolder="scheduler",
|
||||||
|
beta_schedule="linear",
|
||||||
|
algorithm_type="dpmsolver++",
|
||||||
|
use_karras_sigmas=True,
|
||||||
|
)
|
||||||
|
pipe = AnimateDiffSparseControlNetPipeline.from_pretrained(
|
||||||
|
model_id,
|
||||||
|
motion_adapter=motion_adapter,
|
||||||
|
controlnet=controlnet,
|
||||||
|
vae=vae,
|
||||||
|
scheduler=scheduler,
|
||||||
|
torch_dtype=torch.float16,
|
||||||
|
).to(device)
|
||||||
|
pipe.load_lora_weights(lora_adapter_id, adapter_name="motion_lora")
|
||||||
|
|
||||||
|
image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/animatediff-firework.png")
|
||||||
|
|
||||||
|
video = pipe(
|
||||||
|
prompt="closeup face photo of man in black clothes, night city street, bokeh, fireworks in background",
|
||||||
|
negative_prompt="low quality, worst quality",
|
||||||
|
num_inference_steps=25,
|
||||||
|
conditioning_frames=image,
|
||||||
|
controlnet_frame_indices=[0],
|
||||||
|
controlnet_conditioning_scale=1.0,
|
||||||
|
generator=torch.Generator().manual_seed(42),
|
||||||
|
).frames[0]
|
||||||
|
export_to_gif(video, "output.gif")
|
||||||
|
```
|
||||||
|
|
||||||
|
Here are some sample outputs:
|
||||||
|
|
||||||
|
<table align="center">
|
||||||
|
<tr>
|
||||||
|
<center>
|
||||||
|
<b>closeup face photo of man in black clothes, night city street, bokeh, fireworks in background</b>
|
||||||
|
</center>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>
|
||||||
|
<center>
|
||||||
|
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/animatediff-firework.png" alt="closeup face photo of man in black clothes, night city street, bokeh, fireworks in background" />
|
||||||
|
</center>
|
||||||
|
</td>
|
||||||
|
<td>
|
||||||
|
<center>
|
||||||
|
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/animatediff-sparsectrl-rgb-result.gif" alt="closeup face photo of man in black clothes, night city street, bokeh, fireworks in background" />
|
||||||
|
</center>
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
</table>
|
||||||
|
|
||||||
|
### AnimateDiffSDXLPipeline
|
||||||
|
|
||||||
|
AnimateDiff can also be used with SDXL models. This is currently an experimental feature as only a beta release of the motion adapter checkpoint is available.
|
||||||
|
|
||||||
|
```python
|
||||||
|
import torch
|
||||||
|
from diffusers.models import MotionAdapter
|
||||||
|
from diffusers import AnimateDiffSDXLPipeline, DDIMScheduler
|
||||||
|
from diffusers.utils import export_to_gif
|
||||||
|
|
||||||
|
adapter = MotionAdapter.from_pretrained("guoyww/animatediff-motion-adapter-sdxl-beta", torch_dtype=torch.float16)
|
||||||
|
|
||||||
|
model_id = "stabilityai/stable-diffusion-xl-base-1.0"
|
||||||
|
scheduler = DDIMScheduler.from_pretrained(
|
||||||
|
model_id,
|
||||||
|
subfolder="scheduler",
|
||||||
|
clip_sample=False,
|
||||||
|
timestep_spacing="linspace",
|
||||||
|
beta_schedule="linear",
|
||||||
|
steps_offset=1,
|
||||||
|
)
|
||||||
|
pipe = AnimateDiffSDXLPipeline.from_pretrained(
|
||||||
|
model_id,
|
||||||
|
motion_adapter=adapter,
|
||||||
|
scheduler=scheduler,
|
||||||
|
torch_dtype=torch.float16,
|
||||||
|
variant="fp16",
|
||||||
|
).to("cuda")
|
||||||
|
|
||||||
|
# enable memory savings
|
||||||
|
pipe.enable_vae_slicing()
|
||||||
|
pipe.enable_vae_tiling()
|
||||||
|
|
||||||
|
output = pipe(
|
||||||
|
prompt="a panda surfing in the ocean, realistic, high quality",
|
||||||
|
negative_prompt="low quality, worst quality",
|
||||||
|
num_inference_steps=20,
|
||||||
|
guidance_scale=8,
|
||||||
|
width=1024,
|
||||||
|
height=1024,
|
||||||
|
num_frames=16,
|
||||||
|
)
|
||||||
|
|
||||||
|
frames = output.frames[0]
|
||||||
|
export_to_gif(frames, "animation.gif")
|
||||||
|
```
|
||||||
|
|
||||||
### AnimateDiffVideoToVideoPipeline
|
### AnimateDiffVideoToVideoPipeline
|
||||||
|
|
||||||
AnimateDiff can also be used to generate visually similar videos or enable style/character/background or other edits starting from an initial video, allowing you to seamlessly explore creative possibilities.
|
AnimateDiff can also be used to generate visually similar videos or enable style/character/background or other edits starting from an initial video, allowing you to seamlessly explore creative possibilities.
|
||||||
@@ -118,7 +428,7 @@ from PIL import Image
|
|||||||
adapter = MotionAdapter.from_pretrained("guoyww/animatediff-motion-adapter-v1-5-2", torch_dtype=torch.float16)
|
adapter = MotionAdapter.from_pretrained("guoyww/animatediff-motion-adapter-v1-5-2", torch_dtype=torch.float16)
|
||||||
# load SD 1.5 based finetuned model
|
# load SD 1.5 based finetuned model
|
||||||
model_id = "SG161222/Realistic_Vision_V5.1_noVAE"
|
model_id = "SG161222/Realistic_Vision_V5.1_noVAE"
|
||||||
pipe = AnimateDiffVideoToVideoPipeline.from_pretrained(model_id, motion_adapter=adapter, torch_dtype=torch.float16).to("cuda")
|
pipe = AnimateDiffVideoToVideoPipeline.from_pretrained(model_id, motion_adapter=adapter, torch_dtype=torch.float16)
|
||||||
scheduler = DDIMScheduler.from_pretrained(
|
scheduler = DDIMScheduler.from_pretrained(
|
||||||
model_id,
|
model_id,
|
||||||
subfolder="scheduler",
|
subfolder="scheduler",
|
||||||
@@ -209,6 +519,97 @@ Here are some sample outputs:
|
|||||||
</tr>
|
</tr>
|
||||||
</table>
|
</table>
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
### AnimateDiffVideoToVideoControlNetPipeline
|
||||||
|
|
||||||
|
AnimateDiff can be used together with ControlNets to enhance video-to-video generation by allowing for precise control over the output. ControlNet was introduced in [Adding Conditional Control to Text-to-Image Diffusion Models](https://huggingface.co/papers/2302.05543) by Lvmin Zhang, Anyi Rao, and Maneesh Agrawala, and allows you to condition Stable Diffusion with an additional control image to ensure that the spatial information is preserved throughout the video.
|
||||||
|
|
||||||
|
This pipeline allows you to condition your generation both on the original video and on a sequence of control images.
|
||||||
|
|
||||||
|
```python
|
||||||
|
import torch
|
||||||
|
from PIL import Image
|
||||||
|
from tqdm.auto import tqdm
|
||||||
|
|
||||||
|
from controlnet_aux.processor import OpenposeDetector
|
||||||
|
from diffusers import AnimateDiffVideoToVideoControlNetPipeline
|
||||||
|
from diffusers.utils import export_to_gif, load_video
|
||||||
|
from diffusers import AutoencoderKL, ControlNetModel, MotionAdapter, LCMScheduler
|
||||||
|
|
||||||
|
# Load the ControlNet
|
||||||
|
controlnet = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-openpose", torch_dtype=torch.float16)
|
||||||
|
# Load the motion adapter
|
||||||
|
motion_adapter = MotionAdapter.from_pretrained("wangfuyun/AnimateLCM")
|
||||||
|
# Load SD 1.5 based finetuned model
|
||||||
|
vae = AutoencoderKL.from_pretrained("stabilityai/sd-vae-ft-mse", torch_dtype=torch.float16)
|
||||||
|
pipe = AnimateDiffVideoToVideoControlNetPipeline.from_pretrained(
|
||||||
|
"SG161222/Realistic_Vision_V5.1_noVAE",
|
||||||
|
motion_adapter=motion_adapter,
|
||||||
|
controlnet=controlnet,
|
||||||
|
vae=vae,
|
||||||
|
).to(device="cuda", dtype=torch.float16)
|
||||||
|
|
||||||
|
# Enable LCM to speed up inference
|
||||||
|
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config, beta_schedule="linear")
|
||||||
|
pipe.load_lora_weights("wangfuyun/AnimateLCM", weight_name="AnimateLCM_sd15_t2v_lora.safetensors", adapter_name="lcm-lora")
|
||||||
|
pipe.set_adapters(["lcm-lora"], [0.8])
|
||||||
|
|
||||||
|
video = load_video("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/dance.gif")
|
||||||
|
video = [frame.convert("RGB") for frame in video]
|
||||||
|
|
||||||
|
prompt = "astronaut in space, dancing"
|
||||||
|
negative_prompt = "bad quality, worst quality, jpeg artifacts, ugly"
|
||||||
|
|
||||||
|
# Create controlnet preprocessor
|
||||||
|
open_pose = OpenposeDetector.from_pretrained("lllyasviel/Annotators").to("cuda")
|
||||||
|
|
||||||
|
# Preprocess controlnet images
|
||||||
|
conditioning_frames = []
|
||||||
|
for frame in tqdm(video):
|
||||||
|
conditioning_frames.append(open_pose(frame))
|
||||||
|
|
||||||
|
strength = 0.8
|
||||||
|
with torch.inference_mode():
|
||||||
|
video = pipe(
|
||||||
|
video=video,
|
||||||
|
prompt=prompt,
|
||||||
|
negative_prompt=negative_prompt,
|
||||||
|
num_inference_steps=10,
|
||||||
|
guidance_scale=2.0,
|
||||||
|
controlnet_conditioning_scale=0.75,
|
||||||
|
conditioning_frames=conditioning_frames,
|
||||||
|
strength=strength,
|
||||||
|
generator=torch.Generator().manual_seed(42),
|
||||||
|
).frames[0]
|
||||||
|
|
||||||
|
video = [frame.resize(conditioning_frames[0].size) for frame in video]
|
||||||
|
export_to_gif(video, f"animatediff_vid2vid_controlnet.gif", fps=8)
|
||||||
|
```
|
||||||
|
|
||||||
|
Here are some sample outputs:
|
||||||
|
|
||||||
|
<table align="center">
|
||||||
|
<tr>
|
||||||
|
<th align="center">Source Video</th>
|
||||||
|
<th align="center">Output Video</th>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td align="center">
|
||||||
|
anime girl, dancing
|
||||||
|
<br />
|
||||||
|
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/dance.gif" alt="anime girl, dancing" />
|
||||||
|
</td>
|
||||||
|
<td align="center">
|
||||||
|
astronaut in space, dancing
|
||||||
|
<br/>
|
||||||
|
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/animatediff_vid2vid_controlnet.gif" alt="astronaut in space, dancing" />
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
</table>
|
||||||
|
|
||||||
|
**The lights and composition were transferred from the Source Video.**
|
||||||
|
|
||||||
## Using Motion LoRAs
|
## Using Motion LoRAs
|
||||||
|
|
||||||
Motion LoRAs are a collection of LoRAs that work with the `guoyww/animatediff-motion-adapter-v1-5-2` checkpoint. These LoRAs are responsible for adding specific types of motion to the animations.
|
Motion LoRAs are a collection of LoRAs that work with the `guoyww/animatediff-motion-adapter-v1-5-2` checkpoint. These LoRAs are responsible for adding specific types of motion to the animations.
|
||||||
@@ -256,7 +657,6 @@ output = pipe(
|
|||||||
)
|
)
|
||||||
frames = output.frames[0]
|
frames = output.frames[0]
|
||||||
export_to_gif(frames, "animation.gif")
|
export_to_gif(frames, "animation.gif")
|
||||||
|
|
||||||
```
|
```
|
||||||
|
|
||||||
<table>
|
<table>
|
||||||
@@ -331,7 +731,6 @@ output = pipe(
|
|||||||
)
|
)
|
||||||
frames = output.frames[0]
|
frames = output.frames[0]
|
||||||
export_to_gif(frames, "animation.gif")
|
export_to_gif(frames, "animation.gif")
|
||||||
|
|
||||||
```
|
```
|
||||||
|
|
||||||
<table>
|
<table>
|
||||||
@@ -515,6 +914,102 @@ export_to_gif(frames, "animatelcm-motion-lora.gif")
|
|||||||
</tr>
|
</tr>
|
||||||
</table>
|
</table>
|
||||||
|
|
||||||
|
## Using FreeNoise
|
||||||
|
|
||||||
|
[FreeNoise: Tuning-Free Longer Video Diffusion via Noise Rescheduling](https://arxiv.org/abs/2310.15169) by Haonan Qiu, Menghan Xia, Yong Zhang, Yingqing He, Xintao Wang, Ying Shan, Ziwei Liu.
|
||||||
|
|
||||||
|
FreeNoise is a sampling mechanism that can generate longer videos with short-video generation models by employing noise-rescheduling, temporal attention over sliding windows, and weighted averaging of latent frames. It also can be used with multiple prompts to allow for interpolated video generations. More details are available in the paper.
|
||||||
|
|
||||||
|
The currently supported AnimateDiff pipelines that can be used with FreeNoise are:
|
||||||
|
- [`AnimateDiffPipeline`]
|
||||||
|
- [`AnimateDiffControlNetPipeline`]
|
||||||
|
- [`AnimateDiffVideoToVideoPipeline`]
|
||||||
|
- [`AnimateDiffVideoToVideoControlNetPipeline`]
|
||||||
|
|
||||||
|
In order to use FreeNoise, a single line needs to be added to the inference code after loading your pipelines.
|
||||||
|
|
||||||
|
```diff
|
||||||
|
+ pipe.enable_free_noise()
|
||||||
|
```
|
||||||
|
|
||||||
|
After this, either a single prompt could be used, or multiple prompts can be passed as a dictionary of integer-string pairs. The integer keys of the dictionary correspond to the frame index at which the influence of that prompt would be maximum. Each frame index should map to a single string prompt. The prompts for intermediate frame indices, that are not passed in the dictionary, are created by interpolating between the frame prompts that are passed. By default, simple linear interpolation is used. However, you can customize this behaviour with a callback to the `prompt_interpolation_callback` parameter when enabling FreeNoise.
|
||||||
|
|
||||||
|
Full example:
|
||||||
|
|
||||||
|
```python
|
||||||
|
import torch
|
||||||
|
from diffusers import AutoencoderKL, AnimateDiffPipeline, LCMScheduler, MotionAdapter
|
||||||
|
from diffusers.utils import export_to_video, load_image
|
||||||
|
|
||||||
|
# Load pipeline
|
||||||
|
dtype = torch.float16
|
||||||
|
motion_adapter = MotionAdapter.from_pretrained("wangfuyun/AnimateLCM", torch_dtype=dtype)
|
||||||
|
vae = AutoencoderKL.from_pretrained("stabilityai/sd-vae-ft-mse", torch_dtype=dtype)
|
||||||
|
|
||||||
|
pipe = AnimateDiffPipeline.from_pretrained("emilianJR/epiCRealism", motion_adapter=motion_adapter, vae=vae, torch_dtype=dtype)
|
||||||
|
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config, beta_schedule="linear")
|
||||||
|
|
||||||
|
pipe.load_lora_weights(
|
||||||
|
"wangfuyun/AnimateLCM", weight_name="AnimateLCM_sd15_t2v_lora.safetensors", adapter_name="lcm_lora"
|
||||||
|
)
|
||||||
|
pipe.set_adapters(["lcm_lora"], [0.8])
|
||||||
|
|
||||||
|
# Enable FreeNoise for long prompt generation
|
||||||
|
pipe.enable_free_noise(context_length=16, context_stride=4)
|
||||||
|
pipe.to("cuda")
|
||||||
|
|
||||||
|
# Can be a single prompt, or a dictionary with frame timesteps
|
||||||
|
prompt = {
|
||||||
|
0: "A caterpillar on a leaf, high quality, photorealistic",
|
||||||
|
40: "A caterpillar transforming into a cocoon, on a leaf, near flowers, photorealistic",
|
||||||
|
80: "A cocoon on a leaf, flowers in the backgrond, photorealistic",
|
||||||
|
120: "A cocoon maturing and a butterfly being born, flowers and leaves visible in the background, photorealistic",
|
||||||
|
160: "A beautiful butterfly, vibrant colors, sitting on a leaf, flowers in the background, photorealistic",
|
||||||
|
200: "A beautiful butterfly, flying away in a forest, photorealistic",
|
||||||
|
240: "A cyberpunk butterfly, neon lights, glowing",
|
||||||
|
}
|
||||||
|
negative_prompt = "bad quality, worst quality, jpeg artifacts"
|
||||||
|
|
||||||
|
# Run inference
|
||||||
|
output = pipe(
|
||||||
|
prompt=prompt,
|
||||||
|
negative_prompt=negative_prompt,
|
||||||
|
num_frames=256,
|
||||||
|
guidance_scale=2.5,
|
||||||
|
num_inference_steps=10,
|
||||||
|
generator=torch.Generator("cpu").manual_seed(0),
|
||||||
|
)
|
||||||
|
|
||||||
|
# Save video
|
||||||
|
frames = output.frames[0]
|
||||||
|
export_to_video(frames, "output.mp4", fps=16)
|
||||||
|
```
|
||||||
|
|
||||||
|
### FreeNoise memory savings
|
||||||
|
|
||||||
|
Since FreeNoise processes multiple frames together, there are parts in the modeling where the memory required exceeds that available on normal consumer GPUs. The main memory bottlenecks that we identified are spatial and temporal attention blocks, upsampling and downsampling blocks, resnet blocks and feed-forward layers. Since most of these blocks operate effectively only on the channel/embedding dimension, one can perform chunked inference across the batch dimensions. The batch dimension in AnimateDiff are either spatial (`[B x F, H x W, C]`) or temporal (`B x H x W, F, C`) in nature (note that it may seem counter-intuitive, but the batch dimension here are correct, because spatial blocks process across the `B x F` dimension while the temporal blocks process across the `B x H x W` dimension). We introduce a `SplitInferenceModule` that makes it easier to chunk across any dimension and perform inference. This saves a lot of memory but comes at the cost of requiring more time for inference.
|
||||||
|
|
||||||
|
```diff
|
||||||
|
# Load pipeline and adapters
|
||||||
|
# ...
|
||||||
|
+ pipe.enable_free_noise_split_inference()
|
||||||
|
+ pipe.unet.enable_forward_chunking(16)
|
||||||
|
```
|
||||||
|
|
||||||
|
The call to `pipe.enable_free_noise_split_inference` method accepts two parameters: `spatial_split_size` (defaults to `256`) and `temporal_split_size` (defaults to `16`). These can be configured based on how much VRAM you have available. A lower split size results in lower memory usage but slower inference, whereas a larger split size results in faster inference at the cost of more memory.
|
||||||
|
|
||||||
|
## Using `from_single_file` with the MotionAdapter
|
||||||
|
|
||||||
|
`diffusers>=0.30.0` supports loading the AnimateDiff checkpoints into the `MotionAdapter` in their original format via `from_single_file`
|
||||||
|
|
||||||
|
```python
|
||||||
|
from diffusers import MotionAdapter
|
||||||
|
|
||||||
|
ckpt_path = "https://huggingface.co/Lightricks/LongAnimateDiff/blob/main/lt_long_mm_32_frames.ckpt"
|
||||||
|
|
||||||
|
adapter = MotionAdapter.from_single_file(ckpt_path, torch_dtype=torch.float16)
|
||||||
|
pipe = AnimateDiffPipeline.from_pretrained("emilianJR/epiCRealism", motion_adapter=adapter)
|
||||||
|
```
|
||||||
|
|
||||||
## AnimateDiffPipeline
|
## AnimateDiffPipeline
|
||||||
|
|
||||||
@@ -522,12 +1017,36 @@ export_to_gif(frames, "animatelcm-motion-lora.gif")
|
|||||||
- all
|
- all
|
||||||
- __call__
|
- __call__
|
||||||
|
|
||||||
|
## AnimateDiffControlNetPipeline
|
||||||
|
|
||||||
|
[[autodoc]] AnimateDiffControlNetPipeline
|
||||||
|
- all
|
||||||
|
- __call__
|
||||||
|
|
||||||
|
## AnimateDiffSparseControlNetPipeline
|
||||||
|
|
||||||
|
[[autodoc]] AnimateDiffSparseControlNetPipeline
|
||||||
|
- all
|
||||||
|
- __call__
|
||||||
|
|
||||||
|
## AnimateDiffSDXLPipeline
|
||||||
|
|
||||||
|
[[autodoc]] AnimateDiffSDXLPipeline
|
||||||
|
- all
|
||||||
|
- __call__
|
||||||
|
|
||||||
## AnimateDiffVideoToVideoPipeline
|
## AnimateDiffVideoToVideoPipeline
|
||||||
|
|
||||||
[[autodoc]] AnimateDiffVideoToVideoPipeline
|
[[autodoc]] AnimateDiffVideoToVideoPipeline
|
||||||
- all
|
- all
|
||||||
- __call__
|
- __call__
|
||||||
|
|
||||||
|
## AnimateDiffVideoToVideoControlNetPipeline
|
||||||
|
|
||||||
|
[[autodoc]] AnimateDiffVideoToVideoControlNetPipeline
|
||||||
|
- all
|
||||||
|
- __call__
|
||||||
|
|
||||||
## AnimateDiffPipelineOutput
|
## AnimateDiffPipelineOutput
|
||||||
|
|
||||||
[[autodoc]] pipelines.animatediff.AnimateDiffPipelineOutput
|
[[autodoc]] pipelines.animatediff.AnimateDiffPipelineOutput
|
||||||
|
|||||||
29
docs/source/en/api/pipelines/aura_flow.md
Normal file
29
docs/source/en/api/pipelines/aura_flow.md
Normal file
@@ -0,0 +1,29 @@
|
|||||||
|
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
|
||||||
|
|
||||||
|
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||||
|
the License. You may obtain a copy of the License at
|
||||||
|
|
||||||
|
http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
|
||||||
|
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||||
|
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||||
|
specific language governing permissions and limitations under the License.
|
||||||
|
-->
|
||||||
|
|
||||||
|
# AuraFlow
|
||||||
|
|
||||||
|
AuraFlow is inspired by [Stable Diffusion 3](../pipelines/stable_diffusion/stable_diffusion_3.md) and is by far the largest text-to-image generation model that comes with an Apache 2.0 license. This model achieves state-of-the-art results on the [GenEval](https://github.com/djghosh13/geneval) benchmark.
|
||||||
|
|
||||||
|
It was developed by the Fal team and more details about it can be found in [this blog post](https://blog.fal.ai/auraflow/).
|
||||||
|
|
||||||
|
<Tip>
|
||||||
|
|
||||||
|
AuraFlow can be quite expensive to run on consumer hardware devices. However, you can perform a suite of optimizations to run it faster and in a more memory-friendly manner. Check out [this section](https://huggingface.co/blog/sd3#memory-optimizations-for-sd3) for more details.
|
||||||
|
|
||||||
|
</Tip>
|
||||||
|
|
||||||
|
## AuraFlowPipeline
|
||||||
|
|
||||||
|
[[autodoc]] AuraFlowPipeline
|
||||||
|
- all
|
||||||
|
- __call__
|
||||||
@@ -12,42 +12,10 @@ specific language governing permissions and limitations under the License.
|
|||||||
|
|
||||||
# AutoPipeline
|
# AutoPipeline
|
||||||
|
|
||||||
`AutoPipeline` is designed to:
|
The `AutoPipeline` is designed to make it easy to load a checkpoint for a task without needing to know the specific pipeline class. Based on the task, the `AutoPipeline` automatically retrieves the correct pipeline class from the checkpoint `model_index.json` file.
|
||||||
|
|
||||||
1. make it easy for you to load a checkpoint for a task without knowing the specific pipeline class to use
|
|
||||||
2. use multiple pipelines in your workflow
|
|
||||||
|
|
||||||
Based on the task, the `AutoPipeline` class automatically retrieves the relevant pipeline given the name or path to the pretrained weights with the `from_pretrained()` method.
|
|
||||||
|
|
||||||
To seamlessly switch between tasks with the same checkpoint without reallocating additional memory, use the `from_pipe()` method to transfer the components from the original pipeline to the new one.
|
|
||||||
|
|
||||||
```py
|
|
||||||
from diffusers import AutoPipelineForText2Image
|
|
||||||
import torch
|
|
||||||
|
|
||||||
pipeline = AutoPipelineForText2Image.from_pretrained(
|
|
||||||
"runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16, use_safetensors=True
|
|
||||||
).to("cuda")
|
|
||||||
prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
|
|
||||||
|
|
||||||
image = pipeline(prompt, num_inference_steps=25).images[0]
|
|
||||||
```
|
|
||||||
|
|
||||||
<Tip>
|
|
||||||
|
|
||||||
Check out the [AutoPipeline](../../tutorials/autopipeline) tutorial to learn how to use this API!
|
|
||||||
|
|
||||||
</Tip>
|
|
||||||
|
|
||||||
`AutoPipeline` supports text-to-image, image-to-image, and inpainting for the following diffusion models:
|
|
||||||
|
|
||||||
- [Stable Diffusion](./stable_diffusion/overview)
|
|
||||||
- [ControlNet](./controlnet)
|
|
||||||
- [Stable Diffusion XL (SDXL)](./stable_diffusion/stable_diffusion_xl)
|
|
||||||
- [DeepFloyd IF](./deepfloyd_if)
|
|
||||||
- [Kandinsky 2.1](./kandinsky)
|
|
||||||
- [Kandinsky 2.2](./kandinsky_v22)
|
|
||||||
|
|
||||||
|
> [!TIP]
|
||||||
|
> Check out the [AutoPipeline](../../tutorials/autopipeline) tutorial to learn how to use this API!
|
||||||
|
|
||||||
## AutoPipelineForText2Image
|
## AutoPipelineForText2Image
|
||||||
|
|
||||||
|
|||||||
149
docs/source/en/api/pipelines/cogvideox.md
Normal file
149
docs/source/en/api/pipelines/cogvideox.md
Normal file
@@ -0,0 +1,149 @@
|
|||||||
|
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
-->
|
||||||
|
|
||||||
|
# CogVideoX
|
||||||
|
|
||||||
|
[CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer](https://arxiv.org/abs/2408.06072) from Tsinghua University & ZhipuAI, by Zhuoyi Yang, Jiayan Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong, Jie Tang.
|
||||||
|
|
||||||
|
The abstract from the paper is:
|
||||||
|
|
||||||
|
*We introduce CogVideoX, a large-scale diffusion transformer model designed for generating videos based on text prompts. To efficently model video data, we propose to levearge a 3D Variational Autoencoder (VAE) to compresses videos along both spatial and temporal dimensions. To improve the text-video alignment, we propose an expert transformer with the expert adaptive LayerNorm to facilitate the deep fusion between the two modalities. By employing a progressive training technique, CogVideoX is adept at producing coherent, long-duration videos characterized by significant motion. In addition, we develop an effectively text-video data processing pipeline that includes various data preprocessing strategies and a video captioning method. It significantly helps enhance the performance of CogVideoX, improving both generation quality and semantic alignment. Results show that CogVideoX demonstrates state-of-the-art performance across both multiple machine metrics and human evaluations. The model weight of CogVideoX-2B is publicly available at https://github.com/THUDM/CogVideo.*
|
||||||
|
|
||||||
|
<Tip>
|
||||||
|
|
||||||
|
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers.md) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading.md#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.
|
||||||
|
|
||||||
|
</Tip>
|
||||||
|
|
||||||
|
This pipeline was contributed by [zRzRzRzRzRzRzR](https://github.com/zRzRzRzRzRzRzR). The original codebase can be found [here](https://huggingface.co/THUDM). The original weights can be found under [hf.co/THUDM](https://huggingface.co/THUDM).
|
||||||
|
|
||||||
|
There are three official CogVideoX checkpoints for text-to-video and video-to-video.
|
||||||
|
|
||||||
|
| checkpoints | recommended inference dtype |
|
||||||
|
|:---:|:---:|
|
||||||
|
| [`THUDM/CogVideoX-2b`](https://huggingface.co/THUDM/CogVideoX-2b) | torch.float16 |
|
||||||
|
| [`THUDM/CogVideoX-5b`](https://huggingface.co/THUDM/CogVideoX-5b) | torch.bfloat16 |
|
||||||
|
| [`THUDM/CogVideoX1.5-5b`](https://huggingface.co/THUDM/CogVideoX1.5-5b) | torch.bfloat16 |
|
||||||
|
|
||||||
|
There are two official CogVideoX checkpoints available for image-to-video.
|
||||||
|
|
||||||
|
| checkpoints | recommended inference dtype |
|
||||||
|
|:---:|:---:|
|
||||||
|
| [`THUDM/CogVideoX-5b-I2V`](https://huggingface.co/THUDM/CogVideoX-5b-I2V) | torch.bfloat16 |
|
||||||
|
| [`THUDM/CogVideoX-1.5-5b-I2V`](https://huggingface.co/THUDM/CogVideoX-1.5-5b-I2V) | torch.bfloat16 |
|
||||||
|
|
||||||
|
For the CogVideoX 1.5 series:
|
||||||
|
- Text-to-video (T2V) works best at a resolution of 1360x768 because it was trained with that specific resolution.
|
||||||
|
- Image-to-video (I2V) works for multiple resolutions. The width can vary from 768 to 1360, but the height must be 768. The height/width must be divisible by 16.
|
||||||
|
- Both T2V and I2V models support generation with 81 and 161 frames and work best at this value. Exporting videos at 16 FPS is recommended.
|
||||||
|
|
||||||
|
There are two official CogVideoX checkpoints that support pose controllable generation (by the [Alibaba-PAI](https://huggingface.co/alibaba-pai) team).
|
||||||
|
|
||||||
|
| checkpoints | recommended inference dtype |
|
||||||
|
|:---:|:---:|
|
||||||
|
| [`alibaba-pai/CogVideoX-Fun-V1.1-2b-Pose`](https://huggingface.co/alibaba-pai/CogVideoX-Fun-V1.1-2b-Pose) | torch.bfloat16 |
|
||||||
|
| [`alibaba-pai/CogVideoX-Fun-V1.1-5b-Pose`](https://huggingface.co/alibaba-pai/CogVideoX-Fun-V1.1-5b-Pose) | torch.bfloat16 |
|
||||||
|
|
||||||
|
## Inference
|
||||||
|
|
||||||
|
Use [`torch.compile`](https://huggingface.co/docs/diffusers/main/en/tutorials/fast_diffusion#torchcompile) to reduce the inference latency.
|
||||||
|
|
||||||
|
First, load the pipeline:
|
||||||
|
|
||||||
|
```python
|
||||||
|
import torch
|
||||||
|
from diffusers import CogVideoXPipeline, CogVideoXImageToVideoPipeline
|
||||||
|
from diffusers.utils import export_to_video,load_image
|
||||||
|
pipe = CogVideoXPipeline.from_pretrained("THUDM/CogVideoX-5b").to("cuda") # or "THUDM/CogVideoX-2b"
|
||||||
|
```
|
||||||
|
|
||||||
|
If you are using the image-to-video pipeline, load it as follows:
|
||||||
|
|
||||||
|
```python
|
||||||
|
pipe = CogVideoXImageToVideoPipeline.from_pretrained("THUDM/CogVideoX-5b-I2V").to("cuda")
|
||||||
|
```
|
||||||
|
|
||||||
|
Then change the memory layout of the pipelines `transformer` component to `torch.channels_last`:
|
||||||
|
|
||||||
|
```python
|
||||||
|
pipe.transformer.to(memory_format=torch.channels_last)
|
||||||
|
```
|
||||||
|
|
||||||
|
Compile the components and run inference:
|
||||||
|
|
||||||
|
```python
|
||||||
|
pipe.transformer = torch.compile(pipeline.transformer, mode="max-autotune", fullgraph=True)
|
||||||
|
|
||||||
|
# CogVideoX works well with long and well-described prompts
|
||||||
|
prompt = "A panda, dressed in a small, red jacket and a tiny hat, sits on a wooden stool in a serene bamboo forest. The panda's fluffy paws strum a miniature acoustic guitar, producing soft, melodic tunes. Nearby, a few other pandas gather, watching curiously and some clapping in rhythm. Sunlight filters through the tall bamboo, casting a gentle glow on the scene. The panda's face is expressive, showing concentration and joy as it plays. The background includes a small, flowing stream and vibrant green foliage, enhancing the peaceful and magical atmosphere of this unique musical performance."
|
||||||
|
video = pipe(prompt=prompt, guidance_scale=6, num_inference_steps=50).frames[0]
|
||||||
|
```
|
||||||
|
|
||||||
|
The [T2V benchmark](https://gist.github.com/a-r-r-o-w/5183d75e452a368fd17448fcc810bd3f) results on an 80GB A100 machine are:
|
||||||
|
|
||||||
|
```
|
||||||
|
Without torch.compile(): Average inference time: 96.89 seconds.
|
||||||
|
With torch.compile(): Average inference time: 76.27 seconds.
|
||||||
|
```
|
||||||
|
|
||||||
|
### Memory optimization
|
||||||
|
|
||||||
|
CogVideoX-2b requires about 19 GB of GPU memory to decode 49 frames (6 seconds of video at 8 FPS) with output resolution 720x480 (W x H), which makes it not possible to run on consumer GPUs or free-tier T4 Colab. The following memory optimizations could be used to reduce the memory footprint. For replication, you can refer to [this](https://gist.github.com/a-r-r-o-w/3959a03f15be5c9bd1fe545b09dfcc93) script.
|
||||||
|
|
||||||
|
- `pipe.enable_model_cpu_offload()`:
|
||||||
|
- Without enabling cpu offloading, memory usage is `33 GB`
|
||||||
|
- With enabling cpu offloading, memory usage is `19 GB`
|
||||||
|
- `pipe.enable_sequential_cpu_offload()`:
|
||||||
|
- Similar to `enable_model_cpu_offload` but can significantly reduce memory usage at the cost of slow inference
|
||||||
|
- When enabled, memory usage is under `4 GB`
|
||||||
|
- `pipe.vae.enable_tiling()`:
|
||||||
|
- With enabling cpu offloading and tiling, memory usage is `11 GB`
|
||||||
|
- `pipe.vae.enable_slicing()`
|
||||||
|
|
||||||
|
### Quantized inference
|
||||||
|
|
||||||
|
[torchao](https://github.com/pytorch/ao) and [optimum-quanto](https://github.com/huggingface/optimum-quanto/) can be used to quantize the text encoder, transformer and VAE modules to lower the memory requirements. This makes it possible to run the model on a free-tier T4 Colab or lower VRAM GPUs!
|
||||||
|
|
||||||
|
It is also worth noting that torchao quantization is fully compatible with [torch.compile](/optimization/torch2.0#torchcompile), which allows for much faster inference speed. Additionally, models can be serialized and stored in a quantized datatype to save disk space with torchao. Find examples and benchmarks in the gists below.
|
||||||
|
- [torchao](https://gist.github.com/a-r-r-o-w/4d9732d17412888c885480c6521a9897)
|
||||||
|
- [quanto](https://gist.github.com/a-r-r-o-w/31be62828b00a9292821b85c1017effa)
|
||||||
|
|
||||||
|
## CogVideoXPipeline
|
||||||
|
|
||||||
|
[[autodoc]] CogVideoXPipeline
|
||||||
|
- all
|
||||||
|
- __call__
|
||||||
|
|
||||||
|
## CogVideoXImageToVideoPipeline
|
||||||
|
|
||||||
|
[[autodoc]] CogVideoXImageToVideoPipeline
|
||||||
|
- all
|
||||||
|
- __call__
|
||||||
|
|
||||||
|
## CogVideoXVideoToVideoPipeline
|
||||||
|
|
||||||
|
[[autodoc]] CogVideoXVideoToVideoPipeline
|
||||||
|
- all
|
||||||
|
- __call__
|
||||||
|
|
||||||
|
## CogVideoXFunControlPipeline
|
||||||
|
|
||||||
|
[[autodoc]] CogVideoXFunControlPipeline
|
||||||
|
- all
|
||||||
|
- __call__
|
||||||
|
|
||||||
|
## CogVideoXPipelineOutput
|
||||||
|
|
||||||
|
[[autodoc]] pipelines.cogvideo.pipeline_output.CogVideoXPipelineOutput
|
||||||
40
docs/source/en/api/pipelines/cogview3.md
Normal file
40
docs/source/en/api/pipelines/cogview3.md
Normal file
@@ -0,0 +1,40 @@
|
|||||||
|
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
-->
|
||||||
|
|
||||||
|
# CogView3Plus
|
||||||
|
|
||||||
|
[CogView3: Finer and Faster Text-to-Image Generation via Relay Diffusion](https://huggingface.co/papers/2403.05121) from Tsinghua University & ZhipuAI, by Wendi Zheng, Jiayan Teng, Zhuoyi Yang, Weihan Wang, Jidong Chen, Xiaotao Gu, Yuxiao Dong, Ming Ding, Jie Tang.
|
||||||
|
|
||||||
|
The abstract from the paper is:
|
||||||
|
|
||||||
|
*Recent advancements in text-to-image generative systems have been largely driven by diffusion models. However, single-stage text-to-image diffusion models still face challenges, in terms of computational efficiency and the refinement of image details. To tackle the issue, we propose CogView3, an innovative cascaded framework that enhances the performance of text-to-image diffusion. CogView3 is the first model implementing relay diffusion in the realm of text-to-image generation, executing the task by first creating low-resolution images and subsequently applying relay-based super-resolution. This methodology not only results in competitive text-to-image outputs but also greatly reduces both training and inference costs. Our experimental results demonstrate that CogView3 outperforms SDXL, the current state-of-the-art open-source text-to-image diffusion model, by 77.0% in human evaluations, all while requiring only about 1/2 of the inference time. The distilled variant of CogView3 achieves comparable performance while only utilizing 1/10 of the inference time by SDXL.*
|
||||||
|
|
||||||
|
<Tip>
|
||||||
|
|
||||||
|
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers.md) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading.md#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.
|
||||||
|
|
||||||
|
</Tip>
|
||||||
|
|
||||||
|
This pipeline was contributed by [zRzRzRzRzRzRzR](https://github.com/zRzRzRzRzRzRzR). The original codebase can be found [here](https://huggingface.co/THUDM). The original weights can be found under [hf.co/THUDM](https://huggingface.co/THUDM).
|
||||||
|
|
||||||
|
## CogView3PlusPipeline
|
||||||
|
|
||||||
|
[[autodoc]] CogView3PlusPipeline
|
||||||
|
- all
|
||||||
|
- __call__
|
||||||
|
|
||||||
|
## CogView3PipelineOutput
|
||||||
|
|
||||||
|
[[autodoc]] pipelines.cogview3.pipeline_output.CogView3PipelineOutput
|
||||||
56
docs/source/en/api/pipelines/controlnet_flux.md
Normal file
56
docs/source/en/api/pipelines/controlnet_flux.md
Normal file
@@ -0,0 +1,56 @@
|
|||||||
|
<!--Copyright 2024 The HuggingFace Team, The InstantX Team, and the XLabs Team. All rights reserved.
|
||||||
|
|
||||||
|
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||||
|
the License. You may obtain a copy of the License at
|
||||||
|
|
||||||
|
http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
|
||||||
|
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||||
|
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||||
|
specific language governing permissions and limitations under the License.
|
||||||
|
-->
|
||||||
|
|
||||||
|
# ControlNet with Flux.1
|
||||||
|
|
||||||
|
FluxControlNetPipeline is an implementation of ControlNet for Flux.1.
|
||||||
|
|
||||||
|
ControlNet was introduced in [Adding Conditional Control to Text-to-Image Diffusion Models](https://huggingface.co/papers/2302.05543) by Lvmin Zhang, Anyi Rao, and Maneesh Agrawala.
|
||||||
|
|
||||||
|
With a ControlNet model, you can provide an additional control image to condition and control Stable Diffusion generation. For example, if you provide a depth map, the ControlNet model generates an image that'll preserve the spatial information from the depth map. It is a more flexible and accurate way to control the image generation process.
|
||||||
|
|
||||||
|
The abstract from the paper is:
|
||||||
|
|
||||||
|
*We present ControlNet, a neural network architecture to add spatial conditioning controls to large, pretrained text-to-image diffusion models. ControlNet locks the production-ready large diffusion models, and reuses their deep and robust encoding layers pretrained with billions of images as a strong backbone to learn a diverse set of conditional controls. The neural architecture is connected with "zero convolutions" (zero-initialized convolution layers) that progressively grow the parameters from zero and ensure that no harmful noise could affect the finetuning. We test various conditioning controls, eg, edges, depth, segmentation, human pose, etc, with Stable Diffusion, using single or multiple conditions, with or without prompts. We show that the training of ControlNets is robust with small (<50k) and large (>1m) datasets. Extensive results show that ControlNet may facilitate wider applications to control image diffusion models.*
|
||||||
|
|
||||||
|
This controlnet code is implemented by [The InstantX Team](https://huggingface.co/InstantX). You can find pre-trained checkpoints for Flux-ControlNet in the table below:
|
||||||
|
|
||||||
|
|
||||||
|
| ControlNet type | Developer | Link |
|
||||||
|
| -------- | ---------- | ---- |
|
||||||
|
| Canny | [The InstantX Team](https://huggingface.co/InstantX) | [Link](https://huggingface.co/InstantX/FLUX.1-dev-Controlnet-Canny) |
|
||||||
|
| Depth | [The InstantX Team](https://huggingface.co/InstantX) | [Link](https://huggingface.co/Shakker-Labs/FLUX.1-dev-ControlNet-Depth) |
|
||||||
|
| Union | [The InstantX Team](https://huggingface.co/InstantX) | [Link](https://huggingface.co/InstantX/FLUX.1-dev-Controlnet-Union) |
|
||||||
|
|
||||||
|
XLabs ControlNets are also supported, which was contributed by the [XLabs team](https://huggingface.co/XLabs-AI).
|
||||||
|
|
||||||
|
| ControlNet type | Developer | Link |
|
||||||
|
| -------- | ---------- | ---- |
|
||||||
|
| Canny | [The XLabs Team](https://huggingface.co/XLabs-AI) | [Link](https://huggingface.co/XLabs-AI/flux-controlnet-canny-diffusers) |
|
||||||
|
| Depth | [The XLabs Team](https://huggingface.co/XLabs-AI) | [Link](https://huggingface.co/XLabs-AI/flux-controlnet-depth-diffusers) |
|
||||||
|
| HED | [The XLabs Team](https://huggingface.co/XLabs-AI) | [Link](https://huggingface.co/XLabs-AI/flux-controlnet-hed-diffusers) |
|
||||||
|
|
||||||
|
|
||||||
|
<Tip>
|
||||||
|
|
||||||
|
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-components-across-pipelines) section to learn how to efficiently load the same components into multiple pipelines.
|
||||||
|
|
||||||
|
</Tip>
|
||||||
|
|
||||||
|
## FluxControlNetPipeline
|
||||||
|
[[autodoc]] FluxControlNetPipeline
|
||||||
|
- all
|
||||||
|
- __call__
|
||||||
|
|
||||||
|
|
||||||
|
## FluxPipelineOutput
|
||||||
|
[[autodoc]] pipelines.flux.pipeline_output.FluxPipelineOutput
|
||||||
36
docs/source/en/api/pipelines/controlnet_hunyuandit.md
Normal file
36
docs/source/en/api/pipelines/controlnet_hunyuandit.md
Normal file
@@ -0,0 +1,36 @@
|
|||||||
|
<!--Copyright 2024 The HuggingFace Team and Tencent Hunyuan Team. All rights reserved.
|
||||||
|
|
||||||
|
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||||
|
the License. You may obtain a copy of the License at
|
||||||
|
|
||||||
|
http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
|
||||||
|
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||||
|
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||||
|
specific language governing permissions and limitations under the License.
|
||||||
|
-->
|
||||||
|
|
||||||
|
# ControlNet with Hunyuan-DiT
|
||||||
|
|
||||||
|
HunyuanDiTControlNetPipeline is an implementation of ControlNet for [Hunyuan-DiT](https://arxiv.org/abs/2405.08748).
|
||||||
|
|
||||||
|
ControlNet was introduced in [Adding Conditional Control to Text-to-Image Diffusion Models](https://huggingface.co/papers/2302.05543) by Lvmin Zhang, Anyi Rao, and Maneesh Agrawala.
|
||||||
|
|
||||||
|
With a ControlNet model, you can provide an additional control image to condition and control Hunyuan-DiT generation. For example, if you provide a depth map, the ControlNet model generates an image that'll preserve the spatial information from the depth map. It is a more flexible and accurate way to control the image generation process.
|
||||||
|
|
||||||
|
The abstract from the paper is:
|
||||||
|
|
||||||
|
*We present ControlNet, a neural network architecture to add spatial conditioning controls to large, pretrained text-to-image diffusion models. ControlNet locks the production-ready large diffusion models, and reuses their deep and robust encoding layers pretrained with billions of images as a strong backbone to learn a diverse set of conditional controls. The neural architecture is connected with "zero convolutions" (zero-initialized convolution layers) that progressively grow the parameters from zero and ensure that no harmful noise could affect the finetuning. We test various conditioning controls, eg, edges, depth, segmentation, human pose, etc, with Stable Diffusion, using single or multiple conditions, with or without prompts. We show that the training of ControlNets is robust with small (<50k) and large (>1m) datasets. Extensive results show that ControlNet may facilitate wider applications to control image diffusion models.*
|
||||||
|
|
||||||
|
This code is implemented by Tencent Hunyuan Team. You can find pre-trained checkpoints for Hunyuan-DiT ControlNets on [Tencent Hunyuan](https://huggingface.co/Tencent-Hunyuan).
|
||||||
|
|
||||||
|
<Tip>
|
||||||
|
|
||||||
|
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-components-across-pipelines) section to learn how to efficiently load the same components into multiple pipelines.
|
||||||
|
|
||||||
|
</Tip>
|
||||||
|
|
||||||
|
## HunyuanDiTControlNetPipeline
|
||||||
|
[[autodoc]] HunyuanDiTControlNetPipeline
|
||||||
|
- all
|
||||||
|
- __call__
|
||||||
54
docs/source/en/api/pipelines/controlnet_sd3.md
Normal file
54
docs/source/en/api/pipelines/controlnet_sd3.md
Normal file
@@ -0,0 +1,54 @@
|
|||||||
|
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
|
||||||
|
|
||||||
|
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||||
|
the License. You may obtain a copy of the License at
|
||||||
|
|
||||||
|
http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
|
||||||
|
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||||
|
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||||
|
specific language governing permissions and limitations under the License.
|
||||||
|
-->
|
||||||
|
|
||||||
|
# ControlNet with Stable Diffusion 3
|
||||||
|
|
||||||
|
StableDiffusion3ControlNetPipeline is an implementation of ControlNet for Stable Diffusion 3.
|
||||||
|
|
||||||
|
ControlNet was introduced in [Adding Conditional Control to Text-to-Image Diffusion Models](https://huggingface.co/papers/2302.05543) by Lvmin Zhang, Anyi Rao, and Maneesh Agrawala.
|
||||||
|
|
||||||
|
With a ControlNet model, you can provide an additional control image to condition and control Stable Diffusion generation. For example, if you provide a depth map, the ControlNet model generates an image that'll preserve the spatial information from the depth map. It is a more flexible and accurate way to control the image generation process.
|
||||||
|
|
||||||
|
The abstract from the paper is:
|
||||||
|
|
||||||
|
*We present ControlNet, a neural network architecture to add spatial conditioning controls to large, pretrained text-to-image diffusion models. ControlNet locks the production-ready large diffusion models, and reuses their deep and robust encoding layers pretrained with billions of images as a strong backbone to learn a diverse set of conditional controls. The neural architecture is connected with "zero convolutions" (zero-initialized convolution layers) that progressively grow the parameters from zero and ensure that no harmful noise could affect the finetuning. We test various conditioning controls, eg, edges, depth, segmentation, human pose, etc, with Stable Diffusion, using single or multiple conditions, with or without prompts. We show that the training of ControlNets is robust with small (<50k) and large (>1m) datasets. Extensive results show that ControlNet may facilitate wider applications to control image diffusion models.*
|
||||||
|
|
||||||
|
This controlnet code is mainly implemented by [The InstantX Team](https://huggingface.co/InstantX). The inpainting-related code was developed by [The Alimama Creative Team](https://huggingface.co/alimama-creative). You can find pre-trained checkpoints for SD3-ControlNet in the table below:
|
||||||
|
|
||||||
|
|
||||||
|
| ControlNet type | Developer | Link |
|
||||||
|
| -------- | ---------- | ---- |
|
||||||
|
| Canny | [The InstantX Team](https://huggingface.co/InstantX) | [Link](https://huggingface.co/InstantX/SD3-Controlnet-Canny) |
|
||||||
|
| Depth | [The InstantX Team](https://huggingface.co/InstantX) | [Link](https://huggingface.co/InstantX/SD3-Controlnet-Depth) |
|
||||||
|
| Pose | [The InstantX Team](https://huggingface.co/InstantX) | [Link](https://huggingface.co/InstantX/SD3-Controlnet-Pose) |
|
||||||
|
| Tile | [The InstantX Team](https://huggingface.co/InstantX) | [Link](https://huggingface.co/InstantX/SD3-Controlnet-Tile) |
|
||||||
|
| Inpainting | [The AlimamaCreative Team](https://huggingface.co/alimama-creative) | [link](https://huggingface.co/alimama-creative/SD3-Controlnet-Inpainting) |
|
||||||
|
|
||||||
|
|
||||||
|
<Tip>
|
||||||
|
|
||||||
|
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-components-across-pipelines) section to learn how to efficiently load the same components into multiple pipelines.
|
||||||
|
|
||||||
|
</Tip>
|
||||||
|
|
||||||
|
## StableDiffusion3ControlNetPipeline
|
||||||
|
[[autodoc]] StableDiffusion3ControlNetPipeline
|
||||||
|
- all
|
||||||
|
- __call__
|
||||||
|
|
||||||
|
## StableDiffusion3ControlNetInpaintingPipeline
|
||||||
|
[[autodoc]] pipelines.controlnet_sd3.pipeline_stable_diffusion_3_controlnet_inpainting.StableDiffusion3ControlNetInpaintingPipeline
|
||||||
|
- all
|
||||||
|
- __call__
|
||||||
|
|
||||||
|
## StableDiffusion3PipelineOutput
|
||||||
|
[[autodoc]] pipelines.stable_diffusion_3.pipeline_output.StableDiffusion3PipelineOutput
|
||||||
347
docs/source/en/api/pipelines/flux.md
Normal file
347
docs/source/en/api/pipelines/flux.md
Normal file
@@ -0,0 +1,347 @@
|
|||||||
|
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
|
||||||
|
|
||||||
|
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||||
|
the License. You may obtain a copy of the License at
|
||||||
|
|
||||||
|
http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
|
||||||
|
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||||
|
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||||
|
specific language governing permissions and limitations under the License.
|
||||||
|
-->
|
||||||
|
|
||||||
|
# Flux
|
||||||
|
|
||||||
|
Flux is a series of text-to-image generation models based on diffusion transformers. To know more about Flux, check out the original [blog post](https://blackforestlabs.ai/announcing-black-forest-labs/) by the creators of Flux, Black Forest Labs.
|
||||||
|
|
||||||
|
Original model checkpoints for Flux can be found [here](https://huggingface.co/black-forest-labs). Original inference code can be found [here](https://github.com/black-forest-labs/flux).
|
||||||
|
|
||||||
|
<Tip>
|
||||||
|
|
||||||
|
Flux can be quite expensive to run on consumer hardware devices. However, you can perform a suite of optimizations to run it faster and in a more memory-friendly manner. Check out [this section](https://huggingface.co/blog/sd3#memory-optimizations-for-sd3) for more details. Additionally, Flux can benefit from quantization for memory efficiency with a trade-off in inference latency. Refer to [this blog post](https://huggingface.co/blog/quanto-diffusers) to learn more. For an exhaustive list of resources, check out [this gist](https://gist.github.com/sayakpaul/b664605caf0aa3bf8585ab109dd5ac9c).
|
||||||
|
|
||||||
|
</Tip>
|
||||||
|
|
||||||
|
Flux comes in the following variants:
|
||||||
|
|
||||||
|
| model type | model id |
|
||||||
|
|:----------:|:--------:|
|
||||||
|
| Timestep-distilled | [`black-forest-labs/FLUX.1-schnell`](https://huggingface.co/black-forest-labs/FLUX.1-schnell) |
|
||||||
|
| Guidance-distilled | [`black-forest-labs/FLUX.1-dev`](https://huggingface.co/black-forest-labs/FLUX.1-dev) |
|
||||||
|
| Fill Inpainting/Outpainting (Guidance-distilled) | [`black-forest-labs/FLUX.1-Fill-dev`](https://huggingface.co/black-forest-labs/FLUX.1-Fill-dev) |
|
||||||
|
| Canny Control (Guidance-distilled) | [`black-forest-labs/FLUX.1-Canny-dev`](https://huggingface.co/black-forest-labs/FLUX.1-Canny-dev) |
|
||||||
|
| Depth Control (Guidance-distilled) | [`black-forest-labs/FLUX.1-Depth-dev`](https://huggingface.co/black-forest-labs/FLUX.1-Depth-dev) |
|
||||||
|
| Canny Control (LoRA) | [`black-forest-labs/FLUX.1-Canny-dev-lora`](https://huggingface.co/black-forest-labs/FLUX.1-Canny-dev-lora) |
|
||||||
|
| Depth Control (LoRA) | [`black-forest-labs/FLUX.1-Depth-dev-lora`](https://huggingface.co/black-forest-labs/FLUX.1-Depth-dev-lora) |
|
||||||
|
| Redux (Adapter) | [`black-forest-labs/FLUX.1-Redux-dev`](https://huggingface.co/black-forest-labs/FLUX.1-Redux-dev) |
|
||||||
|
|
||||||
|
All checkpoints have different usage which we detail below.
|
||||||
|
|
||||||
|
### Timestep-distilled
|
||||||
|
|
||||||
|
* `max_sequence_length` cannot be more than 256.
|
||||||
|
* `guidance_scale` needs to be 0.
|
||||||
|
* As this is a timestep-distilled model, it benefits from fewer sampling steps.
|
||||||
|
|
||||||
|
```python
|
||||||
|
import torch
|
||||||
|
from diffusers import FluxPipeline
|
||||||
|
|
||||||
|
pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-schnell", torch_dtype=torch.bfloat16)
|
||||||
|
pipe.enable_model_cpu_offload()
|
||||||
|
|
||||||
|
prompt = "A cat holding a sign that says hello world"
|
||||||
|
out = pipe(
|
||||||
|
prompt=prompt,
|
||||||
|
guidance_scale=0.,
|
||||||
|
height=768,
|
||||||
|
width=1360,
|
||||||
|
num_inference_steps=4,
|
||||||
|
max_sequence_length=256,
|
||||||
|
).images[0]
|
||||||
|
out.save("image.png")
|
||||||
|
```
|
||||||
|
|
||||||
|
### Guidance-distilled
|
||||||
|
|
||||||
|
* The guidance-distilled variant takes about 50 sampling steps for good-quality generation.
|
||||||
|
* It doesn't have any limitations around the `max_sequence_length`.
|
||||||
|
|
||||||
|
```python
|
||||||
|
import torch
|
||||||
|
from diffusers import FluxPipeline
|
||||||
|
|
||||||
|
pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16)
|
||||||
|
pipe.enable_model_cpu_offload()
|
||||||
|
|
||||||
|
prompt = "a tiny astronaut hatching from an egg on the moon"
|
||||||
|
out = pipe(
|
||||||
|
prompt=prompt,
|
||||||
|
guidance_scale=3.5,
|
||||||
|
height=768,
|
||||||
|
width=1360,
|
||||||
|
num_inference_steps=50,
|
||||||
|
).images[0]
|
||||||
|
out.save("image.png")
|
||||||
|
```
|
||||||
|
|
||||||
|
### Fill Inpainting/Outpainting
|
||||||
|
|
||||||
|
* Flux Fill pipeline does not require `strength` as an input like regular inpainting pipelines.
|
||||||
|
* It supports both inpainting and outpainting.
|
||||||
|
|
||||||
|
```python
|
||||||
|
import torch
|
||||||
|
from diffusers import FluxFillPipeline
|
||||||
|
from diffusers.utils import load_image
|
||||||
|
|
||||||
|
image = load_image("https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/cup.png")
|
||||||
|
mask = load_image("https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/cup_mask.png")
|
||||||
|
|
||||||
|
repo_id = "black-forest-labs/FLUX.1-Fill-dev"
|
||||||
|
pipe = FluxFillPipeline.from_pretrained(repo_id, torch_dtype=torch.bfloat16).to("cuda")
|
||||||
|
|
||||||
|
image = pipe(
|
||||||
|
prompt="a white paper cup",
|
||||||
|
image=image,
|
||||||
|
mask_image=mask,
|
||||||
|
height=1632,
|
||||||
|
width=1232,
|
||||||
|
max_sequence_length=512,
|
||||||
|
generator=torch.Generator("cpu").manual_seed(0)
|
||||||
|
).images[0]
|
||||||
|
image.save(f"output.png")
|
||||||
|
```
|
||||||
|
|
||||||
|
### Canny Control
|
||||||
|
|
||||||
|
**Note:** `black-forest-labs/Flux.1-Canny-dev` is _not_ a [`ControlNetModel`] model. ControlNet models are a separate component from the UNet/Transformer whose residuals are added to the actual underlying model. Canny Control is an alternate architecture that achieves effectively the same results as a ControlNet model would, by using channel-wise concatenation with input control condition and ensuring the transformer learns structure control by following the condition as closely as possible.
|
||||||
|
|
||||||
|
```python
|
||||||
|
# !pip install -U controlnet-aux
|
||||||
|
import torch
|
||||||
|
from controlnet_aux import CannyDetector
|
||||||
|
from diffusers import FluxControlPipeline
|
||||||
|
from diffusers.utils import load_image
|
||||||
|
|
||||||
|
pipe = FluxControlPipeline.from_pretrained("black-forest-labs/FLUX.1-Canny-dev", torch_dtype=torch.bfloat16).to("cuda")
|
||||||
|
|
||||||
|
prompt = "A robot made of exotic candies and chocolates of different kinds. The background is filled with confetti and celebratory gifts."
|
||||||
|
control_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/robot.png")
|
||||||
|
|
||||||
|
processor = CannyDetector()
|
||||||
|
control_image = processor(control_image, low_threshold=50, high_threshold=200, detect_resolution=1024, image_resolution=1024)
|
||||||
|
|
||||||
|
image = pipe(
|
||||||
|
prompt=prompt,
|
||||||
|
control_image=control_image,
|
||||||
|
height=1024,
|
||||||
|
width=1024,
|
||||||
|
num_inference_steps=50,
|
||||||
|
guidance_scale=30.0,
|
||||||
|
).images[0]
|
||||||
|
image.save("output.png")
|
||||||
|
```
|
||||||
|
|
||||||
|
### Depth Control
|
||||||
|
|
||||||
|
**Note:** `black-forest-labs/Flux.1-Depth-dev` is _not_ a ControlNet model. [`ControlNetModel`] models are a separate component from the UNet/Transformer whose residuals are added to the actual underlying model. Depth Control is an alternate architecture that achieves effectively the same results as a ControlNet model would, by using channel-wise concatenation with input control condition and ensuring the transformer learns structure control by following the condition as closely as possible.
|
||||||
|
|
||||||
|
```python
|
||||||
|
# !pip install git+https://github.com/huggingface/image_gen_aux
|
||||||
|
import torch
|
||||||
|
from diffusers import FluxControlPipeline, FluxTransformer2DModel
|
||||||
|
from diffusers.utils import load_image
|
||||||
|
from image_gen_aux import DepthPreprocessor
|
||||||
|
|
||||||
|
pipe = FluxControlPipeline.from_pretrained("black-forest-labs/FLUX.1-Depth-dev", torch_dtype=torch.bfloat16).to("cuda")
|
||||||
|
|
||||||
|
prompt = "A robot made of exotic candies and chocolates of different kinds. The background is filled with confetti and celebratory gifts."
|
||||||
|
control_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/robot.png")
|
||||||
|
|
||||||
|
processor = DepthPreprocessor.from_pretrained("LiheYoung/depth-anything-large-hf")
|
||||||
|
control_image = processor(control_image)[0].convert("RGB")
|
||||||
|
|
||||||
|
image = pipe(
|
||||||
|
prompt=prompt,
|
||||||
|
control_image=control_image,
|
||||||
|
height=1024,
|
||||||
|
width=1024,
|
||||||
|
num_inference_steps=30,
|
||||||
|
guidance_scale=10.0,
|
||||||
|
generator=torch.Generator().manual_seed(42),
|
||||||
|
).images[0]
|
||||||
|
image.save("output.png")
|
||||||
|
```
|
||||||
|
|
||||||
|
### Redux
|
||||||
|
|
||||||
|
* Flux Redux pipeline is an adapter for FLUX.1 base models. It can be used with both flux-dev and flux-schnell, for image-to-image generation.
|
||||||
|
* You can first use the `FluxPriorReduxPipeline` to get the `prompt_embeds` and `pooled_prompt_embeds`, and then feed them into the `FluxPipeline` for image-to-image generation.
|
||||||
|
* When use `FluxPriorReduxPipeline` with a base pipeline, you can set `text_encoder=None` and `text_encoder_2=None` in the base pipeline, in order to save VRAM.
|
||||||
|
|
||||||
|
```python
|
||||||
|
import torch
|
||||||
|
from diffusers import FluxPriorReduxPipeline, FluxPipeline
|
||||||
|
from diffusers.utils import load_image
|
||||||
|
device = "cuda"
|
||||||
|
dtype = torch.bfloat16
|
||||||
|
|
||||||
|
|
||||||
|
repo_redux = "black-forest-labs/FLUX.1-Redux-dev"
|
||||||
|
repo_base = "black-forest-labs/FLUX.1-dev"
|
||||||
|
pipe_prior_redux = FluxPriorReduxPipeline.from_pretrained(repo_redux, torch_dtype=dtype).to(device)
|
||||||
|
pipe = FluxPipeline.from_pretrained(
|
||||||
|
repo_base,
|
||||||
|
text_encoder=None,
|
||||||
|
text_encoder_2=None,
|
||||||
|
torch_dtype=torch.bfloat16
|
||||||
|
).to(device)
|
||||||
|
|
||||||
|
image = load_image("https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/style_ziggy/img5.png")
|
||||||
|
pipe_prior_output = pipe_prior_redux(image)
|
||||||
|
images = pipe(
|
||||||
|
guidance_scale=2.5,
|
||||||
|
num_inference_steps=50,
|
||||||
|
generator=torch.Generator("cpu").manual_seed(0),
|
||||||
|
**pipe_prior_output,
|
||||||
|
).images
|
||||||
|
images[0].save("flux-redux.png")
|
||||||
|
```
|
||||||
|
|
||||||
|
## Running FP16 inference
|
||||||
|
|
||||||
|
Flux can generate high-quality images with FP16 (i.e. to accelerate inference on Turing/Volta GPUs) but produces different outputs compared to FP32/BF16. The issue is that some activations in the text encoders have to be clipped when running in FP16, which affects the overall image. Forcing text encoders to run with FP32 inference thus removes this output difference. See [here](https://github.com/huggingface/diffusers/pull/9097#issuecomment-2272292516) for details.
|
||||||
|
|
||||||
|
FP16 inference code:
|
||||||
|
```python
|
||||||
|
import torch
|
||||||
|
from diffusers import FluxPipeline
|
||||||
|
|
||||||
|
pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-schnell", torch_dtype=torch.bfloat16) # can replace schnell with dev
|
||||||
|
# to run on low vram GPUs (i.e. between 4 and 32 GB VRAM)
|
||||||
|
pipe.enable_sequential_cpu_offload()
|
||||||
|
pipe.vae.enable_slicing()
|
||||||
|
pipe.vae.enable_tiling()
|
||||||
|
|
||||||
|
pipe.to(torch.float16) # casting here instead of in the pipeline constructor because doing so in the constructor loads all models into CPU memory at once
|
||||||
|
|
||||||
|
prompt = "A cat holding a sign that says hello world"
|
||||||
|
out = pipe(
|
||||||
|
prompt=prompt,
|
||||||
|
guidance_scale=0.,
|
||||||
|
height=768,
|
||||||
|
width=1360,
|
||||||
|
num_inference_steps=4,
|
||||||
|
max_sequence_length=256,
|
||||||
|
).images[0]
|
||||||
|
out.save("image.png")
|
||||||
|
```
|
||||||
|
|
||||||
|
## Single File Loading for the `FluxTransformer2DModel`
|
||||||
|
|
||||||
|
The `FluxTransformer2DModel` supports loading checkpoints in the original format shipped by Black Forest Labs. This is also useful when trying to load finetunes or quantized versions of the models that have been published by the community.
|
||||||
|
|
||||||
|
<Tip>
|
||||||
|
`FP8` inference can be brittle depending on the GPU type, CUDA version, and `torch` version that you are using. It is recommended that you use the `optimum-quanto` library in order to run FP8 inference on your machine.
|
||||||
|
</Tip>
|
||||||
|
|
||||||
|
The following example demonstrates how to run Flux with less than 16GB of VRAM.
|
||||||
|
|
||||||
|
First install `optimum-quanto`
|
||||||
|
|
||||||
|
```shell
|
||||||
|
pip install optimum-quanto
|
||||||
|
```
|
||||||
|
|
||||||
|
Then run the following example
|
||||||
|
|
||||||
|
```python
|
||||||
|
import torch
|
||||||
|
from diffusers import FluxTransformer2DModel, FluxPipeline
|
||||||
|
from transformers import T5EncoderModel, CLIPTextModel
|
||||||
|
from optimum.quanto import freeze, qfloat8, quantize
|
||||||
|
|
||||||
|
bfl_repo = "black-forest-labs/FLUX.1-dev"
|
||||||
|
dtype = torch.bfloat16
|
||||||
|
|
||||||
|
transformer = FluxTransformer2DModel.from_single_file("https://huggingface.co/Kijai/flux-fp8/blob/main/flux1-dev-fp8.safetensors", torch_dtype=dtype)
|
||||||
|
quantize(transformer, weights=qfloat8)
|
||||||
|
freeze(transformer)
|
||||||
|
|
||||||
|
text_encoder_2 = T5EncoderModel.from_pretrained(bfl_repo, subfolder="text_encoder_2", torch_dtype=dtype)
|
||||||
|
quantize(text_encoder_2, weights=qfloat8)
|
||||||
|
freeze(text_encoder_2)
|
||||||
|
|
||||||
|
pipe = FluxPipeline.from_pretrained(bfl_repo, transformer=None, text_encoder_2=None, torch_dtype=dtype)
|
||||||
|
pipe.transformer = transformer
|
||||||
|
pipe.text_encoder_2 = text_encoder_2
|
||||||
|
|
||||||
|
pipe.enable_model_cpu_offload()
|
||||||
|
|
||||||
|
prompt = "A cat holding a sign that says hello world"
|
||||||
|
image = pipe(
|
||||||
|
prompt,
|
||||||
|
guidance_scale=3.5,
|
||||||
|
output_type="pil",
|
||||||
|
num_inference_steps=20,
|
||||||
|
generator=torch.Generator("cpu").manual_seed(0)
|
||||||
|
).images[0]
|
||||||
|
|
||||||
|
image.save("flux-fp8-dev.png")
|
||||||
|
```
|
||||||
|
|
||||||
|
## FluxPipeline
|
||||||
|
|
||||||
|
[[autodoc]] FluxPipeline
|
||||||
|
- all
|
||||||
|
- __call__
|
||||||
|
|
||||||
|
## FluxImg2ImgPipeline
|
||||||
|
|
||||||
|
[[autodoc]] FluxImg2ImgPipeline
|
||||||
|
- all
|
||||||
|
- __call__
|
||||||
|
|
||||||
|
## FluxInpaintPipeline
|
||||||
|
|
||||||
|
[[autodoc]] FluxInpaintPipeline
|
||||||
|
- all
|
||||||
|
- __call__
|
||||||
|
|
||||||
|
|
||||||
|
## FluxControlNetInpaintPipeline
|
||||||
|
|
||||||
|
[[autodoc]] FluxControlNetInpaintPipeline
|
||||||
|
- all
|
||||||
|
- __call__
|
||||||
|
|
||||||
|
## FluxControlNetImg2ImgPipeline
|
||||||
|
|
||||||
|
[[autodoc]] FluxControlNetImg2ImgPipeline
|
||||||
|
- all
|
||||||
|
- __call__
|
||||||
|
|
||||||
|
## FluxControlPipeline
|
||||||
|
|
||||||
|
[[autodoc]] FluxControlPipeline
|
||||||
|
- all
|
||||||
|
- __call__
|
||||||
|
|
||||||
|
## FluxControlImg2ImgPipeline
|
||||||
|
|
||||||
|
[[autodoc]] FluxControlImg2ImgPipeline
|
||||||
|
- all
|
||||||
|
- __call__
|
||||||
|
|
||||||
|
## FluxPriorReduxPipeline
|
||||||
|
|
||||||
|
[[autodoc]] FluxPriorReduxPipeline
|
||||||
|
- all
|
||||||
|
- __call__
|
||||||
|
|
||||||
|
## FluxFillPipeline
|
||||||
|
|
||||||
|
[[autodoc]] FluxFillPipeline
|
||||||
|
- all
|
||||||
|
- __call__
|
||||||
101
docs/source/en/api/pipelines/hunyuandit.md
Normal file
101
docs/source/en/api/pipelines/hunyuandit.md
Normal file
@@ -0,0 +1,101 @@
|
|||||||
|
<!--Copyright 2024 The HuggingFace Team and Tencent Hunyuan Team. All rights reserved.
|
||||||
|
|
||||||
|
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||||
|
the License. You may obtain a copy of the License at
|
||||||
|
|
||||||
|
http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
|
||||||
|
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||||
|
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||||
|
specific language governing permissions and limitations under the License.
|
||||||
|
-->
|
||||||
|
|
||||||
|
# Hunyuan-DiT
|
||||||
|

|
||||||
|
|
||||||
|
[Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding](https://arxiv.org/abs/2405.08748) from Tencent Hunyuan.
|
||||||
|
|
||||||
|
The abstract from the paper is:
|
||||||
|
|
||||||
|
*We present Hunyuan-DiT, a text-to-image diffusion transformer with fine-grained understanding of both English and Chinese. To construct Hunyuan-DiT, we carefully design the transformer structure, text encoder, and positional encoding. We also build from scratch a whole data pipeline to update and evaluate data for iterative model optimization. For fine-grained language understanding, we train a Multimodal Large Language Model to refine the captions of the images. Finally, Hunyuan-DiT can perform multi-turn multimodal dialogue with users, generating and refining images according to the context. Through our holistic human evaluation protocol with more than 50 professional human evaluators, Hunyuan-DiT sets a new state-of-the-art in Chinese-to-image generation compared with other open-source models.*
|
||||||
|
|
||||||
|
|
||||||
|
You can find the original codebase at [Tencent/HunyuanDiT](https://github.com/Tencent/HunyuanDiT) and all the available checkpoints at [Tencent-Hunyuan](https://huggingface.co/Tencent-Hunyuan/HunyuanDiT).
|
||||||
|
|
||||||
|
**Highlights**: HunyuanDiT supports Chinese/English-to-image, multi-resolution generation.
|
||||||
|
|
||||||
|
HunyuanDiT has the following components:
|
||||||
|
* It uses a diffusion transformer as the backbone
|
||||||
|
* It combines two text encoders, a bilingual CLIP and a multilingual T5 encoder
|
||||||
|
|
||||||
|
<Tip>
|
||||||
|
|
||||||
|
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers.md) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading.md#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.
|
||||||
|
|
||||||
|
</Tip>
|
||||||
|
|
||||||
|
<Tip>
|
||||||
|
|
||||||
|
You can further improve generation quality by passing the generated image from [`HungyuanDiTPipeline`] to the [SDXL refiner](../../using-diffusers/sdxl#base-to-refiner-model) model.
|
||||||
|
|
||||||
|
</Tip>
|
||||||
|
|
||||||
|
## Optimization
|
||||||
|
|
||||||
|
You can optimize the pipeline's runtime and memory consumption with torch.compile and feed-forward chunking. To learn about other optimization methods, check out the [Speed up inference](../../optimization/fp16) and [Reduce memory usage](../../optimization/memory) guides.
|
||||||
|
|
||||||
|
### Inference
|
||||||
|
|
||||||
|
Use [`torch.compile`](https://huggingface.co/docs/diffusers/main/en/tutorials/fast_diffusion#torchcompile) to reduce the inference latency.
|
||||||
|
|
||||||
|
First, load the pipeline:
|
||||||
|
|
||||||
|
```python
|
||||||
|
from diffusers import HunyuanDiTPipeline
|
||||||
|
import torch
|
||||||
|
|
||||||
|
pipeline = HunyuanDiTPipeline.from_pretrained(
|
||||||
|
"Tencent-Hunyuan/HunyuanDiT-Diffusers", torch_dtype=torch.float16
|
||||||
|
).to("cuda")
|
||||||
|
```
|
||||||
|
|
||||||
|
Then change the memory layout of the pipelines `transformer` and `vae` components to `torch.channels-last`:
|
||||||
|
|
||||||
|
```python
|
||||||
|
pipeline.transformer.to(memory_format=torch.channels_last)
|
||||||
|
pipeline.vae.to(memory_format=torch.channels_last)
|
||||||
|
```
|
||||||
|
|
||||||
|
Finally, compile the components and run inference:
|
||||||
|
|
||||||
|
```python
|
||||||
|
pipeline.transformer = torch.compile(pipeline.transformer, mode="max-autotune", fullgraph=True)
|
||||||
|
pipeline.vae.decode = torch.compile(pipeline.vae.decode, mode="max-autotune", fullgraph=True)
|
||||||
|
|
||||||
|
image = pipeline(prompt="一个宇航员在骑马").images[0]
|
||||||
|
```
|
||||||
|
|
||||||
|
The [benchmark](https://gist.github.com/sayakpaul/29d3a14905cfcbf611fe71ebd22e9b23) results on a 80GB A100 machine are:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
With torch.compile(): Average inference time: 12.470 seconds.
|
||||||
|
Without torch.compile(): Average inference time: 20.570 seconds.
|
||||||
|
```
|
||||||
|
|
||||||
|
### Memory optimization
|
||||||
|
|
||||||
|
By loading the T5 text encoder in 8 bits, you can run the pipeline in just under 6 GBs of GPU VRAM. Refer to [this script](https://gist.github.com/sayakpaul/3154605f6af05b98a41081aaba5ca43e) for details.
|
||||||
|
|
||||||
|
Furthermore, you can use the [`~HunyuanDiT2DModel.enable_forward_chunking`] method to reduce memory usage. Feed-forward chunking runs the feed-forward layers in a transformer block in a loop instead of all at once. This gives you a trade-off between memory consumption and inference runtime.
|
||||||
|
|
||||||
|
```diff
|
||||||
|
+ pipeline.transformer.enable_forward_chunking(chunk_size=1, dim=1)
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
## HunyuanDiTPipeline
|
||||||
|
|
||||||
|
[[autodoc]] HunyuanDiTPipeline
|
||||||
|
- all
|
||||||
|
- __call__
|
||||||
|
|
||||||
@@ -47,6 +47,7 @@ Sample output with I2VGenXL:
|
|||||||
* Unlike SVD, it additionally accepts text prompts as inputs.
|
* Unlike SVD, it additionally accepts text prompts as inputs.
|
||||||
* It can generate higher resolution videos.
|
* It can generate higher resolution videos.
|
||||||
* When using the [`DDIMScheduler`] (which is default for this pipeline), less than 50 steps for inference leads to bad results.
|
* When using the [`DDIMScheduler`] (which is default for this pipeline), less than 50 steps for inference leads to bad results.
|
||||||
|
* This implementation is 1-stage variant of I2VGenXL. The main figure in the [I2VGen-XL](https://arxiv.org/abs/2311.04145) paper shows a 2-stage variant, however, 1-stage variant works well. See [this discussion](https://github.com/huggingface/diffusers/discussions/7952) for more details.
|
||||||
|
|
||||||
## I2VGenXLPipeline
|
## I2VGenXLPipeline
|
||||||
[[autodoc]] I2VGenXLPipeline
|
[[autodoc]] I2VGenXLPipeline
|
||||||
|
|||||||
@@ -11,7 +11,7 @@ specific language governing permissions and limitations under the License.
|
|||||||
|
|
||||||
Kandinsky 3 is created by [Vladimir Arkhipkin](https://github.com/oriBetelgeuse),[Anastasia Maltseva](https://github.com/NastyaMittseva),[Igor Pavlov](https://github.com/boomb0om),[Andrei Filatov](https://github.com/anvilarth),[Arseniy Shakhmatov](https://github.com/cene555),[Andrey Kuznetsov](https://github.com/kuznetsoffandrey),[Denis Dimitrov](https://github.com/denndimitrov), [Zein Shaheen](https://github.com/zeinsh)
|
Kandinsky 3 is created by [Vladimir Arkhipkin](https://github.com/oriBetelgeuse),[Anastasia Maltseva](https://github.com/NastyaMittseva),[Igor Pavlov](https://github.com/boomb0om),[Andrei Filatov](https://github.com/anvilarth),[Arseniy Shakhmatov](https://github.com/cene555),[Andrey Kuznetsov](https://github.com/kuznetsoffandrey),[Denis Dimitrov](https://github.com/denndimitrov), [Zein Shaheen](https://github.com/zeinsh)
|
||||||
|
|
||||||
The description from it's Github page:
|
The description from it's GitHub page:
|
||||||
|
|
||||||
*Kandinsky 3.0 is an open-source text-to-image diffusion model built upon the Kandinsky2-x model family. In comparison to its predecessors, enhancements have been made to the text understanding and visual quality of the model, achieved by increasing the size of the text encoder and Diffusion U-Net models, respectively.*
|
*Kandinsky 3.0 is an open-source text-to-image diffusion model built upon the Kandinsky2-x model family. In comparison to its predecessors, enhancements have been made to the text understanding and visual quality of the model, achieved by increasing the size of the text encoder and Diffusion U-Net models, respectively.*
|
||||||
|
|
||||||
|
|||||||
115
docs/source/en/api/pipelines/kolors.md
Normal file
115
docs/source/en/api/pipelines/kolors.md
Normal file
@@ -0,0 +1,115 @@
|
|||||||
|
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
|
||||||
|
|
||||||
|
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||||
|
the License. You may obtain a copy of the License at
|
||||||
|
|
||||||
|
http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
|
||||||
|
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||||
|
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||||
|
specific language governing permissions and limitations under the License.
|
||||||
|
-->
|
||||||
|
|
||||||
|
# Kolors: Effective Training of Diffusion Model for Photorealistic Text-to-Image Synthesis
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
Kolors is a large-scale text-to-image generation model based on latent diffusion, developed by [the Kuaishou Kolors team](https://github.com/Kwai-Kolors/Kolors). Trained on billions of text-image pairs, Kolors exhibits significant advantages over both open-source and closed-source models in visual quality, complex semantic accuracy, and text rendering for both Chinese and English characters. Furthermore, Kolors supports both Chinese and English inputs, demonstrating strong performance in understanding and generating Chinese-specific content. For more details, please refer to this [technical report](https://github.com/Kwai-Kolors/Kolors/blob/master/imgs/Kolors_paper.pdf).
|
||||||
|
|
||||||
|
The abstract from the technical report is:
|
||||||
|
|
||||||
|
*We present Kolors, a latent diffusion model for text-to-image synthesis, characterized by its profound understanding of both English and Chinese, as well as an impressive degree of photorealism. There are three key insights contributing to the development of Kolors. Firstly, unlike large language model T5 used in Imagen and Stable Diffusion 3, Kolors is built upon the General Language Model (GLM), which enhances its comprehension capabilities in both English and Chinese. Moreover, we employ a multimodal large language model to recaption the extensive training dataset for fine-grained text understanding. These strategies significantly improve Kolors’ ability to comprehend intricate semantics, particularly those involving multiple entities, and enable its advanced text rendering capabilities. Secondly, we divide the training of Kolors into two phases: the concept learning phase with broad knowledge and the quality improvement phase with specifically curated high-aesthetic data. Furthermore, we investigate the critical role of the noise schedule and introduce a novel schedule to optimize high-resolution image generation. These strategies collectively enhance the visual appeal of the generated high-resolution images. Lastly, we propose a category-balanced benchmark KolorsPrompts, which serves as a guide for the training and evaluation of Kolors. Consequently, even when employing the commonly used U-Net backbone, Kolors has demonstrated remarkable performance in human evaluations, surpassing the existing open-source models and achieving Midjourney-v6 level performance, especially in terms of visual appeal. We will release the code and weights of Kolors at <https://github.com/Kwai-Kolors/Kolors>, and hope that it will benefit future research and applications in the visual generation community.*
|
||||||
|
|
||||||
|
## Usage Example
|
||||||
|
|
||||||
|
```python
|
||||||
|
import torch
|
||||||
|
|
||||||
|
from diffusers import DPMSolverMultistepScheduler, KolorsPipeline
|
||||||
|
|
||||||
|
pipe = KolorsPipeline.from_pretrained("Kwai-Kolors/Kolors-diffusers", torch_dtype=torch.float16, variant="fp16")
|
||||||
|
pipe.to("cuda")
|
||||||
|
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config, use_karras_sigmas=True)
|
||||||
|
|
||||||
|
image = pipe(
|
||||||
|
prompt='一张瓢虫的照片,微距,变焦,高质量,电影,拿着一个牌子,写着"可图"',
|
||||||
|
negative_prompt="",
|
||||||
|
guidance_scale=6.5,
|
||||||
|
num_inference_steps=25,
|
||||||
|
).images[0]
|
||||||
|
|
||||||
|
image.save("kolors_sample.png")
|
||||||
|
```
|
||||||
|
|
||||||
|
### IP Adapter
|
||||||
|
|
||||||
|
Kolors needs a different IP Adapter to work, and it uses [Openai-CLIP-336](https://huggingface.co/openai/clip-vit-large-patch14-336) as an image encoder.
|
||||||
|
|
||||||
|
<Tip>
|
||||||
|
|
||||||
|
Using an IP Adapter with Kolors requires more than 24GB of VRAM. To use it, we recommend using [`~DiffusionPipeline.enable_model_cpu_offload`] on consumer GPUs.
|
||||||
|
|
||||||
|
</Tip>
|
||||||
|
|
||||||
|
<Tip>
|
||||||
|
|
||||||
|
While Kolors is integrated in Diffusers, you need to load the image encoder from a revision to use the safetensor files. You can still use the main branch of the original repository if you're comfortable loading pickle checkpoints.
|
||||||
|
|
||||||
|
</Tip>
|
||||||
|
|
||||||
|
```python
|
||||||
|
import torch
|
||||||
|
from transformers import CLIPVisionModelWithProjection
|
||||||
|
|
||||||
|
from diffusers import DPMSolverMultistepScheduler, KolorsPipeline
|
||||||
|
from diffusers.utils import load_image
|
||||||
|
|
||||||
|
image_encoder = CLIPVisionModelWithProjection.from_pretrained(
|
||||||
|
"Kwai-Kolors/Kolors-IP-Adapter-Plus",
|
||||||
|
subfolder="image_encoder",
|
||||||
|
low_cpu_mem_usage=True,
|
||||||
|
torch_dtype=torch.float16,
|
||||||
|
revision="refs/pr/4",
|
||||||
|
)
|
||||||
|
|
||||||
|
pipe = KolorsPipeline.from_pretrained(
|
||||||
|
"Kwai-Kolors/Kolors-diffusers", image_encoder=image_encoder, torch_dtype=torch.float16, variant="fp16"
|
||||||
|
)
|
||||||
|
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config, use_karras_sigmas=True)
|
||||||
|
|
||||||
|
pipe.load_ip_adapter(
|
||||||
|
"Kwai-Kolors/Kolors-IP-Adapter-Plus",
|
||||||
|
subfolder="",
|
||||||
|
weight_name="ip_adapter_plus_general.safetensors",
|
||||||
|
revision="refs/pr/4",
|
||||||
|
image_encoder_folder=None,
|
||||||
|
)
|
||||||
|
pipe.enable_model_cpu_offload()
|
||||||
|
|
||||||
|
ipa_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/kolors/cat_square.png")
|
||||||
|
|
||||||
|
image = pipe(
|
||||||
|
prompt="best quality, high quality",
|
||||||
|
negative_prompt="",
|
||||||
|
guidance_scale=6.5,
|
||||||
|
num_inference_steps=25,
|
||||||
|
ip_adapter_image=ipa_image,
|
||||||
|
).images[0]
|
||||||
|
|
||||||
|
image.save("kolors_ipa_sample.png")
|
||||||
|
```
|
||||||
|
|
||||||
|
## KolorsPipeline
|
||||||
|
|
||||||
|
[[autodoc]] KolorsPipeline
|
||||||
|
|
||||||
|
- all
|
||||||
|
- __call__
|
||||||
|
|
||||||
|
## KolorsImg2ImgPipeline
|
||||||
|
|
||||||
|
[[autodoc]] KolorsImg2ImgPipeline
|
||||||
|
|
||||||
|
- all
|
||||||
|
- __call__
|
||||||
|
|
||||||
77
docs/source/en/api/pipelines/latte.md
Normal file
77
docs/source/en/api/pipelines/latte.md
Normal file
@@ -0,0 +1,77 @@
|
|||||||
|
<!-- # Copyright 2024 The HuggingFace Team. All rights reserved.
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License. -->
|
||||||
|
|
||||||
|
# Latte
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
[Latte: Latent Diffusion Transformer for Video Generation](https://arxiv.org/abs/2401.03048) from Monash University, Shanghai AI Lab, Nanjing University, and Nanyang Technological University.
|
||||||
|
|
||||||
|
The abstract from the paper is:
|
||||||
|
|
||||||
|
*We propose a novel Latent Diffusion Transformer, namely Latte, for video generation. Latte first extracts spatio-temporal tokens from input videos and then adopts a series of Transformer blocks to model video distribution in the latent space. In order to model a substantial number of tokens extracted from videos, four efficient variants are introduced from the perspective of decomposing the spatial and temporal dimensions of input videos. To improve the quality of generated videos, we determine the best practices of Latte through rigorous experimental analysis, including video clip patch embedding, model variants, timestep-class information injection, temporal positional embedding, and learning strategies. Our comprehensive evaluation demonstrates that Latte achieves state-of-the-art performance across four standard video generation datasets, i.e., FaceForensics, SkyTimelapse, UCF101, and Taichi-HD. In addition, we extend Latte to text-to-video generation (T2V) task, where Latte achieves comparable results compared to recent T2V models. We strongly believe that Latte provides valuable insights for future research on incorporating Transformers into diffusion models for video generation.*
|
||||||
|
|
||||||
|
**Highlights**: Latte is a latent diffusion transformer proposed as a backbone for modeling different modalities (trained for text-to-video generation here). It achieves state-of-the-art performance across four standard video benchmarks - [FaceForensics](https://arxiv.org/abs/1803.09179), [SkyTimelapse](https://arxiv.org/abs/1709.07592), [UCF101](https://arxiv.org/abs/1212.0402) and [Taichi-HD](https://arxiv.org/abs/2003.00196). To prepare and download the datasets for evaluation, please refer to [this https URL](https://github.com/Vchitect/Latte/blob/main/docs/datasets_evaluation.md).
|
||||||
|
|
||||||
|
This pipeline was contributed by [maxin-cn](https://github.com/maxin-cn). The original codebase can be found [here](https://github.com/Vchitect/Latte). The original weights can be found under [hf.co/maxin-cn](https://huggingface.co/maxin-cn).
|
||||||
|
|
||||||
|
<Tip>
|
||||||
|
|
||||||
|
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers.md) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading.md#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.
|
||||||
|
|
||||||
|
</Tip>
|
||||||
|
|
||||||
|
### Inference
|
||||||
|
|
||||||
|
Use [`torch.compile`](https://huggingface.co/docs/diffusers/main/en/tutorials/fast_diffusion#torchcompile) to reduce the inference latency.
|
||||||
|
|
||||||
|
First, load the pipeline:
|
||||||
|
|
||||||
|
```python
|
||||||
|
import torch
|
||||||
|
from diffusers import LattePipeline
|
||||||
|
|
||||||
|
pipeline = LattePipeline.from_pretrained(
|
||||||
|
"maxin-cn/Latte-1", torch_dtype=torch.float16
|
||||||
|
).to("cuda")
|
||||||
|
```
|
||||||
|
|
||||||
|
Then change the memory layout of the pipelines `transformer` and `vae` components to `torch.channels-last`:
|
||||||
|
|
||||||
|
```python
|
||||||
|
pipeline.transformer.to(memory_format=torch.channels_last)
|
||||||
|
pipeline.vae.to(memory_format=torch.channels_last)
|
||||||
|
```
|
||||||
|
|
||||||
|
Finally, compile the components and run inference:
|
||||||
|
|
||||||
|
```python
|
||||||
|
pipeline.transformer = torch.compile(pipeline.transformer)
|
||||||
|
pipeline.vae.decode = torch.compile(pipeline.vae.decode)
|
||||||
|
|
||||||
|
video = pipeline(prompt="A dog wearing sunglasses floating in space, surreal, nebulae in background").frames[0]
|
||||||
|
```
|
||||||
|
|
||||||
|
The [benchmark](https://gist.github.com/a-r-r-o-w/4e1694ca46374793c0361d740a99ff19) results on an 80GB A100 machine are:
|
||||||
|
|
||||||
|
```
|
||||||
|
Without torch.compile(): Average inference time: 16.246 seconds.
|
||||||
|
With torch.compile(): Average inference time: 14.573 seconds.
|
||||||
|
```
|
||||||
|
|
||||||
|
## LattePipeline
|
||||||
|
|
||||||
|
[[autodoc]] LattePipeline
|
||||||
|
- all
|
||||||
|
- __call__
|
||||||
90
docs/source/en/api/pipelines/lumina.md
Normal file
90
docs/source/en/api/pipelines/lumina.md
Normal file
@@ -0,0 +1,90 @@
|
|||||||
|
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
|
||||||
|
|
||||||
|
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||||
|
the License. You may obtain a copy of the License at
|
||||||
|
|
||||||
|
http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
|
||||||
|
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||||
|
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||||
|
specific language governing permissions and limitations under the License.
|
||||||
|
-->
|
||||||
|
|
||||||
|
# Lumina-T2X
|
||||||
|

|
||||||
|
|
||||||
|
[Lumina-Next : Making Lumina-T2X Stronger and Faster with Next-DiT](https://github.com/Alpha-VLLM/Lumina-T2X/blob/main/assets/lumina-next.pdf) from Alpha-VLLM, OpenGVLab, Shanghai AI Laboratory.
|
||||||
|
|
||||||
|
The abstract from the paper is:
|
||||||
|
|
||||||
|
*Lumina-T2X is a nascent family of Flow-based Large Diffusion Transformers (Flag-DiT) that establishes a unified framework for transforming noise into various modalities, such as images and videos, conditioned on text instructions. Despite its promising capabilities, Lumina-T2X still encounters challenges including training instability, slow inference, and extrapolation artifacts. In this paper, we present Lumina-Next, an improved version of Lumina-T2X, showcasing stronger generation performance with increased training and inference efficiency. We begin with a comprehensive analysis of the Flag-DiT architecture and identify several suboptimal components, which we address by introducing the Next-DiT architecture with 3D RoPE and sandwich normalizations. To enable better resolution extrapolation, we thoroughly compare different context extrapolation methods applied to text-to-image generation with 3D RoPE, and propose Frequency- and Time-Aware Scaled RoPE tailored for diffusion transformers. Additionally, we introduce a sigmoid time discretization schedule to reduce sampling steps in solving the Flow ODE and the Context Drop method to merge redundant visual tokens for faster network evaluation, effectively boosting the overall sampling speed. Thanks to these improvements, Lumina-Next not only improves the quality and efficiency of basic text-to-image generation but also demonstrates superior resolution extrapolation capabilities and multilingual generation using decoder-based LLMs as the text encoder, all in a zero-shot manner. To further validate Lumina-Next as a versatile generative framework, we instantiate it on diverse tasks including visual recognition, multi-view, audio, music, and point cloud generation, showcasing strong performance across these domains. By releasing all codes and model weights at https://github.com/Alpha-VLLM/Lumina-T2X, we aim to advance the development of next-generation generative AI capable of universal modeling.*
|
||||||
|
|
||||||
|
**Highlights**: Lumina-Next is a next-generation Diffusion Transformer that significantly enhances text-to-image generation, multilingual generation, and multitask performance by introducing the Next-DiT architecture, 3D RoPE, and frequency- and time-aware RoPE, among other improvements.
|
||||||
|
|
||||||
|
Lumina-Next has the following components:
|
||||||
|
* It improves sampling efficiency with fewer and faster Steps.
|
||||||
|
* It uses a Next-DiT as a transformer backbone with Sandwichnorm 3D RoPE, and Grouped-Query Attention.
|
||||||
|
* It uses a Frequency- and Time-Aware Scaled RoPE.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
[Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers](https://arxiv.org/abs/2405.05945) from Alpha-VLLM, OpenGVLab, Shanghai AI Laboratory.
|
||||||
|
|
||||||
|
The abstract from the paper is:
|
||||||
|
|
||||||
|
*Sora unveils the potential of scaling Diffusion Transformer for generating photorealistic images and videos at arbitrary resolutions, aspect ratios, and durations, yet it still lacks sufficient implementation details. In this technical report, we introduce the Lumina-T2X family - a series of Flow-based Large Diffusion Transformers (Flag-DiT) equipped with zero-initialized attention, as a unified framework designed to transform noise into images, videos, multi-view 3D objects, and audio clips conditioned on text instructions. By tokenizing the latent spatial-temporal space and incorporating learnable placeholders such as [nextline] and [nextframe] tokens, Lumina-T2X seamlessly unifies the representations of different modalities across various spatial-temporal resolutions. This unified approach enables training within a single framework for different modalities and allows for flexible generation of multimodal data at any resolution, aspect ratio, and length during inference. Advanced techniques like RoPE, RMSNorm, and flow matching enhance the stability, flexibility, and scalability of Flag-DiT, enabling models of Lumina-T2X to scale up to 7 billion parameters and extend the context window to 128K tokens. This is particularly beneficial for creating ultra-high-definition images with our Lumina-T2I model and long 720p videos with our Lumina-T2V model. Remarkably, Lumina-T2I, powered by a 5-billion-parameter Flag-DiT, requires only 35% of the training computational costs of a 600-million-parameter naive DiT. Our further comprehensive analysis underscores Lumina-T2X's preliminary capability in resolution extrapolation, high-resolution editing, generating consistent 3D views, and synthesizing videos with seamless transitions. We expect that the open-sourcing of Lumina-T2X will further foster creativity, transparency, and diversity in the generative AI community.*
|
||||||
|
|
||||||
|
|
||||||
|
You can find the original codebase at [Alpha-VLLM](https://github.com/Alpha-VLLM/Lumina-T2X) and all the available checkpoints at [Alpha-VLLM Lumina Family](https://huggingface.co/collections/Alpha-VLLM/lumina-family-66423205bedb81171fd0644b).
|
||||||
|
|
||||||
|
**Highlights**: Lumina-T2X supports Any Modality, Resolution, and Duration.
|
||||||
|
|
||||||
|
Lumina-T2X has the following components:
|
||||||
|
* It uses a Flow-based Large Diffusion Transformer as the backbone
|
||||||
|
* It supports different any modalities with one backbone and corresponding encoder, decoder.
|
||||||
|
|
||||||
|
This pipeline was contributed by [PommesPeter](https://github.com/PommesPeter). The original codebase can be found [here](https://github.com/Alpha-VLLM/Lumina-T2X). The original weights can be found under [hf.co/Alpha-VLLM](https://huggingface.co/Alpha-VLLM).
|
||||||
|
|
||||||
|
<Tip>
|
||||||
|
|
||||||
|
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers.md) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading.md#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.
|
||||||
|
|
||||||
|
</Tip>
|
||||||
|
|
||||||
|
### Inference (Text-to-Image)
|
||||||
|
|
||||||
|
Use [`torch.compile`](https://huggingface.co/docs/diffusers/main/en/tutorials/fast_diffusion#torchcompile) to reduce the inference latency.
|
||||||
|
|
||||||
|
First, load the pipeline:
|
||||||
|
|
||||||
|
```python
|
||||||
|
from diffusers import LuminaText2ImgPipeline
|
||||||
|
import torch
|
||||||
|
|
||||||
|
pipeline = LuminaText2ImgPipeline.from_pretrained(
|
||||||
|
"Alpha-VLLM/Lumina-Next-SFT-diffusers", torch_dtype=torch.bfloat16
|
||||||
|
).to("cuda")
|
||||||
|
```
|
||||||
|
|
||||||
|
Then change the memory layout of the pipelines `transformer` and `vae` components to `torch.channels-last`:
|
||||||
|
|
||||||
|
```python
|
||||||
|
pipeline.transformer.to(memory_format=torch.channels_last)
|
||||||
|
pipeline.vae.to(memory_format=torch.channels_last)
|
||||||
|
```
|
||||||
|
|
||||||
|
Finally, compile the components and run inference:
|
||||||
|
|
||||||
|
```python
|
||||||
|
pipeline.transformer = torch.compile(pipeline.transformer, mode="max-autotune", fullgraph=True)
|
||||||
|
pipeline.vae.decode = torch.compile(pipeline.vae.decode, mode="max-autotune", fullgraph=True)
|
||||||
|
|
||||||
|
image = pipeline(prompt="Upper body of a young woman in a Victorian-era outfit with brass goggles and leather straps. Background shows an industrial revolution cityscape with smoky skies and tall, metal structures").images[0]
|
||||||
|
```
|
||||||
|
|
||||||
|
## LuminaText2ImgPipeline
|
||||||
|
|
||||||
|
[[autodoc]] LuminaText2ImgPipeline
|
||||||
|
- all
|
||||||
|
- __call__
|
||||||
|
|
||||||
76
docs/source/en/api/pipelines/marigold.md
Normal file
76
docs/source/en/api/pipelines/marigold.md
Normal file
@@ -0,0 +1,76 @@
|
|||||||
|
<!--Copyright 2024 Marigold authors and The HuggingFace Team. All rights reserved.
|
||||||
|
|
||||||
|
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||||
|
the License. You may obtain a copy of the License at
|
||||||
|
|
||||||
|
http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
|
||||||
|
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||||
|
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||||
|
specific language governing permissions and limitations under the License.
|
||||||
|
-->
|
||||||
|
|
||||||
|
# Marigold Pipelines for Computer Vision Tasks
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
Marigold was proposed in [Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation](https://huggingface.co/papers/2312.02145), a CVPR 2024 Oral paper by [Bingxin Ke](http://www.kebingxin.com/), [Anton Obukhov](https://www.obukhov.ai/), [Shengyu Huang](https://shengyuh.github.io/), [Nando Metzger](https://nandometzger.github.io/), [Rodrigo Caye Daudt](https://rcdaudt.github.io/), and [Konrad Schindler](https://scholar.google.com/citations?user=FZuNgqIAAAAJ&hl=en).
|
||||||
|
The idea is to repurpose the rich generative prior of Text-to-Image Latent Diffusion Models (LDMs) for traditional computer vision tasks.
|
||||||
|
Initially, this idea was explored to fine-tune Stable Diffusion for Monocular Depth Estimation, as shown in the teaser above.
|
||||||
|
Later,
|
||||||
|
- [Tianfu Wang](https://tianfwang.github.io/) trained the first Latent Consistency Model (LCM) of Marigold, which unlocked fast single-step inference;
|
||||||
|
- [Kevin Qu](https://www.linkedin.com/in/kevin-qu-b3417621b/?locale=en_US) extended the approach to Surface Normals Estimation;
|
||||||
|
- [Anton Obukhov](https://www.obukhov.ai/) contributed the pipelines and documentation into diffusers (enabled and supported by [YiYi Xu](https://yiyixuxu.github.io/) and [Sayak Paul](https://sayak.dev/)).
|
||||||
|
|
||||||
|
The abstract from the paper is:
|
||||||
|
|
||||||
|
*Monocular depth estimation is a fundamental computer vision task. Recovering 3D depth from a single image is geometrically ill-posed and requires scene understanding, so it is not surprising that the rise of deep learning has led to a breakthrough. The impressive progress of monocular depth estimators has mirrored the growth in model capacity, from relatively modest CNNs to large Transformer architectures. Still, monocular depth estimators tend to struggle when presented with images with unfamiliar content and layout, since their knowledge of the visual world is restricted by the data seen during training, and challenged by zero-shot generalization to new domains. This motivates us to explore whether the extensive priors captured in recent generative diffusion models can enable better, more generalizable depth estimation. We introduce Marigold, a method for affine-invariant monocular depth estimation that is derived from Stable Diffusion and retains its rich prior knowledge. The estimator can be fine-tuned in a couple of days on a single GPU using only synthetic training data. It delivers state-of-the-art performance across a wide range of datasets, including over 20% performance gains in specific cases. Project page: https://marigoldmonodepth.github.io.*
|
||||||
|
|
||||||
|
## Available Pipelines
|
||||||
|
|
||||||
|
Each pipeline supports one Computer Vision task, which takes an input RGB image as input and produces a *prediction* of the modality of interest, such as a depth map of the input image.
|
||||||
|
Currently, the following tasks are implemented:
|
||||||
|
|
||||||
|
| Pipeline | Predicted Modalities | Demos |
|
||||||
|
|---------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------:|
|
||||||
|
| [MarigoldDepthPipeline](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/marigold/pipeline_marigold_depth.py) | [Depth](https://en.wikipedia.org/wiki/Depth_map), [Disparity](https://en.wikipedia.org/wiki/Binocular_disparity) | [Fast Demo (LCM)](https://huggingface.co/spaces/prs-eth/marigold-lcm), [Slow Original Demo (DDIM)](https://huggingface.co/spaces/prs-eth/marigold) |
|
||||||
|
| [MarigoldNormalsPipeline](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/marigold/pipeline_marigold_normals.py) | [Surface normals](https://en.wikipedia.org/wiki/Normal_mapping) | [Fast Demo (LCM)](https://huggingface.co/spaces/prs-eth/marigold-normals-lcm) |
|
||||||
|
|
||||||
|
|
||||||
|
## Available Checkpoints
|
||||||
|
|
||||||
|
The original checkpoints can be found under the [PRS-ETH](https://huggingface.co/prs-eth/) Hugging Face organization.
|
||||||
|
|
||||||
|
<Tip>
|
||||||
|
|
||||||
|
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-components-across-pipelines) section to learn how to efficiently load the same components into multiple pipelines. Also, to know more about reducing the memory usage of this pipeline, refer to the ["Reduce memory usage"] section [here](../../using-diffusers/svd#reduce-memory-usage).
|
||||||
|
|
||||||
|
</Tip>
|
||||||
|
|
||||||
|
<Tip warning={true}>
|
||||||
|
|
||||||
|
Marigold pipelines were designed and tested only with `DDIMScheduler` and `LCMScheduler`.
|
||||||
|
Depending on the scheduler, the number of inference steps required to get reliable predictions varies, and there is no universal value that works best across schedulers.
|
||||||
|
Because of that, the default value of `num_inference_steps` in the `__call__` method of the pipeline is set to `None` (see the API reference).
|
||||||
|
Unless set explicitly, its value will be taken from the checkpoint configuration `model_index.json`.
|
||||||
|
This is done to ensure high-quality predictions when calling the pipeline with just the `image` argument.
|
||||||
|
|
||||||
|
</Tip>
|
||||||
|
|
||||||
|
See also Marigold [usage examples](marigold_usage).
|
||||||
|
|
||||||
|
## MarigoldDepthPipeline
|
||||||
|
[[autodoc]] MarigoldDepthPipeline
|
||||||
|
- all
|
||||||
|
- __call__
|
||||||
|
|
||||||
|
## MarigoldNormalsPipeline
|
||||||
|
[[autodoc]] MarigoldNormalsPipeline
|
||||||
|
- all
|
||||||
|
- __call__
|
||||||
|
|
||||||
|
## MarigoldDepthOutput
|
||||||
|
[[autodoc]] pipelines.marigold.pipeline_marigold_depth.MarigoldDepthOutput
|
||||||
|
|
||||||
|
## MarigoldNormalsOutput
|
||||||
|
[[autodoc]] pipelines.marigold.pipeline_marigold_normals.MarigoldNormalsOutput
|
||||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user