Does Stable Diffusion run on NVIDIA Jetson AGX Xavier Developer Kit with CUDA?
A Jetson AGX Xavier developer kit has been lying around in the lab for a while without being used by anyone. I picked it up for my Sunday project to test Stable Diffusion with the CUDA cores to study the state-of-the-art “AI” image generation technique – the original paper from the CompVis group is a great read.
The board I tested today is the very first model of AGX Xavier (P2822-0000) with 16G memory.
Disclaimer: I’m not at all an expert in deep learning or text-to-image.
TL;DR
See the Diffusers on Docker section to run the txt2img container on Xavier with JetPack 5.0.2. If your Xavier doesn’t have the version of JetPack, keep reading.
Prepare Jetson AGX Xavier
The NVIDIA Jetson series require NVIDIA JetPack, which provides “a full development environment for hardware-accelerated AI-at-the-edge development” including the Jetson Linux image (Linux4Tegra, L4T), CUDA setup, etc.
To start with, you need a NVIDIA developer account. Sign up from https://developer.nvidia.com/.
You can just follow the official well-documented instruction here, but I’m taking note for the precise steps I’ve taken below FYI.
Install NVIDIA JetPack
I first installed the latest JetPack (5.0.2 at the time of writing), which comes with L4T 35.1.0 and CUDA 11.4.239. To flash the L4T image to Xavier, I used NVIDIA SDK Manager from my Ubuntu laptop and a USB Type C cable.
Install SDK Manager
Download the .deb
file for the SDK manager from https://developer.nvidia.com/drive/sdk-manager and install it on the host Ubuntu machine (not on Xavier)
sudo apt install sdkmanager_1.9.0-10816_amd64.deb
Start SDK Manager
Run the SDK manager and login with your NVIDIA developer account (on the Ubuntu machine)
sdkmanager
Power on Xavier in recovery mode
Press force recovery button and power button on AGX Xavier to enter recovery mode. Connect the USB Type C cable to the Type C port of Xavier. The SDK Manager should detect the Xavier.
Flash the L4T image
Follow the wizard to flash the image
Finish “System configuration wizard”
Once the Linux image is flashed, then this window will pop up. Continue setting up the Xavier as it is telling us – you need a keyboard (+ a mouse) and a display to do so.
Reboot
In the end of this “System configuration wizard”, Xavier tries to reboot but it seems it hangs at
A start job is running for End-user configuration after initial OEM installation
Just tap the power/reset button to force-reboot.
Install the SDK
By default, xavier has the bridge interface l4tbr0
with the v4 address 192.168.55.1
setup for the SDK manager to connect to and install the SDK components. I.e., you don’t need to change the address. Just enter your user credential in the wizard just like above and hit the Install
button.
When finished, Xavier is all set with the JetPack now 😁
% /usr/local/cuda/bin/nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_May__4_00:02:26_PDT_2022
Cuda compilation tools, release 11.4, V11.4.239
Build cuda_11.4.r11.4/compiler.31294910_0
jetsonUtilities
I also found jetsonhacks/jetsonUtilities to be useful to check the JetPack installation.
% python jetsonInfo.py
NVIDIA Jetson-AGX
L4T 35.1.0 [ JetPack 5.0.2 ]
Ubuntu 20.04.5 LTS
Kernel Version: 5.10.104-tegra
CUDA 11.4.239
CUDA Architecture: NONE
OpenCV version: 4.5.4
OpenCV Cuda: NO
CUDNN: 8.4.1.50
TensorRT: 8.4.1.5
Vision Works: NOT_INSTALLED
VPI: 2.1.6
Vulcan: 1.3.203
Jetson stats
Also, Jetson stats is a useful tool with a beautiful interface.
sudo -H pip install -U jetson-stats
jtop
Tune Xavier
There are some tips to exploit the maximum performance on the Jetson device.
Update the fun profile in /etc/nvfancontrol.conf
FAN_DEFAULT_PROFILE cool
Restart the service
sudo systemctl restart nvfancontrol
Nvidia Power Model Tool to config ID=0 “MAXN” (the NONE power model to release all constraints)
sudo nvpmodel -m 0
Set static max frequency to CPU, GPU and EMC clocks
sudo jetson_clocks
Mount a disk
The disk space on the eMMC is so limited that there’s only 4GB available (!) after the installation.
% sudo df -h /dev/mmcblk0p1
Filesystem Size Used Avail Use% Mounted on
/dev/mmcblk0p1 28G 23G 4.0G 85% /
For the NVIDIA L4T PyTorch container, it is too little for the image to be downloaded locally.
I thought of using the NFS share for the Docker data root, but since OverlayFS is not supported on NFS (“The upper filesystem will normally be writable and if it is it must support the creation of trusted. so NFS is not suitable.”), I decided to mount an SD card (/dev/mmcblk1
) as an ext4 filesystem.
sudo dd if=/dev/zero of=/dev/mmcblk1 bs=4096 status=progress
sudo parted /dev/mmcblk1 --script -- mklabel gpt
sudo parted /dev/mmcblk1 --script -- mkpart primary ext4 0% 100%
sudo mkfs.ext4 /dev/mmcblk1p1 -L XAVIER_SD
This formats the SD card
% sudo parted /dev/mmcblk1 --script print
Model: SD SD32G (sd/mmc)
Disk /dev/mmcblk1: 31.9GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags:
Number Start End Size File system Name Flags
1 1049kB 31.9GB 31.9GB ext4 primary
Mount the SD card to /mnt/sd
sudo mkdir -p /mnt/sd
echo "LABEL=XAVIER_SD /mnt/sd ext4 defaults 0 0" | sudo tee -a /etc/fstab
sudo mount -a
Copy /home
to /mnt/sd/home
and link (do this with caution)
sudo rsync -avxP /home /mnt/sd/
rm -fr /home && ln -sfnv /mnt/sd/home /home
Stop the docker daemon
sudo systemctl stop docker
Migrate /var/lib/docker
to /mnt/sd/docker
sudo rsync -avxP /var/lib/docker /mnt/sd/
Edit /etc/docker/daemon.json
{
"data-root": "/mnt/sd/docker",
"runtimes": {
"nvidia": {
"path": "nvidia-container-runtime",
"runtimeArgs": []
}
}
}
Restart the docker daemon
sudo systemctl start docker
Run Diffusers on Docker
The easiest way to try the Stable Diffusion models would be using Diffusers from Hugging Face.
After many trials and errors (see the later section of this post), I created a Docker image with which I can try a text-to-image app based on the Stable Diffusion models with Diffusers: iomz/diffusers-jetson. Also see https://github.com/iomz/docker-diffusers-jetson – I didn’t automate the Docker image build as it is specific to the Xavier with a certain JetPack.
The image is built from nvcr.io/nvidia/l4t-pytorch:r35.1.0-pth1.11-py3 since this build supports torch.distributed
required for Diffusers.
Here’s all you need to do.
Clone the repo
git clone https://github.com/iomz/docker-diffusers-jetson
Clone a model from Huging Face in models/
with git lfs
First install git lfs
curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash
sudo apt install git-lfs
Then clone the model, e.g., runwayml/stable-diffusion-v1-5 (it should take a while)
mkdir -p docker-diffusers-jetson/models && cd $_
git clone https://huggingface.co/runwayml/stable-diffusion-v1-5
Run txt2img via Docker Compose
With the default settings, it takes about 90 seconds to generate an image.
% docker-compose run --rm txt2img -h
Creating docker-diffusers-jetson_txt2img_run ... done
usage: txt2img.py [-h] [--model MODEL] [--device DEVICE] [--nis NIS] [--out OUT] prompt
A simple text-to-image app with diffusers.StableDiffusionPipeline
positional arguments:
prompt a text string to be passed as the prompt
optional arguments:
-h, --help show this help message and exit
--model MODEL path to the model (default: ./models/stable-diffusion-v1-5)
--device DEVICE device (default: cuda)
--nis NIS value for num_inference_steps (default: 51)
--out OUT result image file (default: out.png)
docker-compose run --rm txt2img "abandoned building in forest with beautiful glass windows"
You can use a different model (e.g., prompthero/midjourney-v4-diffusion) with --model
option (default: models/stable-diffusion-v1-5
)
docker-compose run --rm txt2img --model models/midjourney-v4-diffusion "abandoned building in forest with beautiful glass windows"
EDIT: Comapre Jetson Xavier vs. GeForce RTX 2060 Mobile
Next day, I thought of trying out the same script directly on my laptop’s GPU. It only took 18.19 seconds to generate an image 😂
% sudo lspci -v | grep -i vga
00:02.0 VGA compatible controller: Intel Corporation UHD Graphics 630 (Mobile) (prog-if 00 [VGA controller])
01:00.0 VGA compatible controller: NVIDIA Corporation TU106M [GeForce RTX 2060 Mobile] (rev a1) (prog-if 00 [VGA controller])
% nvidia-smi
Tue Nov 15 12:39:57 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.07 Driver Version: 515.65.07 CUDA Version: 11.7 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:01:00.0 Off | N/A |
| N/A 36C P0 26W / N/A | 5MiB / 6144MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1506 G /usr/lib/xorg/Xorg 4MiB |
+-----------------------------------------------------------------------------+
% python txt2img.py --model models/midjourney-v4-diffusion "stable diffusion is running on my laptop"
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 52/52 [00:13<00:00, 3.74it/s]
python txt2img.py --model models/midjourney-v4-diffusion 18.19s user 6.40s system 114% cpu 21.412 total
and the image generated:
Things I tried but didn’t work
This section contains my note about my failed attempts, but they should work for different purposes.
Docker image for Stable Diffusion
@chitoku has posted a great note in the NVIDIA developer forum. However, it seems that my Xavier has too few VRAM for this approach (not sure, but it gets killed even before the first iteration of the inference).
PyTorch on Xavier
Building PyTorch on AArch64 machines is a bit of hassle – I tried this but failed a few times on the Xavier with an older version of JetPack. Thankfully, NVIDIA also saves us from the effort by providing a pre-built wheel in the Jetson Download Center. Look for “PyTorch for JetPack (JP 5.0.2)”. There’s also a nice doc for Installing PyTorch for Jetson Platform.
- Install the prerequisites
sudo apt-get -y install autoconf bc build-essential g++-8 gcc-8 clang-8 lld-8 gettext-base gfortran-8 iputils-ping libbz2-dev libc++-dev libcgal-dev libffi-dev libfreetype6-dev libhdf5-dev libjpeg-dev liblzma-dev libncurses5-dev libncursesw5-dev libpng-dev libreadline-dev libssl-dev libsqlite3-dev libxml2-dev libxslt-dev locales moreutils openssl python-openssl rsync scons python3-pip libopenblas-dev
- Download the wheel
Find torch-1.13.0a0+d0d6b1f2.nv22.10-cp38-cp38-linux_aarch64.whl
.
- Ensure Python3
Becuase of the limited disk space on the eMMC, I used pyenv to install anaconda3-2022.05
for the PyTorch environment.
% which python3
/home/iomz/.pyenv/shims/python3
% python3 --version
Python 3.8.13
- Install PyTorch
export TORCH_INSTALL=~/torch-1.13.0a0+d0d6b1f2.nv22.10-cp38-cp38-linux_aarch64.whl
python3 -m pip install --upgrade pip; python3 -m pip install aiohttp numpy=='1.19.4' scipy=='1.5.3'; export "LD_LIBRARY_PATH=/usr/lib/llvm-8/lib:$LD_LIBRARY_PATH"; python3 -m pip install --upgrade protobuf; python3 -m pip install --no-cache $TORCH_INSTALL
- Verify the installation
% python
Python 3.8.13 (default, Mar 28 2022, 10:59:05)
[GCC 10.2.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> print(torch.__version__)
1.13.0a0+d0d6b1f2.nv22.10
So far, ok…
Diffusers on Xavier
I just followed the doc.
Install the packages
pip install --upgrade diffusers[torch] transformers scipy
But I encounterd an error when doing from diffusers import StableDiffusionPipeline
>>> from diffusers import StableDiffusionPipeline
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/iomz/.pyenv/versions/anaconda3-2022.05/lib/python3.8/site-packages/diffusers/__init__.py", line 20, in <module>
from .modeling_utils import ModelMixin
File "/home/iomz/.pyenv/versions/anaconda3-2022.05/lib/python3.8/site-packages/diffusers/modeling_utils.py", line 50, in <module>
import accelerate
File "/home/iomz/.pyenv/versions/anaconda3-2022.05/lib/python3.8/site-packages/accelerate/__init__.py", line 7, in <module>
from .accelerator import Accelerator
File "/home/iomz/.pyenv/versions/anaconda3-2022.05/lib/python3.8/site-packages/accelerate/accelerator.py", line 27, in <module>
from .checkpointing import load_accelerator_state, load_custom_state, save_accelerator_state, save_custom_state
File "/home/iomz/.pyenv/versions/anaconda3-2022.05/lib/python3.8/site-packages/accelerate/checkpointing.py", line 24, in <module>
from .utils import (
File "/home/iomz/.pyenv/versions/anaconda3-2022.05/lib/python3.8/site-packages/accelerate/utils/__init__.py", line 66, in <module>
from .operations import (
File "/home/iomz/.pyenv/versions/anaconda3-2022.05/lib/python3.8/site-packages/accelerate/utils/operations.py", line 24, in <module>
from torch.distributed import ReduceOp
ImportError: cannot import name 'ReduceOp' from 'torch.distributed' (/home/iomz/.pyenv/versions/anaconda3-2022.05/lib/python3.8/site-packages/torch/distributed/__init__.py)
PyTorch from the NVIDIA wheel doesn’t have ReduceOp
in torch.distributed
? Let’s see
>>> print(torch.distributed.is_available())
False
Ha. I also didn’t know what this torch.distributed
is for.
Apparently, PyTorch has many cool stuff: https://pytorch.org/docs/stable/distributed.html.
Finally, I found that nvcr.io/nvidia/l4t-pytorch:r34.1.1-pth1.11-py3
supports torch.distributed
in this post.
Then I came up with Diffusers on Docker.
Leave a comment