Running Llama 405B on VLLM with Slurm and Multiple Nodes

Llama 405b is a large model which requires lots of memory.

Quantization MethodWeight Memory# 80GB A100 GPUs
FP16810GB10
INT8/FP8405GB6
INT4202GB3

I have access to a 4-node SLURM cluster each with 4 A100 80GB GPUs each.

So how do we get these to work together?