run KISS: Adding GPU node to AWS EKS

Sunday, June 22, 2025

Adding GPU node to AWS EKS

The following post lists the steps to add a GPU node to an existing kubernetes cluster on AWS EKS.

We start with creation of a new GPU based nodegroup and specifying the GPU instance type.

echo "create GPU nodegroup"

eksctl create nodegroup --cluster ${EKS_CLUSTER_NAME} \
 --name gpu-nodes \
 --node-type g4dn.xlarge \
 --nodes 1 \
 --managed

Next we want to avoid any pod from scheduling on this node, unless it had specifically added a toleration.

echo "taint the GPU node"

NODE=$(kubectl get nodes -l alpha.eksctl.io/nodegroup-name=gpu-nodes -ojsonpath='{.items[*].metadata.name}')
kubectl taint node ${NODE} gpu-workload=true:NoSchedule

To enable usage of GPU, drivers should be applied on the host. This is handled by NDIVIA operator.

echo "install NVIDIA GPU operator"

cat > operator-values.yaml <<EOF
tolerations:
- key: "gpu-workload"
  value: "true"
  effect: "NoSchedule"
EOF

helm repo add nvidia https://nvidia.github.io/gpu-operator
helm repo update
helm install gpu-operator nvidia/gpu-operator -f operator-values.yaml
rm operator-values.yaml

Finally we can see the nodes ready to serve GPU requirements.

echo "print nodes with GPU"

kubectl get nodes -o json | jq '.items[].status.allocatable' | grep nvidia

Full Blog TOC

Full Blog Table Of Content with Keywords Available HERE

Sunday, June 22, 2025

Adding GPU node to AWS EKS

No comments:

Post a Comment