Full Blog TOC

Full Blog Table Of Content with Keywords Available HERE

Sunday, June 22, 2025

Adding GPU node to AWS EKS


 


The following post lists the steps to add a GPU node to an existing kubernetes cluster on AWS EKS.


We start with creation of a new GPU based nodegroup and specifying the GPU instance type.


echo "create GPU nodegroup"

eksctl create nodegroup --cluster ${EKS_CLUSTER_NAME} \
--name gpu-nodes \
--node-type g4dn.xlarge \
--nodes 1 \
--managed


Next we want to avoid any pod from scheduling on this node, unless it had specifically added a toleration.


echo "taint the GPU node"

NODE=$(kubectl get nodes -l alpha.eksctl.io/nodegroup-name=gpu-nodes -ojsonpath='{.items[*].metadata.name}')
kubectl taint node ${NODE} gpu-workload=true:NoSchedule


To enable usage of GPU, drivers should be applied on the host. This is handled by NDIVIA operator.


echo "install NVIDIA GPU operator"

cat > operator-values.yaml <<EOF
tolerations:
- key: "gpu-workload"
value: "true"
effect: "NoSchedule"
EOF

helm repo add nvidia https://nvidia.github.io/gpu-operator
helm repo update
helm install gpu-operator nvidia/gpu-operator -f operator-values.yaml
rm operator-values.yaml


Finally we can see the nodes ready to serve GPU requirements.

echo "print nodes with GPU"

kubectl get nodes -o json | jq '.items[].status.allocatable' | grep nvidia





No comments:

Post a Comment