In this post we will review a method to reduce docker image size for an LLM inference image.
In general using LLM in a docker container tend to create images whose size might be over 20G.
Downloaded docker images from ECR or from NEXUS might be very long. In addition, saving images in these technologies might present a challenge.
To avoid this we use the following:
1. We extract the high disk space usage folders out from the docker image, and save it into S3. Then we create a new slim image without these folders.
2. Upon deployment, we run an init container to download the file from S3, and extract it to an emptyDir volume.
Step 1: Extract folders and create slim image
The following script runs the original huge image build.
Then it extracts the huge folders ExportFolder1, ExportFolder2 into the OutputFile. This file should be loaded in AWS S3.
Next the script creates a new slim image without the folders ExportFolder1, ExportFolder2.
#!/usr/bin/env bash
set -e
cd "$(dirname "$0")"
DockerRegistry=my-registry.my-company
ProjectVersion=latest
BuildNumber=1234
OutputFile="${OutputFile:-llm-image-${BuildNumber}.tar.gz}"
SlimImageName="${DockerRegistry}/llm-image-slim${ProjectVersion}"
ExportFolder1="root/.cache"
ExportFolder2="usr/local/lib/python3.12"
TempFolder="$(mktemp -d)"
ImageFileSystem="${TempFolder}/rootfs"
TempContainer=llm-image-tmp
date
echo "build full image"
./build.sh
date
echo "extracting container to ${ImageFileSystem}"
mkdir -p "${ImageFileSystem}"
docker rm -f ${TempContainer}
docker create --name ${TempContainer} llm-image/dev:latest
docker export ${TempContainer} | tar -x -C "${ImageFileSystem}"
docker rm -f ${TempContainer}
date
echo "compressing folders to ${OutputFile}"
tar -czf "${OutputFile}" -C "${ImageFileSystem}" "${ExportFolder1}" "${ExportFolder2}"
date
echo "creating slim image ${SlimImageName}"
rm -rf "${ImageFileSystem}/${ExportFolder1}" "${ImageFileSystem}/${ExportFolder2}"
tar -C "${ImageFileSystem}" -c . | docker import - "${SlimImageName}"
docker push ${SlimImageName}
date
echo "cleaning up"
rm -rf "${TempFolder}"
date
echo "Done"
Step 2: Deployment
To handle the S3 download we create an extractor image.
Dockerfile:
FROM amazon/aws-cli
COPY files /
ENTRYPOINT ["/entrypoint.sh"]
Downloader script: entrypoint.sh
#!/usr/bin/env bash
set -e
localFileName="/models-local.tar.gz"
cd /local-storage
echo "Downloading ${MODELS_S3_PATH}"
aws s3 cp ${MODELS_S3_PATH} ${localFileName}
echo "Extracting ${localFileName}"
tar -xzf ${localFileName}
echo "Cleaning up..."
rm ${localFileName}
In the kubernetes deployment file, we add an init container.
initContainers:
- name: extract
image: modelsextract
env:
- name: MODELS_S3_PATH
value: {{ .Values.modelsS3Path | quote }}
volumeMounts:
- mountPath: /local-storage
name: local-storage
An emptyDir volume and mount for the exported folders:
volumes:
- name: local-storage
emptyDir:
sizeLimit: 100Gi
volumeMounts:
- mountPath: /root/.cache
subPath: root/.cache
name: local-storage
- mountPath: /usr/local/lib/python3.12
subPath: usr/local/lib/python3.12
name: local-storage
Also, make sure to use the slim image instead of the huge image in the deployment container.