Wednesday, September 16, 2020

List and Read Files from AWS S3 using GoLang



 

In this post we will review how to list files and read files from AWS S3 using GO.

We will be using the AWS SDK for GO: aws-sdk-go.

To access the AWS S3, you must use valid credentials. The default chain of credentials providers includes the following (quoted from the AWS SDK):

  1. Environment Credentials - Set of environment variables that are useful when sub processes are created for specific roles.

  2. Shared Credentials file (~/.aws/credentials) - This file stores your credentials based on a profile name and is useful for local development.

  3. EC2 Instance Role Credentials - Use EC2 Instance Role to assign credentials to application running on an EC2 instance. This removes the need to manage credential files in production.


AWS documentation recommends using the 3rd method, as it is the best secured alternative, and also automatically manages the credentials. Note that this method can be used only when running your code on an AWS EC2 instance. Trying to run the code on a non EC2 with a managed role, would cause the following error:

panic: NoCredentialProviders: no valid providers in chain. Deprecated.


Let's look at the main logic: we will connect to AWS S3, list files on a specific folder, and then read the first file from the list, and print it to the STDOUT.

The main code is:



package main

import (
"fmt"
"github.com/aws/aws-sdk-go/aws"
"github.com/aws/aws-sdk-go/aws/session"
"github.com/aws/aws-sdk-go/service/s3"
"io/ioutil"
"sort"
)

func main() {
region := "us-east-1"

config := aws.Config{
Region: aws.String(region),
}
awsSession, err := session.NewSession(&config)
if err != nil {
panic(err)
}

s3Client := s3.New(awsSession)

folder := "my-folder"
bucket := "my-bucket"

files := list(s3Client, bucket, folder)
bytes := read(s3Client, bucket, files[0])
fmt.Printf("file data is:\n%v\n", string(bytes))
}



The list function receives a bucket name and a folder, and list all the files within this folder. Notice that the list is recursive, which means that all the files in the sub folders are also returned. Notice that the strings array contains the keys for each file. The key is the full path to the file, starting from the bucket root, regardless of the folder used for the list API.

The list function is:



func list(s3Client *s3.S3, bucket string, folder string) []string {
params := &s3.ListObjectsInput{
Bucket: aws.String(bucket),
Prefix: aws.String(folder),
}

resp, err := s3Client.ListObjects(params)
if err != nil {
panic(err)
}

items := make([]string, 0)
for _, key := range resp.Contents {
items = append(items, *key.Key)
}

sort.Strings(items)
return items
}


Finally, let review the file read function. It reads a file from the AWS S3, and returns the file content as a bytes array. For large files, that could pose a problem to keep the entire file in the process memory, avoid using this method, and instead consider using the AWS S3 download API.



func read(s3Client *s3.S3, bucket string, file string) []byte {
getObject := &s3.GetObjectInput{
Bucket: aws.String(bucket),
Key: aws.String(file),
}

result, err := s3Client.GetObject(getObject)
if err != nil {
panic(err)
}
defer result.Body.Close()
data, err := ioutil.ReadAll(result.Body)
if err != nil {
panic(err)
}

return data
}



Final Notes

In this post we've used some basic AWS S3 APIs to list files in a bucket, and to read a file content.

For buckets with more than 1K files, a pagination is used, and hence the list API should be repeatedly called and the ListObjectsInput.Marker should be used for pagination.


No comments:

Post a Comment