Wednesday, September 29, 2021

Evaluating AWS Macie for Sensitive Data Protection

 



In this blog I will review the Personal Identifiable Information (PII) discovery by AWS Macie.

Reading the AWS documentation I've found some encouraging information about Macie:

Macie discovers and protects sensitive data in AWS

 Macie is fully managed, so it is serverless


So, Macie can scan S3 buckets, look for PII, and dump a report to a another specific S3 bucket with the details information about the findings.

I have created a the following JSON file, and uploaded it to an existing S3 bucket:


{
"items": [
{
"id": 634287634,
"lastAccessTime": 2154376372,
"isAdmin": true,
"name": "John Doe",
"streetAddress": "Hasharon 1",
"city": "Ramat Gan",
"country": "ISRAEL",
"hasCreditCard": false,
"phone": "08-6466123",
"anotherSecret": "5555555555554444"
},
{
"id": 54354554,
"lastAccessTime": 5435,
"isAdmin": false,
"name": "Alice Bob",
"streetAddress": "Herzel 5",
"city": "Or Yehuda",
"country": "ISRAEL",
"hasCreditCard": true,
"phone": "08-1234456",
"anotherSecret": "5105105105105100"
},
{
"id": 543543,
"lastAccessTime": 54354357654,
"isAdmin": false,
"name": "Bob Mcgee",
"streetAddress": "Hashalom 54",
"city": "Tel Aviv",
"country": "ISRAEL",
"hasCreditCard": true,
"phone": "03-6478222",
"anotherSecret": "4111111111111111"
}
]
}


The data in the file includes several PII fields:

  • name
  • streetAddress
  • city
  • country
  • phone
  • anotherSecret (which actually contains credit cards numbers)


I've started configuring my first Macie Job to test it. But:

❌ I had to manually create a target S3 bucket for the Macie reports

 I had to manually update the target S3 bucket policy to allow Macie to write its report to it


This is disappointing. I had expected a "Create an S3 bucket for me" button, like other AWS S3 services. So I've created an S3 bucket, and set the following policy for it:


{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "macie.amazonaws.com",
"AWS": "arn:aws:iam::MY_ACCOUNT_NUMBER:user/MY_USER_NAME"
},
"Action": "s3:*",
"Resource": [
"arn:aws:s3:::macie-output-reports-bucket",
"arn:aws:s3:::macie-output-reports-bucket/*"
]
}
]
}


Notice that the MY_ACCOUNT_NAME and MY_USER_NAME should be replaced to match you account and user.


Next Macie wanted me to do some more work for it to dump the report details:

❌ I had to manually create akey in KMS

❌ I had to provide permissions for Macie to use the key


Again, not "do it for me" button. And why am I forced to have the report encrypted? 

Somehow annoyed, I've created a key in KMS, and supply the following policy:


{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Enable IAM User Permissions",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::MY_ACCOUNT_NAME:root"
},
"Action": "kms:*",
"Resource": "*"
},
{
"Sid": "Allow use of the key",
"Effect": "Allow",
"Principal": {
"Service": "macie.amazonaws.com"
},
"Action": [
"kms:Encrypt",
"kms:Decrypt",
"kms:ReEncrypt*",
"kms:GenerateDataKey*",
"kms:DescribeKey"
],
"Resource": "*"
}
]
}


Finally, I was able to run the job. The bucket contained the single small file that I've listed above, but still:

❌ I had to wait for several minutes until the job was complete


Finally, I got the results from the job, created in the target S3 bucket as ".gz" file. I expected a detailed report from Macie. According to the documentation, it reported up to 15 first locations of PII, but:

❌ The results were very partial


"sensitiveData": [
{
"category": "PERSONAL_INFORMATION",
"totalCount": "2",
"detections": [
{
"type": "NAME",
"count": "2",
"occurrences": {
"lineRanges": [],
"pages": [],
"records": [
{
"recordIndex": "0",
"jsonPath": "$.items[0].name"
},
{
"recordIndex": "0",
"jsonPath": "$.items[2].name"
}
],
"cells": []
}
}
]
}
]


The results included only the first and the third items name. For some unclear reason, Macie had skipped the second item. In addition, all the other fields were ignored.


Final Note


Well, in case you did not notice by now, I am very disappointed from Macie. I guess it is still in its non production phases, or else I did something really wrong.

In addition the cost of using this service is 1$/GB which is extremely high.

For now, I would recommend not to use this service.


Wednesday, September 22, 2021

How to Create a Customized AWS AMI

 

In this post we will review the steps to create a customize AWS AMI.

AWS AMI stands for Amazon Machine Image. It is a snapshot of an EC2 instance volumes, that is used to launch new EC2 instances.


We will start by launching a standard Linux x86 machine. Open the EC2 service in the AWS console, and click on the Launch Instances button.





Select the Amazon Linux 2 AMI, which is a basic empty Linux image.




Next we select the instance type. We will use the t2.micro which has 1 vCPU, and 1G RAM. Then click on the Configure Instance Details button.



Now we select the VPC where the instance is deployed, and click on the Review and Launch button, and then click on the Launch button.




Before the instance is launched we need to select a key pair allowing us to connect to the instance. In this case we will create a new key pair, and down it to be used later. Now we can click on the Launch Instance button.




Now that the instance is launched, we can check its status in the EC2 instance table. We need to wait until the status of the instance is Running.




Let's find the instance public IP by clicking on the EC2 instance.

Notice that in case you do not get a public IP, it means you have selected a subnet for instance which is a private subnet, and hence cannot be accessed from the internet. To create a public subnet, make sure to:

  • Add an Internet Gateway
  • Attach it to the VPC
  • Edit the defaulr Routing Table and add a 2nd rule to redirect all traffic to the internet gateway
  • Edit the subnet and use the Actions button to Enable Auto Assign IPv4 Addresses




To connect to the EC2 instance run the following commands:


chmod 400 my-key.pem
ssh -i my-key.pem ec2-user@44.195.128.228


Now we have a connection to the instance and we can configure it. For example, we can run packages installations, GIT connections, update OS configuration, and more.

In this simple example we will simply create a text file.


echo "Hello World" > /home/ec2-user/indication


Next we stop the EC2 instance, and create an image from it.





We can use the Images menu to track the image creation process.





Once the images is ready we can use it for a new EC2 instance creation.




That's all for this time!



Monday, September 13, 2021

Managing Dependencies in Cloud Build




In this post we will review a method of managing dependencies in Google Cloud Build (GCB).
For the basics of GCB, check the post Using Google Cloud Build.


While GCB provide a great method of building artifacts and docker images, it does not handle dependencies. This might be added in a later stage, as GCB is still in its beta phase. For now, we need to handle this ourselves.

Let review an example of how dependencies management is achieved. Let's assume that we have two triggers: trigger-1 and trigger-2 that we want to run in parallel. Then we want trigger-3 to start only after the first two triggers were successfully completed.






To make this work, we will create a control plane trigger: trigger-control.
This trigger will lunch trigger-1 and trigger-2, wait for their completion, and the start trigger-3. 
Notice that the trigger control is initialized by a push the the source repository, while all of the other triggers are manually started by the control trigger.

The trigger is using the predefined builder image of the gcloud image.

The control trigger configuration is:


name: trigger-control

triggerTemplate:
branchName: .*
projectId: my-project
repoName: my-source-repo

build:
timeout: 3600s
steps:
- id: main
name: gcr.io/cloud-builders/gcloud
entrypoint: bash
timeout: 3600s
args:
- /workspace/trigger_control.sh
- ${BRANCH_NAME}


And the trigger shell code is:


trigger_control.sh
#!/usr/bin/env bash

branchName=$1
runIds=""

function run_trigger() {
name=$1
runId=$(gcloud beta builds triggers run image-${name} --branch=${branchName} | grep "name:" | grep builds | cut -d: -f2)
runId=$(echo ${runId} | xargs)

if [[ "${runIds}" == "" ]]; then
runIds="${runId}"
else
runIds="${runIds},${runId}"
fi
}

function wait_for_build() {
buildId=$1
image=$(gcloud beta builds list --format yaml --filter name=${buildId} | grep "TRIGGER_NAME" | cut -d: -f2)
image=$(echo ${image} | xargs)
status="WORKING"
while [ "${status}" == "WORKING" ]
do
status=$(gcloud beta builds list --format yaml --filter name=${buildId} | grep ^status | cut -d: -f2)
status=$(echo ${status} | xargs)
if [[ "${status}" == "WORKING" ]]; then
sleep 5
fi
done

if [[ "${status}" != "SUCCESS" ]]; then
echo "build for ${image} failed, status is ${status}"
exit 1
fi
}

function wait_for_builds(){
IFS=',' read -r -a array <<< "${runIds}"
for element in "${array[@]}"
do
wait_for_build ${element}
done
}

run_trigger trigger-1
run_trigger trigger-2
wait_for_builds

gcloud beta builds triggers run trigger-3 --branch=${branchName}




In this example, the launch of trigger-3 is in background, and the control trigger does not wait for it. We can change this to wait for trigger-3 and hence the control trigger would be success only if all triggers are success.



Final Note


We have reviewed a method to manage dependencies and parallel running of triggers using cloud build. This base could be easily used for complex dependencies management in the GCB.

Notice that you can also use the trigger's "waitFor" configuration to manage parallel steps, but the problem is that the parallel steps run on the same server, hence you will need to use a much powerful and expensive server to run your builds in parallel. This method is much faster and much simpler.





Thursday, September 9, 2021

Using Google Cloud Build

 

This post present basic usage of the Google Cloud Build (GCB).

I've previously described using Jenkins on GCP, but I've found the GCP integration with Jenkins is very shallow, and hence decided to move the build to GCB.

GCB manages triggers, which similar to the Jenkins' job. A trigger can be started by a push to a branch on git, and run a docker image that builds the application. Unlike Jenkins' jobs, currently GCB does not manage dependencies between the triggers.

Let start review a simple build.






First let's create a trigger configuration.


trigger.yaml

name: build-app

triggerTemplate:
branchName: .*
projectId: my-gcp-project
repoName: my-repo

build:
steps:
- id: main
name: my-builder-image
entrypoint: bash
timeout: 600s
args:
- /workspace/build.sh
- ${BRANCH_NAME}
- ${COMMIT_SHA}
- ${SHORT_SHA}
timeout: 600s



and import it using:



gcloud beta builds triggers import --source=trigger.yaml



GCB clones the GIT source under /workspace folder in the docker container, hence we can run our build script from this folder. GCB supplies some predefined builders for standard builds, for example, build of image by a docker file, and build of a VM image. Still, for complex builds, we will probably need to create our own builder image.


To create our own builder images, we create a Dockerfile, and then build it using GCB



gcloud builds submit --tag gcr.io/my-gcp-project/my-builder-image



Once we push a change to the GIT, it will trigger a build that will use the builder image to build our code, and in most cases, we will push a new image to the repository.






Final Note


We have reviewed a simple application build using the cloud build. Some more complex builds can be done using gcloud to automatically manage dependencies. We will review this in future posts.


  • For dependencies management and parallel running of triggers see this post.
  • For printing Git changes history, see this post.


Thursday, September 2, 2021

Deploy Distributed Jenkins on GCP Cloud

 

In this post we will review the steps to deploy Jenkins on GCP cloud, and automatically allocated GCP compute instances for distributed build.

This post is mostly based on the document Using Jenkins for distributed builds on Compute Engine.


Create Image For Jenkins Distributed Agent


Add your SSH key for packer.


gcloud compute project-info describe --format=json | jq -r '.commonInstanceMetadata.items[] | select(.key == "ssh-keys") | .value' > sshKeys.pub
echo "$USER:$(cat ~/.ssh/id_rsa.pub)" >> sshKeys.pub
gcloud compute project-info add-metadata --metadata-from-file ssh-keys=sshKeys.pub
rm -f sshKeys.pub


Install packer.



wget https://releases.hashicorp.com/packer/1.6.6/packer_1.6.6_linux_amd64.zip
unzip packer_1.6.6_linux_amd64.zip
rm -f packer_1.6.6_linux_amd64.zip
sudo mv ./packer /usr/local/bin/


Create service account for packer.



export PROJECT="my-project"
export NAME="packer"
export ACCOUNT="${NAME}@${PROJECT}.iam.gserviceaccount.com"
gcloud iam service-accounts delete ${ACCOUNT} --quiet
gcloud iam service-accounts create ${NAME} --display-name ${NAME}

gcloud projects add-iam-policy-binding $PROJECT --member serviceAccount:$ACCOUNT --role roles/compute.admin
gcloud projects add-iam-policy-binding $PROJECT --member serviceAccount:$ACCOUNT --role roles/iam.serviceAccountUser

gcloud iam service-accounts keys create service-account.json --iam-account $ACCOUNT


Build image using packer.


export GOOGLE_APPLICATION_CREDENTIALS=./service-account.json
packer build template.json


Where the template.json is the following.


{
"builders": [
{
"type": "googlecompute",
"project_id": "radware-cto",
"source_image_family": "ubuntu-2004-lts",
"source_image_project_id": "ubuntu-os-cloud",
"zone": "us-central1-c",
"disk_size": "10",
"image_name": "jenkins-agent",
"image_family": "jenkins-agent",
"ssh_username": "ubuntu"
}
],
"provisioners": [
{
"type": "file",
"source": "ssh_config.txt",
"destination": "/tmp/ssh_config.txt"
},
{
"type": "shell",
"inline": ["sudo cp /tmp/ssh_config.txt /etc/ssh/ssh_config"]
},
{
"type": "shell",
"inline": ["sudo apt-get update && sudo apt-get install -y default-jdk"]
}
]
}



Create Service Account for Jenkins


To create service account for jenkins, update the project name, and run the following script.


export PROJECT="my-project"
export ACCOUNT="jenkins@${PROJECT}.iam.gserviceaccount.com"
gcloud iam service-accounts delete ${ACCOUNT} --quiet
gcloud iam service-accounts create jenkins --display-name jenkins

gcloud projects add-iam-policy-binding $PROJECT --member serviceAccount:$ACCOUNT --role roles/storage.admin
gcloud projects add-iam-policy-binding $PROJECT --member serviceAccount:$ACCOUNT --role roles/compute.instanceAdmin.v1
gcloud projects add-iam-policy-binding $PROJECT --member serviceAccount:$ACCOUNT --role roles/compute.networkAdmin
gcloud projects add-iam-policy-binding $PROJECT --member serviceAccount:$ACCOUNT --role roles/compute.securityAdmin
gcloud projects add-iam-policy-binding $PROJECT --member serviceAccount:$ACCOUNT --role roles/iam.serviceAccountActor

gcloud iam service-accounts keys create jenkins-service-account.json --iam-account $ACCOUNT


Install Jenkins from the Marketplace


Once installed, login to Jeknins, and install the following plugins:

  • Google Compute Engine
  • Cloud Storage


Next, configure the Jenkins plugin, ad mentioned in the document Using Jenkins for distributed builds on Compute Engine.

 

Assign Static IP For Jenkins


Update the Jenkins VM name, and the related region, an run the following script.


VM=jenkins-1-vm
REGION=us-central1
gcloud compute addresses create jenkins-static-ip --region=${REGION}
ADDRESS=$(gcloud compute addresses describe jenkins-static-ip --region=${REGION} | grep "address:" | cut -d' ' -f2)
INTERFACE_NAME=$(gcloud compute instances describe ${VM} | grep -A5 accessConfigs | grep name | cut -d: -f2 | cut -c2-100)
gcloud compute instances delete-access-config ${VM} --access-config-name=${INTERFACE_NAME}
gcloud compute instances add-access-config ${VM} --access-config-name=${INTERFACE_NAME} --address=${ADDRESS}