In this blog I will review the Personal Identifiable Information (PII) discovery by AWS Macie.
Reading the AWS documentation I've found some encouraging information about Macie:
✓ Macie discovers and protects sensitive data in AWS
✓ Macie is fully managed, so it is serverless
So, Macie can scan S3 buckets, look for PII, and dump a report to a another specific S3 bucket with the details information about the findings.
I have created a the following JSON file, and uploaded it to an existing S3 bucket:
{
"items": [
{
"id": 634287634,
"lastAccessTime": 2154376372,
"isAdmin": true,
"name": "John Doe",
"streetAddress": "Hasharon 1",
"city": "Ramat Gan",
"country": "ISRAEL",
"hasCreditCard": false,
"phone": "08-6466123",
"anotherSecret": "5555555555554444"
},
{
"id": 54354554,
"lastAccessTime": 5435,
"isAdmin": false,
"name": "Alice Bob",
"streetAddress": "Herzel 5",
"city": "Or Yehuda",
"country": "ISRAEL",
"hasCreditCard": true,
"phone": "08-1234456",
"anotherSecret": "5105105105105100"
},
{
"id": 543543,
"lastAccessTime": 54354357654,
"isAdmin": false,
"name": "Bob Mcgee",
"streetAddress": "Hashalom 54",
"city": "Tel Aviv",
"country": "ISRAEL",
"hasCreditCard": true,
"phone": "03-6478222",
"anotherSecret": "4111111111111111"
}
]
}
The data in the file includes several PII fields:
- name
- streetAddress
- city
- country
- phone
- anotherSecret (which actually contains credit cards numbers)
I've started configuring my first Macie Job to test it. But:
❌ I had to manually create a target S3 bucket for the Macie reports
❌ I had to manually update the target S3 bucket policy to allow Macie to write its report to it
This is disappointing. I had expected a "Create an S3 bucket for me" button, like other AWS S3 services. So I've created an S3 bucket, and set the following policy for it:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "macie.amazonaws.com",
"AWS": "arn:aws:iam::MY_ACCOUNT_NUMBER:user/MY_USER_NAME"
},
"Action": "s3:*",
"Resource": [
"arn:aws:s3:::macie-output-reports-bucket",
"arn:aws:s3:::macie-output-reports-bucket/*"
]
}
]
}
Notice that the MY_ACCOUNT_NAME and MY_USER_NAME should be replaced to match you account and user.
Next Macie wanted me to do some more work for it to dump the report details:
❌ I had to manually create akey in KMS
❌ I had to provide permissions for Macie to use the key
Again, not "do it for me" button. And why am I forced to have the report encrypted?
Somehow annoyed, I've created a key in KMS, and supply the following policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Enable IAM User Permissions",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::MY_ACCOUNT_NAME:root"
},
"Action": "kms:*",
"Resource": "*"
},
{
"Sid": "Allow use of the key",
"Effect": "Allow",
"Principal": {
"Service": "macie.amazonaws.com"
},
"Action": [
"kms:Encrypt",
"kms:Decrypt",
"kms:ReEncrypt*",
"kms:GenerateDataKey*",
"kms:DescribeKey"
],
"Resource": "*"
}
]
}
Finally, I was able to run the job. The bucket contained the single small file that I've listed above, but still:
❌ I had to wait for several minutes until the job was complete
Finally, I got the results from the job, created in the target S3 bucket as ".gz" file. I expected a detailed report from Macie. According to the documentation, it reported up to 15 first locations of PII, but:
❌ The results were very partial
"sensitiveData": [
{
"category": "PERSONAL_INFORMATION",
"totalCount": "2",
"detections": [
{
"type": "NAME",
"count": "2",
"occurrences": {
"lineRanges": [],
"pages": [],
"records": [
{
"recordIndex": "0",
"jsonPath": "$.items[0].name"
},
{
"recordIndex": "0",
"jsonPath": "$.items[2].name"
}
],
"cells": []
}
}
]
}
]
The results included only the first and the third items name. For some unclear reason, Macie had skipped the second item. In addition, all the other fields were ignored.
Final Note
Well, in case you did not notice by now, I am very disappointed from Macie. I guess it is still in its non production phases, or else I did something really wrong.
In addition the cost of using this service is 1$/GB which is extremely high.
For now, I would recommend not to use this service.