Lately I have been reading the book Building Secure and Reliable Systems:
This book is very relevant for the current project, which had started about a year ago, and now is starting to acquire customers, hence we are looking for the principles of stability and security. These terms are not new to me, but this book has an interesting point of view combining the security and the stability terms in the same methodology.
Our project is using a kubernetes cloud based platform, and based on the methods presented in the book, I've made the following changes to the project.
Kibana View Only User
We are using Kibana and ElasticSearch to view the application status. We've had previously used ElasticSearch in a non secured mode, while counting on our authentication service to block unauthorized users, and on the Kubernetes Ingress to encrypt the traffic. But we have found that some of our users should only use the dashboards, and we do not want them to be able to update the Kibana dashboards. Hence we have started with TLS configuration the the ElasticSearch, which later allowed us to add a view only user in kibana. This can be automated using the following script:
#!/usr/bin/env bash
AUTH_ARG="-u elastic:mypassword"
function createRole(){
cat << EOF > ./input.json
{
"elasticsearch":{
"cluster":[],
"indices":[
{
"names":["*"],
"privileges":["read"],
"allow_restricted_indices":false
}
]
},
"kibana":[
{
"base":["read"],
"spaces":["default"]
}
]
}
EOF
curl ${AUTH_ARG} -s -X PUT -H 'kbn-xsrf: true' -H 'Content-Type: application/json' 'http://localhost:5601/api/security/role/my_viewer_role' --data-binary "@input.json"
}
function createUser(){
cat << EOF > ./input.json
{
"password": "myviewpassword",
"roles": ["my_viewer_role"]
}
EOF
curl ${AUTH_ARG} -s -k -X POST -H 'Content-Type: application/json' ${ELASTICSEARCH_HOSTS}/_security/user/myview --data-binary "@input.json"
}
createRole
createUser
Pods Anti Affinity
We want to ensure that in case a kubernetes node crashes, we will not have one of our microservices down. This is done by starting at least 2 replicas of each critical microservice, and using anti affinity rule to request kubernetes not to schedule 2 pods of the same microservice on the same node. Notice that in this case, since the kubernetes nodes amount is not very high, we set the anti affinity as a recommendation, not as an enforcement.
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-deployment
spec:
replicas: 2
selector:
matchLabels:
configid: my-container
template:
metadata:
labels:
configid: my-container
spec:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
podAffinityTerm:
topologyKey: kubernetes.io/hostname
labelSelector:
matchExpressions:
- key: configid
operator: In
values:
- my-container
containers:
- name: ...
Jenkins Merge Job
For some time, to update the production environment, we have manually merged the GIT dev branch to the master branch. As we wanted to reduce manual action items, and mistakes risks, we have added a merge job to automate the merge.
#!/usr/bin/env bash
git checkout -B dev
git pull
git checkout -B master
git pull
git merge dev -m "merge by jenkins"
git push --set-upstream ....
Auditing
Auditing enables use to find problems and malicious actions. We have added the following auditing.
- Audit log of successful and failed logins to the authentication service
- Audit of user name and the change in the management service
Fail-Safe
We've have some cases where our Redis DB was down. While we strive to fully prevent some cases, we cannot always fix all corner cases. We have decided that in case our DB is down, we will change the mode of the services as a "pass-through", meaning that we will change the mode of the services to enable any possible operation without accessing the DB. This leave the system as a partly functioning system, but this is better than a fully non operational system.
Final Note
This are only the first steps toward a more secure and stable product, I will keep updating in the future as I progress in the book reading, and apply more changes.