In this post we review a combination of the two development methods: Data Driven Development and Visualized Data. This methods are the key value of an idea conversion into a real product.
A. The Fairy Tale
Once upon a time, a great developer had an idea: "I add my software component in the middle of network traffic and make something really good with it!". And so, the great developer had implemented his idea, placed his software component right in the great spot somewhere along the network traffic, and everything worked!
Ahh.. No...
These kind of stories are fairy tales, and do not exist in real life. When we have an idea, we are not aware to the full implications of the implementation, and our plan to cope with theoretical data finds unexpected behavior on the real data. Trying to implement and deploy such a software component tends to quickly and disgraceful fail.
B. Data Driven Development
B.1. Get The Data
To prepare for real data, we need to develop our software side by side with real data starting from day one. This means that we need to get hold of real data. This is possible if we're part of a software organization which already has several products running out there in the cloud. We need to get a tap/mirror of the data and save it for our application. For example, we can enable saving of the real data in AWS S3 for a small section of the organization customers.
B.2. Secure The Data
Access to the real data has huge benefits, but also huge risks. Think about real network traffic data which contains credit card details, as well as medical information. We must use ALL of our means to secure access to the data.
B.3. Anonymization of the Data
Notice that this requires us to handle PII, and comply with the relevant country laws, such as GDPR.
One way to handle this is to anonymize the data before saving it. We can also save the data for a short period of time, and then delete it. This should be carefully handled, as a leak of customers' real data has a devastating implications for the software organization.
B.4. Simulation of Data Flow
Now that we have the data, and before starting to implement our software component, we should create a simulation wrapper. The simulation component reads the data from the saved location, and simulate running our software component as if it was actually running in the cloud in the production. This means that the simulation should stream the data into our software component.
B.5 Use the Same Source
An important thing to notice is that our simulation is a wrapper the the actual component source code, the same one running in the production. Do not mistake and have 2 sets of code for simulation and for production.
C. Visualized Data
Our software component does something (otherwise why does it exist?). It can for example periodically report an analysis of the data, or it can alter something in the data. Whatever it does, we need to be aware of this both as part of our simulation and as part of the production run. How should we check it is doing its job?
C.1. Logs - The Wrong Method
While logs might be fine for deep inspection of a problem, the logs are not suitable to check whether the software component does fulfill its purpose. There are many problems with the logs.
- Do we need to scan though thousands of log lines to find the related log lines that represents the status?
- Do we plan to keep the verbose logs in the production, and pay the price of storing them and searching in the logs?
- Can we show the logs to a non-software-engineer and explain the result?
C.2. GUI - The Right Method
C.3. Save the Status
C.4. Load the Status
C.5. Visualize the Status
And who send HTTP request to the status loader? The status visualization component. This is a JavaScript based application that present in an interactive and user friendly manner the responses from the status loader. We can easily implement such component, for example using react & redux. To display graphs and histograms, we can use some of the existing free libraries such as react-vis, google geomap, react-date-range, react-datepicker, react-dropdown, react-select, and many more.
D. Summary
- Get real data
- Implement simulation wrapper to run the real software component code
- Export the status from our software component
- Implement a status loader
- Implement a status visualization
- Test our software component using the simulation and real data
- Check results using the status loader and status visualization
- Fix issues, and rerun until we have good enough results
- Move to production for minimal deployment
- Check results using the status loader and status visualization (yes, the same tools should be used for production!)
- Fix issues, and rerun until we have good enough results