Monday, December 26, 2022

Using kind for Local Development

 



Overview

In this post we will review a method to use kind for local development, for example on the personal laptop. Kind stands for Kubernetes IN Docker, and uses docker containers to simulate kubernetes nodes. It is a great tool to replace bare metal kubernetes,which until recently was relatively easy to install, but then with the CRI changes in kubernetes got very complicated.

The steps below explain how to install kind cluster with NGINX based ingress. Then we show how to simplify the communication with the services deployed on the kubernetes cluster.


Install The Kind Binary

Kind is a standalone binary, we just download it:


curl -Lo ./kind https://kind.sigs.k8s.io/dl/v0.17.0/kind-linux-amd64
chmod +x ./kind
sudo mv ./kind /usr/local/bin/kind


Create Kind Based Kubernetes Cluster

To install a kubernetes cluster, we use the following:


cat <<EOF | kind create cluster --config=-
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
kubeadmConfigPatches:
- |
kind: InitConfiguration
nodeRegistration:
kubeletExtraArgs:
node-labels: "ingress-ready=true"
extraPortMappings:
- containerPort: 80
hostPort: 80
protocol: TCP
- containerPort: 443
hostPort: 443
protocol: TCP
EOF


This enables binding of ports 80 and 443 on the local machine, which will later be used for communication from the local development machine to the services running on the kubernetes cluster.


Deploy NGINX Ingress

Now we deploy NGINX based ingress, which is the most popular kubernetes ingress.


kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/main/deploy/static/provider/kind/deploy.yaml

sleep 5

kubectl wait --namespace ingress-nginx \
--for=condition=ready pod \
--selector=app.kubernetes.io/component=controller \
--timeout=90s


Load Docker Images

In most cases we build the application docker images locally on the development machine, and hence to make these available for the kind based kubernetes we need to load them, for example:

kind load docker-image ${DockerTag}

This command should be integrated as part of the docker images build scripts on the local machine.


List Loaded Docker Images

While this is not a part of the regular build and test process, I've found it very useful command - to view the images that are already loaded to the kind:


docker exec -it $(kind get clusters | head -1)-control-plane crictl images


Communication With The Services

Great! We've deployed our services on the kubernetes cluster, and everything works, but hey? wait... How do we access these services? The best method would be a combination of local development kind dedicated ingress with hosts file update.

Let assume we have kubernetes services named my-service-1 and my-service-2. First we create an ingress to route the incoming requests to the services. Notice that this ingress should be deployed only as part of the development environment, and not on the production. This could be easily accomplish using helm flags. The ingress would be:


apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: ingress-kind
annotations:
nginx.ingress.kubernetes.io/ssl-redirect: "false"
spec:
rules:
- host: my-service-1
http:
paths:
- pathType: ImplementationSpecific
backend:
service:
name: my-service-1
port:
number: 80
- host: my-service-2
http:
paths:
- pathType: ImplementationSpecific
backend:
service:
name: my-service-2
port:
number: 80


Notice that we use the host name to select the target service to access. But how do we get the my-service-1 and my-service-2 to reach the kind based kubernetes ingress? Simple - we update the hosts file:

127.0.0.1 my-service-1
127.0.0.1 my-service-2


By setting the IP for these services to the localhost, we get to the binding of the cluster that we've previously configured as part of the cluster creation.

Final Note

Kind is a great and easy tool for local development on a kubernetes cluster, but there are some downsides. First, we need to wait for the docker images to be loaded into the kind cluster. This might last ~20 seconds, and while it is relatively short time, it is a bit annoying. Second, using ingress we can route only to HTTP/S based services. TCP based services (as Redis) cannot be addressed using this method.

Monday, December 19, 2022

React-Redux Application Coding Guidelines


 


In this post we will list some of the coding best practices for a react/redux application development. These thumb rules prevents a great pain during the actual application initial development, and a greater pain during the later stages of the application maintenance. 


create-react-app

Use the create-react-app and redux template to initialize your new application. See example in the post: Create Redux App with ASync Ajax Calls, which also includes a requests wrapper, and error handling.

When using create-react-app, the application build is super simplified. This reduces the pain involved in a JavaScript application build, and it is a real pain, trust me. In some cases, an issue that you encounter might require your to eject. Trying avoiding this as long as you can, even at a price of ugly bypass, as the alternative is for sure uglier.

Dockerize

Use two stages docker build to shorted the application build time. An example for a docker file is:

FROM node:16.3 as builder

WORKDIR /app
COPY src/package.json ./
COPY src/package-lock.json ./
RUN npm install

COPY src/ ./
RUN npm run build


FROM nginx:1.21.5
COPY --from=builder /app/build /gui
CMD ["nginx", "-g", "daemon off;"]

Flat Components

Keep all components as a flat folders list under the src folder.



This might seems counter intuitive at first sight, but in later stage of the application maintenance, when you often refactor components due to requirements changes, you are saved from trying to understand where to import your component from. 

Should I use:
import '../componentA/component.js'

Or:
import '../componentA/componentB/component.js'

Or:
import '../../componentA/componentB/component.js'

In addition, finding you component is much simpler when the entire list is flat.


Component Files

Each component folder should contain at the most 3 files:
  • component.js
  • component.module.css
  • slice.js

For example, a super button component will have the following directories and files:
~/src/super-component/component.js
~/src/super-component/component.module.css
~/src/super-component/slice.js

This is a good standard, that all component look the same, and it is also a great time saver for refactors.

component.js Visualization

Use only visualization related code in the component. Any logic related code should be moved to the slice.js. 

For example, displaying a list of items that includes a search by item name text box (assuming the search is client side). The component.js should never do the filter action. The slice.js should handle the filtering and produce a new filter-result list that will be used in the component.js.

component.js Indentation

Handling GUI and css can be a real pain sometimes, hence we should strive to keep the component.js simple. Do not overload it with multiple levels of components, but instead keep only a flat list of items in the result of the component.

For example, this simple table component:


return (
<div className={styles.root}>
<div>My Table</div>

<div>
<div className={styles.tableHead}>
<div className={styles.tableHeadColumn}>
Header1
</div>
<div className={styles.tableHeadColumn}>
Header2
</div>
</div>
<div className={styles.tableRow}>
<div className={styles.tableValue}>
Value 1
</div>
<div className={styles.tableValue}>
Value 2
</div>
</div>
</div>
</div>
)

is terrible!

It should be a flat list of items:


return (
<div className={styles.root}>
<div>My Table</div>
<TableHeader/>
<TableRows/>
</div>
)

So we should break the complex table component into multiple small components.

Flex

CSS flex is a great method to position and stretch the components, but we should keep it neat and clean to make stuff work. When using flex, make sure to avoid a case when a parent flex CSS directive is handled in one component, and affects or combines with a flex CSS directive in another component.

The following guideline usually applies to flex usage:


return (
<div className={styles.parent}>
<div className={styles.nonStretchedChild}>
<ChildComponent1/>
</div>
<div className={styles.stretchedChild}>
<ChildComponent2/>
</div>
</div>
)


and the styles are:

.parent {
width: 100%;
display: flex;
flex-direction: row;
}

.nonStretchedChild {
width: 100px
}

.stretchedChild {
flex: 1
}


slice.js Scope


Keep the slice.js small as possible, while it handles only the related component logic. When we have a slice which is very large, that's usually a sign that you need to break the component to multiple child components.

Thunk

When we split our logic to multiple slices, we probably have some (but not many) actions that require access to a state which is distributed among multiple slices. To access multiple slices, we can use the Thunk API. We can also use thunk API for async request/response handling for example:


export const graphClick = createAsyncThunk(
'graph/clickGraph',
async (input, thunkApi) => {
const {graphId} = input
const state = thunkApi.getState()
const lastXLocation = state.graph.lastXLocation[graphId]

const body = {
x: lastXLocation,
}

thunkApi.dispatch(setLoading(true))
const {ok, response} = await sendRequest('/get-my-data', body)
thunkApi.dispatch(setLoading(false))

if (!ok) {
thunkApi.dispatch(addNotification(true, response))
return
}

thunkApi.dispatch(setBuckets(response))
},
)


Final Note

In this document we have listed some coding best practices. Looking on this list, a first impression might be that this is a "too much overhead" for an application, but trust me, it is not. Once starting to use these guidelines, coding, maintenance, refactoring, and bug fixes are all much simpler, and coding without these guidelines looks like a huge mess.









Monday, December 12, 2022

Create Excel File in Go


 

In the following post we will wrap Excel file creation in GO.

The excel wraps the Excelize library, and simplify the usage for applications, that need a simple excel. This is not suitable for complex excel sheets.


Upon creation of the wrapper, we create a new excel file, with some default styles.


package excel2

import (
"fmt"
"github.com/xuri/excelize/v2"
)

const sheetName = "Sheet1"

type Excel struct {
excelFile *excelize.File
currentRow int
currentColumn int
styleBold int
styleRed int
styleOrange int
styleNormal int
}

func ProduceExcel() *Excel {
excelFile := excelize.NewFile()

const fontSize = 10
styleBold, err := excelFile.NewStyle(&excelize.Style{
Font: &excelize.Font{
Bold: true,
Family: "Times New Roman",
Size: fontSize,
Color: "#000000",
},
})
if err != nil {
panic(err)
}

styleRed, err := excelFile.NewStyle(&excelize.Style{
Font: &excelize.Font{
Bold: true,
Family: "Times New Roman",
Size: fontSize,
Color: "#FF0000",
},
})
if err != nil {
panic(err)
}

styleOrange, err := excelFile.NewStyle(&excelize.Style{
Font: &excelize.Font{
Bold: false,
Family: "Times New Roman",
Size: fontSize,
Color: "#FFA500",
},
})
if err != nil {
panic(err)
}

styleNormal, err := excelFile.NewStyle(&excelize.Style{
Font: &excelize.Font{
Bold: false,
Family: "Times New Roman",
Size: fontSize,
Color: "#000000",
},
})
if err != nil {
panic(err)
}

err = excelFile.SetColWidth(sheetName, "A", "Z", 30)
if err != nil {
panic(err)
}

return &Excel{
excelFile: excelFile,
currentRow: 1,
currentColumn: -1,
styleBold: styleBold,
styleRed: styleRed,
styleOrange: styleOrange,
styleNormal: styleNormal,
}
}



Next we handle add of data to the excel:



func (e *Excel) MoveToNextRow() {
e.currentRow++
e.currentColumn = -1
}

func (e *Excel) AddNextCell(text string) {
e.currentColumn++

axis := e.getCurrentCellAxis()
err := e.excelFile.SetCellValue(sheetName, axis, text)
if err != nil {
panic(err)
}
}

func (e *Excel) getCurrentCellAxis() string {
columnChar := string(rune(int('A') + e.currentColumn))
axis := fmt.Sprintf("%v%v", columnChar, e.currentRow)
return axis
}



And provide a simple style wrapper:


func (e *Excel) SetCellStyle(bold bool, red bool, orange bool) {
axis := e.getCurrentCellAxis()
if bold {
err := e.excelFile.SetCellStyle(sheetName, axis, axis, e.styleBold)
if err != nil {
panic(err)
}
} else if red {
err := e.excelFile.SetCellStyle(sheetName, axis, axis, e.styleRed)
if err != nil {
panic(err)
}
} else if orange {
err := e.excelFile.SetCellStyle(sheetName, axis, axis, e.styleOrange)
if err != nil {
panic(err)
}
} else {
err := e.excelFile.SetCellStyle(sheetName, axis, axis, e.styleNormal)
if err != nil {
panic(err)
}
}
}


Last, we can save the file:



func (e *Excel) SaveAs(filePath string) {
err := e.excelFile.SaveAs(filePath)
if err != nil {
panic(err)
}
}





Create and Download Excel File in JavaScript


 


In this post we'll display a simple method to create Excel file from an HTML table, and download it.

First we create an HTML table based on our data:


function getHtmlTable() {
const data = [
['Name', 'Age', 'Height'],
['Alice', 56, 180],
['Bob', 47, 176]
]
const rows = data.map(row => {
const columns = row.map(column => {
return `<td>${column}</td>`
})
return `<tr>${columns.join('')}</tr>`
})
return `<table>${rows.join('')}</table>`
}


Next we create a download as an Excel file:


function exportToExcel() {
const htmlTable = getHtmlTable()
window.open('data:application/vnd.ms-excel,' + encodeURIComponent(htmlTable))
}


Final Note


Notice that while this method is extremely simple, we cannot have control over the excel file styling, and the downloaded file name.
 



Monday, December 5, 2022

Getting AWS Batch Logs in Go


 


In the previous post AWS Batch in Go, we've started AWS batches, and tracked their execution, waiting for the completion. Once an AWS batch had failed, why won't we simplify the problem investigation, and fetch the log?

To get a job log, we use first AWS batch API to get the job stream name:


input := batch.DescribeJobsInput{Jobs: []*string{aws.String(jobId)}}
output, err := awsBatch.DescribeJobs(&input)
if err != nil {
panic(err)
}

streamName := output.Jobs[0].Container.LogStreamName



Then we use the AWS cloud watch API to get the events, while limit to get only tail of 100 events.


events, err := awsCloudWatchLog.GetLogEvents(&cloudwatchlogs.GetLogEventsInput{
Limit: aws.Int64(100),
LogGroupName: aws.String("/aws/batch/job"),
LogStreamName: streamName,
})


The full function is:


func GetJobLogs(jobId string) string {
awsSession, err := session.NewSession()
if err != nil {
panic(err)
}

awsBatch := batch.New(awsSession)

input := batch.DescribeJobsInput{Jobs: []*string{aws.String(jobId)}}
output, err := awsBatch.DescribeJobs(&input)
if err != nil {
panic(err)
}

streamName := output.Jobs[0].Container.LogStreamName

awsCloudWatchLog := cloudwatchlogs.New(awsSession)
events, err := awsCloudWatchLog.GetLogEvents(&cloudwatchlogs.GetLogEventsInput{
Limit: aws.Int64(100),
LogGroupName: aws.String("/aws/batch/job"),
LogStreamName: streamName,
})

var lines []string
for _, event := range events.Events {
eventTime := time.UnixMilli(*event.Timestamp)
line := fmt.Sprintf("%v %v", times.NiceTime(&eventTime), *event.Message)
lines = append(lines, line)
}

return strings.Join(lines, "\n")
}



Monday, November 28, 2022

Intercept Network Requests in a React Native App



React Native is development framework enabling development of applications for both Android and iOS using the same react like code base. It provides fast development for JavaScript engineers, and reduces the need to learn both iOS and Android proprietary development method and tools.

Most apps do not stand on their own, and need to send network requests to a server. In many cases, intercepting these requests from within the app, and adding/changing some of the requests has a real value, for example, handling authentication, adding tracking info, and more.

 In this post we will present how to intercept network requests in a React Native app. We will start by creation of a new React Native app, then add a sample network request from the app, and finally, we will learn to intercept the network requests.


Create a New React Native App

To create a new React Native app, run the following:

npx react-native init MyTestApp
cd MyTestApp


Then run these commands.
Each command should run in a different terminal (in the MyTestApp folder).

Terminal 1:
npx react-native start

Terminal 2:
npx react-native run-android

Send a Network Request

First install axios:

npm install -S axios

Add a call upon onPress on any element to axios:

async function apiCall() {
 const axiosInstance = axios.create({
   baseURL: 'http://www.my-server.com/',
   headers: {
     'Cache-Control': 'max-age=640000',
     'User-Agent': 'MyApp',
   },
 });

 axiosInstance.get('index.html').then(response => {
   console.log(response.data);
 });
}

Intercept Network Requests

In android/app/build.gradle under dependencies section, add:

implementation 'androidx.appcompat:appcompat:1.4.0'
implementation 'com.google.android.material:material:1.4.0'
implementation 'androidx.constraintlayout:constraintlayout:2.1.4'
implementation 'com.squareup.retrofit2:retrofit:2.9.0'
implementation 'com.squareup.retrofit2:converter-gson:2.9.0'


In the MainApplication.java 

import com.facebook.react.modules.network.OkHttpClientProvider;

And in the onCreate method:

OkHttpClientProvider.setOkHttpClientFactory(new InterceptorClient());

Lastly, implement the Interceptor intself:


import com.facebook.react.modules.network.OkHttpClientFactory;

import okhttp3.OkHttpClient;
import okhttp3.Request;

public class InterceptorClient implements OkHttpClientFactory {

    @Override
    public OkHttpClient createNewNetworkModuleClient() {
        return new OkHttpClient.Builder()
                .addInterceptor(chain -> {
                    Request original = chain.request();
                    Request.Builder builder = original.newBuilder()
                            .addHeader("My-authenticatoin", "my-token");
                    Request request = builder.build();

                    return chain.proceed(request);
                })
                .build();
    }
}




Monday, November 21, 2022

AWS Batch in Go


 


In a previous post we've used AWS batch using boto3.

In this post we will wrap usage of AWS batch using golang.



First we'll create the batch wrapper class.

package awsbatch

import (
"fmt"
"github.com/aws/aws-sdk-go/aws"
"github.com/aws/aws-sdk-go/aws/session"
"github.com/aws/aws-sdk-go/service/batch"
"time"
)

type BatchWrapper struct {
batch *batch.Batch
jobs []*string
}

func ProduceBatchWrapper() *BatchWrapper {
awsSession, err := session.NewSession()
if err != nil {
panic(err)
}

awsBatch := batch.New(awsSession)
return &BatchWrapper{
batch: awsBatch,
jobs: []*string{},
}
}


Next, we submit batches, and let them run in the background.


func (b *BatchWrapper) SubmitBatch(
jobName string,
environment map[string]string,
) {

overrides := batch.ContainerOverrides{
Command: []*string{aws.String("/simulatorbackend")},
Environment: []*batch.KeyValuePair{},
}

for key, value := range environment {
overrides.Environment = append(
overrides.Environment,
&batch.KeyValuePair{
Name: aws.String(key),
Value: aws.String(value),
},
)
}

input := batch.SubmitJobInput{
JobName: aws.String(jobName),
JobQueue: aws.String("my-batch-queue"),
JobDefinition: aws.String("my-batch-jobdef"),
ContainerOverrides: &overrides,
}

output, err := b.batch.SubmitJob(&input)
if err != nil {
panic(err)
}

b.jobs = append(b.jobs, output.JobId)
}


And finally, we wait for all the batches to complete.



func (b *BatchWrapper) WaitForBatches() {
input := batch.DescribeJobsInput{
Jobs: b.jobs,
}

b.jobs = []*string{}

for {
output, err := b.batch.DescribeJobs(&input)
if err != nil {
panic(err)
}

allDone := true
for _, job := range output.Jobs {
if *job.Status == "FAILED" {
panic("job failed")
} else if *job.Status == "RUNNING" || *job.Status == "STARTING" || *job.Status == "SUBMITTED" {
fmt.Printf("job %v id %v status %v\n",
*job.JobName,
*job.JobId,
*job.Status,
)

allDone = false
}
}
if allDone {
return
}
time.Sleep(time.Second * 10)
}
}



Monday, November 14, 2022

Classical Information


 

In this post we will use a method to present classical information. This basic method will be used in later posts to discuss quantum information. This is based on the information in the Qikit course.

Classical State and Probability Vectors

If X represents a bit, whose state can be 0 with probability 1/3 and 1 with probability 2/3, then we mark it as follows.


X bit

Σ = {0,1}

Pr (X=0) = 1/3   and    Pr (X=1) = 2/3


This state can also be presented as probabilistic column vector:    


Notice that:

  1. all numbers in the vector are non-negative real numbers
  2. The sum of the vector is 1


Bra and Ket

Bra is a row vector with 1 set in a single position, and all others are zeros.

(1,0) is bra zero, and the shorthand is marked as <0|

(0,1) is bra one, and the shorthand is marked as <1|


Ket is a probability column vector representing the X bit in only one state.

X is bit 0, and the shorthand for that is ket zero, marked as |0>

X is bit 1, and the shorthand for that is ket one, marked as |1>


We can use bra and ket to create vectors and matrices, for example:

 = 1/3 |0>   + 2/3 |1>


Deterministic Operations


We can define a function to make a change to the bit X.

For example: 

f1(X) = 1, will always convert the value of the bit to 1

f2(X) = !X, will always change the value of the bit to the opposite value.


The functions can be represented as matrices, so that M |X> = |f(X)>

For example, the corresponding matrices for the functions above are:

M1 = 


M2 = 


Probabilistic Operations

We can also configure probabilistic functions that have a probability of changing a bit.
Notice that the sum of each column in the matrix must be one.
For example:

M = 

and then

M |0> = always |0>

M |1> = 50% |0> and 50% |1>



Monday, November 7, 2022

NPM and Dependencies


 


Npm is a software registry, which holds hundreds of thousands libraries. It is used in a JavaScript based project to install dependencies.

The dependencies are added using npm, which install the dependencies in a transitive manner. This means that in case we install library A, which requires library B, and library B requires library C, then A, B, and C are all installed.

Not only that but npm also manages the versions requirements, so if A requires a specific version of B. Unlike other tools (like maven) npm can install different versions o the same library. See a nice example in the post: Understanding npm dependency resolution.


Still, there are some keynotes of npm usage for an npm user to keep in mind.


First, always install dependencies using install flag, e.g.:

npm install my-dependency-library

This does the following:

  1. Adds the recent version of the library to the package.json file.
  2. Add all the transitive dependencies of the library to the package-lock.json file.
  3. Install (downloads) all the transitive dependencies to the node_modules folder.

Second, npm does not start in vein every run. It inspects the current content of the package.json, package-lock.json, and the node_modules folder, and prefer using the dependencies from there instead of downloading new ones. This means, that if something went wrong, and we want to start a fresh dependencies installation, we need to delete both package-lock.json and the node_modules folder before running npm install.

Third, a very common error is "npm unable to resolve dependency tree". This is due to a dependency resolving algorithm change in recent npm versions, as explained here. To solve this, start a fresh dependencies installation (as specified above), and run npm with the --legacy-peer-deps flag.



Sunday, October 30, 2022

Argo PostSync Hook vs Helm PostInstall Hook


 

In this post we will review a compatibility issue between Argo and Helm.


TL;DR

Using helm's postinstall hook in argo might never be run, hence should be avoided in some cases.


Helm vs. Argo

Helm is a package manager for kubernetes.

Argo is an open source tool used to manage CI/CD workflows in a kubernetes environment.

Argo actually wraps helm charts deployment as part of an argo workflow. However, in practice, argo does not run helm. It uses it own implementation to deploy the helm charts. This means that we have compatibility issues.


The Problem Scenario

In my case, I've had a helm chart using the helm post-install hook. Deploying the chart using helm on a kubernetes cluster works fine.


apiVersion: batch/v1
kind: Job
metadata:
name: my-job
annotations:
"helm.sh/hook": "post-install"
"helm.sh/hook-delete-policy": "hook-succeeded"
"helm.sh/hook-weight": "5"


 However deploying the chart using argo does not complete. Argo does not run the post-install hook.


The Problem Cause

The reason for the symptom is agro translating helm's post-install hook to argo's PostSync hook, which is documented as:


"Executes after all Sync hooks completed and were successful, a successful application, and all resources in a Healthy state."


That's not a precise documentation. 

Argo does not only wait for all pods to be alive, that is, answer to the kubernetes liveness probe. 

Argo also waits for the pods to be ready, that is, answer to the kubernetes readiness probe.

This has become an issue in my case, as the post install create entities that only after their creation the pods can be ready for service.


The Bypass

I've changed the job not to use helm hooks at all. This means that the job need to explicitly wait for the deployment, and then create the related entities that enable pods to be in a ready state. 

Notice that once removing the helm hooks, the job is run only once upon the deplyment. In my case I wanted to job to be run also in post-upgrade, so I used the trick as described in this post to cause rerun of the job by using the revision as part of the job name:


apiVersion: batch/v1
kind: Job
metadata:
name: my-job-{{ .Release.Revision }}


Final Note

In this post we've demonstrated a compatibiliy issue between helm and argo, and explained how can it be bypassed. For most of helm charts, that are not using post install hooks, or not depending on the hooks results to make the pods ready, this will not be an issue. For the charts that do fall into this category, a bypass should be implemented.


Yet Another Bug

A month had passed, and another argo hook compatibility bug was found...
Argo runs the pre-install hooks before any upgrade. This will probably cause many issues for a deployment that uses a pre-install hook. Let's hope that argo will be fixed somewhere in the near future.







Monday, October 24, 2022

Sending Message From and To The Service Worker

 


In this post we will review how to send messages from a service worker to the page javascript, and how to send messages from the page javascript back to the service worker.


See also the related posts:

Send a message from the page to the service worker

First we send a message from the page:


const event = {
type: "hello-from-page",
data: "this is my data",
}
navigator.serviceWorker.controller.postMessage(event)

Notice that the data is sent to the page related service worker. It is not possible to send the message to another site/page service worker, but only to the page location service worker.

The service worker should accept the message.


self.addEventListener('message', (event) => handleMessageFromPage(event))

function handleMessageFromPage(event) {
if (event.data.type === 'hello-from-page' ) {
console.log(event.data.data)
}
}

As the service worker is single threaded, make sure to handle the message in a timely fashion.


Send a message from the service worker to the page


The service worker can send a response directly to the page which sent the message, or alternatively send notification to all of the connected pages.


function notifyReadyForClients() {
self.clients.matchAll().then((clients) => {
for (const client of clients) {
console.log(`notifying client ${client.id} - ${client.url}`)
client.postMessage('this is the message from the service worker')
}
})
}


The page receives the message back using the following:


navigator.serviceWorker.addEventListener('message', receiveMessageFromServiceWorker)

function receiveMessageFromServiceWorker(event) {
if (event.data === 'this is the message from the service worker') {
console.log(`got ready notification from service worker`)
}
}



Final note

Working with service workers add many abilties to the web application, however, it might complicate the development and testing cycles. It is important to use a well design architecture for messaging between the service worker and the page, and avoid a spaghetti messaging design.


Saturday, October 15, 2022

Detecting Obfuscated JavaScripts



In this post we will review a python machine learning implementation based on the article Detecting Obfuscated JavaScripts from Known and Unknown Obfuscators using Machine Learning.


Data Collection

The first step is to collect the javascripts from some of the most popular sites. We download the top 1000 popular sites from https://dataforseo.com, using the following curl:


curl 'https://dataforseo.com/wp-admin/admin-ajax.php' \
-H 'authority: dataforseo.com' \
-H 'accept: application/json, text/javascript, */*; q=0.01' \
-H 'accept-language: en-US,en;q=0.9,he;q=0.8,fr;q=0.7' \
-H 'content-type: application/x-www-form-urlencoded; charset=UTF-8' \
-H 'cookie: PHPSESSID=hqg1mr3lrcodbrujnddpfv0acv; _gcl_au=1.1.932766159.1664772134; referrer=https://www.google.com/; _gid=GA1.2.350097184.1664772135; _lfa=LF1.1.9259cece6f47bcdb.1664772134834; cae45c4ea51njjp04o0dacqap3-agile-crm-guid=86bf2470-40ff-6e95-0f29-905636c53559; cae45c4ea51njjp04o0dacqap3-agile-original-referrer=https%3A//www.google.com/; cae45c4ea51njjp04o0dacqap3-agile-crm-session_id=48d757a8-f09c-bb2b-4168-7272ecbbd6f7; cae45c4ea51njjp04o0dacqap3-agile-crm-session_start_time=14; _aimtellSubscriberID=b81e9d16-592b-ff27-9a09-1934dadd04c6; cae45c4ea51njjp04o0dacqap3-agile-session-webrules_v2=%7B%26%2334%3Brule_id%26%2334%3B%3A5120774913982464%2C%26%2334%3Bcount%26%2334%3B%3A1%2C%26%2334%3Btime%26%2334%3B%3A1664772136776%7D; intercom-id-yhwl2kwv=cd0629b2-2766-4925-814e-36baf817ef57; intercom-session-yhwl2kwv=; _gat=1; _ga_T5NKP5Y695=GS1.1.1664772134.1.1.1664772624.59.0.0; _ga=GA1.1.1433352343.1664772135; _uetsid=c0cc940042d511ed9b67d1852d41bc8d; _uetvid=c0cc95d042d511eda56a27dc9895ce0f' \
-H 'origin: https://dataforseo.com' \
-H 'referer: https://dataforseo.com/top-1000-websites' \
-H 'sec-ch-ua: "Chromium";v="106", "Google Chrome";v="106", "Not;A=Brand";v="99"' \
-H 'sec-ch-ua-mobile: ?0' \
-H 'sec-ch-ua-platform: "Linux"' \
-H 'sec-fetch-dest: empty' \
-H 'sec-fetch-mode: cors' \
-H 'sec-fetch-site: same-origin' \
-H 'user-agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36' \
-H 'x-requested-with: XMLHttpRequest' \
--data-raw 'action=dfs_ranked_domains&location=0' \
--compressed > sites.json


Next, from each site we download the javascripts referenced from the site landing page. This is done using the beautiful soup library.


import json
import os.path
import pathlib
import shutil
from multiprocessing import Pool

import bs4
import requests

from src.common import ROOT_FOLDER


def send_request(url):
agent = 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36'
headers = {
'User-Agent': agent,
}
page = requests.get(url, headers=headers)
if page.status_code != 200:
error_page = page.content
error_page = error_page.decode('utf-8')
raise Exception('{} failed code is {}: {}'.format(url, page.status_code, error_page))

data = page.content
data = data.decode('utf-8')
return data


def get_domain_folder(domain):
return ROOT_FOLDER + '/sites/' + domain


def process_script(domain, script, script_index):
if script.has_attr('src'):
src = script['src']
if not src.startswith('http'):
src = 'https://{}{}'.format(domain, src)
print('download script {}'.format(src))
data = send_request(src)
else:
data = script.getText()

output_path = '{}/{}.js'.format(get_domain_folder(domain), script_index)
with open(output_path, 'w') as file:
file.write(data)


def process_site(domain):
domain_folder = get_domain_folder(domain)
site_complete_indication = domain_folder + '/complete.txt'
if os.path.exists(site_complete_indication):
print('site {} already done'.format(domain))
return

if os.path.exists(domain_folder):
shutil.rmtree(domain_folder)
os.mkdir(domain_folder)

try:
data = send_request('https://' + domain)
except Exception as e:
print('domain {} access failed: {}'.format(domain, e))
return

site = bs4.BeautifulSoup(data, 'html.parser')

success = 0
failed = 0
for i, script in enumerate(site.findAll('script')):
try:
process_script(domain, script, i)
success += 1
except Exception as e:
print(e)
failed += 1

with open(site_complete_indication, 'w') as file:
file.write('success {}\nfailed {}'.format(success, failed))


def process_site_thread(site_tuple):
site_index, site = site_tuple
domain = site['domain']
print('process site {}: {}'.format(site_index, domain))
process_site(domain)


def main():
print('loading sites')

pathlib.Path(ROOT_FOLDER + '/sites').mkdir(parents=True, exist_ok=True)

with open(ROOT_FOLDER + '/sites.json', 'r') as file:
sites_json = json.load(file)

sites_tuples = list(enumerate(sites_json))
with Pool(20) as pool:
pool.map(process_site_thread, sites_tuples)


main()


Data Preparation 


Having list of scripts for each site, we merge all the scripts into one folder, and remove the duplicates.



import hashlib
import os
import pathlib

from src.common import ROOT_FOLDER


def main():
pathlib.Path(ROOT_FOLDER + '/scripts').mkdir(parents=True, exist_ok=True)
hashes = {}
output_counter = 0
scripts_counter = 0
duplicates_counter = 0
for site in os.walk(ROOT_FOLDER + '/sites'):
site_path = site[0]
files = site[2]
for site_file in files:
script_path = '{}/{}'.format(site_path, site_file)
if not script_path.endswith('.js'):
continue

scripts_counter += 1
print('{}: {}'.format(scripts_counter, script_path))
with open(script_path, 'r') as file:
data = file.read()

data = data.strip()
if len(data) < 1000 or data.startswith('{') or data.startswith('<'):
continue

script_hash = hashlib.sha256(data.encode('utf-8')).hexdigest()
if script_hash in hashes:
duplicates_counter += 1
else:
hashes[script_hash] = True
output_counter += 1
output_path = ROOT_FOLDER + '/scripts/{}.js'.format(output_counter)
with open(output_path, 'w') as file:
file.write(data)

print('scripts {} duplicates {}'.format(scripts_counter, duplicates_counter))


main()


Once we have one folder with all the scripts, we can obfuscate them using different obfuscators. In the previous post we have been Using Online Obfuscatation for Multiple Files. In addition, we use the webpack obfuscator:


import os
import pathlib
import subprocess
from multiprocessing import Pool

from src.common import ROOT_FOLDER


def obfuscate(entry):
input_path, output_path = entry
stdout = subprocess.check_output([
'javascript-obfuscator',
input_path,
'--output',
output_path,
])
if len(stdout) > 0:
print(stdout)


def main():
os.environ["PATH"] += os.pathsep + '~/.nvm/versions/node/v18.3.0/bin'
output_folder = ROOT_FOLDER + '/obfuscated_webpack'
scripts_folder = ROOT_FOLDER + '/scripts'
pathlib.Path(output_folder).mkdir(parents=True, exist_ok=True)
jobs = []
for _, _, files_names in os.walk(scripts_folder):
for i, file_name in enumerate(sorted(files_names)):
file_path = scripts_folder + '/' + file_name
output_path = output_folder + '/' + file_name
entry = file_path, output_path
jobs.append(entry)

with Pool(6) as pool:
pool.map(obfuscate, jobs)


main()


Features Extraction


Now that we have the original javascripts folder, in addition to 3 obfuscated folders, we can extract features for each javascript file, and save the features into a csv file.



import csv
import os
import re
from collections import Counter
from math import log
from multiprocessing import Pool

import tqdm as tqdm

from src.common import ROOT_FOLDER


class Extractor:
def __init__(self):
self.csv_lines = []

def extract_folder(self, folder_path):
print('extracting folder {}'.format(folder_path))
files_paths = []
for _, _, files_names in os.walk(folder_path):
for file in files_names:
files_paths.append(folder_path + '/' + file)

with Pool(7) as pool:
for result in tqdm.tqdm(pool.imap_unordered(extract_file, files_paths), total=len(files_paths)):
if result is not None:
self.csv_lines.append(result)

def save_csv(self, file_path):
header = get_header()

self.csv_lines.insert(0, header)

with open(file_path, 'w') as file:
writer = csv.writer(file)
writer.writerows(self.csv_lines)

print('csv ready')


def extract_file(file_path):
with open(file_path, 'r') as file:
data = file.read()

data = data.strip()
if len(data) < 1000:
return

data = data.lower()

if 'looks like a html code, please use gui' in data:
return

words = re.split('[^a-z]', data)
words = list(filter(None, words))
if len(words) == 0:
return

backslash_ratio = data.count('/n') / len(data)
space_ratio = data.count(' ') / len(data)
bracket_ratio = data.count('[') / len(data)
hex_count = max(
len(re.findall('x[0-9a-f]{4}', data)),
data.count('\\x')
)
hex_ratio = hex_count / len(words)
unicode_ratio = data.count('\\u') / len(words)

chars_in_comment = 0
long_lines = 0
lines = data.split('\n')
not_empty_lines_counter = 0
for line in lines:
line = line.strip()
if line.startswith('//'):
chars_in_comment += len(line)
if len(line) > 1000:
long_lines += 1
if len(line) > 0:
not_empty_lines_counter += 1
chars_in_comment_share = chars_in_comment / not_empty_lines_counter
chars_per_line = len(data) / not_empty_lines_counter

if_share = words.count('if') / len(words)
false_share = words.count('false') / len(words)
true_share = words.count('true') / len(words)
return_share = words.count('return') / len(words)
var_share = words.count('var') / len(words)
tostring_share = words.count('tostring') / len(words)
this_share = words.count('this') / len(words)
else_share = words.count('else') / len(words)
null_share = words.count('null') / len(words)
special_words = [
'eval',
'unescape',
'fromcharcode',
'charcodeat',
'window',
'document',
'string',
'array',
'object',
]

special_count = 0
for special_word in special_words:
special_count += words.count(special_word)
special_share = special_count / len(words)

return [
file_path,
backslash_ratio,
chars_in_comment_share,
if_share,
special_share,
long_lines,
false_share,
hex_ratio,
unicode_ratio,
space_ratio,
true_share,
bracket_ratio,
return_share,
var_share,
tostring_share,
this_share,
else_share,
null_share,
chars_per_line,
shannon(data),
]


def shannon(string):
counts = Counter(string)
frequencies = ((i / len(string)) for i in counts.values())
return - sum(f * log(f, 2) for f in frequencies)


def get_header():
return [
'file_path',
'backslash_ratio',
'chars_in_comment_share',
'if_share',
'special_share',
'long_lines',
'false_share',
'hex_ratio',
'unicode_ratio',
'space_ratio',
'true_share',
'bracket_ratio',
'return_share',
'var_share',
'tostring_share',
'this_share',
'else_share',
'null_share',
'chars_per_line',
'shannon',
]


def main():
extractor = Extractor()
extractor.extract_folder(ROOT_FOLDER + '/obfuscated_webpack')
extractor.extract_folder(ROOT_FOLDER + '/scripts')
extractor.extract_folder(ROOT_FOLDER + '/obfuscated_draftlogic')
extractor.extract_folder(ROOT_FOLDER + '/obfuscated_javascriptobfuscator')

extractor.save_csv(ROOT_FOLDER + '/features.csv')


if __name__ == '__main__':
main()


Machine Learning


The last step is to run a random forest for the features.csv, and create a model that will be used to identify whether scripts are obfuscated.


import joblib
import numpy
import numpy as np
import pandas
import pandas as pd
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split

from src.common import ROOT_FOLDER
from src.features_extract import extract_file, get_header

pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)


def load_csv(csv_path):
print('load CSV')

df = pd.read_csv(csv_path)
print(df.head(5))
return df


def build_forest():
df = load_csv(ROOT_FOLDER + '/features.csv')
print('split to training and test')
df['file_path'] = df['file_path'].apply(lambda x: 1 if 'obfuscated' in x else 0)
labels = np.array(df['file_path'])

features = df.drop('file_path', axis=1)
feature_list = list(features.columns)
features = np.array(features)
train_features, test_features, train_labels, test_labels = \
train_test_split(features,
labels,
test_size=0.25,
random_state=42,
)
print('training features shape {} labels shape {}'.format(
train_features.shape, train_labels.shape))
print('test features shape {} labels shape {}'.format(
test_features.shape, test_labels.shape))

print('random forest classifier training')

forest = RandomForestRegressor(n_estimators=100, random_state=42, verbose=2, n_jobs=-2)
forest.fit(train_features, train_labels)

print('random forest predictions')
predictions = forest.predict(test_features)

prediction_threshold = 0.5
predictions[predictions < prediction_threshold] = 0
predictions[predictions >= prediction_threshold] = 1

prediction_errors = predictions - test_labels
print('error for test {}'.format(
round(np.mean(abs(prediction_errors)), 3), 'degrees.'))

print('importance of each feature')

importances = list(forest.feature_importances_)
feature_importances = [(feature, round(importance, 2)) for feature, importance in
zip(feature_list, importances)]
feature_importances = sorted(feature_importances, key=lambda x: x[1], reverse=True)
for pair in feature_importances:
print('variable: {} Importance: {}'.format(*pair))

print('confusion matrix')

joined = np.stack((predictions, test_labels), axis=1)
tp = joined[np.where(
(joined[:, 0] == 1) *
(joined[:, 1] == 1)
)]
tn = joined[np.where(
(joined[:, 0] == 0) *
(joined[:, 1] == 0)
)]
fp = joined[np.where(
(joined[:, 0] == 1) *
(joined[:, 1] == 0)
)]
fn = joined[np.where(
(joined[:, 0] == 0) *
(joined[:, 1] == 1)
)]
print('true positive {}'.format(np.shape(tp)[0]))
print('true negative {}'.format(np.shape(tn)[0]))
print('false positive {}'.format(np.shape(fp)[0]))
print('false negative {}'.format(np.shape(fn)[0]))

joblib.dump(forest, ROOT_FOLDER + '/random_forest.joblib')


def load_forest():
forest = joblib.load(ROOT_FOLDER + '/random_forest.joblib')

df = load_csv(ROOT_FOLDER + '/features.csv')
print('split to training and test')
keep_name = df['file_path']
df['file_path'] = df['file_path'].apply(lambda x: 1 if 'obfuscated' in x else 0)
labels = np.array(df['file_path'])

features = df.drop('file_path', axis=1)

predictions = forest.predict(features)
prediction_threshold = 0.5
predictions[predictions < prediction_threshold] = 0
predictions[predictions >= prediction_threshold] = 1
errors = 0
for ndarray_index, y in numpy.ndenumerate(predictions):
label = labels[ndarray_index]
prediction = predictions[ndarray_index]
if label != prediction:
errors += 1
row = ndarray_index[0]
print('file {} row {}'.format(keep_name[row], row))
print('errors', errors)


def analyze_new_script(file_path):
forest = joblib.load(ROOT_FOLDER + '/random_forest.joblib')
forest.verbose = 0

rows = [extract_file((file_path, True))]
df = pandas.DataFrame(rows, columns=get_header())
features = df.drop('file_path', axis=1)

print(features)
predictions = forest.predict(features.values)
prediction = predictions[0]
print(prediction)
if prediction > 0.5:
print('this is obfuscated')
else:
print('not obfuscated')


build_forest()
load_forest()
analyze_new_script(ROOT_FOLDER + '/scripts/1.js')
analyze_new_script(ROOT_FOLDER + '/obfuscated_javascriptobfuscator/1.js')
analyze_new_script(ROOT_FOLDER + '/obfuscated_draftlogic/1.js')


Final Note

The performance of the random forest model is ~1% of false negatives and false positives, hence we can fell pretty good in using it for our need.