run KISS: October 2022

Sunday, October 30, 2022

Argo PostSync Hook vs Helm PostInstall Hook

In this post we will review a compatibility issue between Argo and Helm.

TL;DR

Using helm's postinstall hook in argo might never be run, hence should be avoided in some cases.

Helm vs. Argo

Helm is a package manager for kubernetes.

Argo is an open source tool used to manage CI/CD workflows in a kubernetes environment.

Argo actually wraps helm charts deployment as part of an argo workflow. However, in practice, argo does not run helm. It uses it own implementation to deploy the helm charts. This means that we have compatibility issues.

The Problem Scenario

In my case, I've had a helm chart using the helm post-install hook. Deploying the chart using helm on a kubernetes cluster works fine.

apiVersion: batch/v1
kind: Job
metadata:
  name: my-job
  annotations:
    "helm.sh/hook": "post-install"
    "helm.sh/hook-delete-policy": "hook-succeeded"
    "helm.sh/hook-weight": "5"

However deploying the chart using argo does not complete. Argo does not run the post-install hook.

The Problem Cause

The reason for the symptom is agro translating helm's post-install hook to argo's PostSync hook, which is documented as:

"Executes after all Sync hooks completed and were successful, a successful application, and all resources in a Healthy state."

That's not a precise documentation.

Argo does not only wait for all pods to be alive, that is, answer to the kubernetes liveness probe.

Argo also waits for the pods to be ready, that is, answer to the kubernetes readiness probe.

This has become an issue in my case, as the post install create entities that only after their creation the pods can be ready for service.

The Bypass

I've changed the job not to use helm hooks at all. This means that the job need to explicitly wait for the deployment, and then create the related entities that enable pods to be in a ready state.

Notice that once removing the helm hooks, the job is run only once upon the deplyment. In my case I wanted to job to be run also in post-upgrade, so I used the trick as described in this post to cause rerun of the job by using the revision as part of the job name:

apiVersion: batch/v1
kind: Job
metadata:
  name: my-job-{{ .Release.Revision }}

Final Note

In this post we've demonstrated a compatibiliy issue between helm and argo, and explained how can it be bypassed. For most of helm charts, that are not using post install hooks, or not depending on the hooks results to make the pods ready, this will not be an issue. For the charts that do fall into this category, a bypass should be implemented.

Yet Another Bug

A month had passed, and another argo hook compatibility bug was found...

Argo runs the pre-install hooks before any upgrade. This will probably cause many issues for a deployment that uses a pre-install hook. Let's hope that argo will be fixed somewhere in the near future.

Monday, October 24, 2022

Sending Message From and To The Service Worker

In this post we will review how to send messages from a service worker to the page javascript, and how to send messages from the page javascript back to the service worker.

Send a message from the page to the service worker

First we send a message from the page:

const event = {
  type: "hello-from-page",
  data: "this is my data",
}
navigator.serviceWorker.controller.postMessage(event)

Notice that the data is sent to the page related service worker. It is not possible to send the message to another site/page service worker, but only to the page location service worker.

The service worker should accept the message.

self.addEventListener('message', (event) => handleMessageFromPage(event))

function handleMessageFromPage(event) {
  if (event.data.type === 'hello-from-page' ) {
    console.log(event.data.data)
  }
}

As the service worker is single threaded, make sure to handle the message in a timely fashion.

Send a message from the service worker to the page

The service worker can send a response directly to the page which sent the message, or alternatively send notification to all of the connected pages.

function notifyReadyForClients() {
  self.clients.matchAll().then((clients) => {
    for (const client of clients) {
      console.log(`notifying client ${client.id} - ${client.url}`)
      client.postMessage('this is the message from the service worker')
    }
  })
}

The page receives the message back using the following:

navigator.serviceWorker.addEventListener('message', receiveMessageFromServiceWorker)

function receiveMessageFromServiceWorker(event) {
  if (event.data === 'this is the message from the service worker') {
    console.log(`got ready notification from service worker`)
  }
}

Final note

Working with service workers add many abilties to the web application, however, it might complicate the development and testing cycles. It is important to use a well design architecture for messaging between the service worker and the page, and avoid a spaghetti messaging design.

Saturday, October 15, 2022

Detecting Obfuscated JavaScripts

In this post we will review a python machine learning implementation based on the article Detecting Obfuscated JavaScripts from Known and Unknown Obfuscators using Machine Learning.

Data Collection

The first step is to collect the javascripts from some of the most popular sites. We download the top 1000 popular sites from https://dataforseo.com, using the following curl:

curl 'https://dataforseo.com/wp-admin/admin-ajax.php' \
  -H 'authority: dataforseo.com' \
  -H 'accept: application/json, text/javascript, */*; q=0.01' \
  -H 'accept-language: en-US,en;q=0.9,he;q=0.8,fr;q=0.7' \
  -H 'content-type: application/x-www-form-urlencoded; charset=UTF-8' \
  -H 'cookie: PHPSESSID=hqg1mr3lrcodbrujnddpfv0acv; _gcl_au=1.1.932766159.1664772134; referrer=https://www.google.com/; _gid=GA1.2.350097184.1664772135; _lfa=LF1.1.9259cece6f47bcdb.1664772134834; cae45c4ea51njjp04o0dacqap3-agile-crm-guid=86bf2470-40ff-6e95-0f29-905636c53559; cae45c4ea51njjp04o0dacqap3-agile-original-referrer=https%3A//www.google.com/; cae45c4ea51njjp04o0dacqap3-agile-crm-session_id=48d757a8-f09c-bb2b-4168-7272ecbbd6f7; cae45c4ea51njjp04o0dacqap3-agile-crm-session_start_time=14; _aimtellSubscriberID=b81e9d16-592b-ff27-9a09-1934dadd04c6; cae45c4ea51njjp04o0dacqap3-agile-session-webrules_v2=%7B%26%2334%3Brule_id%26%2334%3B%3A5120774913982464%2C%26%2334%3Bcount%26%2334%3B%3A1%2C%26%2334%3Btime%26%2334%3B%3A1664772136776%7D; intercom-id-yhwl2kwv=cd0629b2-2766-4925-814e-36baf817ef57; intercom-session-yhwl2kwv=; _gat=1; _ga_T5NKP5Y695=GS1.1.1664772134.1.1.1664772624.59.0.0; _ga=GA1.1.1433352343.1664772135; _uetsid=c0cc940042d511ed9b67d1852d41bc8d; _uetvid=c0cc95d042d511eda56a27dc9895ce0f' \
  -H 'origin: https://dataforseo.com' \
  -H 'referer: https://dataforseo.com/top-1000-websites' \
  -H 'sec-ch-ua: "Chromium";v="106", "Google Chrome";v="106", "Not;A=Brand";v="99"' \
  -H 'sec-ch-ua-mobile: ?0' \
  -H 'sec-ch-ua-platform: "Linux"' \
  -H 'sec-fetch-dest: empty' \
  -H 'sec-fetch-mode: cors' \
  -H 'sec-fetch-site: same-origin' \
  -H 'user-agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36' \
  -H 'x-requested-with: XMLHttpRequest' \
  --data-raw 'action=dfs_ranked_domains&location=0' \
  --compressed > sites.json

Next, from each site we download the javascripts referenced from the site landing page. This is done using the beautiful soup library.

import json
import os.path
import pathlib
import shutil
from multiprocessing import Pool

import bs4
import requests

from src.common import ROOT_FOLDER


def send_request(url):
    agent = 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36'
    headers = {
        'User-Agent': agent,
    }
    page = requests.get(url, headers=headers)
    if page.status_code != 200:
        error_page = page.content
        error_page = error_page.decode('utf-8')
        raise Exception('{} failed code is {}: {}'.format(url, page.status_code, error_page))

    data = page.content
    data = data.decode('utf-8')
    return data


def get_domain_folder(domain):
    return ROOT_FOLDER + '/sites/' + domain


def process_script(domain, script, script_index):
    if script.has_attr('src'):
        src = script['src']
        if not src.startswith('http'):
            src = 'https://{}{}'.format(domain, src)
        print('download script {}'.format(src))
        data = send_request(src)
    else:
        data = script.getText()

    output_path = '{}/{}.js'.format(get_domain_folder(domain), script_index)
    with open(output_path, 'w') as file:
        file.write(data)


def process_site(domain):
    domain_folder = get_domain_folder(domain)
    site_complete_indication = domain_folder + '/complete.txt'
    if os.path.exists(site_complete_indication):
        print('site {} already done'.format(domain))
        return

    if os.path.exists(domain_folder):
        shutil.rmtree(domain_folder)
    os.mkdir(domain_folder)

    try:
        data = send_request('https://' + domain)
    except Exception as e:
        print('domain {} access failed: {}'.format(domain, e))
        return

    site = bs4.BeautifulSoup(data, 'html.parser')

    success = 0
    failed = 0
    for i, script in enumerate(site.findAll('script')):
        try:
            process_script(domain, script, i)
            success += 1
        except Exception as e:
            print(e)
            failed += 1

    with open(site_complete_indication, 'w') as file:
        file.write('success {}\nfailed {}'.format(success, failed))


def process_site_thread(site_tuple):
    site_index, site = site_tuple
    domain = site['domain']
    print('process site {}: {}'.format(site_index, domain))
    process_site(domain)


def main():
    print('loading sites')

    pathlib.Path(ROOT_FOLDER + '/sites').mkdir(parents=True, exist_ok=True)

    with open(ROOT_FOLDER + '/sites.json', 'r') as file:
        sites_json = json.load(file)

    sites_tuples = list(enumerate(sites_json))
    with Pool(20) as pool:
        pool.map(process_site_thread, sites_tuples)


main()

Data Preparation

Having list of scripts for each site, we merge all the scripts into one folder, and remove the duplicates.

import hashlib
import os
import pathlib

from src.common import ROOT_FOLDER


def main():
    pathlib.Path(ROOT_FOLDER + '/scripts').mkdir(parents=True, exist_ok=True)
    hashes = {}
    output_counter = 0
    scripts_counter = 0
    duplicates_counter = 0
    for site in os.walk(ROOT_FOLDER + '/sites'):
        site_path = site[0]
        files = site[2]
        for site_file in files:
            script_path = '{}/{}'.format(site_path, site_file)
            if not script_path.endswith('.js'):
                continue

            scripts_counter += 1
            print('{}: {}'.format(scripts_counter, script_path))
            with open(script_path, 'r') as file:
                data = file.read()

            data = data.strip()
            if len(data) < 1000 or data.startswith('{') or data.startswith('<'):
                continue

            script_hash = hashlib.sha256(data.encode('utf-8')).hexdigest()
            if script_hash in hashes:
                duplicates_counter += 1
            else:
                hashes[script_hash] = True
                output_counter += 1
                output_path = ROOT_FOLDER + '/scripts/{}.js'.format(output_counter)
                with open(output_path, 'w') as file:
                    file.write(data)

    print('scripts {} duplicates {}'.format(scripts_counter, duplicates_counter))


main()

Once we have one folder with all the scripts, we can obfuscate them using different obfuscators. In the previous post we have been Using Online Obfuscatation for Multiple Files. In addition, we use the webpack obfuscator:

import os
import pathlib
import subprocess
from multiprocessing import Pool

from src.common import ROOT_FOLDER


def obfuscate(entry):
    input_path, output_path = entry
    stdout = subprocess.check_output([
        'javascript-obfuscator',
        input_path,
        '--output',
        output_path,
    ])
    if len(stdout) > 0:
        print(stdout)


def main():
    os.environ["PATH"] += os.pathsep + '~/.nvm/versions/node/v18.3.0/bin'
    output_folder = ROOT_FOLDER + '/obfuscated_webpack'
    scripts_folder = ROOT_FOLDER + '/scripts'
    pathlib.Path(output_folder).mkdir(parents=True, exist_ok=True)
    jobs = []
    for _, _, files_names in os.walk(scripts_folder):
        for i, file_name in enumerate(sorted(files_names)):
            file_path = scripts_folder + '/' + file_name
            output_path = output_folder + '/' + file_name
            entry = file_path, output_path
            jobs.append(entry)

    with Pool(6) as pool:
        pool.map(obfuscate, jobs)


main()

Features Extraction

Now that we have the original javascripts folder, in addition to 3 obfuscated folders, we can extract features for each javascript file, and save the features into a csv file.

import csv
import os
import re
from collections import Counter
from math import log
from multiprocessing import Pool

import tqdm as tqdm

from src.common import ROOT_FOLDER


class Extractor:
    def __init__(self):
        self.csv_lines = []

    def extract_folder(self, folder_path):
        print('extracting folder {}'.format(folder_path))
        files_paths = []
        for _, _, files_names in os.walk(folder_path):
            for file in files_names:
                files_paths.append(folder_path + '/' + file)

        with Pool(7) as pool:
            for result in tqdm.tqdm(pool.imap_unordered(extract_file, files_paths), total=len(files_paths)):
                if result is not None:
                    self.csv_lines.append(result)

    def save_csv(self, file_path):
        header = get_header()

        self.csv_lines.insert(0, header)

        with open(file_path, 'w') as file:
            writer = csv.writer(file)
            writer.writerows(self.csv_lines)

        print('csv ready')


def extract_file(file_path):
    with open(file_path, 'r') as file:
        data = file.read()

    data = data.strip()
    if len(data) < 1000:
        return

    data = data.lower()

    if 'looks like a html code, please use gui' in data:
        return

    words = re.split('[^a-z]', data)
    words = list(filter(None, words))
    if len(words) == 0:
        return

    backslash_ratio = data.count('/n') / len(data)
    space_ratio = data.count(' ') / len(data)
    bracket_ratio = data.count('[') / len(data)
    hex_count = max(
        len(re.findall('x[0-9a-f]{4}', data)),
        data.count('\\x')
    )
    hex_ratio = hex_count / len(words)
    unicode_ratio = data.count('\\u') / len(words)

    chars_in_comment = 0
    long_lines = 0
    lines = data.split('\n')
    not_empty_lines_counter = 0
    for line in lines:
        line = line.strip()
        if line.startswith('//'):
            chars_in_comment += len(line)
        if len(line) > 1000:
            long_lines += 1
        if len(line) > 0:
            not_empty_lines_counter += 1
    chars_in_comment_share = chars_in_comment / not_empty_lines_counter
    chars_per_line = len(data) / not_empty_lines_counter

    if_share = words.count('if') / len(words)
    false_share = words.count('false') / len(words)
    true_share = words.count('true') / len(words)
    return_share = words.count('return') / len(words)
    var_share = words.count('var') / len(words)
    tostring_share = words.count('tostring') / len(words)
    this_share = words.count('this') / len(words)
    else_share = words.count('else') / len(words)
    null_share = words.count('null') / len(words)
    special_words = [
        'eval',
        'unescape',
        'fromcharcode',
        'charcodeat',
        'window',
        'document',
        'string',
        'array',
        'object',
    ]

    special_count = 0
    for special_word in special_words:
        special_count += words.count(special_word)
    special_share = special_count / len(words)

    return [
        file_path,
        backslash_ratio,
        chars_in_comment_share,
        if_share,
        special_share,
        long_lines,
        false_share,
        hex_ratio,
        unicode_ratio,
        space_ratio,
        true_share,
        bracket_ratio,
        return_share,
        var_share,
        tostring_share,
        this_share,
        else_share,
        null_share,
        chars_per_line,
        shannon(data),
    ]


def shannon(string):
    counts = Counter(string)
    frequencies = ((i / len(string)) for i in counts.values())
    return - sum(f * log(f, 2) for f in frequencies)


def get_header():
    return [
        'file_path',
        'backslash_ratio',
        'chars_in_comment_share',
        'if_share',
        'special_share',
        'long_lines',
        'false_share',
        'hex_ratio',
        'unicode_ratio',
        'space_ratio',
        'true_share',
        'bracket_ratio',
        'return_share',
        'var_share',
        'tostring_share',
        'this_share',
        'else_share',
        'null_share',
        'chars_per_line',
        'shannon',
    ]


def main():
    extractor = Extractor()
    extractor.extract_folder(ROOT_FOLDER + '/obfuscated_webpack')
    extractor.extract_folder(ROOT_FOLDER + '/scripts')
    extractor.extract_folder(ROOT_FOLDER + '/obfuscated_draftlogic')
    extractor.extract_folder(ROOT_FOLDER + '/obfuscated_javascriptobfuscator')

    extractor.save_csv(ROOT_FOLDER + '/features.csv')


if __name__ == '__main__':
    main()

Machine Learning

The last step is to run a random forest for the features.csv, and create a model that will be used to identify whether scripts are obfuscated.

import joblib
import numpy
import numpy as np
import pandas
import pandas as pd
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split

from src.common import ROOT_FOLDER
from src.features_extract import extract_file, get_header

pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)


def load_csv(csv_path):
    print('load CSV')

    df = pd.read_csv(csv_path)
    print(df.head(5))
    return df


def build_forest():
    df = load_csv(ROOT_FOLDER + '/features.csv')
    print('split to training and test')
    df['file_path'] = df['file_path'].apply(lambda x: 1 if 'obfuscated' in x else 0)
    labels = np.array(df['file_path'])

    features = df.drop('file_path', axis=1)
    feature_list = list(features.columns)
    features = np.array(features)
    train_features, test_features, train_labels, test_labels = \
        train_test_split(features,
                         labels,
                         test_size=0.25,
                         random_state=42,
                         )
    print('training features shape {} labels shape {}'.format(
        train_features.shape, train_labels.shape))
    print('test features shape {} labels shape {}'.format(
        test_features.shape, test_labels.shape))

    print('random forest classifier training')

    forest = RandomForestRegressor(n_estimators=100, random_state=42, verbose=2, n_jobs=-2)
    forest.fit(train_features, train_labels)

    print('random forest predictions')
    predictions = forest.predict(test_features)

    prediction_threshold = 0.5
    predictions[predictions < prediction_threshold] = 0
    predictions[predictions >= prediction_threshold] = 1

    prediction_errors = predictions - test_labels
    print('error for test {}'.format(
        round(np.mean(abs(prediction_errors)), 3), 'degrees.'))

    print('importance of each feature')

    importances = list(forest.feature_importances_)
    feature_importances = [(feature, round(importance, 2)) for feature, importance in
                           zip(feature_list, importances)]
    feature_importances = sorted(feature_importances, key=lambda x: x[1], reverse=True)
    for pair in feature_importances:
        print('variable: {} Importance: {}'.format(*pair))

    print('confusion matrix')

    joined = np.stack((predictions, test_labels), axis=1)
    tp = joined[np.where(
        (joined[:, 0] == 1) *
        (joined[:, 1] == 1)
    )]
    tn = joined[np.where(
        (joined[:, 0] == 0) *
        (joined[:, 1] == 0)
    )]
    fp = joined[np.where(
        (joined[:, 0] == 1) *
        (joined[:, 1] == 0)
    )]
    fn = joined[np.where(
        (joined[:, 0] == 0) *
        (joined[:, 1] == 1)
    )]
    print('true positive {}'.format(np.shape(tp)[0]))
    print('true negative {}'.format(np.shape(tn)[0]))
    print('false positive {}'.format(np.shape(fp)[0]))
    print('false negative {}'.format(np.shape(fn)[0]))

    joblib.dump(forest, ROOT_FOLDER + '/random_forest.joblib')


def load_forest():
    forest = joblib.load(ROOT_FOLDER + '/random_forest.joblib')

    df = load_csv(ROOT_FOLDER + '/features.csv')
    print('split to training and test')
    keep_name = df['file_path']
    df['file_path'] = df['file_path'].apply(lambda x: 1 if 'obfuscated' in x else 0)
    labels = np.array(df['file_path'])

    features = df.drop('file_path', axis=1)

    predictions = forest.predict(features)
    prediction_threshold = 0.5
    predictions[predictions < prediction_threshold] = 0
    predictions[predictions >= prediction_threshold] = 1
    errors = 0
    for ndarray_index, y in numpy.ndenumerate(predictions):
        label = labels[ndarray_index]
        prediction = predictions[ndarray_index]
        if label != prediction:
            errors += 1
            row = ndarray_index[0]
            print('file {} row {}'.format(keep_name[row], row))
    print('errors', errors)


def analyze_new_script(file_path):
    forest = joblib.load(ROOT_FOLDER + '/random_forest.joblib')
    forest.verbose = 0

    rows = [extract_file((file_path, True))]
    df = pandas.DataFrame(rows, columns=get_header())
    features = df.drop('file_path', axis=1)

    print(features)
    predictions = forest.predict(features.values)
    prediction = predictions[0]
    print(prediction)
    if prediction > 0.5:
        print('this is obfuscated')
    else:
        print('not obfuscated')


build_forest()
load_forest()
analyze_new_script(ROOT_FOLDER + '/scripts/1.js')
analyze_new_script(ROOT_FOLDER + '/obfuscated_javascriptobfuscator/1.js')
analyze_new_script(ROOT_FOLDER + '/obfuscated_draftlogic/1.js')

Final Note

The performance of the random forest model is ~1% of false negatives and false positives, hence we can fell pretty good in using it for our need.

Saturday, October 8, 2022

Using Online Obfuscatation for Multiple Files

In this post we will use online obfuscators to scrumble javascript code. The goal is to automate obfuscating of many source files, rather than focus on a single project obfuscation.

To do this we wrap the call to the online obfuscation in a loop for each javascript that we have, and keep the results in a dedicated output folder.

import os
import pathlib
import urllib.parse
from multiprocessing import Pool

import requests

from src.common import ROOT_FOLDER


def obfuscate(entry):
    input_path, output_path = entry
    with open(input_path, 'r') as file:
        data = file.read()
    headers = {
        'authority': 'javascriptobfuscator.com',
        'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
        'accept-language': 'en-US,en;q=0.9,he;q=0.8,fr;q=0.7',
        'cache-control': 'max-age=0',
        'content-type': 'application/x-www-form-urlencoded',
        'origin': 'https://javascriptobfuscator.com',
        'referer': 'https://javascriptobfuscator.com/Javascript-Obfuscator.aspx',
        'sec-ch-ua': '"Chromium";v="106", "Google Chrome";v="106", "Not;A=Brand";v="99"',
        'sec-ch-ua-mobile': '?0',
        'sec-ch-ua-platform': 'Linux',
        'sec-fetch-dest': 'document',
        'sec-fetch-mode': 'navigate',
        'sec-fetch-site': 'same-origin',
        'sec-fetch-user': '?1',
        'upgrade-insecure-requests': '1',
        'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36',
    }
    data = urllib.parse.quote_plus(data)
    data = 'UploadLib_Uploader_js=1&__EVENTTARGET=ctl00%24MainContent%24Button1&__EVENTARGUMENT=&__VIEWSTATE=%2FwEPDwUKMTM4MjU3NDgxNw9kFgJmD2QWAgIDD2QWAgIBDxYCHgRUZXh0BdkBPGxpIGNsYXNzPSdsaXN0LWlubGluZS1pdGVtIG1yLTAnPjxhIGNsYXNzPSd1LWhlYWRlcl9fbmF2YmFyLWxpbmsnIGhyZWY9Jy9zaWduaW4uYXNweCc%2BQWNjb3VudCBMb2dpbjwvYT48L2xpPgo8bGkgY2xhc3M9J2xpc3QtaW5saW5lLWl0ZW0gbXItMCc%2BPGEgY2xhc3M9J3UtaGVhZGVyX19uYXZiYXItbGluaycgaHJlZj0nL3JlZ2lzdGVyLmFzcHgnPlJlZ2lzdGVyPC9hPjwvbGk%2BIGQYAQUeX19Db250cm9sc1JlcXVpcmVQb3N0QmFja0tleV9fFgUFGmN0bDAwJE1haW5Db250ZW50JGNiTGluZUJSBRpjdGwwMCRNYWluQ29udGVudCRjYkluZGVudAUdY3RsMDAkTWFpbkNvbnRlbnQkY2JFbmNvZGVTdHIFG2N0bDAwJE1haW5Db250ZW50JGNiTW92ZVN0cgUgY3RsMDAkTWFpbkNvbnRlbnQkY2JSZXBsYWNlTmFtZXNJfhOUrd%2FjYMwya4KqO76nY28hwfkIpQAmM%2Bhk51YiJA%3D%3D&__VIEWSTATEGENERATOR=6D198BE1&__EVENTVALIDATION=%2FwEdAAzyRDYiu41ivvipFNnKHrClCJ8xELtYGHfHJig8BNR1A%2Fnd3wctyww89JbDbeLvgrjW%2FQY5cz%2Bpu3qUjqM%2B4n5jIWlyEKFxLO5ck%2BF6M0ODiJ1itZp%2B2hATYVWj%2Fb%2B%2BnyR8f2dPhQQre4aI0Iea4dKYmjI5SSrP8%2Fdi9FPKAsCRiSDSoNvpe2qp90wnP2HAWzNs9mdJae9TApAJFRRb54f73WbA4XcESfoeI8EInEzA%2BdxRJK%2FkVxlULg0AsW337%2FI8ZVc1MOVK9zP9AcHGfTxHt98XiGpmCkjM8SbZaQl4aw%3D%3D&ctl00%24MainContent%24uploader1=&ctl00%24MainContent%24TextBox1=' + data + '&ctl00%24MainContent%24TextBox2=&ctl00%24MainContent%24cbEncodeStr=on&ctl00%24MainContent%24cbMoveStr=on&ctl00%24MainContent%24cbReplaceNames=on&ctl00%24MainContent%24TextBox3=%5E_get_%0D%0A%5E_set_%0D%0A%5E_mtd_'
    response = requests.post('https://javascriptobfuscator.com/Javascript-Obfuscator.aspx', headers=headers, data=data)
    if response.status_code != 200:
        error_page = response.content
        error_page = error_page.decode('utf-8')
        raise Exception('failed code is {}: {}'.format(response.status_code, error_page))

    response_data = response.content.decode('utf-8')
    obfuscated = response_data.split('"Obfuscated result">', 1)[1]
    obfuscated = obfuscated.split('</textarea>', 1)[0]
    if 'CodeParseException' in obfuscated:
        print('error in file {}'.format(input_path))
        return
    with open(output_path, 'w') as file:
        file.write(obfuscated)


def main():
    output_folder = ROOT_FOLDER + '/obfuscated_javascriptobfuscator'
    scripts_folder = ROOT_FOLDER + '/scripts'
    pathlib.Path(output_folder).mkdir(parents=True, exist_ok=True)
    jobs = []
    for _, _, files_names in os.walk(scripts_folder):
        for i, file_name in enumerate(sorted(files_names)):
            file_path = scripts_folder + '/' + file_name
            output_path = output_folder + '/' + file_name
            entry = file_path, output_path
            jobs.append(entry)

    with Pool(20) as pool:
        pool.map(obfuscate, jobs)


main()

This code uses the javascriptobfuscator.com site for the actual obfuscation. Using a python tasks pool, it runs 20 processes of workers to send requests for the online obfuscator, and extracts the obfuscated result from the response.

We can do the same for another online obfuscator:

import os
import pathlib
import urllib.parse
from multiprocessing import Pool

import requests

from src.common import ROOT_FOLDER


def obfuscate(entry):
    input_path, output_path = entry
    with open(input_path, 'r') as file:
        data = file.read()
    headers = {
        'authority': 'www.daftlogic.com',
        'accept': '*/*',
        'accept-language': 'en-US,en;q=0.9,he;q=0.8,fr;q=0.7',
        'content-type': 'application/x-www-form-urlencoded',
        'cookie': 'PHPSESSID=29dafa2eb763cbcb11186cdbbcfc3314; _ga_6ZVKNC886B=GS1.1.1665286398.1.0.1665286398.0.0.0; _ga=GA1.1.175147361.1665286398; __cf_bm=gB4IMuUgZ8Az74pmgv2AwFVTJUOMaplVnQrpncSNpps-1665286399-0-AYS8PCyBXpfp49hS5bTxKyZ+kMpFi0N2Qlvjt3ONdyDMNG2rJgSHAyqZ0AiqN6GqXtUwYp7EDJsuVvooDQ6tjjg2jKzohM3l6v7W8/iOAqsO1xJARWlh1+6GMBqTGwvu7Q==; FCNEC=%5B%5B%22AKsRol-i1Gi2rmvKQSvp1TrIdzP4VD6g0NFyVC0zPhesxjtxxxo_bFi-jGqN6Xq967IDZMx0q2UyyMIivy7jozOXkF8Du1seYhGQ-A4VD2FSt5RzNPjqqDhvrqazTrVNNelEch0-nnOJyzYsg4hvwy0qAkCw0oC6zg%3D%3D%22%5D%2Cnull%2C%5B%5D%5D',
        'origin': 'https://www.daftlogic.com',
        'referer': 'https://www.daftlogic.com/projects-online-javascript-obfuscator.htm',
        'sec-ch-ua': '"Chromium";v="106", "Google Chrome";v="106", "Not;A=Brand";v="99"',
        'sec-ch-ua-mobile': '?0',
        'sec-ch-ua-platform': 'Linux',
        'sec-fetch-dest': 'document',
        'sec-fetch-mode': 'navigate',
        'sec-fetch-site': 'same-origin',
        'sec-fetch-user': '?1',
        'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36',
    }
    data = 'input=' + urllib.parse.quote_plus(data)
    response = requests.post('https://www.daftlogic.com/includes/ajax/jsobs.php', headers=headers, data=data)
    if response.status_code != 200:
        error_page = response.content
        error_page = error_page.decode('utf-8')
        raise Exception('failed code is {}: {}'.format(response.status_code, error_page))

    response_data = response.content.decode('utf-8')
    with open(output_path, 'w') as file:
        file.write(response_data)


def main():
    output_folder = ROOT_FOLDER + '/obfuscated_draflogic'
    scripts_folder = ROOT_FOLDER + '/scripts'
    pathlib.Path(output_folder).mkdir(parents=True, exist_ok=True)
    jobs = []
    for _, _, files_names in os.walk(scripts_folder):
        for i, file_name in enumerate(sorted(files_names)):
            file_path = scripts_folder + '/' + file_name
            output_path = output_folder + '/' + file_name
            entry = file_path, output_path
            jobs.append(entry)

    with Pool(20) as pool:
        pool.map(obfuscate, jobs)


main()

This time we've use www.draftlogic.com, which offers a different kind of obfuscation.

We can then analyze the obfuscation results of these sites for multiple javascripts, and get some insights about methods used for obfuscations for various javascripts.

Sunday, October 2, 2022

Parallel Images Build Using Bash Job Pool

In this post we will review how to use parallel bash processes to increase build images process.

TL;DR

Use shellutils to run parallel tasks as part of the build main script.

The Local Build Process

In many projects we have ability to locally build all the project's docker images on the local development machine. This can be used both to validate that we did not break the build when changing a major common functionality, and for locally running kubernetes cluster with the fresh docker images.

The build usually started by a central build script which then sequentially launches each of the images build script. Hence if we have 20 images, each image build script running for one minute, we have to wait for 20 minutes. One way to handle this is to run the central build and then go fo a coffee break :)

An alternative is to run the images build in parallel, using a process pool. We can use the shellutils helper functions to wrap the images scripts parallelism.

Parallel Central Build

An example for such central build is the following:

#!/usr/bin/env bash

rm -rf ./build_tmp
mkdir ./build_tmp

. ./job_pool.sh

echo "Building docker images in parallel"
job_pool_init 10 1
while read name ; do
  job_pool_run ./images/${name}/build.sh ./build_tmp/${name}.out
done < ./images_to_build
job_pool_shutdown

if [[ "${job_pool_nerrors}" != "0" ]]; then
  echo ""
  echo "****************************************"
  echo "            Failure summary             "
  echo "****************************************"

  cat ./build_tmp/*

  echo "****************************************"
  echo "   Total num of errors is: ${job_pool_nerrors}   "
  echo "****************************************"
  exit 1
fi
rm -rf ./build_tmp

To use this, first download the job_pool.sh from the shellutils, and save it in the same folder as the central build script. The central build script loads the job_pool.sh, and creates a pool of 10 workers.

Then for each image listed in the images_to_build file runs the image build script in images/IMAGE_NAME/build.sh. Finally the script reports success or failures in case of a problem.

Final Note

We have demonstrated usage of parallel shell scripts run for a faster local build. This is a real improvement that instead of waiting 20 minutes for the build, we ended up waiting only 2-3 minutes.

This bash job pool can be used for any other tasks that we manage by bash scripts, such as pushing docker images to images repository, generating load, and more.