run KISS: March 2023

Monday, March 27, 2023

PDF split and merge in Python

In this post we will review how to split and merge PDFs files.

Why is this required? Sometimes, to fill a form, you edit a PDF file in an online PDF editor site, but don't want to enter your credit card ID or bank account number in this online anonymous site.

How to do it anyway?

Fill all the form except the details you want to keep for yourself.

Split the PDF file to multiple pages using the following python code.

import os
from PyPDF2 import PdfReader, PdfWriter

def split_pdfs(input_file_path):
    inputpdf = PdfReader(open(input_file_path, "rb"))

    out_paths = []
    if not os.path.exists("outputs"):
        os.makedirs("outputs")

    for i, page in enumerate(inputpdf.pages):
        output = PdfWriter()
        output.add_page(page)

        out_file_path = f"outputs/{i}.pdf"
        with open(out_file_path, "wb") as output_stream:
            output.write(output_stream)

        out_paths.append(out_file_path)
    return out_paths

split_pdfs("document.pdf")

Next open the PDF file with the related page you want to fill offline, and update it using a image editor: make a screenshot, paste in any local image editor application, and add the text you need. Then, paste the image to your google drive as a new document, and download it as PDF.

Replace the page file that you've updated, and use the following python code to merge the pages back.

from PyPDF2 import PdfMerger

merger = PdfMerger()

for i in range(3):
    merger.append('outputs/{}.pdf'.format(i))

merger.write("result.pdf")
merger.close()

Done! Your secrets are save...

Monday, March 20, 2023

Deploy MongoDB Community on Kubernetes

In this post we will review the step to install MongoDB community version on a kubernetes cluster.

First, we install the MongoDB operator:

helm repo add mongodb https://mongodb.github.io/helm-charts
helm install community-operator mongodb/community-operator

Next, we create a replica.yaml file for the MongoDB custom resources:

---
apiVersion: mongodbcommunity.mongodb.com/v1
kind: MongoDBCommunity
metadata:
  name: example-mongodb
spec:
  members: 1
  type: ReplicaSet
  version: "5.0.16"
  security:
    authentication:
      modes: ["SCRAM"]
  users:
    - name: my-user
      db: admin
      passwordSecretRef: # a reference to the secret that will be used to generate the user's password
        name: my-user-password
      roles:
        - name: clusterAdmin
          db: admin
        - name: userAdminAnyDatabase
          db: admin
      scramCredentialsSecretName: my-scram
  additionalMongodConfig:
    storage.wiredTiger.engineConfig.journalCompressor: zlib

# the user credentials will be generated from this secret
# once the credentials are generated, this secret is no longer required
---
apiVersion: v1
kind: Secret
metadata:
  name: my-user-password
type: Opaque
stringData:
  password: my-pass

And apply the resources:

kubectl apply -f replica.yaml

Now we can connect to the MongoDB using kubectl:

kubectl exec -it example-mongodb-0 -- mongosh "mongodb+srv://my-user:mongo@example-mongodb-svc.default.svc.cluster.local/admin?ssl=false"

Monday, March 13, 2023

Callback in Java

In this post we will review an example of callback in Java. Callbacks enables us to handle task asynchronously. For example, we can run a long execution time query and instead of waiting for the query result to return, we can do something else meanwhile. Asynchronous and callbacks are extensively used in GUI applications, where in most cases a single thread is updating the GUI, and we don't want any long running task to make the GUI freeze. Hence any background long processing is done using asynchronous callbacks.

Lets examine an example: we have query API that runs a query for 1 second, and supplies a callback API.

class QueryApi implements Runnable {
  private final String queryText;
  private final ApiCallback callback;

  QueryApi(String queryText, ApiCallback callback) {
    this.queryText = queryText;
    this.callback = callback;
  }

  public void runQuery() {
    new Thread(this).start();
  }

  public void run() {
    System.out.println("processing the query");
    try {
      Thread.sleep(1000);
    } catch (InterruptedException ignored) {

    }

    if (new Random().nextBoolean()) {
      this.callback.onSuccess("query " + this.queryText + " result: very good");
    } else {
      this.callback.onFailure("bad luck");
    }
  }
}

The callback is an interface with success and failure handling:

interface ApiCallback {
  void onSuccess(String queryResult);

  void onFailure(String errorMessage);
}

To use the API, we implement the callbacks, and call the query API:

public class Main implements ApiCallback {
  public static void main(String[] args) throws Exception {
    new Main().doWork();
  }

  private void doWork() {
    QueryApi api = new QueryApi("get all data", this);
    api.runQuery();
    System.out.println("i am not waiting for the query, so I can do other tasks");
  }

  @Override
  public void onSuccess(String queryResult) {
    System.out.println("YES!\n" + queryResult);
  }

  @Override
  public void onFailure(String errorMessage) {
    System.out.println("AHH!\n" + errorMessage);
  }
}

The output from this code is the following:

i am not waiting for the query, so I can do other tasks

processing the query

YES!

query get all data result: very good

Monday, March 6, 2023

Go Logger and Logging Methodology

In this post we will review a simple logging wrapper in Go, and discuss a logging methodology for a production grade product.

A Simple Log Wrapper

The following is a simple wrapping for logging.

package log

import (
 "fmt"
 "go.uber.org/zap"
 "os"
 "time"
)

const levelFatal = "FATAL"
const levelError = "ERROR"
const levelWarn = "WARN"
const levelInfo = "INFO"
const levelVerb = "VERB"

var zapLogger *zap.Logger

func initLogger() {
 configuration := zap.NewProductionConfig()
 configuration.Level = zap.NewAtomicLevelAt(zap.DebugLevel)
 configuration.EncoderConfig.CallerKey = ""
 var err error
 zapLogger, err = configuration.Build()
 if err != nil {
  panic(err)
 }
}

func writeRecord(level string, format string, v ...interface{}) {
 if Config.ZapLogger {
  if zapLogger == nil {
   initLogger()
  }
  message := fmt.Sprintf(format, v...)
  switch level {
  case "VERB":
   zapLogger.Debug(message)
   break
  case "INFO":
   zapLogger.Info(message)
   break
  case "WARN":
   zapLogger.Warn(message)
   break
  case "ERROR":
   zapLogger.Error(message)
   break
  case "FATAL":
   zapLogger.Fatal(message)
   break
  default:
   zapLogger.Info(message)
  }
 } else {
  formattedTimestamp := time.Now().UTC().Format("2006-01-02 15:04:05.000")
  updatedFormat := fmt.Sprintf("%v %v: %v\n", formattedTimestamp, level, format)
  fmt.Printf(updatedFormat, v...)
 }
}

func Error(format string, v ...interface{}) {
 writeRecord(levelError, format, v...)
 os.Exit(1)
}

func Fatal(format string, v ...interface{}) {
 writeRecord(levelFatal, format, v...)
 os.Exit(1)
}

func Warn(format string, v ...interface{}) {
 writeRecord(levelWarn, format, v...)
}

func Info(format string, v ...interface{}) {
 writeRecord(levelInfo, format, v...)
}

func V1(format string, v ...interface{}) {
 if Config.Verbose < 1 {
  return
 }
 writeRecord(levelVerb, format, v...)
}

func V2(format string, v ...interface{}) {
 if Config.Verbose < 2 {
  return
 }
 writeRecord("VERB", format, v...)
}

func V5(format string, v ...interface{}) {
 if Config.Verbose < 5 {
  return
 }
 writeRecord("VERB", format, v...)
}

Out of the score of this post is loading of a configuration based on environment variables that includes the verbosity level, and an indication whether to use zap logger.

The gain here, is that this class encapsulate the actual log mechanism, and hence users of the log do not need to import the log library.

We also easily switch between using zap logger, which is a one liner JSON printer of the log message, suitable for production environment, where logs are collected by an automated system such as GrayLog, and between a simple STDOUT printer, suitable for a development environment where a developer manually examines the logs.

Logging Methodology

So we use this log wrapper, and everything works fine, but then, unexpectedly (or not..) we have an issue in production. Our solution is running on kubernetes which supplies the verbosity in environment variable, so we can edit the deployment, and change the verbosity to further investigate the problem.

Wait..

Can we?

Will it assist the problem investigation?

The answer is "probably not" to both questions.

In a production system, another team (devops), not the developers team, is handling the deployment on kubernetes. The devops teams are hard to reach, and will probably not like the idea of changing production configuration, so the developer will need to convince the devops that this is the only way to investigate the issue. Not a fun task.

Suppose we do change the environment variable, and start getting verbose logs. Can you analyze millions of logs manually spit from all over your code? Will the production log collection mechanism be able the handle this huge stress?

It seems we need to rethink our methodology.

A good solution is to be able to update loggers on the fly, using a central configuration GUI, while the processes/pods are running. Also, we need ability to specifically activate logs for code sections, and for specific type of data handled.

To do this, we first add a PrefixedLogger.

package log

type PrefixLogger struct {
 prefix       string
 forceVerbose *int
}

func ProducePrefixLogger(
 prefix string,
) *PrefixLogger {
 return &PrefixLogger{
  prefix: prefix,
 }
}

func (p *PrefixLogger) ForceVerbose(verbose int) {
 p.forceVerbose = &verbose
}

func (p *PrefixLogger) GetVerbose() int {
 if p.forceVerbose == nil {
  return Config.Verbose
 }

 return *p.forceVerbose
}

func (p *PrefixLogger) V1(format string, v ...interface{}) {
 if p.GetVerbose() < 1 {
  return
 }
 writeRecord(levelVerb, p.prefix+format, v...)
}

func (p *PrefixLogger) V2(format string, v ...interface{}) {
 if p.GetVerbose() < 2 {
  return
 }
 writeRecord(levelVerb, p.prefix+format, v...)
}

func (p *PrefixLogger) V5(format string, v ...interface{}) {
 if p.GetVerbose() < 5 {
  return
 }
 writeRecord(levelVerb, p.prefix+format, v...)
}

func (p *PrefixLogger) Info(format string, v ...interface{}) {
 writeRecord(levelInfo, p.prefix+format, v...)
}

func (p *PrefixLogger) Warn(format string, v ...interface{}) {
 writeRecord(levelWarn, p.prefix+format, v...)
}

func (p *PrefixLogger) Fatal(format string, v ...interface{}) {
 writeRecord(levelFatal, p.prefix+format, v...)
}

This uses the logger, but adds a constant prefix to each log message. Allowing us later to filter messages by this text.

Next we add a logger that handles a specific type of data. In our case each data is related to a customer, and the code for a customer is PO.


type PoLogger struct {
 log.PrefixLogger
 specificEnabled bool
 component       string
}

func ProducePoLogger(
 po *types.PoWithId,
 component string,
 prefix string,
) *PoLogger {
 loggerPrefix := fmt.Sprintf("PO %v %v ", po.PoId, component)
 if len(prefix) > 0 {
  loggerPrefix += " " + prefix
 }

 logger := log.ProducePrefixLogger(loggerPrefix)

 specificEnabled := isComponentEnabled(po.Log.Components, component)

 poLogger := PoLogger{
  PrefixLogger:    *logger,
  specificEnabled: specificEnabled,
  component:       component,
 }

 poLogger.setVerboseByPo(po)

 return &poLogger
}

func (l *PoLogger) IsSpecificEnabled() bool {
 return l.specificEnabled
}

func (l *PoLogger) setVerboseByPo(po *types.PoWithId) {
 verbose := log.Config.Verbose
 if po.Log.Verbose > verbose {
  specificEnabled := isComponentEnabled(po.Log.Components, l.component)
  if len(po.Log.Components) == 0 || specificEnabled {
   verbose = po.Log.Verbose
  }
 }

 l.PrefixLogger.ForceVerbose(verbose)
}

func (l *PoLogger) RefreshPo(po *types.PoWithId) {
 l.setVerboseByPo(po)
}

func isComponentEnabled(components []string, myComponent string) bool {
 if len(components) == 0 {
  return false
 }

 for _, component := range components {
  if component == myComponent {
   return true
  }
 }

 return false
}

The customer (PO) configuration is saved in the database, and we can update it using a dedicated GUI. This configuration includes the names of the codes sections that we want to activate. So we can both select which code to get logs for, and which customer to get logs for, which reduces to logs records amount to a reasonable amount.

Notice that the RefreshPo method needs to be activated once in a while (by a scheduler or a callback event) to ensure we use the updated logger configuration.

Final Note

Creating a maintainable product is a process that affects a product both bottom-up and top-bottom, it should be a in the back of our mind through the design and the implementation steps. An important piece of this is a good logging capability. We have reviewed the required pieces to make this happen, and as always, a code review for not just the code, but also the logging behavior is a key part to keep it working.