run KISS: June 2024

Monday, June 24, 2024

Camel Case Words Count

I've recently had to analyze URL path elements, and check each word in it. However I ran into an issue that I need to check each camel case word. For this, I've create a function to split a string to camel case words.

import (
  "fmt"
  "unicode"
)

func SplitCamelWords(segment string) []string {
  var words []string
  var word string

  var prevCharLetter bool
  var prevCharUpper bool
  inUpperWord := false
  for charIndex, currentChar := range segment {

   currCharUpper := unicode.IsUpper(currentChar)
   currCharLetter := unicode.IsLetter(currentChar)

   if charIndex > 0 {

    if currCharLetter {
     if prevCharLetter {
      if prevCharUpper {
       if currCharUpper {
        inUpperWord = true
       } else {
        if inUpperWord {
         words = append(words, word)
         word = ""
        }
       }
      } else {
       if currCharUpper {
        words = append(words, word)
        word = ""
       } else {
        inUpperWord = false
       }
      }
     }

    } else {
     inUpperWord = false
     if prevCharLetter {
      words = append(words, word)
      word = ""
     }
    }
   }

   prevCharUpper = currCharUpper
   prevCharLetter = currCharLetter
   word += fmt.Sprintf("%c", currentChar)
  }

  if prevCharLetter {
   words = append(words, word)
   word = ""
  }

  return words
}

and a test output is:

A -> [A]
a -> [a]
Aaaaa -> [Aaaaa]
AAAAA -> [AAAAA]
aaaaa -> [aaaaa]
A1 -> [A]
a1 -> [a]
Aaaaa1 -> [Aaaaa]
AAAAA1 -> [AAAAA]
aaaaa1 -> [aaaaa]
aB -> [a B]
aaaaB -> [aaaa B]
AaaaB -> [Aaaa B]
AaaaBBBB -> [Aaaa BBBB]
AaaaBbbbb -> [Aaaa Bbbbb]
aB2 -> [a B]
aaaaB2 -> [aaaa B]
AaaaB2 -> [Aaaa B]
AaaaBBBB2 -> [Aaaa BBBB]
AaaaBbbbb2 -> [Aaaa Bbbbb]
Aaaa1b -> [Aaaa 1b]
Aaaa1B -> [Aaaa 1B]
Aaaa1bbb -> [Aaaa 1bbb]
Aaaa1Bbb -> [Aaaa 1Bbb]
AaBbCc -> [Aa Bb Cc]
A1B2C3 -> [A 1B 2C]
Aa1Bb2Cc3 -> [Aa 1Bb 2Cc]
AAA BBB Ccc -> [AAA  BBB  Ccc]
AAA1BBB Ccc -> [AAA 1BBB  Ccc]
AAA1BBB Ccc -> [AAA 1BBB  Ccc]

Notice that the task is not as obvious as it might appear at first glace. We cannot just split whenever we find an upper case character, but instead we need to consider the sequence of characters.

For example: HouseOfLove would count as 3 words: House, Of, Love.

However, houseOFLove would still count as 3 words, since we have a sequence of upper case characters: House, OF, Love.

Monday, June 10, 2024

Frequent Pattern Growth Algoritm

In this post we will review the frequest pattern growth algorithm (aka FP Growth).

The goal of the algorithm is to find frequent sets in a large database. This goal is similar to the Apriori algorithm, but it requires only a single scan of the database. To achieve it goal, FP Growth builds a FP tree that is used to find the relations.

Let's assume that we have a database of transactions, each containing list of items.

First we count occurrences for each item, and notice the order of items by count.

The transactions are treated as if items were sorted by the occurrences. For simplicity, let update the transactions:

Now we construct the FP tree, keeping counters on each node. Notice that the items are added according to the ordered transaction.

After the 1st transaction:

After the 2nd transaction:

After the 3rd transaction:

After the 4th transaction:

After the 5th transaction:

After the 6th transaction:

After the 7th transaction:

After the 8th transaction:

After the 9th transaction:

Next, for each item, we mark what are the paths that lead to the item, and keep the item node score.

For example, item A can be reached by :

C, where the A node has count of 1
D,E, where the A node has count of 1
D,C,E, where the A node has count of 1

Now let us use a minimum support of 2 occurrences, which configures of common patterns we are searching.

For each item, we find sets of items which are common in the conditional paths, and have count at least as the minimum support.

Finally, we create frequent pattern rules where the conditional FP tree is used with the item:

Monday, June 3, 2024

Requirements for a Production Grade Kubernetes Based Solution

In this post we will review list of requirements for a production grade kubernetes solution. These requirements are standard for any deployment that is deployed in a shared resources kubernetes, and aim to provide security, reliability, and maintability for the deployment.

Helm Chart

A deployment should provide a helm chart to install it. The helm chart should be customizable, enabling add and change of:

Labels
Annotations
Image repo
Image version
Node selector
Affinity
CPU and memory resource per container
Log verbosity
Service definitions: types, ports
Additional volume and volumes mounts

In terms of security:

The RBAC should have least privileges settings
Use read-only file system whereever possible

In addition, helm upgrade should run with minimum downtime.

Communication

All communication should support both clear text and TLS. In case of TLS, there should be an ability to specify the location of the PKI files.

Containers

All containers should follow the next guidelines:

Run as non-root user
Log to STDOUT
Support liveness and readiness probes
Accept SIGTERM and exit gracefully, and log termination upon exit

Benchmarking

Detailed benchmarking should be done for the deployment, that specifies for a range of specified loads, the expected resources for each container.
In case of need, auto scaling should be handled automatically.
There should be no single point of failure. All services should high availability.

Tests

Development stage should include both unit tests, and end-to-end tests.

Full code coverage should be achieved as part of the tests.

Full Blog TOC

Full Blog Table Of Content with Keywords Available HERE