Wednesday, May 27, 2020

Create a Java gRPC client




This post presents the steps required to create a java gRPC client.

The gRPC is a great library/protocol providing inter-microservices communication, with high performance, and multiple programming language support. The definition in the official gRPC site is:


"
RPC is a modern open source high performance RPC framework that can run in any environment. It can efficiently connect services in and across data centers with pluggable support for load balancing, tracing, health checking and authentication. 
"


Let's jump directly into the files required to implement the java gRPC client.

The .proto File

The proto file describe the structures used for the communication, as well as the services.


syntax = "proto3";

package api;
option java_package = "org.alon.grpc.generated";


service MyServer {
rpc UpdateStore(Request) returns (Response) {}
}

enum Action {
ADD = 0;
REDUCE = 1;
}

message Request {
string name = 1;
uint64 update = 2;
Action action = 3;
}

message Response{
uint64 price = 1;
}

We have configured the structure of the Request and the structure of the Response.
In addition, we have configured the service API: UpdateStore.


The Maven pom file

The pom.xml is based on a standard java application pom file, and includes the following:
  • protoc and gRPC related dependencies
  • a protobuf-maven-plugin to generate the protoc and gRPC APIs. The following files are generated, and automatically added to the project sources:
    • org.alon.grpc.generated.Api - the request builder class
    • org.alon.grpc.generated.MyServerGrpc - the server communication API


<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>

<groupId>org.alon</groupId>
<artifactId>grpc</artifactId>
<version>1.0-SNAPSHOT</version>

<properties>
<maven.compiler.source>1.9</maven.compiler.source>
<maven.compiler.target>1.9</maven.compiler.target>
</properties>

<dependencies>
<dependency>
<groupId>io.grpc</groupId>
<artifactId>grpc-netty-shaded</artifactId>
<version>1.22.1</version>
</dependency>
<dependency>
<groupId>io.grpc</groupId>
<artifactId>grpc-protobuf</artifactId>
<version>1.22.1</version>
</dependency>
<dependency>
<groupId>io.grpc</groupId>
<artifactId>grpc-stub</artifactId>
<version>1.22.1</version>
</dependency>
<dependency>
<groupId>javax.annotation</groupId>
<artifactId>javax.annotation-api</artifactId>
<version>1.3.2</version>
</dependency>
</dependencies>

<build>
<extensions>
<!--
generates various useful platform-dependent project properties normalized from ${os.name} and ${os.arch}
This is required for running the protobuf plugin
-->
<extension>
<groupId>kr.motd.maven</groupId>
<artifactId>os-maven-plugin</artifactId>
<version>1.5.0.Final</version>
</extension>
</extensions>
<plugins>
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<configuration>
<archive>
<manifest>
<mainClass>org.alon.grpc.GrpcClient</mainClass>
</manifest>
</archive>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
</configuration>
<executions>
<execution>
<id>make-assembly</id> <!-- this is used for inheritance merges -->
<phase>package</phase> <!-- bind to the packaging phase -->
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
<!--
copy the proto files to the local folder.
For actual user: You can manually copy the proto files to the folder: target/proto
and the comment this plugin.
-->
<plugin>
<artifactId>maven-antrun-plugin</artifactId>
<executions>
<execution>
<id>copy-protoc-files</id>
<phase>generate-sources</phase>
<configuration>
<tasks>
<copy file="src/main/proto/api.proto"
tofile="target/proto/api.proto"
overwrite="true"
/>
</tasks>
</configuration>
<goals>
<goal>run</goal>
</goals>
</execution>
</executions>
</plugin>
<!--
run the protoc to generate Java source from the proto files
-->
<plugin>
<groupId>org.xolstice.maven.plugins</groupId>
<artifactId>protobuf-maven-plugin</artifactId>
<version>0.6.1</version>
<configuration>
<pluginId>grpc-java</pluginId>
<pluginArtifact>io.grpc:protoc-gen-grpc-java:1.22.1:exe:${os.detected.classifier}</pluginArtifact>
<protoSourceRoot>target/proto</protoSourceRoot>
</configuration>
<executions>
<execution>
<phase>generate-sources</phase>
<id>run-protoc</id>
<goals>
<goal>compile</goal>
<goal>compile-custom</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>


</project>


The Java Source


The last piece is the java source that we create to use the generated gRPC source.


package orig.alon.grpc;

import io.grpc.ManagedChannel;
import io.grpc.netty.shaded.io.grpc.netty.GrpcSslContexts;
import io.grpc.netty.shaded.io.grpc.netty.NettyChannelBuilder;
import io.grpc.netty.shaded.io.netty.handler.ssl.SslContext;
import org.alon.grpc.generated.Api;
import org.alon.grpc.generated.MyServerGrpc;

import java.io.File;
import java.util.concurrent.TimeUnit;

public class Client {

private final ManagedChannel channel;
private final MyServerGrpc.MyServerBlockingStub serverApi;

public Client(String host, int port, String certificate) throws Exception {
SslContext sslContext = GrpcSslContexts.forClient()
.trustManager(new File(certificate)).build();

channel = NettyChannelBuilder.forAddress(host, port).sslContext(sslContext).build();
serverApi = MyServerGrpc.newBlockingStub(channel);
}

public void shutdown() throws Exception {
channel.shutdown().awaitTermination(5, TimeUnit.SECONDS);
}

private long sendRequest(Api.Action action, String name, long price) {
Api.Request request = Api.Request.newBuilder()
.setAction(action)
.setName(name)
.setUpdate(price)
.build();

Api.Response response = serverApi.updateStore(request);
return response.getPrice();
}

public long add(String name, long price) {
return sendRequest(Api.Action.ADD, name, price);
}

public long reduce(String name, long price) {
return sendRequest(Api.Action.REDUCE, name, price);
}

public static void main(String[] args) throws Exception {
System.out.println("Starting");
if (args.length != 3) {
System.err.println("Wrong amount of arguments");
System.err.println("Usage:");
System.err.println("HOST PORT SERVER_CERTIFICATE_FILE_PATH");
return;
}
String host = args[0];
int port = Integer.parseInt(args[1]);
String certificate = args[2];

Client client = new Client(host, port, certificate);
try {
System.out.println(client.add("p1", 1));
System.out.println(client.reduce("p2", 5));
} finally {
client.shutdown();
}
System.out.println("Done");

}
}

Our code receives the host, port, and the certificate file that are used to create connection to the gRPC server. We nicely wrap the gRPC APIs with out own methods, and activate the gRPC server API.


Monday, May 25, 2020

GO Access Control Best Practice




I've been using GO language for some time now, and I've collected some guidelines and code standards to enable code maintainability and flexibility.

As always, guidelines are only guidelines, and can have exceptions on various usages, but still these guidelines usually do save time on later changes to the code, and reduce coupling.


The Guidelines


1. Create new package for each functional struct

What is a "functional struct"?
A functional struct is a structure that has methods attached to it.

For example, in this code section:


package book

import "fmt"

type author struct {
firstName string
lastName string
}

type Book struct {
name string
author author
}

func Create(name string, authorFirstName string, authorLastName string) *Book {
b := Book{
name: name,
author: author{
firstName: authorFirstName,
lastName: authorLastName,
},
}
return &b
}

func (b *Book) Print() {
fmt.Printf("The book name is: %v\n", b.name)
fmt.Printf("Written by: %v %v\n", b.author.lastName, b.author.firstName)
}

  • The Book struct is a functional structure, as it includes a Print method.
  • The author struct is a data storage structure, and it does not include any related methods.

As seen in this example, the Book structure resides in its own package: the book package.
This means that we only expose public elements (functions and structs starting with upper case) to other packages. 

The important idea to understand is that the book package should include ONLY the Book structure related methods and possibly other data storage structures, but nothing else.

This means that we will have many packages in our code, but it significantly reduces the coupling.

The usage of this structure would be as follow:


func main() {
b := book.Create("Alice in Wonderland", "Lewis", "Carroll")
b.Print()
}


2. Use a Create function to construct the structure


As seen in the previous example, the Book structure is created using the Create function.
While this somehow complicates the new structure creation, it allows great flexibility.

For example, let's assume that now we want to create a map of authors by first name. If we would have directly created the structure, we would have to change all of the struct usages to initialize the map, while in our case, we would change it only in a single location:

func Create(name string, authorFirstName string, authorLastName string) *Book {
b := Book{
name: name,
authors: make(map[string]author),
}
b.authors[authorFirstName] = author{
firstName: authorFirstName,
lastName: authorLastName,
}
return &b
}




3. Expose only public entities


This is obvious but should be mentioned.

You package is your fortress. You should open access to the package only where needed.

Hence we will use GO upper case methods, functions, and variables, only where we need.



4. Wire dependencies instead of creating them


Whenever using one function struct on another functional struct, create the structures from out of the function structures scope. 

For example, let's assume we want a printer class for to print the output:


package printer

import (
"fmt"
"io/ioutil"
)

type Printer struct {
outputFile string
printToStdout bool
}

func Create(outputFile string, printToStdout bool) *Printer {
p := Printer{
outputFile: outputFile,
printToStdout: printToStdout,
}
return &p
}

func (p *Printer) Print(format string, a ...interface{}) {
if p.printToStdout {
fmt.Printf(format, a...)
} else {
data := fmt.Sprintf(format, a...)
ioutil.WriteFile(p.outputFile, []byte(data), 0x555)
}
}


So, we create the Printer structure on its own package, as mentioned in guideline #1.
But we do not construct the Printer structure within the Book structure, but only from the outside.
Hence an example of an update usage is:


func main() {
p := printer.Create("", true)
b := book.Create(p, "Alice in Wonderland", "Lewis", "Carroll")
b.Print()
}

and so the updated Print method is using the Printer structure:

func (b *Book) Print() {
b.printer.Print("The book name is: %v\n", b.name)
for _, a := range b.authors {
b.printer.Print("Written by: %v %v\n", a.lastName, a.firstName)
}
}


This allows use to send additional parameters to the Printer structure without modifications of the Book structure.


Final Notes


In this post we have reviewed some coding guidelines for the GO Access Control.
These guidelines might appear some cumbersome at first sight, but in the long distance save a lot of time in bug fixes, and maintainability. 

If you like the ideas presented here, leave a comment!









Thursday, May 21, 2020

Local Outlier Factor - A simple GO implementation



In this post we will review the LOF: Local Outlier Factor algorithm.
For full source of the example, see my GO sources at https://github.com/alonana/lof
An example of using the LOF is available in this post.

The LOF is based on several terms:
  • reachability distance
  • LRD
  • LOF
To implement LOF, we'll first implement K-Nearest-Neighbors detection.
Once we have the list of k nearest neighbors for each point, we can run the LOF algorithm.

The LOF provides score for each point.
A LOF score for an outlier would be higher 1.
How high?
It depends on you data, but the more clear outlier, the higher the score would be.


The LOF calculation is based on the following pseudo code:

func LOF(p) {
  sum=0
  foreach neighbor in neighbors(point) {
     sum += LRD(neighbor)
  }
  return sum / (k * LRD(p))
}

The LRD calculation is based on the following pseudo code:

func LRD(p) {
  sum=0
  foreach neighbor in neighbors(point) {
     sum += reachabilityDistance(p,neighbor)
  }
  return k / sum
}


The reachability distance calculation is based on the following pseudo code:

func reachabilityDistance(a, b) {
  return max(Distance(a,b), kDistance(b))
}

The k-distance is based on the following pseudo code:

func kDistance(p) {
  highest=0
  foreach neighbor in neighbors(point) {
     highest = max(highest, Distance(p,neighbor))
  }
  return highest
}

Again, notice that the "neighbors" used in the pseudo code, returns only the k-nearest points, and not all the neighbors.



Monday, May 18, 2020

How to cleanup Google Cloud Container Registry Images





Using Google Cloud Container Registry is pretty easy. 

But as time go by, you get more and more images accumulating in the container registry.
To delete the images, you can manually delete each image, which is OK for a single image removal, but frustrating if you want to remove multiple images.

I have created a short script to remove images recursively from a specific container registry folder.


#!/usr/bin/env bash
set -e

REPO=gcr.io/MY_PROJECT/MY_GCR_FOLDER


deleteTag() {
NAME=$1
HASH=$2
echo "delete hash ${NAME} ${HASH}"
gcloud container images delete -q --force-delete-tags ${NAME}@${HASH}
}

deleteTags() {
NAME=$1
echo "scan tags for ${NAME}"
gcloud container images list-tags ${NAME} --format='get(digest)' | while read line; do deleteTag $NAME $line; done
}

deleteImages() {
NAME=$1
echo "scan images for ${NAME}"
gcloud container images list --repository=${NAME} | grep -v ^NAME | while read line; do deleteImages $line; done
deleteTags ${NAME}
}

deleteImages ${REPO}



To run the script, supply the following as an argument: gcr.io/MY_PROJECT/MY_GCR_FOLDER.
The script will recursively delete the folder and files under this folder.





Update: keep the latest version


To keep the latest version for each image, change the following functions:


deleteTag() {
LATEST=$1
NAME=$2
HASH=$3
TAG=$4

if [[ "${LATEST}" == "${TAG}" || "latest" == "${TAG}" ]]; then
echo "skip delete of ${NAME} ${TAG}"
return 0
fi

echo "delete hash ${NAME} ${TAG} ${HASH}"
gcloud container images delete -q --force-delete-tags ${NAME}@${HASH}
}

deleteTags() {
NAME=$1
echo "scan tags for ${NAME}"
LATEST=$(gcloud container images list-tags ${NAME} --format='get(tags)' | sort -n | tail -1)
echo "keeping latest: ${LATEST}"
gcloud container images list-tags ${NAME} --format='get(digest,tags)' | while read line; do deleteTag $LATEST $NAME $line; done
}






Thursday, May 14, 2020

Access Google Cloud BigQuery from GO



In this post we will review how to access Google Cloud BigQuery from a GOlang Application.

"Serverless, highly scalable, and cost-effective cloud data warehouse designed for business agility."

In simple words, BigQuery enables use to save huge amount of data in a relational DBMS, and access it using plain SQL language, enriched with some of BigQuery proprietary functions.


Access Key


Now that we have data in BigQuery, we'll probably want to process it.
To access the BigQuery, we first need to create an access key.

To create an access key, login to Google Cloud Platform console, and select IAM and Admin, Service Accounts. Then select the account, and using the menu, select Create key, and export to a JSON format.







Let's save the file in path key.json.

Notice:
The selected service account should be granted with permissions to access the BigQuery.


The GO Application


Now, we can use the key.json file to access BigQuery.


package main

import (
"cloud.google.com/go/bigquery"
"context"
"fmt"
"google.golang.org/api/iterator"
"os"
)

func main() {
projectId := "YOUR_GCP_PROJECT_NAME"

_ = os.Setenv("GOOGLE_APPLICATION_CREDENTIALS", "key.json")
bigQueryClient, err := bigquery.NewClient(context.Background(), projectId)
if err != nil {
panic(err)
}

sql := "select col1,col2 from YOUR_DATASET_NAME.YOUR_TABLE_NAME limit 100"
query := bigQueryClient.Query(sql)
result, err := query.Read(context.Background())
if err != nil {
panic(err)
}

for {
var row []bigquery.Value
err := result.Next(&row)
if err == iterator.Done {
return
}
if err != nil {
panic(err)
}

stringColumn := row[2].(string)
intColumn := row[3].(int64)

fmt.Println(stringColumn, intColumn)
}
}


Notice that the path to the key.json is supplied as an environment variable.
In addition, we need to specify our project ID (even though it is already specified in the key.json), and the SQL text that we want to run.


And that's it, very simple.




Tuesday, May 5, 2020

Cloud Migration: Move Your Application to Google Kubernetes Engine



Lately I had to move an application running on a local kubernetes cluster to run on Google Kubernetes Engine(GKE), which is a part of the Google Cloud Platform (GCP). 


In this post I will review the steps done to get things working:

  1. Install Google Cloud SDK
  2. Create a new kubernetes cluster on GKE
  3. Enable a local kubectl to access the kubernetes cluster on GKE
  4. Upload the images to Google Cloud container registry
  5. Adjust the kubernetes templates to use GKE's persistence disks


1. Install Google Cloud SDK


The first step is to install the gcloud CLI, which is the Google cloud SDK.
Google cloud SDK is required to create, and update the various GCP entities, for example: login to GCP, create a kubernetes cluster, configure docker to connect to the GCP registry, and much more.

Specifically, for Ubuntu, follow the instructions of: Install Google Cloud SDK using apt-get.
The summary of these instructions is below.

echo "deb [signed-by=/usr/share/keyrings/cloud.google.gpg] https://packages.cloud.google.com/apt cloud-sdk main" | sudo tee -a /etc/apt/sources.list.d/google-cloud-sdk.list
sudo apt-get install apt-transport-https ca-certificates gnupg
curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key --keyring /usr/share/keyrings/cloud.google.gpg add -
sudo apt-get update && sudo apt-get install google-cloud-sdk
gcloud init


2. Create a new kubernetes cluster on GKE


To create a new kubernetes cluster, I've simply used the GCP web console.
  • Login to the GCP web console using your personal user or your work related user.
  • Click the Menu on the top left, and select: Kubernetes Engine, Clusters.
  • Click on Create Cluster, and change any settings you need (I've used all of the defaults)
Wait several minutes, and your kubernetes cluster is ready.

Later I've found that the cluster was configured to use 3 machines, that each has a single CPU.
For most applications, this is not enough, so I've updated the cluster to use 3 machines of 8 CPUs, using the gcloud SDK:

gcloud container node-pools create MY_NEW_POOL --cluster=MY_K8S_CLUSTER --num-nodes=3 --machine-type=n1-standard-8


3. Enable a local kubectl to access the kubernetes cluster on GKE


Now, our kubernetes cluster is ready, but how can we use the kubectl CLI to access it?

The first method which is simple, but less convenient in my option, is to use the kubectl from the GCP web console.
  • Login to the GCP web console using your personal user or your work related user.
  • Click the Menu on the top left, and select: Kubernetes Engine, Clusters.
  • On the clusters table, click on the connect button on the right side of your cluster.
  • That's it, you have a SSH session and a configured kubectl ready for your use

The second method requires some more steps, but in the long run, is easier to use. It is based on the Configuring cluster access for kubectl guide.
First, enable kubernetes API:
  • Login to the GCP web console using your personal user or your work related user.
  • Click the Menu on the top left, and select: API & services -> Enable API & services -> kubernetes API 

Next, update the local kubectl configuration (at ~/.kube/config) using the gcloud CLI:


gcloud container clusters get-credentials MY_K8S_CLUSTER


As a side note, when working with multiple kubernetes cluster, you should be aware of the kubectl contexts.

kubectl context   =  kubernetes Cluster   +   kubernetes Namespace   +   kubernetes User

Use the following commands to list, view, and update the current kubectl context:

kubectl config get-contexts                          # display list of contexts 
kubectl config current-context                       # display the current-context
kubectl config use-context my-cluster-name           # set the default context to my-cluster-name


4. Upload the images to Google Cloud container registry


OK, your cluster is up and running, and you can access it. But how can you access your images?

If you already have a public accessible container registry, great! You can skip this step.

Otherwise, you can use GCP container registry.

First, enable the container registry API:
  • Login to the GCP web console using your personal user or your work related user.
  • Click the Menu on the top left, and select: API & services -> Enable API & services -> container registry API 
Next, login to the machine where the docker images resides, and run the following:

gcloud auth login
gcloud auth configure-docker

This enables your local docker to access the GCP container registry.

Finally, to upload a docker image, tag it using the GCP prefix, and push it:

IMAGE_FULL_ID=MY_IMAGES_FOLDER/MY_IMAGE_NAME
GCR_TAG=gcr.io/MY_GCP_PROJECT_NAME/${IMAGE_FULL_ID}
docker tag MY_LOCAL_REGISTRY_SERVER:MY_LOCAL_REGISTRY_PORT/${IMAGE_FULL_ID} ${GCR_TAG}
docker push ${GCR_TAG}


5. Adjust the kubernetes templates to use GKE's persistence disks


In case the application has persistence volume claims, you should update it to use GCP's persistence instead,
This is done by dropping the storage class name from the persistence volume claims.

For example, remove the red line here:

volumeClaimTemplates:
  - metadata:
      name: persist-data
    spec:
      accessModes: [ "ReadWriteOnce" ]
      storageClassName: "hostPath"
      resources:
       
requests:
         
storage: "1g"

Summary


GCP is a great platform, allowing quick implementation and deployment of applications.
In this post we have reviewed move of a single application to GKE.
Once the application is located in GKE, it can also easily use additional GCP services, such as BigQuery, Pub/Sub, and AI.