Full Blog TOC

Full Blog Table Of Content with Keywords Available HERE

Wednesday, November 27, 2024

Redpanda Connect Introduction


Redpanda Connect, previously known as Benthos, is a streaming pipeline that reads and write messages from/to many connectors. It enables transforming the messages using built-in processors. The goal of this framework is to enable us connect and convert data streams without writing a proprietary code, but only using a configuration file with minimal dedicated processing code. 

To install use the following:

curl -LO https://github.com/redpanda-data/redpanda/releases/latest/download/rpk-linux-amd64.zip
unzip rpk-linux-amd64.zip
sudo mv rpk /usr/local/bin/
rm rpk-linux-amd64.zip

Create a file named connect.yaml:

stdin: {}

- mapping: root = content().uppercase()

stdout: {}

And run it:

rpk connect run connect.yaml

Now any input to the STDIN is copied to the STDOUT.

While Redpanda connect can manage string messages, most f its abilities are built toward JSON messages handling. For example, consider the following connect.yaml file:

interval: 1s
count: 0
mapping: |
let first_name = fake("first_name")
let last_name = fake("last_name")

root.id = counter()
root.name = ($first_name + " " + $last_name)
root.timestamp = now()

- sleep:
duration: 100ms
- group_by:
- check: this.id % 2 == 0
- mapping: |
root.original_doc = this
root.encoded = this.name.hash("sha256").encode("base64")

stdout: {}

It will generate the following output:

{"id":1,"name":"Ivah Mohr","timestamp":"2024-11-27T16:36:37.542410543+02:00"}
{"encoded":"oaaBKF/7oz0N6j6VlSZs14u8FD2dAwPSBAoJvIIMpWI=","original_doc":{"id":2,"name":"Darrion Miller","timestamp":"2024-11-27T16:36:38.541429372+02:00"}}
{"id":3,"name":"Maddison Paucek","timestamp":"2024-11-27T16:36:39.542047832+02:00"}
{"encoded":"+e6EyBEILJYStdV+DH6cohEVvfW04VTo1q2YLXp4ft8=","original_doc":{"id":4,"name":"Eleazar Sporer","timestamp":"2024-11-27T16:36:40.542410537+02:00"}}
{"id":5,"name":"Junius Renner","timestamp":"2024-11-27T16:36:41.542298415+02:00"}
{"encoded":"A4o97ySPf9yWFMXcutPBiI6a6Fd19ofqBmK/7s84ZZ4=","original_doc":{"id":6,"name":"Carlie Osinski","timestamp":"2024-11-27T16:36:42.542277315+02:00"}}
{"id":7,"name":"Raphaelle Reichel","timestamp":"2024-11-27T16:36:43.542382662+02:00"}
{"encoded":"U+pLsYjRaWH87nUFdZmRLJwvGrIfUOCYeUrazKGoEHA=","original_doc":{"id":8,"name":"Elmira Douglas","timestamp":"2024-11-27T16:36:44.542245761+02:00"}}
{"id":9,"name":"Ambrose Hudson","timestamp":"2024-11-27T16:36:45.542307678+02:00"}
{"encoded":"J2NxrIaC1cvHtqIduSOx85TlyFLQrT48QaYw8iR9To0=","original_doc":{"id":10,"name":"Jedidiah Veum","timestamp":"2024-11-27T16:36:46.542276779+02:00"}}

Final Words

Yeah, that's cool: You save time coding and fixing bugs. This is nice if you need a simple fast and reliable pipeline.
However, in real life you will usually require more the the builtin abilities, and then you need to extend Redpanda Connect with your own code in GO. Is it better than just writing all in GO? I think it depends on how much bugs do you expect from the programmer.

No comments:

Post a Comment