Full Blog TOC

Full Blog Table Of Content with Keywords Available HERE

Wednesday, February 4, 2026

Using DuckDB in GO




In this post we show a simple example of using DuckDB in GO.

DuckDB features:

  • Can run in-memory 
  • Supports multiple files types, such as : JSON, NDJSON, CSV EXCEL
  • Supports multiple storage types, such as: AWS S3, GCP Storage, PostgreSQL

In the following example we store our data in parquet files in AWS S3.
We order the file using a folders structure that would later enable use to fetch only the files we need.


package duckdb

import (
"database/sql"
"fmt"
"testing"
"time"

_ "github.com/marcboeker/go-duckdb"
)

func TestValidation(_ *testing.T) {
db, err := sql.Open("duckdb", "")
kiterr.RaiseIfError(err)
defer db.Close()

_, err = db.Exec(`
INSTALL httpfs;
LOAD httpfs;
`)
kiterr.RaiseIfError(err)

_, err = db.Exec(`
SET s3_region='us-east-1';
SET s3_access_key_id='XXX';
SET s3_secret_access_key='XXX';
`)
kiterr.RaiseIfError(err)

createTable(db)
createData(db)
exportData(db)
}

func createTable(
db *sql.DB,
) {
_, err := db.Exec(`
CREATE TABLE IF NOT EXISTS events (
my_text TEXT,
my_value INTEGER,
event_time TIMESTAMP
);
`)
kiterr.RaiseIfError(err)
}

func createData(
db *sql.DB,
) {
statement, err := db.Prepare(`
INSERT INTO events (my_text, my_value, event_time)
VALUES (?, ?, ?)
`)
kiterr.RaiseIfError(err)

startTime := kittime.ParseDay("2000-01-01")
for day := range 2 {
for hour := range 24 {
for dataIndex := range 2 {
dayTime := startTime.Add(time.Duration(day) * 24 * time.Hour)
hourTime := dayTime.Add(time.Duration(hour) * time.Hour)
fmt.Printf("saving data %v for %v\n", dataIndex, kittime.NiceTime(&hourTime))
_, err = statement.Exec("aaa", 11, &hourTime)
kiterr.RaiseIfError(err)
}
}
}
}

func exportData(
db *sql.DB,
) {
_, err := db.Exec(`
COPY (
SELECT
my_text,
my_value,
event_time,
CAST(event_time AS DATE) AS event_date,
strftime(event_time , '%H') AS event_hour
FROM events
)
TO 's3://my-bucket/duck/'
(
FORMAT PARQUET,
COMPRESSION GZIP,
PARTITION_BY (event_date, event_hour)
);
`)
kiterr.RaiseIfError(err)
}


The result in AWS S3 is:


$ aws s3 ls --recursive s3://my-bucket
2026-02-04 11:27:40 542 duck/event_date=2000-01-02/event_hour=10/data_0.parquet
2026-02-04 11:27:45 541 duck/event_date=2000-01-02/event_hour=13/data_0.parquet
2026-02-04 11:27:47 541 duck/event_date=2000-01-02/event_hour=14/data_0.parquet
2026-02-04 11:27:49 542 duck/event_date=2000-01-02/event_hour=15/data_0.parquet
2026-02-04 11:27:50 541 duck/event_date=2000-01-02/event_hour=16/data_0.parquet
2026-02-04 11:27:52 542 duck/event_date=2000-01-02/event_hour=17/data_0.parquet
2026-02-04 11:27:54 541 duck/event_date=2000-01-02/event_hour=18/data_0.parquet
2026-02-04 11:27:58 541 duck/event_date=2000-01-02/event_hour=20/data_0.parquet
2026-02-04 11:28:00 542 duck/event_date=2000-01-02/event_hour=21/data_0.parquet
2026-02-04 11:28:03 542 duck/event_date=2000-01-02/event_hour=23/data_0.parquet


So for a quick start project that needs to store data overtime, and read it up need it is nice solution. For a heavier project that requires more data with high control over the storage type, parallelism, and timing, it might require creation of proprietary code.



No comments:

Post a Comment