Posts

Dive Deeper into Data Engineering on Databricks

Latency goes subsecond in Apache Spark Structured Streaming Improving Offset Management in Project Lightspeed

Spark Structured Streaming with Kafka: Understanding the startingOffset=“earliest” Issue

Que-veut-dire-libre-ou-open-source-pour-un-grand-modele-de-langage/

What are LLMs, and how are they used in generative AI?

How we saved 90% of costs by moving from AWS Lambda to AWS Fargate

Six point checklist for Spark job optimization

Apache Iceberg reduced Amazon S3 cost by 90%

Simple method for choosing the number of partitions in Spark

Scala is a maintenance nightmare ?

Apache Spark on Kubernetes

The new generation Data Lake architecture

How To Break DAG Lineage in Apache Spark — 3 Methods