Skip to content

pedromazala/flink-streaming-ai

Repository files navigation

Flink Streaming AI Platform

This repository contains a streaming-first RAG pipeline built from three applications:

  • apps/cdc – Flink 1.19 session cluster that uses Flink CDC to capture data from MySQL and publish it to Kafka.
  • apps/knowledge – Flink 2.0 DataStream job that reads CDC events from Kafka and asynchronously enriches Qdrant with fresh embeddings.
  • apps/ask – Flink 2.0 DataStream job that consumes questions from Kafka, retrieves context from Qdrant, calls OpenAI, and emits answers back to Kafka.

The overall flow is:

MySQL -> (Flink CDC) -> Kafka topic `knowledge_topic`
       -> (Knowledge Job) -> Qdrant with metadata = context ID
Ask topic `ask` -> (Ask Job) -> Answer topic `answer`

Repository layout

apps/
  ask/             # Flink 2.0 ask processor
  knowledge/       # Flink 2.0 knowledge ingestor
  cdc/             # Session-cluster CDC stack (Flink 1.19 + MySQL + Kafka)
libs/
  core/            # Shared LangChain/Qdrant services
Dockerfile         # Multi-target builder for ask & knowledge images
docker-compose.yml # Full stack (CDC + jobs + infra)
up.sh              # One command bootstrap

Prerequisites

  • Docker 24+ with Compose v2
  • Java tooling is handled inside the Docker builds (Gradle 8, JDK 21)
  • An OpenAI API key with access to gpt-4o-mini and text-embedding-3-small

Quick start

  1. Copy the environment template and add your secrets:

    cp .env.example .env
    # edit .env and set OPENAI_API_KEY at minimum
  2. Launch the entire stack:

    ./up.sh

    This builds the Flink fat jars, creates the Docker images, and starts MySQL, Kafka, Qdrant and all Flink jobs.

  3. Once everything is healthy you can inspect the UIs:

Interacting with the pipeline

Add knowledge

Use the helper script to write knowledge rows into MySQL (they are CDC-ed into Kafka automatically):

./run.sh --context-id main --context "Paris is the capital of France."

The CDC job publishes {id, context, knowledge} JSON events to the knowledge_topic. The knowledge job embeds the text and stores it in Qdrant with metadata key defined by CONTEXT_METADATA_KEY (default context).

Ask questions

Send questions through the same script. The job accepts either context or legacy sessionId keys:

./run.sh --context-id main --ask "What is the capital of France?"

Stream answers from the answer topic (Ctrl+C to stop):

./run.sh --list

Messages include context, question, answer, and a timestamp.

Environment variables

Most values have sensible defaults but can be overridden via .env or the host environment.

Variable Default Description
OPENAI_API_KEY required API key used by both Flink 2.0 jobs
QDRANT_COLLECTION flink_streaming_ai_docs Collection that stores embeddings
CONTEXT_METADATA_KEY context Metadata key used for Qdrant filtering
ASK_TOPIC ask Kafka topic producing questions
ANSWER_TOPIC answer Kafka topic receiving answers
ASK_GROUP_ID ask-processor Consumer group for the ask job
KNOWLEDGE_TOPIC knowledge_topic Kafka topic with CDC updates
KNOWLEDGE_GROUP_ID knowledge-to-qdrant Consumer group for the knowledge job

Internal services (MySQL, Kafka, AKHQ, Qdrant) can also be modified directly in docker-compose.yml.

Development notes

  • Both Flink 2.0 applications rely on shaded JARs created with the Shadow plugin. The multi-target Dockerfile builds them automatically.
  • Shared LangChain services live under libs/core and are reused by both jobs.
  • The CDC stack still runs on Flink 1.19 because Flink CDC 3.5 depends on the classic SourceFunction API.
  • If you need to reset state, use docker compose down -v to remove Docker volumes (Kafka/MySQL/Qdrant data).

Troubleshooting

  • Ensure OPENAI_API_KEY is set before running ./up.sh; otherwise the Flink jobs will continuously restart.
  • Use docker compose logs -f knowledge-jobmanager or ask-jobmanager to inspect failures in the jobs.
  • If you change Java code, rerun docker compose build to rebuild the shaded JARs.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published