Kubernetes operator for streaming data between different sources (Kafka, PostgreSQL, Trino) with support for message transformations.
- Multiple Data Sources: Kafka, PostgreSQL, Trino
- Message Transformations:
- Timestamp - add timestamp to messages
- Flatten - expand arrays into separate messages
- Filter - filter by conditions
- Mask - mask sensitive data
- Router - route to different sinks
- Select - select specific fields
- Remove - remove fields
- SnakeCase - convert field names to snake_case
- CamelCase - convert field names to CamelCase
- Kubernetes Secrets Support: Configure connectors using
SecretReffor secure credential management - Per-Resource Pod Deployment: Each DataFlow resource creates a separate pod (Deployment) for processing
- Resource Management: Configure CPU and memory resources for processor pods
- Pod Placement Control: Configure nodeSelector, affinity, and tolerations for fine-grained pod placement
- Kubernetes 1.24+
- Helm 3.0+
- kubectl
- Go 1.21+ (for local development)
- Docker and docker-compose (for local development)
Before installing the operator, you need to install the Custom Resource Definition (CRD):
kubectl apply -f https://raw.githubusercontent.com/dataflow-operator/dataflow/refs/heads/main/config/crd/bases/dataflow.dataflow.io_dataflows.yamlOr use a local file:
kubectl apply -f config/crd/bases/dataflow.dataflow.io_dataflows.yaml- Install the operator from OCI registry:
helm install dataflow-operator oci://ghcr.io/dataflow-operator/helm-charts/dataflow-operator- For installation with custom settings:
helm install dataflow-operator oci://ghcr.io/dataflow-operator/helm-charts/dataflow-operator \
--set image.repository=your-registry/controller \
--set image.tag=v1.0.0 \
--set replicaCount=2- For installation in a specific namespace:
helm install dataflow-operator oci://ghcr.io/dataflow-operator/helm-charts/dataflow-operator \
--namespace dataflow-system \
--create-namespace- Check installation status:
kubectl get pods -l app.kubernetes.io/name=dataflow-operatorNote: For local development, you can also use the local chart:
helm install dataflow-operator ./helm/dataflow-operatorhelm upgrade dataflow-operator oci://ghcr.io/dataflow-operator/helm-charts/dataflow-operatorhelm uninstall dataflow-operatorFor local development, you can run the operator locally:
make runOr use the script:
./scripts/run-local.sh- Start dependencies with UI interfaces:
docker-compose up -dAvailable UIs:
- Kafka UI: http://localhost:8080
- pgAdmin: http://localhost:5050 (admin@admin.com / admin)
- Run the operator:
make runSee config/samples/ for CRD manifest examples.
For secure credential storage, use Kubernetes Secrets:
kubectl apply -f config/samples/kafka-to-postgres-secrets.yamlThis example demonstrates using SecretRef for connector configuration. All connectors support configuration from Kubernetes Secrets.
dataflow/
├── api/v1/ # CRD definitions
├── internal/
│ ├── connectors/ # Connectors for sources/sinks
│ ├── transformers/ # Message transformations
│ ├── processor/ # Message processor
│ └── controller/ # Kubernetes controller
├── helm/dataflow-operator/ # Helm Chart for installation
├── config/samples/ # CRD examples
├── docs/ # MkDocs documentation
├── test/ # Tests and utilities
└── scripts/ # Helper scripts
Full documentation is available in docs/. To view:
mkdocs serveOr from the project root:
cd docs && mkdocs serveDocumentation is available in two languages:
- English: Default language
- Russian: Available in the navigation menu
If you encounter issues with make generate, try:
# Update controller-gen
go install sigs.k8s.io/controller-tools/cmd/controller-gen@latest
# Then
make generate# Unit tests
make test
# Integration tests (requires kind)
./scripts/setup-kind.sh
make test-integrationDataFlow включает веб-интерфейс для управления манифестами, просмотра логов и мониторинга метрик.
go run cmd/gui-server/main.go --bind-address :8080Или с параметрами:
go run cmd/gui-server/main.go \
--bind-address :8080 \
--kubeconfig ~/.kube/config \
--log-level info- Откройте браузер и перейдите на
http://localhost:8080 - Используйте вкладки для:
- Манифесты: Управление DataFlow ресурсами (создание, просмотр, обновление, удаление)
- Логи: Просмотр логов обработки в реальном времени
- Метрики: Мониторинг статуса обработки, количества обработанных сообщений и ошибок
Подробнее см. cmd/gui-server/README.md
DataFlow Operator supports configuring connectors from Kubernetes Secrets through SecretRef fields. This allows:
- Secure storage of sensitive data (passwords, tokens, connection strings)
- Centralized credential management
- Secret rotation without changing DataFlow resources
- Access control through Kubernetes RBAC
See the Connectors documentation for details.
Apache License 2.0