This project is a sub-project of AgCloud

Sound

Mentored by: Vast Data

Sound - Cloud-based platform for agricultural data management and analytics

React

Node.js

PostgreSQL

AWS

Docker

Kubernetes

Microservices

Description

Sound sub-project of AgCloud. A comprehensive cloud platform for managing agricultural operations, data, and analytics. Provides centralized storage, processing, and visualization of farm data including crop monitoring, weather integration, equipment management, and predictive analytics. Features include multi-tenant architecture, real-time dashboards, and API integrations.

Team Members

Cohort: Data Science Bootcamp 2025 (Data)

Malka H.

Responsibilities:

Implemented Kafka startup logic inside Docker, including broker initialization, readiness checks, automatic topic creation with custom retention settings, and smoke-test validation to ensure message delivery functionality.
Implemented a fully automated audio-compression and tiering system that identifies old audio files based on the timestamp embedded in the filename, downloads them from MinIO, compresses them using FFmpeg (Opus/FLAC), replaces the original files with the compressed versions, removes outdated compressed files, and executes continuously through a Docker-based Cron scheduler.
Implemented data visualization inside the PyQt6 application by generating analytics graphs with Matplotlib and embedding them directly into the GUI views, while also integrating live Grafana dashboard panels through QWebEngineView to display real-time system metrics as part of the application's interface.
Developed a full notification-scheduling service including a Flask REST API with PostgreSQL persistence, supporting creation, editing, and deletion of scheduled tasks. Integrated the API with both a JavaScript web interface and a PyQt6 desktop view to provide a unified UI for managing schedules across the system.
Built a pipeline that ingests edge-device metadata via MQTT into Kafka, Flink, and PostgreSQL, while raw data files flow through Kafka, MinIO, bucket notifications, Flink, and PostgreSQL. Added a matching script that links data files with their metadata in a connection table, ensuring downstream use only of files that contain both.

...and more contributions not listed here

Dive in 🚀

No preview image

Ruth H.

Responsibilities:

🇬🇧 Message Load Testing Between MQTT and Kafka Inside GitHub Actions In this task, I built a full end-to-end soak testing environment that simulates continuous, high-volume message flow between MQTT and Kafka — entirely inside GitHub Actions, without Kubernetes. I created a custom Docker image for Mosquitto with a Kafka Bridge, designed a CI workflow that publishes and consumes large message streams, and used simulator.py and junitify.py to analyze message loss, latency, and reliability. The result was a fully automated CI pipeline that validates system stability on every change.
🇬🇧 Audio Metrics Microservice for MinIO + Prometheus I developed a microservice called sound_metrics that periodically scans ultrasonic audio files stored in MinIO and computes key metrics such as: • Average RMS • Amplitude standard deviation • Microphone uptime The service uses Python, Librosa, and ffmpeg for audio processing, exposes a Prometheus metrics endpoint, and runs as part of the system’s Docker Compose environment.
🇬🇧 ML Research and Model for Plant Stress Detection For the third task, I researched and built a machine-learning model that detects plant stress based on ultrasonic recordings. I worked on: • Collecting and organizing the dataset • Extracting features using Librosa • Generating spectrograms for CNN models • Training models in TensorFlow/PyTorch • Evaluating and saving results The model was packaged into a CronJob-style service that runs periodically, performs predictions, and logs them into PostgreSQL. When the model detects a condition that requires attention, it sends an alert message to Kafka, where it is then consumed and displayed to the user.

Dive in 🚀

Tehila H.

Responsibilities:

Developed a Python-based simulator that used Kafka and MQTT to publish synthetic sensor events and verified end-to-end message reliability, ensuring that the messaging infrastructure delivered all events without loss.
Built an MQTT-to-Kafka bridge service that subscribed to specific MQTT topics and forwarded incoming messages to matching Kafka topics, allowing the platform to unify data ingestion and maintain consistent routing across microservices.
Created a real-time sound-classification microservice in FastAPI that processed audio segments with ML models and published alerts to a dedicated Kafka topic whenever suspicious environmental noise was detected.
Implemented an Apache Flink streaming service that consumed events from Kafka in real time and automatically triggered API calls to the sound-classification service for every incoming audio message, enabling low-latency, event-driven processing.
Orchestrated the entire system data flow end-to-end by adapting an external team’s simulator to upload sound files into MinIO together with aligned metadata, ensuring the data seamlessly propagated through Kafka, Flink, and the ML inference services. This work guaranteed that the full pipeline—from simulated data to classification output—functioned reliably as an integrated system.

...and more contributions not listed here

Dive in 🚀

Tehila D.

Responsibilities:

Developed MQTT integration, including Mosquitto configuration, reliable connectivity, topic filtering, and message ingestion into Kafka.
implemented MinIO bucket-notification workflows enabling event-driven routing of uploaded imagery and audio to ingestion services.
built a Python-based GUI environment inside Docker using noVNC, including system views, operational tools, and data-visualization panels.
Research: Audio Playback Integration – Investigating options for implementing audio streaming and playback inside the GUI, including understanding current platform limitations (e.g., noVNC audio constraints) and exploring alternative solutions.

Dive in 🚀