
Mentored by: Mobileye
High-performance real-time imaging system for Raspberry Pi

End-to-end camera pipeline for OV5647 sensor including configuration layer, kernel-driver integration, frame acquisition, real-time image processing (debayering, WB, gamma correction), OpenMP & NEON acceleration, logging/monitoring layer, CI pipeline with ARM cross-compilation and QEMU testing. Achieves performance significantly faster than OpenCV in multiple benchmarks.
Cohort: Embedded Systems Bootcamp 2025 (Embedded)
Responsibilities:
Designed modular system architecture separating camera configuration, buffer management, and core logic, ensuring maintainability and scalability.
Designed and implemented a custom Linux kernel camera module supporting MIPI-CSI and DMA for high-performance frame acquisition on Raspberry Pi.
Developed IOCTL interfaces for user-kernel communication.
Developed shared-memory buffers for efficient data transfer between Kernel Space and User Space, reducing context-switch overhead.
Integrated the module with the V4L2 (Video for Linux 2) framework to provide seamless access for user-space applications.
Developed and optimized image-processing algorithms within the SpeedyCam pipeline, leveraging ARM NEON and OpenMP for parallel performance acceleration.
Accelerated performance using NEON SIMD instructions and OpenMP, achieving ~30% latency reduction compared to baseline OpenCV solution.
Built integration and unit tests using mocks to ensure driver stability and regression prevention.
...and more contributions not listed here
Responsibilities:
Developed a kernel-level camera module for Linux with MIPI-CSI and DMA support for real-time frame acquisition on Raspberry Pi.
Designed an efficient shared-memory mechanism for transferring image data between Kernel and User Space, minimizing memory copies and improving throughput
Implemented a Color Conversion algorithm supporting YUV, LAB, LUV, and GRAY color spaces, including NEON-SIMD optimizations for vectorized real-time processing.
Accelerated the image-processing pipeline using OpenMP to distribute workloads across multiple CPU cores for continuous, high-throughput processing
Accelerated the image-processing pipeline using OpenMP to distribute workloads across multiple CPU cores for continuous, high-throughput processing.
Built a comprehensive suite of unit tests and integration tests using mocks.
Researched IOCTL interfaces and alternative communication mechanisms between User Space and Kernel Space for driver control and frame management.
Designed and developed a thread-safe system library implementing a parallel pipeline: A dedicated high-priority thread for high-rate frame acquisition from the video stream And a second asynchronous thread for image processing The architecture provides load separation, non-blocking operation, and improved real-time system stability
...and more contributions not listed here
Responsibilities:
Developed a custom Linux camera module supporting MIPI -CSI and DMA for real-time frame acquisition.
Designed shared-memory buffers for efficient data transfer between Kernel and User Space.
Implemented OpenCV algorithms including White Balance, Color Conversion, Single-Channel Extraction, and RGB Debayering; optimized performance using NEON-SIMD and OpenMP for real-time image processing.
Optimized frame-pipeline performance, achieving ~30% lower latency compared to OpenCV.
Built unit and integration tests using mocks, fully integrated into a CI workflow.
Used CMake to build a clean and modular system, working in a Linux environment.
Used CMake and Make to build a clean and modular system, working in a Linux environment.
...and more contributions not listed here
Responsibilities:
Developed an Initial Configuration Layer (POC) for the OV5647 sensor using the I2C API, implementing generic and specific functions for key camera parameters (e.g., Frame Rate, Resolution,Test pattern).
Executed an Architectural Pivot to V4L2 subsystem integration, refactoring the solution to leverage the platform's native configuration mechanism, and delivering a clean Abstraction API.
Extracted a White Balance (WB) algorithm from the OpenCV library codebase for standalone implementation and performance benchmarking
Low-Level Performance Optimization Executed low-level time optimization on the isolated White Balance algorithm. Achieved significant runtime improvements by implementing parallelization techniques including SIMD vectorization and OpenMP multithreading.
Concurrent Pipeline Design: Implemented a thread-safe system library with a Concurrent Pipeline Architecture utilizing two dedicated threads: one for high-priority frame acquisition and one for parallel image processing.
...and more contributions not listed here