Software & Embedded developer with specialization in real-time systems and programming languages close to hardware. Possesses high thinking skills and a creative approach to solving complex challenges with experience in low-level development. Thorough and has a very high ability to understand in depth, emphasizes precise planning and writing clean, high-quality and efficient code, and has a high self-learning ability.
AI-powered optimization engine for next-generation computing
Mentored by: Next Silicon
Embedded Systems Bootcamp 2025 (Embedded)
Responsibilities:
Download an LLM from Hugging Face, convert it into a llama.cpp-compatible format, and run it locally using llama.cpp.
Study foundational Machine Learning and Deep Learning concepts, with an in-depth focus on Large Language Models (LLMs).
Learn SYCL, emphasizing best practices for writing highly parallel and efficient code that fully exploits GPU capabilities.
Implement the CONCAT operator for the SYCL backend in llama.cpp, including effective workload distribution across GPU hardware. PR-LINK: Implement CONCAT for SYCL beckend
Implement the PADREFLECT1D operator for the SYCL backend in llama.cpp, including effective workload distribution across GPU hardware. PR-LINK: Implement PADREFLECT1D for SYCL beckend
Study the Attention mechanism as described in “Attention Is All You Need” (2017).
Study FlashAttention from the paper “FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness” (2022).
Implement Flash Attention in llama.cpp for the SYCL backend, addressing development challenges and managing trade-offs intelligently to achieve maximum performance. PR-LINK: Implement FLASH_ATTN for SYCL beckend
Measure, benchmark, and optimize the Flash Attention implementation, iterating toward improved throughput and efficiency

Java project involving an inquiry-management system, with hands-on experience in designing and managing multi-threaded workflows.
Native