Bat sheva S.

Bio

Software developer with strong analytical thinking, problem‑solving skills, and a high sense of ownership. Experienced in performance optimization, GPU computing with SYCL, and C/C++. Known for quick grasp of new technologies, fast learning, and effective teamwork and communication in collaborative environments.

Skills

C++

CI/CD

CUDA

GPU

SYCL

C++

DPC++

HPC

parallel programming

LLMs

oneAPI

linux

WSL

Git

Bootcamp Project

NextOptAI

AI-powered optimization engine for next-generation computing

Mentored by: Next Silicon

Embedded Systems Bootcamp 2025 (Embedded)

Responsibilities:

Explore core Machine Learning and Deep Learning principles, with a strong emphasis on Large Language Models (LLMs).
Gain proficiency in SYCL, focusing on writing scalable, efficient parallel code that maximizes GPU utilization.
Developed full CPU and SYCL GPU support for new numerical operators (FLOOR, CEIL, ROUND, TRUNC): implemented kernels, wired them into the GGML operator system, and integrated them into the llama.cpp graph builder, with validation on both CPU and GPU hardware. PR-LINK:Implement operators in CPU backend PR-LINK:Implement operators in SYCL backend
Optimized these operators at the kernel level and ran controlled before/after benchmarks, achieving significant speedups in their execution.
Opened and maintained pull requests to upstream ggml/llama.cpp, addressing reviewer feedback, refining code design, updating documentation and tests, ensuring CI stability, and successfully merging the new operators into the main codebase.
Study the Attention mechanism as described in “Attention Is All You Need” (2017).
Study FlashAttention from the paper “FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness” (2022).
Implement a SYCL-based FlashAttention backend in llama.cpp, handling integration challenges and design trade-offs to deliver high-performance attention on GPU. The implementation includes efficient matrix multiplication with matrix transposition for better memory access patterns, explicit staging of data in on-chip SRAM, and custom implementations of Softmax and value normalization within the kernel. Profile, benchmark, and refine the FlashAttention path, iteratively tuning it for higher throughput, lower latency, and improved overall efficiency. PR-LINK: Implement FLASH_ATTN for SYCL beckend

Click to enlarge

Additional Projects

Built a subscription management website for a fuel company using a three-tier architecture and robust data management practices with SQL Server as the database, a C# (.NET) backend, and a React frontend.

English Level

Working Proficiency