Dvora G.

Bio

Software Engineering Associate with a specialization in AI, excelling in strong logical reasoning, advanced analytical skills, and fast, creative thinking. Skilled in handling complex tasks, highly proactive, responsible, and an excellent team player. Self-taught with the ability to quickly adapt to new technologies, consistently meeting deadlines while striving for excellence.

Skills

Python

C++

Docker

Redis

FastAPI

CUDA

vLLM

KVrocks

RocksDB

GPU Computing

HBM

KV-Cache Offloading

REST APIs

Linux

Git

Multi-Threading

Profiling

Benchmarking

DRAM/SSD Performance Analysis.

Bootcamp Project

Hyperconverged KV-Cache Offloading for Cost-Efficient LLM Inference

Datacenter-scale LLM inference framework that offloads KV-cache to a hyperconverged KV-store, increasing capacity and robustness while keeping high hit-rates and good user experience.

Mentored by: Pliops

Data Science Bootcamp 2025 (Data)

Responsibilities:

Research and Familiarization with vLLM Conducted in-depth study of the vLLM architecture, execution flow, and inference pipeline. Executed multiple LLM inference runs using vLLM to gain hands-on operational understanding.
Research and Evaluation of KVRocks Investigated the KVRocks storage engine, its design goals, architecture, and usage scenarios. Evaluated its suitability as an external KV-store for inference KV-cache offloading.
Baseline vLLM Deployment on NVIDIA L4 GPUs Deployed the open-source vLLM framework on NVIDIA L4 GPUs to establish a baseline environment. Executed Llama-3-8B inference workloads to measure initial performance characteristics.
Concurrency Limit Analysis Based on GPU HBM Saturation Calculated precise concurrency limits by analyzing HBM saturation points relative to token size and KV-cache footprint. Derived the maximum number of concurrent clients supported per server.
Integration Between vLLM and the Pliops Gateway Integrated vLLM with the Pliops Gateway (connector) for accelerated KV access. Validated full end-to-end inference flow through the gateway.
Implementation of a New Storage Backend for KVRocks in the Gateway Developed and integrated a new KVRocks storage backend within the Pliops Gateway. Implemented full CRUD support for direct interaction with the KV-store.
Single-Threaded Multi-Get / Multi-Set Implementation Implemented a single-threaded Multi-Get / Multi-Set access pattern to KVRocks. Evaluated correctness, stability, and latency impact of batched operations.
Multi-Threaded Get/Set Integration via the Gateway Implemented a multi-threaded Get/Set access pattern with multiple concurrent clients. Managed connection handling, synchronization, and system stability under load.
Performance Benchmarking and Comparative Storage Analysis Executed performance benchmarks comparing vLLM + Pliops Gateway + KVRocks against vLLM with DRAM-based LM-Cache. Synthesized benchmark data into comparative DRAM vs. SSD graphs, plotting RPS/GPU vs. TPS/USER to illustrate cost–performance trade-offs.
Bottleneck Analysis and Performance Optimization Insights Analyzed logs and performance metrics to identify I/O, networking, and memory bottlenecks. Derived initial optimization insights and future improvement directions.

Click to enlarge

Additional Projects

Online Survey Platform | Node.js & React (Deployed on AWS) | https://github.com/d7080120/reactnodeproject Full-stack development of a survey platform featuring an intuitive admin panel, wide functionality, and user-friendly interface. Technologies: React (client-side), Node.js (server-side), MongoDB (database).

E-Commerce Website | C# .NET & JavaScript | https://github.com/d7080120/WebApi_BabyProductShop Developed an e-commerce website for managing online transactions. Backend in .NET with a three-layer architecture, Swagger integration, DTOs, and SQL database connection. Frontend implemented in JavaScript for customer interaction.

English Level

Fluent