Sara T.

Bio

Proactive software engineer and knowledge management specialist, with hands-on experience in AI systems. Combines analytical precision, independent thinking, collaborative skills, and a strong drive for excellence.

Skills

Python

CUDA

PyTorch

VLLM

MOONCAKE

Bootcamp Project

Hyperconverged KV-Cache Offloading for Cost-Efficient LLM Inference

Datacenter-scale LLM inference framework that offloads KV-cache to a hyperconverged KV-store, increasing capacity and robustness while keeping high hit-rates and good user experience.

Mentored by: Pliops

Data Science Bootcamp 2025 (Data)

Responsibilities:

Hyper-Converged KV-Cache Offloading As part of a research-driven team project, each member was assigned a specific topic to investigate, build, and benchmark. My focus was on executing Mooncake and analyzing how it manages memory retention from vLLM within a hyper-converged design. This included hands-on work with KV-Cache optimization for key-value storage and evaluating system behavior under different inference loads.
vLLM Research & System Architecture Analysis Conducted in-depth research into vLLM and execution flow, as well as a detailed study of MOONCAKE’s architecture to fully understand system behavior, request flow, and pipeline dynamics.
Performed full setup and execution of the Mooncake project on a PLIOPS server from open source, including build, dependency installation, and runtime validation. Handled various deployment challenges, including crashes, server-specific build adjustments, and CUDA incompatibility with the MOONCAKE requirements. Requested the appropriate CUDA version to be installed, and later resolved a Python version mismatch by switching to a compatible interpreter.
Assisted a PLIOPS employee in analyzing high DRAM usage (~1GB for 2 keys) when running models on Mooncake. Replicated the test on a different server using a different model and script, as the benchmark he used exceeded the available GPU resources on my server. My test consumed 380MB for 2 keys.
Checked how many tokens are stored per request and the memory cost per token. After adding debugging, found that my run did not cause rapid DRAM growth — memory usage matched the token count times per-token cost.

Click to enlarge

Additional Projects

Backend Development with Node.js and MongoDB

Built backend server using Node.js
Used Axios in backend to fetch data from external APIs
Collaborated in team using Git for version control
Used MongoDB for database operations

Subscription Management Platform for Print House

Built a full-stack subscription management system with C# (.NET) backend (three-tier

architecture) and React/Redux frontend

Developed order processing, package management, and purchase flow logic
Designed a relational SQL database with multi-table structure and processing logic
Ensured solid UX, accurate workflows, and modular, maintainable code

English Level

Working Proficiency