ExtraTech Logo
TracksProjectsStudentsRecommendationsContact Us

© 2025 ExtraTech Bootcamps. All rights reserved.

← Back to Students

Sara T.

GitHubBlog Post

Bio

Proactive software engineer and knowledge management specialist, with hands-on experience in AI systems. Combines analytical precision, independent thinking, collaborative skills, and a strong drive for excellence.

Skills

Python
CUDA
PyTorch
VLLM
MOONCAKE

Bootcamp Project

Hyperconverged KV-Cache Offloading for Cost-Efficient LLM Inference

Datacenter-scale LLM inference framework that offloads KV-cache to a hyperconverged KV-store, increasing capacity and robustness while keeping high hit-rates and good user experience.

Pliops

Mentored by: Pliops

Data Science Bootcamp 2025 (Data)

Responsibilities:

  • Hyper-Converged KV-Cache Offloading As part of a research-driven team project, each member was assigned a specific topic to investigate, build, and benchmark. My focus was on executing Mooncake and analyzing how it manages memory retention from vLLM within a hyper-converged design. This included hands-on work with KV-Cache optimization for key-value storage and evaluating system behavior under different inference loads.

  • vLLM Research & System Architecture Analysis Conducted in-depth research into vLLM and execution flow, as well as a detailed study of MOONCAKE’s architecture to fully understand system behavior, request flow, and pipeline dynamics.

  • Performed full setup and execution of the Mooncake project on a PLIOPS server from open source, including build, dependency installation, and runtime validation. Handled various deployment challenges, including crashes, server-specific build adjustments, and CUDA incompatibility with the MOONCAKE requirements. Requested the appropriate CUDA version to be installed, and later resolved a Python version mismatch by switching to a compatible interpreter.

  • Assisted a PLIOPS employee in analyzing high DRAM usage (~1GB for 2 keys) when running models on Mooncake. Replicated the test on a different server using a different model and script, as the benchmark he used exceeded the available GPU resources on my server. My test consumed 380MB for 2 keys.

  • Checked how many tokens are stored per request and the memory cost per token. After adding debugging, found that my run did not cause rapid DRAM growth — memory usage matched the token count times per-token cost.

Sara T. - Task Preview
Click to enlarge

Additional Projects

Backend Development with Node.js and MongoDB

  • Built backend server using Node.js
  • Used Axios in backend to fetch data from external APIs
  • Collaborated in team using Git for version control
  • Used MongoDB for database operations

Subscription Management Platform for Print House

  • Built a full-stack subscription management system with C# (.NET) backend (three-tier

architecture) and React/Redux frontend

  • Developed order processing, package management, and purchase flow logic
  • Designed a relational SQL database with multi-table structure and processing logic
  • Ensured solid UX, accurate workflows, and modular, maintainable code

English Level

Working Proficiency