vLLM is an inference and serving engine for large language models (LLMs). Prior to version 0.9.0, when a new prompt is processed, if the PageAttention mechanism finds a matching prefix chunk, the prefill process speeds up, which is reflected in the TTFT (Time to First Token). These timing differences caused by matching chunks are significant enough to be recognized and exploited. This issue has been patched in version 0.9.0.
This vulnerability carries a LOW severity rating with a CVSS v3.1 score of 2.6, indicating it can be exploited remotely over the network but requires specific conditions to be met though user interaction is required requiring only low-level privileges . The vulnerability impacts limited data confidentiality, for affected systems. Impacting 1 product from vllm organizations running these solutions should prioritize assessment and patching.
Reported in 2025, this vulnerability emerged during an era marked by increased sophistication in supply chain attacks, cloud infrastructure vulnerabilities, and software-as-a-service (SaaS) security challenges. Security practices during this period emphasized zero-trust architectures, container security, and API protection.
2025-05-29T17:15:21.327
2025-06-24T18:25:31.883
Analyzed
CVSSv3.1: 2.6 (LOW)
SecUtils normalizes and enriches National Vulnerability Database (NVD) records by standardizing vendor and product identifiers, aggregating vulnerability metadata from both NVD and MITRE sources, and providing structured context for security teams. For vllm's affected products, we extract Common Platform Enumeration (CPE) data, Common Weakness Enumeration (CWE) classifications, CVSS severity metrics, and reference data to enable rapid vulnerability prioritization and asset correlation. This record contains no exploit code, proof-of-concept instructions, or attack methodologies—only defensive intelligence necessary for patch management, risk assessment, and security operations.