### Summary Xgrammar includes a cache for compiled grammars to increase performance with repeated use of the same grammar. This cache is held in memory. Since the cache is unbounded, a system making use of xgrammar can be abused to fill up a host's memory and case a denial of service. For example, sending many small requests to an LLM inference server with unique JSON schemas would eventually cause this denial of service to occur. ### Details The fix is to add a limit to the cache size. This was done in https://github.com/mlc-ai/xgrammar/pull/243 An example of making use of the new cache size limit can be found in vLLM here: https://github.com/vllm-project/vllm/pull/16283 ### Impact Any system making use of Xgrammar and taking requests as input from potentially untrusted parties would be vulnerable to this denial of service issue.