Semantic Caching - Higress.AI

Scene Description

The gateway optimizes performance through:

Precision Caching: Reduces token consumption and latency by caching repetitive/similar queries.
Semantic Context Caching: Stores LLM response contexts in-memory, automatically injecting historical dialogues into subsequent prompts to enhance contextual understanding.

Pratice Description

The AI gateway optimizes inference latency and costs by caching LLM responses in an in-memory database through gateway plugins. It automatically stores historical dialogues of corresponding users at the gateway layer and populates them into the context during subsequent conversations, thereby enhancing the large model's understanding of contextual semantics.

Try it from docs

Try it from docs

Try it from docs

Try it from docs

Contact

Follow and engage with us through the following channels to stay updated on the latest developments from higress.ai.

https://medium.com/@higress_ai

https://x.com/higress_ai

https://github.com/alibaba/higress

https://discord.gg/tSbww9VDaM

Menu

Partners：

｜

｜

SpringCloudAlibaba

｜

SpringAIAlibaba

｜

｜

｜

｜

Contact

Follow and engage with us through the following channels to stay updated on the latest developments from higress.ai.

https://medium.com/@higress_ai

https://x.com/higress_ai

https://github.com/alibaba/higress

https://discord.gg/tSbww9VDaM

Partners

｜

｜

SpringCloudAlibaba

｜

SpringAIAlibaba

｜

｜

｜

Contact

Follow and engage with us through the following channels to stay updated on the latest developments from higress.ai.

https://medium.com/@higress_ai

https://x.com/higress_ai

https://github.com/alibaba/higress

https://discord.gg/tSbww9VDaM

Contact

Follow and engage with us through the following channels to stay updated on the latest developments from higress.ai.

https://medium.com/@higress_ai

https://x.com/higress_ai

https://github.com/alibaba/higress

https://discord.gg/tSbww9VDaM

Menu

Partners：

｜

｜

SpringCloudAlibaba

｜

SpringAIAlibaba

｜

｜

｜

｜