Higress AI Gateway Development Challenge Participation Guide
Cheng Tan
|
Oct 14, 2025
|
Core Interpretation of the Competition Tech Stack
1.1. Extending Higress: An Introduction to Go and Wasm Plugins
Plugins are the core mechanism for injecting intelligence into the Higress data plane. All competition topics for this event basically need to be implemented by writing or using plugins.
WebAssembly (Wasm): A Secure, Cross-Language Sandbox Technology
WebAssembly (Wasm) is a portable binary instruction format running in a secure sandbox environment. It allows code written in various languages such as Go, Rust, C++, etc., to run securely in host applications like Envoy/Higress.
Proxy-Wasm Specification: This is a standard Application Binary Interface (ABI) developed for proxy environments, defining how Wasm modules interact with the underlying network capabilities of the proxy (such as reading and writing Headers, and Body).
Language SDK: To simplify development, the community provides SDKs for specific languages, such as
[https://github.com/higress-group/wasm-go](https://github.com/higress-group/wasm-go)
. The SDK encapsulates the underlying ABI details, allowing developers to write plugin logic in a manner more consistent with their language habits. The usage of this SDK can be referenced in this document: https://higress.cn/docs/latest/user/wasm-go
Native Go Plugins: High Performance and Development Simplicity
In addition to Wasm, Higress also supports using native Go language to develop plugins. Compared to the safety sandbox and multi-language support provided by Wasm, native Go plugins can offer a simpler development process and direct access to host capabilities in certain scenarios. Especially for Topic Three (Smart Routing), participants can freely choose any plugin extension method, including Wasm or native Go plugins, to implement their solutions.
1.2. Language of the Agent: MCP
Beyond Traditional APIs
With the rise of AI Agents, traditional REST APIs struggle to handle dynamic, stateful, tool-driven interaction patterns. The MCP protocol (Model Context Protocol) was born to address this challenge.
What is MCP?
MCP is an open standard first introduced by Anthropic and open-sourced, aimed at connecting AI applications with external tools, data sources, and workflows. MCP can be compared to the USB-C interface for the “AI world,” providing a unified and standardized specification for interconnection between different systems.
Client-Server Architecture
MCP follows the client-server architecture pattern:
MCP Server: A program exposing a series of capabilities (tools) to the AI Agent. For example, a MCP server could provide tools for querying a database or sending emails. In Topic Two, participants will enhance the functionality of an existing RAG MCP server.
MCP Client: Typically embedded in AI applications, responsible for communicating with the server to discover and invoke the tools it provides.
The Role of Higress: Higress can host the MCP server through its plugin mechanism. This brings a key architectural advantage: all calls to the Agent's tools can go through this unified entry, benefiting from enterprise governance capabilities provided by the gateway such as authentication, throttling, observability, etc.
1.3. HiMarket AI Marketplace
From Project to Product
HiMarket is an “enterprise-level AI capability marketplace and developer ecosystem center.” Its main goal is to package, publish, manage, and operate core AI assets like model APIs, MCP servers, Agent APIs, etc., in a standardized manner, transforming them from an internal project into a discoverable, subscribable “AI product.”
The core components of HiMarket include:
AI Open Platform Management Console: Administrators and operators will package the underlying AI capabilities here into “AI products.”
AI Open Platform Portal: A “storefront” for developers, where they can discover, subscribe to, and test AI products.
AI Gateway (Higress): The underlying engine that carries all AI calls and is responsible for executing policies such as security and flow control.
Publishing Workflow
For teams selecting Topic One, publishing the final results to HiMarket is a critical step. The process examples are as follows:
Create an “API Product” in the management console.
Associate the product with the Agent routing configured in Higress.
Add complete documentation and usage guidelines for the product.
Publish the configured product to the portal with one click.
After that, other developers can register, create consumers (obtain credentials), and subscribe to use the Agent API on the portal.
In-Depth Introduction and Guidelines for the Competition Topics
This competition has three main topic directions, and participants can choose any one to compete. Below are the participation guidelines for each topic direction:
2.1. Topic One: Accelerating AI Agent Development
2.1.1. Vision: Empowering the Agent Development Process
Building high-quality AI Agents is a complex system engineering task that requires orchestrating LLMs, managing external tools through protocols like MCP, and dealing with complex business logic. The ultimate goal of this topic is to create a tool that can significantly reduce this complexity, abstracting and simplifying the Agent construction process. As emphasized in the topic guidelines, the tool developed must play a “critical role” in “lowering development costs” and “enhancing the quality of Agents,” serving as the core criteria for measuring the value of the submission.
2.1.2. Task: Create a “Productivity Amplifier”
This topic requires delivering a command-line tool or a web tool with a graphical user interface. It must leverage Higress plugins, LLMs, and MCP capabilities to empower and simplify the Agent creation process.
Some innovative ideas for reference:
MCP Automation Generation Tool: For example, if a user provides an HTTP interface or the corresponding code repository, this tool can read these materials and automatically generate MCP Server plugin configurations that can run on Higress. The existing community project
openapi-to-mcpserver
(https://github.com/higress-group/openapi-to-mcpserver) can read standard OpenAPI (Swagger) specification documents and generate MCP Server plugin configurations that run in Higress, serving as a working foundation.Agent-Oriented Intelligent Assembly for MCP: Extend Higress's capabilities for MCP Server plugins, allowing Higress to serve as an MCP Hub that intelligently selects tools and assembles them according to the Agent’s needs in various scenarios, enhancing the accuracy of tool calls and reducing costs.
Visual Agent Orchestrator: Create a Web UI that allows developers to drag and drop different MCP tools and prompts into a complete workflow, automatically generating the core logic code for the Agent, implemented through Higress plugins.
2.1.3. Key Point: Publish to HiMarket
Publishing the constructed Agent API to HiMarket is a critical step for this topic.
The specific publishing process is as follows:
Containerize the Agent Service: Package the core logic of the Agent and its MCP server into a Docker image.
Configure Higress: Set up routing rules in Higress to correctly forward external traffic to the Agent MCP server.
Create HiMarket API Product: Log in to the HiMarket management console, define product information, associate it with the Higress routing, and set different usage plans (such as free or paid version).
Draft Usage Guidelines: Write a clear and detailed usage document for the API product in HiMarket. This is key to ensuring that the product can be used smoothly by other developers.
Publish and Test: Publish the product to the portal. Then, you can simulate the end user, complete registration and subscription on the developer portal, and conduct end-to-end testing of the entire process.
2.2. Topic Two: Enhancing RAG Capability
2.2.1. Challenge: From “Simple RAG” to “Advanced RAG”
The basic RAG system has obvious bottlenecks: it may retrieve irrelevant or noisy documents, which can directly lead to factual errors or “hallucinations” in LLM outputs. The mission of this topic is to significantly enhance the performance and robustness of the built-in RAG MCP server in Higress by implementing one or more advanced RAG techniques.
2.2.2. Starting Point: Baseline RAG MCP Server
Code Repository Address:
https://github.com/alibaba/higress/tree/main/plugins/golang-filter/mcp-server/servers/rag</font>
Operation Guide: First, fork this repository as the starting point for the project. Before starting coding, it is recommended to analyze the structure of the existing code in depth, understand the data processing flow, and clarify key processes such as query encoding, retrieval, and generation, to find the best points to inject enhancement logic.
2.2.3. Enhancement Path: In-depth Exploration of Advanced RAG Technologies
The work will optimize various stages of the RAG process. Here are three main technical directions, any one or more of which can be chosen for in-depth implementation.
A. Pre-Retrieval Optimization
The core idea of this stage is to optimize the user query before it is sent to the vector database in order to improve the retrieval “hit rate.”
Query Expansion: By algorithms or LLMs, expand the original query into a richer version that includes synonyms, near-synonyms, or related terms to broaden the retrieval scope.
Query Transformation: Utilize LLM’s understanding ability to rewrite vague, colloquial user questions into more structured query statements suitable for vector retrieval. For complex issues, they can also be decomposed into several simpler sub-issues, retrieved separately, and then summarized.
B. Post-Retrieval Optimization
This stage performs “cleaning” and “refining” of the retrieval results after documents have been retrieved from the database and before they are sent into the LLM.
Re-ranking: Initial retrieval (usually using efficient vector similarity computation) aims to ensure “recall rate,” identifying as many relevant documents as possible. However, this may introduce noise. Re-ranking employs a more powerful but computationally costly model (such as a Cross-Encoder) to perform a secondary ranking on the initially retrieved document list, placing the most relevant documents accurately at the top. This step is crucial because research shows that LLMs are very sensitive to the position of information in context (i.e., the “lost in the middle” problem), and putting the most critical information at the beginning can significantly improve generation quality.
Context Compression: To reduce noise and shorten the context length given to the LLM (thus lowering API costs and latency), compress the retrieved documents. This includes filtering out irrelevant sentences and even discarding entire irrelevant documents.
C. Corrective Retrieval
This is a more cutting-edge paradigm that introduces an active verification and self-correction mechanism for the RAG system.
Introduction to CRAG (Corrective Retrieval Augmented Generation): The core of the CRAG method is to introduce a lightweight “retrieval evaluator,” responsible for scoring the relevance of the set of retrieved documents to the original query.
Three corrective actions:
Correct: If the evaluator gives a high confidence score, the system will refine the retrieved documents (e.g., decompose and filter out irrelevant parts) for generation.
Incorrect: If the confidence is low, the system will decisively discard these internal retrieval results and actively trigger web searches to find more accurate information from broader external sources.
Ambiguous: If the confidence is intermediate, the system will adopt a mixed strategy, combining internal document refinement with results from external web searches.
Implementing CRAG will be a major highlight of this project. It involves not only simple optimization but also builds a smarter and more robust RAG architecture, reflecting a profound understanding of RAG system failure modes.
2.2.4. Proof of Improvement: Rigorous Quantitative Evaluation Guidelines
This topic requires providing quantitative evaluation results. Contestants need to choose publicly available datasets (such as HotpotQA, Natural Questions, etc.) and provide quantitative metrics (such as the Precision/Recall of retrieval, the Factuality Score of generation, etc.) to demonstrate that their solution significantly improves the RAG capability compared to existing code.
Choosing Datasets
It is recommended to use publicly available, standardized question-and-answer datasets for evaluation.
HotpotQA: This dataset contains about 113,000 question-and-answer pairs based on Wikipedia. Its feature is that questions require “multi-hop reasoning,” meaning that information from multiple documents must be combined to answer. It is very suitable for evaluating RAG enhancement strategies that require integration and reasoning of multiple information sources.
Natural Questions (NQ): This dataset contains over 300,000 authentic Google search queries, with answers labeled by humans from Wikipedia pages. It provides a diverse question distribution that is closer to real-world application scenarios.
Defining Evaluation Metrics
It is advisable to assess system performance from both “retrieval” and “generation” dimensions.
Retrieval Quality Metrics:
Precision: The proportion of truly relevant documents among all retrieved documents.
Recall: The proportion of relevant documents that were successfully retrieved by the system among all documents that should have been retrieved.
Generation Quality Metrics:
Factuality Score: This is the ultimate standard for measuring the end-to-end quality of the RAG system. It assesses the factual consistency of the final answer generated by the LLM against the “ground-truth.” It can refer to the classification system used in common evaluation frameworks (like promptfoo), categorizing answer consistency into levels A (subset and consistent), B (superset and consistent), C (completely consistent), D (conflict), E (different expression but factually consistent), etc., and scoring based on this.
2.3. Topic Three: Intelligent Routing
2.3.1. Objective: Intelligent, Efficient, Data-Driven LLM Routing
In real-world LLM applications, not all requests have the same level of complexity. Using a large, expensive model to answer simple questions is a waste of resources; on the other hand, using a small model for complex tasks does not guarantee effectiveness. The mission of this topic is to build an intelligent routing system that can accurately dispatch requests to the most suitable model based on the semantic content of the requests, thereby finding the best balance among cost, latency, and accuracy.
2.3.2. Architectural Inspiration: vLLM Semantic Router
The conceptual model for this topic is derived from the excellent open-source project in the community vllm-project/semantic-router
. Its core operating mechanism is:
Semantic Classification: Utilize a lightweight classification model (e.g., BERT) to analyze the semantics of incoming requests and classify their intents (e.g., “This is a coding issue,” “This is a math problem,” etc.).
Envoy Integration: This project achieves implementation through the
ext_proc
extension mechanism of Envoy, which forms a direct parallel reference to building plugins for Higress (also based on Envoy).
2.3.3. Core Innovation: Self-Optimizing Data Pipeline
This is the most crucial and innovative requirement of this topic. What needs to be realized is not just a static router but a system capable of continuous self-evolution. The specific requirement is that the gateway, while processing requests, must not only route the request to the predicted best model (e.g., Model A) but also send that request to a baseline model (e.g., Model B).
Data Collection: The plugin must capture a complete, structured data record for each request, formatted as
{request, model_A_output, model_B_output, latency, cost}
. This high-quality data, with parallel results, will become a valuable asset for future iterative training.Flywheel Effect: This design creates a powerful “flywheel effect.” The initial routing model makes predictions; the system collects more accurate training data through parallel calls; using this new data, a better routing model can be trained; a superior model makes more accurate predictions, thus collecting higher quality data... This cycle allows the system to enter a self-optimizing positive loop. This reflects the core concepts of MLOps and system design, and serves as a key measure of the scheme's advancement.
2.3.4. Evaluation and Testing: Using the Official Mock Service
This topic provides a complete Mock service and evaluation API that all development and testing must be based on. It is essential to strictly follow the specifications defined here: https://github.com/alibaba/higress/issues/2946
Development and Verification Process:
Data Collection Stage:
Use the training dataset service
http://sem-router-train.higress.io/questions
to obtain a list of questions.For each question, the plugin should send it in parallel to multiple models (for instance, a generic large model and one or more domain-specific smaller models).
Obtain Ground Truth Scores:
For each answer returned by the model, must call the
/v1/evaluate
interface to obtain an objective correctness score. This score will serve as the label for the training data.
Train Classifier:
Use the collected data (including question text, models used, latency, cost, and evaluation scores) to train a classification model. The task of this model is to input a question and output the predicted best model.
Deployment and Validation Stage:
Use the validation dataset service
http://sem-router-verify.higress.io/questions
to obtain validation questions.At this point, the plugin should use the classifier trained in the previous step to select a single model to call for each question.
Objective Assessment:
The objective score of the work will be based on the average accuracy across the entire validation set (from the
/v1/evaluate
interface) and average response latency.
Competition Process and Participation Guidelines
3.1. Key Milestones and Deadlines
Be sure to remember the following key timeframe and plan your project progress accordingly.
Submission Deadline: December 10, 2025
Final Defense Date: December 29, 2025
The competition is mainly divided into two phases: before the submission deadline, focus on project development, coding, and documentation writing; after submission, the finalist teams need to prepare presentation materials and on-site demos to face questions and reviews from judges.
3.2. Submission Guidelines and Best Practices
Complete project proposal documents, source code, and demo presentation videos must be submitted. The submission address is:
https://survey.taobao.com/apps/zhiliao/6lgXsPIYX
Code Submission
Code Project Files: Ensure that all code is stored in a well-organized code project file, making it convenient for subsequent open-sourcing on the AtomGit platform.
README is the “face” of the project: Invest time in writing a high-quality
README.md
file. It should clearly articulate the core ideas, architectural design, innovations, and provide detailed installation, configuration, and operational guidelines.Code Quality: Clean, standardized, and well-commented code will leave a very positive impression on the judges.
Project Documentation
Design Document: It is recommended to include a concise design document explaining the trade-offs and decisions made in technical selections and architectural design.
Results and Evaluation (especially applicable to Topics Two and Three): There should be a specific chapter detailing the evaluation methods, presenting quantitative results (using tables and charts), and conducting in-depth analysis and interpretation of the results.
Project Presentation
Demo Video: It is highly recommended to record a 3-5 minute project demo video. This is the most intuitive and effective way to showcase the project’s actual running effects and user experiences.
Final Defense Preparation
Prepare a logically clear and focused presentation material (PPT).
Ensure that the project can perform smooth on-site demonstrations.
Be prepared to answer in-depth technical questions that judges may raise, fully showcasing your profound understanding of the project and related technologies.