SOFA AI Gateway Implementation

SOFA

|

Oct 16, 2025

|

Share on X

1. Background

The gateway, as an important middleware, plays roles in traffic governance, routing forwarding, protocol conversion, and security protection in traditional business scenarios. According to the positioning of different business scenarios, various types of gateways will emerge, such as traffic gateways, ESBs (Enterprise Service Bus), API gateways, and cloud-native gateways. From the perspective of gateway responsibilities, the essential duties have not changed much; the main focus has been on more adaptation based on different business scenarios, to better meet business usage. For example, the API gateway is designed for microservice scenarios, transforming the previous management granularity from coarse-grained traffic or services to fine-grained management at the REST or interface level, thus achieving more refined governance. This is the core driving force behind the evolution from a traffic gateway to an API gateway.

In the AI scenario, there has been a fundamental transformation in business models, and the challenges faced by the gateway have shifted from 'service' to 'model' and 'intelligent agents.' This change is not a simple technological iteration; it also brings about a comprehensive reshaping of business logic, interaction modes, resource consumption, and risk models.

To effectively support increasingly diverse and complicated AI business scenarios (such as service models, intelligent agents, AI applications, and MCP, etc.), the role of the API gateway urgently needs to be upgraded from a general type to a specialized AI gateway. The core capabilities of the existing general-purpose gateway can no longer meet the specific demands of these scenarios. Therefore, the AI gateway has specifically expanded and strengthened its capability set, deriving core features such as intelligent routing, unified model access, semantic caching, content security, MCP proxy, and model rate limiting.

The SOFA commercialization team has launched the SOFA AI gateway, also known as SOFA Higress, to meet customer needs for AI business development.

2. Positioning of SOFA AI Gateway

SOFA AI Gateway (also known as SOFA Higress) is built on the open-source Higress kernel, specifically optimized and enhanced for SOFA scenarios. It is an intelligent gateway solution aimed at AI needs.

From the outset, the positioning of SOFA AI Gateway has been made clear: to provide specialized services for three core AI business scenarios:

  • Agent Proxy: Serving as a unified entry and exit gateway for agent traffic, providing security protection and traffic control. At the same time, it acts as a toolset (Tools Hub) for intelligent agents, managing the tool list centrally and facilitating connections between agents and external systems through the AI gateway. Quickly converting existing business APIs into tools recognizable and callable by agents. Additionally, it offers REST-to-MCP conversion functionality through the MCP protocol, accelerating the MCP transformation process of existing businesses and significantly simplifying the docking and calling of agents.

  • Model Proxy: Providing model inference gateway capabilities, integrating core functionalities such as semantic caching, content security, and unified access, significantly reducing the complexity and cost of model access. At the same time, based on refined business attributes and characteristics, it offers precise model rate limiting guarantees.

  • MCP Market Service: Building a dedicated MCP market for the financial sector, providing specialized financial data and a rich array of financial business services, empowering financial scenarios and effectively improving the efficiency and quality of intelligent agent development.

The following sections will elaborate on the above three aspects in detail.

3. Practical Implementation

3.1 Technology Selection

SOFA AI Gateway utilizes Higress as its kernel, mainly considering its strong open-source community and rich extension mechanism, while also aligning with the future goal of multi-gateway integration. Therefore, we built upon the Higress gateway and migrated existing capabilities from API gateways, data gateways, intercommunication gateways, etc.

3.2 Entry and Exit Gateway for Intelligent Agents

Currently, intelligent agents are undoubtedly the hottest topic, with many enterprises beginning to build their own vertical business intelligent agents. To help enterprises build their agents better and faster, we have clearly positioned the gateway as a unified entry and exit gateway for agent traffic.

SOFA AI Gateway provides key capabilities for intelligent agents:

  • Ensuring Entry Security and Stability: Implementing security protection and business rate limiting on downstream traffic entering the intelligent agent, ensuring the stable operation and security protection of the intelligent agent applications.

  • Empowering Core Capabilities of Intelligent Agents: Intelligent agents depend on models, tools, knowledge bases, etc., for reasoning, planning, and mitigating hallucination issues, to continuously improve the quality of answers to questions, ultimately aiming to become specialized intelligent agents. To this end, the gateway centrally converges outgoing traffic on the intelligent agent's exit side, simplifying the cost of integrating agents with external systems.

SOFA AI Gateway provides the following key functions mainly on the exit side of the intelligent agent traffic:

  • Model Proxy: Providing a unified model access and management layer that supports easy model replacement (such as A/B testing validation), traffic control, and unified token management, greatly simplifying the iterative process of models during intelligent agent development. Given the high cost of model resources, the gateway has also implemented fine-grained model call rate limiting at the business level, effectively preventing certain businesses from over-consuming resources and ensuring the performance and stability of the overall model services, avoiding resource contention.

  • Tool and MCP Management: SOFA AI Gateway acts as a bridge between intelligent agents and existing enterprise systems, standardizing existing REST APIs into Function calls recognizable by intelligent agents, facilitating convenient integration and unified management of services. With the emergence of the MCP (Model Context Protocol), the gateway further converts existing APIs into MCP format for use by intelligent agents, greatly simplifying the backend service docking process for agents. For external AI services procured by enterprises (which typically have independent authentication systems), the gateway acts as a unified exit proxy, handling complex interconnection protocols and authentication, allowing intelligent agents to seamlessly call external capabilities and focus on core business logic.

  • Data Services and Rapid Data Retrieval: The SOFA Gateway has built-in data open APIs that can dynamically generate REST APIs through SQL queries from the results processed by large data platforms, thus encapsulating tools directly usable by intelligent agents. In light of the emergence of NL2SQL (Natural Language to SQL) and NL2Data (Natural Language Data Retrieval) technologies, the gateway plans to integrate such functionalities, supporting users/agents in efficiently retrieving required data through more natural language instructions in the future.

3.3 Inference Gateway - Intelligent Routing Proxy for Models

There are significant differences between the gateway's proxy model services and traditional service proxies. The root of this difference lies in the unique traffic characteristics of model services, mainly including:

  • High Latency and Queuing Effects: Model inference requires complex calculations, and processing time for a single request far exceeds that of traditional services (which can take seconds to minutes). When a new request arrives and the instance is busy, it will enter a queue for waiting, leading to a significant extension in the response time for the first token, degrading user experience. This contrasts sharply with the quick processing mode of traditional services.

  • High Resource Consumption and Continuous Occupation: Model inference relies on dedicated hardware like GPUs and is computationally intensive. GPU resources (memory and computational power) represent a key bottleneck; a single inference request will occupy resources throughout the process and cannot quickly release resources like traditional stateless services.

  • Substantial Variation in Processing Times: The time taken for model requests varies greatly and is influenced by input/output lengths, model complexities, and task types (ranging from seconds to minutes). This uncertainty makes traditional fixed time window or connection count-based load balancing strategies difficult to apply.

Given these core characteristics of model traffic, traditional load balancing strategies commonly used by gateways (such as simple polling, least connections, and random) often perform poorly in model service proxy scenarios or even backfire. For example, polling may assign new requests to instances that are already overloaded and queued, further exacerbating delays. Therefore, gateway solutions for model services need to provide smarter routing strategies that can dynamically make decisions based on real-time load on model instances, KV Cache status, queue situations, and other indicators.

SOFA AI Gateway, as a unified entrance for models, is responsible for realizing multi-cluster routing and proxying for models, providing lifecycle management for model registration and decommissioning as well as intelligent routing capabilities.

The intelligent routing logic of SOFA AI Gateway is different from the implementations of open-source Higress and industry inference gateways, yet it integrates the advantages of both. Higress's intelligent routing capability is entirely implemented in plugins, meaning that all routing logic is developed and integrated through a plugin approach, including routing based on metrics indicators. This design typically performs better in terms of performance. In contrast, current industry inference gateways generally implement routing selection based on the Gateway API Inference Extension specification through independently deployed EPP services.

For the sake of improving delivery efficiency, SOFA AI Gateway did not choose to directly modify the Higress data plane source code to integrate Gateway API Inference Extension capabilities, nor did it allow business sides to directly write routing plugins within the plugin. Instead, we developed Higress plugins supporting ext-proc protocol to connect with the EPP services on the business side or used HTTP protocols to interface with traditional services, facilitating custom routing extensions.

Of course, in the future, to better align with industry standards, we also plan to make modifications to the data plane to integrate native Gateway API Inference Extension capabilities.

3.4 MCP Market

In the practice of intelligent agent projects, we realized that high-quality tools (especially specialized MCPs) and authoritative data are key to the capabilities of intelligent agents. General-purpose large models face significant limitations in professional fields like finance: knowledge may be outdated, lacking deep industry understanding, and difficult to ensure the accuracy and compliance of responses.

The role of specialized tools (MCP) lies in:

  • Providing Precise, Real-Time Professional Capabilities: Encapsulating complex tasks such as financial analysis, diagnosis, and interpretation into callable services, ensuring the professionalism and reliability of output results.

  • Accessing Authoritative, Dynamic Data Sources: Directly connecting with processed professional data and core financial data from partners, addressing the issues of outdated data and single-source data in general models.

  • Enhancing Efficiency and Scalability: Modularizing and servicing specific capability modules for convenient on-demand calls by intelligent agents, while facilitating continuous iteration and reuse of capabilities. Therefore, based on Ant Financial's expertise, as well as intelligence agent development experience from projects in Ningbo, we have packaged quality financial data and services into MCPs, building an MCP market to provide SaaS services for proprietary cloud intelligent agents. We aim to standardize and service-pack financial expertise (knowledge, data, processes, risk control, etc.) to construct a financial capability 'Lego' market. The SOFA AI platform has already launched and is continuously enhancing a series of MCPs aimed at financial scenarios, providing intelligent agents with a powerful 'toolbox.' Several financial domain MCPs, such as product diagnosis, configuration selection, market interpretation, and event interpretation, have already been launched.

Address of the MCP Market:https://mcp.sofa.antdigital.com/mcp/home

4. Future Prospects

In the construction process, we have also encountered some new challenges, primarily including insufficient accuracy in entity recognition and MCP context overflow.

Unclear Entity Extraction: When users query or operate MCP services through natural language, the relevant key inputs (such as fund or stock names or codes) depend heavily on precise entity recognition. However, when users use aliases, non-standard industry terms (commonly known as 'black words'), or incomplete names, the results extracted by the model may not accurately correspond to real financial entities (like fund names or stock codes). This directly affects the accuracy of subsequent processing and user experience. Therefore, we urgently need to introduce the 'Slot Extraction' engineering capability to refine the verification and mapping of recognition results, enhancing user interaction experience and information recall rates.

MCP Context Explosion: Currently, the platform has launched 15 specialized MCPs, and this number will continue to increase in the future. Accessing too many MCPs significantly inflates the processing context of each request, placing pressure on the model's performance and resource consumption. In response to this issue, constructing a set of intelligent MCP routing mechanisms is essential to accurately filter the required service modules based on user requests, avoiding unnecessary context loading.

Building slot extraction capabilities and intelligent routing capacities for MCPs will also be a key area of focus for SOFA AI Gateway in the second half of the year.

5. Acknowledgements

Thanks to the Higress open-source team; without such a great product, the rapid incubation of SOFA AI Gateway would not have been possible. Special thanks to @Chengtan for providing professional answers during the construction of SOFA AI Gateway.

Contact

Follow and engage with us through the following channels to stay updated on the latest developments from higress.ai.

Contact

Follow and engage with us through the following channels to stay updated on the latest developments from higress.ai.

Contact

Follow and engage with us through the following channels to stay updated on the latest developments from higress.ai.

Contact

Follow and engage with us through the following channels to stay updated on the latest developments from higress.ai.