Recent revelations brought about by new releases in the field of AI

Wang Chen

Oct 10, 2025

Share on X

Since the beginning of 2024, in the rapid development of AI infrastructure, the AI gateway at the PaaS layer is one of the most noticeable changes in infrastructure. Starting with the static rules and simple routing of traditional gateways, the role of the gateway has been continuously stretched. Users are attempting to make systems more flexible, controllable, and usable by using gateways for multi-model traffic scheduling, intelligent routing, Agent and MCP service management, AI governance, and more.

During the National Day holiday, the AI community released/upgraded several products, and we present a brief report here to glimpse the insights into the new direction of AI gateway evolution.

1. Low-threshold post-training tools with higher freedom

The Thinking Machines Lab, founded by former OpenAI CTO Mira Murati, has released its first product, "Tinker," aimed at simplifying the fine-tuning process of large-scale language models. Users can write training loops in Python, while Tinker is responsible for managing the infrastructure for distributed training. Currently, it supports Alibaba Cloud's Qwen series models and Meta's Llama series, primarily targeting researchers, developers, and AI enthusiasts, especially those needing customized models.

Compared to the low-threshold (white-screen) model fine-tuning tools/platforms already provided by various cloud vendors, Tinker's main advantage is its higher freedom in fine-tuning. It provides a foundational and clear API interface for writing experiments and training pipelines, allowing fine control over loss functions, training loops, data workflows, etc., with standard Python code.

Although foundational models are becoming increasingly intelligent, Tinker hopes to enable more people to build models of specialized knowledge and domains based on private data and algorithms in a more open way.

This trend will further strengthen the demand for AI gateways in multi-model traffic scheduling and intelligent routing scenarios. Currently, most AI gateways adopt simple routing strategies that are unrelated to content (such as based on URL paths), making it impossible to understand the intrinsic semantics of AI requests. This routing approach results in all requests, whether simple or complex, being directed to the same default model, causing model mismatch and even wastage of resources. By utilizing intelligent routing features to direct semantically compliant requests to post-trained models and routing others to foundational models, Higress is developing related capabilities. If you have the relevant technical background, you are welcome to participate in our challenge.

2. A reactivated tool ecosystem

In the "AI Native Application Architecture White Paper," we shared that the significance of tools lies in extending large models into the physical world, exposing the read and write rights of the physical world to models in a controlled manner, allowing them to operate effectively within safe boundaries. Although there are thousands of MCP Servers worldwide, the number of Agents calling them is limited, and users employing Agents to call MCP Servers are even scarcer. The reasons for this are complex, involving both integration and output experience issues of Agents, the varying quality of MCP Server entry, and the psychological costs and benefits of user adoption.

The App Inside ChatGPT launched by OpenAI has started to attempt to solve these thorny problems. The author believes this serves as an excellent demonstration. The most actively used Agents globally are bringing better AI experiences through the tool ecosystem, which can greatly activate the developer ecosystem.

For other Agents, the greatest benefit is the emergence of standard paths. In the past, each team was trying to solve tool integration issues independently, writing various adapters and defining their own interfaces. If the tool ecosystem of ChatGPT stabilizes, it would provide a template for the entire industry: how tools should expose capabilities, how parameters should be described, how they should be retrieved by models, and how they can be naturally referenced in conversations. Agent developers can directly reference these paradigms and focus on their own business scenarios.
For MCP Servers, as more users get used to directly calling apps like Figma, Canva, and Zillow within ChatGPT, this change in user perception will stimulate more developers to improve their MCP tools, making them respond faster, with more intuitive parameters, and more interpretable results, rather than just conforming to specifications and going live quickly.

Although the App Inside ChatGPT adopts MCP standards, the developer guidelines provided have essential differences, such as supporting two-way real-time communication. It also offers dedicated design guidelines, emphasizing that applications (APPs) exist within ChatGPT, extending user capabilities while not interrupting the conversational flow, seamlessly integrating into the ChatGPT interface through lightweight cards, carousels, full-screen views, and other display modes while maintaining clarity, credibility, and voice functionality.

These differences are aimed at addressing the three major issues mentioned above. For detailed differences in development guidelines and design guidelines, see:

https://developers.openai.com/apps-sdk

Next, we will directly compare the differences in user interaction experiences between the two.

Interaction Experience	ChatGPT Calling APP	APP Calling in MCP Mode
Integration / Coherence	Users can directly trigger an APP by mentioning it in a conversation, and users may not notice the process of the APP being called.	The model displays prompts like "Calling / External tools processing / Waiting for response" during the conversation, and users can sense that the tool is a part of the model's thinking chain.
Tool Experience Built-in	The controlled APP operates within ChatGPT.	The controlled APP requires jumping out of ChatGPT to operate in the APP.
Real-time Performance	Supports Real Time communication methods.	Mainly one-way communication.
Visibility / Tool Exploration	There is a display directory for the APPs, recommendations, icons, descriptions, etc., allowing users to browse/enable/disable tools. Tools can be designed to be installed in ChatGPT like plugins.	Tools are typically a part of the internal model; users may not have a clear tool directory interface, with the decisions made by the model for whether to call them.
Visual Consistency	A consistent design for the UI/conversation interface/interaction style of tools leads to a more unified user experience.	Tools may behave differently under different models/Agent calls, making it harder to guarantee experience consistency.
Error / Fallback	A smooth degradation/capture/retry/fallback can occur when a tool call fails, allowing the conversation to continue. The user experience remains relatively stable.	If the model/tool call fails, the model itself must determine how to fallback, which may lead to conversation interruptions/incoherence/clear error prompts.
Operation Authority / User Authorization	User authorization/data access/privacy boundaries can be uniformly managed, allowing users to control permissions for APPs within the ChatGPT platform UI.	Authorization/data access controls may need to be managed by the model/tool designers at the calling level; users may need to authorize multiple times or manually intervene.

Overall, the calling mechanism of the ChatGPT Apps SDK tends to hide the calling boundary, reduce the explicit tool feel, aiming to make tool capabilities feel like inherent capabilities of ChatGPT. In contrast, due to its neutrality, the MCP mode cannot define a design guideline for Agents based on which to specify detailed development guidelines, thus opting to preserve a greater sense of visible collaboration between models/Agents and tools, making tool calls more apparent.

This trend will activate the capabilities accumulated by AI gateways in MCP scenarios, such as MCP Server agents, curated tools, security certification, as well as unified observation, throttling, and other governance capabilities.

3. Agent Developer Suite

Does anyone remember the Langchain CEO’s rebuttal in April this year against the Agent construction guidelines produced by OpenAI, claiming that the guidelines lack technical details and are hard to guide developers in determining the specific functions and implementation methods of Agents? He believed that a reliable orchestration layer is crucial for developers, significantly enhancing accuracy in obtaining context and providing direct assistance for data persistence and fault tolerance.

Six months later, OpenAI released its own Agent Developer Suite (Agent Kit), including Agent Build, Connector Registry, and Chat Kit.

Agent Build:

We previously shared two mainstream agent building paradigms in the "AI Native Application Architecture White Paper": low-code and high-code. Agent Build is a low-code tool for creating visual canvases of multi-agent workflows and version control, enabling engineers and domain experts to collaborate within the same interface, providing a user-friendly display mode. Like other frameworks, it also offers capabilities such as models, tools, safety barriers, knowledge bases, memory, evaluation, etc. Additionally, it provides a high-code orchestration framework for SDKs, which is a better application building model compared to low-code.

The Connector Registry facilitates the management of data and tools' connection methods among OpenAI products, somewhat like Himarket provides management for APIs, MCPs, and Agents, except that the former is used for centralized backend management of connectors. Chat Kit introduces agent workflows into the product UI and connects embeddable chats to agent backends.

In summary, whether you use Agent Build to build a native Agent based on OpenAI API or bring an Agent into existing applications through Chat Kit, these tools further reduce development costs and shorten delivery cycles. As more enterprises and developers deploy AI applications on cloud computing vendors' infrastructure, they require enterprise-level capabilities in security, auditing, permission management, traffic control, cost control, and more, which will increase the demand for enterprise-level computing, storage, and gateways.

4. The significance of video generation tools

It may seem like a video generation tool unrelated to AGI, but it will have a special promoting role in the co-evolution of society and technology. Most practitioners might have a similar intuition since the advancement of AI technology far exceeds the public’s understanding iteration of AI, and the public needs time to understand, adapt, and digest, leading to a gap where the true capabilities of AI remain largely unrecognized and utilized by the public.

Compared to text, Sora possesses stronger emotional resonance and influence and can more easily ignite public creativity. Through Sora, natural language is no longer just a trigger for logical operations but generates a perceivable world, accelerating the public's understanding and application ability of AI.

On the other hand, if LLM allows machines to understand language, Sora enables machines to comprehend the dynamic structure of the world; only the combination of both may allow AI to evolve in physical and socially visualized environments.

This trend will accelerate the demands for AI gateways in multimodal scenarios, including compatibility of communication protocols (such as support for WebSocket), content security auditing, saving request and response content for observability, etc.

Contact

Follow and engage with us through the following channels to stay updated on the latest developments from higress.ai.

https://medium.com/@higress_ai