Analysis of the functions and application scenarios of mainstream API protocols
Wang Chen
|
Sep 8, 2025
|
The author recently participated in the development of an open-source project: HiMarket[1]. This open-source project aims to help developers, especially enterprises, quickly implement a private AI open platform, focusing on managing API and MCP services offered externally. Therefore, this opportunity has been used to summarize the functions and application scenarios of mainstream API protocols in order to clarify easily confused concepts.
API (Application Programming Interface) is, as the name suggests, used to connect different software systems to achieve data exchange and service sharing. In composition, it is a set of specifications or protocols that define how different applications or components interact with each other. The core capabilities of an API can be defined with three keywords: defining rules, decoupling systems, and enhancing reusability.
Broadly speaking, we can also view APIs as a kind of middleware that allows developers to access and use certain features or data without needing to understand the detailed implementation behind it, thereby reducing the complexity of software systems. For example, the OpenAPI provided by Alibaba Cloud offers a series of application programming interfaces that help developers manage resources, data, services, etc. on the cloud via APIs.
With the richness of software forms and application scenarios, the types of APIs are also continuously being enriched. From the early heavyweight SOAP, to the popular RESTful API in the web world, and further to the streaming API in the context of large models, the emergence of each new type corresponds to different engineering solutions under new software forms. This article aims to summarize the positioning, functions, and application scenarios of mainstream APIs to help developers gain a comprehensive understanding of API protocols.
Different perspectives lead to different classification methods. This article categorizes APIs into six types based on application scenarios.
1. Widely Used RESTful APIs
REST (Representational State Transfer) is the most widely used architectural style today. It is based on the HTTP protocol and defines a set of design constraints and principles. Its core idea is that everything is a resource, which can be manipulated through a unified interface. Resources are represented by URLs, and operations are defined by HTTP methods (GET/POST/PUT/DELETE). APIs built on REST are referred to as RESTful APIs.
In the web world, resources usually correspond to a URL. For example:
https://api.example.com/users/123
→ represents a user resourcehttps://api.example.com/orders/456/items
→ represents the product resource in an order
Just like every house in the physical world has a unique address. Common open platforms like WeChat, Alipay, and Gaode provide API services that are RESTful.
Advantages
Intuitive and Understandable: The URL is the resource, and the HTTP methods are the operations, with clear semantics for GET/POST/PUT/DELETE.
Statelessness: Each request carries context independently, so the server does not need to remember the client's state, which improves scalability.
Standardization: Based on the HTTP protocol, it fully reuses existing infrastructures (caching, proxies, load balancing, etc.).
Cross-Language: Text formats such as JSON/XML allow easy parsing across different languages.
How It Works
A typical REST call is divided into two parts: client request and server response:
What the Client Request Contains
URL: Identifies the resource, e.g.,
/users/123
HTTP Method: Defines the operation, e.g.,
GET
for querying,PUT
for updatingHeaders: Additional information, such as content format (
Content-Type: application/json
), authentication token (Authorization: Bearer ...
)Request Body: For
POST/PUT
requests, typically contains JSON data
Request example:
What the Server Response Contains
Status Code: Indicates the result, such as
200 OK
,201 Created
,404 Not Found
Headers: Metadata of the response, such as cache policy, data format
Response Body: Typically JSON, carrying the resource content
Response example:
REST's design philosophy meets the explosive demand of internet applications: lightweight, intuitive, cross-language, and easily extensible. Coupled with the specifications of Swagger (now known as OpenAPI Specification, OAS), it has become the most widely used API form in the internet world, serving as the de facto standard for the Web API protocol.
However, as application scenarios diversify, it has gradually exposed some limitations, thus leading to the emergence of other API forms.
2. Frontend Query-Friendly GraphQL
In the context of mobile internet and frontend complexity, there may be mismatches between the data structures required by the client and the RESTful responses returned by the backend.
Excessive data retrieval: For instance, the client only needs the user's avatar and nickname, but the RESTful API
/users/123
returns the entire user object (containing address, orders, permissions, and a lot more information). This not only wastes bandwidth but also increases parsing overhead.Insufficient data retrieval: For example, the mobile end needs to display user information + the last three orders, but with RESTful, it can only call
/users/123
first, then/users/123/orders
, and finally must filter. One page may require three or four requests, affecting performance and latency.
This inadequacy is especially serious in mobile, complex single-page applications, and cross-platform applications, as the client side and network environment have limited resources. GraphQL was developed precisely to solve this issue. It allows clients to “fetch data on demand,” returning only the fields needed in one request.
GraphQL, developed by Facebook and open-sourced in 2015, is a query language and execution engine for APIs. Its core idea is that the client decides what data it wants, and the server only returns the required fields.
In contrast to RESTful's resource-based categorization, GraphQL has a single unified entry point (usually /graphql
), and clients use query statements (Queries) to specify the data structure precisely. To illustrate the differences between GraphQL and RESTful:
RESTful is a combo meal, where ordering "user information" results in a table full of dishes, including things you don't need;
GraphQL is a self-service section, where users grab a plate and choose an avatar, nickname, and the three recent orders — they take what they need.
Advantages
On-demand data retrieval: Avoids the excessive and insufficient retrieval issues of RESTful.
Single entry point: All requests go through
/graphql
, simplifying routing and maintenance.Strongly typed Schema: Clients and servers share the same type system, reducing issues of inconsistent data formats.
Self-documenting: Schema definitions serve as documentation, enabling developers to automatically fetch interface descriptions through introspection.
Frontend friendly: Frontend teams can independently define the data required, reducing communication costs with backend teams.
Operational Mechanism
GraphQL operates in three steps:
The client constructs a query: Describing the required data fields using a syntax similar to JSON;
The server parses the query: Validating the syntax and field legality according to the Schema;
The server executes the resolver: Fetching data from the database or service and assembling it into a response.
Example of Client Request
Example of Server Response
As can be seen: The client only asks for "avatar, nickname, and three orders"; the server will not give an extra byte.
3. API for Microservices
In cases where performance requirements are not high, RESTful is sufficiently useful. However, as we enter the realm of microservices, the issues become complex:
A single frontend request may need to link 5 to 10 microservices, and RESTful's JSON encoding + HTTP/1.1 protocol is not efficient for serialization and transmission.
Services often involve high-frequency calls, where bandwidth and CPU serialization overhead become major issues, creating a demand for high-performance calling frameworks.
Here, we introduce three familiar high-performance remote calling frameworks or paradigms for microservice architecture.
Apache Dubbo
Apache Dubbo, open-sourced by Alibaba, is an RPC (Remote Procedure Call) framework. Core features include:
Multiple protocol support: By default, it uses the Dubbo protocol (based on TCP/long connections) and later supports gRPC, REST, etc.
Registration center: Typically paired with Zookeeper or Nacos for service discovery.
Load balancing & fault tolerance: Built-in multiple load strategies, call retries, rate limiting, and degradation.
Java ecosystem oriented: Tightly integrates with Spring/Spring Boot.
gRPC
As a variant of the RPC architecture, gRPC was created by Google in 2015 and offers higher performance and lower latency characteristics compared to RESTful APIs.
Protocol: gRPC relies on HTTP/2, which provides better performance and lower latency, while REST uses HTTP/1.1.
Data format: gRPC uses Protocol Buffers (a binary serialization format), resulting in smaller payloads and faster communication; REST typically utilizes JSON or XML (text-based formats).
API design: gRPC follows the RPC paradigm, making it feel like calling a local function; REST adheres to the architectural constraints of the Representational State Transfer model, focusing on resources and state transitions.
Streaming: gRPC supports bidirectional streaming, enabling continuous data exchange between client and server; REST is limited to the request-response communication pattern.
Spring Cloud
In the Spring Cloud system, initial remote calls primarily relied on REST (based on HTTP), using RestTemplate
or later recommended WebClient
. Subsequently, Feign (a declarative HTTP client) emerged, allowing developers to call remote services using interfaces + annotations, making it closer to an RPC development experience.
To illustrate, RESTful is highway transportation (HTTP + JSON), flexible but prone to traffic jams as volume increases; RPC is well-built high-speed rail tracks, where the trains operate efficiently and predictably, suitable for large-scale point-to-point communication.
4. Real-time Communication APIs: WebSocket
Real-time performance is a must-have in many internet application scenarios. For instance:
Message synchronization in instant messaging applications
Document editing in collaborative office software
State updates in gaming scenarios
Market updates on trading platforms
If we only rely on the request-response model of RESTful APIs, the client either needs to constantly poll the server or can only passively wait. The former wastes resources, while the latter fails to meet real-time requirements. Therefore, WebSocket was born, providing the capability for the server to actively push messages to the client, though the methods and applicable scenarios differ.
WebSocket is a full-duplex communication protocol defined in the HTML5 standard, allowing clients and servers to exchange data bidirectionally and in real-time over a single TCP connection. Unlike REST, WebSocket is not a one-time request-and-response short communication; once a connection is established, messages can be exchanged at any time.
Advantages
Bidirectional communication: Both clients and servers can actively send messages, breaking the unidirectional mode of HTTP.
Low latency: Reusing the connection, avoiding the overhead of frequent HTTP handshakes.
High real-time performance: Suitable for scenarios with frequent message exchanges.
Operational Mechanism
The client initiates a handshake via an HTTP request (
Upgrade: websocket
);The server agrees to upgrade and establishes a WebSocket connection;
Subsequent communications are transmitted based on frame protocols over the long TCP connection, supporting both binary and text.
Example
5. Streaming APIs for Large Model Scenarios: SSE
In large model scenarios, traffic exhibits the following characteristics:
Generation results are incremental: Large models do not produce a complete answer all at once during inference but output token by token.
Response delays are longer: Inference can take seconds or even minutes.
The data volume is large and unpredictable: The outputs of large models are often difficult to estimate in advance, and one-time transmission can cause memory pressure, sudden bandwidth spikes, and even connection timeouts.
The interaction mode is mostly unidirectional: Most large model application scenarios involve the user asking a question and the model responding, rarely requiring real-time bidirectional message exchange.
A large number of connections with high operational requirements: A large model application may simultaneously serve millions of users, requiring lighter, more proxy-friendly, and load-balancing solutions.
Therefore, RESTful is unsuitable, as it requires the client to issue a request, wait for the server to finish computing, and then return results all at once. WebSocket requires bidirectional communication, involving independent protocol upgrades, long connection management, and heartbeat checks; under complex networks (firewalls, proxies, load balancing), WebSocket is more prone to interruption, and its bidirectional capabilities appear redundant, making it not the optimal choice.
SSE (Server-Sent Events) is a one-way streaming transmission mechanism based on HTTP, allowing the server to continuously push event streams to the client through the same connection, naturally fitting scenarios involving dialogue agents.
Advantages
Natural streaming: Supports the server to generate and push data simultaneously, allowing users to see part of the output immediately.
Based on HTTP: Reuses existing HTTP infrastructure, ensuring good compatibility; proxies/load balancers/firewalls are friendly.
Lightweight: Requires the server to continuously write
data:
data blocks, allowing the client to receive updates in real-time.Focusing on unidirectional flow: Perfectly matches the large model scenario of "only needing to output", eliminating the need to waste resources maintaining a bidirectional channel.
Reconnect on disconnection: Supporting
Last-Event-ID
, it can recover from the interruption point.
Operational Mechanism
The client initiates an HTTP GET request to the server via
EventSource
;The server returns
Content-Type: text/event-stream
and keeps the connection open;The server continuously pushes events in a streaming format:
The client processes them one by one, creating a real-time effect.
Example
To illustrate the differences between the three: WebSocket is like a phone call, where both parties can interrupt each other at any time. SSE is like a radio broadcast, with the server continuously playing programs and the client tuning in. In contrast, RESTful is like writing a letter, requiring a back-and-forth exchange. Compared to the phone mode of WebSocket, SSE is more lightweight and suitable for unidirectional pushing. For details on the choice between WebSocket and SSE, please read[2].
6. API for MCP Scenarios
Although both interact with large models, the traffic characteristics of MCP scenarios differ somewhat from those of large model clients. Initially, MCP officially used the SSE protocol, but later changed to the Streamable HTTP protocol.
Issues with HTTP + SSE
The transport process of HTTP + SSE involves communication between the client and server through two main channels.
HTTP Request/Response: The client sends messages to the server via standard HTTP requests.
Server-Sent Events (SSE): The server pushes messages to the client through a dedicated /sse endpoint.
This leads to the following three issues:
The server must maintain long connections, resulting in significant resource consumption under high concurrency.
Server messages can only be transmitted via SSE, causing unnecessary complexity and overhead.
Infrastructure compatibility: Many existing network infrastructures may not properly handle long-term SSE connections. Corporate firewalls may forcibly terminate timed-out connections, leading to unreliable services.
Improvements of Streamable HTTP
Streamable HTTP is a significant upgrade of the MCP protocol, addressing multiple critical issues with the original HTTP + SSE transport method through the following improvements:
Unified endpoint: Removed the dedicated `/sse` endpoint for establishing connections, integrating all communication into a unified endpoint.
On-demand streaming: The server can flexibly choose to return standard HTTP responses or stream via SSE.
State management: Introduced a session mechanism to support state management and recovery.
Thus, compared to SSE, Streamable HTTP shows significant improvements in stability, performance, and client complexity. For data comparison, please read this article[3].
7. Conclusion and Outlook
The evolution of APIs is a process by which software engineering continuously seeks to address new issues. From the complexities of SOAP, to the simplicity of RESTful, to the flexibility of GraphQL, and from unidirectional HTTP requests to real-time bidirectional communication with WebSocket, and finally to streaming APIs in the context of large models, each form represents a new balance found between performance, flexibility, and real-time capability.
In the future, as AI-native applications continue to diversify, APIs will continue to evolve and will give rise to more demands in API management. Recently, Higress has open-sourced the out-of-the-box AI open platform Himarket, aimed at efficiently and uniformly managing RESTful APIs, MCP services, and other interfaces that provide services, data, and applications. Welcome to try it. For functionality and demo details, please read this article[3].