Monday, December 17, 2012

Speeding up HTTP with minimal protocol changes

As SPDY works its way through IETF ratification I began wondering whether it was really necessary to add a complex, binary protocol to the HTTP suite to improve HTTP performance. One of the main things that SPDY sets out to fix is defined in the opening paragraph of the SPDY proposal:
One of the bottlenecks of HTTP implementations is that HTTP relies on multiple connections for concurrency.  This causes several problems, including additional round trips for connection setup, slow-start delays, and connection rationing by the client, where it tries to avoid opening too many connections to any single server.  HTTP pipelining helps some, but only achieves partial multiplexing.  In addition, pipelining has proven non-deployable in existing browsers due to intermediary interference.
The solution to this problem (as currently proposed) is SPDY. But I couldn't help thinking that solving the multiplexing problem could be done in a simpler manner within HTTP itself. And so here is a partial proposal that involves adding two new headers to existing HTTP and nothing more.

1.  Overview

   HMURR (pronounced 'hammer') introduces a new pipelining mechanism
   with explicit identifiers used to match requests and responses sent
   on the same TCP connection so that out-of-order responses are
   possible. The current HTTP 1.1 pipelining mechanism requires that
   responses be returned in the same order as requests are made (FIFO)
   which itself introduces a head-of-line blocking problem.

   In addition, HTTP 1.1 pipelining does not allow responses to be
   interleaved. When a response is transmitted the entire response
   must be sent before a later response can be transmitted. HMURR
   introduces a chunking mechanism that allows partial responses to be
   sent. This enables multiple responses to be interleaved on a single
   connection preventing a long response from starving out shorter
   ones.

   HMURR attempts to preserve the existing semantics of HTTP.  All
   features such as cookies, ETags, Vary headers, Content-Encoding
   negotiations, etc. work as they do with HTTP; HMURR simply
   introduces an explicit multiplexing mechanism.

   HMURR introduces two new HTTP headers: one header that is used for
   requests and responses and one that is only present in
   responses. No changes are made to other HTTP headers or HTTP
   responses.

2. HTTP Version

   It is intended that HMURR be a modification to the existing HTTP
   standard RFC 2616 and requires a higher HTTP version number. Either
   HTTP 1.2 or HTTP 2.0 would be suitable.

3. HMURR Operation

3.1. Pipelining

   A client that supports persistent connections MAY "pipeline" its
   requests (i.e., send multiple requests without waiting for each
   response). Each request must contain a Request-ID header specifying a
   unique identifier used by the client to identify the request. When
   responding to a request the server will each the Request-ID header
   with the same value so that the client can match requests and
   responses. This mechanism allows HTTP responses to be returned in any
   order.

   Clients which assume persistent connections and pipeline immediately
   after connection establishment SHOULD be prepared to retry their
   connection if the first pipelined attempt fails. If a client does
   such a retry, it MUST NOT pipeline before it knows the connection is
   persistent. Clients MUST also be prepared to resend their requests if
   the server closes the connection before sending all of the
   corresponding responses.

   Clients SHOULD NOT pipeline requests using non-idempotent methods or
   non-idempotent sequences of methods (see section 9.1.2 of
   RFC2616). Otherwise, a premature termination of the transport
   connection could lead to indeterminate results. A client wishing to
   send a non-idempotent request SHOULD wait to send that request until
   it has received the response status for all previous outstanding
   requests made in the pipeline.

3.2. Multiplexed responses

   A server may choose to break a response into parts so that a large
   response does not consume the entire TCP connection. This allows
   multiple responses to be returned without any one waiting for another.

   When a response is broken into parts each part will consist of a
   normal HTTP header and body. These parts are called slices. The first
   slice sent in response to an HTTP request MUST contain either a
   Content-Length or specify Transfer-Encoding: chunked.

   Each slice MUST start with a valid Status-Line (RFC 2616 section 6.1)
   followed by response headers. The first slice MUST have the HTTP
   headers that would be present were the response transmitted
   unsliced. Subsequent slices MUST have only a Slice-Length (but see
   next paragraph) and Request-ID header. The minimal slice will consist
   of a Status-Line and a single Request-ID header.

   In satisfying an HTTP request the server MAY send multiple slices. All
   slices except the last one MUST contain a Slice-Length header
   specifying the number of bytes of content being transmitted in that
   slice. The final slice MUST NOT contain a Slice-Length header; the
   client MUST either use the Content-Length header sent in the first
   slice (if present) or the chunked transfer encoding to determine how
   much data is to be read.

   The HTTP response code MAY change from slice to slice if server
   conditions change. For example, if a server becomes unavailable while
   sending slices in response to a request the Status-Line on the initial
   slice could have indicated 200 OK but a subsequent slice may indicate
   500 Internal Server Error. If the HTTP response code changes the
   server MUST send a complete set of HTTP headers as if the it were the
   first slice.

   Since there is no negotiation between client and server about sliced
   responses, a client sending a Request-ID header MUST be prepared to
   handle a sliced response.

3.3. Long responses

   A server MAY choose to use the slice mechanism in section 3.2 to
   implement a long response to a request. For example, a chat server
   could make a single HTTP request for lines of chat and the server
   could use the slice mechanism with chunked transfer encoding to send
   messages when they arrive.

   The client would simply wait for slices to arrive and decode the
   chunks within them. One simple mechanism would be to send a slice
   containing the same number of bytes as the chunk (the chunked encoding
   header would indicate X bytes and the Slice-Length would be X bytes
   plus the chunk header size). The client would then be able to read a
   complete slice containing a complete chunk and use it for rendering.

3.4. Example session

   In this example the HTTP version for HMURR is specified as 1.2. It
   shows a client making an initial request for a page without a
   Request-ID, receiving the complete response and then reusing the
   connection to send multiple requests and received sliced replies in a
   different order on a single TCP connection.

     client                             server

     GET / HTTP/1.2
     Host: example.com
     Connection: keep-alive


                                        HTTP/1.2 200 OK
                                        Content-Length: 1234
                                        Content-Type: text/html
                                        Connection: keep-alive

                                        (1234 bytes of data)


     GET /header.jpg HTTP/1.2
     Host: example.com
     Request-ID: a1

     GET /favicon.ico HTTP/1.2
     Host: example.com
     Request-ID: b2

     GET /hero.jpg HTTP/1.2
     Host: example.com                  HTTP/1.2 200 OK
     Request-ID: c3                     Content-Length: 632
                                        Content-Type: image/jpeg
     GET /iframe.html HTTP/1.2          Request-ID: b2
     Host: example.com
     Request-ID: d4                     (632 bytes of data)


                                        HTTP/1.2 200 OK
                                        Content-Length: 65343
                                        Request-ID: a1
                                        Slice-Length: 1024

                                        (1024 bytes of data)

                                        HTTP/1.2 200 OK
                                        Transfer-Encoding: chunked
                                        Request-ID: c3
                                        Slice-Length: 4957

                                        (4957 of chunked data)

                                        HTTP/1.2 200 OK
                                        Content-Length: 128
                                        Request-ID: d4

                                        (128 bytes of HTML)

                                        HTTP/1.2 200 OK
                                        Request-ID: a1

                                        (64319 bytes of data)

                                        HTTP/1.2 200 OK
                                        Request-ID: c3
                                        Slice-Length: 2354

                                        (2354 bytes of chunked data)

                                        HTTP/1.2 200 OK
                                        Request-ID: c3
                                        (chunked data that includes 00
                                        block indicating end)

   In this example, the request for / is satisfied in full without using
   pipelining or slicing. The client then makes requests for four
   resources /header.jpg, /favicon.ico, /hero.jpg and /iframe.html and
   assigns them IDs a1, b2, c3 and d4 respectively.

   Since /favicon.ico (ID b2) is small it is sent while the client is
   generating requests and in full (the Request-ID header is present, but
   Slice-Length is not).

   /header.jpg is sent in two slices. The first has a Slice-Length of
   1024 bytes and specifies the complete Content-Length of the
   resource. The second slice has no Slice-Length header indicating that
   it is the final slice satisfying the request with ID a1.

   /hero.jpg is sent using chunked encoding and in two slices. The first
   slice indicate a Slice-Length (of chunked data) and the second slice
   has no Slice-Length and the client reads the rest of the chunked data
   (which must include the 0 length final chunked block).

   /iframe.html is small and is satisfied with a non-sliced response.
   Responses are delivered in the order that is convenient for the server
   and using slicing to prevent starvation. Since the client needs the /
   resource in its entirety before continuing it does not send a
   Request-ID header and receives the complete response.

4. Header Definitions

This section defines the syntax and semantics of additional HTTP
headers added with HMURR to the standard HTTP/1.1 header fields.

4.1. Request-ID

   The Request-ID is added to the HTTP request headers generated by a
   client to indicate that it intends to use HMURR and to uniquely
   identify the request.

      Request-ID = "Request-ID" ":" unique-request-tag

   When responding to the request the origin-server MUST insert a
   Request-ID header with the corresponding unique-request-tag so that
   the client can match requests and responses.

4.2. Slice-Length

   The Slice-Length response-header is added to a response by the
   origin-server to indicate the length of content that follows the HTTP
   response headers.

      Slice-Length = "Slice-Length" ":" 1*DIGIT
   If this header is missing it indicates that the entire (or remaining
   unsent) response-body is being transmitted with this set of HTTP
   headers. If present it indicates the number of bytes of response that
   are being transmitted. The client MUST use the Content-Length to
   determine the total length expected, or if chunked transfer encoding
   is used the client MUST use the chunked encoding header to determine
   the end of the content.

Obviously, this proposal does not provide all the functionality of SPDY (such as a forced TLS connection, header compression or built-in server push), but it does deal with connection multiplexing in a simple, textual manner.

There are probably reasons (that I've overlooked) why my proposal is a bad idea; what are they?

If you enjoyed this blog post, you might enjoy my travel book for people interested in science and technology: The Geek Atlas. Signed copies of The Geek Atlas are available.

<$BlogCommentBody$>

<$BlogCommentDateTime$> <$BlogCommentDeleteIcon$>

Post a Comment

Links to this post:

<$BlogBacklinkControl$> <$BlogBacklinkTitle$> <$BlogBacklinkDeleteIcon$>
<$BlogBacklinkSnippet$>
Create a Link

<< Home