Monday, December 17, 2012

Speeding up HTTP with minimal protocol changes

As SPDY works its way through IETF ratification I began wondering whether it was really necessary to add a complex, binary protocol to the HTTP suite to improve HTTP performance. One of the main things that SPDY sets out to fix is defined in the opening paragraph of the SPDY proposal:
One of the bottlenecks of HTTP implementations is that HTTP relies on multiple connections for concurrency.  This causes several problems, including additional round trips for connection setup, slow-start delays, and connection rationing by the client, where it tries to avoid opening too many connections to any single server.  HTTP pipelining helps some, but only achieves partial multiplexing.  In addition, pipelining has proven non-deployable in existing browsers due to intermediary interference.
The solution to this problem (as currently proposed) is SPDY. But I couldn't help thinking that solving the multiplexing problem could be done in a simpler manner within HTTP itself. And so here is a partial proposal that involves adding two new headers to existing HTTP and nothing more.

1.  Overview

   HMURR (pronounced 'hammer') introduces a new pipelining mechanism
   with explicit identifiers used to match requests and responses sent
   on the same TCP connection so that out-of-order responses are
   possible. The current HTTP 1.1 pipelining mechanism requires that
   responses be returned in the same order as requests are made (FIFO)
   which itself introduces a head-of-line blocking problem.

   In addition, HTTP 1.1 pipelining does not allow responses to be
   interleaved. When a response is transmitted the entire response
   must be sent before a later response can be transmitted. HMURR
   introduces a chunking mechanism that allows partial responses to be
   sent. This enables multiple responses to be interleaved on a single
   connection preventing a long response from starving out shorter

   HMURR attempts to preserve the existing semantics of HTTP.  All
   features such as cookies, ETags, Vary headers, Content-Encoding
   negotiations, etc. work as they do with HTTP; HMURR simply
   introduces an explicit multiplexing mechanism.

   HMURR introduces two new HTTP headers: one header that is used for
   requests and responses and one that is only present in
   responses. No changes are made to other HTTP headers or HTTP

2. HTTP Version

   It is intended that HMURR be a modification to the existing HTTP
   standard RFC 2616 and requires a higher HTTP version number. Either
   HTTP 1.2 or HTTP 2.0 would be suitable.

3. HMURR Operation

3.1. Pipelining

   A client that supports persistent connections MAY "pipeline" its
   requests (i.e., send multiple requests without waiting for each
   response). Each request must contain a Request-ID header specifying a
   unique identifier used by the client to identify the request. When
   responding to a request the server will each the Request-ID header
   with the same value so that the client can match requests and
   responses. This mechanism allows HTTP responses to be returned in any

   Clients which assume persistent connections and pipeline immediately
   after connection establishment SHOULD be prepared to retry their
   connection if the first pipelined attempt fails. If a client does
   such a retry, it MUST NOT pipeline before it knows the connection is
   persistent. Clients MUST also be prepared to resend their requests if
   the server closes the connection before sending all of the
   corresponding responses.

   Clients SHOULD NOT pipeline requests using non-idempotent methods or
   non-idempotent sequences of methods (see section 9.1.2 of
   RFC2616). Otherwise, a premature termination of the transport
   connection could lead to indeterminate results. A client wishing to
   send a non-idempotent request SHOULD wait to send that request until
   it has received the response status for all previous outstanding
   requests made in the pipeline.

3.2. Multiplexed responses

   A server may choose to break a response into parts so that a large
   response does not consume the entire TCP connection. This allows
   multiple responses to be returned without any one waiting for another.

   When a response is broken into parts each part will consist of a
   normal HTTP header and body. These parts are called slices. The first
   slice sent in response to an HTTP request MUST contain either a
   Content-Length or specify Transfer-Encoding: chunked.

   Each slice MUST start with a valid Status-Line (RFC 2616 section 6.1)
   followed by response headers. The first slice MUST have the HTTP
   headers that would be present were the response transmitted
   unsliced. Subsequent slices MUST have only a Slice-Length (but see
   next paragraph) and Request-ID header. The minimal slice will consist
   of a Status-Line and a single Request-ID header.

   In satisfying an HTTP request the server MAY send multiple slices. All
   slices except the last one MUST contain a Slice-Length header
   specifying the number of bytes of content being transmitted in that
   slice. The final slice MUST NOT contain a Slice-Length header; the
   client MUST either use the Content-Length header sent in the first
   slice (if present) or the chunked transfer encoding to determine how
   much data is to be read.

   The HTTP response code MAY change from slice to slice if server
   conditions change. For example, if a server becomes unavailable while
   sending slices in response to a request the Status-Line on the initial
   slice could have indicated 200 OK but a subsequent slice may indicate
   500 Internal Server Error. If the HTTP response code changes the
   server MUST send a complete set of HTTP headers as if the it were the
   first slice.

   Since there is no negotiation between client and server about sliced
   responses, a client sending a Request-ID header MUST be prepared to
   handle a sliced response.

3.3. Long responses

   A server MAY choose to use the slice mechanism in section 3.2 to
   implement a long response to a request. For example, a chat server
   could make a single HTTP request for lines of chat and the server
   could use the slice mechanism with chunked transfer encoding to send
   messages when they arrive.

   The client would simply wait for slices to arrive and decode the
   chunks within them. One simple mechanism would be to send a slice
   containing the same number of bytes as the chunk (the chunked encoding
   header would indicate X bytes and the Slice-Length would be X bytes
   plus the chunk header size). The client would then be able to read a
   complete slice containing a complete chunk and use it for rendering.

3.4. Example session

   In this example the HTTP version for HMURR is specified as 1.2. It
   shows a client making an initial request for a page without a
   Request-ID, receiving the complete response and then reusing the
   connection to send multiple requests and received sliced replies in a
   different order on a single TCP connection.

     client                             server

     GET / HTTP/1.2
     Connection: keep-alive

                                        HTTP/1.2 200 OK
                                        Content-Length: 1234
                                        Content-Type: text/html
                                        Connection: keep-alive

                                        (1234 bytes of data)

     GET /header.jpg HTTP/1.2
     Request-ID: a1

     GET /favicon.ico HTTP/1.2
     Request-ID: b2

     GET /hero.jpg HTTP/1.2
     Host:                  HTTP/1.2 200 OK
     Request-ID: c3                     Content-Length: 632
                                        Content-Type: image/jpeg
     GET /iframe.html HTTP/1.2          Request-ID: b2
     Request-ID: d4                     (632 bytes of data)

                                        HTTP/1.2 200 OK
                                        Content-Length: 65343
                                        Request-ID: a1
                                        Slice-Length: 1024

                                        (1024 bytes of data)

                                        HTTP/1.2 200 OK
                                        Transfer-Encoding: chunked
                                        Request-ID: c3
                                        Slice-Length: 4957

                                        (4957 of chunked data)

                                        HTTP/1.2 200 OK
                                        Content-Length: 128
                                        Request-ID: d4

                                        (128 bytes of HTML)

                                        HTTP/1.2 200 OK
                                        Request-ID: a1

                                        (64319 bytes of data)

                                        HTTP/1.2 200 OK
                                        Request-ID: c3
                                        Slice-Length: 2354

                                        (2354 bytes of chunked data)

                                        HTTP/1.2 200 OK
                                        Request-ID: c3
                                        (chunked data that includes 00
                                        block indicating end)

   In this example, the request for / is satisfied in full without using
   pipelining or slicing. The client then makes requests for four
   resources /header.jpg, /favicon.ico, /hero.jpg and /iframe.html and
   assigns them IDs a1, b2, c3 and d4 respectively.

   Since /favicon.ico (ID b2) is small it is sent while the client is
   generating requests and in full (the Request-ID header is present, but
   Slice-Length is not).

   /header.jpg is sent in two slices. The first has a Slice-Length of
   1024 bytes and specifies the complete Content-Length of the
   resource. The second slice has no Slice-Length header indicating that
   it is the final slice satisfying the request with ID a1.

   /hero.jpg is sent using chunked encoding and in two slices. The first
   slice indicate a Slice-Length (of chunked data) and the second slice
   has no Slice-Length and the client reads the rest of the chunked data
   (which must include the 0 length final chunked block).

   /iframe.html is small and is satisfied with a non-sliced response.
   Responses are delivered in the order that is convenient for the server
   and using slicing to prevent starvation. Since the client needs the /
   resource in its entirety before continuing it does not send a
   Request-ID header and receives the complete response.

4. Header Definitions

This section defines the syntax and semantics of additional HTTP
headers added with HMURR to the standard HTTP/1.1 header fields.

4.1. Request-ID

   The Request-ID is added to the HTTP request headers generated by a
   client to indicate that it intends to use HMURR and to uniquely
   identify the request.

      Request-ID = "Request-ID" ":" unique-request-tag

   When responding to the request the origin-server MUST insert a
   Request-ID header with the corresponding unique-request-tag so that
   the client can match requests and responses.

4.2. Slice-Length

   The Slice-Length response-header is added to a response by the
   origin-server to indicate the length of content that follows the HTTP
   response headers.

      Slice-Length = "Slice-Length" ":" 1*DIGIT
   If this header is missing it indicates that the entire (or remaining
   unsent) response-body is being transmitted with this set of HTTP
   headers. If present it indicates the number of bytes of response that
   are being transmitted. The client MUST use the Content-Length to
   determine the total length expected, or if chunked transfer encoding
   is used the client MUST use the chunked encoding header to determine
   the end of the content.

Obviously, this proposal does not provide all the functionality of SPDY (such as a forced TLS connection, header compression or built-in server push), but it does deal with connection multiplexing in a simple, textual manner.

There are probably reasons (that I've overlooked) why my proposal is a bad idea; what are they?


Dmitrii Dimandt said...

If anything, your post shows how awesome HTTP is.

skyborne said...

In no particular order:

It's my understanding that HTTP/2.0 would be inappropriate for this, as the rest of HTTP/1.1 (including pipelining) is unaffected by slicing.

HMURR is capitalized like an acronym, but isn't expanded. Geeks will want to know. (I sure do.) Http Multiplexing ... something.

Section 3.1, "When responding to a request the server will each the Request-ID header" ... should the final "each" be "echo"?

But other than that: I like it. I think HTTPS Everywhere, should its site list be turned into an API like phishing/malware protection is, would make forced TLS a non-issue. You would achieve higher security even on sites that don't want to update their server software to support http/1.2 or SPDY.

Brandon Siegel said...

The only issue I see with this is how does the client handle the situation where a slice is delivered out-of-order without attaching explicit sequence numbers to the slices? The client actually has no way of knowing that a slice may have been delivered out-of-order, and could end up stitching together a corrupted response. If you are going to permit sliced responses, I don't see a way of avoiding assigning explicit sequence numbers to the response slices.

Cliff Spradlin said...

I totally agree with you, and I proposed the same thing the day that SPDY was announced.!topic/chromium-discuss/PtszJy6q9b4

They replied saying that basically the only thing they cared about is speed, regardless of complexity. I didn't have data to show which solution is faster.

The lack of a request-id header seems like a simple oversight in the spec that could be fixed.

Patrick McManus said...

Your premise seems to be that SPDY is a complex binary protocol and we don't need that achieve mux. I don't see much evidence for your complexity assertion - especially when compared with HTTP/1.

Consider the difference in finding the message boundaries of a SPDY message with an HTTP/1.1 one.

To process the SPDY message you
* buffer 8 bytes of data
* determine that it is a data packet by confirming (buf[0] & 0x80 == 0)
* determine the length of the data by ntohl(((uint32_t)buf)[1)& 0x00ffffff
* find out if there are more chunks to come by looking at (buf[4] & 1).

All done. It is very straightforward, efficient, well bounded, and testable.

Contrast that with parsing a HTTP/1 message.

* read "enough" data from the network. You don't know how much that is and if you read too much you have to implement some facility for "putting it back" so the next message can use it. This inevitably leads to streaming parsers and their inherent complexity and lack of efficiency.

* parse the headers so you can determine which message delimiter is being used. To parse the header you have to implement line folding, implement a tokenizer aware of various rules around quotation marks and colons, and adopt to a number of real-world variations in the use of line endings other than just CRLF. To do this you have to run a state machine against every byte of input rather than directly address fixed offsets.

* Implement strategies to deal with conflicts such "Content-Length: 42, 17" and "Content-Length: 95\r\nTransfer-Encoding: chunked\r\n"

* you need to implement an ascii to integer conversion routine to determine the status code because some status codes implicitly impact message delimiters (e.g. 304).

* now you have to implement no less than 5 message delimiting schemes - chunked encodings, EOF, content-length, implicit (304), and everyone's favorite multibyte/ranges.

* If you have content-length or a chunk you'll need to convert a text string to an integer again.. and http/1 doesn't bound the size of the text string so you'll either have a common bug with an overflow or you'll implement an undiscoverable constraint in your implementation leading to cases of failing interop.

* you'll still have exposure to a whole class of CRLF injection attacks inherent in the text format.

The binary framing is so much less complex and significant improvement. Sure, to the naked eye in a log it may not look that way but that is optimizing for all the wrong things and can be quite misleading - do you really interpret a HTTP header with line folding or \n\r instead of \r\n sequences correctly when eyeballing it?)

Now there certainly is some complexity in SPDY but it doesn't come, in my opinion, from the binary framing that you're talking about here. That is a a significant simplification over HTTP/1.

Bill said...

Missing a word in your proposed spec:

When responding to a request the server will [verb-missing-here] each the Request-ID header with the same value so that the client can match requests and responses.

Pelican said...

Cool Gripe :-) seems to have a much simpler way of muxing (MUX) and session control ( than the existing way of HTTP/2.0

Tom Dooner said...

Cool proposal, I'm confident that it could work.

But one thing -- SPDY yields a *massive* speed boost from HTTP header compression. Especially with the size of cookies for some sites. It seems like your proposal would worsen the overhead by requiring lots of headers to be sent with every chunk of data (whereas SPDY streams assume the headers for subsequent bits of data).

I'm not sure the performance implications of this, but it seems like you might be sacrificing speed for conceptual clarity. Fine with me, but some performance measurements are definitely in order.

Tom Dooner said...
This comment has been removed by the author.
jcdickinson said...

Slight change to decrease the amount of work old clients/servers need to do:

GET / HTTP/1.1
Connection: keep-alive, upgrade
Upgrade: Upgrade

Being presented with this the (compliant) server would:

HTTP/1.1 101 Switching Protocols
Upgrade: HTTP/1.1, HMURR/1.0
Connection: Upgrade, keep-alive

HTTP/1.1 200 OK
Content-Length: 1234
Content-Type: text/html

Allowing you do to do the handshake in exactly one request/response pair (while, in theory, maintaining compatibility with the current TLS upgrade system).

Good work though! I am a massive fan of binary protocols, but this doesn't seem like such a bad compromise.

jcdickinson said...

Uh, make that:

Upgrade: HMURR/1.0

In the first sample.

DZONEMVB said...

Hi John,

I'm a Content Curator for and I think this post would be an excellent candidate for re-syndication on our Performance Zone. If you're interested, drop me a line at allenc[at]dzone[dot]com!