## Monday, December 17, 2012

### Speeding up HTTP with minimal protocol changes

As SPDY works its way through IETF ratification I began wondering whether it was really necessary to add a complex, binary protocol to the HTTP suite to improve HTTP performance. One of the main things that SPDY sets out to fix is defined in the opening paragraph of the SPDY proposal:
One of the bottlenecks of HTTP implementations is that HTTP relies on multiple connections for concurrency.  This causes several problems, including additional round trips for connection setup, slow-start delays, and connection rationing by the client, where it tries to avoid opening too many connections to any single server.  HTTP pipelining helps some, but only achieves partial multiplexing.  In addition, pipelining has proven non-deployable in existing browsers due to intermediary interference.
The solution to this problem (as currently proposed) is SPDY. But I couldn't help thinking that solving the multiplexing problem could be done in a simpler manner within HTTP itself. And so here is a partial proposal that involves adding two new headers to existing HTTP and nothing more.

1.  Overview

HMURR (pronounced 'hammer') introduces a new pipelining mechanism
with explicit identifiers used to match requests and responses sent
on the same TCP connection so that out-of-order responses are
possible. The current HTTP 1.1 pipelining mechanism requires that
responses be returned in the same order as requests are made (FIFO)
which itself introduces a head-of-line blocking problem.

In addition, HTTP 1.1 pipelining does not allow responses to be
interleaved. When a response is transmitted the entire response
must be sent before a later response can be transmitted. HMURR
introduces a chunking mechanism that allows partial responses to be
sent. This enables multiple responses to be interleaved on a single
connection preventing a long response from starving out shorter
ones.

HMURR attempts to preserve the existing semantics of HTTP.  All
negotiations, etc. work as they do with HTTP; HMURR simply
introduces an explicit multiplexing mechanism.

HMURR introduces two new HTTP headers: one header that is used for
requests and responses and one that is only present in
responses.

2. HTTP Version

It is intended that HMURR be a modification to the existing HTTP
standard RFC 2616 and requires a higher HTTP version number. Either
HTTP 1.2 or HTTP 2.0 would be suitable.

3. HMURR Operation

3.1. Pipelining

A client that supports persistent connections MAY "pipeline" its
requests (i.e., send multiple requests without waiting for each
response). Each request must contain a Request-ID header specifying a
unique identifier used by the client to identify the request. When
responding to a request the server will each the Request-ID header
with the same value so that the client can match requests and
responses. This mechanism allows HTTP responses to be returned in any
order.

Clients which assume persistent connections and pipeline immediately
after connection establishment SHOULD be prepared to retry their
connection if the first pipelined attempt fails. If a client does
such a retry, it MUST NOT pipeline before it knows the connection is
persistent. Clients MUST also be prepared to resend their requests if
the server closes the connection before sending all of the
corresponding responses.

Clients SHOULD NOT pipeline requests using non-idempotent methods or
non-idempotent sequences of methods (see section 9.1.2 of
RFC2616). Otherwise, a premature termination of the transport
connection could lead to indeterminate results. A client wishing to
send a non-idempotent request SHOULD wait to send that request until
it has received the response status for all previous outstanding

3.2. Multiplexed responses

A server may choose to break a response into parts so that a large
response does not consume the entire TCP connection. This allows
multiple responses to be returned without any one waiting for another.

When a response is broken into parts each part will consist of a
normal HTTP header and body. These parts are called slices. The first
slice sent in response to an HTTP request MUST contain either a
Content-Length or specify Transfer-Encoding: chunked.

Each slice MUST start with a valid Status-Line (RFC 2616 section 6.1)
followed by response headers. The first slice MUST have the HTTP
headers that would be present were the response transmitted
unsliced. Subsequent slices MUST have only a Slice-Length (but see
next paragraph) and Request-ID header. The minimal slice will consist
of a Status-Line and a single Request-ID header.

In satisfying an HTTP request the server MAY send multiple slices. All
slices except the last one MUST contain a Slice-Length header
specifying the number of bytes of content being transmitted in that
slice. The final slice MUST NOT contain a Slice-Length header; the
client MUST either use the Content-Length header sent in the first
slice (if present) or the chunked transfer encoding to determine how
much data is to be read.

The HTTP response code MAY change from slice to slice if server
conditions change. For example, if a server becomes unavailable while
sending slices in response to a request the Status-Line on the initial
slice could have indicated 200 OK but a subsequent slice may indicate
500 Internal Server Error. If the HTTP response code changes the
server MUST send a complete set of HTTP headers as if the it were the
first slice.

Since there is no negotiation between client and server about sliced
responses, a client sending a Request-ID header MUST be prepared to
handle a sliced response.

3.3. Long responses

A server MAY choose to use the slice mechanism in section 3.2 to
implement a long response to a request. For example, a chat server
could make a single HTTP request for lines of chat and the server
could use the slice mechanism with chunked transfer encoding to send
messages when they arrive.

The client would simply wait for slices to arrive and decode the
chunks within them. One simple mechanism would be to send a slice
containing the same number of bytes as the chunk (the chunked encoding
header would indicate X bytes and the Slice-Length would be X bytes
plus the chunk header size). The client would then be able to read a
complete slice containing a complete chunk and use it for rendering.

3.4. Example session

In this example the HTTP version for HMURR is specified as 1.2. It
shows a client making an initial request for a page without a
Request-ID, receiving the complete response and then reusing the
connection to send multiple requests and received sliced replies in a
different order on a single TCP connection.

client                             server

GET / HTTP/1.2
Host: example.com
Connection: keep-alive

HTTP/1.2 200 OK
Content-Length: 1234
Content-Type: text/html
Connection: keep-alive

(1234 bytes of data)

Host: example.com
Request-ID: a1

GET /favicon.ico HTTP/1.2
Host: example.com
Request-ID: b2

GET /hero.jpg HTTP/1.2
Host: example.com                  HTTP/1.2 200 OK
Request-ID: c3                     Content-Length: 632
Content-Type: image/jpeg
GET /iframe.html HTTP/1.2          Request-ID: b2
Host: example.com
Request-ID: d4                     (632 bytes of data)

HTTP/1.2 200 OK
Content-Length: 65343
Request-ID: a1
Slice-Length: 1024

(1024 bytes of data)

HTTP/1.2 200 OK
Transfer-Encoding: chunked
Request-ID: c3
Slice-Length: 4957

(4957 of chunked data)

HTTP/1.2 200 OK
Content-Length: 128
Request-ID: d4

(128 bytes of HTML)

HTTP/1.2 200 OK
Request-ID: a1

(64319 bytes of data)

HTTP/1.2 200 OK
Request-ID: c3
Slice-Length: 2354

(2354 bytes of chunked data)

HTTP/1.2 200 OK
Request-ID: c3

                                        (chunked data that includes 00
block indicating end)

In this example, the request for / is satisfied in full without using
pipelining or slicing. The client then makes requests for four
resources /header.jpg, /favicon.ico, /hero.jpg and /iframe.html and
assigns them IDs a1, b2, c3 and d4 respectively.

Since /favicon.ico (ID b2) is small it is sent while the client is
generating requests and in full (the Request-ID header is present, but
Slice-Length is not).

/header.jpg is sent in two slices. The first has a Slice-Length of
1024 bytes and specifies the complete Content-Length of the
resource. The second slice has no Slice-Length header indicating that
it is the final slice satisfying the request with ID a1.

/hero.jpg is sent using chunked encoding and in two slices. The first
slice indicate a Slice-Length (of chunked data) and the second slice
has no Slice-Length and the client reads the rest of the chunked data
(which must include the 0 length final chunked block).

/iframe.html is small and is satisfied with a non-sliced response.
Responses are delivered in the order that is convenient for the server
   and using slicing to prevent starvation. Since the client needs the /
resource in its entirety before continuing it does not send a

This section defines the syntax and semantics of additional HTTP

4.1. Request-ID

The Request-ID is added to the HTTP request headers generated by a
client to indicate that it intends to use HMURR and to uniquely
identify the request.

Request-ID = "Request-ID" ":" unique-request-tag

When responding to the request the origin-server MUST insert a
Request-ID header with the corresponding unique-request-tag so that
the client can match requests and responses.

4.2. Slice-Length

origin-server to indicate the length of content that follows the HTTP

Slice-Length = "Slice-Length" ":" 1*DIGIT

   If this header is missing it indicates that the entire (or remaining
unsent) response-body is being transmitted with this set of HTTP
headers. If present it indicates the number of bytes of response that
are being transmitted. The client MUST use the Content-Length to
determine the total length expected, or if chunked transfer encoding
is used the client MUST use the chunked encoding header to determine
the end of the content.


Obviously, this proposal does not provide all the functionality of SPDY (such as a forced TLS connection, header compression or built-in server push), but it does deal with connection multiplexing in a simple, textual manner.

There are probably reasons (that I've overlooked) why my proposal is a bad idea; what are they?

If you enjoyed this blog post, you might enjoy my travel book for people interested in science and technology: The Geek Atlas. Signed copies of The Geek Atlas are available.

Dmitrii Dimandt said...

If anything, your post shows how awesome HTTP is.

2:53 PM
skyborne said...

In no particular order:

It's my understanding that HTTP/2.0 would be inappropriate for this, as the rest of HTTP/1.1 (including pipelining) is unaffected by slicing.

HMURR is capitalized like an acronym, but isn't expanded. Geeks will want to know. (I sure do.) Http Multiplexing ... something.

Section 3.1, "When responding to a request the server will each the Request-ID header" ... should the final "each" be "echo"?

But other than that: I like it. I think HTTPS Everywhere, should its site list be turned into an API like phishing/malware protection is, would make forced TLS a non-issue. You would achieve higher security even on sites that don't want to update their server software to support http/1.2 or SPDY.

3:39 PM
Brandon Siegel said...

The only issue I see with this is how does the client handle the situation where a slice is delivered out-of-order without attaching explicit sequence numbers to the slices? The client actually has no way of knowing that a slice may have been delivered out-of-order, and could end up stitching together a corrupted response. If you are going to permit sliced responses, I don't see a way of avoiding assigning explicit sequence numbers to the response slices.

4:04 PM

I totally agree with you, and I proposed the same thing the day that SPDY was announced.

They replied saying that basically the only thing they cared about is speed, regardless of complexity. I didn't have data to show which solution is faster.

The lack of a request-id header seems like a simple oversight in the spec that could be fixed.

4:12 PM
Patrick McManus said...

Your premise seems to be that SPDY is a complex binary protocol and we don't need that achieve mux. I don't see much evidence for your complexity assertion - especially when compared with HTTP/1.

Consider the difference in finding the message boundaries of a SPDY message with an HTTP/1.1 one.

To process the SPDY message you
* buffer 8 bytes of data
* determine that it is a data packet by confirming (buf[0] & 0x80 == 0)
* determine the length of the data by ntohl(((uint32_t)buf)[1)& 0x00ffffff
* find out if there are more chunks to come by looking at (buf[4] & 1).

All done. It is very straightforward, efficient, well bounded, and testable.

Contrast that with parsing a HTTP/1 message.

* read "enough" data from the network. You don't know how much that is and if you read too much you have to implement some facility for "putting it back" so the next message can use it. This inevitably leads to streaming parsers and their inherent complexity and lack of efficiency.

* parse the headers so you can determine which message delimiter is being used. To parse the header you have to implement line folding, implement a tokenizer aware of various rules around quotation marks and colons, and adopt to a number of real-world variations in the use of line endings other than just CRLF. To do this you have to run a state machine against every byte of input rather than directly address fixed offsets.

* Implement strategies to deal with conflicts such "Content-Length: 42, 17" and "Content-Length: 95\r\nTransfer-Encoding: chunked\r\n"

* you need to implement an ascii to integer conversion routine to determine the status code because some status codes implicitly impact message delimiters (e.g. 304).

* now you have to implement no less than 5 message delimiting schemes - chunked encodings, EOF, content-length, implicit (304), and everyone's favorite multibyte/ranges.

* If you have content-length or a chunk you'll need to convert a text string to an integer again.. and http/1 doesn't bound the size of the text string so you'll either have a common bug with an overflow or you'll implement an undiscoverable constraint in your implementation leading to cases of failing interop.

* you'll still have exposure to a whole class of CRLF injection attacks inherent in the text format.

The binary framing is so much less complex and significant improvement. Sure, to the naked eye in a log it may not look that way but that is optimizing for all the wrong things and can be quite misleading - do you really interpret a HTTP header with line folding or \n\r instead of \r\n sequences correctly when eyeballing it?)

Now there certainly is some complexity in SPDY but it doesn't come, in my opinion, from the binary framing that you're talking about here. That is a a significant simplification over HTTP/1.

5:08 PM
Bill said...

Missing a word in your proposed spec:

When responding to a request the server will [verb-missing-here] each the Request-ID header with the same value so that the client can match requests and responses.

7:08 PM
Pelican said...

Cool Gripe :-)
http://www.w3.org/Protocols/HTTP-NG/http-ng-status.html seems to have a much simpler way of muxing (MUX) and session control (http://www.w3.org/Protocols/HTTP-NG/http-ng-scp.html) than the existing way of HTTP/2.0

7:25 PM
Tom Dooner said...

Cool proposal, I'm confident that it could work.

But one thing -- SPDY yields a *massive* speed boost from HTTP header compression. Especially with the size of cookies for some sites. It seems like your proposal would worsen the overhead by requiring lots of headers to be sent with every chunk of data (whereas SPDY streams assume the headers for subsequent bits of data).

I'm not sure the performance implications of this, but it seems like you might be sacrificing speed for conceptual clarity. Fine with me, but some performance measurements are definitely in order.

7:53 PM
Tom Dooner said...

This comment has been removed by the author.

7:54 PM
jcdickinson said...

Slight change to decrease the amount of work old clients/servers need to do:

GET / HTTP/1.1
Host: example.com

Being presented with this the (compliant) server would:

HTTP/1.1 101 Switching Protocols

HTTP/1.1 200 OK
Content-Length: 1234
Content-Type: text/html

Allowing you do to do the handshake in exactly one request/response pair (while, in theory, maintaining compatibility with the current TLS upgrade system).

Good work though! I am a massive fan of binary protocols, but this doesn't seem like such a bad compromise.

6:54 PM
jcdickinson said...

Uh, make that:

In the first sample.

6:56 PM
DZONEMVB said...

Hi John,

I'm a Content Curator for DZone.com and I think this post would be an excellent candidate for re-syndication on our Performance Zone. If you're interested, drop me a line at allenc[at]dzone[dot]com!

10:18 PM