A while ago I was troubleshooting an issue with a certain client software (that I don’t even know the name of) on an internal network, that was uploading images via HTTP to a remote service on the internet. The simple HTTP POST upload requests failed for some reason with a HTTP/1.0 501 Not Implemented error from our Squid proxy server, prompting me to analyze further.
Fortunately the I had a network trace from the local admins so I could easily reproduce and test the requests with curl from my Linux client.
The issue proved to be caused to the chunked Transfer-Encoding headers of the POSTs and how our Squid 3.1 proxy server is unable to process requests with this Transfer-Encoding in certain cases.
I’ve seen a few mentions of issues with chunked Transfer-Encoding and Squid on the internets but many were older, contradictory or referencing Server side responses and not client side requests like in this case.
In a nutshell, Squid 3.1 is just not fully HTTP/1.1 compatible yet and only partially supports some features like chunked encoding as stated on the project’s website:
Squid-3.2 claims HTTP/1.1 support. Squid v3.1 claims HTTP/1.1 support but only in sent requests (from Squid to servers). Earlier Squid versions do not claim HTTP/1.1 support by default because they cannot fully handle Expect:100-continue, 1xx responses, and/or chunked messages.
Both Squid-3 and Squid-2 contain at least response chunked decoding. The chunked encoding portion is available from Squid-3.2 on all traffic except CONNECT requests.
As I mentioned on my other post about tuning Squid, this article is about the ignore_expect_100 setting and why it’s set on our Squid proxies.
We had and issue where requests for a certain site failed with an HTTP 417 error generated by Squid. You would find entries like these in the Squid access.log for the site that wasn’t accessible:
1329478677.344 0 192.168.1.22 NONE/417 4480 POST http://somewebsite.com/webservice/data.asmx – NONE/- text/html
(Note: It’s actually an application communicating via HTTP and not manual browser-based access by human beings.)
Turned out the problem is that this application sends requests with an “Expect: 100-continue” HTTP-header and Squid doesn’t have a proper implementation of the HTTP 1.1 Expect-mechanism.
The purpose of the 100 (Continue) status (see section 10.1.1) is to allow a client that is sending a request message with a request body to determine if the origin server is willing to accept the request (based on the request headers) before the client sends the request body. In some cases, it might either be inappropriate or highly inefficient for the client to send the body if the server will reject the message without looking at the body.
In layman’s terms, this means if a client that wants to send a request (with content) to a webserver , e.g. upload a file with an HTTP-POST, it can first ask “Hey webserver, I’m about to POST you a file, here are the HTTP-Headers of my request. If that’s all right with you, send me a 100-continue response and I’ll transmit the file.” (kind of similar to an HTTP-HEAD , which is just an exchange of headers too). Now why is this useful? For example, it allows a webserver to check the Content-Type, Content-Length or any other header of the request before the client attempts to send any actual data. If the webserver doesn’t accept this MIME-type, decides the file is too large or anything else, it can simply respond with a 417 Expectation Failed and no resources would have been wasted by sending a file the web application wouldn’t process anyways.
Geso. The tweaks described here are based on a Squid 3.1.x implementation, but should be valid on newer versions (3.2 currently in Beta), Squid 2.7 and 2.6 too. Just check the respective documentation on the Squid website.
To give you a little background on the involved environment, the Squid proxies I am referring to here are used in a simple proxy-sandwich configuration. Downstream of the Squids are our “main proxies” of which provide load balancing, high availability and caching logic. Those direct all traffic to our friends the Squids, responsible for content filtering. Now behind the Squids is another set of upstream proxies which provides AV scanning of web traffic. (Now don’t you dare to ask why this is 3-layered like this).
On an average day (90% of the traffic is generated between around 07am and 15pm), we’re pushing around 200-250GB of “ordinary” HTTP(S) and FTP traffic, consisting of 7-8 million requests through this configuration.