“Block Non‑RFC‑Compliant HTTP Traffic on HTTP/HTTPS Ports: Enable to allow the service to block traffic that isn’t compliant with Request for Comments (RFC)…” (help.zscaler.com)
“Block Non‑HTTP Traffic on HTTP/HTTPS Ports: Enable to restrict traffic on HTTP/HTTPS ports to HTTP traffic only.” (help.zscaler.com)
1 | Executive summary
Over two work‑weeks we set out to prove whether Zscaler Internet Access (ZIA) really enforces RFC‑conformant HTTP syntax when the “Block Non‑RFC‑Compliant HTTP Traffic” feature is enabled. The journey produced a 30‑case regression pack, four daily (batch) test rounds, several surprises, and a handful of actionable bugs escalated to the vendor.
2 | Why this matters
Modern web gateways promise deep‑layer protocol validation, but in practice attackers still smuggle requests by abusing edge‑case grammar. Enforcing only proper HTTP on ports 80/443 dramatically reduces the surface for:
- classic request‑smuggling (CL / TE desynchronisation)
- tunnel pivoting and protocol confusion attacks (SSH‑over‑443 etc.)
- header‑injection tricks that bypass downstream WAFs
If ZIA really blocks malformed handshakes early, every downstream control benefits.
3 | Phase 0 — quick‑and‑dirty curl probes
Before writing the raw‑socket harness we validated ZIA’s setting with the native Windows curl.exe (v8.7.0) against https://httpbin.org. Twelve malformed requests were selected to cover the most common RFC violations an attacker can create with a single‑line command.
|
# |
Malformation |
Goal / RFC violation |
Expected (feature ON) |
Observed (baseline) |
|
1 |
GET with body (-d) |
Disallowed payload on idempotent method (RFC 7231 §4.3.1) |
Block |
200 OK |
|
2 |
POST with invalid Content‑Type |
Invalid media‑type token (RFC 7231 §3.1.1.1) |
Block |
200 OK |
|
3 |
FAKEMETHOD verb |
Undefined method token (RFC 7230 §3.1.1) |
Block (405/501) |
200 OK |
|
4 |
Custom header containing raw CRLF |
Break header grammar, probe injection |
Block |
200 OK‡ |
|
5 |
Malformed JSON body {invalid: json,} |
Ensure backend sees parse error |
N/A |
200 OK |
|
6 |
Empty Host: header |
Mandatory authority missing (RFC 7230 §5.4) |
Block |
not executed† |
|
7 |
HTTP/0.9 request |
Obsolete version on modern port |
Block / drop |
200 OK |
|
8 |
Transfer‑Encoding: chunked + Content‑Length |
TE.CL smuggling precursor |
Block |
200 OK |
|
9 |
Invalid Accept: */*; q=2.0 |
q‑value must be ≤ 1.0 (RFC 9110 §12.5.1) |
Block |
200 OK |
|
10 |
PUT with Content‑Length: 10 but no body |
Length mismatch |
Block |
200 OK |
|
11 |
Duplicate Content‑Type headers |
Conflicting value occurrences |
Block |
200 OK |
|
12 |
GET with Transfer‑Encoding: chunked |
Chunked framing on no‑body method |
Block |
200 OK |
† Direct raw socket is mandatory; curl automatically re‑inserts a sane Host header when not using an explicit proxy.
‡ curl.exe strips raw control characters before sending; separate raw harness required for definitive test.
Outcome: 0/12 transactions were blocked, confirming the need for deeper testing.
4 | Phase 1 — vendor analysis | Phase 1 — vendor analysis
Test done by Zscaler Resident Engineer (Nicolas) after escalation to vendor and results replicate ours:
Following your request I have retested in my lab all the use cases and found the following:- HTTP version invalid - Detected - echo -e "GRT / HTTP/0.9\r\nHost: perdu.com\r\n\r\n" | nc perdu.com 80- Host empty - Not detected - echo -e "GET / HTTP/1.1\r\nHost: \r\n\r\n" | nc perdu.com 80- Body in the Header - Not Detected - echo -e "GET / HTTP/1.1\r\nHost: perdu.com\r\nContent-Length: 17\r\n\r\nThisShouldNotBeHere" | nc perdu.com 80- Invalid User-Agent - Not Detected - echo -e "GET / HTTP/1.1\r\nHost: perdu.com\r\nUser-Agent: InvalidUserAgent/1.0\r\n\r\n" | nc perdu.com 80- Empty User-Agent - Not Detected - echo -e "GET / HTTP/1.1\r\nHost: perdu.com\r\nUser-Agent: \r\n\r\n" | nc perdu.com 80- Invalid Header - Not Detected - echo -e "GET / HTTP/1.1\r\nHost: perdu.com\r\n: InvalidHeaderValue\r\n\r\n" | nc perdu.com 80I have open a ticket (05642244) and this is the answer from the support. Only the last cases need to be investigated further as this is not expected.
"Block Non-RFC Compliant HTTPTraffic" feature typically targets HTTP requests that are so malformed that they: Don't adhere to the basic syntactic structure defined in HTTP RFCs (e.g., RFC 7230-7235 for HTTP/1.1). Could be used in certain attack vectors (e.g., HTTP Desync, request smuggling) if processed by a less robust downstream server. Represent clear violations of mandatory (MUST/REQUIRED) directives in the RFCs, especially regarding request line and essential headers.
It generally does not block: Violations of recommended (SHOULD/RECOMMENDED) directives, Semantically incorrect but syntactically valid requests and Uncommon but technically permissible constructs.
Regarding the test cases:
- Host empty : Why Not Detected: RFC 7230, Section 5.4 states: "A client MUST send a Host header field in all HTTP/1.1 request messages." While sending an empty value for the Host header is bad practice and will likely cause issues with the origin server (which would typically return a 400 Bad Request), the header line itself is present. The Zscaler feature might be checking for the presence of the Host header line, not necessarily the validity or non-emptiness of its value. Some interpretations focus on the complete absence of the Host: line as the primary violation.
- Body in the Header : Why Not Detected: RFC 7231, Section 4.3.1 (GET): "A payload within a GET request message has no defined semantics; sending a payload body on a GET request might cause some existing implementations to reject the request."
This means it's not forbidden by the RFC, just that its meaning is undefined and servers might reject it. It's not a syntactic violation for a GET request to have a body, just very unusual and typically ignored or problematic for the server. The feature likely doesn't see this as "non-RFC compliant" from a strict protocol structure perspective.
- Invalid User-Agent : Why Not Detected: RFC 7231, Section 5.5.3 states: "A user agent SHOULD send a User-Agent header field in each request..." SHOULD means it's recommended but not mandatory. Furthermore, the content of the User-Agent string is not strictly defined or validated by the HTTP protocol itself (though there are conventions). An "invalid" User-Agent string is still a syntactically valid header.
- Empty User-Agent : Why Not Detected: Same as above. The header is present, its value is empty. This is RFC compliant.
- Invalid Header : This shpuld be detected, we ar checking why this is not the case. A header field name must be a token (RFC 7230, Section 3.2.6). A token cannot be empty. So, a header line that starts with a colon :\s*InvalidHeaderValue (implying an empty header name) is indeed non-RFC compliant.
5 | Phase 2 — building a deterministic regression pack
We switched to raw‑socket scripting so that every byte on the wire is ours, free from curl’s silent sanitation.
https://github.com/Ispanikas/small_scripts/blob/8d8c15945f3615fc630e00aadd399d9ec75dc215/Http-Regression-Pack.ps1
Key design choices
|
Concept |
Rationale |
|
Raw TCP, no TLS |
Simplest path; ZIA still parses layer‑7 before forwarding. |
|
30 cases / 4 “DAY” batches |
Mirrors a QA sprint cadence while isolating functional areas. |
|
CSV logging |
Single artefact that joins timestamp, verdict line, and full payload. |
|
Idempotent host variable |
Change $TargetHost once to reroute traffic to any lab origin. |
Batch themes
|
Test Batch (Day) |
Focus area |
# cases |
|
1 – Baseline & vendor repro |
Known malformed examples |
8 |
|
2 – Host‑header edge cases |
Missing/dup/oversize/fold Host |
7 |
|
3 – TE / CL smuggling probes |
Chunked v Content‑Length |
7 |
|
4 – Exotic tokens & line format |
Oversize start‑line, UTF‑8 method, lone LF, etc. |
8 |
6 | Phase 3 — execution & data gathering
Environment — Windows 11 22H2 VM behind Zscaler Client Connector > Z‑Tunnel 2.0 (dataplane). Every test batch was executed twice: first through a plain‑HTTP path (TLS inspection Off), then through the organisation’s full SSL inspection stack (TLS inspection On). Each run takes < 20 s and appends to HttpMalformedTest‑log.csv.
|
Batch (Day) |
Run timestamp (UTC+3) |
Client Connector ver. |
Z‑Tunnel mode |
TLS inspection |
Runtime (s) |
CSV rows |
Notes |
|
1‑A (Jul 10) |
2025‑07‑10 09:12 |
4.5.0.232 |
2.0 |
Off |
17 |
8 |
Baseline without SSL inspection |
|
1‑B (Jul 10) |
2025‑07‑10 09:19 |
4.5.0.232 |
2.0 |
On |
18 |
8 |
Same batch through MITM TLS |
|
2‑A (Jul 11) |
2025‑07‑11 09:05 |
4.5.0.232 |
2.0 |
Off |
18 |
7 |
Host‑header focus |
|
2‑B (Jul 11) |
2025‑07‑11 09:13 |
4.5.0.232 |
2.0 |
On |
19 |
7 |
Host‑header with SSL inspection |
|
3‑A (Jul 14) |
2025‑07‑14 08:58 |
4.5.0.232 |
2.0 |
Off |
19 |
7 |
TE/CL smuggling |
|
3‑B (Jul 14) |
2025‑07‑14 09:07 |
4.5.0.232 |
2.0 |
On |
20 |
7 |
TE/CL with SSL inspection |
|
4‑A (Jul 15) |
2025‑07‑15 09:02 |
4.5.0.232 |
2.0 |
Off |
19 |
8 |
Exotic syntax |
|
4‑B (Jul 15) |
2025‑07‑15 09:11 |
4.5.0.232 |
2.0 |
On |
20 |
8 |
Exotic syntax under SSL inspection |
Column explanations
- Run timestamp — start of script execution in Europe/Sofia (UTC+3).
- Client Connector ver. — upgraded to the 4.5 release stream that introduces a new DPI library; pinned for consistency.
- Z‑Tunnel mode — 2.0 enables per‑packet proxy; 1.0 would alter flow handling.
- TLS inspection — each batch evaluated with ZIA SSL inspection Off and On to reveal parsing differences in clear‑text vs decrypted flows.
- Runtime — wall‑clock seconds measured by PowerShell Measure‑Command.
- CSV rows — one per test‑case; sanity‑check against batch cardinality.
7 | Phase 4 — results & analysis | Phase 4 — results & analysis
|
Date (2025) |
Batch |
Highlights |
Deviations / bugs |
|
Jul 10 |
Baseline |
HTTP/0.9 dropped; invalid method rejected. |
Host: with empty value only hits origin (400), not ZIA edge. Empty header name not blocked. |
|
Jul 11 |
Host |
Empty/Missing/Duplicate Host blocked. |
IPv6 literal blocked (403) although RFC‑compliant. |
|
Jul 14 |
TE/CL |
Classic TE/CL smuggling blocked. |
CL underflow accepted → high‑risk desync vector. |
|
Jul 15 |
Exotic |
Oversize lines & non‑ASCII methods blocked. |
Lone LF delimiter & tab‑prefixed method accepted. CR in header value drops connection without reason code. |
8 | Security implications
The good — ZIA detects most obvious smuggling tricks out‑of‑the‑box.
The bad — leniency around stray LFs, header‑name grammar, and CL‑underflow suggests the HTTP parser still follows “be liberal in what you accept”. Attackers can chain those quirks to:
- inject ghost requests after the gateway
- bypass content filters by wrapping payload in tolerant syntax
- open covert tunnels that masquerade as broken browsers
Real results:
https://github.com/Ispanikas/small_scripts/blob/130f96cc85f712e62cfc13eab0b3a2b1af9648e1/HttpMalformedTest-(httpbin)log.csv
https://github.com/Ispanikas/small_scripts/blob/130f96cc85f712e62cfc13eab0b3a2b1af9648e1/HttpMalformedTest-(perdu)log.csv
Author: Tsvetomir Bogdanov • Edited: July 29 2025