Change caching rules#3
Conversation
There was a problem hiding this comment.
Pull request overview
This PR updates the html2md proxy-wasm filter’s caching and conversion signaling so HTML responses can be safely cached in both original and converted (HTML→Markdown) variants.
Changes:
- Normalize conversion signaling via a
FLAG_MARKDOWNconstant and setConvertrequest header based onAccept. - Apply
Vary: Convertonly for HTML responses (and merge with any existingVaryvalue). - Change conversion error handling to fall back to the origin response instead of sending a 500.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| } else { | ||
| self.add_http_request_header(CONVERT_FLAG, ""); |
There was a problem hiding this comment.
on_http_request_headers now adds a Convert request header with an empty value for non-Markdown requests. Since the response already emits Vary: Convert for HTML, caches will still distinguish the Markdown variant even if the header is absent, so forwarding an always-present custom header to the origin may be unnecessary. Consider omitting the header entirely when not converting (keep it removed) unless an upstream component explicitly depends on Convert: "".
| } else { | |
| self.add_http_request_header(CONVERT_FLAG, ""); |
| Err(e) => { | ||
| println!( | ||
| "cannot convert body to string {} for {}", | ||
| e, | ||
| self.get_path() | ||
| ); | ||
| self.send_http_response( | ||
| SERVER_ERROR, | ||
| vec![], | ||
| Some(b"Origin response is not valid UTF-8"), | ||
| ); | ||
| return Action::Pause; | ||
| println!("cannot convert body to string {} for {}", e, self.get_path()); | ||
| return Action::Continue; // if body is not valid UTF-8, skip conversion and return original body | ||
| } | ||
| }; |
There was a problem hiding this comment.
When UTF-8 decoding fails you return Action::Continue, but at this point (when response.md is set) the response headers were already rewritten in on_http_response_headers (e.g., Content-Type: text/markdown, removed Content-Length, set Transfer-Encoding: chunked). This will forward the original (non-Markdown) body with Markdown headers, which is incorrect. Consider deferring header mutations until after a successful conversion, or explicitly restoring the original Content-Type/encoding-related headers and clearing response.md before continuing.
| Ok(md) => md, | ||
| Err(e) => { | ||
| println!( | ||
| "cannot convert HTML to Markdown: {} for {}", | ||
| e, | ||
| self.get_path() | ||
| ); | ||
| self.send_http_response( | ||
| SERVER_ERROR, | ||
| vec![], | ||
| Some(b"Failed to convert HTML to Markdown"), | ||
| ); | ||
| return Action::Pause; | ||
| println!("cannot convert HTML to Markdown: {} for {}", e, self.get_path()); | ||
| return Action::Continue; // if conversion fails, skip and return original body | ||
| } |
There was a problem hiding this comment.
On conversion failure you now return Action::Continue, but the response headers may have already been rewritten to Markdown in on_http_response_headers. This can result in serving the original HTML body with Content-Type: text/markdown and chunked transfer encoding, and also caching the wrong variant. If the intent is to fall back to the origin body, the filter should keep the original headers (or revert them) when conversion fails.
Fix caching for HTML->Markdown template