Description
When calling URL.to_uri() on a URL that uses a plain IPv6 literal host (e.g. http://[::1]:PORT/...), hyperlink attempts to IDNA-encode the host. This is incorrect and results in an exception from idna, since IPv6 literals are not DNS labels and must not undergo IDNA processing.
This is the same root cause as #68, which appears to have gone stale.
Reproduction
from hyperlink import EncodedURL
url = EncodedURL.from_text("http://[::1]:60993/api/v1/token")
url.to_uri().to_text().encode("ascii")
Actual result
idna.core.InvalidCodepoint:
Codepoint U+003A ':' not allowed
This occurs because idna.encode("::1") is invoked.
Expected result
The URL should round-trip successfully:
"http://[::1]:60993/api/v1/token"
Why this is a bug
- IDNA applies only to DNS labels, not IP literals
- RFC 3986 explicitly allows IPv6 literal hosts
hyperlink already has parse_host() which distinguishes:
- DNS names
- IPv4 literals
- IPv6 literals
- Applying IDNA to IP literals is both unnecessary and invalid
Suggested fix
In URL.to_uri(), detect IPv4/IPv6 literal hosts using parse_host() and skip IDNA encoding for those cases. IDNA should only be applied to non-IP (DNS) hosts.
Pseudo-logic:
family, host_text = parse_host(self.host)
if family is not None:
# IPv4 / IPv6 literal
host = host_text
else:
host = idna_encode(host_text, uts46=True).decode("ascii")
Regression test
def test_ipv6_literal_not_idna_encoded():
from hyperlink import EncodedURL
url = EncodedURL.from_text("http://[::1]:8080/path")
assert url.to_uri().to_text() == "http://[::1]:8080/path"
Environment
- hyperlink: current release / main
- Python: 3.10+
- idna: current
Description
When calling
URL.to_uri()on a URL that uses a plain IPv6 literal host (e.g.http://[::1]:PORT/...),hyperlinkattempts to IDNA-encode the host. This is incorrect and results in an exception fromidna, since IPv6 literals are not DNS labels and must not undergo IDNA processing.This is the same root cause as #68, which appears to have gone stale.
Reproduction
Actual result
This occurs because
idna.encode("::1")is invoked.Expected result
The URL should round-trip successfully:
"http://[::1]:60993/api/v1/token"Why this is a bug
hyperlinkalready hasparse_host()which distinguishes:Suggested fix
In
URL.to_uri(), detect IPv4/IPv6 literal hosts usingparse_host()and skip IDNA encoding for those cases. IDNA should only be applied to non-IP (DNS) hosts.Pseudo-logic:
Regression test
Environment