Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net::base_transport should try multiple resolved addresses / multiple AF_INET families #18422

Open
nvartolomei opened this issue May 13, 2024 · 1 comment
Labels
kind/bug Something isn't working

Comments

@nvartolomei
Copy link
Contributor

nvartolomei commented May 13, 2024

In #18412 I have tried to disable reconnections in tight loop in the http client. Some failing tests did uncover a problem with net::base_transport. That is, it does a DNS lookup for a random ipv4/ipv6 address and uses it. That will often fail on the first try if for example the local network doesn't support ipv6 (as it is in the failed tests in the linked PRs).

The test works in dev branch because after a few attempts we get lucky and finally get a working IP address. If we get the same address a few times in a row, or different address but of the same unsupported family (e.g. ipv6 in the test above) then we waste time on retrying to connect.

I believe it would be better to have the redpanda's net::base_transport cover that case. The DNS request already returns all addresses. We should try them one-by-one until our connection succeeds. Probably will need 2 DNS requests to get IPv6 and IPv4 addresses. Potential optimization is to race connecting to 2 addresses in parallel (like Golang transport does; dial.go can be used as inspiration).

Note: Extra care will need to be taken regarding timeouts. We will need to support a whole operation timeout/deadline and a shorter per-connection attempt timeout.

JIRA Link: CORE-2929

@nvartolomei nvartolomei added the kind/bug Something isn't working label May 13, 2024
@emaxerrno
Copy link
Contributor

@nvartolomei - woah! what a great find! haha love the idea of 2 fibers like in go dialParallel() what a smart idea.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants