-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Open
Description
Currently, the socket runtime will call LibC.getaddrinfo, which will block the event loop until it completes.
For latency-sensitive applications, this can cause slowdowns that stack, or other issues. It is easy to write a python script that makes a bunch of HTTP requests to some hosts, and it will be much faster than Crystal, because it doesn't get stuck on DNS.
Since all other discussions I could find are quite old & closed, opening this new one for visibility & broader view of the problem's history.
Related material
- Fix/Implement own DNS resolver #2660
- In a past era, Crystal used libevents getaddrinfo, but it was discarded for plain LibC getaddrinfo due to other issues. I don't know if anything has changed on libevent's end to consider trying it again.
- dns.cr shard
- Currently we use this for non-blocking DNS, and it works "ok". I do not recommend it though - unless you really need it - as it is far too complicated.
- Theaded DNS resolver (from Configurable DNS resolvers #4236)
- To me, it seems like this could be a next step: Keep the robustness of libc getaddrinfo, but do it in another thread for the event loop is not blocked. This implementation could be brought up to date, or a new one made.
straight-shoota, Sija, jwoertink, bararchy, n-rodriguez and 4 moren-rodriguez
Metadata
Metadata
Assignees
Type
Projects
Milestone
Relationships
Development
Select code repository
Activity
feat(staff-api): fail fast if misconfigured
straight-shoota commentedon Apr 24, 2024
A great benefit of
getaddrinfois that it's battle tested and covers a lot of niches. It's used almost ubiquitously and its behaviour is considered as the system default (which it actually is on many systems).Its main issue is that its blocking the current thread. We can alleviate that a bit by running it in a dedicated thread. This could happen implicitly and would be a nice feature in the context of the ongoing multi-threading refactor (ref crystal-lang/rfcs#2).
This is a well-known technique and should work relatively fine. However, it's quite inefficient.
So I think we'll eventually need a native implementation for DNS resolution that uses Crystal's concurrency.
https://github.com/636f7374/dns.cr could be a good source for inspiration, but it's far too complex for this. We only need a relatively simple implementation, covering a fraction of its features.
We might take further inspiration from Go: https://pkg.go.dev/net#hdr-Name_Resolution
It actually still uses
getaddrinfobecause the native implementation is incomplete. There's a rather complex set of rules to decide when to use the native implementation and whengetaddrinfo.Considering that even Go hasn't managed to reach feature parity with
getaddrinfo(I'm not even sure if that's a goal for them), maybe we should really keep our focus on makinggetaddrinfousable as easily and efficiently as possible, and worry about a native implementation later.That said, looking at the Go implementation it really shouldn't be that difficult to implement our own algorithm.
ysbaddaden commentedon Apr 25, 2024
It's impossible to reach feature parity with
getaddrinfo(or equivalent). For example glibc/nsswitch allows extensible resolvers, so anybody can write, add and configure their own special resolver to handle special cases (e.g. custom domain, mDns, whatever you can think of).Let us pause to contemplate the idea of using plugins at the libc level, instead of running a local resolver, that allows such plugins 😮💨
Using a custom DNS resolver, be it libevent dns or a pure Crystal one, means bypassing all that customization. Even going through a custom DNS resolver then fallback to
getaddrinfois prone to errors, because the former may resolve while the latter is customized to return something else 😭Let's not even talk about security extensions (DNSSEC, DoT, DoH, DNSCrypt).
The dedicated thread is still likely a good idea. Even with an io/event aware DNS resolver, we'd still need the ability to call
getaddrinfo. The advantage of Go is that it doesn't need a dedicated thread (it 'merely' detaches the scheduler from the thread) and there can be multiple concurrent calls togetaddrinfo.straight-shoota commentedon Jun 10, 2024
On Linux there's also
getaddrinfo_afor asynchronous queries. It's a GNU extension and doesn't seem to be supported outside of glibc.I haven't looked too deeply into how it's implemented, but it could be a bit of a challenge to integrate it with the event loop.
Also, since it only works with glibc, we'll need a generic solution for other targets anyway. So I don't think it's much worth persuing it.
HertzDevil commentedon Sep 1, 2024
getaddrinfo_anotifies the caller via a POSIX signal or a callback in a new thread.getaddrinfo_async_start; it is used by Bun, other than that I couldn't find any documentation about it. Another system API isDNSServiceGetAddrInfofor macOS 10.12+.DnsQueryExorGetAddrInfoExW. Both support asycnhronous callbacks, the latter overlapped I/O also.I don't think we really have to stick to using
getaddrinfoexactly, as long as other system APIs also provide feature parity with respect to those customizations.Crystal::System::Addrinfo#14957stakach commentedon Sep 3, 2024
building a crystal native implementation sounds like the easiest way to provide cross platform non-blocking support.
I like the idea of being able to switch out the implementation too - making it easier to add support for mDNS or similar to 3rd party libraries that one might be using in an application (or maybe a way to switch implementation on matching regex
.localfor instance)chatgpt pumped out a robust (albeit basic) crystal implementation supports IPv4, IPv6 and ALPN defaulting to using
/etc/resolv.confservers, pipelining queries and multiple DNS servers with timeoutsAdd `Crystal::System::Addrinfo` (#14957)
straight-shoota commentedon Oct 16, 2024
I found this great resource explaining details of the different implementations for DNS resolves on Linux:
https://biriukov.dev/docs/resolver-dual-stack-application/0-sre-should-know-about-gnu-linux-resolvers-and-dual-stack-applications/
Haven't read all of it yet. But I think an easy takeaway is that
getaddrinfo_ais just runninggetattrinfowithin a thread pool. I don't think it worth it using this.In order to support async
getaddrinfoon Unix systems that don't have the GNU extensiongetattrinfo_a(which includes musl), we'll need our own threadpool-based implementation anyway. Then we can just use that on all Unix platforms, including glibc.straight-shoota commentedon Oct 16, 2024
Another observation is that the implementation of
getaddrinfoin musl is very basic compared to glibc (ref #13619 (comment)). I figure it should be relatively easy to implement the equivalent in Crystal natively. The great benefit would be using Crystal's concurrent IO and not occupying an OS thread.I presume the implementations of
getaddrinfoon other Unix systems will be more on the simple side and not as bloated as glibc's.straight-shoota commentedon Oct 16, 2024
There's also an overview of which resolvers are used in popular programming languages' standard libraries:
getaddrinfo+ optional async version running in a thread poolgetaddrinfovia cgo (threaded via scheduler semantics). Go impl is usually preferred, but under some conditions it prefersgetaddrinfo(e.g. on OS X, or with certain configurations that the native implementation cannot handle). So Go impl looks like a plug-in optimization for simple standard configurations.getaddrinfo(not sure if threaded, but seems like no)getaddrinfo(not sure if threaded)getaddrinfovialibuv(in a thread pool)crysbot commentedon Oct 18, 2024
This issue has been mentioned on Crystal Forum. There might be relevant details there:
https://forum.crystal-lang.org/t/charting-the-route-to-multi-threading-support/7320/1
yxhuvud commentedon Oct 19, 2024
One way to get started fast could be to start with one of the implementations for ruby, translate that and then refactor until happiness ensues. Available options there are https://github.com/ruby/resolv and very perhaps https://github.com/socketry/rubydns even if that one mainly is a server and therefore has more functionality than we want. If nothing else they could be good places to look for inspiration.
straight-shoota commentedon Oct 19, 2024
The implementations in Go and musl libc should also be good inspiration.
ysbaddaden commentedon Oct 20, 2024
I'll state it again: languages don't implement custom DNS because it's much harder and complex than it first looks like.
We already removed support for libevent DNS because it failed in practice in simple setups, because glibc, nsswitch, bonjour, docker DNS, etc. See #2660 #2426 #2745 and the issues they link to.
Stdlib should merely provide a configurable backend and either keep the current blocking behavior and/or delegate to a pool of threads. Then shards can implement custom DNS resolvers and applications replace the default adapter.
straight-shoota commentedon Oct 20, 2024
Not sure I agree without reservations. I think it depends a lot on use case.
The complications only apply if you expect feature parity with
getattrinfo, particularly the GNU variant. But that complexity may not always be necessary or desirable. For example, it makes it hard to reason about what's going on.System-specific resolvers can cause an application to behave differently on different platforms, including statically (musl) vs. dynamically (glibc) linked executables on Linux.
That being said, this would be very low priorty. I don't think we will see a resolver implementation in stdlib anytime soon. If ever.
An immediate action should be to ensure
getaddrinfocan run in a thread without blocking other fibers. This shouldn't be hard with the upcoming execution contexts.And we should see what's necessary to integrate other resolver implementations from user code.
stakach commentedon Oct 22, 2024
I created a shard with a possible implementation: https://github.com/spider-gazelle/dns
Currently features:
.localdomainsHappy to work on getting this into crystal std lib with community collaboration if there is interest / people think it's going in the right direction - or at least hooks in the std lib so an alternative resolver can be used without monkey patching
stakach commentedon Oct 31, 2024
I have https://github.com/spider-gazelle/dns probably resolving most of the issues seen in previous getaddrinfo replacements
I would like to progress this into something that might be considered for inclusion in the std lib
AND/OR
implementing an official way to replace the default resolver as @ysbaddaden mentioned.
My opinion is:
include "dns"automatically withsocketfor platforms where an async implementation isn't availableSocket::Addrinfothat doesn't seem ideal)I imagine something like this in the standard lib (of course no one would be forced to use this implementation either, assuming point 2 is implemented)
Does the above sound like a way forward?
atlantis commentedon Nov 2, 2024
This sounds amazing to me! Even if
getaddrinforemains the default implementation, I would argue that it's a kindness to future Crystal developers to call in its own thread, if at all possible - it was a bit of a shock to see my entire near-realtime embedded application hang for 20s the first time it lost internet, in spite of all my carefulspawnplanning.Granted my use-case is non-standard, but I believe that even a vanilla Crystal use-case like a high-performance Kemal app could easily be affected by this design decision (e.g. an app that routinely calls customer-specified webhook urls might periodically hang for a few seconds until
getaddrinforeturns)?So if it's feasible to have the default
getaddrinfocall in its own thread, seems like that's clearing out a potential trap that would otherwise be lurking once you start using Crystal at scale. 🤷♂️stakach commentedon Nov 5, 2024
@atlantis we could make the DNS lib native resolver perform the resolutions in a thread pool on platforms that don't support async - which would solve the issue
Using native async system calls would be lighter weight so I think doing that by preference would work best.
On platforms where async isn't supported we would
require "dns"anytimesocketis required to provide async DNS transparentlyThat way we don't complicate things beyond the 3 points I outlined above (I'll edit the message with these changes)
albertz commentedon Sep 17, 2025
I found this issue to be an interesting discussion about async alternatives to
getaddrinfo.There was a recent article RIP pthread_cancel (HN discussion) on
pthread_canceluse forgetaddrinfo(which does not work well).One option which was often mentioned is the standalone library c-ares. Maybe that could be an option for Crystal as well? C-ares also provides a list of other alternatives here, e.g. djbdns.
yxhuvud commentedon Sep 18, 2025
Generally I'm skeptical of using alternatives to
getaddrinfofor compatibility reasons, but c-ares claims to be used by among others libcurl and nodejs, which seems like a very reasonable set of users. Looking at their examples . it seems like they have support for both threaded and async mode activated by socket. Either would be very implementable, though the devil is in the details in deciding of what would be the best solution for crystal in an execution context world.straight-shoota commentedon Oct 16, 2025
The Zig devblog writes about their homegrown async DNS and its quirks compared to
getaddrinfo: https://ziglang.org/devlog/2025/#2025-10-15