I've got a notebook and start an OpenVPN daemon at boot time, but lacking a dns server it cannot connect. When I get some kinf of network connection by plugging in an ethernet cable or coming near some WLAN access point, my network is reconfigured and all newly started processes work well enough. But processes started before the resolv.conf was changed have to be restarted by root which is really annoying, especially if they maintain some kind of state while running. This was just one example, I believe there might be many more programs around with this kind of problem, e.g. nscd. So I make this a feature request. I'm suprised I could not find this around already. Possible solutions: * reread resolv.conf every time some name is resolved * check modification time of resolv.conf every time a name is resolved * reread resolv.conf if a nameserver does not respond * reread resolv.conf if cached dns ip is more than t seconds old * include some explicit "reread now" command for such daemons * ... to be combined and continued
There is a solution, already implemented. Use nscd and nscd -i hosts in the script that rewrites your resolv.conf (or nsswitch.conf etc.).
I met the same problem. Is there any way to let gethostbyname() reread '/etc/resolv.conf'?
Use nscd. There will be no support for this is libc routines themselves.
*** Bug 3675 has been marked as a duplicate of this bug. ***
nscd does not resolve the resolver issue. I have verified with sendmail. Sendmail caches the resolver information and will not accept updates, even with nscd running. I have verified it will still attempt to communicate with the old name severs after updating the resolv.conf. Another issue with the nscd solution is that it interferes with freeipa/IDM as they require sssd and all referenced documentation states not to have nscd intalled/running with sssd.
(In reply to Karl from comment #5) > nscd does not resolve the resolver issue. > > I have verified with sendmail. Sendmail caches the resolver information and > will not accept updates, even with nscd running. I have verified it will > still attempt to communicate with the old name severs after updating the > resolv.conf. Indeed, this is a very good point. Thanks.
(In reply to Karl from comment #5) > nscd does not resolve the resolver issue. > > I have verified with sendmail. Sendmail caches the resolver information and > will not accept updates, even with nscd running. I have verified it will > still attempt to communicate with the old name severs after updating the > resolv.conf. How is that a problem in glibc if sendmail caches the resolver? Or are you saying something else? That libc.so.6 caches the resolvers and fails to call out to nscd? That would be a real bug, and we'd like to see some kind of reproducer for that if possible, so we can fix the issue. > Another issue with the nscd solution is that it interferes with freeipa/IDM > as they require sssd and all referenced documentation states not to have > nscd intalled/running with sssd. This documentation is wrong. I have worked closely with the SSSD team at Red Hat and I can get more concrete evidence to prove this if we need to. You can run SSSD with NSCD without any problem. Can you provide a reference to the documentation so I can talk to the SSSD team about this?
(In reply to Carlos O'Donell from comment #7) > How is that a problem in glibc if sendmail caches the resolver? Or are you > saying something else? That libc.so.6 caches the resolvers and fails to call > out to nscd? That would be a real bug, and we'd like to see some kind of > reproducer for that if possible, so we can fix the issue. sendmail needs to do MX lookups, so it uses res_query (and res_search, depending on the context) and not one of the getaddrinfo/get*by* functions. It's also a forking daemon. It initializes the glibc resolver before forking, and I assume all the child processes inherit the cached list of name servers.
(In reply to Florian Weimer from comment #8) > (In reply to Carlos O'Donell from comment #7) > > > How is that a problem in glibc if sendmail caches the resolver? Or are you > > saying something else? That libc.so.6 caches the resolvers and fails to call > > out to nscd? That would be a real bug, and we'd like to see some kind of > > reproducer for that if possible, so we can fix the issue. > > sendmail needs to do MX lookups, so it uses res_query (and res_search, > depending on the context) and not one of the getaddrinfo/get*by* functions. > It's also a forking daemon. It initializes the glibc resolver before > forking, and I assume all the child processes inherit the cached list of > name servers. Correct, the res_* functions are designed specifically to talk directly to Internet domain name servers, and as such bypass nscd and sssd. You are also correct, that once you initialize the resolver state the state is static, this is all well known. All children and threads will have the same resolver state if created after initialization. It is also well known that calling res_init() again will cause any underlying configuration files to be reloaded (atomic increment of __res_initstamp does this). Therefore the bug is entirely in sendmail. If you use this API you must have a side-channel to notify the application that it should call res_init again. This is a push process. For example it might be with systemd integration that you discover the network has changed and call res_init() again. There have been patches floated that add stat() calls to *all* of the res_* functions, but the performance implications of that change have never been analyzed and that's why the patch keeps getting rejected. Is it within the noise to do stat() on /etc/resolv.conf to reload the resolvers if they change? It seems a heavy handed approach for systems which are less dynamic and have more stable configurations. At first blush it seems the stat has to be less costly than the upcoming network traffice, but it's still a non-zero cost paid in the hot-path of all these functions.
*** Bug 18279 has been marked as a duplicate of this bug. ***
Please consider for inclusion in glibc the stat() patch used by Debian. I recently spent a long time tracking down a problem with hostname resolution in a long-running process which ultimately turned out to be caused by glibc caching an empty /etc/resolv.conf. This can occur if the network configuration is dynamic, e.g. managed by DHCP and NetworkManager. From googling, it's apparent that many different programs, such as web browsers (Firefox, Chromium, etc.), have also run into this problem and have had to add hacks to work around it. Interestingly, in contrast with /etc/resolv.conf, glibc's resolver immediately recognizes changes in /etc/hosts. Furthermore, /etc/hosts is always read in full. It seems that if there is concern about performance of stat()ing /etc/resolv.conf, there could be an optimization made to skip reading /etc/hosts if it hasn't changed, thereby replacing about 5 system calls with 1. That would likely save more time per getaddrinfo() than is spent by stat()ing /etc/resolv.conf an extra time. Of course, the "correct" solution would be to use inotify to push changes only when they actually happen. Unfortunately, glibc doesn't have an opportunity to do that.
Created attachment 9078 [details] attachment-37701-0.html This sounds like a great solution!!! On Tue, Mar 8, 2016 at 9:07 PM ebiggers3 at gmail dot com < sourceware-bugzilla@sourceware.org> wrote: > https://sourceware.org/bugzilla/show_bug.cgi?id=984 > > Eric Biggers <ebiggers3 at gmail dot com> changed: > > What |Removed |Added > > ---------------------------------------------------------------------------- > CC| |ebiggers3 at gmail dot com > > --- Comment #11 from Eric Biggers <ebiggers3 at gmail dot com> --- > Please consider for inclusion in glibc the stat() patch used by Debian. I > recently spent a long time tracking down a problem with hostname > resolution in > a long-running process which ultimately turned out to be caused by glibc > caching an empty /etc/resolv.conf. This can occur if the network > configuration > is dynamic, e.g. managed by DHCP and NetworkManager. From googling, it's > apparent that many different programs, such as web browsers (Firefox, > Chromium, > etc.), have also run into this problem and have had to add hacks to work > around > it. > > Interestingly, in contrast with /etc/resolv.conf, glibc's resolver > immediately > recognizes changes in /etc/hosts. Furthermore, /etc/hosts is always read > in > full. It seems that if there is concern about performance of stat()ing > /etc/resolv.conf, there could be an optimization made to skip reading > /etc/hosts if it hasn't changed, thereby replacing about 5 system calls > with 1. > That would likely save more time per getaddrinfo() than is spent by > stat()ing > /etc/resolv.conf an extra time. > > Of course, the "correct" solution would be to use inotify to push changes > only > when they actually happen. Unfortunately, glibc doesn't have an > opportunity to > do that. > > -- > You are receiving this mail because: > You are on the CC list for the bug.
(In reply to Eric Biggers from comment #11) > Please consider for inclusion in glibc the stat() patch used by Debian. The Debian patch is incorrect, it breaks applications which override name servers by direct access to _res. I plan to add some /etc/resolv.conf auto-update functionality, but it will need a different implementation. This work is blocked by our inability to properly test libresolv and /etc/resolv.conf processing right now. A first step along the path is this patch, which is still awaiting review: <https://sourceware.org/ml/libc-alpha/2016-02/msg00376.html>
I am having the same issue with CrashPlan. The backup engine service fails because there is no connection on boot. Here is the text from the support representative. Any thoughts would be appreciated. ========================================================================== Thanks for your patience on this. Our inquiry wound it's way to the Tier 3 Engineers and the Engineering department. What you are experiencing is a bug in Linux... depending on who you ask. RedHat thinks it is working as expected, everyone else thinks it's a bug. Go figure! Here's what is going on (just a warning, things get pretty jargon-y): When a process (like CrashPlan), makes its first DNS request, glibc reads the list of DNS servers from /etc/resolv.conf. If networks are chosen after boot, or dynamically with DHCP, the /etc/resov.conf may be empty. This means that applications that don't re-initialize their name severs will be stuck with nothing, and will not be able to resolve addresses. RedHat's stance on this is that it is up to the application to handle this re-initialization logic. For short lived programs (ping for example), this isn't a big deal because once they are run again, there is usually a name server for resolution. For long-running daemons, such as the CrashPlan Engine, they never re-intialize the name servers, and cannot connect. Restarting the service with a connection, and therefore a nameserver, gets things rolling again - which is why you notice that a restart resolves the issue. Notably, Debian based distros use a patched version of glibc that takes care of this problem. Unfortunately, as only a very small subset of our users experience this problem, we have no intentions of changing the logic of our product to account for how RedHat distributions handle initial name resolution. That leaves you with 3 options, all of which are beyond CrashPlan's scope of support: Rebuild glibc with Debian's patch. Configure NetworkManager to use a local dnsmasq instance. Switch distros to a Debian-based solution (Debian, Ubuntu, Mint, etc.) Likewise, CrashPlan may not be the product that fits your use case - and that is what the trial period is for! We want you to have a backup solution that works for you. You may find the glibc bug page interesting, though to be honest I can't make heads or tails out of it!: https://sourceware.org/bugzilla/show_bug.cgi?id=984 Though not satisfying, within the context of CrashPlan support I must consider this ticket resolved, and will mark it as solved. If you have any additional questions, please let me know!
Any update? This bug is now 11 years old and injects false notions into posiz compliant code. Caching the resolver should be avoided at all costs. There are methods to cache the name lookups which should be used, but caching the resolver results in bad results with Network Manager (installed by default by Red Hat) and any modifications to the resolv.conf name servers. The only way to address this currently is to reboot the server anytime the resolver is modified. This is not practical and, again, Network Manager will modify it after boot. I've already proven that nscd and sssd do not address this break. There's also a very real exploit here. A hacker could gain the ability to modify the resolv.conf, restart apache, sendmail, or other app which is caching the resolver information, place back the original resolv.conf and now use their name servers to route web or smtp traffic to their sites.
(In reply to Karl from comment #15) > Any update? > > This bug is now 11 years old and injects false notions into posiz compliant > code. > > Caching the resolver should be avoided at all costs. There are methods to > cache the name lookups which should be used, but caching the resolver > results in bad results with Network Manager (installed by default by Red > Hat) and any modifications to the resolv.conf name servers. > > The only way to address this currently is to reboot the server anytime the > resolver is modified. This is not practical and, again, Network Manager will > modify it after boot. I've already proven that nscd and sssd do not address > this break. > > There's also a very real exploit here. A hacker could gain the ability to > modify the resolv.conf, restart apache, sendmail, or other app which is > caching the resolver information, place back the original resolv.conf and > now use their name servers to route web or smtp traffic to their sites. There is some consensus that glibc should be changed to match the debian-glibc behaviour which checks for changes in /etc/resolv.conf. The problem as noted in comment 13 by Florian we need better test infrastructure in glibc to test resolver changes. With that in mind I reviewed Florian's chroot-based test for resolver changes here: https://sourceware.org/ml/libc-alpha/2016-06/msg00376.html https://sourceware.org/ml/libc-alpha/2016-06/msg00366.html Thus I think we're making some progress here.
*** Bug 20900 has been marked as a duplicate of this bug. ***
(In reply to Karl from comment #15) sorry, but your security claim makes no sense. if a hacker has compromised your system enough to modify resolv.conf, then you've already lost. claiming reloading it on the fly fixes things is a bit ridiculous.