Bug 682833
Summary: | Autofs doesn't resolve correctly with dnsnames after a connection. | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Patrik Martinsson <martinsson.patrik> | ||||||||||
Component: | autofs | Assignee: | Ian Kent <ikent> | ||||||||||
Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | yanfu,wang <yanwang> | ||||||||||
Severity: | unspecified | Docs Contact: | |||||||||||
Priority: | low | ||||||||||||
Version: | 6.0 | CC: | ikent, rwheeler | ||||||||||
Target Milestone: | rc | ||||||||||||
Target Release: | --- | ||||||||||||
Hardware: | Unspecified | ||||||||||||
OS: | Unspecified | ||||||||||||
Whiteboard: | |||||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||||
Doc Text: | Story Points: | --- | |||||||||||
Clone Of: | Environment: | ||||||||||||
Last Closed: | 2011-08-15 12:34:38 UTC | Type: | --- | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Attachments: |
|
Description
Patrik Martinsson
2011-03-07 17:57:59 UTC
Yes, that's strange behaviour. I'll check it out. Ian (In reply to comment #0) > Description of problem: > Autofs doesn't seem to be able to get host info trough getaddrinfo() when using > dns-names. The feature works well and as expected when having a network, but if > the connection goes down, and then brought up again it doesn't seem to work > properly. Initial impression is I can't duplicate this but I haven't yet setup proper DNS. > > > Version-Release number of selected component (if applicable): > autofs-5.0.5-23.el6.x86_64 > > How reproducible: > Always. > There are a couple of things wrong with this procedure as well. Perhaps you can redo your test taking account of the comments below. > Steps to Reproduce: > 1. Create a map, > test -auto,intr,vers=3 fileserver:/vol/test OK, good start, use an invalid mount option, auto, in the map entry. Probably not an actual problem though. > > 2. /usr/sbin/automount -d -f -n 1 > > 3. cd /autofs/test > > 4. Disconnect network. > > 5. Connect network. > > 6. Run simple getaddrinfo-program to test just to see that we actually can > reach filer. snip ... Right, so run an independent lookup .... mmm, which doesn't check if glibc is caching results and is somehow affected by the network disconnect ..... again probably not the case. > > 6. connection verified, try to cd /autofs/test But your working directory is already /autofs/test so the previous mount should still be mounted and this should do nothing but set the directory again without contacting the autofs daemon. > > Actual results: > autofs complains about, > add_host_addrs: hostname lookup failed: Temporary failure in name resolution You have run the daemon with debug but haven't included the output so we have no idea what actually happened. Like whether the mount was expired (but you would have had to change working directory away, which isn't included in your description) and a new mount was attempted. How about redoing the test with a little more detailed feedback please. Ian Created attachment 482834 [details]
Name lookup alternate program
Running the before, during and after as part of the test
should provide a check for some sort of glibc caching
causing a problem.
Thanks for the quick response, sorry for being somewhat sloppy in the bugreport. In you getaddr-program I added, ni = NULL; after the declaration, otherwise freeaddrinfo segfaults if a host is not reachable. Also changed so we never exit the program, I've attached it together with a debuglog and a recording of how I made the test. I've tested it with the approach you suggest, 1. map is now, temp -defaults,intr,vers=3 fs5:/vol/vol3/no_backup/temp 2. /usr/sbin/automount -d -f -n 1, the log is attached. 3. Having your getaddr app running. 4. cd /autofs/test, mount works. 5. Pressing enter on getaddr program shows successful getaddrinfo on host. 6. cd / 7. Disconnect network. 8. Pressing enter on getaddr program shows unsuccessful getaddrinfo on host. 9. cd /autofs/test, mount does not work, as expected with no network. 10. Connect network again. 11. Pressing enter on getaddr program shows successful getaddrinfo on host. 12. cd /autofs/test, mount does not work, debug output says temporary failure in name resolution, retrying this a couple of times with same result. 13. Pressing enter on getaddr program shows successful getaddrinfo on host. 14. cd /autofs/test, mount does not work, debug output says temporary failure in name resolution. 15. Closing test. I've recorded the test so you can see exactly how its made. I can not reproduce this when using ip-addr instead of hostname. This is really not a big issue since we never change ipaddresses on the filers, however i thought i should mention the problem if anybody else has the same issue. And as I mentioned earlier, sometimes it works after a couple of tries, sometimes I never seem to be able to mount without restarting the daemon. Created attachment 482867 [details]
Logs, recording and getaddr-sample.
(In reply to comment #5) > Thanks for the quick response, sorry for being somewhat sloppy in the > bugreport. No problem. > > In you getaddr-program I added, ni = NULL; after the declaration, otherwise > freeaddrinfo segfaults if a host is not reachable. Also changed so we never > exit the program, I've attached it together with a debuglog and a recording of > how I made the test. Got that. At first I was wondering why the directory change after the interface comes back up caused a new mount but I see that the existing mount gets umounted when the interface goes down, probably by NetworkManager (in my case), it's definitely not autofs doing it. I wasn't aware of that behaviour. At this point I still don't see the name lookup problem you are seeing but that's on Fedora 14. I don't have an up to date RHEL-6 install, so that's the next thing I need to do. I will also update the dns lookup program to do the same thing as the daemon does. It does an address lookup on the passed in string and then progresses to a name lookup if that fails, in case the name is actually an address string. Ian Created attachment 484147 [details]
Another name lookup test program
This is basically what the daemon does when looking up a name.
Does using this for the name lookup exhibit the same problem
you are seeing?
Hey Ian, I've redone the tests with the new name lookup program, and I'm seeing the same behaviour as earlier, you can watch the video. The test is not conclusive since it tends to work sometimes and sometimes not, as you can see in the video the first mount after the disconnect works as expected, but then when i disconnect/connect again I cant get it to mount. Spontaneously I think this sounds like a some caching issue as you've mentioned earlier, but i don't know how much time and effort we should spend on this matter - since it appears that I'm the only one with this problem and it's very random. As I said earlier, we use the IP-addresses instead and it works fine. Although it's always a bit annoying to not find the real issue :) Best regards, Patrik Martinsson Created attachment 485771 [details]
Video of autofs tests.
(In reply to comment #9) > Hey Ian, > > I've redone the tests with the new name lookup program, and I'm seeing the same > behaviour as earlier, you can watch the video. I was hoping the name lookup program would fail in the same way as automount. I thought maybe it was an initialization problem specific to your machine because you had to add initialization to my original program and I didn't see that problem when I used it. It may still be that since the environment within automount isn't the same as the test program. > > The test is not conclusive since it tends to work sometimes and sometimes not, > as you can see in the video the first mount after the disconnect works as > expected, but then when i disconnect/connect again I cant get it to mount. > > Spontaneously I think this sounds like a some caching issue as you've mentioned > earlier, but i don't know how much time and effort we should spend on this > matter - since it appears that I'm the only one with this problem and it's very > random. Yes, it's quite odd. Adding specific initialization to the name lookup parts of autofs is straight forward, it done in only two places. I can make a test package on the chance it would fix the problem. Another thing from the video is that it looks like the "-n 1" option isn't being honoured. The video clearly shows the mount request immediately returning a fail long after the one second negative timeout. That deserves some investigation. I'll have a look around and see if I can see what might cause that. I'm setting devel_ack+ on this bug so I can investigate (and fix) the observed problem with setting the negative timeout. Since I can't reproduce the DNS problem at all I can't fix it until we get a report that has a scenario that gives a different view of the problem and a lead to follow as to what the problem is. I have been unable to reproduce this problem despite a fair amount of effort. Consequently I don't have a resolution ready for RHEL-6.2. Deferring to 6.3. Hi again Ian, I haven't looked into this issue since we changed to rhel 6.1, and since we also changed to ip-addr instead of hostnames. I know that our dns setup is somewhat "not the way it should be", so maybe that has something to do with it (although you would figure that the test-program would get the same result then). I think you can close this as invalid and I can look into it in the future and reopen the bug if I hit this issue again. Thanks for the great work anyway! (In reply to comment #17) > Hi again Ian, > > I haven't looked into this issue since we changed to rhel 6.1, and since we > also changed to ip-addr instead of hostnames. I know that our dns setup is > somewhat "not the way it should be", so maybe that has something to do with it > (although you would figure that the test-program would get the same result > then). Yes, puzzling. > > I think you can close this as invalid and I can look into it in the future and > reopen the bug if I hit this issue again. > > Thanks for the great work anyway! OK, thanks for getting back to us. I'll close this INSUFFICIENT_DATA for want of of better status. Ian |