A user is experiencing automount segfaults on all of their systems on autofs-5.0.1-0.rc2.102 (and previous versions). Feb 4 06:00:44 rlph047 automount[3491]: create_tcp_client:299: hostname lookup failed: No such file or directory Feb 4 06:00:44 rlph047 kernel: automount[28201]: segfault at 00002aab00b9401c rip 00002aaaab63ea00 rsp 0000000040820078 error 6 The segfault is always preceded by the hostname lookup failure, and it seems to happen at random. We have had them check their DNS setup and they know of no problems that would cause lookup failures. They are using 8-way redundant NFS servers in all of their maps. We have a core file along with the following partial backtrace. #0 0x00002aaaab63ea02 in ?? () #1 0x00002aaaac4a0e91 in rpc_destroy_tcp_client (info=0x40a20fc0) at rpc_subs.c:384 #2 0x00002aaaac49ff6e in get_nfs_info (logopt=0, host=0x555561636e00, pm_info=0x40a20f70, rpc_info=0x40a20fc0, proto=0x40a20ea0 "h\001", version=16, options=0x40a210c0 "ro,hard,intr,vers=3,noquota,tcp,timeo=600,retrans=2", random_selection=0) at replicated.c:582 #3 0x00002aaaac4a0699 in prune_host_list (logopt=0, list=0x40a21168, vers=51, options=0x40a210c0 "ro,hard,intr,vers=3,noquota,tcp,timeo=600,retrans=2", random_selection=0) at replicated.c:640 ... Which afaict appears to be segfaulting in the macro: #define clnt_control(cl,rq,in) ((*(cl)->cl_ops->cl_control)(cl,rq,in)) Since the problem seems to occur in the replicated code path, I had them see if they could reproduce it without the redundant servers and so far they have not been able to. I attempted to reproduce the issue by setting up a redundant NFS share on my cluster and specifying one invalid hostname in the map: images -rw,hard,intr,bg,vers=3noquota,nosuid,tcp,timeo=600,retrans=2 jrummy5-1-clust.ruemker.pvt,jrummy5-2-clust.ruemker.pvt,test1.ruemker.pvt:/mnt/lv1 where the first 2 are valid hostnames and the 3rd is not. However it looks like I hit a different lookup failure than him: /var/log/messages.3:Feb 11 12:43:34 jrummy5-64 automount[15245]: host test1.ruemker.pvt: lookup failure 1 So it appears the customer's setup succeeds at the first lookup in modules/replicated.c:add_host_addrs but is failing in lib/rpc_subs.c:create_{udp,tcp}_client. On the assumption that the lookup failure was a direct cause of the segfault, I had them try autofs-5.0.1-0.rc2.102 built with this upstream patch: http://www.kernel.org/pub/linux/daemons/autofs/v5/autofs-5.0.3-remove-redundant-dns-name-lookups.patch because it removes redundant lookups when the addr is already known. This also appears to fix / work around the problem as they haven't seen any failures on the system since installing it. Customer has currently worked around the problem by disabling replicated servers. STEPS TO REPRODUCE: -Configure autofs map with replicated NFS servers -Cause a lookup failure that only triggers the create_tcp_client lookup failure and not the failure in add_host_addrs EXPECTED RESULTS: Autofs recovers from failed lookup and resumes operation
Created attachment 336067 [details] Patch to clear rpc client on lookup fail I suspect this patch will help.
I have added this patch to rev 102 and made a scratch build. It can be found at /mnt/redhat/brewroot/scratch/ikent/task_1733533 and is revision 0.rc2.102.bz491351.1. Can we have this tested to see if I'm correct please.
This issue has been fixed in the latest autofs package autofs-5.0.1-0.rc2.125. I was not able to reproduce this issue however the issue has been verified by the customer.
~~ Attention - RHEL 5.4 Beta Released! ~~ RHEL 5.4 Beta has been released! There should be a fix present in the Beta release that addresses this particular request. Please test and report back results here, at your earliest convenience. RHEL 5.4 General Availability release is just around the corner! If you encounter any issues while testing Beta, please describe the issues you have encountered and set the bug into NEED_INFO. If you encounter new issues, please clone this bug to open a new issue and request it be reviewed for inclusion in RHEL 5.4 or a later update, if it is not of urgent severity. Please do not flip the bug status to VERIFIED. Only post your verification results, and if available, update Verified field with the appropriate value. Questions can be posted to this bug or your customer or partner representative.
Release note added. If any revisions are required, please set the "requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Previously, if a name lookup failed while creating a TCP or UDP client, automount would destroy the client, but would not set the rpc client to NULL. Therefore, subsequent lookup attempts would attempt to use the invalid rpc client, which would lead to a segmentation fault. Now, when a name lookup fails, autofs sets the rpc client to NULL, and therefore avoids the segmentation fault on subsequent lookup attempts. This leads to a subsequent SEGV when attempting to use the invalid client.
Release note updated. If any revisions are required, please set the "requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1,4 +1 @@ -Previously, if a name lookup failed while creating a TCP or UDP client, automount would destroy the client, but would not set the rpc client to NULL. Therefore, subsequent lookup attempts would attempt to use the invalid rpc client, which would lead to a segmentation fault. Now, when a name lookup fails, autofs sets the rpc client to NULL, and therefore avoids the segmentation fault on subsequent lookup attempts. +Previously, if a name lookup failed while creating a TCP or UDP client, automount would destroy the client, but would not set the rpc client to NULL. Therefore, subsequent lookup attempts would attempt to use the invalid rpc client, which would lead to a segmentation fault. Now, when a name lookup fails, autofs sets the rpc client to NULL, and therefore avoids the segmentation fault on subsequent lookup attempts.- -This leads to a subsequent SEGV when attempting to -use the invalid client.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2009-1397.html