Bug 491351 - automount segfault after lookup failure
automount segfault after lookup failure
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: autofs (Show other bugs)
5.3
All Linux
low Severity high
: rc
: ---
Assigned To: Ian Kent
BaseOS QE
: OtherQA
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2009-03-20 11:13 EDT by Sachin Prabhu
Modified: 2010-10-23 04:26 EDT (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Previously, if a name lookup failed while creating a TCP or UDP client, automount would destroy the client, but would not set the rpc client to NULL. Therefore, subsequent lookup attempts would attempt to use the invalid rpc client, which would lead to a segmentation fault. Now, when a name lookup fails, autofs sets the rpc client to NULL, and therefore avoids the segmentation fault on subsequent lookup attempts.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-09-02 07:59:53 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Patch to clear rpc client on lookup fail (1.09 KB, patch)
2009-03-20 12:14 EDT, Ian Kent
no flags Details | Diff

  None (edit)
Description Sachin Prabhu 2009-03-20 11:13:09 EDT
A user is experiencing automount segfaults on all of their systems on autofs-5.0.1-0.rc2.102 (and previous versions).  

  Feb  4 06:00:44 rlph047 automount[3491]: create_tcp_client:299: hostname lookup failed: No such file or directory
  Feb  4 06:00:44 rlph047 kernel: automount[28201]: segfault at 00002aab00b9401c rip 00002aaaab63ea00 rsp 0000000040820078 error 6

The segfault is always preceded by the hostname lookup failure, and it seems to happen at random.  We have had them check their DNS setup and they know of no problems that would cause lookup failures. They are using 8-way redundant NFS servers in all of their maps.

We have a core file along with the following partial backtrace.

  #0  0x00002aaaab63ea02 in ?? ()
  #1  0x00002aaaac4a0e91 in rpc_destroy_tcp_client (info=0x40a20fc0) at rpc_subs.c:384
  #2  0x00002aaaac49ff6e in get_nfs_info (logopt=0, host=0x555561636e00, pm_info=0x40a20f70, rpc_info=0x40a20fc0, proto=0x40a20ea0 "h\001", version=16,
   options=0x40a210c0 "ro,hard,intr,vers=3,noquota,tcp,timeo=600,retrans=2", random_selection=0) at replicated.c:582
  #3  0x00002aaaac4a0699 in prune_host_list (logopt=0, list=0x40a21168, vers=51, options=0x40a210c0 "ro,hard,intr,vers=3,noquota,tcp,timeo=600,retrans=2",
   random_selection=0) at replicated.c:640
...


Which afaict appears to be segfaulting in the macro:

  #define clnt_control(cl,rq,in) ((*(cl)->cl_ops->cl_control)(cl,rq,in))

Since the problem seems to occur in the replicated code path, I had them see if they could reproduce it without the redundant servers and so far they have not been able to.

I attempted to reproduce the issue by setting up a redundant NFS share on my cluster and specifying one invalid hostname in the map:

  images -rw,hard,intr,bg,vers=3noquota,nosuid,tcp,timeo=600,retrans=2 jrummy5-1-clust.ruemker.pvt,jrummy5-2-clust.ruemker.pvt,test1.ruemker.pvt:/mnt/lv1

where the first 2 are valid hostnames and the 3rd is not.  However it looks like I hit a different lookup failure than him:

  /var/log/messages.3:Feb 11 12:43:34 jrummy5-64 automount[15245]: host test1.ruemker.pvt: lookup failure 1

So it appears the customer's setup succeeds at the first lookup in modules/replicated.c:add_host_addrs but is failing in lib/rpc_subs.c:create_{udp,tcp}_client.  On the assumption that the lookup failure was a direct cause of the segfault, I had them try autofs-5.0.1-0.rc2.102 built with this upstream patch:

  http://www.kernel.org/pub/linux/daemons/autofs/v5/autofs-5.0.3-remove-redundant-dns-name-lookups.patch

because it removes redundant lookups when the addr is already known.  This also appears to fix / work around the problem as they haven't seen any failures on the system since installing it.  

Customer has currently worked around the problem by disabling replicated servers.

STEPS TO REPRODUCE: 
-Configure autofs map with replicated NFS servers
-Cause a lookup failure that only triggers the create_tcp_client lookup failure and not the failure in add_host_addrs

EXPECTED RESULTS:  Autofs recovers from failed lookup and resumes operation
Comment 4 Ian Kent 2009-03-20 12:14:15 EDT
Created attachment 336067 [details]
Patch to clear rpc client on lookup fail

I suspect this patch will help.
Comment 5 Ian Kent 2009-03-20 12:17:56 EDT
I have added this patch to rev 102 and made a scratch build.
It can be found at
/mnt/redhat/brewroot/scratch/ikent/task_1733533
and is revision 0.rc2.102.bz491351.1.

Can we have this tested to see if I'm correct please.
Comment 9 Ian Kent 2009-05-21 00:15:48 EDT
This issue has been fixed in the latest autofs package
autofs-5.0.1-0.rc2.125.

I was not able to reproduce this issue however the issue has
been verified by the customer.
Comment 11 Chris Ward 2009-07-03 14:27:39 EDT
~~ Attention - RHEL 5.4 Beta Released! ~~

RHEL 5.4 Beta has been released! There should be a fix present in the Beta release that addresses this particular request. Please test and report back results here, at your earliest convenience. RHEL 5.4 General Availability release is just around the corner!

If you encounter any issues while testing Beta, please describe the issues you have encountered and set the bug into NEED_INFO. If you encounter new issues, please clone this bug to open a new issue and request it be reviewed for inclusion in RHEL 5.4 or a later update, if it is not of urgent severity.

Please do not flip the bug status to VERIFIED. Only post your verification results, and if available, update Verified field with the appropriate value.

Questions can be posted to this bug or your customer or partner representative.
Comment 13 Ruediger Landmann 2009-08-31 07:39:15 EDT
Release note added. If any revisions are required, please set the 
"requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

New Contents:
Previously, if a name lookup failed while creating a TCP or UDP client, automount would destroy the client, but would not set the rpc client to NULL. Therefore, subsequent lookup attempts would attempt to use the invalid rpc client, which would lead to a segmentation fault. Now, when a name lookup fails, autofs sets the rpc client to NULL, and therefore avoids the segmentation fault on subsequent lookup attempts.

This leads to a subsequent SEGV when attempting to
use the invalid client.
Comment 14 Ruediger Landmann 2009-08-31 20:07:02 EDT
Release note updated. If any revisions are required, please set the 
"requires_release_notes"  flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

Diffed Contents:
@@ -1,4 +1 @@
-Previously, if a name lookup failed while creating a TCP or UDP client, automount would destroy the client, but would not set the rpc client to NULL. Therefore, subsequent lookup attempts would attempt to use the invalid rpc client, which would lead to a segmentation fault. Now, when a name lookup fails, autofs sets the rpc client to NULL, and therefore avoids the segmentation fault on subsequent lookup attempts.
+Previously, if a name lookup failed while creating a TCP or UDP client, automount would destroy the client, but would not set the rpc client to NULL. Therefore, subsequent lookup attempts would attempt to use the invalid rpc client, which would lead to a segmentation fault. Now, when a name lookup fails, autofs sets the rpc client to NULL, and therefore avoids the segmentation fault on subsequent lookup attempts.-
-This leads to a subsequent SEGV when attempting to
-use the invalid client.
Comment 15 errata-xmlrpc 2009-09-02 07:59:53 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-1397.html

Note You need to log in before you can comment on or make changes to this bug.