Bug 1089576

Summary: segfault in automount
Product: Red Hat Enterprise Linux 6 Reporter: Greg Earle <bugzilla.redhat.com>
Component: autofsAssignee: Ian Kent <ikent>
Status: CLOSED ERRATA QA Contact: JianHong Yin <jiyin>
Severity: low Docs Contact:
Priority: unspecified    
Version: 6.3CC: eguan, ikent, rcritten, rmainz
Target Milestone: rc   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: autofs-5.0.5-94.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 783303 Environment:
Last Closed: 2014-10-14 08:18:21 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Patch - fix reset flex scan buffer on init none

Description Greg Earle 2014-04-21 02:32:38 UTC
I think I'm hitting something very similar to now-closed bug #783303 on RHEL 6.3.

I first saw this happen with autofs-5.0.5-54.el6.i686 as shipped with RHEL 6.3.

I just updated autofs to autofs-5.0.5-88.el6.i686 and the bug is still there.

On a RHEL 6.3 system that uses LDAP for authentication, I can reliably crash "automount" with a SIGSEGV just by SIGHUP'ing it.  And here I thought that was just going to make it re-read its maps  ;-)

I started up the automounter and it spits out this error to the "messages" file:

automount[5622]: syntax error in map near [ master \
ldap:ldap.fltops.jpl.nasa.gov:nisMapName=auto.home,ou=Services,ou=MER,ou=Projects,dc=dir,dc=jpl,dc=nasa,dc=gov ]

but it comes up.  Then I send it a HUP and it results in

automount[5913]: segfault at 0 ip 0023cd2f sp b5dfafbc error 4 in
libc-2.12.so[110000+190000]

Core was generated by `automount --pid-file /var/run/autofs.pid'.
Program terminated with signal 11, Segmentation fault.
#0  0x0023cd2f in __strcmp_sse4_2 () from /lib/libc.so.6
Missing separate debuginfos, use: debuginfo-install autofs-5.0.5-88.el6.i686
(gdb) where
#0  0x0023cd2f in __strcmp_sse4_2 () from /lib/libc.so.6
#1  0x00cf796c in master_find_mapent ()
#2  0x00cf38df in master_parse_entry ()
#3  0x002abf6f in lookup_read_master () from /usr/lib/autofs/lookup_file.so
#4  0x00ce6af1 in ?? ()
#5  0x00ce6c1b in ?? ()
#6  0x00ce84a7 in lookup_nss_read_master ()
#7  0x00cf9102 in master_read_master ()
#8  0x00cd8cf6 in _start ()

I consider this low priority since ordinarily we don't SIGHUP the automounter;
but I am trying to solve some (apparently) negative-caching issues we're having that result in keys not being found despite them having been removed from maps.

Apr 20 14:10:49 machine3 automount[2194]: rpc_get_exports_proto
Apr 20 14:10:49 machine3 automount[2194]: lookup_mount: exports lookup failed for machine2
Apr 20 14:10:49 machine3 automount[2194]: key "machine2" not found in map source(s).

The strange thing is that the maps are obtained from LDAP - "machine2" is a local RAID server and it should not appear in any LDAP map!

Please let me know if there's any other debuginfo I can generate to isolate this.

+++ This bug was initially created as a clone of Bug #783303 +++

Description of problem:

While doing some automount in LDAP testing I apparently caused automount to core at least once. I found the cores after I had finished testing so I'm not sure what the contents of the maps were. I'm providing the stack in case it is useful.

I realize this is a rather unclear description. All I can say is I was trying to configure submounts at the time.

Core was generated by `automount -fdv'.
Program terminated with signal 11, Segmentation fault.
#0  __strcmp_sse2 () at ../sysdeps/x86_64/strcmp.S:214
214             movlpd  (%rsi), %xmm2
Missing separate debuginfos, use: debuginfo-install keyutils-libs-1.2-7.fc15.x86_64 krb5-libs-1.9.2-3.fc15.2.x86_64 libdb-5.1.25-3.fc15.x86_64 libgcc-4.6.0-10.fc15.x86_64 libgssglue-0.3-0.fc15.x86_64 libselinux-2.0.99-4.fc15.x86_64 nss-softokn-freebl-3.12.10-2.fc15.x86_64 openssl-1.0.0e-1.fc15.x86_64
(gdb) where
#0  __strcmp_sse2 () at ../sysdeps/x86_64/strcmp.S:214
#1  0x00007f0af20a6196 in master_find_mapent (master=0x7f0af2d0c040, path=0x0)
    at master.c:622
#2  0x00007f0af20a2ae2 in master_parse_entry (
    buffer=0x7fffd5e14b10 "/share auto.share", default_timeout=300, 
    logging=<optimized out>, age=1326926467) at master_parse.y:768
#3  0x00007f0af04391ee in lookup_read_master (master=<optimized out>, 
    age=1326926467, context=0x7f0af2d19a70) at lookup_ldap.c:1676
#4  0x00007f0af2097d32 in do_read_master (master=0x7f0af2d0c040, 
    type=<optimized out>, age=1326926467) at lookup.c:96
#5  0x00007f0af2098484 in lookup_nss_read_master (master=0x7f0af2d0c040, 
    age=1326926467) at lookup.c:229
#6  0x00007f0af20a7397 in master_read_master (master=0x7f0af2d0c040, 
    age=1326926467, readall=0) at master.c:832
#7  0x00007f0af208d5db in main (argc=0, argv=<optimized out>)
    at automount.c:2146


Version-Release number of selected component (if applicable):

autofs-5.0.5-38.fc15.x86_64

--- Additional comment from Ian Kent on 2012-01-22 19:37:17 EST ---

(In reply to comment #0)
> Description of problem:
> 
> While doing some automount in LDAP testing I apparently caused automount to
> core at least once. I found the cores after I had finished testing so I'm not
> sure what the contents of the maps were. I'm providing the stack in case it is
> useful.
> 
> I realize this is a rather unclear description. All I can say is I was trying
> to configure submounts at the time.

Yep, it looks a bit hard to work out what happened.

> (gdb) where
> #0  __strcmp_sse2 () at ../sysdeps/x86_64/strcmp.S:214
> #1  0x00007f0af20a6196 in master_find_mapent (master=0x7f0af2d0c040, path=0x0)
>     at master.c:622

path=0x00 but should be "/share" by this time ... but ...

> #2  0x00007f0af20a2ae2 in master_parse_entry (
>     buffer=0x7fffd5e14b10 "/share auto.share", default_timeout=300, 
>     logging=<optimized out>, age=1326926467) at master_parse.y:768

the buffer looks like it contains a valid entry and similar
things have been parsed many, many times by the parser ????

Perhaps you could just keep an eye out and get back with 
similar details when it happens again but with a little more
on what was being done at the time.

Ian

--- Additional comment from Fedora End Of Life on 2012-08-06 16:06:45 EDT ---

This message is a notice that Fedora 15 is now at end of life. Fedora 
has stopped maintaining and issuing updates for Fedora 15. It is 
Fedora's policy to close all bug reports from releases that are no 
longer maintained.  At this time, all open bugs with a Fedora 'version'
of '15' have been closed as WONTFIX.

(Please note: Our normal process is to give advanced warning of this 
occurring, but we forgot to do that. A thousand apologies.)

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, feel free to reopen 
this bug and simply change the 'version' to a later Fedora version.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we were unable to fix it before Fedora 15 reached end of life. If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora, you are encouraged to click on 
"Clone This Bug" (top right of this page) and open it against that 
version of Fedora.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

--- Additional comment from Fedora End Of Life on 2012-08-06 16:06:45 EDT ---

This message is a notice that Fedora 15 is now at end of life. Fedora 
has stopped maintaining and issuing updates for Fedora 15. It is 
Fedora's policy to close all bug reports from releases that are no 
longer maintained.  At this time, all open bugs with a Fedora 'version'
of '15' have been closed as WONTFIX.

(Please note: Our normal process is to give advanced warning of this 
occurring, but we forgot to do that. A thousand apologies.)

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, feel free to reopen 
this bug and simply change the 'version' to a later Fedora version.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we were unable to fix it before Fedora 15 reached end of life. If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora, you are encouraged to click on 
"Clone This Bug" (top right of this page) and open it against that 
version of Fedora.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 2 Ian Kent 2014-04-22 02:07:10 UTC
(In reply to Greg Earle from comment #0)
> I think I'm hitting something very similar to now-closed bug #783303 on RHEL
> 6.3.
> 
> I first saw this happen with autofs-5.0.5-54.el6.i686 as shipped with RHEL
> 6.3.
> 
> I just updated autofs to autofs-5.0.5-88.el6.i686 and the bug is still there.
> 
> On a RHEL 6.3 system that uses LDAP for authentication, I can reliably crash
> "automount" with a SIGSEGV just by SIGHUP'ing it.  And here I thought that
> was just going to make it re-read its maps  ;-)
> 
> I started up the automounter and it spits out this error to the "messages"
> file:
> 
> automount[5622]: syntax error in map near [ master \
> ldap:ldap.fltops.jpl.nasa.gov:nisMapName=auto.home,ou=Services,ou=MER,
> ou=Projects,dc=dir,dc=jpl,dc=nasa,dc=gov ]

Right, so you have a syntax error in the master map, looks like
a missing leading "/" on "master".

> 
> but it comes up.  Then I send it a HUP and it results in
> 
> automount[5913]: segfault at 0 ip 0023cd2f sp b5dfafbc error 4 in
> libc-2.12.so[110000+190000]
> 
> Core was generated by `automount --pid-file /var/run/autofs.pid'.
> Program terminated with signal 11, Segmentation fault.
> #0  0x0023cd2f in __strcmp_sse4_2 () from /lib/libc.so.6
> Missing separate debuginfos, use: debuginfo-install autofs-5.0.5-88.el6.i686
> (gdb) where
> #0  0x0023cd2f in __strcmp_sse4_2 () from /lib/libc.so.6
> #1  0x00cf796c in master_find_mapent ()
> #2  0x00cf38df in master_parse_entry ()
> #3  0x002abf6f in lookup_read_master () from /usr/lib/autofs/lookup_file.so
> #4  0x00ce6af1 in ?? ()
> #5  0x00ce6c1b in ?? ()
> #6  0x00ce84a7 in lookup_nss_read_master ()
> #7  0x00cf9102 in master_read_master ()
> #8  0x00cd8cf6 in _start ()

Leaving out the "/" I can duplicate this, including with the
current upstream.

I'll see what I can find.
Ian

Comment 3 Ian Kent 2014-04-22 04:13:45 UTC
Created attachment 888341 [details]
Patch - fix reset flex scan buffer on init

Comment 5 Greg Earle 2014-04-22 16:48:48 UTC
Cheers Ian - I will notify the people that control the LDAP maps and let them know that map has a problem.

Ultimately I am trying to solve this problem:

I have 3 machines exhibiting bizarre behavior.  At 8:45 PM PDT every night (despite my inability to find any 'cron' job referencing this particular time), each of them try to automount *every single host* in our NIS "hosts" map:

automount[7066]: lookup_mount: exports lookup failed for [NIS hostname1]
automount[7066]: lookup_mount: exports lookup failed for [NIS hostname2]
automount[7066]: lookup_mount: exports lookup failed for [NIS hostname3]
etc.

It's driving me batty and no amount of SIGHUPs has solved it.  When I saw a similar message on these LDAP-using hosts, I thought "Aha, I can try to debug it over there, since I'm only getting one of those "exports lookup failed" messages on those machines", as I mentioned previously.

Apr 20 14:10:49 machine3 automount[2194]: rpc_get_exports_proto
Apr 20 14:10:49 machine3 automount[2194]: lookup_mount: exports lookup failed for machine2
Apr 20 14:10:49 machine3 automount[2194]: key "machine2" not found in map source(s).

Anyway, I'll get back to trying to figure out this head-scratcher and will wait in hopes that your 5.0.5-94 patch will float downstream to us RHEL 6.x users where x < 5  :-)

That's what led me to discover this current bug I'm reporting.

Comment 6 Greg Earle 2014-04-22 16:57:07 UTC
Oops, sorry - that last comment should've more accurately said

automount[7066]: lookup_mount: exports lookup failed for [NIS hostname1]
automount[7066]: update_negative_cache: key "NIS hostname1" not found in map.
automount[7066]: lookup_mount: exports lookup failed for [NIS hostname2]
automount[7066]: update_negative_cache: key "NIS hostname2" not found in map.
automount[7066]: lookup_mount: exports lookup failed for [NIS hostname3]
automount[7066]: update_negative_cache: key "NIS hostname3" not found in map.

[... etc. to the end of the NIS "hosts" map entries ...]

Comment 7 Greg Earle 2014-04-22 20:24:09 UTC
I think I may have found the problem with the update_negative_cache issue.  (Typos in our "netgroup" file/map.)  So I guess it's good that it existed to set up the circumstances under which I found this bug, but mentioning it here is just noise now.  Unfortunately I can't edit out Comments 5 & 6.

Anyway Ian thanks for the patch and I'll look forward to the forthcoming -94 version RPMs.

Comment 8 Ian Kent 2014-04-23 02:28:13 UTC
(In reply to Greg Earle from comment #5)
> Cheers Ian - I will notify the people that control the LDAP maps and let
> them know that map has a problem.

There's often a bit of confusion with this so it might be worth
pointing out there are three types of autofs format map they can
encounter (an amd map format parser is being added to autofs but
I won't talk about that).

First is the master map whose first field is the mount point for
the autofs managed file system and so will always be a full path.

Then each entry in the master map will refer to an autofs mount
map which can be either an indirect or direct map.

The first field of indirect maps is a single path component so
has no path separators but the first field of direct maps is
always the full path to the automount point so will always
contain path separators.

I expect you know this already but had to say it for completeness.

Ian

Comment 9 Ian Kent 2014-04-23 02:37:43 UTC
(In reply to Greg Earle from comment #5)
> 
> Anyway, I'll get back to trying to figure out this head-scratcher and will
> wait in hopes that your 5.0.5-94 patch will float downstream to us RHEL 6.x
> users where x < 5  :-)

That's usually not the way things work, bugs get fixed in the
current release only.

If there's a business case to back port a change then a back
port needs to be requested through support and then it needs
to be approved by project management. If approved then it is
usually only for the release previous to the current release.
The further back you need to go the harder it is to get
approval.

fyi, bugs logged directly in public bugzilla are lowest priority
for engineering and are essentially best effort if time permits
since GSS is the way customers are supposed to report problems.

Ian

Comment 10 Greg Earle 2014-04-25 04:11:28 UTC
Thanks for the clarification Ian.

I'm a little puzzled in that sometimes I will see updated RPMs that are tagged with a name showing the current OS build (e.g. ".el6_5.${arch}.rpm" in the name) but then a newer RPM will come along that's apparently been back-ported and it goes back to having a ".el6.${arch}.rpm" name - which I assume means it's been compiled on a previous version.

I realize that might not happen to automount but just in case it does I'll be ready ;-)

(I also realize there's no business case for anything other than "bugs get fixed in the current release only" but some of us work in places *cough*GovernmentLabs*cough* that don't let us run the latest & greatest - which in the case of the current OS release 6.5 means that I avoided Heartbleed.  I didn't file this via GSS because it didn't seem important enough to do so.)

Anyway you can close this bug out.  I found out that fixing the typo problem didn't seem to fix my issue (the automounters on 3 systems deciding simultaneously every day at 8:45 PM to try and automount every host in the NIS "hosts" map, for no obvious reason), so I have to get back to focussing on that oddity.

Comment 11 Ian Kent 2014-04-28 01:25:47 UTC
(In reply to Greg Earle from comment #10)
> Thanks for the clarification Ian.
> 
> I'm a little puzzled in that sometimes I will see updated RPMs that are
> tagged with a name showing the current OS build (e.g. ".el6_5.${arch}.rpm"
> in the name) but then a newer RPM will come along that's apparently been
> back-ported and it goes back to having a ".el6.${arch}.rpm" name - which I
> assume means it's been compiled on a previous version.

Upon each release, 6.3, 6.4, 6.5 etc., a branch is created in our
code repository at the point of the release and work continues on
the main branch toward the next release.

These releases always have the ".el6.${arch}". Now, if for some
reason a bug is approved for back porting to the previous release
(usually not further) then the changes get applied to the specific
release branch and get a revision number like ".el6_x.${arch}[.x]".

The scheme is meant to ensure that revisions on the current release
branch always have a higher revision number (from rpms' POV) than
those on previous release branches.

There's a fair bit more to this relating to how the revision is
specified, what is appropriate for back port etc. but I won't
go into that.

> 
> I realize that might not happen to automount but just in case it does I'll
> be ready ;-)

Looking at the changelog "rpm -q[p] --changelog <package>" of each
should tell the story of what has changed if you need to know.

> 
> (I also realize there's no business case for anything other than "bugs get
> fixed in the current release only" but some of us work in places
> *cough*GovernmentLabs*cough* that don't let us run the latest & greatest -
> which in the case of the current OS release 6.5 means that I avoided
> Heartbleed.  I didn't file this via GSS because it didn't seem important
> enough to do so.)

Sometimes there is a business case for back porting a change but
it's a fair amount of effort for the reporter so the change has
to be important enough for them to consider requesting it be
back ported. Then there's the question of the magnitude of the
change etc.....

> 
> Anyway you can close this bug out.  I found out that fixing the typo problem
> didn't seem to fix my issue (the automounters on 3 systems deciding
> simultaneously every day at 8:45 PM to try and automount every host in the
> NIS "hosts" map, for no obvious reason), so I have to get back to focussing
> on that oddity.

I think the resolution of this bug is fairly clear and we have
a bank of regression tests that attempt to ensure changes don't
introduce regressions.

We'll leave this open while we work through the QA process
for the release it's included in and close it when that's
complete.

Ian

Comment 12 Ian Kent 2014-04-28 01:37:20 UTC
(In reply to Greg Earle from comment #10)
> 
> Anyway you can close this bug out.  I found out that fixing the typo problem
> didn't seem to fix my issue (the automounters on 3 systems deciding
> simultaneously every day at 8:45 PM to try and automount every host in the
> NIS "hosts" map, for no obvious reason), so I have to get back to focussing
> on that oddity.

That has to be something doing a scan of the directory tree.

There's a couple of things that can affect this.

If the autofs mount has the browse option then each host directory
will be created within the autofs mount point and a directory scan
will cause everything to be mounted. But if the browse option isn't
present then there won't necessarily be a bunch of directories for
such a scan (see BROWSE_MODE in the configuration for the default
setting).

If you enable debug logging the the pid of the process that is
requesting the mount will be logged when the daemon receives the
request. Perhaps you could use that to identify the errant process
together with a frequent ps list capture around the time it occurs.

Ian

Comment 18 errata-xmlrpc 2014-10-14 08:18:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-1587.html