Bug 485717 - 5.0.4-9 segfaults at startup
5.0.4-9 segfaults at startup
Status: CLOSED RAWHIDE
Product: Fedora
Classification: Fedora
Component: autofs (Show other bugs)
rawhide
All Linux
low Severity medium
: ---
: ---
Assigned To: Ian Kent
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2009-02-16 10:21 EST by Jason Tibbitts
Modified: 2009-02-19 10:21 EST (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-02-19 10:21:37 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Backtrace from core left by 5.0.4-9. (2.74 KB, text/plain)
2009-02-16 13:31 EST, Jason Tibbitts
no flags Details
Backtrace from custom 5.0.4-9 build (3.12 KB, text/plain)
2009-02-17 10:54 EST, Jason Tibbitts
no flags Details
Patch to fix array out of bounds accesses and cleanup couple of other alloca() calls (6.26 KB, patch)
2009-02-19 01:35 EST, Ian Kent
no flags Details | Diff

  None (edit)
Description Jason Tibbitts 2009-02-16 10:21:36 EST
I just updated a rawhide guest; previously it had 5.0.4-8, which worked, but 5.0.4-9 segfaults at startup.  Of course, plenty of other packages were updated at the same time, including the kernel, but booting back into the older one doesn't help.

The only thing logged is:

automount[1663] general protection ip:f8a038 sp:7fff13e706b0 error:0 in libc-2.9.90.so[f10000+169000]

However, when I set LOGGING="debug" in /etc/sysconfig/autofs, I get the following with no segfault (and still no working autofs):

09:09:04 v02 automount[1661]: Starting automounter version 5.0.4-9, master map auto.master
using kernel protocol version 5.01
lookup_nss_read_master: reading master files auto.master
parse_init: parse(sun): init gathered global options: (null)
lookup_read_master: lookup(file): read entry /misc
lookup_read_master: lookup(file): read entry /net
lookup_read_master: lookup(file): read entry +auto.master
lookup_nss_read_master: reading master files auto.master
parse_init: parse(sun): init gathered global options: (null)
lookup_nss_read_master: reading master ldap auto.master
parse_server_string: lookup(ldap): Attempting to parse LDAP information from string "auto.master".
parse_server_string: lookup(ldap): mapname auto.master
parse_ldap_config: lookup(ldap): ldap authentication configured with the following options:
parse_ldap_config: lookup(ldap): use_tls: 0, tls_required: 0, auth_required: 1, sasl_mech: (null)
parse_ldap_config: lookup(ldap): user: (null), secret: unspecified, client principal: (null) credential cache: (null)
do_bind: lookup(ldap): auth_required: 1, sasl_mech (null)
do_bind: lookup(ldap): ldap anonymous bind returned 0
get_query_dn: lookup(ldap): found query dn nisMapName=auto.master,dc=math,dc=uh,dc=edu
parse_init: parse(sun): init gathered global options: (null)
do_bind: lookup(ldap): auth_required: 1, sasl_mech (null)
do_bind: lookup(ldap): ldap anonymous bind returned 0
lookup_read_master: lookup(ldap): searching for "(objectclass=nisObject)" under "nisMapName=auto.master,dc=math,dc=uh,dc=edu"
lookup_read_master: lookup(ldap): examining entries


Note that I stripped the time, host and process info from all but the first line for better formatting; everything happens in the same second.

The result of
ldapsearch -b nisMapName=auto.master,dc=math,dc=uh,dc=edu objectclass=nisObject 
is:

# /home, auto.master, math.uh.edu
dn: cn=/home,nisMapName=auto.master,dc=math,dc=uh,dc=edu
objectClass: nisObject
cn: /home
nisMapName: auto.master
nisMapEntry: ldap:nisMapName=auto.home,dc=math,dc=uh,dc=edu

# /nas, auto.master, math.uh.edu
dn: cn=/nas,nisMapName=auto.master,dc=math,dc=uh,dc=edu
objectClass: nisObject
cn: /nas
nisMapName: auto.master
nisMapEntry: ldap:nisMapName=auto.nas,dc=math,dc=uh,dc=edu

Please let me know if I can supply any additional information.
Comment 1 Ian Kent 2009-02-16 11:29:38 EST
How about a backtrace of a core?
Use "thr a a bt".

Ian
Comment 2 Jason Tibbitts 2009-02-16 12:53:43 EST
I didn't even realize it was leaving core files, but then I noticed them in /.  Unfortunately the backtrace is useless without debug symbols and debuginfo installation seems to be broken in rawhide.  (Yum always bails with a message about metalink.xml not existing.)  I'll keep trying.
Comment 3 Ian Kent 2009-02-16 13:01:29 EST
(In reply to comment #2)
> I didn't even realize it was leaving core files, but then I noticed them in /. 
> Unfortunately the backtrace is useless without debug symbols and debuginfo
> installation seems to be broken in rawhide.  (Yum always bails with a message
> about metalink.xml not existing.)  I'll keep trying.

Right, you could just grab the srpm and "rpmbuild --rebuild" to
get the debuginfo package.
Comment 4 Jason Tibbitts 2009-02-16 13:30:05 EST
True, but I also need a pile of other debuginfo packages.  In any case I think I've managed to pull everything together; at least, gdb doesn't tell me I need to install something else.  I'll attach what I have.
Comment 5 Jason Tibbitts 2009-02-16 13:31:26 EST
Created attachment 332082 [details]
Backtrace from core left by 5.0.4-9.
Comment 6 Ian Kent 2009-02-16 23:00:37 EST
(In reply to comment #5)
> Created an attachment (id=332082) [details]
> Backtrace from core left by 5.0.4-9.

Looking at the backtrace it's extremely hard to see that
this alloc or free could be wrong and installing Rawhide
is proving to be a bit of a nightmare.

I'll keep at it.
Comment 7 Jason Tibbitts 2009-02-16 23:41:54 EST
If there's any debugging you'd like me to do, I can tweak the source or apply patches and rebuild.  I'm not terribly well versed in gdb, though, so if you want me to set break or watch points then you probably need to be explicit.

For getting to rawhide, I found it far easier to install a minimal F10 system and let yum upgrade it.
Comment 8 Ian Kent 2009-02-17 00:14:59 EST
(In reply to comment #7)
> If there's any debugging you'd like me to do, I can tweak the source or apply
> patches and rebuild.  I'm not terribly well versed in gdb, though, so if you
> want me to set break or watch points then you probably need to be explicit.

Thanks for the offer but the trace we have shows a problem with
something that is easily verified OK by inspection so this must
actually be something else in disguise but I don't know what.

I haven't been able to reproduce a problem on F-9 or F-10 with
this package using a similar LDAP map.

> 
> For getting to rawhide, I found it far easier to install a minimal F10 system
> and let yum upgrade it.

The 2.6.29 series just doesn't seem to get anywhere at boot.
I've tried several revs.
Comment 9 Ian Kent 2009-02-17 00:20:16 EST
(In reply to comment #7)
> If there's any debugging you'd like me to do, I can tweak the source or apply
> patches and rebuild.  I'm not terribly well versed in gdb, though, so if you
> want me to set break or watch points then you probably need to be explicit.

But I'd be interested to see a debug log with authrequired="no"
in /etc/autofs_ldap_auth.conf.

It shouldn't make a difference though.

Ian
Comment 10 Ian Kent 2009-02-17 05:02:42 EST
I've managed to install a Rawhide instance.
I've tried a couple of LDAP configurations and I'm still not
seeing any segv issues.

Can you post the output of "ldd /usr/sbin/automount" please.
Ian
Comment 11 Ian Kent 2009-02-17 05:18:36 EST
(In reply to comment #4)
> True, but I also need a pile of other debuginfo packages.  In any case I think
> I've managed to pull everything together; at least, gdb doesn't tell me I need
> to install something else.  I'll attach what I have.

I'm also curious what the result of building and using the
autofs package local on your machine would be. That should
rule out any library version mismatches.

Ian
Comment 12 Jason Tibbitts 2009-02-17 10:53:30 EST
Several replies in one here:

I went ahead and updated to the latest rawhide, since it has a new kernel and a new gcc.  It makes no difference with the rawhide autofs.  Removing and reinstalling autofs also makes no difference.

authrequired is already set to "no" in /etc/autofs_ldap_auth.conf; that file is unchanged in my installation.

# ldd /usr/sbin/automount
        linux-vdso.so.1 =>  (0x00007fff751ff000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00000000006df000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00007f5b6cef6000)
        libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f5b6ccdb000)
        libc.so.6 => /lib64/libc.so.6 (0x00007f5b6c969000)
        /lib64/ld-linux-x86-64.so.2 (0x0000000000a2f000)

I went ahead and rebuilt what's in current autofs with the currently installed rawhide system (including the new gcc); it still crashes, but the backtrace looks different.  I'll attach it.
Comment 13 Jason Tibbitts 2009-02-17 10:54:59 EST
Created attachment 332234 [details]
Backtrace from custom 5.0.4-9 build
Comment 14 Ian Kent 2009-02-17 11:39:17 EST
(In reply to comment #12)
> Several replies in one here:
> 
> I went ahead and updated to the latest rawhide, since it has a new kernel and a
> new gcc.  It makes no difference with the rawhide autofs.  Removing and
> reinstalling autofs also makes no difference.

OK, thanks for doing that.

> 
> authrequired is already set to "no" in /etc/autofs_ldap_auth.conf; that file is
> unchanged in my installation.

That's odd, the debug log entries above indicated it was set
to yes.

> 
> # ldd /usr/sbin/automount
>         linux-vdso.so.1 =>  (0x00007fff751ff000)
>         libpthread.so.0 => /lib64/libpthread.so.0 (0x00000000006df000)
>         libdl.so.2 => /lib64/libdl.so.2 (0x00007f5b6cef6000)
>         libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f5b6ccdb000)
>         libc.so.6 => /lib64/libc.so.6 (0x00007f5b6c969000)
>         /lib64/ld-linux-x86-64.so.2 (0x0000000000a2f000)
> 
> I went ahead and rebuilt what's in current autofs with the currently installed
> rawhide system (including the new gcc); it still crashes, but the backtrace
> looks different.  I'll attach it.

The backtrace looks almost identical to the other report of
this (yes, your not alone). At least that's something I guess.

In the backtrace it looks like the server field in the LDAP
info struct has rubbish in it but I can't see how that is
happening since it is always initialized to zeros.

I've managed to install an instance of Rawhide and I've tried
to duplicate this without success so far. Maybe we should also
compare package versions, libxml2 (do you have libxml2-devel
installed), ldap libraries, Kerberos and SASL.

I'll keep trying to duplicate it, there must be something I'm
missing.

Ian
Comment 15 Jason Tibbitts 2009-02-18 11:16:07 EST
It looks like somehow I entered a comment when the information you asked for but didn't click submit.  Crap.

The logs do indicate that somehow authorization is required, yet I'm not setting it.  However, I went back to -8, which works, and it also indicates that authorization is required.

Here's the version info you requested; I'm current as of last night's rawhide push.

# rpm -qa|egrep '(ldap|libxml|krb|sasl)'|grep -v debug|sort
cyrus-sasl-2.1.22-21.fc11.x86_64
cyrus-sasl-devel-2.1.22-21.fc11.x86_64
cyrus-sasl-lib-2.1.22-21.fc11.x86_64
krb5-devel-1.6.3-17.fc11.x86_64
krb5-libs-1.6.3-17.fc11.x86_64
krb5-workstation-1.6.3-17.fc11.x86_64
libxml2-2.7.3-1.fc11.x86_64
libxml2-devel-2.7.3-1.fc11.x86_64
nss_ldap-264-1.fc11.x86_64
openldap-2.4.14-1.fc11.x86_64
openldap-devel-2.4.14-1.fc11.x86_64
pam_krb5-2.3.3-1.fc11.x86_64
python-krbV-1.0.13-8.fc11.x86_64

Anyway, since -8 works and -9 doesn't, I checked CVS and fortunately the only difference is that -9 has a bunch of patches applied.  I bisected the list and found that if everything up autofs-5.0.4-easy-alloca-replacements.patch is applied, there's no crash but if I then apply that patch in addition then automount dies.  The patches beyond that point don't seem to be independent so I can't just leave that one patch out, but hopefully that's a sufficient clue to go on.
Comment 16 Ian Kent 2009-02-18 11:52:13 EST
(In reply to comment #15)
> It looks like somehow I entered a comment when the information you asked for
> but didn't click submit.  Crap.
> 
> The logs do indicate that somehow authorization is required, yet I'm not
> setting it.  However, I went back to -8, which works, and it also indicates
> that authorization is required.
> 
> Here's the version info you requested; I'm current as of last night's rawhide
> push.
> 
> # rpm -qa|egrep '(ldap|libxml|krb|sasl)'|grep -v debug|sort
> cyrus-sasl-2.1.22-21.fc11.x86_64
> cyrus-sasl-devel-2.1.22-21.fc11.x86_64
> cyrus-sasl-lib-2.1.22-21.fc11.x86_64
> krb5-devel-1.6.3-17.fc11.x86_64
> krb5-libs-1.6.3-17.fc11.x86_64
> krb5-workstation-1.6.3-17.fc11.x86_64
> libxml2-2.7.3-1.fc11.x86_64
> libxml2-devel-2.7.3-1.fc11.x86_64
> nss_ldap-264-1.fc11.x86_64
> openldap-2.4.14-1.fc11.x86_64
> openldap-devel-2.4.14-1.fc11.x86_64
> pam_krb5-2.3.3-1.fc11.x86_64
> python-krbV-1.0.13-8.fc11.x86_64

Thanks, I'll check these.

> 
> Anyway, since -8 works and -9 doesn't, I checked CVS and fortunately the only
> difference is that -9 has a bunch of patches applied.  I bisected the list and
> found that if everything up autofs-5.0.4-easy-alloca-replacements.patch is
> applied, there's no crash but if I then apply that patch in addition then
> automount dies.  The patches beyond that point don't seem to be independent so
> I can't just leave that one patch out, but hopefully that's a sufficient clue
> to go on.

That's also quite odd, that patch largely changes alloca(3) calls
to use arrays or malloc(3). I went over that submission several
times. But, the fact remains, autofs breaks for you when it's
added so something isn't quite right.

It would probably be a good idea for me to do a scratch build
with all the patches except the one above, as long as you have time
to test it out of course. Would that be OK with you?

And I'll keep trying to get a 64-bit Rawhide installed and duplicate
the problem (KVM is does not play well with x86_64 at the moment).

Ian
Comment 17 Jason Tibbitts 2009-02-18 12:17:56 EST
I'm willing to test anything you'd like.  It's just a KVM guest that's not doing anything else, so I can trash it at will.  If it gets bad enough, I can put it outside the firewall and get a public key from you so that you can log in.
Comment 18 Ian Kent 2009-02-18 20:36:50 EST
In the effort to duplicate this I've updated my system (since
KVM is having problems creating x86_64 vms atm) from F-9, i386
-> F-10, x86_64 and sure enough I can reproduce the issue using
the Rawhide package.

The work you did to identify the patch where the issue first
starts will save me a heap of time, thanks very much for the
effort.
Comment 19 Ian Kent 2009-02-19 01:35:19 EST
Created attachment 332500 [details]
Patch to fix array out of bounds accesses and cleanup couple of other alloca() calls

I think this fixes the SEGV we've been seeing.
I've built autofs-5.0.4-11 into Rawhide with this patch,
please test the package and let me know how it goes.

This access violation has been present in the code for
a long time and this was the sort of thing the alloca()
replacement should have identified. But, alas, it didn't
show up on 32-bit arch and so got through to cause pain.

Sorry!
Ian
Comment 20 Jason Tibbitts 2009-02-19 10:21:37 EST
I can verify that the -11 package pulled from koji works fine.  Thanks for looking into this.

I'll go ahead and close this out.

Note You need to log in before you can comment on or make changes to this bug.