Bug 662204

Summary: tcsh seg faults during tilde expansion
Product: [Fedora] Fedora Reporter: Elliott Forney <elliott.forney>
Component: glibcAssignee: Andreas Schwab <schwab>
Status: CLOSED CURRENTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: urgent Docs Contact:
Priority: low    
Version: 14CC: fweimer, jakub, schwab, vvitek
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-08-29 21:02:39 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
backtrace showing segmentation violation after tilde expansion
none
backtrace showing segmentation violation after tilde completion
none
backtrace showing seg fault after tilde expansion
none
backtrace showing seg fault after tilde completion
none
backtrace showing seg fault after tilde completion
none
backtrace showing seg fault after tilde expansion
none
valgrind during tilde expansion
none
valgrind during tilde completion
none
backtrace showing seg fault after tilde expansion
none
gdb output showing malloc_usable_size returns zero before ksh crash none

Description Elliott Forney 2010-12-10 21:03:46 UTC
Description of problem:

tcsh causes a segmentation fault during tilde expansion.  This appears to only  
happen when using NIS with compat set in nsswitch.conf and a +@netgroupname entry in passwd.

Version-Release number of selected component (if applicable):

tcsh-6.17-8.fc14

How reproducible:

Always.

Steps to Reproduce:

1. Configure NIS with at least one user and one netgroup.

2. Set compat for passwd in /etc/nsswitch.conf to allow restricting logins     
with +@netgroupname

3. Add +@netgroupname to end of /etc/passwd

4. start tcsh and try tilde expansion.  For instance:

apples> echo ~idfah
Segmentation fault (core dumped)

apples> cd ~idfah
Segmentation fault (core dumped)
  
Actual results:

Segmentation fault.

Expected results:

Successful expansion of ~ to home directory path.

Additional info:

This does not happen in other shells.  bash or zsh for example.

This problem was not present in Fedora 12.  We tried installing the Fedora 12 tcsh package on Fedora 14 as well as building from source and the problem still happens.  We haven't tried Fedora 13 yet.

Comment 1 Elliott Forney 2010-12-20 21:29:29 UTC
Created attachment 469857 [details]
backtrace showing segmentation violation after tilde expansion

Comment 2 Elliott Forney 2010-12-20 21:31:04 UTC
Created attachment 469858 [details]
backtrace showing segmentation violation after tilde completion

Comment 3 Elliott Forney 2010-12-20 21:39:27 UTC
Looks like tcsh crashes after calling getpwnam when performing tilde expansion, i.e. echo ~username

Also crashes after calling getpwent when performing tilde completion, i.e. cd ~us then tab.

Looks like string passed to getpwnam is correct.  Wrote a simple test program calling getpwnam in the same way that tcsh does, test program does not crash.  Other programs that call getpwnam and getpwent are working correctly.

Comment 4 Elliott Forney 2010-12-29 03:45:22 UTC
Possibly related to the following?

https://bugzilla.redhat.com/show_bug.cgi?id=105886
http://sources.redhat.com/bugzilla/show_bug.cgi?id=962

Enabling nscd seems to thwart the problem as suggested in the second link above.

Comment 5 Elliott Forney 2011-01-20 04:06:49 UTC
Looks like automount is also regularly crashing when +@netgroup entries are in passwd.  The following error is left in /var/log/messages

automount[1103]: set_tsd_user_vars: failed to get passwd info from getpwuid_r

This sounds suspiciously similar to where tcsh is crashing during tilde expansion/completion.  I have spent considerable time digging through the tcsh code and can't seem to find anything.

I am beginning to think the problem is in the glibc NIS code.  I suppose I should open another bug report there?

Comment 6 Vojtech Vitek 2011-02-01 13:01:46 UTC
Thank you for the bug report and further investigation, Eliot!

No, there is no need to open new bug. We can simply change the component and the title once we can prove this is not related to tcsh.

Can you still reproduce the bug? If so, can you please install the following debug info packages and provide us new backtrace?
$ debuginfo-install glibc ncurses-libs nss-softokn-freebl

Comment 7 Elliott Forney 2011-02-04 23:28:45 UTC
Created attachment 477144 [details]
backtrace showing seg fault after tilde expansion

Comment 8 Elliott Forney 2011-02-04 23:30:05 UTC
Created attachment 477145 [details]
backtrace showing seg fault after tilde completion

Comment 9 Elliott Forney 2011-02-04 23:35:41 UTC
Thanks!  Yes, we are still able to reproduce the problem with the latest updates.  I have updated the backtraces with the requested debuginfo's installed.

Comment 10 Vojtech Vitek 2011-02-07 13:37:42 UTC
Changing component to glibc NIS.

I believe this one is related:
https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/23077

Comment 11 Andreas Schwab 2011-02-07 13:51:46 UTC
valgrind?

Comment 12 Elliott Forney 2011-02-07 22:34:14 UTC
Created attachment 477511 [details]
backtrace showing seg fault after tilde completion

Comment 13 Elliott Forney 2011-02-07 22:34:49 UTC
Created attachment 477512 [details]
backtrace showing seg fault after tilde expansion

Comment 14 Elliott Forney 2011-02-07 22:35:36 UTC
Created attachment 477513 [details]
valgrind during tilde expansion

Comment 15 Elliott Forney 2011-02-07 22:36:05 UTC
Created attachment 477514 [details]
valgrind during tilde completion

Comment 16 Elliott Forney 2011-02-07 22:38:39 UTC
Created attachment 477516 [details]
backtrace showing seg fault after tilde expansion

Comment 17 Elliott Forney 2011-02-07 22:44:49 UTC
Added valgrind output and fixed backtraces to be from the current Fedora 14 tcsh version.

Comment 18 Andreas Schwab 2011-02-11 13:51:19 UTC
What is malloc_usable_size returning?

Comment 19 Elliott Forney 2011-02-12 01:30:31 UTC
Interestingly, when it crashed in gdb it actually crashes inside of malloc_usable_size.  So, it does not return.

When running it in valgrind it reports that

-csh: nss_nis/nis-netgrp.c:75: _nss_nis_setnetgrent: Assertion `malloc_usable_size (netgrp->data) >= len + 1' failed.

but I'm not sure what value it is returning.

Comment 20 Elliott Forney 2011-02-12 01:40:26 UTC
It looks like ksh also crashes during tilde expansion:

$ echo ~idfah
ksh: nss_nis/nis-netgrp.c:75: _nss_nis_setnetgrent: Assertion `malloc_usable_size (netgrp->data) >= len + 1' failed.
Aborted (core dumped)

So it definitely isn't a tcsh problem.  The system also seems very unstable unless compat is turned off, the automounter crashes and won't start working again without a reboot.

Running gdb on ksh with a breakpoint at line 74 of nss_nis/nis-netgrp.c in glibc-2.12.90-21.i686 in reveals that malloc_usable_size returns zero, even though strlen(netgrp->data) is 863 and len is 863 and netgrp->data prints just fine.

I assume netgrp->data is pointing to somewhere it shouldn't?

will attach gdb output.

Comment 21 Elliott Forney 2011-02-12 01:43:01 UTC
Created attachment 478338 [details]
gdb output showing malloc_usable_size returns zero before ksh crash

also shows backtraces and that len is correct and strlen(netgrp->data) is correct.

Comment 22 Elliott Forney 2011-02-14 21:09:11 UTC
I should mention that our NIS server runs Red Hat Enterprise Linux 5.6

I'm not sure if it would be worth trying to serve NIS from Fedora 14?

Comment 23 Elliott Forney 2011-02-15 22:02:05 UTC
Looks like glibc-2.13-1 was recently released as a Fedora update.  We installed it last night and the problem seems to have stopped!

Comment 24 Elliott Forney 2011-02-22 23:40:30 UTC
tcsh and ksh no longer seg fault with glibc-2.13-1.  I don't know what you guys did but thanks!

We are still having problems with autofs hanging but I think it may be a different problem.  In fact, I can cause NIS to temporarily hang altogether with a command like the following:

while true; do ssh localhost uptime; done

or by calling getpwnam() repeatedly.  NIS and autofs start working again if you wait a few minutes after this.  It is almost like NIS trying to prevent a DOS attack.  This doesn't happen with Fedora 12 so it must be a change on the client end.  Of course, starting nscd stops the problem but I would argue that one shouldn't have to use nscd to run NIS+compat.

Any thoughts on this would be much appreciated.  This is likely a separate problem, should I file a separate bug report?

Comment 25 Elliott Forney 2011-08-29 21:02:39 UTC
These problems all appear to be fixed in Fedora 15.