Description of problem: tcsh causes a segmentation fault during tilde expansion. This appears to only happen when using NIS with compat set in nsswitch.conf and a +@netgroupname entry in passwd. Version-Release number of selected component (if applicable): tcsh-6.17-8.fc14 How reproducible: Always. Steps to Reproduce: 1. Configure NIS with at least one user and one netgroup. 2. Set compat for passwd in /etc/nsswitch.conf to allow restricting logins with +@netgroupname 3. Add +@netgroupname to end of /etc/passwd 4. start tcsh and try tilde expansion. For instance: apples> echo ~idfah Segmentation fault (core dumped) apples> cd ~idfah Segmentation fault (core dumped) Actual results: Segmentation fault. Expected results: Successful expansion of ~ to home directory path. Additional info: This does not happen in other shells. bash or zsh for example. This problem was not present in Fedora 12. We tried installing the Fedora 12 tcsh package on Fedora 14 as well as building from source and the problem still happens. We haven't tried Fedora 13 yet.
Created attachment 469857 [details] backtrace showing segmentation violation after tilde expansion
Created attachment 469858 [details] backtrace showing segmentation violation after tilde completion
Looks like tcsh crashes after calling getpwnam when performing tilde expansion, i.e. echo ~username Also crashes after calling getpwent when performing tilde completion, i.e. cd ~us then tab. Looks like string passed to getpwnam is correct. Wrote a simple test program calling getpwnam in the same way that tcsh does, test program does not crash. Other programs that call getpwnam and getpwent are working correctly.
Possibly related to the following? https://bugzilla.redhat.com/show_bug.cgi?id=105886 http://sources.redhat.com/bugzilla/show_bug.cgi?id=962 Enabling nscd seems to thwart the problem as suggested in the second link above.
Looks like automount is also regularly crashing when +@netgroup entries are in passwd. The following error is left in /var/log/messages automount[1103]: set_tsd_user_vars: failed to get passwd info from getpwuid_r This sounds suspiciously similar to where tcsh is crashing during tilde expansion/completion. I have spent considerable time digging through the tcsh code and can't seem to find anything. I am beginning to think the problem is in the glibc NIS code. I suppose I should open another bug report there?
Thank you for the bug report and further investigation, Eliot! No, there is no need to open new bug. We can simply change the component and the title once we can prove this is not related to tcsh. Can you still reproduce the bug? If so, can you please install the following debug info packages and provide us new backtrace? $ debuginfo-install glibc ncurses-libs nss-softokn-freebl
Created attachment 477144 [details] backtrace showing seg fault after tilde expansion
Created attachment 477145 [details] backtrace showing seg fault after tilde completion
Thanks! Yes, we are still able to reproduce the problem with the latest updates. I have updated the backtraces with the requested debuginfo's installed.
Changing component to glibc NIS. I believe this one is related: https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/23077
valgrind?
Created attachment 477511 [details] backtrace showing seg fault after tilde completion
Created attachment 477512 [details] backtrace showing seg fault after tilde expansion
Created attachment 477513 [details] valgrind during tilde expansion
Created attachment 477514 [details] valgrind during tilde completion
Created attachment 477516 [details] backtrace showing seg fault after tilde expansion
Added valgrind output and fixed backtraces to be from the current Fedora 14 tcsh version.
What is malloc_usable_size returning?
Interestingly, when it crashed in gdb it actually crashes inside of malloc_usable_size. So, it does not return. When running it in valgrind it reports that -csh: nss_nis/nis-netgrp.c:75: _nss_nis_setnetgrent: Assertion `malloc_usable_size (netgrp->data) >= len + 1' failed. but I'm not sure what value it is returning.
It looks like ksh also crashes during tilde expansion: $ echo ~idfah ksh: nss_nis/nis-netgrp.c:75: _nss_nis_setnetgrent: Assertion `malloc_usable_size (netgrp->data) >= len + 1' failed. Aborted (core dumped) So it definitely isn't a tcsh problem. The system also seems very unstable unless compat is turned off, the automounter crashes and won't start working again without a reboot. Running gdb on ksh with a breakpoint at line 74 of nss_nis/nis-netgrp.c in glibc-2.12.90-21.i686 in reveals that malloc_usable_size returns zero, even though strlen(netgrp->data) is 863 and len is 863 and netgrp->data prints just fine. I assume netgrp->data is pointing to somewhere it shouldn't? will attach gdb output.
Created attachment 478338 [details] gdb output showing malloc_usable_size returns zero before ksh crash also shows backtraces and that len is correct and strlen(netgrp->data) is correct.
I should mention that our NIS server runs Red Hat Enterprise Linux 5.6 I'm not sure if it would be worth trying to serve NIS from Fedora 14?
Looks like glibc-2.13-1 was recently released as a Fedora update. We installed it last night and the problem seems to have stopped!
tcsh and ksh no longer seg fault with glibc-2.13-1. I don't know what you guys did but thanks! We are still having problems with autofs hanging but I think it may be a different problem. In fact, I can cause NIS to temporarily hang altogether with a command like the following: while true; do ssh localhost uptime; done or by calling getpwnam() repeatedly. NIS and autofs start working again if you wait a few minutes after this. It is almost like NIS trying to prevent a DOS attack. This doesn't happen with Fedora 12 so it must be a change on the client end. Of course, starting nscd stops the problem but I would argue that one shouldn't have to use nscd to run NIS+compat. Any thoughts on this would be much appreciated. This is likely a separate problem, should I file a separate bug report?
These problems all appear to be fixed in Fedora 15.