Description of problem:
tcsh fails when run my a user authenticated to ldap.
This has been seen to be an issue with
Downgrading the packages works around the issue:
OK: tcsh.x86_64 0:6.17-14.el6
Version-Release number of selected component (if applicable):
According to the customer the error was reproducible by machine by having an ldap authenticated account with either csh or tcsh logging in through ssh, logging on through the console, or su - 'ing to the user account.
At that point the connection/console would hang until I kill -9 the csh or tcsh process from another connection. I verified this problem occurred across many different accounts. I also tested with accounts that had no .cshrc file to process on login. Downgrading tcsh resolves the issue.
User cannot login
User can logs in
This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.
Created attachment 705663 [details]
strace -24 version of tcsh
Created attachment 705664 [details]
ltrace -24 version of tcsh
Issue is still apparent with latest version of tcsh.
Thank you for the bug report. This problem is probably caused by history file
locking patch. I have set up LDAP and I have tried to reproduce
the issue (tcsh-6.17-19/tcsh-6.17.24, RHEL-6.3 x86_64). Logging on through ssh
or using su on LDAP authenticated account works for me. Please write me
concrete steps how to reproduce this issue.
Can you also add an output of lsof on tcsh's history file (`lsof ~/.history`)
when tcsh hangs? Is your LDAP authenticated user logged on multiple times, or
is this problem caused even on first time log in?
ping: can you please provide needed info?
This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated
in the current release, Red Hat is unable to address this
request at this time.
Red Hat invites you to ask your support representative to
propose this request, if appropriate, in the next release of
Red Hat Enterprise Linux.
According to provided info, the issue lays in history locking. Unfortunately, I'm still unable to reproduce this issue without additional info - e.g. list of processes which lock ~/.history file.
The only thing which comes to my mind right now is NFS and its nfslock service which has to be started in order to use locks (if NFS is in use).
Does this issue occur only when trying to log in with already logged in user?
Which process locks ~/.history file (lslocks)?
Is NFS in use?
What about trying to run tcsh from a different shell, or using local/remote session?
OK, so we know that this was caused by BZ 791232, right? Have we considered backing that patch out? Perhaps we should consider fixing that problem in a different way. Maybe, instead of using locks to lock the history file, maybe we could consider just not writing to the history file for non-interactive shells (i.e. scripts).
We've got people hanging either trying to log out of their machines, or even worse, when trying to log in. We've got people's production jobs hanging in the middle of the night because of this. Certainly these things are worse than a messed up history file, aren't they?
This thing has been around since 2012, and has now accumulated about a dozen customer cases now. Isn't it time to just give up on trying to figure out how to fix the file locking and just give up on it and try something else?
We need development to ACK this, please. Just back out the history file locking and put "no history file updates from non-interactive shells" in its place.
> In case some TAM is in contact with our customers who face the issues -
> could you please ask them, if they would be willing to update their
> scripts in case we would provide them with parallel (conflicting) package?
Our partner thinks, having to update all scripts would be not acceptable for customers - so both tcsh versions would have to be conflicting if we aim at parallel releases.
Still waiting for the NFS details of the affected environment.
*** Bug 1293411 has been marked as a duplicate of this bug. ***
Created attachment 1212851 [details]
The patch is not complete, I will be adding a note about this new behaviour into the tcsh man page later.
Created attachment 1213496 [details]
Updated version of the previous patch. Upstream testsuite passes. My manual testing suggests this patch is working for .history file located in /home mounted via NFS.
Comment on attachment 1213496 [details]
The patch looks good. I believe it will address the issue with tcsh hanging during login in case the history file is stored on a remote file system. It also introduces an option to restore the current (unpatched) behavior.
As already mentioned by Pavel Raiskup, the return value of statfs() should be checked. Otherwise we would read uninitialized contents of stack.
Also the list of remote file system could be made more complete.
Created attachment 1213920 [details]
All comments from review addressed, here is the latest version.
Created attachment 1213923 [details]
Previous version accidentally added sh.err.h file (generated automatically by ./configure) into the patch.
Comment on attachment 1213923 [details]
Looks good to me.
Fix commited into dist-git:
With the rollout of 7.3 all users of (t)csh have their shell hng upon login - or any program using such a shell script during it's operation (>5 client machines CentOS 7, not on the 8 CentOS 6 clients). Using a fully patched CentOS 6 NFS server doesn't exhibit the problem. The file server uses zfsONlinux but I also tried plain ext4 - no difference.
Killall for the affected user will still leave a process tied to the .history file. The .history file is not being updated and the user can edit and remove the file.
I've tried locking the history file via /etc/csh.cshrc - same behaviour.
The file server is CentOS 7.3. Any NFS client that is running 6.x is fine - only 7.3 clients hang (t)csh.
Using ethernet or infiniband doesn't matter. If the client is a VM or physical doesn't matter. I tried disabling history - but wasn't successfull.
Rolling back the tcsh and nfs-utils packages and kernels didn't help.
The work around is to move the user .history file to /tmp/$user
Ha - I just noticed some of the early comments.
In our setup (CentOS 7.3 clients and servers) tcsh always hangs at login but if an error is introduced into /etc/csh.cshrc then the hang always happens at logout.
I've been using RedHat since 3.0, some Fedora and CentOS since RHEL and had never run into this issue before.
Drat - the problem is much wider than the tcsh .history file. Students running simulations or anyone installing software onto the NFS share (user /home) results in hangs.
vsimk 27065 XXuserXX 3w REG 0,40 2078 33557884 /home/XXuserXXpathXX/lab-one/LOG/uw-sim.log (nfsSERV:/home/zpoolHOME)
java 28616 28629 XXuserXX 49w REG 0,40 0 33145715 /home/XXuserXX/mgc/install.aol/mip_history.txt (nfsSERV:/home/zpoolHOME)
In my case, I could neither su - myself, nor ssh myself@host until I downgrade the tcsh rpm level from tcsh-6.17-25.el6_6.x86_64--to--tcsh-6.17-14.el6.x86_64. It was almost like couldn't get to the shell. Ctrl+c break was able to reach the prompt, but without the XForwarding. With the downgrade, I was able do the console log on so that I got to the RHEL gnome desktop. Since the downgrade, I was not able to ssh to the other hosts with higher ver. of tcsh rpm (e.g. tcsh-6.17-25.el6_6.x86_64). But in my scenario, it was bit confusing, other users can ssh among other hosts with the higher ver of tcsh, e.g.
tcsh-6.17-25.el6_6.x86_64. But I didn't have any problem in SSH'ing to rhel 7.2 with tcsh-6.18.01-8.el7.x86_64.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.