Bug 885901
Summary: | tcsh will hang during login when .history is located and network file system (NFS, Samba, ...) | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Jesse Triplett <jtriplet> | ||||||||||||
Component: | tcsh | Assignee: | David Kaspar // Dee'Kej <deekej> | ||||||||||||
Status: | CLOSED ERRATA | QA Contact: | Iveta Wiedermann <isenfeld> | ||||||||||||
Severity: | urgent | Docs Contact: | Lenka Kimlickova <lkimlick> | ||||||||||||
Priority: | urgent | ||||||||||||||
Version: | 6.3 | CC: | arfernan, btotty, ccheney, chorn, deekej, fjaspe, fkrska, fpokorny, fsorenso, isenfeld, jruemker, kdudka, mkolbas, mpoole, msvistun, ovasik, praetzel, praiskup, qguo, rpiddapa, srandhaw, ssekidde, thgardne, thozza, yhuang, yoguma, zpytela | ||||||||||||
Target Milestone: | rc | ||||||||||||||
Target Release: | 6.5 | ||||||||||||||
Hardware: | All | ||||||||||||||
OS: | Linux | ||||||||||||||
Whiteboard: | |||||||||||||||
Fixed In Version: | tcsh-6.17-38.el6 | Doc Type: | Bug Fix | ||||||||||||
Doc Text: |
`tcsh` no longer becomes unresponsive when the `.history` file is located on a network file system
Previously, if the `.history` file was located on a network file system, such as NFS or Samba, the `tcsh` command language interpreter sometimes became unresponsive during the login process. With this update, the `.history` file is not locked if located on a network file system. As a result, `tcsh` no longer becomes unresponsive in the described situation.
Note that having multiple instances of `tcsh` running can cause the `.history` file to become corrupted. You can resolve this problem by enabling explicit file-locking mechanism. To do that, add the "lock" parameter to the "savehist" option in the `tcsh` configuration file. For example:
$ cat /etc/csh.cshrc
# csh configuration for all shell invocations.
set savehist = (1024 merge lock)
To force `tcsh` to use file-locking when `.history` is located on a network file system, the "lock" parameter must be the third parameter of the "savehist" option. Do this at your own risk, because Red Hat does not guarantee that using the "lock" parameter prevents `tcsh` from becoming unresponsive during the login process.
|
Story Points: | --- | ||||||||||||
Clone Of: | |||||||||||||||
: | 1388425 1388426 (view as bug list) | Environment: | |||||||||||||
Last Closed: | 2017-03-21 11:17:18 UTC | Type: | Bug | ||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||
Documentation: | --- | CRM: | |||||||||||||
Verified Versions: | Category: | --- | |||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||
Embargoed: | |||||||||||||||
Bug Depends On: | 1293411, 1388426 | ||||||||||||||
Bug Blocks: | 1075802, 1172231, 1269194, 1269889, 1316087, 1356047, 1359260, 1388425, 1400664, 1405173 | ||||||||||||||
Attachments: |
|
Description
Jesse Triplett
2012-12-10 23:36:20 UTC
This request was not resolved in time for the current release. Red Hat invites you to ask your support representative to propose this request, if still desired, for consideration in the next release of Red Hat Enterprise Linux. This request was not resolved in time for the current release. Red Hat invites you to ask your support representative to propose this request, if still desired, for consideration in the next release of Red Hat Enterprise Linux. Created attachment 705663 [details]
strace -24 version of tcsh
Created attachment 705664 [details]
ltrace -24 version of tcsh
Roman, Issue is still apparent with latest version of tcsh. Thank you for the bug report. This problem is probably caused by history file locking patch. I have set up LDAP and I have tried to reproduce the issue (tcsh-6.17-19/tcsh-6.17.24, RHEL-6.3 x86_64). Logging on through ssh or using su on LDAP authenticated account works for me. Please write me concrete steps how to reproduce this issue. Can you also add an output of lsof on tcsh's history file (`lsof ~/.history`) when tcsh hangs? Is your LDAP authenticated user logged on multiple times, or is this problem caused even on first time log in? Thank you! ping: can you please provide needed info? This request was evaluated by Red Hat Product Management for inclusion in the current release of Red Hat Enterprise Linux. Because the affected component is not scheduled to be updated in the current release, Red Hat is unable to address this request at this time. Red Hat invites you to ask your support representative to propose this request, if appropriate, in the next release of Red Hat Enterprise Linux. According to provided info, the issue lays in history locking. Unfortunately, I'm still unable to reproduce this issue without additional info - e.g. list of processes which lock ~/.history file. The only thing which comes to my mind right now is NFS and its nfslock service which has to be started in order to use locks (if NFS is in use). Does this issue occur only when trying to log in with already logged in user? Which process locks ~/.history file (lslocks)? Is NFS in use? What about trying to run tcsh from a different shell, or using local/remote session? OK, so we know that this was caused by BZ 791232, right? Have we considered backing that patch out? Perhaps we should consider fixing that problem in a different way. Maybe, instead of using locks to lock the history file, maybe we could consider just not writing to the history file for non-interactive shells (i.e. scripts). We've got people hanging either trying to log out of their machines, or even worse, when trying to log in. We've got people's production jobs hanging in the middle of the night because of this. Certainly these things are worse than a messed up history file, aren't they? This thing has been around since 2012, and has now accumulated about a dozen customer cases now. Isn't it time to just give up on trying to figure out how to fix the file locking and just give up on it and try something else? We need development to ACK this, please. Just back out the history file locking and put "no history file updates from non-interactive shells" in its place. > In case some TAM is in contact with our customers who face the issues -
> could you please ask them, if they would be willing to update their
> scripts in case we would provide them with parallel (conflicting) package?
Our partner thinks, having to update all scripts would be not acceptable for customers - so both tcsh versions would have to be conflicting if we aim at parallel releases.
Still waiting for the NFS details of the affected environment.
*** Bug 1293411 has been marked as a duplicate of this bug. *** Created attachment 1212851 [details]
do-not-lock-history-file-when-on-remote-filesystem.patch
The patch is not complete, I will be adding a note about this new behaviour into the tcsh man page later.
Created attachment 1213496 [details]
do-not-lock-history-file-when-on-remote-filesystem-v2.patch
Updated version of the previous patch. Upstream testsuite passes. My manual testing suggests this patch is working for .history file located in /home mounted via NFS.
Comment on attachment 1213496 [details]
do-not-lock-history-file-when-on-remote-filesystem-v2.patch
The patch looks good. I believe it will address the issue with tcsh hanging during login in case the history file is stored on a remote file system. It also introduces an option to restore the current (unpatched) behavior.
As already mentioned by Pavel Raiskup, the return value of statfs() should be checked. Otherwise we would read uninitialized contents of stack.
Also the list of remote file system could be made more complete.
Created attachment 1213920 [details]
do-not-lock-history-file-when-on-remote-filesystem-v3.patch
All comments from review addressed, here is the latest version.
Created attachment 1213923 [details]
do-not-lock-history-file-when-on-remote-filesystem-v4.patch
Previous version accidentally added sh.err.h file (generated automatically by ./configure) into the patch.
Comment on attachment 1213923 [details]
do-not-lock-history-file-when-on-remote-filesystem-v4.patch
Looks good to me.
Fix commited into dist-git: http://pkgs.devel.redhat.com/cgit/rpms/tcsh/commit/?id=d1185383df36948e87d82b93c With the rollout of 7.3 all users of (t)csh have their shell hng upon login - or any program using such a shell script during it's operation (>5 client machines CentOS 7, not on the 8 CentOS 6 clients). Using a fully patched CentOS 6 NFS server doesn't exhibit the problem. The file server uses zfsONlinux but I also tried plain ext4 - no difference. Killall for the affected user will still leave a process tied to the .history file. The .history file is not being updated and the user can edit and remove the file. I've tried locking the history file via /etc/csh.cshrc - same behaviour. The file server is CentOS 7.3. Any NFS client that is running 6.x is fine - only 7.3 clients hang (t)csh. Using ethernet or infiniband doesn't matter. If the client is a VM or physical doesn't matter. I tried disabling history - but wasn't successfull. Rolling back the tcsh and nfs-utils packages and kernels didn't help. The work around is to move the user .history file to /tmp/$user Ha - I just noticed some of the early comments. In our setup (CentOS 7.3 clients and servers) tcsh always hangs at login but if an error is introduced into /etc/csh.cshrc then the hang always happens at logout. I've been using RedHat since 3.0, some Fedora and CentOS since RHEL and had never run into this issue before. Drat - the problem is much wider than the tcsh .history file. Students running simulations or anyone installing software onto the NFS share (user /home) results in hangs. vsimk 27065 XXuserXX 3w REG 0,40 2078 33557884 /home/XXuserXXpathXX/lab-one/LOG/uw-sim.log (nfsSERV:/home/zpoolHOME) java 28616 28629 XXuserXX 49w REG 0,40 0 33145715 /home/XXuserXX/mgc/install.aol/mip_history.txt (nfsSERV:/home/zpoolHOME) In my case, I could neither su - myself, nor ssh myself@host until I downgrade the tcsh rpm level from tcsh-6.17-25.el6_6.x86_64--to--tcsh-6.17-14.el6.x86_64. It was almost like couldn't get to the shell. Ctrl+c break was able to reach the prompt, but without the XForwarding. With the downgrade, I was able do the console log on so that I got to the RHEL gnome desktop. Since the downgrade, I was not able to ssh to the other hosts with higher ver. of tcsh rpm (e.g. tcsh-6.17-25.el6_6.x86_64). But in my scenario, it was bit confusing, other users can ssh among other hosts with the higher ver of tcsh, e.g. tcsh-6.17-25.el6_6.x86_64. But I didn't have any problem in SSH'ing to rhel 7.2 with tcsh-6.18.01-8.el7.x86_64. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2017-0731.html |