885901 – tcsh will hang during login when .history is located and network file system (NFS, Samba, ...)

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 885901 - tcsh will hang during login when .history is located and network file system (NFS, Samba, ...)

Summary: tcsh will hang during login when .history is located and network file system ...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	tcsh
Sub Component:
Version:	6.3
Hardware:	All
OS:	Linux
Priority:	urgent
Severity:	urgent
Target Milestone:	rc
Target Release:	6.5
Assignee:	David Kaspar // Dee'Kej
QA Contact:	Iveta Wiedermann
Docs Contact:	Lenka Kimlickova
URL:
Whiteboard:
Duplicates (1):	1293411 (view as bug list)
Depends On:	1293411 1388426
Blocks:	1075802 1172231 1269194 1269889 1316087 1356047 1359260 1388425 1400664 1405173
TreeView+	depends on / blocked

Reported:	2012-12-10 23:36 UTC by Jesse Triplett
Modified:	2023-12-15 15:45 UTC (History)
CC List:	27 users (show)
Fixed In Version:	tcsh-6.17-38.el6
Doc Type:	Bug Fix
Doc Text:	`tcsh` no longer becomes unresponsive when the `.history` file is located on a network file system Previously, if the `.history` file was located on a network file system, such as NFS or Samba, the `tcsh` command language interpreter sometimes became unresponsive during the login process. With this update, the `.history` file is not locked if located on a network file system. As a result, `tcsh` no longer becomes unresponsive in the described situation. Note that having multiple instances of `tcsh` running can cause the `.history` file to become corrupted. You can resolve this problem by enabling explicit file-locking mechanism. To do that, add the "lock" parameter to the "savehist" option in the `tcsh` configuration file. For example: $ cat /etc/csh.cshrc # csh configuration for all shell invocations. set savehist = (1024 merge lock) To force `tcsh` to use file-locking when `.history` is located on a network file system, the "lock" parameter must be the third parameter of the "savehist" option. Do this at your own risk, because Red Hat does not guarantee that using the "lock" parameter prevents `tcsh` from becoming unresponsive during the login process.
Clone Of:
Clones:	1388425 1388426 (view as bug list)
Environment:
Last Closed:	2017-03-21 11:17:18 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
strace -24 version of tcsh (427.21 KB, text/plain) 2013-03-05 20:18 UTC, Simon Sekidde	no flags	Details
ltrace -24 version of tcsh (10.14 MB, application/octet-stream) 2013-03-05 20:18 UTC, Simon Sekidde	no flags	Details
do-not-lock-history-file-when-on-remote-filesystem-v2.patch (4.25 KB, patch) 2016-10-24 14:52 UTC, David Kaspar // Dee'Kej	kdudka: review-	Details \| Diff
do-not-lock-history-file-when-on-remote-filesystem-v3.patch (9.40 KB, patch) 2016-10-25 13:49 UTC, David Kaspar // Dee'Kej	no flags	Details \| Diff
do-not-lock-history-file-when-on-remote-filesystem-v4.patch (5.82 KB, patch) 2016-10-25 14:02 UTC, David Kaspar // Dee'Kej	kdudka: review+	Details \| Diff
Show Obsolete (2) View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Knowledge Base (Solution)	405803	0	None	None	None	2016-09-09 06:21:57 UTC
Red Hat Product Errata	RHBA-2017:0731	0	normal	SHIPPED_LIVE	tcsh bug fix update	2017-03-21 12:43:34 UTC

Description Jesse Triplett 2012-12-10 23:36:20 UTC

Description of problem:
 tcsh fails when run my a user authenticated to ldap.

 This has been seen to be an issue with 
  BAD: tcsh-6.17-19.el6_2.x86_64
 Downgrading the packages works around the issue:
  OK:  tcsh.x86_64 0:6.17-14.el6


Version-Release number of selected component (if applicable):
tcsh-6.17-19.el6_2.x86_64

How reproducible:
According to the customer the error was reproducible by machine by having an ldap authenticated account with either csh or tcsh logging in through ssh, logging on through the console, or su - 'ing to the user account.

At that point the connection/console would hang until I kill -9 the csh or tcsh process from another connection. I verified this problem occurred across many different accounts. I also tested with accounts that had no .cshrc file to process on login. Downgrading tcsh resolves the issue.

Actual results:
User cannot login

Expected results:
User can logs in

Comment 2 RHEL Program Management 2012-12-14 08:08:09 UTC

This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.

Comment 8 RHEL Program Management 2013-01-26 06:47:25 UTC

This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.

Comment 15 Simon Sekidde 2013-03-05 20:18:03 UTC

Created attachment 705663 [details]
strace -24 version of tcsh

Comment 16 Simon Sekidde 2013-03-05 20:18:41 UTC

Created attachment 705664 [details]
ltrace -24 version of tcsh

Comment 17 Simon Sekidde 2013-03-05 20:19:37 UTC

Roman, 

Issue is still apparent with latest version of tcsh.

Comment 19 Fridolín Pokorný 2013-05-09 12:37:41 UTC

Thank you for the bug report. This problem is probably caused by history file
locking patch. I have set up LDAP and I have tried to reproduce
the issue (tcsh-6.17-19/tcsh-6.17.24, RHEL-6.3 x86_64). Logging on through ssh
or using su on LDAP authenticated account works for me. Please write me
concrete steps how to reproduce this issue.

Can you also add an output of lsof on tcsh's history file (`lsof ~/.history`)
when tcsh hangs? Is your LDAP authenticated user logged on multiple times, or
is this problem caused even on first time log in?

Thank you!

Comment 21 Jaromír Končický 2013-08-13 09:09:59 UTC

ping: can you please provide needed info?

Comment 22 RHEL Program Management 2013-10-14 00:10:50 UTC

This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated
in the current release, Red Hat is unable to address this
request at this time.

Red Hat invites you to ask your support representative to
propose this request, if appropriate, in the next release of
Red Hat Enterprise Linux.

Comment 36 Fridolín Pokorný 2015-05-13 12:18:24 UTC

According to provided info, the issue lays in history locking. Unfortunately, I'm still unable to reproduce this issue without additional info - e.g. list of processes which lock ~/.history file.

The only thing which comes to my mind right now is NFS and its nfslock service which has to be started in order to use locks (if NFS is in use).

Does this issue occur only when trying to log in with already logged in user?
Which process locks ~/.history file (lslocks)?
Is NFS in use?
What about trying to run tcsh from a different shell, or using local/remote session?

Comment 45 Thomas Gardner 2016-08-03 23:09:18 UTC

OK, so we know that this was caused by BZ 791232, right?  Have we considered backing that patch out?  Perhaps we should consider fixing that problem in a different way.  Maybe, instead of using locks to lock the history file, maybe we could consider just not writing to the history file for non-interactive shells (i.e. scripts).

We've got people hanging either trying to log out of their machines, or even worse, when trying to log in.  We've got people's production jobs hanging in the middle of the night because of this.  Certainly these things are worse than a messed up history file, aren't they?

This thing has been around since 2012, and has now accumulated about a dozen customer cases now.  Isn't it time to just give up on trying to figure out how to fix the file locking and just give up on it and try something else?

Comment 46 Thomas Gardner 2016-08-03 23:11:42 UTC

We need development to ACK this, please.  Just back out the history file locking and put "no history file updates from non-interactive shells" in its place.

Comment 60 Christian Horn 2016-09-26 06:42:29 UTC

> In case some TAM is in contact with our customers who face the issues -
> could you please ask them, if they would be willing to update their
> scripts in case we would provide them with parallel (conflicting) package?
Our partner thinks, having to update all scripts would be not acceptable for customers - so both tcsh versions would have to be conflicting if we aim at parallel releases.

Still waiting for the NFS details of the affected environment.

Comment 65 David Kaspar // Dee'Kej 2016-10-21 09:05:35 UTC

*** Bug 1293411 has been marked as a duplicate of this bug. ***

Comment 68 David Kaspar // Dee'Kej 2016-10-21 13:49:47 UTC

Created attachment 1212851 [details]
do-not-lock-history-file-when-on-remote-filesystem.patch

The patch is not complete, I will be adding a note about this new behaviour into the tcsh man page later.

Comment 71 David Kaspar // Dee'Kej 2016-10-24 14:52:29 UTC

Created attachment 1213496 [details]
do-not-lock-history-file-when-on-remote-filesystem-v2.patch

Updated version of the previous patch. Upstream testsuite passes. My manual testing suggests this patch is working for .history file located in /home mounted via NFS.

Comment 73 Kamil Dudka 2016-10-25 13:20:41 UTC

Comment on attachment 1213496 [details]
do-not-lock-history-file-when-on-remote-filesystem-v2.patch

The patch looks good.  I believe it will address the issue with tcsh hanging during login in case the history file is stored on a remote file system.  It also introduces an option to restore the current (unpatched) behavior.

As already mentioned by Pavel Raiskup, the return value of statfs() should be checked.  Otherwise we would read uninitialized contents of stack.

Also the list of remote file system could be made more complete.

Comment 74 David Kaspar // Dee'Kej 2016-10-25 13:49:27 UTC

Created attachment 1213920 [details]
do-not-lock-history-file-when-on-remote-filesystem-v3.patch

All comments from review addressed, here is the latest version.

Comment 75 David Kaspar // Dee'Kej 2016-10-25 14:02:13 UTC

Created attachment 1213923 [details]
do-not-lock-history-file-when-on-remote-filesystem-v4.patch

Previous version accidentally added sh.err.h file (generated automatically by ./configure) into the patch.

Comment 76 Kamil Dudka 2016-10-25 15:54:13 UTC

Comment on attachment 1213923 [details]
do-not-lock-history-file-when-on-remote-filesystem-v4.patch

Looks good to me.

Comment 77 David Kaspar // Dee'Kej 2016-10-26 08:32:26 UTC

Fix commited into dist-git:
http://pkgs.devel.redhat.com/cgit/rpms/tcsh/commit/?id=d1185383df36948e87d82b93c

Comment 86 Eric 2017-01-16 23:27:13 UTC

With the rollout of 7.3 all users of (t)csh have their shell hng upon login - or any program using such a shell script during it's operation (>5 client machines CentOS 7, not on the 8 CentOS 6 clients). Using a fully patched CentOS 6 NFS server doesn't exhibit the problem.  The file server uses zfsONlinux but I also tried plain ext4 - no difference.

Killall for the affected user will still leave a process tied to the .history file.  The .history file is not being updated and the user can edit and remove the file.

I've tried locking the history file via /etc/csh.cshrc - same behaviour.

The file server is CentOS 7.3.  Any NFS client that is running 6.x is fine - only 7.3 clients hang (t)csh.
Using ethernet or infiniband doesn't matter.  If the client is a VM or physical doesn't matter.  I tried disabling history - but wasn't successfull.

Rolling back the tcsh and nfs-utils packages and kernels didn't help.

The work around is to move the user .history file to /tmp/$user

Comment 87 Eric 2017-01-17 00:43:11 UTC

Ha - I just noticed some of the early comments.
In our setup (CentOS 7.3 clients and servers) tcsh always hangs at login but if an error is introduced into /etc/csh.cshrc then the hang always happens at logout.
I've been using RedHat since 3.0, some Fedora and CentOS since RHEL and had never run into this issue before.

Comment 88 Eric 2017-01-17 15:21:39 UTC

Drat - the problem is much wider than the tcsh .history file.  Students running simulations or anyone installing software onto the NFS share (user /home) results in hangs.

vsimk     27065             XXuserXX    3w      REG               0,40      2078   33557884 /home/XXuserXXpathXX/lab-one/LOG/uw-sim.log (nfsSERV:/home/zpoolHOME)
java      28616 28629       XXuserXX   49w      REG               0,40         0   33145715 /home/XXuserXX/mgc/install.aol/mip_history.txt (nfsSERV:/home/zpoolHOME)

Comment 90 linferna 2017-02-21 06:45:39 UTC

In my case, I could neither su - myself, nor ssh myself@host until I downgrade the tcsh rpm level from tcsh-6.17-25.el6_6.x86_64--to--tcsh-6.17-14.el6.x86_64. It was almost like couldn't get to the shell. Ctrl+c break was able to reach the prompt, but without the XForwarding. With the downgrade, I was able do the console log on so that I got to the RHEL gnome desktop. Since the downgrade, I was not able to ssh to the other hosts with higher ver. of tcsh rpm (e.g. tcsh-6.17-25.el6_6.x86_64). But in my scenario, it was bit confusing, other users can ssh among other hosts with the higher ver of tcsh, e.g.
tcsh-6.17-25.el6_6.x86_64. But I didn't have any problem in SSH'ing to rhel 7.2 with tcsh-6.18.01-8.el7.x86_64.

Comment 92 errata-xmlrpc 2017-03-21 11:17:18 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2017-0731.html

Note You need to log in before you can comment on or make changes to this bug.

arfernan
btotty
ccheney
chorn
deekej
fjaspe
fkrska
fpokorny
fsorenso
isenfeld
jruemker
kdudka
mkolbas
mpoole
msvistun
ovasik
praetzel
praiskup
qguo
rpiddapa
srandhaw
ssekidde
thgardne
thozza
yhuang
yoguma
zpytela