Bug 206123

Summary: cifsd is not userspace signal-safe
Product: Red Hat Enterprise Linux 4 Reporter: Jose Plans <jplans>
Component: kernelAssignee: Jeff Layton <jlayton>
Status: CLOSED DUPLICATE QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.4CC: jbaron, staubach, steved, tao
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-07-10 11:11:28 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 234547, 245674    
Attachments:
Description Flags
patch -- handle ERESTARTSYS error in cifs_reconnect
none
patch -- patch for cifs 1.48aRH to handle signals in cifs_reconnect
none
patch -- make cifsd be more signal-safe
none
patch -- make tcp_recvmsg look for kthread_stop events
none
patch -- use force_sig instead of send_sig none

Description Jose Plans 2006-09-12 09:10:58 UTC
Description of problem:

Samba server was configured on an x360 and x335 was used as the Samba client.
Fsstress & tar.sh tests were running on cifs type client mount points.
When the I/O was in progress, I killed the samba server daemons (both smbd &
nmbd) which blocked I/O on cifs client mount points.
The moment I kill the samba server daemons, process state of cifsd on the client
changes from Interruptible sleep (S) to Uninterruptible sleep (D) and hence it
is not letting me to kill cifsd.
(pls note:all these operations are done with root privilege).

Here are the sequence of commands I tried out on client:

--------- before killing samba server daemons ---------------

[root@x335b fsstress]# ps aux | grep cifsd
root     26318  3.6  0.0     0    0 ?        S    Feb13  35:32 [cifsd]
root     13088  0.0  0.0  5560  644 pts/3    S+   10:19   0:00 grep cifsd


--------- after killing samba server daemons ---------------

[root@x335b fsstress]# ps aux | grep cifsd
root     26318  3.6  0.0     0    0 ?        D    Feb13  35:39 [cifsd]
root     13125  0.0  0.0  4504  644 pts/3    S+   10:24   0:00 grep cifsd

[root@x335b fsstress]# kill -9 26318
[root@x335b fsstress]# ps aux | grep cifsd
root     26318  3.6  0.0     0    0 ?        D    Feb13  35:39 [cifsd]
root     13130  0.0  0.0  5192  644 pts/3    S+   10:25   0:00 grep cifsd

Then, I tried restarting samba server daemons and then unmounting and mounting
the cifs client mount points.
This gives me following cifs mount error:

[root@x335b /]# mount -t cifs -o username=root //9.124.111.56/SAMBA1 /SAMBA1
Password:
mount error 20 = Not a directory
Refer to the mount.cifs(8) manual page (e.g.man mount.cifs)
[root@x335b /]#

As old cifsd thread is not clearing out even after unmounting cifs mount point &
it is conflicting with the new cifsd thread when I try mounting again and hence
above mount error.

Considering a typical customer scenario where the samba server goes down in the
middle of some file I/O & user's cifs mount point gets disconnected from samba
server,in that case user normally restart his samba server daemons & tries to
unmount/mount cifs mount points. But in this case, user is not able to remount
cifs ending up with mount error. Here, user has no option other than restarting
his client machine.


Version-Release number of selected component (if applicable):
cifs 1.34 - 2.6.9-30.ELsmp

How reproducible:
Always.

Steps to Reproduce:
1. kill cifsd
2. mount a CIFS share
  
Actual results:
mount error 20 = Not a directory

Expected results:
Succesful mount.

Additional info:
[FIXED in cifs 1.44]

-- from the IT --
cifs 1.44 at http://us1. samba.org/samba/ftp/cifs-cvs/cifs-1.44.tar.gz includes
all current fixes and builds on RHEL4 prior to about the 30 version of the rhel4
2.6.9 kernel (when RedHat added kzalloc to its 2.6.9 it caused a build warning
for this cifs on later updates).

If we have to work backwards to a smaller fix (smaller than the current stable
cifs 1.44 fix rollup) - it would be helpful to make sure 1.44 fixes it
-- from the IT --

Comment 3 Jason Baron 2006-11-03 19:55:33 UTC
Can you please verify this is fixed in the u5 beta kernel...kernels can be found
at: http://people.redhat.com/~jbaron/rhel4/ thanks.

Comment 10 RHEL Program Management 2006-11-08 15:40:29 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux major release.  Product Management has requested further
review of this request by Red Hat Engineering, for potential inclusion in a Red
Hat Enterprise Linux Major release.  This request is not yet committed for
inclusion.

Comment 19 Jeff Layton 2007-04-23 19:06:26 UTC
I've just started taking a look at this. My understanding of the facts here are:

1) when the server is killed, the processes and cifsd hang in D state until the
server is restarted

2) if, while the server is down, you then "kill -9" cifsd and do a lazy umount
on the filesystem then you can't remount the filesystem and the mount does not
recover when the server comes back

#2 sounds almost like a case of "don't do that". Perhaps the correct answer
there is to make sure that cifsd ignores all signals so that when the server
comes back up, things end up finishing up as they should. I'll have a look and
see if I can reproduce this.


Comment 21 Jeff Layton 2007-04-24 13:39:08 UTC
I don't seem to be able to reproduce this on recent RHEL4 kernels (I'm using the
jtltest.6 kernels from my people page):

http://people.redhat.com/jlayton

Here's what I'm doing:

1) mount up a cifs share:

    //dantu/testuser /mnt/cifs    cifs
user=testuser,dom=EXAMPLE,uid=5000,gid=5000,file_mod=0777,password=testuser,noauto
0 0

2) run fsstress:

    #!/bin/sh
    FSSTRESS=/usr/lib/ltp/testcases/kernel/fs/fsstress/fsstress
    TESTDIR=/mnt/cifs

    nohup $FSSTRESS -d $TESTDIR/CIFS1 -l 0 -n 1000 -p 4 -r & 
    nohup $FSSTRESS -d $TESTDIR/CIFS2 -l 0 -n 1000 -p 4 -r & 

3) Do a "service smb stop" on the samba server
4) kill off the fsstress processes:

# pkill fsstress

5) kill off cifsd:

# kill -9 <pid of cifsd>

6) umount:

# umount -l /mnt/cifs

(it returns after a few seconds)

7) start up smb on the server:

# service smb start

8) mount again:

# mount /mnt/cifs

...it worked fine. Am I not doing the reproducer correctly?


Comment 27 Jeff Layton 2007-05-03 17:14:15 UTC
Again, I'm having a difficult time reproducing this. Testing with kernels from
my people page:

http://people.redhat.com/jlayton/

My steps are:

1) mount up a CIFS share from a samba server on /mnt/cifs
2) start up fsstress (8 different processes of it)
3) stop samba on the server
4) pkill fsstress
5) kill -9 pid_of_cifsd
5) umount -l /mnt/cifs
6) start samba on server
7) mount /mnt/cifs
8) I/O to /mnt/cifs (ls, cd, catting files) works as expected

I need detailed, step-by-step directions on how to reproduce this. Also let me
know what kernel they are reproducing this on.


Comment 45 Jeff Layton 2007-05-17 15:17:33 UTC
Joe, what kernel were you testing with when you got the panic?

Comment 47 Jeff Layton 2007-05-18 13:58:44 UTC
The CIFS code in the -42 kernels is *really* old. Can you retest with -55 kernel?


Comment 49 Jeff Layton 2007-05-18 18:06:51 UTC
First issue...

I get this when trying to mount with their setup, with "printable" set to "yes":

mount error 20 = Not a directory
Refer to the mount.cifs(8) manual page (e.g.man mount.cifs)

Commenting that out and restarting samba allows it to me mounted. After that,
I'm sort of able to reproduce this. When trying to remount /SAMBA1, I don't get
"-ENOTDIR", I get "-EAGAIN".

I think the part I was missing before was that they are actually killing *all*
of the cifs daemons, not just cifsd as they stated before.

I suppose we can try and consider this a bug, but it seems like a rather
nonsensical set of steps. What on earth would possess them to want to kill the
cifs daemons? More than likely what's happening is that something these daemons
are calling are not "signal safe" and that's passing an unexpected error back up
the stack.

The easiest fix might be to just to have the cifs daemons ignore signals, but
I'm not sure if that's the best way to fix this. I'll look at it when I get some
time, and will also test with the jtltest kernels since they contain cifs 1.48aRH.


Comment 50 Jeff Layton 2007-05-18 18:23:24 UTC
jtltest kernels seem to show a more consistent ENOTDIR error, so something may
have changed there. I get a lot of these sorts of printk's in the ring buffer:

 CIFS VFS: Error 0xffffff90 on cifs_get_inode_info in lookup of
/p0/d3XXXXXXXXXXXXXXX/d5XXX/d15XXXX/f24XXXXXXX

so I'm thinking that we're not getting back valid inode info in this situation
and that's causing the ENOTDIR errors.


Comment 51 Jeff Layton 2007-05-18 18:27:30 UTC
Testing with jtltest.8 (current kernel on my people page). If I don't signal the
daemons then everything seems to recover gracefully. So it seems like making
sure we either ignore or properly handle signals with the cifs daemons is going
to be crucial for fixing this.



Comment 52 Jeff Layton 2007-05-30 14:34:34 UTC
Spent some time working to simplify this reproducer and have discovered some things:

1) only one mount is needed, not 4
2) it doesn't matter whether you use tar.sh or fsstress. The "problem" is the
same. The key is to leave the process running when you do the umount -l so there
is active I/O.
3) if, after getting this error, you kill off the processes still doing I/O (in
their reproducer, that would be the tar.sh processes that they just suspended
and didn't kill), then I think you can then remount and carry on normally.

CIFS is already somewhat suprisingly robust in this regard :-). I want to do a
bit more work and see if I can narrow down the kill -9 to just a single daemon
so we can see what needs to be looked over.

Given that this only occurs when sending a SIGKILL to the daemons, I'm somewhat
tempted to just close this as a WONTFIX sort of problem and recommend that they
don't do that.


Comment 53 Jeff Layton 2007-05-30 14:58:17 UTC
It does actually seem to be SIGKILL'ing cifsd that causes the problem, though it
seems like the process that's running on the mount needs to be in just the right
state for this to occur.

So minimal set of steps:

1) mount a single samba share, make sure printable is off
2) on that share, run tar.sh script (it's in one of the above comments)
3) on server shut down smbd/nmbd
4) suspend tar.sh (with ^z)
5) kill -9 cifsd
6) umount -l samba share
7) start up nmbd/smbd on server
8) try to mount samba share again

you should either get a -EAGAIN or -ENOTDIR. Given how nonsensical these steps
are, I'm going to recommend closing this as WONTFIX with a note to the customer
not to kill cifsd. Ignoring signals in cifsd looks problematic and there's no
reason that the customer should expect that killing cifsd shouldn't have
consequences.

I'll leave this open for a bit in case Steve wants to chime in though. Perhaps
he has a different take on the matter...


Comment 54 Jeff Layton 2007-05-30 15:01:30 UTC
Actually, tar.sh might not be in this BZ. In case it's not:

#!/bin/bash
while :
	do
		tar -cvf root.tar /root 
		rm -rf root.tar
	done

Comment 55 Jeff Layton 2007-05-30 19:42:48 UTC
Created attachment 155742 [details]
patch -- handle ERESTARTSYS error in cifs_reconnect

This patch might actually be appropriate here, though I'd like to propose it
upstream before we roll with it.

The issue seems to be that when the server goes down, that cifsd calls
cifs_reconnect, which loops and attempts to reconnect to the server every 30
seconds. If cifsd catches a signal during this, however, all the connect
attempts return -ERESTARTSYS (-512), and never complete. Since the signal is
never flushed, it seems to loop indefinitely.

This patch makes it so that if the connect returns -ERESTARTSYS that we treat
it like the thread caught a signal. We return with the error and start shutting
down. A preliminary test with a RHEL5 version of this seems to show that it
fixes this problem, though again I'd like some upstream review.

This patch is against upstream code, so the RHEL4/5 equivalent will look a
little different.

Comment 56 Jeff Layton 2007-05-30 20:34:12 UTC
Created attachment 155749 [details]
patch -- patch for cifs 1.48aRH to handle signals in cifs_reconnect

I've proposed the other patch on the upstream mailing list and am awaiting
response. This patch is the one I tested.

This patch is against the code in the kernels on my people page:

http://people.redhat.com/jlayton

Once I have a better idea of the upstream response, I'll see about adding this
to them for testing.

Comment 57 Jeff Layton 2007-06-02 12:46:23 UTC
Created attachment 155981 [details]
patch -- make cifsd be more signal-safe

The crux of this matter is that the cifsd kernel thread is not signal-safe. So
if it catches a spurious signal, it can get into a state where things are not
working correctly. It needs to allow signals through though, or mounts and
unmounts can take a very long time if the thread is blocked in kernel_recvmsg.

This patch seems to fix at least the most common cases, and is backported from
one that I've proposed to the upstream maintainer.

I'll probably look to get this into 5.2 and 4.7. The bottom line here is that
sending a "kill -9" to cifsd is a really bad idea and this "bug" is really a
self-inflicted wound. If they don't do that, this problem goes away.

Comment 58 Jeff Layton 2007-06-08 14:00:32 UTC
I sent a patch upstream to make cifsd handle signals better, but Christoph
Hellwig commented that the fact that cifs code needs to send signals is really a
bug itself. I'm going to take a crack at seeing if I can clean up the
kernel_recvmsg code to not need a signal to return when run from a kthread that
has been stopped. That should obviate the need to deal with signals altogether
and we can then just ignore them all.

Working on this in F7 for now since this will clearly need to go upstream first.
Unfortunately, it looks like this will need to be fixed in all the different
transport .recvmsg functions (i.e. tcp, udp, etc...) since things seem to block
there...


Comment 59 Jeff Layton 2007-06-08 16:55:56 UTC
Created attachment 156590 [details]
patch -- make tcp_recvmsg look for kthread_stop events

I've proposed this patch upstream as an RFC. It makes tcp_recvmsg treat a
kthread_stop event as if it had been signalled. I'm not certain whether this
will have bad side effects, however. If it is ok, then we should be able to
remove the need for cifsd to allow signals through.

Comment 60 Jeff Layton 2007-06-20 20:07:14 UTC
Herbert Xu stated that this idea feels wrong, since the tcp code shouldn't need
to know anything about kthreads. I tend to agree -- this seems sort of like a
layering violation.

It seems like we need a kernel-internal way to break out of kernel_recvmsg
without using signals, but that's neutral enough not to need any sort of
layering violation...


Comment 61 Jeff Layton 2007-06-21 14:37:59 UTC
Created attachment 157543 [details]
patch -- use force_sig instead of send_sig

Just sent this patch upstream. This changes cifsd to ignore all signals and
uses force_sig instead of send_sig for sending the signal. This should allow us
to continue to break out of kernel_recvmsg on mount/umount while insulating
cifsd from signals sent from userspace.

Comment 62 Steve French 2007-06-25 22:04:21 UTC
I agree that the force_sig approach appears better.  I don't see anyone
commenting on the patch though - any feedback?  I was planning on merging your
force_sig patch very soon

Comment 63 Steve French 2007-06-25 22:31:15 UTC
ok - merged into cifs-2.6.git (let me know if any objections to this approach
are mentioned)

Comment 64 Jeff Layton 2007-06-26 00:32:29 UTC
Thanks, Steve.

No feedback so far other than from Shaggy, who stated basically what you did,
that he liked it better. I'll let you know if I hear any objections.

There is some upstream discussion also about maybe making kthread_stop send a
signal automatically. If that occurs we might can just remove the force_sig()
calls as well. I'll keep you posted...


Comment 65 Jeff Layton 2007-06-26 12:08:53 UTC
Cloned this for RHEL5 in bug #245674

...As a followup, Christoph Hellwig commented that he still objected to this.
I'm gathering though that his objection is to the use of signals at all, and not
in the particular change to using force_sig here. Once he explains better, we
may want to modify this approach again.


Comment 66 Jeff Layton 2007-07-10 11:11:28 UTC

*** This bug has been marked as a duplicate of 239339 ***