Bug 206123
Summary: | cifsd is not userspace signal-safe | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 4 | Reporter: | Jose Plans <jplans> |
Component: | kernel | Assignee: | Jeff Layton <jlayton> |
Status: | CLOSED DUPLICATE | QA Contact: | Brian Brock <bbrock> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 4.4 | CC: | jbaron, staubach, steved, tao |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2007-07-10 11:11:28 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 234547, 245674 | ||
Attachments: |
Description
Jose Plans
2006-09-12 09:10:58 UTC
Can you please verify this is fixed in the u5 beta kernel...kernels can be found at: http://people.redhat.com/~jbaron/rhel4/ thanks. This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux major release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Major release. This request is not yet committed for inclusion. I've just started taking a look at this. My understanding of the facts here are: 1) when the server is killed, the processes and cifsd hang in D state until the server is restarted 2) if, while the server is down, you then "kill -9" cifsd and do a lazy umount on the filesystem then you can't remount the filesystem and the mount does not recover when the server comes back #2 sounds almost like a case of "don't do that". Perhaps the correct answer there is to make sure that cifsd ignores all signals so that when the server comes back up, things end up finishing up as they should. I'll have a look and see if I can reproduce this. I don't seem to be able to reproduce this on recent RHEL4 kernels (I'm using the jtltest.6 kernels from my people page): http://people.redhat.com/jlayton Here's what I'm doing: 1) mount up a cifs share: //dantu/testuser /mnt/cifs cifs user=testuser,dom=EXAMPLE,uid=5000,gid=5000,file_mod=0777,password=testuser,noauto 0 0 2) run fsstress: #!/bin/sh FSSTRESS=/usr/lib/ltp/testcases/kernel/fs/fsstress/fsstress TESTDIR=/mnt/cifs nohup $FSSTRESS -d $TESTDIR/CIFS1 -l 0 -n 1000 -p 4 -r & nohup $FSSTRESS -d $TESTDIR/CIFS2 -l 0 -n 1000 -p 4 -r & 3) Do a "service smb stop" on the samba server 4) kill off the fsstress processes: # pkill fsstress 5) kill off cifsd: # kill -9 <pid of cifsd> 6) umount: # umount -l /mnt/cifs (it returns after a few seconds) 7) start up smb on the server: # service smb start 8) mount again: # mount /mnt/cifs ...it worked fine. Am I not doing the reproducer correctly? Again, I'm having a difficult time reproducing this. Testing with kernels from my people page: http://people.redhat.com/jlayton/ My steps are: 1) mount up a CIFS share from a samba server on /mnt/cifs 2) start up fsstress (8 different processes of it) 3) stop samba on the server 4) pkill fsstress 5) kill -9 pid_of_cifsd 5) umount -l /mnt/cifs 6) start samba on server 7) mount /mnt/cifs 8) I/O to /mnt/cifs (ls, cd, catting files) works as expected I need detailed, step-by-step directions on how to reproduce this. Also let me know what kernel they are reproducing this on. Joe, what kernel were you testing with when you got the panic? The CIFS code in the -42 kernels is *really* old. Can you retest with -55 kernel? First issue... I get this when trying to mount with their setup, with "printable" set to "yes": mount error 20 = Not a directory Refer to the mount.cifs(8) manual page (e.g.man mount.cifs) Commenting that out and restarting samba allows it to me mounted. After that, I'm sort of able to reproduce this. When trying to remount /SAMBA1, I don't get "-ENOTDIR", I get "-EAGAIN". I think the part I was missing before was that they are actually killing *all* of the cifs daemons, not just cifsd as they stated before. I suppose we can try and consider this a bug, but it seems like a rather nonsensical set of steps. What on earth would possess them to want to kill the cifs daemons? More than likely what's happening is that something these daemons are calling are not "signal safe" and that's passing an unexpected error back up the stack. The easiest fix might be to just to have the cifs daemons ignore signals, but I'm not sure if that's the best way to fix this. I'll look at it when I get some time, and will also test with the jtltest kernels since they contain cifs 1.48aRH. jtltest kernels seem to show a more consistent ENOTDIR error, so something may have changed there. I get a lot of these sorts of printk's in the ring buffer: CIFS VFS: Error 0xffffff90 on cifs_get_inode_info in lookup of /p0/d3XXXXXXXXXXXXXXX/d5XXX/d15XXXX/f24XXXXXXX so I'm thinking that we're not getting back valid inode info in this situation and that's causing the ENOTDIR errors. Testing with jtltest.8 (current kernel on my people page). If I don't signal the daemons then everything seems to recover gracefully. So it seems like making sure we either ignore or properly handle signals with the cifs daemons is going to be crucial for fixing this. Spent some time working to simplify this reproducer and have discovered some things: 1) only one mount is needed, not 4 2) it doesn't matter whether you use tar.sh or fsstress. The "problem" is the same. The key is to leave the process running when you do the umount -l so there is active I/O. 3) if, after getting this error, you kill off the processes still doing I/O (in their reproducer, that would be the tar.sh processes that they just suspended and didn't kill), then I think you can then remount and carry on normally. CIFS is already somewhat suprisingly robust in this regard :-). I want to do a bit more work and see if I can narrow down the kill -9 to just a single daemon so we can see what needs to be looked over. Given that this only occurs when sending a SIGKILL to the daemons, I'm somewhat tempted to just close this as a WONTFIX sort of problem and recommend that they don't do that. It does actually seem to be SIGKILL'ing cifsd that causes the problem, though it seems like the process that's running on the mount needs to be in just the right state for this to occur. So minimal set of steps: 1) mount a single samba share, make sure printable is off 2) on that share, run tar.sh script (it's in one of the above comments) 3) on server shut down smbd/nmbd 4) suspend tar.sh (with ^z) 5) kill -9 cifsd 6) umount -l samba share 7) start up nmbd/smbd on server 8) try to mount samba share again you should either get a -EAGAIN or -ENOTDIR. Given how nonsensical these steps are, I'm going to recommend closing this as WONTFIX with a note to the customer not to kill cifsd. Ignoring signals in cifsd looks problematic and there's no reason that the customer should expect that killing cifsd shouldn't have consequences. I'll leave this open for a bit in case Steve wants to chime in though. Perhaps he has a different take on the matter... Actually, tar.sh might not be in this BZ. In case it's not: #!/bin/bash while : do tar -cvf root.tar /root rm -rf root.tar done Created attachment 155742 [details]
patch -- handle ERESTARTSYS error in cifs_reconnect
This patch might actually be appropriate here, though I'd like to propose it
upstream before we roll with it.
The issue seems to be that when the server goes down, that cifsd calls
cifs_reconnect, which loops and attempts to reconnect to the server every 30
seconds. If cifsd catches a signal during this, however, all the connect
attempts return -ERESTARTSYS (-512), and never complete. Since the signal is
never flushed, it seems to loop indefinitely.
This patch makes it so that if the connect returns -ERESTARTSYS that we treat
it like the thread caught a signal. We return with the error and start shutting
down. A preliminary test with a RHEL5 version of this seems to show that it
fixes this problem, though again I'd like some upstream review.
This patch is against upstream code, so the RHEL4/5 equivalent will look a
little different.
Created attachment 155749 [details] patch -- patch for cifs 1.48aRH to handle signals in cifs_reconnect I've proposed the other patch on the upstream mailing list and am awaiting response. This patch is the one I tested. This patch is against the code in the kernels on my people page: http://people.redhat.com/jlayton Once I have a better idea of the upstream response, I'll see about adding this to them for testing. Created attachment 155981 [details]
patch -- make cifsd be more signal-safe
The crux of this matter is that the cifsd kernel thread is not signal-safe. So
if it catches a spurious signal, it can get into a state where things are not
working correctly. It needs to allow signals through though, or mounts and
unmounts can take a very long time if the thread is blocked in kernel_recvmsg.
This patch seems to fix at least the most common cases, and is backported from
one that I've proposed to the upstream maintainer.
I'll probably look to get this into 5.2 and 4.7. The bottom line here is that
sending a "kill -9" to cifsd is a really bad idea and this "bug" is really a
self-inflicted wound. If they don't do that, this problem goes away.
I sent a patch upstream to make cifsd handle signals better, but Christoph Hellwig commented that the fact that cifs code needs to send signals is really a bug itself. I'm going to take a crack at seeing if I can clean up the kernel_recvmsg code to not need a signal to return when run from a kthread that has been stopped. That should obviate the need to deal with signals altogether and we can then just ignore them all. Working on this in F7 for now since this will clearly need to go upstream first. Unfortunately, it looks like this will need to be fixed in all the different transport .recvmsg functions (i.e. tcp, udp, etc...) since things seem to block there... Created attachment 156590 [details]
patch -- make tcp_recvmsg look for kthread_stop events
I've proposed this patch upstream as an RFC. It makes tcp_recvmsg treat a
kthread_stop event as if it had been signalled. I'm not certain whether this
will have bad side effects, however. If it is ok, then we should be able to
remove the need for cifsd to allow signals through.
Herbert Xu stated that this idea feels wrong, since the tcp code shouldn't need to know anything about kthreads. I tend to agree -- this seems sort of like a layering violation. It seems like we need a kernel-internal way to break out of kernel_recvmsg without using signals, but that's neutral enough not to need any sort of layering violation... Created attachment 157543 [details]
patch -- use force_sig instead of send_sig
Just sent this patch upstream. This changes cifsd to ignore all signals and
uses force_sig instead of send_sig for sending the signal. This should allow us
to continue to break out of kernel_recvmsg on mount/umount while insulating
cifsd from signals sent from userspace.
I agree that the force_sig approach appears better. I don't see anyone commenting on the patch though - any feedback? I was planning on merging your force_sig patch very soon ok - merged into cifs-2.6.git (let me know if any objections to this approach are mentioned) Thanks, Steve. No feedback so far other than from Shaggy, who stated basically what you did, that he liked it better. I'll let you know if I hear any objections. There is some upstream discussion also about maybe making kthread_stop send a signal automatically. If that occurs we might can just remove the force_sig() calls as well. I'll keep you posted... Cloned this for RHEL5 in bug #245674 ...As a followup, Christoph Hellwig commented that he still objected to this. I'm gathering though that his objection is to the use of signals at all, and not in the particular change to using force_sig here. Once he explains better, we may want to modify this approach again. *** This bug has been marked as a duplicate of 239339 *** |