Description of problem: Samba server was configured on an x360 and x335 was used as the Samba client. Fsstress & tar.sh tests were running on cifs type client mount points. When the I/O was in progress, I killed the samba server daemons (both smbd & nmbd) which blocked I/O on cifs client mount points. The moment I kill the samba server daemons, process state of cifsd on the client changes from Interruptible sleep (S) to Uninterruptible sleep (D) and hence it is not letting me to kill cifsd. (pls note:all these operations are done with root privilege). Here are the sequence of commands I tried out on client: --------- before killing samba server daemons --------------- [root@x335b fsstress]# ps aux | grep cifsd root 26318 3.6 0.0 0 0 ? S Feb13 35:32 [cifsd] root 13088 0.0 0.0 5560 644 pts/3 S+ 10:19 0:00 grep cifsd --------- after killing samba server daemons --------------- [root@x335b fsstress]# ps aux | grep cifsd root 26318 3.6 0.0 0 0 ? D Feb13 35:39 [cifsd] root 13125 0.0 0.0 4504 644 pts/3 S+ 10:24 0:00 grep cifsd [root@x335b fsstress]# kill -9 26318 [root@x335b fsstress]# ps aux | grep cifsd root 26318 3.6 0.0 0 0 ? D Feb13 35:39 [cifsd] root 13130 0.0 0.0 5192 644 pts/3 S+ 10:25 0:00 grep cifsd Then, I tried restarting samba server daemons and then unmounting and mounting the cifs client mount points. This gives me following cifs mount error: [root@x335b /]# mount -t cifs -o username=root //9.124.111.56/SAMBA1 /SAMBA1 Password: mount error 20 = Not a directory Refer to the mount.cifs(8) manual page (e.g.man mount.cifs) [root@x335b /]# As old cifsd thread is not clearing out even after unmounting cifs mount point & it is conflicting with the new cifsd thread when I try mounting again and hence above mount error. Considering a typical customer scenario where the samba server goes down in the middle of some file I/O & user's cifs mount point gets disconnected from samba server,in that case user normally restart his samba server daemons & tries to unmount/mount cifs mount points. But in this case, user is not able to remount cifs ending up with mount error. Here, user has no option other than restarting his client machine. Version-Release number of selected component (if applicable): cifs 1.34 - 2.6.9-30.ELsmp How reproducible: Always. Steps to Reproduce: 1. kill cifsd 2. mount a CIFS share Actual results: mount error 20 = Not a directory Expected results: Succesful mount. Additional info: [FIXED in cifs 1.44] -- from the IT -- cifs 1.44 at http://us1. samba.org/samba/ftp/cifs-cvs/cifs-1.44.tar.gz includes all current fixes and builds on RHEL4 prior to about the 30 version of the rhel4 2.6.9 kernel (when RedHat added kzalloc to its 2.6.9 it caused a build warning for this cifs on later updates). If we have to work backwards to a smaller fix (smaller than the current stable cifs 1.44 fix rollup) - it would be helpful to make sure 1.44 fixes it -- from the IT --
Can you please verify this is fixed in the u5 beta kernel...kernels can be found at: http://people.redhat.com/~jbaron/rhel4/ thanks.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux major release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Major release. This request is not yet committed for inclusion.
I've just started taking a look at this. My understanding of the facts here are: 1) when the server is killed, the processes and cifsd hang in D state until the server is restarted 2) if, while the server is down, you then "kill -9" cifsd and do a lazy umount on the filesystem then you can't remount the filesystem and the mount does not recover when the server comes back #2 sounds almost like a case of "don't do that". Perhaps the correct answer there is to make sure that cifsd ignores all signals so that when the server comes back up, things end up finishing up as they should. I'll have a look and see if I can reproduce this.
I don't seem to be able to reproduce this on recent RHEL4 kernels (I'm using the jtltest.6 kernels from my people page): http://people.redhat.com/jlayton Here's what I'm doing: 1) mount up a cifs share: //dantu/testuser /mnt/cifs cifs user=testuser,dom=EXAMPLE,uid=5000,gid=5000,file_mod=0777,password=testuser,noauto 0 0 2) run fsstress: #!/bin/sh FSSTRESS=/usr/lib/ltp/testcases/kernel/fs/fsstress/fsstress TESTDIR=/mnt/cifs nohup $FSSTRESS -d $TESTDIR/CIFS1 -l 0 -n 1000 -p 4 -r & nohup $FSSTRESS -d $TESTDIR/CIFS2 -l 0 -n 1000 -p 4 -r & 3) Do a "service smb stop" on the samba server 4) kill off the fsstress processes: # pkill fsstress 5) kill off cifsd: # kill -9 <pid of cifsd> 6) umount: # umount -l /mnt/cifs (it returns after a few seconds) 7) start up smb on the server: # service smb start 8) mount again: # mount /mnt/cifs ...it worked fine. Am I not doing the reproducer correctly?
Again, I'm having a difficult time reproducing this. Testing with kernels from my people page: http://people.redhat.com/jlayton/ My steps are: 1) mount up a CIFS share from a samba server on /mnt/cifs 2) start up fsstress (8 different processes of it) 3) stop samba on the server 4) pkill fsstress 5) kill -9 pid_of_cifsd 5) umount -l /mnt/cifs 6) start samba on server 7) mount /mnt/cifs 8) I/O to /mnt/cifs (ls, cd, catting files) works as expected I need detailed, step-by-step directions on how to reproduce this. Also let me know what kernel they are reproducing this on.
Joe, what kernel were you testing with when you got the panic?
The CIFS code in the -42 kernels is *really* old. Can you retest with -55 kernel?
First issue... I get this when trying to mount with their setup, with "printable" set to "yes": mount error 20 = Not a directory Refer to the mount.cifs(8) manual page (e.g.man mount.cifs) Commenting that out and restarting samba allows it to me mounted. After that, I'm sort of able to reproduce this. When trying to remount /SAMBA1, I don't get "-ENOTDIR", I get "-EAGAIN". I think the part I was missing before was that they are actually killing *all* of the cifs daemons, not just cifsd as they stated before. I suppose we can try and consider this a bug, but it seems like a rather nonsensical set of steps. What on earth would possess them to want to kill the cifs daemons? More than likely what's happening is that something these daemons are calling are not "signal safe" and that's passing an unexpected error back up the stack. The easiest fix might be to just to have the cifs daemons ignore signals, but I'm not sure if that's the best way to fix this. I'll look at it when I get some time, and will also test with the jtltest kernels since they contain cifs 1.48aRH.
jtltest kernels seem to show a more consistent ENOTDIR error, so something may have changed there. I get a lot of these sorts of printk's in the ring buffer: CIFS VFS: Error 0xffffff90 on cifs_get_inode_info in lookup of /p0/d3XXXXXXXXXXXXXXX/d5XXX/d15XXXX/f24XXXXXXX so I'm thinking that we're not getting back valid inode info in this situation and that's causing the ENOTDIR errors.
Testing with jtltest.8 (current kernel on my people page). If I don't signal the daemons then everything seems to recover gracefully. So it seems like making sure we either ignore or properly handle signals with the cifs daemons is going to be crucial for fixing this.
Spent some time working to simplify this reproducer and have discovered some things: 1) only one mount is needed, not 4 2) it doesn't matter whether you use tar.sh or fsstress. The "problem" is the same. The key is to leave the process running when you do the umount -l so there is active I/O. 3) if, after getting this error, you kill off the processes still doing I/O (in their reproducer, that would be the tar.sh processes that they just suspended and didn't kill), then I think you can then remount and carry on normally. CIFS is already somewhat suprisingly robust in this regard :-). I want to do a bit more work and see if I can narrow down the kill -9 to just a single daemon so we can see what needs to be looked over. Given that this only occurs when sending a SIGKILL to the daemons, I'm somewhat tempted to just close this as a WONTFIX sort of problem and recommend that they don't do that.
It does actually seem to be SIGKILL'ing cifsd that causes the problem, though it seems like the process that's running on the mount needs to be in just the right state for this to occur. So minimal set of steps: 1) mount a single samba share, make sure printable is off 2) on that share, run tar.sh script (it's in one of the above comments) 3) on server shut down smbd/nmbd 4) suspend tar.sh (with ^z) 5) kill -9 cifsd 6) umount -l samba share 7) start up nmbd/smbd on server 8) try to mount samba share again you should either get a -EAGAIN or -ENOTDIR. Given how nonsensical these steps are, I'm going to recommend closing this as WONTFIX with a note to the customer not to kill cifsd. Ignoring signals in cifsd looks problematic and there's no reason that the customer should expect that killing cifsd shouldn't have consequences. I'll leave this open for a bit in case Steve wants to chime in though. Perhaps he has a different take on the matter...
Actually, tar.sh might not be in this BZ. In case it's not: #!/bin/bash while : do tar -cvf root.tar /root rm -rf root.tar done
Created attachment 155742 [details] patch -- handle ERESTARTSYS error in cifs_reconnect This patch might actually be appropriate here, though I'd like to propose it upstream before we roll with it. The issue seems to be that when the server goes down, that cifsd calls cifs_reconnect, which loops and attempts to reconnect to the server every 30 seconds. If cifsd catches a signal during this, however, all the connect attempts return -ERESTARTSYS (-512), and never complete. Since the signal is never flushed, it seems to loop indefinitely. This patch makes it so that if the connect returns -ERESTARTSYS that we treat it like the thread caught a signal. We return with the error and start shutting down. A preliminary test with a RHEL5 version of this seems to show that it fixes this problem, though again I'd like some upstream review. This patch is against upstream code, so the RHEL4/5 equivalent will look a little different.
Created attachment 155749 [details] patch -- patch for cifs 1.48aRH to handle signals in cifs_reconnect I've proposed the other patch on the upstream mailing list and am awaiting response. This patch is the one I tested. This patch is against the code in the kernels on my people page: http://people.redhat.com/jlayton Once I have a better idea of the upstream response, I'll see about adding this to them for testing.
Created attachment 155981 [details] patch -- make cifsd be more signal-safe The crux of this matter is that the cifsd kernel thread is not signal-safe. So if it catches a spurious signal, it can get into a state where things are not working correctly. It needs to allow signals through though, or mounts and unmounts can take a very long time if the thread is blocked in kernel_recvmsg. This patch seems to fix at least the most common cases, and is backported from one that I've proposed to the upstream maintainer. I'll probably look to get this into 5.2 and 4.7. The bottom line here is that sending a "kill -9" to cifsd is a really bad idea and this "bug" is really a self-inflicted wound. If they don't do that, this problem goes away.
I sent a patch upstream to make cifsd handle signals better, but Christoph Hellwig commented that the fact that cifs code needs to send signals is really a bug itself. I'm going to take a crack at seeing if I can clean up the kernel_recvmsg code to not need a signal to return when run from a kthread that has been stopped. That should obviate the need to deal with signals altogether and we can then just ignore them all. Working on this in F7 for now since this will clearly need to go upstream first. Unfortunately, it looks like this will need to be fixed in all the different transport .recvmsg functions (i.e. tcp, udp, etc...) since things seem to block there...
Created attachment 156590 [details] patch -- make tcp_recvmsg look for kthread_stop events I've proposed this patch upstream as an RFC. It makes tcp_recvmsg treat a kthread_stop event as if it had been signalled. I'm not certain whether this will have bad side effects, however. If it is ok, then we should be able to remove the need for cifsd to allow signals through.
Herbert Xu stated that this idea feels wrong, since the tcp code shouldn't need to know anything about kthreads. I tend to agree -- this seems sort of like a layering violation. It seems like we need a kernel-internal way to break out of kernel_recvmsg without using signals, but that's neutral enough not to need any sort of layering violation...
Created attachment 157543 [details] patch -- use force_sig instead of send_sig Just sent this patch upstream. This changes cifsd to ignore all signals and uses force_sig instead of send_sig for sending the signal. This should allow us to continue to break out of kernel_recvmsg on mount/umount while insulating cifsd from signals sent from userspace.
I agree that the force_sig approach appears better. I don't see anyone commenting on the patch though - any feedback? I was planning on merging your force_sig patch very soon
ok - merged into cifs-2.6.git (let me know if any objections to this approach are mentioned)
Thanks, Steve. No feedback so far other than from Shaggy, who stated basically what you did, that he liked it better. I'll let you know if I hear any objections. There is some upstream discussion also about maybe making kthread_stop send a signal automatically. If that occurs we might can just remove the force_sig() calls as well. I'll keep you posted...
Cloned this for RHEL5 in bug #245674 ...As a followup, Christoph Hellwig commented that he still objected to this. I'm gathering though that his objection is to the use of signals at all, and not in the particular change to using force_sig here. Once he explains better, we may want to modify this approach again.
*** This bug has been marked as a duplicate of 239339 ***