Hide Forgot
Created attachment 836274 [details] dmesg Description of problem: $ ps -Lww -C mount.cifs -o pid,stat,nwchan,wchan,comm,args PID STAT WCHAN WCHAN COMMAND COMMAND 19236 D 5c9470 cifs_g mount.cifs /sbin/mount.cifs //pdbisdocs.pdb.fsc.net/Projects /cifs/pdbisdocs.pdb.fsc.net/Projects -n -o rw,multiuser,cruid=1004,sec=krb5i $ cat /proc/19236/stack [<ffffffffa05c9470>] cifs_get_smb_ses+0x200/0x4c0 [cifs] [<ffffffffa05c9cdd>] cifs_mount+0x5ad/0xad0 [cifs] [<ffffffffa05b8f50>] cifs_do_mount+0xa0/0x4d0 [cifs] [<ffffffff811ac239>] mount_fs+0x39/0x1b0 [<ffffffff811c51df>] vfs_kern_mount+0x5f/0xf0 [<ffffffff811c75ae>] do_mount+0x23e/0xa20 [<ffffffff811c7e13>] SyS_mount+0x83/0xc0 [<ffffffff81656859>] system_call_fastpath+0x16/0x1b The bash session which still had CWD on CIFS hangs in a similar stack: [<ffffffffa05dac23>] wait_for_response.isra.6+0xc3/0x120 [cifs] [<ffffffffa05dbb9f>] SendReceive+0x12f/0x300 [cifs] [<ffffffffa05ba3b6>] CIFSSMBNegotiate+0x116/0x6a0 [cifs] [<ffffffffa05e2dd7>] cifs_negotiate+0x17/0x60 [cifs] [<ffffffffa05c901f>] cifs_negotiate_protocol+0x7f/0xe0 [cifs] [<ffffffffa05c947b>] cifs_get_smb_ses+0x20b/0x4c0 [cifs] [<ffffffffa05ca6c9>] cifs_sb_tlink+0x4a9/0x850 [cifs] [<ffffffffa05d4f91>] cifs_get_inode_info+0x51/0x7b0 [cifs] [<ffffffffa05d6f3f>] cifs_revalidate_dentry_attr+0xef/0x1b0 [cifs] [<ffffffffa05d70d7>] cifs_getattr+0x57/0x130 [cifs] [<ffffffff811acf36>] vfs_getattr+0x46/0x80 [<ffffffff811ad032>] vfs_fstatat+0x62/0xa0 [<ffffffff811ad21f>] SYSC_newstat+0x1f/0x40 [<ffffffff811ad66e>] SyS_newstat+0xe/0x10 [<ffffffff81656859>] system_call_fastpath+0x16/0x1b Version-Release number of selected component (if applicable): kernel-3.11.9-200.fc19.x86_64 How reproducible: happens sometimes when CIFS file systems are idle for some time. I haven't done exhaustive reproducibility testing. In the case at had I had a bash session open on one CIFS FS, so autofs didn't timeout. Steps to Reproduce: 1. configure CIFS filesystems using autofs 2. cd to a CIFS file system in a terminal session 3. wait a while Actual results: Accesses to Expected results: Additional info: While I was typing in this bug, the CIFS accesses obviously timed out after estimated 10-15 minutes. Now everything is fine again. However I verified that while the CIFS session was stalled, the server was alive and well. I could easily access it using smbclient from the same host. During the error situation, I had set cifsFYI to 15 and saw the following output: [39990.240053] fs/cifs/connect.c: Received no data or error: expecting 4 got -110 [39990.240057] fs/cifs/connect.c: Reconnecting tcp session [39990.240060] fs/cifs/connect.c: cifs_reconnect: marking sessions and tcons for reconnect [39990.240063] fs/cifs/connect.c: cifs_reconnect: tearing down socket [39990.240065] fs/cifs/connect.c: State: 0x3 Flags: 0x0 [39990.240068] fs/cifs/connect.c: Post shutdown state: 0x3 Flags: 0x0 [39990.240083] fs/cifs/connect.c: cifs_reconnect: moving mids to private list [39990.240086] fs/cifs/connect.c: cifs_reconnect: issuing mid callbacks Full dmesg is attached.
(In reply to Martin Wilck from comment #0) > Created attachment 836274 [details] > dmesg > > Description of problem: > > $ ps -Lww -C mount.cifs -o pid,stat,nwchan,wchan,comm,args > PID STAT WCHAN WCHAN COMMAND COMMAND > 19236 D 5c9470 cifs_g mount.cifs /sbin/mount.cifs > //pdbisdocs.pdb.fsc.net/Projects /cifs/pdbisdocs.pdb.fsc.net/Projects -n -o > rw,multiuser,cruid=1004,sec=krb5i > > $ cat /proc/19236/stack > > [<ffffffffa05c9470>] cifs_get_smb_ses+0x200/0x4c0 [cifs] > [<ffffffffa05c9cdd>] cifs_mount+0x5ad/0xad0 [cifs] > [<ffffffffa05b8f50>] cifs_do_mount+0xa0/0x4d0 [cifs] > [<ffffffff811ac239>] mount_fs+0x39/0x1b0 > [<ffffffff811c51df>] vfs_kern_mount+0x5f/0xf0 > [<ffffffff811c75ae>] do_mount+0x23e/0xa20 > [<ffffffff811c7e13>] SyS_mount+0x83/0xc0 > [<ffffffff81656859>] system_call_fastpath+0x16/0x1b > > The bash session which still had CWD on CIFS hangs in a similar stack: > > [<ffffffffa05dac23>] wait_for_response.isra.6+0xc3/0x120 [cifs] > [<ffffffffa05dbb9f>] SendReceive+0x12f/0x300 [cifs] > [<ffffffffa05ba3b6>] CIFSSMBNegotiate+0x116/0x6a0 [cifs] > [<ffffffffa05e2dd7>] cifs_negotiate+0x17/0x60 [cifs] > [<ffffffffa05c901f>] cifs_negotiate_protocol+0x7f/0xe0 [cifs] > [<ffffffffa05c947b>] cifs_get_smb_ses+0x20b/0x4c0 [cifs] > [<ffffffffa05ca6c9>] cifs_sb_tlink+0x4a9/0x850 [cifs] > [<ffffffffa05d4f91>] cifs_get_inode_info+0x51/0x7b0 [cifs] > [<ffffffffa05d6f3f>] cifs_revalidate_dentry_attr+0xef/0x1b0 [cifs] > [<ffffffffa05d70d7>] cifs_getattr+0x57/0x130 [cifs] > [<ffffffff811acf36>] vfs_getattr+0x46/0x80 > [<ffffffff811ad032>] vfs_fstatat+0x62/0xa0 > [<ffffffff811ad21f>] SYSC_newstat+0x1f/0x40 > [<ffffffff811ad66e>] SyS_newstat+0xe/0x10 > [<ffffffff81656859>] system_call_fastpath+0x16/0x1b > > > Version-Release number of selected component (if applicable): > kernel-3.11.9-200.fc19.x86_64 > > How reproducible: > happens sometimes when CIFS file systems are idle for some time. I haven't > done exhaustive reproducibility testing. > In the case at had I had a bash session open on one CIFS FS, so autofs > didn't timeout. > > Steps to Reproduce: > 1. configure CIFS filesystems using autofs > 2. cd to a CIFS file system in a terminal session > 3. wait a while > > Actual results: > Accesses to > > Expected results: > > > Additional info: > While I was typing in this bug, the CIFS accesses obviously timed out after > estimated 10-15 minutes. Now everything is fine again. > > However I verified that while the CIFS session was stalled, the server was > alive and well. I could easily access it using smbclient from the same host. > > During the error situation, I had set cifsFYI to 15 and saw the following > output: > > > [39990.240053] fs/cifs/connect.c: Received no data or error: expecting 4 > got -110 It certainly seems like the server wasn't responding: [jlayton@tlielax ~]$ grep 110 /usr/include/asm-generic/errno* /usr/include/asm-generic/errno.h:#define ETIMEDOUT 110 /* Connection timed out */ > [39990.240057] fs/cifs/connect.c: Reconnecting tcp session > [39990.240060] fs/cifs/connect.c: cifs_reconnect: marking sessions and tcons > for reconnect > [39990.240063] fs/cifs/connect.c: cifs_reconnect: tearing down socket > [39990.240065] fs/cifs/connect.c: State: 0x3 Flags: 0x0 > [39990.240068] fs/cifs/connect.c: Post shutdown state: 0x3 Flags: 0x0 > [39990.240083] fs/cifs/connect.c: cifs_reconnect: moving mids to private list > [39990.240086] fs/cifs/connect.c: cifs_reconnect: issuing mid callbacks > A network capture might be interesting...
> It certainly seems like the server wasn't responding: True. The question is why it didn't. As I noted, new connection could be setup easily with that server. My conjecture is that the kernel side was trying to refresh (?) an existing connection that had been dropped on the server long before, and therefore got no response. Next question is, if I'm right, why did the server drop the connection and the client did not? Would it be possible to use a shorter timeout value at this point? Both Windows and Netapp CIFS servers were stalled in the problem situation, but maybe that one stalled connection was just blocking everything else in the kernel's CIFS layer (you can judge better than me if that's possible - the traces above suggest that the bash process was stuck, and mount.cifs was just waiting for it to finish. The bash process was on a share on the Netapp server). > A network capture might be interesting... Sure. Unfortunately I have no deterministic procedure to reproduce this. It just seems to happen once in a while.
(In reply to Martin Wilck from comment #2) > Sure. Unfortunately I have no deterministic procedure to reproduce this. It > just seems to happen once in a while. One could run a variant of $ tcpdump -C 1 -i em1 -nv -w tcpdump.bin -W 3 which will only safe the last traffic and delete older dumps. Together with a port or ip filter it does maybe not slow down the system to much.
Having the same issue. I can reproduce it 100%. Not only shares mounted with mount.cifs are affected, but KDE KIO. Strangely, I simultaneously lost the ability to send emails to the SMTP server hosted on that same Windows box. Dec 14 16:13:55 localhost.localdomain kernel: CIFS VFS: Server 10.10.10.2 has not responded in 120 seconds. Reconnecting... Dec 14 16:16:01 localhost.localdomain kernel: CIFS VFS: Server 10.10.10.2 has not responded in 120 seconds. Reconnecting... Dec 14 16:18:07 localhost.localdomain kernel: CIFS VFS: Server 10.10.10.2 has not responded in 120 seconds. Reconnecting... Suspecting it was caused by one of the below updates: http://paste.fedoraproject.org/61856/87054993/
Interesting is that I can copy some files over to CIFS, but not others. Also I can send some emails but not all. The very short plain text emails send fine, but as soon as I am trying to reply, or send HTML emails with links and different fonts, they time out. From the current 3.11.10 I booted into 3.11.9 and 3.8.5 kernels and the issue is still there.
I was wrong about being able to copy some files - only 0 size files copied. Essentially they were created on the destination, but no actual copying occurred. Any >0 file freezes the copying program no matter what it is. There seems to be difference in behaviour of stalled program depending on the CIFS mounting options. When only username and password is provided, I am not observing high CPU usage in IOWait. If a full set of options is supplied, it's 100% of 1 CPU core in IOWait.
Using a Windows machine, I can pull a file from F19 Samba share. Copying files in a Windows VM running on F19 machine using shared folders works too.
Still waiting for any other info?
*********** MASS BUG UPDATE ************** We apologize for the inconvenience. There is a large number of bugs to go through and several of them have gone stale. Due to this, we are doing a mass bug update across all of the Fedora 19 kernel bugs. Fedora 19 has now been rebased to 3.12.6-200.fc19. Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel. If you have moved on to Fedora 20, and are still experiencing this issue, please change the version to Fedora 20. If you experience different issues, please open a new bug report for those.
This message is a notice that Fedora 19 is now at end of life. Fedora has stopped maintaining and issuing updates for Fedora 19. It is Fedora's policy to close all bug reports from releases that are no longer maintained. Approximately 4 (four) weeks from now this bug will be closed as EOL if it remains open with a Fedora 'version' of '19'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 19 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete.
I haven't seen this any more since upgrading to F20.
Fedora 19 changed to end-of-life (EOL) status on 2015-01-06. Fedora 19 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. If you experience problems, please add a comment to this bug. Thank you for reporting this bug and we are sorry it could not be fixed.