Bug 712088

Summary: cifs causing suspend to fail
Product: [Fedora] Fedora Reporter: Adam Williamson <awilliam>
Component: kernelAssignee: Jeff Layton <jlayton>
Status: CLOSED DUPLICATE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 15CC: aurorauddeborg, claus, dcbw, gansalmon, itamar, jiali, jonathan, kernel-maint, lpoetter, madhu.chinakonda, notting, redhat-bugzilla, sprabhu, steved, subscript
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-11-05 16:30:58 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
patch: add wait_event_freezekillable and have cifs use it
none
ring buffer log with cifsFYI turned up and extra waitqueue debugging
none
patch -- add wait_event_freezekillable and have cifs use it none

Description Adam Williamson 2011-06-09 13:34:18 UTC
I've talked to Jeff Layton about this and he thinks it's not the bug fixed by https://patchwork.kernel.org/patch/476581/ , but it could be related.

On my desktop, I can rarely successfully suspend; usually suspend fails with a trace blaming cifs. Here's all the traces currently in my log. I installed 2.6.38.7-30.1.fc15.x86_64 - a 2.6.38.7-30.fc15 build with the patch above added to it - at Jun  6 20:43:12, so the trace from before that may be the one fixed by that patch, but the ones after can't be.

Setting high severity because in this shiny new GNOME 3 world, we're all supposed to be able to suspend ;)

Jun  6 10:13:39 adam kernel: [151701.996159] Freezing user space processes ... 
Jun  6 10:13:39 adam kernel: [151721.991662] Freezing of tasks failed after 20.00 seconds (1 tasks refusing to freeze, wq_busy=0):
Jun  6 10:13:39 adam kernel: [151721.991689] umount          D 0000000000000000     0 18407  18390 0x00800084
Jun  6 10:13:39 adam kernel: [151721.991695]  ffff8803f6d87b58 0000000000000082 ffff880300000000 ffff8803ba28dc80
Jun  6 10:13:39 adam kernel: [151721.991701]  ffff8803f6d87fd8 ffff8803f6d87fd8 0000000000013840 0000000000013840
Jun  6 10:13:39 adam kernel: [151721.991706]  ffff88041f271720 ffff8803ba28dc80 ffff8803f6d87b58 000000018106f187
Jun  6 10:13:39 adam kernel: [151721.991711] Call Trace:
Jun  6 10:13:39 adam kernel: [151721.991733]  [<ffffffffa03b9cb0>] wait_for_response+0xbc/0xbe [cifs]
Jun  6 10:13:39 adam kernel: [151721.991737]  [<ffffffff8106f212>] ? autoremove_wake_function+0x0/0x3d
Jun  6 10:13:39 adam kernel: [151721.991741]  [<ffffffffa03baa95>] SendReceive+0x183/0x321 [cifs]
Jun  6 10:13:39 adam kernel: [151721.991744]  [<ffffffffa03a4d61>] CIFSSMBUnixQPathInfo+0x164/0x200 [cifs]
Jun  6 10:13:39 adam kernel: [151721.991749]  [<ffffffffa03b561f>] cifs_get_inode_info_unix+0x8e/0x165 [cifs]
Jun  6 10:13:39 adam kernel: [151721.991753]  [<ffffffffa03ae80b>] ? build_path_from_dentry+0xe0/0x1e9 [cifs]
Jun  6 10:13:39 adam kernel: [151721.991755]  [<ffffffff81111495>] ? __kmalloc+0xfb/0x10d
Jun  6 10:13:39 adam kernel: [151721.991758]  [<ffffffffa03ae80b>] ? build_path_from_dentry+0xe0/0x1e9 [cifs]
Jun  6 10:13:39 adam kernel: [151721.991761]  [<ffffffffa03ae902>] ? build_path_from_dentry+0x1d7/0x1e9 [cifs]
Jun  6 10:13:39 adam kernel: [151721.991765]  [<ffffffffa03b695d>] cifs_revalidate_dentry+0x10c/0x189 [cifs]
Jun  6 10:13:39 adam kernel: [151721.991768]  [<ffffffffa03b6a12>] cifs_getattr+0x38/0xc0 [cifs]
Jun  6 10:13:39 adam kernel: [151721.991771]  [<ffffffff81125113>] vfs_getattr+0x45/0x63
Jun  6 10:13:39 adam kernel: [151721.991772]  [<ffffffff8112517e>] vfs_fstatat+0x4d/0x63
Jun  6 10:13:39 adam kernel: [151721.991774]  [<ffffffff811251cf>] vfs_stat+0x1b/0x1d
Jun  6 10:13:39 adam kernel: [151721.991775]  [<ffffffff811252ce>] sys_newstat+0x1a/0x33
Jun  6 10:13:39 adam kernel: [151721.991777]  [<ffffffff81129f89>] ? path_put+0x1f/0x23
Jun  6 10:13:39 adam kernel: [151721.991779]  [<ffffffff8109fa08>] ? audit_syscall_entry+0x145/0x171
Jun  6 10:13:39 adam kernel: [151721.991781]  [<ffffffff81009bc2>] system_call_fastpath+0x16/0x1b
Jun  6 10:13:39 adam kernel: [151721.991783] 
Jun  6 10:13:39 adam kernel: [151721.991784] Restarting tasks ... done.

-------------------

Jun  6 23:31:08 adam kernel: [10050.432057] Freezing user space processes ... 
Jun  6 23:31:08 adam kernel: [10070.416787] Freezing of tasks failed after 20.00 seconds (1 tasks refusing to freeze, wq_busy=0):
Jun  6 23:31:08 adam kernel: [10070.416815] umount          D ffff88041b0d6950     0  5495   5477 0x00800084
Jun  6 23:31:08 adam kernel: [10070.416820]  ffff8803ce4afb58 0000000000000086 ffff8803ce4afaf8 ffff880404530000
Jun  6 23:31:08 adam kernel: [10070.416826]  ffff8803ce4affd8 ffff8803ce4affd8 0000000000013840 0000000000013840
Jun  6 23:31:08 adam kernel: [10070.416831]  ffff8803f4390000 ffff880404530000 ffff8803ce4afb58 ffffffff8106f1e3
Jun  6 23:31:08 adam kernel: [10070.416836] Call Trace:
Jun  6 23:31:08 adam kernel: [10070.416844]  [<ffffffff8106f1e3>] ? prepare_to_wait+0x6c/0x78
Jun  6 23:31:08 adam kernel: [10070.416863]  [<ffffffffa03d9d14>] wait_for_response+0xbc/0xbe [cifs]
Jun  6 23:31:08 adam kernel: [10070.416865]  [<ffffffff8106f26e>] ? autoremove_wake_function+0x0/0x3d
Jun  6 23:31:08 adam kernel: [10070.416868]  [<ffffffffa03daaf9>] SendReceive+0x183/0x321 [cifs]
Jun  6 23:31:08 adam kernel: [10070.416871]  [<ffffffffa03c4d61>] CIFSSMBUnixQPathInfo+0x164/0x200 [cifs]
Jun  6 23:31:08 adam kernel: [10070.416875]  [<ffffffffa03d5683>] cifs_get_inode_info_unix+0x8e/0x165 [cifs]
Jun  6 23:31:08 adam kernel: [10070.416878]  [<ffffffffa03ce86f>] ? build_path_from_dentry+0xe0/0x1e9 [cifs]
Jun  6 23:31:08 adam kernel: [10070.416880]  [<ffffffff811114a9>] ? __kmalloc+0xfb/0x10d
Jun  6 23:31:08 adam kernel: [10070.416883]  [<ffffffffa03ce86f>] ? build_path_from_dentry+0xe0/0x1e9 [cifs]
Jun  6 23:31:08 adam kernel: [10070.416885]  [<ffffffffa03ce966>] ? build_path_from_dentry+0x1d7/0x1e9 [cifs]
Jun  6 23:31:08 adam kernel: [10070.416888]  [<ffffffffa03d69c1>] cifs_revalidate_dentry+0x10c/0x189 [cifs]
Jun  6 23:31:08 adam kernel: [10070.416891]  [<ffffffffa03d6a76>] cifs_getattr+0x38/0xc0 [cifs]
Jun  6 23:31:08 adam kernel: [10070.416893]  [<ffffffff81125127>] vfs_getattr+0x45/0x63
Jun  6 23:31:08 adam kernel: [10070.416895]  [<ffffffff81125192>] vfs_fstatat+0x4d/0x63
Jun  6 23:31:08 adam kernel: [10070.416896]  [<ffffffff811251e3>] vfs_stat+0x1b/0x1d
Jun  6 23:31:08 adam kernel: [10070.416897]  [<ffffffff811252e2>] sys_newstat+0x1a/0x33
Jun  6 23:31:08 adam kernel: [10070.416899]  [<ffffffff81129f9d>] ? path_put+0x1f/0x23
Jun  6 23:31:08 adam kernel: [10070.416901]  [<ffffffff8109fa88>] ? audit_syscall_entry+0x145/0x171
Jun  6 23:31:08 adam kernel: [10070.416903]  [<ffffffff81009bc2>] system_call_fastpath+0x16/0x1b
Jun  6 23:31:08 adam kernel: [10070.416904] 
Jun  6 23:31:08 adam kernel: [10070.416904] Restarting tasks ... done.

--------------------------

Jun  8 01:35:32 adam kernel: [32920.913531] Freezing user space processes ... 
Jun  8 01:35:32 adam kernel: [32940.898389] Freezing of tasks failed after 20.00 seconds (1 tasks refusing to freeze, wq_busy=0):
Jun  8 01:35:32 adam kernel: [32940.898417] umount          D 0000000000000000     0  8529   8511 0x00800084
Jun  8 01:35:32 adam kernel: [32940.898423]  ffff8802ebc9bb58 0000000000000082 ffff880200000000 ffff8803dba60000
Jun  8 01:35:32 adam kernel: [32940.898428]  ffff8802ebc9bfd8 ffff8802ebc9bfd8 0000000000013840 0000000000013840
Jun  8 01:35:32 adam kernel: [32940.898433]  ffff88043f2a1730 ffff8803dba60000 ffff8802ebc9bb58 000000018106f1e3
Jun  8 01:35:32 adam kernel: [32940.898438] Call Trace:
Jun  8 01:35:32 adam kernel: [32940.898455]  [<ffffffffa03b0d14>] wait_for_response+0xbc/0xbe [cifs]
Jun  8 01:35:32 adam kernel: [32940.898466]  [<ffffffff8106f26e>] ? autoremove_wake_function+0x0/0x3d
Jun  8 01:35:32 adam kernel: [32940.898470]  [<ffffffffa03b1af9>] SendReceive+0x183/0x321 [cifs]
Jun  8 01:35:32 adam kernel: [32940.898473]  [<ffffffffa039bd61>] CIFSSMBUnixQPathInfo+0x164/0x200 [cifs]
Jun  8 01:35:32 adam kernel: [32940.898477]  [<ffffffffa03ac683>] cifs_get_inode_info_unix+0x8e/0x165 [cifs]
Jun  8 01:35:32 adam kernel: [32940.898480]  [<ffffffffa03a586f>] ? build_path_from_dentry+0xe0/0x1e9 [cifs]
Jun  8 01:35:32 adam kernel: [32940.898482]  [<ffffffff811114a9>] ? __kmalloc+0xfb/0x10d
Jun  8 01:35:32 adam kernel: [32940.898484]  [<ffffffffa03a586f>] ? build_path_from_dentry+0xe0/0x1e9 [cifs]
Jun  8 01:35:32 adam kernel: [32940.898487]  [<ffffffffa03a5966>] ? build_path_from_dentry+0x1d7/0x1e9 [cifs]
Jun  8 01:35:32 adam kernel: [32940.898490]  [<ffffffffa03ad9c1>] cifs_revalidate_dentry+0x10c/0x189 [cifs]
Jun  8 01:35:32 adam kernel: [32940.898493]  [<ffffffffa03ada76>] cifs_getattr+0x38/0xc0 [cifs]
Jun  8 01:35:32 adam kernel: [32940.898494]  [<ffffffff81125127>] vfs_getattr+0x45/0x63
Jun  8 01:35:32 adam kernel: [32940.898496]  [<ffffffff81125192>] vfs_fstatat+0x4d/0x63
Jun  8 01:35:32 adam kernel: [32940.898497]  [<ffffffff811251e3>] vfs_stat+0x1b/0x1d
Jun  8 01:35:32 adam kernel: [32940.898498]  [<ffffffff811252e2>] sys_newstat+0x1a/0x33
Jun  8 01:35:32 adam kernel: [32940.898500]  [<ffffffff81129f9d>] ? path_put+0x1f/0x23
Jun  8 01:35:32 adam kernel: [32940.898502]  [<ffffffff8109fa88>] ? audit_syscall_entry+0x145/0x171
Jun  8 01:35:32 adam kernel: [32940.898504]  [<ffffffff81009bc2>] system_call_fastpath+0x16/0x1b
Jun  8 01:35:32 adam kernel: [32940.898505] 
Jun  8 01:35:32 adam kernel: [32940.898505] Restarting tasks ... done.

--------------------

Jun  8 01:35:59 adam kernel: [32948.281192] Freezing user space processes ... 
Jun  8 01:35:59 adam kernel: [32968.266173] Freezing of tasks failed after 20.00 seconds (1 tasks refusing to freeze, wq_busy=0):
Jun  8 01:35:59 adam kernel: [32968.266200] umount          D 0000000000000000     0  8936   8919 0x00800084
Jun  8 01:35:59 adam kernel: [32968.266206]  ffff88040e435b58 0000000000000082 ffff880400000000 ffff880438d94590
Jun  8 01:35:59 adam kernel: [32968.266212]  ffff88040e435fd8 ffff88040e435fd8 0000000000013840 0000000000013840
Jun  8 01:35:59 adam kernel: [32968.266217]  ffff88043f271730 ffff880438d94590 ffff88040e435b58 000000018106f1e3
Jun  8 01:35:59 adam kernel: [32968.266222] Call Trace:
Jun  8 01:35:59 adam kernel: [32968.266239]  [<ffffffffa03b0d14>] wait_for_response+0xbc/0xbe [cifs]
Jun  8 01:35:59 adam kernel: [32968.266251]  [<ffffffff8106f26e>] ? autoremove_wake_function+0x0/0x3d
Jun  8 01:35:59 adam kernel: [32968.266255]  [<ffffffffa03b1af9>] SendReceive+0x183/0x321 [cifs]
Jun  8 01:35:59 adam kernel: [32968.266258]  [<ffffffffa039bd61>] CIFSSMBUnixQPathInfo+0x164/0x200 [cifs]
Jun  8 01:35:59 adam kernel: [32968.266261]  [<ffffffffa03ac683>] cifs_get_inode_info_unix+0x8e/0x165 [cifs]
Jun  8 01:35:59 adam kernel: [32968.266265]  [<ffffffffa03a586f>] ? build_path_from_dentry+0xe0/0x1e9 [cifs]
Jun  8 01:35:59 adam kernel: [32968.266266]  [<ffffffff811114a9>] ? __kmalloc+0xfb/0x10d
Jun  8 01:35:59 adam kernel: [32968.266269]  [<ffffffffa03a586f>] ? build_path_from_dentry+0xe0/0x1e9 [cifs]
Jun  8 01:35:59 adam kernel: [32968.266272]  [<ffffffffa03a5966>] ? build_path_from_dentry+0x1d7/0x1e9 [cifs]
Jun  8 01:35:59 adam kernel: [32968.266275]  [<ffffffffa03ad9c1>] cifs_revalidate_dentry+0x10c/0x189 [cifs]
Jun  8 01:35:59 adam kernel: [32968.266278]  [<ffffffffa03ada76>] cifs_getattr+0x38/0xc0 [cifs]
Jun  8 01:35:59 adam kernel: [32968.266279]  [<ffffffff81125127>] vfs_getattr+0x45/0x63
Jun  8 01:35:59 adam kernel: [32968.266281]  [<ffffffff81125192>] vfs_fstatat+0x4d/0x63
Jun  8 01:35:59 adam kernel: [32968.266282]  [<ffffffff811251e3>] vfs_stat+0x1b/0x1d
Jun  8 01:35:59 adam kernel: [32968.266283]  [<ffffffff811252e2>] sys_newstat+0x1a/0x33
Jun  8 01:35:59 adam kernel: [32968.266285]  [<ffffffff81129f9d>] ? path_put+0x1f/0x23
Jun  8 01:35:59 adam kernel: [32968.266286]  [<ffffffff8109fa88>] ? audit_syscall_entry+0x145/0x171
Jun  8 01:35:59 adam kernel: [32968.266288]  [<ffffffff81009bc2>] system_call_fastpath+0x16/0x1b
Jun  8 01:35:59 adam kernel: [32968.266289] 
Jun  8 01:35:59 adam kernel: [32968.266290] Restarting tasks ... done.

-------------------------------------

Jun  8 12:20:36 adam kernel: [12770.446927] Freezing of tasks failed after 20.00 seconds (1 tasks refusing to freeze, wq_busy=0):
Jun  8 12:20:36 adam kernel: [12770.446952] umount          D 0000000000000000     0 32480  32463 0x00800084
Jun  8 12:20:36 adam kernel: [12770.446957]  ffff8804192a5b58 0000000000000082 ffff880400000000 ffff880438c90000
Jun  8 12:20:36 adam kernel: [12770.446963]  ffff8804192a5fd8 ffff8804192a5fd8 0000000000013840 0000000000013840
Jun  8 12:20:36 adam kernel: [12770.446968]  ffff88043f271730 ffff880438c90000 ffff8804192a5b58 000000018106f1e3
Jun  8 12:20:36 adam kernel: [12770.446973] Call Trace:
Jun  8 12:20:36 adam kernel: [12770.446990]  [<ffffffffa03c3d14>] wait_for_response+0xbc/0xbe [cifs]
Jun  8 12:20:36 adam kernel: [12770.447002]  [<ffffffff8106f26e>] ? autoremove_wake_function+0x0/0x3d
Jun  8 12:20:36 adam kernel: [12770.447006]  [<ffffffffa03c4af9>] SendReceive+0x183/0x321 [cifs]
Jun  8 12:20:36 adam kernel: [12770.447009]  [<ffffffffa03aed61>] CIFSSMBUnixQPathInfo+0x164/0x200 [cifs]
Jun  8 12:20:36 adam kernel: [12770.447012]  [<ffffffffa03bf683>] cifs_get_inode_info_unix+0x8e/0x165 [cifs]
Jun  8 12:20:36 adam kernel: [12770.447016]  [<ffffffffa03b886f>] ? build_path_from_dentry+0xe0/0x1e9 [cifs]
Jun  8 12:20:36 adam kernel: [12770.447017]  [<ffffffff811114a9>] ? __kmalloc+0xfb/0x10d
Jun  8 12:20:36 adam kernel: [12770.447020]  [<ffffffffa03b886f>] ? build_path_from_dentry+0xe0/0x1e9 [cifs]
Jun  8 12:20:36 adam kernel: [12770.447022]  [<ffffffffa03b8966>] ? build_path_from_dentry+0x1d7/0x1e9 [cifs]
Jun  8 12:20:36 adam kernel: [12770.447026]  [<ffffffffa03c09c1>] cifs_revalidate_dentry+0x10c/0x189 [cifs]
Jun  8 12:20:36 adam kernel: [12770.447028]  [<ffffffffa03c0a76>] cifs_getattr+0x38/0xc0 [cifs]
Jun  8 12:20:36 adam kernel: [12770.447030]  [<ffffffff81125127>] vfs_getattr+0x45/0x63
Jun  8 12:20:36 adam kernel: [12770.447032]  [<ffffffff81125192>] vfs_fstatat+0x4d/0x63
Jun  8 12:20:36 adam kernel: [12770.447033]  [<ffffffff811251e3>] vfs_stat+0x1b/0x1d
Jun  8 12:20:36 adam kernel: [12770.447034]  [<ffffffff811252e2>] sys_newstat+0x1a/0x33
Jun  8 12:20:36 adam kernel: [12770.447036]  [<ffffffff81129f9d>] ? path_put+0x1f/0x23
Jun  8 12:20:36 adam kernel: [12770.447037]  [<ffffffff8109fa88>] ? audit_syscall_entry+0x145/0x171
Jun  8 12:20:36 adam kernel: [12770.447039]  [<ffffffff81009bc2>] system_call_fastpath+0x16/0x1b
Jun  8 12:20:36 adam kernel: [12770.447040] 
Jun  8 19:20:36 adam rtkit-daemon[1334]: The canary thread is apparently starving. Taking action.
Jun  8 19:20:36 adam rtkit-daemon[1334]: Demoting known real-time threads.
Jun  8 12:20:36 adam kernel: [12770.447041] Restarting tasks ... done.

-------------------------------------

Comment 1 Adam Williamson 2011-06-09 13:35:03 UTC
Forgot to mention - the offending CIFS mount should be:

//192.168.1.13/Volume_1 /share/data             cifs    rsize=8192,wsize=8192,nosuid,soft,user=guest,noauto,comment=systemd.automount   0 0

note the use of systemd automounting.

Comment 2 Jeff Layton 2011-06-10 17:17:28 UTC
Patch sent upstream and should make its way into stable kernels soon. I think we also need to fix the "send" codepath as well, and make the sleep we do in wait_for_response freezable, but it's a little tricky as there is not yet a wait_event_killable variant that is also freezable.

Comment 3 Adam Williamson 2011-06-29 17:52:38 UTC
Just to note, I've noticed a very similar problem with NFS in current Rawhide:

https://bugzilla.redhat.com/show_bug.cgi?id=717735

Comment 4 Adam Williamson 2011-07-06 19:24:01 UTC
Still happening with 3.0-0.rc6.git0.1.1.fc16.x86_64 (that's just 3.0-0.rc6.git0.1.fc16.x86_64 with the patch to fix https://bugzilla.redhat.com/show_bug.cgi?id=717060 applied). Traceback is now:

Jul  6 12:21:27 adam kernel: [ 6352.278471] PM: Syncing filesystems ... done.
Jul  6 12:21:47 adam kernel: [ 6352.379145] Freezing user space processes ... 
Jul  6 12:21:47 adam kernel: [ 6372.363733] Freezing of tasks failed after 20.00 seconds (1 tasks refusing to freeze, wq_busy=0):
Jul  6 12:21:47 adam kernel: [ 6372.363839] umount          D ffff88044623a7a0  4224  3728   3711 0x00800084
Jul  6 12:21:47 adam kernel: [ 6372.363843]  ffff8803bec7db38 0000000000000046 0000000000000292 ffff880400000000
Jul  6 12:21:47 adam kernel: [ 6372.363848]  ffff88044623a3d0 ffff8803bec7dfd8 ffff8803bec7dfd8 00000000001d2d00
Jul  6 12:21:47 adam kernel: [ 6372.363852]  ffff88044631c7a0 ffff88044623a3d0 ffff8803bec7db38 0000000181074c76
Jul  6 12:21:47 adam kernel: [ 6372.363856] Call Trace:
Jul  6 12:21:47 adam kernel: [ 6372.363864]  [<ffffffffa04eb8e0>] wait_for_response+0xbc/0xbe [cifs]
Jul  6 12:21:47 adam kernel: [ 6372.363869]  [<ffffffff81074d4c>] ? __init_waitqueue_head+0x4b/0x4b
Jul  6 12:21:47 adam kernel: [ 6372.363875]  [<ffffffffa04ec6fe>] SendReceive+0x19b/0x2b5 [cifs]
Jul  6 12:21:47 adam kernel: [ 6372.363879]  [<ffffffffa04d67e7>] CIFSSMBUnixQPathInfo+0x167/0x212 [cifs]
Jul  6 12:21:47 adam kernel: [ 6372.363883]  [<ffffffff814f36e5>] ? _raw_spin_unlock+0x28/0x2c
Jul  6 12:21:47 adam kernel: [ 6372.363888]  [<ffffffffa04e76ab>] cifs_get_inode_info_unix+0x91/0x16b [cifs]
Jul  6 12:21:47 adam kernel: [ 6372.363893]  [<ffffffffa04e082b>] ? build_path_from_dentry+0xda/0x1c2 [cifs]
Jul  6 12:21:47 adam kernel: [ 6372.363897]  [<ffffffffa04e082b>] ? build_path_from_dentry+0xda/0x1c2 [cifs]
Jul  6 12:21:47 adam kernel: [ 6372.363901]  [<ffffffff8112623e>] ? __kmalloc+0xfa/0x10c
Jul  6 12:21:47 adam kernel: [ 6372.363905]  [<ffffffffa04e082b>] ? build_path_from_dentry+0xda/0x1c2 [cifs]
Jul  6 12:21:47 adam kernel: [ 6372.363909]  [<ffffffffa04e08de>] ? build_path_from_dentry+0x18d/0x1c2 [cifs]
Jul  6 12:21:47 adam kernel: [ 6372.363914]  [<ffffffffa04e8972>] cifs_revalidate_dentry_attr+0x111/0x178 [cifs]
Jul  6 12:21:47 adam kernel: [ 6372.363919]  [<ffffffffa04e8ab7>] cifs_getattr+0x83/0x10b [cifs]
Jul  6 12:21:47 adam kernel: [ 6372.363922]  [<ffffffff8113b7d9>] vfs_getattr+0x45/0x63
Jul  6 12:21:47 adam kernel: [ 6372.363925]  [<ffffffff8113b84f>] vfs_fstatat+0x58/0x6e
Jul  6 12:21:47 adam kernel: [ 6372.363928]  [<ffffffff8113b8a0>] vfs_stat+0x1b/0x1d
Jul  6 12:21:47 adam kernel: [ 6372.363930]  [<ffffffff8113b99f>] sys_newstat+0x1a/0x33
Jul  6 12:21:47 adam kernel: [ 6372.363933]  [<ffffffff81140b21>] ? path_put+0x1f/0x23
Jul  6 12:21:47 adam kernel: [ 6372.363937]  [<ffffffff810ac3c9>] ? audit_syscall_entry+0x11c/0x148
Jul  6 12:21:47 adam kernel: [ 6372.363940]  [<ffffffff8125cbbe>] ? trace_hardirqs_on_thunk+0x3a/0x3f
Jul  6 12:21:47 adam kernel: [ 6372.363944]  [<ffffffff814fa242>] system_call_fastpath+0x16/0x1b
Jul  6 12:21:47 adam kernel: [ 6372.363947] 
Jul  6 12:21:47 adam kernel: [ 6372.363948] Restarting tasks ... done.

Comment 5 Jeff Layton 2011-07-20 11:20:01 UTC
Yes. The problem is that wait_event_killable doesn't handle the freezer. I think we also need to add a way to handle task freezing when in TASK_KILLABLE sleep. This is doable, but means adding a new wait_event_killable_freezable() sleep (or something along those lines).

Comment 6 Jeff Layton 2011-07-20 14:31:45 UTC
Created attachment 514019 [details]
patch: add wait_event_freezekillable and have cifs use it

Ok, here's a potential patch. I'm not 100% convinced that it doesn't have a race if a freeze event comes in, but it's probably good enough for an initial test. Adam, can you test this out and let me know if it helps?

The potential race I'm thinking about is here in __wait_event_killable:

                if (condition)                                          \
                        break;                                          \
                if (!fatal_signal_pending(current)) {                   \
                        schedule();                                     \
                        continue;                                       \
                }                                                       \

...I'm not sure if we'd continue to hang if we got a freeze event after checking the condition but before the schedule(). I'll probably need to run this by Rafal or someone else upstream to be sure.

Still, it's an unlikely race, so testing this patch would be helpful.

Comment 7 Adam Williamson 2011-07-21 15:47:20 UTC
roger, will test - thanks jeff.

Comment 8 Adam Williamson 2011-08-03 20:14:08 UTC
Still fails with the patch. Running 3.0.0-3.1.fc16.x86_64 - that's 3.0.0-3.fc16.x86_64 with the patch applied - I get this trace:

Aug  3 03:54:16 adam kernel: [151854.705496] PM: Syncing filesystems ... done.
Aug  3 03:54:36 adam kernel: [151854.932420] Freezing user space processes ... 
Aug  3 03:54:36 adam kernel: [151874.917385] Freezing of tasks failed after 20.00 seconds (1 tasks refusing to freeze, wq_busy=0):
Aug  3 03:54:36 adam kernel: [151874.917488] umount          D ffff88040b5c4b90  4944 30573  30556 0x00800084
Aug  3 03:54:36 adam kernel: [151874.917493]  ffff8803c57e7b38 0000000000000046 0000000000000292 ffff880400000000
Aug  3 03:54:36 adam kernel: [151874.917498]  ffff88040b5c47c0 ffff8803c57e7fd8 ffff8803c57e7fd8 00000000001d3080
Aug  3 03:54:36 adam kernel: [151874.917502]  ffff880446308000 ffff88040b5c47c0 ffff8803c57e7b38 00000001810764ae
Aug  3 03:54:36 adam kernel: [151874.917506] Call Trace:
Aug  3 03:54:36 adam kernel: [151874.917515]  [<ffffffffa04d77c0>] wait_for_response+0xbc/0xbe [cifs]
Aug  3 03:54:36 adam kernel: [151874.917520]  [<ffffffff81076584>] ? __init_waitqueue_head+0x4b/0x4b
Aug  3 03:54:36 adam kernel: [151874.917525]  [<ffffffffa04d85de>] SendReceive+0x19b/0x2b5 [cifs]
Aug  3 03:54:36 adam kernel: [151874.917530]  [<ffffffffa04c25e7>] CIFSSMBUnixQPathInfo+0x167/0x212 [cifs]
Aug  3 03:54:36 adam kernel: [151874.917535]  [<ffffffffa04d358b>] cifs_get_inode_info_unix+0x91/0x16b [cifs]
Aug  3 03:54:36 adam kernel: [151874.917539]  [<ffffffff8108abfe>] ? lock_acquire+0xdb/0x103
Aug  3 03:54:36 adam kernel: [151874.917543]  [<ffffffffa04cc4ec>] ? cifs_ci_compare+0x70/0x70 [cifs]
Aug  3 03:54:36 adam kernel: [151874.917546]  [<ffffffff8108aafa>] ? lock_release+0x173/0x19c
Aug  3 03:54:36 adam kernel: [151874.917550]  [<ffffffffa04cc542>] ? rcu_read_unlock+0x21/0x23 [cifs]
Aug  3 03:54:36 adam kernel: [151874.917555]  [<ffffffffa04cc78f>] ? build_path_from_dentry+0x1e6/0x21b [cifs]
Aug  3 03:54:36 adam kernel: [151874.917560]  [<ffffffffa04d4852>] cifs_revalidate_dentry_attr+0x111/0x178 [cifs]
Aug  3 03:54:36 adam kernel: [151874.917565]  [<ffffffffa04d4997>] cifs_getattr+0x83/0x10b [cifs]
Aug  3 03:54:36 adam kernel: [151874.917569]  [<ffffffff8113eb65>] vfs_getattr+0x45/0x63
Aug  3 03:54:36 adam kernel: [151874.917572]  [<ffffffff8113ebdb>] vfs_fstatat+0x58/0x6e
Aug  3 03:54:36 adam kernel: [151874.917575]  [<ffffffff8113ec2c>] vfs_stat+0x1b/0x1d
Aug  3 03:54:36 adam kernel: [151874.917577]  [<ffffffff8113ed2b>] sys_newstat+0x1a/0x33
Aug  3 03:54:36 adam kernel: [151874.917580]  [<ffffffff81143ecd>] ? path_put+0x1f/0x23
Aug  3 03:54:36 adam kernel: [151874.917584]  [<ffffffff810aff85>] ? audit_syscall_entry+0x11c/0x148
Aug  3 03:54:36 adam kernel: [151874.917587]  [<ffffffff812452be>] ? trace_hardirqs_on_thunk+0x3a/0x3f
Aug  3 03:54:36 adam kernel: [151874.917591]  [<ffffffff814e4fc2>] system_call_fastpath+0x16/0x1b
Aug  3 03:54:36 adam kernel: [151874.917594] 
Aug  3 03:54:36 adam kernel: [151874.917595] Restarting tasks ... done.

Comment 9 Adam Williamson 2011-09-13 16:08:30 UTC
jeff: any news here? this has been making it impossible for me to suspend for weeks :/

Comment 10 Jeff Layton 2011-09-14 16:54:11 UTC
Sorry, I got sidetracked on some other stuff, but I took another look at this today.
I added some debug code in the __wait_event_killable macro, but I think there's
a crucial hint here in the stack trace you posted as well. Here's the one from my
machine:

[18662.635383] sky2 0000:09:00.0: eth0: disabling interface
[18662.790079] 9718: __wait_event_killable wakeup, freezing=0
[18664.024830] PM: Syncing filesystems ... done.
[18664.041172] PM: Preparing system for mem sleep
[18664.192465] Freezing user space processes ... 
[18684.202257] Freezing of tasks failed after 20.00 seconds (1 tasks refusing to freeze, wq_busy=0):
[18684.202372] umount          D ffff88010fe94700     0  9718   9701 0x00800084
[18684.202376]  ffff8801169c3b08 0000000000000082 ffff8801169c3b40 ffff88003796e140
[18684.202380]  ffff880117dd8000 ffff8801169c3fd8 ffff8801169c3fd8 0000000000012540
[18684.202383]  ffff88010af92e60 ffff880117dd8000 ffff8801169c3b18 ffff8801169c3ad8
[18684.202387] Call Trace:

...so what happened here is that process 9718 got woken up just after eth0 went
down, but not by the freezer. I suspect it got woken up by cifsd or something
(though I'd need to do some more debugging to discern that).

The crucial thing though here is the actual process name -- "umount". It looks
like what's happening here is that the suspend code is trying to unmount the cifs
mount *after* taking down the network interfaces.

I think that's a big part of the problem here. If the userspace suspend code
wants to unmount the filesystem, it ought to wait to take down the interfaces
until after that. If not, then it probably ought not try to unmount the fs at
all.

I suspect that if that's fixed, then this problem will go away. I'm just a little
out of the loop on how suspending in userspace works. Why does it try to unmount
the filesystems here in the first place?

Comment 11 Jeff Layton 2011-09-14 16:59:31 UTC
I guess, to summarize my point -- why attempt to unmount the fs at all if you're going to rip out the network interfaces before that can complete? At that point you might as well just leave them mounted.

In the meantime I'll keep looking at what's happening with the hung process, but something seems wrong in this design...

Comment 12 Jeff Layton 2011-09-14 17:16:42 UTC
Created attachment 523204 [details]
ring buffer log with cifsFYI turned up and extra waitqueue debugging

Here we can see what's happening (at least on my laptop). Something downs the network interface, and nearly simultaneously attempts to unmount the cifs fs. That unmount can't complete though (since the interface is down).

Eventually after waking back up, it does complete, but by then the suspend has given up.

Note that about half the time, suspending in this situation works on this host. When that occurs though, it looks like the unmount finishes before the network interface goes down.

The thing I'm trying to determine now is -- why is the umount process not being woken up after this printk pops?

[18664.192465] Freezing user space processes ...

Comment 13 Adam Williamson 2011-09-14 17:28:47 UTC
so, the user-space side of things is handled in pm-suspend, which is part of pm-utils. pm-suspend itself is a script, and it appears to source scripts in /usr/lib64/pm-utils/sleep.d/ . One of those is /usr/lib64/pm-utils/sleep.d/55NetworkManager , which does the 'take down the network interface' bit. I am having trouble finding out what's doing the 'umount filesystems' bit, though. still looking.

Comment 14 Adam Williamson 2011-09-14 17:32:41 UTC
it looks like setting PM_DEBUG = true might get more debugging output from pm_utils, in case that might help.

Comment 15 Adam Williamson 2011-09-14 17:35:53 UTC
could it be systemd's automount stuff which is trying to take down the mount?

Comment 16 Adam Williamson 2011-09-14 17:43:12 UTC
let's throw some CCs at the problem!

lennart, bill - any idea what in userspace would be trying to umount a CIFS share (configured as systemd.automount) at the same time as the network is being taken down, on suspend?

Comment 17 Bill Nottingham 2011-09-14 17:48:03 UTC
netfs init script is called from a NetworkManager dispatcher script.

See /etc/NetworkManager/dispatcher.d/05-netfs.

Comment 18 Adam Williamson 2011-09-14 18:05:26 UTC
okay. I notice that /etc/rc.d/init.d/netfs excepts mount points specified as 'noauto' from its 'start' action but for 'stop' it uses any point listed in /etc/mtab :

CIFSMTAB=$(LC_ALL=C awk '$3 == "cifs" { print $2 }' /proc/mounts | tac)

and that awk command does indeed return my /share/data partition. So it looks likely that indeed is what's making it get umounted.

so, presumably it should be up to NM to make sure it doesn't take the interface down until the umount is actually complete? let's add dcbw to the CC party.

Comment 19 Jeff Layton 2011-09-14 18:15:08 UTC
The question I have is -- why unmount these in the first place? Sure, the netif might go away but you might have the fs pinned in place. For instance, a user process could be sitting with its cwd inside a cifs or nfs mount. A umount will then fail...

It seems to me that we'd be better off not trying to unmount those on suspend at all. It's fine when the mount is idle, but when it isn't it's not likely to work.

That does mean that we'll need to ensure that processes that are hung waiting on I/O can still be frozen though. I think that's something we can achieve, but I'm still working through the details.

Comment 20 Jeff Layton 2011-09-14 20:08:42 UTC
Created attachment 523251 [details]
patch -- add wait_event_freezekillable and have cifs use it

A more highly experimental patch than the last one. This one also modifies fake_signal_wake_up to wake up TASK_KILLABLE tasks. I don't *think* this would break anything but I'm far from certain.

This seems to fix the problem for me, but it's quite possible that it'll get shot down upstream. Still, results testing with this would be helpful...

Comment 21 Jeff Layton 2011-09-15 14:39:14 UTC
To clarify...even if you fix the suspend userspace code not to try and
unmount we still will likely need to fix this, regardless. It's possible
that when the network interfaces are taken down for the suspend that
there's another process on the box that's doing something on the cifs
mount. If there is, then we need to allow it to freeze while it's waiting
on the reply. When it wakes back up, the cifs code should deal with the
reconnect...

Comment 22 Adam Williamson 2011-09-15 17:40:29 UTC
Note that someone mentioned in https://bugzilla.redhat.com/show_bug.cgi?id=717735 - the highly similar bug for NFS - that it was working for them in rc6, and indeed I just tried a suspend last night with rc6, and it worked. Has there been a change recently that might affect this?

I've only had success once, so it'd be premature to close the bug, but it's interesting. Probably worth noting I had a process actively using the mount when I suspended, which is not usually the case.

Comment 23 Adam Williamson 2011-10-25 00:46:32 UTC
Been running kernels with this patch on my laptop and desktop for a week or so, noticed no problems and have been able to suspend both systems first time, every time. Looks good to me!

Comment 24 Adam Williamson 2011-11-05 16:30:58 UTC

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

*** This bug has been marked as a duplicate of bug 717735 ***