See also https://bugzilla.redhat.com/show_bug.cgi?id=712088 , which is a very similar bug with CIFS. I currently have a share on my NAS mounted via NFS: nas:/mnt/HD_a2/ on /share/data type nfs (rw,relatime,vers=3,rsize=524288,wsize=524288,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.1.13,mountvers=3,mountport=2049,mountproto=udp,local_lock=none,addr=192.168.1.13) Trying to suspend this system last night failed, twice, with this trace: Jun 29 00:31:36 adam kernel: [14989.522021] Freezing user space processes ... Jun 29 00:31:36 adam kernel: [15009.506600] Freezing of tasks failed after 20.00 seconds (1 tasks refusing to freeze, wq_busy=0): Jun 29 00:31:36 adam kernel: [15009.506695] umount D ffff88041a0a03d0 4992 24634 24613 0x00800084 Jun 29 00:31:36 adam kernel: [15009.506700] ffff8803830e5b78 0000000000000046 ffff8803830e5c28 0000000000000296 Jun 29 00:31:36 adam kernel: [15009.506704] ffff88041a0a0000 ffff8803830e5fd8 ffff8803830e5fd8 00000000001d2d00 Jun 29 00:31:36 adam kernel: [15009.506708] ffff880440788000 ffff88041a0a0000 ffff88045f6e3ea8 0000000000000082 Jun 29 00:31:36 adam kernel: [15009.506712] Call Trace: Jun 29 00:31:36 adam kernel: [15009.506724] [<ffffffffa03e5d46>] ? rpc_queue_empty+0x31/0x31 [sunrpc] Jun 29 00:31:36 adam kernel: [15009.506730] [<ffffffffa03e5d7a>] rpc_wait_bit_killable+0x34/0x38 [sunrpc] Jun 29 00:31:36 adam kernel: [15009.506734] [<ffffffff814f1873>] __wait_on_bit+0x48/0x7b Jun 29 00:31:36 adam kernel: [15009.506737] [<ffffffff814f1918>] out_of_line_wait_on_bit+0x72/0x7d Jun 29 00:31:36 adam kernel: [15009.506743] [<ffffffffa03e6a30>] ? __rpc_execute+0xb2/0x257 [sunrpc] Jun 29 00:31:36 adam kernel: [15009.506748] [<ffffffffa03e5d46>] ? rpc_queue_empty+0x31/0x31 [sunrpc] Jun 29 00:31:36 adam kernel: [15009.506752] [<ffffffff81074d89>] ? autoremove_wake_function+0x3d/0x3d Jun 29 00:31:36 adam kernel: [15009.506758] [<ffffffffa03e6a70>] __rpc_execute+0xf2/0x257 [sunrpc] Jun 29 00:31:36 adam kernel: [15009.506761] [<ffffffff81074ab4>] ? wake_up_bit+0x25/0x2a Jun 29 00:31:36 adam kernel: [15009.506766] [<ffffffffa03e6c42>] rpc_execute+0x3f/0x43 [sunrpc] Jun 29 00:31:36 adam kernel: [15009.506770] [<ffffffffa03dfef7>] rpc_run_task+0x86/0x8e [sunrpc] Jun 29 00:31:36 adam kernel: [15009.506775] [<ffffffffa03dffec>] rpc_call_sync+0x45/0x66 [sunrpc] Jun 29 00:31:36 adam kernel: [15009.506785] [<ffffffffa045c982>] nfs3_rpc_wrapper.constprop.7+0x2c/0x64 [nfs] Jun 29 00:31:36 adam kernel: [15009.506793] [<ffffffffa045da54>] nfs3_proc_getattr+0x5d/0x83 [nfs] Jun 29 00:31:36 adam kernel: [15009.506799] [<ffffffffa044f24b>] __nfs_revalidate_inode+0xb4/0x1a2 [nfs] Jun 29 00:31:36 adam kernel: [15009.506805] [<ffffffffa044f492>] nfs_revalidate_inode+0x4a/0x51 [nfs] Jun 29 00:31:36 adam kernel: [15009.506811] [<ffffffffa044f571>] nfs_getattr+0x92/0xc3 [nfs] Jun 29 00:31:36 adam kernel: [15009.506815] [<ffffffff8113b7d9>] vfs_getattr+0x45/0x63 Jun 29 00:31:36 adam kernel: [15009.506817] [<ffffffff8113b84f>] vfs_fstatat+0x58/0x6e Jun 29 00:31:36 adam kernel: [15009.506820] [<ffffffff8113b8a0>] vfs_stat+0x1b/0x1d Jun 29 00:31:36 adam kernel: [15009.506823] [<ffffffff8113b99f>] sys_newstat+0x1a/0x33 Jun 29 00:31:36 adam kernel: [15009.506825] [<ffffffff81140b21>] ? path_put+0x1f/0x23 Jun 29 00:31:36 adam kernel: [15009.506829] [<ffffffff810ac3c9>] ? audit_syscall_entry+0x11c/0x148 Jun 29 00:31:36 adam kernel: [15009.506833] [<ffffffff8125cbae>] ? trace_hardirqs_on_thunk+0x3a/0x3f Jun 29 00:31:36 adam kernel: [15009.506836] [<ffffffff814fa182>] system_call_fastpath+0x16/0x1b Jun 29 00:31:36 adam kernel: [15009.506839] Jun 29 00:31:36 adam kernel: [15009.506840] Restarting tasks ... done. Suspend bugs are a bit more important now we're in the Glorious Era Of GNOME 3...
Same problem here. I also tried automounting the NAS which resulted in a failed suspend but which caused the mount to be unmounted, allowing a second attempt to suspend to succeed.
OOPS! Just noticed this was for rawhide - I'm on F15.
probably still the same bug.
same here on F15: Aug 1 08:58:55 jan kernel: [ 1525.755153] Freezing of tasks failed after 20.00 seconds (1 tasks refusing to freeze, wq_busy=0): Aug 1 08:58:55 jan kernel: [ 1525.755275] umount D ffff8800bf683b50 0 2176 2156 0x00800084 Aug 1 08:58:55 jan kernel: [ 1525.755283] ffff880131035b68 0000000000000082 0000000000000246 ffff88010f454590 Aug 1 08:58:55 jan kernel: [ 1525.755290] ffff880131035fd8 ffff880131035fd8 0000000000013840 0000000000013840 Aug 1 08:58:55 jan kernel: [ 1525.755297] ffff88012fbcc590 ffff88010f454590 ffff880131035b38 ffffffff81478454 Aug 1 08:58:55 jan kernel: [ 1525.755304] Call Trace: Aug 1 08:58:55 jan kernel: [ 1525.755317] [<ffffffff81478454>] ? _raw_spin_unlock_irqrestore+0x17/0x19 Aug 1 08:58:55 jan kernel: [ 1525.755339] [<ffffffffa031107b>] ? rpc_wait_bit_killable+0x0/0x38 [sunrpc] Aug 1 08:58:55 jan kernel: [ 1525.755355] [<ffffffffa03110af>] rpc_wait_bit_killable+0x34/0x38 [sunrpc] Aug 1 08:58:55 jan kernel: [ 1525.755361] [<ffffffff814773cd>] __wait_on_bit+0x48/0x7b Aug 1 08:58:55 jan kernel: [ 1525.755367] [<ffffffff8106acc7>] ? queue_work_on+0x37/0x45 Aug 1 08:58:55 jan kernel: [ 1525.755373] [<ffffffff81477472>] out_of_line_wait_on_bit+0x72/0x7d Aug 1 08:58:55 jan kernel: [ 1525.755388] [<ffffffffa031107b>] ? rpc_wait_bit_killable+0x0/0x38 [sunrpc] Aug 1 08:58:55 jan kernel: [ 1525.755393] [<ffffffff8106f2ab>] ? wake_bit_function+0x0/0x31 Aug 1 08:58:55 jan kernel: [ 1525.755409] [<ffffffffa0311c80>] __rpc_execute+0xf2/0x295 [sunrpc] Aug 1 08:58:55 jan kernel: [ 1525.755414] [<ffffffff8106f021>] ? wake_up_bit+0x25/0x2a Aug 1 08:58:55 jan kernel: [ 1525.755429] [<ffffffffa0311e90>] rpc_execute+0x3f/0x43 [sunrpc] Aug 1 08:58:55 jan kernel: [ 1525.755442] [<ffffffffa030bdde>] rpc_run_task+0xeb/0xf7 [sunrpc] Aug 1 08:58:55 jan kernel: [ 1525.755454] [<ffffffffa030bed7>] rpc_call_sync+0x45/0x66 [sunrpc] Aug 1 08:58:55 jan kernel: [ 1525.755479] [<ffffffffa0483b0a>] nfs3_rpc_wrapper.constprop.7+0x2c/0x64 [nfs] Aug 1 08:58:55 jan kernel: [ 1525.755501] [<ffffffffa0484bdf>] nfs3_proc_getattr+0x5d/0x83 [nfs] Aug 1 08:58:55 jan kernel: [ 1525.755518] [<ffffffffa0476c14>] __nfs_revalidate_inode+0xb4/0x1a2 [nfs] Aug 1 08:58:55 jan kernel: [ 1525.755534] [<ffffffffa0476e52>] nfs_revalidate_inode+0x4a/0x51 [nfs] Aug 1 08:58:55 jan kernel: [ 1525.755550] [<ffffffffa0476f31>] nfs_getattr+0x92/0xc4 [nfs] Aug 1 08:58:55 jan kernel: [ 1525.755557] [<ffffffff81124fb7>] vfs_getattr+0x45/0x63 Aug 1 08:58:55 jan kernel: [ 1525.755562] [<ffffffff8104708d>] ? pick_next_task_fair+0xae/0xc1 Aug 1 08:58:55 jan kernel: [ 1525.755567] [<ffffffff81125022>] vfs_fstatat+0x4d/0x63 Aug 1 08:58:55 jan kernel: [ 1525.755572] [<ffffffff81047bda>] ? pick_next_task+0x2a/0x4e Aug 1 08:58:55 jan kernel: [ 1525.755576] [<ffffffff81125073>] vfs_stat+0x1b/0x1d Aug 1 08:58:55 jan kernel: [ 1525.755581] [<ffffffff81125172>] sys_newstat+0x1a/0x33 Aug 1 08:58:55 jan kernel: [ 1525.755586] [<ffffffff81129e2d>] ? path_put+0x1f/0x23 Aug 1 08:58:55 jan kernel: [ 1525.755592] [<ffffffff8109fa68>] ? audit_syscall_entry+0x145/0x171 Aug 1 08:58:55 jan kernel: [ 1525.755598] [<ffffffff81009bc2>] system_call_fastpath+0x16/0x1b Aug 1 08:58:55 jan kernel: [ 1525.755602] Aug 1 08:58:55 jan kernel: [ 1525.755604] Restarting tasks ... done. Aug 1 08:58:55 jan kernel: [ 1525.859058] video LNXVIDEO:00: Restoring backlight state
also confirmed for F16
seems to be fixed in 3.1.0-0.rc6.git0.0.fc16.x86_64
Changing this to F15 so we can get it fixed there.
This problem is almost certainly the NFS equivalent of bug 712088. What I've found with the cifs equivalent bug is that there is some raciness involved here. If the fs ends up getting cleanly unmounted before the network interfaces go down, then suspending will work. If it does not though, then it'll generally fail. My suspicion is that the timing may have changed here and the umount is finishing before the network interfaces go down. I see nothing right offhand that would fix this in any recent kernel. How sure are you that it's fixed? What might be interesting is to run some network heavy activity on a NFS mount and try repeatedly testing suspends. Assuming that it's not really fixed, I suspect that we might be able to fix this with a similar approach to the patch in bug 717735. If so then this will require some similar changes to that problem so I'll go ahead and grab this one too.
Created attachment 523426 [details] patch -- allow cifs and nfs TASK_KILLABLE sleeps to freeze Here's a mostly untested set of patches that I think will probably fix this. The fix is twofold... First, we have to allow the freezer to wake processes that are in TASK_KILLABLE sleep. This is probably the most controversial part of the patchset, but I think it'll probably be harmless. We'll see what the linux-pm folks think though... Next we have to teach the wait_bit_killable variants in the NFS and RPC code to try_to_freeze when they are woken up without a fatal signal. I also threw the cifs patch for this problem in for good measure. If you can test this set and let me know if it really fixes the issue then I'll see about getting these in for 3.2...
Created attachment 524636 [details] patch -- allow cifs and nfs TASK_KILLABLE sleeps to freeze Revised patch. This also fixes some cases where the NFS layer will sleep during locking and NFSERR_JUKEBOX sort of events. It also properly includes freezer.h.
Sorry I haven't tested this yet, Jeff, Beta has just been too busy for me to spend time on anything not-Beta really :( I'll get to it ASAP. suspend is still working for me most of the time with non-debug kernels, so they're definitely changing the timing a bit - I'll test the patch both with a debug-enabled and a debug-disabled kernel.
I have a similar problem but the hung task is fuser, not umount: Oct 16 23:51:30 farina kernel: [21003.895160] PM: Syncing filesystems ... done. Oct 16 23:51:30 farina kernel: [21004.042552] Freezing user space processes ... Oct 16 23:51:30 farina kernel: [21024.051046] Freezing of tasks failed after 20.00 seconds (1 tasks refusing to freeze, wq_busy=0): Oct 16 23:51:30 farina kernel: [21024.051174] fuser D 0000000000000000 0 7082 6840 0x00800084 Oct 16 23:51:30 farina kernel: [21024.051178] ffff880098b25ae8 0000000000000082 ffffffff8148858a ffff880100000000 Oct 16 23:51:30 farina kernel: [21024.051181] ffff8800b1ae4590 ffff880098b25fd8 ffff880098b25fd8 0000000000012540 Oct 16 23:51:30 farina kernel: [21024.051184] ffffffff81a0b020 ffff8800b1ae4590 ffff88012ffb3b80 0000000100000246 Oct 16 23:51:30 farina kernel: [21024.051187] Call Trace: Oct 16 23:51:30 farina kernel: [21024.051194] [<ffffffff8148858a>] ? _raw_spin_lock_irqsave+0x12/0x2f Oct 16 23:51:30 farina kernel: [21024.051208] [<ffffffffa0db7847>] ? rpc_queue_empty+0x2e/0x2e [sunrpc] Oct 16 23:51:30 farina kernel: [21024.051212] [<ffffffff8104fbd6>] schedule+0x5a/0x5c Oct 16 23:51:30 farina kernel: [21024.051220] [<ffffffffa0db787b>] rpc_wait_bit_killable+0x34/0x38 [sunrpc] Oct 16 23:51:30 farina kernel: [21024.051223] [<ffffffff814875a4>] __wait_on_bit+0x48/0x7b Oct 16 23:51:30 farina kernel: [21024.051226] [<ffffffff8105a76b>] ? _local_bh_enable_ip+0x25/0x8e Oct 16 23:51:30 farina kernel: [21024.051229] [<ffffffff81487649>] out_of_line_wait_on_bit+0x72/0x7d Oct 16 23:51:30 farina kernel: [21024.051237] [<ffffffffa0db7847>] ? rpc_queue_empty+0x2e/0x2e [sunrpc] Oct 16 23:51:30 farina kernel: [21024.051239] [<ffffffff81070687>] ? autoremove_wake_function+0x3d/0x3d Oct 16 23:51:30 farina kernel: [21024.051247] [<ffffffffa0db841c>] __rpc_execute+0xf0/0x293 [sunrpc] Oct 16 23:51:30 farina kernel: [21024.051255] [<ffffffffa0db862c>] rpc_execute+0x3f/0x43 [sunrpc] Oct 16 23:51:30 farina kernel: [21024.051261] [<ffffffffa0db1e0f>] rpc_run_task+0x86/0x8e [sunrpc] Oct 16 23:51:30 farina kernel: [21024.051267] [<ffffffffa0db1f04>] rpc_call_sync+0x45/0x66 [sunrpc] Oct 16 23:51:30 farina kernel: [21024.051284] [<ffffffffa0e104ba>] _nfs4_call_sync+0x21/0x23 [nfs] Oct 16 23:51:30 farina kernel: [21024.051296] [<ffffffffa0e0dd75>] nfs4_call_sync+0x16/0x18 [nfs] Oct 16 23:51:30 farina kernel: [21024.051308] [<ffffffffa0e0ebe3>] _nfs4_proc_getattr+0x97/0xa5 [nfs] Oct 16 23:51:30 farina kernel: [21024.051321] [<ffffffffa0e11ddb>] nfs4_proc_getattr+0x36/0x55 [nfs] Oct 16 23:51:30 farina kernel: [21024.051330] [<ffffffffa0dfcffd>] __nfs_revalidate_inode+0xb4/0x1a2 [nfs] Oct 16 23:51:30 farina kernel: [21024.051338] [<ffffffffa0dfd309>] nfs_getattr+0x81/0xc4 [nfs] Oct 16 23:51:30 farina kernel: [21024.051342] [<ffffffff8112abb3>] vfs_getattr+0x45/0x63 Oct 16 23:51:30 farina kernel: [21024.051344] [<ffffffff8112ac29>] vfs_fstatat+0x58/0x6e Oct 16 23:51:30 farina kernel: [21024.051346] [<ffffffff8112ac7a>] vfs_stat+0x1b/0x1d Oct 16 23:51:30 farina kernel: [21024.051349] [<ffffffff8112ad79>] sys_newstat+0x1a/0x33 Oct 16 23:51:30 farina kernel: [21024.051351] [<ffffffff8112fba4>] ? path_put+0x20/0x24 Oct 16 23:51:30 farina kernel: [21024.051354] [<ffffffff810a0f88>] ? audit_syscall_entry+0x145/0x171 Oct 16 23:51:30 farina kernel: [21024.051358] [<ffffffff8148ed02>] system_call_fastpath+0x16/0x1b Oct 16 23:51:30 farina kernel: [21024.051360] Oct 16 23:51:30 farina kernel: [21024.051361] Restarting tasks ... done. fuser was monitoring an NFS mount point I have 192.168.2.17:/efserv/ /efserv nfs4 rw,relatime,vers=4,rsize=131072,wsize=131072,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.2.11,minorversion=0,local_lock=none,addr=192.168.2.17 0 0 [root@farina ~]# uname -a Linux farina.dj.edm 2.6.40.6-0.fc15.x86_64 #1 SMP Tue Oct 4 00:39:50 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux Though here I tried to reproduce and it's umount now: Oct 17 00:07:21 farina kernel: [21944.859957] PM: Syncing filesystems ... done. Oct 17 00:07:21 farina kernel: [21944.866209] Freezing user space processes ... Oct 17 00:07:21 farina kernel: [21964.875048] Freezing of tasks failed after 20.00 seconds (1 tasks refusing to freeze, wq_busy=0): Oct 17 00:07:21 farina kernel: [21964.875182] umount D 0000000000000000 0 8070 8050 0x00800084 Oct 17 00:07:21 farina kernel: [21964.875186] ffff880125ea1ae8 0000000000000086 ffffffff8148858a ffff880100000000 Oct 17 00:07:21 farina kernel: [21964.875190] ffff88010615c590 ffff880125ea1fd8 ffff880125ea1fd8 0000000000012540 Oct 17 00:07:21 farina kernel: [21964.875193] ffffffff81a0b020 ffff88010615c590 ffff88012ffb3b80 0000000100000246 Oct 17 00:07:21 farina kernel: [21964.875195] Call Trace: Oct 17 00:07:21 farina kernel: [21964.875202] [<ffffffff8148858a>] ? _raw_spin_lock_irqsave+0x12/0x2f Oct 17 00:07:21 farina kernel: [21964.875217] [<ffffffffa0db7847>] ? rpc_queue_empty+0x2e/0x2e [sunrpc] Oct 17 00:07:21 farina kernel: [21964.875221] [<ffffffff8104fbd6>] schedule+0x5a/0x5c Oct 17 00:07:21 farina kernel: [21964.875229] [<ffffffffa0db787b>] rpc_wait_bit_killable+0x34/0x38 [sunrpc] Oct 17 00:07:21 farina kernel: [21964.875232] [<ffffffff814875a4>] __wait_on_bit+0x48/0x7b Oct 17 00:07:21 farina kernel: [21964.875235] [<ffffffff8105a76b>] ? _local_bh_enable_ip+0x25/0x8e Oct 17 00:07:21 farina kernel: [21964.875238] [<ffffffff81487649>] out_of_line_wait_on_bit+0x72/0x7d Oct 17 00:07:21 farina kernel: [21964.875245] [<ffffffffa0db7847>] ? rpc_queue_empty+0x2e/0x2e [sunrpc] Oct 17 00:07:21 farina kernel: [21964.875248] [<ffffffff81070687>] ? autoremove_wake_function+0x3d/0x3d Oct 17 00:07:21 farina kernel: [21964.875256] [<ffffffffa0db841c>] __rpc_execute+0xf0/0x293 [sunrpc] Oct 17 00:07:21 farina kernel: [21964.875264] [<ffffffffa0db862c>] rpc_execute+0x3f/0x43 [sunrpc] Oct 17 00:07:21 farina kernel: [21964.875270] [<ffffffffa0db1e0f>] rpc_run_task+0x86/0x8e [sunrpc] Oct 17 00:07:21 farina kernel: [21964.875276] [<ffffffffa0db1f04>] rpc_call_sync+0x45/0x66 [sunrpc] Oct 17 00:07:21 farina kernel: [21964.875293] [<ffffffffa0e104ba>] _nfs4_call_sync+0x21/0x23 [nfs] Oct 17 00:07:21 farina kernel: [21964.875305] [<ffffffffa0e0dd75>] nfs4_call_sync+0x16/0x18 [nfs] Oct 17 00:07:21 farina kernel: [21964.875317] [<ffffffffa0e0ebe3>] _nfs4_proc_getattr+0x97/0xa5 [nfs] Oct 17 00:07:21 farina kernel: [21964.875330] [<ffffffffa0e11ddb>] nfs4_proc_getattr+0x36/0x55 [nfs] Oct 17 00:07:21 farina kernel: [21964.875338] [<ffffffffa0dfcffd>] __nfs_revalidate_inode+0xb4/0x1a2 [nfs] Oct 17 00:07:21 farina kernel: [21964.875347] [<ffffffffa0dfd309>] nfs_getattr+0x81/0xc4 [nfs] Oct 17 00:07:21 farina kernel: [21964.875350] [<ffffffff8112abb3>] vfs_getattr+0x45/0x63 Oct 17 00:07:21 farina kernel: [21964.875353] [<ffffffff8112ac29>] vfs_fstatat+0x58/0x6e Oct 17 00:07:21 farina kernel: [21964.875355] [<ffffffff8112ac7a>] vfs_stat+0x1b/0x1d Oct 17 00:07:21 farina kernel: [21964.875357] [<ffffffff8112ad79>] sys_newstat+0x1a/0x33 Oct 17 00:07:21 farina kernel: [21964.875360] [<ffffffff8112fba4>] ? path_put+0x20/0x24 Oct 17 00:07:21 farina kernel: [21964.875363] [<ffffffff810a0f88>] ? audit_syscall_entry+0x145/0x171 Oct 17 00:07:21 farina kernel: [21964.875365] [<ffffffff8113048d>] ? putname+0x34/0x36 Oct 17 00:07:21 farina kernel: [21964.875368] [<ffffffff8148ed02>] system_call_fastpath+0x16/0x1b Oct 17 00:07:21 farina kernel: [21964.875371] Oct 17 00:07:21 farina kernel: [21964.875372] Restarting tasks ... done.
Looks like the same problem. As I mentioned before, there is some raciness involved here depending on when the network interfaces come down. At this point it looks like the patchset should make 3.2 upstream...
Been running kernels with this patch on my laptop and desktop for a week or so, noticed no problems and have been able to suspend both systems first time, every time. Looks good to me!
*** Bug 710539 has been marked as a duplicate of this bug. ***
Tejun Heo has proposed that a key piece of this patchset be reverted. Since he knows a lot more about task state handling, I'm inclined to believe him that it'll be problematic. At this point we're still discussing what the correct fix is, but for now it's looking like this will not make 3.2. Hopefully we can get it fixed for 3.3.
Created attachment 531763 [details] patch -- don't have freezer count processes sleeping in NFS/RPC code I've moved my laptop to f16 and have not been able to consistently reproduce this since. I suspect that the problem is actually still there however. If anyone here is still able to reproduce this, could you test the attached patch? This is apparently the scheme that the upstream scheduler gurus would prefer that we use for now. It should work on f15-ish kernels too, but it may need some wiggling to get it to apply there.
The problem's definitely still there - I've been rebuilding kernels with your older patch, and both my systems always suspend first time with the rebuilt kernels, but only about 50% success without the patch. I'll try the new patch soon, thanks! -- Fedora Bugzappers volunteer triage team https://fedoraproject.org/wiki/BugZappers
*** Bug 712088 has been marked as a duplicate of this bug. ***
The patch fails to apply against current F16 kernel, all hunks fail: Patch21080: cifs_nfs_suspend.patch + case "$patch" in + patch -p1 -F1 -s 1 out of 1 hunk FAILED -- saving rejects to file include/linux/freezer.h.rej 2 out of 2 hunks FAILED -- saving rejects to file include/linux/freezer.h.rej
Created attachment 531916 [details] patch -- nfs/sunrpc: make TASK_KILLABLE sleeps attempt to freeze Can you try this patch? It's against 3.1.0-7.fc16.x86_64.
With the new patch I just had one failure to suspend due to CIFS on my laptop; with the old patch I don't think I ever had a failure.
Did it happen to spew anything to the ring buffer when it failed? Also, have you had any problems suspending when NFS is mounted? FWIW, I'm not that thrilled with the new scheme for doing this as it doesn't seem as robust. So, I'm interested in any problems you may be seeing with it...
Jeff: here's the failure from my laptop, running kernel 3.1.1-1 with the new patch applied: Nov 15 10:36:17 vaioz kernel: [ 517.877192] Freezing user space processes ... Nov 15 10:36:17 vaioz kernel: [ 537.850179] Freezing of tasks failed after 20.00 seconds (1 tasks refusing to freeze, wq_busy=0): Nov 15 10:36:17 vaioz kernel: [ 537.850227] umount D 0000000000000000 0 2421 2397 0x00800084 Nov 15 10:36:17 vaioz kernel: [ 537.850235] ffff88013a3f5cb8 0000000000000086 ffff88013801d180 ffff880000000000 Nov 15 10:36:17 vaioz kernel: [ 537.850242] ffff880036ba0000 ffff88013a3f5fd8 ffff88013a3f5fd8 0000000000012f80 Nov 15 10:36:17 vaioz kernel: [ 537.850248] ffffffff81a0d020 ffff880036ba0000 ffff88013a3f5c88 00000001814b7194 Nov 15 10:36:17 vaioz kernel: [ 537.850254] Call Trace: Nov 15 10:36:17 vaioz kernel: [ 537.850266] [<ffffffff814b5cc7>] schedule+0x5a/0x5c Nov 15 10:36:17 vaioz kernel: [ 537.850287] [<ffffffffa0411d80>] wait_for_response+0xbc/0xbe [cifs] Nov 15 10:36:17 vaioz kernel: [ 537.850292] [<ffffffff810733ce>] ? remove_wait_queue+0x3a/0x3a Nov 15 10:36:17 vaioz kernel: [ 537.850299] [<ffffffffa0412850>] SendReceive2+0x162/0x29d [cifs] Nov 15 10:36:17 vaioz kernel: [ 537.850306] [<ffffffffa04129c0>] SendReceiveNoRsp+0x35/0x37 [cifs] Nov 15 10:36:17 vaioz kernel: [ 537.850311] [<ffffffffa03f8e8f>] CIFSSMBTDis+0x8d/0xcf [cifs] Nov 15 10:36:17 vaioz kernel: [ 537.850316] [<ffffffffa04017b6>] cifs_put_tcon+0xc7/0xf0 [cifs] Nov 15 10:36:17 vaioz kernel: [ 537.850321] [<ffffffffa0404c7c>] cifs_put_tlink+0x4f/0x5c [cifs] Nov 15 10:36:17 vaioz kernel: [ 537.850326] [<ffffffffa0405fd3>] cifs_umount+0x4b/0x92 [cifs] Nov 15 10:36:17 vaioz kernel: [ 537.850330] [<ffffffffa03f71d0>] cifs_kill_sb+0x1f/0x23 [cifs] Nov 15 10:36:17 vaioz kernel: [ 537.850336] [<ffffffff8112adfd>] deactivate_locked_super+0x37/0x68 Nov 15 10:36:17 vaioz kernel: [ 537.850339] [<ffffffff8112b66b>] deactivate_super+0x37/0x3b Nov 15 10:36:17 vaioz kernel: [ 537.850342] [<ffffffff811403ec>] mntput_no_expire+0xcc/0xd1 Nov 15 10:36:17 vaioz kernel: [ 537.850344] [<ffffffff81140fa9>] sys_umount+0x2ac/0x2da Nov 15 10:36:17 vaioz kernel: [ 537.850349] [<ffffffff814bd8c2>] system_call_fastpath+0x16/0x1b Nov 15 10:36:17 vaioz kernel: [ 537.850351] -- Fedora Bugzappers volunteer triage team https://fedoraproject.org/wiki/BugZappers
Sorry for the confusion. This bug was for the NFS problem, so I didn't bother to roll in the patches for CIFS. They have been committed upstream so that might have made it difficult to merge. I'll respin this patchset against 3.1.1 and add in the cifs piece. Stay tuned...
Going ahead and moving this to F16 bug...
Created attachment 533853 [details] patch -- cifs/nfs/sunrpc client freezer patch against v3.1.1 Ok, this patch is against 3.1.1 and seems to do the right thing. I tested several suspend/resume cycles with NFS and it seemed to work correctly. Can you test this one when you get time and let me know how it goes? The CIFS part of this patch is already upstream. If the NFS parts work for you too, I'll plan to post the NFS/sunrpc parts upstream within the next few weeks so it'll be ready for 3.3.
been running the 'big patch' against 3.1 on my laptop for a while and it seems to work fine, I've had no failed suspends. thanks! -- Fedora Bugzappers volunteer triage team https://fedoraproject.org/wiki/BugZappers
jeff: can you provide an up-to-date patch against 3.2 for use on my desktop (which is on f17)? I've tried using https://bugzilla.redhat.com/attachment.cgi?id=531763, which as I understood things ought to contain the NFS stuff while 3.2 itself already contains the CIFS stuff; it applies (at least last I tried), but doesn't seem to resolve the issue entirely, I do get suspend failures. So I figure there must be something missing from the combination of that patch plus whatever's in 3.2, compared to 3.1 plus the 'big patch'? -- Fedora Bugzappers volunteer triage team https://fedoraproject.org/wiki/BugZappers
Created attachment 537948 [details] patch -- nfs/sunrpc client freezer patch against v3.2-rc3 or so Yes, this patch applies with a little offset to 3.2-pre kernels. Note that the cifs piece of this is already merged into 3.2. The plan is to merge the nfs/sunrpc parts for 3.3.
jeff: are you sure the CIFS bit is in 3.2.0-rc3.git0 ? I'm running: 3.2.0-0.rc3.git0.1.2.fc17.x86_64 which is a personal build: it's the Fedora 3.2.0-0.rc3.git0.1 kernel with the patch from comment #30 applied (and also some debug config changes, but that doesn't matter to this bug). I tried to suspend and got a failure from CIFS: Nov 30 12:06:45 adam kernel: [ 4459.733121] Freezing of tasks failed after 20.00 seconds (1 tasks refusing to freeze, wq_busy=0): Nov 30 12:06:45 adam kernel: [ 4459.733167] cifsd W ffff880417683540 4528 1690 2 0x00800000 Nov 30 12:06:45 adam kernel: [ 4459.733175] ffff88042e95f9a0 0000000000000046 ffff880400000000 ffffffff812ec9e2 Nov 30 12:06:45 adam kernel: [ 4459.733183] ffff880417683160 ffff88042e95ffd8 ffff88042e95ffd8 ffff88042e95ffd8 Nov 30 12:06:45 adam kernel: [ 4459.733189] ffff880444fae2c0 ffff880417683160 000000000000128d 0000000100000082 Nov 30 12:06:45 adam kernel: [ 4459.733196] Call Trace: Nov 30 12:06:45 adam kernel: [ 4459.733206] [<ffffffff812ec9e2>] ? __debug_object_init+0x202/0x410 Nov 30 12:06:45 adam kernel: [ 4459.733214] [<ffffffff8162519f>] schedule+0x3f/0x60 Nov 30 12:06:45 adam kernel: [ 4459.733218] [<ffffffff816256ea>] schedule_timeout+0x19a/0x380 Nov 30 12:06:45 adam kernel: [ 4459.733228] [<ffffffff81083f60>] ? lock_timer_base+0x70/0x70 Nov 30 12:06:45 adam kernel: [ 4459.733231] [<ffffffff814fac41>] sk_wait_data+0xd1/0xe0 Nov 30 12:06:45 adam kernel: [ 4459.733233] [<ffffffff81099740>] ? remove_wait_ queue+0x50/0x50 Nov 30 12:06:45 adam kernel: [ 4459.733235] [<ffffffff81555b75>] tcp_recvmsg+0x 545/0xca0 Nov 30 12:06:45 adam kernel: [ 4459.733238] [<ffffffff8106a525>] ? load_balance +0x105/0x890 Nov 30 12:06:45 adam kernel: [ 4459.733240] [<ffffffff8157a1bb>] inet_recvmsg+0 x8b/0xa0 Nov 30 12:06:45 adam kernel: [ 4459.733242] [<ffffffff814f52cd>] sock_recvmsg+0 x11d/0x140 Nov 30 12:06:45 adam kernel: [ 4459.733243] [<ffffffff812ec7ae>] ? free_object+ 0x8e/0xc0 Nov 30 12:06:45 adam kernel: [ 4459.733245] [<ffffffff812ed0f8>] ? debug_object _free+0xe8/0x140 Nov 30 12:06:45 adam kernel: [ 4459.733247] [<ffffffff8109d885>] ? destroy_hrti mer_on_stack+0x15/0x20 Nov 30 12:06:45 adam kernel: [ 4459.733249] [<ffffffff81626aba>] ? schedule_hrt imeout_range_clock+0xca/0x160 Nov 30 12:06:45 adam kernel: [ 4459.733250] [<ffffffff8109dde4>] ? hrtimer_star t_range_ns+0x14/0x20 Nov 30 12:06:45 adam kernel: [ 4459.733253] [<ffffffff8112c685>] ? mempool_allo c_slab+0x15/0x20 Nov 30 12:06:45 adam kernel: [ 4459.733255] [<ffffffff814f5336>] kernel_recvmsg+0x46/0x60 Nov 30 12:06:45 adam kernel: [ 4459.733260] [<ffffffffa04a8e7e>] cifs_readv_from_socket+0x1ae/0x280 [cifs] Nov 30 12:06:45 adam kernel: [ 4459.733262] [<ffffffff8112c9a9>] ? mempool_alloc+0x59/0x150 Nov 30 12:06:45 adam kernel: [ 4459.733264] [<ffffffff8105a433>] ? __wake_up+0x53/0x70 Nov 30 12:06:45 adam kernel: [ 4459.733267] [<ffffffffa04a8f77>] cifs_read_from_socket+0x27/0x30 [cifs] Nov 30 12:06:45 adam kernel: [ 4459.733269] [<ffffffffa04a914d>] cifs_demultiplex_thread+0x15d/0xdc0 [cifs] Nov 30 12:06:45 adam kernel: [ 4459.733271] [<ffffffff81624904>] ? __schedule+0x3e4/0x940 Nov 30 12:06:45 adam kernel: [ 4459.733274] [<ffffffffa04a8ff0>] ? dequeue_mid+0x70/0x70 [cifs] Nov 30 12:06:45 adam kernel: [ 4459.733276] [<ffffffff81098e1c>] kthread+0x8c/0xa0 Nov 30 12:06:45 adam kernel: [ 4459.733279] [<ffffffff81631eb4>] kernel_thread_helper+0x4/0x10 Nov 30 12:06:45 adam kernel: [ 4459.733280] [<ffffffff81098d90>] ? kthread_worker_fn+0x1c0/0x1c0 Nov 30 12:06:45 adam kernel: [ 4459.733282] [<ffffffff81631eb0>] ? gs_change+0x13/0x13 Nov 30 12:06:45 adam kernel: [ 4459.733287] Nov 30 12:06:45 adam kernel: [ 4459.733288] Restarting tasks ... done. -- Fedora Bugzappers volunteer triage team https://fedoraproject.org/wiki/BugZappers
Created attachment 539374 [details] patch -- attempt to freeze while looping on a receive attempt It is, but this is a *different* bug that we introduced in 3.2. We've been discussing this upstream since yesterday. My thinking was that this patch would fix it, but the one person who tested it said it didn't work for them. Can you test it and let me know if it does for you?
Created attachment 542512 [details] patch -- cifs/nfs/sunrpc client freezer patch against v3.1.4 This set of patches should resolve the issue and is against v3.1.4. Adam, could you see about putting the above patchset into F16?
Created attachment 542513 [details] patch -- cifs/nfs/sunrpc client freezer patch against v3.2-rc4 (or so) This patch is against the tip of Linus' tree. It should be suitable for f17 kernels. The relevant changes are slated for v3.3, so once f17 moves to a 3.3-based kernel we shouldn't need any patches.
Adam can you see about getting the above two patches into Fedora's kernels?
(In reply to comment #35) > Adam can you see about getting the above two patches into Fedora's kernels? We'll bring in the rawhide version for now and let it settle there for a few days. If it looks good, we'll grab the f16 version after we get the 3.1.5 stable update pushed out. Adam, in the meantime, if you want to respin your f16 kernel and test it locally that would also be helpful.
Josh: I've been using the slightly older version on F16 for weeks without issue, I'll confirm with the new one too. -- Fedora Bugzappers volunteer triage team https://fedoraproject.org/wiki/BugZappers
kernel-3.1.5-2.fc16 has been submitted as an update for Fedora 16. https://admin.fedoraproject.org/updates/kernel-3.1.5-2.fc16
*** Bug 759703 has been marked as a duplicate of this bug. ***
Package kernel-3.1.5-2.fc16: * should fix your issue, * was pushed to the Fedora 16 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing kernel-3.1.5-2.fc16' as soon as you are able to, then reboot. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2011-17052/kernel-3.1.5-2.fc16 then log in and leave karma (feedback).
kernel-3.1.5-2.fc16 has been pushed to the Fedora 16 stable repository. If problems still persist, please make note of it in this bug report.
kernel-2.6.41.5-4.fc15 has been submitted as an update for Fedora 15. https://admin.fedoraproject.org/updates/kernel-2.6.41.5-4.fc15
kernel-2.6.41.6-1.fc15 has been submitted as an update for Fedora 15. https://admin.fedoraproject.org/updates/kernel-2.6.41.6-1.fc15
This issue exists in Redhat EPLC 6.5 kernel version 2.6.32-431.23.3.el6.x86_64 (latest stable release).
The issue still exists in RHEL 6.7 kernel 2.6.32-573.26.1.el6.x86_64.