Description of problem: Hit when running the test in bug 299061 comment 52. Oct 25 10:54:09 marathon-01 kernel: umount D 0000000000611c00 0 24580 24579 (NOTLB) Oct 25 10:54:09 marathon-01 kernel: 000001007acf3d48 0000000000000006 000001007acf3cc8 ffffffff801342f5 Oct 25 10:54:09 marathon-01 kernel: 0000000000000246 00000003a0205b50 0000010121ade7f0 0000000300004554 Oct 25 10:54:09 marathon-01 kernel: 0000010121ade7f0 00000000000009ec Oct 25 10:54:09 marathon-01 kernel: Call Trace:<ffffffff801342f5>{__wake_up_common+67} <ffffffff8030bee1>{wait_for_completion+167} Oct 25 10:54:09 marathon-01 kernel: <ffffffff801342a4>{default_wake_function+0} <ffffffff801342a4>{default_wake_function+0} Oct 25 10:54:09 marathon-01 kernel: <ffffffff8014bd38>{kthread_stop+147} <ffffffffa0232b87>{:lock_dlm:lm_dlm_unmount+46} Oct 25 10:54:09 marathon-01 kernel: <ffffffffa02093ae>{:lock_harness:lm_unmount+62} <ffffffffa02a6fb7>{:gfs:gfs_lm_unmount+33} Oct 25 10:54:09 marathon-01 kernel: <ffffffffa02b636c>{:gfs:gfs_put_super+806} <ffffffff8017f37d>{generic_shutdown_super+198} Oct 25 10:54:09 marathon-01 kernel: <ffffffffa02b3aba>{:gfs:gfs_kill_sb+41} <ffffffff8017f29e>{deactivate_super+95} Oct 25 10:54:09 marathon-01 kernel: <ffffffff80194f92>{sys_umount+925} <ffffffff80110d91>{error_exit+0} Oct 25 10:54:09 marathon-01 kernel: <ffffffff8011026a>{system_call+126} This appears to be the thread that's not exiting: Oct 25 10:54:09 marathon-01 kernel: lock_dlm2 S 0000010121a5e680 0 24566 11 24555 (L-TLB) Oct 25 10:54:09 marathon-01 kernel: 0000010121a27e58 0000000000000046 000001012b8247f0 0000010000000069 Oct 25 10:54:09 marathon-01 kernel: 000001007b4978f4 00000000002f2000 000001000101ee80 0000000000000000 Oct 25 10:54:09 marathon-01 kernel: 000001007ad6a030 0000000000001661 Oct 25 10:54:09 marathon-01 kernel: Call Trace:<ffffffffa023541a>{:lock_dlm:dlm_async+218} <ffffffff801342f5>{__wake_up_common+67} Oct 25 10:54:09 marathon-01 kernel: <ffffffff801342a4>{default_wake_function+0} <ffffffff8014ba68>{keventd_create_kthread+0} Oct 25 10:54:09 marathon-01 kernel: <ffffffffa0235340>{:lock_dlm:dlm_async+0} <ffffffff8014ba68>{keventd_create_kthread+0} Oct 25 10:54:09 marathon-01 kernel: <ffffffff8014ba3f>{kthread+200} <ffffffff80110f47>{child_rip+8} Oct 25 10:54:09 marathon-01 kernel: <ffffffff8014ba68>{keventd_create_kthread+0} <ffffffff8014b977>{kthread+0} Oct 25 10:54:09 marathon-01 kernel: <ffffffff80110f3f>{child_rip+0} I've looked at other users of kthread_stop()/kthread_should_stop(), and most seem to incorporate a call to kthread_should_stop() within the test that adds the thread to a wait_queue. I'm guessing that that's what we should be doing also (and at the same time using wait_event_interruptible() instead of an open-coded equivalent.) Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
possible fix to test RCS file: /cvs/cluster/cluster/gfs-kernel/src/dlm/Attic/thread.c,v retrieving revision 1.16.2.6 diff -u -r1.16.2.6 thread.c --- thread.c 31 Aug 2007 15:23:32 -0000 1.16.2.6 +++ thread.c 25 Oct 2007 19:19:41 -0000 @@ -330,15 +330,10 @@ dlm_lock_t *lp = NULL; dlm_start_t *ds = NULL; uint8_t complete, blocking, submit, start, finish, drop, shrink; - DECLARE_WAITQUEUE(wait, current); while (!kthread_should_stop()) { - set_current_state(TASK_INTERRUPTIBLE); - add_wait_queue(&dlm->wait, &wait); - if (no_work(dlm)) - schedule(); - remove_wait_queue(&dlm->wait, &wait); - set_current_state(TASK_RUNNING); + wait_event_interruptible(dlm->wait, + !no_work(dlm) || kthread_should_stop()); complete = blocking = submit = start = finish = 0; drop = shrink = 0;
Created attachment 243721 [details] patch to try same patch as comment above
fix checked into RHEL4 branch Checking in thread.c; /cvs/cluster/cluster/gfs-kernel/src/dlm/Attic/thread.c,v <-- thread.c new revision: 1.16.2.7; previous revision: 1.16.2.6
Adding missing flags
This is already fixed in 4.7.