Bug 352731 - lock_dlm thread doesn't exit after kthread_stop()
lock_dlm thread doesn't exit after kthread_stop()
Status: CLOSED CURRENTRELEASE
Product: Red Hat Cluster Suite
Classification: Red Hat
Component: GFS-kernel (Show other bugs)
4
All Linux
low Severity low
: ---
: ---
Assigned To: David Teigland
Cluster QE
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2007-10-25 12:50 EDT by David Teigland
Modified: 2010-01-11 22:19 EST (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-03-12 15:55:30 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
patch to try (867 bytes, text/plain)
2007-10-30 14:08 EDT, David Teigland
no flags Details

  None (edit)
Description David Teigland 2007-10-25 12:50:58 EDT
Description of problem:

Hit when running the test in bug 299061 comment 52.

Oct 25 10:54:09 marathon-01 kernel: umount        D 0000000000611c00     0 24580
 24579                     (NOTLB)
Oct 25 10:54:09 marathon-01 kernel: 000001007acf3d48 0000000000000006
000001007acf3cc8 ffffffff801342f5 
Oct 25 10:54:09 marathon-01 kernel:        0000000000000246 00000003a0205b50
0000010121ade7f0 0000000300004554 
Oct 25 10:54:09 marathon-01 kernel:        0000010121ade7f0 00000000000009ec 
Oct 25 10:54:09 marathon-01 kernel: Call
Trace:<ffffffff801342f5>{__wake_up_common+67}
<ffffffff8030bee1>{wait_for_completion+167} 
Oct 25 10:54:09 marathon-01 kernel:       
<ffffffff801342a4>{default_wake_function+0}
<ffffffff801342a4>{default_wake_function+0} 
Oct 25 10:54:09 marathon-01 kernel:        <ffffffff8014bd38>{kthread_stop+147}
<ffffffffa0232b87>{:lock_dlm:lm_dlm_unmount+46} 
Oct 25 10:54:09 marathon-01 kernel:       
<ffffffffa02093ae>{:lock_harness:lm_unmount+62}
<ffffffffa02a6fb7>{:gfs:gfs_lm_unmount+33} 
Oct 25 10:54:09 marathon-01 kernel:       
<ffffffffa02b636c>{:gfs:gfs_put_super+806}
<ffffffff8017f37d>{generic_shutdown_super+198} 
Oct 25 10:54:09 marathon-01 kernel:       
<ffffffffa02b3aba>{:gfs:gfs_kill_sb+41} <ffffffff8017f29e>{deactivate_super+95} 
Oct 25 10:54:09 marathon-01 kernel:        <ffffffff80194f92>{sys_umount+925}
<ffffffff80110d91>{error_exit+0} 
Oct 25 10:54:09 marathon-01 kernel:        <ffffffff8011026a>{system_call+126} 

This appears to be the thread that's not exiting:

Oct 25 10:54:09 marathon-01 kernel: lock_dlm2     S 0000010121a5e680     0 24566
    11               24555 (L-TLB)
Oct 25 10:54:09 marathon-01 kernel: 0000010121a27e58 0000000000000046
000001012b8247f0 0000010000000069 
Oct 25 10:54:09 marathon-01 kernel:        000001007b4978f4 00000000002f2000
000001000101ee80 0000000000000000 
Oct 25 10:54:09 marathon-01 kernel:        000001007ad6a030 0000000000001661 
Oct 25 10:54:09 marathon-01 kernel: Call
Trace:<ffffffffa023541a>{:lock_dlm:dlm_async+218}
<ffffffff801342f5>{__wake_up_common+67} 
Oct 25 10:54:09 marathon-01 kernel:       
<ffffffff801342a4>{default_wake_function+0}
<ffffffff8014ba68>{keventd_create_kthread+0} 
Oct 25 10:54:09 marathon-01 kernel:       
<ffffffffa0235340>{:lock_dlm:dlm_async+0}
<ffffffff8014ba68>{keventd_create_kthread+0} 
Oct 25 10:54:09 marathon-01 kernel:        <ffffffff8014ba3f>{kthread+200}
<ffffffff80110f47>{child_rip+8} 
Oct 25 10:54:09 marathon-01 kernel:       
<ffffffff8014ba68>{keventd_create_kthread+0} <ffffffff8014b977>{kthread+0} 
Oct 25 10:54:09 marathon-01 kernel:        <ffffffff80110f3f>{child_rip+0} 


I've looked at other users of kthread_stop()/kthread_should_stop(),
and most seem to incorporate a call to kthread_should_stop() within
the test that adds the thread to a wait_queue.  I'm guessing that
that's what we should be doing also (and at the same time using
wait_event_interruptible() instead of an open-coded equivalent.)

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:
Comment 1 David Teigland 2007-10-25 15:20:23 EDT
possible fix to test


RCS file: /cvs/cluster/cluster/gfs-kernel/src/dlm/Attic/thread.c,v
retrieving revision 1.16.2.6
diff -u -r1.16.2.6 thread.c
--- thread.c    31 Aug 2007 15:23:32 -0000      1.16.2.6
+++ thread.c    25 Oct 2007 19:19:41 -0000
@@ -330,15 +330,10 @@
        dlm_lock_t *lp = NULL;
        dlm_start_t *ds = NULL;
        uint8_t complete, blocking, submit, start, finish, drop, shrink;
-       DECLARE_WAITQUEUE(wait, current);
 
        while (!kthread_should_stop()) {
-               set_current_state(TASK_INTERRUPTIBLE);
-               add_wait_queue(&dlm->wait, &wait);
-               if (no_work(dlm))
-                       schedule();
-               remove_wait_queue(&dlm->wait, &wait);
-               set_current_state(TASK_RUNNING);
+               wait_event_interruptible(dlm->wait,
+                               !no_work(dlm) || kthread_should_stop());
 
                complete = blocking = submit = start = finish = 0;
                drop = shrink = 0;
Comment 2 David Teigland 2007-10-30 14:08:02 EDT
Created attachment 243721 [details]
patch to try

same patch as comment above
Comment 3 David Teigland 2008-01-14 11:00:41 EST
fix checked into RHEL4 branch

Checking in thread.c;
/cvs/cluster/cluster/gfs-kernel/src/dlm/Attic/thread.c,v  <--  thread.c
new revision: 1.16.2.7; previous revision: 1.16.2.6
Comment 4 Steve Whitehouse 2009-01-20 10:22:16 EST
Adding missing flags
Comment 5 Chris Feist 2009-03-12 15:55:30 EDT
This is already fixed in 4.7.

Note You need to log in before you can comment on or make changes to this bug.