Red Hat Bugzilla – Bug 206470
gfs recovery mixed with unmounting can hang
Last modified: 2010-01-11 22:13:07 EST
Description of problem:
gfs/lock_dlm get a callback to do recovery at the same time that
a local gfs unmount happens.
lock_dlm prints "pr_start 31060 skip for umount/wd"
and tries to do a kcl_service_leave() which won't work
because the service (in SM) is still in recovery state 2
and needs a start_done() ack from lock_dlm. In this case,
the node that got the unmount and recovery callback at
the same time was the only node with the fs mounted.
Version-Release number of selected component (if applicable):
Do a test with lots of mounting/unmounting and throw in some node
failures and you'll run into this.
Steps to Reproduce:
This will require a lot of work, will put it off until it becomes
an issue for someone.
Moving out for consideration for 4.6
This issue has never actually been seen, so not planning on changing it
(which would be a high regression risk).