Description of problem: gfs/lock_dlm get a callback to do recovery at the same time that a local gfs unmount happens. lock_dlm prints "pr_start 31060 skip for umount/wd" and tries to do a kcl_service_leave() which won't work because the service (in SM) is still in recovery state 2 and needs a start_done() ack from lock_dlm. In this case, the node that got the unmount and recovery callback at the same time was the only node with the fs mounted. Version-Release number of selected component (if applicable): How reproducible: Do a test with lots of mounting/unmounting and throw in some node failures and you'll run into this. Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
This will require a lot of work, will put it off until it becomes an issue for someone.
Moving out for consideration for 4.6
This issue has never actually been seen, so not planning on changing it (which would be a high regression risk).