Created attachment 108570 [details] dumps from dlm Again, not sure if this is everything, think I'm limited by the scrollback buffer again.
Created attachment 108739 [details] Full dlm assert dump Got this one again. (finally) Turned on screen logging, got full output this time. Also included output from other nodes. clocks are synced accross all nodes.
Created attachment 108836 [details] email describing how this bug was hit This is a copy of the email sent on the linux-cluster mailing list.
When the dlm reports -ENOBUFS (-105) it means that no kernel memory could be allocated to send a network message. Obviously, the reccomms function asserts when it sees this, and the remote_stage function doesn't (but it probably should.) It's not clear that there's anything wrong with the dlm here. Reducing the drop_count in lock_dlm might help simply by causing gfs to cache fewer locks and reduce memory usage.
comment #3 is related to bug 139738, not this one
*** This bug has been marked as a duplicate of 142844 ***
Changed to 'CLOSED' state since 'RESOLVED' has been deprecated.