Description of problem: I debated whether this is a clvmd or dlm issue. For some time, I've run into problems where clvmd would lock hard enough that 'kill -9 `pidof clvmd`' would not stop it. This would force a reboot, or often a fence as reboot would never complete, to restore the node. Further, I would often (always?) also then need reboot the surviving node before I could restart clvmd on the first node. Herein lies the major problem. I've finally found a reliable, if stupid, way to trigger this condition. If you set the MTU of the totem interface on one node to be higher than the second (say, 2000 vs 1500), then try to start clvmd, it will start successfully on the higher MTU node and hang while starting on the second, lower MTU node. After this, you will not be able to stop or restart clvmd on the first node until it is rebooted. Regardless of the cause, clvmd (or dlm if it's the source), should eventually fail and exit rather than endlessly wait. This is a very big problem because of the impact on the second node. Version-Release number of selected component (if applicable): - All are 64bit packages cman-3.0.17-1 Corosync Cluster Engine, version '1.2.8' SVN revision '3035' Cluster LVM daemon version: 2.02.73(2) (2010-09-18) dlm_controld 3.0.17 (built Oct 7 2010 06:55:07) Linux kernel 2.6.32, myoung's 170 Xen dom0 How reproducible: 100% with the intentional misconfiguration, random otherwise. Steps to Reproduce: 1. Set the MTU of the totem interface of one node to be higher than another. 2. Create a DRBD device (or possibly use an iSCSI device) and use it to create a clustered PV -> VG -> LV 3. Configure a simple 2-node cluster 4. set the LV to clustered 5. Start clvmd Actual results: clvmd will forever wait while starting. Trying to stop it with ctrl+c will fail. Killing the start PID will return the terminal, but 'clvmd -T30' will remain. This process is not killable. Second node may have started clvmd successfully, but it will not be locked and stopping clvmd will cause the same infinite wait. Expected results: On error, the clvmd (or dlm) should timeout and exit, returning control of the node. Failing this, a timeout should trigger a fence call against the node that caused the timeout. Additional info: Below is a link to the log file starting when cman (corosync) first starts on the node with the lower MTU (which was know to fail). Line 61 is where clvmd was started. On line 111, after the second timeout message, there is a different message (dlm: writequeue empty for nodeid 1) tha may be relevant. The messages continued until the node was fenced (reboot hung). http://pastebin.com/qNQ0B7GN Again, the MTU issue is artificial, but it reliably reproduces an issue I've seen seemingly randomly happen for some time.
Created attachment 463503 [details] Trace from pastebin location Trace from pastebin to keep it local to bz. (Removed common prefix from the trace)
Looks like system is inside some internal dlm code. Could be possibly some misconfiguration? So asking DLM maintainer what he thinks about this problem.
If there are issues other than MTU, we should try to identify them specifically instead of trying to work around them in other layers, without ever identifying the root cause. For the MTU misconfiguration, if corosync cannot operate with that setting, it should validate or detect the misconfig and report an error at startup. All that is not to say that that it wouldn't be a good idea to make higher level subsystems (dlm, clvm, other) interruptable, and capable of cleanly backing out, but the problems identified so far do not seem to justify that amount of work.
There are other issues, unfortunately, I've not been able to identify them yet.
Reassigning to corosync - this is not clvmd fault. Not sure if there is enough info, though. Please can you explain what are the other issues mentioned in comment #4?
Bug # 599327 addresses the problem where corosync cannot be exited when totem is unable to form a configuration (the MTU problem described in this bugzilla). Regards -steve *** This bug has been marked as a duplicate of bug 599327 ***