Description of problem: After a 'service rgmanager start' I noted several nodes did not start the service successfully. The logs on the failed nodes show: Jul 27 12:56:34 morph-03 lock_gulmd_core[6337]: "Magma::9915" is logged out. fd: Jul 27 12:56:20 morph-03 clurgmgrd[9657]: <err> #34: Cannot get status for service Jul 27 12:56:20 morph-03 clurgmgrd[9657]: <err> #34: Cannot get status for service nfs_service Jul 27 12:56:21 morph-03 last message repeated 9 times Version-Release number of selected component (if applicable): gulm-1.0.7-0 rgmanager-1.9.52-0 How reproducible: Onec and a while. Steps to Reproduce: 1. Define a rgmanager service 2. Start service on all nodes Expected results: Additional info:
This is a bug with the way rgmanager works, not with gulm itself. Basically, gulm has no service group support, so rgmanager thinks rgmanager is running on all nodes at all times, which is incorrect. Under most situations, this should not matter, but in this case, there is apparently a race causing rgmanager to break. Restarting rgmanager on the affected node should fix the problem.
Devel ACK for 4.5.
This bug: - is gulm-specific - has a workaround (but a poor one) - has had no customer impact or reports - would require a large change to rgmanager. I'm closing this until we get reports from customers; at which point, it will be re-evaluated. Normally I wouldn't do this, except for the fact that it would be a fairly large change. If I come up with a solution which has less impact, I will reopen the bug as well.