Description of problem: I had a running dlm cluster, went to tool and switched to GULM, and this caused the MGMT status deamon to start causing errors: "A problem was encountered when attempting to get information about the nodes in the cluster. The following error messages was received from cman_tool: Failed to connect to localhost:core (::ffff:127.0.0.1 40040) Connection refused In src/gulm_tool.c322 (1.0-0.pre27) death by: Failed to connect to server" Version-Release number of selected component (if applicable): -38
Fixed in 0.9.48
running 0.9.48, I'm still seeing the exact same cman errors.
Had a minor typo causing problems - I believe this is solved now in 0.9.51-1.0
nope, same problem in -51.
This error now shows up anytime the GUI is started on a GULM cluster, even if it wasn't switched over from dlm. This was working in earlier versions but has regressed. Also these messages appear in the syslog when the GUI starts. May 19 10:47:52 morph-04 gconfd (root-7841): starting (version 2.8.1), pid 7841 user 'root' May 19 10:47:52 morph-04 gconfd (root-7841): Resolved address "xml:readonly:/etc/gconf/gconf.xml.mandatory" to a read-only configuration source at position 0 May 19 10:47:52 morph-04 gconfd (root-7841): Resolved address "xml:readwrite:/root/.gconf" to a writable configuration source at position 1 May 19 10:47:52 morph-04 gconfd (root-7841): Resolved address "xml:readonly:/etc/gconf/gconf.xml.defaults" to a read-only configuration source at position 2 May 19 10:47:54 morph-04 lock_gulmd_core[7065]: "Magma::7842" is logged out. fd:11
This is happening because the call to gulm_tool nodelist localhost:core is timing out intermittantly. I make this call about every 20 seconds to refresh the node list, and now and then, it will begin returning a (-1) exit code and printing "Command timed out" to stderr. CC'ing tilstra on this for his insight.
Add Network2 to the verbosity, then read /var/log/messages from the gulm server node. It should have messages about the connection attempts. Might have some clues there.
try running 'gulm_tool getstats 127.0.0.1' DNS is being too slow, and so gulm_tool is timing out before it can resolve the name. Why your machines are doing dns queries for 'localhost' I don't know either.
well.....i cannot rectify the tendency forthis to happen - but I *can* have the UI deal with it when it does...and it does now; kinda elegantly I think! :-) Check out 0.9.54-1.0
fix verified in -60.