Red Hat Bugzilla – Bug 111931
Network power switch tests not working as expected
Last modified: 2007-11-30 17:06:53 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1)
Description of problem:
The power switch test for the network power switch entails unplugging
the power switch's ethernet cable while the cluster is running, as
well as starting a cluster up with the cable not plugged in.
In the first case, even though there are messages logged about not
hearing from the power switch, clustat shows everything as perfectly
fine, for at least 5 minutes after the cable has been unplugged.
In the second case, the start of the cluster never gets beyond
starting cluquorumd, even after about 5 minutes of waitin. I did not
see any messages in the log explaining why, but neither did I change
the log level from 4.
The only case in this section of testing which behaved as expected was
when a machine needed to try to power cycle the other one, due to a
loss of quorum. In this case, the machine which needed to be power
cycled got into a status = Down and Switch = timeout state after less
than a minute, then the machine trying to do the power cycle
complained that it could not contact the power switch and had both its
status and switch states as unknown, not long afterward. When the
cable was reconnected, the machine which needed to be cycled got
cycled, and things were fine.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
A1. happy cluster, cable unplugged.
2. keep an eye on clustat and the log.
B1. Cable unplugged, start a cluster
2. keep an eye on service cluster status and the log
Actual Results: a) The cluster appeared to not know that the power
switch was unavailable, even though the log was complaining.
b) The cluster was unable to start anything beyond cluquorumd.
Expected Results: a) The cluster ought to have indicated in clustat
that the power switch was not available. Possibly also sending
messages to the screen about it, as happened in the attempted power
cycling example I gave.
b) The cluster ought to have warned about lack of power switch, but
started. The power switch ought to have been down in clustat.
This appears to not be a huge problem, as when the machines actually
try to *use* the power switch, the appropriate states are changed.
Thus, this is set to low severity.