Created attachment 1003865 [details] engine.log Description of problem: following event looks quite odd... 2015-Mar-19, 15:53 Power Management test failed for Host $host.com.Done ^^^^ eh? Version-Release number of selected component (if applicable): rhevm-backend-3.5.1-0.1.el6ev.noarch How reproducible: ?? Steps to Reproduce: 1. occurred while testing pm/fencing[1] 2. 3. Actual results: strange 'Done' in the end of event msg Expected results: event msg should make sense Additional info: [1] https://bugzilla.redhat.com/show_bug.cgi?id=1192596#c1
Hm.... Looking at the code, the "Done" you see is part of the result of running the fencing script, so I wonder whether it is part of our code or part of the underlying fencing script on the host side. Martin - thoughts? Jiri - does it happen all the time? When does it happen? If it rarely happens then I'd close it as WONTFIX.
Yes it is always: 2015-Apr-20, 16:06 Host dell-r210ii-13.rhev.lab.eng.brq.redhat.com from cluster Default was chosen as a proxy to execute Status command on Host dell-r210ii-04.rhev.lab.eng.brq.redhat.com. 2015-Apr-20, 16:06 Host dell-r210ii-13.rhev.lab.eng.brq.redhat.com from cluster Default was chosen as a proxy to execute Restart command on Host dell-r210ii-04.rhev.lab.eng.brq.redhat.com. 2015-Apr-20, 16:06 Host slot-4.rhev.lab.eng.brq.redhat.com from data center Default was chosen as a proxy to execute Status command on Host dell-r210ii-04.rhev.lab.eng.brq.redhat.com. > Reproducing steps: > > 1) Create cluster1 with 2 hosts (host1 and host2) > 2) Create cluster2 with 1 host (host3) in the same DC as cluster1 > 3) Block connection from host2 to PM interface of host1 > 4) Turn off host1 using its PM interface > 5) host1 become NonResponding, PM stop operation of host1 using host2 as proxy fails due to blocked connection > 6) PM stop operation using host3 as proxy will be skipped because host1 is already down > 7) Engine will badly interpret result of PM stop operation: instead of "skipped, because host is already turned off", it will determine result as "skipped due to fencing policy" -> host1 will not be restarted -> HA VMs running on host1 will not be restarted on different host
(In reply to Jiri Belka from comment #2) > Yes it is always: > > 2015-Apr-20, 16:06 > > Host dell-r210ii-13.rhev.lab.eng.brq.redhat.com from cluster Default was > chosen as a proxy to execute Status command on Host > dell-r210ii-04.rhev.lab.eng.brq.redhat.com. > > > 2015-Apr-20, 16:06 > > Host dell-r210ii-13.rhev.lab.eng.brq.redhat.com from cluster Default was > chosen as a proxy to execute Restart command on Host > dell-r210ii-04.rhev.lab.eng.brq.redhat.com. > > > 2015-Apr-20, 16:06 > > Host slot-4.rhev.lab.eng.brq.redhat.com from data center Default was chosen > as a proxy to execute Status command on Host > dell-r210ii-04.rhev.lab.eng.brq.redhat.com. > Jiri, looking at your comment, there is no "Done" here. Am I missing something?
I almost was thinking - was I trolling? But, yeah, I got it on 3.5.4-1.2 2015-Aug-12, 17:11 Power Management test failed for Host dell-r210ii-04.rhev.lab.eng.brq.redhat.com.Done 9001 dell-r210ii-04.rhev.lab.eng.brq.redhat.com Default Steps to reproduce: - 2 hosts - on host 2: iptables -I OUTPUT -d $host1_ipmi -j REJECT - check PM settings of host1, click Test and wait
Fixed - if message is 'Done', don't show it.
Moving back to post. Ori, please backport it to 3.6.
'Not Specified' is not much better: Power Management test failed for Host dell-r210ii-13.rhev.lab.eng.brq.redhat.com.Not Specified rhevm-webadmin-portal-3.6.0-0.18.el6.noarch
I was not able to reproduce with the steps provided in comment 4
#8 meant that 'Done' issue is solved but as #8 states clearly, there's new 'Not Specified' issue :)
(In reply to Jiri Belka from comment #10) > #8 meant that 'Done' issue is solved but as #8 states clearly, there's new > 'Not Specified' issue :) What are the steps to get this ?
Steps to reproduce: - 2 hosts (- on host 2: ipmitool -e @ -I lanplus -H $host1_ipmi power status) - on host 2: iptables -I OUTPUT -d $host1_ipmi -j REJECT (- on host 2: ipmitool -e @ -I lanplus -H $host1_ipmi power status # failure) - check PM settings of host1, click Test and wait In Edit host you will see: Test failed: Search for events... Events: message = "Power Management test failed*"
Still requires backport.
ok, rhevm-webadmin-portal-3.6.1.1-0.1.el6.noarch No 'Done' in the event message.