Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1151553

Summary: [PM] PM-Restart, host still down - ERROR [org.ovirt.engine.core.vdsbroker.ResourceManager] ... CreateCommand failed: java.lang.NullPointerException
Product: Red Hat Enterprise Virtualization Manager Reporter: Jiri Belka <jbelka>
Component: ovirt-engineAssignee: Eli Mesika <emesika>
Status: CLOSED CURRENTRELEASE QA Contact: Jiri Belka <jbelka>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.4.0CC: bazulay, ecohen, emesika, gklein, iheim, jbelka, lpeer, lsurette, mgrac, michal.skrivanek, oourfali, pstehlik, rbalakri, Rhev-m-bugs, sherold, yeylon
Target Milestone: ---Keywords: ZStream
Target Release: 3.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: infra
Fixed In Version: org.ovirt.engine-root-3.5.0-19 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1159761 (view as bug list) Environment:
Last Closed: 2015-02-17 17:14:34 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1122979, 1159761    
Attachments:
Description Flags
engine.log
none
logs from engine and pserver
none
vdsm.log none

Description Jiri Belka 2014-10-10 16:33:44 UTC
Created attachment 945737 [details]
engine.log

Description of problem:

I tried to restart host via PM when it was in maintenance, it was stopped but then start failed for some reason...

...
2014-10-10 17:57:22,729 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.FenceVdsVDSCommand] (org.ovirt.thread.pool-4-thread-35) [179ac2f6] START, FenceVdsVDSCommand(HostName = ibm-p8-rhevm-hv-01.lab.bos.redhat.com, HostId = f48e6a95-78fe-47ac-956b-34d1cfd8425f, targetVdsId = 18befb72-2bdc-499c-bc1f-d5cc789c578c, action = Stop, ip = bandelier-fsp.lab.bos.redhat.com, port = , type = ipmilan, user = root, password = ******, options = 'lanplus=1,cipher=1'), log id: 43f17c18
...
2014-10-10 17:57:40,553 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.FenceVdsVDSCommand] (org.ovirt.thread.pool-4-thread-35) START, FenceVdsVDSCommand(HostName = ibm-p8-rhevm-hv-01.lab.bos.redhat.com, HostId = f48e6a95-78fe-47ac-956b-34d1cfd8425f, targetVdsId = 18befb72-2bdc-499c-bc1f-d5cc789c578c, action = Start, ip = bandelier-fsp.lab.bos.redhat.com, port = , type = ipmilan, user = root, password = ******, options = 'lanplus=1,cipher=1'), log id: 5b0af10f

2014-10-10 18:00:50,815 ERROR [org.ovirt.engine.core.bll.StartVdsCommand] (org.ovirt.thread.pool-4-thread-35) Failed to verify host bandelier.lab.bos.redhat.com start status. Have retried 18 times with delay of 10 seconds between each retry.
2014-10-10 18:00:50,881 INFO  [org.ovirt.engine.core.bll.FenceExecutor] (org.ovirt.thread.pool-4-thread-35) Executing <Start> Power Management command, Proxy Host:null, Agent:ipmilan, Target Host:bandelier.lab.bos.redhat.com, Management IP:bandelier-fsp.lab.bos.redhat.com, User:root, Options:lanplus=1,cipher=1
2014-10-10 18:00:50,889 ERROR [org.ovirt.engine.core.vdsbroker.ResourceManager] (org.ovirt.thread.pool-4-thread-35) CreateCommand failed: java.lang.NullPointerException
        at java.util.concurrent.ConcurrentHashMap.hash(ConcurrentHashMap.java:333) [rt.jar:1.7.0_65]
        at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:988) [rt.jar:1.7.0_65]
...

OK, engine tried... It could try to start again via ipmi then to give up...

'power on' via ipmitool works ok.

Version-Release number of selected component (if applicable):
rhevm-backend-3.4.3-1.1.el6ev.noarch

How reproducible:
??

Steps to Reproduce:
1. maitenance -> restart for ppc64
2.
3.

Actual results:
looks like engine tries to power up a host just once

Expected results:
should try again to power up the host

Additional info:

Comment 1 Jiri Belka 2014-10-10 16:35:40 UTC
trying only once...

# grep 'targetVdsId = 18befb72-2bdc-499c-bc1f-d5cc789c578c.*action = ' /var/log/ovirt-engine/engine.log | grep '17:5' | sed 's/.*\(action = [^,]*\).*/\1/' 
action = Status
action = Stop
action = Status
action = Status
action = Start
action = Status
action = Status
action = Status
action = Status
action = Status
action = Status
action = Status
action = Status
action = Status
action = Status
action = Status
action = Status
action = Status

Comment 2 Jiri Belka 2014-10-10 16:45:47 UTC
correction:

iirc it was not from maintenance but when host was up.

Comment 3 Jiri Belka 2014-10-10 16:59:56 UTC
i tried with another pserver which was in maintenance and it is also down after reboot.

Comment 5 Jiri Belka 2014-10-10 19:49:50 UTC
I updated pserver firmware to SV810_087 and it didn't help, must be engine issue then.

Comment 6 Eli Mesika 2014-10-20 11:56:44 UTC
Please provide VDSM log of the host that serves as a proxy for the fencing action

Comment 7 Jiri Belka 2014-10-20 14:47:44 UTC
meanwhile i tested the scenario on x86 hosts, all went good.

i'm waiting to get my hands on free Power 8 servers...

Comment 8 Jiri Belka 2014-10-21 09:23:04 UTC
Created attachment 948866 [details]
logs from engine and pserver

Comment 11 Jiri Belka 2014-10-21 12:41:53 UTC
3.4.3 and vdsm-4.14.17-1.pkvm2_1.ppc64

Comment 15 Jiri Belka 2014-10-23 13:19:56 UTC
stopping from maintenance and then starting from 'off' mode works ok.

Comment 16 Marek Grac 2014-10-23 13:22:42 UTC
@Eli:

running with additional 'verbose=1' will show also underlying ipmitool commands. The output 'Chassis power is off' is probably all what we have from ipmitool.

Comment 17 Eli Mesika 2014-10-23 13:25:55 UTC
Jiri

Please repeat scenario after adding also verbose=1 to your parameters in order to see why the 'on' operation fails

Comment 18 Eli Mesika 2014-10-23 13:28:16 UTC
(In reply to Eli Mesika from comment #17)
> Jiri
> 
> Please repeat scenario after adding also verbose=1 to your parameters in
> order to see why the 'on' operation fails

also, please try to add power_wait=4 and retry the restart operation....

Comment 19 Jiri Belka 2014-10-23 13:49:46 UTC
Created attachment 949890 [details]
vdsm.log

log from proxy with requested options.

Comment 20 Marek Grac 2014-10-23 13:59:17 UTC
I don't know which hardware do you have but for some device (e.g. HP iLO3 which is also IPMI - we use option 'retry_on' which defines how many times we should try.

Full list of possible arguments is available with defaults is now prepared and available on:

https://fedorahosted.org/cluster/wiki/FenceArguments

Comment 27 Jiri Belka 2014-11-07 14:54:09 UTC
ok, works same as in BZ1159761

Comment 28 Eyal Edri 2015-02-17 17:14:34 UTC
rhev 3.5.0 was released. closing.