Bug 1151553
| Summary: | [PM] PM-Restart, host still down - ERROR [org.ovirt.engine.core.vdsbroker.ResourceManager] ... CreateCommand failed: java.lang.NullPointerException | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Jiri Belka <jbelka> | ||||||||
| Component: | ovirt-engine | Assignee: | Eli Mesika <emesika> | ||||||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Jiri Belka <jbelka> | ||||||||
| Severity: | medium | Docs Contact: | |||||||||
| Priority: | unspecified | ||||||||||
| Version: | 3.4.0 | CC: | bazulay, ecohen, emesika, gklein, iheim, jbelka, lpeer, lsurette, mgrac, michal.skrivanek, oourfali, pstehlik, rbalakri, Rhev-m-bugs, sherold, yeylon | ||||||||
| Target Milestone: | --- | Keywords: | ZStream | ||||||||
| Target Release: | 3.5.0 | ||||||||||
| Hardware: | Unspecified | ||||||||||
| OS: | Unspecified | ||||||||||
| Whiteboard: | infra | ||||||||||
| Fixed In Version: | org.ovirt.engine-root-3.5.0-19 | Doc Type: | Bug Fix | ||||||||
| Doc Text: | Story Points: | --- | |||||||||
| Clone Of: | |||||||||||
| : | 1159761 (view as bug list) | Environment: | |||||||||
| Last Closed: | 2015-02-17 17:14:34 UTC | Type: | Bug | ||||||||
| Regression: | --- | Mount Type: | --- | ||||||||
| Documentation: | --- | CRM: | |||||||||
| Verified Versions: | Category: | --- | |||||||||
| oVirt Team: | Infra | RHEL 7.3 requirements from Atomic Host: | |||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||
| Embargoed: | |||||||||||
| Bug Depends On: | |||||||||||
| Bug Blocks: | 1122979, 1159761 | ||||||||||
| Attachments: |
|
||||||||||
trying only once... # grep 'targetVdsId = 18befb72-2bdc-499c-bc1f-d5cc789c578c.*action = ' /var/log/ovirt-engine/engine.log | grep '17:5' | sed 's/.*\(action = [^,]*\).*/\1/' action = Status action = Stop action = Status action = Status action = Start action = Status action = Status action = Status action = Status action = Status action = Status action = Status action = Status action = Status action = Status action = Status action = Status action = Status correction: iirc it was not from maintenance but when host was up. i tried with another pserver which was in maintenance and it is also down after reboot. I updated pserver firmware to SV810_087 and it didn't help, must be engine issue then. Please provide VDSM log of the host that serves as a proxy for the fencing action meanwhile i tested the scenario on x86 hosts, all went good. i'm waiting to get my hands on free Power 8 servers... Created attachment 948866 [details]
logs from engine and pserver
3.4.3 and vdsm-4.14.17-1.pkvm2_1.ppc64 stopping from maintenance and then starting from 'off' mode works ok. @Eli: running with additional 'verbose=1' will show also underlying ipmitool commands. The output 'Chassis power is off' is probably all what we have from ipmitool. Jiri Please repeat scenario after adding also verbose=1 to your parameters in order to see why the 'on' operation fails (In reply to Eli Mesika from comment #17) > Jiri > > Please repeat scenario after adding also verbose=1 to your parameters in > order to see why the 'on' operation fails also, please try to add power_wait=4 and retry the restart operation.... Created attachment 949890 [details]
vdsm.log
log from proxy with requested options.
I don't know which hardware do you have but for some device (e.g. HP iLO3 which is also IPMI - we use option 'retry_on' which defines how many times we should try. Full list of possible arguments is available with defaults is now prepared and available on: https://fedorahosted.org/cluster/wiki/FenceArguments ok, works same as in BZ1159761 rhev 3.5.0 was released. closing. |
Created attachment 945737 [details] engine.log Description of problem: I tried to restart host via PM when it was in maintenance, it was stopped but then start failed for some reason... ... 2014-10-10 17:57:22,729 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.FenceVdsVDSCommand] (org.ovirt.thread.pool-4-thread-35) [179ac2f6] START, FenceVdsVDSCommand(HostName = ibm-p8-rhevm-hv-01.lab.bos.redhat.com, HostId = f48e6a95-78fe-47ac-956b-34d1cfd8425f, targetVdsId = 18befb72-2bdc-499c-bc1f-d5cc789c578c, action = Stop, ip = bandelier-fsp.lab.bos.redhat.com, port = , type = ipmilan, user = root, password = ******, options = 'lanplus=1,cipher=1'), log id: 43f17c18 ... 2014-10-10 17:57:40,553 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.FenceVdsVDSCommand] (org.ovirt.thread.pool-4-thread-35) START, FenceVdsVDSCommand(HostName = ibm-p8-rhevm-hv-01.lab.bos.redhat.com, HostId = f48e6a95-78fe-47ac-956b-34d1cfd8425f, targetVdsId = 18befb72-2bdc-499c-bc1f-d5cc789c578c, action = Start, ip = bandelier-fsp.lab.bos.redhat.com, port = , type = ipmilan, user = root, password = ******, options = 'lanplus=1,cipher=1'), log id: 5b0af10f 2014-10-10 18:00:50,815 ERROR [org.ovirt.engine.core.bll.StartVdsCommand] (org.ovirt.thread.pool-4-thread-35) Failed to verify host bandelier.lab.bos.redhat.com start status. Have retried 18 times with delay of 10 seconds between each retry. 2014-10-10 18:00:50,881 INFO [org.ovirt.engine.core.bll.FenceExecutor] (org.ovirt.thread.pool-4-thread-35) Executing <Start> Power Management command, Proxy Host:null, Agent:ipmilan, Target Host:bandelier.lab.bos.redhat.com, Management IP:bandelier-fsp.lab.bos.redhat.com, User:root, Options:lanplus=1,cipher=1 2014-10-10 18:00:50,889 ERROR [org.ovirt.engine.core.vdsbroker.ResourceManager] (org.ovirt.thread.pool-4-thread-35) CreateCommand failed: java.lang.NullPointerException at java.util.concurrent.ConcurrentHashMap.hash(ConcurrentHashMap.java:333) [rt.jar:1.7.0_65] at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:988) [rt.jar:1.7.0_65] ... OK, engine tried... It could try to start again via ipmi then to give up... 'power on' via ipmitool works ok. Version-Release number of selected component (if applicable): rhevm-backend-3.4.3-1.1.el6ev.noarch How reproducible: ?? Steps to Reproduce: 1. maitenance -> restart for ppc64 2. 3. Actual results: looks like engine tries to power up a host just once Expected results: should try again to power up the host Additional info: