Bug 858350
| Summary: | RHEV 3.1 Beta1 - DC not coming back online after orderly shutdown | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Steve Reichard <sreichar> | ||||||||
| Component: | ovirt-engine | Assignee: | Ayal Baron <abaron> | ||||||||
| Status: | CLOSED WONTFIX | QA Contact: | Haim <hateya> | ||||||||
| Severity: | unspecified | Docs Contact: | |||||||||
| Priority: | unspecified | ||||||||||
| Version: | 3.1.0 | CC: | abaron, amureini, bthurber, dyasny, fsimonce, iheim, lpeer, Rhev-m-bugs, sgrinber, yeylon, ykaul | ||||||||
| Target Milestone: | --- | ||||||||||
| Target Release: | 3.2.0 | ||||||||||
| Hardware: | Unspecified | ||||||||||
| OS: | Unspecified | ||||||||||
| Whiteboard: | storage | ||||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||||
| Doc Text: | Story Points: | --- | |||||||||
| Clone Of: | Environment: | ||||||||||
| Last Closed: | 2012-12-26 09:52:36 UTC | Type: | Bug | ||||||||
| Regression: | --- | Mount Type: | --- | ||||||||
| Documentation: | --- | CRM: | |||||||||
| Verified Versions: | Category: | --- | |||||||||
| oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||
| Embargoed: | |||||||||||
| Attachments: |
|
||||||||||
Created attachment 614098 [details]
hyper1 logs
Created attachment 614100 [details]
hyper2 logs
The issue is that storage_pool_format_type became/was NULL:
engine=> select id,name,storage_pool_format_type from storage_pool;
id | name | storage_pool_format_type
--------------------------------------+-----------+--------------------------
f7728d06-6d04-43d1-807b-e838b6ee3eb8 | Secondary | 3
989ca2aa-e875-11e1-95f7-00215a5c2e44 | Default | NULL
Logs:
2012-09-21 09:31:27,210 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStartVDSCommand] (QuartzScheduler_Worker-71) [7f614b79] START, SpmStartVDSCommand(vdsId = 5f4fbe68-e89f-11e1-8b0a-00215a5c2e44, storage
PoolId = 989ca2aa-e875-11e1-95f7-00215a5c2e44, prevId=-1, prevLVER=0, storagePoolFormatType=null, recoveryMode=Manual, SCSIFencing=false), log id: 268f760b
2012-09-21 09:31:27,210 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand] (QuartzScheduler_Worker-71) [7f614b79] Failed in SpmStartVDS method, for vds: br-rhelhyp1; host: 10.16.136.100
2012-09-21 09:31:27,211 ERROR [org.ovirt.engine.core.vdsbroker.VDSCommandBase] (QuartzScheduler_Worker-71) [7f614b79] Command SpmStartVDS execution failed. Exception: NullPointerException:
Relevant code SpmStartVDSCommand.java:91
_result = getBroker().spmStart(getParameters().getStoragePoolId().toString(),
getParameters().getPrevId(), getParameters().getPrevLVER(),
getParameters().getRecoveryMode().getValue(),
(new Boolean(getParameters().getSCSIFencing())).toString().toLowerCase(),
Config.<Integer> GetValue(ConfigValues.MaxNumberOfHostsInStoragePool),
getParameters().getStoragePoolFormatType().getValue());
Where getParameters().getStoragePoolFormatType() would have been null.
Changing the storage_pool_format_type to 3 solved the problem:
engine=> update storage_pool set storage_pool_format_type=NULL where id='989ca2aa-e875-11e1-95f7-00215a5c2e44';
engine=> select id,name,storage_pool_format_type from storage_pool;
id | name | storage_pool_format_type
--------------------------------------+-----------+--------------------------
f7728d06-6d04-43d1-807b-e838b6ee3eb8 | Secondary | 3
989ca2aa-e875-11e1-95f7-00215a5c2e44 | Default | 3
The most important thing that should be investigated (beside the NULL value in the db) is why the host was reported as SPM after that SpmStartVDSCommand *failed* for NullPointerException.
PS. VDSM was reporting the correct spm status with: {'spm_st': {'spmId': -1, 'spmStatus': 'Free', 'spmLver': 0}}
(In reply to comment #3) > The issue is that storage_pool_format_type became/was NULL: > > engine=> select id,name,storage_pool_format_type from storage_pool; > id | name | storage_pool_format_type > --------------------------------------+-----------+-------------------------- > f7728d06-6d04-43d1-807b-e838b6ee3eb8 | Secondary | 3 > 989ca2aa-e875-11e1-95f7-00215a5c2e44 | Default | NULL > > Logs: > > 2012-09-21 09:31:27,210 INFO > [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStartVDSCommand] > (QuartzScheduler_Worker-71) [7f614b79] START, SpmStartVDSCommand(vdsId = > 5f4fbe68-e89f-11e1-8b0a-00215a5c2e44, storage > PoolId = 989ca2aa-e875-11e1-95f7-00215a5c2e44, prevId=-1, prevLVER=0, > storagePoolFormatType=null, recoveryMode=Manual, SCSIFencing=false), log id: > 268f760b > 2012-09-21 09:31:27,210 ERROR > [org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand] > (QuartzScheduler_Worker-71) [7f614b79] Failed in SpmStartVDS method, for > vds: br-rhelhyp1; host: 10.16.136.100 > 2012-09-21 09:31:27,211 ERROR > [org.ovirt.engine.core.vdsbroker.VDSCommandBase] (QuartzScheduler_Worker-71) > [7f614b79] Command SpmStartVDS execution failed. Exception: > NullPointerException: > > Relevant code SpmStartVDSCommand.java:91 > > _result = > getBroker().spmStart(getParameters().getStoragePoolId().toString(), > getParameters().getPrevId(), getParameters().getPrevLVER(), > getParameters().getRecoveryMode().getValue(), > (new > Boolean(getParameters().getSCSIFencing())).toString().toLowerCase(), > Config.<Integer> > GetValue(ConfigValues.MaxNumberOfHostsInStoragePool), > getParameters().getStoragePoolFormatType().getValue()); > > Where getParameters().getStoragePoolFormatType() would have been null. > > Changing the storage_pool_format_type to 3 solved the problem: > > engine=> update storage_pool set storage_pool_format_type=NULL where > id='989ca2aa-e875-11e1-95f7-00215a5c2e44'; > engine=> select id,name,storage_pool_format_type from storage_pool; > id | name | storage_pool_format_type > --------------------------------------+-----------+-------------------------- > f7728d06-6d04-43d1-807b-e838b6ee3eb8 | Secondary | 3 > 989ca2aa-e875-11e1-95f7-00215a5c2e44 | Default | 3 > > The most important thing that should be investigated (beside the NULL value > in the db) is why the host was reported as SPM after that SpmStartVDSCommand > *failed* for NullPointerException. > > PS. VDSM was reporting the correct spm status with: {'spm_st': {'spmId': -1, > 'spmStatus': 'Free', 'spmLver': 0}} the NULL value is another bug which is already being worked on: 851003 - StoragePool storage_pool_format_type is null after upgrade (lost in update). wrt the other issue - need to investigate. |
Created attachment 614090 [details] rhevm logs Description of problem: -Summary: Because of a planned site outage, my RHEV 3.1 Beta1 environment was brought down. Upon power up, one of the redundant FC controllers failed. This should have caused path failures, but other paths would have been operational. The DC does not come back online. Attempts to activate the storage get the following error: Failed to activate Storage Domain brFC (Data Center Default) by admin@internal -Details Shutdown - + hypervisors were put into maintenance + hypervisors were shutdown + rhevm was shutdown + SC were shutdown + equipment was powered off Boot Up - + switches were powered + storage was powered + systems were powered + rhevm was booted + hypervisors were booted The RHEVM showed that the host were up and one was serving as a SPM. Logging into this host, the LVs could be read. Access to the storage was lost when this host was put maintenance. As further debug/characterization: Hypervisors were brought in and out of maintenance Hypervisors were reboot when in maintenance RHEVM was reboot Storage was shutdown and repowered (still fails) Failed controller was pulled Rebooted RHEV The DC still did not come online. One host as put into maintenance. A new DC was created. A new cluster was created. The host was edited to be in the new DC. A new LUN was presented from the same storage array. A new Data Domain was created. This DC came online. It appears to me that the problem is not something with the storage array since the new DC could come online, however, the failed paths is an issue for the old DC? I will be attaching log from both the RHEVM and hypevisors. {Is log collector supposed to work? Did I do something wrong? [root@br-rhevm ovirt-engine]# rhevm-log-collector INFO: /tmp/logcollector does not exist. It will be created. Please provide the REST API password for the admin@internal RHEV-M user (CTRL+D to skip): Host list (datacenter=None, cluster=None, host=None): Data Center | Cluster | Hostname/IP Address Default | Default | 10.16.136.100 Secondary | Secondary-clus1 | 10.16.136.101 [root@br-rhevm ovirt-engine]# ls /tmp/logcollector/ [root@br-rhevm ovirt-engine]# } Version-Release number of selected component (if applicable): Red Hat Enterprise Virtualization Manager Version: '3.1.0-11.el6ev' (c) 2012 Red Hat Inc. All rights reserved. [root@br-rhevm ovirt-engine]# cat /etc/redhat-release Red Hat Enterprise Linux Server release 6.3 (Santiago) [root@br-rhevm ovirt-engine]# uname -a Linux br-rhevm.cloud.lab.eng.bos.redhat.com 2.6.32-279.2.1.el6.x86_64 #1 SMP Thu Jul 5 21:08:58 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux [root@br-rhevm ovirt-engine]# yum list installed | grep -e rhev -e virt -e vdsm apache-commons-jxpath.noarch 1.3-5.1_redhat_1.ep6.el6 @rhel-x86_64-server-6-rhevm-3-beta libvirt.x86_64 0.9.10-21.el6_3.3 @rhel-x86_64-server-6 libvirt-client.x86_64 0.9.10-21.el6_3.3 @rhel-x86_64-server-6 libvirt-python.x86_64 0.9.10-21.el6_3.3 @rhel-x86_64-server-6 rhev-guest-tools-iso.noarch 3.1-4 @rhel-x86_64-server-6-rhevm-3-beta rhev-hypervisor6.noarch 6.3-20120813.0.rhev31.el6_3 @rhel-x86_64-server-6-rhevh-beta rhevm.noarch 3.1.0-11.el6ev @rhel-x86_64-server-6-rhevm-3-beta rhevm-backend.noarch 3.1.0-11.el6ev @rhel-x86_64-server-6-rhevm-3-beta rhevm-config.noarch 3.1.0-11.el6ev @rhel-x86_64-server-6-rhevm-3-beta rhevm-dbscripts.noarch 3.1.0-11.el6ev @rhel-x86_64-server-6-rhevm-3-beta rhevm-doc.noarch 3.1.0-7.el6eng @rhel-x86_64-server-6-rhevm-3-beta rhevm-genericapi.noarch 3.1.0-11.el6ev @rhel-x86_64-server-6-rhevm-3-beta rhevm-image-uploader.noarch 3.1.0-1.el6ev @rhel-x86_64-server-6-rhevm-3-beta rhevm-iso-uploader.noarch 3.1.0-2.el6ev @rhel-x86_64-server-6-rhevm-3-beta rhevm-log-collector.noarch 3.1.0-2.el6ev @rhel-x86_64-server-6-rhevm-3-beta rhevm-notification-service.noarch 3.1.0-11.el6ev @rhel-x86_64-server-6-rhevm-3-beta rhevm-restapi.noarch 3.1.0-11.el6ev @rhel-x86_64-server-6-rhevm-3-beta rhevm-sdk.noarch 3.1.0.5-1.el6ev @rhel-x86_64-server-6-rhevm-3-beta rhevm-setup.noarch 3.1.0-11.el6ev @rhel-x86_64-server-6-rhevm-3-beta rhevm-tools-common.noarch 3.1.0-11.el6ev @rhel-x86_64-server-6-rhevm-3-beta rhevm-userportal.noarch 3.1.0-11.el6ev @rhel-x86_64-server-6-rhevm-3-beta rhevm-webadmin-portal.noarch 3.1.0-11.el6ev @rhel-x86_64-server-6-rhevm-3-beta rhn-virtualization-common.noarch 5.4.14-9.el6sat @rhn-tools-rhel-x86_64-server-6 rhn-virtualization-host.noarch 5.4.14-9.el6sat @rhn-tools-rhel-x86_64-server-6 vdsm-bootstrap.noarch 4.9.6-28.0.el6_3 @rhel-x86_64-server-6-rhevm-3-beta virt-viewer.x86_64 0.5.2-9.el6 @rhel-x86_64-server-6 virt-what.x86_64 1.11-1.1.el6 @anaconda-RedHatEnterpriseLinux-201206132210.x86_64/6.3 [root@br-rhevm ovirt-engine]# [root@br-rhelhyp1 ~]# cat /etc/redhat-release Red Hat Enterprise Linux Server release 6.3 (Santiago) [root@br-rhelhyp1 ~]# uname -a Linux br-rhelhyp1.cloud.lab.eng.bos.redhat.com 2.6.32-279.2.1.el6.x86_64 #1 SMP Thu Jul 5 21:08:58 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux [root@br-rhelhyp1 ~]# yum list installed | grep -e vdsm -e rhev -e virt libvirt.x86_64 0.9.10-21.el6_3.3 @rhel-x86_64-server-6 libvirt-client.x86_64 0.9.10-21.el6_3.3 @rhel-x86_64-server-6 libvirt-python.x86_64 0.9.10-21.el6_3.3 @rhel-x86_64-server-6 qemu-img-rhev.x86_64 2:0.12.1.2-2.298.el6_3 @rhel-x86_64-rhev-mgmt-agent-6-beta qemu-kvm-rhev.x86_64 2:0.12.1.2-2.298.el6_3 @rhel-x86_64-rhev-mgmt-agent-6-beta qemu-kvm-rhev-tools.x86_64 2:0.12.1.2-2.298.el6_3 @rhel-x86_64-rhev-mgmt-agent-6-beta rhn-virtualization-common.noarch 5.4.14-9.el6sat @rhn-tools-rhel-x86_64-server-6 rhn-virtualization-host.noarch 5.4.14-9.el6sat @rhn-tools-rhel-x86_64-server-6 sanlock.x86_64 2.3-3.el6_3 @rhel-x86_64-rhev-mgmt-agent-6-beta sanlock-lib.x86_64 2.3-3.el6_3 @rhel-x86_64-rhev-mgmt-agent-6-beta sanlock-python.x86_64 2.3-3.el6_3 @rhel-x86_64-rhev-mgmt-agent-6-beta vdsm.x86_64 4.9.6-27.0.el6_3 @rhel-x86_64-rhev-mgmt-agent-6-beta vdsm-cli.noarch 4.9.6-27.0.el6_3 @rhel-x86_64-rhev-mgmt-agent-6-beta vdsm-python.x86_64 4.9.6-27.0.el6_3 @rhel-x86_64-rhev-mgmt-agent-6-beta virt-viewer.x86_64 0.5.2-9.el6 @rhel-x86_64-server-6 virt-what.x86_64 1.11-1.1.el6 @anaconda-RedHatEnterpriseLinux-201206132210.x86_64/6.3 [root@br-rhelhyp1 ~]# How reproducible: Unknown Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: