Created attachment 1092711 [details] sosreport Description of problem: Install and configure HE-VM successful on RHEV-H7.1-20151015, upgrade to RHEV-H7.2-20151104 via cmd line. After upgrade successful, HE-VM can not startup automatically. Version-Release number of selected component (if applicable): rhev-hypervisor7-7.1-20151015.0.iso ovirt-node-3.2.3-23.el7.noarch rhev-hypervisor7-7.2-20151104.0.iso ovirt-node-3.6.0-0.20.20151103git3d3779a.el7ev.noarch rhevm-appliance-20151014.1-1.x86_64.rhevm.ova How reproducible: 100% Steps to Reproduce: 1. PXE install rhev-hypervisor7-7.1-20151015.0.iso 2. Install, configure and running HE-VM successful. 3. Upgrade to RHEV-H7.2-20151104 with "upgrade" parameter via cmd line. initrd=/images/rhevh-vdsm7-7.2-20151104.0_36/initrd0.img ksdevice=bootif rootflags=loop rootflags=ro rd.dm=0 rd_NO_MULTIPATH rd.md=0 crashkernel=256M rootfstype=auto lang= max_loop=256 rhgb quiet elevator=deadline rd.live.check rd.luks=0 install ro root=live:/rhev-hypervisor7-7.2-20151104.0.iso rd.live.image BOOTIF=01-5c-f3-fc-e9-c0-c8 upgrade BOOT_IMAGE=/images/rhevh-vdsm7-7.2-20151104.0_36/vmlinuz0 Actual results: After step3 HE-VM failed to startup. Expected results: After step3 HE-VM should startup and running successful. Additional info: You can find the some errors in /var/log/agent.log: [root@ibm-x3650m3-02 ovirt-hosted-engine-ha]# cat agent.log ... ... MainThread::INFO::2015-11-11 10:06:24,862::upgrade::165::ovirt_hosted_engine_ha.lib.upgrade.StorageServer::(_is_conf_volume_there) Found conf volume: imgUUID:2e528651-6262-49bc-abd5-06d036e109c4, volUUID:f0be421a-9466-4cd9-b877-35387f33af70 MainThread::INFO::2015-11-11 10:06:24,992::upgrade::670::ovirt_hosted_engine_ha.lib.upgrade.StorageServer::(_startMonitoringDomain) Start monitoring domain MainThread::INFO::2015-11-11 10:06:55,568::upgrade::288::ovirt_hosted_engine_ha.lib.upgrade.StorageServer::(_get_conffile_content) Reading conf file: hosted-engine.conf MainThread::ERROR::2015-11-11 10:06:55,639::agent::205::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Error: 'Error executing: 1 - stdout: - stderr:mv: cannot backup ‘/etc/ovirt-hosted-engine/hosted-engine.conf’: Device or resource busy ' - trying to restart agent MainThread::WARNING::2015-11-11 10:07:00,645::agent::208::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Restarting agent, attempt '9' MainThread::ERROR::2015-11-11 10:07:00,645::agent::210::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Too many errors occurred, giving up. Please review the log and consider filing a bug. [root@ibm-x3650m3-02 mnt]# hosted-engine --vm-status --== Host 1 status ==-- Status up-to-date : False Hostname : ibm-x3650m3-02.qe.lab.eng.nay.redhat.com Host ID : 1 Engine status : unknown stale-data Score : 2400 Local maintenance : False Host timestamp : 564 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=564 (Wed Nov 11 09:34:41 2015) host-id=1 score=2400 maintenance=False state=EngineStarting
Error: 'Error executing: 1 - stdout: - stderr:mv: cannot backup ‘/etc/ovirt-hosted-engine/hosted-engine.conf’: Device or resource busy Seems to be around persistence, but maybe that's just one issue preventing the start.
To me this one is another symptom of bz#1280268. Could you please also share the ovirt-node.log? Thanks!
Created attachment 1094846 [details] /var/log and sosreport
Douglas, for this bug we need release note for rhevh 7.2 for 3.6 beta 1.
The key issue here is- 10:06:55,639::agent::205::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Error: 'Error executing: 1 - stdout: - stderr:mv: cannot backup ‘/etc/ovirt-hosted-engine/hosted-engine.conf’: Device or resource busy The upgrade process from 3.5 to 3.6 is upgrading the configuration and moving some parts to the shared storage. So far it was tested on RHEL and we need to ensure this is properly supported in RHEV-H as well. This is not the same as bug 1280268 since this flow is related to the upgrade procedure from 3.5 to 3.6.
Moved it over because the code fix around persistence is needed in ovirt-hosted-engine-ha
Please fill in the fixed in version field
Verified Upgrade from 3.5 - Red Hat Enterprise Virtualization Hypervisor release 7.1 (20151015.0.el7ev) ============================================================================================== ovirt-hosted-engine-ha-1.2.7.2-1.el7ev.noarch ovirt-hosted-engine-setup-1.2.6.1-1.el7ev.noarch vdsm-4.16.27-1.el7ev.x86_64 to 3.6 - Red Hat Enterprise Virtualization Hypervisor (Beta) release 7.2 (20151221.1.el7ev) ============================================================================================== ovirt-hosted-engine-setup-1.3.1.3-1.el7ev.noarch ovirt-hosted-engine-ha-1.3.3.6-1.el7ev.noarch vdsm-4.17.13-1.el7ev.noarch 1) Install RHEV-H 3.5 2) Deploy hosted-engine on one host with NFS storage 3) Enable global maintenance(on host hosted-engine --set-maintenance --mode=global) 4) Upgrade engine to 3.6 5) Power off engine vm(hosted-engine --vm-poweroff) 6) Upgrade host to RHEV-H 3.6 via usb-key Upgrade succeed and after host upgrade ovirt-he-agent and ovirt-ha-broker up.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHEA-2016-0422.html