Bug 1280310 - Failed to startup HE-VM after upgrade RHEV-H7.1-20151015 to RHEV-H7.2-20151104 (Device or resource busy)
Summary: Failed to startup HE-VM after upgrade RHEV-H7.1-20151015 to RHEV-H7.2-2015110...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-hosted-engine-ha
Version: 3.6.0
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ovirt-3.6.1
: 3.6.1
Assignee: Martin Sivák
QA Contact: Artyom
URL:
Whiteboard:
Depends On: 1291731
Blocks: 1264065 RHEV3.6Upgrade
TreeView+ depends on / blocked
 
Reported: 2015-11-11 12:55 UTC by Chaofeng Wu
Modified: 2016-03-09 19:50 UTC (History)
20 users (show)

Fixed In Version: ovirt-hosted-engine-ha-1.3.3.6-1.el7ev
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-03-09 19:50:17 UTC
oVirt Team: SLA
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
sosreport (5.89 MB, application/x-xz)
2015-11-11 12:55 UTC, Chaofeng Wu
no flags Details
/var/log and sosreport (7.15 MB, application/x-gzip)
2015-11-16 10:42 UTC, Chaofeng Wu
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1280268 0 urgent CLOSED HE-VM cannot startup automatically after successful configure HE 2021-02-22 00:41:40 UTC
Red Hat Product Errata RHEA-2016:0422 0 normal SHIPPED_LIVE ovirt-hosted-engine-ha bug fix and enhancement update 2016-03-09 23:58:25 UTC
oVirt gerrit 48700 0 master MERGED upgrade: Use unpersist/persist cmds for ovirt node Never
oVirt gerrit 50166 0 ovirt-hosted-engine-ha-1.3 MERGED upgrade: Use unpersist/persist cmds for ovirt node Never
oVirt gerrit 50782 0 master MERGED Revert "persistence: Use pythonic persistence" 2015-12-21 11:44:34 UTC
oVirt gerrit 50783 0 master MERGED upgrade: Add (un)persist verbs to sudo 2015-12-21 11:44:38 UTC
oVirt gerrit 50797 0 ovirt-hosted-engine-ha-1.3 MERGED Revert "persistence: Use pythonic persistence" 2015-12-21 11:45:20 UTC
oVirt gerrit 50798 0 ovirt-hosted-engine-ha-1.3 MERGED upgrade: Add (un)persist verbs to sudo 2015-12-21 11:45:45 UTC

Internal Links: 1280268

Description Chaofeng Wu 2015-11-11 12:55:20 UTC
Created attachment 1092711 [details]
sosreport

Description of problem:
Install and configure HE-VM successful on RHEV-H7.1-20151015, upgrade to RHEV-H7.2-20151104 via cmd line. After upgrade successful, HE-VM can not startup automatically.

Version-Release number of selected component (if applicable):
rhev-hypervisor7-7.1-20151015.0.iso
ovirt-node-3.2.3-23.el7.noarch
rhev-hypervisor7-7.2-20151104.0.iso
ovirt-node-3.6.0-0.20.20151103git3d3779a.el7ev.noarch
rhevm-appliance-20151014.1-1.x86_64.rhevm.ova

How reproducible:
100%

Steps to Reproduce:
1. PXE install rhev-hypervisor7-7.1-20151015.0.iso
2. Install, configure and running HE-VM successful.
3. Upgrade to RHEV-H7.2-20151104 with "upgrade" parameter via cmd line.
initrd=/images/rhevh-vdsm7-7.2-20151104.0_36/initrd0.img ksdevice=bootif rootflags=loop rootflags=ro rd.dm=0 rd_NO_MULTIPATH rd.md=0 crashkernel=256M rootfstype=auto lang=  max_loop=256 rhgb quiet elevator=deadline rd.live.check rd.luks=0 install ro root=live:/rhev-hypervisor7-7.2-20151104.0.iso rd.live.image  BOOTIF=01-5c-f3-fc-e9-c0-c8 upgrade BOOT_IMAGE=/images/rhevh-vdsm7-7.2-20151104.0_36/vmlinuz0

Actual results:
After step3 HE-VM failed to startup.

Expected results:
After step3 HE-VM should startup and running successful.

Additional info:
You can find the some errors in /var/log/agent.log:
[root@ibm-x3650m3-02 ovirt-hosted-engine-ha]# cat agent.log 
... ...
MainThread::INFO::2015-11-11 10:06:24,862::upgrade::165::ovirt_hosted_engine_ha.lib.upgrade.StorageServer::(_is_conf_volume_there) Found conf volume: imgUUID:2e528651-6262-49bc-abd5-06d036e109c4, volUUID:f0be421a-9466-4cd9-b877-35387f33af70
MainThread::INFO::2015-11-11 10:06:24,992::upgrade::670::ovirt_hosted_engine_ha.lib.upgrade.StorageServer::(_startMonitoringDomain) Start monitoring domain
MainThread::INFO::2015-11-11 10:06:55,568::upgrade::288::ovirt_hosted_engine_ha.lib.upgrade.StorageServer::(_get_conffile_content) Reading conf file: hosted-engine.conf
MainThread::ERROR::2015-11-11 10:06:55,639::agent::205::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Error: 'Error executing: 1 - stdout: - stderr:mv: cannot backup ‘/etc/ovirt-hosted-engine/hosted-engine.conf’: Device or resource busy
' - trying to restart agent
MainThread::WARNING::2015-11-11 10:07:00,645::agent::208::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Restarting agent, attempt '9'
MainThread::ERROR::2015-11-11 10:07:00,645::agent::210::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Too many errors occurred, giving up. Please review the log and consider filing a bug.

[root@ibm-x3650m3-02 mnt]# hosted-engine --vm-status


--== Host 1 status ==--

Status up-to-date                  : False
Hostname                           : ibm-x3650m3-02.qe.lab.eng.nay.redhat.com
Host ID                            : 1
Engine status                      : unknown stale-data
Score                              : 2400
Local maintenance                  : False
Host timestamp                     : 564
Extra metadata (valid at timestamp):
	metadata_parse_version=1
	metadata_feature_version=1
	timestamp=564 (Wed Nov 11 09:34:41 2015)
	host-id=1
	score=2400
	maintenance=False
	state=EngineStarting

Comment 1 Fabian Deutsch 2015-11-11 13:45:15 UTC
Error: 'Error executing: 1 - stdout: - stderr:mv: cannot backup ‘/etc/ovirt-hosted-engine/hosted-engine.conf’: Device or resource busy

Seems to be around persistence, but maybe that's just one issue preventing the start.

Comment 2 Douglas Schilling Landgraf 2015-11-12 16:11:08 UTC
To me this one is another symptom of bz#1280268. Could you please also share the ovirt-node.log?

Thanks!

Comment 3 Chaofeng Wu 2015-11-16 10:42:23 UTC
Created attachment 1094846 [details]
/var/log and sosreport

Comment 6 Ying Cui 2015-11-16 13:58:48 UTC
Douglas, for this bug we need release note for rhevh 7.2 for 3.6 beta 1.

Comment 7 Doron Fediuck 2015-11-17 11:52:23 UTC
The key issue here is-
10:06:55,639::agent::205::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Error: 'Error executing: 1 - stdout: - stderr:mv: cannot backup ‘/etc/ovirt-hosted-engine/hosted-engine.conf’: Device or resource busy

The upgrade process from 3.5 to 3.6 is upgrading the configuration and moving
some parts to the shared storage. So far it was tested on RHEL and we need to ensure this is properly supported in RHEV-H as well.

This is not the same as bug 1280268 since this flow is related to the upgrade
procedure from 3.5 to 3.6.

Comment 8 Fabian Deutsch 2015-11-23 09:59:53 UTC
Moved it over because the code fix around persistence is needed in ovirt-hosted-engine-ha

Comment 11 Fabian Deutsch 2015-12-14 14:39:46 UTC
Please fill in the fixed in version field

Comment 12 Artyom 2015-12-22 15:53:30 UTC
Verified
Upgrade from 3.5 - Red Hat Enterprise Virtualization Hypervisor release 7.1 (20151015.0.el7ev)
==============================================================================================
ovirt-hosted-engine-ha-1.2.7.2-1.el7ev.noarch
ovirt-hosted-engine-setup-1.2.6.1-1.el7ev.noarch
vdsm-4.16.27-1.el7ev.x86_64

to 3.6 - Red Hat Enterprise Virtualization Hypervisor (Beta) release 7.2 (20151221.1.el7ev)
==============================================================================================
ovirt-hosted-engine-setup-1.3.1.3-1.el7ev.noarch
ovirt-hosted-engine-ha-1.3.3.6-1.el7ev.noarch
vdsm-4.17.13-1.el7ev.noarch

1) Install RHEV-H 3.5
2) Deploy hosted-engine on one host with NFS storage
3) Enable global maintenance(on host hosted-engine --set-maintenance --mode=global)
4) Upgrade engine to 3.6
5) Power off engine vm(hosted-engine --vm-poweroff)
6) Upgrade host to RHEV-H 3.6 via usb-key

Upgrade succeed and after host upgrade ovirt-he-agent and ovirt-ha-broker up.

Comment 14 errata-xmlrpc 2016-03-09 19:50:17 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-0422.html


Note You need to log in before you can comment on or make changes to this bug.