1343593 – after hosted-engine --upgrade-appliance action is done, HE VM still runs with old disk

Bug 1343593 - after hosted-engine --upgrade-appliance action is done, HE VM still runs with old disk

Summary: after hosted-engine --upgrade-appliance action is done, HE VM still runs with...

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	ovirt-hosted-engine-setup
Classification:	oVirt
Component:	General
Sub Component:
Version:	2.0.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Target Release:	---
Assignee:	Simone Tiraboschi
QA Contact:	Jiri Belka
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1382543 (view as bug list)
Depends On:	1347731
Blocks:	1319457
TreeView+	depends on / blocked

Reported:	2016-06-07 14:08 UTC by Jiri Belka
Modified:	2022-02-25 11:11 UTC (History)
CC List:	10 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2016-06-23 14:39:19 UTC
oVirt Team:	Integration
Embargoed:
Dependent Products:
Flags:	sbonazzo: blocker-

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	1326810	urgent	CLOSED	Cannot edit HE VM via REST API	2021-02-22 00:41:40 UTC
Red Hat Bugzilla	1328921	high	CLOSED	Importing HE with VNC device attached creates corrupted VM in the database	2021-02-22 00:41:40 UTC
Red Hat Issue Tracker	RHV-44934	None	None	None	2022-02-25 11:11:24 UTC

Internal Links: 1326810 1328921

Description Jiri Belka 2016-06-07 14:08:13 UTC

Description of problem:

hosted-engine --upgrade-appliance starts the action of HE VM upgrade and in last steps the setup instructs the user to inspect if HE VM runs correctly after upgrade - the setup is still in progress - and in this time the HE VM runs 4.0 engine, ie. it gets new image attached correctly.

But later on the setup shutdowns the HE VM and requires the user to start it again. This new start causes the HE VM to run with old disk thus we are back in 3.6 engine :/

~~~
# egrep -i 'add_vm_disk.*create_disk:' /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20160607124454-uswy6r.log
2016-06-07 12:48:02 DEBUG otopi.plugins.gr_he_upgradeappliance.engine.add_vm_disk add_vm_disk._create_disk:221 vol: 2fdacca1-4695-48b0-ba41-4530e34055f5
2016-06-07 12:48:02 DEBUG otopi.plugins.gr_he_upgradeappliance.engine.add_vm_disk add_vm_disk._create_disk:222 img: 116ab426-4a9e-4332-9cd2-38d479af6565
~~~

After post-upgrade start libvirt reports:

~~~
# virsh domblklist HostedEngine
Target     Source
------------------------------------------------
vda        /var/run/vdsm/storage/f310489b-a6fe-4f8e-b685-c10d6be57abe/97c39ec3-0318-4dca-9ca5-0af112c80a63/0bc5708f-e522-4c5f-b972-12250178b4bd
hdc        -
~~~

So it seems it is running back with old image:

~~~
# ( cd /rhev/data-center/mnt/10.34.63.199\:_jbelka_jb-vhe1 ; find . -type f -size +100M | xargs ls -lth )
-rw-rw----. 1 vdsm kvm  50G Jun  7  2016 ./f310489b-a6fe-4f8e-b685-c10d6be57abe/images/97c39ec3-0318-4dca-9ca5-0af112c80a63/0bc5708f-e522-4c5f-b972-12250178b4bd
-rw-rw----. 1 vdsm kvm  50G Jun  7 13:27 ./f310489b-a6fe-4f8e-b685-c10d6be57abe/images/116ab426-4a9e-4332-9cd2-38d479af6565/2fdacca1-4695-48b0-ba41-4530e34055f5
-rw-rw----. 1 vdsm kvm 1.0G Jun  6 15:56 ./f310489b-a6fe-4f8e-b685-c10d6be57abe/images/70ece018-4e4f-4d0e-ae27-e0a38f38c195/1e0b19ec-6bed-4580-8ddc-82cd073c0ec7
~~~


Version-Release number of selected component (if applicable):
ovirt-hosted-engine-setup-2.0.0-1.el7ev.noarch

How reproducible:
100%

Steps to Reproduce:
1. hosted-engine --upgrade-appliance
2. make the action finish successfully
3. when asked to 'restart' vm, start it via hosted-engine --vm-start

Actual results:
post-upgrade HE VM start causes the VM to run with old image

Expected results:
HE VM should run with new image (upgrade) after upgrade finished successfully

Additional info:
discovered while checking qemu args which had '-machine rhel6.5.0'

# ps auxww | grep -o '[q]emu.*rhel[[:digit:]\.]*'
qemu     23846 18.5 43.9 3940204 1707172 ?     Sl   14:43  15:29 /usr/libexec/qemu-kvm -name HostedEngine -S -machine rhel6.5.0

Comment 2 Jiri Belka 2016-06-07 14:19:46 UTC

While observing libvirt log for HE one can see - 1st one original HE setup, 2nd HE setup which was started in the process of upgrade, 3rd / last one is the HE setup which was started after while upgrade action finished:

# egrep -o 'rtc base.*file[^,]*' /var/log/libvirt/qemu/HostedEngine.log  | tail -n 3 | sed 's/\-global.*\ \-//'
rtc base=2016-06-07T09:03:22,driftfix=slew drive file=/var/run/vdsm/storage/f310489b-a6fe-4f8e-b685-c10d6be57abe/97c39ec3-0318-4dca-9ca5-0af112c80a63/0bc5708f-e522-4c5f-b972-12250178b4bd
rtc base=2016-06-07T11:08:19,driftfix=slew drive file=/var/run/vdsm/storage/f310489b-a6fe-4f8e-b685-c10d6be57abe/116ab426-4a9e-4332-9cd2-38d479af6565/2fdacca1-4695-48b0-ba41-4530e34055f5
rtc base=2016-06-07T12:43:20,driftfix=slew drive file=/var/run/vdsm/storage/f310489b-a6fe-4f8e-b685-c10d6be57abe/97c39ec3-0318-4dca-9ca5-0af112c80a63/0bc5708f-e522-4c5f-b972-12250178b4b

Comment 3 Simone Tiraboschi 2016-06-08 08:03:00 UTC

Jiri, was your VM configured to use VNC?
AFAIK this is a side effect of:
https://bugzilla.redhat.com/show_bug.cgi?id=1326810
https://bugzilla.redhat.com/show_bug.cgi?id=1328921

So we probably have to retest with a recent version of engine-appliance which correctly address this.

Comment 4 Jiri Belka 2016-06-08 08:10:57 UTC

My HE VM used default value for display, thus it was SPICE one. This could be fore sure visible in vdsm.log.

Comment 5 Simone Tiraboschi 2016-06-08 08:19:13 UTC

From the log I see that hosted-engine-setup created the VM with vnc:

2016-06-07 13:08:18 DEBUG otopi.plugins.gr_he_upgradeappliance.vm.runvm runvm._createvm:79 {'status': {'message': 'Done', 'code': 0}, 'items': [{u'displayInfo': [{u'tlsPort': u'-1', u'ipAddress': u'0', u'port': u'-1', u'type': u'vnc'}], u'memUsage': u'0', u'acpiEnable': u'true', u'guestFQDN': u'', u'pid': u'0', u'session': u'Unknown', u'displaySecurePort': u'-1', u'timeOffset': u'0', u'displayType': u'vnc', u'cpuUser': u'0.00', u'elapsedTime': u'0', u'vmType': u'kvm', u'cpuSys': u'0.00', u'appsList': [], u'vmName': u'HostedEngine', u'status': u'WaitForLaunch', u'hash': u'-4942054084956770103', u'vmId': u'3a9ad9b6-56fd-49ba-bfb7-bd281c5a6b98', u'displayIp': u'0', u'displayPort': u'-1', u'guestIPs': u'', u'kvmEnable': u'true', u'monitorResponse': u'0', u'username': u'Unknown', u'guestCPUCount': -1, u'clientIp': u'', u'statusTime': u'4307025300'}]}


and indeed in the answerfile from the shared storage there was:
2016-06-07 12:45:03 DEBUG otopi.plugins.gr_he_common.core.remote_answerfile remote_answerfile._fetch_answer_file:82 Answer file form the shared storage: [environment:default]
...
OVEHOSTED_VDSM/consoleType=str:vnc

Comment 6 Jiri Belka 2016-06-08 08:30:01 UTC

(In reply to Simone Tiraboschi from comment #5)
> From the log I see that hosted-engine-setup created the VM with vnc:
> 
> 2016-06-07 13:08:18 DEBUG otopi.plugins.gr_he_upgradeappliance.vm.runvm
> runvm._createvm:79 {'status': {'message': 'Done', 'code': 0}, 'items':
> [{u'displayInfo': [{u'tlsPort': u'-1', u'ipAddress': u'0', u'port': u'-1',
> u'type': u'vnc'}], u'memUsage': u'0', u'acpiEnable': u'true', u'guestFQDN':
> u'', u'pid': u'0', u'session': u'Unknown', u'displaySecurePort': u'-1',
> u'timeOffset': u'0', u'displayType': u'vnc', u'cpuUser': u'0.00',
> u'elapsedTime': u'0', u'vmType': u'kvm', u'cpuSys': u'0.00', u'appsList':
> [], u'vmName': u'HostedEngine', u'status': u'WaitForLaunch', u'hash':
> u'-4942054084956770103', u'vmId': u'3a9ad9b6-56fd-49ba-bfb7-bd281c5a6b98',
> u'displayIp': u'0', u'displayPort': u'-1', u'guestIPs': u'', u'kvmEnable':
> u'true', u'monitorResponse': u'0', u'username': u'Unknown',
> u'guestCPUCount': -1, u'clientIp': u'', u'statusTime': u'4307025300'}]}
> 
> 
> and indeed in the answerfile from the shared storage there was:
> 2016-06-07 12:45:03 DEBUG otopi.plugins.gr_he_common.core.remote_answerfile
> remote_answerfile._fetch_answer_file:82 Answer file form the shared storage:
> [environment:default]
> ...
> OVEHOSTED_VDSM/consoleType=str:vnc

hm, that's strange, I am 100% sure after upgrade it was SPICE, so it was switched during upgrade process:

# egrep -o 'rtc base.*\-(spice|vnc) ' /var/log/libvirt/qemu/HostedEngine.log  | tail -n 3 | sed 's/\-global.*0 //'
rtc base=2016-06-07T09:03:22,driftfix=slew -spice 
rtc base=2016-06-07T11:08:19,driftfix=slew -vnc 
rtc base=2016-06-07T12:43:20,driftfix=slew -spice

Comment 7 Simone Tiraboschi 2016-06-08 12:24:58 UTC

The spice->vnc->spice issue was a side effect of rhbz#1328921

Jiri, can you please detail which versions of ovirt-hosted-engine-setup, ovirt-hosted-engine-ha and ovirt-engine-appliance did you used since we need to understand if all the relevant fixes were in?

Comment 8 Jiri Belka 2016-06-09 10:56:27 UTC

I suppose it was from 3.6.7 and 4.0.0-12

Jun 09 07:22:45 Installed: ovirt-hosted-engine-ha-1.3.5.7-1.el7ev.noarch
Jun 09 07:22:46 Installed: ovirt-hosted-engine-setup-1.3.7.1-1.el7ev.noarch
Jun 09 10:57:56 Updated: ovirt-hosted-engine-ha-2.0.0-1.el7ev.noarch
Jun 09 10:57:57 Updated: ovirt-hosted-engine-setup-2.0.0-1.el7ev.noarch
Jun 09 12:05:45 Installed: rhevm-appliance-20160526.0-1.el7ev.noarch

Comment 9 Simone Tiraboschi 2016-06-09 11:14:16 UTC

(In reply to Jiri Belka from comment #8)
> I suppose it was from 3.6.7 and 4.0.0-12

Yes, then we have bugs...

> Jun 09 07:22:45 Installed: ovirt-hosted-engine-ha-1.3.5.7-1.el7ev.noarch
> Jun 09 07:22:46 Installed: ovirt-hosted-engine-setup-1.3.7.1-1.el7ev.noarch
> Jun 09 10:57:56 Updated: ovirt-hosted-engine-ha-2.0.0-1.el7ev.noarch
> Jun 09 10:57:57 Updated: ovirt-hosted-engine-setup-2.0.0-1.el7ev.noarch
> Jun 09 12:05:45 Installed: rhevm-appliance-20160526.0-1.el7ev.noarch

The issue was probably here: rhevm-appliance-20160526.0-1.el7ev.noarch

Can you please try with a fresher appliance that includes the fixes for
https://bugzilla.redhat.com/show_bug.cgi?id=1328921
and
https://bugzilla.redhat.com/show_bug.cgi?id=1326810
?

So basically something built this week, probably you have to use an upstream one.

Comment 10 Jiri Belka 2016-06-09 14:51:37 UTC

> The issue was probably here: rhevm-appliance-20160526.0-1.el7ev.noarch
> 
> Can you please try with a fresher appliance that includes the fixes for
> https://bugzilla.redhat.com/show_bug.cgi?id=1328921
> and
> https://bugzilla.redhat.com/show_bug.cgi?id=1326810
> ?
> 
> So basically something built this week, probably you have to use an upstream
> one.

So my next tried was a little bit more successful:

ovirt-engine-appliance-3.6-20160608.1.el7.centos.noarch
ovirt-engine-appliance-4.0-20160606.1.el7.centos.noarch
ovirt-hosted-engine-ha-1.3.5.8-0.0.master.20160531154033.git186400d.el7.noarch
ovirt-hosted-engine-ha-2.0.0-1.20160601054504.git3b8b82b.el7.noarch
ovirt-hosted-engine-setup-1.3.7.3-0.0.master.20160607094202.git6c7a783.el7.centos.noarch
ovirt-hosted-engine-setup-2.0.0-1.el7.centos.noarch

And it finished with following output, the VM runs inside with RHEL 7.2 now:

~~~
[ INFO  ] Trying to get a fresher copy of vm configuration from the OVF_STORE
[ ERROR ] Unable to get vm.conf from OVF_STORE, falling back to initial vm.conf
[ INFO  ] Running engine-setup on the appliance
          |- Preparing to restore:
          |- - Unpacking file '/root/engine_backup.tar.gz'
          |- FATAL: Backup was created by version '3.6' and can not be restored using the installed version 4.1
          |- HE_APPLIANCE_ENGINE_RESTORE_FAIL
[ ERROR ] Engine backup restore failed on the appliance
[ ERROR ] Failed to execute stage 'Closing up': engine-backup failed restoring the engine backup on the appliance Please check its log on the appliance. 
[ INFO  ] Stage: Clean up
[ INFO  ] Stage: Pre-termination
[ INFO  ] Stage: Termination
[ ERROR ] Hosted Engine upgrade failed: this system is not reliable, please check the issue, fix and try again
          Log file is located at /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20160609162517-wtysuf.log
~~~

I'm confused how it works with disk images. Do you attach temporary disk and then later you "merge" change to original disk? See below:

~~~
# egrep -o 'rtc base.*file[^,]*' /var/log/libvirt/qemu/HostedEngine.log  | tail -n 3 | sed 's/\-global.*\ \-//'
rtc base=2016-06-09T13:04:56,driftfix=slew drive file=/var/run/vdsm/storage/691e2028-25c2-45da-8feb-6ac02ee45d12/4bf2fbaf-8c14-47fa-97a8-739af9a5e7c8/00717a9f-5658-4196-89e5-a7c36a541fa4
rtc base=2016-06-09T14:35:22,driftfix=slew drive file=/var/run/vdsm/storage/691e2028-25c2-45da-8feb-6ac02ee45d12/d99b08ad-83e5-416e-8c0b-9dbf463343da/f40ac89c-c3c3-44d2-babb-ffe72b286b3e
rtc base=2016-06-09T14:41:17,driftfix=slew drive file=/var/run/vdsm/storage/691e2028-25c2-45da-8feb-6ac02ee45d12/4bf2fbaf-8c14-47fa-97a8-739af9a5e7c8/00717a9f-5658-4196-89e5-a7c36a541fa4

# virsh domblklist 6
Target     Source
------------------------------------------------
hdc        -
vda        /var/run/vdsm/storage/691e2028-25c2-45da-8feb-6ac02ee45d12/4bf2fbaf-8c14-47fa-97a8-739af9a5e7c8/00717a9f-5658-4196-89e5-a7c36a541fa4
~~~

So it starts with a image '00717...', then it gets started with image 'f40ac...' and later on it is started again with the image starting with '00717...'. Looks like changes are "merged" into original disk. could you clarify?

Comment 11 Jiri Belka 2016-06-09 14:53:38 UTC

Ah, ignore conclution from #10 - in fact it is running again with old disk after HE VM start. I just checked rpm packages inside the HE VM guest OS.

Comment 12 Simone Tiraboschi 2016-06-09 15:38:34 UTC

(In reply to Jiri Belka from comment #10)

> ovirt-engine-appliance-3.6-20160608.1.el7.centos.noarch
> ovirt-engine-appliance-4.0-20160606.1.el7.centos.noarch

>           |- FATAL: Backup was created by version '3.6' and can not be
> restored using the installed version 4.1

Not sure why ovirt-engine-appliance-4.0 comes with engine-backup from 4.1: this is a different issue and it's worth to track it seperately.

 
> I'm confused how it works with disk images. Do you attach temporary disk and
> then later you "merge" change to original disk? See below:

- We are (or least we should) not touching the original 3.6 disk.
- the upgrade flow uses the 3.6 engine to create a new floating disk,
- then we transfer the the new (4.0) appliance over the new disk
- we inject the backup into the new disk
- we shutdown the engine VM (still with 3.6 disk)
- we restart the engine VM temporary attaching the new floating disk patching vm.conf on the fly
- we attach a cloud-init image to configure the new appliance disk (root password, hostname...)
- we restore the backup and we execute engine-setup
- we check that the datacenter comes up
- we ask the user to review
- just at the end, if everything is fine, we use the 4.0 engine to edit the hosted-engine vm to use the new disk; till that, if we have an error in any stage or if the user aborts, on the next boot the VM will still be started with the old (3.6) disk since the OVF_STORE still points there.

Comment 13 Simone Tiraboschi 2016-06-09 15:39:31 UTC

(In reply to Jiri Belka from comment #11)
> Ah, ignore conclution from #10 - in fact it is running again with old disk
> after HE VM start. I just checked rpm packages inside the HE VM guest OS.

This is what we expect on errors: it's a kind of instant rollback!

Comment 14 Simone Tiraboschi 2016-06-10 14:50:46 UTC

(In reply to Simone Tiraboschi from comment #9)
> Can you please try with a fresher appliance that includes the fixes for
> https://bugzilla.redhat.com/show_bug.cgi?id=1328921
> and
> https://bugzilla.redhat.com/show_bug.cgi?id=1326810
> ?
> 
> So basically something built this week, probably you have to use an upstream
> one.

Still not working with:
ovirt-engine-appliance.noarch       4.0-20160603.1.el7.centos

Comment 15 Jiri Belka 2016-06-16 10:42:10 UTC

Still not working correctly with:

ovirt-hosted-engine-setup-2.0.0.1-1.el7ev.noarch
rhevm-appliance-20160615.0-1.el7ev.noarch

Migration tested like this:

Jun 16 08:22:49 Installed: ovirt-hosted-engine-setup-1.3.7.2-1.el7ev.noarch
Jun 16 08:23:34 Installed: rhevm-appliance-20160602.0-1.el7ev.noarch
Jun 16 08:58:11 Updated: ovirt-hosted-engine-setup-2.0.0.1-1.el7ev.noarch
Jun 16 09:28:43 Updated: rhevm-appliance-20160615.0-1.el7ev.noarch


# egrep -o 'rtc base.*file[^,]*' /var/log/libvirt/qemu/HostedEngine.log  | tail -n 3 | sed 's/\-global.*\ \-//'
rtc base=2016-06-16T06:44:14,driftfix=slew drive file=/var/run/vdsm/storage/ad888318-9a70-48f2-9ce3-952cfabaaebb/fb22581e-6875-4e75-b018-331fd2e2d2cb/48e34983-a47a-4f95-8c91-99ff0794915f
rtc base=2016-06-16T08:13:55,driftfix=slew drive file=/tmp/tmph5LD1j/seed.iso
rtc base=2016-06-16T08:34:40,driftfix=slew drive file=/var/run/vdsm/storage/ad888318-9a70-48f2-9ce3-952cfabaaebb/fb22581e-6875-4e75-b018-331fd2e2d2cb/48e34983-a47a-4f95-8c91-99ff0794915f

Comment 16 Red Hat Bugzilla Rules Engine 2016-06-16 10:42:15 UTC

Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.

Comment 17 Simone Tiraboschi 2016-06-16 11:25:39 UTC

Jiri,
did you also reduced the timeout on OvfUpdateIntervalInMinutes to one minute as for 
https://bugzilla.redhat.com/show_bug.cgi?id=1343455 ?

We still have to manually do that till https://gerrit.ovirt.org/#/c/51842/ gets in

Comment 18 Jiri Belka 2016-06-16 11:46:01 UTC

Yes I ran engine-config to chagne it and then I waited the setup to finish. Then I ended global maintenance mode and the VM was started by HA agents.

Comment 19 Roman Mohr 2016-06-16 12:11:52 UTC

(In reply to Jiri Belka from comment #18)
Hi Jiri,

> Yes I ran engine-config to chagne it and then I waited the setup to finish.
> Then I ended global maintenance mode and the VM was started by HA agents.

Could you also share the he-agent log and the vm.conf which was finally used by the engine?

Comment 20 Roman Mohr 2016-06-16 12:54:08 UTC

(In reply to Simone Tiraboschi from comment #17)
> Jiri,
> did you also reduced the timeout on OvfUpdateIntervalInMinutes to one minute
> as for 
> https://bugzilla.redhat.com/show_bug.cgi?id=1343455 ?
> 
> We still have to manually do that till https://gerrit.ovirt.org/#/c/51842/
> gets in

It is also worth noting that you have to restart the engine. Otherwise the config change is not picked up by the engine.

Comment 21 Jiri Belka 2016-06-17 11:12:55 UTC

So I retried again and I modified OvfUpateIntervalInMinutes already in 3.6 engine (restared) and it failed with:

~~~
...
[ INFO  ] Connecting to the Engine
[ 5875.057635] device-mapper: table: 253:4: multipath: error getting device
[ 5875.064342] device-mapper: ioctl: error adding target to table
          The engine VM is currently running with the new disk but the hosted-engine configuration is still point to the old one.
          Please make sure that everything is fine on the engine VM side before definitively switching the disks.
          Are you sure you want to continue? (Yes, No)[Yes]: 
[ INFO  ] Connecting to the Engine
[ INFO  ] Registering the new hosted-engine disk in the DB
[ INFO  ] Waiting for the engine to complete disk registration. This may take several minutes...
[ INFO  ] The new engine VM disk is now ready
 detail: There was an attempt to change Hosted Engine VM values that are locked.
[ INFO  ] Stage: Clean up
[ INFO  ] Stage: Pre-termination
[ INFO  ] Stage: Termination
[ ERROR ] Hosted Engine upgrade failed: this system is not reliable, please check the issue, fix and try again
          Log file is located at /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20160617102816-gx7xz5.log

# ps auxww | grep Hosted
qemu     27906 27.1 32.2 5021468 2577160 ?     Sl   10:51   2:25 /usr/libexec/qemu-kvm -name HostedEngine -S -machine rhel6.5.0,accel=kvm,usb=off -cpu qemu64,-svm -m 4096 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -uuid 65064c09-fdd3-4f21-9cf5-33a965f0e122 -smbios type=1,manufacturer=Red Hat,product=RHEV Hypervisor,version=7.2-13.0.el7ev,serial=4C4C4544-0058-3410-8058-C2C04F38354A,uuid=65064c09-fdd3-4f21-9cf5-33a965f0e122 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-HostedEngine/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=2016-06-17T08:51:08,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-reboot -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x4 -device virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x5 -drive file=/var/run/vdsm/storage/709f2402-6b69-4021-b365-cb24383353c3/d00251e9-0023-4908-8f72-62bfe2a8b7eb/f5c691ff-f491-4c12-af0b-c4e8a07efd34,if=none,id=drive-virtio-disk0,format=raw,serial=d00251e9-0023-4908-8f72-62bfe2a8b7eb,cache=none,werror=stop,rerror=stop,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive file=/tmp/tmpoDIfQT/seed.iso,if=none,id=drive-ide0-1-0,readonly=on,format=raw -device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev tap,fd=26,id=hostnet0,vhost=on,vhostfd=27 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:1a:4a:d0:40:16,bus=pci.0,addr=0x3 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channels/65064c09-fdd3-4f21-9cf5-33a965f0e122.com.redhat.rhevm.vdsm,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm -chardev socket,id=charchannel1,path=/var/lib/libvirt/qemu/channels/65064c09-fdd3-4f21-9cf5-33a965f0e122.org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0 -chardev socket,id=charchannel2,path=/var/lib/libvirt/qemu/channels/65064c09-fdd3-4f21-9cf5-33a965f0e122.org.ovirt.hosted-engine-setup.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=3,chardev=charchannel2,id=channel2,name=org.ovirt.hosted-engine-setup.0 -vnc 0:0,password -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -msg timestamp=on
~~~

Right now the HE VM is still running (with 4.0 appliance):

~~~
rtc base=2016-06-17T07:43:47,driftfix=slew drive file=/var/run/vdsm/storage/709f2402-6b69-4021-b365-cb24383353c3/6171a38c-73f4-4670-ab75-be4bb3d0dd4a/359d6ed8-7535-48d6-8737-3b1362f894e2
rtc base=2016-06-17T07:50:00,driftfix=slew drive file=/var/run/vdsm/storage/709f2402-6b69-4021-b365-cb24383353c3/6171a38c-73f4-4670-ab75-be4bb3d0dd4a/359d6ed8-7535-48d6-8737-3b1362f894e2
rtc base=2016-06-17T08:51:08,driftfix=slew drive file=/tmp/tmpoDIfQT/seed.iso
~~~

Ending global maintenance and shutting down the HE VM, after the HE VM is stared again it runs with old disk.

Comment 23 Simone Tiraboschi 2016-06-17 11:28:22 UTC

It seams that now we have a new issue when we try to switch the disk on the new engine VM:

2016-06-17 10:58:04 INFO otopi.plugins.gr_he_upgradeappliance.engine.add_vm_disk add_vm_disk._wait_disk_ready:113 The new engine VM disk is now ready
2016-06-17 10:58:07 DEBUG otopi.context context._executeMethod:142 method exception
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/otopi/context.py", line 132, in _executeMethod
    method['method']()
  File "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/gr-he-upgradeappliance/engine/add_vm_disk.py", line 374, in _closeup
    e_vm_b.update()
  File "/usr/lib/python2.7/site-packages/ovirtsdk/infrastructure/brokers.py", line 30639, in update
    headers={"Correlation-Id":correlation_id, "Expect":expect}
  File "/usr/lib/python2.7/site-packages/ovirtsdk/infrastructure/proxy.py", line 68, in update
    return self.request('PUT', url, body, headers, cls=cls)
  File "/usr/lib/python2.7/site-packages/ovirtsdk/infrastructure/proxy.py", line 122, in request
    persistent_auth=self.__persistent_auth
  File "/usr/lib/python2.7/site-packages/ovirtsdk/infrastructure/connectionspool.py", line 79, in do_request
    persistent_auth)
  File "/usr/lib/python2.7/site-packages/ovirtsdk/infrastructure/connectionspool.py", line 156, in __do_request
    raise errors.RequestError(response_code, response_reason, response_body)
RequestError: 
status: 400
reason: Bad Request
detail: There was an attempt to change Hosted Engine VM values that are locked.
2016-06-17 10:58:07 ERROR otopi.context context._executeMethod:151 Failed to execute stage 'Closing up': 
status: 400
reason: Bad Request
detail: There was an attempt to change Hosted Engine VM values that are locked.

Comment 24 Simone Tiraboschi 2016-06-17 11:56:02 UTC

The behaviour at comment 23 has be seen for the first time with rhevm-appliance-20160615.0-1.el7ev.ova so it could be a regression on engine code.

Comment 26 Simone Tiraboschi 2016-06-17 13:28:30 UTC

The 4.0 engine refuses to edit the engien VM changing its disk:

2016-06-17 06:58:15,075 INFO  [org.ovirt.engine.core.bll.UpdateVmCommand] (default task-16) [35a97f3d] Lock Acquired to object 'EngineLock:{exclusiveLocks='[HostedEngine=<VM_NAME, ACTION_TYPE_FAILED_VM_IS_BEING_UPDATED>]', sharedLocks='[65064c09-fdd3-4f21-9cf5-33a965f0e122=<VM, ACTION_TYPE_FAILED_VM_IS_BEING_UPDATED>]'}'
2016-06-17 06:58:15,079 WARN  [org.ovirt.engine.core.bll.UpdateVmCommand] (default task-16) [35a97f3d] Validation of action 'UpdateVm' failed for user admin@internal-authz. Reasons: VAR__ACTION__UPDATE,VAR__TYPE__VM,VM_CANNOT_UPDATE_HOSTED_ENGINE_FIELD
2016-06-17 06:58:15,080 INFO  [org.ovirt.engine.core.bll.UpdateVmCommand] (default task-16) [35a97f3d] Lock freed to object 'EngineLock:{exclusiveLocks='[HostedEngine=<VM_NAME, ACTION_TYPE_FAILED_VM_IS_BEING_UPDATED>]', sharedLocks='[65064c09-fdd3-4f21-9cf5-33a965f0e122=<VM, ACTION_TYPE_FAILED_VM_IS_BEING_UPDATED>]'}'
2016-06-17 06:58:15,094 ERROR [org.ovirt.engine.api.restapi.resource.AbstractBackendResource] (default task-16) [] Operation Failed: [There was an attempt to change Hosted Engine VM values that are locked.]

Opening a new bug since it's a regression on engine side.

Comment 28 Simone Tiraboschi 2016-06-17 13:43:26 UTC

Moving to ON_QA, testonly since it's now blocked by 1343593 but the issue is not here.

Comment 29 Jiri Belka 2016-06-23 13:40:57 UTC

Closing as https://bugzilla.redhat.com/show_bug.cgi?id=1347731 - there will be redesign of the things behind the scene.

Comment 30 Simone Tiraboschi 2016-10-10 09:54:18 UTC

*** Bug 1382543 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.