Bug 1260177 - Restoring a RAM snapshots in RHEL7.2 shows error stating the vm (even though it starts correctly) and fails to connect via spice(SetVmTicket: Unexpected exception)
Restoring a RAM snapshots in RHEL7.2 shows error stating the vm (even though ...
Status: CLOSED CURRENTRELEASE
Product: vdsm
Classification: oVirt
Component: Core (Show other bugs)
---
Unspecified Unspecified
medium Severity medium (vote)
: ovirt-3.6.0-rc3
: ---
Assigned To: Francesco Romani
sefi litmanovich
virt
: Regression, Reopened
Depends On:
Blocks: 1273551 1293299
  Show dependency treegraph
 
Reported: 2015-09-04 12:17 EDT by Carlos Mestre González
Modified: 2016-02-08 09:16 EST (History)
14 users (show)

See Also:
Fixed In Version: v4.17.9
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1273551 1305530 (view as bug list)
Environment:
Last Closed: 2016-02-08 08:12:32 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Virt
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
rule-engine: ovirt‑3.6.0+
rule-engine: blocker+
ylavi: planning_ack+
rule-engine: devel_ack+
rule-engine: testing_ack+


Attachments (Terms of Use)
engine.log (181.80 KB, text/plain)
2015-09-04 12:18 EDT, Carlos Mestre González
no flags Details
vdsm.log (667.73 KB, text/plain)
2015-09-04 12:20 EDT, Carlos Mestre González
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 44842 master MERGED vm: graphics: fix settings after restore Never
oVirt gerrit 45979 master ABANDONED vm: graphics: save all settings when snapshotting Never
oVirt gerrit 46524 ovirt-3.6 MERGED vm: graphics: fix settings after restore Never
oVirt gerrit 47487 ovirt-3.5 MERGED vm: graphics: fix settings after restore Never

  None (edit)
Description Carlos Mestre González 2015-09-04 12:17:08 EDT
Description of problem:
Restoring a snapshot works but I cannot access via spice fails with "Error while executing action SetVmTicket: Unexpected exception". Also the engine shows an error starting the vm even though it starts successfully.

Version-Release number of selected component (if applicable):
rhevm-3.5.4.2-1.3.el6ev.noarch
libvirt-1.2.17-5.el7.x86_64
vdsm-4.16.26-1.el7ev.x86_64
libvirt-1.2.8-16.el7_1.3.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Create a VM with an OS, start it, make a snapshot and stop it.
2. start the vm and start a process (in my example: cat /dev/zero)
2. Create a RAM snapshot
3. Shutdown the vm
5. Preview the RAM snapshot
6. Start the vm with the preview snapshot

Actual results:
- Cannot connect to the vm via spice, fails with error:
Operation cancelled with "Error while executing action SetVmTicket: Unexpected exception"

- Engine shows error in VdsUpdateRunTimeInfo and shows the vm was not able to "Wake Up from Hiberanation" even though the vm started.
 
Expected results:
Can access the vm via spice and the engine doesn't show error

Additional info:
2015-09-04 18:40:41,312 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-78) Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM vm_ram_snapshot_iscsi is down with error. Exit message: Wake up from hibernation failed.
2015-09-04 18:40:41,312 INFO  [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-78) Running on vds during rerun failed vm: null
2015-09-04 18:40:41,313 INFO  [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-78) VM vm_ram_snapshot_iscsi (12440bc2-5cb5-4ae3-967c-66ad875d48ff) is running in db and not running in VDS host_mixed_3
2015-09-04 18:40:41,313 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.FullListVdsCommand] (DefaultQuartzScheduler_Worker-78) START, FullListVdsCommand(HostName = host_mixed_3, HostId = 98d617f0-1640-40d4-bf7e-1a26df2c8c6f, vds=Host[host_mixed_3,98d617f0-1640-40d4-bf7e-1a26df2c8c6f], vmIds=[12440bc2-5cb5-4ae3-967c-66ad875d48ff]), log id: 29ad4e76
2015-09-04 18:40:41,317 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.FullListVdsCommand] (DefaultQuartzScheduler_Worker-78) FINISH, FullListVdsCommand, return: [], log id: 29ad4e76
2015-09-04 18:40:41,327 ERROR [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-78) Rerun vm 12440bc2-5cb5-4ae3-967c-66ad875d48ff. Called from vds host_mixed_3
2015-09-04 18:40:41,336 WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-7-thread-27) Correlation ID: vms_syncAction_4235fc92-073c-402c, Job ID: 18369157-944f-4fd8-aa0c-9789498228d3, Call Stack: null, Custom Event ID: -1, Message: Failed to run VM vm_ram_snapshot_iscsi on Host host_mixed_3.

We haven't encounter this issue with the nodes running on RHEL6.
Comment 1 Carlos Mestre González 2015-09-04 12:18:02 EDT
Created attachment 1070360 [details]
engine.log
Comment 2 Carlos Mestre González 2015-09-04 12:20:40 EDT
Created attachment 1070361 [details]
vdsm.log
Comment 3 Arik 2015-09-06 07:35:38 EDT
Some initial feedback:
The restoration of the saved memory state failed. RunVmCommand that was requested on 18:40:30 (engine time) failed on host_mixed_3 with the following error in VDSM (15:40:30 VDSM time):
libvirtError: XML error: graphics listen attribute 10.35.160.55 must match address attribute of first listen element (found none)

There was rerun attempt that managed to run it eventually (without the stored memory) on host_mixed_1.
I'm not familiar with this error - need to investigate it.

As for the failure to connect to the VM, I can't see any SetVmTicket operation in the engine log. Carlos, where did you see the error "Error while executing action SetVmTicket: Unexpected exception"? can you retrieve the time it was requested in?
Comment 4 Arik 2015-09-06 08:16:32 EDT
Francesco, I see this thread where you discussed this issue with libvirt guys:
http://www.redhat.com/archives/libvir-list/2015-February/msg01179.html

Can you please advice here?
Comment 5 Carlos Mestre González 2015-09-07 04:07:03 EDT
Hi Ari,

Error was in the web UI after clicking in the console icon. Was just a normal popup with that message.
Comment 6 Francesco Romani 2015-09-07 07:11:34 EDT
(In reply to Arik from comment #4)
> Francesco, I see this thread where you discussed this issue with libvirt
> guys:
> http://www.redhat.com/archives/libvir-list/2015-February/msg01179.html
> 
> Can you please advice here?

Hi Arik, back in time the patch mentioned here: http://www.redhat.com/archives/libvir-list/2015-February/msg01202.html

fixed the issue for me. Now checking if anything changed meantime.
Comment 7 Francesco Romani 2015-09-07 07:50:02 EDT
(In reply to Francesco Romani from comment #6)
> (In reply to Arik from comment #4)
> > Francesco, I see this thread where you discussed this issue with libvirt
> > guys:
> > http://www.redhat.com/archives/libvir-list/2015-February/msg01179.html
> > 
> > Can you please advice here?
> 
> Hi Arik, back in time the patch mentioned here:
> http://www.redhat.com/archives/libvir-list/2015-February/msg01202.html
> 
> fixed the issue for me. Now checking if anything changed meantime.

BTW, it seems this and https://bugzilla.redhat.com/show_bug.cgi?id=1260409 share the same root cause.
Comment 9 Francesco Romani 2015-09-09 02:12:59 EDT
The evidence linking to 1260409 wasn't too strong, removing dependency for now.

I've an old half-baked patch that should fix the original issue, taking the bug and resuming that patch.
Comment 10 Red Hat Bugzilla Rules Engine 2015-09-22 03:43:59 EDT
This bug report has Keywords: Regression or TestBlocker.
Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.
Comment 11 Francesco Romani 2015-09-24 07:37:52 EDT
http://gerrit.ovirt.org/46524 merged in 3.6 branch -> MODIFIED
Comment 12 Red Hat Bugzilla Rules Engine 2015-10-18 04:34:26 EDT
Bug tickets that are moved to testing must have target release set to make sure tester knows what to test. Please set the correct target release before moving to ON_QA.
Comment 13 sefi litmanovich 2015-10-19 07:05:25 EDT
Verified with rhevm-3.6.0.1-0.1.el6.noarch.
host: RHEL_7.2

vdsm-4.17.9-1.el7ev.noarch
libvirt-daemon-driver-nwfilter-1.2.17-12.el7.x86_64
libvirt-daemon-driver-storage-1.2.17-12.el7.x86_64
libvirt-daemon-config-nwfilter-1.2.17-12.el7.x86_64
libvirt-client-1.2.17-12.el7.x86_64
libvirt-daemon-driver-secret-1.2.17-12.el7.x86_64
libvirt-lock-sanlock-1.2.17-12.el7.x86_64
libvirt-daemon-1.2.17-12.el7.x86_64
libvirt-daemon-driver-nodedev-1.2.17-12.el7.x86_64
libvirt-daemon-kvm-1.2.17-12.el7.x86_64
libvirt-daemon-driver-interface-1.2.17-12.el7.x86_64
libvirt-daemon-driver-network-1.2.17-12.el7.x86_64
libvirt-python-1.2.17-2.el7.x86_64
libvirt-daemon-driver-qemu-1.2.17-12.el7.x86_64

Verified according to steps in description.
after starting the previewd snapshot started vm, open console successfully (no error with setVmTicket) and vm is on the same state (processing cat /etc/zero)
Comment 14 Carlos Mestre González 2015-10-19 09:38:11 EDT
Hi Francesco,

In my initial report I commented this was in 3.5, I see by the comment and version this was released for 3.6, but what about 3.5.5?

Regards,
Carlos
Comment 15 Francesco Romani 2015-10-19 17:38:06 EDT
(In reply to Carlos Mestre González from comment #14)
> Hi Francesco,
> 
> In my initial report I commented this was in 3.5, I see by the comment and
> version this was released for 3.6, but what about 3.5.5?
> 
> Regards,
> Carlos

Hi Carlos,

The backport slipped a bit, but here is the backport: https://gerrit.ovirt.org/47487
I'm still aiming to have it merged ASAP, to be in the next build.
Comment 16 Raz Tamir 2015-11-03 16:04:26 EST
Happens also in 3.5 - rhevm-3.5.6.1-0.1.el6ev.noarch.
Should I reopen this bug or open a new one?
Comment 17 Francesco Romani 2015-11-04 08:16:12 EST
(In reply to ratamir from comment #16)
> Happens also in 3.5 - rhevm-3.5.6.1-0.1.el6ev.noarch.
> Should I reopen this bug or open a new one?

make sure you are running vdsm >= 4.16.27 and libvirt 1.2.8-16_el7_1.4

if not, please file a new bug.
Comment 18 Francesco Romani 2015-11-04 08:17:01 EST
(In reply to Francesco Romani from comment #17)
> (In reply to ratamir from comment #16)
> > Happens also in 3.5 - rhevm-3.5.6.1-0.1.el6ev.noarch.
> > Should I reopen this bug or open a new one?
> 
> make sure you are running vdsm >= 4.16.27 and libvirt 1.2.8-16_el7_1.4
> 
> if not, please file a new bug.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

If you are running the versions listed above and if still fails, please file a new bug.
Comment 19 Sandro Bonazzola 2015-11-04 08:37:43 EST
oVirt 3.6.0 has been released on November 4th, 2015 and should fix this issue.
If problems still persist, please open a new BZ and reference this one.
Comment 20 Francesco Romani 2015-11-09 09:05:44 EST
(In reply to Francesco Romani from comment #17)
> (In reply to ratamir from comment #16)
> > Happens also in 3.5 - rhevm-3.5.6.1-0.1.el6ev.noarch.
> > Should I reopen this bug or open a new one?
> 
> make sure you are running vdsm >= 4.16.27 and libvirt 1.2.8-16_el7_1.4
> 
> if not, please file a new bug.

Unfortunately the fix didn't make it into 4.16.27 (my bad - twice! once for the fix slip, once for misreporting :( - really sorry about that), so the above is true for 4.16.28.
Comment 21 Red Hat Bugzilla Rules Engine 2016-02-08 07:48:48 EST
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.
Comment 22 Francesco Romani 2016-02-08 08:12:32 EST
Was reopened to investigate further what was claimed in https://bugzilla.redhat.com/show_bug.cgi?id=1260177#c17
but better follow my own suggestion and file a new bug if this is the case :)
Comment 23 Francesco Romani 2016-02-08 09:16:27 EST
(In reply to Francesco Romani from comment #22)
> Was reopened to investigate further what was claimed in
> https://bugzilla.redhat.com/show_bug.cgi?id=1260177#c17
> but better follow my own suggestion and file a new bug if this is the case :)

Done:
https://bugzilla.redhat.com/show_bug.cgi?id=1305530

Note You need to log in before you can comment on or make changes to this bug.