Bug 976240

Summary: openstack-nova: vnc console dies when taking snapshot over instance (Server disconnected (code: 1006)
Product: Red Hat OpenStack Reporter: Haim <hateya>
Component: openstack-novaAssignee: Xavier Queralt <xqueralt>
Status: CLOSED ERRATA QA Contact: Martin Pavlásek <mpavlase>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.0CC: dallan, dron, jkt, mlopes, ndipanov, xqueralt, yeylon
Target Milestone: beta   
Target Release: 4.0   
Hardware: x86_64   
OS: Linux   
Whiteboard: OtherQA
Fixed In Version: openstack-nova-2013.2-4.el6ost Doc Type: Bug Fix
Doc Text:
Nova required instances be stopped before taking snapshots. As a result, taking a snapshot of an instance would drop any VNC connections to it. With the new release, live snapshots are used by default. Instances remain powered on during snapshots; requiring no downtime, and keeping VNC connections open.
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-12-20 00:07:01 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
nova logs. none

Description Haim 2013-06-20 07:56:08 UTC
Description of problem:

when taking a snapshot (live) over an instance while vnc console is opened to the instance, vm freezes and I lost vnc console to vm.
i get the following error:

Server disconnected (code: 1006)



Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. run instance
2. open vnc console
3. take live snapshot 
4. check vnc console output

Actual results:


Expected results:


Additional info:

Comment 2 Haim 2013-06-20 08:26:24 UTC
Created attachment 763315 [details]
nova logs.

Comment 3 Xavier Queralt 2013-06-24 14:59:54 UTC
The shipped versions of libvirt and qemu don't yet support live snapshots. When an snapshot is created the instance has to be paused for an instant which causes the vnc connection to be lost.

Live snapshots in openstack require libvirt >= 1.0.0 and qemu >= 1.3.0.

Until those two packages are not updated, a possible solution could be to:

a) try to reopen the websocket connection from the web side if it is lost.
b) make novncproxy to re-stablish the connection with the instance and resume the vnc session with the client.

In any case, the VNC console can be recovered just by refreshing the page.

Comment 4 Xavier Queralt 2013-10-29 08:17:28 UTC
This got fixed in RHOS 4.0 with the inclusion of live snapshots in bug 1020958

Even though the versions of libvirt and qemu in RHEL 6.5 are not the ones mentioned above, the functionality required for live snapshots has been backported to RHEL 6.5 and we can now perform an snapshot without having to stop the instance (or loosing the vnc connection).

Comment 5 Xavier Queralt 2013-10-29 13:05:23 UTC
*** Bug 1022596 has been marked as a duplicate of this bug. ***

Comment 8 Martin Pavlásek 2013-12-03 14:35:26 UTC
1] Version of related packages
####################################
Live snapshots requires qemu-kvm-rhev that provides supported fuctionality. On upstream it is solved by version dependency on newer libvirt and qemu.

[root@mpavlase-rhos-4 ~(keystone_admin)]# repoquery qemu-kvm-* openstack-nova-*
openstack-nova-0:2013.2-5.el6ost.noarch
openstack-nova-api-0:2013.2-5.el6ost.noarch
openstack-nova-cells-0:2013.2-5.el6ost.noarch
openstack-nova-cert-0:2013.2-5.el6ost.noarch
openstack-nova-common-0:2013.2-5.el6ost.noarch
openstack-nova-compute-0:2013.2-5.el6ost.noarch
openstack-nova-conductor-0:2013.2-5.el6ost.noarch
openstack-nova-console-0:2013.2-5.el6ost.noarch
openstack-nova-doc-0:2013.2-5.el6ost.noarch
openstack-nova-network-0:2013.2-5.el6ost.noarch
openstack-nova-novncproxy-0:2013.2-5.el6ost.noarch
openstack-nova-objectstore-0:2013.2-5.el6ost.noarch
openstack-nova-scheduler-0:2013.2-5.el6ost.noarch
openstack-nova-volume-0:2012.2.4-1.el6.noarch
qemu-kvm-2:0.12.1.2-2.410.el6.x86_64
qemu-kvm-rhev-2:0.12.1.2-2.415.el6_5.3.x86_64
qemu-kvm-rhev-tools-2:0.12.1.2-2.415.el6_5.3.x86_64
qemu-kvm-tools-2:0.12.1.2-2.410.el6.x86_64

2] Verify behaviour - preparation
####################################
- upload cirros into glance as 'cirros-0.3.1-x86_64-uec'
- boot machine with that image
[root@mpavlase-rhos-4 ~(keystone_admin)]# nova boot --image cirros-0.3.1-x86_64-uec --flavor m1.tiny testmachine
+--------------------------------------+--------------------------------------+
| Property                             | Value                                |
+--------------------------------------+--------------------------------------+
| OS-EXT-STS:task_state                | scheduling                           |
| image                                | cirros-0.3.1-x86_64-uec              |
| OS-EXT-STS:vm_state                  | building                             |
| OS-EXT-SRV-ATTR:instance_name        | instance-00000003                    |
| OS-SRV-USG:launched_at               | None                                 |
| flavor                               | m1.tiny                              |
| id                                   | ff3fc48a-e4e9-44e5-832a-c9166509b832 |
| security_groups                      | [{u'name': u'default'}]              |
| user_id                              | fb0d867fa1b84a0f8f759d063b3ec0e6     |
| OS-DCF:diskConfig                    | MANUAL                               |
| accessIPv4                           |                                      |
| accessIPv6                           |                                      |
| progress                             | 0                                    |
| OS-EXT-STS:power_state               | 0                                    |
| OS-EXT-AZ:availability_zone          | nova                                 |
| config_drive                         |                                      |
| status                               | BUILD                                |
| updated                              | 2013-12-03T12:08:09Z                 |
| hostId                               |                                      |
| OS-EXT-SRV-ATTR:host                 | None                                 |
| OS-SRV-USG:terminated_at             | None                                 |
| key_name                             | None                                 |
| OS-EXT-SRV-ATTR:hypervisor_hostname  | None                                 |
| name                                 | testmachine                          |
| adminPass                            | PrUSZpEy6X5Z                         |
| tenant_id                            | e004be4e70ff4a9384a7de9fb222e764     |
| created                              | 2013-12-03T12:08:09Z                 |
| os-extended-volumes:volumes_attached | []                                   |
| metadata                             | {}                                   |
+--------------------------------------+--------------------------------------+
Note: IP of this VM: 192.168.32.2

3] Verify behaviour
####################################
Live snapshots doesn't require stopping of VM during creating snapshot, so I've written simply script, that regularly write into file and stdout, so we can watch continous output.

3.1 - create disc changes
mpavlase@localhost $ ssh cirros.32.2
$ while true; do date|tee -a date-file; sleep 1; done
Tue Dec  3 05:30:17 MST 2013
Tue Dec  3 05:30:18 MST 2013
Tue Dec  3 05:30:20 MST 2013
Tue Dec  3 05:30:21 MST 2013
Tue Dec  3 05:30:22 MST 2013
Tue Dec  3 05:30:24 MST 2013
Tue Dec  3 05:30:25 MST 2013
Tue Dec  3 05:30:26 MST 2013
Tue Dec  3 05:30:27 MST 2013
Tue Dec  3 05:30:28 MST 2013
Tue Dec  3 05:30:29 MST 2013
Tue Dec  3 05:30:31 MST 2013
Tue Dec  3 05:30:32 MST 2013
Tue Dec  3 05:30:33 MST 2013
Tue Dec  3 05:30:34 MST 2013
Tue Dec  3 05:30:35 MST 2013
Tue Dec  3 05:30:36 MST 2013
Tue Dec  3 05:30:37 MST 2013
Tue Dec  3 05:30:38 MST 2013
Tue Dec  3 05:30:40 MST 2013
Tue Dec  3 05:30:41 MST 2013
Tue Dec  3 05:30:42 MST 2013
Tue Dec  3 05:30:43 MST 2013
Tue Dec  3 05:30:44 MST 2013
Tue Dec  3 05:30:45 MST 2013
...

3.2 - login into Horizon, 'Instances', clicked on 'Create snapshot', name of snapshot 'snap1'
3.3 - boot new VM from 'snap1', it has been assigned IP: 192.168.32.3
3.4 - log into new VM and verify output of watched file
mpavlase@localhost $ ssh cirros.32.3
$ tail date-file 
Tue Dec  3 05:30:22 MST 2013
Tue Dec  3 05:30:24 MST 2013
Tue Dec  3 05:30:25 MST 2013
Tue Dec  3 05:30:26 MST 2013
Tue Dec  3 05:30:27 MST 2013
Tue Dec  3 05:30:28 MST 2013
Tue Dec  3 05:30:29 MST 2013
Tue Dec  3 05:30:31 MST 2013
Tue Dec  3 05:30:32 MST 2013
Tue Dec  3 05:30:33 MST 2013

4] Summary
####################################
Initial VM obviously didn't stop during creating snapshot. There are some missing lines (in example 05:30:19), but it is not relevant, no continous outage didn't occure. VERIFIED

Comment 9 Martin Pavlásek 2013-12-03 16:03:40 UTC
Hmm... I did a quite big mistake, so move to next round.

All is same, but in step 3.1 I did via noVNC (tab 'Console' in Horizon) and behaviour was exactly same, so same result as in last - VERIFIED.

Comment 12 errata-xmlrpc 2013-12-20 00:07:01 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2013-1859.html