Bug 1598781
Summary: | Upgrading RHV-H is bringing back libvirt network file which causes issues in starting of VM | ||
---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | nijin ashok <nashok> |
Component: | imgbased | Assignee: | Ryan Barry <rbarry> |
Status: | CLOSED ERRATA | QA Contact: | Yaning Wang <yaniwang> |
Severity: | urgent | Docs Contact: | |
Priority: | urgent | ||
Version: | 4.2.4 | CC: | cshao, dfediuck, gveitmic, huzhao, peyu, qiyuan, rbarry, sbonazzo, yaniwang, ycui, ylavi, yturgema, yzhao |
Target Milestone: | ovirt-4.2.6 | Keywords: | ZStream |
Target Release: | --- | Flags: | peyu:
testing_plan_complete+
|
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2018-09-04 13:43:29 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | Node | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1622025 |
Description
nijin ashok
2018-07-06 12:45:49 UTC
Thanks, Nijin! Not sure how this wasn't caught in testing, but this may be the best but report I've ever seen I can't actually reproduce this. In both cases (updating from 4.1->4.2 prior to this patch, updating from 4.1->4.2 after this patch), both symlinks survive. The logging is corrected with this patch, and behavior should be correct based on the report in comment#1, but since I can't reproduce the issue, I can't be sure. Can you please provide exact steps to reproduce? (In reply to Ryan Barry from comment #3) > I can't actually reproduce this. > > In both cases (updating from 4.1->4.2 prior to this patch, updating from > 4.1->4.2 after this patch), both symlinks survive. > > The logging is corrected with this patch, and behavior should be correct > based on the report in comment#1, but since I can't reproduce the issue, I > can't be sure. > > Can you please provide exact steps to reproduce? Thank you Ryan for checking this. I am able to reproduce the issue. It indeed needs 3 layer copy. [1] Installed rhvh-4.1-0.20180425.0 and added it in 4.1 manager imgbase w [INFO] You are on rhvh-4.1-0.20170417.0+1 Files are present ls /etc/libvirt/qemu/networks/ autostart vdsm-ovirtmgmt.xml ls /etc/libvirt/qemu/networks/autostart/ vdsm-ovirtmgmt.xml [2] Upgraded manager to 4.2. [3] Upgraded hypervisor to rhvh-4.2.3.0-0.20180518.0+1. Here the issue was not visible because, in the end, both the files will be copied from the old layer. Even though an "often" link was created in between, it will automatically get fixed in the end. === 2018-07-09 11:37:18,457 [DEBUG] (remediate_etc) Calling: (['mount', u'/dev/rhvh/rhvh-4.2.3.0-0.20180518.0+1', u'/tmp/mnt.CJFuB'],) {'close_fds': True, 'stderr': -2} 2018-07-09 11:37:27,156 [DEBUG] (remediate_etc) os.unlink(/tmp/mnt.CJFuB////etc/libvirt/qemu/networks/vdsm-ovirtmgmt.xml) 2018-07-09 11:37:27,156 [DEBUG] (remediate_etc) os.unlink(/tmp/mnt.CJFuB////etc/libvirt/qemu/networks/autostart/vdsm-ovirtmgmt.xml) 2018-07-09 11:37:48,797 [DEBUG] (migrate_etc) Calling: (['cp', '-a', '-r', u'/tmp/mnt.rfCeG///etc/libvirt/qemu/networks/vdsm-ovirtmgmt.xml', u'/tmp/mnt.X5uYa///etc/libvirt/qemu/networks/vdsm-ovirtmgmt.xml'],) {'close_fds': True, 'stderr': -2} 2018-07-09 11:37:48,801 [DEBUG] (migrate_etc) Calling: (['cp', '-a', '-r', u'/tmp/mnt.rfCeG///etc/libvirt/qemu/networks/autostart/vdsm-ovirtmgmt.xml', u'/tmp/mnt.X5uYa///etc/libvirt/qemu/networks/autostart/vdsm-ovirtmgmt.xml'],) {'close_fds': True, 'stderr': -2} === [4] Updated cluster to 4.2 and start and stop the VM in this host. This will clear the vdsm-ovirtmgmt.xml from both the directories. [5] Put the host into maintenance mode and upgraded the host to rhvh-4.2.4.3-0.20180627.0. imgbase w You are on rhvh-4.2.4.3-0.20180627.0+1 The vdsm-ovirtmgmt.xml in the autostart directory is back from dead. ls /etc/libvirt/qemu/networks/autostart/ vdsm-ovirtmgmt.xml ls /etc/libvirt/qemu/networks/ autostart default.xml [6] Started the VM on this host and failed with error below. 2018-07-09 12:25:53,067+0530 ERROR (vm/3c2a0c41) [virt.vm] (vmId='3c2a0c41-682c-413b-8c1c-75a8c726f4ff') The vm start process failed (vm:943) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 872, in _startUnderlyingVm self._run() File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2789, in _run self._setup_devices() File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2613, in _setup_devices dev_object.setup() File "/usr/lib/python2.7/site-packages/vdsm/virt/vmdevices/graphics.py", line 91, in setup displaynetwork.create_network(display_network, self.vmid) File "/usr/lib/python2.7/site-packages/vdsm/virt/displaynetwork.py", line 27, in create_network libvirtnetwork.create_network(netname, display_device, user_reference) File "/usr/lib/python2.7/site-packages/vdsm/virt/libvirtnetwork.py", line 89, in create_network _createNetwork(createNetworkDef(netname, bridged, iface)) File "/usr/lib/python2.7/site-packages/vdsm/virt/libvirtnetwork.py", line 108, in _createNetwork net.setAutostart(1) File "/usr/lib64/python2.7/site-packages/libvirt.py", line 2981, in setAutostart if ret == -1: raise libvirtError ('virNetworkSetAutostart() failed', net=self) libvirtError: Failed to create symlink '/etc/libvirt/qemu/networks/autostart/vdsm-ovirtmgmt.xml' to '/etc/libvirt/qemu/networks/vdsm-ovirtmgmt.xml': File exists This autostart file get automatically cleared after this failure as a part of VM teardown process. So the next VM start will work. However, for the customer, the impact was larger has he migrated the VM and the VM status went into down status in the portal although qemu-kvm process exist in the host. I was not able to reproduce it though. Thanks - this gave me a reproducer, and I confirmed that the patch resolves Another hit with 2 VMs. Raising this to urgent because on migration VDSM stops reporting the VM to RHV-M even though the qemu-kvm process exists and is running (the VM is fine). But RHV-M sees it as down. This open space for ugly split brains and corruptions. Just to show what I meant above: On incoming migration: 2018-08-16 10:15:39,271+1000 ERROR (vm/2e6bb483) [virt.vm] (vmId='2e6bb483-ff14-4743-8fc1-dcd41d644f15') The vm start process failed (vm:942) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 871, in _startUnderlyingVm self._run() File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2791, in _run self._setup_devices() File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2615, in _setup_devices dev_object.setup() File "/usr/lib/python2.7/site-packages/vdsm/virt/vmdevices/graphics.py", line 91, in setup displaynetwork.create_network(display_network, self.vmid) File "/usr/lib/python2.7/site-packages/vdsm/virt/displaynetwork.py", line 27, in create_network libvirtnetwork.create_network(netname, display_device, user_reference) File "/usr/lib/python2.7/site-packages/vdsm/virt/libvirtnetwork.py", line 89, in create_network _createNetwork(createNetworkDef(netname, bridged, iface)) File "/usr/lib/python2.7/site-packages/vdsm/virt/libvirtnetwork.py", line 108, in _createNetwork net.setAutostart(1) File "/usr/lib64/python2.7/site-packages/libvirt.py", line 2981, in setAutostart if ret == -1: raise libvirtError ('virNetworkSetAutostart() failed', net=self) libvirtError: Failed to create symlink '/etc/libvirt/qemu/networks/autostart/vdsm-production.xml' to '/etc/libvirt/qemu/networks/vdsm-production.xml': File exists 2018-08-16 10:15:39,271+1000 INFO (vm/2e6bb483) [virt.vm] (vmId='2e6bb483-ff14-4743-8fc1-dcd41d644f15') Changed state to Down: Failed to create symlink '/etc/libvirt/qemu/networks/autostart/vdsm-production.xml' to '/etc/libvirt/qemu/networks/vdsm-production.xml': File exists (code=1) (vm:1682) 2018-08-16 10:15:39,277+1000 INFO (vm/2e6bb483) [virt.vm] (vmId='2e6bb483-ff14-4743-8fc1-dcd41d644f15') Stopping connection (guestagent:438) Then VDSM does not report the VM, but the VM is running happily on the host. Verified on build 4.2-20180827.0.el7_5 steps: 1. from 4.1-20180426.0 upgrade 4.2-20180531.0 2. upgrade to 4.2-20180827.0.el7_5 Actual results: libvirt network symlinks are removed correctly after multi-upgrade. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:2626 sync2jira sync2jira |