Bug 1318902 - Cannot start guest on RHEL 7.3 host via RHEVM
Summary: Cannot start guest on RHEL 7.3 host via RHEVM
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm
Version: 3.6.3
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ovirt-4.0.0-alpha
: 4.0.0
Assignee: Francesco Romani
QA Contact: Shira Maximov
URL:
Whiteboard:
Depends On: 1290357 1325503 1326839
Blocks: 1322796
TreeView+ depends on / blocked
 
Reported: 2016-03-18 05:42 UTC by Han Han
Modified: 2017-02-27 12:14 UTC (History)
17 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1322796 (view as bug list)
Environment:
Last Closed: 2016-08-23 20:19:06 UTC
oVirt Team: Virt


Attachments (Terms of Use)
vdsm log (218.27 KB, text/plain)
2016-03-18 09:29 UTC, yangyang
no flags Details
libvirtd.log (13.12 MB, text/plain)
2016-03-18 09:31 UTC, yangyang
no flags Details
The log of ovirt-engine (838.00 KB, application/x-xz)
2016-03-21 04:33 UTC, Han Han
no flags Details
The log of vdsm (64.97 KB, application/x-xz)
2016-03-21 04:33 UTC, Han Han
no flags Details
The log of libvirt (1.61 KB, application/x-xz)
2016-03-21 04:34 UTC, Han Han
no flags Details
vdsm log of verification try that failed (661.03 KB, application/x-xz)
2016-04-17 16:17 UTC, Ilanit Stein
no flags Details
qemu log of verification try that failed (2.79 KB, text/plain)
2016-04-17 16:18 UTC, Ilanit Stein
no flags Details
engine log of verification try that failed (587.90 KB, application/x-gzip)
2016-04-17 16:20 UTC, Ilanit Stein
no flags Details


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2016:1671 normal SHIPPED_LIVE VDSM 4.0 GA bug fix and enhancement update 2016-09-02 21:32:03 UTC
oVirt gerrit 55189 None None None 2016-03-24 10:25:05 UTC
oVirt gerrit 55225 ovirt-3.6 MERGED configurator: libvirt: do not jump on virtlogd 2016-03-25 13:57:00 UTC

Description Han Han 2016-03-18 05:42:09 UTC
Description of problem:
As subject

Version-Release number of selected component (if applicable):
On host:
libvirt-daemon-1.3.2-1.el7.x86_64
qemu-kvm-rhev-2.5.0-2.el7.x86_64
vdsm-4.17.23.1-0.el7ev.noarch
kernel-3.10.0-363.el7.x86_64
On RHEVM:
rhevm-3.6.3.4-0.1.el6.noarch
kernel-2.6.32-604.el6.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Add a data center, a cluster
For the data center, select Compatibility Version as 3.6, others default.
For the cluster, CPU Architecture as x86)_64, CPU Architecture as Intel Conroe Family.
2. Add a host and a nfs domain
Add a intel host for the former data center and cluster. Add a nfs storage domain for the host.
3. 
Create a guest whose os disk is on the nfs domain. Run once the guest booting with PXE, the guest and host will be down.

Then alert on RHEVM:
Mar 15, 2016 12:37:33 PM
VDSM A command failed: Broken pipe
Mar 15, 2016 12:04:20 PM
VDSM A command failed: Broken pipe
Mar 15, 2016 11:52:21 AM
Host A is non responsive.
Mar 15, 2016 11:52:21 AM
VM QQ was set to the Unknown status.
Mar 15, 2016 11:51:11 AM
Invalid status on Data Center hhan. Setting Data Center status to Non Responsive (On host A, Error: Network error during communication with the Host.).
Mar 15, 2016 11:51:01 AM
User admin@internal failed to initiate a console session for VM QQ
Mar 15, 2016 11:51:01 AM
Host A is not responding. Host cannot be fenced automatically because power management for the host is disabled.
Mar 15, 2016 11:51:01 AM
VDSM command failed: Broken pipe

Actual results:
As step3

Expected results:
Then guest should start normally.

Additional info:

Comment 1 yangyang 2016-03-18 09:29:48 UTC
Created attachment 1137728 [details]
vdsm log

Comment 2 yangyang 2016-03-18 09:31:03 UTC
Created attachment 1137729 [details]
libvirtd.log

Comment 3 Yaniv Kaul 2016-03-20 07:07:22 UTC
1. Please attach compressed logs to the bugs.
2. Are your hosts AMD or Intel? Libvirt hints at AMD, your description above says Intel:
2016-03-18 09:26:10.169+0000: 19577: debug : cpuDecode:194 : cpu=0x7f8fec008d10, data=0x7f8fec0107c0, nmodels=29, preferred=Opteron_G1
2016-03-18 09:26:10.169+0000: 19577: debug : cpuDecode:198 : models[0]=Opteron_G5
2016-03-18 09:26:10.169+0000: 19577: debug : cpuDecode:198 : models[1]=Opteron_G4
2016-03-18 09:26:10.169+0000: 19577: debug : cpuDecode:198 : models[2]=Opteron_G3
2016-03-18 09:26:10.169+0000: 19577: debug : cpuDecode:198 : models[3]=Opteron_G2
2016-03-18 09:26:10.169+0000: 19577: debug : cpuDecode:198 : models[4]=Opteron_G1

Comment 4 Dan Kenigsberg 2016-03-20 10:01:29 UTC
I don't see any "broken pipe" in your Vdsm log, or any other error for that matter. Please attach the logs of the failed attempt to run the QQ VM.

Comment 5 Han Han 2016-03-21 04:32:00 UTC
(In reply to Yaniv Kaul from comment #3)
> 1. Please attach compressed logs to the bugs.
> 2. Are your hosts AMD or Intel? Libvirt hints at AMD, your description above
> says Intel:
> 2016-03-18 09:26:10.169+0000: 19577: debug : cpuDecode:194 :
> cpu=0x7f8fec008d10, data=0x7f8fec0107c0, nmodels=29, preferred=Opteron_G1
> 2016-03-18 09:26:10.169+0000: 19577: debug : cpuDecode:198 :
> models[0]=Opteron_G5
> 2016-03-18 09:26:10.169+0000: 19577: debug : cpuDecode:198 :
> models[1]=Opteron_G4
> 2016-03-18 09:26:10.169+0000: 19577: debug : cpuDecode:198 :
> models[2]=Opteron_G3
> 2016-03-18 09:26:10.169+0000: 19577: debug : cpuDecode:198 :
> models[3]=Opteron_G2
> 2016-03-18 09:26:10.169+0000: 19577: debug : cpuDecode:198 :
> models[4]=Opteron_G1

I used Intel cpu, but host was returned after reporting the bug. So I rerun the bug as comment1 using Intel cpu host.

After step3 the VM and host both down, the error in RHEVM web page:
Mar 21, 2016 11:27:13 AM
Host QQ is non responsive.
Mar 21, 2016 11:27:13 AM
VM Q was set to the Unknown status.
Mar 21, 2016 11:26:14 AM
VDSM command failed: Heartbeat exeeded
Mar 21, 2016 11:25:55 AM
Invalid status on Data Center hhan. Setting Data Center status to Non Responsive (On host QQ, Error: Network error during communication with the Host.).
Mar 21, 2016 11:25:55 AM
Host QQ is not responding. Host cannot be fenced automatically because power management for the host is disabled.
Mar 21, 2016 11:25:55 AM
VDSM QQ command failed: Broken pipe
Mar 21, 2016 11:25:55 AM
VDSM QQ command failed: Broken pipe

The logs of ovirt-engine, vdsm, libvirt is in attachment.

Comment 6 Han Han 2016-03-21 04:33:01 UTC
Created attachment 1138406 [details]
The log of ovirt-engine

Comment 7 Han Han 2016-03-21 04:33:53 UTC
Created attachment 1138407 [details]
The log of vdsm

Comment 8 Han Han 2016-03-21 04:34:31 UTC
Created attachment 1138408 [details]
The log of libvirt

Comment 9 Dan Kenigsberg 2016-03-21 07:52:45 UTC
no broken pipe here as well, but there's

libvirtError: Failed to connect socket to '/var/run/libvirt/virtlogd-sock': No such file or directory

{u'vmParams': {u'acpiEnable': u'true', u'emulatedMachine': u'pc-i440fx-rhel7.2.0', u'vmId': u'c3c08421-6da3-4a28-9adb-ab23e7eac76a', u'memGuaranteedSize': 1024, u'transparentHugePages': u'true', u'timeOffset': u'0', u'cpuType': u'Conroe', u'smp': u'1', u'guestNumaNodes': [{u'nodeIndex': 0, u'cpus': u'0', u'memory': u'1024'}], u'custom': {}, u'vmType': u'kvm', u'spiceSslCipherSuite': u'DEFAULT', u'memSize': 1024, u'smpCoresPerSocket': u'1', u'vmName': u'Q', u'nice': u'0', u'maxMemSize': 4194304, u'bootMenuEnable': u'false', u'copyPasteEnable': u'true', u'smpThreadsPerCore': u'1', u'smartcardEnable': u'false', u'maxMemSlots': 16, u'fileTransferEnable': u'true', u'keyboardLayout': u'en-us', u'kvmEnable': u'true', u'displayNetwork': u'ovirtmgmt', u'devices': [{u'device': u'qxl', u'specParams': {u'vram': u'8192', u'vgamem': u'16384', u'heads': u'1', u'ram': u'65536'}, u'type': u'video', u'deviceId': u'eb27e56b-792a-4ca5-91a8-d61abe50cbae'}, {u'device': u'spice', u'specParams': {u'fileTransferEnable': u'true', u'copyPasteEnable': u'true'}, u'type': u'graphics', u'deviceId': u'd54a012c-36f2-4b52-8e98-b35f5751d639'}, {u'index': u'2', u'iface': u'ide', u'specParams': {u'path': u''}, u'readonly': u'true', u'deviceId': u'9f387e35-f890-4b1c-a41b-cc5cf668444c', u'path': u'', u'device': u'cdrom', u'shared': u'false', u'type': u'disk'}, {u'index': 0, u'domainID': u'77a1b246-5a05-4c81-9291-6134df4c6848', u'format': u'raw', u'optional': u'false', u'poolID': u'71cb9874-d667-42c2-b9b8-cf826d285e22', u'volumeID': u'b7524b5c-4f6d-4a11-a319-e116ccc56b3f', u'imageID': u'8e055c35-a4c0-4e72-a44a-9f15387ead0c', u'specParams': {}, u'readonly': u'false', u'iface': u'virtio', u'deviceId': u'8e055c35-a4c0-4e72-a44a-9f15387ead0c', u'bootOrder': u'2', u'device': u'disk', u'shared': u'false', u'propagateErrors': u'off', u'type': u'disk'}, {u'nicModel': u'pv', u'macAddr': u'00:1a:4a:16:01:51', u'linkActive': u'true', u'network': u'ovirtmgmt', u'bootOrder': u'1', u'filter': u'vdsm-no-mac-spoofing', u'specParams': {u'inbound': {}, u'outbound': {}}, u'deviceId': u'15545951-3106-4662-8d49-a631b8cc07c0', u'device': u'bridge', u'type': u'interface'}, {u'device': u'memballoon', u'specParams': {u'model': u'virtio'}, u'type': u'balloon', u'deviceId': u'f5447bf6-d36a-47f9-a679-b1a1c3806f85'}, {u'index': u'0', u'specParams': {}, u'deviceId': u'84ae75f0-404f-491a-a35e-85481c317fa0', u'device': u'scsi', u'model': u'virtio-scsi', u'type': u'controller'}, {u'device': u'virtio-serial', u'specParams': {}, u'type': u'controller', u'deviceId': u'53117546-60fc-41d3-a7b6-996aa5258a26'}], u'maxVCpus': u'16', u'spiceSecureChannels': u'smain,sinputs,scursor,splayback,srecord,sdisplay,susbredir,ssmartcard', u'display': u'qxl'}, u'vmID': u'c3c08421-6da3-4a28-9adb-ab23e7eac76a'}

Comment 10 Francesco Romani 2016-03-21 08:01:13 UTC
(In reply to Dan Kenigsberg from comment #9)
> no broken pipe here as well, but there's
> 
> libvirtError: Failed to connect socket to '/var/run/libvirt/virtlogd-sock':
> No such file or directory
> 
> {u'vmParams': {u'acpiEnable': u'true', u'emulatedMachine':
> u'pc-i440fx-rhel7.2.0', u'vmId': u'c3c08421-6da3-4a28-9adb-ab23e7eac76a',
> u'memGuaranteedSize': 1024, u'transparentHugePages': u'true', u'timeOffset':
> u'0', u'cpuType': u'Conroe', u'smp': u'1', u'guestNumaNodes':
> [{u'nodeIndex': 0, u'cpus': u'0', u'memory': u'1024'}], u'custom': {},
> u'vmType': u'kvm', u'spiceSslCipherSuite': u'DEFAULT', u'memSize': 1024,
> u'smpCoresPerSocket': u'1', u'vmName': u'Q', u'nice': u'0', u'maxMemSize':
> 4194304, u'bootMenuEnable': u'false', u'copyPasteEnable': u'true',
> u'smpThreadsPerCore': u'1', u'smartcardEnable': u'false', u'maxMemSlots':
> 16, u'fileTransferEnable': u'true', u'keyboardLayout': u'en-us',
> u'kvmEnable': u'true', u'displayNetwork': u'ovirtmgmt', u'devices':
> [{u'device': u'qxl', u'specParams': {u'vram': u'8192', u'vgamem': u'16384',
> u'heads': u'1', u'ram': u'65536'}, u'type': u'video', u'deviceId':
> u'eb27e56b-792a-4ca5-91a8-d61abe50cbae'}, {u'device': u'spice',
> u'specParams': {u'fileTransferEnable': u'true', u'copyPasteEnable':
> u'true'}, u'type': u'graphics', u'deviceId':
> u'd54a012c-36f2-4b52-8e98-b35f5751d639'}, {u'index': u'2', u'iface': u'ide',
> u'specParams': {u'path': u''}, u'readonly': u'true', u'deviceId':
> u'9f387e35-f890-4b1c-a41b-cc5cf668444c', u'path': u'', u'device': u'cdrom',
> u'shared': u'false', u'type': u'disk'}, {u'index': 0, u'domainID':
> u'77a1b246-5a05-4c81-9291-6134df4c6848', u'format': u'raw', u'optional':
> u'false', u'poolID': u'71cb9874-d667-42c2-b9b8-cf826d285e22', u'volumeID':
> u'b7524b5c-4f6d-4a11-a319-e116ccc56b3f', u'imageID':
> u'8e055c35-a4c0-4e72-a44a-9f15387ead0c', u'specParams': {}, u'readonly':
> u'false', u'iface': u'virtio', u'deviceId':
> u'8e055c35-a4c0-4e72-a44a-9f15387ead0c', u'bootOrder': u'2', u'device':
> u'disk', u'shared': u'false', u'propagateErrors': u'off', u'type': u'disk'},
> {u'nicModel': u'pv', u'macAddr': u'00:1a:4a:16:01:51', u'linkActive':
> u'true', u'network': u'ovirtmgmt', u'bootOrder': u'1', u'filter':
> u'vdsm-no-mac-spoofing', u'specParams': {u'inbound': {}, u'outbound': {}},
> u'deviceId': u'15545951-3106-4662-8d49-a631b8cc07c0', u'device': u'bridge',
> u'type': u'interface'}, {u'device': u'memballoon', u'specParams': {u'model':
> u'virtio'}, u'type': u'balloon', u'deviceId':
> u'f5447bf6-d36a-47f9-a679-b1a1c3806f85'}, {u'index': u'0', u'specParams':
> {}, u'deviceId': u'84ae75f0-404f-491a-a35e-85481c317fa0', u'device':
> u'scsi', u'model': u'virtio-scsi', u'type': u'controller'}, {u'device':
> u'virtio-serial', u'specParams': {}, u'type': u'controller', u'deviceId':
> u'53117546-60fc-41d3-a7b6-996aa5258a26'}], u'maxVCpus': u'16',
> u'spiceSecureChannels':
> u'smain,sinputs,scursor,splayback,srecord,sdisplay,susbredir,ssmartcard',
> u'display': u'qxl'}, u'vmID': u'c3c08421-6da3-4a28-9adb-ab23e7eac76a'}

"virtlogd" sounds new to me, looks like I need to catch up with libvirt 1.3.x

Han, please check https://bugzilla.redhat.com/show_bug.cgi?id=1290357 and see if this applies to your case.

Comment 11 Han Han 2016-03-21 11:39:02 UTC
(In reply to Dan Kenigsberg from comment #9)
> no broken pipe here as well, but there's
> 
> libvirtError: Failed to connect socket to '/var/run/libvirt/virtlogd-sock':
> No such file or directory
> 
> {u'vmParams': {u'acpiEnable': u'true', u'emulatedMachine':
> u'pc-i440fx-rhel7.2.0', u'vmId': u'c3c08421-6da3-4a28-9adb-ab23e7eac76a',
> u'memGuaranteedSize': 1024, u'transparentHugePages': u'true', u'timeOffset':
> u'0', u'cpuType': u'Conroe', u'smp': u'1', u'guestNumaNodes':
> [{u'nodeIndex': 0, u'cpus': u'0', u'memory': u'1024'}], u'custom': {},
> u'vmType': u'kvm', u'spiceSslCipherSuite': u'DEFAULT', u'memSize': 1024,
> u'smpCoresPerSocket': u'1', u'vmName': u'Q', u'nice': u'0', u'maxMemSize':
> 4194304, u'bootMenuEnable': u'false', u'copyPasteEnable': u'true',
> u'smpThreadsPerCore': u'1', u'smartcardEnable': u'false', u'maxMemSlots':
> 16, u'fileTransferEnable': u'true', u'keyboardLayout': u'en-us',
> u'kvmEnable': u'true', u'displayNetwork': u'ovirtmgmt', u'devices':
> [{u'device': u'qxl', u'specParams': {u'vram': u'8192', u'vgamem': u'16384',
> u'heads': u'1', u'ram': u'65536'}, u'type': u'video', u'deviceId':
> u'eb27e56b-792a-4ca5-91a8-d61abe50cbae'}, {u'device': u'spice',
> u'specParams': {u'fileTransferEnable': u'true', u'copyPasteEnable':
> u'true'}, u'type': u'graphics', u'deviceId':
> u'd54a012c-36f2-4b52-8e98-b35f5751d639'}, {u'index': u'2', u'iface': u'ide',
> u'specParams': {u'path': u''}, u'readonly': u'true', u'deviceId':
> u'9f387e35-f890-4b1c-a41b-cc5cf668444c', u'path': u'', u'device': u'cdrom',
> u'shared': u'false', u'type': u'disk'}, {u'index': 0, u'domainID':
> u'77a1b246-5a05-4c81-9291-6134df4c6848', u'format': u'raw', u'optional':
> u'false', u'poolID': u'71cb9874-d667-42c2-b9b8-cf826d285e22', u'volumeID':
> u'b7524b5c-4f6d-4a11-a319-e116ccc56b3f', u'imageID':
> u'8e055c35-a4c0-4e72-a44a-9f15387ead0c', u'specParams': {}, u'readonly':
> u'false', u'iface': u'virtio', u'deviceId':
> u'8e055c35-a4c0-4e72-a44a-9f15387ead0c', u'bootOrder': u'2', u'device':
> u'disk', u'shared': u'false', u'propagateErrors': u'off', u'type': u'disk'},
> {u'nicModel': u'pv', u'macAddr': u'00:1a:4a:16:01:51', u'linkActive':
> u'true', u'network': u'ovirtmgmt', u'bootOrder': u'1', u'filter':
> u'vdsm-no-mac-spoofing', u'specParams': {u'inbound': {}, u'outbound': {}},
> u'deviceId': u'15545951-3106-4662-8d49-a631b8cc07c0', u'device': u'bridge',
> u'type': u'interface'}, {u'device': u'memballoon', u'specParams': {u'model':
> u'virtio'}, u'type': u'balloon', u'deviceId':
> u'f5447bf6-d36a-47f9-a679-b1a1c3806f85'}, {u'index': u'0', u'specParams':
> {}, u'deviceId': u'84ae75f0-404f-491a-a35e-85481c317fa0', u'device':
> u'scsi', u'model': u'virtio-scsi', u'type': u'controller'}, {u'device':
> u'virtio-serial', u'specParams': {}, u'type': u'controller', u'deviceId':
> u'53117546-60fc-41d3-a7b6-996aa5258a26'}], u'maxVCpus': u'16',
> u'spiceSecureChannels':
> u'smain,sinputs,scursor,splayback,srecord,sdisplay,susbredir,ssmartcard',
> u'display': u'qxl'}, u'vmID': u'c3c08421-6da3-4a28-9adb-ab23e7eac76a'}

The virtlogd is not automaticallay start after installing host via RHEVM. I started the VM first time then the virtlogd.sock error appeared. But it only case the VM down, the host was still on.
After I started virtlogd manually, the status of VM became '?', the host and storage domain were both down.
So I think the virtlogd sock error is from the first time I started the VM, which didn't cause the host down.

Comment 12 Francesco Romani 2016-03-21 16:57:47 UTC
Thanks Han.

Reading https://bugzilla.redhat.com/show_bug.cgi?id=1290357, I'm still not sure this is issue of RHEV-M or libvirt issue, perhaps just packaging issue.

According to https://bugzilla.redhat.com/show_bug.cgi?id=1318902#c11 , the packaging seems not 100% correct, hence depending on 1290357

We could perhaps add extra safety in ovirt-host-deploy, to ensure the required service is up.

Comment 13 Michal Skrivanek 2016-03-22 11:52:14 UTC
(In reply to Francesco Romani from comment #12)

how did those versions end up on your host anyway? libvirt-1.3.2 is not a RHEL 7.2, neither is qemu 2.5

Comment 14 Han Han 2016-03-24 01:50:44 UTC
(In reply to Michal Skrivanek from comment #13)
> (In reply to Francesco Romani from comment #12)
> 
> how did those versions end up on your host anyway? libvirt-1.3.2 is not a
> RHEL 7.2, neither is qemu 2.5

Well, I use RHEL 7.3 Beta for host. The libvirt/qemu version as comment1 says.

Comment 15 Francesco Romani 2016-03-24 10:25:06 UTC
The current resolution is:
- depend on rhbz#1290357 to make sure the libvirt dependency is well behaving
  -- no change needed in ovirt-host-deploy
- disable support for virlogd until our integration is well tested 
  -- https://gerrit.ovirt.org/55189

Comment 16 Michal Skrivanek 2016-03-24 15:54:08 UTC
Actually, we need it in 3.6 in time before we get 7.3 out of a sudden. Please check the change is ok on old(current) libvirt in 7.2

Comment 17 Francesco Romani 2016-03-24 16:49:06 UTC
(In reply to Michal Skrivanek from comment #16)
> Actually, we need it in 3.6 in time before we get 7.3 out of a sudden.
> Please check the change is ok on old(current) libvirt in 7.2

Should be OK already, I verified on CentOS 7.2, which is equivalent for this purpose, with libvirt 1.2.17. Will doublecheck on real RHEL7.2 just to be sure.

Comment 22 Ilanit Stein 2016-04-13 14:08:15 UTC
This bug can't be verified till 1326839 resolved.

Comment 23 Ilanit Stein 2016-04-17 16:10:22 UTC
On a rhel 7.3 Beta host (upgraded from rhel 7.2), 
tried to verified this bug:

Run once, boot from pxe a guest, turned the host to non-responsive state, and then the VM, that started to become up, to down state.
with same host events as described in this bug description.

service vdsmd restart didn't help. vdsm service was still down (code=killed, signal=SEGV).

host reinstall via rhevm webadmin, didn't help - Install failed on 
"Host <hostname> installation failed. Network error during communication with the host."

host versions:
=============
libvirt-1.3.3-2.el7.x86_64
vdsm-4.17.26-0.el7ev.noarch
qemu-kvm-rhev-2.3.0-31.el7_2.12.x86_64
3.10.0-379.el7.x86_64

engine version:
==============
rhevm-3.6.5-0.1.el6

vdsm service status:
===================
[root@blond-vdsf yum.repos.d]# service vdsmd status
Redirecting to /bin/systemctl status  vdsmd.service
● vdsmd.service - Virtual Desktop Server Manager
   Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled; vendor preset: enabled)
   Active: activating (auto-restart) (Result: signal) since Sun 2016-04-17 18:27:33 IDT; 8s ago
  Process: 19283 ExecStopPost=/usr/libexec/vdsm/vdsmd_init_common.sh --post-stop (code=exited, status=0/SUCCESS)
  Process: 19203 ExecStart=/usr/share/vdsm/daemonAdapter -0 /dev/null -1 /dev/null -2 /dev/null /usr/share/vdsm/vdsm (code=killed, signal=SEGV)
  Process: 19102 ExecStartPre=/usr/libexec/vdsm/vdsmd_init_common.sh --pre-start (code=exited, status=0/SUCCESS)
 Main PID: 19203 (code=killed, signal=SEGV)

Apr 17 18:27:33 blond-vdsf.qa.lab.tlv.redhat.com systemd[1]: vdsmd.service failed.
Apr 17 18:27:33 blond-vdsf.qa.lab.tlv.redhat.com systemd[1]: vdsmd.service holdoff time over, scheduling restart.

Attaching vdsm, qemu logs.

Michal,

Is there a way to overcome this bug, without re installing the host from scratch?

Comment 24 Ilanit Stein 2016-04-17 16:17:54 UTC
Created attachment 1148073 [details]
vdsm log of verification try that failed

For the relevant VM, Please see VM.create @ time 18:22

Comment 25 Ilanit Stein 2016-04-17 16:18:30 UTC
Created attachment 1148074 [details]
qemu log of verification try that failed

Comment 26 Ilanit Stein 2016-04-17 16:20:51 UTC
Created attachment 1148075 [details]
engine log of verification try that failed

The time on the engine is 2 hours behind the rhel 7.3 host, running the VM.
(VM creation on host 18:22, and on engine, need to look 2 16:22)

Comment 27 Francesco Romani 2016-04-18 09:21:35 UTC
(In reply to Ilanit Stein from comment #23)
> On a rhel 7.3 Beta host (upgraded from rhel 7.2), 
> tried to verified this bug:
> 
> Run once, boot from pxe a guest, turned the host to non-responsive state,
> and then the VM, that started to become up, to down state.
> with same host events as described in this bug description.
> 
> service vdsmd restart didn't help. vdsm service was still down (code=killed,
> signal=SEGV).
> 
> host reinstall via rhevm webadmin, didn't help - Install failed on 
> "Host <hostname> installation failed. Network error during communication
> with the host."
> 
> host versions:
> =============
> libvirt-1.3.3-2.el7.x86_64
> vdsm-4.17.26-0.el7ev.noarch
> qemu-kvm-rhev-2.3.0-31.el7_2.12.x86_64
> 3.10.0-379.el7.x86_64
> 
> engine version:
> ==============
> rhevm-3.6.5-0.1.el6
> 
> vdsm service status:
> ===================
> [root@blond-vdsf yum.repos.d]# service vdsmd status
> Redirecting to /bin/systemctl status  vdsmd.service
> ● vdsmd.service - Virtual Desktop Server Manager
>    Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled; vendor
> preset: enabled)
>    Active: activating (auto-restart) (Result: signal) since Sun 2016-04-17
> 18:27:33 IDT; 8s ago
>   Process: 19283 ExecStopPost=/usr/libexec/vdsm/vdsmd_init_common.sh
> --post-stop (code=exited, status=0/SUCCESS)
>   Process: 19203 ExecStart=/usr/share/vdsm/daemonAdapter -0 /dev/null -1
> /dev/null -2 /dev/null /usr/share/vdsm/vdsm (code=killed, signal=SEGV)
>   Process: 19102 ExecStartPre=/usr/libexec/vdsm/vdsmd_init_common.sh
> --pre-start (code=exited, status=0/SUCCESS)
>  Main PID: 19203 (code=killed, signal=SEGV)
> 
> Apr 17 18:27:33 blond-vdsf.qa.lab.tlv.redhat.com systemd[1]: vdsmd.service
> failed.
> Apr 17 18:27:33 blond-vdsf.qa.lab.tlv.redhat.com systemd[1]: vdsmd.service
> holdoff time over, scheduling restart.
> 
> Attaching vdsm, qemu logs.
> 
> Michal,
> 
> Is there a way to overcome this bug, without re installing the host from
> scratch?

Seems like libvirt is just crashing, as we learn from 

https://bugzilla.redhat.com/show_bug.cgi?id=1326839#c4

Please note:

Apr 13 22:03:04 amd-6172-512-2 kernel: periodic/0[49069]: segfault at deadbef7 ip 00007f6717c1bf78 sp 00007f66a4ff6ca8 error 4 in libvirt.so.0.1003.3[7f6717b6c000+33d000]

I can't tell if it is a libvirt bug or if it is host-specific.
The root cause is different from the one fixed here, even though I concur the final effect is the same.

Comment 29 Shira Maximov 2016-07-24 10:27:45 UTC
verification steps: 
1. create rhel 7.3 host using the following repos: http://download.eng.tlv.redhat.com/pub/rhel/nightly/RHEL-7.3-20160720.n.0/compose/Server/x86_64/os/
2. add the host to existing RHEVM 4.0.2-0.2.rc1.el7ev setup 
3. create vm from template on rhel7.3 host - worked 
4. create vm and run once on rhel 7.3 host - worked

Comment 31 errata-xmlrpc 2016-08-23 20:19:06 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-1671.html


Note You need to log in before you can comment on or make changes to this bug.