Bug 1326839

Summary: libvirt-python crashes in getAllDomainStats
Product: Red Hat Enterprise Linux 7 Reporter: Ilanit Stein <istein>
Component: libvirt-pythonAssignee: Pavel Hrdina <phrdina>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact:
Priority: unspecified    
Version: 7.3CC: bugs, dyuan, hhan, jdenemar, lcheng, mavital, michal.skrivanek, mzhan, pzhang, sherold, xuzhang, yanyang
Target Milestone: pre-dev-freezeKeywords: Regression
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: libvirt-python-1.3.4-1.el7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-11-04 00:12:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Virt RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1318902, 1322796    
Attachments:
Description Flags
vdsm log
none
engine log
none
/var/log/messages none

Description Ilanit Stein 2016-04-13 13:52:39 UTC
Description of problem:
Installation of a rhel 7.3 in rhevm fail on host non responsive.
vdsmd service did not succeed to come up. Failed on: code=killed, signal=SEGV


Version-Release number of selected component (if applicable):
vdsm-4.17.25-0.el7ev.noarch & vdsm-4.17.26-0.el7ev.noarch

libvirt-1.3.3-1.el7.x86_64
qemu-img-rhev-2.5.0-4.el7.x86_64
qemu-kvm-rhev-2.5.0-4.el7.x86_64

Additional info:

systemctl status  -l vdsmd.service
● vdsmd.service - Virtual Desktop Server Manager
   Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled; vendor preset: enabled)
   Active: activating (auto-restart) (Result: signal) since Wed 2016-04-13 17:52:25 CST; 4s ago
  Process: 26156 ExecStopPost=/usr/libexec/vdsm/vdsmd_init_common.sh --post-stop (code=exited, status=0/SUCCESS)
  Process: 26091 ExecStart=/usr/share/vdsm/daemonAdapter -0 /dev/null -1 /dev/null -2 /dev/null /usr/share/vdsm/vdsm (code=killed, signal=SEGV)
  Process: 26019 ExecStartPre=/usr/libexec/vdsm/vdsmd_init_common.sh --pre-start (code=exited, status=0/SUCCESS)
 Main PID: 26091 (code=killed, signal=SEGV)

Apr 13 17:52:25 amd-6172-512-2.lab.eng.pek2.redhat.com systemd[1]: vdsmd.service failed.
Apr 13 17:52:26 amd-6172-512-2.lab.eng.pek2.redhat.com systemd[1]: vdsmd.service holdoff time over, scheduling restart.


dmesg contain repeated such messages:
"[21367.505170] periodic/0[27661]: segfault at deadbef7 ip 00007f1c37f11f78 sp 00007f1bc17f7ca8 error 4 in libvirt.so.0.1003.3[7f1c37e62000+33d000]
"

Comment 1 Ilanit Stein 2016-04-13 13:53:20 UTC
Created attachment 1146857 [details]
vdsm log

Comment 2 Ilanit Stein 2016-04-13 13:54:01 UTC
Created attachment 1146858 [details]
engine log

Comment 3 Ilanit Stein 2016-04-13 13:56:40 UTC
Created attachment 1146859 [details]
/var/log/messages

Comment 4 Ilanit Stein 2016-04-13 14:05:12 UTC
Additional Info:

* hhan: so the service is killed by SIGSEGV , but we cannot get coredump..

Some messages seen on From /var/log/messages (attached):

Apr 13 22:03:01 amd-6172-512-2 systemd: mom-vdsm.service stop-sigterm timed out. Killing.
Apr 13 22:03:01 amd-6172-512-2 systemd: mom-vdsm.service: main process exited, code=killed, status=9/KILL
Apr 13 22:03:01 amd-6172-512-2 systemd: Unit mom-vdsm.service entered failed state.
Apr 13 22:03:01 amd-6172-512-2 systemd: mom-vdsm.service failed.
Apr 13 22:03:01 amd-6172-512-2 systemd: Starting Virtual Desktop Server Manager...
Apr 13 22:03:01 amd-6172-512-2 vdsmd_init_common.sh: vdsm: Running mkdirs
Apr 13 22:03:01 amd-6172-512-2 vdsmd_init_common.sh: vdsm: Running configure_coredump
Apr 13 22:03:01 amd-6172-512-2 vdsmd_init_common.sh: vdsm: Running configure_vdsm_logs
Apr 13 22:03:01 amd-6172-512-2 vdsmd_init_common.sh: vdsm: Running wait_for_network
Apr 13 22:03:01 amd-6172-512-2 vdsmd_init_common.sh: vdsm: Running run_init_hooks
Apr 13 22:03:01 amd-6172-512-2 vdsmd_init_common.sh: vdsm: Running upgraded_version_check
Apr 13 22:03:01 amd-6172-512-2 vdsmd_init_common.sh: vdsm: Running check_is_configured
Apr 13 22:03:01 amd-6172-512-2 vdsmd_init_common.sh: Current revision of multipath.conf detected, preserving
Apr 13 22:03:01 amd-6172-512-2 vdsmd_init_common.sh: libvirt is already configured for vdsm
Apr 13 22:03:01 amd-6172-512-2 vdsmd_init_common.sh: vdsm: Running validate_configuration
Apr 13 22:03:02 amd-6172-512-2 vdsmd_init_common.sh: SUCCESS: ssl configured to true. No conflicts
Apr 13 22:03:02 amd-6172-512-2 vdsmd_init_common.sh: vdsm: Running prepare_transient_repository
Apr 13 22:03:02 amd-6172-512-2 vdsmd_init_common.sh: vdsm: Running syslog_available
Apr 13 22:03:02 amd-6172-512-2 vdsmd_init_common.sh: vdsm: Running nwfilter
Apr 13 22:03:03 amd-6172-512-2 vdsmd_init_common.sh: vdsm: Running dummybr
Apr 13 22:03:03 amd-6172-512-2 vdsmd_init_common.sh: vdsm: Running tune_system
Apr 13 22:03:03 amd-6172-512-2 vdsmd_init_common.sh: vdsm: Running test_space
Apr 13 22:03:03 amd-6172-512-2 vdsmd_init_common.sh: vdsm: Running test_lo
Apr 13 22:03:03 amd-6172-512-2 systemd: Started Virtual Desktop Server Manager.
Apr 13 22:03:03 amd-6172-512-2 systemd: Started MOM instance configured for VDSM purposes.
Apr 13 22:03:03 amd-6172-512-2 systemd: Starting MOM instance configured for VDSM purposes...
Apr 13 22:03:04 amd-6172-512-2 kernel: periodic/0[49069]: segfault at deadbef7 ip 00007f6717c1bf78 sp 00007f66a4ff6ca8 error 4 in libvirt.so.0.1003.3[7f6717b6c000+33d000]
Apr 13 22:03:04 amd-6172-512-2 journal: End of file while reading data: Input/output error
Apr 13 22:03:04 amd-6172-512-2 journal: End of file while reading data: Input/output error
Apr 13 22:03:04 amd-6172-512-2 systemd: vdsmd.service: main process exited, code=killed, status=11/SEGV
Apr 13 22:03:04 amd-6172-512-2 vdsmd_init_common.sh: vdsm: Running run_final_hooks
Apr 13 22:03:04 amd-6172-512-2 systemd: Unit vdsmd.service entered failed state.
Apr 13 22:03:04 amd-6172-512-2 systemd: vdsmd.service failed.
Apr 13 22:03:04 amd-6172-512-2 systemd: vdsmd.service holdoff time over, scheduling restart.
Apr 13 22:03:04 amd-6172-512-2 systemd: Stopping MOM instance configured for VDSM purposes...
Apr 13 22:03:15 amd-6172-512-2 systemd: mom-vdsm.service stop-sigterm timed out. Killing.
Apr 13 22:03:15 amd-6172-512-2 systemd: mom-vdsm.service: main process exited, code=killed, status=9/KILL
Apr 13 22:03:15 amd-6172-512-2 systemd: Unit mom-vdsm.service entered failed state.
Apr 13 22:03:15 amd-6172-512-2 systemd: mom-vdsm.service failed.
Apr 13 22:03:15 amd-6172-512-2 systemd: Starting Virtual Desktop Server Manager...
Apr 13 22:03:15 amd-6172-512-2 vdsmd_init_common.sh: vdsm: Running mkdirs
Apr 13 22:03:15 amd-6172-512-2 vdsmd_init_common.sh: vdsm: Running configure_coredump
Apr 13 22:03:15 amd-6172-512-2 vdsmd_init_common.sh: vdsm: Running configure_vdsm_logs
Apr 13 22:03:15 amd-6172-512-2 vdsmd_init_common.sh: vdsm: Running wait_for_network
Apr 13 22:03:15 amd-6172-512-2 vdsmd_init_common.sh: vdsm: Running run_init_hooks
Apr 13 22:03:15 amd-6172-512-2 vdsmd_init_common.sh: vdsm: Running upgraded_version_check
Apr 13 22:03:15 amd-6172-512-2 vdsmd_init_common.sh: vdsm: Running check_is_configured
Apr 13 22:03:15 amd-6172-512-2 vdsmd_init_common.sh: Current revision of multipath.conf detected, preserving
Apr 13 22:03:15 amd-6172-512-2 vdsmd_init_common.sh: libvirt is already configured for vdsm
Apr 13 22:03:15 amd-6172-512-2 vdsmd_init_common.sh: vdsm: Running validate_configuration
Apr 13 22:03:16 amd-6172-512-2 vdsmd_init_common.sh: SUCCESS: ssl configured to true. No conflicts
Apr 13 22:03:16 amd-6172-512-2 vdsmd_init_common.sh: vdsm: Running prepare_transient_repository
Apr 13 22:03:16 amd-6172-512-2 vdsmd_init_common.sh: vdsm: Running syslog_available
Apr 13 22:03:16 amd-6172-512-2 vdsmd_init_common.sh: vdsm: Running nwfilter
Apr 13 22:03:17 amd-6172-512-2 vdsmd_init_common.sh: vdsm: Running dummybr
Apr 13 22:03:17 amd-6172-512-2 vdsmd_init_common.sh: vdsm: Running tune_system
Apr 13 22:03:17 amd-6172-512-2 vdsmd_init_common.sh: vdsm: Running test_space
Apr 13 22:03:17 amd-6172-512-2 vdsmd_init_common.sh: vdsm: Running test_lo
Apr 13 22:03:17 amd-6172-512-2 systemd: Started Virtual Desktop Server Manager.
Apr 13 22:03:17 amd-6172-512-2 systemd: Started MOM instance configured for VDSM purposes.
Apr 13 22:03:17 amd-6172-512-2 systemd: Starting MOM instance configured for VDSM purposes...


Apr 13 22:03:18 amd-6172-512-2 kernel: periodic/0[49214]: segfault at deadbef7 ip 00007f71c4519f78 sp 00007f714d7f7ca8 error 4 in libvirt.so.0.1003.3[7f71c446a000+33d000]


Apr 13 22:03:18 amd-6172-512-2 journal: End of file while reading data: Input/output error
Apr 13 22:03:18 amd-6172-512-2 journal: End of file while reading data: Input/output error
Apr 13 22:03:18 amd-6172-512-2 systemd: vdsmd.service: main process exited, code=killed, status=11/SEGV
Apr 13 22:03:18 amd-6172-512-2 vdsmd_init_common.sh: vdsm: Running run_final_hooks
Apr 13 22:03:18 amd-6172-512-2 systemd: Unit vdsmd.service entered failed state.
Apr 13 22:03:18 amd-6172-512-2 systemd: vdsmd.service failed.
Apr 13 22:03:18 amd-6172-512-2 systemd: vdsmd.service holdoff time over, scheduling restart.
Apr 13 22:03:18 amd-6172-512-2 systemd: Stopping MOM instance configured for VDSM purposes...

Comment 5 Francesco Romani 2016-04-18 13:21:08 UTC
I did some investigation on the host.
It seems either bad python-libvirt bug or serious host issue:

Python 2.7.5 (default, Apr  4 2016, 11:14:06) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import libvirt
>>> conn = libvirt.openReadOnly('qemu:///system')
>>> doms = conn.listAllDomains(0)
>>> doms
[<libvirt.virDomain object at 0x7f1861b99850>]
>>> print(doms[0].UUIDString())
0324055f-f0f2-4fe9-a236-20e95ff3350a
>>> print(conn.getAllDomainStats())
Segmentation fault

libvirt-daemon-driver-nodedev-1.3.3-2.el7.x86_64
libvirt-devel-1.3.3-2.el7.x86_64
libvirt-daemon-driver-network-1.3.3-2.el7.x86_64
libvirt-daemon-config-nwfilter-1.3.3-2.el7.x86_64
libvirt-lock-sanlock-1.3.3-2.el7.x86_64
libvirt-daemon-driver-nwfilter-1.3.3-2.el7.x86_64
libvirt-daemon-driver-secret-1.3.3-2.el7.x86_64
libvirt-daemon-config-network-1.3.3-2.el7.x86_64
libvirt-daemon-kvm-1.3.3-2.el7.x86_64
libvirt-login-shell-1.3.3-2.el7.x86_64
libvirt-client-1.3.3-2.el7.x86_64
libvirt-daemon-driver-interface-1.3.3-2.el7.x86_64
libvirt-daemon-driver-lxc-1.3.3-2.el7.x86_64
libvirt-docs-1.3.3-2.el7.x86_64
libvirt-daemon-lxc-1.3.3-2.el7.x86_64
libvirt-debuginfo-1.3.3-2.el7.x86_64
libvirt-daemon-1.3.3-2.el7.x86_64
libvirt-daemon-driver-qemu-1.3.3-2.el7.x86_64
libvirt-nss-1.3.3-2.el7.x86_64
libvirt-python-1.3.3-1.el7.x86_64
libvirt-daemon-driver-storage-1.3.3-2.el7.x86_64
libvirt-1.3.3-2.el7.x86_64

Comment 6 Francesco Romani 2016-04-18 13:22:38 UTC
Possible workarounds (none actually tried)
- try another RHEL 7.3 host to check if it is really host-specific or package issue
- downgrade libvirt-python (if possible) to latest 1.2.x version

Comment 7 Francesco Romani 2016-04-18 13:31:35 UTC
(In reply to Francesco Romani from comment #5)
> I did some investigation on the host.
> It seems either bad python-libvirt bug or serious host issue:
> 
> Python 2.7.5 (default, Apr  4 2016, 11:14:06) 
> [GCC 4.8.5 20150623 (Red Hat 4.8.5-4)] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import libvirt
> >>> conn = libvirt.openReadOnly('qemu:///system')
> >>> doms = conn.listAllDomains(0)
> >>> doms
> [<libvirt.virDomain object at 0x7f1861b99850>]
> >>> print(doms[0].UUIDString())
> 0324055f-f0f2-4fe9-a236-20e95ff3350a
> >>> print(conn.getAllDomainStats())
> Segmentation fault
> 
> libvirt-daemon-driver-nodedev-1.3.3-2.el7.x86_64
> libvirt-devel-1.3.3-2.el7.x86_64
> libvirt-daemon-driver-network-1.3.3-2.el7.x86_64
> libvirt-daemon-config-nwfilter-1.3.3-2.el7.x86_64
> libvirt-lock-sanlock-1.3.3-2.el7.x86_64
> libvirt-daemon-driver-nwfilter-1.3.3-2.el7.x86_64
> libvirt-daemon-driver-secret-1.3.3-2.el7.x86_64
> libvirt-daemon-config-network-1.3.3-2.el7.x86_64
> libvirt-daemon-kvm-1.3.3-2.el7.x86_64
> libvirt-login-shell-1.3.3-2.el7.x86_64
> libvirt-client-1.3.3-2.el7.x86_64
> libvirt-daemon-driver-interface-1.3.3-2.el7.x86_64
> libvirt-daemon-driver-lxc-1.3.3-2.el7.x86_64
> libvirt-docs-1.3.3-2.el7.x86_64
> libvirt-daemon-lxc-1.3.3-2.el7.x86_64
> libvirt-debuginfo-1.3.3-2.el7.x86_64
> libvirt-daemon-1.3.3-2.el7.x86_64
> libvirt-daemon-driver-qemu-1.3.3-2.el7.x86_64
> libvirt-nss-1.3.3-2.el7.x86_64
> libvirt-python-1.3.3-1.el7.x86_64
> libvirt-daemon-driver-storage-1.3.3-2.el7.x86_64
> libvirt-1.3.3-2.el7.x86_64


Works like a charm with libvirt 1.3.1 (upstream) recompiled from vanilla sources:
libvirt-daemon-driver-interface-1.3.1-1.el7.centos.x86_64
libvirt-docs-1.3.1-1.el7.centos.x86_64
libvirt-daemon-1.3.1-1.el7.centos.x86_64
libvirt-daemon-driver-qemu-1.3.1-1.el7.centos.x86_64
libvirt-lock-sanlock-1.3.1-1.el7.centos.x86_64
libvirt-python-1.2.17-2.el7.x86_64
libvirt-daemon-driver-secret-1.3.1-1.el7.centos.x86_64
libvirt-1.3.1-1.el7.centos.x86_64
libvirt-daemon-driver-nwfilter-1.3.1-1.el7.centos.x86_64
libvirt-daemon-driver-storage-1.3.1-1.el7.centos.x86_64
libvirt-daemon-config-network-1.3.1-1.el7.centos.x86_64
libvirt-daemon-kvm-1.3.1-1.el7.centos.x86_64
libvirt-debuginfo-1.3.1-1.el7.centos.x86_64
libvirt-python-debuginfo-1.2.13-1.el7.centos.x86_64
libvirt-client-1.3.1-1.el7.centos.x86_64
libvirt-daemon-driver-lxc-1.3.1-1.el7.centos.x86_64
libvirt-daemon-lxc-1.3.1-1.el7.centos.x86_64
libvirt-daemon-driver-nodedev-1.3.1-1.el7.centos.x86_64
libvirt-devel-1.3.1-1.el7.centos.x86_64
libvirt-daemon-driver-network-1.3.1-1.el7.centos.x86_64
libvirt-daemon-config-nwfilter-1.3.1-1.el7.centos.x86_64
libvirt-login-shell-1.3.1-1.el7.centos.x86_64

Jiri, could you please have a look? is this a known issue?

Comment 8 Jiri Denemark 2016-04-18 13:47:09 UTC
It's a new thing and it is reproducible even with current upstream of both libvirt and libvirt-python.

Comment 9 Jiri Denemark 2016-04-18 14:32:07 UTC
Broken by v1.2.20-20-g827ed9b (libvirt-python):

commit 1d39dbaf637db03f6e597ed56b96aa065710b4a1
Author:     Pavel Hrdina <phrdina>
AuthorDate: Mon Oct 5 09:42:23 2015 +0200
Commit:     Pavel Hrdina <phrdina>
CommitDate: Mon Oct 5 09:42:23 2015 +0200

    use VYR_PY_LIST_SET_GOTO and VIR_PY_LIST_APPEND_GOTO
    
    Signed-off-by: Pavel Hrdina <phrdina>

Comment 11 Pavel Hrdina 2016-04-19 07:53:58 UTC
Upstream commit:

commit e9c4e2abffef007a28112ebb40a9586b0128f10b
Author: Pavel Hrdina <phrdina>
Date:   Mon Apr 18 16:53:50 2016 +0200

    fix crash in getAllDomainStats

Comment 13 lcheng 2016-06-14 10:52:26 UTC
Verified on libvirt-python-1.3.5-1.el7.x86_64, the result is as expected. Modify the status to VERIFIED.


Python 2.7.5 (default, Oct 11 2015, 17:47:16) 
[GCC 4.8.3 20140911 (Red Hat 4.8.3-9)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import libvirt
>>> conn = libvirt.openReadOnly('qemu:///system')
>>> doms = conn.listAllDomains(0)
>>> doms
[<libvirt.virDomain object at 0x7f17a5e0c710>]
>>> print(doms[0].UUIDString())
05867c1a-afeb-300e-e55e-2673391ae080
>>> print(conn.getAllDomainStats())
[(<libvirt.virDomain object at 0x7f17a5e38cd0>, {'block.0.rd.reqs': 0L, 'block.0.name': 'vda', 'cpu.system': 736820000000L, 'block.0.wr.reqs': 0L, 'cpu.time': 1421518322510L, 'net.0.tx.bytes': 0L, 'state.reason': 1, 'vcpu.0.time': 1421330000000L, 'net.count': 1, 'net.0.name': 'vnet0', 'state.state': 1, 'block.count': 1, 'net.0.tx.pkts': 0L, 'block.0.rd.times': 0L, 'vcpu.0.wait': 0L, 'balloon.maximum': 1048576L, 'block.0.path': '/var/lib/libvirt/images/libvirt-test-api', 'block.0.physical': 1110769664L, 'net.0.rx.pkts': 1L, 'net.0.rx.drop': 7538L, 'block.0.wr.times': 0L, 'block.0.allocation': 0L, 'vcpu.maximum': 1, 'block.0.wr.bytes': 0L, 'net.0.rx.errs': 0L, 'net.0.tx.drop': 0L, 'net.0.tx.errs': 0L, 'block.0.rd.bytes': 0L, 'vcpu.0.state': 1, 'block.0.fl.reqs': 0L, 'cpu.user': 70000000L, 'net.0.rx.bytes': 90L, 'vcpu.current': 1, 'balloon.current': 1048576L, 'block.0.capacity': 10737418240L, 'block.0.fl.times': 0L})]
>>>

Comment 15 errata-xmlrpc 2016-11-04 00:12:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-2186.html