Bug 1420718 - libvirtd crashes when calling virConnectGetAllDomainStats on a VM which has empty cdrom drive
Summary: libvirtd crashes when calling virConnectGetAllDomainStats on a VM which has e...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: libvirt
Version: 7.4
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: rc
: 7.4
Assignee: Peter Krempa
QA Contact: yanqzhan@redhat.com
URL:
Whiteboard:
Depends On:
Blocks: 1426477 CVE-2017-2635
TreeView+ depends on / blocked
 
Reported: 2017-02-09 11:35 UTC by yanqzhan@redhat.com
Modified: 2017-08-02 00:01 UTC (History)
12 users (show)

Fixed In Version: libvirt-3.1.0-1.el7
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1426477 (view as bug list)
Environment:
Last Closed: 2017-08-01 17:21:45 UTC
Target Upstream Version:


Attachments (Terms of Use)
engine.log (8.04 KB, text/plain)
2017-02-09 11:35 UTC, yanqzhan@redhat.com
no flags Details
engineLog_errorPicture (224.56 KB, application/x-gzip)
2017-02-10 03:29 UTC, yanqzhan@redhat.com
no flags Details
gdb.txt for libvirtd (40.61 KB, text/plain)
2017-02-22 11:17 UTC, yanqzhan@redhat.com
no flags Details
Use virsh to reproduce (791 bytes, application/x-shellscript)
2017-02-24 02:07 UTC, Han Han
no flags Details


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2017:1846 normal SHIPPED_LIVE libvirt bug fix and enhancement update 2017-08-01 18:02:50 UTC

Description yanqzhan@redhat.com 2017-02-09 11:35:12 UTC
Created attachment 1248789 [details]
engine.log

Description of problem:
Fail to initiate a console session for VM when RunOnce by network(pxe) 

Version-Release number of selected component (if applicable):
Rhevm-4.1.0.4-0.1.el7.noarch
Hosts:
libvirt-daemon-3.0.0-1.el7.x86_64
vdsm-4.19.5-1.el7ev.x86_64

How reproducible:
100%

Steps to Reproduce:
1.Create a data center: edit name, set others as default.

2.Create a cluster: edit name, CPU Arc:x86_64, CPU Type: AMD, set others as default.

3.Add 2 hosts with AMD cpu

4.Create a nfs storage

5.New a vm: edit name, nic1:ovritmgmt/ovritmgmt, 
            create image: 5G, Interface:VirtIO, storage:nfs
set others as default.

6.RunOnce vm: boot options: select “network(pxe)” up to top, click ‘ok’ .

7.When vm is PoweringUp, click ‘console’ icon to open console for vm.


Actual results:
The console for vm is opened and begin booting from pxe for a few seconds, but will be closed soon.
Click the ‘console’ icon again to reopen console, console cannot be opened with error: “Setting VM ticket failed”.
After a few minutes, vm is down.


Expected results:
The console should keep open for booting, should not get error “Setting VM ticket failed”, and the vm should not be down.


Additional info:
1.Pls refer to attachments for logs.
2.Not occurred on Rhevm-4.1.0-0.3.beta2.el7.noarch, vdsm-4.19.1-1.el7ev.x86_64.
3.“Setting VM ticket failed” also occurred when: after 'Runonce' installation with boot.iso then 'Run' VM and try to open console. VM will also down.

Comment 2 Arik 2017-02-09 12:01:03 UTC
I don't see anything useful in the attached engine.log - are you sure it covers the right time?
Can you provide more information - what was the name of the VM, when did you notice that it went Down or failed to initiate a console?

Comment 3 yanqzhan@redhat.com 2017-02-10 03:29:18 UTC
Created attachment 1248951 [details]
engineLog_errorPicture

The VM name is 'AA'. On rhevm webpage, the events are as follows:
  VM AA was started by admin@internal-authz (Host: A).
  User admin@internal-authz initiated console session for VM AA
  User admin@internal-authz is connected to VM AA.
  VDSM A command GetStatsVDS failed: Heartbeat exceeded
  VDSM A command SpmStatusVDS failed: Heartbeat exceeded
  User admin@internal-authz failed to initiate a console session for VM AA
  Invalid status on Data Center Default. Setting Data Center status to Non Responsive (On host A, Error: Network error during communication with the Host.).
  User admin@internal-authz failed to initiate a console session for VM AA
  User admin@internal-authz failed to initiate a console session for VM AA
  VDSM command GetStoragePoolInfoVDS failed: Heartbeat exceeded
  Status of host A was set to Up.
  VDSM A command HSMGetAllTasksStatusesVDS failed: Not SPM
  VM AA is down with error. Exit message: Failed to find the libvirt domain.
  Failed to run VM AA on Host A.
  Failed to run VM AA (User: admin@internal-authz).
  Storage Pool Manager runs on Host A (Address: amd-8750-4-2.englab.nay.redhat.com).

Pls refer to the new attachment: engineLog_errorPicture for more details.

Comment 4 Michal Skrivanek 2017-02-10 07:33:35 UTC
there are libvirt connectivity issues too - please get libvirt, qemu, and system logs and specify what are the exact pkgs versions

Comment 8 yanqzhan@redhat.com 2017-02-22 11:17:02 UTC
Created attachment 1256417 [details]
gdb.txt for libvirtd

It's a libvirt issue, reproduce on both libvirt-3.0.0-2.el7.x86_64 and libvirt-3.0.0-1.el7.x86_64, not reproduced on libvirt-2.5.0-1.el7.x86_64.

Pls refer to attachment: gdb.txt for libvirtd

Comment 11 yanqzhan@redhat.com 2017-02-23 06:06:15 UTC
Hi, can reproduce on following env:
rhevm-4.0.7-0.1.el7ev.noarch
vdsm-4.18.23-1.el7ev.x86_64
Libvirt-3.0.0-2.el7.x86_64

Steps:
1.gdb attaches libvirtd process
#gdb -p `pidof libvirtd` 
(gdb)c
Continuing

2.Try to start the vm by RunOnce with network(pxe) on rhevm webpage

3.A coredump occurred:
(gdb)
Continuing
Detaching after fork from child process 12902.
...
Detaching after fork from child process 13012.
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ff34ab84700 (LWP 12530)]
__strrchr_sse42 () at ../sysdeps/x86_64/multiarch/strrchr.S:138
138        pcmpistri    $0x4a, (%r8), %xmm1


So guess that it's libvirtd bug. For more details pls refer to the attachment:"gdb.txt for libvirtd"



backtrace of libvirtd:
(gdb) bt
#0  __strrchr_sse42 () at ../sysdeps/x86_64/multiarch/strrchr.S:138
#1  0x00007ff35b4f1ec3 in virFileIsSharedFSType (path=path@entry=0x0,
    fstypes=fstypes@entry=63) at util/virfile.c:3363
#2  0x00007ff35b4f283a in virFileIsSharedFS (path=path@entry=0x0)
    at util/virfile.c:3569
#3  0x00007ff32fe08864 in qemuOpenFileAs (bypassSecurityDriver=0x0,
    needUnlink=0x0, oflags=0, path=0x0, dynamicOwnership=false,
    fallback_gid=107, fallback_uid=107) at qemu/qemu_driver.c:2927
#4  qemuOpenFile (driver=driver@entry=0x7ff3241167d0,
    vm=vm@entry=0x7ff320018b10, path=0x0, oflags=oflags@entry=0,
    needUnlink=needUnlink@entry=0x0,
    bypassSecurityDriver=bypassSecurityDriver@entry=0x0)
    at qemu/qemu_driver.c:2908
#5  0x00007ff32fe26c73 in qemuDomainStorageOpenStat (
    driver=driver@entry=0x7ff3241167d0, vm=vm@entry=0x7ff320018b10,
    src=src@entry=0x7ff320017140, ret_fd=ret_fd@entry=0x7ff34ab836ac,
    ret_sb=ret_sb@entry=0x7ff34ab836b0, cfg=0x7ff32410bca0, cfg=0x7ff32410bca0)
    at qemu/qemu_driver.c:11602
#6  0x00007ff32fe26ff0 in qemuDomainStorageUpdatePhysical (
    driver=driver@entry=0x7ff3241167d0, cfg=cfg@entry=0x7ff32410bca0,
    vm=vm@entry=0x7ff320018b10, src=src@entry=0x7ff320017140)
    at qemu/qemu_driver.c:11655
#7  0x00007ff32fe27a5d in qemuDomainGetStatsOneBlock (
---Type <return> to continue, or q <return> to quit---
    driver=driver@entry=0x7ff3241167d0, cfg=cfg@entry=0x7ff32410bca0,
    dom=dom@entry=0x7ff320018b10, record=record@entry=0x7ff318007de0,
    maxparams=maxparams@entry=0x7ff34ab839e4, src=src@entry=0x7ff320017140,
    block_idx=block_idx@entry=0, backing_idx=backing_idx@entry=0,
    stats=0x7ff318008060, disk=0x7ff320016f90, disk=0x7ff320016f90)
    at qemu/qemu_driver.c:19541
#8  0x00007ff32fe27ecc in qemuDomainGetStatsBlock (driver=0x7ff3241167d0,
    dom=0x7ff320018b10, record=0x7ff318007de0, maxparams=0x7ff34ab839e4,
    privflags=<optimized out>) at qemu/qemu_driver.c:19600
#9  0x00007ff32fe0cb21 in qemuDomainGetStats (flags=1,
    record=<synthetic pointer>, stats=127, dom=0x7ff320018b10,
    conn=0x7ff3241efcb0) at qemu/qemu_driver.c:19762
#10 qemuConnectGetAllDomainStats (conn=0x7ff3241efcb0, doms=<optimized out>,
    ndoms=<optimized out>, stats=127, retStats=0x7ff34ab83b10,
    flags=<optimized out>) at qemu/qemu_driver.c:19852
#11 0x00007ff35b5f5dc6 in virConnectGetAllDomainStats (conn=0x7ff3241efcb0,
    stats=0, retStats=retStats@entry=0x7ff34ab83b10, flags=0)
    at libvirt-domain.c:11311
#12 0x00007ff35c252d30 in remoteDispatchConnectGetAllDomainStats (
    server=0x7ff35e131020, msg=0x7ff35e164560, ret=0x7ff318007d80,
    args=0x7ff318006150, rerr=0x7ff34ab83c50, client=0x7ff35e1611e0)
    at remote.c:6543
#13 remoteDispatchConnectGetAllDomainStatsHelper (server=0x7ff35e131020,
---Type <return> to continue, or q <return> to quit---
    client=0x7ff35e1611e0, msg=0x7ff35e164560, rerr=0x7ff34ab83c50,
    args=0x7ff318006150, ret=0x7ff318007d80) at remote_dispatch.h:615
#14 0x00007ff35b656072 in virNetServerProgramDispatchCall (msg=0x7ff35e164560,
    client=0x7ff35e1611e0, server=0x7ff35e131020, prog=0x7ff35e1510f0)
    at rpc/virnetserverprogram.c:437
#15 virNetServerProgramDispatch (prog=0x7ff35e1510f0,
    server=server@entry=0x7ff35e131020, client=0x7ff35e1611e0,
    msg=0x7ff35e164560) at rpc/virnetserverprogram.c:307
#16 0x00007ff35c285ccd in virNetServerProcessMsg (msg=<optimized out>,
    prog=<optimized out>, client=<optimized out>, srv=0x7ff35e131020)
    at rpc/virnetserver.c:148
#17 virNetServerHandleJob (jobOpaque=<optimized out>, opaque=0x7ff35e131020)
    at rpc/virnetserver.c:169
#18 0x00007ff35b53d401 in virThreadPoolWorker (
    opaque=opaque@entry=0x7ff35e125bb0) at util/virthreadpool.c:167
#19 0x00007ff35b53c788 in virThreadHelper (data=<optimized out>)
    at util/virthread.c:206
#20 0x00007ff3588ffdc5 in start_thread (arg=0x7ff34ab84700)
    at pthread_create.c:308
#21 0x00007ff3586277ad in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Comment 12 Peter Krempa 2017-02-23 09:23:49 UTC
Way simpler libvirt-only reproducer is to create a VM with an empty cdrom drive and call "virsh domstats VM" while it's running.

Comment 13 Han Han 2017-02-24 02:07:06 UTC
Created attachment 1257112 [details]
Use virsh to reproduce

Reproded via virsh:
➜  ~ virsh attach-disk V  /tmp/boot.iso sda --config --type cdrom
Disk attached successfully

➜  ~ virsh change-media V sda --eject --config
Successfully ejected media.
➜  ~ virsh start V
Domain V started

➜  ~ virsh domstats V
error: Disconnected from qemu:///system due to I/O error
error: End of file while reading data: Input/output error

Comment 17 Peter Krempa 2017-02-24 08:47:11 UTC
commit c3de387380f6057ee0e46cd9f2f0a092e8070875
Author: Peter Krempa <pkrempa@redhat.com>
Date:   Thu Feb 23 10:07:30 2017 +0100

    qemu: Don't update physical storage size of empty drives
    
    Previously the code called virStorageSourceUpdateBlockPhysicalSize which
    did not do anything on empty drives since it worked only on block
    devices. After the refactor in c5f6151390 it's called for all devices
    and thus attempts to deref the NULL path of empty drives.
    
    Add a check that skips the update of the physical size if the storage
    source is empty.

Comment 19 yanqzhan@redhat.com 2017-03-24 08:13:13 UTC
Verify this bug with libvirt-3.1.0-2.el7.x86_64, qemu-kvm-rhev-2.8.0-6.el7.x86_64

Steps to verify:
1.Check libvirtd status
# systemctl status libvirtd|grep PID
 Main PID: 9115 (libvirtd)

2.Prepare a domain with empty cdrom
# cd /tmp
# wget http://.../boot.iso 

# virsh attach-disk V /tmp/boot.iso sda --config --type cdrom
Disk attached successfully

# virsh change-media V sda --eject --config
Successfully ejected media.

3.Start the domain and check domstats
# virsh start V
Domain V started

# virsh domstats V
Domain: 'V'
  state.state=1
  state.reason=1
  cpu.time=25500459451
  cpu.user=800000000
  cpu.system=9940000000
...

# virsh -r domstats V
Domain: 'V'
  state.state=1
  state.reason=1
  cpu.time=25994233410
  cpu.user=800000000
  cpu.system=9960000000
…

4.Check libvirtd again:
# systemctl status libvirtd |grep PID
 Main PID: 9115 (libvirtd)

libvirtd is not crashed.

Since the result is as expected, mark this bug as verified.

Comment 20 errata-xmlrpc 2017-08-01 17:21:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:1846

Comment 21 errata-xmlrpc 2017-08-02 00:01:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:1846


Note You need to log in before you can comment on or make changes to this bug.