Bug 1420718
Summary: | libvirtd crashes when calling virConnectGetAllDomainStats on a VM which has empty cdrom drive | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Yanqiu Zhang <yanqzhan> | ||||||||||
Component: | libvirt | Assignee: | Peter Krempa <pkrempa> | ||||||||||
Status: | CLOSED ERRATA | QA Contact: | Yanqiu Zhang <yanqzhan> | ||||||||||
Severity: | high | Docs Contact: | |||||||||||
Priority: | unspecified | ||||||||||||
Version: | 7.4 | CC: | ahadas, bugs, chhu, dyuan, hhan, michal.skrivanek, mzhan, pkrempa, pzhang, rbalakri, xuzhang, yanqzhan | ||||||||||
Target Milestone: | rc | Keywords: | Regression, TestBlocker | ||||||||||
Target Release: | 7.4 | ||||||||||||
Hardware: | x86_64 | ||||||||||||
OS: | Linux | ||||||||||||
Whiteboard: | |||||||||||||
Fixed In Version: | libvirt-3.1.0-1.el7 | Doc Type: | If docs needed, set a value | ||||||||||
Doc Text: | Story Points: | --- | |||||||||||
Clone Of: | |||||||||||||
: | 1426477 (view as bug list) | Environment: | |||||||||||
Last Closed: | 2017-08-01 17:21:45 UTC | Type: | Bug | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | Virt | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Bug Depends On: | |||||||||||||
Bug Blocks: | 1426477, 1427090 | ||||||||||||
Attachments: |
|
I don't see anything useful in the attached engine.log - are you sure it covers the right time? Can you provide more information - what was the name of the VM, when did you notice that it went Down or failed to initiate a console? Created attachment 1248951 [details]
engineLog_errorPicture
The VM name is 'AA'. On rhevm webpage, the events are as follows:
VM AA was started by admin@internal-authz (Host: A).
User admin@internal-authz initiated console session for VM AA
User admin@internal-authz is connected to VM AA.
VDSM A command GetStatsVDS failed: Heartbeat exceeded
VDSM A command SpmStatusVDS failed: Heartbeat exceeded
User admin@internal-authz failed to initiate a console session for VM AA
Invalid status on Data Center Default. Setting Data Center status to Non Responsive (On host A, Error: Network error during communication with the Host.).
User admin@internal-authz failed to initiate a console session for VM AA
User admin@internal-authz failed to initiate a console session for VM AA
VDSM command GetStoragePoolInfoVDS failed: Heartbeat exceeded
Status of host A was set to Up.
VDSM A command HSMGetAllTasksStatusesVDS failed: Not SPM
VM AA is down with error. Exit message: Failed to find the libvirt domain.
Failed to run VM AA on Host A.
Failed to run VM AA (User: admin@internal-authz).
Storage Pool Manager runs on Host A (Address: amd-8750-4-2.englab.nay.redhat.com).
Pls refer to the new attachment: engineLog_errorPicture for more details.
there are libvirt connectivity issues too - please get libvirt, qemu, and system logs and specify what are the exact pkgs versions Created attachment 1256417 [details]
gdb.txt for libvirtd
It's a libvirt issue, reproduce on both libvirt-3.0.0-2.el7.x86_64 and libvirt-3.0.0-1.el7.x86_64, not reproduced on libvirt-2.5.0-1.el7.x86_64.
Pls refer to attachment: gdb.txt for libvirtd
Hi, can reproduce on following env: rhevm-4.0.7-0.1.el7ev.noarch vdsm-4.18.23-1.el7ev.x86_64 Libvirt-3.0.0-2.el7.x86_64 Steps: 1.gdb attaches libvirtd process #gdb -p `pidof libvirtd` (gdb)c Continuing 2.Try to start the vm by RunOnce with network(pxe) on rhevm webpage 3.A coredump occurred: (gdb) Continuing Detaching after fork from child process 12902. ... Detaching after fork from child process 13012. Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7ff34ab84700 (LWP 12530)] __strrchr_sse42 () at ../sysdeps/x86_64/multiarch/strrchr.S:138 138 pcmpistri $0x4a, (%r8), %xmm1 So guess that it's libvirtd bug. For more details pls refer to the attachment:"gdb.txt for libvirtd" backtrace of libvirtd: (gdb) bt #0 __strrchr_sse42 () at ../sysdeps/x86_64/multiarch/strrchr.S:138 #1 0x00007ff35b4f1ec3 in virFileIsSharedFSType (path=path@entry=0x0, fstypes=fstypes@entry=63) at util/virfile.c:3363 #2 0x00007ff35b4f283a in virFileIsSharedFS (path=path@entry=0x0) at util/virfile.c:3569 #3 0x00007ff32fe08864 in qemuOpenFileAs (bypassSecurityDriver=0x0, needUnlink=0x0, oflags=0, path=0x0, dynamicOwnership=false, fallback_gid=107, fallback_uid=107) at qemu/qemu_driver.c:2927 #4 qemuOpenFile (driver=driver@entry=0x7ff3241167d0, vm=vm@entry=0x7ff320018b10, path=0x0, oflags=oflags@entry=0, needUnlink=needUnlink@entry=0x0, bypassSecurityDriver=bypassSecurityDriver@entry=0x0) at qemu/qemu_driver.c:2908 #5 0x00007ff32fe26c73 in qemuDomainStorageOpenStat ( driver=driver@entry=0x7ff3241167d0, vm=vm@entry=0x7ff320018b10, src=src@entry=0x7ff320017140, ret_fd=ret_fd@entry=0x7ff34ab836ac, ret_sb=ret_sb@entry=0x7ff34ab836b0, cfg=0x7ff32410bca0, cfg=0x7ff32410bca0) at qemu/qemu_driver.c:11602 #6 0x00007ff32fe26ff0 in qemuDomainStorageUpdatePhysical ( driver=driver@entry=0x7ff3241167d0, cfg=cfg@entry=0x7ff32410bca0, vm=vm@entry=0x7ff320018b10, src=src@entry=0x7ff320017140) at qemu/qemu_driver.c:11655 #7 0x00007ff32fe27a5d in qemuDomainGetStatsOneBlock ( ---Type <return> to continue, or q <return> to quit--- driver=driver@entry=0x7ff3241167d0, cfg=cfg@entry=0x7ff32410bca0, dom=dom@entry=0x7ff320018b10, record=record@entry=0x7ff318007de0, maxparams=maxparams@entry=0x7ff34ab839e4, src=src@entry=0x7ff320017140, block_idx=block_idx@entry=0, backing_idx=backing_idx@entry=0, stats=0x7ff318008060, disk=0x7ff320016f90, disk=0x7ff320016f90) at qemu/qemu_driver.c:19541 #8 0x00007ff32fe27ecc in qemuDomainGetStatsBlock (driver=0x7ff3241167d0, dom=0x7ff320018b10, record=0x7ff318007de0, maxparams=0x7ff34ab839e4, privflags=<optimized out>) at qemu/qemu_driver.c:19600 #9 0x00007ff32fe0cb21 in qemuDomainGetStats (flags=1, record=<synthetic pointer>, stats=127, dom=0x7ff320018b10, conn=0x7ff3241efcb0) at qemu/qemu_driver.c:19762 #10 qemuConnectGetAllDomainStats (conn=0x7ff3241efcb0, doms=<optimized out>, ndoms=<optimized out>, stats=127, retStats=0x7ff34ab83b10, flags=<optimized out>) at qemu/qemu_driver.c:19852 #11 0x00007ff35b5f5dc6 in virConnectGetAllDomainStats (conn=0x7ff3241efcb0, stats=0, retStats=retStats@entry=0x7ff34ab83b10, flags=0) at libvirt-domain.c:11311 #12 0x00007ff35c252d30 in remoteDispatchConnectGetAllDomainStats ( server=0x7ff35e131020, msg=0x7ff35e164560, ret=0x7ff318007d80, args=0x7ff318006150, rerr=0x7ff34ab83c50, client=0x7ff35e1611e0) at remote.c:6543 #13 remoteDispatchConnectGetAllDomainStatsHelper (server=0x7ff35e131020, ---Type <return> to continue, or q <return> to quit--- client=0x7ff35e1611e0, msg=0x7ff35e164560, rerr=0x7ff34ab83c50, args=0x7ff318006150, ret=0x7ff318007d80) at remote_dispatch.h:615 #14 0x00007ff35b656072 in virNetServerProgramDispatchCall (msg=0x7ff35e164560, client=0x7ff35e1611e0, server=0x7ff35e131020, prog=0x7ff35e1510f0) at rpc/virnetserverprogram.c:437 #15 virNetServerProgramDispatch (prog=0x7ff35e1510f0, server=server@entry=0x7ff35e131020, client=0x7ff35e1611e0, msg=0x7ff35e164560) at rpc/virnetserverprogram.c:307 #16 0x00007ff35c285ccd in virNetServerProcessMsg (msg=<optimized out>, prog=<optimized out>, client=<optimized out>, srv=0x7ff35e131020) at rpc/virnetserver.c:148 #17 virNetServerHandleJob (jobOpaque=<optimized out>, opaque=0x7ff35e131020) at rpc/virnetserver.c:169 #18 0x00007ff35b53d401 in virThreadPoolWorker ( opaque=opaque@entry=0x7ff35e125bb0) at util/virthreadpool.c:167 #19 0x00007ff35b53c788 in virThreadHelper (data=<optimized out>) at util/virthread.c:206 #20 0x00007ff3588ffdc5 in start_thread (arg=0x7ff34ab84700) at pthread_create.c:308 #21 0x00007ff3586277ad in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 Way simpler libvirt-only reproducer is to create a VM with an empty cdrom drive and call "virsh domstats VM" while it's running. Created attachment 1257112 [details]
Use virsh to reproduce
Reproded via virsh:
➜ ~ virsh attach-disk V /tmp/boot.iso sda --config --type cdrom
Disk attached successfully
➜ ~ virsh change-media V sda --eject --config
Successfully ejected media.
➜ ~ virsh start V
Domain V started
➜ ~ virsh domstats V
error: Disconnected from qemu:///system due to I/O error
error: End of file while reading data: Input/output error
commit c3de387380f6057ee0e46cd9f2f0a092e8070875 Author: Peter Krempa <pkrempa> Date: Thu Feb 23 10:07:30 2017 +0100 qemu: Don't update physical storage size of empty drives Previously the code called virStorageSourceUpdateBlockPhysicalSize which did not do anything on empty drives since it worked only on block devices. After the refactor in c5f6151390 it's called for all devices and thus attempts to deref the NULL path of empty drives. Add a check that skips the update of the physical size if the storage source is empty. Verify this bug with libvirt-3.1.0-2.el7.x86_64, qemu-kvm-rhev-2.8.0-6.el7.x86_64 Steps to verify: 1.Check libvirtd status # systemctl status libvirtd|grep PID Main PID: 9115 (libvirtd) 2.Prepare a domain with empty cdrom # cd /tmp # wget http://.../boot.iso # virsh attach-disk V /tmp/boot.iso sda --config --type cdrom Disk attached successfully # virsh change-media V sda --eject --config Successfully ejected media. 3.Start the domain and check domstats # virsh start V Domain V started # virsh domstats V Domain: 'V' state.state=1 state.reason=1 cpu.time=25500459451 cpu.user=800000000 cpu.system=9940000000 ... # virsh -r domstats V Domain: 'V' state.state=1 state.reason=1 cpu.time=25994233410 cpu.user=800000000 cpu.system=9960000000 … 4.Check libvirtd again: # systemctl status libvirtd |grep PID Main PID: 9115 (libvirtd) libvirtd is not crashed. Since the result is as expected, mark this bug as verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:1846 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:1846 |
Created attachment 1248789 [details] engine.log Description of problem: Fail to initiate a console session for VM when RunOnce by network(pxe) Version-Release number of selected component (if applicable): Rhevm-4.1.0.4-0.1.el7.noarch Hosts: libvirt-daemon-3.0.0-1.el7.x86_64 vdsm-4.19.5-1.el7ev.x86_64 How reproducible: 100% Steps to Reproduce: 1.Create a data center: edit name, set others as default. 2.Create a cluster: edit name, CPU Arc:x86_64, CPU Type: AMD, set others as default. 3.Add 2 hosts with AMD cpu 4.Create a nfs storage 5.New a vm: edit name, nic1:ovritmgmt/ovritmgmt, create image: 5G, Interface:VirtIO, storage:nfs set others as default. 6.RunOnce vm: boot options: select “network(pxe)” up to top, click ‘ok’ . 7.When vm is PoweringUp, click ‘console’ icon to open console for vm. Actual results: The console for vm is opened and begin booting from pxe for a few seconds, but will be closed soon. Click the ‘console’ icon again to reopen console, console cannot be opened with error: “Setting VM ticket failed”. After a few minutes, vm is down. Expected results: The console should keep open for booting, should not get error “Setting VM ticket failed”, and the vm should not be down. Additional info: 1.Pls refer to attachments for logs. 2.Not occurred on Rhevm-4.1.0-0.3.beta2.el7.noarch, vdsm-4.19.1-1.el7ev.x86_64. 3.“Setting VM ticket failed” also occurred when: after 'Runonce' installation with boot.iso then 'Run' VM and try to open console. VM will also down.