Bug 1356883
Summary: | libvirtd crash seen, while attempting creation VM snapshots | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | SATHEESARAN <sasundar> | ||||||||||
Component: | libvirt | Assignee: | Martin Kletzander <mkletzan> | ||||||||||
Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | Han Han <hhan> | ||||||||||
Severity: | high | Docs Contact: | |||||||||||
Priority: | unspecified | ||||||||||||
Version: | 7.2 | CC: | dyuan, hhan, jdenemar, jsuchane, knarra, pcuzner, pzhang, rbalakri, rs, sabose, sasundar, tjelinek, xuzhang, ylavi | ||||||||||
Target Milestone: | pre-dev-freeze | ||||||||||||
Target Release: | --- | ||||||||||||
Hardware: | x86_64 | ||||||||||||
OS: | Linux | ||||||||||||
Whiteboard: | |||||||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||||
Doc Text: | Story Points: | --- | |||||||||||
Clone Of: | Environment: |
RHEV-RHGS HC
|
|||||||||||
Last Closed: | 2017-01-25 09:50:13 UTC | Type: | Bug | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | Virt | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Bug Depends On: | |||||||||||||
Bug Blocks: | 1277939 | ||||||||||||
Attachments: |
|
Description
SATHEESARAN
2016-07-15 08:19:54 UTC
Backtrace from the libvirtd coredump ------------------------------------- Reading symbols from /usr/sbin/libvirtd...Reading symbols from /usr/lib/debug/usr/sbin/libvirtd.debug...done. done. warning: .dynamic section for "/usr/lib64/libsystemd.so.0.6.0" is not at the expected address (wrong library or version mismatch?) warning: Can't read pathname for load map: Input/output error. [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by `/usr/sbin/libvirtd --listen'. Program terminated with signal 11, Segmentation fault. #0 0x00007f91affedafa in virNumaGetMaxCPUs () at util/virnuma.c:378 378 return NUMA_MAX_N_CPUS; Missing separate debuginfos, use: debuginfo-install sanlock-lib-3.2.4-2.el7_2.x86_64 (gdb) bt #0 0x00007f91affedafa in virNumaGetMaxCPUs () at util/virnuma.c:378 #1 0x00007f91affedb41 in virNumaGetNodeCPUs (node=node@entry=0, cpus=cpus@entry=0x7f9193de2610) at util/virnuma.c:259 #2 0x00007f91b0093baf in nodeCapsInitNUMA (sysfs_prefix=sysfs_prefix@entry=0x0, caps=caps@entry=0x7f918c10c550) at nodeinfo.c:2122 #3 0x00007f919741a5d6 in virQEMUCapsInit (cache=0x7f918c146df0) at qemu/qemu_capabilities.c:1058 #4 0x00007f9197454040 in virQEMUDriverCreateCapabilities (driver=driver@entry=0x7f918c20a770) at qemu/qemu_conf.c:903 #5 0x00007f9197496121 in qemuStateInitialize (privileged=true, callback=<optimized out>, opaque=<optimized out>) at qemu/qemu_driver.c:862 #6 0x00007f91b0095ddf in virStateInitialize (privileged=true, callback=callback@entry=0x7f91b0cd3ec0 <daemonInhibitCallback>, opaque=opaque@entry=0x7f91b1291910) at libvirt.c:777 #7 0x00007f91b0cd3f1b in daemonRunStateInit (opaque=0x7f91b1291910) at libvirtd.c:947 #8 0x00007f91b0008182 in virThreadHelper (data=<optimized out>) at util/virthread.c:206 #9 0x00007f91ad671dc5 in start_thread (arg=0x7f9193de3700) at pthread_create.c:308 #10 0x00007f91ad39eced in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 Copied the contents from Global Events Tab ------------------------------------------ Jul 15, 2016 8:16:36 AM VM appvm29 was set to the Unknown status. Jul 15, 2016 8:16:36 AM VM appvm21 was set to the Unknown status. Jul 15, 2016 8:16:36 AM VM appvm21 was set to the Unknown status. Jul 15, 2016 8:16:36 AM VM appvm20 was set to the Unknown status. Jul 15, 2016 8:16:36 AM VM appvm20 was set to the Unknown status. Jul 15, 2016 8:16:36 AM VM appvm20 was set to the Unknown status. Jul 15, 2016 8:16:36 AM VM appvm18 was set to the Unknown status. Jul 15, 2016 8:16:36 AM VM appvm18 was set to the Unknown status. Jul 15, 2016 8:16:36 AM VM appvm18 was set to the Unknown status. Jul 15, 2016 8:16:36 AM VM appvm15 was set to the Unknown status. Jul 15, 2016 8:16:36 AM VM appvm15 was set to the Unknown status. Jul 15, 2016 8:16:36 AM VM appvm15 was set to the Unknown status. Jul 15, 2016 8:16:36 AM VM appvm13 was set to the Unknown status. Jul 15, 2016 8:16:36 AM VM appvm13 was set to the Unknown status. Jul 15, 2016 8:16:36 AM VM appvm13 was set to the Unknown status. Jul 15, 2016 8:16:35 AM VM appvm11 was set to the Unknown status. Jul 15, 2016 8:16:35 AM VM appvm11 was set to the Unknown status. Jul 15, 2016 8:16:35 AM VM appvm11 was set to the Unknown status. Jul 15, 2016 8:16:35 AM VM appvm09 was set to the Unknown status. Jul 15, 2016 8:16:35 AM VM appvm09 was set to the Unknown status. Jul 15, 2016 8:16:35 AM VM appvm04 was set to the Unknown status. Jul 15, 2016 8:16:35 AM VM appvm09 was set to the Unknown status. Jul 15, 2016 8:16:35 AM VM appvm04 was set to the Unknown status. Jul 15, 2016 8:16:35 AM VM appvm04 was set to the Unknown status. Jul 15, 2016 8:16:35 AM Failed to create live snapshot 'GLUSTER-Geo-rep-snapshot' for VM 'appvm29'. VM restart is recommended. Note that using the created snapshot might cause data inconsistency. Jul 15, 2016 8:16:35 AM VDSM host2 command failed: Vds timeout occured Jul 15, 2016 8:16:35 AM VDSM host2 command failed: Message timeout which can be caused by communication issues Jul 15, 2016 8:14:14 AM Failed to complete snapshot 'GLUSTER-Geo-rep-snapshot' creation for VM 'appvm02'. Jul 15, 2016 8:14:01 AM Failed to complete snapshot 'GLUSTER-Geo-rep-snapshot' creation for VM 'appvm03'. Jul 15, 2016 8:14:00 AM Failed to complete snapshot 'GLUSTER-Geo-rep-snapshot' creation for VM 'appvm15'. Jul 15, 2016 8:13:58 AM Failed to complete snapshot 'GLUSTER-Geo-rep-snapshot' creation for VM 'appvm18'. Jul 15, 2016 8:13:57 AM Failed to complete snapshot 'GLUSTER-Geo-rep-snapshot' creation for VM 'appvm21'. Jul 15, 2016 8:13:55 AM Failed to complete snapshot 'GLUSTER-Geo-rep-snapshot' creation for VM 'appvm16'. Jul 15, 2016 8:13:51 AM Failed to complete snapshot 'GLUSTER-Geo-rep-snapshot' creation for VM 'appvm04'. Jul 15, 2016 8:13:48 AM Failed to complete snapshot 'GLUSTER-Geo-rep-snapshot' creation for VM 'appvm11'. Jul 15, 2016 8:13:47 AM Failed to complete snapshot 'GLUSTER-Geo-rep-snapshot' creation for VM 'appvm06'. Jul 15, 2016 8:13:45 AM Failed to complete snapshot 'GLUSTER-Geo-rep-snapshot' creation for VM 'appvm22'. Jul 15, 2016 8:13:44 AM Failed to complete snapshot 'GLUSTER-Geo-rep-snapshot' creation for VM 'appvm13'. Jul 15, 2016 8:13:43 AM Failed to complete snapshot 'GLUSTER-Geo-rep-snapshot' creation for VM 'appvm12'. Jul 15, 2016 8:13:43 AM Status of host host1 was set to Up. Jul 15, 2016 8:13:42 AM Failed to complete snapshot 'GLUSTER-Geo-rep-snapshot' creation for VM 'appvm09'. Jul 15, 2016 8:13:42 AM Manually synced the storage devices from host host1 Jul 15, 2016 8:13:41 AM Failed to complete snapshot 'GLUSTER-Geo-rep-snapshot' creation for VM 'appvm10'. Jul 15, 2016 8:13:40 AM Failed to complete snapshot 'GLUSTER-Geo-rep-snapshot' creation for VM 'appvm20'. Jul 15, 2016 8:13:31 AM Failed to create live snapshot 'GLUSTER-Geo-rep-snapshot' for VM 'appvm15'. VM restart is recommended. Note that using the created snapshot might cause data inconsistency. Jul 15, 2016 8:13:31 AM Failed to create live snapshot 'GLUSTER-Geo-rep-snapshot' for VM 'appvm18'. VM restart is recommended. Note that using the created snapshot might cause data inconsistency. Jul 15, 2016 8:13:31 AM Failed to create live snapshot 'GLUSTER-Geo-rep-snapshot' for VM 'appvm21'. VM restart is recommended. Note that using the created snapshot might cause data inconsistency. Jul 15, 2016 8:13:31 AM Failed to create live snapshot 'GLUSTER-Geo-rep-snapshot' for VM 'appvm16'. VM restart is recommended. Note that using the created snapshot might cause data inconsistency. Jul 15, 2016 8:13:31 AM Host host2 is not responding. Host cannot be fenced automatically because power management for the host is disabled. Jul 15, 2016 8:13:31 AM Failed to create live snapshot 'GLUSTER-Geo-rep-snapshot' for VM 'appvm13'. VM restart is recommended. Note that using the created snapshot might cause data inconsistency. Jul 15, 2016 8:13:31 AM Failed to create live snapshot 'GLUSTER-Geo-rep-snapshot' for VM 'appvm12'. VM restart is recommended. Note that using the created snapshot might cause data inconsistency. Jul 15, 2016 8:13:30 AM VDSM host2 command failed: Vds timeout occured Jul 15, 2016 8:13:30 AM VDSM host2 command failed: Vds timeout occured Jul 15, 2016 8:13:30 AM VDSM host2 command failed: Vds timeout occured Jul 15, 2016 8:13:30 AM VDSM host2 command failed: Vds timeout occured Jul 15, 2016 8:13:31 AM Failed to create live snapshot 'GLUSTER-Geo-rep-snapshot' for VM 'appvm09'. VM restart is recommended. Note that using the created snapshot might cause data inconsistency. Jul 15, 2016 8:13:31 AM Failed to create live snapshot 'GLUSTER-Geo-rep-snapshot' for VM 'appvm11'. VM restart is recommended. Note that using the created snapshot might cause data inconsistency. Jul 15, 2016 8:13:31 AM Failed to create live snapshot 'GLUSTER-Geo-rep-snapshot' for VM 'appvm20'. VM restart is recommended. Note that using the created snapshot might cause data inconsistency. Jul 15, 2016 8:13:31 AM Host host1 is not responding. Host cannot be fenced automatically because power management for the host is disabled. Jul 15, 2016 8:13:31 AM Failed to create live snapshot 'GLUSTER-Geo-rep-snapshot' for VM 'appvm02'. VM restart is recommended. Note that using the created snapshot might cause data inconsistency. Jul 15, 2016 8:13:31 AM Failed to create live snapshot 'GLUSTER-Geo-rep-snapshot' for VM 'appvm10'. VM restart is recommended. Note that using the created snapshot might cause data inconsistency. Jul 15, 2016 8:13:31 AM Failed to create live snapshot 'GLUSTER-Geo-rep-snapshot' for VM 'appvm22'. VM restart is recommended. Note that using the created snapshot might cause data inconsistency. Jul 15, 2016 8:13:31 AM Failed to create live snapshot 'GLUSTER-Geo-rep-snapshot' for VM 'appvm06'. VM restart is recommended. Note that using the created snapshot might cause data inconsistency. Jul 15, 2016 8:13:30 AM VDSM host1 command failed: Vds timeout occured Jul 15, 2016 8:13:30 AM VDSM host2 command failed: Vds timeout occured Jul 15, 2016 8:13:30 AM VDSM host2 command failed: Vds timeout occured Jul 15, 2016 8:13:30 AM VDSM host1 command failed: Vds timeout occured Jul 15, 2016 8:13:30 AM VDSM host2 command failed: Vds timeout occured Jul 15, 2016 8:13:30 AM VDSM host1 command failed: Vds timeout occured Jul 15, 2016 8:13:30 AM Failed to create live snapshot 'GLUSTER-Geo-rep-snapshot' for VM 'appvm03'. VM restart is recommended. Note that using the created snapshot might cause data inconsistency. Jul 15, 2016 8:13:30 AM VDSM host2 command failed: Vds timeout occured Jul 15, 2016 8:13:30 AM Failed to create live snapshot 'GLUSTER-Geo-rep-snapshot' for VM 'appvm04'. VM restart is recommended. Note that using the created snapshot might cause data inconsistency. Jul 15, 2016 8:13:30 AM VDSM host1 command failed: Vds timeout occured Jul 15, 2016 8:13:30 AM VDSM host2 command failed: Vds timeout occured Jul 15, 2016 8:13:30 AM VDSM host1 command failed: Vds timeout occured Jul 15, 2016 8:13:30 AM VDSM host1 command failed: Vds timeout occured Jul 15, 2016 8:13:30 AM VDSM host1 command failed: Message timeout which can be caused by communication issues Jul 15, 2016 8:13:30 AM VDSM host1 command failed: Message timeout which can be caused by communication issues Jul 15, 2016 8:13:30 AM VDSM host2 command failed: Message timeout which can be caused by communication issues Jul 15, 2016 8:12:13 AM Snapshot 'GLUSTER-Geo-rep-snapshot' creation for VM 'appvm27' has been completed. Jul 15, 2016 8:12:05 AM Snapshot 'GLUSTER-Geo-rep-snapshot' creation for VM 'appvm26' has been completed. Jul 15, 2016 8:11:52 AM Snapshot 'GLUSTER-Geo-rep-snapshot' creation for VM 'appvm25' has been completed. Jul 15, 2016 8:11:42 AM VM appvm25 is not responding. Jul 15, 2016 8:11:42 AM VM appvm03 is not responding. Jul 15, 2016 8:11:42 AM VM appvm26 is not responding. Jul 15, 2016 8:11:42 AM VM appvm16 is not responding. Jul 15, 2016 8:11:42 AM VM appvm02 is not responding. Jul 15, 2016 8:11:42 AM VM appvm10 is not responding. Jul 15, 2016 8:11:42 AM VM appvm27 is not responding. Jul 15, 2016 8:11:42 AM VM appvm22 is not responding. Jul 15, 2016 8:11:41 AM VM appvm12 is not responding. Jul 15, 2016 8:11:41 AM VM appvm06 is not responding. Jul 15, 2016 8:11:21 AM Snapshot 'GLUSTER-Geo-rep-snapshot' creation for VM 'appvm29' was initiated by admin@internal. Jul 15, 2016 8:11:18 AM Snapshot 'GLUSTER-Geo-rep-snapshot' creation for VM 'appvm28' was initiated by admin@internal. Jul 15, 2016 8:11:15 AM Snapshot 'GLUSTER-Geo-rep-snapshot' creation for VM 'appvm27' was initiated by admin@internal. Jul 15, 2016 8:11:11 AM Snapshot 'GLUSTER-Geo-rep-snapshot' creation for VM 'appvm26' was initiated by admin@internal. Jul 15, 2016 8:11:08 AM Snapshot 'GLUSTER-Geo-rep-snapshot' creation for VM 'appvm25' was initiated by admin@internal. Jul 15, 2016 8:11:05 AM Snapshot 'GLUSTER-Geo-rep-snapshot' creation for VM 'appvm24' was initiated by admin@internal. Jul 15, 2016 8:11:03 AM Snapshot 'GLUSTER-Geo-rep-snapshot' creation for VM 'appvm23' was initiated by admin@internal. Jul 15, 2016 8:10:58 AM Snapshot 'GLUSTER-Geo-rep-snapshot' creation for VM 'appvm22' was initiated by admin@internal. The actual issue started after July 15, 08.10 AM Please refer to the logs after this timestamp Created attachment 1180075 [details]
libvirtd coredump
Created attachment 1180076 [details]
vdsm.log from host2
Created attachment 1180077 [details]
vdsm.log from host3
Created attachment 1180079 [details]
engine.log from hosted engine
(In reply to SATHEESARAN from comment #6) > The actual issue started after July 15, 08.10 AM > Please refer to the logs after this timestamp July 15, 08.10 AM IST ( approx ) and July 14, 24.01 PM EDT ( approx ) (In reply to SATHEESARAN from comment #1) > Backtrace from the libvirtd coredump > ------------------------------------- > > Reading symbols from /usr/sbin/libvirtd...Reading symbols from > /usr/lib/debug/usr/sbin/libvirtd.debug...done. > done. > > warning: .dynamic section for "/usr/lib64/libsystemd.so.0.6.0" is not at the > expected address (wrong library or version mismatch?) > > warning: Can't read pathname for load map: Input/output error. > [Thread debugging using libthread_db enabled] > Using host libthread_db library "/lib64/libthread_db.so.1". > Core was generated by `/usr/sbin/libvirtd --listen'. > Program terminated with signal 11, Segmentation fault. > #0 0x00007f91affedafa in virNumaGetMaxCPUs () at util/virnuma.c:378 > 378 return NUMA_MAX_N_CPUS; > Missing separate debuginfos, use: debuginfo-install > sanlock-lib-3.2.4-2.el7_2.x86_64 > (gdb) bt > #0 0x00007f91affedafa in virNumaGetMaxCPUs () at util/virnuma.c:378 > #1 0x00007f91affedb41 in virNumaGetNodeCPUs (node=node@entry=0, > cpus=cpus@entry=0x7f9193de2610) at util/virnuma.c:259 > #2 0x00007f91b0093baf in nodeCapsInitNUMA > (sysfs_prefix=sysfs_prefix@entry=0x0, caps=caps@entry=0x7f918c10c550) at > nodeinfo.c:2122 > #3 0x00007f919741a5d6 in virQEMUCapsInit (cache=0x7f918c146df0) at > qemu/qemu_capabilities.c:1058 > #4 0x00007f9197454040 in virQEMUDriverCreateCapabilities > (driver=driver@entry=0x7f918c20a770) at qemu/qemu_conf.c:903 > #5 0x00007f9197496121 in qemuStateInitialize (privileged=true, > callback=<optimized out>, opaque=<optimized out>) at qemu/qemu_driver.c:862 > #6 0x00007f91b0095ddf in virStateInitialize (privileged=true, > callback=callback@entry=0x7f91b0cd3ec0 <daemonInhibitCallback>, > opaque=opaque@entry=0x7f91b1291910) at libvirt.c:777 > #7 0x00007f91b0cd3f1b in daemonRunStateInit (opaque=0x7f91b1291910) at > libvirtd.c:947 > #8 0x00007f91b0008182 in virThreadHelper (data=<optimized out>) at > util/virthread.c:206 > #9 0x00007f91ad671dc5 in start_thread (arg=0x7f9193de3700) at > pthread_create.c:308 > #10 0x00007f91ad39eced in clone () at > ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 @Jiri: this seems a lot like a libvirt bug - is it a known issue? No, I'm not aware of an existing bug in this area, you likely hit something new... ok, moved to libvirt Either the backtrace from the coredump is incorrect or it's a bug in libnuma (numactl) code. That's because NUMA_MAX_N_CPUS is defined as (numa_all_cpus_ptr->size) which would mean that the pointer is NULL. And that can't be true because libnuma would exit() if it failed the allocation. Would you mind reproducing and then capturing full backtrace right away (that is "thread apply all bt full" command in gdb) as well as printing the value of numa_all_cpus_ptr just in case that backtrace looks similar? Thanks a lot in advance. Closing due to not enough information, if the bug persists, please create new BZ with all requested information already attached. We are no longer seeing this problem with the latest RHV 4.3 and RHGS 3.4.4 |