Bug 1600599

Summary:	[RHHI] QEMU-KVM crash seen while removing storage domain which had vm's on it
Product:	Red Hat Enterprise Linux 7	Reporter:	bipin <bshetty>
Component:	glusterfs	Assignee:	sankarshan <sankarshan>
Status:	CLOSED NOTABUG	QA Contact:	Sweta Anandpara <sanandpa>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	7.5	CC:	bshetty, chayang, coli, juzhang, michen, ngu, pagranat, rhs-bugs, sabose, sankarshan, sasundar, virt-maint
Target Milestone:	rc
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:	1600598	Environment:
Last Closed:	2019-05-08 09:59:50 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1481022, 1600598

Description bipin 2018-07-12 14:56:54 UTC

+++ This bug was initially created as a clone of Bug #1600598 +++

Description of problem:
-----------------------
The qemu-kvm crashed while removing the storage domain via RHV-M. The storage domain has few vm's running on it. This was seen on a dedupe and compression enabled (VDO) storage domain


Version-Release number of selected component (if applicable):
-----------------------------------------------------------
qemu-guest-agent-2.8.0-2.el7.x86_64
qemu-kvm-rhev-2.10.0-21.el7_5.4.x86_64
qemu-kvm-common-rhev-2.10.0-21.el7_5.4.x86_64
libvirt-daemon-driver-qemu-3.9.0-14.el7_5.6.x86_64
qemu-img-rhev-2.10.0-21.el7_5.4.x86_64
glusterfs-3.8.4-54.13.el7rhgs.x86_64

How reproducible:
----------------

Steps to Reproduce:
-------------------
1. Have the HE deployed on a Hyperconverged infrastructure(RHHI)
2. Create a storage domain with VDO enabled volumes
3. Create multiple vm's and pump data using FIO
4. Stop the gluster volumes and delete it
5. Remove the storage domain via RHV-M

Actual results:
--------------
qemu-kvm crashed

Expected results:
----------------
qemu-kvm shouldn't crash

Additional info:
----------------
1.This was seen on a gluster replica 3 (1*3) volume
2.Had VDO enabled bricks 
3.3 node cluster

--- Additional comment from bipin on 2018-07-12 10:55:04 EDT ---

id 7545f33c3237624d89ff870354c7d8fa238bcedb
reason:         qemu-kvm killed by SIGABRT
time:           Tue 10 Jul 2018 11:58:56 AM IST
cmdline:        /usr/libexec/qemu-kvm -name guest=vdo_vm1,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-38-vdo_vm1/master-key.aes -machine pc-i440fx-rhel7.5.0,accel=kvm,usb=off,dump-guest-core=off -cpu Haswell-noTSX,spec-ctrl=on,ssbd=on -m size=2097152k,slots=16,maxmem=8388608k -realtime mlock=off -smp 2,maxcpus=16,sockets=16,cores=1,threads=1 -numa node,nodeid=0,cpus=0-1,mem=2048 -uuid f44c3d16-d521-4b20-a02e-cca52070bfda -smbios 'type=1,manufacturer=oVirt,product=RHEV Hypervisor,version=7.5-5.0.el7,serial=00000000-0000-0000-0000-AC1F6B400622,uuid=f44c3d16-d521-4b20-a02e-cca52070bfda' -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-38-vdo_vm1/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=2018-07-10T04:02:26,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-scsi-pci,id=ua-0f32076f-ed6c-4cdf-b1f5-6bbdfce63727,bus=pci.0,addr=0x4 -device virtio-serial-pci,id=ua-1928334c-76c9-4947-868b-187e6f287ff5,max_ports=16,bus=pci.0,addr=0x5 -drive if=none,id=drive-ua-a35d33bd-5ad6-4a61-a70d-53ea611fd546,readonly=on,werror=report,rerror=report -device ide-cd,bus=ide.1,unit=0,drive=drive-ua-a35d33bd-5ad6-4a61-a70d-53ea611fd546,id=ua-a35d33bd-5ad6-4a61-a70d-53ea611fd546 -drive file=/rhev/data-center/mnt/glusterSD/rhsqa-grafton7-nic2.lab.eng.blr.redhat.com:vdo/7c5d831d-c723-4003-b0df-b5e16f5f2320/images/a25daf4d-df1d-4ab2-95a0-fc49b1d47527/f628b286-281f-4166-9824-f095483697ba,format=raw,if=none,id=drive-ua-a25daf4d-df1d-4ab2-95a0-fc49b1d47527,serial=a25daf4d-df1d-4ab2-95a0-fc49b1d47527,cache=none,werror=stop,rerror=stop,aio=threads -device scsi-hd,bus=ua-0f32076f-ed6c-4cdf-b1f5-6bbdfce63727.0,channel=0,scsi-id=0,lun=0,drive=drive-ua-a25daf4d-df1d-4ab2-95a0-fc49b1d47527,id=ua-a25daf4d-df1d-4ab2-95a0-fc49b1d47527,bootindex=2 -drive file=/rhev/data-center/mnt/glusterSD/rhsqa-grafton7-nic2.lab.eng.blr.redhat.com:vdo/7c5d831d-c723-4003-b0df-b5e16f5f2320/images/21c04a77-1a84-4743-93da-2a5878369f20/7ca00a4d-11f9-49aa-a1bc-508ea7098f47,format=raw,if=none,id=drive-ua-21c04a77-1a84-4743-93da-2a5878369f20,serial=21c04a77-1a84-4743-93da-2a5878369f20,cache=none,werror=stop,rerror=stop,aio=threads -device scsi-hd,bus=ua-0f32076f-ed6c-4cdf-b1f5-6bbdfce63727.0,channel=0,scsi-id=0,lun=2,drive=drive-ua-21c04a77-1a84-4743-93da-2a5878369f20,id=ua-21c04a77-1a84-4743-93da-2a5878369f20 -netdev tap,fd=50,id=hostua-94cdc487-7d03-428b-a328-5d9aa400d30b,vhost=on,vhostfd=52 -device virtio-net-pci,netdev=hostua-94cdc487-7d03-428b-a328-5d9aa400d30b,id=ua-94cdc487-7d03-428b-a328-5d9aa400d30b,mac=00:1a:4a:16:01:15,bus=pci.0,addr=0x3,bootindex=1 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channels/f44c3d16-d521-4b20-a02e-cca52070bfda.ovirt-guest-agent.0,server,nowait -device virtserialport,bus=ua-1928334c-76c9-4947-868b-187e6f287ff5.0,nr=1,chardev=charchannel0,id=channel0,name=ovirt-guest-agent.0 -chardev socket,id=charchannel1,path=/var/lib/libvirt/qemu/channels/f44c3d16-d521-4b20-a02e-cca52070bfda.org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=ua-1928334c-76c9-4947-868b-187e6f287ff5.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0 -chardev spicevmc,id=charchannel2,name=vdagent -device virtserialport,bus=ua-1928334c-76c9-4947-868b-187e6f287ff5.0,nr=3,chardev=charchannel2,id=channel2,name=com.redhat.spice.0 -spice port=5921,tls-port=5922,addr=10.70.36.241,x509-dir=/etc/pki/vdsm/libvirt-spice,tls-channel=main,tls-channel=display,tls-channel=inputs,tls-channel=cursor,tls-channel=playback,tls-channel=record,tls-channel=smartcard,tls-channel=usbredir,seamless-migration=on -device qxl-vga,id=ua-fff2892c-61a6-4837-afa0-916c1e6e0f21,ram_size=67108864,vram_size=8388608,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=ua-fd858673f66-ea73-49e2-9985-ac4590adbfee,filename=/dev/urandom -device virtio-rng-pci,rng=objua-05164f66-ea73-49e2-9985-ac4590adbfee,id=ua-05164f66-ea73-49e2-9985-ac4590adbfee,bus=pci.0,addr=0x7 -device vmcoreinfo -msg timestamp=on
package:        qemu-kvm-rhev-2.10.0-21.el7_5.4
uid:            107 (qemu)
count:          1
Directory:      /var/tmp/abrt/ccpp-2018-07-10-11:58:56-52702
Run 'abrt-cli report /var/tmp/abrt/ccpp-2018-07-10-11:58:56-52702' for creating a case in Red Hat Customer Portal

Comment 5 bipin 2018-07-12 15:10:09 UTC

Also had raised similar qemu-crash bugs previously 1575872 and 1561324.

Comment 7 Jeff Cody 2018-08-15 17:34:26 UTC

Just a note: from the description, this is occurring on a gluster fuse mount, and not with the QEMU native libgfapi driver.  I thought it worth mentioning since the guest name was "libgfapi", and I wanted to avoid confusion.

From the trace, it appears the fcntl operation F_UNLCK is failing on the image file, which is located on the glusterfs fuse mount.

I wonder if this is related to BZ #1598025.  Like that bug, I am suspecting the bug may be in the glusterfs library used for fuse.

If you use a later glusterfs version (such as 4.0.2-1) on the qemu host machine (i.e. the machine mounting the fuse mount, not the gluster server) does this problem go away?

Comment 9 Jeff Cody 2018-08-15 17:43:08 UTC

*** Bug 1609561 has been marked as a duplicate of this bug. ***

Comment 10 bipin 2018-11-08 07:42:56 UTC

(In reply to Jeff Cody from comment #7)
> Just a note: from the description, this is occurring on a gluster fuse
> mount, and not with the QEMU native libgfapi driver.  I thought it worth
> mentioning since the guest name was "libgfapi", and I wanted to avoid
> confusion.
> 
> From the trace, it appears the fcntl operation F_UNLCK is failing on the
> image file, which is located on the glusterfs fuse mount.
> 
> I wonder if this is related to BZ #1598025.  Like that bug, I am suspecting
> the bug may be in the glusterfs library used for fuse.
> 
> If you use a later glusterfs version (such as 4.0.2-1) on the qemu host
> machine (i.e. the machine mounting the fuse mount, not the gluster server)
> does this problem go away?

Apologies for the delayed reply. In the later version's of gluster (RHGS 3.4.0 & 3.4.1) ,I couldn't see the above issue

Comment 11 Sahina Bose 2018-11-19 05:25:56 UTC

Can we close this bug as it's not reproducible with latest version

Comment 12 bipin 2019-05-08 09:59:50 UTC

Apologies for the delay, since the bug wasn't reproducible closing it.