Bug 1281520

Summary:	[ppc64le] Qemu crashes after writing to a resized disk from the vm.
Product:	Red Hat Enterprise Linux 7	Reporter:	Carlos Mestre González <cmestreg>
Component:	qemu-kvm-rhev	Assignee:	Thomas Huth <thuth>
Status:	CLOSED DUPLICATE	QA Contact:	Virtualization Bugs <virt-bugs>
Severity:	urgent	Docs Contact:
Priority:	unspecified
Version:	7.2	CC:	amureini, cmestreg, dgibson, famz, gklein, hannsj_uhl, knoel, lvivier, mazhang, michen, qzhang, shuyu, virt-maint, xuhan, xuma, zhengtli
Target Milestone:	rc
Target Release:	---
Hardware:	ppc64le
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2015-11-23 04:59:06 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1277922
Bug Blocks:	1279052, 1308609, 1359843

Description Carlos Mestre González 2015-11-12 16:26:15 UTC

Description of problem:
As the topic says, I'm opening a new bz here to investigate the issue regarding the crash from this bug https://bugzilla.redhat.com/show_bug.cgi?id=1279052 (there's another issue going on with vdsm, so opening a new one to investigate the issue with qemu)

Version-Release number of selected component (if applicable):
qemu-kvm-common-rhev-2.3.0-31.el7_2.1.ppc64le
qemu-img-rhev-2.3.0-31.el7_2.1.ppc64le
libvirt-daemon-driver-qemu-1.2.17-13.el7.ppc64le
ipxe-roms-qemu-20130517-7.gitc4bce43.el7.noarch
qemu-kvm-rhev-2.3.0-31.el7_2.1.ppc64le
qemu-kvm-tools-rhev-2.3.0-31.el7_2.1.ppc64le


How reproducible:
100%

Steps to Reproduce:
Please see https://bugzilla.redhat.com/show_bug.cgi?id=1279052#c17 (and rest of the bug)

Actual results:
ERROR:qom/object.c:716:object_unref: assertion failed: (obj->ref > 0)

Thus the QEMU process exited because it hit an assert() statement - something called object_unref() with an object that was not referenced anymore.

Qemu crashes, check https://bugzilla.redhat.com/show_bug.cgi?id=1279052#c19 and #c22 and for the core dump

Additional info:

Comment 2 Qunfang Zhang 2015-11-13 10:54:26 UTC

Hi, David

Is it possible that this bug is the same issue as bug 1277922? Since bug 1279052 comment 22 and bug 1277922 comment 18 looks similar.

Comment 3 Laurent Vivier 2015-11-13 13:22:08 UTC

(In reply to Qunfang Zhang from comment #2)
> Hi, David
> 
> Is it possible that this bug is the same issue as bug 1277922? Since bug
> 1279052 comment 22 and bug 1277922 comment 18 looks similar.

It looks like. It happens when the VM is stopped on I/O error (not enough space) and the VM is restarted after the problem has been fixed.

Comment 4 Shuang Yu 2015-11-13 16:17:38 UTC

Try to reproduce this issue with "qemu-kvm-rhev-2.3.0-31.el7_2.1.ppc64le",but follow the steps as below,only hit 
"(qemu) info status 
VM status: paused (io-error)" problem.

Host version:
qemu-kvm-rhev-2.3.0-31.el7_2.1.ppc64le
kernel-3.10.0-330.el7.ppc64le
SLOF-20150313-5.gitc89b0df.el7.noarch

Steps:

1.On iscsi server,create lun for iscsi client to use.

# qemu-img create -f qcow2 /home/test 1G

#targetcli
..
/backstores/fileio> create file0 /home/test
Created fileio file0 with size 1073741824
/> /iscsi/iqn.2015-10.com.test:server1/tpg1/luns/ create /backstores/fileio/file0


2.On iscsi client:

# iscsiadm --mode node --targetname iqn.2015-10.com.test:server1 --portal 10.16.67.19:3260 --login

# fdisk -l
Disk /dev/sdg: 1073 MB, 1073741824 bytes, 2097152 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 8388608 bytes
Disk label type: dos
Disk identifier: 0xc5672dcd

#fdisk /dev/sdg & #mkfs.ext4 /dev/sdg1 & mount /dev/sdg1 /mnt/tmp

3.Create lvm on /dev/sdg1

# losetup /dev/loop0 /dev/sdg1

# pvcreate /dev/loop0
  Physical volume "/dev/loop0" successfully created

# vgcreate test /dev/loop0
  Volume group "test" successfully created

# vgchange -ay test
  0 logical volume(s) in volume group "test" now active

# lvcreate -L 500M -n mylvm test
  Logical volume "mylvm" created.

# lvdisplay 
   
  --- Logical volume ---
  LV Path                /dev/test/mylvm
  LV Name                mylvm
  VG Name                test
  LV UUID                eN75cd-80I1-Zrdy-a59F-2ewE-iEEV-xeVENo
  LV Write Access        read/write
  LV Creation host, time ibm-p8-rhevm-16.lab4.eng.bos.redhat.com, 2015-11-13 09:29:37 -0500
  LV Status              available
  # open                 0
  LV Size                500.00 MiB
  Current LE             125
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     8192
  Block device           253:3

# qemu-img info /dev/test/mylvm
image: /dev/test/mylvm
file format: raw
virtual size: 500M (524288000 bytes)
disk size: 0

4.Create qcow2 image on /dev/test/mylvm

# qemu-img create -f qcow2 /dev/test/mylvm 2G
Formatting '/dev/test/mylvm', fmt=qcow2 size=2147483648 encryption=off cluster_size=65536 lazy_refcounts=off refcount_bits=16

# qemu-img info /dev/test/mylvm 
image: /dev/test/mylvm
file format: qcow2
virtual size: 2.0G (2147483648 bytes)
disk size: 0
cluster_size: 65536
Format specific information:
    compat: 1.1
    lazy refcounts: false
    refcount bits: 16
    corrupt: false

5.Boot up the guest with /dev/test/mylvm as data disk

#  /usr/libexec/qemu-kvm -name Bug-reverify -machine pseries,accel=kvm,usb=off -m 4G -smp 8,sockets=2,cores=1,threads=4 -uuid 8aeab7e2-f341-4f8c-80e8-59e2968d85c2 -realtime mlock=off -nodefaults -monitor stdio -rtc base=utc -msg timestamp=on -usb -device usb-tablet,id=tablet1  -vga std -qmp tcp:0:4666,server,nowait -netdev tap,id=hostnet1,script=/etc/qemu-ifup,vhost=on -device virtio-net-pci,netdev=hostnet1,id=net1,mac=00:54:5a:52:5f:5c -vnc :10 -device virtio-scsi-pci,id=scsi0,addr=0x6 -drive file=RHEL-7.2-20151030.0-Server-ppc64le.qcow2,format=qcow2,if=none,id=drive-scsi0-0-0-0,cache=none -device scsi-hd,bus=scsi0.0,drive=drive-scsi0-0-0-0,bootindex=1,id=scsi0-0-0-0 -drive file=/dev/test/mylvm,format=qcow2,if=none,id=drive-scsi1,cache=none -device scsi-hd,bus=scsi0.0,drive=drive-scsi1,id=scsi1 

6.Extend the lvm size

# lvextend -L +500M /dev/test/mylvm
  Size of logical volume test/mylvm changed from 500.00 MiB (125 extents) to 1000.00 MiB (250 extents).
  Logical volume mylvm successfully resized.

7.In the guest:

# fdisk /dev/sdb
# mkfs.ext4 /dev/sdb1
# mount /dev/sdb1 /mnt/sdb1

8.In the guest:
# dd if=/dev/urandom of=/mnt/sdb1/file bs=1M count=2048

Actual result:
(qemu) info status
VM status: paused (io-error)
(qemu) info status
VM status: paused (io-error)
(qemu) cont
(qemu) info status
VM status: paused (io-error)
(qemu)

Comment 5 Thomas Huth 2015-11-13 16:50:24 UTC

(In reply to Qunfang Zhang from comment #2)
> Is it possible that this bug is the same issue as bug 1277922? Since bug
> 1279052 comment 22 and bug 1277922 comment 18 looks similar.

I agree with Laurent, looks similar! I'll do build with the fix that has been suggested in that bug, then we can (hopefully) check whether it fixes this issue, too.

Comment 9 Carlos Mestre González 2015-11-20 15:19:09 UTC

Hi,

I tested the scenario with the new build and seems to work, as in the vm *doesn't crash*. On the other hand the same scenario the vm pauses with an Storage space error, that it's a bug too (that could be related to some other components, so I'm updating the bug 1279052 and see if there's anything else to do with qemu-kvm-rhev.

Thanks for your build.

Comment 10 David Gibson 2015-11-23 01:41:15 UTC

The whole reproducer is deliberately set up to trigger a pause due to a storage space error, so seeing that initially is not a bug.  If expanding the LV then resuming qemu isn't enough to fix the storage space error and let the guest continue, then there is a problem.

Note that depending on exactly how much disk space you allocate at each stage, it is possible that you could get one storage space pause, lvextend, resume the guest and then it will run for a while before hitting another storage space pause which would require another lvextend.

Comment 11 David Gibson 2015-11-23 01:44:50 UTC

Sorry, comment 10 above was written with regards to the reproducer for bug 1277922.

I don't know vdsm well enough to be certain, but it looks like I'd expect a storage space error along the way for this bug as well - however, I'd expect vdsm to resolve that error (by expanding the space and resuming the guest) without manual intervention.

It does sound to me like this was a duplicate of bug 1277922 as expected.

Comment 12 Thomas Huth 2015-11-23 04:59:06 UTC

I agree with David, this bug here (the QEMU crash) was a duplicate of 1277922, so I'm closing this ticket accordingly. The remaining issue with the VM pause can be tracked in BZ 1279052 instead.

*** This bug has been marked as a duplicate of bug 1277922 ***