Bug 1345830

Summary: [RHEL.7.3] [thin-provisioning] guest got io-error when I dd a file in it
Product: Red Hat Enterprise Linux 7 Reporter: Yang Meng <meyang>
Component: qemu-kvm-rhevAssignee: Fam Zheng <famz>
Status: CLOSED NOTABUG QA Contact: Virtualization Bugs <virt-bugs>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.3CC: chayang, huding, juzhang, knoel, meyang, ngu, pbonzini, shuang, virt-maint, xutian
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-06-22 11:36:34 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Yang Meng 2016-06-13 09:32:44 UTC
Description of problem:

guest got io-error

Version-Release number of selected component (if applicable):
qemu-kvm-rhev-2.6.0-5.el7.x86_64
kernel-3.10.0-422.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1. in host:

modprobe -r scsi_debug
modprobe scsi_debug lbpu=1 lbpws=1

[root@hp-dl385g7-01 home]# lsscsi 
[2:0:0:0] disk HP LOGICAL VOLUME 5.70 /dev/sda 
[2:3:0:0] storage HP P410i 5.70 - 
[12:0:0:0] disk Linux scsi_debug 0004 /dev/sdb

[root@hp-dl385g7-01 home]# sg_vpd -p0xb2 /dev/sdb
Logical block provisioning VPD page (SBC):
Unmap command supported (LBPU): 1
Write same (16) with unmap bit supported (LBWS): 1
Write same (10) with unmap bit supported (LBWS10): 0
Logical block provisioning read zeros (LBPRZ): 1
Anchored LBAs supported (ANC_SUP): 0
Threshold exponent: 0
Descriptor present (DP): 0
Provisioning type: 0

[root@hp-dl385g7-01 ~]# cat /sys/block/sdb/device/scsi_disk/12\:0\:0\:0/provisioning_mode 
writesame_16



2.boot up guest

MALLOC_PERTURB_=1  /usr/libexec/qemu-kvm \
    -name 'avocado-vt-vm1'  \
    -sandbox off  \
    -machine pc  \
    -nodefaults  \
    -vga cirrus  \
    -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1,server,nowait \
    -mon chardev=qmp_id_qmpmonitor1,mode=control  \
    -chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/omonitor-catch_monitor-20160613-034403-23Mgrt5K,server,nowait \
    -mon chardev=qmp_id_catch_monitor,mode=control \
    -device pvpanic,ioport=0x505,id=id7sDMld  \
    -device ich9-usb-ehci1,id=usb1,addr=1d.7,multifunction=on,bus=pci.0 \
    -device ich9-usb-uhci1,id=usb1.0,multifunction=on,masterbus=usb1.0,addr=1d.0,firstport=0,bus=pci.0 \
    -device ich9-usb-uhci2,id=usb1.1,multifunction=on,masterbus=usb1.0,addr=1d.2,firstport=2,bus=pci.0 \
    -device ich9-usb-uhci3,id=usb1.2,multifunction=on,masterbus=usb1.0,addr=1d.4,firstport=4,bus=pci.0 \
    -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=03,disable-legacy=off,disable-modern=on \
    -drive id=drive_image1,if=none,cache=none,snapshot=off,aio=native,format=qcow2,file=/home/RHEL-Server-7.3-64-virtio-scsi.qcow2 \
    -device scsi-hd,id=image1,drive=drive_image1 \
    -device virtio-net-pci,mac=9a:e8:e9:ea:eb:ec,id=idPUX3Se,vectors=4,netdev=id70bfyc,bus=pci.0,addr=04,disable-legacy=off,disable-modern=on  \
    -netdev tap,id=id70bfyc,vhost=on \
    -m 8192  \
    -smp 8,maxcpus=8,cores=4,threads=1,sockets=2  \
    -cpu 'Opteron_G3',+kvm_pv_unhalt \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -vnc :0  \
    -rtc base=utc,clock=host,driftfix=slew  \
    -boot order=cdn,once=c,menu=off,strict=off \
    -enable-kvm \
    -monitor stdio \
    -drive file=/home/100M.qcow2,if=none,id=drive-data-disk,format=raw,cache=none,aio=native,werror=stop,rerror=stop,discard=on \
    -device virtio-scsi-pci,id=scsi1,bus=pci.0,addr=0x8 \
    -device scsi-hd,drive=drive-data-disk,bus=scsi1.0,id=data-disk \
    -chardev socket,path=/tmp/qga.sock,server,nowait,id=qga0 \
    -device virtio-serial \
    -device virtserialport,chardev=qga0,name=org.qemu.guest_agent.0 \

3.in guest:
mkfs.ext4 /dev/sdb

4.in host:
[root@hp-dl385g7-01 ~]# cat /sys/bus/pseudo/drivers/scsi_debug/map
0-199,704-2727,16256-16383

4. in guest:
mkdir /home/test
mount /dev/sdb /home/test
dd if=/dev/zero of=/home/test/file

5. guest hang 
6.in hmp monitor issue "info status"

(qemu) info status 
VM status: paused (io-error)


Actual results:

guest hang with io-error
Expected results:
guest should running withou error

Additional info:

cpuinfo:

processor	: 15
vendor_id	: AuthenticAMD
cpu family	: 16
model		: 9
model name	: AMD Opteron(tm) Processor 6128
stepping	: 1
microcode	: 0x10000d9
cpu MHz		: 2000.046
cache size	: 512 KB
physical id	: 1
siblings	: 8
core id		: 3
cpu cores	: 8
apicid		: 39
initial apicid	: 23
fpu		: yes
fpu_exception	: yes
cpuid level	: 5
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc art rep_good nopl nonstop_tsc extd_apicid amd_dcm pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt nodeid_msr hw_pstate npt lbrv svm_lock nrip_save pausefilter
bogomips	: 4000.39
TLB size	: 1024 4K pages
clflush size	: 64
cache_alignment	: 64
address sizes	: 48 bits physical, 48 bits virtual
power management: ts ttp tm stc 100mhzsteps hwpstate

Comment 2 Yang Meng 2016-06-14 01:35:29 UTC
polarion cases steps:

1.Boot guest with **/dev/sdc** (generaled as **setup**) with virtio serial and
start guest agent inside guest.
Refer to case: <https://tcms.engineering.redhat.com/case/135941/>.
e.g:...-drive file=/dev/sdc,if=none,id=drive-data-disk,format=**raw**,cache=no
ne,aio=native,werror=stop,rerror=stop****,**discard=on **-device virtio-scsi-
pci,id=scsi1,bus=pci.0,addr=0x8-device **scsi-block**,drive=drive-data-
disk,bus=scsi1.0,id=data-disk \
-chardev socket,path=/tmp/qga.sock,server,nowait,id=qga0 -device virtio-serial -device virtserialport,chardev=qga0,name=org.qemu.guest_agent.0
1.after step 1, boot guest successfully.



2.Make file system to the disk in the guest.
# mkfs.ext4 /dev/sdb	2.after step 3,
# cat /sys/bus/pseudo/drivers/scsi_debug/map
1-616,16257-16383
3. On the host
# cat /sys/bus/pseudo/drivers/scsi_debug/map	3.after step 5,
# cat /sys/bus/pseudo/drivers/scsi_debug/map
1-616,645-1588,1599-4026,4029-16383
4.On the guest
# mount /dev/sdb /home/test
# dd if=/dev/zero of=/home/test/file	4.after step 7,
{"execute":"guest-fstrim"}
{"return": {}}
5.cat map in host.
# cat /sys/bus/pseudo/drivers/scsi_debug/map	5.after step 8,
# cat /sys/bus/pseudo/drivers/scsi_debug/map
**1-612**
6.in guest.
# rm /home/test/file	
7.connect the chardev socket in host side and send "guest-fstrim" command in
the host side:
# nc -U /tmp/qga.sock
{"execute":"guest-fstrim"}	
8.cat map in host.
# cat /sys/bus/pseudo/drivers/scsi_debug/map


i followed the case steps to find this problem ,if it is wrong ,please help to correct me ,thanks.

Comment 3 Fam Zheng 2016-06-22 03:27:50 UTC
Your command line has two "-drive":

    -drive id=drive_image1,if=none,cache=none,snapshot=off,aio=native,format=qcow2,file=/home/RHEL-Server-7.3-64-virtio-scsi.qcow2 \
    ...
    -drive file=/home/100M.qcow2,if=none,id=drive-data-disk,format=raw,cache=none,aio=native,werror=stop,rerror=stop,discard=on \

both of which are qcow2 images under /home, how is it related to scsi_debug and thin-provisioning?

Comment 4 Fam Zheng 2016-06-22 03:29:43 UTC
Also please make check that your host filesystem is not 100% full, because if so, the stop is expected (werror=stop).

Comment 5 Yang Meng 2016-06-22 09:12:10 UTC
(In reply to Fam Zheng from comment #3)
> Your command line has two "-drive":
> 
>     -drive
> id=drive_image1,if=none,cache=none,snapshot=off,aio=native,format=qcow2,
> file=/home/RHEL-Server-7.3-64-virtio-scsi.qcow2 \
>     ...
>     -drive
> file=/home/100M.qcow2,if=none,id=drive-data-disk,format=raw,cache=none,
> aio=native,werror=stop,rerror=stop,discard=on \
> 
> both of which are qcow2 images under /home, how is it related to scsi_debug
> and thin-provisioning?

sorry ,i paste the wrong commandline, modified it to:

modify the qemu commandline to:

MALLOC_PERTURB_=1  /usr/libexec/qemu-kvm \
    -name 'avocado-vt-vm1'  \
    -sandbox off  \
    -machine pc  \
    -nodefaults  \
    -vga cirrus  \
    -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1,server,nowait \
    -mon chardev=qmp_id_qmpmonitor1,mode=control  \
    -chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/omonitor-catch_monitor-20160613-034403-23Mgrt5K,server,nowait \
    -mon chardev=qmp_id_catch_monitor,mode=control \
    -device pvpanic,ioport=0x505,id=id7sDMld  \
    -device ich9-usb-ehci1,id=usb1,addr=1d.7,multifunction=on,bus=pci.0 \
    -device ich9-usb-uhci1,id=usb1.0,multifunction=on,masterbus=usb1.0,addr=1d.0,firstport=0,bus=pci.0 \
    -device ich9-usb-uhci2,id=usb1.1,multifunction=on,masterbus=usb1.0,addr=1d.2,firstport=2,bus=pci.0 \
    -device ich9-usb-uhci3,id=usb1.2,multifunction=on,masterbus=usb1.0,addr=1d.4,firstport=4,bus=pci.0 \
    -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=03,disable-legacy=off,disable-modern=on \
    -drive id=drive_image1,if=none,cache=none,snapshot=off,aio=native,format=qcow2,file=/home/RHEL-Server-7.3-64-virtio-scsi.qcow2 \
    -device scsi-hd,id=image1,drive=drive_image1 \
    -device virtio-net-pci,mac=9a:e8:e9:ea:eb:ec,id=idPUX3Se,vectors=4,netdev=id70bfyc,bus=pci.0,addr=04,disable-legacy=off,disable-modern=on  \
    -netdev tap,id=id70bfyc,vhost=on \
    -m 8192  \
    -smp 8,maxcpus=8,cores=4,threads=1,sockets=2  \
    -cpu 'Opteron_G3',+kvm_pv_unhalt \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -vnc :0  \
    -rtc base=utc,clock=host,driftfix=slew  \
    -boot order=cdn,once=c,menu=off,strict=off \
    -enable-kvm \
    -monitor stdio \
    -drive file=/dev/sdb,if=none,id=drive-data-disk,format=raw,cache=none,aio=native,werror=enospc,rerror=report,discard=on \
    -device virtio-scsi-pci,id=scsi1,bus=pci.0,addr=0x8 \
    -device scsi-block,drive=drive-data-disk,bus=scsi1.0,id=data-disk \
    -chardev socket,path=/tmp/qga.sock,server,nowait,id=qga0 \
    -device virtio-serial \
    -device virtserialport,chardev=qga0,name=org.qemu.guest_agent.0 \

Comment 6 Yang Meng 2016-06-22 09:23:22 UTC
(In reply to Fam Zheng from comment #4)
> Also please make check that your host filesystem is not 100% full, because
> if so, the stop is expected (werror=stop).

when i issue dd command ,the guest hangs ,can you provide a way to check the filesystem ,thanks.

and do you mean if i use werror=stop,rerror=stop ,it will hang and the hmp monitor should show io-error when i issue "info status"

i think it should show like the following and the guest should not hang.

[root@bootp-73-199-1 home]# mount /dev/sdb /home/test
[root@bootp-73-199-1 home]# dd if=/dev/zero of=/home/test/file
dd: writing to ‘/home/test/file’: No space left on device
976777+0 records in
976776+0 records out
500109312 bytes (500 MB) copied, 6.79864 s, 73.6 MB/s



BTW: if use werror=enospc,rerror=report , guset will not hang and just give warnings "No space left on device"

if i was wrong, please help correct ,thanks.

Comment 7 Fam Zheng 2016-06-22 11:36:34 UTC
(In reply to Yang Meng from comment #6)
> and do you mean if i use werror=stop,rerror=stop ,it will hang and the hmp
> monitor should show io-error when i issue "info status"

Yes, "stop" is expected, because your host filesystem is full, and it is an io-error which will be captured by werror=. And you werror= is 'stop'.

Comment 8 Yang Meng 2016-06-23 01:56:19 UTC
(In reply to Fam Zheng from comment #7)
> (In reply to Yang Meng from comment #6)
> > and do you mean if i use werror=stop,rerror=stop ,it will hang and the hmp
> > monitor should show io-error when i issue "info status"
> 
> Yes, "stop" is expected, because your host filesystem is full, and it is an
> io-error which will be captured by werror=. And you werror= is 'stop'.

thanks for your explanation.

BTW, i tried with image file like:
    -drive file=/home/1G.raw,if=none,id=drive-data-disk,format=raw,cache=none,aio=native,werror=stop,rerror=stop \
    -device virtio-scsi-pci,id=scsi1,bus=pci.0,addr=0x8 \
    -device scsi-hd,drive=drive-data-disk,bus=scsi1.0,id=data-disk \

why it didn't cause io-error in hmp monitor and just got "dd: writing to ‘/home/test/file’: No space left on device",could you help to explain the difference,thanks

Comment 9 Fam Zheng 2016-06-23 02:21:28 UTC
This is because in this case, stop only happens when the _host_ disk is full.