Bug 1276985

Summary:	Guest did not recognize the new size after spapr-vscsi block resize in qmp until the guest reboot and the qmp will report io error for spapr-scsi block resize
Product:	Red Hat Enterprise Linux 7	Reporter:	Shuang Yu <shuyu>
Component:	libstoragemgmt	Assignee:	Ewan D. Milne <emilne>
Status:	CLOSED WONTFIX	QA Contact:	Storage QE <storage-qe>
Severity:	medium	Docs Contact:
Priority:	low
Version:	7.2	CC:	bugproxy, dgibson, emilne, fnovak, gduarte, hannsj_uhl, knoel, michen, ngu, qzhang, thuth, virt-maint
Target Milestone:	rc
Target Release:	7.5
Hardware:	ppc64le
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2017-09-19 12:48:39 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1399177, 1444027

Description Shuang Yu 2015-11-02 01:47:57 UTC

Description of problem:
Guest cannot recognize the new size after spapr-vscsi block resize in qmp until the guest reboot,and the qmp will report "BLOCK_IO_ERROR" for spapr-scsi block resize.

Version-Release number of selected component (if applicable):
kernel-3.10.0-326.el7.ppc64le
qemu-kvm-rhev-2.3.0-31.el7.ppc64le
SLOF-20150313-5.gitc89b0df.el7.noarch

How reproducible:
2/2

Steps to Reproduce:

1.Boot up the guest with spapr-vscsi data disk on PowerPC:

# /usr/libexec/qemu-kvm -name Bug-reverify -machine pseries,accel=kvm,usb=off -m 4G -smp 8,sockets=2,cores=1,threads=4 -uuid 8aeab7e2-f341-4f8c-80e8-59e2968d85c2 -realtime mlock=off -nodefaults -monitor stdio -rtc base=utc -msg timestamp=on -usb -device usb-tablet,id=tablet1  -vga std -qmp tcp:0:4666,server,nowait -netdev tap,id=hostnet1,script=/etc/qemu-ifup,vhost=on -device virtio-net-pci,netdev=hostnet1,id=net1,mac=00:54:5a:52:5f:5c -vnc :10 -device spapr-vscsi,id=scsi0,reg=0x6000 -drive file=RHEL-7.2-20151015.0-Server-ppc64.qcow2,format=qcow2,if=none,id=drive-scsi0-0-0-0,cache=none -device scsi-hd,bus=scsi0.0,drive=drive-scsi0-0-0-0,bootindex=1,id=scsi0-0-0-0 -drive file=RHEL-7.2-20151015.0-Server-ppc64-dvd1.iso,format=raw,if=none,id=drive-scsi1,cache=none -device scsi-cd,bus=scsi0.0,drive=drive-scsi1,bootindex=2,id=scsi1 -drive file=data.raw,format=raw,if=none,id=drive-scsi2,cache=none -device scsi-hd,bus=scsi0.0,drive=drive-scsi2,id=scsi2

2.In the guest check the data disk size:

# fdisk -l

3.In the qmp,check the data disk size:

{"execute":"qmp_capabilities"}
{"return": {}}

{"execute":"query-block"}

4.In the qmp,change the data disk size and check the data disk size again:

{"execute":"block_resize","arguments":{"device":"drive-scsi2","size":10737418240}}

{"execute":"query-block"}

5.In the guest,check the data disk size again:

#fdisk -l

6.In the qmp,restart the guest:
{"execute":"system_reset"}
{"return": {}}

7.In the guest,check the data disk size again:
#fdisk -l



Actual results:

After step 2:
# fdisk -l
Disk /dev/sdb: 21.5 GB, 21474836480 bytes, 41943040 sectors   

After step 3:
{"execute":"query-block"}
... {"virtual-size": 21474836480, "filename": "data.raw", "format": "raw", "actual-size": 0, "dirty-flag": false}, ...

After step 4:

{"execute":"block_resize","arguments":{"device":"drive-scsi2","size":10737418240}}
{"return": {}}

{"execute":"query-block"}
...{"virtual-size": 10737418240, "filename": "data.raw", "format": "raw", "actual-size": 0, "dirty-flag": false}...    

{"timestamp": {"seconds": 1446170431, "microseconds": 570579}, "event": "BLOCK_IO_ERROR", "data": {"device": "drive-scsi2", "nospace": false, "__com.redhat_reason": "eio", "reason": "Input/output error", "operation": "read", "action": "report"}}
{"timestamp": {"seconds": 1446170431, "microseconds": 585497}, "event": "BLOCK_IO_ERROR", "data": {"device": "drive-scsi2", "nospace": false, "__com.redhat_reason": "eio", "reason": "Input/output error", "operation": "read", "action": "report"}}

After step 5:
#fdisk -l
Disk /dev/sdb: 21.5 GB, 21474836480 bytes, 41943040 sectors

After step 7:
#fdisk -l
Disk /dev/sdb: 10.7 GB, 10737418240 bytes, 20971520 sectors

Expected results:
If the spapr-vscsi support block-resize,there should no "BLOCK_IO_ERROR" in the qmp.
And for virtio-scsi/virtio-block block resize,the guest no need to reboot to recognize the new block size,so spapr-vscsi be consistent with them will be better.

Additional info:
Test virtio-scsi/virtio-block data disk resize with the same steps,when I change the block size in the qmp,then check the block size in the guest with "#fdisk -l",
the guest will recognized the new block size immediately and in the qmp didnot have io error.

And test on x86 platform,with virtio-scsi/virtio-blk data disk,resize block in the qmp,the guest can recognize the new block size immediately .

Comment 3 Gu Nini 2015-11-04 10:21:20 UTC

(In reply to Shuang Yu from comment #0)
> Description of problem:
> Guest cannot recognize the new size after spapr-vscsi block resize in qmp
> until the guest reboot,and the qmp will report "BLOCK_IO_ERROR" for
> spapr-scsi block resize.
> 
> Version-Release number of selected component (if applicable):
> kernel-3.10.0-326.el7.ppc64le
> qemu-kvm-rhev-2.3.0-31.el7.ppc64le
> SLOF-20150313-5.gitc89b0df.el7.noarch
> 


On the same software versions, I could reproduce the bug; and it's found if block_resize to a larger size than the original one, there is no the bug problem, i.e. it only occurs when block_resize to a smaller size and with cmd 'fdisk -l' follows.

Comment 7 Thomas Huth 2016-01-29 15:46:40 UTC

FWIW, the problem can also be reproduced via the HMP monitor (which is a little bit easier to use than the QMP monitor):

[root@localhost ~]# fdisk -l /dev/sdb

Disk /dev/sdb: 12.9 GB, 12884901888 bytes, 25165824 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

[root@localhost ~]#
QEMU 2.5.50 monitor - type 'help' for more information
(qemu) block_resize drive-scsi2 1G
(qemu) 

[root@localhost ~]# fdisk -l /dev/sdb
[   70.053495] sd 0:0:1:0: Capacity data has changed

Disk /dev/sdb: 12.9 GB, 12884901888 bytes, 25165824 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

[root@localhost ~]# [   70.166005] sd 0:0:1:0: [sdb] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[   70.166072] sd 0:0:1:0: [sdb] Sense Key : Aborted Command [current] 
[   70.166110] sd 0:0:1:0: [sdb] Add. Sense: I/O process terminated
[   70.166150] sd 0:0:1:0: [sdb] CDB: Read(10) 28 00 01 7f ff 80 00 00 80 00
[   70.166189] blk_update_request: I/O error, dev sdb, sector 25165696
[   70.269402] sd 0:0:1:0: [sdb] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[   70.269479] sd 0:0:1:0: [sdb] Sense Key : Aborted Command [current] 
[   70.269517] sd 0:0:1:0: [sdb] Add. Sense: I/O process terminated
[   70.269555] sd 0:0:1:0: [sdb] CDB: Read(10) 28 00 01 7f ff 80 00 00 80 00
[   70.269594] blk_update_request: I/O error, dev sdb, sector 25165696
[   70.269634] Buffer I/O error on device sdb, logical block 196607

Comment 8 Thomas Huth 2016-02-04 13:08:18 UTC

FWIW, the virtio-scsi driver in the guest has some explicit code to handle CAPACITY_CHANGE events, in the function virtscsi_handle_param_change():

	/* Handle "Parameters changed", "Mode parameters changed", and
	   "Capacity data has changed".  */
	if (asc == 0x2a && (ascq == 0x00 || ascq == 0x01 || ascq == 0x09))
		scsi_rescan_device(&sdev->sdev_gendev);

I can not see anything similar in the ibmvscsi.c code (which is the driver for the spapr-vscsi device), so we might need to add something similar there.

Comment 9 Thomas Huth 2016-04-04 15:53:39 UTC

Ok, after reading a little bit more through the kernel sources, I think this should normally be handled via the "CAPACITY_DATA_HAS_CHANGED" udev event. You've got to install the "libstoragemgmt-udev" RPM for this. Unfortunately, the rule for rescanning the device on "CAPACITY_DATA_HAS_CHANGED" has been commented out by default, so you've got to manually enable it again in the /lib/udev/rules.d/90-scsi-ua.rules file after installing the "libstoragemgmt-udev" package. Then I get:

[root@localhost ~]# fdisk -l /dev/sdb

Disk /dev/sdb: 10.7 GB, 10737418240 bytes, 20971520 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

[root@localhost ~]# 
<<pressed CTRL-a+c to enter the QEMU monitor>>
QEMU 2.5.90 monitor - type 'help' for more information
(qemu) block_resize drive-scsi2 1G
(qemu) 
<<pressed CTRL-a+c to leave the QEMU monitor>>

[root@localhost ~]# fdisk -l /dev/sdb
[   83.308274] sd 0:0:1:0: Capacity data has changed
[   83.310172] sd 0:0:1:0: [sdb] 2097152 512-byte logical blocks: (1.07 GB/1.00 GiB)
[   83.310483] sdb: detected capacity change from 10737418240 to 1073741824

Disk /dev/sdb: 10.7 GB, 10737418240 bytes, 20971520 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

[root@localhost ~]# fdisk -l /dev/sdb

Disk /dev/sdb: 1073 MB, 1073741824 bytes, 2097152 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

... that means for the first "fdisk -l", the size was not updated yet, but it worked fine immediately afterwards, and the scary error message in the kernel output is gone, too.

I think that line in /lib/udev/rules.d/90-scsi-ua.rules really should not be disabled by default - so I'm re-assigning this ticket to libstoragemgmt for further discussion.

Comment 10 Thomas Huth 2016-04-04 15:56:44 UTC

BTW, if I've got that right, the line in 90-scsi-ua.rules was disabled because of BZ 1071382 ... however, that was due to multipath devices only, so I think this feature should still be enabled for non-multipath devices somehow.

Comment 11 Ewan D. Milne 2016-04-04 18:18:52 UTC

We had to remove the automatic rescanning of the device properties upon receipt
of CAPACITY DATA HAS CHANGED unit attention, because multipath will discontinue
using a path if the capacity does not match the other paths (even if it is the
only good path to the device).  So until multipath is changed we cannot enable
this by default.

Comment 12 IBM Bug Proxy 2016-06-10 23:20:31 UTC

------- Comment From tyreld.com 2016-06-10 19:10 EDT-------
What is the expectation going forward? Is this a multipath bug or are we treating this as an ibmvscsi bug? It wasn't clear to me if multipath needs to be fixed so the udev rule can be used, or if there was an expectation that the vscsi driver needs to be fixed to perform a rescan on capacity change?

Comment 13 Ewan D. Milne 2016-06-13 18:01:15 UTC

We cannot enable the udev rule to automatically rescan scsi devices upon receipt
of ASC/ASCQ 2A 09 CAPACITY DATA HAS CHANGED until multipath stops treating a
capacity mismatch as a reason to fail the path.  The support to automatically
resize a disk is also not completely extended through the I/O stack -- if the
SCSI disk were rescanned, the scsi_disk capacity would be updated, but an LVM
volume would not be resized, the file system would not be resized, etc.

I think what you want to do here is file an RFE with what you are looking to
have supported.

It appears as if the I/O error that is mentioned in the problem description
was caused by the resized LUN being smaller than originally probed, and the
device being accessed outside the new (smaller) size.  Under what circumstances
would this be desired?  Even if we probed the new LUN size, it would still be
smaller.  (Most requests we get for the ability to resize LUNs are for people
looking to grow a LUN.)

Comment 14 David Gibson 2016-06-14 03:12:20 UTC

*** Bug 1345720 has been marked as a duplicate of this bug. ***

Comment 16 IBM Bug Proxy 2017-09-14 11:50:29 UTC

------- Comment From lagarcia.com 2017-09-14 07:45 EDT-------
IBM suggests to close this bug as will not fix. From Tyrel:

----------------------------

I recommend closing this as per the issues brought up by Redhat as to why this doesn't currently work out of the box (comment 11 in LTC bugzilla and comment 13 in Red Hat bugzilla). Further, shrinking a live disk is generally ill advised anyways.

----------------------------

Comment 17 Ewan D. Milne 2017-09-19 12:48:39 UTC

Closing per IBM comments.