1415637 – blk_update_request: I/O error

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1415637 - blk_update_request: I/O error

Summary: blk_update_request: I/O error

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	qemu-kvm-rhev
Sub Component:
Version:	7.4
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	unspecified
Target Milestone:	rc
Target Release:	---
Assignee:	Stefan Hajnoczi
QA Contact:	Xueqiang Wei
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-01-23 10:19 UTC by Yongxue Hong
Modified:	2020-01-10 10:03 UTC (History)
CC List:	10 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-01-24 13:48:02 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Yongxue Hong 2017-01-23 10:19:10 UTC

Description of problem:
When run fio on x86 guest, the result seems like unreasonable, and even appears I/O error.

Version-Release number of selected component (if applicable):
host kernel:3.10.0-543.el7.x86_64
qemu :qemu-kvm-rhev-2.8.0-2.el7
guest kernel:3.10.0-543.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1.boot a guest
eg:
/usr/libexec/qemu-kvm \
-name RHEL7-8378 \
-m 2G \
-smp 4 \
-boot menu=on,order=c,splash-time=5000 \
-device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=06 \
-drive file=/home/hyx/os/RHEL-7.3-20161019.0-Server-x86_64-dvd1.iso,if=none,media=cdrom,id=image0 \
-device scsi-cd,id=scsi-cd0,drive=image0,channel=0,scsi-id=0,lun=2,bootindex=1 \
-drive file=/home/hyx/image/RHEL7-8378-2-15G.raw,if=none,media=disk,format=raw,cache=none,bps=102400,iops=100,id=image2 \
-device scsi-hd,id=scsi-hd2,drive=image2,channel=0,scsi-id=0,lun=1,bootindex=0 \
-drive file=/home/hyx/image/RHEL7-8378-20G.raw,if=none,media=disk,format=raw,id=image1 \
-device scsi-hd,id=scsi-hd1,drive=image1,channel=0,scsi-id=0,lun=0,bootindex=2 \
-netdev tap,id=hostnet0,script=/etc/qemu-ifup \
-device virtio-net-pci,netdev=hostnet0,id=virtio-net-pci0,mac=70:e2:84:14:0e:12 \
-serial unix:./sock1,server,nowait \
-monitor stdio \
-qmp tcp:0:6666,server,nowait \
-vnc :0 

2.login guest and run fio
eg:
1th:
fio -filename=/dev/sdb -direct=1 -rw=write -bs=1M -ioengine=libaio -size=10M -name=test -iodepth=16 -runtime=300

2th:
fio -filename=/dev/sdb -direct=1 -rw=write -bs=64k -ioengine=libaio -size=100M -name=test -iodepth=16 -runtime=600

3th:
fio -filename=/dev/sdb -direct=1 -rw=write -bs=1M -ioengine=libaio -size=1000M -name=test -iodepth=16 -runtime=300 



Actual results:
1th:
[root@dhcp-10-168 home]# fio -filename=/dev/sdb -direct=1 -rw=write -bs=1M -ioengine=libaio -size=10M -name=test -iodepth=16 -runtime=300
test: (g=0): rw=write, bs=1M-1M/1M-1M/1M-1M, ioengine=libaio, iodepth=16
fio-2.2.8
Starting 1 process
Jobs: 1 (f=1): [W(1)] [90.3% done] [0KB/0KB/0KB /s] [0/0/0 iops] [eta 00m:29s]
test: (groupid=0, jobs=1): err= 0: pid=2271: Sun Jan 22 04:46:37 2017
  write: io=10240KB, bw=38830B/s, iops=0, runt=270036msec
    slat (usec): min=91, max=118, avg=109.00, stdev= 7.97
    clat (usec): min=270035K, max=270036K, avg=270035016.10, stdev=336.43
     lat (usec): min=270035K, max=270036K, avg=270035125.60, stdev=331.41
    clat percentiles (msec):
     |  1.00th=[16712],  5.00th=[16712], 10.00th=[16712], 20.00th=[16712],
     | 30.00th=[16712], 40.00th=[16712], 50.00th=[16712], 60.00th=[16712],
     | 70.00th=[16712], 80.00th=[16712], 90.00th=[16712], 95.00th=[16712],
     | 99.00th=[16712], 99.50th=[16712], 99.90th=[16712], 99.95th=[16712],
     | 99.99th=[16712]
    bw (KB  /s): min=    3, max=    3, per=8.11%, avg= 3.00, stdev= 0.00
    lat (msec) : >=2000=100.00%
  cpu          : usr=0.00%, sys=0.00%, ctx=13, majf=0, minf=30
  IO depths    : 1=10.0%, 2=20.0%, 4=40.0%, 8=30.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=0.0%, 8=0.0%, 16=100.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=0/w=10/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
  WRITE: io=10240KB, aggrb=37KB/s, minb=37KB/s, maxb=37KB/s, mint=270036msec, maxt=270036msec

Disk stats (read/write):
  sdb: ios=33/29, merge=0/0, ticks=229134/4409225, in_queue=4908382, util=100.00%

2th:
[root@dhcp-10-168 home]# fio -filename=/dev/sdb -direct=1 -rw=write -bs=64k -ioengine=libaio -size=100M -name=test -iodepth=16 -runtime=600
test: (g=0): rw=write, bs=64K-64K/64K-64K/64K-64K, ioengine=libaio, iodepth=16
fio-2.2.8
Starting 1 process
Jobs: 1 (f=1): [W(1)] [58.7% done] [0KB/0KB/0KB /s] [0/0/0 iops] [eta 07m:09s]   
test: (groupid=0, jobs=1): err= 0: pid=2300: Sun Jan 22 04:58:03 2017
  write: io=61056KB, bw=102522B/s, iops=1, runt=609831msec
    slat (usec): min=5, max=39, avg=17.34, stdev= 3.02
    clat (msec): min=1, max=19199, avg=10217.69, stdev=1046.21
     lat (msec): min=1, max=19199, avg=10217.70, stdev=1046.21
    clat percentiles (msec):
     |  1.00th=[ 5669],  5.00th=[10290], 10.00th=[10290], 20.00th=[10290],
     | 30.00th=[10290], 40.00th=[10290], 50.00th=[10290], 60.00th=[10290],
     | 70.00th=[10290], 80.00th=[10290], 90.00th=[10290], 95.00th=[10290],
     | 99.00th=[13435], 99.50th=[16581], 99.90th=[16712], 99.95th=[16712],
     | 99.99th=[16712]
    bw (KB  /s): min=    6, max=  236, per=100.00%, avg=100.04, stdev= 5.40
    lat (msec) : 2=0.10%, 750=0.10%, 2000=0.21%, >=2000=99.58%
  cpu          : usr=0.00%, sys=0.01%, ctx=954, majf=0, minf=31
  IO depths    : 1=0.1%, 2=0.2%, 4=0.4%, 8=0.8%, 16=98.4%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=99.9%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=0/w=954/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
  WRITE: io=61056KB, aggrb=100KB/s, minb=100KB/s, maxb=100KB/s, mint=609831msec, maxt=609831msec

Disk stats (read/write):
  sdb: ios=0/953, merge=0/0, ticks=0/9670101, in_queue=9680210, util=100.00%

3th:
[root@dhcp-10-168 ~]# fio -filename=/dev/sdb -direct=1 -rw=write -bs=1M -ioengine=libaio -size=1000M -name=test -iodepth=16 -runtime=300 
test: (g=0): rw=write, bs=1M-1M/1M-1M/1M-1M, ioengine=libaio, iodepth=16
fio-2.2.8
Starting 1 process
fio: job 'test' hasn't exited in 60 seconds, it appears to be stuck. Doing forceful exit of this job.

test: (groupid=0, jobs=1): err= 0: pid=2294: Sun Jan 22 06:23:07 2017
    lat (msec) : >=2000=25.00%
  cpu          : usr=0.00%, sys=0.00%, ctx=0, majf=0, minf=0
  IO depths    : 1=5.0%, 2=10.0%, 4=20.0%, 8=40.0%, 16=25.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=0/w=20/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):

Disk stats (read/write):
  sdb: ios=25/25, merge=0/0, ticks=215895/4143126, in_queue=20201945, util=100.00%
fio: file hash not empty on exit
[root@dhcp-10-168 ~]# fio: io_u error on file /dev/sdb: Input/output error: write offset=11534336, buflen=1048576
fio: io_u error on file /dev/sdb: Input/output error: write offset=10485760, buflen=1048576
fio: io_u error on file /dev/sdb: Input/output error: write offset=9437184, buflen=1048576

[root@dhcp-10-168 ~]# [ 1177.095868] blk_update_request: I/O error, dev sdb, sector 24544
[ 1177.096516] blk_update_request: I/O error, dev sdb, sector 23536
[ 1177.097286] blk_update_request: I/O error, dev sdb, sector 22528
[ 1177.097820] blk_update_request: I/O error, dev sdb, sector 22496
[ 1177.098269] blk_update_request: I/O error, dev sdb, sector 21488
[ 1177.098717] blk_update_request: I/O error, dev sdb, sector 20480
[ 1177.099164] blk_update_request: I/O error, dev sdb, sector 20448

Result Summary:
The 1th and 2th fio result seems like unreasonable since the value of iops is too lower.
The 3th fio result show blk_update_request: I/O error.

Expected results:
no error when run fio.

Additional info:
This problem doesn't be reappeared on ppc64le host.

Comment 2 Stefan Hajnoczi 2017-01-24 13:48:02 UTC

This is a recurring NOTABUG issue with bugs QE reports against I/O throttling.

100 KB/s is not a realistic block I/O limit since a single request may be larger than that.  Requests are not split and the bps limit must be honored, so the request will simply never complete.  The guest driver times out after a while and reports an I/O error.  This is expected behavior in the case where a tiny bps limit is used.

A realistic limit is 2 MB/s or larger.  Even then, remember that I/O is in requests so the actual throughput achieved may be 2 MB/s - request_size.

I guess this is /dev/sdb inside your guest:

-drive file=/home/hyx/image/RHEL7-8378-2-15G.raw,if=none,media=disk,format=raw,cache=none,bps=102400,iops=100,id=image2 \

Here are the cases:

1. fio -bs=1M, write: iops=0

The request size is larger than the bps limit so QEMU will never submit the request.  Therefore iops is 0.

2. fio -bs=64k, write: iops=1

100 KB/s / 64 KB/request = 1.56 request/s

Since requests are not split, the result is rounded down to 1 request/s, which is exactly the result we got.

This result is expected.

3. fio -bs=1M, job is stuck

Same as case #1.  No requests will be completed since they are larger than the bps limit.

Comment 3 Yongxue Hong 2017-02-04 10:07:25 UTC

(In reply to Stefan Hajnoczi from comment #2)
> This is a recurring NOTABUG issue with bugs QE reports against I/O
> throttling.
> 
> 100 KB/s is not a realistic block I/O limit since a single request may be
> larger than that.  Requests are not split and the bps limit must be honored,
> so the request will simply never complete.  The guest driver times out after
> a while and reports an I/O error.  This is expected behavior in the case
> where a tiny bps limit is used.
> 
> A realistic limit is 2 MB/s or larger.  Even then, remember that I/O is in
> requests so the actual throughput achieved may be 2 MB/s - request_size.
> 
> I guess this is /dev/sdb inside your guest:
> 
> -drive
> file=/home/hyx/image/RHEL7-8378-2-15G.raw,if=none,media=disk,format=raw,
> cache=none,bps=102400,iops=100,id=image2 \
> 
> Here are the cases:
> 
> 1. fio -bs=1M, write: iops=0
> 
> The request size is larger than the bps limit so QEMU will never submit the
> request.  Therefore iops is 0.
> 
> 2. fio -bs=64k, write: iops=1
> 
> 100 KB/s / 64 KB/request = 1.56 request/s
> 
> Since requests are not split, the result is rounded down to 1 request/s,
> which is exactly the result we got.
> 
> This result is expected.
> 
> 3. fio -bs=1M, job is stuck
> 
> Same as case #1.  No requests will be completed since they are larger than
> the bps limit.
Hi Stefan Hajnoczi
For 1th and 3th test, I tested it again with dd tool,but not prompt "blk_update_request: I/O error".

Following is my run command:
1th:
[root@dhcp-10-168 home]# dd if=/dev/zero of=/dev/sdb bs=1M count=10 oflag=direct
10+0 records in
10+0 records out
10485760 bytes (10 MB) copied, 102.141 s, 103 kB/s

2th:
[root@dhcp-10-168 home]# dd if=/dev/zero of=/dev/sdb bs=1M count=100 oflag=direct
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 1023.74 s, 102 kB/s

3th:
[root@dhcp-10-168 home]# dd if=/dev/zero of=/dev/sdb bs=1M count=1000 oflag=direct
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 10239.7 s, 102 kB/s


The value of test is limited 100kB ,and the bs is larger than the bps limit.

Comment 4 Yongxue Hong 2017-02-06 09:16:41 UTC

(In reply to Stefan Hajnoczi from comment #2)
> This is a recurring NOTABUG issue with bugs QE reports against I/O
> throttling.
> 
> 100 KB/s is not a realistic block I/O limit since a single request may be
> larger than that.  Requests are not split and the bps limit must be honored,
> so the request will simply never complete.  The guest driver times out after
> a while and reports an I/O error.  This is expected behavior in the case
> where a tiny bps limit is used.
> 
> A realistic limit is 2 MB/s or larger.  Even then, remember that I/O is in
> requests so the actual throughput achieved may be 2 MB/s - request_size.
> 
> I guess this is /dev/sdb inside your guest:
> 
> -drive
> file=/home/hyx/image/RHEL7-8378-2-15G.raw,if=none,media=disk,format=raw,
> cache=none,bps=102400,iops=100,id=image2 \
> 
> Here are the cases:
> 
> 1. fio -bs=1M, write: iops=0
> 
> The request size is larger than the bps limit so QEMU will never submit the
> request.  Therefore iops is 0.
> 
> 2. fio -bs=64k, write: iops=1
> 
> 100 KB/s / 64 KB/request = 1.56 request/s
> 
> Since requests are not split, the result is rounded down to 1 request/s,
> which is exactly the result we got.
> 
> This result is expected.
> 
> 3. fio -bs=1M, job is stuck
> 
> Same as case #1.  No requests will be completed since they are larger than
> the bps limit.

Hi Stefan Hajnoczi
For 1th and 3th test, I tested it again with dd tool,but not prompt "blk_update_request: I/O error".

Following is my run command:
1th:
[root@dhcp-10-168 home]# dd if=/dev/zero of=/dev/sdb bs=1M count=10 oflag=direct
10+0 records in
10+0 records out
10485760 bytes (10 MB) copied, 102.141 s, 103 kB/s

2th:
[root@dhcp-10-168 home]# dd if=/dev/zero of=/dev/sdb bs=1M count=100 oflag=direct
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 1023.74 s, 102 kB/s

3th:
[root@dhcp-10-168 home]# dd if=/dev/zero of=/dev/sdb bs=1M count=1000 oflag=direct
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 10239.7 s, 102 kB/s


The value of test is limited 100kB ,and the bs is larger than the bps limit,
and no blk_update_request error.

Comment 5 Yongxue Hong 2017-02-15 08:19:17 UTC

Hi Stefan Hajnoczi:
I have ever run the same fio in ppc64, then no any error.
Please refer to the comment 17 of bug(https://bugzilla.redhat.com/show_bug.cgi?id=1413921).
thank you.

Comment 6 Stefan Hajnoczi 2017-02-16 18:51:47 UTC

(In reply to Yongxue Hong from comment #3)
> (In reply to Stefan Hajnoczi from comment #2)
> > This is a recurring NOTABUG issue with bugs QE reports against I/O
> > throttling.
> > 
> > 100 KB/s is not a realistic block I/O limit since a single request may be
> > larger than that.  Requests are not split and the bps limit must be honored,
> > so the request will simply never complete.  The guest driver times out after
> > a while and reports an I/O error.  This is expected behavior in the case
> > where a tiny bps limit is used.
> > 
> > A realistic limit is 2 MB/s or larger.  Even then, remember that I/O is in
> > requests so the actual throughput achieved may be 2 MB/s - request_size.
> > 
> > I guess this is /dev/sdb inside your guest:
> > 
> > -drive
> > file=/home/hyx/image/RHEL7-8378-2-15G.raw,if=none,media=disk,format=raw,
> > cache=none,bps=102400,iops=100,id=image2 \
> > 
> > Here are the cases:
> > 
> > 1. fio -bs=1M, write: iops=0
> > 
> > The request size is larger than the bps limit so QEMU will never submit the
> > request.  Therefore iops is 0.
> > 
> > 2. fio -bs=64k, write: iops=1
> > 
> > 100 KB/s / 64 KB/request = 1.56 request/s
> > 
> > Since requests are not split, the result is rounded down to 1 request/s,
> > which is exactly the result we got.
> > 
> > This result is expected.
> > 
> > 3. fio -bs=1M, job is stuck
> > 
> > Same as case #1.  No requests will be completed since they are larger than
> > the bps limit.
> Hi Stefan Hajnoczi
> For 1th and 3th test, I tested it again with dd tool,but not prompt
> "blk_update_request: I/O error".
> 
> Following is my run command:
> 1th:
> [root@dhcp-10-168 home]# dd if=/dev/zero of=/dev/sdb bs=1M count=10
> oflag=direct
> 10+0 records in
> 10+0 records out
> 10485760 bytes (10 MB) copied, 102.141 s, 103 kB/s
> 
> 2th:
> [root@dhcp-10-168 home]# dd if=/dev/zero of=/dev/sdb bs=1M count=100
> oflag=direct
> 100+0 records in
> 100+0 records out
> 104857600 bytes (105 MB) copied, 1023.74 s, 102 kB/s
> 
> 3th:
> [root@dhcp-10-168 home]# dd if=/dev/zero of=/dev/sdb bs=1M count=1000
> oflag=direct
> 1000+0 records in
> 1000+0 records out
> 1048576000 bytes (1.0 GB) copied, 10239.7 s, 102 kB/s
> 
> 
> The value of test is limited 100kB ,and the bs is larger than the bps limit.

The guest kernel splits the 1 MB write requests because they exceed the block queue limits.  The guest is not submitting 1 MB write requests.

You can check /sys/block/sdb/queue/{max_hw_sectors_kb,max_sectors_kb} for the queue limits.

Note You need to log in before you can comment on or make changes to this bug.