Bug 1415637 - blk_update_request: I/O error
Summary: blk_update_request: I/O error
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm-rhev
Version: 7.4
Hardware: x86_64
OS: Linux
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Stefan Hajnoczi
QA Contact: Xueqiang Wei
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-01-23 10:19 UTC by Yongxue Hong
Modified: 2020-01-10 10:03 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-01-24 13:48:02 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Yongxue Hong 2017-01-23 10:19:10 UTC
Description of problem:
When run fio on x86 guest, the result seems like unreasonable, and even appears I/O error.

Version-Release number of selected component (if applicable):
host kernel:3.10.0-543.el7.x86_64
qemu :qemu-kvm-rhev-2.8.0-2.el7
guest kernel:3.10.0-543.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1.boot a guest
eg:
/usr/libexec/qemu-kvm \
-name RHEL7-8378 \
-m 2G \
-smp 4 \
-boot menu=on,order=c,splash-time=5000 \
-device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=06 \
-drive file=/home/hyx/os/RHEL-7.3-20161019.0-Server-x86_64-dvd1.iso,if=none,media=cdrom,id=image0 \
-device scsi-cd,id=scsi-cd0,drive=image0,channel=0,scsi-id=0,lun=2,bootindex=1 \
-drive file=/home/hyx/image/RHEL7-8378-2-15G.raw,if=none,media=disk,format=raw,cache=none,bps=102400,iops=100,id=image2 \
-device scsi-hd,id=scsi-hd2,drive=image2,channel=0,scsi-id=0,lun=1,bootindex=0 \
-drive file=/home/hyx/image/RHEL7-8378-20G.raw,if=none,media=disk,format=raw,id=image1 \
-device scsi-hd,id=scsi-hd1,drive=image1,channel=0,scsi-id=0,lun=0,bootindex=2 \
-netdev tap,id=hostnet0,script=/etc/qemu-ifup \
-device virtio-net-pci,netdev=hostnet0,id=virtio-net-pci0,mac=70:e2:84:14:0e:12 \
-serial unix:./sock1,server,nowait \
-monitor stdio \
-qmp tcp:0:6666,server,nowait \
-vnc :0 

2.login guest and run fio
eg:
1th:
fio -filename=/dev/sdb -direct=1 -rw=write -bs=1M -ioengine=libaio -size=10M -name=test -iodepth=16 -runtime=300

2th:
fio -filename=/dev/sdb -direct=1 -rw=write -bs=64k -ioengine=libaio -size=100M -name=test -iodepth=16 -runtime=600

3th:
fio -filename=/dev/sdb -direct=1 -rw=write -bs=1M -ioengine=libaio -size=1000M -name=test -iodepth=16 -runtime=300 



Actual results:
1th:
[root@dhcp-10-168 home]# fio -filename=/dev/sdb -direct=1 -rw=write -bs=1M -ioengine=libaio -size=10M -name=test -iodepth=16 -runtime=300
test: (g=0): rw=write, bs=1M-1M/1M-1M/1M-1M, ioengine=libaio, iodepth=16
fio-2.2.8
Starting 1 process
Jobs: 1 (f=1): [W(1)] [90.3% done] [0KB/0KB/0KB /s] [0/0/0 iops] [eta 00m:29s]
test: (groupid=0, jobs=1): err= 0: pid=2271: Sun Jan 22 04:46:37 2017
  write: io=10240KB, bw=38830B/s, iops=0, runt=270036msec
    slat (usec): min=91, max=118, avg=109.00, stdev= 7.97
    clat (usec): min=270035K, max=270036K, avg=270035016.10, stdev=336.43
     lat (usec): min=270035K, max=270036K, avg=270035125.60, stdev=331.41
    clat percentiles (msec):
     |  1.00th=[16712],  5.00th=[16712], 10.00th=[16712], 20.00th=[16712],
     | 30.00th=[16712], 40.00th=[16712], 50.00th=[16712], 60.00th=[16712],
     | 70.00th=[16712], 80.00th=[16712], 90.00th=[16712], 95.00th=[16712],
     | 99.00th=[16712], 99.50th=[16712], 99.90th=[16712], 99.95th=[16712],
     | 99.99th=[16712]
    bw (KB  /s): min=    3, max=    3, per=8.11%, avg= 3.00, stdev= 0.00
    lat (msec) : >=2000=100.00%
  cpu          : usr=0.00%, sys=0.00%, ctx=13, majf=0, minf=30
  IO depths    : 1=10.0%, 2=20.0%, 4=40.0%, 8=30.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=0.0%, 8=0.0%, 16=100.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=0/w=10/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
  WRITE: io=10240KB, aggrb=37KB/s, minb=37KB/s, maxb=37KB/s, mint=270036msec, maxt=270036msec

Disk stats (read/write):
  sdb: ios=33/29, merge=0/0, ticks=229134/4409225, in_queue=4908382, util=100.00%

2th:
[root@dhcp-10-168 home]# fio -filename=/dev/sdb -direct=1 -rw=write -bs=64k -ioengine=libaio -size=100M -name=test -iodepth=16 -runtime=600
test: (g=0): rw=write, bs=64K-64K/64K-64K/64K-64K, ioengine=libaio, iodepth=16
fio-2.2.8
Starting 1 process
Jobs: 1 (f=1): [W(1)] [58.7% done] [0KB/0KB/0KB /s] [0/0/0 iops] [eta 07m:09s]   
test: (groupid=0, jobs=1): err= 0: pid=2300: Sun Jan 22 04:58:03 2017
  write: io=61056KB, bw=102522B/s, iops=1, runt=609831msec
    slat (usec): min=5, max=39, avg=17.34, stdev= 3.02
    clat (msec): min=1, max=19199, avg=10217.69, stdev=1046.21
     lat (msec): min=1, max=19199, avg=10217.70, stdev=1046.21
    clat percentiles (msec):
     |  1.00th=[ 5669],  5.00th=[10290], 10.00th=[10290], 20.00th=[10290],
     | 30.00th=[10290], 40.00th=[10290], 50.00th=[10290], 60.00th=[10290],
     | 70.00th=[10290], 80.00th=[10290], 90.00th=[10290], 95.00th=[10290],
     | 99.00th=[13435], 99.50th=[16581], 99.90th=[16712], 99.95th=[16712],
     | 99.99th=[16712]
    bw (KB  /s): min=    6, max=  236, per=100.00%, avg=100.04, stdev= 5.40
    lat (msec) : 2=0.10%, 750=0.10%, 2000=0.21%, >=2000=99.58%
  cpu          : usr=0.00%, sys=0.01%, ctx=954, majf=0, minf=31
  IO depths    : 1=0.1%, 2=0.2%, 4=0.4%, 8=0.8%, 16=98.4%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=99.9%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=0/w=954/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
  WRITE: io=61056KB, aggrb=100KB/s, minb=100KB/s, maxb=100KB/s, mint=609831msec, maxt=609831msec

Disk stats (read/write):
  sdb: ios=0/953, merge=0/0, ticks=0/9670101, in_queue=9680210, util=100.00%

3th:
[root@dhcp-10-168 ~]# fio -filename=/dev/sdb -direct=1 -rw=write -bs=1M -ioengine=libaio -size=1000M -name=test -iodepth=16 -runtime=300 
test: (g=0): rw=write, bs=1M-1M/1M-1M/1M-1M, ioengine=libaio, iodepth=16
fio-2.2.8
Starting 1 process
fio: job 'test' hasn't exited in 60 seconds, it appears to be stuck. Doing forceful exit of this job.

test: (groupid=0, jobs=1): err= 0: pid=2294: Sun Jan 22 06:23:07 2017
    lat (msec) : >=2000=25.00%
  cpu          : usr=0.00%, sys=0.00%, ctx=0, majf=0, minf=0
  IO depths    : 1=5.0%, 2=10.0%, 4=20.0%, 8=40.0%, 16=25.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=0/w=20/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):

Disk stats (read/write):
  sdb: ios=25/25, merge=0/0, ticks=215895/4143126, in_queue=20201945, util=100.00%
fio: file hash not empty on exit
[root@dhcp-10-168 ~]# fio: io_u error on file /dev/sdb: Input/output error: write offset=11534336, buflen=1048576
fio: io_u error on file /dev/sdb: Input/output error: write offset=10485760, buflen=1048576
fio: io_u error on file /dev/sdb: Input/output error: write offset=9437184, buflen=1048576

[root@dhcp-10-168 ~]# [ 1177.095868] blk_update_request: I/O error, dev sdb, sector 24544
[ 1177.096516] blk_update_request: I/O error, dev sdb, sector 23536
[ 1177.097286] blk_update_request: I/O error, dev sdb, sector 22528
[ 1177.097820] blk_update_request: I/O error, dev sdb, sector 22496
[ 1177.098269] blk_update_request: I/O error, dev sdb, sector 21488
[ 1177.098717] blk_update_request: I/O error, dev sdb, sector 20480
[ 1177.099164] blk_update_request: I/O error, dev sdb, sector 20448

Result Summary:
The 1th and 2th fio result seems like unreasonable since the value of iops is too lower.
The 3th fio result show blk_update_request: I/O error.

Expected results:
no error when run fio.

Additional info:
This problem doesn't be reappeared on ppc64le host.

Comment 2 Stefan Hajnoczi 2017-01-24 13:48:02 UTC
This is a recurring NOTABUG issue with bugs QE reports against I/O throttling.

100 KB/s is not a realistic block I/O limit since a single request may be larger than that.  Requests are not split and the bps limit must be honored, so the request will simply never complete.  The guest driver times out after a while and reports an I/O error.  This is expected behavior in the case where a tiny bps limit is used.

A realistic limit is 2 MB/s or larger.  Even then, remember that I/O is in requests so the actual throughput achieved may be 2 MB/s - request_size.

I guess this is /dev/sdb inside your guest:

-drive file=/home/hyx/image/RHEL7-8378-2-15G.raw,if=none,media=disk,format=raw,cache=none,bps=102400,iops=100,id=image2 \

Here are the cases:

1. fio -bs=1M, write: iops=0

The request size is larger than the bps limit so QEMU will never submit the request.  Therefore iops is 0.

2. fio -bs=64k, write: iops=1

100 KB/s / 64 KB/request = 1.56 request/s

Since requests are not split, the result is rounded down to 1 request/s, which is exactly the result we got.

This result is expected.

3. fio -bs=1M, job is stuck

Same as case #1.  No requests will be completed since they are larger than the bps limit.

Comment 3 Yongxue Hong 2017-02-04 10:07:25 UTC
(In reply to Stefan Hajnoczi from comment #2)
> This is a recurring NOTABUG issue with bugs QE reports against I/O
> throttling.
> 
> 100 KB/s is not a realistic block I/O limit since a single request may be
> larger than that.  Requests are not split and the bps limit must be honored,
> so the request will simply never complete.  The guest driver times out after
> a while and reports an I/O error.  This is expected behavior in the case
> where a tiny bps limit is used.
> 
> A realistic limit is 2 MB/s or larger.  Even then, remember that I/O is in
> requests so the actual throughput achieved may be 2 MB/s - request_size.
> 
> I guess this is /dev/sdb inside your guest:
> 
> -drive
> file=/home/hyx/image/RHEL7-8378-2-15G.raw,if=none,media=disk,format=raw,
> cache=none,bps=102400,iops=100,id=image2 \
> 
> Here are the cases:
> 
> 1. fio -bs=1M, write: iops=0
> 
> The request size is larger than the bps limit so QEMU will never submit the
> request.  Therefore iops is 0.
> 
> 2. fio -bs=64k, write: iops=1
> 
> 100 KB/s / 64 KB/request = 1.56 request/s
> 
> Since requests are not split, the result is rounded down to 1 request/s,
> which is exactly the result we got.
> 
> This result is expected.
> 
> 3. fio -bs=1M, job is stuck
> 
> Same as case #1.  No requests will be completed since they are larger than
> the bps limit.
Hi Stefan Hajnoczi
For 1th and 3th test, I tested it again with dd tool,but not prompt "blk_update_request: I/O error".

Following is my run command:
1th:
[root@dhcp-10-168 home]# dd if=/dev/zero of=/dev/sdb bs=1M count=10 oflag=direct
10+0 records in
10+0 records out
10485760 bytes (10 MB) copied, 102.141 s, 103 kB/s

2th:
[root@dhcp-10-168 home]# dd if=/dev/zero of=/dev/sdb bs=1M count=100 oflag=direct
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 1023.74 s, 102 kB/s

3th:
[root@dhcp-10-168 home]# dd if=/dev/zero of=/dev/sdb bs=1M count=1000 oflag=direct
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 10239.7 s, 102 kB/s


The value of test is limited 100kB ,and the bs is larger than the bps limit.

Comment 4 Yongxue Hong 2017-02-06 09:16:41 UTC
(In reply to Stefan Hajnoczi from comment #2)
> This is a recurring NOTABUG issue with bugs QE reports against I/O
> throttling.
> 
> 100 KB/s is not a realistic block I/O limit since a single request may be
> larger than that.  Requests are not split and the bps limit must be honored,
> so the request will simply never complete.  The guest driver times out after
> a while and reports an I/O error.  This is expected behavior in the case
> where a tiny bps limit is used.
> 
> A realistic limit is 2 MB/s or larger.  Even then, remember that I/O is in
> requests so the actual throughput achieved may be 2 MB/s - request_size.
> 
> I guess this is /dev/sdb inside your guest:
> 
> -drive
> file=/home/hyx/image/RHEL7-8378-2-15G.raw,if=none,media=disk,format=raw,
> cache=none,bps=102400,iops=100,id=image2 \
> 
> Here are the cases:
> 
> 1. fio -bs=1M, write: iops=0
> 
> The request size is larger than the bps limit so QEMU will never submit the
> request.  Therefore iops is 0.
> 
> 2. fio -bs=64k, write: iops=1
> 
> 100 KB/s / 64 KB/request = 1.56 request/s
> 
> Since requests are not split, the result is rounded down to 1 request/s,
> which is exactly the result we got.
> 
> This result is expected.
> 
> 3. fio -bs=1M, job is stuck
> 
> Same as case #1.  No requests will be completed since they are larger than
> the bps limit.

Hi Stefan Hajnoczi
For 1th and 3th test, I tested it again with dd tool,but not prompt "blk_update_request: I/O error".

Following is my run command:
1th:
[root@dhcp-10-168 home]# dd if=/dev/zero of=/dev/sdb bs=1M count=10 oflag=direct
10+0 records in
10+0 records out
10485760 bytes (10 MB) copied, 102.141 s, 103 kB/s

2th:
[root@dhcp-10-168 home]# dd if=/dev/zero of=/dev/sdb bs=1M count=100 oflag=direct
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 1023.74 s, 102 kB/s

3th:
[root@dhcp-10-168 home]# dd if=/dev/zero of=/dev/sdb bs=1M count=1000 oflag=direct
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 10239.7 s, 102 kB/s


The value of test is limited 100kB ,and the bs is larger than the bps limit,
and no blk_update_request error.

Comment 5 Yongxue Hong 2017-02-15 08:19:17 UTC
Hi Stefan Hajnoczi:
I have ever run the same fio in ppc64, then no any error.
Please refer to the comment 17 of bug(https://bugzilla.redhat.com/show_bug.cgi?id=1413921).
thank you.

Comment 6 Stefan Hajnoczi 2017-02-16 18:51:47 UTC
(In reply to Yongxue Hong from comment #3)
> (In reply to Stefan Hajnoczi from comment #2)
> > This is a recurring NOTABUG issue with bugs QE reports against I/O
> > throttling.
> > 
> > 100 KB/s is not a realistic block I/O limit since a single request may be
> > larger than that.  Requests are not split and the bps limit must be honored,
> > so the request will simply never complete.  The guest driver times out after
> > a while and reports an I/O error.  This is expected behavior in the case
> > where a tiny bps limit is used.
> > 
> > A realistic limit is 2 MB/s or larger.  Even then, remember that I/O is in
> > requests so the actual throughput achieved may be 2 MB/s - request_size.
> > 
> > I guess this is /dev/sdb inside your guest:
> > 
> > -drive
> > file=/home/hyx/image/RHEL7-8378-2-15G.raw,if=none,media=disk,format=raw,
> > cache=none,bps=102400,iops=100,id=image2 \
> > 
> > Here are the cases:
> > 
> > 1. fio -bs=1M, write: iops=0
> > 
> > The request size is larger than the bps limit so QEMU will never submit the
> > request.  Therefore iops is 0.
> > 
> > 2. fio -bs=64k, write: iops=1
> > 
> > 100 KB/s / 64 KB/request = 1.56 request/s
> > 
> > Since requests are not split, the result is rounded down to 1 request/s,
> > which is exactly the result we got.
> > 
> > This result is expected.
> > 
> > 3. fio -bs=1M, job is stuck
> > 
> > Same as case #1.  No requests will be completed since they are larger than
> > the bps limit.
> Hi Stefan Hajnoczi
> For 1th and 3th test, I tested it again with dd tool,but not prompt
> "blk_update_request: I/O error".
> 
> Following is my run command:
> 1th:
> [root@dhcp-10-168 home]# dd if=/dev/zero of=/dev/sdb bs=1M count=10
> oflag=direct
> 10+0 records in
> 10+0 records out
> 10485760 bytes (10 MB) copied, 102.141 s, 103 kB/s
> 
> 2th:
> [root@dhcp-10-168 home]# dd if=/dev/zero of=/dev/sdb bs=1M count=100
> oflag=direct
> 100+0 records in
> 100+0 records out
> 104857600 bytes (105 MB) copied, 1023.74 s, 102 kB/s
> 
> 3th:
> [root@dhcp-10-168 home]# dd if=/dev/zero of=/dev/sdb bs=1M count=1000
> oflag=direct
> 1000+0 records in
> 1000+0 records out
> 1048576000 bytes (1.0 GB) copied, 10239.7 s, 102 kB/s
> 
> 
> The value of test is limited 100kB ,and the bs is larger than the bps limit.

The guest kernel splits the 1 MB write requests because they exceed the block queue limits.  The guest is not submitting 1 MB write requests.

You can check /sys/block/sdb/queue/{max_hw_sectors_kb,max_sectors_kb} for the queue limits.


Note You need to log in before you can comment on or make changes to this bug.