Description of problem: fio test on gluster-block device results in I/O error messages in /var/log/messages Version-Release number of selected component (if applicable): On server: glusterfs-3.8.4-41.el7rhgs.x86_64 glusterfs-api-3.8.4-41.el7rhgs.x86_64 glusterfs-cli-3.8.4-41.el7rhgs.x86_64 glusterfs-server-3.8.4-41.el7rhgs.x86_64 glusterfs-libs-3.8.4-41.el7rhgs.x86_64 glusterfs-client-xlators-3.8.4-41.el7rhgs.x86_64 glusterfs-fuse-3.8.4-41.el7rhgs.x86_64 gluster-block-0.2.1-6.el7rhgs.x86_64 libtcmu-1.2.0-11.el7rhgs.x86_64 tcmu-runner-1.2.0-11.el7rhgs.x86_64 On client: iscsi-initiator-utils-6.2.0.874-4.el7.x86_64 Steps to Reproduce: 1. Created a gluster-block device /dev/sdb with xfs FS: <quote> # lsscsi [1:0:0:0] disk LIO-ORG TCMU device 0002 /dev/sdb # lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 465.8G 0 disk ├─sda1 8:1 0 1G 0 part /boot └─sda2 8:2 0 464.8G 0 part ├─rhel_gprfc013-root 253:0 0 50G 0 lvm / ├─rhel_gprfc013-swap 253:1 0 23.6G 0 lvm [SWAP] └─rhel_gprfc013-home 253:2 0 391.1G 0 lvm /home sdb 8:16 0 300G 0 disk /mnt/glustervol </quote> 2. Ran fio sequential write test: <quote> # cat job.fio.write [global] rw=write create_on_open=1 fsync_on_close=1 size=10g bs=1024k openfiles=1 startdelay=0 ioengine=sync nrfiles=1 [lgf-write] directory=/mnt/glustervol/${HOSTNAME} filename_format=f.$jobnum.$filenum numjobs=8 </quote> # ./fio --output=out.fio.write job.fio.write Actual results: Errors in /var/log/messages: <quote> Aug 17 01:50:13 localhost kernel: sd 1:0:0:0: [sdb] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Aug 17 01:50:13 localhost kernel: sd 1:0:0:0: [sdb] Sense Key : Not Ready [current] Aug 17 01:50:13 localhost kernel: sd 1:0:0:0: [sdb] Add. Sense: Logical unit communication failure Aug 17 01:50:13 localhost kernel: sd 1:0:0:0: [sdb] CDB: Write(10) 2a 00 00 66 03 00 00 00 80 00 Aug 17 01:50:13 localhost kernel: blk_update_request: I/O error, dev sdb, sector 6685440 Aug 17 01:50:13 localhost kernel: Buffer I/O error on dev sdb, logical block 834528, lost async page write Aug 17 01:50:13 localhost kernel: Buffer I/O error on dev sdb, logical block 834529, lost async page write Aug 17 01:50:13 localhost kernel: Buffer I/O error on dev sdb, logical block 834530, lost async page write Aug 17 01:50:13 localhost kernel: Buffer I/O error on dev sdb, logical block 834531, lost async page write Aug 17 01:50:13 localhost kernel: Buffer I/O error on dev sdb, logical block 834532, lost async page write Aug 17 01:50:13 localhost kernel: Buffer I/O error on dev sdb, logical block 834533, lost async page write Aug 17 01:50:13 localhost kernel: Buffer I/O error on dev sdb, logical block 834534, lost async page write Aug 17 01:50:13 localhost kernel: Buffer I/O error on dev sdb, logical block 834535, lost async page write Aug 17 01:50:13 localhost kernel: Buffer I/O error on dev sdb, logical block 834536, lost async page write Aug 17 01:50:13 localhost kernel: Buffer I/O error on dev sdb, logical block 834537, lost async page write Aug 17 01:50:13 localhost kernel: sd 1:0:0:0: [sdb] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Aug 17 01:50:13 localhost kernel: sd 1:0:0:0: [sdb] Sense Key : Not Ready [current] Aug 17 01:50:13 localhost kernel: sd 1:0:0:0: [sdb] Add. Sense: Logical unit communication failure Aug 17 01:50:13 localhost kernel: sd 1:0:0:0: [sdb] CDB: Write(10) 2a 00 01 0a 04 80 00 00 80 00 Aug 17 01:50:13 localhost kernel: blk_update_request: I/O error, dev sdb, sector 17433728 Aug 17 01:50:14 localhost kernel: sd 1:0:0:0: [sdb] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Aug 17 01:50:14 localhost kernel: sd 1:0:0:0: [sdb] Sense Key : Not Ready [current] Aug 17 01:50:14 localhost kernel: sd 1:0:0:0: [sdb] Add. Sense: Logical unit communication failure Aug 17 01:50:14 localhost kernel: sd 1:0:0:0: [sdb] CDB: Write(10) 2a 00 00 4a 03 80 00 00 80 00 Aug 17 01:50:14 localhost kernel: blk_update_request: I/O error, dev sdb, sector 4850560 Aug 17 01:50:15 localhost kernel: sd 1:0:0:0: [sdb] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Aug 17 01:50:15 localhost kernel: sd 1:0:0:0: [sdb] Sense Key : Not Ready [current] Aug 17 01:50:15 localhost kernel: sd 1:0:0:0: [sdb] Add. Sense: Logical unit communication failure Aug 17 01:50:15 localhost kernel: sd 1:0:0:0: [sdb] CDB: Write(10) 2a 00 01 0c 02 80 00 00 80 00 Aug 17 01:50:15 localhost kernel: blk_update_request: I/O error, dev sdb, sector 17564288 ... </quote> Expected results: No I/O errors Additional info: gluster v info perfvol: Volume Name: perfvol Type: Distribute Volume ID: 0a1381e8-3cae-4f6a-8151-4171a28edd56 Status: Started Snapshot Count: 0 Number of Bricks: 1 Transport-type: tcp Bricks: Brick1: smerf04-10ge:/mnt/rhs_brick1 Options Reconfigured: server.allow-insecure: on user.cifs: off features.shard-block-size: 64MB features.shard: on cluster.shd-wait-qlength: 10000 cluster.shd-max-threads: 8 cluster.locking-scheme: granular cluster.data-self-heal-algorithm: full cluster.quorum-type: auto cluster.eager-lock: disable network.remote-dio: disable performance.readdir-ahead: off performance.open-behind: off performance.stat-prefetch: off performance.io-cache: off performance.read-ahead: off performance.quick-read: off transport.address-family: inet nfs.disable: on
Forgot to put in the OS version. On both client and server: # cat /etc/redhat-release Red Hat Enterprise Linux Server release 7.4 (Maipo) kernel rpm: kernel-3.10.0-693.el7.x86_64
Manoj and I looked at the setup today and found this issue to be the same race fixed by https://review.gluster.org/17821, this issue is seen only with plain distribute and not with Replication. Manoj is doing some more tests and will update again with more information. I am clearing the needinfo on us for now.
Switched to 1x3 volume configuration. No I/O errors seen after that.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:2607