Description of problem: When using direct I/O, reading from a file returns more data, padding the file data with zeroes. Here is an example. ## On a host mounting gluster using fuse $ pwd /rhev/data-center/mnt/glusterSD/voodoo4.tlv.redhat.com:_gv0/de566475-5b67-4987-abf3-3dc98083b44c/dom_md $ mount | grep glusterfs voodoo4.tlv.redhat.com:/gv0 on /rhev/data-center/mnt/glusterSD/voodoo4.tlv.redhat.com:_gv0 type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072) $ stat metadata File: metadata Size: 501 Blocks: 1 IO Block: 131072 regular file Device: 31h/49d Inode: 13313776956941938127 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 36/ vdsm) Gid: ( 36/ kvm) Context: system_u:object_r:fusefs_t:s0 Access: 2019-08-01 22:21:49.186381528 +0300 Modify: 2019-08-01 22:21:49.427404135 +0300 Change: 2019-08-01 22:21:49.969739575 +0300 Birth: - $ cat metadata ALIGNMENT=1048576 BLOCK_SIZE=4096 CLASS=Data DESCRIPTION=gv0 IOOPTIMEOUTSEC=10 LEASERETRIES=3 LEASETIMESEC=60 LOCKPOLICY= LOCKRENEWALINTERVALSEC=5 MASTER_VERSION=1 POOL_DESCRIPTION=4k-gluster POOL_DOMAINS=de566475-5b67-4987-abf3-3dc98083b44c:Active POOL_SPM_ID=-1 POOL_SPM_LVER=-1 POOL_UUID=44cfb532-3144-48bd-a08c-83065a5a1032 REMOTE_PATH=voodoo4.tlv.redhat.com:/gv0 ROLE=Master SDUUID=de566475-5b67-4987-abf3-3dc98083b44c TYPE=GLUSTERFS VERSION=5 _SHA_CKSUM=3d1cb836f4c93679fc5a4e7218425afe473e3cfa $ dd if=metadata bs=4096 count=1 of=/dev/null 0+1 records in 0+1 records out 501 bytes copied, 0.000340298 s, 1.5 MB/s $ dd if=metadata bs=4096 count=1 of=/dev/null iflag=direct 1+0 records in 1+0 records out 4096 bytes (4.1 kB, 4.0 KiB) copied, 0.00398529 s, 1.0 MB/s Checking the copied data, the actual content of the file is padded with zeros to 4096 bytes. ## On the one of the gluster nodes $ pwd /export/vdo0/brick/de566475-5b67-4987-abf3-3dc98083b44c/dom_md $ stat metadata File: metadata Size: 501 Blocks: 16 IO Block: 4096 regular file Device: fd02h/64770d Inode: 149 Links: 2 Access: (0644/-rw-r--r--) Uid: ( 36/ UNKNOWN) Gid: ( 36/ kvm) Context: system_u:object_r:usr_t:s0 Access: 2019-08-01 22:21:50.380425478 +0300 Modify: 2019-08-01 22:21:49.427397589 +0300 Change: 2019-08-01 22:21:50.374425302 +0300 Birth: - $ dd if=metadata bs=4096 count=1 of=/dev/null 0+1 records in 0+1 records out 501 bytes copied, 0.000991636 s, 505 kB/s $ dd if=metadata bs=4096 count=1 of=/dev/null iflag=direct 0+1 records in 0+1 records out 501 bytes copied, 0.0011381 s, 440 kB/s This proves that the issue is in gluster. # gluster volume info gv0 Volume Name: gv0 Type: Replicate Volume ID: cbc5a2ad-7246-42fc-a78f-70175fb7bf22 Status: Started Snapshot Count: 0 Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: voodoo4.tlv.redhat.com:/export/vdo0/brick Brick2: voodoo5.tlv.redhat.com:/export/vdo0/brick Brick3: voodoo8.tlv.redhat.com:/export/vdo0/brick (arbiter) Options Reconfigured: storage.owner-gid: 36 storage.owner-uid: 36 server.event-threads: 4 client.event-threads: 4 cluster.choose-local: off user.cifs: off features.shard: on cluster.shd-wait-qlength: 10000 cluster.shd-max-threads: 8 cluster.locking-scheme: granular cluster.data-self-heal-algorithm: full cluster.server-quorum-type: server cluster.quorum-type: auto cluster.eager-lock: enable network.remote-dio: disable performance.low-prio-threads: 32 performance.io-cache: off performance.read-ahead: off performance.quick-read: off transport.address-family: inet nfs.disable: on performance.client-io-threads: on $ xfs_info /export/vdo0 meta-data=/dev/mapper/vdo0 isize=512 agcount=4, agsize=6553600 blks = sectsz=4096 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=1, rmapbt=0 = reflink=0 data = bsize=4096 blocks=26214400, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0, ftype=1 log =internal log bsize=4096 blocks=12800, version=2 = sectsz=4096 sunit=1 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 Version-Release number of selected component (if applicable): Server: $ rpm -qa | grep glusterfs glusterfs-libs-6.4-1.fc29.x86_64 glusterfs-api-6.4-1.fc29.x86_64 glusterfs-client-xlators-6.4-1.fc29.x86_64 glusterfs-fuse-6.4-1.fc29.x86_64 glusterfs-6.4-1.fc29.x86_64 glusterfs-cli-6.4-1.fc29.x86_64 glusterfs-server-6.4-1.fc29.x86_64 Client: $ rpm -qa | grep glusterfs glusterfs-client-xlators-6.4-1.fc29.x86_64 glusterfs-6.4-1.fc29.x86_64 glusterfs-rdma-6.4-1.fc29.x86_64 glusterfs-cli-6.4-1.fc29.x86_64 glusterfs-libs-6.4-1.fc29.x86_64 glusterfs-fuse-6.4-1.fc29.x86_64 glusterfs-api-6.4-1.fc29.x86_64 How reproducible: Always. Steps to Reproduce: 1. Provision gluster volume over vdo (did not check without vdo) 2. Create a file of 501 bytes 3. Read the file using direct I/O Actual results: read() returns 4096 bytes, padding the file data with zeroes Expected results: read() returns actual file data (501 bytes)
David, do you think this can affect sanlock?
Kevin, do you think this can affect qemu/qemu-img?
@Nir, thanks for the report. We will look into this.
(In reply to Nir Soffer from comment #2) > Kevin, do you think this can affect qemu/qemu-img? This is not a problem for QEMU as long as the file size is correct. If gluster didn't do the zero padding, QEMU would do it internally. In fact, fixing this in gluster may break the case of unaligned image sizes with QEMU because the image size is rounded up to sector (512 byte) granularity and the gluster driver turns short reads into errors. This would actually affect non-O_DIRECT, too, which already seems to behave this way, so can you just give this a quick test?
(In reply to Nir Soffer from comment #1) > David, do you think this can affect sanlock? I don't think so. sanlock doesn't use any space that it didn't first write to initialize.
REVIEW: https://review.gluster.org/23212 (features/shard: Send correct size when reads are sent beyond file size) posted (#1) for review on release-6 by Krutika Dhananjay
REVIEW: https://review.gluster.org/23212 (features/shard: Send correct size when reads are sent beyond file size) merged (#3) on release-6 by hari gowtham
Krutika, can we backport this fix to current RHGS? This issue affects RHV customers, see bug 1800803.
(In reply to Nir Soffer from comment #8) > Krutika, can we backport this fix to current RHGS? > > This issue affects RHV customers, see bug 1800803. Yes, I will do the needful. Thanks. -Krutika