Bug 1473887

Summary: read call on shard should not go beyond the shard-size
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: SATHEESARAN <sasundar>
Component: shardingAssignee: Krutika Dhananjay <kdhananj>
Status: CLOSED NOTABUG QA Contact: SATHEESARAN <sasundar>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: rhgs-3.2CC: rhs-bugs, storage-qa-internal
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-07-23 03:40:33 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description SATHEESARAN 2017-07-22 03:59:04 UTC
Description of problem:
-----------------------
On a particular testing with RHHI, it was observed that when VMs are created from template, one after the other before completion of one VM creation, VMs are stuck in 'Image Locked' state

When looked at the gluster logs, it was found that the reads are going beyond the shard-size and XFS is not returning proper errno.

So clearly we found out 2 issues
1. XFS issue ( BZ 1473549 )
2. gluster shard problem
This bug is to address the second point.

Version-Release number of selected component (if applicable):
--------------------------------------------------------------
RHGS 3.2.0 async
glusterfs-3.8.4-18.6.el7rhgs

How reproducible:
-----------------
Hit it for the first time, never tried to reproduce

Steps to Reproduce:
-------------------
This particular issue was seen with the RHHI case
There could be a much simpler step, but let me post what I have seen
1. Create RHV setup backed with gluster replica 3 volume ( XFS bricks )
2. Create RHEL 7.4 template
3. Create multiple VMs from the RHEL 7.4 template ( one after other, before the completion of previous VM )

Actual results:
---------------
read on  shard has gone beyond the shard size
[2017-07-20 12:50:59.982435] E [MSGID: 113040] [posix.c:3119:posix_readv] 0-vmstore-posix: read failed on gfid=0aac1b8b-0ccc-4936-922c-efb524d226e3, fd=0x7fd7c401989c, offset=3584 size=2048, buf=0x7fd7d8af7000 [Unknown error 3072]

Expected results:
-----------------
read syscall on shard should not have gone beyond shard-size

Comment 3 Krutika Dhananjay 2017-07-23 03:40:33 UTC
Wrt the specific logs highlighted in comment #2, the offset is 4193792, which is 512bytes short of 4194304(=4 * 1024 * 1024 = 4MB).
So it is a legitimate read; Also in light of the clarification provided at https://bugzilla.redhat.com/show_bug.cgi?id=1473549#c31, I'm closing this bug.

-Krutika

Comment 4 SATHEESARAN 2017-07-24 06:12:14 UTC
On Sat, Jul 22, 2017 at 11:09 PM, Pranith Kumar Karampuri <pkarampu> wrote:
I am sorry, I calculated 4MB wrong. Read came inside 4MB size, but I was under the impression it was beyond 4MB size. There is no bug in shard. I am going to update the bz as well.