Bug 1379976

Summary: [NFS Ganesha + tiering]: IO hangs for a while during attach tier
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Shashank Raj <sraj>
Component: nfs-ganeshaAssignee: Kaleb KEITHLEY <kkeithle>
Status: CLOSED NOTABUG QA Contact: storage-qa-internal <storage-qa-internal>
Severity: high Docs Contact:
Priority: unspecified    
Version: rhgs-3.2CC: bugs, jthottan, kkeithle, kramdoss, ndevos, rhs-bugs, skoduri, storage-qa-internal
Target Milestone: ---Keywords: Triaged, ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1374229 Environment:
Last Closed: 2018-11-19 06:34:08 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1374229    
Bug Blocks:    

Description Shashank Raj 2016-09-28 09:55:33 UTC
+++ This bug was initially created as a clone of Bug #1374229 +++

Description of problem:
IO hangs when a hot tier is attached to a disperse volume. Although IO resumes after a while, the duration of IO hang increases in proportion to the configuration (2x2 vs 4x2) of hot tier and the IO load.

IO hangs for 2 minutes (avg) with 2x2 hot tier and kernel untar IO load. With 4x2 hot tier, IO hangs for more than 5 minutes. 

Volume Name: krk-vol
Type: Tier
Volume ID: 1bf70c56-56e1-4b9e-aeeb-e94a1fc42a28
Status: Started
Number of Bricks: 16
Transport-type: tcp
Hot Tier :
Hot Tier Type : Distributed-Replicate
Number of Bricks: 2 x 2 = 4
Brick1: 10.70.37.142:/bricks/brick4/ht1
Brick2: 10.70.37.153:/bricks/brick4/ht1
Brick3: 10.70.37.194:/bricks/brick4/ht1
Brick4: 10.70.37.182:/bricks/brick4/ht1
Cold Tier:
Cold Tier Type : Distributed-Disperse
Number of Bricks: 2 x (4 + 2) = 12
Brick5: 10.70.37.182:/bricks/brick0/v1
Brick6: 10.70.37.194:/bricks/brick0/v1
Brick7: 10.70.37.153:/bricks/brick0/v1
Brick8: 10.70.37.142:/bricks/brick0/v1
Brick9: 10.70.37.114:/bricks/brick0/v1
Brick10: 10.70.37.86:/bricks/brick0/v1
Brick11: 10.70.37.182:/bricks/brick1/v1
Brick12: 10.70.37.194:/bricks/brick1/v1
Brick13: 10.70.37.153:/bricks/brick1/v1
Brick14: 10.70.37.142:/bricks/brick1/v1
Brick15: 10.70.37.114:/bricks/brick1/v1
Brick16: 10.70.37.86:/bricks/brick1/v1
Options Reconfigured:
cluster.tier-mode: cache
features.ctr-enabled: on
features.quota-deem-statfs: on
features.inode-quota: on
features.quota: on
ganesha.enable: on
features.cache-invalidation: on
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: on
nfs-ganesha: enable
cluster.enable-shared-storage: enable


Version-Release number of selected component (if applicable):
[root@dhcp37-114 ~]# rpm -qa | grep 'gluster'
glusterfs-cli-3.8.3-0.1.git2ea32d9.el7.centos.x86_64
glusterfs-server-3.8.3-0.1.git2ea32d9.el7.centos.x86_64
python-gluster-3.8.3-0.1.git2ea32d9.el7.centos.noarch
glusterfs-client-xlators-3.8.3-0.1.git2ea32d9.el7.centos.x86_64
glusterfs-3.8.3-0.1.git2ea32d9.el7.centos.x86_64
glusterfs-fuse-3.8.3-0.1.git2ea32d9.el7.centos.x86_64
nfs-ganesha-gluster-next.20160813.2f47e8a-1.el7.centos.x86_64
glusterfs-libs-3.8.3-0.1.git2ea32d9.el7.centos.x86_64
glusterfs-api-3.8.3-0.1.git2ea32d9.el7.centos.x86_64
glusterfs-ganesha-3.8.3-0.1.git2ea32d9.el7.centos.x86_64


How reproducible:
Always

Steps to Reproduce:
1. create a distributed-disperse volume
2. enable quota and set limits
3. start IO and attach hot tier

Actual results:
Attach tier succeeds, IO hangs for a while

Expected results:
No IO hang should be seen with attach tier operation

Additional info:

--- Additional comment from Niels de Vos on 2016-09-12 01:36:54 EDT ---

All 3.8.x bugs are now reported against version 3.8 (without .x). For more information, see http://www.gluster.org/pipermail/gluster-devel/2016-September/050859.html

--- Additional comment from Soumya Koduri on 2016-09-13 08:14:45 EDT ---

Could you please collect pkt trace and logs when IO hang is seen.