Bug 1374229

Summary: [NFS Ganesha + tiering]: IO hangs for a while during attach tier
Product: [Community] GlusterFS Reporter: krishnaram Karthick <kramdoss>
Component: ganesha-nfsAssignee: bugs <bugs>
Status: CLOSED EOL QA Contact:
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.8CC: bugs, jthottan, kkeithle, kramdoss, mzywusko, ndevos, skoduri
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1379976 (view as bug list) Environment:
Last Closed: 2017-11-07 10:35:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1379976    

Description krishnaram Karthick 2016-09-08 09:53:34 UTC
Description of problem:
IO hangs when a hot tier is attached to a disperse volume. Although IO resumes after a while, the duration of IO hang increases in proportion to the configuration (2x2 vs 4x2) of hot tier and the IO load.

IO hangs for 2 minutes (avg) with 2x2 hot tier and kernel untar IO load. With 4x2 hot tier, IO hangs for more than 5 minutes. 

Volume Name: krk-vol
Type: Tier
Volume ID: 1bf70c56-56e1-4b9e-aeeb-e94a1fc42a28
Status: Started
Number of Bricks: 16
Transport-type: tcp
Hot Tier :
Hot Tier Type : Distributed-Replicate
Number of Bricks: 2 x 2 = 4
Brick1: 10.70.37.142:/bricks/brick4/ht1
Brick2: 10.70.37.153:/bricks/brick4/ht1
Brick3: 10.70.37.194:/bricks/brick4/ht1
Brick4: 10.70.37.182:/bricks/brick4/ht1
Cold Tier:
Cold Tier Type : Distributed-Disperse
Number of Bricks: 2 x (4 + 2) = 12
Brick5: 10.70.37.182:/bricks/brick0/v1
Brick6: 10.70.37.194:/bricks/brick0/v1
Brick7: 10.70.37.153:/bricks/brick0/v1
Brick8: 10.70.37.142:/bricks/brick0/v1
Brick9: 10.70.37.114:/bricks/brick0/v1
Brick10: 10.70.37.86:/bricks/brick0/v1
Brick11: 10.70.37.182:/bricks/brick1/v1
Brick12: 10.70.37.194:/bricks/brick1/v1
Brick13: 10.70.37.153:/bricks/brick1/v1
Brick14: 10.70.37.142:/bricks/brick1/v1
Brick15: 10.70.37.114:/bricks/brick1/v1
Brick16: 10.70.37.86:/bricks/brick1/v1
Options Reconfigured:
cluster.tier-mode: cache
features.ctr-enabled: on
features.quota-deem-statfs: on
features.inode-quota: on
features.quota: on
ganesha.enable: on
features.cache-invalidation: on
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: on
nfs-ganesha: enable
cluster.enable-shared-storage: enable


Version-Release number of selected component (if applicable):
[root@dhcp37-114 ~]# rpm -qa | grep 'gluster'
glusterfs-cli-3.8.3-0.1.git2ea32d9.el7.centos.x86_64
glusterfs-server-3.8.3-0.1.git2ea32d9.el7.centos.x86_64
python-gluster-3.8.3-0.1.git2ea32d9.el7.centos.noarch
glusterfs-client-xlators-3.8.3-0.1.git2ea32d9.el7.centos.x86_64
glusterfs-3.8.3-0.1.git2ea32d9.el7.centos.x86_64
glusterfs-fuse-3.8.3-0.1.git2ea32d9.el7.centos.x86_64
nfs-ganesha-gluster-next.20160813.2f47e8a-1.el7.centos.x86_64
glusterfs-libs-3.8.3-0.1.git2ea32d9.el7.centos.x86_64
glusterfs-api-3.8.3-0.1.git2ea32d9.el7.centos.x86_64
glusterfs-ganesha-3.8.3-0.1.git2ea32d9.el7.centos.x86_64


How reproducible:
Always

Steps to Reproduce:
1. create a distributed-disperse volume
2. enable quota and set limits
3. start IO and attach hot tier

Actual results:
Attach tier succeeds, IO hangs for a while

Expected results:
No IO hang should be seen with attach tier operation

Additional info:

Comment 1 Niels de Vos 2016-09-12 05:36:54 UTC
All 3.8.x bugs are now reported against version 3.8 (without .x). For more information, see http://www.gluster.org/pipermail/gluster-devel/2016-September/050859.html

Comment 2 Soumya Koduri 2016-09-13 12:14:45 UTC
Could you please collect pkt trace and logs when IO hang is seen.

Comment 3 Niels de Vos 2017-11-07 10:35:48 UTC
This bug is getting closed because the 3.8 version is marked End-Of-Life. There will be no further updates to this version. Please open a new bug against a version that still receives bugfixes if you are still facing this issue in a more current release.

Comment 4 krishnaram Karthick 2020-09-28 02:59:20 UTC
clearing stale needinfos.