Bug 1272408

Summary: Data Tiering:[2015-10-15 02:54:52.259879] E [MSGID: 109039] [dht-common.c:2833:dht_vgetxattr_cbk] 0-tiervolume-cold-dht: vgetxattr: Subvolume tiervolume-disperse-1 returned -1 [No such file or directory]
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Nag Pavan Chilakam <nchilaka>
Component: tierAssignee: Ashish Pandey <aspandey>
Status: CLOSED ERRATA QA Contact: Rahul Hinduja <rhinduja>
Severity: high Docs Contact:
Priority: unspecified    
Version: rhgs-3.1CC: asrivast, dlambrig, rhinduja, rhs-bugs, sankarshan, storage-qa-internal
Target Milestone: ---Keywords: ZStream
Target Release: RHGS 3.1.2   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.7.5-11 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1272401 Environment:
Last Closed: 2016-03-01 05:41:50 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1272401    
Bug Blocks: 1260783, 1260923    

Description Nag Pavan Chilakam 2015-10-16 10:50:33 UTC
+++ This bug was initially created as a clone of Bug #1272401 +++

Description of problem:
=========================
On the longevity/stress setup we are getting the error message 
[2015-10-15 02:54:52.259879] E [MSGID: 109039] [dht-common.c:2833:dht_vgetxattr_cbk] 0-tiervolume-cold-dht: vgetxattr: Subvolume tiervolume-disperse-1 returned -1 [No s
uch file or directory]


Version-Release number of selected component (if applicable):
=============================================================

glusterfs-3.7.5-0.19.git0f5c3e8.el7.centos.x86_64



Steps Carried:
==============

1. Created 12 node cluster
2. Create tiered volume with Hot tier as (6 x 2) and Cold tier as (2 x (6 + 2) = 16)
3. Fuse Mount the volume on 3 clients RHEL7.2,RHEl7.1 and RHEL6.7
4. Start creating data from each client:

Client 1:
=========
[root@dj ~]# crefi --multi -n 10 -b 10 -d 10 --max=1024k --min=5k --random -T 5 -t text -I 5 --fop=create /mnt/fuse/

Client 2:
=========
[root@mia ~]# cd /mnt/fuse/
[root@mia fuse]# for i in {1..10}; do cp -rf /etc etc.$i ; sleep 100 ; done

Client 3:
=========
[root@wingo fuse]# for i in {1..999}; do dd if=/dev/zero of=dd.$i bs=1M count=1 ; sleep 10 ; done

5. After a while, the data creation of client 1 and client 2 should be completed while the data creation from client 3 will still be inprogress

6. At this point the data creation will be of only 1 file from client 3 in every 10 sec.

7. Monitor the cpu usage using top

Comment 5 Rahul Hinduja 2015-11-02 13:17:36 UTC
For records: Able to hit it again while changing permissions of files in system 


[root@dhcp37-160 glusterfs]# grep "dht_vgetxattr_cbk" tiervolume-tier.log
[2015-11-02 09:04:32.847824] E [MSGID: 109039] [dht-common.c:2833:dht_vgetxattr_cbk] 0-tiervolume-cold-dht: vgetxattr: Subvolume tiervolume-disperse-0 returned -1 [Input/output error]
[2015-11-02 09:04:32.847852] E [MSGID: 109039] [dht-common.c:2833:dht_vgetxattr_cbk] 0-tiervolume-tier-dht: vgetxattr: Subvolume tiervolume-cold-dht returned -1 [Input/output error]
[2015-11-02 10:30:05.823787] E [MSGID: 109039] [dht-common.c:2833:dht_vgetxattr_cbk] 0-tiervolume-cold-dht: vgetxattr: Subvolume tiervolume-disperse-0 returned -1 [No such file or directory]
[2015-11-02 10:30:05.823812] E [MSGID: 109039] [dht-common.c:2833:dht_vgetxattr_cbk] 0-tiervolume-tier-dht: vgetxattr: Subvolume tiervolume-cold-dht returned -1 [No such file or directory]
[root@dhcp37-160 glusterfs]#

Comment 7 Rahul Hinduja 2015-11-30 06:58:13 UTC
This bug was reproducible while performing metadata changes. With the latest build glusterfs-3.7.5-7.el7rhgs.x86_64 metadata changes do not cause a migration. Performed fops like create,chmod,chown,chgrp,symlink,truncate,rename with the latest build and didn't observe these errors. 

Since we do not know the actual RCA, this bug should be kept open till the regression is done where stress of fops would be performed in test and cache mode.

Comment 9 Ashish Pandey 2015-12-14 04:40:19 UTC
If a getxattr call is made on a file and if the file has been migrated from cold (EC) to hot, that might give an error that file does not exist.
This patch handles the issue by making cold volume as hashed volume and making sure to creat T files on hashed volume (EC). 
That also make sure that a getxattr on that file will see the file and xattr.

Comment 10 Rahul Hinduja 2015-12-17 13:30:24 UTC
Verified with build: 

Ran the regression run on Tiered volume (Cold Tier: 2x(4+2) and Hot Tier: 2x2) which covers fops like create, chmod, chown, chgrp, symlink, rename, truncate. 

No errors reported. Moving the bug to verified state. 

[root@dhcp37-165 glusterfs]# grep -i "vgetxattr" tiervolume-*
[root@dhcp37-165 glusterfs]# 
[root@dhcp37-165 glusterfs]# grep -i " E " tiervolume-*
[root@dhcp37-165 glusterfs]#

Comment 13 errata-xmlrpc 2016-03-01 05:41:50 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0193.html