Bug 1275242 - file demotion failed due to different gfid's
file demotion failed due to different gfid's
Status: CLOSED WORKSFORME
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: tier (Show other bugs)
3.1
Unspecified Unspecified
urgent Severity urgent
: ---
: ---
Assigned To: Nithya Balachandran
Bhaskarakiran
: ZStream
Depends On:
Blocks: 1260923
  Show dependency treegraph
 
Reported: 2015-10-26 06:59 EDT by Bhaskarakiran
Modified: 2016-11-23 18:11 EST (History)
7 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-11-26 04:15:33 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Bhaskarakiran 2015-10-26 06:59:43 EDT
Description of problem:
=======================

Created an ec volume (8+4) and attached a rep 2 tier volume. Created 1000 files and waited till demotion of all files. Repeated the step till 5000 files. During file creation, brought down 2 of the cold tier bricks and all the files got healed. Some of the files didn't get demoted and file type shows as DBase 3 data file while it should be a data file. Checked the gfid's of the file on cold and hot tier and both are different. The file sizes are also different.

[root@transformers ~]# file /rhs/brick1/b1/files/testfile.5241
/rhs/brick1/b1/files/testfile.5241: sticky data
[root@transformers ~]# file /rhs/brick12/vol1-tier1/files/testfile.5241
/rhs/brick12/vol1-tier1/files/testfile.5241: DBase 3 data file
[root@transformers ~]# 

[root@transformers ~]# ls -lh /rhs/brick1/b1/files/testfile.5241
---------T. 2 root root 2.6M Oct 23 12:12 /rhs/brick1/b1/files/testfile.5241
[root@transformers ~]# ls -lh /rhs/brick12/vol1-tier1/files/testfile.5241
-rw-r--r--. 2 root root 1.0M Oct 26 14:40 /rhs/brick12/vol1-tier1/files/testfile.5241
[root@transformers ~]# 


[root@transformers ~]# getfattr -d -e hex -m. /rhs/brick1/b1/files/testfile.5241
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick1/b1/files/testfile.5241
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.bit-rot.version=0x0200000000000000562728e200093316
trusted.ec.config=0x0000080c04000200
trusted.ec.size=0x0000000001479000
trusted.ec.version=0x00000000000005870000000000000b0e
trusted.gfid=0x91acb23edeb04945a8fc73023d897e41
trusted.pgfid.3eec8567-4b3b-4890-8858-55142006e8e7=0x00000001
trusted.tier-gfid.linkto=0x766f6c312d686f742d64687400

[root@transformers ~]# getfattr -d -e hex -m. /rhs/brick12/vol1-tier1/files/testfile.5241
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick12/vol1-tier1/files/testfile.5241
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x020000000000000056273ba7000c2168
trusted.gfid=0xa5caddeda04441faae826fa3edaff790
trusted.glusterfs.quota.3eec8567-4b3b-4890-8858-55142006e8e7.contri=0x00000000001000000000000000000001
trusted.pgfid.3eec8567-4b3b-4890-8858-55142006e8e7=0x00000001

[root@transformers ~]# 

Version-Release number of selected component (if applicable):
=============================================================
3.7.5.0-3

How reproducible:
=================
Tried once

Steps to Reproduce:
===================
As in description

Actual results:
===============
Data corruption


Expected results:
=================
No data corruption

Additional info:
================
sosreports will be copied to rhsqe-repo/sosreports/<bugid>
Comment 3 Dan Lambright 2015-11-24 15:08:10 EST
I tried to reproduce this on cold/hot : 1 x (4 + 2) / 3 x 2 

1. create 5000 50K files
2. in parallel, killed two cold EC bricks
3. waited for all files to demote
4. Observed they all demoted successfully. Observed their size/type was correct.

There have been many EC related fixes for tiering since this bug was open. Can QE also try to reproduce it again, if we both are unable we should close it. Otherwise, we should exchange information on how to reproduce the problem.
Comment 4 Bhaskarakiran 2015-11-26 04:15:33 EST
I have tried couple of times reproducing this but not successful. I am closing this bug for now. Will reopen if its seen again.
Comment 5 nchilaka 2015-12-21 07:42:04 EST
Note, this can be a "potential risk" for the feature

Note You need to log in before you can comment on or make changes to this bug.