Bug 1303291 - [Tiering]: Detach tier status shows as failed on one of the nodes
[Tiering]: Detach tier status shows as failed on one of the nodes
Status: MODIFIED
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: tier (Show other bugs)
3.1
Unspecified Unspecified
unspecified Severity medium
: ---
: ---
Assigned To: hari gowtham
nchilaka
tier-attach-detach
: Reopened, ZStream
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2016-01-30 04:49 EST by krishnaram Karthick
Modified: 2017-06-28 05:06 EDT (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-06-06 04:16:35 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description krishnaram Karthick 2016-01-30 04:49:15 EST
Description of problem:
On a 16 node setup with a tiered volume configured (as in the vol info below), detach tier status on the volume shows the status on one node as 'failed'. There are no information of any failures in the log messages though and files seems to have migrated from hot tier to cold tier. There are no disruption to IOs from fuse mount. This issue appears to be a cosmetic issue, however we'll need to analyze and confirm the same.

[root@dhcp37-101 glusterfs]# gluster v tier krk-vol detach status
Node           Rebalanced-files          size       scanned      failures       skipped               status   
---------      -----------   -----------   -----------   -----------   -----------         ------------     
localhost             4353        24.3GB         17437         12999             0            completed            
10.70.37.202            10006        21.1GB         19550          9477             0            completed            
10.70.37.195                0        0Bytes             0             0             0            completed             
10.70.35.44              721         8.1MB         20984          9768             0            completed            
10.70.35.231                0        0Bytes             0             0             0            completed             
10.70.35.176                0        0Bytes        173332           555             0               failed            
10.70.35.232                0        0Bytes             0             0             0            completed             
10.70.35.173              256         4.2MB         19009          8819             0            completed            
10.70.35.163                0        0Bytes             0             0             0            completed             
10.70.37.69                0        0Bytes             0             0             0            completed             
10.70.37.60             6205        21.7GB         18453         12126             0            completed            
10.70.37.120                0        0Bytes             0             0             0            completed             



<<snippet of 'gluster v status krk-vol'>>

Task Status of Volume krk-vol
------------------------------------------------------------------------------
Task                 : Detach tier         
ID                   : e2857877-fc13-45ad-85cf-5e721b400212
Status               : failed              

<<gluster v info>>

Volume Name: krk-vol
Type: Tier
Volume ID: 192655ce-4ef6-4ada-8e0c-6f137e2721e1
Status: Started
Number of Bricks: 36
Transport-type: tcp
Hot Tier :
Hot Tier Type : Distributed-Replicate
Number of Bricks: 6 x 2 = 12
Brick1: 10.70.37.101:/rhs/brick6/krkvol
Brick2: 10.70.37.69:/rhs/brick6/krkvol
Brick3: 10.70.37.60:/rhs/brick6/krkvol
Brick4: 10.70.37.120:/rhs/brick6/krkvol
Brick5: 10.70.37.202:/rhs/brick6/krkvol
Brick6: 10.70.37.195:/rhs/brick6/krkvol
Brick7: 10.70.35.44:/rhs/brick6/krkvol
Brick8: 10.70.35.231:/rhs/brick6/krkvol
Brick9: 10.70.35.176:/rhs/brick6/krkvol
Brick10: 10.70.35.232:/rhs/brick6/krkvol
Brick11: 10.70.35.173:/rhs/brick6/krkvol
Brick12: 10.70.35.163:/rhs/brick6/krkvol
Cold Tier:
Cold Tier Type : Distributed-Disperse
Number of Bricks: 2 x (8 + 4) = 24
Brick13: 10.70.35.176:/rhs/brick5/krkvol
Brick14: 10.70.35.232:/rhs/brick5/krkvol
Brick15: 10.70.35.173:/rhs/brick5/krkvol
Brick16: 10.70.35.163:/rhs/brick5/krkvol
Brick17: 10.70.37.101:/rhs/brick5/krkvol
Brick18: 10.70.37.69:/rhs/brick5/krkvol
Brick19: 10.70.37.60:/rhs/brick5/krkvol
Brick20: 10.70.37.120:/rhs/brick5/krkvol
Brick21: 10.70.37.202:/rhs/brick4/krkvol
Brick22: 10.70.37.195:/rhs/brick4/krkvol
Brick23: 10.70.35.155:/rhs/brick4/krkvol
Brick24: 10.70.35.222:/rhs/brick4/krkvol
Brick25: 10.70.35.108:/rhs/brick4/krkvol
Brick26: 10.70.35.44:/rhs/brick4/krkvol
Brick27: 10.70.35.89:/rhs/brick4/krkvol
Brick28: 10.70.35.231:/rhs/brick4/krkvol
Brick29: 10.70.35.176:/rhs/brick4/krkvol
Brick30: 10.70.35.232:/rhs/brick4/krkvol
Brick31: 10.70.35.173:/rhs/brick4/krkvol
Brick32: 10.70.35.163:/rhs/brick4/krkvol
Brick33: 10.70.37.101:/rhs/brick4/krkvol
Brick34: 10.70.37.69:/rhs/brick4/krkvol
Brick35: 10.70.37.60:/rhs/brick4/krkvol
Brick36: 10.70.37.120:/rhs/brick4/krkvol
Options Reconfigured:
diagnostics.client-log-level: INFO
cluster.tier-max-files: 10000
cluster.read-freq-threshold: 1
cluster.write-freq-threshold: 1
features.record-counters: on
cluster.tier-demote-frequency: 300
cluster.watermark-low: 15
cluster.watermark-hi: 70
cluster.min-free-disk: 20
performance.write-behind: off
performance.open-behind: off
performance.read-ahead: off
performance.io-cache: off
cluster.tier-mode: cache
features.ctr-enabled: on
features.quota-deem-statfs: off
features.inode-quota: on
features.quota: on
performance.readdir-ahead: on


Version-Release number of selected component (if applicable):
glusterfs-server-3.7.5-17.el7rhgs.x86_64

How reproducible:
Yet to be determined

Steps to Reproduce:
While various IO's are in progress along with promotion/demotion happening on a tiered volume, detach tier and wait for it to complete.

Actual results:
Detach tier failed on one node

Expected results:
Detach tier should succeed from all nodes

Additional info:
sosreports shall be attached shortly.
Comment 1 krishnaram Karthick 2016-01-30 04:56:05 EST
sosreports are available here --> http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1303291/
Comment 3 hari gowtham 2016-06-03 04:16:00 EDT
Found the roll call caused due to to stale file handle on the tier log of the node on which detach has failed.

[2016-01-28 08:59:37.329788] E [MSGID: 109037] [tier.c:403:tier_migrate_using_query_file] 0-krk-vol-tier-dht: Error in parent lookup for 24be5704-b4c2-404e-8b79-d31720ca0ec1 [Stale file handle]                                       
[2016-01-28 09:00:30.353680] I [MSGID: 109038] [tier.c:585:tier_migrate_using_query_file] 0-krk-vol-tier-dht: Reached cycle migration limit.migrated bytes 4194307600 files 40
[2016-01-28 09:00:30.354004] E [MSGID: 109037] [tier.c:1531:tier_start] 0-krk-vol-tier-dht: Demotion failed

The stale file error caused the detach to fail.

patch on master : http://review.gluster.org/#/c/13501/

Patch on 3.7 : http://review.gluster.org/#/c/13692/
Comment 4 hari gowtham 2016-06-06 04:16:35 EDT

*** This bug has been marked as a duplicate of bug 1296908 ***
Comment 5 hari gowtham 2016-06-06 06:11:52 EDT
confused with another bug. will update the RCA soon.
Comment 7 Milind Changire 2017-01-18 05:19:31 EST
Moving to MODIFIED.
Patch available downstream as commit d9b5fef.

Note You need to log in before you can comment on or make changes to this bug.