Hide Forgot
Description of problem: On a 16 node setup with a tiered volume configured (as in the vol info below), detach tier status on the volume shows the status on one node as 'failed'. There are no information of any failures in the log messages though and files seems to have migrated from hot tier to cold tier. There are no disruption to IOs from fuse mount. This issue appears to be a cosmetic issue, however we'll need to analyze and confirm the same. [root@dhcp37-101 glusterfs]# gluster v tier krk-vol detach status Node Rebalanced-files size scanned failures skipped status --------- ----------- ----------- ----------- ----------- ----------- ------------ localhost 4353 24.3GB 17437 12999 0 completed 10.70.37.202 10006 21.1GB 19550 9477 0 completed 10.70.37.195 0 0Bytes 0 0 0 completed 10.70.35.44 721 8.1MB 20984 9768 0 completed 10.70.35.231 0 0Bytes 0 0 0 completed 10.70.35.176 0 0Bytes 173332 555 0 failed 10.70.35.232 0 0Bytes 0 0 0 completed 10.70.35.173 256 4.2MB 19009 8819 0 completed 10.70.35.163 0 0Bytes 0 0 0 completed 10.70.37.69 0 0Bytes 0 0 0 completed 10.70.37.60 6205 21.7GB 18453 12126 0 completed 10.70.37.120 0 0Bytes 0 0 0 completed <<snippet of 'gluster v status krk-vol'>> Task Status of Volume krk-vol ------------------------------------------------------------------------------ Task : Detach tier ID : e2857877-fc13-45ad-85cf-5e721b400212 Status : failed <<gluster v info>> Volume Name: krk-vol Type: Tier Volume ID: 192655ce-4ef6-4ada-8e0c-6f137e2721e1 Status: Started Number of Bricks: 36 Transport-type: tcp Hot Tier : Hot Tier Type : Distributed-Replicate Number of Bricks: 6 x 2 = 12 Brick1: 10.70.37.101:/rhs/brick6/krkvol Brick2: 10.70.37.69:/rhs/brick6/krkvol Brick3: 10.70.37.60:/rhs/brick6/krkvol Brick4: 10.70.37.120:/rhs/brick6/krkvol Brick5: 10.70.37.202:/rhs/brick6/krkvol Brick6: 10.70.37.195:/rhs/brick6/krkvol Brick7: 10.70.35.44:/rhs/brick6/krkvol Brick8: 10.70.35.231:/rhs/brick6/krkvol Brick9: 10.70.35.176:/rhs/brick6/krkvol Brick10: 10.70.35.232:/rhs/brick6/krkvol Brick11: 10.70.35.173:/rhs/brick6/krkvol Brick12: 10.70.35.163:/rhs/brick6/krkvol Cold Tier: Cold Tier Type : Distributed-Disperse Number of Bricks: 2 x (8 + 4) = 24 Brick13: 10.70.35.176:/rhs/brick5/krkvol Brick14: 10.70.35.232:/rhs/brick5/krkvol Brick15: 10.70.35.173:/rhs/brick5/krkvol Brick16: 10.70.35.163:/rhs/brick5/krkvol Brick17: 10.70.37.101:/rhs/brick5/krkvol Brick18: 10.70.37.69:/rhs/brick5/krkvol Brick19: 10.70.37.60:/rhs/brick5/krkvol Brick20: 10.70.37.120:/rhs/brick5/krkvol Brick21: 10.70.37.202:/rhs/brick4/krkvol Brick22: 10.70.37.195:/rhs/brick4/krkvol Brick23: 10.70.35.155:/rhs/brick4/krkvol Brick24: 10.70.35.222:/rhs/brick4/krkvol Brick25: 10.70.35.108:/rhs/brick4/krkvol Brick26: 10.70.35.44:/rhs/brick4/krkvol Brick27: 10.70.35.89:/rhs/brick4/krkvol Brick28: 10.70.35.231:/rhs/brick4/krkvol Brick29: 10.70.35.176:/rhs/brick4/krkvol Brick30: 10.70.35.232:/rhs/brick4/krkvol Brick31: 10.70.35.173:/rhs/brick4/krkvol Brick32: 10.70.35.163:/rhs/brick4/krkvol Brick33: 10.70.37.101:/rhs/brick4/krkvol Brick34: 10.70.37.69:/rhs/brick4/krkvol Brick35: 10.70.37.60:/rhs/brick4/krkvol Brick36: 10.70.37.120:/rhs/brick4/krkvol Options Reconfigured: diagnostics.client-log-level: INFO cluster.tier-max-files: 10000 cluster.read-freq-threshold: 1 cluster.write-freq-threshold: 1 features.record-counters: on cluster.tier-demote-frequency: 300 cluster.watermark-low: 15 cluster.watermark-hi: 70 cluster.min-free-disk: 20 performance.write-behind: off performance.open-behind: off performance.read-ahead: off performance.io-cache: off cluster.tier-mode: cache features.ctr-enabled: on features.quota-deem-statfs: off features.inode-quota: on features.quota: on performance.readdir-ahead: on Version-Release number of selected component (if applicable): glusterfs-server-3.7.5-17.el7rhgs.x86_64 How reproducible: Yet to be determined Steps to Reproduce: While various IO's are in progress along with promotion/demotion happening on a tiered volume, detach tier and wait for it to complete. Actual results: Detach tier failed on one node Expected results: Detach tier should succeed from all nodes Additional info: sosreports shall be attached shortly.
sosreports are available here --> http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1303291/
Found the roll call caused due to to stale file handle on the tier log of the node on which detach has failed. [2016-01-28 08:59:37.329788] E [MSGID: 109037] [tier.c:403:tier_migrate_using_query_file] 0-krk-vol-tier-dht: Error in parent lookup for 24be5704-b4c2-404e-8b79-d31720ca0ec1 [Stale file handle] [2016-01-28 09:00:30.353680] I [MSGID: 109038] [tier.c:585:tier_migrate_using_query_file] 0-krk-vol-tier-dht: Reached cycle migration limit.migrated bytes 4194307600 files 40 [2016-01-28 09:00:30.354004] E [MSGID: 109037] [tier.c:1531:tier_start] 0-krk-vol-tier-dht: Demotion failed The stale file error caused the detach to fail. patch on master : http://review.gluster.org/#/c/13501/ Patch on 3.7 : http://review.gluster.org/#/c/13692/
*** This bug has been marked as a duplicate of bug 1296908 ***
confused with another bug. will update the RCA soon.
Moving to MODIFIED. Patch available downstream as commit d9b5fef.
As tier is not being actively developed, I'm closing this bug. Feel free to open it if necessary.