Bug 1275602 - [Tier]: Ambiguous Error log for a file which is demoted from one of the replica child
Summary: [Tier]: Ambiguous Error log for a file which is demoted from one of the repli...
Keywords:
Status: CLOSED DUPLICATE of bug 1275158
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: tier
Version: rhgs-3.1
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
: ---
Assignee: Bug Updates Notification Mailing List
QA Contact: Nag Pavan Chilakam
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-10-27 10:12 UTC by Rahul Hinduja
Modified: 2016-09-17 15:34 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-10-31 14:24:27 UTC
Embargoed:


Attachments (Terms of Use)

Description Rahul Hinduja 2015-10-27 10:12:45 UTC
Description of problem:
=======================

In a replica setup, a file gets demoted from any one of the replica child. But the other replica child logs a error for "demotion failure". For example:

One replica child continuous logs Error:

[root@dhcp37-165 ~]# tailf /var/log/glusterfs/tiervolume-tier.log | grep " E " 
[2015-10-26 19:00:39.107354] E [MSGID: 109037] [tier.c:1446:tier_start] 0-tiervolume-tier-dht: Demotion failed
[2015-10-26 19:02:16.668893] E [MSGID: 109037] [tier.c:1446:tier_start] 0-tiervolume-tier-dht: Demotion failed
[2015-10-26 19:04:44.606759] E [MSGID: 109037] [tier.c:1446:tier_start] 0-tiervolume-tier-dht: Demotion failed
[2015-10-26 19:06:24.076481] E [MSGID: 109037] [tier.c:1446:tier_start] 0-tiervolume-tier-dht: Demotion failed
[2015-10-26 19:08:41.747167] E [MSGID: 109037] [tier.c:1446:tier_start] 0-tiervolume-tier-dht: Demotion failed


Whereas its equivalent replica child doesn't log error:

[root@dhcp37-133 glusterfs]# tailf /var/log/glusterfs/tiervolume-tier.log | grep " E " 



This concludes that the file gets demoted by the other child. Log snippet:

[2015-10-26 21:58:17.760582] I [MSGID: 109038] [tier.c:476:tier_migrate_using_query_file] 0-tiervolume-tier-dht: Tier 0 src_subvol tiervolume-hot-dht file 562dceb6%%YU616R45A1
[2015-10-26 21:58:17.761275] I [MSGID: 109038] [tier.c:109:tier_check_same_node] 0-tiervolume-tier-dht: /thread1/level029/level129/level229/level329/level429/level529/level629/level729/level829/562dceb6%%YU616R45A1 does not belong to this node
[2015-10-26 21:58:17.766512] I [MSGID: 109038] [tier.c:476:tier_migrate_using_query_file] 0-tiervolume-tier-dht: Tier 0 src_subvol tiervolume-hot-dht file 562dcf01%%RGSBRPPT8J
[2015-10-26 21:58:17.766978] I [MSGID: 109038] [tier.c:109:tier_check_same_node] 0-tiervolume-tier-dht: /thread0/level029/level129/level229/level329/level429/level529/level629/level729/562dcf01%%RGSBRPPT8J does not belong to this node
[2015-10-26 21:58:17.771192] I [MSGID: 109038] [tier.c:476:tier_migrate_using_query_file] 0-tiervolume-tier-dht: Tier 0 src_subvol tiervolume-hot-dht file 562dcf2a%%YSJC8E9V56
[2015-10-26 21:58:17.771790] I [MSGID: 109038] [tier.c:109:tier_check_same_node] 0-tiervolume-tier-dht: /thread0/level029/level129/level229/level329/level429/level529/level629/level729/level829/562dcf2a%%YSJC8E9V56 does not belong to this node
[2015-10-26 21:58:17.775665] I [MSGID: 109038] [tier.c:476:tier_migrate_using_query_file] 0-tiervolume-tier-dht: Tier 0 src_subvol tiervolume-hot-dht file 562dcf3d%%A6SVP21L38
[2015-10-26 21:58:17.776166] I [MSGID: 109038] [tier.c:109:tier_check_same_node] 0-tiervolume-tier-dht: /thread0/level029/level129/level229/level329/level429/level529/level629/level729/level829/level929/562dcf3d%%A6SVP21L38 does not belong to this node
[2015-10-26 21:58:17.780003] I [MSGID: 109038] [tier.c:476:tier_migrate_using_query_file] 0-tiervolume-tier-dht: Tier 0 src_subvol tiervolume-hot-dht file 562dcf47%%0GK5K1ZFUE
[2015-10-26 21:58:17.780789] I [MSGID: 109038] [tier.c:109:tier_check_same_node] 0-tiervolume-tier-dht: /thread0/level029/level129/level229/level329/level429/level529/level629/level729/level829/level929/562dcf47%%0GK5K1ZFUE does not belong to this node
[2015-10-26 21:58:17.785054] I [MSGID: 109038] [tier.c:476:tier_migrate_using_query_file] 0-tiervolume-tier-dht: Tier 0 src_subvol tiervolume-hot-dht file 562dcf4b%%YFGOB9WO1X
[2015-10-26 21:58:17.785732] I [MSGID: 109038] [tier.c:109:tier_check_same_node] 0-tiervolume-tier-dht: /thread0/level029/level129/level229/level329/level429/level529/level629/level729/level829/level929/562dcf4b%%YFGOB9WO1X does not belong to this node
[2015-10-26 21:58:17.789311] I [MSGID: 109038] [tier.c:476:tier_migrate_using_query_file] 0-tiervolume-tier-dht: Tier 0 src_subvol tiervolume-hot-dht file rsh.81
[2015-10-26 21:58:17.789904] I [MSGID: 109038] [tier.c:109:tier_check_same_node] 0-tiervolume-tier-dht: /rsh.81 does not belong to this node
[2015-10-26 21:58:17.795228] I [MSGID: 109038] [tier.c:476:tier_migrate_using_query_file] 0-tiervolume-tier-dht: Tier 0 src_subvol tiervolume-hot-dht file rsh.112
[2015-10-26 21:58:17.796431] I [MSGID: 109038] [tier.c:109:tier_check_same_node] 0-tiervolume-tier-dht: /rsh.112 does not belong to this node
[2015-10-26 21:58:17.800886] I [MSGID: 109038] [tier.c:476:tier_migrate_using_query_file] 0-tiervolume-tier-dht: Tier 0 src_subvol tiervolume-hot-dht file cert8.db
[2015-10-26 21:58:17.801465] I [MSGID: 109038] [tier.c:109:tier_check_same_node] 0-tiervolume-tier-dht: /abc.12/openldap/certs/cert8.db does not belong to this node
[2015-10-26 21:58:17.806664] I [MSGID: 109038] [tier.c:476:tier_migrate_using_query_file] 0-tiervolume-tier-dht: Tier 0 src_subvol tiervolume-hot-dht file secmod.db
[2015-10-26 21:58:17.807103] I [MSGID: 109038] [tier.c:109:tier_check_same_node] 0-tiervolume-tier-dht: /etc.8/openldap/certs/secmod.db does not belong to this node
[2015-10-26 21:58:17.807637] E [MSGID: 109037] [tier.c:1446:tier_start] 0-tiervolume-tier-dht: Demotion failed



Rebalance logs to many failures as:
===================================

[root@dhcp37-165 ~]# gluster volume rebalance tiervolume status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost              737        0Bytes      64224770        746596             0          in progress          531207.00
                            10.70.37.133             7511        0Bytes      58431687          2637             0          in progress          531207.00
                            10.70.37.160                2        0Bytes      10559684          5057             0          in progress          182423.00
                            10.70.37.158             7679        0Bytes      57794593          2494             0          in progress          531207.00
                            10.70.37.110              194        0Bytes      64354818        791001             0          in progress          531206.00
                            10.70.37.155             8432        0Bytes      58207442          2501             0          in progress          531207.00
                             10.70.37.99              738        0Bytes      64024039        773551             0          in progress          531207.00
                             10.70.37.88             9869        0Bytes      57896871          2577             0          in progress          531203.00
                            10.70.37.112              217        0Bytes      64384696        740388             0          in progress          531204.00
                            10.70.37.199             8545        0Bytes      58615684          2564             0          in progress          531203.00
                            10.70.37.162              216        0Bytes      64242075        743772             0          in progress          531203.00
                             10.70.37.87            11048        0Bytes      58441456          2523             0          in progress          531204.00
volume rebalance: tiervolume: success: 
[root@dhcp37-165 ~]# 


[root@dhcp37-165 ~]# gluster volume tier tiervolume status
Node                 Promoted files       Demoted files        Status              
---------            ---------            ---------            ---------           
localhost            547                  0                    in progress         
10.70.37.133         0                    7507                 in progress         
10.70.37.160         0                    0                    in progress         
10.70.37.158         0                    7671                 in progress         
10.70.37.110         0                    0                    in progress         
10.70.37.155         0                    8438                 in progress         
10.70.37.99          548                  0                    in progress         
10.70.37.88          0                    9874                 in progress         
10.70.37.112         0                    0                    in progress         
10.70.37.199         0                    8552                 in progress         
10.70.37.162         0                    0                    in progress         
10.70.37.87          0                    11065                in progress         
volume rebalance: tiervolume: success: 
[root@dhcp37-165 ~]# 

For a user this error is confusing as this means an demotion failure, whereas in actual demotion is succeeded from one of the child and attempt from another is failed. 


Version-Release number of selected component (if applicable):
=============================================================

glusterfs-3.7.5-0.3.el7rhgs.x86_64

How reproducible:
=================

Always


Steps to Reproduce:
===================
1. Cold tier: EC, Hot Tier: Distributed-Replicate
2. Continuous creation and lookup from mount
3. Monitor logs for demotion

Comment 3 Dan Lambright 2015-10-31 14:24:27 UTC
These log message were not accurate and were removed in the fix associated with bug 1275158.

*** This bug has been marked as a duplicate of bug 1275158 ***


Note You need to log in before you can comment on or make changes to this bug.