Bug 1812402 - [RHEL-8.1]-Remove-brick fails to start on local host on a SSL enabled setup
Summary: [RHEL-8.1]-Remove-brick fails to start on local host on a SSL enabled setup
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: distribute
Version: rhgs-3.5
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: Susant Kumar Palai
QA Contact: Prasad Desala
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-03-11 09:23 UTC by Upasana
Modified: 2020-05-08 16:37 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-03-18 05:46:55 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Upasana 2020-03-11 09:23:21 UTC
Description of problem:
=======================

I had a 6X3 replica volume on a SSl enabled setup in rhel-8 and started remove-brick for the same , i see that remove-brick has failed to start on the local-host


Version-Release number of selected component (if applicable):
=============================================================
glusterfs-6.0-30.el8rhgs.x86_64


How reproducible:
===================
1/1


Steps to Reproduce:
===================
1.Had an 5X3 replica volume , added bricks and waited to reabalance to complete
2.On a 6X3 replica volume , remove -brick started and it is failing to start on local host

3.Remove brick should start successfully

Actual results:
===============
Remove brick fails to start on local host


Expected results:
=================
Remove brick should start on local host


Additional info:
===============
[root@rhsqa3 glusterfs]# rpm -qa|grep gluster
glusterfs-6.0-30.el8rhgs.x86_64
glusterfs-api-6.0-30.el8rhgs.x86_64
glusterfs-server-6.0-30.el8rhgs.x86_64
glusterfs-debuginfo-6.0-30.el8rhgs.x86_64
glusterfs-libs-6.0-30.el8rhgs.x86_64
glusterfs-client-xlators-6.0-30.el8rhgs.x86_64
glusterfs-cli-6.0-30.el8rhgs.x86_64
python3-gluster-6.0-30.el8rhgs.x86_64
glusterfs-fuse-6.0-30.el8rhgs.x86_64
glusterfs-devel-6.0-30.el8rhgs.x86_64
glusterfs-rdma-6.0-30.el8rhgs.x86_64
glusterfs-events-6.0-30.el8rhgs.x86_64



[root@rhsqa3 glusterfs]# gluster v info
 
Volume Name: replica-vol
Type: Distributed-Replicate
Volume ID: 410abc32-e37d-4e47-81ad-789f508e2c25
Status: Started
Snapshot Count: 0
Number of Bricks: 6 x 3 = 18
Transport-type: tcp
Bricks:
Brick1: rhsqa1.lab.eng.blr.redhat.com:/bricks/brick0/rep1
Brick2: rhsqa2.lab.eng.blr.redhat.com:/bricks/brick0/rep1
Brick3: rhsqa3.lab.eng.blr.redhat.com:/bricks/brick0/rep1
Brick4: rhsqa1.lab.eng.blr.redhat.com:/bricks/brick1/rep2
Brick5: rhsqa2.lab.eng.blr.redhat.com:/bricks/brick1/rep2
Brick6: rhsqa4.lab.eng.blr.redhat.com:/bricks/brick1/rep2
Brick7: rhsqa4.lab.eng.blr.redhat.com:/bricks/brick2/rep3
Brick8: rhsqa2.lab.eng.blr.redhat.com:/bricks/brick2/rep3
Brick9: rhsqa3.lab.eng.blr.redhat.com:/bricks/brick2/rep3
Brick10: rhsqa1.lab.eng.blr.redhat.com:/bricks/brick3/rep4
Brick11: rhsqa4.lab.eng.blr.redhat.com:/bricks/brick3/rep4
Brick12: rhsqa3.lab.eng.blr.redhat.com:/bricks/brick3/rep4
Brick13: rhsqa1.lab.eng.blr.redhat.com:/bricks/brick4/rep5
Brick14: rhsqa2.lab.eng.blr.redhat.com:/bricks/brick4/rep5
Brick15: rhsqa3.lab.eng.blr.redhat.com:/bricks/brick4/rep5
Brick16: rhsqa1.lab.eng.blr.redhat.com:/bricks/brick5/rep6
Brick17: rhsqa3.lab.eng.blr.redhat.com:/bricks/brick5/rep6
Brick18: rhsqa4.lab.eng.blr.redhat.com:/bricks/brick5/rep6
Options Reconfigured:
performance.client-io-threads: off
nfs.disable: on
storage.fips-mode-rchecksum: on
transport.address-family: inet
auth.ssl-allow: rhsqa1.lab.eng.blr.redhat.com,rhsqa2.lab.eng.blr.redhat.com,rhsqa3.lab.eng.blr.redhat.com,rhsqa4.lab.eng.blr.redhat.com,rhsqa5.lab.eng.blr.redhat.com,rhsqa8.lab.eng.blr.redhat.com,rhs-client21.lab.eng.blr.redhat.com
client.ssl: on
server.ssl: on



[root@rhsqa3 glusterfs]# gluster v status
Status of volume: replica-vol
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick rhsqa1.lab.eng.blr.redhat.com:/bricks
/brick0/rep1                                49152     0          Y       4925 
Brick rhsqa2.lab.eng.blr.redhat.com:/bricks
/brick0/rep1                                49152     0          Y       1310 
Brick rhsqa3.lab.eng.blr.redhat.com:/bricks
/brick0/rep1                                49152     0          Y       2608 
Brick rhsqa1.lab.eng.blr.redhat.com:/bricks
/brick1/rep2                                49153     0          Y       4945 
Brick rhsqa2.lab.eng.blr.redhat.com:/bricks
/brick1/rep2                                49153     0          Y       1330 
Brick rhsqa4.lab.eng.blr.redhat.com:/bricks
/brick1/rep2                                49152     0          Y       6220 
Brick rhsqa4.lab.eng.blr.redhat.com:/bricks
/brick2/rep3                                49153     0          Y       6240 
Brick rhsqa2.lab.eng.blr.redhat.com:/bricks
/brick2/rep3                                49154     0          Y       1350 
Brick rhsqa3.lab.eng.blr.redhat.com:/bricks
/brick2/rep3                                49153     0          Y       2614 
Brick rhsqa1.lab.eng.blr.redhat.com:/bricks
/brick3/rep4                                49154     0          Y       4965 
Brick rhsqa4.lab.eng.blr.redhat.com:/bricks
/brick3/rep4                                49154     0          Y       6260 
Brick rhsqa3.lab.eng.blr.redhat.com:/bricks
/brick3/rep4                                49154     0          Y       2625 
Brick rhsqa1.lab.eng.blr.redhat.com:/bricks
/brick4/rep5                                49155     0          Y       4985 
Brick rhsqa2.lab.eng.blr.redhat.com:/bricks
/brick4/rep5                                49155     0          Y       1371 
Brick rhsqa3.lab.eng.blr.redhat.com:/bricks
/brick4/rep5                                49155     0          Y       2626 
Brick rhsqa1.lab.eng.blr.redhat.com:/bricks
/brick5/rep6                                49156     0          Y       24539
Brick rhsqa3.lab.eng.blr.redhat.com:/bricks
/brick5/rep6                                49156     0          Y       30262
Brick rhsqa4.lab.eng.blr.redhat.com:/bricks
/brick5/rep6                                49155     0          Y       14955
Self-heal Daemon on localhost               N/A       N/A        Y       30289
Self-heal Daemon on rhsqa2.lab.eng.blr.redh
at.com                                      N/A       N/A        Y       20172
Self-heal Daemon on rhsqa1.lab.eng.blr.redh
at.com                                      N/A       N/A        Y       24561
Self-heal Daemon on rhsqa4.lab.eng.blr.redh
at.com                                      N/A       N/A        Y       14976
 
Task Status of Volume replica-vol
------------------------------------------------------------------------------
Task                 : Remove brick        
ID                   : c72a9d5e-0bd4-4c4f-87e5-7c3356e7879b
Removed bricks:     
rhsqa4.lab.eng.blr.redhat.com:/bricks/brick2/rep3
rhsqa2.lab.eng.blr.redhat.com:/bricks/brick2/rep3
rhsqa3.lab.eng.blr.redhat.com:/bricks/brick2/rep3
rhsqa1.lab.eng.blr.redhat.com:/bricks/brick3/rep4
rhsqa4.lab.eng.blr.redhat.com:/bricks/brick3/rep4
rhsqa3.lab.eng.blr.redhat.com:/bricks/brick3/rep4
Status               : in progress         
 
[root@rhsqa3 glusterfs]# 



Rebalance status
======================

[root@rhsqa3 glusterfs]# gluster v remove-brick replica-vol rhsqa4.lab.eng.blr.redhat.com:/bricks/brick2/rep3 rhsqa2.lab.eng.blr.redhat.com:/bricks/brick2/rep3 rhsqa3.lab.eng.blr.redhat.com:/bricks/brick2/rep3 rhsqa1.lab.eng.blr.redhat.com:/bricks/brick3/rep4 rhsqa4.lab.eng.blr.redhat.com:/bricks/brick3/rep4 rhsqa3.lab.eng.blr.redhat.com:/bricks/brick3/rep4 status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status  run time in h:m:s
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
           rhsqa1.lab.eng.blr.redhat.com             2002       159.7MB         15786             0             0          in progress        0:22:11
           rhsqa2.lab.eng.blr.redhat.com             1449        23.4MB          9881             0             0          in progress        0:22:11
           rhsqa4.lab.eng.blr.redhat.com              783         6.4MB          6401             0             0          in progress        0:22:11
                               localhost                0        0Bytes             0             1             0               failed        0:00:00
Estimated time left for rebalance to complete :      488:07:19
[root@rhsqa3 glusterfs]# 


Server rebalance logs on local host
====================================
[2020-03-11 08:59:31.772117] I [MSGID: 109081] [dht-common.c:5872:dht_setxattr] 0-replica-vol-dht: fixing the layout of /
[2020-03-11 08:59:31.772150] W [MSGID: 109016] [dht-selfheal.c:1784:dht_fix_layout_of_directory] 0-replica-vol-dht: Layout fix failed: 1 subvolume(s) are down. Skipping fix layout. path:/ gfid:00000000-0000-0000-0000-000000000001
[2020-03-11 08:59:31.772176] E [MSGID: 109026] [dht-rebalance.c:4680:gf_defrag_start_crawl] 0-replica-vol-dht: fix layout on / failed [Transport endpoint is not connected]
[2020-03-11 08:59:31.772504] I [MSGID: 109028] [dht-rebalance.c:5059:gf_defrag_status_get] 0-replica-vol-dht: Rebalance is failed. Time taken is 0.00 secs
[2020-03-11 08:59:31.772521] I [MSGID: 109028] [dht-rebalance.c:5065:gf_defrag_status_get] 0-replica-vol-dht: Files migrated: 0, size: 0, lookups: 0, failures: 1, skipped: 0
[2020-03-11 08:59:31.773135] W [glusterfsd.c:1581:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x82de) [0x7f2250a4d2de] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xfd) [0x55a18049a86d] -->/usr/sbin/glusterfs(cleanup_and_exit+0x58) [0x55a18049a6b8] ) 0-: received signum (15), shutting down


Note You need to log in before you can comment on or make changes to this bug.