Description of problem: ======================= I had a 6X3 replica volume on a SSl enabled setup in rhel-8 and started remove-brick for the same , i see that remove-brick has failed to start on the local-host Version-Release number of selected component (if applicable): ============================================================= glusterfs-6.0-30.el8rhgs.x86_64 How reproducible: =================== 1/1 Steps to Reproduce: =================== 1.Had an 5X3 replica volume , added bricks and waited to reabalance to complete 2.On a 6X3 replica volume , remove -brick started and it is failing to start on local host 3.Remove brick should start successfully Actual results: =============== Remove brick fails to start on local host Expected results: ================= Remove brick should start on local host Additional info: =============== [root@rhsqa3 glusterfs]# rpm -qa|grep gluster glusterfs-6.0-30.el8rhgs.x86_64 glusterfs-api-6.0-30.el8rhgs.x86_64 glusterfs-server-6.0-30.el8rhgs.x86_64 glusterfs-debuginfo-6.0-30.el8rhgs.x86_64 glusterfs-libs-6.0-30.el8rhgs.x86_64 glusterfs-client-xlators-6.0-30.el8rhgs.x86_64 glusterfs-cli-6.0-30.el8rhgs.x86_64 python3-gluster-6.0-30.el8rhgs.x86_64 glusterfs-fuse-6.0-30.el8rhgs.x86_64 glusterfs-devel-6.0-30.el8rhgs.x86_64 glusterfs-rdma-6.0-30.el8rhgs.x86_64 glusterfs-events-6.0-30.el8rhgs.x86_64 [root@rhsqa3 glusterfs]# gluster v info Volume Name: replica-vol Type: Distributed-Replicate Volume ID: 410abc32-e37d-4e47-81ad-789f508e2c25 Status: Started Snapshot Count: 0 Number of Bricks: 6 x 3 = 18 Transport-type: tcp Bricks: Brick1: rhsqa1.lab.eng.blr.redhat.com:/bricks/brick0/rep1 Brick2: rhsqa2.lab.eng.blr.redhat.com:/bricks/brick0/rep1 Brick3: rhsqa3.lab.eng.blr.redhat.com:/bricks/brick0/rep1 Brick4: rhsqa1.lab.eng.blr.redhat.com:/bricks/brick1/rep2 Brick5: rhsqa2.lab.eng.blr.redhat.com:/bricks/brick1/rep2 Brick6: rhsqa4.lab.eng.blr.redhat.com:/bricks/brick1/rep2 Brick7: rhsqa4.lab.eng.blr.redhat.com:/bricks/brick2/rep3 Brick8: rhsqa2.lab.eng.blr.redhat.com:/bricks/brick2/rep3 Brick9: rhsqa3.lab.eng.blr.redhat.com:/bricks/brick2/rep3 Brick10: rhsqa1.lab.eng.blr.redhat.com:/bricks/brick3/rep4 Brick11: rhsqa4.lab.eng.blr.redhat.com:/bricks/brick3/rep4 Brick12: rhsqa3.lab.eng.blr.redhat.com:/bricks/brick3/rep4 Brick13: rhsqa1.lab.eng.blr.redhat.com:/bricks/brick4/rep5 Brick14: rhsqa2.lab.eng.blr.redhat.com:/bricks/brick4/rep5 Brick15: rhsqa3.lab.eng.blr.redhat.com:/bricks/brick4/rep5 Brick16: rhsqa1.lab.eng.blr.redhat.com:/bricks/brick5/rep6 Brick17: rhsqa3.lab.eng.blr.redhat.com:/bricks/brick5/rep6 Brick18: rhsqa4.lab.eng.blr.redhat.com:/bricks/brick5/rep6 Options Reconfigured: performance.client-io-threads: off nfs.disable: on storage.fips-mode-rchecksum: on transport.address-family: inet auth.ssl-allow: rhsqa1.lab.eng.blr.redhat.com,rhsqa2.lab.eng.blr.redhat.com,rhsqa3.lab.eng.blr.redhat.com,rhsqa4.lab.eng.blr.redhat.com,rhsqa5.lab.eng.blr.redhat.com,rhsqa8.lab.eng.blr.redhat.com,rhs-client21.lab.eng.blr.redhat.com client.ssl: on server.ssl: on [root@rhsqa3 glusterfs]# gluster v status Status of volume: replica-vol Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick rhsqa1.lab.eng.blr.redhat.com:/bricks /brick0/rep1 49152 0 Y 4925 Brick rhsqa2.lab.eng.blr.redhat.com:/bricks /brick0/rep1 49152 0 Y 1310 Brick rhsqa3.lab.eng.blr.redhat.com:/bricks /brick0/rep1 49152 0 Y 2608 Brick rhsqa1.lab.eng.blr.redhat.com:/bricks /brick1/rep2 49153 0 Y 4945 Brick rhsqa2.lab.eng.blr.redhat.com:/bricks /brick1/rep2 49153 0 Y 1330 Brick rhsqa4.lab.eng.blr.redhat.com:/bricks /brick1/rep2 49152 0 Y 6220 Brick rhsqa4.lab.eng.blr.redhat.com:/bricks /brick2/rep3 49153 0 Y 6240 Brick rhsqa2.lab.eng.blr.redhat.com:/bricks /brick2/rep3 49154 0 Y 1350 Brick rhsqa3.lab.eng.blr.redhat.com:/bricks /brick2/rep3 49153 0 Y 2614 Brick rhsqa1.lab.eng.blr.redhat.com:/bricks /brick3/rep4 49154 0 Y 4965 Brick rhsqa4.lab.eng.blr.redhat.com:/bricks /brick3/rep4 49154 0 Y 6260 Brick rhsqa3.lab.eng.blr.redhat.com:/bricks /brick3/rep4 49154 0 Y 2625 Brick rhsqa1.lab.eng.blr.redhat.com:/bricks /brick4/rep5 49155 0 Y 4985 Brick rhsqa2.lab.eng.blr.redhat.com:/bricks /brick4/rep5 49155 0 Y 1371 Brick rhsqa3.lab.eng.blr.redhat.com:/bricks /brick4/rep5 49155 0 Y 2626 Brick rhsqa1.lab.eng.blr.redhat.com:/bricks /brick5/rep6 49156 0 Y 24539 Brick rhsqa3.lab.eng.blr.redhat.com:/bricks /brick5/rep6 49156 0 Y 30262 Brick rhsqa4.lab.eng.blr.redhat.com:/bricks /brick5/rep6 49155 0 Y 14955 Self-heal Daemon on localhost N/A N/A Y 30289 Self-heal Daemon on rhsqa2.lab.eng.blr.redh at.com N/A N/A Y 20172 Self-heal Daemon on rhsqa1.lab.eng.blr.redh at.com N/A N/A Y 24561 Self-heal Daemon on rhsqa4.lab.eng.blr.redh at.com N/A N/A Y 14976 Task Status of Volume replica-vol ------------------------------------------------------------------------------ Task : Remove brick ID : c72a9d5e-0bd4-4c4f-87e5-7c3356e7879b Removed bricks: rhsqa4.lab.eng.blr.redhat.com:/bricks/brick2/rep3 rhsqa2.lab.eng.blr.redhat.com:/bricks/brick2/rep3 rhsqa3.lab.eng.blr.redhat.com:/bricks/brick2/rep3 rhsqa1.lab.eng.blr.redhat.com:/bricks/brick3/rep4 rhsqa4.lab.eng.blr.redhat.com:/bricks/brick3/rep4 rhsqa3.lab.eng.blr.redhat.com:/bricks/brick3/rep4 Status : in progress [root@rhsqa3 glusterfs]# Rebalance status ====================== [root@rhsqa3 glusterfs]# gluster v remove-brick replica-vol rhsqa4.lab.eng.blr.redhat.com:/bricks/brick2/rep3 rhsqa2.lab.eng.blr.redhat.com:/bricks/brick2/rep3 rhsqa3.lab.eng.blr.redhat.com:/bricks/brick2/rep3 rhsqa1.lab.eng.blr.redhat.com:/bricks/brick3/rep4 rhsqa4.lab.eng.blr.redhat.com:/bricks/brick3/rep4 rhsqa3.lab.eng.blr.redhat.com:/bricks/brick3/rep4 status Node Rebalanced-files size scanned failures skipped status run time in h:m:s --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- rhsqa1.lab.eng.blr.redhat.com 2002 159.7MB 15786 0 0 in progress 0:22:11 rhsqa2.lab.eng.blr.redhat.com 1449 23.4MB 9881 0 0 in progress 0:22:11 rhsqa4.lab.eng.blr.redhat.com 783 6.4MB 6401 0 0 in progress 0:22:11 localhost 0 0Bytes 0 1 0 failed 0:00:00 Estimated time left for rebalance to complete : 488:07:19 [root@rhsqa3 glusterfs]# Server rebalance logs on local host ==================================== [2020-03-11 08:59:31.772117] I [MSGID: 109081] [dht-common.c:5872:dht_setxattr] 0-replica-vol-dht: fixing the layout of / [2020-03-11 08:59:31.772150] W [MSGID: 109016] [dht-selfheal.c:1784:dht_fix_layout_of_directory] 0-replica-vol-dht: Layout fix failed: 1 subvolume(s) are down. Skipping fix layout. path:/ gfid:00000000-0000-0000-0000-000000000001 [2020-03-11 08:59:31.772176] E [MSGID: 109026] [dht-rebalance.c:4680:gf_defrag_start_crawl] 0-replica-vol-dht: fix layout on / failed [Transport endpoint is not connected] [2020-03-11 08:59:31.772504] I [MSGID: 109028] [dht-rebalance.c:5059:gf_defrag_status_get] 0-replica-vol-dht: Rebalance is failed. Time taken is 0.00 secs [2020-03-11 08:59:31.772521] I [MSGID: 109028] [dht-rebalance.c:5065:gf_defrag_status_get] 0-replica-vol-dht: Files migrated: 0, size: 0, lookups: 0, failures: 1, skipped: 0 [2020-03-11 08:59:31.773135] W [glusterfsd.c:1581:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x82de) [0x7f2250a4d2de] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xfd) [0x55a18049a86d] -->/usr/sbin/glusterfs(cleanup_and_exit+0x58) [0x55a18049a6b8] ) 0-: received signum (15), shutting down