Bug 1443843

Summary: Brick Multiplexing :- resetting a brick bring down other bricks with same PID
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Karan Sandha <ksandha>
Component: coreAssignee: Samikshan Bairagya <sbairagy>
Status: CLOSED ERRATA QA Contact: Nag Pavan Chilakam <nchilaka>
Severity: high Docs Contact:
Priority: unspecified    
Version: rhgs-3.3CC: amukherj, rhs-bugs, storage-qa-internal
Target Milestone: ---   
Target Release: RHGS 3.3.0   
Hardware: All   
OS: Linux   
Whiteboard: brick-multiplexing
Fixed In Version: glusterfs-3.8.4-26 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1446172 (view as bug list) Environment:
Last Closed: 2017-09-21 04:37:54 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1446172, 1449933, 1449934    
Bug Blocks: 1417151    

Description Karan Sandha 2017-04-20 06:42:20 UTC
Description of problem:
resetting a single brick bring the other brick down with same pid

Version-Release number of selected component (if applicable):
3.8.4-22

How reproducible:
100% 

Steps to Reproduce:
1.

[root@K1 ~]# gluster v reset-brick testvol 10.70.47.60:/bricks/brick0/b3 start
volume reset-brick: success: reset-brick start operation successful

2. 

[root@K1 b3]# gluster v status testvol
Status of volume: testvol
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.47.60:/bricks/brick0/b3         N/A       N/A        N       N/A  
Brick 10.70.46.218:/bricks/brick0/b2        49152     0          Y       374  
Brick 10.70.47.61:/bricks/brick0/b3         49152     0          Y       24892
Brick 10.70.47.60:/bricks/brick1/b3         N/A       N/A        N       N/A  
Brick 10.70.46.218:/bricks/brick1/b2        49152     0          Y       374  
Brick 10.70.47.61:/bricks/brick1/b3         49152     0          Y       24892
Brick 10.70.46.218:/bricks/brick2/b2        49152     0          Y       374  
Brick 10.70.47.61:/bricks/brick2/b3         49152     0          Y       24892
Brick 10.70.47.60:/bricks/brick2/b3         49153     0          Y       1629 
NFS Server on localhost                     2049      0          Y       1653 
Self-heal Daemon on localhost               N/A       N/A        Y       1662 
NFS Server on 10.70.46.218                  2049      0          Y       698  
Self-heal Daemon on 10.70.46.218            N/A       N/A        Y       707  
NFS Server on 10.70.47.61                   2049      0          Y       25123
Self-heal Daemon on 10.70.47.61             N/A       N/A        Y       25132
 
Task Status of Volume testvol
------------------------------------------------------------------------------
Task                 : Rebalance           
ID                   : e686e9ea-ad3d-4135-933d-2836075c16d7
Status               : completed           
 

3.

Actual results:
two bricks go down with same pid

Expected results:
Other brick should be unaffected.

Additional info:
logs placed at : rhsqe-repo.lab.eng.blr.redhat.com:/var/www/html/sosreports/<bug>

Comment 4 Atin Mukherjee 2017-04-27 11:56:40 UTC
upstream patch : https://review.gluster.org/#/c/17128/

Comment 5 Atin Mukherjee 2017-05-15 08:30:05 UTC
downstream patch : https://code.engineering.redhat.com/gerrit/#/c/106048/

Comment 7 Nag Pavan Chilakam 2017-06-10 08:20:53 UTC
onqa validation:
reset brick now doesnt bring down any other brick using same PID, hence moving to verified. tested with steps mentioned in summary
3.8.4-27

Comment 9 errata-xmlrpc 2017-09-21 04:37:54 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2774