Bug 1459603 - Brick multiplexing: Ending up with multiple brick processes and hence losing brick mux property
Brick multiplexing: Ending up with multiple brick processes and hence losing ...
Status: CLOSED WONTFIX
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: core (Show other bugs)
3.3
Unspecified Unspecified
unspecified Severity high
: ---
: ---
Assigned To: Atin Mukherjee
Rahul Hinduja
brick-multiplexing
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-06-07 10:42 EDT by nchilaka
Modified: 2017-08-31 09:31 EDT (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-06-08 01:18:26 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description nchilaka 2017-06-07 10:42:37 EDT
Description of problem:
===============
After killing a brick ,when we start force a non-base volume,it gets a brick PID.
Now if we start the base volume either using glusterd restart of start force the base volume ,it gets a different brick pid instead of taking the already existing brick pid
hence losing brick mux property

Version-Release number of selected component (if applicable):
======
3.8.4-27

How reproducible:
always

Steps to Reproduce:
1.enable brick mux and have 3 volumes v1,v2,v3 with  v1 as base volume
2.now kill b1 of v1, which means all the volumes' b1 are down
3. now start b1 on v3 using start force
4. now restart glusterd or use start force v1 to brick up b1 on v1 


Actual results:
=============
b1 on v1 and v2 is same but is different from b1 on v3

Expected results:
========
should pickup an already available brick pid ie the one on v3

Additional info:
Comment 2 Atin Mukherjee 2017-06-08 01:18:26 EDT
This is only reproducible when glusterd is restarted, I couldn't reproduce it when v1 is force started (tried multiple times).

Now coming to the RCA part. So before restarting glusterd brick of v3 was up where as bricks of v1 & v2 were down. Now when glusterd was restarted v3's brick status will be updated to started only when glusterd receives the RPC_CLNT_CONNECT event. But given on gluster restart bricks are started with no_wait mode i.e. multiple bricks are started parallely before v3's brick status is updated to started state the find_compat_brick_in_vol () will not find any bricks which is compatible to attach as glusterd is still not aware of if brick has been successfully started or not. So this is the reason why you see v1 & v2 are having the same brick pid where as v3 has a different one.

This behaviour is as per the design and I think this is something we can live with for sure.
Comment 3 nchilaka 2017-07-07 03:38:21 EDT
the same problem is seen in below case:
1)create v1 with b1,b2,b3
2)enable brick mux
3)kill b1
4) create a new vol v2 where b2 and b3 will have same pids as v1 but b2 will  a pid which is for itself(as no other bricks exist on that node0
5) now service glusterd restart to bring v1 up
6)now b1 of v1 is not the same pid of b1 of v2, hence losing brick mux facility

Is that still fine , can we live with this?
Comment 5 nchilaka 2017-07-07 09:50:44 EDT
Atin, Seeing this on 3.8.4-32 itself
Comment 6 Atin Mukherjee 2017-07-09 11:23:41 EDT
(In reply to nchilaka from comment #3)
> the same problem is seen in below case:
> 1)create v1 with b1,b2,b3
> 2)enable brick mux
> 3)kill b1
> 4) create a new vol v2 where b2 and b3 will have same pids as v1 but b2 will
> a pid which is for itself(as no other bricks exist on that node0
> 5) now service glusterd restart to bring v1 up
> 6)now b1 of v1 is not the same pid of b1 of v2, hence losing brick mux
> facility
> 
> Is that still fine , can we live with this?

The steps are not clear to me. You'd need to elaborate here. How did you kill the brick? Is it the entire process or a particular brick by gf_attach utility?

"4) create a new vol v2 where b2 and b3 will have same pids as v1 but b2 will  a pid which is for itself(as no other bricks exist on that node0" - I didn't understand this at all.
Comment 7 Atin Mukherjee 2017-07-09 11:33:46 EDT
OK, so I am able to reproduce one such scenario where a volume was created and started with 3 bricks being multiplexed and the process was brought down and a new volume was created and started. Now on restarting glusterd I see two brick processes, one for vol1 and the other for vol2. I'm looking into it, but this doesn't look like a blocker as there is no functionality loss as such. You'd need to file a different bug for this observation.
Comment 8 Atin Mukherjee 2017-07-09 11:37:07 EDT
(In reply to Atin Mukherjee from comment #7)
> OK, so I am able to reproduce one such scenario where a volume was created
> and started with 3 bricks being multiplexed and the process was brought down
> and a new volume was created and started. Now on restarting glusterd I see
> two brick processes, one for vol1 and the other for vol2. I'm looking into
> it, but this doesn't look like a blocker as there is no functionality loss
> as such. You'd need to file a different bug for this observation.

Alright, so the RCA is same of comment 2. No need for a different bug.

Note You need to log in before you can comment on or make changes to this bug.