Bug 2015092

Summary: ./tests/bugs/core/bug-1432542-mpx-restart-crash.t is getting crashed
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Mohit Agrawal <moagrawa>
Component: coreAssignee: Mohit Agrawal <moagrawa>
Status: CLOSED ERRATA QA Contact: Sayalee <saraut>
Severity: urgent Docs Contact:
Priority: urgent    
Version: rhgs-3.5CC: aramteke, nyancey, rhs-bugs, sajmoham, sasundar, sheggodu, tshacked
Target Milestone: ---Keywords: CodeChange, ZStream
Target Release: RHGS 3.5.z Batch Update 7   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-6.0-62 Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-05-31 12:37:31 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Mohit Agrawal 2021-10-18 11:39:34 UTC
The test case ./tests/bugs/core/bug-1432542-mpx-restart-crash.t is getting crashed on running during periodic regression test suite.
For more refer the link (https://gluster-downstream-jenkins-csb-storage.apps.ocp4.prod.psi.redhat.com/job/periodic-regression-RHGS-3.5.6-on-RHEL7/20/consoleFull)

Reproducible: Run the test case in a loop to reproduce the issue

RCA: The test case ./tests/bugs/core/bug-1432542-mpx-restart-crash.t is getting crashed at the time of detaching a brick.The brick process is getting crashed because there is a race condition to send a disconnect on rpc associated with victim brick and handling GF_EVENT_CLEANUP for the victim brick.

Solution: Save victim_name on local variable to avoid crash.

The issue is already fixed in upstream(https://github.com/gluster/glusterfs/pull/1979)

Comment 1 SATHEESARAN 2021-10-20 12:31:54 UTC
Hi Mohit, 

I took a look at the test .t - https://github.com/gluster/glusterfs/blob/devel/tests/bugs/core/bug-1432542-mpx-restart-crash.t

So the test steps looks as:

1. brick-multiplex feature is turned on
2. Create a volumes ( 2x3 )
3. Mount the volume and write a 4K block sized file using dd
4. Check the memory used so far:
# pmap -x $(pgrep glusterfsd) | grep total
5.Check the threads count after creating the volume:
# ps -T -p $(pgrep glusterfsd) | wc -l
6. Repeat 2,3,4,5 for 14 more volumes
7. Stop glusterd
8. Kill all brick process
8. Restart glusterd
9. Clean the setup by umounting all the volumes, stopping and deleting the bricks

I see that this use case is not a widely use case. This test is written to make sure that the restart of glusterd,
doesn't core dumps. This use case is not widely used by the customers, should we address this fix downstream ?

On the other hand, you mention that getting crash at the time of detaching a brick, which step in this test involves detaching the brick ?

Comment 2 Mohit Agrawal 2021-10-20 12:39:06 UTC
(In reply to SATHEESARAN from comment #1)
> Hi Mohit, 
> 
> I took a look at the test .t -
> https://github.com/gluster/glusterfs/blob/devel/tests/bugs/core/bug-1432542-
> mpx-restart-crash.t
> 
> So the test steps looks as:
> 
> 1. brick-multiplex feature is turned on
> 2. Create a volumes ( 2x3 )
> 3. Mount the volume and write a 4K block sized file using dd
> 4. Check the memory used so far:
> # pmap -x $(pgrep glusterfsd) | grep total
> 5.Check the threads count after creating the volume:
> # ps -T -p $(pgrep glusterfsd) | wc -l
> 6. Repeat 2,3,4,5 for 14 more volumes
> 7. Stop glusterd
> 8. Kill all brick process
> 8. Restart glusterd
> 9. Clean the setup by umounting all the volumes, stopping and deleting the
> bricks
> 
> I see that this use case is not a widely use case. This test is written to
> make sure that the restart of glusterd,
> doesn't core dumps. This use case is not widely used by the customers,
> should we address this fix downstream ?
> 
> On the other hand, you mention that getting crash at the time of detaching a
> brick, which step in this test involves detaching the brick ?

The test case could be normal test case in brick_mux environment, In case of volume
stop in brick_mux environment there could be a similar situation when a brick process
can generate the crash.The other thing is fix is very small kind of coverity so we should
consider it.

Thanks,
Mohit Agrawal

Comment 13 errata-xmlrpc 2022-05-31 12:37:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (glusterfs bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:4840