Bug 2015092 - ./tests/bugs/core/bug-1432542-mpx-restart-crash.t is getting crashed
Summary: ./tests/bugs/core/bug-1432542-mpx-restart-crash.t is getting crashed
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: core
Version: rhgs-3.5
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: RHGS 3.5.z Batch Update 7
Assignee: Mohit Agrawal
QA Contact: Sayalee
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-10-18 11:39 UTC by Mohit Agrawal
Modified: 2022-05-31 12:37 UTC (History)
7 users (show)

Fixed In Version: glusterfs-6.0-62
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-05-31 12:37:31 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2022:4840 0 None None None 2022-05-31 12:37:46 UTC

Description Mohit Agrawal 2021-10-18 11:39:34 UTC
The test case ./tests/bugs/core/bug-1432542-mpx-restart-crash.t is getting crashed on running during periodic regression test suite.
For more refer the link (https://gluster-downstream-jenkins-csb-storage.apps.ocp4.prod.psi.redhat.com/job/periodic-regression-RHGS-3.5.6-on-RHEL7/20/consoleFull)

Reproducible: Run the test case in a loop to reproduce the issue

RCA: The test case ./tests/bugs/core/bug-1432542-mpx-restart-crash.t is getting crashed at the time of detaching a brick.The brick process is getting crashed because there is a race condition to send a disconnect on rpc associated with victim brick and handling GF_EVENT_CLEANUP for the victim brick.

Solution: Save victim_name on local variable to avoid crash.

The issue is already fixed in upstream(https://github.com/gluster/glusterfs/pull/1979)

Comment 1 SATHEESARAN 2021-10-20 12:31:54 UTC
Hi Mohit, 

I took a look at the test .t - https://github.com/gluster/glusterfs/blob/devel/tests/bugs/core/bug-1432542-mpx-restart-crash.t

So the test steps looks as:

1. brick-multiplex feature is turned on
2. Create a volumes ( 2x3 )
3. Mount the volume and write a 4K block sized file using dd
4. Check the memory used so far:
# pmap -x $(pgrep glusterfsd) | grep total
5.Check the threads count after creating the volume:
# ps -T -p $(pgrep glusterfsd) | wc -l
6. Repeat 2,3,4,5 for 14 more volumes
7. Stop glusterd
8. Kill all brick process
8. Restart glusterd
9. Clean the setup by umounting all the volumes, stopping and deleting the bricks

I see that this use case is not a widely use case. This test is written to make sure that the restart of glusterd,
doesn't core dumps. This use case is not widely used by the customers, should we address this fix downstream ?

On the other hand, you mention that getting crash at the time of detaching a brick, which step in this test involves detaching the brick ?

Comment 2 Mohit Agrawal 2021-10-20 12:39:06 UTC
(In reply to SATHEESARAN from comment #1)
> Hi Mohit, 
> 
> I took a look at the test .t -
> https://github.com/gluster/glusterfs/blob/devel/tests/bugs/core/bug-1432542-
> mpx-restart-crash.t
> 
> So the test steps looks as:
> 
> 1. brick-multiplex feature is turned on
> 2. Create a volumes ( 2x3 )
> 3. Mount the volume and write a 4K block sized file using dd
> 4. Check the memory used so far:
> # pmap -x $(pgrep glusterfsd) | grep total
> 5.Check the threads count after creating the volume:
> # ps -T -p $(pgrep glusterfsd) | wc -l
> 6. Repeat 2,3,4,5 for 14 more volumes
> 7. Stop glusterd
> 8. Kill all brick process
> 8. Restart glusterd
> 9. Clean the setup by umounting all the volumes, stopping and deleting the
> bricks
> 
> I see that this use case is not a widely use case. This test is written to
> make sure that the restart of glusterd,
> doesn't core dumps. This use case is not widely used by the customers,
> should we address this fix downstream ?
> 
> On the other hand, you mention that getting crash at the time of detaching a
> brick, which step in this test involves detaching the brick ?

The test case could be normal test case in brick_mux environment, In case of volume
stop in brick_mux environment there could be a similar situation when a brick process
can generate the crash.The other thing is fix is very small kind of coverity so we should
consider it.

Thanks,
Mohit Agrawal

Comment 13 errata-xmlrpc 2022-05-31 12:37:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (glusterfs bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:4840


Note You need to log in before you can comment on or make changes to this bug.