Bug 1580361

Summary: rpc: ABRT, SEGV in rpcsvc_handle_disconnect->glusterd_rpcsvc_notify
Product: [Community] GlusterFS Reporter: Kaleb KEITHLEY <kkeithle>
Component: glusterdAssignee: bugs <bugs>
Status: CLOSED INSUFFICIENT_DATA QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: mainlineCC: amukherj, bugs, jeff, kkeithle
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-12-05 06:55:05 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Kaleb KEITHLEY 2018-05-21 11:07:19 UTC
Description of problem:


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Kaleb KEITHLEY 2018-05-21 11:09:46 UTC
See https://retrace.fedoraproject.org/faf/reports/2170288/

includes a rudimentary BT

Comment 2 Jeff Darcy 2018-05-21 12:11:20 UTC
I see there's some RDMA code in the stack trace. Any chance this is RDMA-specific, e.g. some kind of ordering/serialization assumption fulfilled by the socket code but not by the RDMA code? That wouldn't necessarily mean it's a flaw in the RDMA code or should be fixed there, but might provide a useful hint. Or maybe it's pure coincidence.

Comment 3 Atin Mukherjee 2018-06-18 03:40:29 UTC
Do we have a reproducer for this?

Comment 4 Kaleb KEITHLEY 2018-06-18 11:34:07 UTC
No, _I_ don't have a reproducer.

This is an automated ABRT report coming from boxes running community gluster installed from the CentOS Storage SIG repos.

I'm simply relaying the report that gets forwarded to me as the Gluster packager in Fedora. (CentOS ABRTs get sent to Fedora.)

Although as you can see at the link posted in Comment 1 it has occurred 67 times, so it seems like it should be easy to reproduce.

Comment 5 Atin Mukherjee 2018-10-05 02:38:48 UTC
Kaleb - do you still see this happening? I'm looking for a core file (specifically the complete backtrace) and the test which is run to get to the crash by which we can begin the investigation.

Comment 6 Shyamsundar 2018-10-23 14:54:00 UTC
Release 3.12 has been EOLd and this bug was still found to be in the NEW state, hence moving the version to mainline, to triage the same and take appropriate actions.

Comment 7 Atin Mukherjee 2018-12-05 06:55:05 UTC
Doesn't have sufficient data to debug this. Closing it.