Bug 1597768

Summary: br-state-check.t crashed while brick multiplex is enabled
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Mohit Agrawal <moagrawa>
Component: glusterfsAssignee: Mohit Agrawal <moagrawa>
Status: CLOSED ERRATA QA Contact: Bala Konda Reddy M <bmekala>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rhgs-3.4CC: amukherj, nchilaka, rhs-bugs, sankarshan, sheggodu, vbellur
Target Milestone: ---   
Target Release: RHGS 3.4.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.12.2-14 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1597776 (view as bug list) Environment:
Last Closed: 2018-09-04 06:50:20 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1503137, 1597776    

Description Mohit Agrawal 2018-07-03 15:08:01 UTC
Description of problem:

Test case ./tests/bitrot/br-state-check.t crashed while brick multiplex
is enabled.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:

Test case crashed
Expected results:
Test case should not crash.

Additional info:

Comment 2 Mohit Agrawal 2018-07-03 15:17:37 UTC
Hi,

Below is the bt pattern for brick process 
(gdb) bt
#0  0x00007fe461b5e34d in memset (__len=2792, __ch=0, __dest=0x0) at /usr/include/bits/string3.h:84
#1  rpcsvc_request_create (svc=svc@entry=0x7fe420058be0, trans=trans@entry=0x7fe4501e68b0, 
    msg=msg@entry=0x7fe4501e9650) at rpcsvc.c:459
#2  0x00007fe461b5e7c5 in rpcsvc_handle_rpc_call (svc=0x7fe420058be0, trans=trans@entry=0x7fe4501e68b0, 
    msg=0x7fe4501e9650) at rpcsvc.c:615
#3  0x00007fe461b5ebeb in rpcsvc_notify (trans=0x7fe4501e68b0, mydata=<optimized out>, 
    event=<optimized out>, data=<optimized out>) at rpcsvc.c:789
#4  0x00007fe461b60b23 in rpc_transport_notify (this=this@entry=0x7fe4501e68b0, 
    event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=data@entry=0x7fe4501e9650) at rpc-transport.c:538
#5  0x00007fe45698f5d6 in socket_event_poll_in (this=this@entry=0x7fe4501e68b0, 
    notify_handled=<optimized out>) at socket.c:2315
#6  0x00007fe456991b7c in socket_event_handler (fd=23, idx=10, gen=4, data=0x7fe4501e68b0, poll_in=1, 
    poll_out=0, poll_err=0) at socket.c:2467
#7  0x00007fe461dfa524 in event_dispatch_epoll_handler (event=0x7fe454edae80, event_pool=0x55e74e601200)
    at event-epoll.c:583
#8  event_dispatch_epoll_worker (data=0x55e74e64a9e0) at event-epoll.c:659
#9  0x00007fe460bfbe25 in start_thread () from /usr/lib64/libpthread.so.0
#10 0x00007fe4604c834d in clone () from /usr/lib64/libc.so.6

$3 = (xlator_t *) 0x7fe4200062a0
(gdb) p *(xlator_t*)this->xl
$4 = {name = 0x7fe420006e60 "patchy-changelog", type = 0x7fe420006fe0 "features/changelog", 
  instance_name = 0x0, next = 0x7fe420003960, prev = 0x7fe420007720, parents = 0x7fe420008460, 
  children = 0x7fe4200076c0, options = 0x0, dlhandle = 0x7fe450011250, fops = 0x7fe44f4f8780 <fops>, 
  cbks = 0x7fe44f4f8720 <cbks>, dumpops = 0x0, volume_options = {next = 0x7fe420006300, 
    prev = 0x7fe420006300}, fini = 0x7fe44f2e9560 <fini>, init = 0x7fe44f2e8a60 <init>, 
  reconfigure = 0x7fe44f2e8370 <reconfigure>, mem_acct_init = 0x7fe44f2e82f0 <mem_acct_init>, 
  notify = 0x7fe44f2e7990 <notify>, loglevel = GF_LOG_NONE, client_latency = 0, latencies = {{min = 0, 
      max = 0, total = 0, std = 0, mean = 0, count = 0} <repeats 55 times>}, history = 0x0, 
  ctx = 0x55e74e5ca010, graph = 0x7fe420000990, itable = 0x0, init_succeeded = 1 '\001', private = 0x0, 
  mem_acct = 0x7fe420053720, winds = 0, switched = 0 '\000', local_pool = 0x0, 
  is_autoloaded = _gf_false, volfile_id = 0x0, xl_id = 4, cleanup_starting = 1, call_cleanup = 1}


(gdb) p *svc->xl
$7 = {name = 0x7fe420006e60 "patchy-changelog", type = 0x7fe420006fe0 "features/changelog", 
  instance_name = 0x0, next = 0x7fe420003960, prev = 0x7fe420007720, parents = 0x7fe420008460, 
  children = 0x7fe4200076c0, options = 0x0, dlhandle = 0x7fe450011250, fops = 0x7fe44f4f8780 <fops>, 
  cbks = 0x7fe44f4f8720 <cbks>, dumpops = 0x0, volume_options = {next = 0x7fe420006300, 
    prev = 0x7fe420006300}, fini = 0x7fe44f2e9560 <fini>, init = 0x7fe44f2e8a60 <init>, 
  reconfigure = 0x7fe44f2e8370 <reconfigure>, mem_acct_init = 0x7fe44f2e82f0 <mem_acct_init>, 
  notify = 0x7fe44f2e7990 <notify>, loglevel = GF_LOG_NONE, client_latency = 0, latencies = {{min = 0, 
      max = 0, total = 0, std = 0, mean = 0, count = 0} <repeats 55 times>}, history = 0x0, 
  ctx = 0x55e74e5ca010, graph = 0x7fe420000990, itable = 0x0, init_succeeded = 1 '\001', private = 0x0, 
  mem_acct = 0x7fe420053720, winds = 0, switched = 0 '\000', local_pool = 0x0, 
  is_autoloaded = _gf_false, volfile_id = 0x0, xl_id = 4, cleanup_starting = 1, call_cleanup = 1}
(gdb) p svc
$8 = (rpcsvc_t *) 0x7fe420058be0
(gdb) p *svc
$9 = {rpclock = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, 
      __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, 
    __align = 0}, memfactor = 8, authschemes = {next = 0x7fe420058ea0, prev = 0x7fe420059080}, 
  options = 0x7fe420058470, allow_insecure = _gf_true, register_portmap = _gf_true, 
  root_squash = _gf_false, anonuid = 65534, anongid = 65534, ctx = 0x55e74e5ca010, listeners = {
    next = 0x7fe420058c48, prev = 0x7fe420058c48}, programs = {next = 0x7fe420059638, 
    prev = 0x7fe420059638}, notify = {next = 0x7fe420058c68, prev = 0x7fe420058c68}, notify_count = 1, 
  xl = 0x7fe4200062a0, mydata = 0x7fe4200062a0, notifyfn = 0x0, rxpool = 0x0, drc = 0x0, 
  outstanding_rpc_limit = 0, addr_namelookup = _gf_false, throttle = _gf_false}


It is showing clearly xprt is getting a request for changelog rpc for that rxpool is already destroyed so at the time of allocating memory for an RPC request brick process is getting crashed. To resolve the same need to update the rpc cleanup code in changelog xlator.

Regards
Mohit Agrawal

Comment 11 Bala Konda Reddy M 2018-07-25 11:18:41 UTC
Build: 3.12.2-14
On brick mux setup of 3 nodes,
Followed the steps from the br-state-check.t and no glusterd crashes seen.

Run the prove tests by installing by source on RHEL7.5 "prove -vf tests/bitrot/br-state-check.t"
[root@dhcp37-188 rhs-glusterfs]# prove -vf tests/bitrot/br-state-check.t 
tests/bitrot/br-state-check.t .. 
1..35
ok
All tests successful.
Files=1, Tests=35, 41 wallclock secs ( 0.05 usr  0.01 sys +  2.23 cusr  2.52 csys =  4.81 CPU)
Result: PASS

Hence marking it as verified

Comment 12 errata-xmlrpc 2018-09-04 06:50:20 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2607