Bug 1597768 - br-state-check.t crashed while brick multiplex is enabled
Summary: br-state-check.t crashed while brick multiplex is enabled
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: glusterfs
Version: rhgs-3.4
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: RHGS 3.4.0
Assignee: Mohit Agrawal
QA Contact: Bala Konda Reddy M
URL:
Whiteboard:
Depends On:
Blocks: 1503137 1597776
TreeView+ depends on / blocked
 
Reported: 2018-07-03 15:08 UTC by Mohit Agrawal
Modified: 2018-09-04 06:51 UTC (History)
6 users (show)

Fixed In Version: glusterfs-3.12.2-14
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1597776 (view as bug list)
Environment:
Last Closed: 2018-09-04 06:50:20 UTC


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2018:2607 None None None 2018-09-04 06:51:44 UTC

Description Mohit Agrawal 2018-07-03 15:08:01 UTC
Description of problem:

Test case ./tests/bitrot/br-state-check.t crashed while brick multiplex
is enabled.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:

Test case crashed
Expected results:
Test case should not crash.

Additional info:

Comment 2 Mohit Agrawal 2018-07-03 15:17:37 UTC
Hi,

Below is the bt pattern for brick process 
(gdb) bt
#0  0x00007fe461b5e34d in memset (__len=2792, __ch=0, __dest=0x0) at /usr/include/bits/string3.h:84
#1  rpcsvc_request_create (svc=svc@entry=0x7fe420058be0, trans=trans@entry=0x7fe4501e68b0, 
    msg=msg@entry=0x7fe4501e9650) at rpcsvc.c:459
#2  0x00007fe461b5e7c5 in rpcsvc_handle_rpc_call (svc=0x7fe420058be0, trans=trans@entry=0x7fe4501e68b0, 
    msg=0x7fe4501e9650) at rpcsvc.c:615
#3  0x00007fe461b5ebeb in rpcsvc_notify (trans=0x7fe4501e68b0, mydata=<optimized out>, 
    event=<optimized out>, data=<optimized out>) at rpcsvc.c:789
#4  0x00007fe461b60b23 in rpc_transport_notify (this=this@entry=0x7fe4501e68b0, 
    event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=data@entry=0x7fe4501e9650) at rpc-transport.c:538
#5  0x00007fe45698f5d6 in socket_event_poll_in (this=this@entry=0x7fe4501e68b0, 
    notify_handled=<optimized out>) at socket.c:2315
#6  0x00007fe456991b7c in socket_event_handler (fd=23, idx=10, gen=4, data=0x7fe4501e68b0, poll_in=1, 
    poll_out=0, poll_err=0) at socket.c:2467
#7  0x00007fe461dfa524 in event_dispatch_epoll_handler (event=0x7fe454edae80, event_pool=0x55e74e601200)
    at event-epoll.c:583
#8  event_dispatch_epoll_worker (data=0x55e74e64a9e0) at event-epoll.c:659
#9  0x00007fe460bfbe25 in start_thread () from /usr/lib64/libpthread.so.0
#10 0x00007fe4604c834d in clone () from /usr/lib64/libc.so.6

$3 = (xlator_t *) 0x7fe4200062a0
(gdb) p *(xlator_t*)this->xl
$4 = {name = 0x7fe420006e60 "patchy-changelog", type = 0x7fe420006fe0 "features/changelog", 
  instance_name = 0x0, next = 0x7fe420003960, prev = 0x7fe420007720, parents = 0x7fe420008460, 
  children = 0x7fe4200076c0, options = 0x0, dlhandle = 0x7fe450011250, fops = 0x7fe44f4f8780 <fops>, 
  cbks = 0x7fe44f4f8720 <cbks>, dumpops = 0x0, volume_options = {next = 0x7fe420006300, 
    prev = 0x7fe420006300}, fini = 0x7fe44f2e9560 <fini>, init = 0x7fe44f2e8a60 <init>, 
  reconfigure = 0x7fe44f2e8370 <reconfigure>, mem_acct_init = 0x7fe44f2e82f0 <mem_acct_init>, 
  notify = 0x7fe44f2e7990 <notify>, loglevel = GF_LOG_NONE, client_latency = 0, latencies = {{min = 0, 
      max = 0, total = 0, std = 0, mean = 0, count = 0} <repeats 55 times>}, history = 0x0, 
  ctx = 0x55e74e5ca010, graph = 0x7fe420000990, itable = 0x0, init_succeeded = 1 '\001', private = 0x0, 
  mem_acct = 0x7fe420053720, winds = 0, switched = 0 '\000', local_pool = 0x0, 
  is_autoloaded = _gf_false, volfile_id = 0x0, xl_id = 4, cleanup_starting = 1, call_cleanup = 1}


(gdb) p *svc->xl
$7 = {name = 0x7fe420006e60 "patchy-changelog", type = 0x7fe420006fe0 "features/changelog", 
  instance_name = 0x0, next = 0x7fe420003960, prev = 0x7fe420007720, parents = 0x7fe420008460, 
  children = 0x7fe4200076c0, options = 0x0, dlhandle = 0x7fe450011250, fops = 0x7fe44f4f8780 <fops>, 
  cbks = 0x7fe44f4f8720 <cbks>, dumpops = 0x0, volume_options = {next = 0x7fe420006300, 
    prev = 0x7fe420006300}, fini = 0x7fe44f2e9560 <fini>, init = 0x7fe44f2e8a60 <init>, 
  reconfigure = 0x7fe44f2e8370 <reconfigure>, mem_acct_init = 0x7fe44f2e82f0 <mem_acct_init>, 
  notify = 0x7fe44f2e7990 <notify>, loglevel = GF_LOG_NONE, client_latency = 0, latencies = {{min = 0, 
      max = 0, total = 0, std = 0, mean = 0, count = 0} <repeats 55 times>}, history = 0x0, 
  ctx = 0x55e74e5ca010, graph = 0x7fe420000990, itable = 0x0, init_succeeded = 1 '\001', private = 0x0, 
  mem_acct = 0x7fe420053720, winds = 0, switched = 0 '\000', local_pool = 0x0, 
  is_autoloaded = _gf_false, volfile_id = 0x0, xl_id = 4, cleanup_starting = 1, call_cleanup = 1}
(gdb) p svc
$8 = (rpcsvc_t *) 0x7fe420058be0
(gdb) p *svc
$9 = {rpclock = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, 
      __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, 
    __align = 0}, memfactor = 8, authschemes = {next = 0x7fe420058ea0, prev = 0x7fe420059080}, 
  options = 0x7fe420058470, allow_insecure = _gf_true, register_portmap = _gf_true, 
  root_squash = _gf_false, anonuid = 65534, anongid = 65534, ctx = 0x55e74e5ca010, listeners = {
    next = 0x7fe420058c48, prev = 0x7fe420058c48}, programs = {next = 0x7fe420059638, 
    prev = 0x7fe420059638}, notify = {next = 0x7fe420058c68, prev = 0x7fe420058c68}, notify_count = 1, 
  xl = 0x7fe4200062a0, mydata = 0x7fe4200062a0, notifyfn = 0x0, rxpool = 0x0, drc = 0x0, 
  outstanding_rpc_limit = 0, addr_namelookup = _gf_false, throttle = _gf_false}


It is showing clearly xprt is getting a request for changelog rpc for that rxpool is already destroyed so at the time of allocating memory for an RPC request brick process is getting crashed. To resolve the same need to update the rpc cleanup code in changelog xlator.

Regards
Mohit Agrawal

Comment 11 Bala Konda Reddy M 2018-07-25 11:18:41 UTC
Build: 3.12.2-14
On brick mux setup of 3 nodes,
Followed the steps from the br-state-check.t and no glusterd crashes seen.

Run the prove tests by installing by source on RHEL7.5 "prove -vf tests/bitrot/br-state-check.t"
[root@dhcp37-188 rhs-glusterfs]# prove -vf tests/bitrot/br-state-check.t 
tests/bitrot/br-state-check.t .. 
1..35
ok
All tests successful.
Files=1, Tests=35, 41 wallclock secs ( 0.05 usr  0.01 sys +  2.23 cusr  2.52 csys =  4.81 CPU)
Result: PASS

Hence marking it as verified

Comment 12 errata-xmlrpc 2018-09-04 06:50:20 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2607


Note You need to log in before you can comment on or make changes to this bug.