Description of problem: Test case ./tests/bitrot/br-state-check.t crashed while brick multiplex is enabled. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Test case crashed Expected results: Test case should not crash. Additional info:
Hi, Below is the bt pattern for brick process (gdb) bt #0 0x00007fe461b5e34d in memset (__len=2792, __ch=0, __dest=0x0) at /usr/include/bits/string3.h:84 #1 rpcsvc_request_create (svc=svc@entry=0x7fe420058be0, trans=trans@entry=0x7fe4501e68b0, msg=msg@entry=0x7fe4501e9650) at rpcsvc.c:459 #2 0x00007fe461b5e7c5 in rpcsvc_handle_rpc_call (svc=0x7fe420058be0, trans=trans@entry=0x7fe4501e68b0, msg=0x7fe4501e9650) at rpcsvc.c:615 #3 0x00007fe461b5ebeb in rpcsvc_notify (trans=0x7fe4501e68b0, mydata=<optimized out>, event=<optimized out>, data=<optimized out>) at rpcsvc.c:789 #4 0x00007fe461b60b23 in rpc_transport_notify (this=this@entry=0x7fe4501e68b0, event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=data@entry=0x7fe4501e9650) at rpc-transport.c:538 #5 0x00007fe45698f5d6 in socket_event_poll_in (this=this@entry=0x7fe4501e68b0, notify_handled=<optimized out>) at socket.c:2315 #6 0x00007fe456991b7c in socket_event_handler (fd=23, idx=10, gen=4, data=0x7fe4501e68b0, poll_in=1, poll_out=0, poll_err=0) at socket.c:2467 #7 0x00007fe461dfa524 in event_dispatch_epoll_handler (event=0x7fe454edae80, event_pool=0x55e74e601200) at event-epoll.c:583 #8 event_dispatch_epoll_worker (data=0x55e74e64a9e0) at event-epoll.c:659 #9 0x00007fe460bfbe25 in start_thread () from /usr/lib64/libpthread.so.0 #10 0x00007fe4604c834d in clone () from /usr/lib64/libc.so.6 $3 = (xlator_t *) 0x7fe4200062a0 (gdb) p *(xlator_t*)this->xl $4 = {name = 0x7fe420006e60 "patchy-changelog", type = 0x7fe420006fe0 "features/changelog", instance_name = 0x0, next = 0x7fe420003960, prev = 0x7fe420007720, parents = 0x7fe420008460, children = 0x7fe4200076c0, options = 0x0, dlhandle = 0x7fe450011250, fops = 0x7fe44f4f8780 <fops>, cbks = 0x7fe44f4f8720 <cbks>, dumpops = 0x0, volume_options = {next = 0x7fe420006300, prev = 0x7fe420006300}, fini = 0x7fe44f2e9560 <fini>, init = 0x7fe44f2e8a60 <init>, reconfigure = 0x7fe44f2e8370 <reconfigure>, mem_acct_init = 0x7fe44f2e82f0 <mem_acct_init>, notify = 0x7fe44f2e7990 <notify>, loglevel = GF_LOG_NONE, client_latency = 0, latencies = {{min = 0, max = 0, total = 0, std = 0, mean = 0, count = 0} <repeats 55 times>}, history = 0x0, ctx = 0x55e74e5ca010, graph = 0x7fe420000990, itable = 0x0, init_succeeded = 1 '\001', private = 0x0, mem_acct = 0x7fe420053720, winds = 0, switched = 0 '\000', local_pool = 0x0, is_autoloaded = _gf_false, volfile_id = 0x0, xl_id = 4, cleanup_starting = 1, call_cleanup = 1} (gdb) p *svc->xl $7 = {name = 0x7fe420006e60 "patchy-changelog", type = 0x7fe420006fe0 "features/changelog", instance_name = 0x0, next = 0x7fe420003960, prev = 0x7fe420007720, parents = 0x7fe420008460, children = 0x7fe4200076c0, options = 0x0, dlhandle = 0x7fe450011250, fops = 0x7fe44f4f8780 <fops>, cbks = 0x7fe44f4f8720 <cbks>, dumpops = 0x0, volume_options = {next = 0x7fe420006300, prev = 0x7fe420006300}, fini = 0x7fe44f2e9560 <fini>, init = 0x7fe44f2e8a60 <init>, reconfigure = 0x7fe44f2e8370 <reconfigure>, mem_acct_init = 0x7fe44f2e82f0 <mem_acct_init>, notify = 0x7fe44f2e7990 <notify>, loglevel = GF_LOG_NONE, client_latency = 0, latencies = {{min = 0, max = 0, total = 0, std = 0, mean = 0, count = 0} <repeats 55 times>}, history = 0x0, ctx = 0x55e74e5ca010, graph = 0x7fe420000990, itable = 0x0, init_succeeded = 1 '\001', private = 0x0, mem_acct = 0x7fe420053720, winds = 0, switched = 0 '\000', local_pool = 0x0, is_autoloaded = _gf_false, volfile_id = 0x0, xl_id = 4, cleanup_starting = 1, call_cleanup = 1} (gdb) p svc $8 = (rpcsvc_t *) 0x7fe420058be0 (gdb) p *svc $9 = {rpclock = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}, memfactor = 8, authschemes = {next = 0x7fe420058ea0, prev = 0x7fe420059080}, options = 0x7fe420058470, allow_insecure = _gf_true, register_portmap = _gf_true, root_squash = _gf_false, anonuid = 65534, anongid = 65534, ctx = 0x55e74e5ca010, listeners = { next = 0x7fe420058c48, prev = 0x7fe420058c48}, programs = {next = 0x7fe420059638, prev = 0x7fe420059638}, notify = {next = 0x7fe420058c68, prev = 0x7fe420058c68}, notify_count = 1, xl = 0x7fe4200062a0, mydata = 0x7fe4200062a0, notifyfn = 0x0, rxpool = 0x0, drc = 0x0, outstanding_rpc_limit = 0, addr_namelookup = _gf_false, throttle = _gf_false} It is showing clearly xprt is getting a request for changelog rpc for that rxpool is already destroyed so at the time of allocating memory for an RPC request brick process is getting crashed. To resolve the same need to update the rpc cleanup code in changelog xlator. Regards Mohit Agrawal
Build: 3.12.2-14 On brick mux setup of 3 nodes, Followed the steps from the br-state-check.t and no glusterd crashes seen. Run the prove tests by installing by source on RHEL7.5 "prove -vf tests/bitrot/br-state-check.t" [root@dhcp37-188 rhs-glusterfs]# prove -vf tests/bitrot/br-state-check.t tests/bitrot/br-state-check.t .. 1..35 ok All tests successful. Files=1, Tests=35, 41 wallclock secs ( 0.05 usr 0.01 sys + 2.23 cusr 2.52 csys = 4.81 CPU) Result: PASS Hence marking it as verified
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:2607