+++ This bug was initially created as a clone of Bug #1763036 +++ +++ This bug was initially created as a clone of Bug #1741193 +++ Description of problem: glusterfsd crashed on all the gluster nodes and created a huge corefile causing the mounts to go down Primary analysis of the corefile: (/cases/02441282/_additional_files/x-coredump/inf03-core.6572.) [file seems incomplete] Core was generated by `/usr/sbin/glusterfsd -s 10.7.199.48 --volfile-id vol_0005ed9e52ceb05f16b1bb48cd'. Program terminated with signal 6, Aborted. #0 0x00007f35de36e207 in ?? () (gdb) bt Python Exception <class 'gdb.MemoryError'> Cannot access memory at address 0x7f2bdef8da58: (gdb) info proc glusterfsd exe = '/usr/sbin/glusterfsd -s 10.7.199.48 --volfile-id vol_0005ed9e52ceb05f16b1bb48cd' (gdb) backtrace #0 0x00007f35de36e207 in ?? () Cannot access memory at address 0x7f2bdef8da58 (gdb) down 0x7f2bdef8da58 Cannot access memory at address 0x7f2bdef8da58 (gdb) info frame Stack level 0, frame at 0x7f2bdef8da60: rip = 0x7f35de36e207; saved rip Cannot access memory at address 0x7f2bdef8da58 (gdb) up 0x7f2bdef8da58 #0 0x00007f35de36e207 in ?? () (gdb) i r rax 0x0 0 rbx 0x7f35e01ff000 139869370183680 rcx 0xffffffffffffffff -1 rdx 0x6 6 rsi 0x2da6 11686 rdi 0x19ac 6572 rbp 0x7f35de4c2520 0x7f35de4c2520 rsp 0x7f2bdef8da58 0x7f2bdef8da58 r8 0x0 0 r9 0x7f17b4f04770 139739796621168 r10 0x8 8 r11 0x206 518 r12 0x7f35dfdb875a 139869365700442 r13 0x7f35dfdb8b00 139869365701376 r14 0x0 0 r15 0x7f255fc70ac0 139798497397440 rip 0x7f35de36e207 0x7f35de36e207 eflags 0x206 [ PF IF ] cs 0x33 51 ss 0x2b 43 ds 0x0 0 es 0x0 0 fs 0x0 0 gs 0x0 0 (gdb) Please verify the corefile further to see if the root cause could be isolated 10.7.199.48:-var-lib-heketi-mounts-vg_77cff90104e3566f7856e523efd82152-brick_3ac56467ac42041325057fdfd61bf7cc-brick uuid=e0b70592-6ab6-4d66-b1b6-d54bd4797d70 hostname=10.7.199.48 path=/var/lib/heketi/mounts/vg_77cff90104e3566f7856e523efd82152/brick_3ac56467ac42041325057fdfd61bf7cc/brick real_path=/var/lib/heketi/mounts/vg_77cff90104e3566f7856e523efd82152/brick_3ac56467ac42041325057fdfd61bf7cc/brick listen-port=49153 rdma.listen-port=0 decommissioned=0 brick-id=vol_0005ed9e52ceb05f16b1bb48cd526ea9-client-2 mount_dir=/brick snap-status=0 brick-fsid=16842170 [2019-08-10 06:34:18.342656] I [MSGID: 106005] [glusterd-handler.c:6179:__glusterd_brick_rpc_notify] 0-management: Brick 10.7.199.48:/var/lib/heketi/mounts/vg_77cff90104e3566f7856e523efd82152/brick_3ac56467ac42041325057fdfd61bf7cc/brick has disconnected from glusterd. (Please feel free to change the "Component" if found inapprpriate. Many thanks, Jay Version-Release number of selected component (if applicable): rhgs3.4 How reproducible: The issue occurred twice, around the same time (morning, latest one was on Aug 10) Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: --- Additional comment from Jay Samson on 2019-09-23 10:19:15 UTC --- Here is the analysis of the issue so far (will post more when I get time) : [New LWP 21371] [New LWP 23168] [New LWP 117329] bt[Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by `/usr/sbin/glusterfsd -s 10.7.199.47 --volfile-id vol_b763bb6afb974e6f80384673ba'. Program terminated with signal 6, Aborted. #0 0x00007f9efaed5207 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:55 55 return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig); (gdb) bt #0 0x00007f9efaed5207 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:55 #1 0x00007f9efaed68f8 in __GI_abort () at abort.c:90 #2 0x00007f9efaece026 in __assert_fail_base (fmt=0x7f9efb028ea0 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x7f9efc94507a "slot->fd == fd", file=file@entry=0x7f9efc945054 "event-epoll.c", line=line@entry=417, function=function@entry=0x7f9efc945420 <__PRETTY_FUNCTION__.11118> "event_register_epoll") at assert.c:92 #3 0x00007f9efaece0d2 in __GI___assert_fail (assertion=assertion@entry=0x7f9efc94507a "slot->fd == fd", file=file@entry=0x7f9efc945054 "event-epoll.c", line=line@entry=417, function=function@entry=0x7f9efc945420 <__PRETTY_FUNCTION__.11118> "event_register_epoll") at assert.c:101 #4 0x00007f9efc8f7d04 in event_register_epoll (event_pool=0x563fac588150, fd=<optimized out>, handler=<optimized out>, data=<optimized out>, poll_in=<optimized out>, poll_out=<optimized out>, notify_poller_death=0 '\000') at event-epoll.c:417 #5 0x00007f9ef798ceb2 in socket_server_event_handler (fd=<optimized out>, idx=<optimized out>, gen=<optimized out>, data=0x7f9ee80403f0, poll_in=<optimized out>, poll_out=<optimized out>, poll_err=0, event_thread_died=0 '\000') at socket.c:2950 #6 0x00007f9efc8f8870 in event_dispatch_epoll_handler (event=0x7f98fffc9e70, event_pool=0x563fac588150) at event-epoll.c:643 #7 event_dispatch_epoll_worker (data=0x7f992c8ea110) at event-epoll.c:759 #8 0x00007f9efb6d5dd5 in start_thread (arg=0x7f98fffca700) at pthread_create.c:307 #9 0x00007f9efaf9cead in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 (gdb) f 7 #7 event_dispatch_epoll_worker (data=0x7f992c8ea110) at event-epoll.c:759 759 ret = event_dispatch_epoll_handler (event_pool, &event); (gdb) p *event_pool $1 = {ops = 0x7f9efcb6b4e0 <event_ops_epoll>, fd = 3, breaker = {0, 0}, count = 16384, reg = 0x0, ereg = {0x563fac5b6e50, 0x7f9e247f81b0, 0x0 <repeats 1022 times>}, slots_used = {1021, 10, 0 <repeats 1022 times>}, poller_death = {next = 0x563fac5b6fe0, prev = 0x563fac5b6fe0}, poller_death_sliced = 0, poller_gen = 54, used = 0, changed = 0, mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}, cond = {__data = {__lock = 0, __futex = 0, __total_seq = 0, __wakeup_seq = 0, __woken_seq = 0, __mutex = 0x0, __nwaiters = 0, __broadcast_seq = 0}, __size = '\000' <repeats 47 times>, __align = 0}, evcache = 0x0, evcache_size = 0, eventthreadcount = 197, pollers = {140320709637888, 140320701245184, 140319441008384, 140318519322368, 140318091491072, 140317311362816, 140317135181568, 140316095010560, 140315306489600, 140314870265600, 140314023028480, 140313586804480, 140312873785088, 140312370452224, 140311590323968, 140311028274944, 140310508189440, 140310340400896, 140310247036672, 140310153672448, 140310060308224, 140309966944000, 140309803562752, 140309710198528, 140309616834304, 140309451233024, 140309357868800, 140309130221312, 140309028464384, 140308935100160, 140308841735936, 140308748371712, 140308655007488, 140308561643264, 140308468279040, 140308374914816, 140308281550592, 140308188186368, 140308086429440, 140307993065216, 140307899700992, 140307806336768, 140307704579840, 140307611215616, 140307509458688, 140307416094464, 140307322730240, 140307229366016, 140307136001792, 140307042637568, 140306949273344, 140306855909120, 140306762544896, 140306660787968, 140306559031040, 140306457274112, 140306363909888, 140306270545664, 140306177181440, 140306083817216, 140305990452992, 140305818961664, 140305725597440, 140305632233216, 140305538868992, 140305445504768, 140305352140544, 140305258776320, 140305022048000, 140304928683776, 140304762005248, 140304668641024, 140304566884096, 140304465127168, 140304363370240, 140304261613312, 140304168249088, 140304074884864, 140303981520640, 140303879763712, 140303786399488, 140303693035264, 140303599671040, 140303506306816, 140303412942592, 140303317473024, 140303222003456, 140303126533888, 140303031064320, 140302935594752, 140302840125184, 140302744655616, 140302640527104, 140302536398592, 140302432270080, 140302336534272, 140302240798464, 140302145062656, 140302049326848, 140301953591040, 140301857855232, 140301711763200, 140301607634688, 140301503506176, 140301407770368, 140301230380800, 140301128623872, 140301026866944, 140300924057344, 140300819928832, 140300715800320, 140300550899456, 140300383143680, 140300281386752, 140300106315520, 140300004558592, 140299902801664, 140299801044736, 140299697182464, 140299601712896, 140299506243328, 140299410507520, 140299306379008, 140299202250496, 140299098121984, 140298993993472, 140298898257664, 140298802521856, 140298698393344, 140298602657536, 140298498529024, 140298402793216, 140298307057408, 140298211321600, 140298107193088, 140298003064576, 140297907328768, 140297811592960, 140297715857152, 140297603335936, 140297499207424, 140297403471616, 140297307735808, 140297212000000, 140297116264192, 140297020528384, 140296924792576, 140296820664064, 140296650192640, 140296556828416, 140296463464192, 140296370099968, 140296274630400, 140296178894592, 140296012674816, 140295836526336, 140295743162112, 140295576483584, 140295483119360, 140295389755136, 140295296390912, 140295201974016, 140295106504448, 140295002642176, 140294907172608, 140294811436800, 140294715700992, 140294619965184, 140294524229376, 140294428493568, 140294332757760, 140294237021952, 140294141286144, 140294045550336, 140293949814528, 140293854078720, 140293758342912, 140293662607104, 140293558478592, 140293454350080, 140293350221568, 140293254485760, 140293158749952, 140293063014144, 140292967278336, 140292863149824, 140292767414016, 140292598515456, 140292496758528, 140292403394304, 140292308711168, 140292212975360, 140292117239552, 140292021503744, 140291925767936, 140291751245568, 140291717674752, 0 <repeats 827 times>}, destroy = 0, activethreadcount = 198, auto_thread_count = 195} (gdb) f 6 #6 0x00007f9efc8f8870 in event_dispatch_epoll_handler (event=0x7f98fffc9e70, event_pool=0x563fac588150) at event-epoll.c:643 643 ret = handler (fd, idx, gen, data, (gdb) p fd $2 = 12 (gdb) p idx $3 = 3 (gdb) l 638 639 if (!handler) 640 goto out; 641 642 if (!handled_error_previously) { 643 ret = handler (fd, idx, gen, data, 644 (event->events & (EPOLLIN|EPOLLPRI)), 645 (event->events & (EPOLLOUT)), 646 (event->events & (EPOLLERR|EPOLLHUP)), 0); 647 } (gdb) p data $4 = (void *) 0x7f9ee80403f0 (gdb) f 5socket_server_event_handler Invalid number "5socket". (gdb) f 5 #5 0x00007f9ef798ceb2 in socket_server_event_handler (fd=<optimized out>, idx=<optimized out>, gen=<optimized out>, data=0x7f9ee80403f0, poll_in=<optimized out>, poll_out=<optimized out>, poll_err=0, event_thread_died=0 '\000') at socket.c:2950 2950 event_register (ctx->event_pool, (gdb) l 2945 */ 2946 ret = rpc_transport_notify (this, RPC_TRANSPORT_ACCEPT, new_trans); 2947 2948 if (ret != -1) { 2949 new_priv->idx = 2950 event_register (ctx->event_pool, 2951 new_sock, 2952 socket_event_handler, 2953 new_trans, 2954 1, 0, (gdb) p nds_sock No symbol "nds_sock" in current context. (gdb) p new_sock $5 = 1340 (gdb) f 4 #4 0x00007f9efc8f7d04 in event_register_epoll (event_pool=0x563fac588150, fd=<optimized out>, handler=<optimized out>, data=<optimized out>, poll_in=<optimized out>, poll_out=<optimized out>, notify_poller_death=0 '\000') at event-epoll.c:417 417 assert (slot->fd == fd); (gdb) l 412 return -1; 413 } 414 415 slot = event_slot_get (event_pool, idx); 416 417 assert (slot->fd == fd); 418 419 LOCK (&slot->lock); 420 { 421 /* make epoll 'singleshot', which (gdb) p *event_pool $6 = {ops = 0x7f9efcb6b4e0 <event_ops_epoll>, fd = 3, breaker = {0, 0}, count = 16384, reg = 0x0, ereg = {0x563fac5b6e50, 0x7f9e247f81b0, 0x0 <repeats 1022 times>}, slots_used = {1021, 10, 0 <repeats 1022 times>}, poller_death = {next = 0x563fac5b6fe0, prev = 0x563fac5b6fe0}, poller_death_sliced = 0, poller_gen = 54, used = 0, changed = 0, mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}, cond = {__data = {__lock = 0, __futex = 0, __total_seq = 0, __wakeup_seq = 0, __woken_seq = 0, __mutex = 0x0, __nwaiters = 0, __broadcast_seq = 0}, __size = '\000' <repeats 47 times>, __align = 0}, evcache = 0x0, evcache_size = 0, eventthreadcount = 197, pollers = {140320709637888, 140320701245184, 140319441008384, 140318519322368, 140318091491072, 140317311362816, 140317135181568, 140316095010560, 140315306489600, 140314870265600, 140314023028480, 140313586804480, 140312873785088, 140312370452224, 140311590323968, 140311028274944, 140310508189440, 140310340400896, 140310247036672, 140310153672448, 140310060308224, 140309966944000, 140309803562752, 140309710198528, 140309616834304, 140309451233024, 140309357868800, 140309130221312, 140309028464384, 140308935100160, 140308841735936, 140308748371712, 140308655007488, 140308561643264, 140308468279040, 140308374914816, 140308281550592, 140308188186368, 140308086429440, 140307993065216, 140307899700992, 140307806336768, 140307704579840, 140307611215616, 140307509458688, 140307416094464, 140307322730240, 140307229366016, 140307136001792, 140307042637568, 140306949273344, 140306855909120, 140306762544896, 140306660787968, 140306559031040, 140306457274112, 140306363909888, 140306270545664, 140306177181440, 140306083817216, 140305990452992, 140305818961664, 140305725597440, 140305632233216, 140305538868992, 140305445504768, 140305352140544, 140305258776320, 140305022048000, 140304928683776, 140304762005248, 140304668641024, 140304566884096, 140304465127168, 140304363370240, 140304261613312, 140304168249088, 140304074884864, 140303981520640, 140303879763712, 140303786399488, 140303693035264, 140303599671040, 140303506306816, 140303412942592, 140303317473024, 140303222003456, 140303126533888, 140303031064320, 140302935594752, 140302840125184, 140302744655616, 140302640527104, 140302536398592, 140302432270080, 140302336534272, 140302240798464, 140302145062656, 140302049326848, 140301953591040, 140301857855232, 140301711763200, 140301607634688, 140301503506176, 140301407770368, 140301230380800, 140301128623872, 140301026866944, 140300924057344, 140300819928832, 140300715800320, 140300550899456, 140300383143680, 140300281386752, 140300106315520, 140300004558592, 140299902801664, 140299801044736, 140299697182464, 140299601712896, 140299506243328, 140299410507520, 140299306379008, 140299202250496, 140299098121984, 140298993993472, 140298898257664, 140298802521856, 140298698393344, 140298602657536, 140298498529024, 140298402793216, 140298307057408, 140298211321600, 140298107193088, 140298003064576, 140297907328768, 140297811592960, 140297715857152, 140297603335936, 140297499207424, 140297403471616, 140297307735808, 140297212000000, 140297116264192, 140297020528384, 140296924792576, 140296820664064, 140296650192640, 140296556828416, 140296463464192, 140296370099968, 140296274630400, 140296178894592, 140296012674816, 140295836526336, 140295743162112, 140295576483584, 140295483119360, 140295389755136, 140295296390912, 140295201974016, 140295106504448, 140295002642176, 140294907172608, 140294811436800, 140294715700992, 140294619965184, 140294524229376, 140294428493568, 140294332757760, 140294237021952, 140294141286144, 140294045550336, 140293949814528, 140293854078720, 140293758342912, 140293662607104, 140293558478592, 140293454350080, 140293350221568, 140293254485760, 140293158749952, 140293063014144, 140292967278336, 140292863149824, 140292767414016, 140292598515456, 140292496758528, 140292403394304, 140292308711168, 140292212975360, 140292117239552, 140292021503744, 140291925767936, 140291751245568, 140291717674752, 0 <repeats 827 times>}, destroy = 0, activethreadcount = 198, auto_thread_count = 195} (gdb) p idx $7 = <optimized out> (gdb) ptype event_pool type = struct event_pool { struct event_ops *ops; int fd; int breaker[2]; int count; struct event_slot_poll *reg; struct event_slot_epoll *ereg[1024]; int slots_used[1024]; struct list_head poller_death; int poller_death_sliced; int poller_gen; int used; int changed; pthread_mutex_t mutex; pthread_cond_t cond; void *evcache; int evcache_size; int eventthreadcount; pthread_t pollers[1024]; int destroy; int activethreadcount; int auto_thread_count; } * (gdb) p *event_pool->event_slot_epoll There is no member named event_slot_epoll. (gdb) p *event_pool->ereg $8 = (struct event_slot_epoll *) 0x563fac5b6e50 (gdb) p (struct event_slot_epoll *)event_pool->ereg $9 = (struct event_slot_epoll *) 0x563fac588170 (gdb) p *(struct event_slot_epoll *)event_pool->ereg $10 = {fd = -1403294128, events = 22079, gen = 612336048, idx = 32670, ref = 0, do_close = 0, in_handler = 0, handled_error = 0, data = 0x0, handler = 0x0, lock = {spinlock = 0, mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}}, poller_death = {next = 0x0, prev = 0x0}} (gdb) p *event_pool->slots_used $11 = 1021 (gdb) p *event_pool->slots_used[1] Cannot access memory at address 0xa (gdb) p *event_pool->slots_used[0] Cannot access memory at address 0x3fd (gdb) p event_pool->slots_used[0] $12 = 1021 (gdb) p event_pool->slots_used[1] $13 = 10 (gdb) p event_pool->slots_used[2] $14 = 0 (gdb) p event_pool->slots_used[3] $15 = 0 (gdb) p event_pool->slots_used[1021] $16 = 0 (gdb) p event_pool->slots_used[1024] $17 = -1403293728 (gdb) p event_pool->slots_used[1023] $18 = 0 (gdb) p event_pool->slots_used[10] $19 = 0 (gdb) p *event_pool->ereg[0] $20 = {fd = 7, events = 1073741851, gen = 1, idx = 0, ref = 1, do_close = 0, in_handler = 0, handled_error = 0, data = 0x563fac595a20, handler = 0x7f9ef798c780 <socket_server_event_handler>, lock = {spinlock = 0, mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}}, poller_death = {next = 0x563fac5b6ea8, prev = 0x563fac5b6ea8}} (gdb) p *event_pool->ereg[1] $21 = {fd = -1, events = 1073741851, gen = 216, idx = 0, ref = 1, do_close = 1, in_handler = 0, handled_error = 0, data = 0x7f9e3c050be0, handler = 0x7f9ef7989980 <socket_event_handler>, lock = {spinlock = 0, mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}}, poller_death = {next = 0x7f9e247f8208, prev = 0x7f9e247f8208}} (gdb) bt #0 0x00007f9efaed5207 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:55 #1 0x00007f9efaed68f8 in __GI_abort () at abort.c:90 #2 0x00007f9efaece026 in __assert_fail_base (fmt=0x7f9efb028ea0 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x7f9efc94507a "slot->fd == fd", file=file@entry=0x7f9efc945054 "event-epoll.c", line=line@entry=417, function=function@entry=0x7f9efc945420 <__PRETTY_FUNCTION__.11118> "event_register_epoll") at assert.c:92 #3 0x00007f9efaece0d2 in __GI___assert_fail (assertion=assertion@entry=0x7f9efc94507a "slot->fd == fd", file=file@entry=0x7f9efc945054 "event-epoll.c", line=line@entry=417, function=function@entry=0x7f9efc945420 <__PRETTY_FUNCTION__.11118> "event_register_epoll") at assert.c:101 #4 0x00007f9efc8f7d04 in event_register_epoll (event_pool=0x563fac588150, fd=<optimized out>, handler=<optimized out>, data=<optimized out>, poll_in=<optimized out>, poll_out=<optimized out>, notify_poller_death=0 '\000') at event-epoll.c:417 #5 0x00007f9ef798ceb2 in socket_server_event_handler (fd=<optimized out>, idx=<optimized out>, gen=<optimized out>, data=0x7f9ee80403f0, poll_in=<optimized out>, poll_out=<optimized out>, poll_err=0, event_thread_died=0 '\000') at socket.c:2950 #6 0x00007f9efc8f8870 in event_dispatch_epoll_handler (event=0x7f98fffc9e70, event_pool=0x563fac588150) at event-epoll.c:643 #7 event_dispatch_epoll_worker (data=0x7f992c8ea110) at event-epoll.c:759 #8 0x00007f9efb6d5dd5 in start_thread (arg=0x7f98fffca700) at pthread_create.c:307 #9 0x00007f9efaf9cead in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 (gdb) l 422 means we need to re-add the fd with 423 epoll_ctl(EPOLL_CTL_MOD) after delivery of every 424 single event. This assures us that while a poller 425 thread has picked up and is processing an event, 426 another poller will not try to pick this at the same 427 time as well. 428 */ 429 430 slot->events = EPOLLPRI | EPOLLHUP | EPOLLERR | EPOLLONESHOT; 431 slot->handler = handler; (gdb) p *slot $22 = {fd = -1, events = 1073741851, gen = 216, idx = 0, ref = 1, do_close = 1, in_handler = 0, handled_error = 0, data = 0x7f9e3c050be0, handler = 0x7f9ef7989980 <socket_event_handler>, lock = {spinlock = 0, mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}}, poller_death = {next = 0x7f9e247f8208, prev = 0x7f9e247f8208}} (gdb) p event_pool->slots_used[11] $23 = 0 (gdb) p event_pool->slots_used[12] $24 = 0 (gdb) p *slot $25 = {fd = -1, events = 1073741851, gen = 216, idx = 0, ref = 1, do_close = 1, in_handler = 0, handled_error = 0, data = 0x7f9e3c050be0, handler = 0x7f9ef7989980 <socket_event_handler>, lock = {spinlock = 0, mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}}, poller_death = {next = 0x7f9e247f8208, prev = 0x7f9e247f8208}} (gdb) p slot->fd $26 = -1 (gdb) p fd $27 = <optimized out> (gdb) f 4 #4 0x00007f9efc8f7d04 in event_register_epoll (event_pool=0x563fac588150, fd=<optimized out>, handler=<optimized out>, data=<optimized out>, poll_in=<optimized out>, poll_out=<optimized out>, notify_poller_death=0 '\000') at event-epoll.c:417 417 assert (slot->fd == fd); (gdb) p slot->fd $28 = -1 --- Additional comment from Prashant Dhange on 2019-09-25 03:43:12 UTC --- (In reply to Prashant Dhange from comment #33) > (In reply to Prashant Dhange from comment #32) > > (In reply to nravinas from comment #30) > > > Hi, Nithya. > > > > > > * I've been requested to try to help here. Unfortunately, setting a digicore > > > environment to analyze this core is not working as expected: > > > > > > IP: 10.72.93.42 > > > Username / password: root / redhat > > > > > > cd /root/digicore > > > > > > sh digicore.sh attach 02441282 > > > > > > Once attached to the container, I have downloaded a core file at /core > > > > > > gdb /usr/sbin/glusterfsd core > > > > > > . > > > . > > > . > > > > > > Core was generated by `/usr/sbin/glusterfsd -s 10.7.199.47 --volfile-id > > > vol_b763bb6afb974e6f80384673ba'. > > > Program terminated with signal 6, Aborted. > > > #0 0x00007f9efaed5207 in sigorset (dest=0x4193, left=0x5377, right=0x6) at > > > sigorset.c:27 > > > 27 if (dest == NULL || left == NULL || right == NULL) > > > > > The above digicore environment is having different gluster/glibc package > > version and are not matching with customer's coredump environment that is > > why we are getting incomplete backtrace. > > > > We had passed wrong installed-rpms (rpm -qa) to digicore while creating > > digicore containter, whereas coredump was generated on environment having > > glusterfs 3.12.2-47.el7rhgs.x86_64 and glibc 2.17-260.el7_6.4.x86_64 > > > > The above digicore container is having glusterfs 3.12.2-47.4.el7rhgs.x86_64 > > and glibc 2.17-292.el7.x86_64 : > > [root@7c3000b2e002 case]# rpm -qa |egrep 'gluster|glibc' > > glibc-2.17-292.el7.x86_64 Tue Sep 3 > > 09:12:15 2019 > > glibc-common-2.17-292.el7.x86_64 Tue Sep 3 > > 09:12:12 2019 > > glusterfs-3.12.2-47.4.el7rhgs.x86_64 Sun Sep 15 > > 10:20:34 2019 > > glusterfs-api-3.12.2-47.4.el7rhgs.x86_64 Sun Sep 15 > > 10:20:35 2019 > > glusterfs-client-xlators-3.12.2-47.4.el7rhgs.x86_64 Sun Sep 15 > > 10:20:34 2019 > > glusterfs-cli-3.12.2-47.4.el7rhgs.x86_64 Sun Sep 15 > > 10:20:34 2019 > > glusterfs-events-3.12.2-47.4.el7rhgs.x86_64 Sun Sep 15 > > 10:20:36 2019 > > glusterfs-fuse-3.12.2-47.4.el7rhgs.x86_64 Sun Sep 15 > > 10:20:35 2019 > > glusterfs-geo-replication-3.12.2-47.4.el7rhgs.x86_64 Sun Sep 15 > > 10:20:36 2019 > > glusterfs-libs-3.12.2-47.4.el7rhgs.x86_64 Sun Sep 15 > > 10:20:34 2019 > > glusterfs-rdma-3.12.2-47.4.el7rhgs.x86_64 Sun Sep 15 > > 10:20:38 2019 > > glusterfs-server-3.12.2-47.4.el7rhgs.x86_64 Sun Sep 15 > > 10:20:35 2019 > > gluster-block-0.2.1-32.el7rhgs.x86_64 Sun Sep 15 > > 10:25:48 2019 > > gluster-nagios-addons-0.2.10-2.el7rhgs.x86_64 Wed Sep 11 > > 13:24:39 2019 > > gluster-nagios-common-0.2.4-1.el7rhgs.noarch Wed Sep 11 > > 13:24:12 2019 > > libvirt-daemon-driver-storage-gluster-4.5.0-23.el7_7.1.x86_64 Sun Sep 15 > > 10:20:36 2019 > > python2-gluster-3.12.2-47.4.el7rhgs.x86_64 Sun Sep 15 > > 10:20:36 2019 > > vdsm-gluster-4.30.18-1.0.el7rhgs.x86_64 Sun Sep 15 > > 10:20:38 2019 > > > > Correcting glusterfs/glibc version to > > 3.12.2-47.4.el7rhgs.x86_64/2.17-292.el7.x86_64 respectively on above > > container should fix the issue and we should get complete backtrace here. > Correction: it should be "Correcting glusterfs/glibc version to > 3.12.2-47.el7rhgs.x86_64/2.17-260.el7_6.4.x86_64 respectively on above > container should fix the issue and we should get complete backtrace here." I have fixed the package mismatch on above container and we are able to get proper backtrace. [root@7c3000b2e002 case]# rpm -qa|egrep 'gluster|glibc' glusterfs-libs-3.12.2-47.el7rhgs.x86_64 glusterfs-client-xlators-3.12.2-47.el7rhgs.x86_64 glusterfs-cli-3.12.2-47.el7rhgs.x86_64 glusterfs-server-3.12.2-47.el7rhgs.x86_64 gluster-block-0.2.1-31.el7rhgs.x86_64 glibc-2.17-260.el7_6.4.x86_64 glibc-debuginfo-2.17-260.el7_6.4.x86_64 glusterfs-3.12.2-47.el7rhgs.x86_64 glusterfs-api-3.12.2-47.el7rhgs.x86_64 glusterfs-fuse-3.12.2-47.el7rhgs.x86_64 python2-gluster-3.12.2-47.el7rhgs.x86_64 glusterfs-geo-replication-3.12.2-47.el7rhgs.x86_64 glibc-common-2.17-260.el7_6.4.x86_64 glibc-debuginfo-common-2.17-260.el7_6.4.x86_64 glusterfs-debuginfo-3.12.2-47.el7rhgs.x86_64 [root@7c3000b2e002 case]# gdb /usr/sbin/glusterfsd core GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-115.el7 Copyright (C) 2013 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>... Reading symbols from /usr/sbin/glusterfsd...Reading symbols from /usr/lib/debug/usr/sbin/glusterfsd.debug...done. done. .... .... [New LWP 21301] [New LWP 21371] [New LWP 23168] [New LWP 117329] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by `/usr/sbin/glusterfsd -s 10.7.199.47 --volfile-id vol_b763bb6afb974e6f80384673ba'. Program terminated with signal 6, Aborted. #0 0x00007f9efaed5207 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:55 55 return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig); (gdb) bt #0 0x00007f9efaed5207 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:55 #1 0x00007f9efaed68f8 in __GI_abort () at abort.c:90 #2 0x00007f9efaece026 in __assert_fail_base (fmt=0x7f9efb028ea0 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x7f9efc94507a "slot->fd == fd", file=file@entry=0x7f9efc945054 "event-epoll.c", line=line@entry=417, function=function@entry=0x7f9efc945420 <__PRETTY_FUNCTION__.11118> "event_register_epoll") at assert.c:92 #3 0x00007f9efaece0d2 in __GI___assert_fail (assertion=assertion@entry=0x7f9efc94507a "slot->fd == fd", file=file@entry=0x7f9efc945054 "event-epoll.c", line=line@entry=417, function=function@entry=0x7f9efc945420 <__PRETTY_FUNCTION__.11118> "event_register_epoll") at assert.c:101 #4 0x00007f9efc8f7d04 in event_register_epoll (event_pool=0x563fac588150, fd=<optimized out>, handler=<optimized out>, data=<optimized out>, poll_in=<optimized out>, poll_out=<optimized out>, notify_poller_death=0 '\000') at event-epoll.c:417 #5 0x00007f9ef798ceb2 in socket_server_event_handler (fd=<optimized out>, idx=<optimized out>, gen=<optimized out>, data=0x7f9ee80403f0, poll_in=<optimized out>, poll_out=<optimized out>, poll_err=0, event_thread_died=0 '\000') at socket.c:2950 #6 0x00007f9efc8f8870 in event_dispatch_epoll_handler (event=0x7f98fffc9e70, event_pool=0x563fac588150) at event-epoll.c:643 #7 event_dispatch_epoll_worker (data=0x7f992c8ea110) at event-epoll.c:759 #8 0x00007f9efb6d5dd5 in start_thread (arg=0x7f98fffca700) at pthread_create.c:307 #9 0x00007f9efaf9cead in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 (gdb) So we have 2 coredump debug environments: 1. Natalia's ocs environment ocs master: ip-address : 10.74.251.54 username : root password : RedHat1! [root@vm251-54 ~]# oc get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE glusterblock-storage-provisioner-dc-1-49bmc 1/1 Running 0 11h 10.131.0.119 vm252-19.gsslab.pnq2.redhat.com <none> glusterfs-storage-mqkck 1/1 Running 0 11h 10.74.252.19 vm252-19.gsslab.pnq2.redhat.com <none> glusterfs-storage-w9xfz 1/1 Running 0 11h 10.74.254.205 vm254-205.gsslab.pnq2.redhat.com <none> glusterfs-storage-zzwd9 1/1 Running 0 11h 10.74.252.69 vm252-69.gsslab.pnq2.redhat.com <none> heketi-storage-1-8vwsc 1/1 Running 0 11h 10.131.0.117 vm252-19.gsslab.pnq2.redhat.com <none> [root@vm251-54 ~]# oc rsh glusterfs-storage-mqkck sh-4.2# cd /cores/ sh-4.2# gdb /usr/sbin/glusterfsd inf02-core.16787 2. digicore container mentioned in comment#30 --- Additional comment from Prashant Dhange on 2019-09-25 05:07:11 UTC --- Echoing what Mohit has found earlier here in comment#8 (In reply to Mohit Agrawal from comment #8) > Hi, > > I have checked gluster processes logs specific to brick crash, I have not > found any reason of brick crash in logs. > There are no error messages available in logs just before a brick crash. > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > [2019-08-10 06:32:32.415690] I [addr.c:55:compare_addr_and_update] > 0-/var/lib/heketi/mounts/vg_dac7467f7481ef3dc72981494e55c93c/ > brick_705d09c3f50d87b740ab828b8c819fee/brick: allowed = "*", received addr = > "10.7.199.96" > [2019-08-10 06:32:32.415729] I [MSGID: 115029] > [server-handshake.c:817:server_setvolume] > 0-vol_0005ed9e52ceb05f16b1bb48cd526ea9-server: accepted client from > vlmazocpapp32.fisdev.local-57970-2019/08/10-06:32:32:332389- > vol_ec6393163cc25b9fed3c1e5e2ff8375c-client-2-0-0 (version: 3.12.2) > /lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xa0)[0x7f8e5b3a6cc0] There was no enough memory available to hold backtrace message/ > /lib64/libglusterfs.so.0(gf_print_trace+0x334)[0x7f8e5b3b0c04] > /lib64/libc.so.6(+0x36280)[0x7f8e59a06280] > /lib64/libc.so.6(gsignal+0x37)[0x7f8e59a06207] > /lib64/libc.so.6(abort+0x148)[0x7f8e59a078f8] > /lib64/libc.so.6(+0x2f026)[0x7f8e599ff026] > /lib64/libc.so.6(+0x2f0d2)[0x7f8e599ff0d2] > /lib64/libglusterfs.so.0(+0x88c4b)[0x7f8e5b403c4b] > /usr/lib64/glusterfs/3.12.2/rpc-transport/socket.so(+0xcdaf)[0x7f8e564bedaf] > /lib64/libglusterfs.so.0(+0x894c4)[0x7f8e5b4044c4] > /lib64/libpthread.so.0(+0x7dd5)[0x7f8e5a205dd5] > /lib64/libc.so.6(clone+0x6d)[0x7f8e59aceb3d] > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > > On the system side, I have found some of the processes got OOM killed (like > heapster) multiple times due to cgroup memory limit configured but > on the basis of current logs, I can not say the same is true for gluster > also. > We don't have complete backtrace and available backtrace pattern in logs is > not > sufficient to find out the reason for a crash. > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> > > [11799234.175888] SELinux: mount invalid. Same superblock, different > security settings for (dev mqueue, type mqueue) > [11799235.185554] XFS (dm-1222): Unmounting Filesystem > [11799532.330862] heapster invoked oom-killer: gfp_mask=0x50, order=0, > oom_score_adj=985 > [11799532.341972] heapster > cpuset=docker- > 6e4a91e46c97a87310aab482bdd8b5350d58b0dc2c537f56769f3deddd73246e.scope > mems_allowed=0 > [11799532.349607] CPU: 5 PID: 95365 Comm: heapster Not tainted > 3.10.0-957.10.1.el7.x86_64 #1 > [11799532.355169] Hardware name: Microsoft Corporation Virtual > Machine/Virtual Machine, BIOS 090007 06/02/2017 > [11799532.361995] Call Trace: > [11799532.364181] [<ffffffffae162e41>] dump_stack+0x19/0x1b > [11799532.367727] [<ffffffffae15d86a>] dump_header+0x90/0x229 > [11799532.371755] [<ffffffffadbba156>] ? find_lock_task_mm+0x56/0xc0 > [11799532.375694] [<ffffffffadc31628>] ? > try_get_mem_cgroup_from_mm+0x28/0x60 > [11799532.381063] [<ffffffffadbba604>] oom_kill_process+0x254/0x3d0 > [11799532.385394] [<ffffffffadc35406>] > mem_cgroup_oom_synchronize+0x546/0x570 > [11799532.389867] [<ffffffffadc34880>] ? mem_cgroup_charge_common+0xc0/0xc0 > [11799532.393860] [<ffffffffadbbae94>] pagefault_out_of_memory+0x14/0x90 > [11799532.397908] [<ffffffffae15bd72>] mm_fault_error+0x6a/0x157 > [11799532.401558] [<ffffffffae1707a8>] __do_page_fault+0x3c8/0x500 > [11799532.406047] [<ffffffffae170915>] do_page_fault+0x35/0x90 > [11799532.410130] [<ffffffffae16c758>] page_fault+0x28/0x30 > [11799532.419650] Task in > /kubepods.slice/kubepods-burstable.slice/kubepods-burstable- > pod76432b3c_a713_11e9_93ef_000d3a046b95.slice/docker- > 6e4a91e46c97a87310aab482bdd8b5350d58b0dc2c537f56769f3deddd73246e.scope > killed as a result of limit of > /kubepods.slice/kubepods-burstable.slice/kubepods-burstable- > pod76432b3c_a713_11e9_93ef_000d3a046b95.slice > [11799532.446763] memory: usage 3662108kB, limit 3662108kB, failcnt 5883951 > [11799532.450874] memory+swap: usage 3662108kB, limit 9007199254740988kB, > failcnt 0 > [11799532.455828] kmem: usage 0kB, limit 9007199254740988kB, failcnt 0 > [11799532.459804] Memory cgroup stats for > /kubepods.slice/kubepods-burstable.slice/kubepods-burstable- > pod76432b3c_a713_11e9_93ef_000d3a046b95.slice: cache:0KB rss:0KB > rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB > inactive_file:0KB active_file:0KB unevictable:0KB > [11799532.476295] Memory cgroup stats for > /kubepods.slice/kubepods-burstable.slice/kubepods-burstable- > pod76432b3c_a713_11e9_93ef_000d3a046b95.slice/docker- > 56f2a10610e0295e6716b9de5e32334b96be7201b37f8160f728af4c3e366b08.scope: > cache:0KB rss:428KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB > active_anon:428KB inactive_file:0KB active_file:0KB unevictable:0KB > [11799532.500710] Memory cgroup stats for > /kubepods.slice/kubepods-burstable.slice/kubepods-burstable- > pod76432b3c_a713_11e9_93ef_000d3a046b95.slice/docker- > 6e4a91e46c97a87310aab482bdd8b5350d58b0dc2c537f56769f3deddd73246e.scope: > cache:452KB rss:3661228KB rss_huge:0KB mapped_file:452KB swap:0KB > inactive_anon:0KB active_anon:3661224KB inactive_file:0KB active_file:368KB > unevictable:0KB > [11799532.521834] [ pid ] uid tgid total_vm rss nr_ptes swapents > oom_score_adj name > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> cgroup memory limitation not allowing to allocate more pages requested by glusterfsd here. > > I have not found any logs specific to consuming high memory by gluster > process > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> > > messages:Aug 5 04:48:38 vlmazocpinf01 kernel: [11799532.330862] heapster > invoked oom-killer: gfp_mask=0x50, order=0, oom_score_adj=985 > messages:Aug 5 04:48:38 vlmazocpinf01 kernel: [11799532.381063] > [<ffffffffadbba604>] oom_kill_process+0x254/0x3d0 > messages:Aug 5 04:48:38 vlmazocpinf01 kernel: [11799532.419650] Task in > /kubepods.slice/kubepods-burstable.slice/kubepods-burstable- > pod76432b3c_a713_11e9_93ef_000d3a046b95.slice/docker- > 6e4a91e46c97a87310aab482bdd8b5350d58b0dc2c537f56769f3deddd73246e.scope > killed as a result of limit of > /kubepods.slice/kubepods-burstable.slice/kubepods-burstable- > pod76432b3c_a713_11e9_93ef_000d3a046b95.slice > messages:Aug 5 04:48:38 vlmazocpinf01 kernel: [11799532.545283] Memory > cgroup out of memory: Kill process 107416 (heapster) score 1985 or sacrifice > child > messages:Aug 5 04:48:38 vlmazocpinf01 kernel: [11799532.551985] Killed > process 101585 (heapster) total-vm:5095012kB, anon-rss:3659164kB, > file-rss:0kB, shmem-rss:0kB > grep: messages-20190728: No such file or directory > messages-20190804:Jul 31 03:06:10 vlmazocpinf01 kernel: [11361383.951387] > heapster invoked oom-killer: gfp_mask=0x50, order=0, oom_score_adj=985 > messages-20190804:Jul 31 03:06:10 vlmazocpinf01 kernel: [11361384.041571] > [<ffffffffadbba604>] oom_kill_process+0x254/0x3d0 > messages-20190804:Jul 31 03:06:10 vlmazocpinf01 kernel: [11361384.112290] > Task in > /kubepods.slice/kubepods-burstable.slice/kubepods-burstable- > pod76432b3c_a713_11e9_93ef_000d3a046b95.slice/docker- > 3d1520b587fa1705dd72e9aacea58b31ae8b3d1b90592e1d00453f2292af1d20.scope > killed as a result of limit of > /kubepods.slice/kubepods-burstable.slice/kubepods-burstable- > pod76432b3c_a713_11e9_93ef_000d3a046b95.slice > messages-20190804:Jul 31 03:06:10 vlmazocpinf01 kernel: [11361384.353431] > Memory cgroup out of memory: Kill process 40432 (heapster) score 1985 or > sacrifice child > messages-20190804:Jul 31 03:06:10 vlmazocpinf01 kernel: [11361384.368790] > Killed process 9732 (heapster) total-vm:4980396kB, anon-rss:3659704kB, > file-rss:0kB, shmem-rss:0kB > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> > > Can we ask if the customer has observed any odd behavior at the time of > brick crash? > > Thanks, > Mohit Agrawal The crash seems to be because of event_slot_alloc could not find slot for new socket fd=1340, looks like __event_newtable failing to allocate memory to new table. --- Additional comment from Mohit Agrawal on 2019-09-25 05:18:40 UTC --- Hi, It seems brick process is getting crashed because the function event_slot_alloc is not able to return valid slot. bt #0 0x00007f9efaed5207 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:55 #1 0x00007f9efaed68f8 in __GI_abort () at abort.c:90 #2 0x00007f9efaece026 in __assert_fail_base (fmt=0x7f9efb028ea0 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x7f9efc94507a "slot->fd == fd", file=file@entry=0x7f9efc945054 "event-epoll.c", line=line@entry=417, function=function@entry=0x7f9efc945420 <__PRETTY_FUNCTION__.11118> "event_register_epoll") at assert.c:92 #3 0x00007f9efaece0d2 in __GI___assert_fail (assertion=assertion@entry=0x7f9efc94507a "slot->fd == fd", file=file@entry=0x7f9efc945054 "event-epoll.c", line=line@entry=417, function=function@entry=0x7f9efc945420 <__PRETTY_FUNCTION__.11118> "event_register_epoll") at assert.c:101 #4 0x00007f9efc8f7d04 in event_register_epoll (event_pool=0x563fac588150, fd=<optimized out>, handler=<optimized out>, data=<optimized out>, poll_in=<optimized out>, poll_out=<optimized out>, notify_poller_death=0 '\000') at event-epoll.c:417 #5 0x00007f9ef798ceb2 in socket_server_event_handler (fd=<optimized out>, idx=<optimized out>, gen=<optimized out>, data=0x7f9ee80403f0, poll_in=<optimized out>, poll_out=<optimized out>, poll_err=0, event_thread_died=0 '\000') at socket.c:2950 #6 0x00007f9efc8f8870 in event_dispatch_epoll_handler (event=0x7f98fffc9e70, event_pool=0x563fac588150) at event-epoll.c:643 #7 event_dispatch_epoll_worker (data=0x7f992c8ea110) at event-epoll.c:759 #8 0x00007f9efb6d5dd5 in start_thread (arg=0x7f98fffca700) at pthread_create.c:307 #9 0x00007f9efaf9cead in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 (gdb) f 4 #4 0x00007f9efc8f7d04 in event_register_epoll (event_pool=0x563fac588150, fd=<optimized out>, handler=<optimized out>, data=<optimized out>, poll_in=<optimized out>, poll_out=<optimized out>, notify_poller_death=0 '\000') at event-epoll.c:417 417 assert (slot->fd == fd); (gdb) p slot $3184 = (struct event_slot_epoll *) 0x7f9e247f81b0 (gdb) p *slot $3185 = {fd = -1, events = 1073741851, gen = 216, idx = 0, ref = 1, do_close = 1, in_handler = 0, handled_error = 0, data = 0x7f9e3c050be0, handler = 0x7f9ef7989980 <socket_event_handler>, lock = {spinlock = 0, mutex = {__data = { __lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}}, poller_death = {next = 0x7f9e247f8208, prev = 0x7f9e247f8208}} p event_pool->slots_used $3188 = {1021, 10, 0 <repeats 1022 times>} f 4 After print the complete fd table i am not able to figure out any entry for socket 1340 set $tmp = 1 while ($tmp != 1023) print ((struct event_slot_epoll *)event_pool->ereg[0])[$tmp].fd set $tmp = $tmp + 1 f 5 p new_sock 1340 As per the current code of event_slot_alloc first, it checks the value of slot_used to validate the free entry in the table. Current bt is showing the value of slots used is 1021(less than 1024) it means still it has some free slot and it sets the table to this registry index(0) >>>>>>>>>>>>>>>>>>>>> for (i = 0; i < EVENT_EPOLL_TABLES; i++) { switch (event_pool->slots_used[i]) { case EVENT_EPOLL_SLOTS: continue; case 0: if (!event_pool->ereg[i]) { table = __event_newtable(event_pool, i); if (!table) return -1; } else { table = event_pool->ereg[i]; } break; default: table = event_pool->ereg[i]; break; } if (table) /* break out of the loop */ break; } if (!table) return -1; table_idx = i; >>>>>>>>>>>>>>>>>>>>>>>>> In below code it tries to check the free entry in the table.As per current slots_used value ideally 3 entry should be free in table but somehow here no entry is free. The code is not validating the fd assignment in table. It is just returning idx. >>>>>>>>>>>>>>>>>>>>>>>>>> for (i = 0; i < EVENT_EPOLL_SLOTS; i++) { if (table[i].fd == -1) { /* wipe everything except bump the generation */ gen = table[i].gen; memset(&table[i], 0, sizeof(table[i])); table[i].gen = gen + 1; LOCK_INIT(&table[i].lock); INIT_LIST_HEAD(&table[i].poller_death); table[i].fd = fd; if (notify_poller_death) { table[i].idx = table_idx * EVENT_EPOLL_SLOTS + i; list_add_tail(&table[i].poller_death, &event_pool->poller_death); } event_pool->slots_used[table_idx]++; break; } } >>>>>>>>>>>>>>>>> return table_idx * EVENT_EPOLL_SLOTS + i; I think we need to update the code. I have checked the event code, I am not able to figure out why slots_used is not showing correct value. Ideally slots_used value should be 1024 because in registry table no index is free and before returning the index it should validate fd is successfully assigned or not. I am still checking the code, need some more time to understand the issue. Thanks, Mohit Agrawal --- Additional comment from Jay Samson on 2019-09-27 13:26:36 UTC --- Hello, May I know if we have an update that can be passed to the customer? Regards, Jay --- Additional comment from Mohit Agrawal on 2019-09-30 10:23:38 UTC --- Hi Jay, We are still working on this. I don't have any update yet. We are suspecting there is some race condition in socket code path. Do you have any idea what is the frequency of getting this crash? Is the issue easily reproducible or can we ask something about the reproducer? Thanks, Mohit Agrawal --- Additional comment from Mohit Agrawal on 2019-10-01 11:34:26 UTC --- Hi Jay, Can you confirm with the customer is it possible to test a patch? On the basis of core we are not able to find out root cause of race. Does the customer have some test environment where the issue is easily reproducible? Thanks, Mohit Agrawal --- Additional comment from Jay Samson on 2019-10-01 11:48:10 UTC --- Hello Mohit, The customer has updated this today: " Acutally, in taking a look, it does look like one of the gluster pods core dumped saturday morning. I'm grabbing the core file now to upload. The problem is, even though we do tend to see a similar trend here with WHEN the files are generated (usually saturday mornings 3am-7am or so) I have a really hard time trying to figure out just what is going on in the cluster at that time that may be causing this, or trying to map load back to specific pvcs, etc.. Last week we updated these nodes to RHEL 7.7 and they were rebooted as well, just FYI." He has also attached the core file. I have copied it to supportshell. (Please let me know if you have a system IP or something where I can copy this file) I will verify the feasibility of installing a test patch with them and let you know their inputs. (I'll keep the needinfo on me until this is available) Thanks, Jay --- Additional comment from Mohit Agrawal on 2019-10-01 13:33:33 UTC --- Hi, I don't any extra system, kindly upload new corefile on the same system. Regards, Mohit Agrawal --- Additional comment from Jay Samson on 2019-10-01 14:26:16 UTC --- Hello, The latest corefile is copied to 10.72.93.42 under / filename: core.9838.gz Regards, Jay --- Additional comment from Mohit Agrawal on 2019-10-01 15:04:05 UTC --- Hi Jay, It seems there is sufficient space available on the system, can you please untar and setup gdb environment to debug the core. Kindly ask logs also if logs are not available. Regards, Mohit Agrawal --- Additional comment from Raghavendra G on 2019-10-01 17:21:16 UTC --- (In reply to Mohit Agrawal from comment #38) > > I think we need to update the code. > I have checked the event code, I am not able to figure out why slots_used > is not showing correct value. > Ideally slots_used value should be 1024 because in registry table no index > is free and before returning > the index it should validate fd is successfully assigned or not. > > I am still checking the code, need some more time to understand the issue. I found the reason why slots_used for table 0 was wrong. From one of the logs of a crashed brick process: var-lib-heketi-mounts-vg_862823908fb7cc063683414098b0ab22-brick_99bcb9083a46f31076bfc008c88791ef-brick.log-20190817:[2019-08-17 04:14:03.801645] E [MSGID: 101063] [event-epoll.c:611:event_dispatch_epoll_handler] 0-epoll: stale fd found on idx=500, gen=130, events=1, slot->gen=132 The msg "stale fd found.." indicates that the slot is already deallocated and hence slot->ref would be 0. However 1. event_dispatch_epoll_handler would call slot_ref on this slot (ref would be 0+1 = 1 at this point) 2. event_dispatch_epoll_handler identifies its a stale fd and hence slot is not valid and exits 3. event_dispatch_epoll_handler would call slot_unref on this slot (ref would be 1-1=0 again, resulting in dealloc again without any slot_alloc) Note that only event_slot_alloc increments event_pool->slots_used[table_idx] (event_slot_get doesn't). This means for this slot there was one call to event_slot_alloc but two calls to event_slot_dealloc and hence the value of slots_used[0] would actually be 1 less than the actual number of slots allocated in table 0. Coming back to the log message printed above, idx=500 means this slot belongs to table 0 and hence there is wrong accounting (1 less than actual value) in table 0. So if event_pool->slots_used[0] is 1023, the table is actually full. But event_slot_alloc would wrongly believe there is one slot available and will loop through all slots of table[0] without any success. But at the end of __event_slot_alloc we don't actually test whether we got an empty slot, but instead assume that the iterator 'i' is actually pointing to an empty slot in table 0 and construct and index out of it. As Mohit as already pointed out above, as index is not pointing to a valid slot it doesn't contain expected fd resulting in assertion failure. The root cause here is that the slot->ref is not incremented atomically when slot is allocated. Instead it is done later as part of event_slot_get. If in this window if we happen to run into ref/unref cycles (as explained above) it would result in more number of calls to event_slot_deallocation than actually needed resulting in wrong accounting of slots_used in slot table. The fix would be: 1. increment slot->ref atomically in __event_slot_alloc 2. Add checks to __event_slot_alloc whether it actually returns a valid slot instead of assuming it does. > > Thanks, > Mohit Agrawal --- Additional comment from Raghavendra G on 2019-10-02 02:46:39 UTC --- (In reply to Raghavendra G from comment #46) > > The root cause here is that the slot->ref is not incremented atomically when > slot is allocated. Instead it is done later as part of event_slot_get. If in > this window if we happen to run into ref/unref cycles (as explained above) > it would result in more number of calls to event_slot_deallocation than > actually needed resulting in wrong accounting of slots_used in slot table. > > The fix would be: > 1. increment slot->ref atomically in __event_slot_alloc > 2. Add checks to __event_slot_alloc whether it actually returns a valid slot > instead of assuming it does. While the problem I mentioned above related to non-atomic increment of slot->ref while allocating a slot is a valid one, the other problem is ref/unref cycles on a deallocated slot resulting in calling of deallocation of an already deallocated slot. This problem won't be fixed by atomically incrementing slot->ref in __event_slot_alloc. The fix for this problem would require preventing of ref/unref cycles and/or deallocation of an unused socket. One trivial fix that comes to my mind is to not return a slot at all (instead return NULL) if index is not pointing to a valid slot (a valid slot will have a non-negative fd and generation number matches with the value passed in argument of event_slot_get). We can continue discussions on this over the patch on gerrit. --- Additional comment from Mohit Agrawal on 2019-10-03 03:29:35 UTC --- Hi Jay, It seems there is no sufficient space available on the system(10.72.93.42), can you please unzip and setup gdb environment to debug the core. Kindly ask logs also if logs are not available. Regards, Mohit Agrawal --- Additional comment from Jay Samson on 2019-10-03 13:28:34 UTC --- Hi Mohit, The core debug env is set-up (access instructions same as Comment #30) # gdb /usr/sbin/glusterfsd core.9838 GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-115.el7 Copyright (C) 2013 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>... Reading symbols from /usr/sbin/glusterfsd...Reading symbols from /usr/lib/debug/usr/sbin/glusterfsd.debug...done. done. Core was generated by `/usr/sbin/glusterfsd -s 10.7.199.47 --volfile-id vol_ad9ae40b319276e6e36ff24f81'. Program terminated with signal 6, Aborted. #0 0x00007fb134b52207 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:55 55 return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig); (gdb) t a a bt Thread 3594 (Thread 0x7fae1f7fe700 (LWP 12072)): #0 0x00007fb134c10f73 in select () at ../sysdeps/unix/syscall-template.S:81 #1 0x00007fb12d8d4aca in changelog_ev_dispatch (data=0x7fae9beba838) at changelog-ev-handle.c:350 #2 0x00007fb135352dd5 in start_thread (arg=0x7fae1f7fe700) at pthread_create.c:307 #3 0x00007fb134c19ead in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 Thread 3593 (Thread 0x7fa90143a700 (LWP 46542)): #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238 #1 0x00007fb12c5f4f50 in iot_worker (data=0x7fac1ec29890) at io-threads.c:194 #2 0x00007fb135352dd5 in start_thread (arg=0x7fa90143a700) at pthread_create.c:307 #3 0x00007fb134c19ead in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 Thread 3592 (Thread 0x7faa7f7fe700 (LWP 14802)): #0 0x00007fb134be0e2d in nanosleep () at ../sysdeps/unix/syscall-template.S:81 #1 0x00007fb134be0cc4 in __sleep (seconds=0, seconds@entry=5) at ../sysdeps/unix/sysv/linux/sleep.c:137 #2 0x00007fb12ea3f5c1 in posix_disk_space_check_thread_proc (data=0x7fab07c5d7a0) at posix-helpers.c:1920 #3 0x00007fb135352dd5 in start_thread (arg=0x7faa7f7fe700) at pthread_create.c:307 #4 0x00007fb134c19ead in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 Thread 3591 (Thread 0x7fabb6158700 (LWP 14062)): #0 0x00007fb134c10f73 in select () at ../sysdeps/unix/syscall-template.S:81 #1 0x00007fb12d8d4aca in changelog_ev_dispatch (data=0x7fac0685e0b8) at changelog-ev-handle.c:350 #2 0x00007fb135352dd5 in start_thread (arg=0x7fabb6158700) at pthread_create.c:307 #3 0x00007fb134c19ead in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 Thread 3590 (Thread 0x7fad5ad98700 (LWP 12601)): #0 0x00007fb134c10f73 in select () at ../sysdeps/unix/syscall-template.S:81 #1 0x00007fb12d8d4aca in changelog_ev_dispatch (data=0x7fad95d8c0a8) at changelog-ev-handle.c:350 #2 0x00007fb135352dd5 in start_thread (arg=0x7fad5ad98700) at pthread_create.c:307 #3 0x00007fb134c19ead in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 <snipped> Regards, Jay --- Additional comment from Mohit Agrawal on 2019-10-14 04:57:03 UTC --- Hi, Thanks for setup the environment to debug the core. The new core(./case/core.9838) is also similar to the older one. >>>>>>>>>>>>>>>>>>> #0 0x00007fb134b52207 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:55 #1 0x00007fb134b538f8 in __GI_abort () at abort.c:90 #2 0x00007fb134b4b026 in __assert_fail_base (fmt=0x7fb134ca5ea0 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x7fb1365c207a "slot->fd == fd", file=file@entry=0x7fb1365c2054 "event-epoll.c", line=line@entry=417, function=function@entry=0x7fb1365c2420 <__PRETTY_FUNCTION__.11118> "event_register_epoll") at assert.c:92 #3 0x00007fb134b4b0d2 in __GI___assert_fail (assertion=assertion@entry=0x7fb1365c207a "slot->fd == fd", file=file@entry=0x7fb1365c2054 "event-epoll.c", line=line@entry=417, function=function@entry=0x7fb1365c2420 <__PRETTY_FUNCTION__.11118> "event_register_epoll") at assert.c:101 #4 0x00007fb136574d04 in event_register_epoll (event_pool=0x561792cb1150, fd=<optimized out>, handler=<optimized out>, data=<optimized out>, poll_in=<optimized out>, poll_out=<optimized out>, notify_poller_death=0 '\000') at event-epoll.c:417 #5 0x00007fb131509eb2 in socket_server_event_handler (fd=<optimized out>, idx=<optimized out>, gen=<optimized out>, data=0x7fb12003eee0, poll_in=<optimized out>, poll_out=<optimized out>, poll_err=0, event_thread_died=0 '\000') at socket.c:2950 #6 0x00007fb136575870 in event_dispatch_epoll_handler (event=0x7faceb1ece30, event_pool=0x561792cb1150) at event-epoll.c:643 #7 event_dispatch_epoll_worker (data=0x7fad1d7569a0) at event-epoll.c:759 #8 0x00007fb135352dd5 in start_thread (arg=0x7faceb1ed700) at pthread_create.c:307 #9 0x00007fb134c19ead in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 >>>>>>>>>>>>>>>>>>>. Thanks, Mohit Agrawal --- Additional comment from Mohit Agrawal on 2019-10-18 06:01:22 UTC --- Hi, It seems the brick process is getting crashed because the function event_slot_alloc is not able to return a valid slot. bt #0 0x00007f9efaed5207 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:55 #1 0x00007f9efaed68f8 in __GI_abort () at abort.c:90 #2 0x00007f9efaece026 in __assert_fail_base (fmt=0x7f9efb028ea0 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x7f9efc94507a "slot->fd == fd", file=file@entry=0x7f9efc945054 "event-epoll.c", line=line@entry=417, function=function@entry=0x7f9efc945420 <__PRETTY_FUNCTION__.11118> "event_register_epoll") at assert.c:92 #3 0x00007f9efaece0d2 in __GI___assert_fail (assertion=assertion@entry=0x7f9efc94507a "slot->fd == fd", file=file@entry=0x7f9efc945054 "event-epoll.c", line=line@entry=417, function=function@entry=0x7f9efc945420 <__PRETTY_FUNCTION__.11118> "event_register_epoll") at assert.c:101 #4 0x00007f9efc8f7d04 in event_register_epoll (event_pool=0x563fac588150, fd=<optimized out>, handler=<optimized out>, data=<optimized out>, poll_in=<optimized out>, poll_out=<optimized out>, notify_poller_death=0 '\000') at event-epoll.c:417 #5 0x00007f9ef798ceb2 in socket_server_event_handler (fd=<optimized out>, idx=<optimized out>, gen=<optimized out>, data=0x7f9ee80403f0, poll_in=<optimized out>, poll_out=<optimized out>, poll_err=0, event_thread_died=0 '\000') at socket.c:2950 #6 0x00007f9efc8f8870 in event_dispatch_epoll_handler (event=0x7f98fffc9e70, event_pool=0x563fac588150) at event-epoll.c:643 #7 event_dispatch_epoll_worker (data=0x7f992c8ea110) at event-epoll.c:759 #8 0x00007f9efb6d5dd5 in start_thread (arg=0x7f98fffca700) at pthread_create.c:307 #9 0x00007f9efaf9cead in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 (gdb) f 4 #4 0x00007f9efc8f7d04 in event_register_epoll (event_pool=0x563fac588150, fd=<optimized out>, handler=<optimized out>, data=<optimized out>, poll_in=<optimized out>, poll_out=<optimized out>, notify_poller_death=0 '\000') at event-epoll.c:417 417 assert (slot->fd == fd); (gdb) p slot $3184 = (struct event_slot_epoll *) 0x7f9e247f81b0 (gdb) p *slot $3185 = {fd = -1, events = 1073741851, gen = 216, idx = 0, ref = 1, do_close = 1, in_handler = 0, handled_error = 0, data = 0x7f9e3c050be0, handler = 0x7f9ef7989980 <socket_event_handler>, lock = {spinlock = 0, mutex = {__data = { __lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}}, poller_death = {next = 0x7f9e247f8208, prev = 0x7f9e247f8208}} p event_pool->slots_used $3188 = {1021, 10, 0 <repeats 1022 times>} f 4 After print the complete fd table i am not able to figure out any entry for socket 1340 set $tmp = 1 while ($tmp != 1023) print ((struct event_slot_epoll *)event_pool->ereg[0])[$tmp].fd set $tmp = $tmp + 1 f 5 p new_sock 1340 As per the current code of event_slot_alloc first, it checks the value of slot_used to validate the free entry in the table. Current bt is showing the value of slots used is 1021(less than 1024) it means still it has some free slot and it sets the table to this registry index(0) >>>>>>>>>>>>>>>>>>>>> for (i = 0; i < EVENT_EPOLL_TABLES; i++) { switch (event_pool->slots_used[i]) { case EVENT_EPOLL_SLOTS: continue; case 0: if (!event_pool->ereg[i]) { table = __event_newtable(event_pool, i); if (!table) return -1; } else { table = event_pool->ereg[i]; } break; default: table = event_pool->ereg[i]; break; } if (table) /* break out of the loop */ break; } if (!table) return -1; table_idx = i; >>>>>>>>>>>>>>>>>>>>>>>>> In below code it tries to check the free entry in the table.As per current slots_used value ideally 3 entry should be free in table but somehow here no entry is free. The code is not validating the fd assignment in the table. It is just returning idx. >>>>>>>>>>>>>>>>>>>>>>>>>> for (i = 0; i < EVENT_EPOLL_SLOTS; i++) { if (table[i].fd == -1) { /* wipe everything except bump the generation */ gen = table[i].gen; memset(&table[i], 0, sizeof(table[i])); table[i].gen = gen + 1; LOCK_INIT(&table[i].lock); INIT_LIST_HEAD(&table[i].poller_death); table[i].fd = fd; if (notify_poller_death) { table[i].idx = table_idx * EVENT_EPOLL_SLOTS + i; list_add_tail(&table[i].poller_death, &event_pool->poller_death); } event_pool->slots_used[table_idx]++; break; } } >>>>>>>>>>>>>>>>> return table_idx * EVENT_EPOLL_SLOTS + i; I think we need to update the code. I have checked the event code, I am not able to figure out why slots_used is not showing correct value. Ideally slots_used value should be 1024 because in registry table no index is free and before returning the index it should validate fd is successfully assigned or not. RCA: The slot->ref is not incremented atomically when slot is allocated. Instead it is done later as part of event_slot_get. If in this window if we happen to run into ref/unref cycles (as explained above) it would result in more calls to event_slot_deallocation than actually needed resulting in wrong accounting of slots_used in slot table. The fix would be: 1. increment slot->ref atomically in __event_slot_alloc 2. Add checks to __event_slot_alloc whether it actually returns a valid slot instead of assuming it does. Thanks, Mohit Agrawal --- Additional comment from Mohit Agrawal on 2019-10-18 06:05:29 UTC --- Patch is posted to resolve the same https://review.gluster.org/#/c/glusterfs/+/23508/ --- Additional comment from Worker Ant on 2019-10-22 13:47:20 UTC --- REVIEW: https://review.gluster.org/23508 (rpc: Synchronize slot allocation code) merged (#7) on master by Raghavendra G
REVIEW: https://review.gluster.org/24166 (rpc: Synchronize slot allocation code) posted (#2) for review on release-5 by MOHIT AGRAWAL
This bug is moved to https://github.com/gluster/glusterfs/issues/1057, and will be tracked there from now on. Visit GitHub issues URL for further details