Bug 1420966

Summary: c2s killed by SIGABRT due to assertion failure with sx_nad_write_elem()
Product: Red Hat Satellite 5 Reporter: Neal Kim <nkim>
Component: ServerAssignee: Tomáš Kašpárek <tkasparek>
Status: CLOSED DEFERRED QA Contact: Red Hat Satellite QA List <satqe-list>
Severity: high Docs Contact:
Priority: high    
Version: 570CC: byodlows, dsafford, dyordano, jgiordan, jhutar, nkim, pgervase, tkasparek, tlestach
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-04-10 08:26:52 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Neal Kim 2017-02-10 02:28:00 UTC
Description of problem:

Process /usr/bin/c2s was killed by signal 6 (SIGABRT)

#0  0x0000003c490325e5 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x0000003c49033dc5 in abort () at abort.c:92
#2  0x0000003c4902b70e in __assert_fail_base (fmt=<value optimized out>, assertion=0x422512 "(int) (nad != ((void *)0))", file=0x4224c1 "io.c", line=<value optimized out>, 
    function=<value optimized out>) at assert.c:96
#3  0x0000003c4902b7d0 in __assert_fail (assertion=0x422512 "(int) (nad != ((void *)0))", file=0x4224c1 "io.c", line=422, function=0x4227b0 "sx_nad_write_elem") at assert.c:105
#4  0x000000000040ea39 in sx_nad_write_elem (s=0x277c3c0, nad=<value optimized out>, elem=<value optimized out>) at io.c:422
#5  0x0000000000408fe4 in c2s_router_sx_callback (s=<value optimized out>, e=<value optimized out>, data=0x26af540, arg=0x1448030) at c2s.c:1135
#6  0x0000000000410489 in __sx_event (file=0x4224c1 "io.c", line=156, s=0x148e650, e=event_PACKET, data=0x26af540) at sx.c:338
#7  0x000000000040f009 in _sx_process_read (s=0x148e650, buf=<value optimized out>) at io.c:156
#8  0x000000000040f568 in sx_can_read (s=0x148e650) at io.c:243
#9  0x00000000004081e1 in c2s_router_mio_callback (m=<value optimized out>, a=<value optimized out>, fd=0x149b020, data=<value optimized out>, arg=0x1448030) at c2s.c:1309
#10 0x00000000004169dc in _mio_run (m=0x1463140, timeout=<value optimized out>) at mio_impl.h:257
#11 0x000000000040cadd in main (argc=<value optimized out>, argv=<value optimized out>) at main.c:732


Version-Release number of selected component (if applicable):

jabberd-2.2.8-23.el6sat


How reproducible:

Consistently but not at will.


Steps to Reproduce:

1. Start jabber services
2. Wait 1 or 2 days
3. Observe that c2s has been killed


Actual results:

Process /usr/bin/c2s was killed by signal 6 (SIGABRT)


Expected results:

c2s not being killed


Additional info:

This particular customer has a relatively high number of OSAD clients (~5000+) and is having load related issues among other things. Would be helpful that c2s does not die unexpectedly.

I did find a somewhat similar backtrace from here:

http://jabberd2.xiaoka.narkive.com/Zjrhn2jY/c2s-is-crashing-under-high-load-of-register-requests

Which mentions this commit:

https://github.com/Jabberd2/jabberd2/commit/5b7acbeab8ad60dfd

Comment 15 Dana Safford 2017-09-08 22:09:09 UTC
due to the customer escalation, I set the 'customer escalation' flag.

Comment 19 Tomas Lestach 2018-04-10 08:26:52 UTC
We have re-reviewed this bug, as part of an ongoing effort to improve Satellite/Proxy feature and bug updates, review and backlog.

This bug has currently no open customer cases. While this bug may still valid, we do not see it being implemented prior to the EOL of the Satellite 5.x product. As such, this is being CLOSED DEFERRED.