RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1811632 - pl_DefWrite() crashes in callback *PRWriteFN
Summary: pl_DefWrite() crashes in callback *PRWriteFN
Keywords:
Status: CLOSED DUPLICATE of bug 1801327
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: 389-ds-base
Version: 7.6
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: rc
: ---
Assignee: thierry bordaz
QA Contact: RHDS QE
URL:
Whiteboard:
: 1843676 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-03-09 11:42 UTC by amitkuma
Modified: 2023-12-15 17:28 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-07-23 00:13:05 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Comment 5 thierry bordaz 2020-03-12 08:46:07 UTC
Thanks for the debug environment

    - Crashing thread is sasl bind sending back data
[29/Feb/2020:00:11:37.005277765 -0500] conn=1759899 fd=169 slot=169 connection from 10.8.4.59 to 10.241.127.197
[29/Feb/2020:00:11:37.005922272 -0500] conn=1759899 op=0 SRCH base="" scope=0 filter="(objectClass=*)" attrs="* altServer namingContexts supportedControl supportedExtension supportedFeatures supportedLDAPVersion supportedSASLMechanisms domaincontrollerfunctionality defaultnamingcontext lastusn highestcommittedusn aci"
[29/Feb/2020:00:11:37.007175023 -0500] conn=1759899 op=0 RESULT err=0 tag=101 nentries=1 etime=0.0001492091
[29/Feb/2020:00:11:37.112671578 -0500] conn=1759899 op=1 BIND dn="" method=sasl version=3 mech=GSSAPI
[29/Feb/2020:00:11:37.170833924 -0500] conn=1759899 op=1 RESULT err=14 tag=97 nentries=0 etime=0.1835246772, SASL bind in progress
[29/Feb/2020:00:11:37.172148132 -0500] conn=1759899 op=2 BIND dn="" method=sasl version=3 mech=GSSAPI
[29/Feb/2020:00:11:37.172995298 -0500] conn=1759899 op=2 RESULT err=14 tag=97 nentries=0 etime=0.0000872759, SASL bind in progress
[29/Feb/2020:00:11:37.174364349 -0500] conn=1759899 op=3 BIND dn="" method=sasl version=3 mech=GSSAPI

(gdb) where
#0  0x00007f11a4d411e4 in pl_DefWrite (fd=<optimized out>, buf=0x56395a100000, amount=14) at ../../../nspr/pr/src/io/prlayer.c:87
#1  0x00005638dbd9c431 in write_function (ignore=0, handle=0x563952c0f4a0, count=14, buffer=0x56395a100000) at ldap/servers/slapd/daemon.c:2002
#2  0x00005638dbd9c431 in openldap_write_function (sbiod=<optimized out>, buf=0x56395a100000, len=14) at ldap/servers/slapd/daemon.c:2076
#3  0x00007f11a590292e in ber_int_sb_write (sb=sb@entry=0x5638dfd75770, buf=0x56395a100000, len=len@entry=14) at sockbuf.c:445
#4  0x00007f11a58fec0b in ber_flush2 (sb=0x5638dfd75770, ber=ber@entry=0x5639032e1220, freeit=freeit@entry=1) at io.c:246
#5  0x00007f11a58fed1d in ber_flush (sb=<optimized out>, ber=ber@entry=0x5639032e1220, freeit=freeit@entry=1) at io.c:211
#6  0x00007f11a6f982ec in flush_ber (pb=pb@entry=0x5638dfd691a0, conn=0x5638dfd85d80, op=0x563903430000, ber=ber@entry=0x5639032e1220, type=type@entry=0)
    at ldap/servers/slapd/result.c:1761
#7  0x00007f11a6f9a4f0 in send_ldap_result_ext (pb=pb@entry=0x5638dfd691a0, err=err@entry=0, matched=matched@entry=0x0, text=<optimized out>, 
    text@entry=0x0, nentries=nentries@entry=0, urls=urls@entry=0x0, ber=0x5639032e1220, ber@entry=0x0) at ldap/servers/slapd/result.c:600
#8  0x00007f11a6f9a6b1 in send_ldap_result (pb=pb@entry=0x5638dfd691a0, err=err@entry=0, matched=matched@entry=0x0, text=text@entry=0x0, nentries=nentries@entry=0, urls=urls@entry=0x0) at ldap/servers/slapd/result.c:199
#9  0x00005638dbdabbfd in ids_sasl_check_bind (pb=pb@entry=0x5638dfd691a0) at ldap/servers/slapd/saslbind.c:1125
#10 0x00005638dbd93871 in do_bind (pb=pb@entry=0x5638dfd691a0) at ldap/servers/slapd/bind.c:363
#11 0x00005638dbd9a4b8 in connection_dispatch_operation (pb=0x5638dfd691a0, op=0x563903430000, conn=0x5638dfd85d80) at ldap/servers/slapd/connection.c:600
#12 0x00005638dbd9a4b8 in connection_threadmain () at ldap/servers/slapd/connection.c:1785
#13 0x00007f11a4d5abab in _pt_root (arg=0x5638dfc446c0) at ../../../nspr/pr/src/pthreads/ptthread.c:201
#14 0x00007f11a46fadd5 in start_thread (arg=0x7f11781b9700) at pthread_create.c:307
#15 0x00007f11a3da6ead in __libc_ifunc_impl_list (name=<optimized out>, array=0x7f11781b9700, max=<optimized out>) at ../sysdeps/x86_64/multiarch/ifunc-impl-list.c:338
#16 0x0000000000000000 in None ()

    - It crashed because the lower layer (containing callbacks) was NULL
(gdb) frame 0
(gdb) x/i pl_DefWrite
   0x7f11a4d411e0 <pl_DefWrite>:	mov    0x10(%rdi),%rdi
(gdb) x/10i pl_DefWrite
   0x7f11a4d411e0 <pl_DefWrite>:	mov    0x10(%rdi),%rdi   <--- 'rdi' was the fd handle. 0x10(%rdi) was fd->lower
=> 0x7f11a4d411e4 <pl_DefWrite+4>:	mov    (%rdi),%rax
(gdb) print $rdi
$8 = 0
(gdb) ptype fd
type = struct PRFileDesc {
    const PRIOMethods *methods;
    PRFilePrivate *secret;
    PRFileDesc *lower;      <--- 'lower' layer was NULL
    PRFileDesc *higher;
    void (*dtor)(PRFileDesc *);
    PRDescIdentity identity;
} *

    - But in the dumped core we can see that it is *not* NULL
(gdb) thread 1
(gdb) frame 1
(gdb) print *((PRFileDesc *)handle)->lower
$6 = {methods = 0x7f11a4f6d580 <ipv6_to_v4_tcpMethods>, secret = 0x0, lower = 0x563903e198c0, higher = 0x563952c0f4a0, dtor = 0x7f11a4d41390 <pl_FDDestructor>, 
  identity = 1}
(gdb) print *((PRFileDesc *)handle)->lower->methods
$7 = {file_type = PR_DESC_LAYERED, close = 0x7f11a4d41bc0 <pl_TopClose>, read = 0x7f11a4d411d0 <pl_DefRead>, write = 0x7f11a4d411e0 <pl_DefWrite>, 
  available = 0x7f11a4d411f0 <pl_DefAvailable>, available64 = 0x7f11a4d41200 <pl_DefAvailable64>, fsync = 0x7f11a4d41210 <pl_DefFsync>, 
  seek = 0x7f11a4d41220 <pl_DefSeek>, seek64 = 0x7f11a4d41230 <pl_DefSeek64>, fileInfo = 0x7f11a4d41240 <pl_DefFileInfo>, 
  fileInfo64 = 0x7f11a4d41250 <pl_DefFileInfo64>, writev = 0x7f11a4d41260 <pl_DefWritev>, connect = 0x7f11a4d40f20 <Ipv6ToIpv4SocketConnect>, 
  accept = 0x7f11a4d40c80 <Ipv6ToIpv4SocketAccept>, bind = 0x7f11a4d40e70 <Ipv6ToIpv4SocketBind>, listen = 0x7f11a4d412b0 <pl_DefListen>, 
  shutdown = 0x7f11a4d412c0 <pl_DefShutdown>, recv = 0x7f11a4d412d0 <pl_DefRecv>, send = 0x7f11a4d412e0 <pl_DefSend>, recvfrom = 0x7f11a4d412f0 <pl_DefRecvfrom>, 
  sendto = 0x7f11a4d41300 <pl_DefSendto>, poll = 0x7f11a4d41310 <pl_DefPoll>, acceptread = 0x7f11a4d40ae0 <Ipv6ToIpv4SocketAcceptRead>, 
  transmitfile = 0x7f11a4d41330 <pl_DefTransmitfile>, getsockname = 0x7f11a4d40a70 <Ipv6ToIpv4SocketGetName>, 
  getpeername = 0x7f11a4d40a00 <Ipv6ToIpv4SocketGetPeerName>, reserved_fn_6 = 0x7f11a4d40230 <_PR_InvalidInt>, reserved_fn_5 = 0x7f11a4d40230 <_PR_InvalidInt>, 
  getsocketoption = 0x7f11a4d41360 <pl_DefGetsocketoption>, setsocketoption = 0x7f11a4d41370 <pl_DefSetsocketoption>, sendfile = 0x7f11a4d41380 <pl_DefSendfile>, 
  connectcontinue = 0x7f11a4d41280 <pl_DefConnectcontinue>, reserved_fn_3 = 0x7f11a4d40230 <_PR_InvalidInt>, reserved_fn_2 = 0x7f11a4d40230 <_PR_InvalidInt>, 
  reserved_fn_1 = 0x7f11a4d40230 <_PR_InvalidInt>, reserved_fn_0 = 0x7f11a4d40230 <_PR_InvalidInt>}

    In conclusion:
        I suspect two threads (thread 1 and another one not identified in the core) competed to use the connect.
        Thread 1 (sasl bind) registered callbacks and at the same time the other thread pushed
        them as layer callback.
        The push reset fd->lower for a transient period, that triggered the crash
        It matches BZ https://bugzilla.redhat.com/show_bug.cgi?id=1756182

Next step:
    - How frequently does it crash ?
    - Did you identify a reproducer ?
    - Please update 389-ds version to 389-ds-base-1.3.10.1-3.el7 (bz 1756182) and check if crash happens again

Comment 6 amitkuma 2020-03-13 11:47:58 UTC
Response from Customer:

- How frequently does it crash ?
ns-slapd is crashing every few days, this may be more frequent under heavy load, but this is anecdotal at the moment.
    - Did you identify a reproducer ?
Unfortunately not
    - Please update 389-ds version to 389-ds-base-1.3.10.1-3.el7 (bz 1756182) and check if crash happens again "
I will discuss this internal.

Will revert once customer discusses internally comes back

Comment 20 Marc Sauton 2020-06-04 00:49:23 UTC
*** Bug 1843676 has been marked as a duplicate of this bug. ***

Comment 24 joel 2020-06-04 19:15:55 UTC
customer has verified their package:

 "The version is indeed 1.3.9.1-12.el7_7. 

So what's the best course of action for now?
"


Note You need to log in before you can comment on or make changes to this bug.