Bugzilla (bugzilla.redhat.com) will be under maintenance for infrastructure upgrades and will not be available on July 31st between 12:30 AM - 05:30 AM UTC. We appreciate your understanding and patience. You can follow status.redhat.com for details.
Bug 1341864 - [nunc-stans] Server crashes under load with SEGV in thread #1, when connection id=0, fd=0
Summary: [nunc-stans] Server crashes under load with SEGV in thread #1, when connectio...
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: 389-ds-base
Version: 7.3
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Noriko Hosoi
QA Contact: Viktor Ashirov
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-06-01 23:41 UTC by Noriko Hosoi
Modified: 2020-09-13 21:44 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-11-09 22:41:04 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github 389ds 389-ds-base issues 1903 0 None None None 2020-09-13 21:44:19 UTC

Description Noriko Hosoi 2016-06-01 23:41:07 UTC
This bug is created as a clone of upstream ticket:
https://fedorahosted.org/389/ticket/48843

Background:

We had been running a RedHat-supplied 389-ds-base implementation (1.3.4.0).  This implementation caused TCP latency spikes of 1000ms and multiples under load (up to 100 searches/s, 10-30 concurrent connections), leading to TCP retransmissions, aborted connections and generally bad performance.

A later version, (1.3.4.5) from the 389 Directory Server project did not show any improvement.

An even later version which we are using currently in production (1.3.5.9)  has fixed the latency issue. We suspect that a patch in 1.3.4.7 has brought the improvement (Ticket #48341: 0001-Ticket-48341-deadlock-on-connection-mutex.patch).  With version 1.3.5.9, however, we are experiencing frequent crashes of loaded server (every couple of hours).  We have not been able to reproduce the issue with artificial load patterns in our development environment.

The issue:

Currently we are trying to upgrade to 1.3.5.1 (or by now: 1.3.5.3) but are waiting for a fix for shadow attributes.
As there had been no significant changes to the connection handling since 1.3.4.9, I would like to flag the issue even before upgrading to 1.3.5.x (with debuginfo enabled for core dumps).

What we are seeing so far is:

* crashes are segfaults
* thread #1 is crashing
* the crashing function is  connection_table_move_connection_out_of_active_list (in ldap/servers/slapd/conntable.c)
* connection id is 0,  fd is 0
* an error log line "connection - conn=0 fd=0 Attempt to release connection that is not acquired" is always the last logged line before the crash.

This pattern is *always* the same.

I will try to supply an up-to-date stack trace or a core dump once we have migrated to the current 1.3.5.x version

Comment 3 Noriko Hosoi 2016-11-09 22:41:04 UTC
Triage meeting:
  Cannot reproduce the problem in house.
  Much more likely, fixed by NS 0.2.0. I think close with RHEL7.4 and 1.3.6.


Note You need to log in before you can comment on or make changes to this bug.