1255042 – Slapd crashes reported from replication tests

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1255042 - Slapd crashes reported from replication tests

Summary: Slapd crashes reported from replication tests

Keywords:
Status:	CLOSED WORKSFORME
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	389-ds-base
Sub Component:
Version:	7.2
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Noriko Hosoi
QA Contact:	Viktor Ashirov
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2015-08-19 13:32 UTC by Sankar Ramalingam
Modified:	2020-09-13 21:37 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-06-14 12:02:56 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Backtraces for slapd crashes (258.43 KB, text/plain) 2015-08-19 13:32 UTC, Sankar Ramalingam	no flags	Details
Stacktrace for slapd crash (258.43 KB, text/plain) 2015-08-19 13:33 UTC, Sankar Ramalingam	no flags	Details
Stacktrace for slapd crash (35.95 KB, text/plain) 2015-08-19 13:34 UTC, Sankar Ramalingam	no flags	Details
Stack trace from accPolicy tests (188.13 KB, text/plain) 2015-08-26 09:21 UTC, Sankar Ramalingam	no flags	Details
Stacktrace from mmraccept tests (189.29 KB, text/plain) 2015-08-26 10:42 UTC, Sankar Ramalingam	no flags	Details
Show Obsolete (1) View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	389ds 389-ds-base issues 1734	0	None	closed	Slapd crashes reported from replication tests	2020-10-16 23:42:17 UTC

Description Sankar Ramalingam 2015-08-19 13:32:23 UTC

Created attachment 1064852 [details]
Backtraces for slapd crashes

Description of problem: Slapd crashes coming from multi-operations stress tests.


Version-Release number of selected component (if applicable): 389-ds-base-1.3.4.0-12


How reproducible: Consistently with the latest build of 389-ds-base-1.3.4.0-12


Steps to Reproduce:
1. Run Multi-operations stress tests on a beaker machine.
2. Job - https://beaker.engineering.redhat.com/jobs/1053896
3. Crashes reported from execution

Actual results: Slapd crashes


Expected results: No slapd crash


Additional info:

Comment 1 Sankar Ramalingam 2015-08-19 13:33:33 UTC

Created attachment 1064860 [details]
Stacktrace for slapd crash

Crash report from beaker

Comment 2 Sankar Ramalingam 2015-08-19 13:34:28 UTC

Created attachment 1064867 [details]
Stacktrace for slapd crash

Comment 4 Noriko Hosoi 2015-08-19 16:41:09 UTC

(In reply to Sankar Ramalingam from comment #1)
> Created attachment 1064860 [details]
> Stacktrace for slapd crash
> 
> Crash report from beaker

This is a stacktrace of ns-slapd.  It crashed in getting entryrdn element in the entryrdn index while deleting.


> Sankar Ramalingam 2015-08-19 09:34:28 EDT
> Created attachment 1064867 [details]
> Stacktrace for slapd crash

This is a stacktrace of ldclt (not ns-slapd).


What do "Multi-operations stress tests" do?  Where can I see the test program?  Is it a test against a standalone server or MMR?

Are the cores (especially of ns-slapd) left on the beaker?  Can I see the error and access logs?  Also, ldclt logs?

Could it be possible to run the stress test with valgrind?

Comment 5 Sankar Ramalingam 2015-08-24 18:31:39 UTC

Cloned a beaker job to reproduce the crash. I will re-run the crash tests with Valgrind.
https://beaker.engineering.redhat.com/jobs/1060253

Comment 9 Sankar Ramalingam 2015-08-25 17:38:13 UTC

1. I could reproduce the crash on one of the beaker machine. However, this is from ldclt, not from ns-slapd.

Host - vm-idm-002.lab.eng.pnq.redhat.com
Root pw: Redhat123
Coredump - /var/spool/abrt/ccpp-2015-08-25-11:21:28-30568/coredump

2. I ran the stress tests with valgrind and the report is already out. As expected there was no crash.
Host - vm-idm-017.lab.eng.pnq.redhat.com

Variables set by tests:
        MUOP01_MAX_RANGE=1000000
        MUOP01_NB_LOOPS=17280

I ran it with :
        MUOP01_MAX_RANGE=100000
        MUOP01_NB_LOOPS=1728

Feel free to ask more questions or for further execution.

Comment 10 Noriko Hosoi 2015-08-25 17:49:15 UTC

Thank you, Sankar.

Could you repeat the case 1 (no valgrind) on several beaker machines in parallel?  If the crash is not captured, I will give up...

Thanks...

Comment 11 Sankar Ramalingam 2015-08-25 18:28:32 UTC

(In reply to Noriko Hosoi from comment #10)
> Thank you, Sankar.
> 
> Could you repeat the case 1 (no valgrind) on several beaker machines in
> parallel?  
https://beaker.engineering.redhat.com/jobs/1061677
https://beaker.engineering.redhat.com/jobs/1061678
https://beaker.engineering.redhat.com/jobs/1061679
https://beaker.engineering.redhat.com/jobs/1061680

If the crash is not captured, I will give up...
> 
> Thanks...

Comment 13 Sankar Ramalingam 2015-08-26 09:21:22 UTC

Created attachment 1067208 [details]
Stack trace from accPolicy tests

Managed to reproduce the crash with the accPolicy acceptance tests. The crash is not specific to stress tests. I have cloned another beaker job with accPolicy tests to reproduce the crash and provide access to the core files.
Attaching the stack trace.

Comment 14 Sankar Ramalingam 2015-08-26 10:42:31 UTC

Created attachment 1067219 [details]
Stacktrace from mmraccept tests

mmraccept tests also crashing. Attaching the stack trace.

Comment 16 Sankar Ramalingam 2015-08-27 07:00:26 UTC

The crash is not reproducible for me when I clone beaker jobs or manual trigger of jobs from Jenkins. However, this keeps coming from the automated execution from Jenkins, not consistent though. I am trying few more runs today to reproduce the crash as well as reserve the same machine for troubleshooting.

Comment 17 Sankar Ramalingam 2015-08-27 10:55:30 UTC

Managed to reproduce the crash with mmraccept tests by manually triggering of jobs from Jenkins. Machine is reserved and available for further troubleshooting.

Hostname - apollo.idmqe.lab.eng.bos.redhat.com
Root pw: Redhat123

[root@apollo ~]# find /var -name core*
/var/lib/systemd/coredump
/var/spool/abrt/ccpp-2015-08-27-06:18:17-344/coredump
/var/spool/abrt/ccpp-2015-08-27-06:18:17-344/core_backtrace

I guess, the tests around accPolicy would also crash the server. It will reach accPlolicy tests about 5 hrs from now. Feel free to access this machine for further investigation.

Comment 18 Noriko Hosoi 2015-08-27 18:24:29 UTC

Sankar,

I see lots of file system full errors in /var/log/messages. Could it be related to the test failure? If so, could you rerun the test with more disk spaces?

....
Aug 27 13:33:56 apollo ns-slapd: Failed to write log, Netscape Portable Runtime error -5956 (The device for storing the file is full.): - slapd shutting down - closing down internal subsystems and plugins
Aug 27 13:33:56 apollo ns-slapd: Writing to the errors log failed. Exiting...
Aug 27 13:33:56 apollo ns-slapd: Failed to write log, Netscape Portable Runtime error -5956 (The device for storing the file is full.): - Waiting for 4 database threads to stop
Aug 27 13:33:56 apollo ns-slapd: Writing to the errors log failed. Exiting...
Aug 27 13:33:57 apollo ns-slapd: Failed to write log, Netscape Portable Runtime error -5956 (The device for storing the file is full.): - All database threads now stopped
Aug 27 13:33:57 apollo ns-slapd: Writing to the errors log failed. Exiting...
Aug 27 13:33:57 apollo ns-slapd: Failed to write log, Netscape Portable Runtime error -5956 (The device for storing the file is full.): - slapd shutting down - freed 1 work q stack objects - freed 1 op stack objects
Aug 27 13:33:57 apollo ns-slapd: Writing to the errors log failed. Exiting...
Aug 27 13:33:58 apollo ns-slapd: Failed to write log, Netscape Portable Runtime error -5956 (The device for storing the file is full.): - slapd stopped.
....

If you still see the crash on the host with the larger file system, we learned openldap was rebased for 7.2. The issue may be related to the openldap upgrade. Could you please run the test with downgrading openldap to the version of 7.1?

Thanks.

Comment 19 Sankar Ramalingam 2015-08-28 14:54:07 UTC

The machine is already returned back to beaker pool and I am doubtful that I could run the tests by downgrading openldap. Moreover, its difficult to consistently reproduce the crash by running the same set of tests. So, I feel this should be pushed to next release unless we figure out a way to reproduce it very consistently.

Comment 20 Noriko Hosoi 2015-08-28 15:32:46 UTC

(In reply to Sankar Ramalingam from comment #19)
> The machine is already returned back to beaker pool and I am doubtful that I
> could run the tests by downgrading openldap. Moreover, its difficult to
> consistently reproduce the crash by running the same set of tests. So, I
> feel this should be pushed to next release unless we figure out a way to
> reproduce it very consistently.

Do you mean you want to stop investigating this crash for now?

Please note that this bug is already targeted as 7.3.0.

Comment 21 Sankar Ramalingam 2015-08-29 12:04:23 UTC

(In reply to Noriko Hosoi from comment #20)
> (In reply to Sankar Ramalingam from comment #19)
> > The machine is already returned back to beaker pool and I am doubtful that I
> > could run the tests by downgrading openldap. Moreover, its difficult to
> > consistently reproduce the crash by running the same set of tests. So, I
> > feel this should be pushed to next release unless we figure out a way to
> > reproduce it very consistently.
> 
> Do you mean you want to stop investigating this crash for now?
Yes. I felt I am spending more time for this but no outcome.
> 
> Please note that this bug is already targeted as 7.3.0.
Okay, I thought I found a reliable reproducer, but now it doesn't look like. So, I would like to give up now and continue with RHEL7.2 work.

Comment 45 Noriko Hosoi 2016-01-07 19:21:26 UTC

Upstream ticket:
https://fedorahosted.org/389/ticket/48403

Comment 46 Noriko Hosoi 2016-05-28 00:46:44 UTC

We did not have a chance to look into this issue recently.

Now we have a rhel-7.3 candidate (of course, we are fixing more bugs, though)
https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=495659
389-ds-base-1.3.5.4-1.el7

Sankar, could you please retry the test with this latest build on rhel-7.3?
> Description of problem: Slapd crashes coming from multi-operations stress tests.

And if the server still crashes, could you retain the test environment with a core file?

Thanks!

Comment 47 Sankar Ramalingam 2016-06-02 08:31:07 UTC

So far with TET acceptance tests, no crashes observed on RHEL7.3 389-ds-base builds. We are yet to start off with Longduration(Tier2) and Stress/Reliability(Tier3) tests for RHEL7.3. I will update the bug with more details if I encounter any crashes with Tier2 and Tier3 execution.

Comment 49 Sankar Ramalingam 2016-06-14 11:17:39 UTC

I cloned a beaker job - https://beaker.engineering.redhat.com/jobs/1369329. I will wait for this job to complete and then update the bug accordingly.

Comment 50 Sankar Ramalingam 2016-06-14 12:02:56 UTC

(In reply to Sankar Ramalingam from comment #49)
> I cloned a beaker job - https://beaker.engineering.redhat.com/jobs/1369329.
> I will wait for this job to complete and then update the bug accordingly.

I didn't observe any crash for the above beaker job. Hence, closing this bug as not reproducible.

Packages tested: 389-ds-base-1.3.5.4-1.el7.x86_64

Note You need to log in before you can comment on or make changes to this bug.