1942060 – Directory Server is unavailable after a restart with nsslapd-readonly=on and consumes 100% CPU

This bug has been migrated to another issue tracking site. It has been closed here and may no longer be being monitored.

If you would like to get updates for this issue, or to participate in it, you may do so at Red Hat Issue Tracker .

Bug 1942060 - Directory Server is unavailable after a restart with nsslapd-readonly=on and consumes 100% CPU

Summary: Directory Server is unavailable after a restart with nsslapd-readonly=on and ...

Keywords:
Status:	CLOSED MIGRATED
Alias:	None
Product:	Red Hat Directory Server
Classification:	Red Hat
Component:	389-ds-base
Sub Component:
Version:	11.2
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	unspecified
Target Milestone:	DS12.5
Target Release:	dirsrv-12.5
Assignee:	LDAP Maintainers
QA Contact:	LDAP QA Team
Docs Contact:	Evgenia Martynyuk
URL:
Whiteboard:	sync-to-jira
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-03-23 14:51 UTC by Marc Muehlfeld
Modified:	2024-06-26 13:46 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2024-06-26 13:46:36 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
error log (14.63 KB, text/plain) 2021-03-23 14:51 UTC, Marc Muehlfeld	no flags	Details
ASAN file showing a double free (6.19 KB, text/plain) 2021-03-23 17:39 UTC, Pierre Rogier	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Github	389ds 389-ds-base issues 4772	None	open	Improve the server start failure detection	2021-05-20 09:41:58 UTC
Red Hat Issue Tracker	DIRSRV-34	None	None	Red Hat Issue Tracker	2024-06-26 13:46:35 UTC
Red Hat Issue Tracker	IDMDS-3960	None	None	None	2023-12-06 16:25:55 UTC

Description Marc Muehlfeld 2021-03-23 14:51:43 UTC

Created attachment 1765573 [details]
error log

Description of problem:
If you set nsslapd-readonly=on and restart the instance, ns-slapd consumes 100% CPU, no ports are opened, and the service is unavailable.


Version-Release number of selected component (if applicable):
389-ds-base-1.4.3.13-1.module+el8dsrv+8334+69a46a2e.x86_64


How reproducible:
Always.


Steps to Reproduce:
1. dsconf -D "cn=Directory Manager" ldap://server.example.com config replace nsslapd-readonly=on
2. systemctl restart dirsrv@instance_name


Actual results:
The systemctl command never ends, ns-slapd consumes 100% CPU, no ports are opened. The service is not available, and no meaningful error is logged.


Expected results:
If it's expected that the instance can't be (re-)started in read-only mode, then starting the service should fail and a meaningful information should be logged.

If it's expected that the instance starts, then ns-slapd should not hang and start successfully in read-only mode.

Comment 1 Pierre Rogier 2021-03-23 17:39:16 UTC

Created attachment 1765639 [details]
ASAN file showing a double free

FYI: I reproduced the problem on master branch with ASAN build and it shows data access on a freed buffer

Comment 2 Zuzana Lena Ansorgova 2021-05-17 14:40:28 UTC

@progier Is this going to be improved in RHDS v12?

Comment 3 Pierre Rogier 2021-05-20 09:15:31 UTC

@zansorgo I do not think so (as I do not see this bug in the JIRA backlog)

BTW, although the double free issue I noticed is very simple to fix (an error case is mishandled),
 the original issue is much more complex (and likely due to the fact that the replication plugin fails to starts. 
The more I think about it, the more I suspect that the global read-only feature is incompatible 
with the replication feature (There is no way to receive updates and no way to send updates because replication state cannot be updated (like RUV and
 CSN generator) furthermore the replication bootstrap fails because schema cannot be updated) 

We could disable replication plugin in that case, but IMHO it would be unsafe (because read-only flag can be changed dynamically but that will not restart
 the replication plugin) and if we run the server in read-write with the replication plugin disabled then there is a risk to corrupt the data ...

Note: The fact that there are trouble when starting the replication and that the server stopped is logged in the error log. 
And when using dsctl command to restart the server, the command finishes but after a long time out (1 minute or so) (that may be why the systemctl appear to hang (especially if there are some retries))

The reason is that lib389 waits until the pid file get created to determine that the server is started rightly.
 The issue is that the server generates the pid file after starting to listen the connection (i.e when the server is fully up) 
   and if the process dies before reaching that way it waits until the timeout expires. 

That said, there may be a way to improve the starting failure detection (to avoid having to wait for the time out)
 The idea is to have 2 pid files (the current one + another one written by the parent process once the child pid is known 
  ( i.e when forking in detach function)
 So in lib389 the detach.pid file should exists when the parent dies and we could check that child process is still alive while waiting
 for the "server started" pid file. (if the child is dead we should tell that the server failed to start and advise to look in 389 errorlog) 


So In conclusion: we could not change the fact that the server fails to start when set in read-only mode but we
could reports that the server failed to start before the user is getting impatient !

I will open a ticket upstream to propose that enhancement.

Comment 8 Viktor Ashirov 2024-06-26 13:46:36 UTC

This BZ has been automatically migrated to Red Hat Issue Tracker https://issues.redhat.com/browse/DIRSRV-34. All future work related to this report will be managed there.

Due to differences in account names between systems, some fields were not replicated. Be sure to add yourself to Jira issue's "Watchers" field to continue receiving updates and add others to the "Need Info From" field to continue requesting information.

In the event you have trouble locating or viewing this issue, you can file an issue by sending mail to rh-issues. You can also visit https://access.redhat.com/articles/7032570 for general account information.

Note You need to log in before you can comment on or make changes to this bug.