Bug 1942060
| Summary: | Directory Server is unavailable after a restart with nsslapd-readonly=on and consumes 100% CPU | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Directory Server | Reporter: | Marc Muehlfeld <mmuehlfe> | ||||||
| Component: | 389-ds-base | Assignee: | LDAP Maintainers <idm-ds-dev-bugs> | ||||||
| Status: | CLOSED MIGRATED | QA Contact: | LDAP QA Team <idm-ds-qe-bugs> | ||||||
| Severity: | unspecified | Docs Contact: | Evgenia Martynyuk <emartyny> | ||||||
| Priority: | medium | ||||||||
| Version: | 11.2 | CC: | idm-ds-dev-bugs, mreynolds, pasik, progier, tbordaz | ||||||
| Target Milestone: | DS12.5 | Keywords: | Triaged, UserExperience | ||||||
| Target Release: | dirsrv-12.5 | ||||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | sync-to-jira | ||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2024-06-26 13:46:36 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
Created attachment 1765639 [details]
ASAN file showing a double free
FYI: I reproduced the problem on master branch with ASAN build and it shows data access on a freed buffer
@progier Is this going to be improved in RHDS v12? @zansorgo I do not think so (as I do not see this bug in the JIRA backlog) BTW, although the double free issue I noticed is very simple to fix (an error case is mishandled), the original issue is much more complex (and likely due to the fact that the replication plugin fails to starts. The more I think about it, the more I suspect that the global read-only feature is incompatible with the replication feature (There is no way to receive updates and no way to send updates because replication state cannot be updated (like RUV and CSN generator) furthermore the replication bootstrap fails because schema cannot be updated) We could disable replication plugin in that case, but IMHO it would be unsafe (because read-only flag can be changed dynamically but that will not restart the replication plugin) and if we run the server in read-write with the replication plugin disabled then there is a risk to corrupt the data ... Note: The fact that there are trouble when starting the replication and that the server stopped is logged in the error log. And when using dsctl command to restart the server, the command finishes but after a long time out (1 minute or so) (that may be why the systemctl appear to hang (especially if there are some retries)) The reason is that lib389 waits until the pid file get created to determine that the server is started rightly. The issue is that the server generates the pid file after starting to listen the connection (i.e when the server is fully up) and if the process dies before reaching that way it waits until the timeout expires. That said, there may be a way to improve the starting failure detection (to avoid having to wait for the time out) The idea is to have 2 pid files (the current one + another one written by the parent process once the child pid is known ( i.e when forking in detach function) So in lib389 the detach.pid file should exists when the parent dies and we could check that child process is still alive while waiting for the "server started" pid file. (if the child is dead we should tell that the server failed to start and advise to look in 389 errorlog) So In conclusion: we could not change the fact that the server fails to start when set in read-only mode but we could reports that the server failed to start before the user is getting impatient ! I will open a ticket upstream to propose that enhancement. This BZ has been automatically migrated to Red Hat Issue Tracker https://issues.redhat.com/browse/DIRSRV-34. All future work related to this report will be managed there. Due to differences in account names between systems, some fields were not replicated. Be sure to add yourself to Jira issue's "Watchers" field to continue receiving updates and add others to the "Need Info From" field to continue requesting information. In the event you have trouble locating or viewing this issue, you can file an issue by sending mail to rh-issues. You can also visit https://access.redhat.com/articles/7032570 for general account information. |
Created attachment 1765573 [details] error log Description of problem: If you set nsslapd-readonly=on and restart the instance, ns-slapd consumes 100% CPU, no ports are opened, and the service is unavailable. Version-Release number of selected component (if applicable): 389-ds-base-1.4.3.13-1.module+el8dsrv+8334+69a46a2e.x86_64 How reproducible: Always. Steps to Reproduce: 1. dsconf -D "cn=Directory Manager" ldap://server.example.com config replace nsslapd-readonly=on 2. systemctl restart dirsrv@instance_name Actual results: The systemctl command never ends, ns-slapd consumes 100% CPU, no ports are opened. The service is not available, and no meaningful error is logged. Expected results: If it's expected that the instance can't be (re-)started in read-only mode, then starting the service should fail and a meaningful information should be logged. If it's expected that the instance starts, then ns-slapd should not hang and start successfully in read-only mode.