Bug 1837105
Summary: | Check for clock errors and time skew | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | mreynolds | |
Component: | 389-ds-base | Assignee: | mreynolds | |
Status: | CLOSED ERRATA | QA Contact: | RHDS QE <ds-qe-bugs> | |
Severity: | unspecified | Docs Contact: | Marc Muehlfeld <mmuehlfe> | |
Priority: | unspecified | |||
Version: | 7.9 | CC: | kbanerje, mrhodes, msauton, pasik, rcain, spichugi, tbordaz, tmihinto, vashirov | |
Target Milestone: | rc | Keywords: | TestCannotAutomate | |
Target Release: | --- | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | 389-ds-base-1.3.10.2-4.el7 | Doc Type: | Bug Fix | |
Doc Text: |
.Directory Server rejects update operations if retrieving the system time fails or the time difference is too large
Previously, when calling the system time() function failed or the function returned an unexpected value, Change Sequence Numbers (CSN) in Directory Server could become corrupted. As a consequence, the administrator had to re-initialize all replicas in the environment. With this update, Directory Server rejects the update operation if the time() function failed, and Directory Server no longer generates corrupt CSNs in the mentioned scenario.
Note that, if the time difference is greater than one day, the server logs a `INFO - csngen_new_csn - Detected large jump in CSN time` message in the `/var/log/dirsrv/slapd-<instance_name>/error` file. However, Directory Server still creates the CSN and does not reject the update operation.
|
Story Points: | --- | |
Clone Of: | ||||
: | 1843157 (view as bug list) | Environment: | ||
Last Closed: | 2020-09-29 19:46:56 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1843157 |
Description
mreynolds
2020-05-18 20:23:44 UTC
Upstream ticket: https://pagure.io/389-ds-base/issue/51095 Build tested: 389-ds-base-1.3.10.2-6.el7.x86_64 Testing was done using libfaketime: systemctl edit dirsrv@server [Service] Environment= Environment=LD_PRELOAD=/usr/lib64/faketime/libfaketimeMT.so.1 Environment=FAKETIME_NO_CACHE=1 Environment=FAKETIME_TIMESTAMP_FILE=/run/dirsrv/faketime.rc Environment=FAKETIME_DONT_RESET=1 Changing time for ns-slapd in a replicated topology causes it to emit the following error messages: echo "-1d x1" > /run/dirsrv/faketime.rc => /var/log/dirsrv/slapd-master1/errors <== [01/Jul/2020:14:07:24.721287371 +0000] - INFO - csngen_new_csn - Detected large jump in CSN time. Delta: -172681 (current time: 1593612444 vs previous time: 1593785125) ==> /var/log/dirsrv/slapd-master2/errors <== [01/Jul/2020:14:07:25.264048182 +0000] - INFO - csngen_new_csn - Detected large jump in CSN time. Delta: -172680 (current time: 1593612445 vs previous time: 1593785125) To reproduce clock_gettime() errors I LD_PRELOADed the following library: // gcc faulty_clock.c -ldl -shared -fPIC -o faulty_clock.so #define _GNU_SOURCE #include "constant.h" #include <dlfcn.h> #include <errno.h> #include <stdio.h> #include <stdlib.h> #include <sys/time.h> #include <time.h> typedef int (*original_clock_gettime_f_type)(clockid_t clk_id, struct timespec *tp); int clock_gettime(clockid_t clk_id, struct timespec *tp) { int result = err; original_clock_gettime_f_type orig_clock; orig_clock = (original_clock_gettime_f_type)dlsym(RTLD_NEXT, "clock_gettime"); int fault = rand() % 100; if (fault == 5) { printf("fair dice roll says %d\n", fault); errno = EFAULT; result = err; } else result = orig_clock(clk_id, tp); return result; } If the failure happens too often, it crashes pretty fast: Jul 03 15:11:25 server.example.com ns-slapd[9497]: clock_gettime failed: 14 Jul 03 15:11:25 server.example.com ns-slapd[9497]: fair dice roll says 5 ... Jul 03 15:11:25 server.example.com ns-slapd[9497]: fair dice roll says 5 Jul 03 15:11:25 server.example.com systemd[1]: dirsrv: main process exited, code=killed, status=6/ABRT Jul 03 15:11:25 server.example.com systemd[1]: Unit dirsrv entered failed state. Jul 03 15:11:25 server.example.com systemd[1]: dirsrv failed. If I reduce the number of occurrences, I see the newly added debug messages: [03/Jul/2020:15:32:58.393416521 +0000] - ERR - csngen_new_csn - Failed to get system time (Bad address) [03/Jul/2020:15:32:58.396762553 +0000] - ERR - ldbm_back_modify - failed to generate modify CSN for entry (cn=user00001,ou=People,dc=example,dc=com), aborting operation [03/Jul/2020:15:32:58.615110395 +0000] - EMERG - vslapd_log_access, Unable to determine system time for message :: conn=4 op=8 MOD dn="cn=user00003,ou=people,dc=example,dc=com" Under normal conditions server should not crash, as the situation where clock_gettime() returns too many errors is very rare. Marking as VERIFIED. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (389-ds-base bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:3894 |