Bug 1364033
Summary: | sssd exits if clock is adjusted backwards after boot | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Patrik Kis <pkis> |
Component: | sssd | Assignee: | Pavel Březina <pbrezina> |
Status: | CLOSED ERRATA | QA Contact: | Steeve Goveas <sgoveas> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 7.3 | CC: | grajaiya, jbastian, jhrozek, lslebodn, mkosek, mzidek, pbrezina, pkis, sgadekar |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | sssd-1.14.0-30.el7 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2016-11-04 07:20:13 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Patrik Kis
2016-08-04 10:39:13 UTC
Looking at the logs, the IPA server is unreachable when sssd_be starts. Does the bug happen after you start SSSD when the IPA server is up? (I'm thinking this might be upstream #3111) The IPA server is up all the time, it's completely independent of the client. And I tested this a few times already also with AD and it was always the same. So I don't think it's #3111, only if for some reason SSSD can not communicate with IPA when the machine is booting and at the end it behaves as in #3111. Note, it happen only on aarch64 what is a quite new platform. I still suspect #3111 because the aarch machine is presumably quite a bit slower than the other architectures. Because we know how to fix that ticket, I would prefer to try with test packages for #3111 when they are available (hopefully really soon now). Ok, I'll be happy to test it with a test package. Pavel, you wrote a patch. Could you prepare a scratch build with your patch? This is different than #3111. Backend starts normally, but network is unreachable during boot so backend goes offline (probably wrong order in systemd?). However both sssd and sssd_be is killed by watchdog which is interesting since according to logs at least sssd_be is working. There is long delay in sssd.log but that may be just because we didn't write anything there. OK, then I guess we need to clone and investigate this as a new bug. Can you take a look at Patrik's machine? That may be problematic. Since this occurs only on boot when a network is unreachable. I can't debug it during boot time and I can't disable network since I'll loose access to the machine. I will provide a test package to see when watchdog ticks are happening. Watchdog tevent timer is not fired for some reason. It is created but never triggered in all sssd processes. However there is no reason for us to hang in the main loop. (In reply to Jakub Hrozek from comment #5) > I still suspect #3111 because the aarch machine is presumably quite a bit > slower than the other architectures. Because we know how to fix that ticket, > I would prefer to try with test packages for #3111 when they are available > (hopefully really soon now). May it be because this architecture is "presumably slower" that we can't rely on those timers? But range of 30+ seconds is quite a long time to suspect this... Do we have any tevent known bug? (In reply to Pavel Březina from comment #18) > Watchdog tevent timer is not fired for some reason. It is created but never > triggered in all sssd processes. However there is no reason for us to hang > in the main loop. > > (In reply to Jakub Hrozek from comment #5) > > I still suspect #3111 because the aarch machine is presumably quite a bit > > slower than the other architectures. Because we know how to fix that ticket, > > I would prefer to try with test packages for #3111 when they are available > > (hopefully really soon now). > > May it be because this architecture is "presumably slower" that we can't > rely on those timers? But range of 30+ seconds is quite a long time to > suspect this... > > Do we have any tevent known bug? No, we don't. Does this bug only happen the first time sssd starts during boot and never again? (In reply to Jakub Hrozek from comment #19) > > No, we don't. Does this bug only happen the first time sssd starts during > boot and never again? Correct, the bug happen only during the first start during boot, any subsequent manual start with systemctl succeeds. I can confirm that. I can't reproduce it locally. How are we going to debug it during boot time? I checked tevent timer code and it's completely independent on any system call. The only idea I have is to use command=strace ... or similar to see what exactly is happening there. Alternatively, we can wait until Fabiano finishes https://fedorahosted.org/sssd/ticket/3051 and then use diag_cmd to see why is sssd stuck. I tried starting sssd through strace like this: [root@bumblebee ~]# cat /etc/systemd/system/sssd.service.d/strace.conf [Service] ExecStart= ExecStart=/usr/bin/strace -f -o /var/tmp/sssd-strace /usr/sbin/sssd -D -f But it was too slow running under strace and it failed with: [root@bumblebee ~]# systemctl restart sssd.service Job for sssd.service failed because a timeout was exceeded. See "systemctl status sssd.service" and "journalctl -xe" for details. [root@bumblebee ~]# systemctl status sssd.service -l ● sssd.service - System Security Services Daemon Loaded: loaded (/usr/lib/systemd/system/sssd.service; enabled; vendor preset: disabled) Drop-In: /etc/systemd/system/sssd.service.d └─journal.conf, strace.conf Active: failed (Result: timeout) since Wed 2016-08-17 16:45:31 CDT; 3min 43s ago Process: 9766 ExecStart=/usr/bin/strace -f -o /var/tmp/sssd-strace /usr/sbin/sssd -D -f (code=exited, status=0/SUCCESS) Main PID: 9690 (code=exited, status=0/SUCCESS) Aug 17 16:44:01 bumblebee.localdomain sssd[nss][9778]: Starting up Aug 17 16:45:31 bumblebee.localdomain systemd[1]: sssd.service start operation timed out. Terminating. Aug 17 16:45:31 bumblebee.localdomain sssd[nss][9778]: Shutting down Aug 17 16:45:31 bumblebee.localdomain sssd[autofs][9780]: Shutting down Aug 17 16:45:31 bumblebee.localdomain sssd[pam][9779]: Shutting down Aug 17 16:45:31 bumblebee.localdomain sssd[be[default]][9776]: Shutting down Aug 17 16:45:31 bumblebee.localdomain sssd[be[REDHAT]][9777]: Shutting down Aug 17 16:45:31 bumblebee.localdomain systemd[1]: Failed to start System Security Services Daemon. Aug 17 16:45:31 bumblebee.localdomain systemd[1]: Unit sssd.service entered failed state. Aug 17 16:45:31 bumblebee.localdomain systemd[1]: sssd.service failed. If you can reproduce the bug as well, can you please do the following instead? 1) start sssd manually and find out exactly how sssd_be is running ps -ef | grep sssd_be it would be something like: /usr/libexec/sssd/sssd_be --domain redhat.com --uid 0 --gid 0 --debug-to-files except the domain would be different 2) open the sssd.conf and add the following to the [domain] section: command = strace -o /tmp/sssd_be_strace -ff /usr/libexec/sssd/sssd_be --domain redhat.com --uid 0 --gid 0 --debug-to-files 3) set SELinux to permissive (otherwise sssd won't be permitted to execute strace) and try out restarting the sssd service to make sure the config is correct 4) reboot, then attach /tmp/sssd_be_strace_* here Thank you! Lukas just suggested an interesting idea: is there a possibility that system time gets changed during the boot? I'll try the steps in comment 25 in a few minutes. First, I noticed that sssd does actually run for a few seconds on boot, and if you login very quickly after reboot, you can catch it before it dies. $ ssh bumblebee.example.com Last login: Thu Aug 18 08:52:54 2016 from foo.example.com [jbastian@bumblebee ~]$ ps ax | grep sss 1173 ? Ss 0:01 /usr/sbin/sssd -D -f 1174 ? S 0:00 /usr/libexec/sssd/sssd_be --domain default --uid 0 --gid 0 --debug-to-files 1175 ? S 0:00 /usr/libexec/sssd/sssd_be --domain FOO --uid 0 --gid 0 --debug-to-files 1176 ? S 0:00 /usr/libexec/sssd/sssd_nss --uid 0 --gid 0 --debug-to-files 1177 ? S 0:00 /usr/libexec/sssd/sssd_pam --uid 0 --gid 0 --debug-to-files 1178 ? S 0:00 /usr/libexec/sssd/sssd_autofs --uid 0 --gid 0 --debug-to-files 3073 pts/0 S+ 0:00 grep --color=auto sss [jbastian@bumblebee ~]$ uptime 09:19:45 up 1 min, 1 user, load average: 1.61, 0.55, 0.20 [jbastian@bumblebee ~]$ ps ax | grep sss 3098 pts/0 S+ 0:00 grep --color=auto sss (In reply to Pavel Březina from comment #26) > Lukas just suggested an interesting idea: is there a possibility that system > time gets changed during the boot? It should not be changing (much) during boot. The systems all have an RTC so the clock is reasonable close during boot, and chronyd is enabled so any adjustments should be very minor, maybe a few minutes if the RTC has drifted. From the reboot in comment 27: [jbastian@bumblebee ~]$ journalctl -b | egrep -i 'rtc|chrony' Aug 18 09:19:40 bumblebee.localdomain kernel: rtc-efi rtc-efi: rtc core: registered rtc-efi as rtc0 Aug 18 09:19:40 bumblebee.localdomain kernel: rtc-efi rtc-efi: setting system clock to 2016-08-18 14:19:40 UTC (1471529980) Aug 18 09:20:15 bumblebee.localdomain chronyd[1116]: chronyd version 2.1.1 starting (+CMDMON +NTP +REFCLOCK +RTC +PRIVDROP +DEBUG +ASYNCDNS +IPV6 +SECHASH) Aug 18 09:20:15 bumblebee.localdomain chronyd[1116]: Frequency 26.946 +/- 0.075 ppm read from /var/lib/chrony/drift Aug 18 09:20:29 bumblebee.localdomain chronyd[1116]: Selected source 10.5.26.10 Aug 18 09:20:29 bumblebee.localdomain chronyd[1116]: System clock wrong by -78.185003 seconds, adjustment started Aug 18 09:19:11 bumblebee.localdomain chronyd[1116]: System clock was stepped by -78.185003 seconds Is a difference of 1min 18sec enough to cause this problem? (In reply to Jeff Bastian from comment #28) > (In reply to Pavel Březina from comment #26) > > Lukas just suggested an interesting idea: is there a possibility that system > > time gets changed during the boot? > > It should not be changing (much) during boot. The systems all have an RTC > so the clock is reasonable close during boot, and chronyd is enabled so any > adjustments should be very minor, maybe a few minutes if the RTC has drifted. > > From the reboot in comment 27: > > [jbastian@bumblebee ~]$ journalctl -b | egrep -i 'rtc|chrony' > Aug 18 09:19:40 bumblebee.localdomain kernel: rtc-efi rtc-efi: rtc core: > registered rtc-efi as rtc0 > Aug 18 09:19:40 bumblebee.localdomain kernel: rtc-efi rtc-efi: setting > system clock to 2016-08-18 14:19:40 UTC (1471529980) > Aug 18 09:20:15 bumblebee.localdomain chronyd[1116]: chronyd version 2.1.1 > starting (+CMDMON +NTP +REFCLOCK +RTC +PRIVDROP +DEBUG +ASYNCDNS +IPV6 > +SECHASH) > Aug 18 09:20:15 bumblebee.localdomain chronyd[1116]: Frequency 26.946 +/- > 0.075 ppm read from /var/lib/chrony/drift > Aug 18 09:20:29 bumblebee.localdomain chronyd[1116]: Selected source > 10.5.26.10 > Aug 18 09:20:29 bumblebee.localdomain chronyd[1116]: System clock wrong by > -78.185003 seconds, adjustment started > Aug 18 09:19:11 bumblebee.localdomain chronyd[1116]: System clock was > stepped by -78.185003 seconds > > Is a difference of 1min 18sec enough to cause this problem? IMHO, the best would be to disable chronyd and try reboot. If the issue is gone then the bug is caused by time shift. That fixed it! sssd is still running 11 minutes after boot when I disabled chronyd: [jbastian@bumblebee ~]$ uptime 11:10:40 up 11 min, 1 user, load average: 0.00, 0.17, 0.20 [jbastian@bumblebee ~]$ systemctl status chronyd.service sssd.service ● chronyd.service - NTP client/server Loaded: loaded (/usr/lib/systemd/system/chronyd.service; disabled; vendor preset: enabled) Active: inactive (dead) ● sssd.service - System Security Services Daemon Loaded: loaded (/usr/lib/systemd/system/sssd.service; enabled; vendor preset: disabled) Drop-In: /etc/systemd/system/sssd.service.d └─journal.conf, strace.conf Active: active (running) since Thu 2016-08-18 11:00:49 CDT; 9min ago Process: 2020 ExecStart=/usr/sbin/sssd -D -f (code=exited, status=0/SUCCESS) Main PID: 2057 (sssd) CGroup: /system.slice/sssd.service ├─2057 /usr/sbin/sssd -D -f ├─2061 /usr/libexec/sssd/sssd_be --domain default --uid 0 --gid 0 ... ├─2062 /usr/libexec/sssd/sssd_be --domain REDHAT --uid 0 --gid 0 -... ├─2063 /usr/libexec/sssd/sssd_nss --uid 0 --gid 0 --debug-to-files... ├─2064 /usr/libexec/sssd/sssd_pam --uid 0 --gid 0 --debug-to-files... └─2065 /usr/libexec/sssd/sssd_autofs --uid 0 --gid 0 --debug-to-fi... Aug 18 11:00:46 bumblebee.localdomain systemd[1]: Starting System Security Se... Aug 18 11:00:48 bumblebee.localdomain sssd[2057]: Starting up Aug 18 11:00:49 bumblebee.localdomain sssd[be[default]][2061]: Starting up Aug 18 11:00:49 bumblebee.localdomain sssd[be[REDHAT]][2062]: Starting up Aug 18 11:00:49 bumblebee.localdomain sssd[be[REDHAT]][2062]: Your configurat... Aug 18 11:00:49 bumblebee.localdomain sssd[be[default]][2061]: Your configura... Aug 18 11:00:49 bumblebee.localdomain sssd[pam][2064]: Starting up Aug 18 11:00:49 bumblebee.localdomain sssd[nss][2063]: Starting up Aug 18 11:00:49 bumblebee.localdomain sssd[autofs][2065]: Starting up Aug 18 11:00:49 bumblebee.localdomain systemd[1]: Started System Security Ser... Hint: Some lines were ellipsized, use -l to show in full. Next, I synchronized the RTC to the wall time to minimize the amount of chronyd adjustment, and sssd stayed alive. [root@bumblebee ~]# systemctl enable chronyd.service [root@bumblebee ~]# systemctl start chronyd.service [root@bumblebee ~]# hwclock --systohc [root@bumblebee ~]# reboot ... [root@bumblebee ~]# uptime 11:37:59 up 5 min, 1 user, load average: 0.04, 0.30, 0.19 [root@bumblebee ~]# journalctl -b0 | egrep -i 'rtc|chrony' Aug 18 11:32:21 bumblebee.localdomain kernel: rtc-efi rtc-efi: rtc core: registered rtc-efi as rtc0 Aug 18 11:32:21 bumblebee.localdomain kernel: rtc-efi rtc-efi: setting system clock to 2016-08-18 16:32:21 UTC (1471537941) Aug 18 11:32:51 bumblebee.localdomain chronyd[1087]: chronyd version 2.1.1 starting (+CMDMON +NTP +REFCLOCK +RTC +PRIVDROP +DEBUG +ASYNCDNS +IPV6 +SECHASH) Aug 18 11:32:51 bumblebee.localdomain chronyd[1087]: Frequency 26.963 +/- 1.452 ppm read from /var/lib/chrony/drift Aug 18 11:33:06 bumblebee.localdomain chronyd[1087]: Selected source 10.5.27.10 Aug 18 11:36:23 bumblebee.localdomain chronyd[1087]: Selected source 50.205.244.27 [root@bumblebee ~]# pgrep -laf sss 1148 /usr/sbin/sssd -D -f 1149 /usr/libexec/sssd/sssd_be --domain default --uid 0 --gid 0 --debug-to-files 1150 /usr/libexec/sssd/sssd_be --domain FOO --uid 0 --gid 0 --debug-to-files 1152 /usr/libexec/sssd/sssd_nss --uid 0 --gid 0 --debug-to-files 1153 /usr/libexec/sssd/sssd_pam --uid 0 --gid 0 --debug-to-files 1154 /usr/libexec/sssd/sssd_autofs --uid 0 --gid 0 --debug-to-files Since the factor seems to be an RTC that's out of sync, and not the aarch64 architecture, I tested on an x86_64 system with RHEL 7.3 Beta by changing the clock and rebooting and I was able to reproduce the problem: [root@localhost ~]# uname -r 3.10.0-493.el7.x86_64 [root@localhost ~]# systemctl enable sssd.service [root@localhost ~]# systemctl stop chronyd.service [root@localhost ~]# date Thu Aug 18 13:22:46 EDT 2016 [root@localhost ~]# date 08181400 Thu Aug 18 14:00:00 EDT 2016 [root@localhost ~]# hwclock --systohc [root@localhost ~]# hwclock Thu 18 Aug 2016 02:00:10 PM EDT -0.141171 seconds [root@localhost ~]# reboot ... ... [root@localhost ~]# journalctl -b0 | egrep -i 'rtc|chrony' ... Aug 18 10:01:51 localhost.localdomain kernel: rtc_cmos 00:02: setting system clock to 2016-08-18 14:01:51 UTC (1471528911) ... Aug 18 14:02:04 localhost.example.com chronyd[670]: System clock wrong by -2228.878225 seconds, adjustment started Aug 18 13:24:56 localhost.example.com chronyd[670]: System clock was stepped by -2228.878225 seconds ... [root@localhost ~]# systemctl status sssd.service ● sssd.service - System Security Services Daemon Loaded: loaded (/usr/lib/systemd/system/sssd.service; enabled; vendor preset: disabled) Drop-In: /etc/systemd/system/sssd.service.d └─journal.conf Active: failed (Result: exit-code) since Thu 2016-08-18 13:25:28 EDT; 3min 34s ago Process: 666 ExecStart=/usr/sbin/sssd -D -f (code=exited, status=0/SUCCESS) Main PID: 693 (code=exited, status=1/FAILURE) It seems sssd only dies if the clock is adjusted backwards. In the comment above, I set the clock in the future so chronyd had to adjust the clock in a negative direction. I tested again, but this time set the clock in the past, and sssd stayed running. [root@localhost ~]# systemctl stop chronyd [root@localhost ~]# date Thu Aug 18 13:33:20 EDT 2016 [root@localhost ~]# date 08181200 Thu Aug 18 12:00:00 EDT 2016 [root@localhost ~]# hwclock --systohc [root@localhost ~]# hwclock Thu 18 Aug 2016 12:00:08 PM EDT -0.844269 seconds [root@localhost ~]# reboot ... ... [root@localhost ~]# journalctl -b0 | egrep -i 'rtc|chrony' ... Aug 18 08:03:20 localhost.localdomain kernel: rtc_cmos 00:02: setting system clock to 2016-08-18 12:03:20 UTC (1471521800) ... Aug 18 12:03:34 localhost.example.com chronyd[666]: System clock wrong by 5603.860154 seconds, adjustment started Aug 18 13:36:58 localhost.example.com chronyd[666]: System clock was stepped by 5603.860154 seconds ... [root@localhost ~]# uptime 13:49:37 up 12 min, 1 user, load average: 0.00, 0.01, 0.03 [root@localhost ~]# systemctl status sssd.service ● sssd.service - System Security Services Daemon Loaded: loaded (/usr/lib/systemd/system/sssd.service; enabled; vendor preset: disabled) Drop-In: /etc/systemd/system/sssd.service.d └─journal.conf Active: active (running) since Thu 2016-08-18 12:03:25 EDT; 1h 46min ago Process: 659 ExecStart=/usr/sbin/sssd -D -f (code=exited, status=0/SUCCESS) Main PID: 684 (sssd) CGroup: /system.slice/sssd.service ├─684 /usr/sbin/sssd -D -f ├─686 /usr/libexec/sssd/sssd_be --domain default --uid 0 --gid 0 -... ├─715 /usr/libexec/sssd/sssd_nss --uid 0 --gid 0 --debug-to-files ├─716 /usr/libexec/sssd/sssd_pam --uid 0 --gid 0 --debug-to-files └─717 /usr/libexec/sssd/sssd_autofs --uid 0 --gid 0 --debug-to-fil... Jeff, thank you very much for testing and confirmation of our suspicion. Yes, thanks for testing. Now I wonder how to fix this bug. Would it make sense to start sssd after chrony? I would say it might be a good idea in general to make sure that the time is in sync for providers that rely on Kerberos.. It's not just about chronyd. There are also ntpd.service systemd-timesyncd.service. And moreover it will cause issues for local users (but that's slightly unrelated to this ticket) What issues for local users? About chrony vs ntpd, could we just use both? If one of them is not installed, I think the After would be ignored.. I'm not sure how else to fix this, the issue seems to be that tevent timers use struct timeval with absolute time.. (In reply to Jakub Hrozek from comment #38) > What issues for local users? > That's pure upstream discussion so lets' avoid it here. > About chrony vs ntpd, could we just use both? If one of them is not > installed, I think the After would be ignored.. > > I'm not sure how else to fix this, the issue seems to be that tevent timers > use struct timeval with absolute time.. Changing the order of starting services might help at boot time. However time can be changed also later. Manually changed date, time synchronisation daemon failed to synchronize date due to various reasons (was disabled at boot time, bugs ...) How do we want to handle such use-cases? OK, do we already know why the time jumps backwards on the test systems? Is it just a property of the test systems? Unless sssd can get a notification about the system time changing, then I don't think we can do anything for the general case.. This happens on all systems. It's very common for the battery-backed RTC to drift, it may run slightly too fast or too slow. If it runs fast, the RTC will drift into the future. When the system boots, the kernel reads the RTC to initialize the system time, then chrony or ntp starts and makes any necessary adjustments. Sometimes the RTC is just wildly wrong -- off by years or even decades -- because it was never set correctly in the first place. And sometimes the battery dies, so when the systems is powered off, the RTC resets to its internal start date (often some day in 1970, 1980 or 2000, I don't know why). There was a proposal to add a notification mechanism to the kernel for system time changes, but it does not appear to have been accepted. https://lkml.org/lkml/2010/11/11/353 The normal method is to call gettimeofday() and make a note, then set an alarm or sleep for a set amount of time, and call gettimeofday() again, and if it changed by more than the expected sleep time, then the clock must have been adjusted. This wasn't a problem for sssd-1.13.0. What changed in 1.14.0 to make it sensitive to time changes? (In reply to Jeff Bastian from comment #41) > This happens on all systems. It's very common for the battery-backed RTC to > drift, it may run slightly too fast or too slow. If it runs fast, the RTC > will drift into the future. When the system boots, the kernel reads the RTC > to initialize the system time, then chrony or ntp starts and makes any > necessary adjustments. > > Sometimes the RTC is just wildly wrong -- off by years or even decades -- > because it was never set correctly in the first place. And sometimes the > battery dies, so when the systems is powered off, the RTC resets to its > internal start date (often some day in 1970, 1980 or 2000, I don't know why). > > There was a proposal to add a notification mechanism to the kernel for > system time changes, but it does not appear to have been accepted. > https://lkml.org/lkml/2010/11/11/353 > > The normal method is to call gettimeofday() and make a note, then set an > alarm or sleep for a set amount of time, and call gettimeofday() again, and > if it changed by more than the expected sleep time, then the clock must have > been adjusted. > Thank you for the analysis. We might want to do something like this for our watchdog. > This wasn't a problem for sssd-1.13.0. What changed in 1.14.0 to make it > sensitive to time changes? We changed the way the "worker" processes like sssd_nss or sssd_be are kept alive and autorestarted. Previously, the main 'sssd' process was sending 'pings' to the worker processes and they would reply with a 'pong'. When the worker failed to answer three times to a ping, the sssd process killed it and spawned a new one. But that was really complex and brittle, so in 1.14 we changed it so that the worker processes themselves set up a timer using timer_create() that send SIGRT to 'self'. When SIGRT is received, a counter is increased and when the counter is increased over a threshold (3 by default), the process quits. There is also a timer event using sssd's event loop that periodically clears the counter -- this is done using sssd's event loop to make sure events from the loop are processed actually. We can detect time shift in watchdog_handler(). We can store current time in setup_watchdog(), if it is bigger setup_watchdog() watchdogctx.current_time <- time watchdog_handler() watchdogctx.current_time > time => timeshift occured, restart watchdog (and other tevent timers if possible) watchdogctx.current_time <- time OK, can you prepare a patch? Upstream ticket: https://fedorahosted.org/sssd/ticket/3154 Thank you. I submitted patch for review. The bug was found and verified by test: /CoreOS/realmd/Sanity/AD-join-and-reboot It can be found here: * git: http://pkgs.devel.redhat.com/cgit/tests/realmd/tree/Sanity/AD-join-and-reboot * TCMS: https://tcms.engineering.redhat.com/case/269538/ * beaker: https://beaker.engineering.redhat.com/tasks/12975 Feel free to execute it. It basically joins the machine to our test AD via realmd and then perform a reboot to see if the changes made by realmd survives the reboot. I have no problem to verify the bug as well, if you want. But I think maybe a new test case should be created, because the test above found the issue only accidentally because the time change in aarch64 machines. This can change in the future and this test case will not found if there is a regression. On the other hand create an automated test case where the time changes during boot is not trivial. It shouldn't be too difficult to automate this, you just need to set the RTC clock to a bit in the future and reboot, e.g., if [ "$REBOOTCOUNT" -eq 0 ] ; then systemctl stop chronyd.service date --set="now + 1 hour" hwclock --systohc rhts-reboot else if systemctl --quiet is-active sssd.service ; then # report success else # report failure fi fi Oh, and maybe sleep a bit before checking if sssd.service is running: ... else sleep 300 if systemctl --quiet is-active sssd.service ; then ... master: * b8ceaeb80cffb00c26390913ea959b77f7e848b9 Verified with ~]# rpm -qa|grep sssd sssd-client-1.14.0-43.el7.aarch64 sssd-krb5-common-1.14.0-43.el7.aarch64 sssd-ad-1.14.0-43.el7.aarch64 sssd-ldap-1.14.0-43.el7.aarch64 sssd-proxy-1.14.0-43.el7.aarch64 python-sssdconfig-1.14.0-43.el7.noarch sssd-common-1.14.0-43.el7.aarch64 sssd-common-pac-1.14.0-43.el7.aarch64 sssd-ipa-1.14.0-43.el7.aarch64 sssd-krb5-1.14.0-43.el7.aarch64 sssd-1.14.0-43.el7.aarch64 ~]# uname -a Linux apm-mustang-ev3-01.lab.eng.brq.redhat.com 4.5.0-11.el7.aarch64 #1 SMP Thu Sep 15 08:51:15 EDT 2016 aarch64 aarch64 aarch64 GNU/Linux [root@apm-mustang-ev3-01 ~]# systemctl stop chronyd.service [root@apm-mustang-ev3-01 ~]# date Wed Sep 21 16:25:04 CEST 2016 [root@apm-mustang-ev3-01 ~]# hwclock Thu 18 Aug 2016 01:14:43 PM CEST -0.441143 seconds [root@apm-mustang-ev3-01 ~]# reboot [......] [root@apm-mustang-ev3-01 ~]# journalctl -b0 | egrep -i 'rtc|chrony' Aug 18 15:16:15 localhost.localdomain kernel: ACPI: SSDT 0x00000043FA818000 000032 (v02 REDHAT UARTCLKS 00000001 01000013) Aug 18 15:16:16 localhost.localdomain kernel: rtc-efi rtc-efi: rtc core: registered rtc-efi as rtc0 Aug 18 15:16:16 localhost.localdomain kernel: rtc-efi rtc-efi: setting system clock to 2016-08-18 13:16:15 UTC (1471526175) Aug 18 13:16:22 localhost.localdomain systemd[1]: RTC configured in localtime, applying delta of 120 minutes to system time. Aug 18 13:16:29 localhost.localdomain chronyd[733]: chronyd version 2.1.1 starting (+CMDMON +NTP +REFCLOCK +RTC +PRIVDROP +DEBUG +ASYNCDNS +IPV6 +SECHASH) Aug 18 13:16:29 localhost.localdomain chronyd[733]: Frequency 80.091 +/- 0.075 ppm read from /var/lib/chrony/drift Aug 18 13:16:30 localhost.localdomain systemd[1]: Starting Wait for chrony to synchronize system clock... Aug 18 13:16:49 apm-mustang-ev3-01.lab.eng.brq.redhat.com chronyd[733]: Selected source 10.34.255.7 Aug 18 13:16:49 apm-mustang-ev3-01.lab.eng.brq.redhat.com chronyd[733]: System clock wrong by 2949029.077709 seconds, adjustment started Sep 21 16:27:18 apm-mustang-ev3-01.lab.eng.brq.redhat.com chronyd[733]: System clock was stepped by 2949029.077709 seconds Sep 21 16:27:19 apm-mustang-ev3-01.lab.eng.brq.redhat.com systemd[1]: Started Wait for chrony to synchronize system clock. [root@apm-mustang-ev3-01 ~]# uptime 16:28:27 up 1 min, 1 user, load average: 0.41, 0.23, 0.09 [root@apm-mustang-ev3-01 ~]# systemctl status sssd ● sssd.service - System Security Services Daemon Loaded: loaded (/usr/lib/systemd/system/sssd.service; enabled; vendor preset: disabled) Drop-In: /etc/systemd/system/sssd.service.d └─journal.conf Active: active (running) since Thu 2016-08-18 13:16:32 CEST; 1 months 3 days ago Process: 729 ExecStart=/usr/sbin/sssd -D -f (code=exited, status=0/SUCCESS) Main PID: 776 (sssd) CGroup: /system.slice/sssd.service ├─776 /usr/sbin/sssd -D -f ├─778 /usr/libexec/sssd/sssd_be --domain testad.qe --uid 0 --gid 0 --debug-to-files ├─808 /usr/libexec/sssd/sssd_nss --uid 0 --gid 0 --debug-to-files └─809 /usr/libexec/sssd/sssd_pam --uid 0 --gid 0 --debug-to-files Aug 18 13:16:29 localhost.localdomain systemd[1]: Starting System Security Services Daemon... Aug 18 13:16:30 localhost.localdomain sssd[776]: Starting up Aug 18 13:16:30 localhost.localdomain sssd[be[testad.qe]][778]: Starting up Aug 18 13:16:32 localhost.localdomain sssd[nss][808]: Starting up Aug 18 13:16:32 localhost.localdomain sssd[pam][809]: Starting up Aug 18 13:16:32 localhost.localdomain systemd[1]: Started System Security Services Daemon. Warning: sssd.service changed on disk. Run 'systemctl daemon-reload' to reload units. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHEA-2016-2476.html |