Bug 1439457
| Summary: | SSSD does not start after upgrade from 7.3 to 7.4 | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Steeve Goveas <sgoveas> | ||||||||||
| Component: | sssd | Assignee: | SSSD Maintainers <sssd-maint> | ||||||||||
| Status: | CLOSED ERRATA | QA Contact: | Steeve Goveas <sgoveas> | ||||||||||
| Severity: | unspecified | Docs Contact: | |||||||||||
| Priority: | unspecified | ||||||||||||
| Version: | 7.4 | CC: | arusso, grajaiya, jhrozek, lslebodn, mkosek, mupadhye, mzidek, pbrezina, sbose, sgoveas, sssd-qe, tscherf | ||||||||||
| Target Milestone: | rc | ||||||||||||
| Target Release: | --- | ||||||||||||
| Hardware: | Unspecified | ||||||||||||
| OS: | Unspecified | ||||||||||||
| Whiteboard: | |||||||||||||
| Fixed In Version: | sssd-1.15.2-29.el7 | Doc Type: | If docs needed, set a value | ||||||||||
| Doc Text: | Story Points: | --- | |||||||||||
| Clone Of: | Environment: | ||||||||||||
| Last Closed: | 2017-08-01 09:04:18 UTC | Type: | Bug | ||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||
| Documentation: | --- | CRM: | |||||||||||
| Verified Versions: | Category: | --- | |||||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
| Embargoed: | |||||||||||||
| Attachments: |
|
||||||||||||
This is the error message: (Thu Apr 6 08:53:39 2017) [sssd[be[sssdad2012r2.com]]] [dp_module_open_lib] (0x0010): Unable to load module [ad] with path [/usr/lib64/sssd/libsss_ad.so]: libwbclient.so.0: cannot open shared object file: No such file or directory Indeed libsss_ad.so depends on libwbclient, but is it required? I cannot see a direct dependency, maybe it is pull in by a Samba library, but even here I think this should not be needed. Looks like libsmbclient pulls it in, but it does so since ever. Maybe there is a version mismatch in the alternatives configuration but it looks like the latest Samba and SSSD packages are ok in this respect. Which Samba version is installed in your test system? (In reply to Sumit Bose from comment #4) > Looks like libsmbclient pulls it in, but it does so since ever. Maybe there > is a version mismatch in the alternatives configuration but it looks like > the latest Samba and SSSD packages are ok in this respect. Which Samba > version is installed in your test system? Can be some issue with scriptlets for updating libwbclient alternatives and restarting sssd service. IMHO it depends on the order of actions in yum transaction. But it can be an explanation why it works after manual restart. yumlog might help. Created attachment 1269193 [details]
yum log
[root@vm-idm-004 ~]# rpm -qa | grep samba
samba-libs-4.6.2-1.el7.x86_64
samba-common-tools-4.6.2-1.el7.x86_64
samba-common-libs-4.6.2-1.el7.x86_64
samba-4.6.2-1.el7.x86_64
samba-client-libs-4.6.2-1.el7.x86_64
samba-common-4.6.2-1.el7.noarch
(In reply to Steeve Goveas from comment #6) > Created attachment 1269193 [details] > yum log > > [root@vm-idm-004 ~]# rpm -qa | grep samba > samba-libs-4.6.2-1.el7.x86_64 > samba-common-tools-4.6.2-1.el7.x86_64 > samba-common-libs-4.6.2-1.el7.x86_64 > samba-4.6.2-1.el7.x86_64 > samba-client-libs-4.6.2-1.el7.x86_64 > samba-common-4.6.2-1.el7.noarch It is not verbose enough and I assume that standard error is missing in attached file.. Try following command yum --assumeyes --obsoletes --debuglevel=6 --errorlevel=10 update &> yumlog Created attachment 1269283 [details]
yum log with debuglevel=6 errorlevel=10
May guess it that we should update # In order for libwbclient to be upgraded before sssd-ad and sets up the # alternatives symlink Requires: libwbclient >= 4.2.3-1 in sssd.spec to libwbclient >= 4.6.2-1. The the previous version is 4.4.4-9 both versions would match the Requires and hence the update order is not deterministic. btw does the issue happen during sssd restart?
I can see this hack in sssd.conf which was I think added for a similar reason:
1232 %posttrans common
1233 /bin/systemctl daemon-reload >/dev/null 2>&1 || :
1234 if [ -f /var/tmp/sssd.upgrade ]; then
1235 /bin/systemctl restart sssd.service >/dev/null 2>&1 || :
1236 else
1237 /bin/systemctl try-restart sssd.service >/dev/null 2>&1 || :
1238 fi
1239 /usr/bin/rm -f /var/tmp/sssd.upgrade || :
The /var/tmp/sssd.upgrade is created in %pre common
But we also have:
1152 %postun common
1153 %systemd_postun_with_restart sssd.service
Where the '%systemd_postun_with_restart' macro expands to:
if [ $1 -ge 1 ] ; then
# Package upgrade, not uninstall
systemctl try-restart >/dev/null 2>&1 || :
fi
So I wonder if just removing the '%systemd_postun_with_restart sssd.service' would help here?
Yes, it happened during restart but I think the posttrans one failed here because the libwbclient update wasn't finished because the Requires did not force libwbclient update to be finished first. (In reply to Sumit Bose from comment #11) > Yes, it happened during restart but I think the posttrans one failed here > because the libwbclient update wasn't finished because the Requires did not > force libwbclient update to be finished first. sssd update alternatives in post and not posttrans. But sssd-wbclient was not installed. I assume that libwbclient from samba install new alternative in "posttrans" and "posttrans" for sssd-common was executed before libwbclient. Maybe "Requires(post)" might help. But I cannot see output from rpm scriptlets in the last yum update log. We might need to debuglevel to 10 or even to change --rpmverbosity. Otherwise we can just guess when it failed. This build: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=13007985 implements Sumit's suggestion from comment #9. Steeve, would it be possible to test it out? Since 1.14.0-17 we tried to restart sssd in postun scriptlet in case of upgrade
using rpm macro "%systemd_postun_with_restart". It is a best practice with systemd.
It is not a problem if sssd-common is installed before libwbclient. Because postun scriptlets are executed in the same order as packages were in yum transaction. sssd-common postun will try restart sssd.service and libwbclient postun scritlet will remove an alternative.
But I am not aware of a way to guarantee such order. (sssd-common before libwbclient or before sssd-libwbclient). We can guarantee reverse order in sssd-common after these libraries.
In most cases (99%), postun scriptlet for sssd-common does not cause problems.
However we tried to restart sssd one more time in %posttrans for sssd-common due to some older bugs.
%posttrans common
/bin/systemctl daemon-reload >/dev/null 2>&1 || :
if [ -f /var/tmp/sssd.upgrade ]; then
/bin/systemctl restart sssd.service >/dev/null 2>&1 || :
else
/bin/systemctl try-restart sssd.service >/dev/null 2>&1 || :
fi
/usr/bin/rm -f /var/tmp/sssd.upgrade || :
In time of execution of this scriptlet, libwbclient alternative was already removed but it was not installed yet. Because sssd-common was installed before libwbclient. There was missing library libwbclient.so.
I was able to enforce upgrade/install order between sssd-common and libwbclient
* either sssd-common contained OrderWithRequires: *libwbclient
* or {sssd-,}libwbclient contianed OrderWithRequires: sssd-common.
However order of cleanup for these packages were wrong in both cases.
and therefore alternative was removed before sssd.service was restarted.
It looks like we need to use uglier version with posttrans.
Created attachment 1276186 [details]
Spec file changes
Verified with version
sssd-1.15.2-37.el7
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
:: [ LOG ] :: Check sssd is running after upgrade
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
:: [ BEGIN ] :: Running 'strict eval 'rpm -q sssd''
sssd-1.14.0-43.el7.x86_64
:: [ PASS ] :: Command 'strict eval 'rpm -q sssd'' (Expected 0, got 0)
:: [ BEGIN ] :: Running 'strict eval 'systemctl status sssd.service''
● sssd.service - System Security Services Daemon
Loaded: loaded (/usr/lib/systemd/system/sssd.service; enabled; vendor preset: disabled)
Drop-In: /etc/systemd/system/sssd.service.d
└─journal.conf
Active: active (running) since Tue 2017-05-30 16:06:53 IST; 7s ago
Process: 14628 ExecStart=/usr/sbin/sssd -D -f (code=exited, status=0/SUCCESS)
Main PID: 14637 (sssd)
CGroup: /system.slice/sssd.service
├─14637 /usr/sbin/sssd -D -f
├─14654 /usr/libexec/sssd/sssd_be --domain sssdad2012r2.com --uid 0 --gid 0 --debug-to-files
├─14655 /usr/libexec/sssd/sssd_be --domain rfc2307 --uid 0 --gid 0 --debug-to-files
├─14656 /usr/libexec/sssd/sssd_be --domain rfc2307bis --uid 0 --gid 0 --debug-to-files
├─14657 /usr/libexec/sssd/sssd_be --domain rfc2307bis_broken --uid 0 --gid 0 --debug-to-files
├─14658 /usr/libexec/sssd/sssd_be --domain proxy --uid 0 --gid 0 --debug-to-files
├─14659 /usr/libexec/sssd/sssd_be --domain LDAP --uid 0 --gid 0 --debug-to-files
├─14663 /usr/libexec/sssd/sssd_nss --uid 0 --gid 0 --debug-to-files
└─14664 /usr/libexec/sssd/sssd_pam --uid 0 --gid 0 --debug-to-files
May 30 16:06:53 ratchet.lab.eng.pnq.redhat.com sssd_be[14654]: GSSAPI client step 1
May 30 16:06:53 ratchet.lab.eng.pnq.redhat.com sssd_be[14654]: GSSAPI client step 2
May 30 16:07:00 ratchet.lab.eng.pnq.redhat.com proxy_child[14745]: Trying to become user [0][0].
May 30 16:07:00 ratchet.lab.eng.pnq.redhat.com proxy_child[14745]: Already user [0].
May 30 16:07:00 ratchet.lab.eng.pnq.redhat.com proxy_child[14745]: server_sort:Unable to register control with rootdse!
May 30 16:07:00 ratchet.lab.eng.pnq.redhat.com sssd[proxy_child[proxy]][14745]: Starting up
May 30 16:07:00 ratchet.lab.eng.pnq.redhat.com sssd_be[14654]: GSSAPI client step 1
May 30 16:07:00 ratchet.lab.eng.pnq.redhat.com sssd_be[14654]: GSSAPI client step 1
May 30 16:07:00 ratchet.lab.eng.pnq.redhat.com sssd_be[14654]: GSSAPI client step 1
May 30 16:07:00 ratchet.lab.eng.pnq.redhat.com sssd_be[14654]: GSSAPI client step 2
:: [ PASS ] :: Command 'strict eval 'systemctl status sssd.service'' (Expected 0, got 0)
:: [ BEGIN ] :: Running 'strict eval 'yum --assumeyes --obsoletes update > yumlog''
MARK-LWD-LOOP -- 2017-05-30 16:10:48 --
MARK-LWD-LOOP -- 2017-05-30 16:15:48 --
MARK-LWD-LOOP -- 2017-05-30 16:20:49 --
MARK-LWD-LOOP -- 2017-05-30 16:25:49 --
:: [ PASS ] :: Command 'strict eval 'yum --assumeyes --obsoletes update > yumlog'' (Expected 0, got 0)
:: [ BEGIN ] :: Running 'strict eval 'rpm -q sssd''
sssd-1.15.2-37.el7.x86_64
:: [ PASS ] :: Command 'strict eval 'rpm -q sssd'' (Expected 0, got 0)
:: [ BEGIN ] :: Running 'strict eval 'systemctl status sssd.service | grep "Active: active (running).*"''
Active: active (running) since Tue 2017-05-30 16:25:24 IST; 2min 14s ago
:: [ PASS ] :: Command 'strict eval 'systemctl status sssd.service | grep "Active: active (running).*"'' (Expected 0, got 0)
'770f6a8a-9b48-4097-ad13-6b6a03a1cef5'
Check-sssd-is-running-after-upgrade result: PASS
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:2294 |
Created attachment 1269165 [details] All sssd logs and messages file Description of problem: SSSD does not start after upgrade from 7.3 to 7.4. Version-Release number of selected component (if applicable): sssd-1.15.2-8.el7.x86_64 How reproducible: always Steps to Reproduce: [root@vm-idm-004 ~]# cat /etc/redhat-release Red Hat Enterprise Linux Server release 7.3 (Maipo) [root@vm-idm-004 ~]# rpm -q sssd sssd-1.14.0-43.el7.x86_64 Check sssd service is running [root@vm-idm-004 ~]# yum --assumeyes --obsoletes update > yumlog [root@vm-idm-004 ~]# rpm -q sssd sssd-1.15.2-8.el7.x86_64 [root@vm-idm-004 ~]# systemctl status sssd Actual results: [root@vm-idm-004 ~]# systemctl status sssd ● sssd.service - System Security Services Daemon Loaded: loaded (/usr/lib/systemd/system/sssd.service; enabled; vendor preset: disabled) Drop-In: /etc/systemd/system/sssd.service.d └─journal.conf Active: failed (Result: exit-code) since Thu 2017-04-06 08:53:56 IST; 53min ago Process: 7408 ExecStart=/usr/sbin/sssd -i -f (code=exited, status=1/FAILURE) Main PID: 7408 (code=exited, status=1/FAILURE) Apr 06 08:53:56 vm-idm-004.lab.eng.pnq.redhat.com sssd[7408]: Exiting the SSSD. Could not restart critical service [sssdad2012r2.com]. Apr 06 08:53:56 vm-idm-004.lab.eng.pnq.redhat.com sssd[be[LDAP]][7414]: Shutting down Apr 06 08:53:56 vm-idm-004.lab.eng.pnq.redhat.com sssd[be[proxy]][7413]: Shutting down Apr 06 08:53:56 vm-idm-004.lab.eng.pnq.redhat.com sssd[be[rfc2307bis_broken]][7412]: Shutting down Apr 06 08:53:56 vm-idm-004.lab.eng.pnq.redhat.com sssd[be[rfc2307bis]][7411]: Shutting down Apr 06 08:53:56 vm-idm-004.lab.eng.pnq.redhat.com sssd[be[rfc2307]][7410]: Shutting down Apr 06 08:53:56 vm-idm-004.lab.eng.pnq.redhat.com systemd[1]: sssd.service: main process exited, code=exited, status=1/FAILURE Apr 06 08:53:56 vm-idm-004.lab.eng.pnq.redhat.com systemd[1]: Failed to start System Security Services Daemon. Apr 06 08:53:56 vm-idm-004.lab.eng.pnq.redhat.com systemd[1]: Unit sssd.service entered failed state. Apr 06 08:53:56 vm-idm-004.lab.eng.pnq.redhat.com systemd[1]: sssd.service failed. Expected results: SSSD should start Additional info: On starting manually it works [root@vm-idm-004 ~]# systemctl start sssd [root@vm-idm-004 ~]# systemctl status sssd ● sssd.service - System Security Services Daemon Loaded: loaded (/usr/lib/systemd/system/sssd.service; enabled; vendor preset: disabled) Drop-In: /etc/systemd/system/sssd.service.d └─journal.conf Active: active (running) since Thu 2017-04-06 10:08:21 IST; 16s ago Main PID: 30753 (sssd) CGroup: /system.slice/sssd.service ├─30753 /usr/sbin/sssd -i -f ├─30754 /usr/libexec/sssd/sssd_be --domain sssdad2012r2.com --uid 0 --gid 0 --debug-to-files ├─30755 /usr/libexec/sssd/sssd_be --domain rfc2307 --uid 0 --gid 0 --debug-to-files ├─30756 /usr/libexec/sssd/sssd_be --domain rfc2307bis --uid 0 --gid 0 --debug-to-files ├─30757 /usr/libexec/sssd/sssd_be --domain rfc2307bis_broken --uid 0 --gid 0 --debug-to-files ├─30758 /usr/libexec/sssd/sssd_be --domain proxy --uid 0 --gid 0 --debug-to-files ├─30759 /usr/libexec/sssd/sssd_be --domain LDAP --uid 0 --gid 0 --debug-to-files ├─30821 /usr/libexec/sssd/sssd_nss --uid 0 --gid 0 --debug-to-files └─30822 /usr/libexec/sssd/sssd_pam --uid 0 --gid 0 --debug-to-files messages Apr 6 10:08:15 vm-idm-004 systemd: Starting System Security Services Daemon... Apr 6 10:08:15 vm-idm-004 sssd: Starting up Apr 6 10:08:15 vm-idm-004 sssd[be[sssdad2012r2.com]]: Starting up Apr 6 10:08:16 vm-idm-004 sssd[be[rfc2307]]: Starting up Apr 6 10:08:16 vm-idm-004 sssd[be[rfc2307bis_broken]]: Starting up Apr 6 10:08:16 vm-idm-004 sssd[be[rfc2307bis]]: Starting up Apr 6 10:08:16 vm-idm-004 sssd[be[proxy]]: Starting up Apr 6 10:08:16 vm-idm-004 sssd[be[LDAP]]: Starting up Apr 6 10:08:16 vm-idm-004 sssd: tkey query failed: GSSAPI error: Major = Unspecified GSS failure. Minor code may provide more information, Minor = Server not found in Kerberos database. Apr 6 10:08:16 vm-idm-004 sssd: tkey query failed: GSSAPI error: Major = Unspecified GSS failure. Minor code may provide more information, Minor = Server not found in Kerberos database. Apr 6 10:08:16 vm-idm-004 sssd: ; TSIG error with server: tsig verify failure Apr 6 10:08:16 vm-idm-004 sssd: update failed: REFUSED Apr 6 10:08:16 vm-idm-004 sssd: ; TSIG error with server: tsig verify failure Apr 6 10:08:16 vm-idm-004 sssd: update failed: REFUSED Apr 6 10:08:16 vm-idm-004 sssd: ; TSIG error with server: tsig verify failure Apr 6 10:08:16 vm-idm-004 sssd: update failed: REFUSED Apr 6 10:08:20 vm-idm-004 sssd[nss]: Starting up Apr 6 10:08:20 vm-idm-004 sssd[pam]: Starting up Apr 6 10:08:21 vm-idm-004 systemd: Started System Security Services Daemon. Apr 6 10:08:23 vm-idm-004 logger: 2017-04-06 10:08:23 /usr/bin/rhts-test-runner.sh 86617 35460 hearbeat...