Red Hat Bugzilla – Bug 1401546
Please back-port fast failover from sssd 1.14 on RHEL 7 into sssd 1.13 on RHEL 6
Last modified: 2018-06-19 01:15:04 EDT
Description of problem: Here is the scenario: 1) We have a working machine talking to LDAP for user authentication. We are leveraging SSSD for this communication, leveraging nss and pam services. 2) We use iptables and insert a rule to drop the LDAP traffic 3) After an extended period of time (say 20min) we remove the iptable rule and permit the ldap traffic again 4) Used tcpdump to observe ldap traffic during the outage. When performing the above it seems to take over 5 minutes for the system to wake up and re-establish a connection to ldap. The system we tested this on is: Version-Release number of selected component (if applicable): RHEL 6.n, sssd <= 1.13 How reproducible: Always Steps to Reproduce: 1. See above 2. 3. Actual results: It takes a long time Expected results: It should connect back up to its LDAP server more quickly after the LDAP server becomes available again. Additional info: We did an evaluation of the latest 7.3 release and noticed the performance of sssd was almost instantaneous after recovering from a prolonged outage. Looking at the version and changelog, it appears with sssd v 1.14.0 this issue was fixed; eagle-eyed Mike picked up on it: [host.example.com ~]$ rpm --changelog -q sssd # snippet * Tue Jul 12 2016 Jakub Hrozek <jhrozek@redhat.com> - 1.14.0-3 - Sync a few minor patches from upstream - Fix a failover issue - Resolves: rhbz#1334749 - sssd fails to mark a connection as bad on searches that time out So the question becomes, can sssd 1.14.0 be made available for RHEL 6? Or would it be possible to back-port this fix to sssd 1.13?
(In reply to Greg Scott from comment #0) > So the question becomes, can sssd 1.14.0 be made available for RHEL 6? > Or would it be possible to back-port this fix to sssd 1.13? There isn't any technical problem to build sssd-1.14 on rhel6 https://copr.fedorainfracloud.org/coprs/g/sssd/sssd-1-14/ However, rhel6 is in production phase which does not allow rebases. Regarding to BZ1334749. The fix is trivial https://fedorahosted.org/sssd/ticket/3009. The question is whether it will help for customer or it is just guessing based on rhel7.3 changelog.
(In reply to Lukas Slebodnik from comment #2) > Regarding to BZ1334749. The fix is trivial > https://fedorahosted.org/sssd/ticket/3009. The question is whether it will > help for customer or it is just guessing based on rhel7.3 changelog. I talked to Greg over e-mail and they were guessing. But I asked for a bugzilla to be opened nonetheless, because otherwise we will just forget about this problem. So I built the test package as requested: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=12198058 However, the problem, as described here sounds a bit different than what I thought it was. The fix was about "failing fast", that is, if a server was marked as bad, it took SSSD too long to get into the offline mode. It seems that the customer's problem is "recovering fast" which is quite different. Moreover, as I already said in the case earlier, sssd mostly reacts to queries from outside, it rarely performs any lookups or reconnects on its own. So I would like to ask for 1) the test package to be tried out. It's the 6.9 candidate with the extra patch applied 2) if this doesn't help (and I'm not convinced anymore it would), then please attach sssd logs that capture the period between inserting and removing the iptables rules. The logs in the customer case do not capture any failover at all. Also, I see errors like "Unexpected result from ldap: Server is unwilling to perform(53), Rejecting the requested operation because the connection has not been authenticated" which seems to suggest they are using an anonymous bind against a server that requires authentication (like AD).
Jakub, I pasted most of your update and your link into the support case yesterday and asked the customer to test the package you built. This thought may be too creative but it's bound to come up. If the customer were to grab a 1.14 ssd from a RHEL 7 repo and try to install that RPM onto RHEL 6, would the install fail with a bazillion dependency problems? - Greg
(In reply to Greg Scott from comment #6) > Jakub, I pasted most of your update and your link into the support case > yesterday and asked the customer to test the package you built. > Thank you. > This thought may be too creative but it's bound to come up. If the customer > were to grab a 1.14 ssd from a RHEL 7 repo and try to install that RPM onto > RHEL 6, would the install fail with a bazillion dependency problems? Yes, this wouldn't work and wouldn't be supported.
I have this comment from the customer: > Unfortunately I'm unable to get to the URL > https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=12198058 from > within PNC and even my home system. Is there an alternative way to grab the > package? Also we are running rhel6.6 so do I need to upgrade to rhel6.9 for > testing or will this package work on 6.6? My fault, it's probably private and I should have packaged this better for the customer. I navigated through the URL above and found the RPMs. It looks like there are a bunch of dependencies that also need to be installed. This is what I think needs to happen. First, do: yum update sssd to get to the stock 1.13 sssd and all its dependencies. Then do: rpm -i sssd-1.13.3-51.el6.i686.rpm On top of the sssd 1.13 already in place. Am I on solid ground? thanks - Greg
(In reply to Greg Scott from comment #8) > This is what I think needs to happen. > > First, do: > > yum update sssd > > to get to the stock 1.13 sssd and all its dependencies. Then do: > > rpm -i sssd-1.13.3-51.el6.i686.rpm > > On top of the sssd 1.13 already in place. Am I on solid ground? > IIRC the only new dependency which is not in rhel6.8 is packages built from ding-libs-0.4.0-12.el6
Yeah, I realized the dependency might be missing so I'm building another test package with just that single patch cherry-picked atop 6.8 and without requiring newer ding-libs: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=12215039 this one should install cleanly on a RHEL-6 system. You should use rpm -Uvh *.rpm to upgrade to these packages. None of the -devel packages are needed for a test and in general only packages that the customer already has should be upgraded (even upgrading only sssd-common should work).
I may have messed up here. I see references to src.rpm files in that link above, but I don't see any installable RPMs. I clicked on some buttons and ended up at https://brewweb.engineering.redhat.com/brew/packageinfo?packageID=17261 where I saw a package named sssd-1.13.3-51.el6.i686 That linked to: https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=527920 where I saw installable RPMs, including: sssd-1.13.3-51.el6.i686.rpm built yesterday, which I attached to the support case. Did I get the wrong one? - Greg
(In reply to Greg Scott from comment #11) > I may have messed up here. I see references to src.rpm files in that link > above, but I don't see any installable RPMs. I clicked on some buttons and > ended up at > > https://brewweb.engineering.redhat.com/brew/packageinfo?packageID=17261 > > where I saw a package named > > sssd-1.13.3-51.el6.i686 > > That linked to: > https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=527920 > > where I saw installable RPMs, including: > > sssd-1.13.3-51.el6.i686.rpm > > built yesterday, which I attached to the support case. Did I get the wrong > one? > > - Greg You'll want to use the build I provided yesterday. Sorry about the first one, I didn't realize there was a dependency on a newer libini package than the customer has through RHN, so the packages wouldn't install cleanly. As for the arch specific packages, here are the ones for x86_64: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=12215044 and here are the packages for i686: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=12215046
Thanks Jakub. I just now attached sssd-1.13.3-22.el6_8.6.1.x86_64.rpm and sssd-1.13.3-22.el6_8.6.1.i686.rpm to the case. The customer wants to stay at RHEL 6.6. So I left instructions to do yum update sssd to get the stock sssd 1.13 and all its dependencies, and then install the RPM above by hand. - Greg
After updating sssd to the stock 1.13 version, the customer had to install several of the patched RPMs by hand. Here is feedback from the support case after installing and testing: ********** "So was able to update the packages. Helmuth ran a test and the issue still exists unfortunately. I went ahead and attached a sosreport from our test system in case you would to take a look at our config. Please let me know if you need anything else or what the next steps are." ********** If it's easy to do and low risk, would it be possible to just build 1.14 on RHEL 6? - Greg
(In reply to Greg Scott from comment #14) > After updating sssd to the stock 1.13 version, the customer had to install > several of the patched RPMs by hand. Here is feedback from the support case > after installing and testing: > > ********** > "So was able to update the packages. Helmuth ran a test and the issue still > exists unfortunately. > > I went ahead and attached a sosreport from our test system in case you would > to take a look at our config. Please let me know if you need anything else > or what the next steps are." > ********** > > If it's easy to do and low risk, would it be possible to just build 1.14 on > RHEL 6? > > - Greg Possible, yes, supported no. In general, I'm wary of asking customers to run the upstream repositories, because we've had a couple of cases in the past when the customer then came back and asked for support on those.. Could we see two sets of debug logs, one from the working and one from the non-working system that capture the problem so that we can see what the issue might be?
Oh - sorry - I didn't mean for the customer to build 1.14. The customer is asking Red hat to build and fully support sssd 1.14 on RHEL 6. The customer attached an SOSreport from the problem system to the support case. Where's a good place to get you a copy? - Greg
(In reply to Greg Scott from comment #16) > Oh - sorry - I didn't mean for the customer to build 1.14. The customer is > asking Red hat to build and fully support sssd 1.14 on RHEL 6. > That won't happen, sorry. RHEL-6 is already in production phase 2 and only urgent and high priority fixes are cherry-picked individually. > The customer attached an SOSreport from the problem system to the support > case. Where's a good place to get you a copy? Attach the sssd debug logs to this bugzilla, please.
Upstream ticket: https://fedorahosted.org/sssd/ticket/3274
Hi Jakub - Since the customer tested the patch and it works for them, what should I tell them about plans, if any, to incorporate it into a RHEL 6 stream? thanks - Greg
(In reply to Greg Scott from comment #31) > Hi Jakub - Since the customer tested the patch and it works for them, what > should I tell them about plans, if any, to incorporate it into a RHEL 6 > stream? > > thanks > > - Greg Unfortunately, I think it's too late for 6.9 unless an exception is provided by a PM since we are already in the snapshot phase. I think either 6.10 or 6.9.z is more realistic..
OK, thanks - Here is what I said in the support case: ****************** Created By: Greg Scott (1/13/2017 10:53 AM) It looks like the engineering team is planning to incorporate the patch you tested into a later RHEL 6 stream. It's too late for 6.9, so hopefully 6.10. It may make it into a later 6.9.z release stream. I'll keep an eye on it and update here when new information is available. - Greg ****************** - Greg
Groovy - thanks to both Jakub and Fabiano. If you have to stand on your head to force-fit these fixes into SSSD 1.13, and if SSSD 1.14 saves your developer time, and if you guys think it's the best way to proceed, then I'll vote to bend the other rule and slip SSSD 1.14 into RHEL 6.10.z, or more likely 6.11. - Greg
(In reply to Greg Scott from comment #48) > Groovy - thanks to both Jakub and Fabiano. > > If you have to stand on your head to force-fit these fixes into SSSD 1.13, > and if SSSD 1.14 saves your developer time, and if you guys think it's the > best way to proceed, then I'll vote to bend the other rule and slip SSSD > 1.14 into RHEL 6.10.z, or more likely 6.11. Putting a major update into a z-stream is generally forbidden and there won't be (AFAIK) a 6.11 so I think 6.10 is the last chance we've got. I've been testing sssd master's reconnection logic lately and I realized we broke something else in 1.15. I'm not sure it's related to the code in 1.13 at all, but I would like to make sure that we're not backporting a bug. If the customer has some time, it would be nice if they could test the 6.10 candidate to make sure something else between the test build and the 6.10 candidate didn't change.
Thanks Jakub. I just left a comment in the support case asking about it. - Greg
(In reply to Greg Scott from comment #50) > Thanks Jakub. I just left a comment in the support case asking about it. > > - Greg Did the customer agree with testing? If yes, this is the link: https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=657662
Thanks Jakub. No answer yet from the customer. - Greg
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:1877