Bug 1620315
Summary: | FreeIPA webUI user authentication fails with 389-ds-base 1.3.8.6+ | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Adam Williamson <awilliam> |
Component: | 389-ds-base | Assignee: | mreynolds |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 27 | CC: | lkrispen, mhonek, mreynolds, rmeggins, tbordaz, vashirov |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2018-11-27 15:05:29 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Attachments: |
Description
Adam Williamson
2018-08-23 01:56:43 UTC
Created attachment 1478020 [details]
server logs (tarball of /var/log from the server)
Created attachment 1478021 [details]
client logs (tarball of /var/log from the client)
dirsrv logs seem to be missing from the tarball, without them it's hard to say why the operation failed. In 1.3.8.5 there was a filter optimization fix, 1.3.8.6 only backs out fix for https://pagure.io/389-ds-base/issue/49789, so I wonder if 1.3.8.5 failed too or it only started with 1.3.8.6. The log tarballs contain the whole of /var/log ; if there's nothing in /var/log/dirsrv in the tarball, there wasn't anything in it on the host. The logs in question may be in the journal; did you check the journal? (You can read it with journalctl --file , the files are under var/log/journal ). The deal with 1.3.8.5 and 1.3.8.6 is that the thing backed out in 1.3.8.6 actually broke FreeIPA server deployment. So I can't really *prove* that this bug existed in 1.3.8.5 because the tests can't get that far, as the server deployment fails. However, since all 1.3.8.6 did was revert that problematic 'fix', logically speaking it seems almost certain that one of the *other* things in 1.3.8.5 (the ones that weren't reverted in 1.3.8.6) is what causes this bug. Very odd there is no logging for DS. Do you know if the logging was disabled intentionally? In the journal I do see the server's error log content but not the access log, and it's the access log that we need. Is this a system I can look at right now? No, the systems are spun up on demand and destroyed right after the test. But I can tweak things and re-run them. It should also be possible to recreate the test manually, though I have a bit of trouble doing so here as https://bugzilla.redhat.com/show_bug.cgi?id=1484130 *still* does not appear to be fixed, so I cannot use disk images on a shared drive, and I don't really have enough space on my local system for disks for the two VMs I usually use to recreate this kind of issue :/ Where would ds logging be configured? If you tell me where to look, I can get the test to tell us what's there, and change it if necessary. I'm on #freeipa on Freenode if you want to discuss this there... Oh, and the tests don't do anything to reduce logging detail (they do a few things to try and get *more* logs). If there aren't enough tests coming out of the logs, that's the way things are set up by default, apparently. Created attachment 1478337 [details]
updated server logs (with dirsrv)
Turns out the server logs being captured after FreeIPA decommissioning meant dirsrv logs were removed; here's an updated tarball with the logs captured before decommissioning, so dirsrv logs are present. The failed login as test3 happens at 19:46:32.0522 UTC (15:46 in the dirsrv logs) and the failed console login as test1 at 19:48:44.
Created attachment 1478355 [details]
server logs from a passed run (same test against a different update)
Here's the same server log tarball from a case where the tests *passed* - this is the exact same test run against a different update, so 389-ds 1.3.8.4 is used. Comparing logs from passed and failed cases may help, I guess. This is with the tweak to ensure dirsrv logs are included. In these logs, the failed attempt to log in to the web UI occurs at 2018-08-23T20:37:44.0902 UTC .
Err...I obviously meant that the *successful* attempt to log in to the web UI occurred at 2018-08-23T20:37:44.0902 UTC :) A quick look through the logs is not too revealing. I see several successful binds as test3: [23/Aug/2018:16:38:19.685947465 -0400] conn=101 op=0 BIND dn="" method=sasl version=3 mech=GSS-SPNEGO [23/Aug/2018:16:38:19.690602686 -0400] conn=101 op=0 RESULT err=0 tag=97 nentries=0 etime=0.0004827704 dn="uid=test3,cn=users,cn=accounts,dc=domain,dc=local" I don't see any errors. In the successful log I see "more" successful binds than in the failing log. I also see a direct (non-sasl) bind of the user in the successful log which I do not see in the failure log: [23/Aug/2018:16:38:19.291538409 -0400] conn=96 fd=122 slot=122 connection from local to /var/run/slapd-DOMAIN-LOCAL.socket [23/Aug/2018:16:38:19.291973196 -0400] conn=96 op=0 BIND dn="uid=test3,cn=users,cn=accounts,dc=domain,dc=local" method=128 version=3 [23/Aug/2018:16:38:19.292841696 -0400] conn=96 op=0 RESULT err=0 tag=97 nentries=0 etime=0.0001134752 dn="uid=test3,cn=users,cn=accounts,dc=domain,dc=local" [23/Aug/2018:16:38:19.293035197 -0400] conn=96 op=1 UNBIND [23/Aug/2018:16:38:19.293068018 -0400] conn=96 op=1 fd=122 closed - U1 Perhaps there is an IPA log that might have more information about why its failing? I will also do a more fine grained log analysis tomorrow. Yeah, that's more or less what I saw. Clearly more 'stuff' happens in the successful login case, but there's no clear reason why the failed one just apparently gives up - the transactions right before it gives up look just the same as in the successful case. I'm currently fiddling with making the test hack up the relevant bits of FreeIPA to log harder; will report any findings from that. Welp, so far, I know for sure that the error is happening here: https://github.com/freeipa/freeipa/blob/release-4-6-3/ipapython/ipaldap.py#L1467 I hacked up the routines there with some debugging statements, you can see the adjusted copy of the file with the added logging lines at https://www.happyassassin.net/temp/ipaldap.py - search for 'XXX'. Here are logs from passed and failed attempts with those changes: https://openqa.stg.fedoraproject.org/tests/344011/file/role_deploy_domain_controller_check-var_log.tar.gz (failed) https://openqa.stg.fedoraproject.org/tests/344013/file/role_deploy_domain_controller_check-var_log.tar.gz (passed) The attempts to log in to the web UI happened at 05:00:02 (pass) and 04:59:10 (fail). The log messages are in var/log/httpd/error.log (yeah, all IPA web UI log messages appear to go to error.log, even 'info' and 'debug' messages). Welp, that's fun - the 'success' case logs 'XXX raising EmptyResult error' (which is the log message I added right before the exception that seems to produce the "no matching entry found" error we see in the web UI) 9 times, the 'failure' case logs it 3 times. I'm done trying to figure out what the hell's going on here for tonight... I'm starting to wonder if this exception is a red herring, and what's really going on is the login process just gets...aborted, or cancelled, somehow, partway through, and *because* that happened, the web UI shows us some exception that was raised...but actually gets raised on the 'success' path too, we just don't get a message about it because the login succeeds. But really not sure. It really seems like the login process just kinda...stops, partway through. The dirsrv logs, httpd logs and even the system logs (there are way more occurrences of 'GSSAPI client step 1' in the journal around the time of the web UI login in the 'success' logs than the 'failure' logs) all look like that. But I can't find any smoking gun explaining *why*. We know there is an issue with one of the fixes in 1.3.8.5 around complex filters and access control. So I backed out the filter fix and did a copr build: https://copr-be.cloud.fedoraproject.org/results/@389ds/389-ds-base-nightly/fedora-27-x86_64/00790627-389-ds-base/ If it still fails there is one more candidate fix that might be related that I can back out as well, but let's see how this build reacts. Looking at the logs again I see that IPA is using GER (getEffectRights), and that behavior is different in the failure logs. We did have a complex fix around GER that went into 1.3.8.5. So I did a separate copr build with just the GER fix removed. I suspect this might actually be the bug, and not the filter fix from the first copr build I did. So please try this copr build first (GER fix removed): https://copr-be.cloud.fedoraproject.org/results/@389ds/389-ds-base-nightly/fedora-27-x86_64/00790664-389-ds-base/ Then if there is still a problem try this one(from comment 17): https://copr-be.cloud.fedoraproject.org/results/@389ds/389-ds-base-nightly/fedora-27-x86_64/00790627-389-ds-base/ Thanks! Will do! Backing the 1.3.8.5 commits out one at a time was gonna be my next play, but thanks for getting to it ahead of me and knowing which ones to try first :P OK. I did horrible things to openQA to make it test those builds, and it looks like the test *fails* with the first one (790664): https://openqa.stg.fedoraproject.org/tests/overview?distri=fedora&version=27&build=RHBZ1620315-790664-NOREPORT&groupid=2 but *passes* with the second one (790627): https://openqa.stg.fedoraproject.org/tests/overview?distri=fedora&version=27&build=RHBZ1620315-790267-NOREPORT&groupid=2 note I mistyped the ID as '790267' there, but fortunately I typo'ed it consistently but used the correct URLs, so the test did in fact download and use the build, as you can see: https://openqa.stg.fedoraproject.org/tests/344487#step/_advisory_update/23 so, it looks like the change from #c17 fixes things. Here is an example of the failure/success taken from the attached logs. FAILURE (role_deploy_domain_controller_check-var_log.tar.gz) [23/Aug/2018:15:46:32.800472178 -0400] conn=88 op=0 BIND dn="" method=sasl version=3 mech=GSS-SPNEGO [23/Aug/2018:15:46:32.807363447 -0400] conn=88 op=0 RESULT err=0 tag=97 nentries=0 etime=0.0007212877 dn="uid=test3,cn=users,cn=accounts,dc=domain,dc=local" ... [23/Aug/2018:15:46:32.825391581 -0400] conn=88 op=4 SRCH base="cn=masters,cn=ipa,cn=etc,dc=domain,dc=local" scope=1 filter="(objectClass=ipaConfigObject)" attrs="cn" [23/Aug/2018:15:46:32.825653982 -0400] conn=88 op=4 RESULT err=0 tag=101 nentries=0 etime=0.0000346845 [23/Aug/2018:15:46:32.826549921 -0400] conn=88 op=5 EXT oid="1.3.6.1.4.1.4203.1.11.3" name="whoami-plugin" [23/Aug/2018:15:46:32.826663836 -0400] conn=88 op=5 RESULT err=0 tag=120 nentries=0 etime=0.0000292653 [Thu Aug 23 15:46:32.816961 2018] [wsgi:error] [pid 7726:tid 139681378449152] [remote 10.0.2.102:37654] ipa: DEBUG: config_show(rights=False, all=False, raw=False, version='2.229') [Thu Aug 23 15:46:32.825950 2018] [wsgi:error] [pid 7726:tid 139681378449152] [remote 10.0.2.102:37654] ipa: INFO: test3: batch: config_show(): EmptyResult [Thu Aug 23 15:46:32.826114 2018] [wsgi:error] [pid 7726:tid 139681378449152] [remote 10.0.2.102:37654] ipa: DEBUG: raw: whoami(version='2.229') [Thu Aug 23 15:46:32.826234 2018] [wsgi:error] [pid 7726:tid 139681378449152] [remote 10.0.2.102:37654] ipa: DEBUG: whoami(version='2.229') ----------------------------------------------------------------------------------------- SUCCESS (role_deploy_domain_controller_check-var_log.tar.gz) [23/Aug/2018:16:37:45.128176099 -0400] conn=88 op=0 BIND dn="" method=sasl version=3 mech=GSS-SPNEGO [23/Aug/2018:16:37:45.133772003 -0400] conn=88 op=0 RESULT err=0 tag=97 nentries=0 etime=0.0005853464 dn="uid=test3,cn=users,cn=accounts,dc=domain,dc=local" ... [23/Aug/2018:16:37:45.151352349 -0400] conn=88 op=4 SRCH base="cn=masters,cn=ipa,cn=etc,dc=domain,dc=local" scope=1 filter="(objectClass=ipaConfigObject)" attrs="cn" [23/Aug/2018:16:37:45.151769181 -0400] conn=88 op=4 RESULT err=0 tag=101 nentries=1 etime=0.0000526557 .. [23/Aug/2018:16:37:45.155848326 -0400] conn=88 op=7 SRCH base="cn=masters,cn=ipa,cn=etc,dc=domain,dc=local" scope=1 filter="(objectClass=ipaConfigObject)" attrs="cn" [23/Aug/2018:16:37:45.156191319 -0400] conn=88 op=7 RESULT err=0 tag=101 nentries=1 etime=0.0000444405 ... [23/Aug/2018:16:37:45.159116377 -0400] conn=88 op=10 SRCH base="cn=masters,cn=ipa,cn=etc,dc=domain,dc=local" scope=1 filter="(objectClass=ipaConfigObject)" attrs="cn" [23/Aug/2018:16:37:45.159579732 -0400] conn=88 op=10 RESULT err=0 tag=101 nentries=1 etime=0.0000615801 [23/Aug/2018:16:37:45.161628540 -0400] conn=88 op=11 EXT oid="1.3.6.1.4.1.4203.1.11.3" name="whoami-plugin" [23/Aug/2018:16:37:45.161700145 -0400] conn=88 op=11 RESULT err=0 tag=120 nentries=0 etime=0.0000154022 [Thu Aug 23 16:37:45.143757 2018] [wsgi:error] [pid 7728:tid 140691395168000] [remote 10.0.2.102:51334] ipa: DEBUG: config_show(rights=False, all=False, raw=False, version='2.229') [Thu Aug 23 16:37:45.161018 2018] [wsgi:error] [pid 7728:tid 140691395168000] [remote 10.0.2.102:51334] ipa: INFO: test3: batch: config_show(): SUCCESS [Thu Aug 23 16:37:45.161220 2018] [wsgi:error] [pid 7728:tid 140691395168000] [remote 10.0.2.102:51334] ipa: DEBUG: raw: whoami(version='2.229') [Thu Aug 23 16:37:45.161358 2018] [wsgi:error] [pid 7728:tid 140691395168000] [remote 10.0.2.102:51334] ipa: DEBUG: whoami(version='2.229') So the failure is that the following search returns no entry ldapsearch -D "uid=test3,cn=users,cn=accounts,dc=domain,dc=local" -W -b "cn=masters,cn=ipa,cn=etc,dc=domain,dc=local" -s one "(objectClass=ipaConfigObject)" cn https://bugzilla.redhat.com/show_bug.cgi?id=1620315#c21 seems to match https://pagure.io/389-ds-base/issue/49443 - https://bugzilla.redhat.com/show_bug.cgi?id=1514051 Seems similar, but there's also some discussion in https://pagure.io/389-ds-base/issue/49443 specifically about how things are different between 'master' (at that time) and the 1.3.7 branch... This message is a reminder that Fedora 27 is nearing its end of life. On 2018-Nov-30 Fedora will stop maintaining and issuing updates for Fedora 27. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '27'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 27 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. This is fixed in the current release after backing out a "search filter enhancement", closing... |