Bug 1930038
| Summary: | 'ipa-server-install --uninstall --ignore-topology-disconnect --ignore-last-of-role' fails with org.freedesktop.DBus.Error.NoReply: Did not receive a reply | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Sudhir Menon <sumenon> |
| Component: | ipa | Assignee: | Thomas Woerner <twoerner> |
| Status: | CLOSED UPSTREAM | QA Contact: | ipa-qe <ipa-qe> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 8.4 | CC: | abokovoy, frenaud, ksiddiqu, nalin, pcech, rcritten, tscherf |
| Target Milestone: | rc | Keywords: | Triaged |
| Target Release: | --- | Flags: | pm-rhel:
mirror+
|
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-05-24 11:24:49 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Sudhir Menon
2021-02-18 09:47:01 UTC
The log files generated during the failure are necessary. Re-assigning to certmonger component. I believe what is happening is the CA is in a bad way so certs are stuck in SUBMITTING. I don't know if this is because something else holds the lock file or not. A change was made to certmonger in Aug 2020 (certmonger-0.79.12+) to not send a SIGKILL when certmonger wants to stop waiting on a child process. This was causing the IPA renewal lock file to be left in an unknown state. It was really just a race condition as it seemed like the processes were usually nearly, but not quite, finished. The problem in this case is that we want to stop tracking a certificate so don't care whether it is issued or not. certmonger uses waitpid() to determine when the process is finished. In this case it won't happen until after the submission is complete (it's in SUBMITTING). So it exceeds the dbus timeout. Heck, I don't know for sure that IPA would ever finish the request. I'm going to investigate sending a SIGTERM that can be caught by the helper so it can clean itself up. Ideally after a timeout, but the DBus request timeout is something extremely short like 25 seconds. I'll consider adding retry code to the certmonger calls in IPA. The reproducer output is in http://freeipa-org-pr-ci.s3-website.eu-central-1.amazonaws.com/jobs/3a555fa6-875f-11eb-92ec-fa163e05ce82/report.html from PR https://github.com/freeipa/freeipa/pull/5573 I reproduce this upstream by running test_integration/test_ipahealthcheck.py::TestIpaHealthCheck 5 times. It almost always fails in at least one of the invocations. Moving back to IPA. The DBus timeouts are seen when IPA is trying to uninstall itself. During this the certificates it issues are untracked by calling the certmonger DBus command remove_request. remove_request waits for the CA helper to complete. Since this time exceeds the DBus timeout the exception is raised. The certs really have no chance of being issued in this case because of the way the test works. It moves forward in time to test that ipa-healtcheck correctly reports that the certs are soon to expire. Then it moves back to current time and tries to uninstall. certmonger may wake up during the period that ipa-healthcheck is running and try to renew the certs, then the time changes back. The CA is basically hosed because if any certs are renewed in the future then nothing will work because they are not yet valid. So modify the test to stop the CA prior to running ipa-healthcheck and uninstall in future time to prevent certificate issuance. In short: the test needs to be fixed. Upstream ticket: https://pagure.io/freeipa/issue/8506 Fixed upstream master: https://pagure.io/freeipa/c/fb58b76a801971748f6b10b732e81763df81c69a https://pagure.io/freeipa/c/8c93e2fb0b151a0a459ad68880b59494f6aeb33c Fixed upstream ipa-4-9: https://pagure.io/freeipa/c/b70e30dbf011fd918c4f2955dda0fc2bc12a35ea https://pagure.io/freeipa/c/d15e577bc1b6f9d98b1ac424d1c0df4ef9839c91 It is fixed upstream. So closing for now. Address another DBus-related failure. Remove all the certmonger tracking prior to uninstall in the _expire_cert_critical() fixture to avoid contention between certmonger starting back up to stop the cert tracking and the IPA helper(s) needing to get a ticket from a half-uninstalled IPA server. Fixed upstream master: https://pagure.io/freeipa/c/46ccf006ffde6c341dbb443d5a22fa1e036402da Fixed upstream ipa-4-9: https://pagure.io/freeipa/c/cc2348aedbee3e59b31df75a23aa14d1c6bbe10c |