Bug 1466093 - Decommissioning domain controller role fails if system is rebooted after deployment (due to firewalld bug)
Summary: Decommissioning domain controller role fails if system is rebooted after depl...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: firewalld
Version: 27
Hardware: All
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Eric Garver
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-06-29 02:15 UTC by Adam Williamson
Modified: 2018-04-30 22:02 UTC (History)
5 users (show)

Fixed In Version: firewalld-0.5.2-2.fc28 firewalld-0.4.4.5-4.fc27
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-04-25 17:59:29 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
/var/log from an affected case (5.93 MB, application/x-gzip)
2017-06-29 02:16 UTC, Adam Williamson
no flags Details

Description Adam Williamson 2017-06-29 02:15:57 UTC
I recently implemented an openQA test which does the following:

* Starting from a clean Fedora 25 Server install, deploy the domain controller role
* On another system, starting from a clean Fedora 25 Server install, enrol as a client in the domain (using realmd)
* Once the client is enrolled, upgrade the Server system to Fedora 26, then upgrade the client system to Fedora 26
* Run through the usual server and client tests on Fedora 26

The server part of this test always fails right at the end, when the role is decommissioned with `rolectl decommission domaincontroller/domain.local`. In contrast, when a similar test is run entirely on Fedora 26 (and, indeed, entirely on Fedora 25) the decommissioning works successfully. It's only when an upgrade is involved that the decommissioning fails.

The failure seems to be related to something done to the firewall configuration during decommissioning, as the system journal contains these lines at the relevant time:

Jun 25 11:49:32 ipa001.domain.local firewalld[654]: WARNING: '/usr/sbin/iptables-restore --wait=2 -n' failed:
Jun 25 11:49:32 ipa001.domain.local firewalld[654]: WARNING: '/usr/sbin/ip6tables-restore --wait=2 -n' failed:
Jun 25 11:49:32 ipa001.domain.local firewalld[654]: ERROR: COMMAND_FAILED

/var/log/rolekit doesn't provide anything useful, though - the last message in it is:

2017-06-25 14:49:31 ERROR: b'Client uninstall complete.'

which I believe is passed along from the FreeIPA client uninstallation process.

I will attach a tarball containing the complete contents of /var/log from the server to this report. You can use 'journalctl --file' to read the journal files under /var/log/journal .

Comment 1 Adam Williamson 2017-06-29 02:16:30 UTC
Created attachment 1292752 [details]
/var/log from an affected case

Comment 2 Adam Williamson 2018-03-15 17:30:25 UTC
This has been happening ever since. It'd be really nice if this test would actually run successfully. Did you ever get to look at it, sgallagh?

Comment 3 Adam Williamson 2018-03-21 18:10:01 UTC
So, this is indeed still happening, now we've solved various other issues that got in the way:

https://openqa.stg.fedoraproject.org/tests/261752

shows the very same error:

https://openqa.stg.fedoraproject.org/tests/261752#step/role_deploy_domain_controller_check/23

Comment 4 Adam Williamson 2018-03-21 22:28:38 UTC
sgallagh has said on IRC that this can be reproduced simply across a reboot (i.e. deploy, reboot, attempt to decommission -> fail) and seems to be an issue in firewalld.

Comment 5 Adam Williamson 2018-03-21 22:35:05 UTC
Proposing as a Beta freeze exception issue. It appears to me the criteria do not in fact cover decommissioning roles, though this may possibly be an oversight.

Comment 6 Stephen Gallagher 2018-03-22 00:05:25 UTC
OK, I've finally tracked down the failure. The issue is definitely occurring within firewalld. It can be reproduced with the following steps that do not require rolekit:

1) Install a system with firewalld enabled
2) `firewall-cmd --add-service freeipa-ldap --permanent`
3) `firewall-cmd --add-service freeipa-ldaps --permanent`
4) Reboot the system
5) Verify that both services are enabled with `firewall-cmd --list-all`
6) `firewall-cmd --remove-service freeipa-ldaps` (Succeeds)
7) `firewall-cmd --remove-service freeipa-ldap` (Returns "Error: COMMAND_FAILED")

It appears that firewalld doesn't properly handle the second removal of a permanent service for which the services have entries that overlap. The freeipa-ldap and freeipa-ldaps services are almost identical, providing numerous ports. They differ only on the LDAP tcp port, which is 389 for freeipa-ldap and 636 for freeipa-ldaps.

So it appears that after removing one of the two services, firewalld cannot properly handle removing the other one. This is what causes the FreeIPA decommissioning to fail. You can reverse steps 6 and 7 above and the second one will always fail.

(I was clued in that it might be related to the freeipa-ldap/s interaction because the postgresql role did not exhibit the same behavior.)

Comment 7 Eric Garver 2018-03-22 12:41:48 UTC
What version of firewalld? I think this was fixed in version firewalld-0.5.2-1.

Comment 8 Stephen Gallagher 2018-03-22 12:53:31 UTC
I can reproduce the issue with firewalld-0.5.1-2.fc28.noarch

I can indeed confirm that firewalld-0.5.2-1.fc28.noarch resolves this issue. Please get it submitted in Bodhi ASAP.

Comment 9 Fedora Update System 2018-03-22 13:27:56 UTC
firewalld-0.5.2-1.fc28 has been submitted as an update to Fedora 28. https://bodhi.fedoraproject.org/updates/FEDORA-2018-fa59de3ded

Comment 10 Fedora Update System 2018-03-22 15:06:49 UTC
firewalld-0.5.2-1.fc28 has been pushed to the Fedora 28 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-fa59de3ded

Comment 11 Fedora Update System 2018-03-22 16:06:12 UTC
firewalld-0.5.2-2.fc28 has been submitted as an update to Fedora 28. https://bodhi.fedoraproject.org/updates/FEDORA-2018-980d3f6ad7

Comment 12 František Zatloukal 2018-03-22 18:46:35 UTC
Discussed during blocker review [1]:

AcceptedFreezeException (Beta) - decommissioning isn't actually part of the release criteria, but is a significant function of server roles, and pulling this in will improve openQA test coverage

[1] https://meetbot-raw.fedoraproject.org/fedora-meeting-1/2018-03-22/

Comment 13 Fedora Update System 2018-03-23 14:44:00 UTC
firewalld-0.5.2-2.fc28 has been pushed to the Fedora 28 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-980d3f6ad7

Comment 14 Adam Williamson 2018-03-23 15:48:16 UTC
openQA testing confirmed the fix for this:

https://openqa.stg.fedoraproject.org/tests/263230

Comment 15 Fedora Update System 2018-03-26 22:30:30 UTC
firewalld-0.5.2-2.fc28 has been pushed to the Fedora 28 stable repository. If problems still persist, please make note of it in this bug report.

Comment 16 Adam Williamson 2018-04-21 01:51:50 UTC
Can the fix for this also be sent to F27? I just started running FreeIPA upgrade tests on stable release updates, and it told me that F27 is still affected by this.

Comment 17 Fedora Update System 2018-04-21 20:07:58 UTC
firewalld-0.4.4.5-4.fc26 has been submitted as an update to Fedora 26. https://bodhi.fedoraproject.org/updates/FEDORA-2018-0f5c19f004

Comment 18 Fedora Update System 2018-04-21 20:08:09 UTC
firewalld-0.4.4.5-4.fc27 has been submitted as an update to Fedora 27. https://bodhi.fedoraproject.org/updates/FEDORA-2018-c65cf564c3

Comment 19 Adam Williamson 2018-04-21 21:30:59 UTC
So I went ahead and backported the commit that looked most obviously like the fix for this - https://github.com/firewalld/firewalld/commit/54835164f610593eedd71f0a7ae62ac5258d2187 - for F26 and F27 and submitted an update. The F27 openQA test result - https://openqa.stg.fedoraproject.org/tests/289311 - seems to confirm the fix: that test is failing in all other F27 update tests right now, but passes (well, soft fails, which is more or less a pass) with this update.

sgallagh, could you confirm and upkarma? I'd like to push this out so we don't have this bug causing the test to fail on *every* F27 update.

Comment 20 Eric Garver 2018-04-23 13:11:14 UTC
(In reply to Adam Williamson from comment #19)
> So I went ahead and backported the commit that looked most obviously like
> the fix for this -
> https://github.com/firewalld/firewalld/commit/
> 54835164f610593eedd71f0a7ae62ac5258d2187 - for F26 and F27 and submitted an
> update. The F27 openQA test result -
> https://openqa.stg.fedoraproject.org/tests/289311 - seems to confirm the
> fix: that test is failing in all other F27 update tests right now, but
> passes (well, soft fails, which is more or less a pass) with this update.

Thanks! LGTM.

> sgallagh, could you confirm and upkarma? I'd like to push this out so we
> don't have this bug causing the test to fail on *every* F27 update.

Adding needinfo for sgallagh.

Comment 21 Stephen Gallagher 2018-04-23 13:12:13 UTC
Yes, I saw this and am looking into it right now, in fact. I'll get the karma out shortly.

Comment 22 Stephen Gallagher 2018-04-23 13:30:56 UTC
Confirmed, this update resolves the firewalld bug. I tested by doing the following:

Installed F27 with all stable updates
firewall-cmd --add-service=freeipa-ldap --permanent
firewall-cmd --add-service=freeipa-ldaps --permanent
Rebooted
Updated to the fixed package (I did this after the reboot to confirm that the initial state was recoverable)
firewall-cmd --remove-service=freeipa-ldap
firewall-cmd --remove-service=freeipa-ldaps
All went well.

Comment 23 Fedora Update System 2018-04-24 05:03:12 UTC
firewalld-0.4.4.5-4.fc26 has been pushed to the Fedora 26 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-0f5c19f004

Comment 24 Fedora Update System 2018-04-24 05:37:35 UTC
firewalld-0.4.4.5-4.fc27 has been pushed to the Fedora 27 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-c65cf564c3

Comment 25 Fedora Update System 2018-04-25 17:59:29 UTC
firewalld-0.4.4.5-4.fc27 has been pushed to the Fedora 27 stable repository. If problems still persist, please make note of it in this bug report.

Comment 26 Fedora Update System 2018-04-30 22:02:33 UTC
firewalld-0.4.4.5-4.fc26 has been pushed to the Fedora 26 stable repository. If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.