Bug 1380797 - Node upgrade doesn't keep service enable/disable configuration
Summary: Node upgrade doesn't keep service enable/disable configuration
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-node-ng
Version: 4.0.3
Hardware: Unspecified
OS: Unspecified
urgent
high
Target Milestone: ovirt-4.1.0-beta
: ---
Assignee: Ryan Barry
QA Contact: jianwu
URL:
Whiteboard:
Depends On:
Blocks: 1388317 1388373 1417161
TreeView+ depends on / blocked
 
Reported: 2016-09-30 14:50 UTC by Michal Skrivanek
Modified: 2021-06-10 11:37 UTC (History)
19 users (show)

Fixed In Version: imgbased-0.8.6-0.1.el7ev
Doc Type: Bug Fix
Doc Text:
Previously, a bug might have enabled services that were disabled when upgrading Red Hat Virtualization Host (RHVH). With this update, the bug was fixed. As a result, when upgrading RHVH, disabled services remain disabled.
Clone Of:
: 1388317 1388373 (view as bug list)
Environment:
Last Closed: 2017-04-20 18:59:29 UTC
oVirt Team: Node
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1383224 0 unspecified CLOSED RHVH-NG is automatically activated after upgrade. 2021-02-22 00:41:40 UTC
Red Hat Knowledge Base (Solution) 2683861 0 None None None 2016-10-06 06:09:13 UTC
Red Hat Product Errata RHEA-2017:1114 0 normal SHIPPED_LIVE redhat-virtualization-host bug fix and enhancement update 2017-04-20 22:57:46 UTC
oVirt gerrit 65406 0 'None' MERGED osupdater: sync up systemd on upgrade 2020-07-27 04:16:30 UTC

Internal Links: 1383224

Description Michal Skrivanek 2016-09-30 14:50:36 UTC
Seems the default install leaves firewalld service enabled and host deploy doesn't disable it. 
Since the firewalld rules are not up to date it breaks several features (migration, image upload).
It should probably be disabled to allow iptables rules to kick in, until we support it properly (bug 1075687 and bug 995362)

Comment 1 Michal Skrivanek 2016-10-02 19:38:05 UTC
It does have a workaround but it is a basic flow that is broken in not so obvious way, hence raising priority

Comment 2 Yedidyah Bar David 2016-10-05 08:04:19 UTC
Normally host-deploy does stop firewalld [1]. This happens only on certain conditions though. Please attach host-deploy log.

[1] https://gerrit.ovirt.org/gitweb?p=ovirt-host-deploy.git;a=blob;f=src/plugins/ovirt-host-deploy/vdsm/bridge.py;h=2b794ff19384cfacdd23393f174d56589678d511;hb=HEAD#l533

Comment 3 Michal Skrivanek 2016-10-05 08:56:43 UTC
(In reply to Yedidyah Bar David from comment #2)
> Normally host-deploy does stop firewalld [1]. This happens only on certain
> conditions though. Please attach host-deploy log.

- it would stop it but the code doesn't disable it so after reboot it doesn't matter
- after investigation that module is not used anymore anyway

Comment 5 Fabian Deutsch 2016-10-07 12:47:23 UTC
Node is using the stock RHEL firewalld solution: firewalld.

By default firewalld is enabled on RHEL and thus also on RHVH.

When a host is getting added to Engine, and it is selected to manage the hosts firewall, then it is expected that the host's firewall is iptables.
But this contradicts the default RHEL configuration.

To prevent that Node implements logic to take care of firewalld on Node, it would rather make sense to move the handling of firewalld (i.e. suggesting to disable it if it was detected) to the host-deploy part - becuase there we already ask if the firewall should be managed.

If the logic is there, then RHEL-H and RHVH hosts benefit alike.

Comment 6 Yedidyah Bar David 2016-10-09 06:54:05 UTC
Please attach relevant logs - host-deploy, engine, perhaps vdsm/system from host.

Comment 12 Sandro Bonazzola 2016-10-11 07:09:44 UTC
See also bug #1383224

Comment 14 Fabian Deutsch 2016-10-11 10:34:36 UTC
According to my understanding services should stay enabled or disabled after updates.

Redirecting the question in comment 10 to Ryan.

Comment 15 Ryan Barry 2016-10-11 15:41:51 UTC
(In reply to Fabian Deutsch from comment #14)
> According to my understanding services should stay enabled or disabled after
> updates.
> 
> Redirecting the question in comment 10 to Ryan.

Yes/no.

This appears to be a basic problem somewhere with the rsync in osupdater.

Services which are enabled "stick"

Services which are disabled do not (if there's a corresponding service on the new image):

[root@localhost ~]# ls -l /etc/systemd/system/basic.target.wants/
total 0
lrwxrwxrwx. 1 root root 41 Oct 10 09:51 firewalld.service -> /usr/lib/systemd/system/firewalld.service
[root@localhost ~]# systemctl enable iptables.service && systemctl disable firewalld.service
Created symlink from /etc/systemd/system/basic.target.wants/iptables.service to /usr/lib/systemd/system/iptables.service.
Removed symlink /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service.
Removed symlink /etc/systemd/system/basic.target.wants/firewalld.service.
[root@localhost ~]# !ls
ls -l /etc/systemd/system/basic.target.wants/
total 0
lrwxrwxrwx. 1 root root 40 Oct 10 09:53 iptables.service -> /usr/lib/systemd/system/iptables.service
[root@localhost ~]# rpm -Uvh redhat-virtualization-host-image-update-4.0-20161010.0.el7_3.noarch.rpm 
Preparing...                          ################################# [100%]
Updating / installing...
   1:redhat-virtualization-host-image-################################# [ 50%]
Cleaning up / removing...
   2:redhat-virtualization-host-image-################################# [100%]
[root@localhost ~]# reboot
Terminated
[root@localhost ~]# Connection to 192.168.122.65 closed by remote host.
Connection to 192.168.122.65 closed.
[rbarry@thinkpad ~]$ ssh root.122.65
root.122.65's password: 
Last login: Mon Oct 10 09:51:22 2016 from 192.168.122.1

  imgbase status: OK

[root@localhost ~]# ls -l /etc/systemd/system/basic.target.wants/
total 0
lrwxrwxrwx. 1 root root 41 Oct 10 06:25 firewalld.service -> /usr/lib/systemd/system/firewalld.service
lrwxrwxrwx. 1 root root 40 Oct 10 09:53 iptables.service -> /usr/lib/systemd/system/iptables.service

This is potentially tricky, because we want to keep some of this behavior (if crond is enabled on a new image, to use a recent example), but we also want to keep disabled services disabled.

I'll see if I can come up with a patch which isn't too invasive, and which doesn't involve whitelisting/blacklisting "special" services from the rsync.

Comment 16 Michal Skrivanek 2016-10-12 09:20:02 UTC
alternatively, I guess we can accelerate our support for firewalld. I wonder if there is any other issue than proper list of services/ports to open. There are not that many.

Didi, please also note that originally this problem is not about hosted engine deploy but "plain" host deploy, where AFAICT the service is _not_ disabled, ever.

Comment 17 Yedidyah Bar David 2016-10-13 07:40:11 UTC
(In reply to Ryan Barry from comment #15)
> (In reply to Fabian Deutsch from comment #14)
> > According to my understanding services should stay enabled or disabled after
> > updates.
> > 
> > Redirecting the question in comment 10 to Ryan.
> 
> Yes/no.
> 
> This appears to be a basic problem somewhere with the rsync in osupdater.
> 
> Services which are enabled "stick"
> 
> Services which are disabled do not (if there's a corresponding service on
> the new image):
> 
[snip]

Thanks for the analysis and clarification.

> This is potentially tricky, because we want to keep some of this behavior
> (if crond is enabled on a new image, to use a recent example), but we also
> want to keep disabled services disabled.

I'd say we probably want to consider, per service, the following:
1. Whether it was enabled by default or not prior to update
2. Whether it is enabled by default or not after the update
3. Whether we want it to be enabled or not - perhaps can also be different before and after update, and "we" can also be different in whether it's somehow user-controlled (from ui/api/etc) or not
4. Whether the user manually enabled or disabled it without us

Depending on the answers to each question above, we might decide, after upgrade, to:
1. enable
2. disable
3. somehow ask the user (not sure how)

> 
> I'll see if I can come up with a patch which isn't too invasive, and which
> doesn't involve whitelisting/blacklisting "special" services from the rsync.

OK. Good luck. If you decide to not do something very general (which might be too much for 4.0 or even 4.1), you should also consider the case that a user unchecked "configure firewall" in the web ui when adding the host, and manually enabled and configured firewalld.

(In reply to Michal Skrivanek from comment #16)
> alternatively, I guess we can accelerate our support for firewalld. I wonder
> if there is any other issue than proper list of services/ports to open.
> There are not that many.

This will solve (partially) the current case, but not the general bug. If a service was actively disabled (by us or the user), it probably needs to remain disabled after upgrade, or we should ask the user.

> 
> Didi, please also note that originally this problem is not about hosted
> engine deploy but "plain" host deploy, where AFAICT the service is _not_
> disabled, ever.

Well, firewalld actually is disabled also in plain host-deploy, if the user asked to configure the firewall (which is currently iptables only).

The code enabling iptables and disabling firewalld is actually inside otopi, is ran whenever the iptables plugin there is requested to configure iptables, and was actually called twice per each of the 'hosted-engine --deploy' processes that the hosts in current report went through - once by 'hosted-engine --deploy' itself, and again by ovirt-host-deploy, which was ran by the engine when 'hosted-engine --deploy' asked the engine to add the host.

Comment 18 Ryan Barry 2016-10-13 14:35:49 UTC
(In reply to Yedidyah Bar David from comment #17)
> (In reply to Ryan Barry from comment #15)
> > (In reply to Fabian Deutsch from comment #14)
> > > According to my understanding services should stay enabled or disabled after
> > > updates.
> > > 
> > > Redirecting the question in comment 10 to Ryan.
> > 
> > Yes/no.
> > 
> > This appears to be a basic problem somewhere with the rsync in osupdater.
> > 
> > Services which are enabled "stick"
> > 
> > Services which are disabled do not (if there's a corresponding service on
> > the new image):
> > 
> 
> Thanks for the analysis and clarification.
> 
> > This is potentially tricky, because we want to keep some of this behavior
> > (if crond is enabled on a new image, to use a recent example), but we also
> > want to keep disabled services disabled.
> 
> I'd say we probably want to consider, per service, the following:
> 1. Whether it was enabled by default or not prior to update
> 2. Whether it is enabled by default or not after the update
> 3. Whether we want it to be enabled or not - perhaps can also be different
> before and after update, and "we" can also be different in whether it's
> somehow user-controlled (from ui/api/etc) or not
> 4. Whether the user manually enabled or disabled it without us
> 
> Depending on the answers to each question above, we might decide, after
> upgrade, to:
> 1. enable
> 2. disable
> 3. somehow ask the user (not sure how)

Actually, this was somewhat easier than I expected.

Services which are enabled in the new image (and the old image) already worked, since /etc/systemd/system is synced to the new image (along with the rest of /etc)

We also keep track of the "base" state of /etc at the time of image building in /usr/share/factory.

For services which are enabled in the old image, /etc/systemd/system/foo.target.wants/bar.service is already preserved.

For services which are enabled in the new image, we can simply keep /etc/systemd/system from the new image.

For services which were manually disabled, comparing the files in /etc/systemd/system (on the old image) vs /usr/share/factory/etc/systemd/system (on the old image) lets us know which services were disabled (manually or by host-deploy), since there are no symlinks in foo.target.wants

There's a patch up on Gerrit now which is verified.

firewalld rules already go in /etc. As long as all services are managed through systemd (which should be the default on EL7), I'm confident in this patch.

Comment 22 cshao 2016-10-25 10:47:21 UTC
RHVH QE can reproduce this issue.

Test version:
redhat-virtualization-host-4.0-20160817.0
redhat-virtualization-host-4.0-20160928.0

Test steps:
1. Install RHVH old version(redhat-virtualization-host-4.0-20160817.0).
2. Check firewalld.service status.
3. Upgrade to redhat-virtualization-host-4.0-20160928.0
4. Check firewalld.service status again.
5. Check port 16514.

Test result:
1. After step2, firewalld server is active status at startup.
2. After step4, firewalld server is active status at startup.
3. After step5. there is nothing output.
# iptables -L | grep 16514

Comment 24 jianwu 2017-01-17 08:46:31 UTC
Hi, all

I have tried to check this bug as follows:

Test version:
old build:redhat-virtualization-host-4.1-20170112.1
          imgbased-0.9.4-0.1.el7ev.noarch
New build:redhat-virtualization-host-4.1-20170116.0
          imgbased-0.9.4-0.1.el7ev.noarch

Test steps:
1. Install RHVH old version(redhat-virtualization-host-4.1-20170112.1)
2. Check # systemctl status firewalld.service
3. Run # iptables -I INPUT 5 -p tcp --dport 16514 -j ACCEPT
       # systemctl disable firewalld.service
       # iptables -L | grep 16514
3. Upgrade to redhat-virtualization-host-4.1-20170116.0
4. Check # systemctl status firewalld.service
5. Check # iptables -L | grep 16514

Actual results:
1. After step 2, firewalld server is active status at startup
2. After step 3, run #iptables -L | grep 16514
  - ACCEPT     tcp  --  anywhere             anywhere             tcp dpt:16514
3. After step 4, firewalld server is inactive status at startup
4. After step 5, output no messages

Additional info:
if I do not run #systemctl disable firewalld.service operation before, firewalld server is active status at startup after upgrade.

So, I think this bug is fixed on this scenario, change status to Verified.

Jianwu
Thanks

Comment 25 errata-xmlrpc 2017-04-20 18:59:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:1114


Note You need to log in before you can comment on or make changes to this bug.