1621015 – SHE 3.6 upgrade to 4.0 failed but engine is 4.0

Bug 1621015 - SHE 3.6 upgrade to 4.0 failed but engine is 4.0

Summary: SHE 3.6 upgrade to 4.0 failed but engine is 4.0

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Virtualization Manager
Classification:	Red Hat
Component:	ovirt-hosted-engine-setup
Sub Component:
Version:	4.2.5
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	ovirt-4.2.7
Target Release:	---
Assignee:	Simone Tiraboschi
QA Contact:	meital avital
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1629888
TreeView+	depends on / blocked

Reported:	2018-08-23 12:57 UTC by Jiri Belka
Modified:	2019-04-28 10:39 UTC (History)
CC List:	3 users (show)
Fixed In Version:	ovirt-hosted-engine-setup-2.2.27-1.el7ev.noarch.rpm
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-11-05 15:03:44 UTC
oVirt Team:	Integration
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
firewalld logs (18.94 KB, text/plain) 2018-08-24 09:52 UTC, Simone Tiraboschi	no flags	Details
firewalld.conf (1.49 KB, text/plain) 2018-08-24 09:52 UTC, Simone Tiraboschi	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	1494985	high	CLOSED	Hosted Engine Upgrade from 3.6 to 4.0 fails with error message "Command '/bin/firewall-cmd' failed to executed"	2023-09-15 00:04:05 UTC
Red Hat Product Errata	RHBA-2018:3482	None	None	None	2018-11-05 15:03:59 UTC
oVirt gerrit	94172	master	MERGED	upgrade: force firewalld to use individual calls	2020-02-19 08:37:28 UTC
oVirt gerrit	94259	ovirt-hosted-engine-setup-2.2	MERGED	upgrade: force firewalld to use individual calls	2020-02-19 08:37:28 UTC

Internal Links: 1494985

Description Jiri Belka 2018-08-23 12:57:24 UTC

Description of problem:

I wanted to upgrade my SHE 3.6 to 4.0 but it failed (but surprisingly
the engine is 4.0 now).

While reading upgrade guide I became skeptical about the flow as:

- you want 4.0 but you are adding 'rhel-7-server-rhev-mgmt-agent-rpms'
  channel which is latest, ie. 4.2

- appliance is OK, rhevm-appliance-4.0.20170307.0-1.el7ev.noarch

Besides the failure itself, my concern is:

- how do we treat our upgrade documentation to older versions
  which mentions a channel (eg. 'rhel-7-server-rhev-mgmt-agent-rpms')
  having only only older version but the latest one?

  are we thus telling customer to upgrade to latest rpms from such
  channel or we let customers experiment here?

IMO the failure is related to the channel having 4.0, 4.1
and 4.2 rpms. Am I not right?

FYI, upgrading just 'ovirt-hosted-engine-setup' to latest version
in this channel means it pulls other requirements, read 'ansible'
which does not exist in 4.0... So the host system is now some mix
of 3.6 and 4.2, as we recommend to upgrade 'ovirt-hosted-engine-setup'
and 'rhevm-appliance' first and _only__then_ the rest of the host.

The failure itself:

---%>---
          |- [ ERROR ] Failed to execute stage 'Closing up': Command '/bin/firewall-cmd' failed to execute
          |- [ INFO  ] Stage: Clean up
          |-           Log file is located at /var/log/ovirt-engine/setup/ovirt-engine-setup-20180823135310-ucjbzp.log
          |- [ INFO  ] Generating answer file '/var/lib/ovirt-engine/setup/answers/20180823135533-setup.conf'
          |- [ INFO  ] Stage: Pre-termination
          |- [ INFO  ] Stage: Termination
          |- [ ERROR ] Execution of setup failed
          |- HE_APPLIANCE_ENGINE_SETUP_FAIL
[ ERROR ] Engine setup failed on the appliance
[ ERROR ] Failed to execute stage 'Closing up': Engine setup failed on the appliance
         Please check its log on the appliance.
         
[ INFO  ] Stage: Clean up
[ INFO  ] Generating answer file '/var/lib/ovirt-hosted-engine-setup/answers/answers-20180823115533.conf'
[ INFO  ] Stage: Pre-termination
[ INFO  ] Stage: Termination
[ ERROR ] Hosted Engine upgrade failed: you can use --rollback-upgrade option to recover the engine VM disk from a backup.
          Log file is located at /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20180823104447-4o09te.log

---%<---

And the log inside engine:

---%>---
2018-08-23 13:53:12 DEBUG otopi.plugins.otopi.services.systemd plugin.execute:921 execute-output: ('/bin/systemctl', 'status', 'firewalld.service') stdo
ut:
● firewalld.service - firewalld - dynamic firewall daemon
   Loaded: loaded (/usr/lib/systemd/system/firewalld.service; enabled; vendor preset: enabled)
   Active: active (running) since Thu 2018-08-23 11:51:42 CEST; 2h 1min ago
     Docs: man:firewalld(1)
 Main PID: 538 (firewalld)
   CGroup: /system.slice/firewalld.service
           └─538 /usr/bin/python -Es /usr/sbin/firewalld --nofork --nopid

Aug 23 11:51:38 localhost.localdomain systemd[1]: Starting firewalld - dynamic firewall daemon...
Aug 23 11:51:42 localhost.localdomain systemd[1]: Started firewalld - dynamic firewall daemon.
Aug 23 11:51:45 localhost.localdomain firewalld[538]: WARNING: '/usr/sbin/ip6tables-restore -n' failed:
Aug 23 11:51:45 localhost.localdomain firewalld[538]: WARNING: '/usr/sbin/iptables-restore -n' failed:
Aug 23 11:51:45 localhost.localdomain firewalld[538]: ERROR: '/usr/sbin/ebtables-restore --noflush' failed:
Aug 23 11:51:45 localhost.localdomain firewalld[538]: ERROR: COMMAND_FAILED
Aug 23 11:51:45 localhost.localdomain firewalld[538]: ERROR: INVALID_ZONE
Aug 23 11:51:46 localhost.localdomain firewalld[538]: ERROR: INVALID_ZONE
...
2018-08-23 13:55:33 DEBUG otopi.plugins.otopi.network.firewalld plugin.execute:926 execute-output: ('/bin/firewall-cmd', '--reload') stderr:
ESC[91mError: COMMAND_FAILEDESC[00m

2018-08-23 13:55:33 DEBUG otopi.context context._executeMethod:142 method exception
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/otopi/context.py", line 132, in _executeMethod
    method['method']()
  File "/usr/share/otopi/plugins/otopi/network/firewalld.py", line 324, in _closeup
    '--reload'
  File "/usr/lib/python2.7/site-packages/otopi/plugin.py", line 931, in execute
    command=args[0],
RuntimeError: Command '/bin/firewall-cmd' failed to execute
2018-08-23 13:55:33 ERROR otopi.context context._executeMethod:151 Failed to execute stage 'Closing up':
---%<---

Huh, why failed to execute?

---%>---
# which firewall-cmd
/usr/bin/firewall-cmd
# ls -li /bin/firewall-cmd /usr/bin/firewall-cmd
1250549 -rwxr-xr-x. 1 root root 105358 Feb 10  2017 /bin/firewall-cmd
1250549 -rwxr-xr-x. 1 root root 105358 Feb 10  2017 /usr/bin/firewall-cmd

# journalctl | grep firewall
Aug 23 11:51:38 localhost.localdomain systemd[1]: Starting firewalld - dynamic firewall daemon...
Aug 23 11:51:42 localhost.localdomain systemd[1]: Started firewalld - dynamic firewall daemon.
Aug 23 11:51:45 localhost.localdomain firewalld[538]: WARNING: '/usr/sbin/ip6tables-restore -n' failed:
Aug 23 11:51:45 localhost.localdomain firewalld[538]: WARNING: '/usr/sbin/iptables-restore -n' failed:
Aug 23 11:51:45 localhost.localdomain firewalld[538]: ERROR: '/usr/sbin/ebtables-restore --noflush' failed:
Aug 23 11:51:45 localhost.localdomain firewalld[538]: ERROR: COMMAND_FAILED
Aug 23 11:51:45 localhost.localdomain firewalld[538]: ERROR: INVALID_ZONE
Aug 23 11:51:45 localhost.localdomain NetworkManager[557]: <warn>  [1535017905.6136] firewall: [0x7f5e40ca0b60,change:"eth0"]: complete: request failed (INVALID_ZONE)
Aug 23 11:51:46 localhost.localdomain firewalld[538]: ERROR: INVALID_ZONE
Aug 23 11:51:46 localhost.localdomain NetworkManager[557]: <warn>  [1535017906.6607] firewall: [0x7f5e40ca2b50,change:"eth0"]: complete: request failed (INVALID_ZONE)
Aug 23 13:55:33 she-test-01.rhev.lab.eng.brq.redhat.com firewalld[538]: WARNING: '/usr/sbin/ip6tables-restore -n' failed:
Aug 23 13:55:33 she-test-01.rhev.lab.eng.brq.redhat.com firewalld[538]: WARNING: '/usr/sbin/iptables-restore -n' failed:
Aug 23 13:55:33 she-test-01.rhev.lab.eng.brq.redhat.com firewalld[538]: ERROR: '/usr/sbin/ebtables-restore --noflush' failed:
Aug 23 13:55:33 she-test-01.rhev.lab.eng.brq.redhat.com firewalld[538]: ERROR: COMMAND_FAILED
---%<---

Version-Release number of selected component (if applicable):
ovirt-hosted-engine-setup-2.2.25-1.el7ev.noarch

How reproducible:
just tried once

Steps to Reproduce:
1. have 3.6 SHE (I had EL 7.4)
2. upgrade ovirt-hosted-engine-setup and rhevm-appliance
   as written in upgrade guide
3. hosted-engine --upgrade-appliance

Actual results:
setup failed but engine is 4.0 now

Expected results:
either it should work fine, or it should be documented not to mess with various rpm versions or it should rollback

Additional info:

Comment 3 Simone Tiraboschi 2018-08-23 13:57:30 UTC

(In reply to Jiri Belka from comment #0)
> IMO the failure is related to the channel having 4.0, 4.1
> and 4.2 rpms. Am I not right?

No, it's not directly related to that.

We discussed this more than once and unfortunately we can do that much due to repository design.

Upstream we have 4.0, 4.1 and 4.2 repositories and each of them contains both the engine rpms and host ones.

Downstream we have instead host channel and engine channel; for the engine channel we have 4.0, 4.1, 4.2 and so on while for the hos/agent channel we have just 3 or 4 so, due to that, latest (whatever it is...) ovirt-hosted-engine-setup is supposed to keep 3.6/el6 -> 4.0/el7 upgrade capability.

> Expected results:
> either it should work fine, or it should be documented not to mess with
> various rpm versions or it should rollback

We have a manual rollback command called:
  hosted-engine --rollback-upgrade
the user is supposed to manually run it if needed.

Comment 4 Simone Tiraboschi 2018-08-23 14:33:46 UTC

(In reply to Jiri Belka from comment #0)
> Huh, why failed to execute?
> 
> ---%>---
> # which firewall-cmd
> /usr/bin/firewall-cmd
> # ls -li /bin/firewall-cmd /usr/bin/firewall-cmd
> 1250549 -rwxr-xr-x. 1 root root 105358 Feb 10  2017 /bin/firewall-cmd
> 1250549 -rwxr-xr-x. 1 root root 105358 Feb 10  2017 /usr/bin/firewall-cmd
> 
> # journalctl | grep firewall
> Aug 23 11:51:38 localhost.localdomain systemd[1]: Starting firewalld -
> dynamic firewall daemon...
> Aug 23 11:51:42 localhost.localdomain systemd[1]: Started firewalld -
> dynamic firewall daemon.
> Aug 23 11:51:45 localhost.localdomain firewalld[538]: WARNING:
> '/usr/sbin/ip6tables-restore -n' failed:
> Aug 23 11:51:45 localhost.localdomain firewalld[538]: WARNING:
> '/usr/sbin/iptables-restore -n' failed:
> Aug 23 11:51:45 localhost.localdomain firewalld[538]: ERROR:
> '/usr/sbin/ebtables-restore --noflush' failed:
> Aug 23 11:51:45 localhost.localdomain firewalld[538]: ERROR: COMMAND_FAILED
> Aug 23 11:51:45 localhost.localdomain firewalld[538]: ERROR: INVALID_ZONE
> Aug 23 11:51:45 localhost.localdomain NetworkManager[557]: <warn> 
> [1535017905.6136] firewall: [0x7f5e40ca0b60,change:"eth0"]: complete:
> request failed (INVALID_ZONE)
> Aug 23 11:51:46 localhost.localdomain firewalld[538]: ERROR: INVALID_ZONE
> Aug 23 11:51:46 localhost.localdomain NetworkManager[557]: <warn> 
> [1535017906.6607] firewall: [0x7f5e40ca2b50,change:"eth0"]: complete:
> request failed (INVALID_ZONE)
> Aug 23 13:55:33 she-test-01.rhev.lab.eng.brq.redhat.com firewalld[538]:
> WARNING: '/usr/sbin/ip6tables-restore -n' failed:
> Aug 23 13:55:33 she-test-01.rhev.lab.eng.brq.redhat.com firewalld[538]:
> WARNING: '/usr/sbin/iptables-restore -n' failed:
> Aug 23 13:55:33 she-test-01.rhev.lab.eng.brq.redhat.com firewalld[538]:
> ERROR: '/usr/sbin/ebtables-restore --noflush' failed:
> Aug 23 13:55:33 she-test-01.rhev.lab.eng.brq.redhat.com firewalld[538]:
> ERROR: COMMAND_FAILED
> ---%<---

We had report of it also in the past,
here one:
https://bugzilla.redhat.com/1494985
but we also had others.

Unfortunately we never got a systematic reproducer. I think it is/was something inside firewalld.

Jiri, could you please retry on the same env reporting if and how it's reproducible?

Comment 5 Jiri Belka 2018-08-24 08:58:28 UTC

# rpm -q redhat-release-server vdsm ovirt-hosted-engine-setup ovirt-hosted-engine-ha libvirt-daemon qemu-kvm-rhev
redhat-release-server-7.4-18.el7.x86_64
vdsm-4.17.45-1.el7ev.noarch
ovirt-hosted-engine-setup-1.3.7.4-1.el7ev.noarch
ovirt-hosted-engine-ha-1.3.5.10-2.el7ev.noarch
libvirt-daemon-3.2.0-14.el7_4.3.x86_64
qemu-kvm-rhev-2.9.0-16.el7_4.8.x86_64


# hosted-engine --vm-status


--== Host 1 status ==--

conf_on_shared_storage             : True
Status up-to-date                  : True
Hostname                           : 10-37-140-183.rhev.lab.eng.brq.redhat.com
Host ID                            : 1
Engine status                      : {"health": "good", "vm": "up", "detail": "up"}
Score                              : 3400
stopped                            : False
Local maintenance                  : False
crc32                              : 103fa4f0
local_conf_timestamp               : 0
Host timestamp                     : 59706

now proceeding with upgrade...

# yum repolist -v | grep -i repo-id
Repo-id      : rhel-7-server-ansible-2-rpms
Repo-id      : rhel-7-server-rhv-4-mgmt-agent-rpms
Repo-id      : rhel-7-server-rpms

^^ ansible is needed as it is required by ovirt-hosted-engine-setup in 4.2 (rhel-7-server-rhv-4-mgmt-agent-rpms will use latest version, thus 4.2 version).

yum update ovirt-hosted-engine-setup rhevm-appliance
 ovirt-engine-sdk-python                         noarch        3.6.9.1-1.el7ev                         rhel-7-server-rhv-4-mgmt-agent-rpms        484 k
     replacing  rhevm-sdk-python.noarch 3.6.9.1-1.el7ev
...
Updating:
 ovirt-hosted-engine-setup                       noarch        2.2.25-1.el7ev                          rhel-7-server-rhv-4-mgmt-agent-rpms        401 k
 rhevm-appliance                                 noarch        1:4.0.20170307.0-1.el7ev                rhel-7-server-rhv-4-mgmt-agent-rpms        1.5 G
...
 ansible                                         noarch        2.6.3-1.el7ae                           rhel-7-server-ansible-2-rpms                10 M
...
 ovirt-host                                      x86_64        4.2.3-1.el7ev                           rhel-7-server-rhv-4-mgmt-agent-rpms        8.7 k
 ovirt-host-dependencies                         x86_64        4.2.3-1.el7ev                           rhel-7-server-rhv-4-mgmt-agent-rpms        8.6 k
...
 otopi                                           noarch        1.7.8-1.el7ev                           rhel-7-server-rhv-4-mgmt-agent-rpms        166 k
 ovirt-host-deploy                               noarch        1.7.4-1.el7ev                           rhel-7-server-rhv-4-mgmt-agent-rpms         96 k
 ovirt-hosted-engine-ha                          noarch        2.2.16-1.el7ev                          rhel-7-server-rhv-4-mgmt-agent-rpms        316 k
 ovirt-setup-lib                                 noarch        1.1.4-1.el7ev                           rhel-7-server-rhv-4-mgmt-agent-rpms         19 k
...


# hosted-engine --upgrade-appliance
...
          |- [ INFO  ] Creating/refreshing Engine 'internal' domain database schema
          |- [ INFO  ] Generating post install configuration file '/etc/ovirt-engine-setup.conf.d/20-setup-ovirt-post.conf'
          |- [ INFO  ] Stage: Transaction commit
          |- [ INFO  ] Stage: Closing up
          |- [ ERROR ] Failed to execute stage 'Closing up': Command '/bin/firewall-cmd' failed to execute
          |- [ INFO  ] Stage: Clean up
          |-           Log file is located at /var/log/ovirt-engine/setup/ovirt-engine-setup-20180824105359-xeuw5a.log
          |- [ INFO  ] Generating answer file '/var/lib/ovirt-engine/setup/answers/20180824105613-setup.conf'
          |- [ INFO  ] Stage: Pre-termination
          |- [ INFO  ] Stage: Termination
          |- [ ERROR ] Execution of setup failed
          |- HE_APPLIANCE_ENGINE_SETUP_FAIL
[ ERROR ] Engine setup failed on the appliance
[ ERROR ] Failed to execute stage 'Closing up': Engine setup failed on the appliance
         Please check its log on the appliance.
         
[ INFO  ] Stage: Clean up
[ INFO  ] Generating answer file '/var/lib/ovirt-hosted-engine-setup/answers/answers-20180824085612.conf'
[ INFO  ] Stage: Pre-termination
[ INFO  ] Stage: Termination
[ ERROR ] Hosted Engine upgrade failed: you can use --rollback-upgrade option to recover the engine VM disk from a backup.
          Log file is located at /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20180824074344-zzvphn.log

Comment 6 Simone Tiraboschi 2018-08-24 09:50:59 UTC

This is definitively due to 

2018-08-23 13:55:33 DEBUG otopi.plugins.otopi.network.firewalld plugin.execute:926 execute-output: ('/bin/firewall-cmd', '--reload') stderr:
ESC[91mError: COMMAND_FAILEDESC[00m

Attaching firewalld logs at debug level.

Eric, could you please take a look?

Comment 7 Simone Tiraboschi 2018-08-24 09:52:10 UTC

Created attachment 1478440 [details]
firewalld logs

Comment 8 Simone Tiraboschi 2018-08-24 09:52:52 UTC

Created attachment 1478441 [details]
firewalld.conf

Comment 9 Jiri Belka 2018-08-24 11:46:33 UTC

after some struggling with ova file, i got an engine vm based on this ova file version up and here are data:

- rpms

rhevm-4.0.7.4-0.1.el7ev.noarch
firewalld-0.4.3.2-8.1.el7_3.2.noarch
redhat-release-server-7.3-7.el7.x86_64
iptables-1.4.21-17.el7.x86_64

- firewall-cmd check

# firewall-cmd --list-all
public (active)
  target: default
  icmp-block-inversion: no
  interfaces: eth0
  sources: 
  services: dhcpv6-client ssh
  ports: 
  protocols: 
  masquerade: no
  forward-ports: 
  sourceports: 
  icmp-blocks: 
  rich rules: 


- trying engine-setup manually (engine-setup --offline)

...
Firewall manage : firewalld
..

2018-08-24 07:44:55 INFO otopi.plugins.ovirt_engine_common.base.core.misc misc._terminate:156 Execution of setup completed successfully

^^  finished without any issue

- firewalld recheck

# firewall-cmd --list-all
public (active)
  target: default
  icmp-block-inversion: no
  interfaces: eth0
  sources: 
  services: dhcpv6-client ovirt-fence-kdump-listener ovirt-http ovirt-https ovirt-imageio-proxy ovirt-postgres ovirt-vmconsole-proxy ovirt-websocket-proxy ssh
  ports: 
  protocols: 
  masquerade: no
  forward-ports: 
  sourceports: 
  icmp-blocks: 
  rich rules:

Comment 10 Jiri Belka 2018-08-24 11:49:37 UTC

thus rhevm-appliance-4.0.20170307.0-1.el7ev.ova itself does work fine via `engine-setup --offline'.

the problem must be related how is engine vm based on this ova file deployed from HE host.

# grep -iE 'firewall|iptables' /var/lib/ovirt-engine/setup/answers/20180824074455-setup.conf 
OVESETUP_CONFIG/firewallManager=str:firewalld
OVESETUP_CONFIG/firewallChangesReview=none:None
OVESETUP_CONFIG/updateFirewall=bool:True

Comment 11 Jiri Belka 2018-08-24 12:34:16 UTC

i tried to run engine-setup on this ova file (image) with cloud-init iso with following content and it worked fine

https://paste.fedoraproject.org/paste/NVmZZeDNAZ6IVQOum7G~Gw/raw

Comment 12 Eric Garver 2018-09-04 12:35:53 UTC

(In reply to Simone Tiraboschi from comment #6)
> This is definitively due to 
> 
> 2018-08-23 13:55:33 DEBUG otopi.plugins.otopi.network.firewalld
> plugin.execute:926 execute-output: ('/bin/firewall-cmd', '--reload') stderr:
> ESC[91mError: COMMAND_FAILEDESC[00m
> 
> Attaching firewalld logs at debug level.
> 
> Eric, could you please take a look?

From the errors I can't tell much. One thing I notice is that this is a pretty old version of firewalld - so old that the "--wait" option for iptables-restore is not being used (see bug 1446162). If anything else on the system happens to be holding the iptables lock, then the iptables-restore will fail.

If you can try to reproduce with this setting in /etc/firewalld/firewalld.conf it would really help.
/etc/firewalld/firewalld.conf:

   IndividualCalls=yes

With IndividualCalls=yes, iptables will be called directly instead of using iptables-restore. The individual calls _will_ use the "-w" option.

Comment 13 Simone Tiraboschi 2018-09-07 07:49:18 UTC

(In reply to Eric Garver from comment #12)
> With IndividualCalls=yes, iptables will be called directly instead of using
> iptables-restore. The individual calls _will_ use the "-w" option.

Thanks, we are going to try that.

Comment 14 Jiri Belka 2018-09-07 13:31:52 UTC

I took the diff and patch it on 3.6 SHE host, then update proceed successfully.

Comment 15 Eric Garver 2018-09-07 14:19:16 UTC

(In reply to Jiri Belka from comment #14)
> I took the diff and patch it on 3.6 SHE host, then update proceed
> successfully.

Please realize that using IndividualCalls=yes has a performance impact when applying rules. That's why iptables-restore is used in later firewalld versions.

Comment 16 Simone Tiraboschi 2018-09-07 14:50:34 UTC

(In reply to Eric Garver from comment #15)
> Please realize that using IndividualCalls=yes has a performance impact when
> applying rules. That's why iptables-restore is used in later firewalld
> versions.

Thanks,
that upgrade code has been developed to let the user, as easily and as smooth as possible, perform an upgrade of his RHV 3.6 environment where RHV manager was running on an VM based on RHEL6.

The user cannot run a direct RHV manager upgrade from 3.6 to the current one (4.2) but he has to pass trough 4.0 and 4.1.
Upgrade from 4.1 and 4.2 can be performed basically in place while 3.6/el6 -> 4.0/el7 is much more complex due to the OS change.

Now RHV manager 4.0 appliance is still shipped over a RHEL 7.2 based appliance and we are not planning to rebuild and retest its appliance over RHEL 7.5 or 7.6.

So Eric basically you are suggesting to use IndividualCalls=yes only during the setup process and then restore the initial configuration to bypass this and let the user reach RHEL 7.5 for the target solution.

Comment 17 Eric Garver 2018-09-07 15:03:39 UTC

(In reply to Simone Tiraboschi from comment #16)
> (In reply to Eric Garver from comment #15)
> > Please realize that using IndividualCalls=yes has a performance impact when
> > applying rules. That's why iptables-restore is used in later firewalld
> > versions.
> 
> Thanks,
> that upgrade code has been developed to let the user, as easily and as
> smooth as possible, perform an upgrade of his RHV 3.6 environment where RHV
> manager was running on an VM based on RHEL6.
> 
> The user cannot run a direct RHV manager upgrade from 3.6 to the current one
> (4.2) but he has to pass trough 4.0 and 4.1.
> Upgrade from 4.1 and 4.2 can be performed basically in place while 3.6/el6
> -> 4.0/el7 is much more complex due to the OS change.
> 
> Now RHV manager 4.0 appliance is still shipped over a RHEL 7.2 based
> appliance and we are not planning to rebuild and retest its appliance over
> RHEL 7.5 or 7.6.
> 
> So Eric basically you are suggesting to use IndividualCalls=yes only during
> the setup process and then restore the initial configuration to bypass this
> and let the user reach RHEL 7.5 for the target solution.

Yes. It's a good idea to restore the original value of IndividualCalls=no once you've upgraded to a RHEL base that supports iptables-restore "-w" option.

Comment 19 Jiri Belka 2018-10-10 10:33:10 UTC

ok, ovirt-hosted-engine-setup-2.2.27-1.el7ev.noarch

...
[ INFO  ] Engine is still not reachable, waiting...
[ INFO  ] Engine replied: DB Up!Welcome to Health Status!
[ INFO  ] Connecting to Engine
[ INFO  ] Connecting to Engine
[ INFO  ] Connecting to Engine
[ INFO  ] Connecting to Engine
[ INFO  ] Connecting to Engine
[ ERROR ] Failed to execute stage 'Closing up': Cannot connect to Engine API on she-test-01.rhev.lab.eng.brq.redhat.com
[ INFO  ] Stage: Clean up
[ INFO  ] Generating answer file '/var/lib/ovirt-hosted-engine-setup/answers/answers-20180927183139.conf'
[ INFO  ] Stage: Pre-termination
[ INFO  ] Stage: Termination
[ ERROR ] Hosted Engine upgrade failed: you can use --rollback-upgrade option to recover the engine VM disk from a backup.
          Log file is located at /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20180927172108-ck35jq.log

# hosted-engine --vm-status


!! Cluster is in GLOBAL MAINTENANCE mode !!



--== Host 1 status ==--

conf_on_shared_storage             : True
Status up-to-date                  : True
Hostname                           : 10-37-140-183.example.com
Host ID                            : 1
Engine status                      : {"health": "good", "vm": "up", "detail": "Up"}
Score                              : 3000
stopped                            : False
Local maintenance                  : False
crc32                              : b5a6a960
local_conf_timestamp               : 278243
Host timestamp                     : 278243
Extra metadata (valid at timestamp):
        metadata_parse_version=1
        metadata_feature_version=1
        timestamp=278243 (Sun Sep 30 22:09:54 2018)
        host-id=1
        score=3000
        vm_conf_refresh_time=278243 (Sun Sep 30 22:09:54 2018)
        conf_on_shared_storage=True
        maintenance=False
        state=GlobalMaintenance
        stopped=False


!! Cluster is in GLOBAL MAINTENANCE mode !!

You have new mail in /var/spool/mail/root

If I'll be able to reproduce above ERROR about connection to API, I'll open separate BZ. But for upgrade as whole, it worked.

Comment 22 errata-xmlrpc 2018-11-05 15:03:44 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:3482

Note You need to log in before you can comment on or make changes to this bug.