Bug 1841195 - Hosted Engine deployment fails with restored backup from 4.3.9 when CA renewal is selected
Summary: Hosted Engine deployment fails with restored backup from 4.3.9 when CA renewa...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: General
Version: 4.4.0
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: ovirt-4.4.2
: 4.4.2.1
Assignee: Yedidyah Bar David
QA Contact: Nikolai Sednev
URL:
Whiteboard:
Depends On: 1868571
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-05-28 15:46 UTC by Oliver Leinfelder
Modified: 2020-09-18 07:13 UTC (History)
5 users (show)

Fixed In Version: ovirt-engine-4.4.2.1
Clone Of:
Environment:
Last Closed: 2020-09-18 07:13:01 UTC
oVirt Team: Integration
Embargoed:
sbonazzo: ovirt-4.4?
sbonazzo: planning_ack?
sbonazzo: devel_ack+
mavital: testing_ack+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 110441 0 master MERGED packaging: pki: renew: Check if ca file exists 2020-08-24 10:18:41 UTC

Description Oliver Leinfelder 2020-05-28 15:46:18 UTC
Description of problem:

Deployment of hosted engine fails with restored backup from hosted engine 4.3.9 when CA renewal is selected

Version-Release number of selected component (if applicable):
4.4.0

How reproducible:
Always

Steps to Reproduce:
1. Create backup of production hosted engine engine-backup --scope=all --mode=backup --file=/backups/migration-4.4/backup.bck --log=/backups/migration-4.4/backuplog.log
2. Clean install of host with oVirt Node 4.4.0 release ISO
3. hosted-engine --deploy --restore-from-file=/root/backup.bck, select yes when asked for CA renewal

Actual results:

Deploy fails:

A log file in  /var/log/ovirt-engine/setup in the running but unfinished VM shows:

2020-05-27 00:17:09,660+0200 DEBUG otopi.context context._executeMethod:145 method exception
Traceback (most recent call last):
  File "/usr/lib64/python3.6/site-packages/M2Crypto/BIO.py", line 279, in openfile
    f = open(filename, mode)
FileNotFoundError: [Errno 2] No such file or directory: '/etc/pki/ovirt-engine/qemu-ca.pem'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/otopi/context.py", line 132, in _executeMethod
    method['method']()
  File "/usr/share/ovirt-engine/setup/bin/../plugins/ovirt-engine-setup/ovirt-engine/pki/ca.py", line 699, in _miscUpgrade
    if self._expired(self._x509_load_cert(ca_file)):
  File "/usr/share/ovirt-engine/setup/bin/../plugins/ovirt-engine-setup/ovirt-engine/pki/ca.py", line 94, in _x509_load_cert
    res = X509.load_cert(f)
  File "/usr/lib64/python3.6/site-packages/M2Crypto/X509.py", line 802, in load_cert
    with BIO.openfile(file) as bio:
  File "/usr/lib64/python3.6/site-packages/M2Crypto/BIO.py", line 281, in openfile
    raise BIOError(ex.args)
M2Crypto.BIO.BIOError: (2, 'No such file or directory')
2020-05-27 00:17:09,663+0200 ERROR otopi.context context._executeMethod:154 Failed to execute stage 'Misc configuration': (2, 'No such file or directory') 

Expected results:

Setup completes.

Additional info:

CA in production HE is valid.

Comment 1 RHEL Program Management 2020-05-29 03:56:15 UTC
The documentation text flag should only be set after 'doc text' field is provided. Please provide the documentation text and set the flag to '?' again.

Comment 2 Michal Skrivanek 2020-05-29 06:50:02 UTC
do we want to do the renewal during restore? we can also just leave that as a post-restore task if it's complicated...

Comment 3 Oliver Leinfelder 2020-05-29 09:55:49 UTC
Renewal was not strictly necessary in this case (this was more a case of "Why not while we're at it .."). I had this issue on the users mailing list and Didi asked me to raise this here.

Comment 4 Yedidyah Bar David 2020-06-01 06:17:45 UTC
(In reply to Michal Skrivanek from comment #2)
> do we want to do the renewal during restore? we can also just leave that as
> a post-restore task if it's complicated...

We can.

Main reason for combining is that both upgrade and pki renew require a significant downtime, and we assume that many people would prefer to have a single such downtime instead of two.

See also bug 1648190.

(In reply to Oliver Leinfelder from comment #3)
> Renewal was not strictly necessary in this case (this was more a case of
> "Why not while we're at it .."). I had this issue on the users mailing list
> and Didi asked me to raise this here.

Indeed. Thanks!

The problem seems to be around our new code for the qemu ca, affecting only 4.4. I didn't try to reproduce yet myself.

Comment 5 Sandro Bonazzola 2020-06-23 16:04:46 UTC
Didi isn't this a duplicate of bug #1841203 ?

Comment 6 Yedidyah Bar David 2020-07-22 13:41:29 UTC
(In reply to Sandro Bonazzola from comment #5)
> Didi isn't this a duplicate of bug #1841203 ?

I don't think they are related.

It seems to be introduced by a patch for bug 1739557 [1].

It adds code to handle qemu-ca authority:

- A method _create_qemu_ca, with name=QEMU_CA_AVAILABLE, that creates the CA

- Some other stuff, including generalizing the code handling the engine's CA, and in particular, handling of renewal also of qemu-ca, in the existing method _miscUpgrade.

_miscUpgrade was changed to be ran before=QEMU_CA_AVAILABLE. This means that we try to upgrade, before we create. So if we do need to upgrade, we fail.

Milan - do you remember why you added this before=QEMU_CA_AVAILABLE? If it's needed, we should add a small test to not fail if it's missing, something like "os.path.exists(ca_file) and ...". Otherwise, perhaps we should run them in the opposite order, changing to "after=QEMU_CA_AVAILABLE".

[1] https://gerrit.ovirt.org/104240

Comment 7 Milan Zamazal 2020-07-22 16:39:21 UTC
(In reply to Yedidyah Bar David from comment #6)

> Milan - do you remember why you added this before=QEMU_CA_AVAILABLE?

I think I just followed what happens to CA_AVAILABLE. I tried to handle the QEMU CA in a similar way as the primary CA. Which may be right or wrong in this case, IIRC I didn't have any extra reason beyond copy&paste to put the thing at that particular place.

Comment 8 Nikolai Sednev 2020-08-25 15:19:03 UTC
Blocked with currently outdated rhvm-appliance-4.4-20200722.0.el8ev.x86_64.rpm appliance. Will have to wait for the latest bits to arrive.

Comment 9 Nikolai Sednev 2020-08-25 17:00:34 UTC
Upgrade worked for me with CA renewal as expected on these components:
ovirt-engine-setup-4.4.2.3-0.6.el8ev.noarch
ovirt-ansible-hosted-engine-setup-1.1.8-1.el8ev.noarch
ovirt-hosted-engine-setup-2.4.6-1.el8ev.noarch
ovirt-hosted-engine-ha-2.4.4-1.el8ev.noarch
Red Hat Enterprise Linux release 8.2 (Ootpa)
Linux 4.18.0-193.14.3.el8_2.x86_64 #1 SMP Mon Jul 20 15:02:29 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

Comment 10 Sandro Bonazzola 2020-09-18 07:13:01 UTC
This bugzilla is included in oVirt 4.4.2 release, published on September 17th 2020.

Since the problem described in this bug report should be resolved in oVirt 4.4.2 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.