Bug 1841203 - Hosted Engine deployment fails with restored backup from 4.3.9: PKIX path building failed
Summary: Hosted Engine deployment fails with restored backup from 4.3.9: PKIX path bui...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: General
Version: 4.4.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ovirt-4.4.3
: ---
Assignee: Yedidyah Bar David
QA Contact: Wei Wang
URL:
Whiteboard:
Depends On: 1859505
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-05-28 15:53 UTC by Oliver Leinfelder
Modified: 2023-09-14 06:01 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-11-11 06:41:21 UTC
oVirt Team: Integration
Embargoed:
sbonazzo: ovirt-4.4?
sbonazzo: planning_ack?
sbonazzo: devel_ack+
mavital: testing_ack+


Attachments (Terms of Use)
he restore with self-ca logs (1.17 MB, application/gzip)
2020-06-11 06:31 UTC, Wei Wang
no flags Details
backup file (1.03 MB, application/gzip)
2020-07-20 01:50 UTC, Wei Wang
no flags Details

Description Oliver Leinfelder 2020-05-28 15:53:13 UTC
Description of problem:

Deployment of hosted engine fails with error

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
Description of problem:

Deployment of hosted engine fails with restored backup from hosted engine 4.3.9 when CA renewal is selected

Version-Release number of selected component (if applicable):
4.4.0

How reproducible:
Always

Steps to Reproduce:
1. Create backup of production hosted engine engine-backup --scope=all --mode=backup --file=/backups/migration-4.4/backup.bck --log=/backups/migration-4.4/backuplog.log
2. Clean install of host with oVirt Node 4.4.0 release ISO
3. hosted-engine --deploy --restore-from-file=/root/backup.bck

Actual results:

Deploy fails with the following error:

[ ERROR ] ovirtsdk4.AuthError: Error during SSO authentication server_error : PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
[ ERROR ] fatal: [localhost]: FAILED! => {"attempts": 50, "changed": false, "msg": "Error during SSO authentication server_error : PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target"}

/var/log/engine.log in the HE shows:

2020-05-27 16:10:43,695+02 ERROR [org.ovirt.engine.core.sso.utils.SsoUtils] (default task-8) [] OAuthException server_error: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
2020-05-27 16:10:53,962+02 INFO [org.ovirt.engine.extension.aaa.jdbc.core.Authentication] (default task-8) [] locking user: admin due to interval failures
2020-05-27 16:10:58,956+02 ERROR [org.ovirt.engine.core.sso.utils.SsoUtils] (default task-8) [] OAuthException server_error: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
2020-05-27 16:11:09,222+02 INFO [org.ovirt.engine.extension.aaa.jdbc.core.Authentication] (default task-8) [] locking user: admin due to interval failures
2020-05-27 16:11:14,217+02 ERROR [org.ovirt.engine.core.sso.utils.SsoUtils] (default task-8) [] OAuthException server_error: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
2020-05-27 16:11:24,484+02 INFO [org.ovirt.engine.extension.aaa.jdbc.core.Authentication] (default task-8) [] locking user: admin due to interval failures
2020-05-27 16:11:29,480+02 ERROR [org.ovirt.engine.core.sso.utils.SsoUtils] (default task-8) [] OAuthException server_error: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target


Expected results:

Setup completes.

Additional info:

Possibly relevant: Production hosted engine uses user supplied certificate for the web server (and only for the web service, other certs/CA were generated by oVirt).

Comment 1 RHEL Program Management 2020-05-29 03:55:34 UTC
The documentation text flag should only be set after 'doc text' field is provided. Please provide the documentation text and set the flag to '?' again.

Comment 2 Michal Skrivanek 2020-05-29 06:54:59 UTC
sounds like bug 1816648. Can you doublecheck the restore script you've used has that fix, and that you are running postgres 12?

Comment 3 Oliver Leinfelder 2020-05-29 09:53:14 UTC
Just to be sure there is no mixup: This is about a hosted engine deployment on a clean oVirt node installation (based on 4.4.0 release ISO from ovirt.org) so postgres was never installed directly by me.

Comment 4 Michal Skrivanek 2020-05-29 10:23:46 UTC
Understood. It still needs to be enabled by dnf module enable postgresql:12 first, before you start the installation

Comment 5 Oliver Leinfelder 2020-05-29 10:45:03 UTC
I'm still at a a loss here, I'm sorry.

Wouldn't that be the job of some - presumably ansible - script during deployment since postgres is inside the VM which is getting built and installed with minimal interaction with me?

Comment 6 Wei Wang 2020-06-05 16:41:57 UTC
Test Version:
4.3.9 build: ovirt-node-ng-installer-4.3.9-2020031917.el7.iso   ovirt-engine-appliance-4.3-20200319.1.el7.x86_64.rpm
4.4.0 build: ovirt-node-ng-installer-4.4.0-2020052110.el8.iso   ovirt-engine-appliance-4.4-20200520111649.1.el8.x86_64.rpm

Test Steps:
According to comment 0

Test Result:
Deployment of hosted engine successfully with restored backup from hosted engine 4.3.9 if reply "No" to question "Renew engine CA on restore if needed? Please notice that if you choose Yes, all hosts will have to be later manually reinstalled from the engine.". 


[ INFO  ] Generating answer file '/var/lib/ovirt-hosted-engine-setup/answers/answers-20200606002713.conf'
[ INFO  ] Generating answer file '/etc/ovirt-hosted-engine/answers.conf'
[ INFO  ] Stage: Pre-termination
[ INFO  ] Stage: Termination
[ INFO  ] Hosted Engine successfully deployed
[ INFO  ] Other hosted-engine hosts have to be reinstalled in order to update their storage configuration. From the engine, host by host, please set maintenance mode and then click on reinstall button ensuring you choose DEPLOY in hosted engine tab.
[ INFO  ] Please note that the engine VM ssh keys have changed. Please remove the engine VM entry in ssh known_hosts on your clients.

QE cannot reproduce this issue.

oliver.leinfelder,
Do you have some special operation. If so, please listing the detail steps for the special operation.

Comment 7 Oliver Leinfelder 2020-06-08 10:20:46 UTC
My backup file comes from a hosted engine 4.3.9-1.el7.

I'm using "hosted-engine --deploy --restore-from-file=/root/backup.bck" on a clean install ovirt node ng 4.4.0-1.el8

I replied "No" to question "Renew engine CA on restore if needed? Please notice that if you choose Yes, all hosts will have to be later manually reinstalled from the engine."

The production hosted engine has it's apache-ca.pem replaced by a user provided CA according to this: https://www.ovirt.org/documentation/admin-guide/appe-oVirt_and_SSL.html (and only Apache cert, all other certs remain untouched)

Comment 8 Yaning Wang 2020-06-10 02:35:10 UTC
> replaced by a user provided CA
you mean a valid CA signed by letsencrypt or other authorities?
and

> The production hosted engine
has a valid domain name which can be visited from www?

Comment 9 Oliver Leinfelder 2020-06-10 07:56:48 UTC
No, the CA is user created but imported into the host trust store (as outlined in https://www.ovirt.org/documentation/admin-guide/appe-oVirt_and_SSL.html)

The hosted engine is not reachable globally but has a certificate from our (user created) CA to its FQDN + subject alternative name (both of which are resolvable via internal DNS)

Comment 10 Wei Wang 2020-06-11 06:26:17 UTC
Test Version:
4.3.9 build: ovirt-node-ng-installer-4.3.9-2020031917.el7.iso   ovirt-engine-appliance-4.3-20200319.1.el7.x86_64.rpm
4.4.0 build: ovirt-node-ng-installer-4.4.0-2020052110.el8.iso   ovirt-engine-appliance-4.4-20200520111649.1.el8.x86_64.rpm

Test according to comment 0 with a third-party CA certificate according to this: https://www.ovirt.org/documentation/admin-guide/appe-oVirt_and_SSL.html (and only Apache cert, all other certs remain untouched)

Result:
hosted engine restore failed at a similar issue

~~~~
engine.log
~~~~
2020-06-11 14:08:09,738+08 ERROR [org.ovirt.engine.core.aaa.filters.SsoRestApiAuthFilter] (default task-1) [] Cannot authenticate using authentication Headers: server_error: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
2020-06-11 14:08:09,764+08 ERROR [org.ovirt.engine.core.sso.utils.SsoUtils] (default task-1) [] OAuthException access_denied: Cannot authenticate user 'None@N/A': No valid profile found in credentials..

Is this issue same with comment 0? If so, There is a d/s bug https://bugzilla.redhat.com/show_bug.cgi?id=1715767, similar to this one. And it was fixed as a document issue https://bugzilla.redhat.com/show_bug.cgi?id=1744522.

Comment 11 Wei Wang 2020-06-11 06:31:11 UTC
Created attachment 1696671 [details]
he restore with self-ca logs

Comment 13 Yedidyah Bar David 2020-07-16 14:12:40 UTC
(In reply to Sandro Bonazzola from comment #12)
> Please let us know if documentation at
> https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.3/
> html-single/administration_guide/
> index?lb_target=production#Replacing_the_Manager_CA_Certificate

Notably, step 14 there, which makes engine-backup include the custom cert.

> or equivalent:
> https://ovirt.org/documentation/admin-guide/appe-oVirt_and_SSL.html
> solves the issue.

That's an old, broken link. The correct one is:

https://ovirt.org/documentation/administration_guide/#Replacing_the_Manager_CA_Certificate

(In reply to Wei Wang from comment #10)
> Test Version:
> 4.3.9 build: ovirt-node-ng-installer-4.3.9-2020031917.el7.iso  
> ovirt-engine-appliance-4.3-20200319.1.el7.x86_64.rpm
> 4.4.0 build: ovirt-node-ng-installer-4.4.0-2020052110.el8.iso  
> ovirt-engine-appliance-4.4-20200520111649.1.el8.x86_64.rpm
> 
> Test according to comment 0 with a third-party CA certificate according to
> this:
> https://www.ovirt.org/documentation/admin-guide/appe-oVirt_and_SSL.html (and
> only Apache cert, all other certs remain untouched)
> 
> Result:
> hosted engine restore failed at a similar issue

Wei, can you please attach also the backup file generated by engine-backup? Thanks. If you still have the machines, I'd like to have a look at them.

> 
> ~~~~
> engine.log
> ~~~~
> 2020-06-11 14:08:09,738+08 ERROR
> [org.ovirt.engine.core.aaa.filters.SsoRestApiAuthFilter] (default task-1) []
> Cannot authenticate using authentication Headers: server_error: PKIX path
> building failed: sun.security.provider.certpath.SunCertPathBuilderException:
> unable to find valid certification path to requested target
> 2020-06-11 14:08:09,764+08 ERROR [org.ovirt.engine.core.sso.utils.SsoUtils]
> (default task-1) [] OAuthException access_denied: Cannot authenticate user
> 'None@N/A': No valid profile found in credentials..
> 
> Is this issue same with comment 0? If so, There is a d/s bug
> https://bugzilla.redhat.com/show_bug.cgi?id=1715767, similar to this one.
> And it was fixed as a document issue
> https://bugzilla.redhat.com/show_bug.cgi?id=1744522.

Indeed, the result of this was adding step 14 that I mentioned above.

Comment 14 Wei Wang 2020-07-20 01:46:50 UTC
(In reply to Yedidyah Bar David from comment #13)
> (In reply to Sandro Bonazzola from comment #12)
> > Please let us know if documentation at
> > https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.3/
> > html-single/administration_guide/
> > index?lb_target=production#Replacing_the_Manager_CA_Certificate
> 
> Notably, step 14 there, which makes engine-backup include the custom cert.
> 
> > or equivalent:
> > https://ovirt.org/documentation/admin-guide/appe-oVirt_and_SSL.html
> > solves the issue.
> 
> That's an old, broken link. The correct one is:
> 
> https://ovirt.org/documentation/administration_guide/
> #Replacing_the_Manager_CA_Certificate
> 
> (In reply to Wei Wang from comment #10)
> > Test Version:
> > 4.3.9 build: ovirt-node-ng-installer-4.3.9-2020031917.el7.iso  
> > ovirt-engine-appliance-4.3-20200319.1.el7.x86_64.rpm
> > 4.4.0 build: ovirt-node-ng-installer-4.4.0-2020052110.el8.iso  
> > ovirt-engine-appliance-4.4-20200520111649.1.el8.x86_64.rpm
> > 
> > Test according to comment 0 with a third-party CA certificate according to
> > this:
> > https://www.ovirt.org/documentation/admin-guide/appe-oVirt_and_SSL.html (and
> > only Apache cert, all other certs remain untouched)
> > 
> > Result:
> > hosted engine restore failed at a similar issue
> 
> Wei, can you please attach also the backup file generated by engine-backup?
> Thanks. If you still have the machines, I'd like to have a look at them.
> 
I have the backup file in my local computer, but the test environment has gone already. I attached the backup file in attachment.
> > 
> > ~~~~
> > engine.log
> > ~~~~
> > 2020-06-11 14:08:09,738+08 ERROR
> > [org.ovirt.engine.core.aaa.filters.SsoRestApiAuthFilter] (default task-1) []
> > Cannot authenticate using authentication Headers: server_error: PKIX path
> > building failed: sun.security.provider.certpath.SunCertPathBuilderException:
> > unable to find valid certification path to requested target
> > 2020-06-11 14:08:09,764+08 ERROR [org.ovirt.engine.core.sso.utils.SsoUtils]
> > (default task-1) [] OAuthException access_denied: Cannot authenticate user
> > 'None@N/A': No valid profile found in credentials..
> > 
> > Is this issue same with comment 0? If so, There is a d/s bug
> > https://bugzilla.redhat.com/show_bug.cgi?id=1715767, similar to this one.
> > And it was fixed as a document issue
> > https://bugzilla.redhat.com/show_bug.cgi?id=1744522.
> 
> Indeed, the result of this was adding step 14 that I mentioned above.

Comment 15 Wei Wang 2020-07-20 01:50:15 UTC
Created attachment 1701696 [details]
backup file

Comment 16 Yedidyah Bar David 2020-07-22 09:50:14 UTC
(In reply to Wei Wang from comment #15)
> Created attachment 1701696 [details]
> backup file

The content of this file do not seem to include what I'd have expected if you followed step 14 (for engine-backup). Are you sure you did? Can you please retry?

I now noticed that this step 14 is broken - filed bug 1859505 for that.

Oliver - can you please also check this on your setup?

Thanks!

Comment 17 Wei Wang 2020-07-22 10:20:27 UTC
(In reply to Yedidyah Bar David from comment #16)
> (In reply to Wei Wang from comment #15)
> > Created attachment 1701696 [details]
> > backup file
> 
> The content of this file do not seem to include what I'd have expected if
> you followed step 14 (for engine-backup). Are you sure you did? Can you
> please retry?
I am not sure the problem in comment 10 is same with comment 0, so I haven't try the step 14. I will try it after I finish 4.3.11 svvp testing.
> 
> I now noticed that this step 14 is broken - filed bug 1859505 for that.
> 
> Oliver - can you please also check this on your setup?
> 
> Thanks!

Comment 18 Wei Wang 2020-08-07 08:09:59 UTC
(In reply to Wei Wang from comment #17)
> (In reply to Yedidyah Bar David from comment #16)
> > (In reply to Wei Wang from comment #15)
> > > Created attachment 1701696 [details]
> > > backup file
> > 
> > The content of this file do not seem to include what I'd have expected if
> > you followed step 14 (for engine-backup). Are you sure you did? Can you
> > please retry?
> I am not sure the problem in comment 10 is same with comment 0, so I haven't
> try the step 14. I will try it after I finish 4.3.11 svvp testing.

Test with the below script instead of that in step 14, hosted engine restore successfully.
=========================================================
BACKUP_PATHS="${BACKUP_PATHS}
/etc/ovirt-engine-backup"
cp -f /etc/pki/ovirt-engine/apache-ca.pem \
    /etc/pki/ca-trust/source/anchors/3rd-party-ca-cert.pem
update-ca-trust
=========================================================

> > 
> > I now noticed that this step 14 is broken - filed bug 1859505 for that.
> > 
> > Oliver - can you please also check this on your setup?
> > 
> > Thanks!

Comment 19 Yedidyah Bar David 2020-08-11 06:03:27 UTC
Thanks! Anything else missing for marking this bug verified, other than bug 1859505?

Comment 20 Wei Wang 2020-08-11 06:15:00 UTC
(In reply to Yedidyah Bar David from comment #19)
> Thanks! Anything else missing for marking this bug verified, other than bug
> 1859505?

You are welcome! Nothing. QE will verify this bug after bug 1859505 is fixed and ON_QA status is changed.

Comment 21 Yedidyah Bar David 2020-08-11 09:12:27 UTC
Very well. Moving to MODIFIED for now.

Comment 22 Wei Wang 2020-08-21 02:04:01 UTC
https://bugzilla.redhat.com/show_bug.cgi?id=1859505 status is NEW, move this bug to "ASSIGNED"

Comment 23 Yedidyah Bar David 2020-08-25 06:11:22 UTC
Moving back to MODIFIED, not sure what the process should ideally be, exactly. Perhaps we should not automatically move bugs from MODIFIED to QE if they depend on bugs in state <= MODIFIED? Perhaps this is hard...

Comment 24 Yedidyah Bar David 2020-09-06 12:14:38 UTC
Dependent doc bug 1859505 closed/published, moving to QE.

Comment 25 Yedidyah Bar David 2020-09-06 12:17:26 UTC
Based on comment 18, the procedure after applying the fix in bug 1859505 was already tested, so it's safe to move to VERIFIED. Will leave it for QE to do.

Comment 26 Wei Wang 2020-09-07 01:27:23 UTC
According to comment 24 and comment 25, QE move this bug to VERIFIED.

Comment 27 Sandro Bonazzola 2020-11-11 06:41:21 UTC
This bugzilla is included in oVirt 4.4.3 release, published on November 10th 2020.

Since the problem described in this bug report should be resolved in oVirt 4.4.3 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.

Comment 28 Red Hat Bugzilla 2023-09-14 06:01:17 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days


Note You need to log in before you can comment on or make changes to this bug.