Description of problem: The environment is having a custom Apache CA certificate modified as per the doc https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.3/html-single/administration_guide/index#Replacing_the_Manager_CA_Certificate. When the backup of the engine was restored to a different server, the user fails to login with the error below. === 2019-05-21 15:08:56,383+02 ERROR [org.ovirt.engine.core.aaa.filters.SsoRestApiAuthFilter] (default task-1) [] Cannot authenticate using authentication Headers: server_error: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target 2019-05-21 15:08:56,423+02 ERROR [org.ovirt.engine.core.sso.utils.SsoUtils] (default task-1) [] OAuthException access_denied: Cannot authenticate user 'None@N/A': No valid profile found in credentials.. === It's because engine-backup is not updating the host-wide trust store which is required as per step 1 in the doc. ~~~ Add your CA certificate to the host-wide trust store: # cp /tmp/3rd-party-ca-cert.pem /etc/pki/ca-trust/source/anchors # update-ca-trust ~~~ Version-Release number of selected component (if applicable): rhvm-4.3.3.7-0.1.el7.noarch How reproducible: 100% Steps to Reproduce: 1. Restore the engine-backup to a different server from an environment which is having a custom CA certificate. Actual results: "No valid profile found in credentials" error when trying to login to an environment with custom Apache CA after restoring with engine-backup Expected results: engine-backup should update the host-wide trust store if the environment is having custom Apache CA. Additional info:
For the customer, the issue was observed when the user tried to restore the environment using "hosted-engine --restore-from-file" which uses engine-backup to restore the environment where the deployment failed while it tried to access the API. I am not sure this should be an engine-backup bug or hosted-engine setup bug. However, since it's reproducible in normal environment, opening it as engine-backup bug.
So you expect something like: If /etc/pki/ovirt-engine/apache-ca.pem is a normal file (perhaps also, if it's a link but not to engine's ca.pem): Copy it to /etc/pki/ca-trust/source/anchors update-ca-trust I can see why you ask that. But it's slightly delicate: With the documented procedure, the user can give the file any name they wish, under /etc/pki/ca-trust/source/anchors (unlike in /etc/pki/ovirt-engine, where it must be called apache-ca.pem, or you must do a (unsupported?) manual change to ssl.conf). I'd expect most users to actually do this - e.g. /etc/pki/ca-trust/source/anchors/IT-CA-$ORG.pem. But in engine-backup, we do not know what name was chosen. So we'll probably need to pick a random name, and also verify that it was not already copied there using some other name. Ideally, as in bug 1693816, this should be handled by the documentation itself. Need to think about this a bit.
100746 is not needed, abandoned. engine-backup already restores everything that was backed up. So I suggest to turn this to a doc bug. Under "Replacing the Red Hat Virtualization Manager Apache CA Certificate", we should add a step: ================================================================ 13. To make engine-backup update the system on restore: # mkdir -p /etc/ovirt-engine-backup/engine-backup-config.d cat << __EOF__ >> /etc/ovirt-engine-backup/engine-backup-config.d/update-system-wide-pki.sh BACKUP_PATHS="\${BACKUP_PATHS} /etc/ovirt-engine-backup" cp -f /etc/pki/ovirt-engine/apache-ca.pem /etc/pki/ca-trust/source/anchors/custom-apache-ca.pem update-ca-trust __EOF__ You can replace "custom-apache-ca" with whatever applicable name - e.g. your 3rd-party ssl ca vendor name, your organization's IT department, etc. ================================================================ Moving to QE for testing. If it works, can be moved to doc team. Testing flow: 1. Install and setup an engine 2. Configure it to use a 3rd-party CA according to the procedure linked in comment 0. Make sure you can login successfully. 3. engine-backup, keep the backup file 4. Reinstall the machine (or revert to a snapshot from before setup) 5. Restore the backup 6. engine-setup 7. Try to login With the existing procedure, this should fail as in comment 0. With the procedure when it includes above step 13, it should work.
Works for me on these components: Engine Software Version:4.3.5.4-0.1.el7 ovirt-hosted-engine-ha-2.3.3-1.el7ev.noarch ovirt-hosted-engine-setup-2.3.11-1.el7ev.noarch Linux 3.10.0-1061.el7.x86_64 #1 SMP Thu Jul 11 21:02:44 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 7.7 (Maipo) Please reopen if you still have any issues with this bug.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2019:2431
Hi, is there any reason why we did not include the file /etc/ovirt-engine-backup/engine-backup-config.d/update-system-wide-pki.sh into the system by default? I believe that even if there is no custom certificate, there would be no issue and we would not add an additional manual step to the process. Roman
(In reply to Roman Hodain from comment #11) > Hi, > > is there any reason why we did not include the file > > /etc/ovirt-engine-backup/engine-backup-config.d/update-system-wide-pki.sh > > into the system by default? I believe that even if there is no custom > certificate, there would be no issue and we would not add an additional > manual step to the process. Roman, there are several different issues here. 1. For current bug, it was not supposed to be closed, but to be moved to doc team, see comment 3. I'll now open a doc bug for that. I guess I had to open it right away, sorry for that. 2. Also for current bug, the name "custom-apache-ca.pem" should probably be customized by the user, not hard-coded, as I wrote there and also comment 2. 3. For a more general discussion, see bug 1422980. Feel free to clarify your own opinion re what should or should not be included in engine-backup, and especially how to decide this. Thanks!
(In reply to Yedidyah Bar David from comment #12) > > 1. For current bug, it was not supposed to be closed, but to be moved to doc > team, see comment 3. I'll now open a doc bug for that. I guess I had to open > it right away, sorry for that. Filed bug 1744522 for this.
sync2jira
I'm reopening this bug because it's happening to a customer with these versions: ovirt-ansible-hosted-engine-setup-1.0.26-1.el7ev.noarch ovirt-hosted-engine-ha-2.3.3-1.el7ev.noarch ovirt-hosted-engine-setup-2.3.11-1.el7ev.noarch We're trying to deploy HE from a backup with a 3rd party Apache certificate, but is failing with this error: ~~~ 2019-09-30 11:38:18,269+0200 DEBUG var changed: host "localhost" var "ansible_failed_result" type "<type 'dict'>" value: "{ "_ansible_no_log": false, "_ansible_parsed": true, "changed": false, "exception": "Traceback (most recent call last):\n File \"/tmp/ansible_ovirt_datacenter_payload__kcHe3/__main__.py\", line 221, in main\n ret = data_centers_module.create()\n File \"/tmp/ansible_ovirt_datacenter_payload__kcHe3/ansible_ovirt_datacenter_payload.zip/ansible/module_utils/ovirt.py\", line 573, in create\n entity = self.search_entity(search_params)\n File \"/tmp/ansible_ovirt_datacenter_payload__kcHe3/ansible_ovirt_datacenter_payload.zip/ansible/module_utils/ovirt.py\", line 810, in search_entity\n entity = search_by_attributes(self._service, list_params=list_params, name=self._module.params['name'])\n File \"/tmp/ansible_ovirt_datacenter_payload__kcHe3/ansible_ovirt_datacenter_payload.zip/ansible/module_utils/ovirt.py\", line 258, in search_by_attributes\n **list_params\n File \"/usr/lib64/python2.7/site-packages/ovirtsdk4/services.py\", line 6131, in list\n return self._internal_get(headers, query, wait)\n File \"/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py\", line 211, in _internal_get\n return future.wait() if wait else future\n File \"/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py\", line 54, in wait\n response = self._connection.wait(self._context)\n File \"/usr/lib64/python2.7/site-packages/ovirtsdk4/__init__.py\", line 497, in wait\n return self.__wait(context, failed_auth)\n File \"/usr/lib64/python2.7/site-packages/ovirtsdk4/__init__.py\", line 527, in __wait\n self._sso_token = self._get_access_token()\n File \"/usr/lib64/python2.7/site-packages/ovirtsdk4/__init__.py\", line 628, in _get_access_token\n sso_error[1]\nAuthError: Error during SSO authentication access_denied : Cannot authenticate user 'None@N/A': No valid profile found in credentials..\n", "failed": true, "invocation": { "module_args": { "comment": null, "compatibility_version": null, "description": null, "fetch_nested": false, "force": null, "id": null, "local": false, "mac_pool": null, "name": "Default", "nested_attributes": [], "poll_interval": 3, "quota_mode": null, "state": "present", "timeout": 180, "wait": true } }, "msg": "Error during SSO authentication access_denied : Cannot authenticate user 'None@N/A': No valid profile found in credentials.." }" 2019-09-30 11:38:18,269+0200 DEBUG var changed: host "localhost" var "dc_result_presence" type "<type 'dict'>" value: "{ "changed": false, "exception": "Traceback (most recent call last):\n File \"/tmp/ansible_ovirt_datacenter_payload__kcHe3/__main__.py\", line 221, in main\n ret = data_centers_module.create()\n File \"/tmp/ansible_ovirt_datacenter_payload__kcHe3/ansible_ovirt_datacenter_payload.zip/ansible/module_utils/ovirt.py\", line 573, in create\n entity = self.search_entity(search_params)\n File \"/tmp/ansible_ovirt_datacenter_payload__kcHe3/ansible_ovirt_datacenter_payload.zip/ansible/module_utils/ovirt.py\", line 810, in search_entity\n entity = search_by_attributes(self._service, list_params=list_params, name=self._module.params['name'])\n File \"/tmp/ansible_ovirt_datacenter_payload__kcHe3/ansible_ovirt_datacenter_payload.zip/ansible/module_utils/ovirt.py\", line 258, in search_by_attributes\n **list_params\n File \"/usr/lib64/python2.7/site-packages/ovirtsdk4/services.py\", line 6131, in list\n return self._internal_get(headers, query, wait)\n File \"/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py\", line 211, in _internal_get\n return future.wait() if wait else future\n File \"/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py\", line 54, in wait\n response = self._connection.wait(self._context)\n File \"/usr/lib64/python2.7/site-packages/ovirtsdk4/__init__.py\", line 497, in wait\n return self.__wait(context, failed_auth)\n File \"/usr/lib64/python2.7/site-packages/ovirtsdk4/__init__.py\", line 527, in __wait\n self._sso_token = self._get_access_token()\n File \"/usr/lib64/python2.7/site-packages/ovirtsdk4/__init__.py\", line 628, in _get_access_token\n sso_error[1]\nAuthError: Error during SSO authentication access_denied : Cannot authenticate user 'None@N/A': No valid profile found in credentials..\n", "failed": true, "msg": "Error during SSO authentication access_denied : Cannot authenticate user 'None@N/A': No valid profile found in credentials.." }" 2019-09-30 11:38:18,270+0200 ERROR ansible failed {'status': 'FAILED', 'ansible_type': 'task', 'ansible_task': u'Ensure that the target datacenter is present', 'ansible_result': u'type: <type \'dict\'>\nstr: {u\'exception\': u\'Traceback (most recent call last):\\n File "/tmp/ansible_ovirt_datacenter_payload__kcHe3/__main__.py", line 221, in main\\n ret = data_centers_module.create()\\n File "/tmp/ansible_ovirt_datacenter_payload__kcHe3/ansible_ovirt_datacenter_payload.zip/ansible/module_utils/ovirt.py",', 'task_duration': 4, 'ansible_host': u'localhost', 'ansible_playbook': u'/usr/share/ovirt-hosted-engine-setup/ansible/trigger_role.yml'} ~~~ We've tried to add the hook described here: https://access.redhat.com/solutions/4198801 But for some reason, it's not executed: ~~~ 2019-09-30 11:26:20,917+0200 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:109 TASK [ovirt.hosted_engine_setup : Include after engine-setup custom tasks files for the engine VM] 2019-09-30 11:26:24,522+0200 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:103 TASK [ovirt.hosted_engine_setup : debug] 2019-09-30 11:26:26,325+0200 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:103 include_after_engine_setup_results: {'skipped_reason': u'No items in the list', 'skipped': True, 'results': [], 'changed': False} <-------- 2019-09-30 11:26:28,328+0200 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:109 TASK [ovirt.hosted_engine_setup : Wait for the engine to reach a stable condition] ~~~
I think we don't want to reopen this bug since a new doc bug 1744522 was created where an extra step will be added in our doc for "customer SSL certificate" as per comment #3.
(In reply to Juan Orti Alcaine from comment #16) > We've tried to add the hook described here: > https://access.redhat.com/solutions/4198801 > > But for some reason, it's not executed: > It looks like the hook path has changed in 4.3. I have modified the solution.
Ok, with the new hook path it worked. Treating this as a documentation bug won't fix the problem for all the people with custom certificates who already followed the documentation and have their backups. All of them will fail to restore. Everybody has the CA in the file /etc/pki/ovirt-engine/apache-ca.pem as it's explicitly indicated in the documentation to use that name. It makes sense to add that certificate to the trust store if it isn't a symlink.
(In reply to Juan Orti Alcaine from comment #21) > Ok, with the new hook path it worked. > > Treating this as a documentation bug won't fix the problem for all the > people with custom certificates who already followed the documentation and > have their backups. All of them will fail to restore. I agree, and this is indeed a real problem. But I do not have a good solution. > > Everybody has the CA in the file /etc/pki/ovirt-engine/apache-ca.pem as it's > explicitly indicated in the documentation to use that name. It makes sense > to add that certificate to the trust store if it isn't a symlink. Under what name? The machine's truststore is not owned by engine-setup - the user is free to maintain it as needed. The documentation is simply helping the user do that. Suppose, for example, that an admin uses some 3rd-party CA for the engine, sets apache-ca.pem to this CA's cert, and adds it under the global truststore as /etc/pki/ca-trust/source/anchors/my-favourite-ca.pem . Then there are some business issues with this favourite CA, and the user does not want to have them in the global truststore (because perhaps they are not trustworthy anymore), but keeps them in apache-ca.pem (because it's not considered important by the user, or simply forgotten, or whatever other reason). This is a situation that is definitely up to the user to handle, even, sadly, during tough times such as an urgent restore. No tooling can know for sure what to do. Admittedly, I probably have less real-world experience than most of the people commenting on this bug, and don't have good means to estimate the trade-off between just copying the file always during restore, adding more code to do this more carefully (perhaps prompt and ask the user?), or do nothing (and rely only on doc). Do you think this is so important that it now requires clear and careful design? asking the user? what to check? Go over all the truststore and try to guess if it's already included, and then do not copy? etc. Again, see the discussion on bug 1422980. Where do you stop? Include all of /etc/pki/ca-trust/source/anchors ? /etc/pki/ca-trust? /etc ? Where do you say "This is up to the user to backup/restore, it's not part of the engine"?
As said in comment #10, please open a new bug instead of re-opening this one.
The Apache configuration of RHV uses /etc/pki/ovirt-engine/apache-ca.pem, and that's the file name instructed to use in the documentation. IMHO I'd include that CA file name in the global trust store when restoring, I simply don't see why not. I'll open a new bug if I find this problem again in the future.