Bug 2011044

Summary: 16.2 ReaR restore during the stage authenticate with Ceph with controller is failing
Product: Red Hat OpenStack Reporter: myadla
Component: tripleo-ansibleAssignee: Juan Larriba <jlarriba>
Status: CLOSED ERRATA QA Contact: myadla
Severity: high Docs Contact:
Priority: medium    
Version: 16.2 (Train)CC: aschultz, elicohen, jbadiapa, jlarriba, lbezdick, myadla, spower
Target Milestone: z1Keywords: Triaged
Target Release: 16.2 (Train on RHEL 8.4)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: tripleo-ansible-0.7.1-2.20210603175844.el8ost.9 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 2015480 (view as bug list) Environment:
Last Closed: 2021-12-09 20:41:22 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2015480    

Description myadla 2021-10-05 20:06:41 UTC
Description of problem:
16.2 ReaR restore during the stage authenticate with Ceph with controller is failing

[heat-admin@controller-0 ~]$ sudo podman exec ceph-mon-controller-0 ceph -n client.admin -k /var/lib/ceph/ceph-authentication.bak -s
[errno 13] error connecting to the cluster
[heat-admin@controller-0 ~]$ 

Error:
======
TASK [Try to authenticate with Ceph with controller-0] *************************
task path: /home/rhos-ci/jenkins/workspace/DFG-enterprise-backup_restore-16.2_director-rhel-virthost-3cont_2comp_3ceph-ipv4-geneve-ir_osp_client/infrared/plugins/backup-restore/playbooks/restore_controllers.yaml:416
Tuesday 05 October 2021  19:50:13 +0000 (0:00:02.511)       2:13:26.469 ******* 
fatal: [undercloud-0]: FAILED! => {
    "changed": true,
    "cmd": "ssh heat-admin.24.10 \"sudo podman exec ceph-mon-controller-0 ceph -n client.admin -k /var/lib/ceph/ceph-authentication.bak -s\"",
    "delta": "0:00:01.061161",
    "end": "2021-10-05 19:50:14.771952",
    "failed_when_result": true,
    "rc": 13,
    "start": "2021-10-05 19:50:13.710791"
}

STDERR:

Warning: Permanently added '192.168.24.10' (ECDSA) to the list of known hosts.

[errno 13] error connecting to the cluster


MSG:

non-zero return code

Console log:
https://rhos-ci-jenkins.lab.eng.tlv2.redhat.com/view/DFG/view/enterprise/view/backup_restore/job/DFG-enterprise-backup_restore-16.2_director-rhel-virthost-3cont_2comp_3ceph-ipv4-geneve-ir_osp_client/20/consoleFull

Restore log: 
https://rhos-ci-jenkins.lab.eng.tlv2.redhat.com/job/DFG-enterprise-backup_restore-16.2_director-rhel-virthost-3cont_2comp_3ceph-ipv4-geneve-ir_osp_client/20/artifact/.sh/ReaR_Restore.log

Version-Release number of selected component (if applicable):
16.2

How reproducible:
100%


Steps to Reproduce:
1. Run 16.2 B&R job in rhos-ci-jenkins
https://rhos-ci-jenkins.lab.eng.tlv2.redhat.com/view/DFG/view/enterprise/view/backup_restore/job/DFG-enterprise-backup_restore-16.2_director-rhel-virthost-3cont_2comp_3ceph-ipv4-geneve-ir_osp_client/
2. It fails during stage ReaR Restore

Actual results:
[heat-admin@controller-0 ~]$ sudo podman exec ceph-mon-controller-0 ceph -n client.admin -k /var/lib/ceph/ceph-authentication.bak -s
[errno 13] error connecting to the cluster
[heat-admin@controller-0 ~]$ 

Expected results: It should pass


Additional info:

Comment 2 Juan Badia Payno 2021-10-11 18:08:42 UTC
The problem is due to that the ceph-authentication file is not being generated.

Comment 14 errata-xmlrpc 2021-12-09 20:41:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Release of components for Red Hat OpenStack Platform 16.2.1 (Train)), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:5067