Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
This project is now read‑only. Starting Monday, February 2, please use Jira Cloud for all bug tracking management.

Bug 1792320

Summary: "ceph-handler : unset noup flag attempts to use container not on host
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: John Fulton <johfulto>
Component: Ceph-AnsibleAssignee: Guillaume Abrioux <gabrioux>
Status: CLOSED ERRATA QA Contact: Vasishta <vashastr>
Severity: urgent Docs Contact: Bara Ancincova <bancinco>
Priority: medium    
Version: 4.1CC: agunn, amsyedha, aschoen, bancinco, ceph-eng-bugs, edonnell, flucifre, fpantano, gabrioux, gcharot, gfidente, gmeno, jbrier, jvisser, kdreyer, knortema, nthomas, nweinber, pasik, pgrist, sunnagar, tchandra, tserlin, vashastr, ykaul, yobshans, yrabl
Target Milestone: rcKeywords: Regression
Target Release: 4.1Flags: knortema: needinfo+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ceph-ansible-4.0.13-1.el8cp, ceph-ansible-4.0.13-1.el7cp Doc Type: Bug Fix
Doc Text:
.{storage-product} installation on Red Hat OpenStack Platform no longer fails Previously, the `ceph-ansible` utility became unresponsive when attempting to install {product} with the Red Hat OpenStack Platform 16, and it returns an error similar to the following: ---- 'Error: unable to exec into ceph-mon-dcn1-computehci1-2: no container with name or ID ceph-mon-dcn1-computehci1-2 found: no such container' ---- This occurred because `ceph-ansible` reads the value of the fact `container_exec_cmd` from the wrong node in handler_osds.yml With this update, `ceph-ansible` reads the value of `container_exec_cmd` from the correct node, and the installation proceeds successfully.
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-05-19 17:32:06 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1730176, 1760354, 1816167    
Attachments:
Description Flags
ceph-ansible playbook log, inventory and vars - from failedqa run none

Description John Fulton 2020-01-17 14:15:04 UTC
ceph-ansible 4.0.10 encounters the following error [1]

'Error: unable to exec into ceph-mon-dcn1-computehci1-2: no container with name or ID ceph-mon-dcn1-computehci1-2 found: no such container'

on this task:

https://github.com/ceph/ceph-ansible/blob/v4.0.10/roles/ceph-handler/tasks/handler_osds.yml#L8

This can be avoided by updating the task from OLD to NEW as follows.

OLD:

- name: unset noup flag
  command: "{{ container_exec_cmd | default('') }} ceph --cluster {{ cluster }} osd unset noup"
  delegate_to: "{{ groups[mon_group_name][0] }}"
  changed_when: False

NEW:

- name: unset noup flag
  command: "{{ hostvars[groups[mon_group_name][0]]['container_exec_cmd'] | default('') }} ceph --cluster {{ cluster }} osd unset noup"
  delegate_to: "{{ groups[mon_group_name][0] }}"
  changed_when: False

[1]
2020-01-16 20:07:41,091 p=566779 u=root |  RUNNING HANDLER [ceph-handler : unset noup flag] *******************************
2020-01-16 20:07:41,092 p=566779 u=root |  task path: /usr/share/ceph-ansible/roles/ceph-handler/tasks/handler_osds.yml:6
2020-01-16 20:07:41,092 p=566779 u=root |  Thursday 16 January 2020  20:07:41 +0000 (0:00:00.204)       0:04:04.329 ****** 
2020-01-16 20:07:42,114 p=566779 u=root |  fatal: [dcn1-computehci1-2 -> 192.168.34.79]: FAILED! => changed=false 
  cmd:
  - podman
  - exec
  - ceph-mon-dcn1-computehci1-2
  - ceph
  - --cluster
  - ceph
  - osd
  - unset
  - noup
  delta: '0:00:00.076633'
  end: '2020-01-16 20:07:42.036222'
  msg: non-zero return code
  rc: 125
  start: '2020-01-16 20:07:41.959589'
  stderr: 'Error: unable to exec into ceph-mon-dcn1-computehci1-2: no container with name or ID ceph-mon-dcn1-computehci1-2 found: no such container'
  stderr_lines: <omitted>
  stdout: ''
  stdout_lines: <omitted>
2020-01-16 20:07:42,351 p=566779 u=root |  fatal: [dcn1-computehci1-1 -> 192.168.34.79]: FAILED! => changed=false 
  cmd:
  - podman
  - exec
  - ceph-mon-dcn1-computehci1-1
  - ceph
  - --cluster
  - ceph
  - osd
  - unset
  - noup
  delta: '0:00:00.072984'
  end: '2020-01-16 20:07:42.274307'
  msg: non-zero return code
  rc: 125
  start: '2020-01-16 20:07:42.201323'
  stderr: 'Error: unable to exec into ceph-mon-dcn1-computehci1-1: no container with name or ID ceph-mon-dcn1-computehci1-1 found: no such container'
  stderr_lines: <omitted>
  stdout: ''
  stdout_lines: <omitted>

Comment 1 RHEL Program Management 2020-01-17 14:15:12 UTC
Please specify the severity of this bug. Severity is defined here:
https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity.

Comment 3 Guillaume Abrioux 2020-01-22 16:59:56 UTC
*** Bug 1794111 has been marked as a duplicate of this bug. ***

Comment 15 Yuri Obshansky 2020-01-24 23:59:40 UTC
Still failed with error:
        "fatal: [dcn1-computehci1-2 -> 192.168.34.63]: FAILED! => changed=false ",
        "  delta: '0:00:00.103441'",
        "  end: '2020-01-24 23:36:41.139930'",
        "      _raw_params: podman exec ceph-mon-dcn1-computehci1-2 ceph --cluster ceph osd unset noup",
        "  start: '2020-01-24 23:36:41.036489'",
        "  stderr: 'Error: no container with name or ID ceph-mon-dcn1-computehci1-2 found: no such container'",

while
[root@dcn1-computehci1-2 ~]# podman ps |grep ceph-mon
da807e827ac6  site-undercloud-0.ctlplane.localdomain:8787/ceph/rhceph-4.0-rhel8:latest           13 minutes ago  Up 13 minutes ago         ceph-mon-dcn1-computehci1-2

Local ceph-ansible contains fix
(undercloud) [stack@site-undercloud-0 ~]$ sudo less /usr/share/ceph-ansible/roles/ceph-handler/tasks/handler_osds.yml

- name: unset noup flag
  command: "{{ hostvars[groups[mon_group_name][0]]['container_exec_cmd'] | default('') }} ceph --cluster {{ cluster }} osd unset noup"
  delegate_to: "{{ groups[mon_group_name][0] }}"
  run_once: true
  changed_when: False

(undercloud) [stack@site-undercloud-0 ~]$ rpm -qa | grep ceph
puppet-ceph-3.0.1-0.20191002213425.55a0f94.el8ost.noarch
ceph-ansible-4.0.12-1.el8cp.noarch

Comment 18 Giulio Fidente 2020-01-27 13:28:42 UTC
Created attachment 1655676 [details]
ceph-ansible playbook log, inventory and vars - from failedqa run

Comment 20 Federico Lucifredi 2020-01-27 20:17:14 UTC
Paul or John: is this a blocker for OSP integration?

Comment 43 errata-xmlrpc 2020-05-19 17:32:06 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:2231