Bug 1792320 - "ceph-handler : unset noup flag attempts to use container not on host
Summary: "ceph-handler : unset noup flag attempts to use container not on host
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat
Component: Ceph-Ansible
Version: 4.1
Hardware: Unspecified
OS: Unspecified
medium
urgent
Target Milestone: rc
: 4.1
Assignee: Guillaume Abrioux
QA Contact: Vasishta
Bara Ancincova
URL:
Whiteboard:
: 1794111 (view as bug list)
Depends On:
Blocks: 1730176 1760354 1816167
TreeView+ depends on / blocked
 
Reported: 2020-01-17 14:15 UTC by John Fulton
Modified: 2020-07-09 09:58 UTC (History)
27 users (show)

Fixed In Version: ceph-ansible-4.0.13-1.el8cp, ceph-ansible-4.0.13-1.el7cp
Doc Type: Bug Fix
Doc Text:
.{storage-product} installation on Red Hat OpenStack Platform no longer fails Previously, the `ceph-ansible` utility became unresponsive when attempting to install {product} with the Red Hat OpenStack Platform 16, and it returns an error similar to the following: ---- 'Error: unable to exec into ceph-mon-dcn1-computehci1-2: no container with name or ID ceph-mon-dcn1-computehci1-2 found: no such container' ---- This occurred because `ceph-ansible` reads the value of the fact `container_exec_cmd` from the wrong node in handler_osds.yml With this update, `ceph-ansible` reads the value of `container_exec_cmd` from the correct node, and the installation proceeds successfully.
Clone Of:
Environment:
Last Closed: 2020-05-19 17:32:06 UTC
Target Upstream Version:
knortema: needinfo+


Attachments (Terms of Use)
ceph-ansible playbook log, inventory and vars - from failedqa run (4.14 MB, application/x-tar)
2020-01-27 13:28 UTC, Giulio Fidente
no flags Details


Links
System ID Priority Status Summary Last Updated
Github ceph ceph-ansible pull 4980 None closed handler: read container_exec_cmd value from first mon 2020-09-30 12:41:40 UTC
Github ceph ceph-ansible pull 4988/commits/c3c3b6d3e2f5c934ed93ac14d15394bb4a7c37c6 None None None 2020-09-30 12:41:40 UTC
Red Hat Product Errata RHSA-2020:2231 None None None 2020-05-19 17:32:51 UTC

Description John Fulton 2020-01-17 14:15:04 UTC
ceph-ansible 4.0.10 encounters the following error [1]

'Error: unable to exec into ceph-mon-dcn1-computehci1-2: no container with name or ID ceph-mon-dcn1-computehci1-2 found: no such container'

on this task:

https://github.com/ceph/ceph-ansible/blob/v4.0.10/roles/ceph-handler/tasks/handler_osds.yml#L8

This can be avoided by updating the task from OLD to NEW as follows.

OLD:

- name: unset noup flag
  command: "{{ container_exec_cmd | default('') }} ceph --cluster {{ cluster }} osd unset noup"
  delegate_to: "{{ groups[mon_group_name][0] }}"
  changed_when: False

NEW:

- name: unset noup flag
  command: "{{ hostvars[groups[mon_group_name][0]]['container_exec_cmd'] | default('') }} ceph --cluster {{ cluster }} osd unset noup"
  delegate_to: "{{ groups[mon_group_name][0] }}"
  changed_when: False

[1]
2020-01-16 20:07:41,091 p=566779 u=root |  RUNNING HANDLER [ceph-handler : unset noup flag] *******************************
2020-01-16 20:07:41,092 p=566779 u=root |  task path: /usr/share/ceph-ansible/roles/ceph-handler/tasks/handler_osds.yml:6
2020-01-16 20:07:41,092 p=566779 u=root |  Thursday 16 January 2020  20:07:41 +0000 (0:00:00.204)       0:04:04.329 ****** 
2020-01-16 20:07:42,114 p=566779 u=root |  fatal: [dcn1-computehci1-2 -> 192.168.34.79]: FAILED! => changed=false 
  cmd:
  - podman
  - exec
  - ceph-mon-dcn1-computehci1-2
  - ceph
  - --cluster
  - ceph
  - osd
  - unset
  - noup
  delta: '0:00:00.076633'
  end: '2020-01-16 20:07:42.036222'
  msg: non-zero return code
  rc: 125
  start: '2020-01-16 20:07:41.959589'
  stderr: 'Error: unable to exec into ceph-mon-dcn1-computehci1-2: no container with name or ID ceph-mon-dcn1-computehci1-2 found: no such container'
  stderr_lines: <omitted>
  stdout: ''
  stdout_lines: <omitted>
2020-01-16 20:07:42,351 p=566779 u=root |  fatal: [dcn1-computehci1-1 -> 192.168.34.79]: FAILED! => changed=false 
  cmd:
  - podman
  - exec
  - ceph-mon-dcn1-computehci1-1
  - ceph
  - --cluster
  - ceph
  - osd
  - unset
  - noup
  delta: '0:00:00.072984'
  end: '2020-01-16 20:07:42.274307'
  msg: non-zero return code
  rc: 125
  start: '2020-01-16 20:07:42.201323'
  stderr: 'Error: unable to exec into ceph-mon-dcn1-computehci1-1: no container with name or ID ceph-mon-dcn1-computehci1-1 found: no such container'
  stderr_lines: <omitted>
  stdout: ''
  stdout_lines: <omitted>

Comment 1 RHEL Program Management 2020-01-17 14:15:12 UTC
Please specify the severity of this bug. Severity is defined here:
https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity.

Comment 3 Guillaume Abrioux 2020-01-22 16:59:56 UTC
*** Bug 1794111 has been marked as a duplicate of this bug. ***

Comment 15 Yuri Obshansky 2020-01-24 23:59:40 UTC
Still failed with error:
        "fatal: [dcn1-computehci1-2 -> 192.168.34.63]: FAILED! => changed=false ",
        "  delta: '0:00:00.103441'",
        "  end: '2020-01-24 23:36:41.139930'",
        "      _raw_params: podman exec ceph-mon-dcn1-computehci1-2 ceph --cluster ceph osd unset noup",
        "  start: '2020-01-24 23:36:41.036489'",
        "  stderr: 'Error: no container with name or ID ceph-mon-dcn1-computehci1-2 found: no such container'",

while
[root@dcn1-computehci1-2 ~]# podman ps |grep ceph-mon
da807e827ac6  site-undercloud-0.ctlplane.localdomain:8787/ceph/rhceph-4.0-rhel8:latest           13 minutes ago  Up 13 minutes ago         ceph-mon-dcn1-computehci1-2

Local ceph-ansible contains fix
(undercloud) [stack@site-undercloud-0 ~]$ sudo less /usr/share/ceph-ansible/roles/ceph-handler/tasks/handler_osds.yml

- name: unset noup flag
  command: "{{ hostvars[groups[mon_group_name][0]]['container_exec_cmd'] | default('') }} ceph --cluster {{ cluster }} osd unset noup"
  delegate_to: "{{ groups[mon_group_name][0] }}"
  run_once: true
  changed_when: False

(undercloud) [stack@site-undercloud-0 ~]$ rpm -qa | grep ceph
puppet-ceph-3.0.1-0.20191002213425.55a0f94.el8ost.noarch
ceph-ansible-4.0.12-1.el8cp.noarch

Comment 18 Giulio Fidente 2020-01-27 13:28:42 UTC
Created attachment 1655676 [details]
ceph-ansible playbook log, inventory and vars - from failedqa run

Comment 20 Federico Lucifredi 2020-01-27 20:17:14 UTC
Paul or John: is this a blocker for OSP integration?

Comment 43 errata-xmlrpc 2020-05-19 17:32:06 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:2231


Note You need to log in before you can comment on or make changes to this bug.