Bug 1791282

Summary: container_exec_cmd always pointing to first node instance in crush_rules tasks
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Giulio Fidente <gfidente>
Component: Ceph-AnsibleAssignee: Guillaume Abrioux <gabrioux>
Status: CLOSED ERRATA QA Contact: Vasishta <vashastr>
Severity: high Docs Contact:
Priority: high    
Version: 4.0CC: aschoen, ceph-eng-bugs, dsavinea, fpantano, gmeno, nthomas, tserlin, ykaul, yrabl
Target Milestone: rc   
Target Release: 4.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ceph-ansible-4.0.10-1.el8cp, ceph-ansible-4.0.10-1.el7cp Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-01-31 12:48:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1642481, 1791283    
Attachments:
Description Flags
ceph-ansible run logs, inventory and vars none

Description Giulio Fidente 2020-01-15 12:37:07 UTC
Created attachment 1652429 [details]
ceph-ansible run logs, inventory and vars

container exec command in crush_rules tasks seems to be always referring to the first member of the mon_group_name

2020-01-15 11:14:29,543 p=523151 u=root |  TASK [ceph-osd : insert new default crush rule into daemon to prevent restart] ***
2020-01-15 11:14:29,543 p=523151 u=root |  task path: /usr/share/ceph-ansible/roles/ceph-osd/tasks/crush_rules.yml:39
2020-01-15 11:14:29,544 p=523151 u=root |  Wednesday 15 January 2020  11:14:29 +0000 (0:00:00.316)       0:08:29.409 ***** 
2020-01-15 11:14:30,451 p=523151 u=root |  ok: [ceph-0 -> 192.168.24.8] => (item=controller-0) => changed=false 
  ansible_loop_var: item
  cmd:
  - podman
  - exec
  - ceph-mon-controller-0
  - ceph
  - --admin-daemon
  - /var/run/ceph/ceph-mon.controller-0.asok
  - config
  - set
  - osd_pool_default_crush_rule
  - '1'
  delta: '0:00:00.466851'
  end: '2020-01-15 11:14:30.414346'
  item: controller-0
  rc: 0
  start: '2020-01-15 11:14:29.947495'
  stderr: ''
  stderr_lines: <omitted>
  stdout: |-
    {
        "success": ""
    }
  stdout_lines: <omitted>
2020-01-15 11:14:30,940 p=523151 u=root |  failed: [ceph-0 -> 192.168.24.10] (item=controller-1) => changed=false 
  ansible_loop_var: item
  cmd:
  - podman
  - exec
  - ceph-mon-controller-0
  - ceph
  - --admin-daemon
  - /var/run/ceph/ceph-mon.controller-1.asok
  - config
  - set
  - osd_pool_default_crush_rule
  - '1'
  delta: '0:00:00.081902'
  end: '2020-01-15 11:14:30.911267'
  item: controller-1
  msg: non-zero return code
  rc: 125
  start: '2020-01-15 11:14:30.829365'
  stderr: 'Error: unable to exec into ceph-mon-controller-0: no container with name or ID ceph-mon-controller-0 found: no such container'
  stderr_lines: <omitted>
  stdout: ''
  stdout_lines: <omitted>
2020-01-15 11:14:31,423 p=523151 u=root |  failed: [ceph-0 -> 192.168.24.30] (item=controller-2) => changed=false 
  ansible_loop_var: item
  cmd:
  - podman
  - exec
  - ceph-mon-controller-0
  - ceph
  - --admin-daemon
  - /var/run/ceph/ceph-mon.controller-2.asok
  - config
  - set
  - osd_pool_default_crush_rule
  - '1'
  delta: '0:00:00.075020'
  end: '2020-01-15 11:14:31.390868'
  item: controller-2
  msg: non-zero return code
  rc: 125
  start: '2020-01-15 11:14:31.315848'
  stderr: 'Error: unable to exec into ceph-mon-controller-0: no container with name or ID ceph-mon-controller-0 found: no such container'
  stderr_lines: <omitted>
  stdout: ''
  stdout_lines: <omitted>

attached are logs, inventory and vars from the failing run

Comment 1 RHEL Program Management 2020-01-15 12:37:13 UTC
Please specify the severity of this bug. Severity is defined here:
https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity.

Comment 6 Yogev Rabl 2020-01-17 15:50:18 UTC
Verified with ceph-ansible-4.0.10-1.el8cp.noarch

Comment 9 errata-xmlrpc 2020-01-31 12:48:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0312