Bug 1644713

Summary: Ceph-ansible fails in handler with missing data for ceph_osd_container_stat
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Tim Rozet <trozet>
Component: Ceph-AnsibleAssignee: Sébastien Han <shan>
Status: CLOSED CURRENTRELEASE QA Contact: ceph-qe-bugs <ceph-qe-bugs>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.1CC: aschoen, ceph-eng-bugs, gfidente, gmeno, jluhrsen, nthomas, sankarshan
Target Milestone: rc   
Target Release: 3.*   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-11-22 11:08:08 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1578730    
Attachments:
Description Flags
ceph log
none
ceph hieradata none

Description Tim Rozet 2018-10-31 13:08:29 UTC
Description of problem:
Deployment fails during ceph-ansible deployment. Only Controller/Compute roles are used in deployment to 5 nodes (3 control, 2 compute). All nodes have ceph osd docker service on them. The ceph-ansible deployment fails with:


The error was: error while evaluating conditional (hostvars[item]['ceph_osd_container_stat'].get('rc') == 0): 'dict object' has no attribute 'ceph_osd_container_stat'

Version-Release number of selected component (if applicable):
ceph-ansible-3.1.9-1.el7.noarch

How reproducible:
Seems to happen about 50% of the time.

Additional info:
The corresponding code that is failing is here:
https://github.com/ceph/ceph-ansible/blob/60bc1e38db0e797ad6553584927f86486ae09c19/roles/ceph-handler/handlers/main.yml#L109

Comment 1 Tim Rozet 2018-10-31 13:10:24 UTC
Created attachment 1499379 [details]
ceph log

Comment 2 Tim Rozet 2018-10-31 13:18:14 UTC
Created attachment 1499383 [details]
ceph hieradata

Comment 4 Tim Rozet 2018-10-31 15:46:26 UTC
Preliminary testing shows that the error does not happen with ceph-ansible-3.1.6. Running more tests to confirm it.

Comment 5 Tim Rozet 2018-11-09 16:37:37 UTC
Confirmed the problem does not happen in ceph-ansible-3.1.6