Bug 1546233 - ceph-ansible containerized deployment fails if the ceph user doesn't already exist on ceph servers
Summary: ceph-ansible containerized deployment fails if the ceph user doesn't already ...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: Ceph-Ansible
Version: 3.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: rc
: 3.2
Assignee: Sébastien Han
QA Contact: Yogev Rabl
URL:
Whiteboard:
Depends On:
Blocks: 1553640
TreeView+ depends on / blocked
 
Reported: 2018-02-16 17:07 UTC by John Fulton
Modified: 2018-11-09 20:06 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-11-09 20:06:01 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github ceph ceph-ansible pull 2487 0 None closed create keys and pools for client nodes only on first node 2020-07-01 15:50:19 UTC

Description John Fulton 2018-02-16 17:07:51 UTC
While using ceph-ansible-3.0.14-1.el7cp.noarch with site-docker.yaml, my deployment failed with the error below because the ceph user didn't exist on the overcloud. 

Could ansible, when running in a containerized scenario, simply ensure that the ceph user exists on all of the nodes? 

2018-02-16 11:45:38,565 p=27957 u=mistral |  TASK [ceph-config : ensure /etc/ceph exists] ***********************************
2018-02-16 11:45:38,816 p=27957 u=mistral |  fatal: [192.168.1.251]: FAILED! => {"changed": false, "gid": 64045, "group": "64045", "mode": "0755", "msg": "chown failed: failed to look up user ceph", "owner": "64045", "path": "/etc/ceph", "secontext": "system_u:object_r:etc_t:s0", "size": 6, "state": "directory", "uid": 64045}
2018-02-16 11:45:38,850 p=27957 u=mistral |  fatal: [192.168.1.252]: FAILED! => {"changed": false, "gid": 64045, "group": "64045", "mode": "0755", "msg": "chown failed: failed to look up user ceph", "owner": "64045", "path": "/etc/ceph", "secontext": "system_u:object_r:etc_t:s0", "size": 6, "state": "directory", "uid": 64045}
2018-02-16 11:45:38,850 p=27957 u=mistral |  PLAY RECAP *********************************************************************
2018-02-16 11:45:38,851 p=27957 u=mistral |  192.168.1.251              : ok=23   changed=1    unreachable=0    failed=1   
2018-02-16 11:45:38,851 p=27957 u=mistral |  192.168.1.252              : ok=21   changed=1    unreachable=0    failed=1   
[root@hci-director mistral]#

Comment 3 John Fulton 2018-02-16 21:50:55 UTC
FYI: We're using a documentation workaround in OpenStack until this is resolved: 

 https://bugzilla.redhat.com/show_bug.cgi?id=1546371

Comment 4 Ken Dreyer (Red Hat) 2018-04-25 17:34:38 UTC
"chown failed: failed to look up user ceph"

This UID and GID should be statically defined on all RHEL 7 systems. The "setup" RPM includes this, see bug 1221043

What is the output from "rpm -qv setup" on a system where this fails?

Comment 8 John Fulton 2018-05-09 21:10:08 UTC
This issue came up because the username "ceph" was hard coded. As per the following commit: 

https://github.com/ceph/ceph-ansible/commit/18c0c7a508efc47382d42fafb9ce9cd01885c78f

It was changed to variables containing a numeric ID. These variables have reasonable defaults:

https://github.com/ceph/ceph-ansible/blob/65ba85aff66b434600e9dfec738d48c85d21c932/roles/ceph-defaults/tasks/facts.yml#L183

and you can set a file to be owned by a numeric ID regardless of if that user is defined on the system [1]. Thus this issue should solved, even if the user doesn't exist.

As patch that will fix this has merged I'm setting the bug to POST.


[1] 
[fultonj@skagra ~]$ grep 1234 /etc/passwd
[fultonj@skagra ~]$ touch x
[fultonj@skagra ~]$ chown 1234:1234 x
chown: changing ownership of 'x': Operation not permitted
[fultonj@skagra ~]$ sudo chown 1234:1234 x
[fultonj@skagra ~]$ ls -l x
-rw-rw-r--. 1 1234 1234 0 May  9 14:00 x
[fultonj@skagra ~]$


Note You need to log in before you can comment on or make changes to this bug.