Description of problem: Addition of mds server to an existing cluster fails with the following error: TASK [ceph-mon : create filesystem pools] ************************************** fatal: [magna096]: FAILED! => {"failed": true, "msg": "the field 'args' has an invalid value, which appears to include a variable that is undefined. The error was: 'osd_pool_default_pg_num' is undefined\n\nThe error appears to have been in '/usr/share/ceph-ansible/roles/ceph-mon/tasks/create_mds_filesystems.yml': line 6, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n# since those check are performed by the ceph-common role\n- name: create filesystem pools\n ^ here\n"} Version-Release number of selected component (if applicable): ceph-ansible-2.2.11-1.el7scon.noarch How reproducible: Always Steps to Reproduce: 1. Create a cluster first with ceph-ansible. 2. Once the cluster is up, run ansible again to add an mds server Actual results: MDS pool creation fails. "The error was: 'osd_pool_default_pg_num' is undefined\n\nThe error appears to have been in '/usr/share/ceph-ansible/roles/ceph-mon/tasks/create_mds_filesystems.yml" Additional info: As a workaround added 'osd_pool_default_pg_num' to group_vars/all.yml in "CONFIG OVERRIDE" section: ceph_conf_overrides: global: osd_pool_default_pg_num: 64 After this, ansible successfully installed mds server.
Sebastien, what specific change upstream fixed this BZ?
Ken, this is fixed in https://github.com/ceph/ceph-ansible/commit/ea68fbaaaee38b1a39b1f093e0faf5f897a466b0
That commit is already in the version Shilpa was running above (ceph-ansible-2.2.11). See "git tag --contains ea68fbaaaee38b1a39b1f093e0faf5f897a466b0" What else do we need to fix this?
@Ken, to be honest, I don't know, we don't see this error in the CI. @Shilpa, could you please try again and let me know if you still see this issue? I tried to reproduce, without success. I first deployed an initial cluster with 3 mons and 3 osds. Then I added a MDS node and re-ran Ansible. Success. As you can see I successfully passed this task: TASK [ceph-mon : create filesystem pools] ********************************************************************************************************************************************************************** task path: /home/jenkins-build/build/workspace/ceph-ansible/roles/ceph-mon/tasks/create_mds_filesystems.yml:6 ok: [mon2] => (item=cephfs_data) => {"changed": false, "cmd": ["ceph", "--cluster", "test", "osd", "pool", "create", "cephfs_data", "8"], "delta": "0:00:01.564608", "end": "2017-08-31 08:27:25.223125", "item" : "cephfs_data", "rc": 0, "start": "2017-08-31 08:27:23.658517", "stderr": "pool 'cephfs_data' created", "stderr_lines": ["pool 'cephfs_data' created"], "stdout": "", "stdout_lines": []} ok: [mon2] => (item=cephfs_metadata) => {"changed": false, "cmd": ["ceph", "--cluster", "test", "osd", "pool", "create", "cephfs_metadata", "8"], "delta": "0:00:01.035994", "end": "2017-08-31 08:27:29.731975" , "item": "cephfs_metadata", "rc": 0, "start": "2017-08-31 08:27:28.695981", "stderr": "pool 'cephfs_metadata' created", "stderr_lines": ["pool 'cephfs_metadata' created"], "stdout": "", "stdout_lines": []} See the final results: jenkins-build@ceph-builders:~/build/workspace/ceph-ansible/tests/functional/centos/7/bluestore$ vagrant ssh mon0 -c "sudo ceph --cluster test -s" cluster: id: 5a51b9d9-b110-4a5f-b73c-b5dcf63552a1 health: HEALTH_WARN no active mgr services: mon: 3 daemons, quorum ceph-mon0,ceph-mon1,ceph-mon2 mgr: no daemons active mds: cephfs-1/1/1 up {0=ceph-mds0=up:active} osd: 1 osds: 1 up, 1 in data: pools: 0 pools, 0 pgs objects: 0 objects, 0 bytes usage: 0 kB used, 0 kB / 0 kB avail pgs: Connection to 192.168.121.133 closed. jenkins-build@ceph-builders:~/build/workspace/ceph-ansible/tests/functional/centos/7/bluestore$ vagrant ssh mon0 -c "sudo ceph --cluster test fs ls" name: cephfs, metadata pool: cephfs_metadata, data pools: [cephfs_data ] Connection to 192.168.121.133 closed. I'm tempted to close this bug but I'll wait from you to report first. Thanks in advance.
Given that I haven't got any response and that I can not reproduce, I'm closing this. Feel free to re-open.
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days