This service will be undergoing maintenance at 00:00 UTC, 2017-10-23 It is expected to last about 30 minutes
Bug 1461367 - Addition of mds node to an existing cluster fails [NEEDINFO]
Addition of mds node to an existing cluster fails
Status: CLOSED WORKSFORME
Product: Red Hat Ceph Storage
Classification: Red Hat
Component: Ceph-Ansible (Show other bugs)
3.0
Unspecified Unspecified
urgent Severity medium
: rc
: 3.0
Assigned To: leseb
ceph-qe-bugs
Erin Donnelly
:
Depends On:
Blocks: 1437916
  Show dependency treegraph
 
Reported: 2017-06-14 06:05 EDT by shilpa
Modified: 2017-09-15 09:10 EDT (History)
14 users (show)

See Also:
Fixed In Version:
Doc Type: Known Issue
Doc Text:
.Adding an MDS to an existing cluster fails Adding a Ceph Metadata Server (MDS) to an existing cluster fails with the error: ---- osd_pool_default_pg_num is undefined\n\nThe error appears to have been in '/usr/share/ceph-ansible/roles/ceph-mon/tasks/create_mds_filesystems.yml ---- As a consequence, an attempt to create an MDS pool fails. To work around this issue, add the `osd_pool_default_pg_num` parameter to `ceph_conf_overrides` in the `/usr/share/ceph-ansible/group_vars/all.yml` file, for example: ---- ceph_conf_overrides: global: osd_pool_default_pg_num: 64 ----
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-09-15 09:10:54 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
seb: needinfo? (smanjara)


Attachments (Terms of Use)

  None (edit)
Description shilpa 2017-06-14 06:05:58 EDT
Description of problem:
Addition of mds server to an existing cluster fails with the following error:

TASK [ceph-mon : create filesystem pools] **************************************
fatal: [magna096]: FAILED! => {"failed": true, "msg": "the field 'args' has an invalid value, which appears to include a variable that is undefined. The error was: 'osd_pool_default_pg_num' is undefined\n\nThe error appears to have been in '/usr/share/ceph-ansible/roles/ceph-mon/tasks/create_mds_filesystems.yml': line 6, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n# since those check are performed by the ceph-common role\n- name: create filesystem pools\n  ^ here\n"}


Version-Release number of selected component (if applicable):
ceph-ansible-2.2.11-1.el7scon.noarch

How reproducible:
Always

Steps to Reproduce:
1. Create a cluster first with ceph-ansible.
2. Once the cluster is up, run ansible again to add an mds server


Actual results:
MDS pool creation fails.

"The error was: 'osd_pool_default_pg_num' is undefined\n\nThe error appears to have been in '/usr/share/ceph-ansible/roles/ceph-mon/tasks/create_mds_filesystems.yml"


Additional info:

As a workaround added 'osd_pool_default_pg_num' to group_vars/all.yml in  "CONFIG OVERRIDE" section:

ceph_conf_overrides:
  global:
     osd_pool_default_pg_num: 64

After this, ansible successfully installed mds server.
Comment 10 Ken Dreyer (Red Hat) 2017-07-06 07:15:18 EDT
Sebastien, what specific change upstream fixed this BZ?
Comment 12 Ken Dreyer (Red Hat) 2017-07-31 12:45:27 EDT
That commit is already in the version Shilpa was running above (ceph-ansible-2.2.11). See "git tag --contains ea68fbaaaee38b1a39b1f093e0faf5f897a466b0"

What else do we need to fix this?
Comment 13 seb 2017-08-31 04:35:51 EDT
@Ken, to be honest, I don't know, we don't see this error in the CI.
@Shilpa, could you please try again and let me know if you still see this issue?

I tried to reproduce, without success. I first deployed an initial cluster with 3 mons and 3 osds. Then I added a MDS node and re-ran Ansible. Success.

As you can see I successfully passed this task:

TASK [ceph-mon : create filesystem pools] **********************************************************************************************************************************************************************
task path: /home/jenkins-build/build/workspace/ceph-ansible/roles/ceph-mon/tasks/create_mds_filesystems.yml:6
ok: [mon2] => (item=cephfs_data) => {"changed": false, "cmd": ["ceph", "--cluster", "test", "osd", "pool", "create", "cephfs_data", "8"], "delta": "0:00:01.564608", "end": "2017-08-31 08:27:25.223125", "item"
: "cephfs_data", "rc": 0, "start": "2017-08-31 08:27:23.658517", "stderr": "pool 'cephfs_data' created", "stderr_lines": ["pool 'cephfs_data' created"], "stdout": "", "stdout_lines": []}
ok: [mon2] => (item=cephfs_metadata) => {"changed": false, "cmd": ["ceph", "--cluster", "test", "osd", "pool", "create", "cephfs_metadata", "8"], "delta": "0:00:01.035994", "end": "2017-08-31 08:27:29.731975"
, "item": "cephfs_metadata", "rc": 0, "start": "2017-08-31 08:27:28.695981", "stderr": "pool 'cephfs_metadata' created", "stderr_lines": ["pool 'cephfs_metadata' created"], "stdout": "", "stdout_lines": []}


See the final results:

jenkins-build@ceph-builders:~/build/workspace/ceph-ansible/tests/functional/centos/7/bluestore$ vagrant ssh mon0 -c "sudo ceph --cluster test -s"
  cluster:
    id:     5a51b9d9-b110-4a5f-b73c-b5dcf63552a1
    health: HEALTH_WARN
            no active mgr

  services:
    mon: 3 daemons, quorum ceph-mon0,ceph-mon1,ceph-mon2
    mgr: no daemons active
    mds: cephfs-1/1/1 up  {0=ceph-mds0=up:active}
    osd: 1 osds: 1 up, 1 in

  data:
    pools:   0 pools, 0 pgs
    objects: 0 objects, 0 bytes
    usage:   0 kB used, 0 kB / 0 kB avail
    pgs:

Connection to 192.168.121.133 closed.
jenkins-build@ceph-builders:~/build/workspace/ceph-ansible/tests/functional/centos/7/bluestore$ vagrant ssh mon0 -c "sudo ceph --cluster test fs ls"
name: cephfs, metadata pool: cephfs_metadata, data pools: [cephfs_data ]
Connection to 192.168.121.133 closed.


I'm tempted to close this bug but I'll wait from you to report first.
Thanks in advance.
Comment 14 seb 2017-09-15 09:10:54 EDT
Given that I haven't got any response and that I can not reproduce, I'm closing this. Feel free to re-open.

Note You need to log in before you can comment on or make changes to this bug.