1461367 – Addition of mds node to an existing cluster fails

Bug 1461367 - Addition of mds node to an existing cluster fails

Summary: Addition of mds node to an existing cluster fails

Keywords:
Status:	CLOSED WORKSFORME
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	Ceph-Ansible
Sub Component:
Version:	3.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	medium
Target Milestone:	rc
Target Release:	3.0
Assignee:	Sébastien Han
QA Contact:	ceph-qe-bugs
Docs Contact:	Erin Donnelly
URL:
Whiteboard:
Depends On:
Blocks:	1437916
TreeView+	depends on / blocked

Reported:	2017-06-14 10:05 UTC by shilpa
Modified:	2023-09-14 03:59 UTC (History)
CC List:	14 users (show)
Fixed In Version:
Doc Type:	Known Issue
Doc Text:	.Adding an MDS to an existing cluster fails Adding a Ceph Metadata Server (MDS) to an existing cluster fails with the error: ---- osd_pool_default_pg_num is undefined\n\nThe error appears to have been in '/usr/share/ceph-ansible/roles/ceph-mon/tasks/create_mds_filesystems.yml ---- As a consequence, an attempt to create an MDS pool fails. To work around this issue, add the `osd_pool_default_pg_num` parameter to `ceph_conf_overrides` in the `/usr/share/ceph-ansible/group_vars/all.yml` file, for example: ---- ceph_conf_overrides: global: osd_pool_default_pg_num: 64 ----
Clone Of:
Environment:
Last Closed:	2017-09-15 13:10:54 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description shilpa 2017-06-14 10:05:58 UTC

Description of problem:
Addition of mds server to an existing cluster fails with the following error:

TASK [ceph-mon : create filesystem pools] **************************************
fatal: [magna096]: FAILED! => {"failed": true, "msg": "the field 'args' has an invalid value, which appears to include a variable that is undefined. The error was: 'osd_pool_default_pg_num' is undefined\n\nThe error appears to have been in '/usr/share/ceph-ansible/roles/ceph-mon/tasks/create_mds_filesystems.yml': line 6, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n# since those check are performed by the ceph-common role\n- name: create filesystem pools\n  ^ here\n"}


Version-Release number of selected component (if applicable):
ceph-ansible-2.2.11-1.el7scon.noarch

How reproducible:
Always

Steps to Reproduce:
1. Create a cluster first with ceph-ansible.
2. Once the cluster is up, run ansible again to add an mds server


Actual results:
MDS pool creation fails.

"The error was: 'osd_pool_default_pg_num' is undefined\n\nThe error appears to have been in '/usr/share/ceph-ansible/roles/ceph-mon/tasks/create_mds_filesystems.yml"


Additional info:

As a workaround added 'osd_pool_default_pg_num' to group_vars/all.yml in  "CONFIG OVERRIDE" section:

ceph_conf_overrides:
  global:
     osd_pool_default_pg_num: 64

After this, ansible successfully installed mds server.

Comment 10 Ken Dreyer (Red Hat) 2017-07-06 11:15:18 UTC

Sebastien, what specific change upstream fixed this BZ?

Comment 11 seb 2017-07-27 09:27:27 UTC

Ken, this is fixed in https://github.com/ceph/ceph-ansible/commit/ea68fbaaaee38b1a39b1f093e0faf5f897a466b0

Comment 12 Ken Dreyer (Red Hat) 2017-07-31 16:45:27 UTC

That commit is already in the version Shilpa was running above (ceph-ansible-2.2.11). See "git tag --contains ea68fbaaaee38b1a39b1f093e0faf5f897a466b0"

What else do we need to fix this?

Comment 13 seb 2017-08-31 08:35:51 UTC

@Ken, to be honest, I don't know, we don't see this error in the CI.
@Shilpa, could you please try again and let me know if you still see this issue?

I tried to reproduce, without success. I first deployed an initial cluster with 3 mons and 3 osds. Then I added a MDS node and re-ran Ansible. Success.

As you can see I successfully passed this task:

TASK [ceph-mon : create filesystem pools] **********************************************************************************************************************************************************************
task path: /home/jenkins-build/build/workspace/ceph-ansible/roles/ceph-mon/tasks/create_mds_filesystems.yml:6
ok: [mon2] => (item=cephfs_data) => {"changed": false, "cmd": ["ceph", "--cluster", "test", "osd", "pool", "create", "cephfs_data", "8"], "delta": "0:00:01.564608", "end": "2017-08-31 08:27:25.223125", "item"
: "cephfs_data", "rc": 0, "start": "2017-08-31 08:27:23.658517", "stderr": "pool 'cephfs_data' created", "stderr_lines": ["pool 'cephfs_data' created"], "stdout": "", "stdout_lines": []}
ok: [mon2] => (item=cephfs_metadata) => {"changed": false, "cmd": ["ceph", "--cluster", "test", "osd", "pool", "create", "cephfs_metadata", "8"], "delta": "0:00:01.035994", "end": "2017-08-31 08:27:29.731975"
, "item": "cephfs_metadata", "rc": 0, "start": "2017-08-31 08:27:28.695981", "stderr": "pool 'cephfs_metadata' created", "stderr_lines": ["pool 'cephfs_metadata' created"], "stdout": "", "stdout_lines": []}


See the final results:

jenkins-build@ceph-builders:~/build/workspace/ceph-ansible/tests/functional/centos/7/bluestore$ vagrant ssh mon0 -c "sudo ceph --cluster test -s"
  cluster:
    id:     5a51b9d9-b110-4a5f-b73c-b5dcf63552a1
    health: HEALTH_WARN
            no active mgr

  services:
    mon: 3 daemons, quorum ceph-mon0,ceph-mon1,ceph-mon2
    mgr: no daemons active
    mds: cephfs-1/1/1 up  {0=ceph-mds0=up:active}
    osd: 1 osds: 1 up, 1 in

  data:
    pools:   0 pools, 0 pgs
    objects: 0 objects, 0 bytes
    usage:   0 kB used, 0 kB / 0 kB avail
    pgs:

Connection to 192.168.121.133 closed.
jenkins-build@ceph-builders:~/build/workspace/ceph-ansible/tests/functional/centos/7/bluestore$ vagrant ssh mon0 -c "sudo ceph --cluster test fs ls"
name: cephfs, metadata pool: cephfs_metadata, data pools: [cephfs_data ]
Connection to 192.168.121.133 closed.


I'm tempted to close this bug but I'll wait from you to report first.
Thanks in advance.

Comment 14 seb 2017-09-15 13:10:54 UTC

Given that I haven't got any response and that I can not reproduce, I'm closing this. Feel free to re-open.

Comment 15 Red Hat Bugzilla 2023-09-14 03:59:09 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days

Note You need to log in before you can comment on or make changes to this bug.