Issuing a command to compact its data store during a rolling upgrade renders the Ceph monitors unresponsive. To avoid this behaviour, skip the command to compact the data store during a rolling upgrade. As a result, the Ceph monitors are responsive.
Description of problem:
=======================
When cluster have more than one monitor rolling update gets hung in task 'compress the store as much as possible' for second monitor node
Version-Release number of selected component (if applicable):
==============================================================
update from 10.2.2-38.el7cp.x86_64 to 10.2.2-39.el7cp.x86_64
How reproducible:
=================
always
Steps to Reproduce:
===================
1. Create a cluster via ceph-ansible having 3 MON, 3 OSD and 1 RGW node (10.2.2-38.el7cp.x86_64)
[root@magna044 ceph-ansible]# cat /etc/ansible/hosts
[mons]
magna078
magna084
magna085
[osds]
magna090
magna091
magna085
[rgws]
magna094
2. create repo fie on all nodes which points to 10.2.2-39.el7cp.x86_64 bits
3. Change the value of 'serial:' to adjust the number of server to be updated.
4. use rolling_update.yml to update all nodes
Actual results:
===============
[root@magna044 ceph-ansible]# ansible-playbook rolling_update.yml
Are you sure you want to upgrade the cluster? [no]: yes
PLAY [confirm whether user really meant to upgrade the cluster] ***************
GATHERING FACTS ***************************************************************
ok: [localhost]
TASK: [exit playbook, if user did not mean to upgrade cluster] ****************
skipping: [localhost]
PLAY [mons;osds;mdss;rgws] ****************************************************
GATHERING FACTS ***************************************************************
ok: [magna084]
ok: [magna078]
ok: [magna085]
ok: [magna091]
ok: [magna090]
ok: [magna094]
TASK: [debug msg="gather facts on all Ceph hosts for following reference"] ****
ok: [magna078] => {
"msg": "gather facts on all Ceph hosts for following reference"
}
ok: [magna084] => {
"msg": "gather facts on all Ceph hosts for following reference"
}
ok: [magna085] => {
"msg": "gather facts on all Ceph hosts for following reference"
}
ok: [magna094] => {
"msg": "gather facts on all Ceph hosts for following reference"
}
ok: [magna090] => {
"msg": "gather facts on all Ceph hosts for following reference"
}
ok: [magna091] => {
"msg": "gather facts on all Ceph hosts for following reference"
}
TASK: [check if sysvinit] *****************************************************
ok: [magna084]
ok: [magna090]
ok: [magna091]
ok: [magna078]
ok: [magna085]
ok: [magna094]
TASK: [check if upstart] ******************************************************
ok: [magna084]
ok: [magna078]
ok: [magna090]
ok: [magna085]
ok: [magna091]
ok: [magna094]
TASK: [check if systemd] ******************************************************
changed: [magna090]
changed: [magna084]
changed: [magna085]
changed: [magna078]
changed: [magna094]
changed: [magna091]
PLAY [mons] *******************************************************************
GATHERING FACTS ***************************************************************
ok: [magna084]
ok: [magna078]
ok: [magna085]
TASK: [compress the store as much as possible] ********************************
changed: [magna078]
Expected results:
=================
IT should update all nodes
Additional info:
Can you give me the state of the cluster prior to run this?
Are all the monitors started?
Can you try to run the compress command manually on the monitor nodes?
I do have one question here.
AFAIK, all MON share same db then why do we need to compress on all MONs?
We can compress on one MON node and it should work fine, right?
Please correct me if I am wrong
It's weird that we don't know the root cause of that, even if the compaction is not needed by the upgrade, I think it's a nice to have.
I ran the playbook several times and the only case where the compact command hung was the monitor being stopped...
I can remove the compact command from the playbook anyway.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHSA-2016:2082
Description of problem: ======================= When cluster have more than one monitor rolling update gets hung in task 'compress the store as much as possible' for second monitor node Version-Release number of selected component (if applicable): ============================================================== update from 10.2.2-38.el7cp.x86_64 to 10.2.2-39.el7cp.x86_64 How reproducible: ================= always Steps to Reproduce: =================== 1. Create a cluster via ceph-ansible having 3 MON, 3 OSD and 1 RGW node (10.2.2-38.el7cp.x86_64) [root@magna044 ceph-ansible]# cat /etc/ansible/hosts [mons] magna078 magna084 magna085 [osds] magna090 magna091 magna085 [rgws] magna094 2. create repo fie on all nodes which points to 10.2.2-39.el7cp.x86_64 bits 3. Change the value of 'serial:' to adjust the number of server to be updated. 4. use rolling_update.yml to update all nodes Actual results: =============== [root@magna044 ceph-ansible]# ansible-playbook rolling_update.yml Are you sure you want to upgrade the cluster? [no]: yes PLAY [confirm whether user really meant to upgrade the cluster] *************** GATHERING FACTS *************************************************************** ok: [localhost] TASK: [exit playbook, if user did not mean to upgrade cluster] **************** skipping: [localhost] PLAY [mons;osds;mdss;rgws] **************************************************** GATHERING FACTS *************************************************************** ok: [magna084] ok: [magna078] ok: [magna085] ok: [magna091] ok: [magna090] ok: [magna094] TASK: [debug msg="gather facts on all Ceph hosts for following reference"] **** ok: [magna078] => { "msg": "gather facts on all Ceph hosts for following reference" } ok: [magna084] => { "msg": "gather facts on all Ceph hosts for following reference" } ok: [magna085] => { "msg": "gather facts on all Ceph hosts for following reference" } ok: [magna094] => { "msg": "gather facts on all Ceph hosts for following reference" } ok: [magna090] => { "msg": "gather facts on all Ceph hosts for following reference" } ok: [magna091] => { "msg": "gather facts on all Ceph hosts for following reference" } TASK: [check if sysvinit] ***************************************************** ok: [magna084] ok: [magna090] ok: [magna091] ok: [magna078] ok: [magna085] ok: [magna094] TASK: [check if upstart] ****************************************************** ok: [magna084] ok: [magna078] ok: [magna090] ok: [magna085] ok: [magna091] ok: [magna094] TASK: [check if systemd] ****************************************************** changed: [magna090] changed: [magna084] changed: [magna085] changed: [magna078] changed: [magna094] changed: [magna091] PLAY [mons] ******************************************************************* GATHERING FACTS *************************************************************** ok: [magna084] ok: [magna078] ok: [magna085] TASK: [compress the store as much as possible] ******************************** changed: [magna078] Expected results: ================= IT should update all nodes Additional info: