Bug 1372481 - [ceph-ansible] : rolling_update got hung in task 'compress the store as much as possible'
Summary: [ceph-ansible] : rolling_update got hung in task 'compress the store as much ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Storage Console
Classification: Red Hat
Component: ceph-ansible
Version: 2
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: 2
Assignee: seb
QA Contact: Rachana Patel
URL:
Whiteboard:
Depends On:
Blocks: Console-2-Async
TreeView+ depends on / blocked
 
Reported: 2016-09-01 21:28 UTC by Rachana Patel
Modified: 2016-10-19 15:22 UTC (History)
13 users (show)

Fixed In Version: ceph-ansible-1.0.5-33.el7scon
Doc Type: Bug Fix
Doc Text:
Issuing a command to compact its data store during a rolling upgrade renders the Ceph monitors unresponsive. To avoid this behaviour, skip the command to compact the data store during a rolling upgrade. As a result, the Ceph monitors are responsive.
Clone Of:
Environment:
Last Closed: 2016-10-19 15:22:11 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2016:2082 0 normal SHIPPED_LIVE Moderate: Red Hat Storage Console 2 security and bug fix update 2017-04-18 19:29:02 UTC

Description Rachana Patel 2016-09-01 21:28:12 UTC
Description of problem:
=======================
When cluster have more than one monitor rolling update gets hung in task 'compress the store as much as possible' for second monitor node


Version-Release number of selected component (if applicable):
==============================================================
update from 10.2.2-38.el7cp.x86_64 to 10.2.2-39.el7cp.x86_64


How reproducible:
=================
always


Steps to Reproduce:
===================
1. Create a cluster via ceph-ansible having 3 MON, 3 OSD and 1 RGW node (10.2.2-38.el7cp.x86_64)
[root@magna044 ceph-ansible]# cat /etc/ansible/hosts
[mons]
magna078
magna084
magna085

[osds]
magna090
magna091
magna085

[rgws]
magna094


2. create repo fie on all nodes which points to 10.2.2-39.el7cp.x86_64 bits
3. Change the value of 'serial:' to adjust the number of server to be updated.
4. use rolling_update.yml to update all nodes

Actual results:
===============
[root@magna044 ceph-ansible]# ansible-playbook rolling_update.yml 
Are you sure you want to upgrade the cluster? [no]: yes

PLAY [confirm whether user really meant to upgrade the cluster] *************** 

GATHERING FACTS *************************************************************** 
ok: [localhost]

TASK: [exit playbook, if user did not mean to upgrade cluster] **************** 
skipping: [localhost]

PLAY [mons;osds;mdss;rgws] **************************************************** 

GATHERING FACTS *************************************************************** 
ok: [magna084]
ok: [magna078]
ok: [magna085]
ok: [magna091]
ok: [magna090]
ok: [magna094]

TASK: [debug msg="gather facts on all Ceph hosts for following reference"] **** 
ok: [magna078] => {
    "msg": "gather facts on all Ceph hosts for following reference"
}
ok: [magna084] => {
    "msg": "gather facts on all Ceph hosts for following reference"
}
ok: [magna085] => {
    "msg": "gather facts on all Ceph hosts for following reference"
}
ok: [magna094] => {
    "msg": "gather facts on all Ceph hosts for following reference"
}
ok: [magna090] => {
    "msg": "gather facts on all Ceph hosts for following reference"
}
ok: [magna091] => {
    "msg": "gather facts on all Ceph hosts for following reference"
}

TASK: [check if sysvinit] ***************************************************** 
ok: [magna084]
ok: [magna090]
ok: [magna091]
ok: [magna078]
ok: [magna085]
ok: [magna094]

TASK: [check if upstart] ****************************************************** 
ok: [magna084]
ok: [magna078]
ok: [magna090]
ok: [magna085]
ok: [magna091]
ok: [magna094]

TASK: [check if systemd] ****************************************************** 
changed: [magna090]
changed: [magna084]
changed: [magna085]
changed: [magna078]
changed: [magna094]
changed: [magna091]

PLAY [mons] ******************************************************************* 

GATHERING FACTS *************************************************************** 
ok: [magna084]
ok: [magna078]
ok: [magna085]

TASK: [compress the store as much as possible] ******************************** 
changed: [magna078]



Expected results:
=================
IT should update all nodes


Additional info:

Comment 3 seb 2016-09-02 15:21:45 UTC
Can you give me the state of the cluster prior to run this?
Are all the monitors started?
Can you try to run the compress command manually on the monitor nodes?

Comment 5 Rachana Patel 2016-09-06 15:08:07 UTC
I do have one question here.
AFAIK, all MON share same db then why do we need to compress on all MONs?

We can compress on one MON node and it should work fine, right?

Please correct me if I am wrong

Comment 11 seb 2016-09-13 08:07:18 UTC
It's weird that we don't know the root cause of that, even if the compaction is not needed by the upgrade, I think it's a nice to have.
I ran the playbook several times and the only case where the compact command hung was the monitor being stopped...

I can remove the compact command from the playbook anyway.

Comment 24 errata-xmlrpc 2016-10-19 15:22:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2016:2082


Note You need to log in before you can comment on or make changes to this bug.