Bug 1816482 - Ceph cluster degraded when updating Storage nodes
Summary: Ceph cluster degraded when updating Storage nodes
Status: NEW
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo
Version: 13.0 (Queens)
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: ---
Assignee: James Slagle
QA Contact: Arik Chernetsky
Depends On:
TreeView+ depends on / blocked
Reported: 2020-03-24 05:20 UTC by Chris Smart
Modified: 2020-03-28 04:32 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Last Closed:
Target Upstream Version:

Attachments (Terms of Use)

Description Chris Smart 2020-03-24 05:20:06 UTC
Description of problem:

When performing overcloud update of ceph storage nodes, the containers are stopped and OSDs go offline which causes the cluster to go into degraded state. The cluster then has to rebalance.

Even though updates are done in serial, there is potentially a risk here that the cluster might still be in degraded state by the time the first, second and third storage nodes are being updated. This might cause data loss or cause the ceph to stop serving until it meets min size.

Perhaps the update process should first ensure the ceph cluster is in a healthy state before proceeding with the each node update. If not, wait for some time and check again. This way we can mitigate the risk of data loss.

Version-Release number of selected component (if applicable):

RHOSP 13.11

How reproducible:

Steps to Reproduce:
1. Prepare update 'openstack overcloud update prepare'
2. Update first ceph storage node 'openstack overcloud update run --nodes ceph-storage-0'
3. Watch cluster with 'ceph -s'

Actual results:
Node instantly proceeds with update and cluster goes into degraded state.

Expected results:
Update should check that the cluster is healthy before proceeding.

Additional info:

Comment 1 Chris Smart 2020-03-24 05:29:20 UTC
FYI, did something like this to make sure it was healthy before moving on.

source ~/stackrc
for node in $(openstack server list -f value -c Name |grep ceph-storage |sort -V); do
  while [[ ! "$(ssh -q controller-0 'sudo ceph -s |grep health:')" =~ "HEALTH_OK" ]] ; do
    echo 'cluster not healthy, sleeping before updating ${node}'
    sleep 5
  echo 'cluster healthy, updating ${node}'
  openstack overcloud update run --nodes "${node}" || { echo 'failed to update ${node}, exiting'; exit 1 ;}
  echo 'updated ${node} successfully'

Comment 2 Chris Smart 2020-03-26 11:46:32 UTC
Even when doing a redeploy of RHOSP over the top (no update), it's restarting all OSD containers and taking each OSD out, which is causing backfilling and recovering.

Else with container restart for every single OSD in the cluster it's having to shuffle data around until all pgs are active+clean again, which is making a simple redeploy take several hours longer than it should....

I might try with noout, norecover, norebalance and nobackfill set to stop this from happening while the deploy is being run. As containers are restarted quickly I'm hoping this won't be a problem, but I'm not sure what ceph-ansible will be looking for (hopefully just active+clean pgs, not HEALTH_OK as setting those flags will put cluster in HEALTH_WARN).

Comment 3 Chris Smart 2020-03-26 23:48:36 UTC
Setting noout, norecover, norebalance and nobackfill flags before a deploy resulted in expected behaviour.

I'm not quite sure why with a redeploy with no ceph config changes is resulting in taking down each OSD but it doesn't seem right...

Note You need to log in before you can comment on or make changes to this bug.