1846095 – During OCS upgrade: No wait time enforced between OSD pod respins in case there is change only in ROOK_CEPH_IMAGE and not CEPH_IMAGE

Bug 1846095 - During OCS upgrade: No wait time enforced between OSD pod respins in case there is change only in ROOK_CEPH_IMAGE and not CEPH_IMAGE

Summary: During OCS upgrade: No wait time enforced between OSD pod respins in case the...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenShift Container Storage
Classification:	Red Hat Storage
Component:	rook
Sub Component:
Version:	4.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	OCS 4.5.0
Assignee:	Travis Nielsen
QA Contact:	Aviad Polak
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-06-10 18:22 UTC by Neha Berry
Modified:	2020-09-15 10:18 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-09-15 10:17:41 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2020:3754	0	None	None	None	2020-09-15 10:18:00 UTC

Description Neha Berry 2020-06-10 18:22:49 UTC

>> Description of problem (please be detailed as possible and provide log
snippests):
----------------------------------------------------------------------

While trying to figure out why some upgrades wait for 10 mins between each OSD pod re-spin and when not, we confirmed the following behavior:

1. During OCS upgrade, the ceph pods like MONs, MGR, MDS, RGW are re-spinned only once, in case there is a change in CEPH_IMAGE. 
2. The OSD pods comprises of containers with different Images - CEPH_IMAGE and ROOK_CEPH_IMAGE. Hence, if there is a change in any(or both) of these 2 images, the OSD pods are re-spinned during upgrade

OSD pod respin wait time of 10 min is enforced via rook operator only if there is a chnage in ceph image.
If the OSD pods are respinned due to ROOK_CEPH_IMAGE change, they are re-spinned one after the other without any wait time at all. No ceph health check is performed via rook operator.

No matter which Image is getting upgraded, is it OK to re-spin all OSDs immediately after one another and not wait for PG status ? The PGs are already in unclean state when the next OSD also respins. tested this on a 6 OSD setup and the total time taken for 6 OSD upgrade was ~5m

Though after a detailed discussion in chatroom, it was conferred that only change in CEPH_IMAHE is treated as an upgrade correctly and Rook will maintain wait between OSDs. But wanted to raise this issue to confirm if we are ever planning to introduce this same WAIT when the pod respin because of ROOK_CEPH_IMAGE version


BZ which confirm wait time of 10 min when ROOK_CEPH_IMAGE is changed: Bug 1840729

As sene below, all 6 OSD pods were re-spinned within max 5m34s

rook-ceph-osd-0-c8cdcc6fd-4wz4h                                   1/1     Running     0          5m34s   10.129.2.75   compute-1   <none>           <none>
rook-ceph-osd-1-84958c4774-5bg5t                                  1/1     Running     0          2m23s   10.128.2.52   compute-0   <none>           <none>
rook-ceph-osd-2-5b8c8f6bf8-hwxsr                                  1/1     Running     0          3m50s   10.131.0.90   compute-2   <none>           <none>
rook-ceph-osd-3-5cfb6b4dd8-jrp6m                                  1/1     Running     0          4m14s   10.129.2.76   compute-1   <none>           <none>
rook-ceph-osd-4-685f7c6f89-25clx                                  1/1     Running     0          113s    10.128.2.53   compute-0   <none>           <none>
rook-ceph-osd-5-79cc7745ff-whpqg                                  1/1     Running     0          3m10s   10.131.0.92   compute-2   <none>           <none>




>> Version of all relevant components (if applicable):
----------------------------------------------------------------------
Pre-upgrade OCS = 4.3 GA
Post upgrade OCS = 4.4 GA
OCP version = 4.4.6(GA)

IN 4.3

 - name: ROOK_CEPH_IMAGE
    quay.io/rhceph-dev/rook-ceph@sha256:8dee92b1f069fe7d5a00d4427a56b15f55034d58013e0f30bb68859bbc608914
 - name: CEPH_IMAGE
     value: registry.redhat.io/rhceph/rhceph-4-rhel8@sha256:9e521d33c1b3c7f5899a8a5f36eee423b8003827b7d12d780a58a701d0a64f0d

In 4.4


 - name: ROOK_CEPH_IMAGE
     value: registry.redhat.io/ocs4/rook-ceph-rhel8-operator@sha256:8dee92b1f069fe7d5a00d4427a56b15f55034d58013e0f30bb68859bbc60891  <<<--- change in this image caused a respin of OSD pods
 - name: CEPH_IMAGE
     value: registry.redhat.io/rhceph/rhceph-4-rhel8@sha256:9e521d33c1b3c7f5899a8a5f36eee423b8003827b7d12d780a58a701d0a64f0d   <<<<--- same as COS 4.3

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
----------------------------------------------------------------------
No. I do not know the real impact as for whatever IO was running on my cluster, I didnt see any IO to fail. But, not sure if re spinning all the OSD pods in such quick succession could cause any user data related error.

Is there any workaround available to the best of your knowledge?
----------------------------------------------------------------------
No

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
----------------------------------------------------------------------
3

Can this issue reproducible?
----------------------------------------------------------------------
Yes

Can this issue reproduce from the UI?
----------------------------------------------------------------------
No, but upgrade was initiated from UI
If this is a regression, please provide more details to justify this:
----------------------------------------------------------------------
No.This has been the behavior in previous builds as well.

Steps to Reproduce:
----------------------------------------------------------------------
1. Create an OCS 4.3 cluster and add capacity to have 6 OSDs on the cluster
2. Initiate some IO via fedora-pods(FIO), pgsql, etc to use up same space in the ceph cluster
3. From the UI, change the channel to stable-4.4
4. With Approval Strategy: Automatic, the Upgrade from 4.3 to 4.4 will be triggered automatically
5. It was observed that CEPH_IMAGE was same between 4.3 and 4.4 builds, but there was a change in ROOK_CEPH_IMAGE and this resulted in 6 OSD pods being respinned with newer image within 5-6 mins. No check for PG satus was performed via rook operator.


Actual results:
----------------------------------------------------------------------
During OCS upgrade, currently, only change in CEPH_IMAGE will be treated as an upgrade correctly and Rook will wait between each OSDs. Not for change in ROOK_CEPH_IMAGE

Expected results:
----------------------------------------------------------------------
During upgrade, any one of Rook/OCS operator should control WIAT_TIME for each OSD pod respin, be it due to change in CEPH_IMAGE or ROOK_CEPH_IMAGE.

Additional info:
----------------------------------------------------------------------

There was no mention of following message in rook logs


"util: retrying after 1m0s, last error: cluster is not fully clean" 



mon: 3 daemons, quorum a,b,c (age 7h)
    mgr: a(active, since 7h)
    mds: ocs-storagecluster-cephfilesystem:1 {0=ocs-storagecluster-cephfilesystem-a=up:active} 1 up:standby-replay
    osd: 6 osds: 6 up (since 2m), 6 in (since 6h); 85 remapped pgs
    rgw: 1 daemon active (ocs.storagecluster.cephobjectstore.a)


 data:
    pools:   10 pools, 192 pgs
    objects: 144.64k objects, 186 GiB
    usage:   558 GiB used, 2.4 TiB / 3.0 TiB avail
    pgs:     16941/433911 objects degraded (3.904%)

Comment 3 Travis Nielsen 2020-06-10 19:42:21 UTC

Since Rook v1.3 upstream (and OCS 4.5), the OSD upgrade behavior has already changed. 

Previously, there was only a wait when upgrading the OSDs if the ceph image was updated,
but not if the Rook image was upgrade. This is what you are seeing in the 4.3 and 4.4 releases.

Now the OSD upgrade behavior is to check if there is a difference in the pod spec to determine 
if we should wait during the upgrade. I wouldn't expect to hit this issue anymore.
@leseb, Please correct me if needed.

@Neha in that case, please confirm if it is already fixed in 4.5 builds.

Comment 4 Sébastien Han 2020-06-11 09:13:10 UTC

That's correct Travis.

Comment 5 Travis Nielsen 2020-06-15 18:30:39 UTC

Acking as fixed in 4.5 and moving to ON_QA to validate.

Comment 9 Neha Berry 2020-08-24 14:53:19 UTC

Hi Travis,

With current Upgrade builds - OCS 4.4.2 and OCS 4.5, even if we select 2 builds whose Ceph version were same internally, e.g. OCS 4.5 (v4.5.0-43.ci) , OCS 4.4.2 -GA, these are the differences, hence replicating the exact same behavior to verify is tough


1.  OCS 4.4 had both rhceph and rook-ceph-rhel8 versions in the pod containers.

http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jnk-vu1cs33-t1/jnk-vu1cs33-t1_20200805T161916/logs/failed_testcase_ocs_logs_1596649015/test_add_capacity_ocs_logs/ocs_must_gather/quay-io-rhceph-dev-ocs-must-gather-sha256-183cc9be0eaec7e3ecf74cce99cfe511f296f1e023798bb5296953d3c3ffb14f/ceph/namespaces/openshift-storage/pods/rook-ceph-osd-0-777ff99fcd-dxjv4/rook-ceph-osd-0-777ff99fcd-dxjv4.yaml

2. OCS 4.5 only has rhceph-dev 

http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/bug-1860418/must-gather.local.6020862434318975465/ceph/namespaces/openshift-storage/pods/rook-ceph-osd-0-b7859999b-xr5q6/rook-ceph-osd-0-b7859999b-xr5q6.yaml

So, could you let us know what all things we need to verify or is there any other upgrade path by which we can test this

Comment 10 Travis Nielsen 2020-08-24 22:59:34 UTC

@Neha You could simulate the upgrade scenarios with the following:

1) Simulate only a change in the ceph image
  - Install OCS 4.5
  - Change the ceph image tag so that it appears to be a different image and set it in the storage cluster CR
  - Watch that the ceph pods are all updated, and pod restarts wait for clean PGs

2) Simulate that the rook deployment has changed
  - Install OCS 4.5
  - Change something in the deployment/pod spec for an OSD, such as add a new label
  - Restart the rook operator
  - Watch that the OSD is restarted because its pod spec changed

Comment 13 errata-xmlrpc 2020-09-15 10:17:41 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenShift Container Storage 4.5.0 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3754

Note You need to log in before you can comment on or make changes to this bug.