Bug 2218013

Summary:	CPU and latency spikes during periodic snaptrim operations
Product:	[Red Hat Storage] Red Hat Ceph Storage	Reporter:	Neha Ojha <nojha>
Component:	RADOS	Assignee:	Adam Kupczyk <akupczyk>
Status:	CLOSED ERRATA	QA Contact:	skanta
Severity:	urgent	Docs Contact:	Rivka Pollack <rpollack>
Priority:	unspecified
Version:	6.0	CC:	adking, akupczyk, amagrawa, amathuri, bhubbard, ceph-eng-bugs, cephqe-warriors, choffman, ekuric, idryomov, jdurgin, kdreyer, kseeger, ksirivad, lflores, ngangadh, nojha, pdhange, pdhiran, rfriedma, rpollack, rzarzyns, skanta, srangana, sseshasa, tnielsen, torkil, tpetr, tserlin, vdas, vereddy, vumrao
Target Milestone:	---	Keywords:	TestBlocker
Target Release:	7.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	ceph-18.2.0-1	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:	2119217	Environment:
Last Closed:	2023-12-13 15:20:35 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	2119217
Bug Blocks:	2077047, 2120239, 2150996, 2154351, 2169499, 2190382, 2237662

Description Neha Ojha 2023-06-27 20:24:03 UTC

+++ This bug was initially created as a clone of Bug #2119217 +++

While testing snapshot based rbd-mirror with a random rw workload, Paul Cuzner noticed that at the start of every replication interval the ceph-osd cpu consumption spikes dramatically, and continues to grow over time. For example, at the beginning of a run the cpu spike was 60% of a core...but after 24 hours with the same randrw workload running this grows to 1.5-2 cores.

The cpu overhead appears worse for 4KB block sizes, than for IO sizes of 16KB or more.

The change rate within the snapshot is only 250MB every 5 mins - The workload is just 20 rbd images, using rate limited fio which caps each rbd image to 50 IOPS (40 read + 10 write)

If the host is not capping the OSD this issue, will likely go unnoticed but in environments like k8s where the OSD is capped this is more of a problem.

This translates to high latency for clients during these spikes, and with the growth over time, it means continual performance degradation.

--- Additional comment from Josh Durgin on 2022-08-17 22:44:10 UTC ---

Paul Cuzner's experiments and analysis leading to this are described here: https://docs.google.com/document/d/13ms1bptpnra7Inyk70ZoeqtIJtx0sUPXaXtkHuxVZk4/edit?usp=sharing

--- Additional comment from Neha Ojha on 2022-11-02 16:23:16 UTC ---

Thanks to Thomas, we have a branch based on the current state of 6.0:   https://gitlab.cee.redhat.com/ceph/ceph/-/commits/private-tserlin-ceph-6.0-rhel-9-test-bz2119217-patches for early performance testing.

Current working doc: https://docs.google.com/document/d/16QPNZDumOOONL20E3YtKYJf-n_NXBYqRU22pLZAgeSE/edit#heading=h.mi6k5ka1jby2

--- Additional comment from Elvir Kuric on 2022-11-02 17:05:19 UTC ---

(In reply to Neha Ojha from comment #2)
> Thanks to Thomas, we have a branch based on the current state of 6.0:  
> https://gitlab.cee.redhat.com/ceph/ceph/-/commits/private-tserlin-ceph-6.0-
> rhel-9-test-bz2119217-patches for early performance testing.
> 
> Current working doc:
> https://docs.google.com/document/d/16QPNZDumOOONL20E3YtKYJf-
> n_NXBYqRU22pLZAgeSE/edit#heading=h.mi6k5ka1jby2

will new image be at https://quay.ceph.io/ceph-ci/ceph ( once it is back online ) and what image tag will be used for it.
Thank you 
Elvir

--- Additional comment from Neha Ojha on 2022-11-02 18:29:04 UTC ---

(In reply to Elvir Kuric from comment #3)
> (In reply to Neha Ojha from comment #2)
> > Thanks to Thomas, we have a branch based on the current state of 6.0:  
> > https://gitlab.cee.redhat.com/ceph/ceph/-/commits/private-tserlin-ceph-6.0-
> > rhel-9-test-bz2119217-patches for early performance testing.
> > 
> > Current working doc:
> > https://docs.google.com/document/d/16QPNZDumOOONL20E3YtKYJf-
> > n_NXBYqRU22pLZAgeSE/edit#heading=h.mi6k5ka1jby2
> 
> will new image be at https://quay.ceph.io/ceph-ci/ceph ( once it is back
> online ) and what image tag will be used for it.
> Thank you 
> Elvir

Thomas, I believe Adam has pushed his changes to private-tserlin-ceph-6.0-rhel-9-test-bz2119217-patches, can you please let us know when the image is ready.

Elvir, you will receive a downstream container image because upstream https://quay.ceph.io/ceph-ci/ceph is still down.

--- Additional comment from  on 2022-11-03 02:36:00 UTC ---

The testfix container is ready:

* rhceph-container-6-55.0.TEST.bz2119217
* Pull from: registry-proxy.engineering.redhat.com/rh-osbs/rhceph:6-55.0.TEST.bz2119217
* Brew link: https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=2243733
* ceph testfix build in container: ceph-17.2.5-8.0.TEST.bz2119217.el9cp

Based on this -patches branch (5c31df0e91285684fcb133a48ac24948aa3a9785):
https://gitlab.cee.redhat.com/ceph/ceph/-/commits/private-tserlin-ceph-6.0-rhel-9-test-bz2119217-patches

Thomas

--- Additional comment from Neha Ojha on 2022-11-04 19:02:03 UTC ---

(In reply to tserlin from comment #5)
> The testfix container is ready:
> 
> * rhceph-container-6-55.0.TEST.bz2119217
> * Pull from:
> registry-proxy.engineering.redhat.com/rh-osbs/rhceph:6-55.0.TEST.bz2119217
> * Brew link:
> https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=2243733
> * ceph testfix build in container: ceph-17.2.5-8.0.TEST.bz2119217.el9cp
> 
> Based on this -patches branch (5c31df0e91285684fcb133a48ac24948aa3a9785):
> https://gitlab.cee.redhat.com/ceph/ceph/-/commits/private-tserlin-ceph-6.0-
> rhel-9-test-bz2119217-patches
> 
> Thomas

Thanks Thomas. 

Hi Elvir, the container images are ready, please let us know when you'll have a chance to run another round of tests.

--- Additional comment from Elvir Kuric on 2022-11-07 12:03:19 UTC ---

(In reply to Neha Ojha from comment #6)
> (In reply to tserlin from comment #5)
> > The testfix container is ready:
> > 
> > * rhceph-container-6-55.0.TEST.bz2119217
> > * Pull from:
> > registry-proxy.engineering.redhat.com/rh-osbs/rhceph:6-55.0.TEST.bz2119217
> > * Brew link:
> > https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=2243733
> > * ceph testfix build in container: ceph-17.2.5-8.0.TEST.bz2119217.el9cp
> > 
> > Based on this -patches branch (5c31df0e91285684fcb133a48ac24948aa3a9785):
> > https://gitlab.cee.redhat.com/ceph/ceph/-/commits/private-tserlin-ceph-6.0-
> > rhel-9-test-bz2119217-patches
> > 
> > Thomas
> 
> Thanks Thomas. 
> 
> Hi Elvir, the container images are ready, please let us know when you'll
> have a chance to run another round of tests.

I am going to use below image ( based on https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=2243733 ) to upgrade clusters 

registry-proxy.engineering.redhat.com/rh-osbs/rhceph:6-55.0.TEST.bz2119217

I can reach it with "podman pull registry-proxy.engineering.redhat.com/rh-osbs/rhceph:6-55.0.TEST.bz2119217" and to upgrade clusters I am going to use 

# ceph orch upgrade start --image registry-proxy.engineering.redhat.com/rh-osbs/rhceph:6-55.0.TEST.bz2119217

based on https://docs.ceph.com/en/quincy/cephadm/upgrade/#starting-the-upgrade

--- Additional comment from Elvir Kuric on 2022-11-08 11:00:55 UTC ---

On test clusters which is in healthy state I wanted to upgrade to test image registry-proxy.engineering.redhat.com/rh-osbs/rhceph:6-55.0.TEST.bz2119217

Can someone advise me here what I am doing wrong, why upgrade process does not start - but it should https://docs.ceph.com/en/quincy/cephadm/upgrade/#starting-the-upgrade 
Comments are welcome , thank you in advance, 
Elvir 

--- logs --- 

# cluster1  orch upgrade start --image registry-proxy.engineering.redhat.com/rh-osbs/rhceph:6-55.0.TEST.bz2119217
Initiating upgrade to registry-proxy.engineering.redhat.com/rh-osbs/rhceph:6-55.0.TEST.bz2119217


in second console 

# ceph -W cephadm

2022-11-08T10:37:42.727226+0000 mgr.f09-h01-000-1029u.rdu2.scalelab.redhat.com.nnirqz [INF] Upgrade: Started with target registry-proxy.engineering.redhat.com/rh-osbs/rhceph:6-55.0.TEST.bz2119217
2022-11-08T10:37:42.876399+0000 mgr.f09-h01-000-1029u.rdu2.scalelab.redhat.com.nnirqz [INF] Upgrade: First pull of registry-proxy.engineering.redhat.com/rh-osbs/rhceph:6-55.0.TEST.bz2119217
2022-11-08T10:37:44.862471+0000 mgr.f09-h01-000-1029u.rdu2.scalelab.redhat.com.nnirqz [INF] Upgrade: Target is version 17.2.5-8.0.TEST.bz2119217.el9cp (quincy)
2022-11-08T10:37:44.862544+0000 mgr.f09-h01-000-1029u.rdu2.scalelab.redhat.com.nnirqz [INF] Upgrade: Target container is registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:431d7d287041b25e3fae2c920b9f040b3ee20af61ebf4d10f9d6131d767914dc, digests ['registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:431d7d287041b25e3fae2c920b9f040b3ee20af61ebf4d10f9d6131d767914dc', 'registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:899805e337013cce4143c815b7b6fb1f3d615505850a7ee85a5b9b542ea44a59']
2022-11-08T10:37:44.862631+0000 mgr.f09-h01-000-1029u.rdu2.scalelab.redhat.com.nnirqz [ERR] Upgrade: Paused due to UPGRADE_BAD_TARGET_VERSION: Upgrade: cannot upgrade/downgrade to 17.2.5-8.0.TEST.bz2119217.el9cp



# cluster1  orch upgrade status
{
	"target_image": "registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:431d7d287041b25e3fae2c920b9f040b3ee20af61ebf4d10f9d6131d767914dc",
	"in_progress": true,
	"which": "Upgrading all daemon types on all hosts",
	"services_complete": [],
	"progress": "0/22 daemons upgraded",
	"message": "Error: UPGRADE_BAD_TARGET_VERSION: Upgrade: cannot upgrade/downgrade to 17.2.5-8.0.TEST.bz2119217.el9cp",
	"is_paused": true
}

--------------------

# cluster1 -s
  cluster:
	id:     6a296ec8-483a-11ed-9fd7-ac1f6b7abb24
	health: HEALTH_ERR
			Upgrade: cannot upgrade/downgrade to 17.2.5-8.0.TEST.bz2119217.el9cp
 
  services:
	mon:        3 daemons, quorum f09-h01-000-1029u.rdu2.scalelab.redhat.com,f09-h02-000-1029u,f09-h03-000-1029u (age 4w)
	mgr:        f09-h01-000-1029u.rdu2.scalelab.redhat.com.nnirqz(active, since 4w), standbys: f09-h02-000-1029u.mzsguz
	osd:        12 osds: 12 up (since 4w), 12 in (since 4w)
	rbd-mirror: 1 daemon active (1 hosts)
	rgw:        1 daemon active (1 hosts, 1 zones)
 
  data:
	pools:   14 pools, 417 pgs
	objects: 42.43k objects, 111 GiB
	usage:   353 GiB used, 4.9 TiB / 5.2 TiB avail
	pgs:     417 active+clean
 
  io:
	client:   148 KiB/s rd, 16 KiB/s wr, 172 op/s rd, 22 op/s wr
 
  progress:
	Upgrade to registry-proxy.engineering.redhat.com/rh-osbs/rhceph:6-55.0.TEST.bz2119217 (0s)
	  [............................] 



I can stop upgrade process 

#  cluster1 orch upgrade stop
Stopped upgrade to registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:431d7d287041b25e3fae2c920b9f040b3ee20af61ebf4d10f9d6131d767914dc

# cluster1 -s
  cluster:
    id:     6a296ec8-483a-11ed-9fd7-ac1f6b7abb24
    health: HEALTH_OK
 
  services:
    mon:        3 daemons, quorum f09-h01-000-1029u.rdu2.scalelab.redhat.com,f09-h02-000-1029u,f09-h03-000-1029u (age 4w)
    mgr:        f09-h01-000-1029u.rdu2.scalelab.redhat.com.nnirqz(active, since 4w), standbys: f09-h02-000-1029u.mzsguz
    osd:        12 osds: 12 up (since 4w), 12 in (since 4w)
    rbd-mirror: 1 daemon active (1 hosts)
    rgw:        1 daemon active (1 hosts, 1 zones)
 
  data:
    pools:   14 pools, 417 pgs
    objects: 42.43k objects, 111 GiB
    usage:   353 GiB used, 4.9 TiB / 5.2 TiB avail
    pgs:     417 active+clean
 
  io:
    client:   18 KiB/s rd, 1.7 KiB/s wr, 18 op/s rd, 1 op/s wr
 


and it is in HEALTH_OK state 

--- 
# for d in mgr mon crash osd mds rgw rbd-mirror cephfs-mirror iscsi nfs ; do echo "daemon ---- ---- $d -----------";  cluster1 orch ps --daemon-type $d; done 
daemon ---- ---- mgr -----------
NAME                                                   HOST                                        PORTS        STATUS        REFRESHED  AGE  MEM USE  MEM LIM  VERSION               IMAGE ID      CONTAINER ID  
mgr.f09-h01-000-1029u.rdu2.scalelab.redhat.com.nnirqz  f09-h01-000-1029u.rdu2.scalelab.redhat.com  *:9283,8765  running (4w)     3m ago   4w     668M        -  18.0.0-333-g29fc1bfd  ddb01ae703b8  1e5d8a03bcb9  
mgr.f09-h02-000-1029u.mzsguz                           f09-h02-000-1029u.rdu2.scalelab.redhat.com               running (4w)     2m ago   4w     434M        -  18.0.0-333-g29fc1bfd  ddb01ae703b8  e6d5d5df14b3  
daemon ---- ---- mon -----------
NAME                                            HOST                                        PORTS  STATUS        REFRESHED  AGE  MEM USE  MEM LIM  VERSION               IMAGE ID      CONTAINER ID  
mon.f09-h01-000-1029u.rdu2.scalelab.redhat.com  f09-h01-000-1029u.rdu2.scalelab.redhat.com         running (4w)     3m ago   4w    1389M    2048M  18.0.0-333-g29fc1bfd  ddb01ae703b8  74b61f772cbb  
mon.f09-h02-000-1029u                           f09-h02-000-1029u.rdu2.scalelab.redhat.com         running (4w)     2m ago   4w    1502M    2048M  18.0.0-333-g29fc1bfd  ddb01ae703b8  8375126e287e  
mon.f09-h03-000-1029u                           f09-h03-000-1029u.rdu2.scalelab.redhat.com         running (4w)     4m ago   4w    1714M    2048M  18.0.0-333-g29fc1bfd  ddb01ae703b8  1020d028eb26  
daemon ---- ---- crash -----------
NAME                     HOST                                        PORTS  STATUS        REFRESHED  AGE  MEM USE  MEM LIM  VERSION               IMAGE ID      CONTAINER ID  
crash.f09-h01-000-1029u  f09-h01-000-1029u.rdu2.scalelab.redhat.com         running (4w)     3m ago   4w    7222k        -  18.0.0-333-g29fc1bfd  ddb01ae703b8  f4b4e36f15ae  
crash.f09-h02-000-1029u  f09-h02-000-1029u.rdu2.scalelab.redhat.com         running (4w)     2m ago   4w    7239k        -  18.0.0-333-g29fc1bfd  ddb01ae703b8  6a5a2c266cce  
crash.f09-h03-000-1029u  f09-h03-000-1029u.rdu2.scalelab.redhat.com         running (4w)     4m ago   4w    7788k        -  18.0.0-333-g29fc1bfd  ddb01ae703b8  f4c73b911425  
daemon ---- ---- osd -----------
NAME    HOST                                        PORTS  STATUS        REFRESHED  AGE  MEM USE  MEM LIM  VERSION               IMAGE ID      CONTAINER ID  
osd.0   f09-h03-000-1029u.rdu2.scalelab.redhat.com         running (4w)     4m ago   4w    88.2G    65.1G  18.0.0-333-g29fc1bfd  ddb01ae703b8  c5ea64f32588  
osd.1   f09-h02-000-1029u.rdu2.scalelab.redhat.com         running (4w)     2m ago   4w    96.2G    64.4G  18.0.0-333-g29fc1bfd  ddb01ae703b8  52328d182657  
osd.2   f09-h01-000-1029u.rdu2.scalelab.redhat.com         running (4w)     3m ago   4w    72.3G    64.1G  18.0.0-333-g29fc1bfd  ddb01ae703b8  d504ff6d326d  
osd.3   f09-h03-000-1029u.rdu2.scalelab.redhat.com         running (4w)     4m ago   4w     103G    65.1G  18.0.0-333-g29fc1bfd  ddb01ae703b8  03edf538e4d2  
osd.4   f09-h02-000-1029u.rdu2.scalelab.redhat.com         running (4w)     2m ago   4w    97.3G    64.4G  18.0.0-333-g29fc1bfd  ddb01ae703b8  b153acf3a82e  
osd.5   f09-h01-000-1029u.rdu2.scalelab.redhat.com         running (4w)     3m ago   4w    26.5G    64.1G  18.0.0-333-g29fc1bfd  ddb01ae703b8  f15133c10080  
osd.6   f09-h03-000-1029u.rdu2.scalelab.redhat.com         running (4w)     4m ago   4w    71.1G    65.1G  18.0.0-333-g29fc1bfd  ddb01ae703b8  fd96f3cc8245  
osd.7   f09-h02-000-1029u.rdu2.scalelab.redhat.com         running (4w)     2m ago   4w     122G    64.4G  18.0.0-333-g29fc1bfd  ddb01ae703b8  814c4a6146d0  
osd.8   f09-h01-000-1029u.rdu2.scalelab.redhat.com         running (4w)     3m ago   4w     158G    64.1G  18.0.0-333-g29fc1bfd  ddb01ae703b8  afb45e38e685  
osd.9   f09-h03-000-1029u.rdu2.scalelab.redhat.com         running (4w)     4m ago   4w     120G    65.1G  18.0.0-333-g29fc1bfd  ddb01ae703b8  012d80aa4566  
osd.10  f09-h02-000-1029u.rdu2.scalelab.redhat.com         running (4w)     2m ago   4w    64.9G    64.4G  18.0.0-333-g29fc1bfd  ddb01ae703b8  c74ad9d68d87  
osd.11  f09-h01-000-1029u.rdu2.scalelab.redhat.com         running (4w)     3m ago   4w     117G    64.1G  18.0.0-333-g29fc1bfd  ddb01ae703b8  dac65c2599d1  
daemon ---- ---- mds -----------
No daemons reported
daemon ---- ---- rgw -----------
NAME                                      HOST                                        PORTS  STATUS        REFRESHED  AGE  MEM USE  MEM LIM  VERSION               IMAGE ID      CONTAINER ID  
rgw.objectstore.f09-h01-000-1029u.nwqpgg  f09-h01-000-1029u.rdu2.scalelab.redhat.com  *:80   running (3w)     3m ago   3w    1210M        -  18.0.0-333-g29fc1bfd  ddb01ae703b8  6d362cc0350e  
daemon ---- ---- rbd-mirror -----------
NAME                                 HOST                                        PORTS  STATUS        REFRESHED  AGE  MEM USE  MEM LIM  VERSION               IMAGE ID      CONTAINER ID  
rbd-mirror.f09-h03-000-1029u.yjnwsk  f09-h03-000-1029u.rdu2.scalelab.redhat.com         running (4w)     4m ago   4w    56.3M        -  18.0.0-333-g29fc1bfd  ddb01ae703b8  7e0cf1e3b361  
daemon ---- ---- cephfs-mirror -----------
No daemons reported
daemon ---- ---- iscsi -----------
No daemons reported
daemon ---- ---- nfs -----------
No daemons reported

--- Additional comment from Josh Durgin on 2022-11-08 15:43:11 UTC ---

(In reply to Elvir Kuric from comment #8)
> On test clusters which is in healthy state I wanted to upgrade to test image
> registry-proxy.engineering.redhat.com/rh-osbs/rhceph:6-55.0.TEST.bz2119217
> 
> Can someone advise me here what I am doing wrong, why upgrade process does
> not start - but it should
> https://docs.ceph.com/en/quincy/cephadm/upgrade/#starting-the-upgrade 
> Comments are welcome , thank you in advance, 
> Elvir 

> # cluster1 -s
>   cluster:
> 	id:     6a296ec8-483a-11ed-9fd7-ac1f6b7abb24
> 	health: HEALTH_ERR
> 			Upgrade: cannot upgrade/downgrade to 17.2.5-8.0.TEST.bz2119217.el9cp

"ceph health detail" should show why cephadm thinks it can't upgrade to this version.

@adking may be able to help further with how to get past that

--- Additional comment from Elvir Kuric on 2022-11-08 15:46:20 UTC ---

# cluster1  health detail

HEALTH_ERR Upgrade: cannot upgrade/downgrade to 17.2.5-8.0.TEST.bz2119217.el9cp
[ERR] UPGRADE_BAD_TARGET_VERSION: Upgrade: cannot upgrade/downgrade to 17.2.5-8.0.TEST.bz2119217.el9cp
    ceph cannot downgrade major versions (from 18.0.0-333-g29fc1bfd (29fc1bfd4c90dd618eb9e0d4ae6474d8cfa5dfdf) reef (dev) to 17.2.5-8.0.TEST.bz2119217.el9cp)

--- Additional comment from Adam King on 2022-11-08 17:41:13 UTC ---

(In reply to Elvir Kuric from comment #10)
> # cluster1  health detail
> 
> HEALTH_ERR Upgrade: cannot upgrade/downgrade to
> 17.2.5-8.0.TEST.bz2119217.el9cp
> [ERR] UPGRADE_BAD_TARGET_VERSION: Upgrade: cannot upgrade/downgrade to
> 17.2.5-8.0.TEST.bz2119217.el9cp
>     ceph cannot downgrade major versions (from 18.0.0-333-g29fc1bfd
> (29fc1bfd4c90dd618eb9e0d4ae6474d8cfa5dfdf) reef (dev) to
> 17.2.5-8.0.TEST.bz2119217.el9cp)

It looks like the version you're currently on is considered v18 (some main branch image? v18 is for Reef) and you're trying to "upgrade" to a v17 image (which would be quincy or perhaps an older main image before v18 was set up). Gathering here that you're testing a patch built on top of 6.0. Is there a reason for starting from a main branch image here and trying to upgrade to a quincy (RHCS 6) image from there? Cephadm is blocking it because as far as it's concerned you're trying to downgrade across major versions which isn't supported.

If this is necessary I think you can technically get around it by first manually upgrading the mgr daemons by redeploying them with a specified image. Note that this doesn't work properly if passed the active mgr (there's a fix open but it wouldn't be in any build you're testing). So you'd have to redeploy the standby mgr(s), do a fail over, then upgrade the previously active one. E.g. with a mgr on vm-00 and vm-01 with vm-00 being the active one I did this with


ceph orch daemon redeploy mgr.vm-01.xnmtnp --image quay.io/ceph/ceph:v17.2.5
Wait until "ceph versions" reports a mgr on v17
ceph mgr fail
ceph orch daemon redeploy mgr.vm-00.motjgc --image quay.io/ceph/ceph:v17.2.5


Then once both the mgr daemons were on v17 I was able to "upgrade" to v17.2.5 from a cluster previously using a v18 image.

I'd keep in mind that this sort of jump from main back to quincy isn't tested or supported so not sure what results you'd get with these particular images, but that's the workaround if starting from whatever main branch image is being used is necessary.

--- Additional comment from Vikhyat Umrao on 2022-11-08 17:47:49 UTC ---

+1 I think we should not downgrade this cluster, we should redeploy! Coming from v18 to v17 is not good we do not know what kind of things it will brake or will have effects in operations!

Josh - thoughts?

--- Additional comment from Josh Durgin on 2022-11-08 19:12:19 UTC ---

(In reply to Vikhyat Umrao from comment #12)
> +1 I think we should not downgrade this cluster, we should redeploy! Coming
> from v18 to v17 is not good we do not know what kind of things it will brake
> or will have effects in operations!
> 
> Josh - thoughts?

Agreed

--- Additional comment from Red Hat Bugzilla on 2022-12-31 11:31:16 UTC ---

remove performed by PnT Account Manager <pnt-expunge>

--- Additional comment from Red Hat Bugzilla on 2022-12-31 11:31:18 UTC ---

remove performed by PnT Account Manager <pnt-expunge>

--- Additional comment from Red Hat Bugzilla on 2022-12-31 19:13:26 UTC ---

remove performed by PnT Account Manager <pnt-expunge>

--- Additional comment from Red Hat Bugzilla on 2022-12-31 19:32:34 UTC ---

remove performed by PnT Account Manager <pnt-expunge>

--- Additional comment from Red Hat Bugzilla on 2022-12-31 19:59:58 UTC ---

remove performed by PnT Account Manager <pnt-expunge>

--- Additional comment from Red Hat Bugzilla on 2022-12-31 20:04:22 UTC ---

remove performed by PnT Account Manager <pnt-expunge>

--- Additional comment from Red Hat Bugzilla on 2022-12-31 22:43:29 UTC ---

remove performed by PnT Account Manager <pnt-expunge>

--- Additional comment from Red Hat Bugzilla on 2022-12-31 23:43:32 UTC ---

remove performed by PnT Account Manager <pnt-expunge>

--- Additional comment from Red Hat Bugzilla on 2022-12-31 23:45:36 UTC ---

remove performed by PnT Account Manager <pnt-expunge>

--- Additional comment from Red Hat Bugzilla on 2022-12-31 23:45:50 UTC ---

remove performed by PnT Account Manager <pnt-expunge>

--- Additional comment from Red Hat Bugzilla on 2023-01-01 05:35:19 UTC ---

remove performed by PnT Account Manager <pnt-expunge>

--- Additional comment from Red Hat Bugzilla on 2023-01-01 05:37:28 UTC ---

remove performed by PnT Account Manager <pnt-expunge>

--- Additional comment from Red Hat Bugzilla on 2023-01-01 05:39:35 UTC ---

remove performed by PnT Account Manager <pnt-expunge>

--- Additional comment from Red Hat Bugzilla on 2023-01-01 05:47:48 UTC ---

remove performed by PnT Account Manager <pnt-expunge>

--- Additional comment from Red Hat Bugzilla on 2023-01-01 05:47:59 UTC ---

remove performed by PnT Account Manager <pnt-expunge>

--- Additional comment from Red Hat Bugzilla on 2023-01-01 05:48:10 UTC ---

remove performed by PnT Account Manager <pnt-expunge>

--- Additional comment from Red Hat Bugzilla on 2023-01-01 06:27:06 UTC ---

remove performed by PnT Account Manager <pnt-expunge>

--- Additional comment from Red Hat Bugzilla on 2023-01-01 06:29:00 UTC ---

remove performed by PnT Account Manager <pnt-expunge>

--- Additional comment from Red Hat Bugzilla on 2023-01-01 08:38:28 UTC ---

Account disabled by LDAP Audit

--- Additional comment from Red Hat Bugzilla on 2023-01-01 08:39:30 UTC ---

Account disabled by LDAP Audit

--- Additional comment from Red Hat Bugzilla on 2023-01-01 08:45:17 UTC ---

Account disabled by LDAP Audit

--- Additional comment from Red Hat Bugzilla on 2023-01-01 08:47:12 UTC ---

Account disabled by LDAP Audit

--- Additional comment from Red Hat Bugzilla on 2023-01-01 08:48:24 UTC ---

remove performed by PnT Account Manager <pnt-expunge>

--- Additional comment from Red Hat Bugzilla on 2023-01-01 08:49:50 UTC ---

Account disabled by LDAP Audit

--- Additional comment from Red Hat Bugzilla on 2023-01-01 08:51:50 UTC ---

remove performed by PnT Account Manager <pnt-expunge>

--- Additional comment from Red Hat Bugzilla on 2023-01-09 08:29:43 UTC ---

Account disabled by LDAP Audit for extended failure

--- Additional comment from RHEL Program Management on 2023-03-29 17:40:55 UTC ---

This bug report has Keywords: Regression or TestBlocker.

Since no regressions or test blockers are allowed between releases, it is being proposed as a blocker for this release.

Please resolve \triage ASAP.

--- Additional comment from Adam Kupczyk on 2023-03-31 19:13:34 UTC ---

The solution for the issue will include
a) A modification to BlueStore that significantly reduces cpu usage when processing snaps.
It is https://github.com/ceph/ceph/pull/49837, passed all tests for Quincy and Reef, waiting review.

b) A PR (https://github.com/ceph/ceph/pull/50812) that introduces feature control logic.

A ceph option `bluestore_reuse_shared_blob` is introduced.
This option will exist in Quincy and Reef, and will be removed in S(quid?).
It is an OSD deploy mode option that determines usage of new improved blob processing logic.

In Quincy(6.1) It is OFF by default.
In Reef(7.x)   It is ON  by default.

Both releases will include new admin socket command for OSD: "bluestore enable_shared_blob_reuse". 
It will immediately make transition OFF -> ON, enabling reuse of shared blobs.
Starting from this moment shared blobs could be reused; in long term reducing number of shared blobs and improving performance.

It will never be possible to transition ON -> OFF.

--- Additional comment from Neha Ojha on 2023-03-31 20:27:42 UTC ---

(In reply to Adam Kupczyk from comment #41)
> The solution for the issue will include
> a) A modification to BlueStore that significantly reduces cpu usage when
> processing snaps.
> It is https://github.com/ceph/ceph/pull/49837, passed all tests for Quincy
> and Reef, waiting review.
> 
> b) A PR (https://github.com/ceph/ceph/pull/50812) that introduces feature
> control logic.
> 
> A ceph option `bluestore_reuse_shared_blob` is introduced.
> This option will exist in Quincy and Reef, and will be removed in S(quid?).
> It is an OSD deploy mode option that determines usage of new improved blob
> processing logic.
> 
> In Quincy(6.1) It is OFF by default.

@tnielsen: We'd like to set bluestore_reuse_shared_blob to true for ODF 4.13 fresh installs, perhaps by means of Rook. As far as brown field clusters are concerned, we need to figure out a couple of things

1. Will we support RDR in clusters upgraded to ODF 4.13?
2. If the answer to (1) is no, we don't need to handle or test upgrades for the scope of this BZ. If the answer is yes, after an upgrade to ODF 4.13, bluestore_reuse_shared_blob will be set to false. We need to use the admin socket command mentioned below to enable this config in clusters using RDR.


> In Reef(7.x)   It is ON  by default.
> 
> Both releases will include new admin socket command for OSD: "bluestore
> enable_shared_blob_reuse". 
> It will immediately make transition OFF -> ON, enabling reuse of shared
> blobs.
> Starting from this moment shared blobs could be reused; in long term
> reducing number of shared blobs and improving performance.
> 
> It will never be possible to transition ON -> OFF.

--- Additional comment from Travis Nielsen on 2023-04-04 19:50:43 UTC ---

(In reply to Neha Ojha from comment #42)
> (In reply to Adam Kupczyk from comment #41)
> > The solution for the issue will include
> > a) A modification to BlueStore that significantly reduces cpu usage when
> > processing snaps.
> > It is https://github.com/ceph/ceph/pull/49837, passed all tests for Quincy
> > and Reef, waiting review.
> > 
> > b) A PR (https://github.com/ceph/ceph/pull/50812) that introduces feature
> > control logic.
> > 
> > A ceph option `bluestore_reuse_shared_blob` is introduced.
> > This option will exist in Quincy and Reef, and will be removed in S(quid?).
> > It is an OSD deploy mode option that determines usage of new improved blob
> > processing logic.
> > 
> > In Quincy(6.1) It is OFF by default.
> 
> @tnielsen: We'd like to set bluestore_reuse_shared_blob to true
> for ODF 4.13 fresh installs, perhaps by means of Rook. 

Rook (via OCS operator for downstream) allows setting Ceph values that will be loaded into the ceph.conf for each daemon.
As long as the bluestore_reuse_shared_blob value would be picked up from ceph.conf, 
for new clusters the value could be set in the configmap here [1] by the OCS operator.


> As far as brown field
> clusters are concerned, we need to figure out a couple of things
> 
> 1. Will we support RDR in clusters upgraded to ODF 4.13?
> 2. If the answer to (1) is no, we don't need to handle or test upgrades for
> the scope of this BZ. If the answer is yes, after an upgrade to ODF 4.13,
> bluestore_reuse_shared_blob will be set to false. We need to use the admin
> socket command mentioned below to enable this config in clusters using RDR.

In the upgraded clusters, the daemons are restarted and would pick up the new ceph.conf that is updated in the configmap.
Malay, during an upgrade, the OCS operator will update that configmap with any new settings, correct? 
I am not clear about the reconcile strategy of the OCS operator for the configmap "rook-config-override".

Neha, but for the brown field case are you saying the admin socket on each OSD would need to have that setting enabled?
That would take a separate approach from how Rook sets any settings in Ceph.
Or is the ceph.conf or a "ceph config set" command sufficient?

[1] https://github.com/red-hat-storage/ocs-operator/blob/main/controllers/storagecluster/cephconfig.go#L31-L40

--- Additional comment from Malay Kumar parida on 2023-04-05 05:54:37 UTC ---

> Malay, during an upgrade, the OCS operator will update that configmap with any new settings, correct? 

Yes Travis, The changes will be picked up after the upgrade. So if some particular config is decided here,
we can put that in that configmap. But keep in mind ocs-operator will always reconcile the configmap and will always apply those 
configs. So if someone wants to remove those configs for any reason that will be impossible. So we have to be a little careful.

--- Additional comment from Neha Ojha on 2023-04-05 17:17:41 UTC ---

(In reply to Travis Nielsen from comment #43)
> (In reply to Neha Ojha from comment #42)
> > (In reply to Adam Kupczyk from comment #41)
> > > The solution for the issue will include
> > > a) A modification to BlueStore that significantly reduces cpu usage when
> > > processing snaps.
> > > It is https://github.com/ceph/ceph/pull/49837, passed all tests for Quincy
> > > and Reef, waiting review.
> > > 
> > > b) A PR (https://github.com/ceph/ceph/pull/50812) that introduces feature
> > > control logic.
> > > 
> > > A ceph option `bluestore_reuse_shared_blob` is introduced.
> > > This option will exist in Quincy and Reef, and will be removed in S(quid?).
> > > It is an OSD deploy mode option that determines usage of new improved blob
> > > processing logic.
> > > 
> > > In Quincy(6.1) It is OFF by default.
> > 
> > @tnielsen: We'd like to set bluestore_reuse_shared_blob to true
> > for ODF 4.13 fresh installs, perhaps by means of Rook. 
> 
> Rook (via OCS operator for downstream) allows setting Ceph values that will
> be loaded into the ceph.conf for each daemon.
> As long as the bluestore_reuse_shared_blob value would be picked up from
> ceph.conf, 
> for new clusters the value could be set in the configmap here [1] by the OCS
> operator.
> 
> 
> > As far as brown field
> > clusters are concerned, we need to figure out a couple of things
> > 
> > 1. Will we support RDR in clusters upgraded to ODF 4.13?
> > 2. If the answer to (1) is no, we don't need to handle or test upgrades for
> > the scope of this BZ. If the answer is yes, after an upgrade to ODF 4.13,
> > bluestore_reuse_shared_blob will be set to false. We need to use the admin
> > socket command mentioned below to enable this config in clusters using RDR.
> 
> In the upgraded clusters, the daemons are restarted and would pick up the
> new ceph.conf that is updated in the configmap.
> Malay, during an upgrade, the OCS operator will update that configmap with
> any new settings, correct? 
> I am not clear about the reconcile strategy of the OCS operator for the
> configmap "rook-config-override".
> 
> Neha, but for the brown field case are you saying the admin socket on each
> OSD would need to have that setting enabled?
> That would take a separate approach from how Rook sets any settings in Ceph.
> Or is the ceph.conf or a "ceph config set" command sufficient?

Adam, how do you envision this to work with your current PR?

> 
> [1]
> https://github.com/red-hat-storage/ocs-operator/blob/main/controllers/
> storagecluster/cephconfig.go#L31-L40

--- Additional comment from Vivek Das on 2023-04-27 07:32:29 UTC ---

Hello Adam,

Any update on this bug ?

This is marked as a test blocker and QE is waiting for the fix.

Regards,
Vivek Das

--- Additional comment from Neha Ojha on 2023-05-03 14:27:38 UTC ---

(In reply to Vivek Das from comment #46)
> Hello Adam,
> 
> Any update on this bug ?
> 
> This is marked as a test blocker and QE is waiting for the fix.
> 
> Regards,
> Vivek Das

The main PR https://github.com/ceph/ceph/pull/49837 is going through upstream reviews and teuthology testing. After it gets merged, we'll merge the quincy backport https://github.com/ceph/ceph/pull/50549 and cherry-pick it to downstream for 6.1.

--- Additional comment from Adam Kupczyk on 2023-05-16 16:18:46 UTC ---

The issue is solved by pulling contents of:
https://github.com/ceph/ceph/pull/51451
into
https://gitlab.cee.redhat.com/ceph/ceph/-/commits/ceph-6.1-rhel-patches

AND

adding a single commit that enables the feature:
"#define WITH_ESB"

--- Additional comment from Ken Dreyer (Red Hat) on 2023-05-16 22:01:43 UTC ---

This means that it's enabled in all our downstream builds now?

--- Additional comment from Ken Dreyer (Red Hat) on 2023-05-16 22:04:07 UTC ---

For the record: this went into dist-git as https://pkgs.devel.redhat.com/cgit/rpms/ceph/commit/?h=ceph-6.1-rhel-9&id=ba5fccbc3b5dbc885e7fb629f3fa815d127d113c (unfortunately lacking "Resolves: rhbz#2119217" lines)

--- Additional comment from Neha Ojha on 2023-05-16 23:26:56 UTC ---

(In reply to Ken Dreyer (Red Hat) from comment #49)
> This means that it's enabled in all our downstream builds now?

Hi Ken,


The commits have been pushed with the ESB feature turned on by default for the time being in order to
1. unblock ODF 4.13 testing 2. get additional testing on it from RHCS/IBM Ceph QE. Moving the BZ to POST to reflect this.

As discussed yesterday, given the risk involved with this feature, we don't want to ship RHCS/IBM Ceph with this feature on for 6.1. We still need a way to turn the compile-time flag (WITH_ESB) off for these builds and just keep it on for ODF where this feature is a must.

--- Additional comment from errata-xmlrpc on 2023-05-18 05:56:22 UTC ---

This bug has been added to advisory RHBA-2023:112314 by Thomas Serlin (tserlin)

--- Additional comment from errata-xmlrpc on 2023-05-18 05:56:23 UTC ---

Bug report changed to ON_QA status by Errata System.
A QE request has been submitted for advisory RHBA-2023:112314-01
https://errata.devel.redhat.com/advisory/112314

--- Additional comment from Neha Ojha on 2023-05-22 17:58:38 UTC ---

Based on the discussion in RHCS release and RDR dependencies meeting this morning, the commits that address this BZ have been reverted from ceph-6.1-rhel-patches.

Thomas has created https://gitlab.cee.redhat.com/ceph/ceph/-/commits/private-tserlin-ceph-6.1-rhel-9-test-bz2119217-patches to continue testing of these patches beyond the scope of 6.1. We'll cherry-pick the commits to the new branch.

--- Additional comment from Adam Kupczyk on 2023-05-22 18:16:32 UTC ---

Just pushed 38 commits 14f4997518c234ad30f9442e2f14b961e349def4 ... 7da3e6ae59de2dacd4d7dc88c7421d9016259fea to https://gitlab.cee.redhat.com/ceph/ceph/-/commits/private-tserlin-ceph-6.1-rhel-9-test-bz2119217-patches. These commits form a full Elastic Shared Blob feature.

These commits were recently reverted to remove Elastic Shared Blob feature form ceph-6.1-rhel-patches.

--- Additional comment from Vikhyat Umrao on 2023-05-22 18:30:48 UTC ---

As discussed in today's mtg and also in comment#54 and comment#55, moving this one out of 6.1!

--- Additional comment from  on 2023-05-23 04:55:03 UTC ---

(In reply to Vikhyat Umrao from comment #56)
> As discussed in today's mtg and also in comment#54 and comment#55, moving
> this one out of 6.1!

Will drop this from the 6.1 errata advisory as well.

Thomas

--- Additional comment from errata-xmlrpc on 2023-05-23 04:55:51 UTC ---

This bug has been dropped from advisory RHBA-2023:112314 by Thomas Serlin (tserlin)

--- Additional comment from  on 2023-05-23 05:08:56 UTC ---

QE can use the following testfix build for testing this BZ:

* rhceph-container-6-164.0.TEST.bz2119217
* Pull from: registry-proxy.engineering.redhat.com/rh-osbs/rhceph:6-164.0.TEST.bz2119217
* Brew link for container: https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=2515859
* Ceph build in container: ceph-17.2.6-58.0.TEST.bz2119217.el9cp

Based on this -patches branch (7da3e6ae59de2dacd4d7dc88c7421d9016259fea):
https://gitlab.cee.redhat.com/ceph/ceph/-/commits/private-tserlin-ceph-6.1-rhel-9-test-bz2119217-patches

Thomas

--- Additional comment from Vikhyat Umrao on 2023-05-24 15:22:30 UTC ---

(In reply to tserlin from comment #57)
> (In reply to Vikhyat Umrao from comment #56)
> > As discussed in today's mtg and also in comment#54 and comment#55, moving
> > this one out of 6.1!
> 
> Will drop this from the 6.1 errata advisory as well.
> 
> Thomas

Thank you, Thomas.

--- Additional comment from Neha Ojha on 2023-06-01 20:22:28 UTC ---

Plan of action as per discussion will all custodians

We'll add a patch on top of https://bugzilla.redhat.com/show_bug.cgi?id=2119217#c55 to add bluestore-rdr as an osd_objectstore runtime option, which will invoke all the BlueStore changes needed for RDR. Since osd_objectstore is set at the time of mkfs, this implies

1. greenfield clusters: new clusters enabling RDR will need to set osd_objectstore=bluestore-rdr at install time

We need a separate BZ to track the work needed in OCS operator to take user input about RDR at install time, which can then be passed to rook to set osd_objectstore appropriately. 

2. brownfield clusters: upgrades will involve migration of one OSD at time to osd_objectstore=bluestore-rdr.

We need to implement something very similar to the Filestore to BlueStore migration playbook in rook, which would do the migration one by one with all the necessary flags set. Ideally the migration should performed using a maintenance window.  

We possibly need another BZ to track this work.

--- Additional comment from  on 2023-06-10 04:48:56 UTC ---

The testfix in comment #59 was x86_64 only, and Boris Ranto asked for a multi-arch (x86_64, ppc64le, s390x) testfix for ODF.

I rebuilt the testfix based on the RHCS 6.1 Release Candidate version (and likely GA), ceph-17.2.6-70. The previous testfix -patches branch rebased cleanly on top of the current ceph-6.1-rhel-patches. Details:

* rhceph-container-6-177.0.TEST.bz2119217
* Pull from: registry-proxy.engineering.redhat.com/rh-osbs/rhceph:6-177.0.TEST.bz2119217
* Brew link for container: https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=2543192
* Ceph build in container: ceph-17.2.6-70.0.TEST.bz2119217.el9cp

Based on this -patches branch (6d74fefa15d1216867d1d112b47bb83c4913d28f):
https://gitlab.cee.redhat.com/ceph/ceph/-/commits/private-tserlin-ceph-6.1-rhel-9-test-bz2119217-70-patches

Thomas

Comment 13 errata-xmlrpc 2023-12-13 15:20:35 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 7.0 Bug Fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:7780

Comment 14 Red Hat Bugzilla 2024-04-12 04:25:19 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days