Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

This project is now read‑only. Starting Monday, February 2, please use https://ibm-ceph.atlassian.net/ for all bug tracking management.

Bug 2047429

Summary:	[Workload-DFG] [RHCS 5.1] - release criteria testing - small objects aged measure(hybrid) workload drops ~20% as compared to upgrade measure workload
Product:	[Red Hat Storage] Red Hat Ceph Storage	Reporter:	Rachana Patel <racpatel>
Component:	RADOS	Assignee:	Kamoltat (Junior) Sirivadhna <ksirivad>
Status:	CLOSED NOTABUG	QA Contact:	Pawan <pdhiran>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	5.1	CC:	akupczyk, amathuri, bhubbard, ceph-eng-bugs, choffman, ksirivad, lflores, mbenjamin, nojha, pdhange, rfriedma, rzarzyns, skanta, sseshasa, twilkins, vumrao
Target Milestone:	---	Keywords:	Performance
Target Release:	6.1
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2023-03-23 17:53:13 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Comment 3 Kamoltat (Junior) Sirivadhna 2022-01-28 21:18:52 UTC

As discussed with Vikhyat,

we want the manager module to enable debug log for the autoscaler, this can be done by using the command:

 ``ceph config set mgr mgr/pg_autoscaler/log_level debug``

Now, what we want to investigate is the number of PGs reduced by the autoscaler as well as the number of PGs that are in backfill/recovery.
To do this we can use the command:

 ``ceph osd pool autoscale-status``

This will give us a table of all the pools in the cluster, containing the current PG_NUM and NEW PG_NUM.
From this, we can determine the number of PGs that each pool needs to increase or decrease. Here we can easily compare the total number of pgs that increase/decrease between 5.0 and 5.1.

So I guess we can do a ``ceph osd pool autoscale-status`` immediately at the start (after deployment and all pools are created), 2 more during a fill workload, 3-4 more during the hybrid workload, and 1 more at the end.
However, we need to keep the time at which we do ``ceph osd pool autoscale-status`` the same for each 5.0 and 5.1 so we can get them as accurate results as possible. Maybe a timer script?


cluster.log will provide us with the number of backfill/recovery pgs at a certain timestamp and we can compare between 5.0 and 5.1

Comment 4 Vikhyat Umrao 2022-01-28 21:33:31 UTC

(In reply to ksirivad from comment #3)
> As discussed with Vikhyat,
> 
Thanks, Junior.

> we want the manager module to enable debug log for the autoscaler, this can
> be done by using the command:
> 
>  ``ceph config set mgr mgr/pg_autoscaler/log_level debug``
> 
> Now, what we want to investigate is the number of PGs reduced by the
> autoscaler as well as the number of PGs that are in backfill/recovery.
> To do this we can use the command:
> 
>  ``ceph osd pool autoscale-status``
> 
> This will give us a table of all the pools in the cluster, containing the
> current PG_NUM and NEW PG_NUM.
> From this, we can determine the number of PGs that each pool needs to
> increase or decrease. Here we can easily compare the total number of pgs
> that increase/decrease between 5.0 and 5.1.
> 
> So I guess we can do a ``ceph osd pool autoscale-status`` immediately at the
> start (after deployment and all pools are created), 2 more during a fill
> workload, 3-4 more during the hybrid workload, and 1 more at the end.
> However, we need to keep the time at which we do ``ceph osd pool
> autoscale-status`` the same for each 5.0 and 5.1 so we can get them as
> accurate results as possible. Maybe a timer script?

You got it. We already have a poller script! to do run and many more commands every 5 minutes so we should be all set!

> 
> 
> cluster.log will provide us with the number of backfill/recovery pgs at a
> certain timestamp and we can compare between 5.0 and 5.1



Tim - so you need to do one extra thing is running the following command[1] before starting the tests!

[1] ceph config set mgr mgr/pg_autoscaler/log_level debug

Comment 5 Vikhyat Umrao 2022-01-28 21:36:15 UTC

(In reply to Vikhyat Umrao from comment #4)
> (In reply to ksirivad from comment #3)
> > As discussed with Vikhyat,
> > 
> Thanks, Junior.
> 
> > we want the manager module to enable debug log for the autoscaler, this can
> > be done by using the command:
> > 
> >  ``ceph config set mgr mgr/pg_autoscaler/log_level debug``
> > 
> > Now, what we want to investigate is the number of PGs reduced by the
> > autoscaler as well as the number of PGs that are in backfill/recovery.
> > To do this we can use the command:
> > 
> >  ``ceph osd pool autoscale-status``
> > 
> > This will give us a table of all the pools in the cluster, containing the
> > current PG_NUM and NEW PG_NUM.
> > From this, we can determine the number of PGs that each pool needs to
> > increase or decrease. Here we can easily compare the total number of pgs
> > that increase/decrease between 5.0 and 5.1.
> > 
> > So I guess we can do a ``ceph osd pool autoscale-status`` immediately at the
> > start (after deployment and all pools are created), 2 more during a fill
> > workload, 3-4 more during the hybrid workload, and 1 more at the end.
> > However, we need to keep the time at which we do ``ceph osd pool
> > autoscale-status`` the same for each 5.0 and 5.1 so we can get them as
> > accurate results as possible. Maybe a timer script?
> 
> You got it. We already have a poller script! to do run and many more
> commands every 5 minutes so we should be all set!

Tim - to avoid any confusion if the `ceph osd pool autoscale-status` command is not logged every 5 minutes please log this one every 5 minutes.

Comment 42 Neha Ojha 2023-03-23 17:53:13 UTC

No update since https://bugzilla.redhat.com/show_bug.cgi?id=2047429#c19, closing for now. Please re-open if the issue reproduces.