Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
This project is now read‑only. Starting Monday, February 2, please use Jira Cloud for all bug tracking management.

Bug 1782494

Summary: ceph osds restart is taking too much time in the converge step of a minor update
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: David Hill <dhill>
Component: Ceph-AnsibleAssignee: Guillaume Abrioux <gabrioux>
Status: CLOSED DUPLICATE QA Contact: Vasishta <vashastr>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 3.2CC: aschoen, ceph-eng-bugs, dsavinea, gmeno, nthomas, ykaul
Target Milestone: rc   
Target Release: 5.*   
Hardware: x86_64   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-12-16 16:31:25 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description David Hill 2019-12-11 17:42:51 UTC
Description of problem:
ceph osds restart is taking too much time and the congerge step of a minor update will most likely timeout .  In this cluster, we have 16 osds per node and 10 nodes which means 160 osds to update.

In the output below, you can see the first node took ~1 hour to restart all its osds and the second 27 minutes.


2019-12-11 09:54:32,393 p=12692 u=mistral |  RUNNING HANDLER [ceph-handler : restart ceph osds daemon(s) - container] *******
2019-12-11 09:54:32,394 p=12692 u=mistral |  Wednesday 11 December 2019  09:54:32 -0600 (0:00:01.308)       0:17:22.095 **** 
2019-12-11 11:00:15,743 p=12692 u=mistral |  changed: [10.10.10.1 -> 10.10.10.2] => (item=10.10.10.2)
2019-12-11 11:27:27,920 p=12692 u=mistral |  changed: [10.10.10.1 -> 10.10.10.3] => (item=10.10.10.3)



Version-Release number of selected component (if applicable):
Latest ceph-ansible 3.2.30.1-1

How reproducible:
Converge

Steps to Reproduce:
1. Do a minor update with lots of OSDs
2.
3.

Actual results:
converge step breaks

Expected results:
converge step completes

Additional info:

Comment 11 Dimitri Savineau 2019-12-16 16:31:25 UTC

*** This bug has been marked as a duplicate of bug 1784047 ***