Bug 1833153
Summary: | add a variable for sleep time of rook operator between checks of downed OSD+Node. | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat OpenShift Container Storage | Reporter: | svolkov |
Component: | rook | Assignee: | Sébastien Han <shan> |
Status: | CLOSED ERRATA | QA Contact: | Shrivaibavi Raghaventhiran <sraghave> |
Severity: | medium | Docs Contact: | |
Priority: | unspecified | ||
Version: | 4.4 | CC: | hnallurv, madam, muagarwa, nberry, ocs-bugs, ratamir, shan, tnielsen |
Target Milestone: | --- | ||
Target Release: | OCS 4.6.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | No Doc Update | |
Doc Text: |
If this bug requires documentation, please select an appropriate Doc Type value.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2020-12-17 06:22:30 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1830015 | ||
Bug Blocks: |
Description
svolkov
2020-05-07 22:34:14 UTC
Moving to 4.5. The timeout of 5 minutes will do for 4.4. With https://github.com/rook/rook/pull/5556 I'm proposing setting the time check interval to 60s. This will follow the pattern of checking mon health in about the same interval (45s). @Sagy, do you still see a need to make this configurable? The change to query osd status every 60s was merged downstream for 4.5 with https://github.com/openshift/rook/pull/65. (In reply to Travis Nielsen from comment #3) > With https://github.com/rook/rook/pull/5556 I'm proposing setting the time > check interval to 60s. This will follow the pattern of checking mon health > in about the same interval (45s). > > @Sagy, do you still see a need to make this configurable? I would make this configurable. it will also help in QE testing and in POCs. not to mention it actually gives the customer an ability to control the failure. Hi Travis, Which variable will be used for this purpose? Moving back to assigned to add the variable instead of simply leave it at the constant of 60s. Moving to 4.6 since it's not blocking. Done in https://github.com/rook/rook/pull/5789 and resynced with https://github.com/openshift/rook/pull/85 Hi Sagy, Since this was a special ask for POC as well, would you like to confirm in latest 4.6 if the fix is what you had asked for ? Neha, This was not a request for a POC, it something I'm sure many customers will use, but I will test this and reply. @svolkov Any updates on the BZ, Did you get a chance to test this ? Tested versions: --------------- OCS - 4.6.0-rc5 OCP - 4.6 Did not get the proper steps to verify this BZ, I followed the steps to reproduce mentioned in BZ https://bugzilla.redhat.com/show_bug.cgi?id=1830015 and also we did not hit any issue during automation runs on tier4. Based on the above explanation moving this BZ to "SANITY VERIFIED". In PR [1] the interval was made configurable in the CephCluster CR to check the OSD health with this default: healthCheck: daemonHealth: osd: disabled: false interval: 60s See the documentation [2]. However, this setting is not exposed for OCS yet. I'd suggest a new BZ for that. [1] https://github.com/rook/rook/pull/5789 [2] https://rook.github.io/docs/rook/v1.5/ceph-cluster-crd.html#health-settings Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Red Hat OpenShift Container Storage 4.6.0 security, bug fix, enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5605 |