Back to bug 1814057

Who When What Removed Added
Mark Jones 2020-03-16 22:21:51 UTC CC marjones
Gaurav Sitlani 2020-03-17 12:20:14 UTC CC gsitlani
Vikhyat Umrao 2020-03-18 04:13:41 UTC CC vumrao
Matthias Muench 2020-03-19 15:00:44 UTC CC mmuench
Neha Ojha 2020-03-19 20:11:06 UTC Assignee nojha sseshasa
Sebastian Gutierrez 2020-05-08 22:35:21 UTC CC segutier
Mark Jones 2020-05-19 16:59:16 UTC Severity medium high
Mark Jones 2020-06-03 16:40:59 UTC CC sseshasa
Flags needinfo?(sseshasa)
Sridhar Seshasayee 2020-06-05 09:51:29 UTC Flags needinfo?(sseshasa)
Neha Ojha 2020-07-01 23:45:24 UTC Status NEW ASSIGNED
Target Release 5.* 4.1
Target Milestone rc z2
Neha Ojha 2020-07-01 23:47:01 UTC Link ID Github ceph/ceph/pull/35798
Neha Ojha 2020-07-09 23:51:03 UTC Blocks 1855472
Josh Durgin 2020-07-15 15:45:40 UTC Priority unspecified medium
CC jdurgin
Neha Ojha 2020-08-25 01:14:55 UTC Status ASSIGNED POST
errata-xmlrpc 2020-08-25 07:32:07 UTC Status POST MODIFIED
CC tserlin
Fixed In Version ceph-14.2.8-100.el8cp, ceph-14.2.8-100.el7cp
Status MODIFIED ON_QA
Pawan 2020-09-04 06:14:44 UTC CC pdhiran
QA Contact mmurthy pdhiran
Pawan 2020-09-07 13:51:00 UTC Flags needinfo?(sseshasa)
Pawan 2020-09-07 16:45:40 UTC Flags needinfo?(sseshasa)
Sridhar Seshasayee 2020-09-07 18:04:51 UTC Flags needinfo?(sseshasa) needinfo?(sseshasa)
Pawan 2020-09-08 05:10:06 UTC Status ON_QA VERIFIED
Aron Gunn 2020-09-08 22:53:58 UTC CC agunn
Docs Contact agunn
Flags needinfo?(sseshasa)
Sridhar Seshasayee 2020-09-09 08:50:17 UTC Doc Text Feature:
The heartbeat grace timer is reset to the default value if
there have been no failures on an osd for an interval
exceeding a threshold value (48 hrs).

Reason:
The grace time is the interval beyond which a Ceph cluster
considers an osd as down in the absence of a heartbeat. The
grace time is scaled based on laggy estimations or in other
words adjusted based on how frequently an osd is
experiencing failures. However, in the absence of failures
on an osd for a prolonged period of time (for e.g. more than
48 hrs), there was no mechanism to reset the timer back to
the default value.

Result:
For an osd, if the failure interval between the last failure
and the latest failure exceeds the set threshold value (48
hrs), the grace timer is reset to the default value (20
secs).
Doc Type If docs needed, set a value Enhancement
Flags needinfo?(sseshasa)
Aron Gunn 2020-09-09 21:43:50 UTC Doc Text Feature:
The heartbeat grace timer is reset to the default value if
there have been no failures on an osd for an interval
exceeding a threshold value (48 hrs).

Reason:
The grace time is the interval beyond which a Ceph cluster
considers an osd as down in the absence of a heartbeat. The
grace time is scaled based on laggy estimations or in other
words adjusted based on how frequently an osd is
experiencing failures. However, in the absence of failures
on an osd for a prolonged period of time (for e.g. more than
48 hrs), there was no mechanism to reset the timer back to
the default value.

Result:
For an osd, if the failure interval between the last failure
and the latest failure exceeds the set threshold value (48
hrs), the grace timer is reset to the default value (20
secs).
.Update to the heartbeat grace period

Previously, when there were no Ceph OSD failures for more than 48 hours, there was no mechanism to reset the timer back to the default value. With this release, the heartbeat grace timer is reset to the default value of 20 seconds, if there have been no failures on a Ceph OSD for 48 hours. When the failure interval between the last failure and the latest failure exceeds 48 hours, the grace timer is reset to the default value of 20 seconds.

The grace time is the interval in which a Ceph storage cluster considers a Ceph OSD as down by the absence of a heartbeat. The grace time is scaled based on lag estimations or on how frequently a Ceph ODS is experiencing failures.
Aron Gunn 2020-09-09 21:44:12 UTC Blocks 1816167
Aron Gunn 2020-09-09 21:46:28 UTC Doc Text .Update to the heartbeat grace period

Previously, when there were no Ceph OSD failures for more than 48 hours, there was no mechanism to reset the timer back to the default value. With this release, the heartbeat grace timer is reset to the default value of 20 seconds, if there have been no failures on a Ceph OSD for 48 hours. When the failure interval between the last failure and the latest failure exceeds 48 hours, the grace timer is reset to the default value of 20 seconds.

The grace time is the interval in which a Ceph storage cluster considers a Ceph OSD as down by the absence of a heartbeat. The grace time is scaled based on lag estimations or on how frequently a Ceph ODS is experiencing failures.
.Update to the heartbeat grace period

Previously, when there were no Ceph OSD failures for more than 48 hours, there was no mechanism to reset the grace timer back to the default value. With this release, the heartbeat grace timer is reset to the default value of 20 seconds, if there have been no failures on a Ceph OSD for 48 hours. When the failure interval between the last failure and the latest failure exceeds 48 hours, the grace timer is reset to the default value of 20 seconds.

The grace time is the interval in which a Ceph storage cluster considers a Ceph OSD as down by the absence of a heartbeat. The grace time is scaled based on lag estimations or on how frequently a Ceph ODS is experiencing failures.
errata-xmlrpc 2020-09-30 15:57:34 UTC Status VERIFIED RELEASE_PENDING
errata-xmlrpc 2020-09-30 17:24:49 UTC Status RELEASE_PENDING CLOSED
Resolution --- ERRATA
Last Closed 2020-09-30 17:24:49 UTC
errata-xmlrpc 2020-09-30 17:25:28 UTC Link ID Red Hat Product Errata RHBA-2020:4144

Back to bug 1814057