Bug 2209879

Summary: [RDR] clock skew detected on mon leads to ceph warning after deploying workloads
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Aman Agrawal <amagrawa>
Component: cephAssignee: Radoslaw Zarzynski <rzarzyns>
ceph sub component: RADOS QA Contact: Elad <ebenahar>
Status: CLOSED NOTABUG Docs Contact:
Severity: high    
Priority: unspecified CC: bniver, ekuric, muagarwa, nojha, ocs-bugs, odf-bz-bot, pbalogh, pcuzner, sagrawal, sostapov
Version: 4.13   
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-05-26 08:04:28 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Comment 2 Paul Cuzner 2023-05-25 07:49:30 UTC
Ceph is normally the victim of time sync errors, not the cause :)

Can you check with a debug pod the state of chrony on the node where this monitor is running please. If the time skew is real then ceph is doing the right thing - however, if you find that the hosts are all sync'ng correctly and ceph is reporting a skew...that's a different problem!

Either way it would be beneficial to see the chrony status

e.g
[paul@redhat-jumpbox ~]$ oc debug node/sno1
Temporary namespace openshift-debug-tkbt4 is created for debugging node...
Starting pod/sno1-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.70.56.16
If you don't see a command prompt, try pressing enter.
sh-4.4# chroot /host
sh-4.4# chronyc sources
MS Name/IP address         Stratum Poll Reach LastRx Last sample               
===============================================================================
^+ dns-e.ns4v.icu                2  10   377  1022    -15ms[  -15ms] +/-  175ms
^+ shaka.ruselabs.com            2  10   377   391    +13ms[  +13ms] +/-  117ms
^* time.cloudflare.com           3  10   377   396   -227us[ -183us] +/-   65ms
^+ 129.146.193.200               2  10   377   488  -6558us[-6514us] +/-  165ms
^? 10.5.27.10                    0  10     0     -     +0ns[   +0ns] +/-    0ns
^+ clock2.bos.redhat.com         2  10   377    94  +3431us[+3431us] +/-  177ms

sh-4.4# chronyc tracking
Reference ID    : A29FC801 (time.cloudflare.com)
Stratum         : 4
Ref time (UTC)  : Thu May 25 07:41:14 2023
System time     : 0.000335479 seconds fast of NTP time
Last offset     : +0.000044595 seconds
RMS offset      : 0.000084538 seconds
Frequency       : 19.144 ppm slow
Residual freq   : +0.000 ppm
Skew            : 0.010 ppm
Root delay      : 0.125449866 seconds
Root dispersion : 0.001973303 seconds
Update interval : 1027.5 seconds
Leap status     : Normal