Bug 1848353
| Summary: | HostedEngine caches time from Host | ||
|---|---|---|---|
| Product: | [oVirt] ovirt-engine | Reporter: | Strahil Nikolov <hunter86_bg> |
| Component: | BLL.HostedEngine | Assignee: | Eli Mesika <emesika> |
| Status: | CLOSED WORKSFORME | QA Contact: | meital avital <mavital> |
| Severity: | medium | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.3.10.4 | CC: | bugs, emesika, michal.skrivanek, mperina |
| Target Milestone: | ovirt-4.4.1 | Flags: | pm-rhel:
ovirt-4.4+
|
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-07-12 10:18:17 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | Infra | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Strahil Nikolov
2020-06-18 08:31:45 UTC
can you add more thoughts? the meail thread gives some hints but it looks like you do know a bit more about the problem The documentation text flag should only be set after 'doc text' field is provided. Please provide the documentation text and set the flag to '?' again. (In reply to Michal Skrivanek from comment #1) > can you add more thoughts? the meail thread gives some hints but it looks > like you do know a bit more about the problem Source of problem is VdsBrokerObjectsBuilder::checkTimeDrift , it calls assignDatetimeValue to store the host date value, but when you look at assignDatetimeValue you see that it first tries to get the date from the cache and not using the fresh date passed for it in the "input" strcut, this was used probably for performance reasons but IMO we can always override this value in the cache in order to follow changes made to the host time After trying to reproduce on 4.4 I saw that setting a date with time-drift > 300 seconds on the host using timedatectl to bring ntp down and set the date-time was recognized by the engine and it starts to log those messages while bringing the ntp up again is recognized by the engine and stops the messages for that host without engine restart. So, after a closer look at the code and testing the scenario the current host date-time value is used. However it took some time ( < 1 min) until the engine got the correct date Do you still have the engine log and vdsm log for the relevant host in order to explore the scenario deeper in your environment ? If not , can you please retest the following : 1) on the HE timedatectl set-ntp no 2) timedatectl set-time <any time with time-drift > 300 from current time> 3) Follow engine.log , you should see those tinme-drift messages 4) on the HE timedatectl set-ntp yes 5) engine.log will not write any of that messages after a while ( < 1 min) (In reply to Eli Mesika from comment #4) Sorry, please test as follows : > 1) on the HE On the host with the time-drift , not the HE > > timedatectl set-ntp no > > 2) > > timedatectl set-time <any time with time-drift > 300 from current time> > > 3) Follow engine.log , you should see those tinme-drift messages > > 4) on the HE On the host with the time-drift , not the HE > > timedatectl set-ntp yes > > 5) > > engine.log will not write any of that messages after a while ( < 1 min) I do have the logs, but due to logrotate issue one of the nodes have a 20G compressed log, so I need to extract that one. It seems that the engine logs are missing. Test: 1. on ovirt2 set timedatectl set-ntp no 2. on ovirt2 set time -40 min 3. 5-6 min later synced the node back. Results: All 3 VMs on ovirt2 has a "not responding" mark (questionmark for status) during the timedrift situation, but systems are responding and accessible over ssh. 1 event for the drift was reported. ``` 2020-07-02 23:41:45,325+03 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-42) [] EVENT_ID: VDS_TIME_DRIFT_ALERT(604), Host ovirt2.localdomain has time-drift of 2494 seconds while maximum configured value is 300 seconds. ``` So, it seems the engine got the time sync and stop reporting the drift .... I will try to reproduce but this time with live migration to the other host. I also tried to: 1. Set 'timedatectl set-ntp no' on ovirt1 2. Set global maintenance 3. Power off the engine (simulate patch + reboot) 4. Power on the HE on ovirt1 Results: No time-drift reported. I guess it is a very hard to reproduce issue. You can close it as fixing something that cannot be reproduced will be extremely difficult. Closing as WORKSFORME as suggested in comment 10 , please feel free to reopen when you have a clear reproduce scenario |