Bug 2184859
| Summary: | NodeSelector for tsc frequency does not tolerate small TSC variations | ||
|---|---|---|---|
| Product: | Container Native Virtualization (CNV) | Reporter: | Germano Veit Michel <gveitmic> |
| Component: | Virtualization | Assignee: | sgott |
| Status: | CLOSED DUPLICATE | QA Contact: | Kedar Bidarkar <kbidarka> |
| Severity: | high | Docs Contact: | |
| Priority: | urgent | ||
| Version: | 4.12.2 | CC: | fdeutsch |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2023-04-12 12:03:59 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
I'm closing this as it appears to be a complete duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=2184860 Please feel free to re-open if I've misunderstood or if you intentionally opened two. (and please clarify if so). *** This bug has been marked as a duplicate of bug 2184860 *** (In reply to sgott from comment #2) > I'm closing this as it appears to be a complete duplicate of > https://bugzilla.redhat.com/show_bug.cgi?id=2184860 > > Please feel free to re-open if I've misunderstood or if you intentionally > opened two. (and please clarify if so). > > *** This bug has been marked as a duplicate of bug 2184860 *** No you are right, its a duplicate. Not sure why it does this sometimes. Click submit once in a 100 times there will be 2 bugs with sequential number, had it before. |
Description of problem: The node labeller marks the nodes with: * their exact TSC frequency * the lowest TSC frequency in the cluster IF they support tsc-scalable Hypothetical example, on a cluster where the lowest frequency is X NAME TSC-FREQUENCY TSC-SCALABLE TSC-FREQUENCY-X TSC-FREQUENCY-Y node1 X true true node2 Y true true true node3 Y false true However, TSC scaling may not be an exact number in many CPU models, leaving some possible variation. See this, same CPU on 2 different systems (or even same system and 2 reboots), there is a 1 Mhz difference that can show up between reboots or systems with same CPU [ 0.000000] tsc: Fast TSC calibration using PIT [ 0.000000] tsc: Detected 2297.345 MHz processor [ 4.127014] tsc: Refined TSC clocksource calibration: 2297.339 MHz [ 0.000000] tsc: Fast TSC calibration using PIT [ 0.000000] tsc: Detected 2297.449 MHz processor [ 4.063010] tsc: Refined TSC clocksource calibration: 2297.338 MHz If we do X = 2297.338 Y = 2297.339 Then a Windows VM with re-enlightenment will never run on node3, because its missing TSC-FREQUENCY-X label by 1MHZ off. The logic will consider this an heterogeneous cluster, but its not. The system should be able to schedule VMs on any of those 3 nodes, regardless of TSC-SCALABLE or not. Because these are essentially the same frequency. Lower layers accept this variance, BZ1839095 Version-Release number of selected component (if applicable): 4.12.10 How reproducible: Always Steps to Reproduce: 1. Use systems with TSC-SCALABLE = false and same CPUs, reboot until different frequencies. Actual results: * VMs fail to schedule on nodes with same CPU Expected results: * VMs scheduled