This can break common anti-affinity patterns as described in bug 1817769. We should alert on this condition so the cluster admin can easily discover the problem and fix it (and also so that we hear about this issue in Telemetry/Insights), without having to do a bunch of debugging and wondering about scheduler bugs.
I'm not a huge fan of creating alarms for bugs. Typically these metrics are wasteful.
Alerting on this costs CPU. Having devs/admins hunt for this costs salary. People are more expensive than computers. I'm open to alternatives to alerts for raising the visibility of this troubling condition, but lots of smart people looked at the must-gather for this cluster before Miciah noticed the localhost issue. I'd like to have the machines chip in in a way that cuts that time down for admins on the next cluster that hits this. Do you have alternative ideas?
Created a JIRA to track the feature request: https://issues.redhat.com/browse/OCPNODE-344