Bug 1881660
Summary: | READYMACHINECOUNT 0 for worker nodes on most of the recent OCP installations | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Anurag saxena <anusaxen> |
Component: | Networking | Assignee: | Ben Bennett <bbennett> |
Networking sub component: | ovn-kubernetes | QA Contact: | Anurag saxena <anusaxen> |
Status: | CLOSED DUPLICATE | Docs Contact: | |
Severity: | medium | ||
Priority: | medium | CC: | aos-bugs, jokerman, kgarriso, zzhao |
Version: | 4.6 | Flags: | anusaxen:
needinfo-
|
Target Milestone: | --- | ||
Target Release: | 4.7.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-09-24 21:42:07 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Anurag saxena
2020-09-22 20:17:11 UTC
Anurag, what's the network pulgin is using for your env, if it's OVN, it should be same issue with https://bugzilla.redhat.com/show_bug.cgi?id=1880974 the root reason is machine-config-daemon pod cannot access 172.30.0.1:443. Noticed on sdn/ovn both. So seems like irrespective of network plugin but rather machine-config related Yep, as per multiple runs today seems like its not happening on OpenshiftSDN but on OVNKUbernetes so right Zhanqi, i suspect https://bugzilla.redhat.com/show_bug.cgi?id=1880974 might be the cause. Will let dev evaluate the statement. Thanks Please provide a must gather from this cluster. Also the privatebin link above requires a password... (In reply to Kirsten Garrison from comment #5) > Also the privatebin link above requires a password... ahh because "=" is excluded hyperlinked somehow https://privatebin-it-iso.int.open.paas.redhat.com/?82565729db2b9703#FdTB8xKK64KIz8NiO5SzHhR+vx9NKL48qqLgsMCNI2E= Also can you please include a full must gather from this cluster as requested above? (In reply to Kirsten Garrison from comment #7) > Also can you please include a full must gather from this cluster as > requested above? Yes, Kirsten. Thats in my action item list but my last cluster got pruned So repro'ing this on a new cluster and will provide gather details. Thanks Anurag! Let's leave the need info on the BZ until the must gather is shared for tracking purposes. Thanks for the kubeconfig! The MCO looks like it's operating as expected but the daemons can't reach the nodes. I see in all of the daemon logs on the workers an unexpected error: E0923 21:05:21.644735 2124 reflector.go:127] github.com/openshift/machine-config-operator/pkg/generated/informers/externalversions/factory.go:101: Failed to watch *v1.MachineConfig: failed to list *v1.MachineConfig: Get "https://172.30.0.1:443/apis/machineconfiguration.openshift.io/v1/machineconfigs?limit=500&resourceVersion=0": dial tcp 172.30.0.1:443: i/o timeout I0923 21:06:10.673012 2124 trace.go:205] Trace[16495265]: "Reflector ListAndWatch" name:k8s.io/client-go/informers/factory.go:134 (23-Sep-2020 21:05:40.672) (total time: 30000ms): Trace[16495265]: [30.000763329s] [30.000763329s] END E0923 21:06:10.673089 2124 reflector.go:127] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1.Node: failed to list *v1.Node: Get "https://172.30.0.1:443/api/v1/nodes?limit=500&resourceVersion=0": dial tcp 172.30.0.1:443: i/o timeout Which is preventing the MCO from finishing. Looking in the logs this is OVN as well. So I'd confirm your idea from:https://bugzilla.redhat.com/show_bug.cgi?id=1881660#c3 I'm going to pass it to that team and let them verify/dupe this BZ to https://bugzilla.redhat.com/show_bug.cgi?id=1880974 Thanks Kirsten for initial investigation. This looks good on today's build 4.6.0-0.nightly-2020-09-24-111253 after https://bugzilla.redhat.com/show_bug.cgi?id=1880974 is merged Good to hear @Anurag - thanks for the update! Feel free to close this as a dupe. *** This bug has been marked as a duplicate of bug 1880974 *** |