Description of problem: After following the procedure to Configuring OperatorHub for restricted networks[1] in an IPv6 bare metal deployment nodes end up in NotReady,SchedulingDisabled state: [kni@provisionhost-0 ~]$ oc get nodes NAME STATUS ROLES AGE VERSION master-0.ocp-edge-cluster.qe.lab.redhat.com NotReady,SchedulingDisabled master,worker 12h v1.16.2 master-1.ocp-edge-cluster.qe.lab.redhat.com Ready master,worker 12h v1.16.2 master-2.ocp-edge-cluster.qe.lab.redhat.com Ready master,worker 12h v1.16.2 [1] https://access.redhat.com/documentation/en-us/openshift_container_platform/4.3/html-single/operators/index#olm-restricted-networks-operatorhub_olm-restricted-networks Version-Release number of selected component (if applicable): 4.3.0-0.nightly-2020-02-21-091838-ipv6.3 How reproducible: 100% Steps to Reproduce: 1. Deploy IPv6 bare metal environment 2. Run the procedure described at https://access.redhat.com/documentation/en-us/openshift_container_platform/4.3/html-single/operators/index#olm-restricted-networks-operatorhub_olm-restricted-networks Actual results: After running the procedure some of the nodes end in NotReady,SchedulingDisabled state. Expected results: Nodes are in Ready state after running the Configuring OperatorHub for restricted networks procedure. Additional info:
[kni@provisionhost-0 ~]$ oc get nodes master-0.ocp-edge-cluster.qe.lab.redhat.com -o yaml apiVersion: v1 kind: Node metadata: annotations: k8s.ovn.org/l3-gateway-config: '{"default":{"interface-id":"br-local_master-0.ocp-edge-cluster.qe.lab.redhat.com","ip-address":"fd99::2/64","mac-address":"42:24:c5:4e:6c:45","mode":"local","next-hop":"fd99::1","node-port-enable":"true","vlan-id":"0"}}' k8s.ovn.org/node-chassis-id: 3aeaac98-fa13-4026-a5ab-840bcd0239c5 k8s.ovn.org/node-join-subnets: '{"default":"fd98::10/125"}' k8s.ovn.org/node-mgmt-port-mac-address: e2:dd:a4:85:85:11 k8s.ovn.org/node-subnets: '{"default":"fd01:0:0:3::/64"}' machineconfiguration.openshift.io/currentConfig: rendered-master-bc96a37b957a32a299d54e386514c8e0 machineconfiguration.openshift.io/desiredConfig: rendered-master-d69cc937b725ac36d82250e6b0c1096b machineconfiguration.openshift.io/reason: "" machineconfiguration.openshift.io/state: Working volumes.kubernetes.io/controller-managed-attach-detach: "true" creationTimestamp: "2020-02-25T02:37:19Z" labels: beta.kubernetes.io/arch: amd64 beta.kubernetes.io/os: linux kubernetes.io/arch: amd64 kubernetes.io/hostname: master-0.ocp-edge-cluster.qe.lab.redhat.com kubernetes.io/os: linux node-role.kubernetes.io/master: "" node-role.kubernetes.io/worker: "" node.openshift.io/os_id: rhcos name: master-0.ocp-edge-cluster.qe.lab.redhat.com resourceVersion: "218891" selfLink: /api/v1/nodes/master-0.ocp-edge-cluster.qe.lab.redhat.com uid: 8b250330-ee8d-4f69-93fc-64d25b7a4ca2 spec: taints: - effect: NoSchedule key: node.kubernetes.io/unschedulable timeAdded: "2020-02-25T14:56:02Z" - effect: NoSchedule key: node.kubernetes.io/unreachable timeAdded: "2020-02-25T14:59:36Z" - effect: NoExecute key: node.kubernetes.io/unreachable timeAdded: "2020-02-25T14:59:41Z" unschedulable: true status: addresses: - address: fd2e:6f44:5dd8:c956::148 type: InternalIP - address: master-0.ocp-edge-cluster.qe.lab.redhat.com type: Hostname allocatable: cpu: 15500m ephemeral-storage: "49681111368" hugepages-1Gi: "0" hugepages-2Mi: "0" memory: 32319712Ki pods: "250" capacity: cpu: "16" ephemeral-storage: 52644Mi hugepages-1Gi: "0" hugepages-2Mi: "0" memory: 32934112Ki pods: "250" conditions: - lastHeartbeatTime: "2020-02-25T14:56:58Z" lastTransitionTime: "2020-02-25T14:59:36Z" message: Kubelet stopped posting node status. reason: NodeStatusUnknown status: Unknown type: MemoryPressure - lastHeartbeatTime: "2020-02-25T14:56:58Z" lastTransitionTime: "2020-02-25T14:59:36Z" message: Kubelet stopped posting node status. reason: NodeStatusUnknown status: Unknown type: DiskPressure - lastHeartbeatTime: "2020-02-25T14:56:58Z" lastTransitionTime: "2020-02-25T14:59:36Z" message: Kubelet stopped posting node status. reason: NodeStatusUnknown status: Unknown type: PIDPressure - lastHeartbeatTime: "2020-02-25T14:56:58Z" lastTransitionTime: "2020-02-25T14:59:36Z" message: Kubelet stopped posting node status. reason: NodeStatusUnknown status: Unknown type: Ready daemonEndpoints: kubeletEndpoint: Port: 10250 images: - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:4ac9818465cbe07f63efbd9d38647aaf6f2f8759ffec13903a5a97bcdfab3be4 - <none>:<none> sizeBytes: 830103640 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:6f2788587579df093bf990268390f7f469cce47e45a7f36b04b74aa00d1dd9e0 - <none>:<none> sizeBytes: 727725367 - names: - registry.svc.ci.openshift.org/ipv6/ovn-kubernetes@sha256:9bb0217b2dd42d2a963b97d2247832a87f289bd537ad3a154ddd5342edc4da6a - <none>:<none> sizeBytes: 648407045 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7732dea30c4b20a3df39fb18edfc3f1e78bb2addcabe49834491f19aa1d6c4a1 - <none>:<none> sizeBytes: 474643035 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:41a180d934a95487f00aa2cc41f2e78b6d504e2fe39c18c66123c9e62c776953 - <none>:<none> sizeBytes: 467784344 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:099032ec5bd9219642474a5859998c4f338221f843ff19823b1eebd58bb9ab5a - <none>:<none> sizeBytes: 423152423 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:5b57acafd25ff81623864415bf17741fe1147fcb54b71535f6608c3e8a1aaedc - <none>:<none> sizeBytes: 409924228 - names: - registry.svc.ci.openshift.org/ipv6/machine-config-operator@sha256:01e1fb5bd114ec241f848467004fdfcea47b286717a09cfe434a50e71d675c02 - <none>:<none> sizeBytes: 407437801 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:669493ce03a4c5d94cbae7ac5c2caeb79b8c1d4b4fefc4767c3e49641b3a8a6f - <none>:<none> sizeBytes: 372668432 - names: - registry.svc.ci.openshift.org/ipv6/cluster-network-operator@sha256:ffd32018be544ebb681a9329487c107c0196aa8cbbce3965a10afb5a63e27b19 - <none>:<none> sizeBytes: 350026677 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ca521acc8a1411f8e4781806acfcd2040405045714dced5975e143569106fc88 - <none>:<none> sizeBytes: 341241550 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:afdb21a7d5f1977518de93d30cb526e9d82c26f18f6d3b0e411ec94ae7c98d93 - <none>:<none> sizeBytes: 332965897 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ecacd961eff8cb8fbed1bda112dd14539c4b0c1b48387e329fc3c9e74bf30239 - <none>:<none> sizeBytes: 332439248 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:1e2d4e5ee1a0ebb9cf599c7f45d4ade4e5ee6b7afc9f2874987edd90f71df32a - <none>:<none> sizeBytes: 332304625 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:a4375971d238f18ada1a996cd3f7709853383c2476f19b962b0c544e9d4e324b - <none>:<none> sizeBytes: 331306685 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:3b661a253769515ebf095c7e3177d2221afd640190d992a49cdb04a1fc9fce12 - <none>:<none> sizeBytes: 329723524 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:a98ef6605fc2b7b20edbaac7d890045f374edb8a770d4c562eeebb874ccb9bb6 - <none>:<none> sizeBytes: 317623077 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7014dfe5aab13a081a171299b2a412e2ac3205abf90db0380664bf6d3ff4e812 - <none>:<none> sizeBytes: 315303502 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:a7b4725c6cd9a5acc63bdbc3d16f5cc195a2a34b456ad8d3b3b9fcd7346a9864 - <none>:<none> sizeBytes: 315105804 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:b0ae86a71b4fff62d36c86c105be65f740342165a6d560dde7f0e546e9edf4af - <none>:<none> sizeBytes: 311932447 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:b0a7f2970c2f3d8f06ce0a743c72474f60581fbf3b4917821926418808ea5928 - <none>:<none> sizeBytes: 311593655 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:b0a7014c7e86c42f78239d2381d7ee32bd63b3a83a3958b42d09909c325effe4 - <none>:<none> sizeBytes: 310761197 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:bda83a90a05ed97033c6065ac35d14e0f0be24d0bcd33fd15221b7b5ba7966f5 - <none>:<none> sizeBytes: 309229038 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:06625988a38a75d7abeba3f5d31f013c8189626444af4492a890f592eab76c10 - <none>:<none> sizeBytes: 308272311 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:101df6bc3b6e171f80d305bdfef699ae34388d2997f993415aee144f08d15654 - <none>:<none> sizeBytes: 305264173 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:e32deb224aeed8135d9fc42b516cf07a2391a933e6438e0bac12bf4dfe76f8f8 - <none>:<none> sizeBytes: 304852163 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:388e17432b72c0cb9586b387176ce1500517f2548fd5bba34588de0ee96b6c6d - <none>:<none> sizeBytes: 301807729 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:359f28880b4110cbd03cc8397540f83d4e7605a51c771117003f488bac569ab1 - <none>:<none> sizeBytes: 300165346 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:4b908f3195060f494590ced95bce23ff027af730a93e5334d57ccfb28fe0f8f1 - <none>:<none> sizeBytes: 299459222 - names: - registry.svc.ci.openshift.org/ipv6/cluster-kube-apiserver-operator@sha256:c744f7b8c0a2086bdb45bb94ee491a2623d6e956831da99b710a67c26106a844 - <none>:<none> sizeBytes: 298765295 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:6fd131fcda17fc956dce1965204c259146151da8276817861adb5baee23d7577 - <none>:<none> sizeBytes: 297132052 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:445b67bd0f586a1bff691cd019c461a8623bd70c8c4d00a5d723a417cf5f038e - <none>:<none> sizeBytes: 292584523 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:c90cafc16f4050d736ed4c6c86f2469b429036a50e27ea3c3639409a246849a4 - <none>:<none> sizeBytes: 292581591 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:f61ed683406ed6cd2a3b3117e82c83211c7551d1e4280ff5ce7462f8d650e79a - <none>:<none> sizeBytes: 285919312 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:5759e2c0ae8ce3193cdb408c4a2485866397d8623dfa05a4d1c20adcd48cf073 - <none>:<none> sizeBytes: 279623253 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:8109601903d3fb1072e5ee02a3b6c2dc6628c0fff26d69c924c9f7ce4c17e22d - <none>:<none> sizeBytes: 277849752 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ca8762a0c0f0177629d8f6fbebbddcb3438ca8990d8ed993d8dc04574e4a974c - <none>:<none> sizeBytes: 271271819 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:e559edba536c2247aa9286cfb2fa6520516df39f023546acc0b58dfd2d6ef627 - <none>:<none> sizeBytes: 264215014 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0b2f4d3140f87845fab5a88c57b6cb0900774699ea30ba26270f25532abaa2fb - <none>:<none> sizeBytes: 258011246 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:5827985410ea82e3f4f0808cc601e2ed834b7a800304e12079bced988b49dccd - <none>:<none> sizeBytes: 256980613 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:475f5ece24e33bfa38ac9b2678db33eae039ab10509e7544096b4c96aba0db86 - <none>:<none> sizeBytes: 255944919 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ff152532ba80cce2351febf6925762017095c9bf26840dff208ff8e3e2ccafbe - <none>:<none> sizeBytes: 250795582 - names: - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:83c963d56fe4738ec023ac4b740fa1417e30379e138f12ccff82bc193aa610d8 - <none>:<none> sizeBytes: 238844227 nodeInfo: architecture: amd64 bootID: b7c77884-89ae-41a2-868d-38e2a01ef63e containerRuntimeVersion: cri-o://1.16.3-22.dev.rhaos4.3.git11c04e3.el8 kernelVersion: 4.18.0-147.5.1.el8_1.x86_64 kubeProxyVersion: v1.16.2 kubeletVersion: v1.16.2 machineID: cd6bf73f0e98411e84d875c27eccba68 operatingSystem: linux osImage: Red Hat Enterprise Linux CoreOS 43.81.202002170853.0 (Ootpa) systemUUID: cd6bf73f-0e98-411e-84d8-75c27eccba68
[root@master-0 core]# systemctl status kubelet Warning: The unit file, source configuration file or drop-ins of kubelet.service changed on disk. Run 'systemctl daemon-reload' to reload units. ● kubelet.service - Kubernetes Kubelet Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: enabled) Drop-In: /etc/systemd/system/kubelet.service.d └─10-default-env.conf, 20-nodenet.conf Active: active (running) since Tue 2020-02-25 14:59:20 UTC; 38min ago Process: 3370 ExecStartPre=/bin/rm -f /var/lib/kubelet/cpu_manager_state (code=exited, status=0/SUCCESS) Process: 3368 ExecStartPre=/bin/mkdir --parents /etc/kubernetes/manifests (code=exited, status=0/SUCCESS) Main PID: 3372 (hyperkube) Tasks: 40 (limit: 26213) Memory: 210.9M CPU: 1min 19.154s CGroup: /system.slice/kubelet.service └─3372 /usr/bin/hyperkube kubelet --config=/etc/kubernetes/kubelet.conf --bootstrap-kubeconfig=/etc/kubernetes/kubeconfig --kubeconfig=/var/lib/kubelet/kubeconfig --container-runtime=remote --container-runtime-endpoint=/var/ru> Feb 25 15:38:07 master-0.ocp-edge-cluster.qe.lab.redhat.com hyperkube[3372]: I0225 15:38:07.454790 3372 prober.go:129] Liveness probe for "openshift-kube-scheduler-localhost.localdomain_openshift-kube-scheduler(37a2869826ee72d5a8ee916> Feb 25 15:38:07 master-0.ocp-edge-cluster.qe.lab.redhat.com hyperkube[3372]: E0225 15:38:07.530876 3372 kubelet.go:2275] node "localhost.localdomain" not found Feb 25 15:38:07 master-0.ocp-edge-cluster.qe.lab.redhat.com hyperkube[3372]: E0225 15:38:07.631139 3372 kubelet.go:2275] node "localhost.localdomain" not found Feb 25 15:38:07 master-0.ocp-edge-cluster.qe.lab.redhat.com hyperkube[3372]: E0225 15:38:07.731375 3372 kubelet.go:2275] node "localhost.localdomain" not found Feb 25 15:38:07 master-0.ocp-edge-cluster.qe.lab.redhat.com hyperkube[3372]: E0225 15:38:07.831633 3372 kubelet.go:2275] node "localhost.localdomain" not found Feb 25 15:38:07 master-0.ocp-edge-cluster.qe.lab.redhat.com hyperkube[3372]: E0225 15:38:07.931866 3372 kubelet.go:2275] node "localhost.localdomain" not found Feb 25 15:38:08 master-0.ocp-edge-cluster.qe.lab.redhat.com hyperkube[3372]: E0225 15:38:08.032129 3372 kubelet.go:2275] node "localhost.localdomain" not found Feb 25 15:38:08 master-0.ocp-edge-cluster.qe.lab.redhat.com hyperkube[3372]: I0225 15:38:08.088550 3372 httplog.go:90] GET /metrics/cadvisor: (42.690275ms) 200 [Prometheus/2.14.0 [fd2e:6f44:5dd8:c956::135]:49376] Feb 25 15:38:08 master-0.ocp-edge-cluster.qe.lab.redhat.com hyperkube[3372]: I0225 15:38:08.107950 3372 prober.go:129] Readiness probe for "coredns-localhost.localdomain_openshift-kni-infra(50d1e0101b9a4c5cc8d8bc70799a0083):coredns" s> Feb 25 15:38:08 master-0.ocp-edge-cluster.qe.lab.redhat.com hyperkube[3372]: E0225 15:38:08.132400 3372 kubelet.go:2275] node "localhost.localdomain" not found
Trying to restart kubelet I can see: [root@master-0 core]# systemctl restart kubelet Warning: The unit file, source configuration file or drop-ins of kubelet.service changed on disk. Run 'systemctl daemon-reload' to reload units. After restarting kubelet still shows [kni@provisionhost-0 ~]$ export KUBECONFIG=clusterconfigs/auth/kubeconfig [kni@provisionhost-0 ~]$ oc get csr NAME AGE REQUESTOR CONDITION csr-5dcqm 36m system:node:master-0.ocp-edge-cluster.qe.lab.redhat.com Pending csr-66wz2 20m system:node:master-0.ocp-edge-cluster.qe.lab.redhat.com Pending csr-ntqr8 51m system:node:master-0.ocp-edge-cluster.qe.lab.redhat.com Pending csr-r6rq8 5m49s system:node:master-0.ocp-edge-cluster.qe.lab.redhat.com Pending csr-zx69t 108s system:node:master-0.ocp-edge-cluster.qe.lab.redhat.com Pending [kni@provisionhost-0 ~]$ oc adm certificate approve csr-5dcqm certificatesigningrequest.certificates.k8s.io/csr-5dcqm approved [kni@provisionhost-0 ~]$ oc adm certificate approve csr-66wz2 certificatesigningrequest.certificates.k8s.io/csr-66wz2 approved [kni@provisionhost-0 ~]$ oc adm certificate approve csr-ntqr8 certificatesigningrequest.certificates.k8s.io/csr-ntqr8 approved [kni@provisionhost-0 ~]$ oc adm certificate approve csr-r6rq8 certificatesigningrequest.certificates.k8s.io/csr-r6rq8 approved [kni@provisionhost-0 ~]$ oc adm certificate approve csr-zx69t certificatesigningrequest.certificates.k8s.io/csr-zx69t approved [kni@provisionhost-0 ~]$ oc get csr NAME AGE REQUESTOR CONDITION csr-5dcqm 36m system:node:master-0.ocp-edge-cluster.qe.lab.redhat.com Approved,Issued csr-66wz2 21m system:node:master-0.ocp-edge-cluster.qe.lab.redhat.com Approved,Issued csr-ntqr8 51m system:node:master-0.ocp-edge-cluster.qe.lab.redhat.com Approved,Issued csr-r6rq8 6m40s system:node:master-0.ocp-edge-cluster.qe.lab.redhat.com Approved,Issued csr-zx69t 2m39s system:node:master-0.ocp-edge-cluster.qe.lab.redhat.com Approved,Issued Checking the nodes we can see: [kni@provisionhost-0 ~]$ oc get nodes NAME STATUS ROLES AGE VERSION localhost.localdomain Ready master,worker 103s v1.16.2 master-0.ocp-edge-cluster.qe.lab.redhat.com NotReady master,worker 13h v1.16.2 master-1.ocp-edge-cluster.qe.lab.redhat.com Ready,SchedulingDisabled master,worker 13h v1.16.2 master-2.ocp-edge-cluster.qe.lab.redhat.com Ready master,worker 13h v1.16.2 2 issues here: 1/ master-0 changed it hostname to localhost 2/ master-1 went into SchedulingDisabled state It looks like the master mcp is updating: [kni@provisionhost-0 ~]$ oc get mcp NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT master rendered-master-bc96a37b957a32a299d54e386514c8e0 False True False 4 0 1 0 worker rendered-worker-868410f8330337e8136d160eaa007384 True False False 0 0 0 0
It looks like it got stuck trying to update the node: machineconfiguration.openshift.io/currentConfig: rendered-master-bc96a37b957a32a299d54e386514c8e0 machineconfiguration.openshift.io/desiredConfig: rendered-master-d69cc937b725ac36d82250e6b0c1096b NotReady,SchedulingDisabled generally indicates it died on reboot. Looking at the instructions in https://access.redhat.com/documentation/en-us/openshift_container_platform/4.3/html-single/operators/index#olm-restricted-networks-operatorhub_olm-restricted-networks I'm not really sure why it tried to apply an updated machineconfig though. Could you provide a must-gather? This might also be more related to node than the MCO
I reproduced the issue, master-0 node gets rebooted and after the reboot it comes up with a bad hostname(localhost). must-gather doesn't seem to work as the cluster doesn't have access to quay.io. Is there any specific specific log that I can provide from master nodes? [kni@provisionhost-0 ~]$ oc adm must-gather [must-gather ] OUT unable to resolve the imagestream tag openshift/must-gather:latest [must-gather ] OUT [must-gather ] OUT Using must-gather plugin-in image: quay.io/openshift/origin-must-gather:latest [must-gather ] OUT namespace/openshift-must-gather-rfcbz created [must-gather ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-msgvj created [must-gather ] OUT pod for plug-in image quay.io/openshift/origin-must-gather:latest created [kni@provisionhost-0 ~]$ oc get is -n openshift must-gather -o yaml apiVersion: image.openshift.io/v1 kind: ImageStream metadata: creationTimestamp: "2020-02-26T03:19:59Z" generation: 2 name: must-gather namespace: openshift resourceVersion: "12739" selfLink: /apis/image.openshift.io/v1/namespaces/openshift/imagestreams/must-gather uid: c9fe41b8-5ae1-48af-b2e2-da1ee3112c30 spec: lookupPolicy: local: false tags: - annotations: null from: kind: DockerImage name: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:b8bc76d838a68c0ac2bd8f8c65cc21852085d44f1f1d07f57a39ecd496ce5706 generation: 2 importPolicy: scheduled: true name: latest referencePolicy: type: Source status: dockerImageRepository: "" tags: - conditions: - generation: 2 lastTransitionTime: "2020-02-26T03:19:59Z" message: 'Internal error occurred: [registry.ocp-edge-cluster.qe.lab.redhat.com:5000/localimages/local-release-image@sha256:b8bc76d838a68c0ac2bd8f8c65cc21852085d44f1f1d07f57a39ecd496ce5706: Get https://registry.ocp-edge-cluster.qe.lab.redhat.com:5000/v2/: x509: certificate signed by unknown authority, quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:b8bc76d838a68c0ac2bd8f8c65cc21852085d44f1f1d07f57a39ecd496ce5706: Get https://quay.io/v2/: dial tcp 52.45.33.205:443: connect: network is unreachable]' reason: InternalError status: "False" type: ImportSuccess items: null tag: latest
when disconnected, must-gather can be run using the --image flag oc adm must-gather --image myregistry.example.com/must-gather You might have to separately mirror the image used for must-gather if it has not already been picked up by the release mirror command.
the OLM disconnected workflow creates an ImageContentSourcePolicy, go ahead and attach the ICSP that was created
Created attachment 1667327 [details] ImageContentSourcePolicy I managed to reproduce the issue on 4.3.0-0.nightly-2020-03-01-194304 The issue occurs after creating the ImageContentSourcePolicy, attaching it. At this point `oc adm must-gather` doesn't return anything as the pods are stuck in Pending state: [kni@provisionhost-0 ~]$ oc adm must-gather --image registry.ocp-edge-cluster.qe.lab.redhat.com:5000/openshift/must-gather [must-gather ] OUT Using must-gather plugin-in image: registry.ocp-edge-cluster.qe.lab.redhat.com:5000/openshift/must-gather [must-gather ] OUT namespace/openshift-must-gather-j6n72 created [must-gather ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-x9mkn created [must-gather ] OUT pod for plug-in image registry.ocp-edge-cluster.qe.lab.redhat.com:5000/openshift/must-gather created [kni@provisionhost-0 ~]$ oc get pods -A | grep must-gather openshift-must-gather-2882s must-gather-dr8hx 0/1 Pending 0 15m openshift-must-gather-j6n72 must-gather-hjmz7 0/1 Pending 0 5m5s openshift-must-gather-wsfrj must-gather-9vbs4 0/1 Pending 0 11m [kni@provisionhost-0 ~]$ oc get nodes NAME STATUS ROLES AGE VERSION master-0.ocp-edge-cluster.qe.lab.redhat.com Ready master 4h1m v1.16.2 master-1.ocp-edge-cluster.qe.lab.redhat.com Ready master 4h1m v1.16.2 master-2.ocp-edge-cluster.qe.lab.redhat.com NotReady,SchedulingDisabled master 4h1m v1.16.2 worker-0.ocp-edge-cluster.qe.lab.redhat.com NotReady,SchedulingDisabled worker 3h45m v1.16.2 worker-1.ocp-edge-cluster.qe.lab.redhat.com Ready worker 3h44m v1.16.2
this one is a blocker for the management pillar. we'll need to escalate.
The issue is blocking installing operators from a disconnected registry; which is needed for ACM disconnected deployment (that should be installed by an extra operator on the IPv6 OCP environment.
After digging a bit I found the following: Mar 03 22:05:08 localhost NetworkManager[1947]: <warn> [1583273108.4627] dispatcher: (1) /etc/NetworkManager/dispatcher.d/30-resolv-prepender failed (exec failed): Failed to execute child process <E2><80><9C>/etc/NetworkManager/dispatcher.d/30-resolv-prepender<E2><80><9D> (Permission denied) Mar 03 22:05:08 localhost NetworkManager[1947]: <warn> [1583273108.4628] dispatcher: (1) /etc/NetworkManager/dispatcher.d/40-mdns-hostname failed (exec failed): Failed to execute child process <E2><80><9C>/etc/NetworkManager/dispatcher.d/40-mdns-hostname<E2><80><9D> (Permission denied) Mar 03 22:05:08 localhost NetworkManager[1947]: <warn> [1583273108.4728] dispatcher: (2) /etc/NetworkManager/dispatcher.d/30-resolv-prepender failed (exec failed): Failed to execute child process <E2><80><9C>/etc/NetworkManager/dispatcher.d/30-resolv-prepender<E2><80><9D> (Permission denied) Mar 03 22:05:08 localhost NetworkManager[1947]: <warn> [1583273108.4728] dispatcher: (2) /etc/NetworkManager/dispatcher.d/40-mdns-hostname failed (exec failed): Failed to execute child process <E2><80><9C>/etc/NetworkManager/dispatcher.d/40-mdns-hostname<E2><80><9D> (Permission denied) Being that /etc/mdns/hostname existed and that it is only created by /etc/NetworkManager/dispatcher.d/40-mdns-hostname it means that on some MCO operation after the initial boot the permissions got broken. Looking at SELinux reveals that: -rwxr-xr-x. 1 root root system_u:object_r:NetworkManager_initrc_exec_t:s0 100 Mar 3 18:43 04-iscsi -rwxr-xr-x. 1 root root system_u:object_r:NetworkManager_initrc_exec_t:s0 1062 Mar 3 18:43 11-dhclient -rwxr-xr-x. 1 root root system_u:object_r:NetworkManager_initrc_exec_t:s0 428 Mar 3 18:43 20-chrony -rwxr-xr-x. 1 root root system_u:object_r:tmp_t:s0 1158 Mar 3 22:04 30-resolv-prepender -rwxr-xr-x. 1 root root system_u:object_r:tmp_t:s0 392 Mar 3 22:04 40-mdns-hostname Which points to the files being rendered to a location with tmp_t label (probably /tmp). After fixing the labels and restarting, the masters and workers went back to Ready. The worker kept the noschedule taint as the machineconfigpool reports that it is still updating and degraded: [kni@provisionhost-0 ~]$ oc get mcp NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT master rendered-master-b3566b5e3bda8866335cc9d5e34b723d False True False 3 2 2 0 worker rendered-worker-805a6087a31096e4ea468a8f81cc882d False True True 2 0 0 1 [kni@provisionhost-0 ~]$ Finally, the troubleshooting process highlighted that the mdns MCO template should control for cases where hostname is reported as localhost.localdomain in its verify-hostname functionality, not just localhost.
*** Bug 1810632 has been marked as a duplicate of this bug. ***
Verified on 4.3.0-0.nightly-2020-03-09-172027
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0858