Bug 1807104
| Summary: | Running the OLM and OperatorHub configuration for restricted networks in an IPv6 bare metal deployment leaves nodes in NotReady,SchedulingDisabled state | ||||||
|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Marius Cornea <mcornea> | ||||
| Component: | Machine Config Operator | Assignee: | Antoni Segura Puimedon <asegurap> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Marius Cornea <mcornea> | ||||
| Severity: | urgent | Docs Contact: | |||||
| Priority: | urgent | ||||||
| Version: | 4.3.z | CC: | agurenko, amurdaca, anli, asegurap, cdoan, farandac, jerzhang, jforrest, jparrill, kboumedh, ohochman, sasha, sberens, wsun, yprokule | ||||
| Target Milestone: | --- | Keywords: | TestBlocker | ||||
| Target Release: | 4.3.z | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | |||||||
| : | 1810331 (view as bug list) | Environment: | |||||
| Last Closed: | 2020-03-24 14:33:46 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | 1810331 | ||||||
| Bug Blocks: | 1771572 | ||||||
| Attachments: |
|
||||||
|
Description
Marius Cornea
2020-02-25 15:34:35 UTC
[kni@provisionhost-0 ~]$ oc get nodes master-0.ocp-edge-cluster.qe.lab.redhat.com -o yaml
apiVersion: v1
kind: Node
metadata:
annotations:
k8s.ovn.org/l3-gateway-config: '{"default":{"interface-id":"br-local_master-0.ocp-edge-cluster.qe.lab.redhat.com","ip-address":"fd99::2/64","mac-address":"42:24:c5:4e:6c:45","mode":"local","next-hop":"fd99::1","node-port-enable":"true","vlan-id":"0"}}'
k8s.ovn.org/node-chassis-id: 3aeaac98-fa13-4026-a5ab-840bcd0239c5
k8s.ovn.org/node-join-subnets: '{"default":"fd98::10/125"}'
k8s.ovn.org/node-mgmt-port-mac-address: e2:dd:a4:85:85:11
k8s.ovn.org/node-subnets: '{"default":"fd01:0:0:3::/64"}'
machineconfiguration.openshift.io/currentConfig: rendered-master-bc96a37b957a32a299d54e386514c8e0
machineconfiguration.openshift.io/desiredConfig: rendered-master-d69cc937b725ac36d82250e6b0c1096b
machineconfiguration.openshift.io/reason: ""
machineconfiguration.openshift.io/state: Working
volumes.kubernetes.io/controller-managed-attach-detach: "true"
creationTimestamp: "2020-02-25T02:37:19Z"
labels:
beta.kubernetes.io/arch: amd64
beta.kubernetes.io/os: linux
kubernetes.io/arch: amd64
kubernetes.io/hostname: master-0.ocp-edge-cluster.qe.lab.redhat.com
kubernetes.io/os: linux
node-role.kubernetes.io/master: ""
node-role.kubernetes.io/worker: ""
node.openshift.io/os_id: rhcos
name: master-0.ocp-edge-cluster.qe.lab.redhat.com
resourceVersion: "218891"
selfLink: /api/v1/nodes/master-0.ocp-edge-cluster.qe.lab.redhat.com
uid: 8b250330-ee8d-4f69-93fc-64d25b7a4ca2
spec:
taints:
- effect: NoSchedule
key: node.kubernetes.io/unschedulable
timeAdded: "2020-02-25T14:56:02Z"
- effect: NoSchedule
key: node.kubernetes.io/unreachable
timeAdded: "2020-02-25T14:59:36Z"
- effect: NoExecute
key: node.kubernetes.io/unreachable
timeAdded: "2020-02-25T14:59:41Z"
unschedulable: true
status:
addresses:
- address: fd2e:6f44:5dd8:c956::148
type: InternalIP
- address: master-0.ocp-edge-cluster.qe.lab.redhat.com
type: Hostname
allocatable:
cpu: 15500m
ephemeral-storage: "49681111368"
hugepages-1Gi: "0"
hugepages-2Mi: "0"
memory: 32319712Ki
pods: "250"
capacity:
cpu: "16"
ephemeral-storage: 52644Mi
hugepages-1Gi: "0"
hugepages-2Mi: "0"
memory: 32934112Ki
pods: "250"
conditions:
- lastHeartbeatTime: "2020-02-25T14:56:58Z"
lastTransitionTime: "2020-02-25T14:59:36Z"
message: Kubelet stopped posting node status.
reason: NodeStatusUnknown
status: Unknown
type: MemoryPressure
- lastHeartbeatTime: "2020-02-25T14:56:58Z"
lastTransitionTime: "2020-02-25T14:59:36Z"
message: Kubelet stopped posting node status.
reason: NodeStatusUnknown
status: Unknown
type: DiskPressure
- lastHeartbeatTime: "2020-02-25T14:56:58Z"
lastTransitionTime: "2020-02-25T14:59:36Z"
message: Kubelet stopped posting node status.
reason: NodeStatusUnknown
status: Unknown
type: PIDPressure
- lastHeartbeatTime: "2020-02-25T14:56:58Z"
lastTransitionTime: "2020-02-25T14:59:36Z"
message: Kubelet stopped posting node status.
reason: NodeStatusUnknown
status: Unknown
type: Ready
daemonEndpoints:
kubeletEndpoint:
Port: 10250
images:
- names:
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:4ac9818465cbe07f63efbd9d38647aaf6f2f8759ffec13903a5a97bcdfab3be4
- <none>:<none>
sizeBytes: 830103640
- names:
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:6f2788587579df093bf990268390f7f469cce47e45a7f36b04b74aa00d1dd9e0
- <none>:<none>
sizeBytes: 727725367
- names:
- registry.svc.ci.openshift.org/ipv6/ovn-kubernetes@sha256:9bb0217b2dd42d2a963b97d2247832a87f289bd537ad3a154ddd5342edc4da6a
- <none>:<none>
sizeBytes: 648407045
- names:
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7732dea30c4b20a3df39fb18edfc3f1e78bb2addcabe49834491f19aa1d6c4a1
- <none>:<none>
sizeBytes: 474643035
- names:
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:41a180d934a95487f00aa2cc41f2e78b6d504e2fe39c18c66123c9e62c776953
- <none>:<none>
sizeBytes: 467784344
- names:
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:099032ec5bd9219642474a5859998c4f338221f843ff19823b1eebd58bb9ab5a
- <none>:<none>
sizeBytes: 423152423
- names:
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:5b57acafd25ff81623864415bf17741fe1147fcb54b71535f6608c3e8a1aaedc
- <none>:<none>
sizeBytes: 409924228
- names:
- registry.svc.ci.openshift.org/ipv6/machine-config-operator@sha256:01e1fb5bd114ec241f848467004fdfcea47b286717a09cfe434a50e71d675c02
- <none>:<none>
sizeBytes: 407437801
- names:
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:669493ce03a4c5d94cbae7ac5c2caeb79b8c1d4b4fefc4767c3e49641b3a8a6f
- <none>:<none>
sizeBytes: 372668432
- names:
- registry.svc.ci.openshift.org/ipv6/cluster-network-operator@sha256:ffd32018be544ebb681a9329487c107c0196aa8cbbce3965a10afb5a63e27b19
- <none>:<none>
sizeBytes: 350026677
- names:
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ca521acc8a1411f8e4781806acfcd2040405045714dced5975e143569106fc88
- <none>:<none>
sizeBytes: 341241550
- names:
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:afdb21a7d5f1977518de93d30cb526e9d82c26f18f6d3b0e411ec94ae7c98d93
- <none>:<none>
sizeBytes: 332965897
- names:
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ecacd961eff8cb8fbed1bda112dd14539c4b0c1b48387e329fc3c9e74bf30239
- <none>:<none>
sizeBytes: 332439248
- names:
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:1e2d4e5ee1a0ebb9cf599c7f45d4ade4e5ee6b7afc9f2874987edd90f71df32a
- <none>:<none>
sizeBytes: 332304625
- names:
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:a4375971d238f18ada1a996cd3f7709853383c2476f19b962b0c544e9d4e324b
- <none>:<none>
sizeBytes: 331306685
- names:
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:3b661a253769515ebf095c7e3177d2221afd640190d992a49cdb04a1fc9fce12
- <none>:<none>
sizeBytes: 329723524
- names:
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:a98ef6605fc2b7b20edbaac7d890045f374edb8a770d4c562eeebb874ccb9bb6
- <none>:<none>
sizeBytes: 317623077
- names:
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7014dfe5aab13a081a171299b2a412e2ac3205abf90db0380664bf6d3ff4e812
- <none>:<none>
sizeBytes: 315303502
- names:
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:a7b4725c6cd9a5acc63bdbc3d16f5cc195a2a34b456ad8d3b3b9fcd7346a9864
- <none>:<none>
sizeBytes: 315105804
- names:
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:b0ae86a71b4fff62d36c86c105be65f740342165a6d560dde7f0e546e9edf4af
- <none>:<none>
sizeBytes: 311932447
- names:
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:b0a7f2970c2f3d8f06ce0a743c72474f60581fbf3b4917821926418808ea5928
- <none>:<none>
sizeBytes: 311593655
- names:
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:b0a7014c7e86c42f78239d2381d7ee32bd63b3a83a3958b42d09909c325effe4
- <none>:<none>
sizeBytes: 310761197
- names:
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:bda83a90a05ed97033c6065ac35d14e0f0be24d0bcd33fd15221b7b5ba7966f5
- <none>:<none>
sizeBytes: 309229038
- names:
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:06625988a38a75d7abeba3f5d31f013c8189626444af4492a890f592eab76c10
- <none>:<none>
sizeBytes: 308272311
- names:
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:101df6bc3b6e171f80d305bdfef699ae34388d2997f993415aee144f08d15654
- <none>:<none>
sizeBytes: 305264173
- names:
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:e32deb224aeed8135d9fc42b516cf07a2391a933e6438e0bac12bf4dfe76f8f8
- <none>:<none>
sizeBytes: 304852163
- names:
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:388e17432b72c0cb9586b387176ce1500517f2548fd5bba34588de0ee96b6c6d
- <none>:<none>
sizeBytes: 301807729
- names:
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:359f28880b4110cbd03cc8397540f83d4e7605a51c771117003f488bac569ab1
- <none>:<none>
sizeBytes: 300165346
- names:
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:4b908f3195060f494590ced95bce23ff027af730a93e5334d57ccfb28fe0f8f1
- <none>:<none>
sizeBytes: 299459222
- names:
- registry.svc.ci.openshift.org/ipv6/cluster-kube-apiserver-operator@sha256:c744f7b8c0a2086bdb45bb94ee491a2623d6e956831da99b710a67c26106a844
- <none>:<none>
sizeBytes: 298765295
- names:
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:6fd131fcda17fc956dce1965204c259146151da8276817861adb5baee23d7577
- <none>:<none>
sizeBytes: 297132052
- names:
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:445b67bd0f586a1bff691cd019c461a8623bd70c8c4d00a5d723a417cf5f038e
- <none>:<none>
sizeBytes: 292584523
- names:
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:c90cafc16f4050d736ed4c6c86f2469b429036a50e27ea3c3639409a246849a4
- <none>:<none>
sizeBytes: 292581591
- names:
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:f61ed683406ed6cd2a3b3117e82c83211c7551d1e4280ff5ce7462f8d650e79a
- <none>:<none>
sizeBytes: 285919312
- names:
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:5759e2c0ae8ce3193cdb408c4a2485866397d8623dfa05a4d1c20adcd48cf073
- <none>:<none>
sizeBytes: 279623253
- names:
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:8109601903d3fb1072e5ee02a3b6c2dc6628c0fff26d69c924c9f7ce4c17e22d
- <none>:<none>
sizeBytes: 277849752
- names:
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ca8762a0c0f0177629d8f6fbebbddcb3438ca8990d8ed993d8dc04574e4a974c
- <none>:<none>
sizeBytes: 271271819
- names:
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:e559edba536c2247aa9286cfb2fa6520516df39f023546acc0b58dfd2d6ef627
- <none>:<none>
sizeBytes: 264215014
- names:
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0b2f4d3140f87845fab5a88c57b6cb0900774699ea30ba26270f25532abaa2fb
- <none>:<none>
sizeBytes: 258011246
- names:
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:5827985410ea82e3f4f0808cc601e2ed834b7a800304e12079bced988b49dccd
- <none>:<none>
sizeBytes: 256980613
- names:
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:475f5ece24e33bfa38ac9b2678db33eae039ab10509e7544096b4c96aba0db86
- <none>:<none>
sizeBytes: 255944919
- names:
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ff152532ba80cce2351febf6925762017095c9bf26840dff208ff8e3e2ccafbe
- <none>:<none>
sizeBytes: 250795582
- names:
- quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:83c963d56fe4738ec023ac4b740fa1417e30379e138f12ccff82bc193aa610d8
- <none>:<none>
sizeBytes: 238844227
nodeInfo:
architecture: amd64
bootID: b7c77884-89ae-41a2-868d-38e2a01ef63e
containerRuntimeVersion: cri-o://1.16.3-22.dev.rhaos4.3.git11c04e3.el8
kernelVersion: 4.18.0-147.5.1.el8_1.x86_64
kubeProxyVersion: v1.16.2
kubeletVersion: v1.16.2
machineID: cd6bf73f0e98411e84d875c27eccba68
operatingSystem: linux
osImage: Red Hat Enterprise Linux CoreOS 43.81.202002170853.0 (Ootpa)
systemUUID: cd6bf73f-0e98-411e-84d8-75c27eccba68
[root@master-0 core]# systemctl status kubelet
Warning: The unit file, source configuration file or drop-ins of kubelet.service changed on disk. Run 'systemctl daemon-reload' to reload units.
● kubelet.service - Kubernetes Kubelet
Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/kubelet.service.d
└─10-default-env.conf, 20-nodenet.conf
Active: active (running) since Tue 2020-02-25 14:59:20 UTC; 38min ago
Process: 3370 ExecStartPre=/bin/rm -f /var/lib/kubelet/cpu_manager_state (code=exited, status=0/SUCCESS)
Process: 3368 ExecStartPre=/bin/mkdir --parents /etc/kubernetes/manifests (code=exited, status=0/SUCCESS)
Main PID: 3372 (hyperkube)
Tasks: 40 (limit: 26213)
Memory: 210.9M
CPU: 1min 19.154s
CGroup: /system.slice/kubelet.service
└─3372 /usr/bin/hyperkube kubelet --config=/etc/kubernetes/kubelet.conf --bootstrap-kubeconfig=/etc/kubernetes/kubeconfig --kubeconfig=/var/lib/kubelet/kubeconfig --container-runtime=remote --container-runtime-endpoint=/var/ru>
Feb 25 15:38:07 master-0.ocp-edge-cluster.qe.lab.redhat.com hyperkube[3372]: I0225 15:38:07.454790 3372 prober.go:129] Liveness probe for "openshift-kube-scheduler-localhost.localdomain_openshift-kube-scheduler(37a2869826ee72d5a8ee916>
Feb 25 15:38:07 master-0.ocp-edge-cluster.qe.lab.redhat.com hyperkube[3372]: E0225 15:38:07.530876 3372 kubelet.go:2275] node "localhost.localdomain" not found
Feb 25 15:38:07 master-0.ocp-edge-cluster.qe.lab.redhat.com hyperkube[3372]: E0225 15:38:07.631139 3372 kubelet.go:2275] node "localhost.localdomain" not found
Feb 25 15:38:07 master-0.ocp-edge-cluster.qe.lab.redhat.com hyperkube[3372]: E0225 15:38:07.731375 3372 kubelet.go:2275] node "localhost.localdomain" not found
Feb 25 15:38:07 master-0.ocp-edge-cluster.qe.lab.redhat.com hyperkube[3372]: E0225 15:38:07.831633 3372 kubelet.go:2275] node "localhost.localdomain" not found
Feb 25 15:38:07 master-0.ocp-edge-cluster.qe.lab.redhat.com hyperkube[3372]: E0225 15:38:07.931866 3372 kubelet.go:2275] node "localhost.localdomain" not found
Feb 25 15:38:08 master-0.ocp-edge-cluster.qe.lab.redhat.com hyperkube[3372]: E0225 15:38:08.032129 3372 kubelet.go:2275] node "localhost.localdomain" not found
Feb 25 15:38:08 master-0.ocp-edge-cluster.qe.lab.redhat.com hyperkube[3372]: I0225 15:38:08.088550 3372 httplog.go:90] GET /metrics/cadvisor: (42.690275ms) 200 [Prometheus/2.14.0 [fd2e:6f44:5dd8:c956::135]:49376]
Feb 25 15:38:08 master-0.ocp-edge-cluster.qe.lab.redhat.com hyperkube[3372]: I0225 15:38:08.107950 3372 prober.go:129] Readiness probe for "coredns-localhost.localdomain_openshift-kni-infra(50d1e0101b9a4c5cc8d8bc70799a0083):coredns" s>
Feb 25 15:38:08 master-0.ocp-edge-cluster.qe.lab.redhat.com hyperkube[3372]: E0225 15:38:08.132400 3372 kubelet.go:2275] node "localhost.localdomain" not found
Trying to restart kubelet I can see: [root@master-0 core]# systemctl restart kubelet Warning: The unit file, source configuration file or drop-ins of kubelet.service changed on disk. Run 'systemctl daemon-reload' to reload units. After restarting kubelet still shows [kni@provisionhost-0 ~]$ export KUBECONFIG=clusterconfigs/auth/kubeconfig [kni@provisionhost-0 ~]$ oc get csr NAME AGE REQUESTOR CONDITION csr-5dcqm 36m system:node:master-0.ocp-edge-cluster.qe.lab.redhat.com Pending csr-66wz2 20m system:node:master-0.ocp-edge-cluster.qe.lab.redhat.com Pending csr-ntqr8 51m system:node:master-0.ocp-edge-cluster.qe.lab.redhat.com Pending csr-r6rq8 5m49s system:node:master-0.ocp-edge-cluster.qe.lab.redhat.com Pending csr-zx69t 108s system:node:master-0.ocp-edge-cluster.qe.lab.redhat.com Pending [kni@provisionhost-0 ~]$ oc adm certificate approve csr-5dcqm certificatesigningrequest.certificates.k8s.io/csr-5dcqm approved [kni@provisionhost-0 ~]$ oc adm certificate approve csr-66wz2 certificatesigningrequest.certificates.k8s.io/csr-66wz2 approved [kni@provisionhost-0 ~]$ oc adm certificate approve csr-ntqr8 certificatesigningrequest.certificates.k8s.io/csr-ntqr8 approved [kni@provisionhost-0 ~]$ oc adm certificate approve csr-r6rq8 certificatesigningrequest.certificates.k8s.io/csr-r6rq8 approved [kni@provisionhost-0 ~]$ oc adm certificate approve csr-zx69t certificatesigningrequest.certificates.k8s.io/csr-zx69t approved [kni@provisionhost-0 ~]$ oc get csr NAME AGE REQUESTOR CONDITION csr-5dcqm 36m system:node:master-0.ocp-edge-cluster.qe.lab.redhat.com Approved,Issued csr-66wz2 21m system:node:master-0.ocp-edge-cluster.qe.lab.redhat.com Approved,Issued csr-ntqr8 51m system:node:master-0.ocp-edge-cluster.qe.lab.redhat.com Approved,Issued csr-r6rq8 6m40s system:node:master-0.ocp-edge-cluster.qe.lab.redhat.com Approved,Issued csr-zx69t 2m39s system:node:master-0.ocp-edge-cluster.qe.lab.redhat.com Approved,Issued Checking the nodes we can see: [kni@provisionhost-0 ~]$ oc get nodes NAME STATUS ROLES AGE VERSION localhost.localdomain Ready master,worker 103s v1.16.2 master-0.ocp-edge-cluster.qe.lab.redhat.com NotReady master,worker 13h v1.16.2 master-1.ocp-edge-cluster.qe.lab.redhat.com Ready,SchedulingDisabled master,worker 13h v1.16.2 master-2.ocp-edge-cluster.qe.lab.redhat.com Ready master,worker 13h v1.16.2 2 issues here: 1/ master-0 changed it hostname to localhost 2/ master-1 went into SchedulingDisabled state It looks like the master mcp is updating: [kni@provisionhost-0 ~]$ oc get mcp NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT master rendered-master-bc96a37b957a32a299d54e386514c8e0 False True False 4 0 1 0 worker rendered-worker-868410f8330337e8136d160eaa007384 True False False 0 0 0 0 It looks like it got stuck trying to update the node:
machineconfiguration.openshift.io/currentConfig: rendered-master-bc96a37b957a32a299d54e386514c8e0
machineconfiguration.openshift.io/desiredConfig: rendered-master-d69cc937b725ac36d82250e6b0c1096b
NotReady,SchedulingDisabled generally indicates it died on reboot.
Looking at the instructions in https://access.redhat.com/documentation/en-us/openshift_container_platform/4.3/html-single/operators/index#olm-restricted-networks-operatorhub_olm-restricted-networks
I'm not really sure why it tried to apply an updated machineconfig though.
Could you provide a must-gather? This might also be more related to node than the MCO
I reproduced the issue, master-0 node gets rebooted and after the reboot it comes up with a bad hostname(localhost).
must-gather doesn't seem to work as the cluster doesn't have access to quay.io. Is there any specific specific log that I can provide from master nodes?
[kni@provisionhost-0 ~]$ oc adm must-gather
[must-gather ] OUT unable to resolve the imagestream tag openshift/must-gather:latest
[must-gather ] OUT
[must-gather ] OUT Using must-gather plugin-in image: quay.io/openshift/origin-must-gather:latest
[must-gather ] OUT namespace/openshift-must-gather-rfcbz created
[must-gather ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-msgvj created
[must-gather ] OUT pod for plug-in image quay.io/openshift/origin-must-gather:latest created
[kni@provisionhost-0 ~]$ oc get is -n openshift must-gather -o yaml
apiVersion: image.openshift.io/v1
kind: ImageStream
metadata:
creationTimestamp: "2020-02-26T03:19:59Z"
generation: 2
name: must-gather
namespace: openshift
resourceVersion: "12739"
selfLink: /apis/image.openshift.io/v1/namespaces/openshift/imagestreams/must-gather
uid: c9fe41b8-5ae1-48af-b2e2-da1ee3112c30
spec:
lookupPolicy:
local: false
tags:
- annotations: null
from:
kind: DockerImage
name: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:b8bc76d838a68c0ac2bd8f8c65cc21852085d44f1f1d07f57a39ecd496ce5706
generation: 2
importPolicy:
scheduled: true
name: latest
referencePolicy:
type: Source
status:
dockerImageRepository: ""
tags:
- conditions:
- generation: 2
lastTransitionTime: "2020-02-26T03:19:59Z"
message: 'Internal error occurred: [registry.ocp-edge-cluster.qe.lab.redhat.com:5000/localimages/local-release-image@sha256:b8bc76d838a68c0ac2bd8f8c65cc21852085d44f1f1d07f57a39ecd496ce5706:
Get https://registry.ocp-edge-cluster.qe.lab.redhat.com:5000/v2/: x509: certificate
signed by unknown authority, quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:b8bc76d838a68c0ac2bd8f8c65cc21852085d44f1f1d07f57a39ecd496ce5706:
Get https://quay.io/v2/: dial tcp 52.45.33.205:443: connect: network is unreachable]'
reason: InternalError
status: "False"
type: ImportSuccess
items: null
tag: latest
when disconnected, must-gather can be run using the --image flag oc adm must-gather --image myregistry.example.com/must-gather You might have to separately mirror the image used for must-gather if it has not already been picked up by the release mirror command. the OLM disconnected workflow creates an ImageContentSourcePolicy, go ahead and attach the ICSP that was created Created attachment 1667327 [details]
ImageContentSourcePolicy
I managed to reproduce the issue on 4.3.0-0.nightly-2020-03-01-194304
The issue occurs after creating the ImageContentSourcePolicy, attaching it.
At this point `oc adm must-gather` doesn't return anything as the pods are stuck in Pending state:
[kni@provisionhost-0 ~]$ oc adm must-gather --image registry.ocp-edge-cluster.qe.lab.redhat.com:5000/openshift/must-gather
[must-gather ] OUT Using must-gather plugin-in image: registry.ocp-edge-cluster.qe.lab.redhat.com:5000/openshift/must-gather
[must-gather ] OUT namespace/openshift-must-gather-j6n72 created
[must-gather ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-x9mkn created
[must-gather ] OUT pod for plug-in image registry.ocp-edge-cluster.qe.lab.redhat.com:5000/openshift/must-gather created
[kni@provisionhost-0 ~]$ oc get pods -A | grep must-gather
openshift-must-gather-2882s must-gather-dr8hx 0/1 Pending 0 15m
openshift-must-gather-j6n72 must-gather-hjmz7 0/1 Pending 0 5m5s
openshift-must-gather-wsfrj must-gather-9vbs4 0/1 Pending 0 11m
[kni@provisionhost-0 ~]$ oc get nodes
NAME STATUS ROLES AGE VERSION
master-0.ocp-edge-cluster.qe.lab.redhat.com Ready master 4h1m v1.16.2
master-1.ocp-edge-cluster.qe.lab.redhat.com Ready master 4h1m v1.16.2
master-2.ocp-edge-cluster.qe.lab.redhat.com NotReady,SchedulingDisabled master 4h1m v1.16.2
worker-0.ocp-edge-cluster.qe.lab.redhat.com NotReady,SchedulingDisabled worker 3h45m v1.16.2
worker-1.ocp-edge-cluster.qe.lab.redhat.com Ready worker 3h44m v1.16.2
this one is a blocker for the management pillar. we'll need to escalate. The issue is blocking installing operators from a disconnected registry; which is needed for ACM disconnected deployment (that should be installed by an extra operator on the IPv6 OCP environment. After digging a bit I found the following: Mar 03 22:05:08 localhost NetworkManager[1947]: <warn> [1583273108.4627] dispatcher: (1) /etc/NetworkManager/dispatcher.d/30-resolv-prepender failed (exec failed): Failed to execute child process <E2><80><9C>/etc/NetworkManager/dispatcher.d/30-resolv-prepender<E2><80><9D> (Permission denied) Mar 03 22:05:08 localhost NetworkManager[1947]: <warn> [1583273108.4628] dispatcher: (1) /etc/NetworkManager/dispatcher.d/40-mdns-hostname failed (exec failed): Failed to execute child process <E2><80><9C>/etc/NetworkManager/dispatcher.d/40-mdns-hostname<E2><80><9D> (Permission denied) Mar 03 22:05:08 localhost NetworkManager[1947]: <warn> [1583273108.4728] dispatcher: (2) /etc/NetworkManager/dispatcher.d/30-resolv-prepender failed (exec failed): Failed to execute child process <E2><80><9C>/etc/NetworkManager/dispatcher.d/30-resolv-prepender<E2><80><9D> (Permission denied) Mar 03 22:05:08 localhost NetworkManager[1947]: <warn> [1583273108.4728] dispatcher: (2) /etc/NetworkManager/dispatcher.d/40-mdns-hostname failed (exec failed): Failed to execute child process <E2><80><9C>/etc/NetworkManager/dispatcher.d/40-mdns-hostname<E2><80><9D> (Permission denied) Being that /etc/mdns/hostname existed and that it is only created by /etc/NetworkManager/dispatcher.d/40-mdns-hostname it means that on some MCO operation after the initial boot the permissions got broken. Looking at SELinux reveals that: -rwxr-xr-x. 1 root root system_u:object_r:NetworkManager_initrc_exec_t:s0 100 Mar 3 18:43 04-iscsi -rwxr-xr-x. 1 root root system_u:object_r:NetworkManager_initrc_exec_t:s0 1062 Mar 3 18:43 11-dhclient -rwxr-xr-x. 1 root root system_u:object_r:NetworkManager_initrc_exec_t:s0 428 Mar 3 18:43 20-chrony -rwxr-xr-x. 1 root root system_u:object_r:tmp_t:s0 1158 Mar 3 22:04 30-resolv-prepender -rwxr-xr-x. 1 root root system_u:object_r:tmp_t:s0 392 Mar 3 22:04 40-mdns-hostname Which points to the files being rendered to a location with tmp_t label (probably /tmp). After fixing the labels and restarting, the masters and workers went back to Ready. The worker kept the noschedule taint as the machineconfigpool reports that it is still updating and degraded: [kni@provisionhost-0 ~]$ oc get mcp NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT master rendered-master-b3566b5e3bda8866335cc9d5e34b723d False True False 3 2 2 0 worker rendered-worker-805a6087a31096e4ea468a8f81cc882d False True True 2 0 0 1 [kni@provisionhost-0 ~]$ Finally, the troubleshooting process highlighted that the mdns MCO template should control for cases where hostname is reported as localhost.localdomain in its verify-hostname functionality, not just localhost. *** Bug 1810632 has been marked as a duplicate of this bug. *** Verified on 4.3.0-0.nightly-2020-03-09-172027 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0858 |