Bug 1823765
| Summary: | nfd-workers crash under an ipv6 environment | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Yolanda Robla <yroblamo> |
| Component: | Node Feature Discovery Operator | Assignee: | Carlos Eduardo Arango Gutierrez <carangog> |
| Status: | CLOSED ERRATA | QA Contact: | Walid A. <wabouham> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 4.5 | CC: | akamra, aojeagar, bfournie, carangog, fpan, mpatel, scuppett, sejug, zshi |
| Target Milestone: | --- | ||
| Target Release: | 4.7.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-02-24 15:01:20 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Yolanda Robla
2020-04-14 12:56:15 UTC
Setting target release to 4.6.0 (active development branch). Where any fixes are required/requested to be backported, clones targeting those z-stream releases will be created. With Eduardo's image we now see it node-feature-discovery working with IPV6 (see the IPs being used below):
[stack@openshift-master-0 ~]$ oc describe pods/nfd-worker-jvppc -n openshift-nfd
Name: nfd-worker-jvppc
Namespace: openshift-nfd
Priority: 0
Node: worker-1.ostest.test.metalkube.org/fd2e:6f44:5dd8:c956::18
Start Time: Mon, 18 Jan 2021 15:53:09 -0500
Labels: app=nfd-worker
controller-revision-hash=6b9f8bfd77
pod-template-generation=1
Annotations: openshift.io/scc: nfd-worker
Status: Running
IP: fd2e:6f44:5dd8:c956::18
IPs:
IP: fd2e:6f44:5dd8:c956::18
Controlled By: DaemonSet/nfd-worker
Containers:
nfd-worker:
Container ID: cri-o://69557baa8f927c78884c6dd797c0adb9bbafdd86832e8d38178643f5a8732eb8
Image: virthost.ostest.test.metalkube.org:5000/localimages/origin-node-feature-discovery:4.7
Image ID: virthost.ostest.test.metalkube.org:5000/localimages/origin-node-feature-discovery@sha256:75929c498301af285a8dcca4b17a45d5b53062c28b3a672a07b791be371757a1
Port: <none>
Host Port: <none>
Command:
nfd-worker
Args:
--sleep-interval=60s
--server=nfd-master:12000
State: Running
Started: Mon, 18 Jan 2021 15:53:15 -0500
Ready: True
Restart Count: 0
Environment:
NODE_NAME: (v1:spec.nodeName)
Mounts:
/etc/kubernetes/node-feature-discovery from config (rw)
/etc/kubernetes/node-feature-discovery/features.d from nfd-features (rw)
/etc/kubernetes/node-feature-discovery/source.d from nfd-hooks (rw)
/host-boot from host-boot (ro)
/host-etc/os-release from host-os-release (ro)
/host-sys from host-sys (rw)
/var/run/secrets/kubernetes.io/serviceaccount from nfd-worker-token-l6j5r (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
host-boot:
Type: HostPath (bare host directory volume)
Path: /boot
HostPathType:
host-os-release:
Type: HostPath (bare host directory volume)
Path: /etc/os-release
HostPathType:
host-sys:
Type: HostPath (bare host directory volume)
Path: /sys
HostPathType:
nfd-hooks:
Type: HostPath (bare host directory volume)
Path: /etc/kubernetes/node-feature-discovery/source.d
HostPathType:
nfd-features:
Type: HostPath (bare host directory volume)
Path: /etc/kubernetes/node-feature-discovery/features.d
HostPathType:
config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: nfd-worker
Optional: false
nfd-worker-token-l6j5r:
Type: Secret (a volume populated by a Secret)
SecretName: nfd-worker-token-l6j5r
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: :NoSchedule op=Exists
node.kubernetes.io/disk-pressure:NoSchedule op=Exists
node.kubernetes.io/memory-pressure:NoSchedule op=Exists
node.kubernetes.io/network-unavailable:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists
node.kubernetes.io/pid-pressure:NoSchedule op=Exists
node.kubernetes.io/unreachable:NoExecute op=Exists
node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events: <none>
[stack@openshift-master-0 ~]$
*** Bug 1913878 has been marked as a duplicate of this bug. *** Verified that we could deploy NFD operator and instance of nfd-master-server operand successfully on an OCP 4.7fc3 disconnected IPv6 baremetal cluster.
The NFD operator image and operand images were mirrored to the cluster local registry and deployed from NFD master github repo.
$ oc version
Client Version: 4.7.0-fc.3
Server Version: 4.7.0-fc.3
Kubernetes Version: v1.20.0+d9c52cc
$ oc get pods -n openshift-nfd
NAME READY STATUS RESTARTS AGE
nfd-master-dmps2 1/1 Running 0 2d17h
nfd-master-gz6q2 1/1 Running 0 2d17h
nfd-master-zkt8h 1/1 Running 0 2d17h
nfd-operator-59bf958694-58dzn 1/1 Running 0 2d17h
nfd-worker-4r8cc 1/1 Running 0 2d17h
nfd-worker-w9s8v 1/1 Running 0 2d17h
$ oc get pods -n openshift-nfd -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nfd-master-dmps2 1/1 Running 0 2d17h fd01:0:0:3::b1 master-0-1.ocp-edge-cluster-jacot2-0.qe.lab.redhat.com <none> <none>
nfd-master-gz6q2 1/1 Running 0 2d17h fd01:0:0:1::84 master-0-0.ocp-edge-cluster-jacot2-0.qe.lab.redhat.com <none> <none>
nfd-master-zkt8h 1/1 Running 0 2d17h fd01:0:0:2::a2 master-0-2.ocp-edge-cluster-jacot2-0.qe.lab.redhat.com <none> <none>
nfd-operator-59bf958694-58dzn 1/1 Running 0 2d17h fd01:0:0:2::a0 master-0-2.ocp-edge-cluster-jacot2-0.qe.lab.redhat.com <none> <none>
nfd-worker-4r8cc 1/1 Running 0 2d17h fd2e:6f44:5dd8::13f worker-0-1.ocp-edge-cluster-jacot2-0.qe.lab.redhat.com <none> <none>
nfd-worker-w9s8v 1/1 Running 0 2d17h fd2e:6f44:5dd8::123 worker-0-0.ocp-edge-cluster-jacot2-0.qe.lab.redhat.com <none> <none>
$ oc describe node | grep feature
feature.node.kubernetes.io/cpu-cpuid.ADX=true
feature.node.kubernetes.io/cpu-cpuid.AESNI=true
feature.node.kubernetes.io/cpu-cpuid.AVX=true
feature.node.kubernetes.io/cpu-cpuid.AVX2=true
feature.node.kubernetes.io/cpu-cpuid.FMA3=true
feature.node.kubernetes.io/cpu-cpuid.HLE=true
feature.node.kubernetes.io/cpu-cpuid.HYPERVISOR=true
feature.node.kubernetes.io/cpu-cpuid.IBPB=true
feature.node.kubernetes.io/cpu-cpuid.RTM=true
feature.node.kubernetes.io/cpu-cpuid.STIBP=true
feature.node.kubernetes.io/cpu-cpuid.VMX=true
feature.node.kubernetes.io/custom-rdma.available=true
feature.node.kubernetes.io/kernel-selinux.enabled=true
feature.node.kubernetes.io/kernel-version.full=4.18.0-240.10.1.el8_3.x86_64
feature.node.kubernetes.io/kernel-version.major=4
feature.node.kubernetes.io/kernel-version.minor=18
feature.node.kubernetes.io/kernel-version.revision=0
feature.node.kubernetes.io/pci-1013.present=true
feature.node.kubernetes.io/pci-1af4.present=true
feature.node.kubernetes.io/system-os_release.ID=rhcos
feature.node.kubernetes.io/system-os_release.RHEL_VERSION=8.3
feature.node.kubernetes.io/system-os_release.VERSION_ID=4.7
feature.node.kubernetes.io/system-os_release.VERSION_ID.major=4
feature.node.kubernetes.io/system-os_release.VERSION_ID.minor=7
nfd.node.kubernetes.io/feature-labels:
feature.node.kubernetes.io/cpu-cpuid.ADX=true
feature.node.kubernetes.io/cpu-cpuid.AESNI=true
feature.node.kubernetes.io/cpu-cpuid.AVX=true
feature.node.kubernetes.io/cpu-cpuid.AVX2=true
feature.node.kubernetes.io/cpu-cpuid.FMA3=true
feature.node.kubernetes.io/cpu-cpuid.HLE=true
feature.node.kubernetes.io/cpu-cpuid.HYPERVISOR=true
feature.node.kubernetes.io/cpu-cpuid.IBPB=true
feature.node.kubernetes.io/cpu-cpuid.RTM=true
feature.node.kubernetes.io/cpu-cpuid.STIBP=true
feature.node.kubernetes.io/cpu-cpuid.VMX=true
feature.node.kubernetes.io/custom-rdma.available=true
feature.node.kubernetes.io/kernel-selinux.enabled=true
feature.node.kubernetes.io/kernel-version.full=4.18.0-240.10.1.el8_3.x86_64
feature.node.kubernetes.io/kernel-version.major=4
feature.node.kubernetes.io/kernel-version.minor=18
feature.node.kubernetes.io/kernel-version.revision=0
feature.node.kubernetes.io/pci-1013.present=true
feature.node.kubernetes.io/pci-1af4.present=true
feature.node.kubernetes.io/system-os_release.ID=rhcos
feature.node.kubernetes.io/system-os_release.RHEL_VERSION=8.3
feature.node.kubernetes.io/system-os_release.VERSION_ID=4.7
feature.node.kubernetes.io/system-os_release.VERSION_ID.major=4
feature.node.kubernetes.io/system-os_release.VERSION_ID.minor=7
nfd.node.kubernetes.io/feature-labels:
$ oc describe node worker-0-0.ocp-edge-cluster-jacot2-0.qe.lab.redhat.com
Name: worker-0-0.ocp-edge-cluster-jacot2-0.qe.lab.redhat.com
Roles: worker
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
feature.node.kubernetes.io/cpu-cpuid.ADX=true
feature.node.kubernetes.io/cpu-cpuid.AESNI=true
feature.node.kubernetes.io/cpu-cpuid.AVX=true
feature.node.kubernetes.io/cpu-cpuid.AVX2=true
feature.node.kubernetes.io/cpu-cpuid.FMA3=true
feature.node.kubernetes.io/cpu-cpuid.HLE=true
feature.node.kubernetes.io/cpu-cpuid.HYPERVISOR=true
feature.node.kubernetes.io/cpu-cpuid.IBPB=true
feature.node.kubernetes.io/cpu-cpuid.RTM=true
feature.node.kubernetes.io/cpu-cpuid.STIBP=true
feature.node.kubernetes.io/cpu-cpuid.VMX=true
feature.node.kubernetes.io/custom-rdma.available=true
feature.node.kubernetes.io/kernel-selinux.enabled=true
feature.node.kubernetes.io/kernel-version.full=4.18.0-240.10.1.el8_3.x86_64
feature.node.kubernetes.io/kernel-version.major=4
feature.node.kubernetes.io/kernel-version.minor=18
feature.node.kubernetes.io/kernel-version.revision=0
feature.node.kubernetes.io/pci-1013.present=true
feature.node.kubernetes.io/pci-1af4.present=true
feature.node.kubernetes.io/system-os_release.ID=rhcos
feature.node.kubernetes.io/system-os_release.RHEL_VERSION=8.3
feature.node.kubernetes.io/system-os_release.VERSION_ID=4.7
feature.node.kubernetes.io/system-os_release.VERSION_ID.major=4
feature.node.kubernetes.io/system-os_release.VERSION_ID.minor=7
kubernetes.io/arch=amd64
kubernetes.io/hostname=worker-0-0.ocp-edge-cluster-jacot2-0.qe.lab.redhat.com
kubernetes.io/os=linux
node-role.kubernetes.io/worker=
node.openshift.io/os_id=rhcos
Annotations: k8s.ovn.org/l3-gateway-config:
{"default":{"mode":"shared","interface-id":"br-ex_worker-0-0.ocp-edge-cluster-jacot2-0.qe.lab.redhat.com","mac-address":"52:54:00:3d:3a:d6...
k8s.ovn.org/node-chassis-id: 3ecd516e-b673-4247-b8c9-e00e644e3b22
k8s.ovn.org/node-local-nat-ip: {"default":["fd99::821b"]}
k8s.ovn.org/node-mgmt-port-mac-address: 0e:55:5a:82:7b:1e
k8s.ovn.org/node-primary-ifaddr: {"ipv6":"fd2e:6f44:5dd8::123/128"}
k8s.ovn.org/node-subnets: {"default":"fd01:0:0:5::/64"}
machine.openshift.io/machine: openshift-machine-api/ocp-edge-cluster-jaco-k2gjl-worker-0-24p6z
machineconfiguration.openshift.io/currentConfig: rendered-worker-88487793ccf13d4f83d751a51ad678bb
machineconfiguration.openshift.io/desiredConfig: rendered-worker-88487793ccf13d4f83d751a51ad678bb
machineconfiguration.openshift.io/reason:
machineconfiguration.openshift.io/state: Done
nfd.node.kubernetes.io/extended-resources:
nfd.node.kubernetes.io/feature-labels:
cpu-cpuid.ADX,cpu-cpuid.AESNI,cpu-cpuid.AVX,cpu-cpuid.AVX2,cpu-cpuid.FMA3,cpu-cpuid.HLE,cpu-cpuid.HYPERVISOR,cpu-cpuid.IBPB,cpu-cpuid.RTM,...
nfd.node.kubernetes.io/worker.version: 1.15
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Mon, 25 Jan 2021 22:28:37 +0000
Taints: <none>
Unschedulable: false
Lease:
HolderIdentity: worker-0-0.ocp-edge-cluster-jacot2-0.qe.lab.redhat.com
AcquireTime: <unset>
RenewTime: Mon, 01 Feb 2021 16:19:02 +0000
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
MemoryPressure False Mon, 01 Feb 2021 16:14:11 +0000 Mon, 25 Jan 2021 22:28:35 +0000 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Mon, 01 Feb 2021 16:14:11 +0000 Mon, 25 Jan 2021 22:28:35 +0000 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Mon, 01 Feb 2021 16:14:11 +0000 Mon, 25 Jan 2021 22:28:35 +0000 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Mon, 01 Feb 2021 16:14:11 +0000 Mon, 25 Jan 2021 22:29:16 +0000 KubeletReady kubelet is posting ready status
Addresses:
InternalIP: fd2e:6f44:5dd8::123
Hostname: worker-0-0.ocp-edge-cluster-jacot2-0.qe.lab.redhat.com
Capacity:
cpu: 8
ephemeral-storage: 52660Mi
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 16390284Ki
pods: 250
Allocatable:
cpu: 7500m
ephemeral-storage: 48622469038
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 15239308Ki
pods: 250
System Info:
Machine ID: e9733bffaa8c4bb39a954ef27df6bba0
System UUID: e9733bff-aa8c-4bb3-9a95-4ef27df6bba0
Boot ID: b35aaec2-8054-4056-84ba-9ec112c63f83
Kernel Version: 4.18.0-240.10.1.el8_3.x86_64
OS Image: Red Hat Enterprise Linux CoreOS 47.83.202101171239-0 (Ootpa)
Operating System: linux
Architecture: amd64
Container Runtime Version: cri-o://1.20.0-0.rhaos4.7.gitd9f17c8.el8.42
Kubelet Version: v1.20.0+d9c52cc
Kube-Proxy Version: v1.20.0+d9c52cc
ProviderID: baremetalhost:///openshift-machine-api/openshift-worker-0-0/293c9b55-57cb-4c6d-bf4a-89d37614a43e
Non-terminated Pods: (30 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE
--------- ---- ------------ ---------- --------------- ------------- ---
openshift-cluster-node-tuning-operator tuned-59q4x 10m (0%) 0 (0%) 50Mi (0%) 0 (0%) 6d17h
openshift-dns dns-default-f2lrb 65m (0%) 0 (0%) 131Mi (0%) 0 (0%) 6d17h
openshift-image-registry image-registry-5f57fbb64f-xlj9c 100m (1%) 0 (0%) 256Mi (1%) 0 (0%) 2d18h
openshift-image-registry node-ca-zcx52 10m (0%) 0 (0%) 10Mi (0%) 0 (0%) 6d17h
openshift-ingress-canary ingress-canary-5lccs 10m (0%) 0 (0%) 20Mi (0%) 0 (0%) 6d17h
openshift-ingress router-default-5b47bd97f6-5bjf7 100m (1%) 0 (0%) 256Mi (1%) 0 (0%) 2d18h
openshift-kni-infra coredns-worker-0-0.ocp-edge-cluster-jacot2-0.qe.lab.redhat.com 200m (2%) 0 (0%) 400Mi (2%) 0 (0%) 6d17h
openshift-kni-infra keepalived-worker-0-0.ocp-edge-cluster-jacot2-0.qe.lab.redhat.com 200m (2%) 0 (0%) 400Mi (2%) 0 (0%) 6d17h
openshift-kni-infra mdns-publisher-worker-0-0.ocp-edge-cluster-jacot2-0.qe.lab.redhat.com 100m (1%) 0 (0%) 200Mi (1%) 0 (0%) 6d17h
openshift-kube-storage-version-migrator migrator-56998ccbc5-9m27v 100m (1%) 0 (0%) 200Mi (1%) 0 (0%) 2d18h
openshift-machine-config-operator machine-config-daemon-cc2dk 40m (0%) 0 (0%) 100Mi (0%) 0 (0%) 6d17h
openshift-marketplace redhat-operator-index-2nwqj 10m (0%) 0 (0%) 50Mi (0%) 0 (0%) 2d18h
openshift-monitoring alertmanager-main-0 8m (0%) 0 (0%) 270Mi (1%) 0 (0%) 2d18h
openshift-monitoring alertmanager-main-1 8m (0%) 0 (0%) 270Mi (1%) 0 (0%) 2d18h
openshift-monitoring alertmanager-main-2 8m (0%) 0 (0%) 270Mi (1%) 0 (0%) 2d18h
openshift-monitoring grafana-76ccdf9487-rgjqv 5m (0%) 0 (0%) 120Mi (0%) 0 (0%) 2d18h
openshift-monitoring kube-state-metrics-56b4768c7-hpbxv 4m (0%) 0 (0%) 120Mi (0%) 0 (0%) 2d18h
openshift-monitoring node-exporter-t7nlc 9m (0%) 0 (0%) 210Mi (1%) 0 (0%) 6d17h
openshift-monitoring openshift-state-metrics-68f5786bbb-bzsjk 3m (0%) 0 (0%) 190Mi (1%) 0 (0%) 2d18h
openshift-monitoring prometheus-k8s-0 76m (1%) 0 (0%) 1204Mi (8%) 0 (0%) 2d18h
openshift-monitoring prometheus-k8s-1 76m (1%) 0 (0%) 1204Mi (8%) 0 (0%) 2d18h
openshift-monitoring thanos-querier-64c9b86458-4r4mj 9m (0%) 0 (0%) 92Mi (0%) 0 (0%) 2d18h
openshift-monitoring thanos-querier-64c9b86458-wl4qp 9m (0%) 0 (0%) 92Mi (0%) 0 (0%) 2d18h
openshift-multus multus-6tw5h 10m (0%) 0 (0%) 150Mi (1%) 0 (0%) 6d17h
openshift-multus network-metrics-daemon-hrt6c 20m (0%) 0 (0%) 120Mi (0%) 0 (0%) 6d17h
openshift-network-diagnostics network-check-source-8b577f64-nlbgd 10m (0%) 0 (0%) 50Mi (0%) 0 (0%) 2d18h
openshift-network-diagnostics network-check-target-b6lhj 10m (0%) 0 (0%) 150Mi (1%) 0 (0%) 6d17h
openshift-nfd nfd-worker-w9s8v 0 (0%) 0 (0%) 0 (0%) 0 (0%) 2d17h
openshift-ovn-kubernetes ovnkube-node-zc8mc 30m (0%) 0 (0%) 620Mi (4%) 0 (0%) 6d17h
openshift-ovn-kubernetes ovs-node-wzx9z 100m (1%) 0 (0%) 300Mi (2%) 0 (0%) 6d17h
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 1340m (17%) 0 (0%)
memory 7505Mi (50%) 0 (0%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-1Gi 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
Events:
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 extras and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5635 |