Bug 1960263
| Summary: | SR-IOV obliviously reboot the node | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Peng Liu <pliu> |
| Component: | Networking | Assignee: | Peng Liu <pliu> |
| Networking sub component: | SR-IOV | QA Contact: | zhaozhanqi <zzhao> |
| Status: | CLOSED CURRENTRELEASE | Docs Contact: | |
| Severity: | high | ||
| Priority: | high | CC: | dosmith, keyoung, vlaad, zzhao |
| Version: | 4.6 | Keywords: | Reopened |
| Target Milestone: | --- | ||
| Target Release: | 4.6.z | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | 1960103 | Environment: | |
| Last Closed: | 2022-08-26 14:18:24 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1960103 | ||
| Bug Blocks: | |||
|
Comment 4
zhaozhanqi
2021-06-07 10:52:12 UTC
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6.34 bux fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:2267 Need to backport one more patch which fixes the scenario where custom MCP is created.
Verified this bug on 4.6.0-202106232234
Create the following yaml file at same time
# cat 1g-mc.yaml intel-dpdk.yaml
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
labels:
machineconfiguration.openshift.io/role: worker
name: 50-kargs-1g-hugepages
spec:
kernelArguments:
- default_hugepagesz=1G
- hugepagesz=1G
- hugepages=4
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
name: intel-dpdk
namespace: openshift-sriov-network-operator
spec:
deviceType: vfio-pci
mtu: 1700
nicSelector:
deviceID: "158b"
pfNames:
- ens1f1
rootDevices:
- '0000:3b:00.1'
vendor: '8086'
nodeSelector:
feature.node.kubernetes.io/sriov-capable: 'true'
numVfs: 2
priority: 99
resourceName: inteldpdk
# oc logs sriov-network-config-daemon-klpbk | grep MCP
I0628 02:03:05.797224 265835 daemon.go:754] drainNode():MCP worker is not ready: [{RenderDegraded False 2021-06-25 09:39:42 +0000 UTC } {NodeDegraded False 2021-06-25 09:39:47 +0000 UTC } {Degraded False 2021-06-25 09:39:47 +0000 UTC } {Updated False 2021-06-28 02:02:24 +0000 UTC } {Updating True 2021-06-28 02:02:24 +0000 UTC All nodes are updating to rendered-worker-d2dd550696cbfafc253011805efcfe77}], wait...
I0628 02:03:35.801191 265835 daemon.go:754] drainNode():MCP worker is not ready: [{RenderDegraded False 2021-06-25 09:39:42 +0000 UTC } {NodeDegraded False 2021-06-25 09:39:47 +0000 UTC } {Degraded False 2021-06-25 09:39:47 +0000 UTC } {Updated False 2021-06-28 02:02:24 +0000 UTC } {Updating True 2021-06-28 02:02:24 +0000 UTC All nodes are updating to rendered-worker-d2dd550696cbfafc253011805efcfe77}], wait...
I0628 02:04:05.802850 265835 daemon.go:754] drainNode():MCP worker is not ready: [{RenderDegraded False 2021-06-25 09:39:42 +0000 UTC } {NodeDegraded False 2021-06-25 09:39:47 +0000 UTC } {Degraded False 2021-06-25 09:39:47 +0000 UTC } {Updated False 2021-06-28 02:02:24 +0000 UTC } {Updating True 2021-06-28 02:02:24 +0000 UTC All nodes are updating to rendered-worker-d2dd550696cbfafc253011805efcfe77}], wait...
I0628 02:09:41.253031 3422 daemon.go:486] completeDrain(): resume MCP worker
# oc describe node dell-per740-14.rhts.eng.pek2.redhat.com
Name: dell-per740-14.rhts.eng.pek2.redhat.com
Roles: worker
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
feature.node.kubernetes.io/sriov-capable=true
kubernetes.io/arch=amd64
kubernetes.io/hostname=dell-per740-14.rhts.eng.pek2.redhat.com
kubernetes.io/os=linux
node-role.kubernetes.io/worker=
node.openshift.io/os_id=rhcos
Annotations: k8s.ovn.org/l3-gateway-config:
{"default":{"mode":"shared","interface-id":"br-ex_dell-per740-14.rhts.eng.pek2.redhat.com","mac-address":"e4:43:4b:5b:6c:28","ip-addresses...
k8s.ovn.org/node-chassis-id: 44557c31-ea74-49f4-abae-78316e0dffa3
k8s.ovn.org/node-join-subnets: {"default":"100.64.3.0/29"}
k8s.ovn.org/node-local-nat-ip: {"default":["169.254.12.7"]}
k8s.ovn.org/node-mgmt-port-mac-address: 36:26:ff:f3:a3:8b
k8s.ovn.org/node-primary-ifaddr: {"ipv4":"10.73.116.62/23","ipv6":"2620:52:0:4974:928e:6695:d41e:b1a4/64"}
k8s.ovn.org/node-subnets: {"default":"10.128.2.0/23"}
machineconfiguration.openshift.io/currentConfig: rendered-worker-d2dd550696cbfafc253011805efcfe77
machineconfiguration.openshift.io/desiredConfig: rendered-worker-d2dd550696cbfafc253011805efcfe77
machineconfiguration.openshift.io/reason:
machineconfiguration.openshift.io/state: Done
sriovnetwork.openshift.io/state: Idle
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Fri, 25 Jun 2021 06:04:40 -0400
Taints: <none>
Unschedulable: false
Lease:
HolderIdentity: dell-per740-14.rhts.eng.pek2.redhat.com
AcquireTime: <unset>
RenewTime: Sun, 27 Jun 2021 22:28:43 -0400
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
MemoryPressure False Sun, 27 Jun 2021 22:24:52 -0400 Sun, 27 Jun 2021 22:08:10 -0400 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Sun, 27 Jun 2021 22:24:52 -0400 Sun, 27 Jun 2021 22:08:10 -0400 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Sun, 27 Jun 2021 22:24:52 -0400 Sun, 27 Jun 2021 22:08:10 -0400 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Sun, 27 Jun 2021 22:24:52 -0400 Sun, 27 Jun 2021 22:08:20 -0400 KubeletReady kubelet is posting ready status
Addresses:
InternalIP: 10.73.116.62
Hostname: dell-per740-14.rhts.eng.pek2.redhat.com
Capacity:
cpu: 32
ephemeral-storage: 584963052Ki
hugepages-1Gi: 4Gi
memory: 32479680Ki
openshift.io/inteldpdk: 2
pods: 250
Allocatable:
cpu: 31500m
ephemeral-storage: 538028206007
hugepages-1Gi: 4Gi
memory: 27134400Ki
openshift.io/inteldpdk: 2
pods: 250
|