Bug 2016988

Summary: NTO does not set io_timeout and max_retries for AWS Nitro instances
Product: OpenShift Container Platform Reporter: Jiří Mencák <jmencak>
Component: Node Tuning OperatorAssignee: Jiří Mencák <jmencak>
Status: CLOSED ERRATA QA Contact: Simon <skordas>
Severity: high Docs Contact:
Priority: high    
Version: 4.10CC: aos-bugs, dagray
Target Milestone: ---   
Target Release: 4.10.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-03-10 16:21:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2017066    

Description Jiří Mencák 2021-10-25 11:18:40 UTC
Description of problem:
AWS Nitro instances need special tuning for NVME devices, see:
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/nvme-ebs-volumes.html#timeout-nvme-ebs-volumes

Version-Release number of selected component (if applicable):
4.9 and 4.10

How reproducible:
Always.

Steps to Reproduce:
1. echo "cat /sys/module/nvme_core/parameters/io_timeout" | oc debug node/<node_name>

Actual results:
OS-provided value not equal to 4294967295

Expected results:
4294967295

Additional info:
https://github.com/openshift/cluster-node-tuning-operator/pull/283

Comment 4 Simon 2021-10-26 17:48:50 UTC
$ for node in $(oc get nodes --no-headers | cut -f 1 -d ' ' ); do echo $node; echo ""; echo "cat /sys/module/nvme_core/parameters/io_timeout" | oc debug node/$node; done
ip-10-0-152-104.us-east-2.compute.internal

W1026 13:47:33.683325  129785 warnings.go:70] would violate "latest" version of "baseline" PodSecurity profile: host namespaces (hostNetwork=true, hostPID=true), hostPath volumes (volume "host"), privileged (container "container-00" must not set securityContext.privileged=true)
Starting pod/ip-10-0-152-104us-east-2computeinternal-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.152.104
If you don't see a command prompt, try pressing enter.
4294967295

Removing debug pod ...
ip-10-0-159-95.us-east-2.compute.internal

W1026 13:47:35.662635  129807 warnings.go:70] would violate "latest" version of "baseline" PodSecurity profile: host namespaces (hostNetwork=true, hostPID=true), hostPath volumes (volume "host"), privileged (container "container-00" must not set securityContext.privileged=true)
Starting pod/ip-10-0-159-95us-east-2computeinternal-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.159.95
If you don't see a command prompt, try pressing enter.
4294967295

Removing debug pod ...
ip-10-0-188-186.us-east-2.compute.internal

W1026 13:47:45.288153  129848 warnings.go:70] would violate "latest" version of "baseline" PodSecurity profile: host namespaces (hostNetwork=true, hostPID=true), hostPath volumes (volume "host"), privileged (container "container-00" must not set securityContext.privileged=true)
Starting pod/ip-10-0-188-186us-east-2computeinternal-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.188.186
If you don't see a command prompt, try pressing enter.
4294967295

Removing debug pod ...
ip-10-0-191-199.us-east-2.compute.internal

W1026 13:47:55.608726  129872 warnings.go:70] would violate "latest" version of "baseline" PodSecurity profile: host namespaces (hostNetwork=true, hostPID=true), hostPath volumes (volume "host"), privileged (container "container-00" must not set securityContext.privileged=true)
Starting pod/ip-10-0-191-199us-east-2computeinternal-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.191.199
If you don't see a command prompt, try pressing enter.
4294967295

Removing debug pod ...
ip-10-0-206-123.us-east-2.compute.internal

W1026 13:48:04.133077  129897 warnings.go:70] would violate "latest" version of "baseline" PodSecurity profile: host namespaces (hostNetwork=true, hostPID=true), hostPath volumes (volume "host"), privileged (container "container-00" must not set securityContext.privileged=true)
Starting pod/ip-10-0-206-123us-east-2computeinternal-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.206.123
If you don't see a command prompt, try pressing enter.
4294967295

Removing debug pod ...
ip-10-0-222-13.us-east-2.compute.internal

W1026 13:48:12.475668  129927 warnings.go:70] would violate "latest" version of "baseline" PodSecurity profile: host namespaces (hostNetwork=true, hostPID=true), hostPath volumes (volume "host"), privileged (container "container-00" must not set securityContext.privileged=true)
Starting pod/ip-10-0-222-13us-east-2computeinternal-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.222.13
If you don't see a command prompt, try pressing enter.
4294967295

Removing debug pod ...

$ oc get clusterversion
NAME      VERSION                         AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.10.0-0.ci-2021-10-26-082859   True        False         4h6m    Cluster version is 4.10.0-0.ci-2021-10-26-082859

Comment 7 errata-xmlrpc 2022-03-10 16:21:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056