Description of problem: when /usr/bin/openshift-node-config transform 'experimental-allowed-unsafe-sysctls' to kubelet argument it contain additional colon, this cause experimental-allowed-unsafe-sysctls not working Version-Release number of selected component (if applicable): openshift v3.10.0-0.60.0 kubernetes v1.10.0+b81c8f8 How reproducible: Always Steps to Reproduce: 1. Add 'experimental-allowed-unsafe-sysctls' to /etc/origin/node/node-config.yaml kubeletArguments: experimental-allowed-unsafe-sysctls: - 'kernel.shm*,kernel.msg*,kernel.sem,fs.mqueue.*,net.*' 2. Check kubelet argument root 18671 6.2 0.9 676232 77644 ? Ssl 02:59 0:01 /usr/bin/hyperkube kubelet --v=5 --address=0.0.0.0 --allow-privileged=true --anonymous-auth=true --authentication-token-webhook=true --authentication-token-webhook-cache-ttl=5m --authorization-mode=Webhook --authorization-webhook-cache-authorized-ttl=5m --authorization-webhook-cache-unauthorized-ttl=5m --bootstrap-kubeconfig=/etc/origin/node/bootstrap.kubeconfig --cadvisor-port=0 --cert-dir=/etc/origin/node/certificates --cgroup-driver=systemd --client-ca-file=/etc/origin/node/client-ca.crt --cloud-config=/etc/origin/cloudprovider/aws.conf --cloud-provider=aws --cluster-dns=172.18.8.45 --cluster-domain=cluster.local --container-runtime=remote --container-runtime-endpoint=/var/run/crio/crio.sock --containerized=false --enable-controller-attach-detach=true "--experimental-allowed-unsafe-sysctls=kernel.shm*,kernel.msg*,kernel.sem,fs.mqueue.*,net.*" --experimental-dockershim-root-directory=/var/lib/dockershim --fail-swap-on=false --feature-gates=RotateKubeletClientCertificate=true,RotateKubeletServerCertificate=true --file-check-frequency=0s --healthz-bind-address= --healthz-port=0 --host-ipc-sources=api --host-ipc-sources=file --host-network-sources=api --host-network-sources=file --host-pid-sources=api --host-pid-sources=file --hostname-override= --http-check-frequency=0s --image-service-endpoint=/var/run/crio/crio.sock --iptables-masquerade-bit=0 --kubeconfig=/etc/origin/node/node.kubeconfig --max-pods=250 --network-plugin=cni --node-ip= --node-labels=node-role.kubernetes.io/compute=true --pod-infra-container-image=registry.reg-aws.openshift.com:443/openshift3/ose-pod:v3.10.0-0.60.0 --pod-manifest-path=/etc/origin/node/pods --port=10250 --read-only-port=0 --register-node=true --root-dir=/var/lib/origin/openshift.local.volumes --rotate-certificates=true --runtime-request-timeout=10m --tls-cert-file= --tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305 --tls-cipher-suites=TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305 --tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256 --tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 --tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384 --tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 --tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256 --tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256 --tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA --tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA --tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA --tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA --tls-cipher-suites=TLS_RSA_WITH_AES_128_GCM_SHA256 --tls-cipher-suites=TLS_RSA_WITH_AES_256_GCM_SHA384 --tls-cipher-suites=TLS_RSA_WITH_AES_128_CBC_SHA --tls-cipher-suites=TLS_RSA_WITH_AES_256_CBC_SHA --tls-min-version=VersionTLS12 --tls-private-key-file= 3. Create pod with "security.alpha.kubernetes.io/unsafe-sysctls: net.ipv4.ip_forward=0", then check the pod status [root@ip-172-18-10-19 ~]# oc create -f https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/pods/sysctls/pod-sysctl-unsafe.yaml -n dma pod "hello-pod" created [root@ip-172-18-10-19 ~]# oc get po -n dma -o wide NAME READY STATUS RESTARTS AGE IP NODE hello-pod 0/1 SysctlForbidden 0 8s <none> ip-172-18-8-45.ec2.internal Actual results: 2. The argument is "--experimental-allowed-unsafe-sysctls=kernel.shm*,kernel.msg*,kernel.sem,fs.mqueue.*,net.*" 3. The pod status is SysctlForbidden Expected results: 2. argument should be --experimental-allowed-unsafe-sysctls=kernel.shm*,kernel.msg*,kernel.sem,fs.mqueue.*,net.* 3. The pod should running Additional info:
This is due to the argument not matching the regex in shellEscapeArg() called by WriteKubeletFlags(): https://github.com/openshift/origin/blob/master/pkg/cmd/server/origin/node/node.go#L23 Unfortunately, it quotes both the flag and argument together rather than just the argument.
Clayton pointed out that having the flag and argument inside the quotes is valid for some command parsers. We need to verify that the argument is not being ignored by the kubelet parser. Avesh, can you take a look?
(In reply to Seth Jennings from comment #2) > Clayton pointed out that having the flag and argument inside the quotes is > valid for some command parsers. We need to verify that the argument is not > being ignored by the kubelet parser. > > Avesh, can you take a look? Sure, will take a look soon.
'--eviction-hard' and '--eviction-soft' also have same issue. //kubelet process root 4064 14.7 0.4 715616 78304 ? Ssl 05:50 0:01 /usr/bin/hyperkube kubelet --v=5 --address=0.0.0.0 --allow-privileged=true --anonymous-auth=true --authentication-token-webhook=true --authentication-token-webhook-cache-ttl=5m --authorization-mode=Webhook --authorization-webhook-cache-authorized-ttl=5m --authorization-webhook-cache-unauthorized-ttl=5m --bootstrap-kubeconfig=/etc/origin/node/bootstrap.kubeconfig --cadvisor-port=0 --cert-dir=/etc/origin/node/certificates --cgroup-driver=systemd --client-ca-file=/etc/origin/node/client-ca.crt --cloud-config=/etc/origin/cloudprovider/aws.conf --cloud-provider=aws --cluster-dns=172.18.5.248 --cluster-domain=cluster.local --container-runtime=remote --container-runtime-endpoint=/var/run/crio/crio.sock --containerized=false --enable-controller-attach-detach=true "--eviction-hard=imagefs.available<15%,memory.available<100Mi,nodefs.available<10%,nodefs.inodesFree<10%" --eviction-max-pod-grace-period=10 "--eviction-soft=memory.available<15Gi" --eviction-soft-grace-period=memory.available=1m0s --experimental-dockershim-root-directory=/var/lib/dockershim --fail-swap-on=false --feature-gates=RotateKubeletClientCertificate=true,RotateKubeletServerCertificate=true --file-check-frequency=0s --healthz-bind-address= --healthz-port=0 --host-ipc-sources=api --host-ipc-sources=file --host-network-sources=api --host-network-sources=file --host-pid-sources=api --host-pid-sources=file --hostname-override= --http-check-frequency=0s --image-service-endpoint=/var/run/crio/crio.sock --iptables-masquerade-bit=0 --kubeconfig=/etc/origin/node/node.kubeconfig --max-pods=250 --network-plugin=cni --node-ip= --node-labels=node-role.kubernetes.io/master=true --pod-infra-container-image=registry.reg-aws.openshift.com:443/openshift3/ose-pod:v3.10.0-0.66.0 --pod-manifest-path=/etc/origin/node/pods --port=10250 --read-only-port=0 --register-node=true --root-dir=/var/lib/origin/openshift.local.volumes --rotate-certificates=true --runtime-request-timeout=10m --tls-cert-file= --tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305 --tls-cipher-suites=TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305 --tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256 --tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 --tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384 --tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 --tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256 --tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256 --tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA --tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA --tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA --tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA --tls-cipher-suites=TLS_RSA_WITH_AES_128_GCM_SHA256 --tls-cipher-suites=TLS_RSA_WITH_AES_256_GCM_SHA384 --tls-cipher-suites=TLS_RSA_WITH_AES_128_CBC_SHA --tls-cipher-suites=TLS_RSA_WITH_AES_256_CBC_SHA --tls-min-version=VersionTLS12 --tls-private-key-file= //The value in node log Jun 12 23:42:27 ip-172-18-5-248.ec2.internal atomic-openshift-node[13437]: I0612 23:42:27.269941 13437 flags.go:27] FLAG: --eviction-hard="imagefs.available<15%,memory.available<100Mi,nodefs.available<10%,nodefs.inodesFree<5%" Jun 12 23:42:27 ip-172-18-5-248.ec2.internal atomic-openshift-node[13437]: I0612 23:42:27.269958 13437 flags.go:27] FLAG: --eviction-max-pod-grace-period="0" Jun 12 23:42:27 ip-172-18-5-248.ec2.internal atomic-openshift-node[13437]: I0612 23:42:27.269963 13437 flags.go:27] FLAG: --eviction-minimum-reclaim="" Jun 12 23:42:27 ip-172-18-5-248.ec2.internal atomic-openshift-node[13437]: I0612 23:42:27.269971 13437 flags.go:27] FLAG: --eviction-pressure-transition-period="5m0s" Jun 12 23:42:27 ip-172-18-5-248.ec2.internal atomic-openshift-node[13437]: I0612 23:42:27.269976 13437 flags.go:27] FLAG: --eviction-soft="" Jun 12 23:42:27 ip-172-18-5-248.ec2.internal atomic-openshift-node[13437]: I0612 23:42:27.269982 13437 flags.go:27] FLAG: --eviction-soft-grace-period=""
I notice you moved the bug to 3.10.z, Can we fix in 3.10.0? As this cause to some parameter can't be overwrite.
Related PRs: https://github.com/openshift/origin/pull/19951 https://github.com/openshift/openshift-ansible/pull/8772
Both PRs are merged. Moving it to Modified.
Checked on v3.10.1 with openshift-ansible-3.10.1-1.git.157.2bb6250.el7.noarch.rpm openshift-ansible-docs-3.10.1-1.git.157.2bb6250.el7.noarch.rpm openshift-ansible-playbooks-3.10.1-1.git.157.2bb6250.el7.noarch.rpm openshift-ansible-roles-3.10.1-1.git.157.2bb6250.el7.noarch.rpm And the issue has been fixed.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:1816