Bug 1587824

Summary: When some node parameter transform to kubelet argument it contain additional quotation marks
Product: OpenShift Container Platform Reporter: DeShuai Ma <dma>
Component: NodeAssignee: Avesh Agarwal <avagarwa>
Status: CLOSED ERRATA QA Contact: DeShuai Ma <dma>
Severity: high Docs Contact:
Priority: high    
Version: 3.10.0CC: aos-bugs, ccoleman, jokerman, mmccomas, sjenning, wjiang
Target Milestone: ---   
Target Release: 3.10.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-07-30 19:17:19 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description DeShuai Ma 2018-06-06 07:38:14 UTC
Description of problem:
when /usr/bin/openshift-node-config transform 'experimental-allowed-unsafe-sysctls' to kubelet argument it contain additional colon, this cause experimental-allowed-unsafe-sysctls not working

Version-Release number of selected component (if applicable):
openshift v3.10.0-0.60.0
kubernetes v1.10.0+b81c8f8

How reproducible:
Always

Steps to Reproduce:
1. Add 'experimental-allowed-unsafe-sysctls' to /etc/origin/node/node-config.yaml
  
kubeletArguments:
  experimental-allowed-unsafe-sysctls:
  - 'kernel.shm*,kernel.msg*,kernel.sem,fs.mqueue.*,net.*'

2. Check kubelet argument
root     18671  6.2  0.9 676232 77644 ?        Ssl  02:59   0:01 /usr/bin/hyperkube kubelet --v=5 --address=0.0.0.0 --allow-privileged=true --anonymous-auth=true --authentication-token-webhook=true --authentication-token-webhook-cache-ttl=5m --authorization-mode=Webhook --authorization-webhook-cache-authorized-ttl=5m --authorization-webhook-cache-unauthorized-ttl=5m --bootstrap-kubeconfig=/etc/origin/node/bootstrap.kubeconfig --cadvisor-port=0 --cert-dir=/etc/origin/node/certificates --cgroup-driver=systemd --client-ca-file=/etc/origin/node/client-ca.crt --cloud-config=/etc/origin/cloudprovider/aws.conf --cloud-provider=aws --cluster-dns=172.18.8.45 --cluster-domain=cluster.local --container-runtime=remote --container-runtime-endpoint=/var/run/crio/crio.sock --containerized=false --enable-controller-attach-detach=true "--experimental-allowed-unsafe-sysctls=kernel.shm*,kernel.msg*,kernel.sem,fs.mqueue.*,net.*" --experimental-dockershim-root-directory=/var/lib/dockershim --fail-swap-on=false --feature-gates=RotateKubeletClientCertificate=true,RotateKubeletServerCertificate=true --file-check-frequency=0s --healthz-bind-address= --healthz-port=0 --host-ipc-sources=api --host-ipc-sources=file --host-network-sources=api --host-network-sources=file --host-pid-sources=api --host-pid-sources=file --hostname-override= --http-check-frequency=0s --image-service-endpoint=/var/run/crio/crio.sock --iptables-masquerade-bit=0 --kubeconfig=/etc/origin/node/node.kubeconfig --max-pods=250 --network-plugin=cni --node-ip= --node-labels=node-role.kubernetes.io/compute=true --pod-infra-container-image=registry.reg-aws.openshift.com:443/openshift3/ose-pod:v3.10.0-0.60.0 --pod-manifest-path=/etc/origin/node/pods --port=10250 --read-only-port=0 --register-node=true --root-dir=/var/lib/origin/openshift.local.volumes --rotate-certificates=true --runtime-request-timeout=10m --tls-cert-file= --tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305 --tls-cipher-suites=TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305 --tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256 --tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 --tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384 --tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 --tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256 --tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256 --tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA --tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA --tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA --tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA --tls-cipher-suites=TLS_RSA_WITH_AES_128_GCM_SHA256 --tls-cipher-suites=TLS_RSA_WITH_AES_256_GCM_SHA384 --tls-cipher-suites=TLS_RSA_WITH_AES_128_CBC_SHA --tls-cipher-suites=TLS_RSA_WITH_AES_256_CBC_SHA --tls-min-version=VersionTLS12 --tls-private-key-file=

3. Create pod with "security.alpha.kubernetes.io/unsafe-sysctls: net.ipv4.ip_forward=0", then check the pod status
[root@ip-172-18-10-19 ~]# oc create -f https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/pods/sysctls/pod-sysctl-unsafe.yaml -n dma
pod "hello-pod" created
[root@ip-172-18-10-19 ~]# oc get po -n dma -o wide
NAME        READY     STATUS            RESTARTS   AGE       IP        NODE
hello-pod   0/1       SysctlForbidden   0          8s        <none>    ip-172-18-8-45.ec2.internal

Actual results:
2. The argument is "--experimental-allowed-unsafe-sysctls=kernel.shm*,kernel.msg*,kernel.sem,fs.mqueue.*,net.*"
3. The pod status is SysctlForbidden

Expected results:
2. argument should be --experimental-allowed-unsafe-sysctls=kernel.shm*,kernel.msg*,kernel.sem,fs.mqueue.*,net.*
3. The pod should running

Additional info:

Comment 1 Seth Jennings 2018-06-06 16:07:51 UTC
This is due to the argument not matching the regex in shellEscapeArg() called by WriteKubeletFlags():
https://github.com/openshift/origin/blob/master/pkg/cmd/server/origin/node/node.go#L23

Unfortunately, it quotes both the flag and argument together rather than just the argument.

Comment 2 Seth Jennings 2018-06-06 20:19:17 UTC
Clayton pointed out that having the flag and argument inside the quotes is valid for some command parsers.  We need to verify that the argument is not being ignored by the kubelet parser.

Avesh, can you take a look?

Comment 3 Avesh Agarwal 2018-06-06 20:32:15 UTC
(In reply to Seth Jennings from comment #2)
> Clayton pointed out that having the flag and argument inside the quotes is
> valid for some command parsers.  We need to verify that the argument is not
> being ignored by the kubelet parser.
> 
> Avesh, can you take a look?

Sure, will take a look soon.

Comment 4 DeShuai Ma 2018-06-13 10:09:58 UTC
'--eviction-hard' and '--eviction-soft' also have same issue.

//kubelet process
root      4064 14.7  0.4 715616 78304 ?        Ssl  05:50   0:01 /usr/bin/hyperkube kubelet --v=5 --address=0.0.0.0 --allow-privileged=true --anonymous-auth=true --authentication-token-webhook=true --authentication-token-webhook-cache-ttl=5m --authorization-mode=Webhook --authorization-webhook-cache-authorized-ttl=5m --authorization-webhook-cache-unauthorized-ttl=5m --bootstrap-kubeconfig=/etc/origin/node/bootstrap.kubeconfig --cadvisor-port=0 --cert-dir=/etc/origin/node/certificates --cgroup-driver=systemd --client-ca-file=/etc/origin/node/client-ca.crt --cloud-config=/etc/origin/cloudprovider/aws.conf --cloud-provider=aws --cluster-dns=172.18.5.248 --cluster-domain=cluster.local --container-runtime=remote --container-runtime-endpoint=/var/run/crio/crio.sock --containerized=false --enable-controller-attach-detach=true "--eviction-hard=imagefs.available<15%,memory.available<100Mi,nodefs.available<10%,nodefs.inodesFree<10%" --eviction-max-pod-grace-period=10 "--eviction-soft=memory.available<15Gi" --eviction-soft-grace-period=memory.available=1m0s --experimental-dockershim-root-directory=/var/lib/dockershim --fail-swap-on=false --feature-gates=RotateKubeletClientCertificate=true,RotateKubeletServerCertificate=true --file-check-frequency=0s --healthz-bind-address= --healthz-port=0 --host-ipc-sources=api --host-ipc-sources=file --host-network-sources=api --host-network-sources=file --host-pid-sources=api --host-pid-sources=file --hostname-override= --http-check-frequency=0s --image-service-endpoint=/var/run/crio/crio.sock --iptables-masquerade-bit=0 --kubeconfig=/etc/origin/node/node.kubeconfig --max-pods=250 --network-plugin=cni --node-ip= --node-labels=node-role.kubernetes.io/master=true --pod-infra-container-image=registry.reg-aws.openshift.com:443/openshift3/ose-pod:v3.10.0-0.66.0 --pod-manifest-path=/etc/origin/node/pods --port=10250 --read-only-port=0 --register-node=true --root-dir=/var/lib/origin/openshift.local.volumes --rotate-certificates=true --runtime-request-timeout=10m --tls-cert-file= --tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305 --tls-cipher-suites=TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305 --tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256 --tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 --tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384 --tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 --tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256 --tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256 --tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA --tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA --tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA --tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA --tls-cipher-suites=TLS_RSA_WITH_AES_128_GCM_SHA256 --tls-cipher-suites=TLS_RSA_WITH_AES_256_GCM_SHA384 --tls-cipher-suites=TLS_RSA_WITH_AES_128_CBC_SHA --tls-cipher-suites=TLS_RSA_WITH_AES_256_CBC_SHA --tls-min-version=VersionTLS12 --tls-private-key-file= 


//The value in node log
Jun 12 23:42:27 ip-172-18-5-248.ec2.internal atomic-openshift-node[13437]: I0612 23:42:27.269941   13437 flags.go:27] FLAG: --eviction-hard="imagefs.available<15%,memory.available<100Mi,nodefs.available<10%,nodefs.inodesFree<5%"
Jun 12 23:42:27 ip-172-18-5-248.ec2.internal atomic-openshift-node[13437]: I0612 23:42:27.269958   13437 flags.go:27] FLAG: --eviction-max-pod-grace-period="0"
Jun 12 23:42:27 ip-172-18-5-248.ec2.internal atomic-openshift-node[13437]: I0612 23:42:27.269963   13437 flags.go:27] FLAG: --eviction-minimum-reclaim=""
Jun 12 23:42:27 ip-172-18-5-248.ec2.internal atomic-openshift-node[13437]: I0612 23:42:27.269971   13437 flags.go:27] FLAG: --eviction-pressure-transition-period="5m0s"
Jun 12 23:42:27 ip-172-18-5-248.ec2.internal atomic-openshift-node[13437]: I0612 23:42:27.269976   13437 flags.go:27] FLAG: --eviction-soft=""
Jun 12 23:42:27 ip-172-18-5-248.ec2.internal atomic-openshift-node[13437]: I0612 23:42:27.269982   13437 flags.go:27] FLAG: --eviction-soft-grace-period=""

Comment 5 DeShuai Ma 2018-06-13 10:11:49 UTC
I notice you moved the bug to 3.10.z, Can we fix in 3.10.0? As this cause to some parameter can't be overwrite.

Comment 8 Avesh Agarwal 2018-06-15 15:03:01 UTC
Both PRs are merged. Moving it to Modified.

Comment 10 weiwei jiang 2018-06-20 06:58:54 UTC
Checked on v3.10.1  with 
openshift-ansible-3.10.1-1.git.157.2bb6250.el7.noarch.rpm
openshift-ansible-docs-3.10.1-1.git.157.2bb6250.el7.noarch.rpm
openshift-ansible-playbooks-3.10.1-1.git.157.2bb6250.el7.noarch.rpm
openshift-ansible-roles-3.10.1-1.git.157.2bb6250.el7.noarch.rpm

And the issue has been fixed.

Comment 12 errata-xmlrpc 2018-07-30 19:17:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:1816