Bug 1373119 - [infrastructure_public_178]Pod with 'tcp_max_syn_backlog' and 'tcp_syncookies' sysctls always Failed
Summary: [infrastructure_public_178]Pod with 'tcp_max_syn_backlog' and 'tcp_syncookies...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 3.3.1
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: Stefan Schimanski
QA Contact: DeShuai Ma
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-09-05 09:28 UTC by DeShuai Ma
Modified: 2017-03-08 18:43 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Release Note
Doc Text:
Part of https://github.com/openshift/openshift-docs/commit/5a532e7b9d2795e31c18c169078f037e68a0afdf
Clone Of:
: 1390706 (view as bug list)
Environment:
Last Closed: 2017-01-18 12:53:19 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:0066 0 normal SHIPPED_LIVE Red Hat OpenShift Container Platform 3.4 RPM Release Advisory 2017-01-18 17:23:26 UTC

Description DeShuai Ma 2016-09-05 09:28:21 UTC
Description of problem:
Create a pod with safe sysctls 'net.ipv4.tcp_max_syn_backlog' or 'net.ipv4.tcp_syncookies', the pod always failed. And the infra container has error: "open /proc/sys/net/ipv4/tcp_max_syn_backlog: no such file or directory" or "open /proc/sys/net/ipv4/tcp_syncookies: no such file or directory".

Version-Release number of selected component (if applicable):
fork_ami_openshift3_clusterinfra_public_178_299

How reproducible:
Always

Steps to Reproduce:
1.Create a pod with safe sysctls 'net.ipv4.tcp_max_syn_backlog' or 'net.ipv4.tcp_syncookies'
oc create -f https://raw.githubusercontent.com/mdshuai/testfile-openshift/master/sysctls/pod-sysctl-safe.yaml

2.Check the pod status
[root@ip-172-18-0-79 dma]# oc logs hello-pod
Error from server: container "hello-pod" in pod "hello-pod" is waiting to start: ContainerCreating
[root@ip-172-18-0-79 dma]# oc get pod
NAME        READY     STATUS              RESTARTS   AGE
hello-pod   0/1       ContainerCreating   0          17s
[root@ip-172-18-0-79 dma]# 
[root@ip-172-18-0-79 dma]# oc describe pod hello-pod
Name:			hello-pod
Namespace:		dma
Security Policy:	anyuid
Node:			ip-172-18-0-79.ec2.internal/172.18.0.79
Start Time:		Mon, 05 Sep 2016 05:19:11 -0400
Labels:			name=hello-pod
Status:			Pending
IP:			
Controllers:		<none>
Containers:
  hello-pod:
    Container ID:	
    Image:		docker.io/deshuai/hello-pod:latest
    Image ID:		
    Port:		8080/TCP
    State:		Waiting
      Reason:		ContainerCreating
    Ready:		False
    Restart Count:	0
    Volume Mounts:
      /tmp from tmp (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-l29uk (ro)
    Environment Variables:	<none>
Conditions:
  Type		Status
  Initialized 	True 
  Ready 	False 
  PodScheduled 	True 
Volumes:
  tmp:
    Type:	EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:	
  default-token-l29uk:
    Type:	Secret (a volume populated by a Secret)
    SecretName:	default-token-l29uk
QoS Tier:	BestEffort
Events:
  FirstSeen	LastSeen	Count	From					SubobjectPath	Type		Reason		Message
  ---------	--------	-----	----					-------------	--------	------		-------
  25s		25s		1	{default-scheduler }					Normal		Scheduled	Successfully assigned hello-pod to ip-172-18-0-79.ec2.internal
  24s		24s		1	{kubelet ip-172-18-0-79.ec2.internal}			Warning		FailedSync	Error syncing pod, skipping: failed to "StartContainer" for "POD" with RunContainerError: "runContainer: Error response from daemon: Cannot start container 7054eff9d1fc4d5d18cc5097263293421e19ba7c980f5b33f25df7481c30b718: [9] System error: could not synchronise with container process"

  12s	12s	1	{kubelet ip-172-18-0-79.ec2.internal}		Warning	FailedSync	Error syncing pod, skipping: failed to "StartContainer" for "POD" with RunContainerError: "runContainer: Error response from daemon: Cannot start container e4fd3f95a3a0e74425128482f895fe20669cb3467e90f8436b2469734fecd567: [9] System error: could not synchronise with container process"

3.Check the infra container
[root@ip-172-18-0-79 dma]# docker ps -a
CONTAINER ID        IMAGE                                             COMMAND                  CREATED              STATUS                  PORTS               NAMES
1898e7d3864a        openshift/origin-pod:v1.3.0-alpha.3               "/pod"                   9 seconds ago        Created                                     k8s_POD.f98c0130_hello-pod_dma_ce3d8845-7349-11e6-91fb-0e95fd557f75_236c2de8
[root@ip-172-18-0-79 dma]# docker logs 1898e7d3864a
open /proc/sys/net/ipv4/tcp_max_syn_backlog: no such file or directory

Actual results:
2.Pod can't running

Expected results:
2.Pod is running

Additional info:

Comment 1 Stefan Schimanski 2016-09-05 09:45:09 UTC
net.ipv4.tcp_syncookies works fine here on the fedora kernel. Will check on the RHEL kernel.

Comment 2 Stefan Schimanski 2016-09-05 10:35:18 UTC
Double checked with recent CentOS VM: net.ipv4.tcp_syncookies is not namespaced.

Comment 3 Stefan Schimanski 2016-09-05 10:45:55 UTC
I will check what we can do about better error behavior in the docker runtime. I guess (without a lot of Kernel knowledge in the kubelet code) the we can do is to fail hard with a good error message. Right now, the kubelet retries a number of times when a container creation fails. That's not a good user experience.

Comment 4 Stefan Schimanski 2016-09-05 11:00:52 UTC
Here is the kernel patch for tcp_syncookies: https://github.com/torvalds/linux/commit/12ed8244ed8b31b023ea6d2851fd8b15f2999e9b

Summary: available in >=4.6

Comment 5 Stefan Schimanski 2016-09-05 11:14:08 UTC
net.ipv4.tcp_max_syn_backlog does not belong onto the whitelist as it is not namespaced in any of today's kernel. Having it whitelisted does no harm though because the sysctl is just not available under /proc/sys in the container.

Here is the upstream fix: https://github.com/kubernetes/kubernetes/pull/32072

Comment 6 Stefan Schimanski 2016-09-05 12:55:53 UTC
About better error behavior: the error message about the failed sysctl is part of the stdout/err of the container in Docker 1.10, not of the actual container status. Compare:

$ docker -l=debug run -d --sysctl=kernel.shm_rmid_forced=hello ubuntu /bin/bash -c "sysctl kernel.shm_rmid_forced"
09813c9c38cf071e0581c38acccce149331d88729bdb0a47f586ca73a92c7221
docker: Error response from daemon: Cannot start container 09813c9c38cf071e0581c38acccce149331d88729bdb0a47f586ca73a92c7221: [9] System error: could not synchronise with container process.
$ docker logs 09813c9c38cf071e0581c38acccce149331d88729bdb0a47f586ca73a92c7221
write /proc/sys/kernel/shm_rmid_forced: invalid argument

With Docker 1.12 the situation is better:

$ docker -l=debug run -d --sysctl=kernel.shm_rmid_forced=hello busybox /bi>
9483cb34b2559e26a96db01c29727c11afd7d5461588c607992119355507f541
docker: Error response from daemon: oci runtime error: write /proc/sys/kernel/shm_rmid_forced: invalid argument.

Comment 7 Stefan Schimanski 2016-09-05 13:20:26 UTC
Bad news (even with Docker 1.12) is that we don't have any mechanism right now to react on specific container creation errors and fail a pod.

Comment 8 DeShuai Ma 2016-09-05 14:12:48 UTC
net.ipv4.tcp_syncookies is same issue, could you double check.

Comment 9 Stefan Schimanski 2016-09-05 14:22:37 UTC
Summing up all above in one comment:

The following applies to all failing sysctls:

- if the kernel does not namespace a sysctl ("no such file or directory") or reject a sysctl value ("invalid argument"), Docker 1.10 will show this in the container logs only. Kubernetes/OpenShift will not see this output. Moreover it will continue to launch the container (with a backoff) creating multiple events.

- With Docker 1.12 the situation is a bit better as Docker reports the very sysctl error in the container launch error message. Then at least the user can see that it was a sysctl error.

The actual failure reason for all the sysctls mentioned is slightly different:

- net.ipv4.tcp_max_syn_backlog is not namespaced on any kernel and should be remove from the whitelist (https://github.com/kubernetes/kubernetes/pull/32072)

- net.ipv4.tcp_syncookies is namespaced in Kernels >= 4.6, but not in RHEL's enterprise kernel. If customers request this, here is the kernel patch to backport by the kernel team: https://github.com/torvalds/linux/commit/12ed8244ed8b31b023ea6d2851fd8b15f2999e9b

- kernel.shm_rmid_forced=hello fails because the kernel validates the value "hello" and rejects it.

Comment 10 DeShuai Ma 2016-10-09 09:56:30 UTC
For net.ipv4.tcp_syncookies support need document Kernels >= 4.6 in openshift-doc.

Comment 12 DeShuai Ma 2016-10-28 02:06:32 UTC
Still has issue, Don't know why changed to ON_QA.

Comment 14 Stefan Schimanski 2016-10-31 15:05:42 UTC
Backport of the net.ipv4.tcp_max_syn_backlog removal to 3.3.x: https://github.com/openshift/ose/pull/441

Comment 15 Andy Goldstein 2016-11-01 16:02:35 UTC
https://github.com/openshift/openshift-docs/pull/3144 is the docs PR that will soon contain the release note that net.ipv4.tcp_syncookies is not namespaced in the RHEL kernel.

The ose PR will fix net.ipv4.tcp_max_syn_backlog in 3.3.x. It is already fixed in 3.4.

Comment 16 Andy Goldstein 2016-11-01 17:16:14 UTC
Moving to MODIFIED as the fix for net.ipv4.tcp_max_syn_backlog is in 3.4 already.

As described above, we will add a release note about net.ipv4.tcp_syncookies.

DeShuai Ma - when verifying this bz, please only verify that net.ipv4.tcp_max_syn_backlog is no longer in the whitelist.

Comment 17 Troy Dawson 2016-11-02 17:53:47 UTC
This has been merged into ose and is in OSE v3.4.0.19 or newer.

Comment 18 Xiaoli Tian 2016-11-04 06:20:04 UTC
Move to ON_QA according to comment 17

Comment 19 DeShuai Ma 2016-11-04 07:17:20 UTC
Test on openshift v3.4.0.21+ca4702d
Verify the bug. "net.ipv4.tcp_max_syn_backlog" is not whitelisted

[root@dhcp-128-7 dma]# oc get pod
NAME        READY     STATUS            RESTARTS   AGE
hello-pod   0/1       SysctlForbidden   0          <invalid>
[root@dhcp-128-7 dma]# oc describe pod hello-pod
Name:			hello-pod
Namespace:		dma
Security Policy:	restricted
Node:			weshi-3.centralus.cloudapp.azure.com/
Start Time:		Fri, 04 Nov 2016 15:15:37 +0800
Labels:			name=hello-pod
Status:			Failed
Reason:			SysctlForbidden
Message:		Pod forbidden sysctl: "net.ipv4.tcp_max_syn_backlog" not whitelisted
IP:			
Controllers:		<none>
Containers:
  hello-pod:
    Image:	docker.io/deshuai/hello-pod:latest
    Port:	8080/TCP
    Volume Mounts:
      /tmp from tmp (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-jakux (ro)
    Environment Variables:	<none>
Volumes:
  tmp:
    Type:	EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:	
  default-token-jakux:
    Type:	Secret (a volume populated by a Secret)
    SecretName:	default-token-jakux
QoS Class:	BestEffort
Tolerations:	<none>
Events:
  FirstSeen	LastSeen	Count	From						SubobjectPath	Type		Reason		Message
  ---------	--------	-----	----						-------------	--------	------		-------
  <invalid>	<invalid>	1	{kubelet weshi-3.centralus.cloudapp.azure.com}			Warning		SysctlForbidden	forbidden sysctl:
"net.ipv4.tcp_max_syn_backlog" not whitelisted

Comment 21 errata-xmlrpc 2017-01-18 12:53:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:0066


Note You need to log in before you can comment on or make changes to this bug.