Bug 1584386

Summary: Docker daemon fails to create kube-state-metrics container with mounted secrets
Product: OpenShift Container Platform Reporter: Dan Mace <dmace>
Component: HawkularAssignee: Frederic Branczyk <fbranczy>
Status: CLOSED CURRENTRELEASE QA Contact: Junqi Zhao <juzhao>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.10.0CC: aos-bugs, dma, jokerman, mmccomas
Target Milestone: ---   
Target Release: 3.10.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-06-27 08:42:05 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
kube-state-metrics pod YAML
none
prometheus-k8s pod
none
kube-state-metrics pod info none

Description Dan Mace 2018-05-30 19:07:36 UTC
Created attachment 1445976 [details]
kube-state-metrics pod YAML

Description of problem:

In the free-stg cluster, the kube-state-metrics pod (part of the monitoring platform in the openshift-monitoring namespace) has a container which is failing to be created by the Docker daemon. The pod uses mounted secrets (and no other volumes). I've attached the pod's YAML. Here's the Docker daemon error:

May 30 17:32:51 ip-172-31-66-208.us-east-2.compute.internal dockerd-current[2313]: time="2018-05-30T17:32:51.280202577Z" level=error msg="Handler for POST /v1.26/containers/create?name=k8s_kube-state-metrics_kube-state-metrics-d6f855965-p4989_openshift-monitoring_825e7362-6425-11e8-bc67-02306c0cdc4b_0 returned error: create b9cc4e7d51a2d2454fd711ad9621214e71afecc8c038f42d290d8f258d85af52: error while creating volume path '/var/lib/docker/volumes/b9cc4e7d51a2d2454fd711ad9621214e71afecc8c038f42d290d8f258d85af52/_data': mkdir /var/lib/docker/volumes/b9cc4e7d51a2d2454fd711ad9621214e71afecc8c038f42d290d8f258d85af52: permission denied"
May 30 17:32:51 ip-172-31-66-208.us-east-2.compute.internal dockerd-current[2313]: time="2018-05-30T17:32:51.282540234Z" level=error msg="Handler for POST /v1.26/containers/create returned error: create b9cc4e7d51a2d2454fd711ad9621214e71afecc8c038f42d290d8f258d85af52: error while creating volume path '/var/lib/docker/volumes/b9cc4e7d51a2d2454fd711ad9621214e71afecc8c038f42d290d8f258d85af52/_data': mkdir /var/lib/docker/volumes/b9cc4e7d51a2d2454fd711ad9621214e71afecc8c038f42d290d8f258d85af52: permission denied"
May 30 17:32:51 ip-172-31-66-208.us-east-2.compute.internal atomic-openshift-node[8645]: E0530 17:32:51.282987    8645 remote_runtime.go:187] CreateContainer in sandbox "b70c2941d6eff0627ee7ca93a32c913fe8cf41b20189744f865355951b7d6916" from runtime service failed: rpc error: code = Unknown desc = Error response from daemon: create b9cc4e7d51a2d2454fd711ad9621214e71afecc8c038f42d290d8f258d85af52: error while creating volume path '/var/lib/docker/volumes/b9cc4e7d51a2d2454fd711ad9621214e71afecc8c038f42d290d8f258d85af52/_data': mkdir /var/lib/docker/volumes/b9cc4e7d51a2d2454fd711ad9621214e71afecc8c038f42d290d8f258d85af52: permission denied
May 30 17:32:51 ip-172-31-66-208.us-east-2.compute.internal atomic-openshift-node[8645]: E0530 17:32:51.283089    8645 kuberuntime_manager.go:734] container start failed: CreateContainerError: Error response from daemon: create b9cc4e7d51a2d2454fd711ad9621214e71afecc8c038f42d290d8f258d85af52: error while creating volume path '/var/lib/docker/volumes/b9cc4e7d51a2d2454fd711ad9621214e71afecc8c038f42d290d8f258d85af52/_data': mkdir /var/lib/docker/volumes/b9cc4e7d51a2d2454fd711ad9621214e71afecc8c038f42d290d8f258d85af52: permission denied
May 30 17:32:51 ip-172-31-66-208.us-east-2.compute.internal atomic-openshift-node[8645]: E0530 17:32:51.283249    8645 pod_workers.go:186] Error syncing pod 825e7362-6425-11e8-bc67-02306c0cdc4b ("kube-state-metrics-d6f855965-p4989_openshift-monitoring(825e7362-6425-11e8-bc67-02306c0cdc4b)"), skipping: failed to "StartContainer" for "kube-state-metrics" with CreateContainerError: "Error response from daemon: create b9cc4e7d51a2d2454fd711ad9621214e71afecc8c038f42d290d8f258d85af52: error while creating volume path '/var/lib/docker/volumes/b9cc4e7d51a2d2454fd711ad9621214e71afecc8c038f42d290d8f258d85af52/_data': mkdir /var/lib/docker/volumes/b9cc4e7d51a2d2454fd711ad9621214e71afecc8c038f42d290d8f258d85af52: permission denied"

The monitoring deployment is identical to free-int, where the pod works correctly. This broken pod prevents the monitoring stack from being fully rolled out by the cluster-monitoring-operator, degrading overall functionality.

Version-Release number of selected component (if applicable):

v3.10.0-0.54.0

Comment 1 Dan Mace 2018-05-30 19:15:16 UTC
Another problem which stands out: the openshift-monitoring/statefulsets/prometheus-k8s pods mount their data directories into emptydir volumes, and prometheus processes are unable to write to those mounted directories:

  WAL log samples: log series: write /prometheus/wal/000001: file already closed

Seems very coincidental we're experiencing emptydir mounts as well. I'm attaching the YAML for that pod as well.

Comment 2 Dan Mace 2018-05-30 19:15:55 UTC
Created attachment 1445977 [details]
prometheus-k8s pod

Comment 3 Dan Mace 2018-05-30 20:26:25 UTC
Spoke with Seth, looks like we found another image which is declaring a VOLUME we're not overriding in the pod spec:

https://github.com/kubernetes/kube-state-metrics/blob/master/Dockerfile#L4

Looks like free-int must be misconfigured to allow auto-mounting of VOLUMEs to succeed.

Comment 4 Dan Mace 2018-05-30 21:05:22 UTC
Created https://bugzilla.redhat.com/show_bug.cgi?id=1584415 to cover the separate issue[1].

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1584386#c1

Comment 5 Frederic Branczyk 2018-05-31 07:23:28 UTC
Opened the upstream fix: https://github.com/kubernetes/kube-state-metrics/pull/471

In the mean time we can just shadow the `/tmp` volume directive with an empty dir volume.

Comment 6 Frederic Branczyk 2018-05-31 08:58:05 UTC
Opened the intermediate fix for the cluster-monitoring-operator as well: https://github.com/openshift/cluster-monitoring-operator/pull/28

Comment 7 Dan Mace 2018-06-06 13:22:39 UTC
Fixed in https://github.com/openshift/openshift-ansible/pull/8591

Comment 9 Junqi Zhao 2018-06-07 02:46:13 UTC
Tested with openshift-ansible-3.10.0-0.63.0.git.0.961c60d.el7.noarch, kube-state-metrics pod could be started up without error

# oc get po | grep kube-state-metrics
kube-state-metrics-b44488686-fk4wd            3/3       Running   0          19m

Steps:
1. set openshift_monitoring_deploy=true in inventory file
2. run with playbooks/openshift-monitoring/config.yml playbook


attached kube-state-metrics pod info

Comment 10 Junqi Zhao 2018-06-07 02:46:54 UTC
Created attachment 1448565 [details]
kube-state-metrics pod info