Bug 1313210
| Summary: | Cinder volume could not be attached to disk before the '60s' timeout duration on containerized openshift | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Jianwei Hou <jhou> |
| Component: | Storage | Assignee: | Jan Safranek <jsafrane> |
| Status: | CLOSED ERRATA | QA Contact: | Jianwei Hou <jhou> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 3.2.0 | CC: | agoldste, aos-bugs, jhou, jkrieger, jokerman, jsafrane, mmccomas, mwysocki, pmorie, sdodson, tdawson |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2016-05-12 16:30:51 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Jianwei Hou
2016-03-01 08:21:56 UTC
I failed to set up OpenShift in OpenStack running as containers. Internal OS1 cloud is slow as hell and ansible script fails at various stages. Can you please give me access to a machine, where it is reproducible so I can take a look? Or teach me how to provision one, I heard you have some scripting around it. Finally, I am able to reproduce it. It's indeed caused by containerized openshift-node. When it attaches a cinder (or any other) volume to the host, it expects appropriate device created in /dev. Since openshift runs in a container, it does not see real /dev/ but the container one. And it times out waiting for the attached device. As a solution, I would propose to run openshift-node with "docker run -v /dev:/dev". Or OpenShift/Kubernetes must be changed to look for devices in configurable directory, not hardcoded /dev/. The same should happen also on GCE or with containerized OpenShift attaching iSCSI, Ceph RBD or any other block device. AWS might be protected from this error as device names are assigned by kubelet (and thus do not need to be loaded from /dev) - I did not check this as running containerized OpenShift is quite painful. Jianwei, I am still very interested in some automated way how to run containerized OpenShift on OpenStack and/or AWS, especially with nightly builds. So, can anyone add "-v /dev/:/dev" to /etc/systemd/system/openshift-node.service when running node as container? Is it a good idea? Reassigning to Contaners component. experimental patch: https://github.com/openshift/origin/pull/8119 Adding Scott to cc:. You're the last one who updated the .service files - can you please look at it? The mountpoint check should be running in the host mount namespace -- why do we need to mount in /dev? Because volume plugins are not aware of running in container. Only mounter is. Why is this assigned to me? Jon? This looks to be specific to the container. Fixed by https://github.com/openshift/origin/pull/8182 Assigning to Scott, he did the fix. No development work should be needed now, we're just waiting for QE to the bug is really fixed. This should be in OSE v3.2.0.7 that was build and pushed to qe today. Tested on
openshift v3.2.0.7
kubernetes v1.2.0-36-g4a3f9c5
etcd 2.2.5
Steps were same with bug description.
Pod status remained in 'ContainerCreating'
# oc get pods
NAME READY STATUS RESTARTS AGE
cinderpd 0/1 ContainerCreating 0 10m
Run docker exec interactively in the node container, tailf the /var/log/messages, found following errors:
```
Mar 24 06:56:06 openshift-111 atomic-openshift-node: I0324 06:56:06.904016 7383 server.go:606] Event(api.ObjectReference{Kind:"Pod", Namespace:"
jhou", Name:"cinderpd", UID:"8d277d8b-f1ad-11e5-af25-fa163e4f5f19", APIVersion:"v1", ResourceVersion:"7283", FieldPath:""}): type: 'Warning' reason
: 'FailedMount' Unable to mount volumes for pod "cinderpd_jhou(8d277d8b-f1ad-11e5-af25-fa163e4f5f19)": exit status 32
Mar 24 06:56:06 openshift-111 atomic-openshift-node: I0324 06:56:06.904103 7383 server.go:606] Event(api.ObjectReference{Kind:"Pod", Namespace:"
jhou", Name:"cinderpd", UID:"8d277d8b-f1ad-11e5-af25-fa163e4f5f19", APIVersion:"v1", ResourceVersion:"7283", FieldPath:""}): type: 'Warning' reason
: 'FailedSync' Error syncing pod, skipping: exit status 32
```
Went to the openstack console -> volumes, the UI showed that the volume was in-use and was attached to my node 'openshift-111.lab.eng.nay.redhat.com'. But openshift thought it was a failed mount.
Furthermore, the PV I created was dynamically provisioned, the pod was stuck in 'ContainerCreating', so I deleted the pod and pvc, the provisioned PV and volume were left behind, they were not deleted.(If this is later considered another issue, I will open another bug to track).
There is something wrong with OpenShift nsenter mounter, I'll look at it. Fixed mounter: https://github.com/kubernetes/kubernetes/pull/23435. Waiting for review. Merged as origin PR: https://github.com/openshift/origin/pull/8501 Merge failed. Please try merging again. 8501 just merged Should be in atomic-openshift-3.2.0.18-1.git.0.c3ac515.el7. This has been built and staged for qe. Verified on containerized setup of openshift v3.2.0.18 kubernetes v1.2.0-36-g4a3f9c5 etcd 2.2.5 Run reproduce steps, this bug is not reproduced now. Mark this bug as verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2016:1064 |