Bug 1367937

Summary: Pods on Containerized OCP Nodes cannot mount remote ceph rbd
Product: OpenShift Container Platform Reporter: Jon Cope <jcope>
Component: InstallerAssignee: Scott Dodson <sdodson>
Status: CLOSED ERRATA QA Contact: Johnny Liu <jialiu>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.2.1CC: aos-bugs, bchilds, eparis, jhou, jokerman, mmccomas, tdawson, xtian
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Previously, the containerized node mounted /sys read-only which prevented the node from mounting CEPH volumes. This mount for the containerized node has been updated to be read-write allowing the node to mount CEPH volumes properly.
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-09-27 09:44:47 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jon Cope 2016-08-17 22:12:01 UTC
Description of problem:
When running OSE 3.2 containerize on a Rhel Atomic Host, pods created on the node cannot mount remote ceph RBDs.  The reason is that the docker run in atomic-openshift-node.service bind mounts /sys:/sys as read-only.  It should instead be read write.


Version-Release number of selected component (if applicable):
2 VM OSE 3.2 Cluster
Master: Rhel 7.2 Server
Node:   RHEL AH 7.2
Ceph:   AIO Containerized Demo hosted on Master


How reproducible:
Very.  Problem occurs because of a default configuration for atomic-openshift-node.service


Steps to Reproduce:
1. Install OSE 3.2 using atomic-openshift-installer. Select rpm for master, containerize for node.

2. deploy ceph demo: 
`docker run --net=host -v /etc/ceph:/etc/ceph -v /var/lib/ceph:/var/lib/ceph   -e MON_IP=192.168.124.240 -e CEPH_NETWORK=192.168.124.0/24 -e CEPH_PUBLIC_NETWORK=192.168.124.0/24  ceph/demo`

3. Create and deploy ceph secret, persistent volume, and claim specs per https://docs.openshift.org/latest/install_config/persistent_storage/persistent_storage_ceph_rbd.html.

Pod template used:
apiVersion: v1
kind: Pod
metadata:
  generateName: ceph-nginx-pvc-
  labels:
    vol: ceph 
spec:
  containers:
   - name: ceph-pvc
     image: fedora/nginx 
     volumeMounts:
       - name: cephvol
         mountPath: /mnt/import
  securityContext:
    privileged: true
  volumes:
    - name: cephvol 
      persistentVolumeClaim:
        claimName: ceph-claim

4. oc describe pod to view relevant errors


Actual results:

journal -eu atomic-openshift-node output:
Aug 17 10:12:51 node.ae atomic-openshift-node[3448]: E0817 10:12:51.001795    3499 pod_workers.go:138] Error syncing pod a22f1231-6488-11e6-9284-5254007c2254, skipping: rbd: map failed exit status 30 2016-08-17 10:12:50.982834 7fc7100987c
Aug 17 10:12:51 node.ae atomic-openshift-node[3448]: rbd: add failed: (30) Read-only file system
Aug 17 10:13:02 node.ae atomic-openshift-node[3448]: I0817 10:13:02.966649    3499 rbd.go:89] ceph secret info: key/AQBSLK5XeNKqBxAAdYFblXFkebX2KVVEo7ESjw==
Aug 17 10:13:03 node.ae atomic-openshift-node[3448]: I0817 10:13:03.970063    3499 rbd_util.go:229] rbd: map mon 192.168.124.240:6789
Aug 17 10:13:04 node.ae atomic-openshift-node[3448]: I0817 10:13:04.011021    3499 rbd_util.go:240] rbd: map error exit status 30 2016-08-17 10:13:03.978876 7f5ab10b67c0 -1 did not load config file, using default settings.
Aug 17 10:13:04 node.ae atomic-openshift-node[3448]: rbd: add failed: (30) Read-only file system
Aug 17 10:13:04 node.ae atomic-openshift-node[3448]: E0817 10:13:04.011093    3499 disk_manager.go:56] failed to attach disk
Aug 17 10:13:04 node.ae atomic-openshift-node[3448]: E0817 10:13:04.011103    3499 rbd.go:208] rbd: failed to setup
Aug 17 10:13:04 node.ae atomic-openshift-node[3448]: E0817 10:13:04.011170    3499 kubelet.go:1796] Unable to mount volumes for pod "ceph-nginx-pvc-0gr12_default(a22f1231-6488-11e6-9284-5254007c2254)": rbd: map failed exit status 30 2016-
Aug 17 10:13:04 node.ae atomic-openshift-node[3448]: rbd: add failed: (30) Read-only file system
Aug 17 10:13:04 node.ae atomic-openshift-node[3448]: ; skipping pod



Expected results:
successfully attach rbd to pod via pvc

Additional Info:

I was able to create the pod after modifying /etc/systemd/system/atomic-openshift-node.service.  Changing ExecStart=/usr/bin/docker run --name atomic-openshift-node --rm --privileged ....... -v /sys:/sys:ro.... to -v /sys:/sys:rw    After that, systemctl daemon-reload && systemctl restart atomic-openshift-node.

Comment 1 Bradley Childs 2016-08-18 17:51:59 UTC
Fixed here:
https://github.com/openshift/origin/pull/10516

Comment 2 Scott Dodson 2016-08-18 20:16:51 UTC
This really gets fixed in the installer. The systemd units in origin unfortunately are just for reference.

https://github.com/openshift/openshift-ansible/pull/2324 - 3.2
https://github.com/openshift/openshift-ansible/pull/2323 - 3.3

Comment 9 Jianwei Hou 2016-08-25 02:17:14 UTC
With latest ansible installer /sys:/sys is mounted as RW, ceph rbd server could be used as persistent storage for pod. Mark this bug as verified.

Comment 11 errata-xmlrpc 2016-09-27 09:44:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1933