Bug 1466248

Summary: [rbd] Improve error messaging for failed volume mount
Product: OpenShift Container Platform Reporter: Jianwei Hou <jhou>
Component: StorageAssignee: hchen
Status: CLOSED UPSTREAM QA Contact: Jianwei Hou <jhou>
Severity: low Docs Contact:
Priority: low    
Version: 3.6.0CC: aos-bugs, aos-storage-staff, rhs-bugs
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-01-11 16:19:34 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jianwei Hou 2017-06-29 10:52:43 UTC
Description of problem:
Create an rbd PV with invalid monitors, when the Pod failed to mount, there were no useful messages describing how/why it failed.

Version-Release number of selected component (if applicable):
openshift v3.6.126.1
kubernetes v1.6.1+5115d708d7

How reproducible:
Always

Steps to Reproduce:
1. Create rbd PV with invalid monitors
2. Create PVC and Pod
3. Pod failed to mount, run 'oc describe pod'

Actual results:
The describe command only shew:

```
  8m            8m              1       default-scheduler                                       Normal          Scheduled               Successfully assigned rbdpd to ip-172-18-7-39.ec2.internal
  6m            1m              3       kubelet, ip-172-18-7-39.ec2.internal                    Warning         FailedMount             Unable to mount volumes for pod "rbdpd_b5wkc(ef35557e-5cb3-11e7-af12-0e8aebc0
d9ea)": timeout expired waiting for volumes to attach/mount for pod "b5wkc"/"rbdpd". list of unattached/unmounted volumes=[pvol]
  6m            1m              3       kubelet, ip-172-18-7-39.ec2.internal                    Warning         FailedSync              Error syncing pod
```

Expected results:
In earlier versions, we could get `rbd: map failed` with oc describe, but in 3.6 there are almost no useful message.

Additional info:

Master Log:

Node Log (of failed PODs):

PV Dump:
apiVersion: v1
kind: PersistentVolume
metadata:
  annotations:
    pv.kubernetes.io/bound-by-controller: "yes"
  creationTimestamp: 2017-06-29T10:15:43Z
  name: rbd-b5wkc
  resourceVersion: "13813"
  selfLink: /api/v1/persistentvolumes/rbd-b5wkc
  uid: e8b896ae-5cb3-11e7-af12-0e8aebc0d9ea
spec:
  accessModes:
  - ReadWriteOnce
  capacity:
    storage: 5Gi
  claimRef:
    apiVersion: v1
    kind: PersistentVolumeClaim
    name: rbdc
    namespace: b5wkc
    resourceVersion: "13811"
    uid: ea7fb166-5cb3-11e7-af12-0e8aebc0d9ea
  persistentVolumeReclaimPolicy: Retain
  rbd:
    fsType: ext4
    image: disk01
    keyring: /etc/ceph/keyring
    monitors:
    - 000.000.0.000:0009
    - 000.000.0.000:0009
    - 000.000.0.000:0009
    - 000.000.0.000:0009
    pool: rbd
    secretRef:
      name: cephrbd-secret
    user: admin
status:
  phase: Bound

PVC Dump:

StorageClass Dump (if StorageClass used by PV/PVC):

Additional info:

Comment 6 hchen 2018-01-11 20:00:33 UTC
*** Bug 1505264 has been marked as a duplicate of this bug. ***