Bug 1466248

Summary:	[rbd] Improve error messaging for failed volume mount
Product:	OpenShift Container Platform	Reporter:	Jianwei Hou <jhou>
Component:	Storage	Assignee:	hchen
Status:	CLOSED UPSTREAM	QA Contact:	Jianwei Hou <jhou>
Severity:	low	Docs Contact:
Priority:	low
Version:	3.6.0	CC:	aos-bugs, aos-storage-staff, rhs-bugs
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2018-01-11 16:19:34 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Jianwei Hou 2017-06-29 10:52:43 UTC

Description of problem:
Create an rbd PV with invalid monitors, when the Pod failed to mount, there were no useful messages describing how/why it failed.

Version-Release number of selected component (if applicable):
openshift v3.6.126.1
kubernetes v1.6.1+5115d708d7

How reproducible:
Always

Steps to Reproduce:
1. Create rbd PV with invalid monitors
2. Create PVC and Pod
3. Pod failed to mount, run 'oc describe pod'

Actual results:
The describe command only shew:

```
  8m            8m              1       default-scheduler                                       Normal          Scheduled               Successfully assigned rbdpd to ip-172-18-7-39.ec2.internal
  6m            1m              3       kubelet, ip-172-18-7-39.ec2.internal                    Warning         FailedMount             Unable to mount volumes for pod "rbdpd_b5wkc(ef35557e-5cb3-11e7-af12-0e8aebc0
d9ea)": timeout expired waiting for volumes to attach/mount for pod "b5wkc"/"rbdpd". list of unattached/unmounted volumes=[pvol]
  6m            1m              3       kubelet, ip-172-18-7-39.ec2.internal                    Warning         FailedSync              Error syncing pod
```

Expected results:
In earlier versions, we could get `rbd: map failed` with oc describe, but in 3.6 there are almost no useful message.

Additional info:

Master Log:

Node Log (of failed PODs):

PV Dump:
apiVersion: v1
kind: PersistentVolume
metadata:
  annotations:
    pv.kubernetes.io/bound-by-controller: "yes"
  creationTimestamp: 2017-06-29T10:15:43Z
  name: rbd-b5wkc
  resourceVersion: "13813"
  selfLink: /api/v1/persistentvolumes/rbd-b5wkc
  uid: e8b896ae-5cb3-11e7-af12-0e8aebc0d9ea
spec:
  accessModes:
  - ReadWriteOnce
  capacity:
    storage: 5Gi
  claimRef:
    apiVersion: v1
    kind: PersistentVolumeClaim
    name: rbdc
    namespace: b5wkc
    resourceVersion: "13811"
    uid: ea7fb166-5cb3-11e7-af12-0e8aebc0d9ea
  persistentVolumeReclaimPolicy: Retain
  rbd:
    fsType: ext4
    image: disk01
    keyring: /etc/ceph/keyring
    monitors:
    - 000.000.0.000:0009
    - 000.000.0.000:0009
    - 000.000.0.000:0009
    - 000.000.0.000:0009
    pool: rbd
    secretRef:
      name: cephrbd-secret
    user: admin
status:
  phase: Bound

PVC Dump:

StorageClass Dump (if StorageClass used by PV/PVC):

Additional info:

Comment 6 hchen 2018-01-11 20:00:33 UTC

*** Bug 1505264 has been marked as a duplicate of this bug. ***