Bug 1438474

Summary:	Cannot attach Azure Disk
Product:	OpenShift Container Platform	Reporter:	Vladislav Walek <vwalek>
Component:	Storage	Assignee:	hchen
Status:	CLOSED ERRATA	QA Contact:	Wenqi He <wehe>
Severity:	urgent	Docs Contact:
Priority:	high
Version:	3.4.0	CC:	aos-bugs, hchen, jokerman, mmccomas, tdawson
Target Milestone:	---
Target Release:	3.4.z
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:
Clones:	1440892 (view as bug list)		Environment:
Last Closed:	2017-04-19 19:43:17 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Vladislav Walek 2017-04-03 13:29:23 UTC

Description of problem:

Customer is trying to attach the Azure VHD disk to the OpenShift, as docker registry. Unfortunately, the openshift is showing following errors:

Mar 30 14:12:47 <node> atomic-openshift-node[2994]: E0330 14:12:47.066793    2994 kubelet_node_status.go:69] Unable to construct api.Node object for kubelet: failed to get external ID from cloud provider: compute.VirtualMachinesClient#Get: Failure responding to request: StatusCode=403 -- Original Error: autorest/azure: Service returned an error. Status=403 Code="AuthorizationFailed" Message="The client '<client_id>' with object id '<client_id>' does not have authorization to perform action 'Microsoft.Compute/virtualMachines/read' over scope '/subscriptions/<subscription_id>/resourceGroups/<LOCATION_ID>-SVT/providers/Microsoft.Compute/virtualMachines/<node>'."

But it disappears and following error is shown:

Mar 30 15:49:13 <node> atomic-openshift-node[15682]: E0330 15:49:13.665327   15682 nestedpendingoperations.go:253] Operation for "\"kubernetes.io/azure-disk/dockerregistry01\"" failed. No retries permitted until 2017-03-30 15:51:13.665304822 +0000 UTC (durationBeforeRetry 2m0s). Error: recovered from panic "runtime error: invalid memory address or nil pointer dereference". (err=<nil>) Call stack: 

The last error still occurs.

Version-Release number of selected component (if applicable):

OpenShift Container Platform 3.4.0

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 10 Wenqi He 2017-04-07 07:25:51 UTC

Not quite sure about the panic issue, but I found if there was a pod creating with a invalid disk, then create a pod with valid disk always fail

$ oc version
openshift v3.4.1.15
kubernetes v1.4.0+776c994


$ oc get pods 
NAME      READY     STATUS              RESTARTS   AGE
azcaro    0/1       ContainerCreating   0          19m
azrarw    0/1       ContainerCreating   0          20m
azrwro    0/1       ContainerCreating   0          20m

while azrwro and azrarw are using invalid disks, but azcaro is using a valid one.

Comment 14 Wenqi He 2017-04-07 07:33:45 UTC

After delete the pod with invalid disks, create a new pod with a valid one, this pod could be running:

$ oc get pods
NAME      READY     STATUS    RESTARTS   AGE
azcaro    1/1       Running   0          4m

$ oc exec -it azcaro sh
/ $ ls /mnt/azure/
20170309    20170310    20170313    lost+found
/ $ exit

Comment 20 Wenqi He 2017-04-11 01:39:23 UTC

Due to comment 15 and comment 14 Change, this bug is fixed. Thanks

Comment 22 errata-xmlrpc 2017-04-19 19:43:17 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:0989