Bug 1438474

Summary: Cannot attach Azure Disk
Product: OpenShift Container Platform Reporter: Vladislav Walek <vwalek>
Component: StorageAssignee: hchen
Status: CLOSED ERRATA QA Contact: Wenqi He <wehe>
Severity: urgent Docs Contact:
Priority: high    
Version: 3.4.0CC: aos-bugs, hchen, jokerman, mmccomas, tdawson
Target Milestone: ---   
Target Release: 3.4.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1440892 (view as bug list) Environment:
Last Closed: 2017-04-19 19:43:17 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Vladislav Walek 2017-04-03 13:29:23 UTC
Description of problem:

Customer is trying to attach the Azure VHD disk to the OpenShift, as docker registry. Unfortunately, the openshift is showing following errors:

Mar 30 14:12:47 <node> atomic-openshift-node[2994]: E0330 14:12:47.066793    2994 kubelet_node_status.go:69] Unable to construct api.Node object for kubelet: failed to get external ID from cloud provider: compute.VirtualMachinesClient#Get: Failure responding to request: StatusCode=403 -- Original Error: autorest/azure: Service returned an error. Status=403 Code="AuthorizationFailed" Message="The client '<client_id>' with object id '<client_id>' does not have authorization to perform action 'Microsoft.Compute/virtualMachines/read' over scope '/subscriptions/<subscription_id>/resourceGroups/<LOCATION_ID>-SVT/providers/Microsoft.Compute/virtualMachines/<node>'."

But it disappears and following error is shown:

Mar 30 15:49:13 <node> atomic-openshift-node[15682]: E0330 15:49:13.665327   15682 nestedpendingoperations.go:253] Operation for "\"kubernetes.io/azure-disk/dockerregistry01\"" failed. No retries permitted until 2017-03-30 15:51:13.665304822 +0000 UTC (durationBeforeRetry 2m0s). Error: recovered from panic "runtime error: invalid memory address or nil pointer dereference". (err=<nil>) Call stack: 

The last error still occurs.

Version-Release number of selected component (if applicable):

OpenShift Container Platform 3.4.0

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 10 Wenqi He 2017-04-07 07:25:51 UTC
Not quite sure about the panic issue, but I found if there was a pod creating with a invalid disk, then create a pod with valid disk always fail

$ oc version
openshift v3.4.1.15
kubernetes v1.4.0+776c994


$ oc get pods 
NAME      READY     STATUS              RESTARTS   AGE
azcaro    0/1       ContainerCreating   0          19m
azrarw    0/1       ContainerCreating   0          20m
azrwro    0/1       ContainerCreating   0          20m

while azrwro and azrarw are using invalid disks, but azcaro is using a valid one.

Comment 14 Wenqi He 2017-04-07 07:33:45 UTC
After delete the pod with invalid disks, create a new pod with a valid one, this pod could be running:

$ oc get pods
NAME      READY     STATUS    RESTARTS   AGE
azcaro    1/1       Running   0          4m

$ oc exec -it azcaro sh
/ $ ls /mnt/azure/
20170309    20170310    20170313    lost+found
/ $ exit

Comment 20 Wenqi He 2017-04-11 01:39:23 UTC
Due to comment 15 and comment 14 Change, this bug is fixed. Thanks

Comment 22 errata-xmlrpc 2017-04-19 19:43:17 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:0989