Bug 1723603

Summary: Azure storage e2e tests are consistently failing
Product: OpenShift Container Platform Reporter: Abhinav Dahiya <adahiya>
Component: StorageAssignee: Fabio Bertinatto <fbertina>
Storage sub component: Kubernetes QA Contact: Wei Duan <wduan>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: unspecified CC: agarcial, aos-bugs, aos-storage-staff, brad.ison, fbertina, gblomqui, jchaloup, jsafrane
Version: 3.10.0Keywords: Reopened
Target Milestone: ---   
Target Release: 4.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-10-27 15:54:19 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Abhinav Dahiya 2019-06-24 23:36:05 UTC
Description of problem:

The Azure platform tests for storage are consistently failing due to:

a) invalid objects
b) invalid API calls.

How reproducible:

https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/pr-logs/pull/23245/pull-ci-openshift-origin-master-e2e-azure/1

```
[sig-storage] In-tree Volumes [Driver: azure] [Testpattern: Dynamic PV (default fs)] provisioning should access volume from different nodes [Suite:openshift/conformance/parallel] [Suite:k8s]
[sig-storage] In-tree Volumes [Driver: azure] [Testpattern: Dynamic PV (default fs)] subPath should verify container cannot write to subpath readonly volumes [Suite:openshift/conformance/parallel] [Suite:k8s]
[sig-storage] In-tree Volumes [Driver: azure] [Testpattern: Inline-volume (default fs)] subPath should be able to unmount after the subpath directory is deleted [Suite:openshift/conformance/parallel] [Suite:k8s]
[sig-storage] In-tree Volumes [Driver: azure] [Testpattern: Inline-volume (default fs)] subPath should support existing directories when readOnly specified in the volumeSource [Suite:openshift/conformance/parallel] [Suite:k8s]
[sig-storage] In-tree Volumes [Driver: azure] [Testpattern: Inline-volume (default fs)] subPath should support existing directory [Suite:openshift/conformance/parallel] [Suite:k8s]
[sig-storage] In-tree Volumes [Driver: azure] [Testpattern: Inline-volume (default fs)] subPath should support existing single file [Suite:openshift/conformance/parallel] [Suite:k8s]
[sig-storage] In-tree Volumes [Driver: azure] [Testpattern: Inline-volume (default fs)] subPath should support file as subpath [Suite:openshift/conformance/parallel] [Suite:k8s]
[sig-storage] In-tree Volumes [Driver: azure] [Testpattern: Inline-volume (default fs)] subPath should support non-existent path [Suite:openshift/conformance/parallel] [Suite:k8s]
[sig-storage] In-tree Volumes [Driver: azure] [Testpattern: Inline-volume (default fs)] subPath should support readOnly directory specified in the volumeMount [Suite:openshift/conformance/parallel] [Suite:k8s]
[sig-storage] In-tree Volumes [Driver: azure] [Testpattern: Inline-volume (default fs)] subPath should support readOnly file specified in the volumeMount [Suite:openshift/conformance/parallel] [Suite:k8s]
[sig-storage] In-tree Volumes [Driver: azure] [Testpattern: Inline-volume (default fs)] subPath should verify container cannot write to subpath readonly volumes [Suite:openshift/conformance/parallel] [Suite:k8s]
[sig-storage] In-tree Volumes [Driver: azure] [Testpattern: Inline-volume (default fs)] volumes should allow exec of files on the volume [Suite:openshift/conformance/parallel] [Suite:k8s]
[sig-storage] In-tree Volumes [Driver: azure] [Testpattern: Inline-volume (default fs)] volumes should be mountable [Suite:openshift/conformance/parallel] [Suite:k8s]
[sig-storage] In-tree Volumes [Driver: azure] [Testpattern: Inline-volume (ext4)] volumes should allow exec of files on the volume [Suite:openshift/conformance/parallel] [Suite:k8s]
[sig-storage] In-tree Volumes [Driver: azure] [Testpattern: Inline-volume (ext4)] volumes should be mountable [Suite:openshift/conformance/parallel] [Suite:k8s]
[sig-storage] In-tree Volumes [Driver: azure] [Testpattern: Pre-provisioned PV (block volmode)] volumeMode should create sc, pod, pv, and pvc, read/write to the pv, and delete all created resources [Suite:openshift/conformance/parallel] [Suite:k8s]
[sig-storage] In-tree Volumes [Driver: azure] [Testpattern: Pre-provisioned PV (default fs)] subPath should be able to unmount after the subpath directory is deleted [Suite:openshift/conformance/parallel] [Suite:k8s]
[sig-storage] In-tree Volumes [Driver: azure] [Testpattern: Pre-provisioned PV (default fs)] subPath should support existing directories when readOnly specified in the volumeSource [Suite:openshift/conformance/parallel] [Suite:k8s]
[sig-storage] In-tree Volumes [Driver: azure] [Testpattern: Pre-provisioned PV (default fs)] subPath should support existing directory [Suite:openshift/conformance/parallel] [Suite:k8s]
[sig-storage] In-tree Volumes [Driver: azure] [Testpattern: Pre-provisioned PV (default fs)] subPath should support existing single file [Suite:openshift/conformance/parallel] [Suite:k8s]
[sig-storage] In-tree Volumes [Driver: azure] [Testpattern: Pre-provisioned PV (default fs)] subPath should support file as subpath [Suite:openshift/conformance/parallel] [Suite:k8s]
[sig-storage] In-tree Volumes [Driver: azure] [Testpattern: Pre-provisioned PV (default fs)] subPath should support non-existent path [Suite:openshift/conformance/parallel] [Suite:k8s]
[sig-storage] In-tree Volumes [Driver: azure] [Testpattern: Pre-provisioned PV (default fs)] subPath should support readOnly directory specified in the volumeMount [Suite:openshift/conformance/parallel] [Suite:k8s]
[sig-storage] In-tree Volumes [Driver: azure] [Testpattern: Pre-provisioned PV (default fs)] subPath should support readOnly file specified in the volumeMount [Suite:openshift/conformance/parallel] [Suite:k8s]
[sig-storage] In-tree Volumes [Driver: azure] [Testpattern: Pre-provisioned PV (default fs)] subPath should verify container cannot write to subpath readonly volumes [Suite:openshift/conformance/parallel] [Suite:k8s]
[sig-storage] In-tree Volumes [Driver: azure] [Testpattern: Pre-provisioned PV (default fs)] volumes should allow exec of files on the volume [Suite:openshift/conformance/parallel] [Suite:k8s]
[sig-storage] In-tree Volumes [Driver: azure] [Testpattern: Pre-provisioned PV (default fs)] volumes should be mountable [Suite:openshift/conformance/parallel] [Suite:k8s]
[sig-storage] In-tree Volumes [Driver: azure] [Testpattern: Pre-provisioned PV (ext4)] volumes should allow exec of files on the volume [Suite:openshift/conformance/parallel] [Suite:k8s]
[sig-storage] In-tree Volumes [Driver: azure] [Testpattern: Pre-provisioned PV (ext4)] volumes should be mountable [Suite:openshift/conformance/parallel] [Suite:k8s]
[sig-storage] In-tree Volumes [Driver: azure] [Testpattern: Pre-provisioned PV (filesystem volmode)] volumeMode should create sc, pod, pv, and pvc, read/write to the pv, and delete all created resources [Suite:openshift/conformance/parallel] [Suite:k8s]
```

Comment 1 Abhinav Dahiya 2019-06-24 23:36:52 UTC
Updating severity to high as this is blocking CI for Azure.

Comment 2 Jan Safranek 2019-06-25 09:11:48 UTC
openshift-tests itself looks misconfigured, because it can't pre-create a volume for tests, all "Pre-provisioned PV" or "Inline-volume" fail. This is the code that's failing:
https://github.com/openshift/origin/blob/d7a4539442e59eb8ccd4bdc8aca5eec731dd219d/vendor/k8s.io/kubernetes/test/e2e/framework/providers/azure/azure.go#L65

accountName, accountType and location parameters are empty, which is then resolved in EnsureStorageAccount():
https://github.com/openshift/origin/blob/d7a4539442e59eb8ccd4bdc8aca5eec731dd219d/vendor/k8s.io/kubernetes/pkg/cloudprovider/providers/azure/azure_storageaccount.go#L93

How do you run openshift-tests? Does it have all azure-specific options / config files / env. variables so the account name + type discovery above can work?


On the bright side, it looks like that the cluster under test is configured correctly - most of tests with dynamically provisioned volumes are working.

Comment 3 Jan Safranek 2019-06-25 11:43:33 UTC
From openshift-tests output:

Jun 24 19:30:37.629: INFO: Couldn't create a new PD, sleeping 5 seconds: could not get storage key for storage account : could not list storage accounts for account type : azure.BearerAuthorizer#WithAuthorization: Failed to refresh the Token for request to https://management.azure.com/subscriptions/d38f1e38-4bed-438e-b227-833f997adf6a/resourceGroups/ci-op-dp142r4t-5cef7-2f6dk-rg/providers/Microsoft.Storage/storageAccounts?api-version=2018-07-01: StatusCode=404 -- Original Error: adal: Refresh request failed. Status Code = '404'. Response body: <!DOCTYPE html>
<html lang=en>
  <meta charset=utf-8>
  <meta name=viewport content="initial-scale=1, minimum-scale=1, width=device-width">
  <title>Error 404 (Not Found)!!1</title>
  <style>
    *{margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{margin:7% auto 0;max-width:390px;min-height:180px;padding:30px 0 15px}* > body{background:url(//www.google.com/images/errors/robot.png) 100% 5px no-repeat;padding-right:205px}p{margin:11px 0 22px;overflow:hidden}ins{color:#777;text-decoration:none}a img{border:0}@media screen and (max-width:772px){body{background:none;margin-top:0;max-width:none;padding-right:0}}#logo{background:url(//www.google.com/images/branding/googlelogo/1x/googlelogo_color_150x54dp.png) no-repeat;margin-left:-5px}@media only screen and (min-resolution:192dpi){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat 0% 0%/100% 100%;-moz-border-image:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) 0}}@media only screen and (-webkit-min-device-pixel-ratio:2){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat;-webkit-background-size:100% 100%}}#logo{display:inline-block;height:54px;width:150px}
  </style>
  <a href=//www.google.com/><span id=logo aria-label=Google></span></a>
  <p><b>404.</b> <ins>That’s an error.</ins>
  <p>The requested URL <code>/metadata/identity/oauth2/token?api-version=2018-02-01&amp;resource=https%3A%2F%2Fmanagement.core.windows.net%2F</code> was not found on this server.  <ins>That’s all we know.</ins>

Comment 4 Abhinav Dahiya 2019-06-25 17:57:44 UTC
The contents for cloud.conf for the Azure tests 

```
aadClientCertPassword: ""
aadClientCertPath: ""
aadClientId: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
aadClientSecret: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
cloud: AzurePublicCloud
cloudProviderBackoff: true
cloudProviderBackoffDuration: 6
cloudProviderBackoffExponent: 0
cloudProviderBackoffJitter: 0
cloudProviderBackoffMode: ""
cloudProviderBackoffRetries: 0
cloudProviderRateLimit: true
cloudProviderRateLimitBucket: 10
cloudProviderRateLimitBucketWrite: 10
cloudProviderRateLimitQPS: 6
cloudProviderRateLimitQPSWrite: 6
disableOutboundSNAT: null
excludeMasterFromStandardLB: null
loadBalancerSku: standard
location: centralus
maximumLoadBalancerRuleCount: 0
primaryAvailabilitySetName: ""
primaryScaleSetName: ""
resourceGroup: adahiya-1-zr9dr-rg
routeTableName: adahiya-1-zr9dr-node-routetable
securityGroupName: adahiya-1-zr9dr-node-nsg
subnetName: adahiya-1-zr9dr-node-subnet
subscriptionId: 433715e6-37fe-4328-af75-3661e13b15fc
tenantId: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
useInstanceMetadata: true
useManagedIdentityExtension: true
userAssignedIdentityID: ""
vmType: ""
vnetName: adahiya-1-zr9dr-vnet
vnetResourceGroup: adahiya-1-zr9dr-rg
```

What do you think is missing jsafrane ??

Comment 5 Abhinav Dahiya 2019-06-25 18:03:35 UTC
test case: `[sig-storage] In-tree Volumes [Driver: azure] [Testpattern: Dynamic PV (default fs)] provisioning should access volume from different nodes [Suite:openshift/conformance/parallel] [Suite:k8s]`

Seems to be failing because:
```
oc get nodes -ojson | jq '.items[].metadata.labels'
{
  "beta.kubernetes.io/arch": "amd64",
  "beta.kubernetes.io/instance-type": "Standard_DS4_v2",
  "beta.kubernetes.io/os": "linux",
  "failure-domain.beta.kubernetes.io/region": "centralus",
  "failure-domain.beta.kubernetes.io/zone": "0",
  "kubernetes.io/arch": "amd64",
  "kubernetes.io/hostname": "adahiya-1-zr9dr-master-0",
  "kubernetes.io/os": "linux",
  "node-role.kubernetes.io/master": "",
  "node.openshift.io/os_id": "rhcos"
}
{
  "beta.kubernetes.io/arch": "amd64",
  "beta.kubernetes.io/instance-type": "Standard_DS4_v2",
  "beta.kubernetes.io/os": "linux",
  "failure-domain.beta.kubernetes.io/region": "centralus",
  "failure-domain.beta.kubernetes.io/zone": "0",
  "kubernetes.io/arch": "amd64",
  "kubernetes.io/hostname": "adahiya-1-zr9dr-master-1",
  "kubernetes.io/os": "linux",
  "node-role.kubernetes.io/master": "",
  "node.openshift.io/os_id": "rhcos"
}
{
  "beta.kubernetes.io/arch": "amd64",
  "beta.kubernetes.io/instance-type": "Standard_DS4_v2",
  "beta.kubernetes.io/os": "linux",
  "failure-domain.beta.kubernetes.io/region": "centralus",
  "failure-domain.beta.kubernetes.io/zone": "0",
  "kubernetes.io/arch": "amd64",
  "kubernetes.io/hostname": "adahiya-1-zr9dr-master-2",
  "kubernetes.io/os": "linux",
  "node-role.kubernetes.io/master": "",
  "node.openshift.io/os_id": "rhcos"
}
{
  "beta.kubernetes.io/arch": "amd64",
  "beta.kubernetes.io/instance-type": "Standard_DS4_v2",
  "beta.kubernetes.io/os": "linux",
  "failure-domain.beta.kubernetes.io/region": "centralus",
  "failure-domain.beta.kubernetes.io/zone": "centralus-1",
  "kubernetes.io/arch": "amd64",
  "kubernetes.io/hostname": "adahiya-1-zr9dr-worker-ftcm4",
  "kubernetes.io/os": "linux",
  "node-role.kubernetes.io/worker": "",
  "node.openshift.io/os_id": "rhcos"
}
{
  "beta.kubernetes.io/arch": "amd64",
  "beta.kubernetes.io/instance-type": "Standard_DS4_v2",
  "beta.kubernetes.io/os": "linux",
  "failure-domain.beta.kubernetes.io/region": "centralus",
  "failure-domain.beta.kubernetes.io/zone": "centralus-2",
  "kubernetes.io/arch": "amd64",
  "kubernetes.io/hostname": "adahiya-1-zr9dr-worker-g98df",
  "kubernetes.io/os": "linux",
  "node-role.kubernetes.io/worker": "",
  "node.openshift.io/os_id": "rhcos"
}
{
  "beta.kubernetes.io/arch": "amd64",
  "beta.kubernetes.io/instance-type": "Standard_DS4_v2",
  "beta.kubernetes.io/os": "linux",
  "failure-domain.beta.kubernetes.io/region": "centralus",
  "failure-domain.beta.kubernetes.io/zone": "centralus-1",
  "kubernetes.io/arch": "amd64",
  "kubernetes.io/hostname": "adahiya-1-zr9dr-worker-zfl45",
  "kubernetes.io/os": "linux",
  "node-role.kubernetes.io/worker": "",
  "node.openshift.io/os_id": "rhcos"
}
```

```
oc get pv -oyaml
apiVersion: v1
items:
- apiVersion: v1
  kind: PersistentVolume
  metadata:
    annotations:
      pv.kubernetes.io/bound-by-controller: "yes"
      pv.kubernetes.io/provisioned-by: kubernetes.io/azure-disk
      volumehelper.VolumeDynamicallyCreatedByKey: azure-disk-dynamic-provisioner
    creationTimestamp: "2019-06-25T17:19:55Z"
    finalizers:
    - kubernetes.io/pv-protection
    labels:
      failure-domain.beta.kubernetes.io/region: centralus
      failure-domain.beta.kubernetes.io/zone: centralus-2
    name: pvc-6fdcb2a9-976d-11e9-9bd8-000d3a948bbe
    resourceVersion: "21282"
    selfLink: /api/v1/persistentvolumes/pvc-6fdcb2a9-976d-11e9-9bd8-000d3a948bbe
    uid: 7351eccf-976d-11e9-a460-000d3a3f59ef
  spec:
    accessModes:
    - ReadWriteOnce
    azureDisk:
      cachingMode: ReadOnly
      diskName: kubernetes-dynamic-pvc-6fdcb2a9-976d-11e9-9bd8-000d3a948bbe
      diskURI: /subscriptions/433715e6-37fe-4328-af75-3661e13b15fc/resourceGroups/adahiya-1-zr9dr-rg/providers/Microsoft.Compute/disks/kubernetes-dynamic-pvc-6fdcb2a9-976d-11e9-9bd8-000d3a948bbe
      fsType: ""
      kind: Managed
      readOnly: false
    capacity:
      storage: 5Gi
    claimRef:
      apiVersion: v1
      kind: PersistentVolumeClaim
      name: pvc-z74wx
      namespace: provisioning-8879
      resourceVersion: "21254"
      uid: 6fdcb2a9-976d-11e9-9bd8-000d3a948bbe
    nodeAffinity:
      required:
        nodeSelectorTerms:
        - matchExpressions:
          - key: failure-domain.beta.kubernetes.io/region
            operator: In
            values:
            - centralus
          - key: failure-domain.beta.kubernetes.io/zone
            operator: In
            values:
            - centralus-2
    persistentVolumeReclaimPolicy: Delete
    storageClassName: provisioning-8879-azure-sc
    volumeMode: Filesystem
  status:
    phase: Bound
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

```

```
oc describe pod pvc-reader-node2-554ch -n provisioning-8879
Name:               pvc-reader-node2-554ch
Namespace:          provisioning-8879
Priority:           0
PriorityClassName:  <none>
Node:               <none>
Labels:             app=pvc-reader-node2
Annotations:        openshift.io/scc: anyuid
Status:             Pending
IP:
Containers:
  volume-tester:
    Image:      docker.io/library/busybox:1.29
    Port:       <none>
    Host Port:  <none>
    Command:
      /bin/sh
      -c
      grep 'hello world' /mnt/test/data
    Environment:  <none>
    Mounts:
      /mnt/test from my-volume (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-tfk8c (ro)
Conditions:
  Type           Status
  PodScheduled   False
Volumes:
  my-volume:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  pvc-z74wx
    ReadOnly:   false
  default-token-tfk8c:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-tfk8c
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason            Age                  From               Message
  ----     ------            ----                 ----               -------
  Warning  FailedScheduling  88s (x3 over 2m54s)  default-scheduler  0/6 nodes are available: 1 node(s) didn't match node selector, 2 node(s) had volume node affinity conflict, 3 node(s) had taints that the pod didn't tolerate.
```

The test pod is failing to scheduler because of some Zone level mismatch. What is the expactation for the storage setup? in terms of VMs in specific Zones, Regions etc...?

Comment 6 Jan Safranek 2019-06-26 11:39:15 UTC
> `[sig-storage] In-tree Volumes [Driver: azure] [Testpattern: Dynamic PV (default fs)] provisioning should access volume from different nodes [Suite:openshift/conformance/parallel] [Suite:k8s]`

This is tracked in bug #1711688.

Let's focus on openshift-tests setup here, which will fix most of the failures. When that's working we can sort out individual flakes in other bugs.

> The contents for cloud.conf for the Azure tests 
[snip]
> What do you think is missing jsafrane ??

I don't know. Our knowledge of Azure is very limited, I hoped that you would be more familiar with its setup. In the end, cloud provider in openshift-tests should be set up in the same way as the cloud provider in OpenShift itself. I don't know Azure enough to judge what is wrong and where.

Comment 7 Jan Safranek 2019-06-26 15:20:58 UTC
I found out why the test gets 404: it uses link-local address to get something from the cloud; http://169.254.169.254/metadata/identity/oauth2/token?api-version=2018-02-01&resource=https%3A%2F%2Fmanagement.core.windows.net%2F

It expects it runs on azure, were such URL might have sense, but it runs on GCE(?) and thus it gets 404. I don't know anything about how the Kubernetes cloud provider works and if it can be configured to run on non-azure machine to provision a disk.

Comment 8 Jan Safranek 2019-06-28 12:29:24 UTC
[trying to re-assing to azure team]

This bug is not about storage but about cloud provider that's used in e2e tests. It must be able to run outside of azure, use only azure API to create volumes and not to use link-local addresses.

In short, this file should work outside of azure:

https://github.com/kubernetes/kubernetes/blob/d3a902ff5b5b8737f2d5ff649656669b8223068f/test/e2e/framework/providers/azure/azure.go

Comment 9 Alberto 2019-07-26 13:34:18 UTC
>It expects it runs on azure, were such URL might have sense, but it runs on GCE(?) and thus it gets 404.
Jan Safranek I'm confused, the e2e for Azure are running on a cluster running on Azure

Comment 10 Jan Safranek 2019-07-26 14:16:35 UTC
No, our e2e tests do not run on Azure. I believe our OpenShift CI cluster (api.ci.openshift.org) runs on GCE and it spawns a pod for each test. The pod installs a "cluster under test" into the real cloud (AWS/Azure/GCE/vSphere/...), but the test binary itself (openshift-tests) still runs in a pod on OpenShift CI cluster, i.e. on GCE.

I asked upstream, it seems it should be enough to set ""useInstanceMetadata": false in your azure cloud config", see https://kubernetes.slack.com/archives/C5HJXTT9Q/p156156401228690.
openshift-tests already does that, see 
 https://github.com/openshift/origin/blame/5d555f05619ad069ca78670ecc06b6c0fb5f0047/test/extended/util/azure/config_file.go#L36, so maybe it's already fixed.

Comment 11 Alberto 2019-07-26 14:35:03 UTC
Thanks for the update Jan but I'm still confused. Regardless where you run the binary from e.g your local machine, it deploys a cluster on you cloud of choice e.g aws/azure and it runs the e2e against that cloud environment to validate the expectations there.

Comment 12 Jan Safranek 2019-07-26 14:45:59 UTC
openshift-tests contains this code: https://github.com/openshift/origin/blob/d7a4539442e59eb8ccd4bdc8aca5eec731dd219d/vendor/k8s.io/kubernetes/test/e2e/framework/providers/azure/azure.go#L65

I.e. it wants to create Azure disks for tests. It does not run on Azure && it wanted to use Azure metadata -> error.  It's a long time ago, maybe it's better now.

Comment 13 Vikas Choudhary 2019-08-22 11:09:54 UTC
small update:
These tests fail on aws as well and have been skipped,https://github.com/kubernetes/kubernetes/blob/master/test/e2e/storage/drivers/in_tree.go#L1553, at upstream k/k too

Comment 14 Jan Chaloupka 2019-08-22 15:33:57 UTC
Jan, do you think it's ok to skip the test completely? If so, we can just skip and close this issue.

Comment 15 Vikas Choudhary 2019-08-26 10:19:21 UTC
Already being skipped: https://github.com/openshift/origin/blob/master/test/extended/util/test.go#L442-L472

Initially at k/k this storage tests were only for gce. Refactoring was done in this PR, https://github.com/kubernetes/kubernetes/pull/66577/files and interfaces were introduced. To satisfy interface, half baked tests were introduced in that PR for aws and azure. Aws storage tests are commented at upstream k/k since then.

This is more an enhancement rather an issue.

Comment 16 Vikas Choudhary 2019-08-26 10:31:02 UTC
will work on getting storage test working for azure and aws. We are already skipping aws storage test, so skipping azure aws tests, i think should be sufficient for this BZ. Closing it!

Comment 17 Jan Safranek 2019-09-04 08:40:25 UTC
I think there are several issues mixed here.

1. openshift-tests binary, running on GCE,  needs to create / delete volume on a remote Azure cluster. This is IMO not fixed, I still can see the corresponding tests disabled in our CI: https://github.com/openshift/origin/blob/cf923545a180bbe4bfd03db7d7fc01a2bf9ff23d/test/extended/util/test.go#L445

I think we're a step further, with all azure tests enabled, I get this error:

Sep  3 12:52:13.178: INFO: At 2019-09-03 12:47:09 +0000 UTC - event for pod-subpath-test-azure-4f4h: {attachdetach-controller } FailedAttachVolume: AttachVolume.Attach failed for volume "test-volume" : Attach volume "e2e-effe3739-ce48-11e9-86bc-0a58ac100cb7.vhd" to instance "/subscriptions/d38f1e38-4bed-438e-b227-833f997adf6a/resourceGroups/ci-op-85vrtic6-5cef7-dxct6-rg/providers/Microsoft.Compute/virtualMachines/ci-op-85vrtic6-5cef7-dxct6-worker-centralus3-t4685" failed with compute.VirtualMachinesClient#CreateOrUpdate: Failure sending request: StatusCode=0 -- Original Error: autorest/azure: Service returned an error. Status=<nil> Code="OperationNotAllowed" Message="Addition of a blob based disk to VM with managed disks is not supported." Target="dataDisk"

The reason is that the test creates a "blob" disk, while the virtual machine can work only with "managed" disks. See https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/pr-logs/pull/23715/pull-ci-openshift-origin-master-e2e-azure/33

I filed https://github.com/kubernetes/kubernetes/issues/82272 to fix this. I.e. there is some work to be done, disabling the tests should be just a temporary measure. We do want to run the tests!


2. volume limit test was disabled upstream: https://github.com/kubernetes/kubernetes/blob/master/test/e2e/storage/drivers/in_tree.go#L1553. It's OK to disable this particular test in our CI too. But it is completely orthogonal to 1! Storage has much more tests that volume limits.


In any case, this is not a blocker bug.

Comment 19 Alberto 2020-05-29 10:35:55 UTC
This has not been prioritised yet and it need to be re-evaluated. Tagging UpcomingSprint.

Comment 20 Danil Grigorev 2020-07-09 08:51:24 UTC
I didn't encounter the described error at any moment in CI, and the fix was merged https://github.com/kubernetes/kubernetes/pull/82324 a while ago. It is now creating the right kind of azure disc - https://github.com/kubernetes/kubernetes/pull/82324/files#diff-d1a5ece2215eb348ea751cd0ac48592fR1470 so assuming this is done now. Could you please verify this?

Moving to the storage team as well.

Comment 23 Wei Duan 2020-08-28 06:19:24 UTC
@Fabio, I have some queries about this PR, could you check if my understanding is right?

1. Looks like we removed the skip for [Driver: azure], but now our test cases are [Driver: azure-disk], so I understand there is no change for case executing?
`\[sig-storage\] In-tree Volumes \[Driver: azure\] \[Testpattern: Inline-volume`,
`\[sig-storage\] In-tree Volumes \[Driver: azure\] \[Testpattern: Pre-provisioned PV`,

2. Actually I see some cases related to azure [Inline-volume]/[Pre-provisioned] [subpath] are skipped, could you help confirm it is expected? 
Like in https://prow.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-azure-4.6/1298942957960826880
[sig-storage] In-tree Volumes [Driver: azure-disk] [Testpattern: Pre-provisioned PV (default fs)] subPath ...
[sig-storage] In-tree Volumes [Driver: azure-disk] [Testpattern: Inline-volume (default fs)] subPath ...

Comment 24 Fabio Bertinatto 2020-08-28 09:39:29 UTC
(In reply to Wei Duan from comment #23)
> @Fabio, I have some queries about this PR, could you check if my
> understanding is right?
> 
> 1. Looks like we removed the skip for [Driver: azure], but now our test
> cases are [Driver: azure-disk], so I understand there is no change for case
> executing?
> `\[sig-storage\] In-tree Volumes \[Driver: azure\] \[Testpattern:
> Inline-volume`,
> `\[sig-storage\] In-tree Volumes \[Driver: azure\] \[Testpattern:
> Pre-provisioned PV`,

That's correct, the PR was just a clean-up. At some point the driver name changed from "azure" to "azure-disk", which invalidated our skip rule.

Since the tests are passing, I removed the skip rules instead of renaming them.

> 
> 2. Actually I see some cases related to azure
> [Inline-volume]/[Pre-provisioned] [subpath] are skipped, could you help
> confirm it is expected? 
> Like in
> https://prow.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-
> ocp-installer-e2e-azure-4.6/1298942957960826880
> [sig-storage] In-tree Volumes [Driver: azure-disk] [Testpattern:
> Pre-provisioned PV (default fs)] subPath ...
> [sig-storage] In-tree Volumes [Driver: azure-disk] [Testpattern:
> Inline-volume (default fs)] subPath ...

Good catch, @Wei. This is expected, as these tests are considered to be redundant:

https://github.com/openshift/origin/blob/abc0e0c4013244b125b9f8bfcb32be8be355a3bc/vendor/k8s.io/kubernetes/test/e2e/storage/testsuites/subpath.go#L85-L89

Comment 25 Wei Duan 2020-08-28 14:29:05 UTC
@Fabio, thanks, it is clear now. 
I changed the status as "Verified"

Comment 27 errata-xmlrpc 2020-10-27 15:54:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196