Bug 1832144

Summary: [4.3] image registry operator keeps creating new storage accounts on Azure
Product: OpenShift Container Platform Reporter: Ricardo Maraschini <rmarasch>
Component: Image RegistryAssignee: Ricardo Maraschini <rmarasch>
Status: CLOSED ERRATA QA Contact: Wenjing Zheng <wzheng>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.3.zCC: aos-bugs, jminter, obulatov, rmarasch, scuppett, wzheng
Target Milestone: ---   
Target Release: 4.3.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: Having an azure infrastructure name containing uppercase letters. Consequence: Azure containers and storage accounts are named using infrastructure name, in some circumstances the auto generated name (containing the infrastructure name) would be valid for a storage account but not for a container, making the operator to create the first but fail on the latter and forcing a retry. Fix: Adjusted the container name creation logic to discard invalid characters. Result: Image registry is correctly deployed on an infrastructure that contain invalid characters on its name.
Story Points: ---
Clone Of: 1832140 Environment:
Last Closed: 2020-06-23 20:58:40 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1832140    
Bug Blocks:    

Comment 2 Ricardo Maraschini 2020-05-29 08:05:28 UTC
Fix for this has already been verified on version 4.5. Awaiting for the same fix get VERIFIED 4.4 before we can move on, so moving this to the next sprint.

Comment 5 Wenjing Zheng 2020-06-16 06:23:45 UTC
When try to use Capital letter in spec.storage.azure.container, it has no warning like 
# configs.imageregistry.operator.openshift.io "cluster" was not valid:
# * spec.storage.azure.container: Invalid value: "": spec.storage.azure.container in body should match '^[0-9a-z]+(-[0-9a-z]+)*$'

And below error appears in operator log:

E0616 06:21:35.748785      13 controller.go:251] unable to sync: unable to sync storage configuration: -> github.com/openshift/cluster-image-registry-operator/vendor/github.com/Azure/azure-storage-blob-go/azblob.newStorageError, /go/src/github.com/openshift/cluster-image-registry-operator/vendor/github.com/Azure/azure-storage-blob-go/azblob/zc_storage_error.go:42
===== RESPONSE ERROR (ServiceCode=InvalidResourceName) =====
Description=The specifed resource name contains invalid characters.
RequestId:0e9a6f3e-901e-00a9-52a6-43b0af000000
Time:2020-06-16T06:21:36.5920978Z, Details: 
   Code: InvalidResourceName
   PUT https://wzheng4328nqgh4d6b.blob.core.windows.net/Wzheng-43-28nqg-image-registry-vdnqsevsehmqtyqldjakfjoatfktcqq?restype=container&timeout=61
   Authorization: REDACTED
   User-Agent: [openshift.io cluster-image-registry-operator/v4.3.26-202006150829-dirty Azure-Storage/0.7 (go1.12.12; linux)]
   X-Ms-Client-Request-Id: [1daf49ad-38e1-4484-7161-8d3342d67cc0]
   X-Ms-Date: [Tue, 16 Jun 2020 06:21:35 GMT]
   X-Ms-Version: [2018-11-09]
   --------------------------------------------------------------------------------
   RESPONSE Status: 400 The specifed resource name contains invalid characters.
   Content-Length: [243]
   Content-Type: [application/xml]
   Date: [Tue, 16 Jun 2020 06:21:35 GMT]
   Server: [Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0]
   X-Ms-Error-Code: [InvalidResourceName]
   X-Ms-Request-Id: [0e9a6f3e-901e-00a9-52a6-43b0af000000]
   X-Ms-Version: [2018-11-09]


, requeuing
I0616 06:21:35.771769      13 generator.go:59] object *v1.ClusterOperator, Name=image-registry updated: changed:metadata.resourceVersion={"52011" -> "52020"}, changed:metadata.selfLink={"/apis/config.openshift.io/v1/clusteroperators/image-registry" -> "/apis/config.openshift.io/v1/clusteroperators/image-registry/status"}, changed:status.conditions.1.message={"Unable to apply resources: unable to sync storage configuration: -> github.com/openshift/cluster-image-registry-operator/vendor/github.com/Azure/azure-storage-blob-go/azblob.newStorageError, /go/src/github.com/openshift/cluster-image-registry-operator/vendor/github.com/Azure/azure-storage-blob-go/azblob/zc_storage_error.go:42\n===== RESPONSE ERROR (ServiceCode=InvalidResourceName) =====\nDescription=The specifed resource name contains invalid characters.\nRequestId:0e9a6b3d-901e-00a9-19a6-43b0af000000\nTime:2020-06-16T06:21:35.7725407Z, Details: \n   Code: InvalidResourceName\n   PUT https://wzheng4328nqgh4d6b.blob.core.windows.net/Wzheng-43-28nqg-image-registry-vdnqsevsehmqtyqldjakfjoatfktcqq?restype=container&timeout=61\n   Authorization: REDACTED\n   User-Agent: [openshift.io cluster-image-registry-operator/v4.3.26-202006150829-dirty Azure-Storage/0.7 (go1.12.12; linux)]\n   X-Ms-Client-Request-Id: [77181d35-1592-45ba-57f2-8609064f2139]\n   X-Ms-Date: [Tue, 16 Jun 2020 06:21:34 GMT]\n   X-Ms-Version: [2018-11-09]\n   --------------------------------------------------------------------------------\n   RESPONSE Status: 400 The specifed resource name contains invalid characters.\n   Content-Length: [243]\n   Content-Type: [application/xml]\n   Date: [Tue, 16 Jun 2020 06:21:35 GMT]\n   Server: [Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0]\n   X-Ms-Error-Code: [InvalidResourceName]\n   X-Ms-Request-Id: [0e9a6b3d-901e-00a9-19a6-43b0af000000]\n   X-Ms-Version: [2018-11-09]\n\n\n" -> "Unable to apply resources: unable to sync storage configuration: -> github.com/openshift/cluster-image-registry-operator/vendor/github.com/Azure/azure-storage-blob-go/azblob.newStorageError, /go/src/github.com/openshift/cluster-image-registry-operator/vendor/github.com/Azure/azure-storage-blob-go/azblob/zc_storage_error.go:42\n===== RESPONSE ERROR (ServiceCode=InvalidResourceName) =====\nDescription=The specifed resource name contains invalid characters.\nRequestId:0e9a6f3e-901e-00a9-52a6-43b0af000000\nTime:2020-06-16T06:21:36.5920978Z, Details: \n   Code: InvalidResourceName\n   PUT https://wzheng4328nqgh4d6b.blob.core.windows.net/Wzheng-43-28nqg-image-registry-vdnqsevsehmqtyqldjakfjoatfktcqq?restype=container&timeout=61\n   Authorization: REDACTED\n   User-Agent: [openshift.io cluster-image-registry-operator/v4.3.26-202006150829-dirty Azure-Storage/0.7 (go1.12.12; linux)]\n   X-Ms-Client-Request-Id: [1daf49ad-38e1-4484-7161-8d3342d67cc0]\n   X-Ms-Date: [Tue, 16 Jun 2020 06:21:35 GMT]\n   X-Ms-Version: [2018-11-09]\n   --------------------------------------------------------------------------------\n   RESPONSE Status: 400 The specifed resource name contains invalid characters.\n   Content-Length: [243]\n   Content-Type: [application/xml]\n   Date: [Tue, 16 Jun 2020 06:21:35 GMT]\n   Server: [Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0]\n   X-Ms-Error-Code: [InvalidResourceName]\n   X-Ms-Request-Id: [0e9a6f3e-901e-00a9-52a6-43b0af000000]\n   X-Ms-Version: [2018-11-09]\n\n\n"}

Comment 6 Ricardo Maraschini 2020-06-17 07:00:08 UTC
@Wenjing Zheng The problem you are facing during the validation of this ticket is something not related to the original issue. We have no plans to backport the container name validation down to 4.3 release, if the user enters an invalid container name manually image registry won't come up and there are logs about the bad configuration.

The original problem (i.e. the one that needs to be replicated) is: image registry was not coming up on a cluster deployment in an Azure environment with "invalid" infrastructure name (upper cases, for instance).

Comment 7 Wenjing Zheng 2020-06-18 11:41:51 UTC
Thanks, Ricardo!
1. I manually make change like this : infrastructureName: SSSSSzheng-43-47kch
oc edit infrastructure cluster 
2. Then change image registry to Removed and then Managed, I can see it is using lower cases:
  storage:
    azure:
      accountName: ssssszheng4347kchz44ph
      container: ssssszheng-43-47kch-image-registry-xjtxccqqpftenhshbfmptjkelkq
3. Then I watch image registry pods:
$ oc get pods
NAME                                               READY   STATUS    RESTARTS   AGE
cluster-image-registry-operator-798f76c669-tp65b   2/2     Running   0          56m
image-registry-6db44dc9ff-lrcjn                    0/1     Pending   0          9m9s
image-registry-75bb5d9549-5ntrc                    1/1     Running   0          9m10s
image-registry-75bb5d9549-n246c                    0/1     Pending   0          9m10s
node-ca-gpkb6                                      1/1     Running   0          55m
node-ca-h47ss                                      1/1     Running   0          55m
node-ca-ns9f9                                      1/1     Running   0          55m
node-ca-r67ms                                      1/1     Running   0          52m
node-ca-xv6wq                                      1/1     Running   0          54m

I found two pods keeps pending and with below error:
Events:
  Type     Reason            Age        From               Message
  ----     ------            ----       ----               -------
  Warning  FailedScheduling  <unknown>  default-scheduler  0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/5 nodes are available: 2 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.

Is it expected?

Comment 8 Wenjing Zheng 2020-06-19 05:57:15 UTC
1. Add some upper cases letter to infrastructureName:
$ oc get infrastructure cluster -o yaml | grep infrastructureName
  infrastructureName: WWWwzheng-43-msbct
2. Removed image registry and then bring it back to check below info:
  storage:
    azure:
      accountName: wwwwzheng43msbctpz427
      container: wwwwzheng-43-msbct-image-registry-jemunkbyfsclpubefygutmowvmks
3. Check image registry pod status - all running

Verified on 4.3.26.

Comment 10 errata-xmlrpc 2020-06-23 20:58:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2585