Bug 1670357

Summary: image-registry pod stuck in "CreateContainerConfigError"
Product: OpenShift Container Platform Reporter: Rutvik <rkshirsa>
Component: Image RegistryAssignee: Oleg Bulatov <obulatov>
Status: CLOSED ERRATA QA Contact: Wenjing Zheng <wzheng>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 4.1.0CC: aabhishe, aos-bugs, bparees, cdaley, gucore, rkshirsa, suchaudh
Target Milestone: ---   
Target Release: 4.1.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-06-04 10:42:19 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1664187    

Description Rutvik 2019-01-29 11:44:09 UTC
Description of problem:

When I installed cluster and went to see image-registry pods, I found those in "CreateContainerConfigError" state. However, other pods are up and running.

# oc get pods
NAME                                               READY     STATUS                       RESTARTS   AGE
cluster-image-registry-operator-684499b66b-4x5nd   1/1       Running                      0          3h
image-registry-5d74797757-b2llp                    0/1       CreateContainerConfigError   0          33m
image-registry-897d8cbdd-6khfl                     0/1       CreateContainerConfigError   0  


----------------------------------------

# oc get configs.imageregistry.operator.openshift.io/instance -o yaml -n openshift-image-registry
apiVersion: imageregistry.operator.openshift.io/v1
kind: Config
metadata:
creationTimestamp: 2019-01-29T07:52:51Z
finalizers:
- imageregistry.operator.openshift.io/finalizer
generation: 1
name: instance
resourceVersion: "16328"
selfLink: /apis/imageregistry.operator.openshift.io/v1/configs/instance
uid: e107e541-239a-11e9-ab16-02f627312978
spec:
httpSecret: 4edf243191160ae3563d2681b80654734c53210e99241e6e077b2ea8ac03c53a0859fbedbb0aec96461f87dbb4e6ddf942998a74c447dce3f446ca932
logging: 2
managementState: Managed
proxy: {}
replicas: 1
requests:
read: {}
write: {}
storage:
s3: {}
status:
conditions:
- lastTransitionTime: 2019-01-29T07:52:52Z
message: Deployment does not have available replicas
status: "False"
type: Available
- lastTransitionTime: 2019-01-29T07:52:52Z
message: 'Unable to apply resources: unable to sync secrets: timed out waiting
for the condition'
status: "True"
type: Progressing
- lastTransitionTime: 2019-01-29T07:52:52Z
status: "False"
type: Failing
- lastTransitionTime: 2019-01-29T07:52:52Z
status: "False"
type: Removed
generations: null
internalRegistryHostname: ""
observedGeneration: 1
readyReplicas: 0
storage: {}
storageManaged: false
version: ""

# oc get events
LAST SEEN   FIRST SEEN   COUNT     NAME                                               KIND      SUBOBJECT                   TYPE      REASON    SOURCE                                              MESSAGE
1h          1h           12        image-registry-5d74797757-b2llp.157e4b3329abc697   Pod       spec.containers{registry}   Warning   Failed    kubelet,       xyz.internal   Error: secrets "image-registry-private-configuration" not found

----------------------------------------


Version-Release number of selected component (if applicable):

oc v4.0.0-0.147.0
kubernetes v1.11.0+dde478551e

Installer:

$ openshift-install version
./openshift-install v0.11.0


Expected results:

image-registry pods should be up and running as soon as the cluster is installed.

Additional info:
https://github.com/openshift/installer/issues/1138

Comment 1 Ben Parees 2019-01-29 14:35:51 UTC
Oleg, not sure if this would be resolved by your recent reordering of the secret sync'ing step....

That said, this should have resolved soon after anyway via a re-deploy once the secret was created.

Rutvik, what platform were you installing on?  AWS or Libvirt or something else?

Comment 2 Rutvik 2019-01-29 15:00:01 UTC
Hello Ben,

Thanks for the followup.

It's AWS and Linux-amd64

Comment 3 Oleg Bulatov 2019-01-29 15:19:22 UTC
The code that could produce 'unable to sync secrets' has been removed from the master today. But if it hadn't, we would have need logs of the operator to know what the operator is doing.

Comment 4 Ben Parees 2019-01-29 15:20:02 UTC
Rutvik and it never proceeded?  We would have expected this to potentially be the initial state but then the secret should have been created and the registry deployed.

Comment 5 Ben Parees 2019-01-29 15:24:39 UTC
Looking at the code before Oleg's changes, the only place where I can see that we'd get a timeout like that would have been in s3's syncsecrets which does a poll looking for the cloud credentials... so this would mean the credential minter never populated the credentials for us.

I don't know what the code looks like now, but we need to make sure if this happens again we report a better error than just a timeout.  Assigning to Corey to fix this to report a better error when we can't locate the credentials.

Comment 6 Abhishek 2019-01-29 15:49:14 UTC
Hi Ben Parees,

This is an issue while installing on aws with the lastest installer 0.11.0 and there is no error with registry deployment in 0.10.1

In 0.11.0 the below secret is missing:

# oc describe secret image-registry-private-configuration
Name:         image-registry-private-configuration
Namespace:    openshift-image-registry
Labels:       <none>
Annotations:  <none>

Type:  Opaque

Data
====
REGISTRY_STORAGE_S3_ACCESSKEY:  20 bytes
REGISTRY_STORAGE_S3_SECRETKEY:  40 bytes

Logs from the non-working environment:

# journalctl -u kubelet.service 

Jan 29 15:42:08 ip hyperkube[4426]: E0129 15:42:08.299166    4426 kuberuntime_manager.go:737] container start failed: CreateContainerConfigError: secrets "image-registry-private-configuration" not found
Jan 29 15:42:08 ip hyperkube[4426]: E0129 15:42:08.299213    4426 pod_workers.go:186] Error syncing pod 3f384ab3-23db-11e9-a944-0661877c94b8 ("image-registry-64c69c9b8d-7hjmf_openshift-image-registry(3f384ab3-23db-11e9-a944-0661877c94b8)"), skipping: failed to "StartContainer" for "registry" with CreateContainerConfigError: "secrets \"image-registry-private-configuration\" not found"

# oc logs -f cluster-image-registry-operator-684499b66b-wbm6b

I0129 15:33:47.277921       1 main.go:24] Cluster Image Registry Operator Version: v4.0.0-0.148.0.0-dirty
I0129 15:33:47.278018       1 main.go:25] Go Version: go1.10.3
I0129 15:33:47.278028       1 main.go:26] Go OS/Arch: linux/amd64
I0129 15:33:47.299125       1 controller.go:371] waiting for informer caches to sync
I0129 15:33:48.502130       1 controller.go:380] started events processor
E0129 15:38:48.534646       1 controller.go:208] unable to sync: unable to sync secrets: timed out waiting for the condition, requeuing
W0129 15:39:16.339626       1 reflector.go:272] k8s.io/client-go/informers/factory.go:130: watch of *v1.ConfigMap ended with: too old resource version: 90931 (94049)
W0129 15:43:29.321787       1 reflector.go:272] k8s.io/client-go/informers/factory.go:130: watch of *v1.ConfigMap ended with: too old resource version: 90931 (96485)
E0129 15:43:48.570537       1 controller.go:208] unable to sync: unable to sync secrets: timed out waiting for the condition, requeuing
W0129 15:45:10.396731       1 reflector.go:272] github.com/openshift/client-go/route/informers/externalversions/factory.go:101: watch of *v1.Route ended with: very short watch: github.com/openshift/client-go/route/informers/externalversions/factory.go:101: Unexpected watch close - watch lasted less than a second and no items received

Comment 7 Ben Parees 2019-01-29 16:17:24 UTC
I'm pretty sure this is indicative that the credential minter is not minting our creds.  Can you look at all the secrets that exist in openshift-image-registry?  I believe there should be one from the credential minter, named "installer-cloud-credentials"



And then work with Corey Daley and Devan Goodwin.

Comment 8 Ben Parees 2019-01-29 16:26:59 UTC
cleaning up the error reporting here, but it doesn't address the underlying failure:
https://github.com/openshift/cluster-image-registry-operator/pull/176

Comment 9 Ben Parees 2019-01-30 21:26:56 UTC
Devan resolved some issues w/ cred minter, so i'm going to optimistically put this back to QE.
https://github.com/openshift/cloud-credential-operator/pull/26

Comment 10 Rutvik 2019-01-31 03:28:02 UTC
Hello Ben,

Pods are still in the same position as they were after installation. Let me know if required any more information from that environment.

Comment 11 Corey Daley 2019-01-31 03:30:29 UTC
The last release for origin was built 12 hours ago, so this fix probably hasn't made it in yet to what you are running (if you are using the installer)

Comment 12 Ben Parees 2019-01-31 03:32:17 UTC
Rutvik if you want to look at anything, you can look at the same thing I asked Abhishek in comment 7:

https://bugzilla.redhat.com/show_bug.cgi?id=1670357#c7


I expect you will either not see a credential, if you watch repeatedly you'll see the credential is being deleted + recreated repeatedly.

Comment 13 Sudarshan Chaudhari 2019-01-31 03:54:42 UTC
Hi! 

I have just deployed the clsuter to verify and here are the list of all the secrets present on my cluster deployed with latest installer:

~~~
$ oc get secrets
NAME                                              TYPE                                  DATA      AGE
builder-dockercfg-mkz67                           kubernetes.io/dockercfg               1         2h
builder-token-fbsxd                               kubernetes.io/service-account-token   3         2h
builder-token-r6xjd                               kubernetes.io/service-account-token   3         2h
cluster-image-registry-operator-dockercfg-rsqs7   kubernetes.io/dockercfg               1         2h
cluster-image-registry-operator-token-svhs9       kubernetes.io/service-account-token   3         2h
cluster-image-registry-operator-token-z9zd6       kubernetes.io/service-account-token   3         2h
default-dockercfg-zcmkh                           kubernetes.io/dockercfg               1         2h
default-token-v26hv                               kubernetes.io/service-account-token   3         2h
default-token-zsg4x                               kubernetes.io/service-account-token   3         2h
deployer-dockercfg-6mzcc                          kubernetes.io/dockercfg               1         2h
deployer-token-9vd49                              kubernetes.io/service-account-token   3         2h
deployer-token-qw87f                              kubernetes.io/service-account-token   3         2h
image-registry                                    Opaque                                0         2h
image-registry-tls                                kubernetes.io/tls                     2         2h
node-ca-dockercfg-dbh8r                           kubernetes.io/dockercfg               1         2h
node-ca-token-flfl2                               kubernetes.io/service-account-token   3         2h
node-ca-token-hv8mx                               kubernetes.io/service-account-token   3         2h
registry-dockercfg-gvwhr                          kubernetes.io/dockercfg               1         2h
registry-token-lq9vx                              kubernetes.io/service-account-token   3         2h
registry-token-xbjs4                              kubernetes.io/service-account-token   3         2h
~~~


The secret does not exists. 

What information is expected in that secret. Can we manually create it?

Thanks.

Comment 14 Ben Parees 2019-01-31 04:00:17 UTC
Sorry but you're just going to have to wait for the credential minter fix to make it into a build for you to pick up.

I'm assigning this to Devan so it comes back to him if the issue is not resolved w/ his fix.  There's nothing that can be done from a registry operator perspective, short of configuring the operator to use a custom secret for the aws credentials, and I don't think that's a productive use of anyone's time right now.

Comment 15 Rutvik 2019-01-31 04:14:42 UTC
Hi Ben,

I was monitoring this from the last couple of days and this secret "installer-cloud-credentials" does not exists and can't even see it in recreation state.

The list of secrets present in my cluster is same as above mentioned by Sudarshan in #13. Nothing has changed after installation.

Comment 16 Ben Parees 2019-01-31 04:28:32 UTC
> I was monitoring this from the last couple of days and this secret "installer-cloud-credentials" does not exists and can't even see it in recreation state.


yup, that's the bug, or a form of it.  that secret should exist in the openshift-image-registry namespace.  Hopefully Devan's fix resolves it. thanks.

Comment 20 Wenjing Zheng 2019-02-19 08:24:50 UTC
I can see image-registry-private-configuration and installer-cloud-credentials as below:
[wzheng@openshift-qe 4.0]$ oc get secrets
NAME                                              TYPE                                  DATA      AGE
builder-dockercfg-swllb                           kubernetes.io/dockercfg               1         28h
builder-token-bh5k2                               kubernetes.io/service-account-token   3         28h
builder-token-pkmqg                               kubernetes.io/service-account-token   3         28h
cluster-image-registry-operator-dockercfg-cwd2z   kubernetes.io/dockercfg               1         28h
cluster-image-registry-operator-token-dlmxf       kubernetes.io/service-account-token   3         28h
cluster-image-registry-operator-token-sdb7w       kubernetes.io/service-account-token   3         28h
default-dockercfg-dnwmm                           kubernetes.io/dockercfg               1         28h
default-token-7q4nq                               kubernetes.io/service-account-token   3         28h
default-token-k4pkq                               kubernetes.io/service-account-token   3         28h
deployer-dockercfg-f55l7                          kubernetes.io/dockercfg               1         28h
deployer-token-dwszl                              kubernetes.io/service-account-token   3         28h
deployer-token-htns5                              kubernetes.io/service-account-token   3         28h
image-registry-private-configuration              Opaque                                2         61m
image-registry-tls                                kubernetes.io/tls                     2         28h
installer-cloud-credentials                       Opaque                                2         132m
node-ca-dockercfg-g9dcr                           kubernetes.io/dockercfg               1         28h
node-ca-token-2vfxb                               kubernetes.io/service-account-token   3         28h
node-ca-token-w74zf                               kubernetes.io/service-account-token   3         28h
registry-dockercfg-299jw                          kubernetes.io/dockercfg               1         28h
registry-token-drjkf                              kubernetes.io/service-account-token   3         28h
registry-token-llpw5                              kubernetes.io/service-account-token   3         28h
[wzheng@openshift-qe 4.0]$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE     STATUS
version   4.0.0-0.nightly-2019-02-17-024922   True        False         28h       Error while reconciling 4.0.0-0.nightly-2019-02-17-024922: the update could not be applied

Comment 23 errata-xmlrpc 2019-06-04 10:42:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758