Bug 1527787 - [Free-INT]Depolyment failed due to Error: ErrImagePull on CRI-O nodes
Summary: [Free-INT]Depolyment failed due to Error: ErrImagePull on CRI-O nodes
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Online
Classification: Red Hat
Component: Image Registry
Version: 3.x
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: Kenny Woodson
QA Contact: Dongbo Yan
URL:
Whiteboard:
: 1542302 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-12-20 06:06 UTC by yufchang
Modified: 2018-03-27 18:08 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-03-27 18:08:04 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
events (15.24 KB, text/plain)
2017-12-20 06:06 UTC, yufchang
no flags Details

Description yufchang 2017-12-20 06:06:56 UTC
Created attachment 1370292 [details]
events

Description of problem:
Error: error reading container (probably exited) json message: EOF
Failed  to pull image  "docker-registry.default.svc:5000/cindy/myruby@sha256:11dfb4954adca55efa6b209c230d798019ac31c0764f7f85730fd8f47d7c264e":  rpc error: code = Unknown desc = pinging docker registry returned: Get  https://docker-registry.default.svc:5000/v2/: x509: certificate is valid  for docker-registry.default.svc.cluster.local,  registry.free-int.openshift.com, 172.30.215.46, not  docker-registry.default.svc

Version-Release number of selected component (if applicable):
Free-INT v3.8.18 (online version 3.6.0.83)

How reproducible:
Always

Steps to Reproduce:
1.using Ruby builder on web console to create an application;
2.check depolyment


Actual results:
Deployment failed due to ErrImagePull
oc get events:
1:21:29 PM
myruby-2-df2dbPodWarningFailed     Error: ImagePullBackOff    
19 times in the last 25 minutes
1:18:21 PMmyruby-2-df2dbPodNormalBack-off     Back-off  pulling image  "docker-registry.default.svc:5000/cindy/myruby@sha256:11dfb4954adca55efa6b209c230d798019ac31c0764f7f85730fd8f47d7c264e"    
6 times in the last 25 minutes
1:17:52 PMmyruby-2-df2dbPodWarningFailed     Failed  to pull image  "docker-registry.default.svc:5000/cindy/myruby@sha256:11dfb4954adca55efa6b209c230d798019ac31c0764f7f85730fd8f47d7c264e":  rpc error: code = Unknown desc = pinging docker registry returned: Get  https://docker-registry.default.svc:5000/v2/: x509: certificate is valid  for docker-registry.default.svc.cluster.local,  registry.free-int.openshift.com, 172.30.215.46, not  docker-registry.default.svc    
4 times in the last 25 minutes
1:17:52 PMmyruby-2-df2dbPodWarningFailed     Error: ErrImagePull    
4 times in the last 25 minutes
1:17:52 PMmyruby-2-df2dbPodNormalPulling     pulling  image  "docker-registry.default.svc:5000/cindy/myruby@sha256:11dfb4954adca55efa6b209c230d798019ac31c0764f7f85730fd8f47d7c264e"    
4 times in the last 25 minutes
1:17:50 PMmyruby-1-deployPodWarningFailed     Error: error reading container (probably exited) json message: EOF

Expected results:
Depolyment succeed

Additional info:

Comment 3 Antonio Murdaca 2017-12-20 16:59:26 UTC
looks like some bad setup from a cri-o prospective

Comment 4 weiwei jiang 2017-12-21 02:15:32 UTC
Checked with OCP 3.8.22 and docker-registry cert SAN contain docker-registry.default.svc, so better check if the cert signed for docker-registry is proper.

# openssl x509 -in registry.crt -noout -text
Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number: 8 (0x8)
    Signature Algorithm: sha256WithRSAEncryption
        Issuer: CN=openshift-signer@1513758622
        Validity
            Not Before: Dec 20 08:40:55 2017 GMT
            Not After : Dec 20 08:40:56 2019 GMT
        Subject: CN=172.30.9.91
        Subject Public Key Info:
            Public Key Algorithm: rsaEncryption
                Public-Key: (2048 bit)
                Modulus:
                    00:ee:fa:86:ca:0d:d2:0f:70:e3:73:de:34:bb:e3:
                    f1:a2:30:14:9a:55:6e:0a:e4:8c:7a:9c:56:dd:38:
                    1e:c6:a7:9b:ce:55:26:72:cd:6d:0c:50:c4:11:84:
                    a2:70:dc:c2:62:75:da:b8:dd:bc:e3:8e:46:89:cc:
                    35:84:be:2b:38:9b:4d:42:af:6e:2d:a4:ca:be:20:
                    9e:bd:73:1f:a1:89:25:cb:71:24:4b:1d:6f:fa:76:
                    e3:07:6f:9c:53:65:88:8d:e6:81:24:a8:8b:f7:7d:
                    7b:52:fd:f5:fd:43:47:4d:a3:36:98:07:b8:36:b8:
                    2b:d2:69:20:16:c0:97:75:45:53:ad:cf:56:2a:a3:
                    70:20:13:01:73:04:0f:a0:47:c9:8d:a8:d4:fc:d8:
                    e2:a9:cb:9c:df:00:a1:28:05:6f:b3:a1:92:f0:d6:
                    c5:c2:80:39:a2:2e:3f:f8:ee:7e:48:86:74:f9:86:
                    da:ed:ca:0a:46:c1:85:84:98:28:6b:57:b4:27:ed:
                    21:17:a9:00:c1:03:57:05:5d:14:ec:bd:11:65:e7:
                    19:f2:b5:80:b6:30:0d:c1:27:ab:a9:6c:0c:1c:16:
                    e5:b5:a1:44:b9:b4:31:de:3e:42:44:83:86:64:45:
                    4a:58:86:4e:0f:23:03:c8:be:e1:d0:4b:41:aa:67:
                    f8:0d
                Exponent: 65537 (0x10001)
        X509v3 extensions:
            X509v3 Key Usage: critical
                Digital Signature, Key Encipherment
            X509v3 Extended Key Usage: 
                TLS Web Server Authentication
            X509v3 Basic Constraints: critical
                CA:FALSE
            X509v3 Subject Alternative Name: 
                DNS:__omit_place_holder__5674c34a7b283217f63ea892a8157eda74190198, DNS:docker-registry-default.apps.1220-3bg.qe.rhcloud.com, DNS:docker-registry.default.svc, DNS:docker-registry.default.svc.cluster.local, DNS:172.30.9.91, IP Address:172.30.9.91
    Signature Algorithm: sha256WithRSAEncryption
         0d:bd:31:78:86:37:03:dc:fc:26:4a:a1:de:ef:20:d1:11:6d:
         97:81:1f:01:35:4c:96:d5:62:6a:77:ea:aa:f1:a4:c5:46:37:
         27:fc:ec:b3:0d:76:04:df:84:2a:a9:4f:fc:47:b3:5a:9f:e0:
         1a:be:60:e5:84:18:6f:91:3a:19:2c:08:88:8e:0c:98:77:97:
         58:e6:0d:67:77:33:89:b3:77:ec:e8:df:8f:2a:50:2d:19:ff:
         30:5b:be:1e:24:41:92:68:24:25:f7:78:db:74:98:cc:5d:d1:
         63:59:bd:24:42:cd:41:20:9f:c8:38:ac:44:2c:29:bd:1b:b3:
         d6:55:49:47:ee:83:01:fe:c7:aa:71:73:f6:42:5c:f3:aa:95:
         3c:cf:05:8f:fa:4b:7f:d0:5d:18:40:ba:c3:2d:75:65:84:b2:
         7e:c3:46:b4:9a:15:b4:74:3c:74:ad:21:a0:78:17:08:2c:6b:
         47:0b:a4:d4:53:a2:6a:6b:ce:72:ee:c4:bd:c7:88:4c:96:82:
         01:6a:79:fa:b7:ab:3f:3b:8b:47:ad:0d:8a:6b:03:10:16:33:
         77:69:02:2e:d6:bd:cc:f6:3f:3b:58:7d:9d:52:03:a8:ed:70:
         cb:3b:c0:1d:6b:b2:ae:92:92:8d:3a:3a:a7:ff:31:27:df:21:
         d5:8e:b0:3c

Comment 6 Seth Jennings 2018-01-04 16:42:48 UTC
Sending to containers/crio team.  Env no longer exists and not a test blocker.  Lowering severity/proirity.

Comment 9 Antonio Murdaca 2018-01-18 15:45:59 UTC
Can someone change the component?

Comment 10 Kenny Woodson 2018-01-25 15:43:46 UTC
Matt and I rerolled the certificates for the registry, modified the masters, and tested that the non-crio nodes are capable of pushing to the registry.

We have since added scale groups back to free-int and we are now able pull/push to the registry.

Comment 11 Dongbo Yan 2018-01-26 03:01:58 UTC
verified
still can reproduce this bug

# oc get pod -w -o wide
NAME                               READY     STATUS             RESTARTS   AGE       IP            NODE
nodejs-mongo-persistent-1-deploy   1/1       Running            0          2m        10.131.7.38   ip-172-31-60-80.ec2.internal
nodejs-mongo-persistent-1-z9pjq    0/1       ImagePullBackOff   0          2m        10.129.7.29   ip-172-31-56-140.ec2.internal
nodejs-mongo-persistent-1-z9pjq   0/1       ErrImagePull   0         3m        10.129.7.29   ip-172-31-56-140.ec2.internal

# oc describe pod nodejs-mongo-persistent-1-z9pjq
Events:
  Type     Reason                 Age              From                                    Message
  ----     ------                 ----             ----                                    -------
  Normal   Scheduled              3m               default-scheduler                       Successfully assigned nodejs-mongo-persistent-1-z9pjq to ip-172-31-56-140.ec2.internal
  Normal   SuccessfulMountVolume  3m               kubelet, ip-172-31-56-140.ec2.internal  MountVolume.SetUp succeeded for volume "default-token-94wz9"
  Normal   Pulling                1m (x4 over 3m)  kubelet, ip-172-31-56-140.ec2.internal  pulling image "172.30.215.46:5000/dyan/nodejs-mongo-persistent@sha256:6cac2111b195247b6c930166a213171b4c1883e840cca0b0d7167fe5116f2e9a"
  Warning  Failed                 1m (x4 over 3m)  kubelet, ip-172-31-56-140.ec2.internal  Failed to pull image "172.30.215.46:5000/dyan/nodejs-mongo-persistent@sha256:6cac2111b195247b6c930166a213171b4c1883e840cca0b0d7167fe5116f2e9a": rpc error: code = Unknown desc = pinging docker registry returned: Get https://172.30.215.46:5000/v2/: x509: certificate signed by unknown authority
  Warning  Failed                 1m (x4 over 3m)  kubelet, ip-172-31-56-140.ec2.internal  Error: ErrImagePull
  Normal   BackOff                1m (x6 over 3m)  kubelet, ip-172-31-56-140.ec2.internal  Back-off pulling image "172.30.215.46:5000/dyan/nodejs-mongo-persistent@sha256:6cac2111b195247b6c930166a213171b4c1883e840cca0b0d7167fe5116f2e9a"
  Warning  Failed                 1m (x6 over 3m)  kubelet, ip-172-31-56-140.ec2.internal  Error: ImagePullBackOff

ip-172-31-56-104.ec2.internal   Ready                      <none>    2d        v1.9.1+a0ce1bc657   Red Hat Enterprise Linux Server 7.4 (Maipo)   3.10.0-693.11.6.el7.x86_64   cri-o://1.9.0

Comment 12 Wenjing Zheng 2018-01-31 09:01:37 UTC
Latest result: failed to push result image to registry if builder pod is on ip-172-31-58-103.ec2.internal and ip-172-31-49-231.ec2.internal:
$ oc get builds
NAME         TYPE      FROM          STATUS                               STARTED              DURATION
ruby-ex-10   Source    Git@bbb6701   Complete                             About a minute ago   20s
ruby-ex-2    Source    Git@bbb6701   Complete                             3 minutes ago        20s
ruby-ex-3    Source    Git@bbb6701   Complete                             3 minutes ago        18s
ruby-ex-4    Source    Git@bbb6701   Failed (PushImageToRegistryFailed)   2 minutes ago        12s
ruby-ex-5    Source    Git@bbb6701   Complete                             2 minutes ago        20s
ruby-ex-6    Source    Git@bbb6701   Complete                             2 minutes ago        19s
ruby-ex-7    Source    Git@bbb6701   Failed (PushImageToRegistryFailed)   About a minute ago   12s
ruby-ex-8    Source    Git@bbb6701   Failed (PushImageToRegistryFailed)   About a minute ago   13s
ruby-ex-9    Source    Git@bbb6701   Failed (PushImageToRegistryFailed)   About a minute ago   15s
$ oc get pods -o wide
NAME               READY     STATUS      RESTARTS   AGE       IP             NODE
ruby-ex-10-build   0/1       Completed   0          1m        10.128.2.204   ip-172-31-49-44.ec2.internal
ruby-ex-2-build    0/1       Completed   0          4m        10.130.2.158   ip-172-31-59-87.ec2.internal
ruby-ex-3-build    0/1       Completed   0          3m        10.130.2.159   ip-172-31-59-87.ec2.internal
ruby-ex-4-build    0/1       Error       0          3m        10.131.6.249   ip-172-31-58-103.ec2.internal
ruby-ex-5-build    0/1       Completed   0          3m        10.129.2.219   ip-172-31-62-45.ec2.internal
ruby-ex-6-build    0/1       Completed   0          2m        10.129.2.220   ip-172-31-62-45.ec2.internal
ruby-ex-6-pnj27    1/1       Running     0          1m        10.129.2.224   ip-172-31-62-45.ec2.internal
ruby-ex-7-build    0/1       Error       0          2m        10.131.6.250   ip-172-31-58-103.ec2.internal
ruby-ex-8-build    0/1       Error       0          2m        10.129.7.19    ip-172-31-49-231.ec2.internal
ruby-ex-9-build    0/1       Error       0          2m        10.131.6.251   ip-172-31-58-103.ec2.internal

Comment 13 Scott Dodson 2018-02-06 13:44:54 UTC
*** Bug 1542302 has been marked as a duplicate of this bug. ***

Comment 14 Justin Pierce 2018-02-06 13:59:57 UTC
All known fixes deployed to free-int. Moving to QA.

Comment 16 yufchang 2018-02-11 02:31:14 UTC
verified on 
OpenShift Master: v3.9.0-0.36.0 (online version 3.6.0.83) 
Kubernetes Master: v1.9.1+a0ce1bc657 
OpenShift Web Console:  v3.9.0-0.36.0


Note You need to log in before you can comment on or make changes to this bug.