Bug 1459810 - heketi-storage-copy-job fails during cns-deploy as it trying to pull a wrong image "heketi/heketi:dev"
Summary: heketi-storage-copy-job fails during cns-deploy as it trying to pull a wrong ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: cns-deploy-tool
Version: cns-3.6
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: CNS 3.6
Assignee: Mohamed Ashiq
QA Contact: Prasanth
URL:
Whiteboard:
Depends On:
Blocks: 1445448
TreeView+ depends on / blocked
 
Reported: 2017-06-08 09:11 UTC by Prasanth
Modified: 2018-12-06 19:34 UTC (History)
7 users (show)

Fixed In Version: cns-deploy-5.0.0-2
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-10-11 07:12:11 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2017:2881 0 normal SHIPPED_LIVE cns-deploy-tool bug fix and enhancement update 2017-10-11 11:11:43 UTC

Description Prasanth 2017-06-08 09:11:18 UTC
Description of problem:

heketi-storage-copy-job fails during cns-deploy  as it trying to pull a wrong image "heketi/heketi:dev"

##########
heketi topology loaded.
Saving heketi-storage.json
secret "heketi-storage-secret" created
endpoints "heketi-storage-endpoints" created
service "heketi-storage-endpoints" created
job "heketi-storage-copy-job" created

Checking status of pods matching 'job-name=heketi-storage-copy-job':
heketi-storage-copy-job-10twl   0/1       ImagePullBackOff   0         5m
Timed out waiting for pods matching 'job-name=heketi-storage-copy-job'.
Error waiting for job 'heketi-storage-copy-job' to complete.
##########


#########
Events:
  FirstSeen     LastSeen        Count   From                                            SubObjectPath           Type            Reason          Message
  ---------     --------        -----   ----                                            -------------           --------        ------          -------
  26s           26s             1       default-scheduler                                                       Normal          Scheduled       Successfully assigned heketi-storage-copy-job-10twl to dhcp46-9.lab.eng.blr.redhat.com
  20s           20s             1       kubelet, dhcp46-9.lab.eng.blr.redhat.com        spec.containers{heketi} Normal          BackOff         Back-off pulling image "heketi/heketi:dev"
  20s           20s             1       kubelet, dhcp46-9.lab.eng.blr.redhat.com                                Warning         FailedSync      Error syncing pod, skipping: failed to "StartContainer" for "heketi" with ImagePullBackOff: "Back-off pulling image \"heketi/heketi:dev\""

  23s   8s      2       kubelet, dhcp46-9.lab.eng.blr.redhat.com        spec.containers{heketi} Normal  Pulling         pulling image "heketi/heketi:dev"
  21s   5s      2       kubelet, dhcp46-9.lab.eng.blr.redhat.com        spec.containers{heketi} Warning Failed          Failed to pull image "heketi/heketi:dev": rpc error: code = 2 desc = unknown: Not Found
  21s   5s      2       kubelet, dhcp46-9.lab.eng.blr.redhat.com                                Warning FailedSync      Error syncing pod, skipping: failed to "StartContainer" for "heketi" with ErrImagePull: "rpc error: code = 2 desc = unknown: Not Found"
#########

Version-Release number of selected component (if applicable):
heketi-client-5.0.0-1.el7rhgs.x86_64
cns-deploy-5.0.0-1.el7rhgs.x86_64

How reproducible: 2/2


Steps to Reproduce:
1. Execute # cns-deploy /opt/topology.json --deploy-gluster
2.
3.

Actual results: cns-deploy fails to complete as it failed during heketi-storage-copy-job due to an incorrect image reference


Expected results: cns-deploy should be successful


Additional info:

--------------------
[root@dhcp47-18 ~]# oc get jobs
NAME                      DESIRED   SUCCESSFUL   AGE
heketi-storage-copy-job   1         0            3s

[root@dhcp47-18 ~]# oc describe job heketi-storage-copy-job
Name:           heketi-storage-copy-job
Namespace:      storage-project
Selector:       controller-uid=23a29c8e-4c29-11e7-9471-005056b3ded1
Labels:         deploy-heketi=support
Annotations:    <none>
Parallelism:    1
Completions:    1
Start Time:     Thu, 08 Jun 2017 14:32:03 +0530
Pods Statuses:  1 Running / 0 Succeeded / 0 Failed
Pod Template:
  Labels:       controller-uid=23a29c8e-4c29-11e7-9471-005056b3ded1
                job-name=heketi-storage-copy-job
  Containers:
   heketi:
    Image:      heketi/heketi:dev
    Port:
    Command:
      cp
      /db/heketi.db
      /heketi
    Environment:        <none>
    Mounts:
      /db from heketi-storage-secret (rw)
      /heketi from heketi-storage (rw)
  Volumes:
   heketi-storage:
    Type:               Glusterfs (a Glusterfs mount on the host that shares a pod's lifetime)
    EndpointsName:      heketi-storage-endpoints
    Path:               heketidbstorage
    ReadOnly:           false
   heketi-storage-secret:
    Type:       Secret (a volume populated by a Secret)
    SecretName: heketi-storage-secret
    Optional:   false
Events:
  FirstSeen     LastSeen        Count   From            SubObjectPath   Type            Reason                  Message
  ---------     --------        -----   ----            -------------   --------        ------                  -------
  13s           13s             1       job-controller                  Normal          SuccessfulCreate        Created pod: heketi-storage-copy-job-10twl
[root@dhcp47-18 ~]# oc get pods
NAME                             READY     STATUS             RESTARTS   AGE
deploy-heketi-1-f5kvs            1/1       Running            0          1m
glusterfs-rprj8                  1/1       Running            0          2m
glusterfs-sss5j                  1/1       Running            0          2m
glusterfs-vc452                  1/1       Running            0          2m
heketi-storage-copy-job-10twl    0/1       ImagePullBackOff   0          19s
storage-project-router-3-59wqf   1/1       Running            1          2h


[root@dhcp47-18 ~]# oc describe pod heketi-storage-copy-job-10twl
Name:                   heketi-storage-copy-job-10twl
Namespace:              storage-project
Security Policy:        privileged
Node:                   dhcp46-9.lab.eng.blr.redhat.com/10.70.46.9
Start Time:             Thu, 08 Jun 2017 14:32:03 +0530
Labels:                 controller-uid=23a29c8e-4c29-11e7-9471-005056b3ded1
                        job-name=heketi-storage-copy-job
Annotations:            kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"Job","namespace":"storage-project","name":"heketi-storage-copy-job","uid":"23a29c8e-4c29-11e7-9471-005056b...
                        openshift.io/scc=privileged
Status:                 Pending
IP:                     10.129.0.8
Controllers:            Job/heketi-storage-copy-job
Containers:
  heketi:
    Container ID:
    Image:              heketi/heketi:dev
    Image ID:
    Port:
    Command:
      cp
      /db/heketi.db
      /heketi
    State:              Waiting
      Reason:           ImagePullBackOff
    Ready:              False
    Restart Count:      0
    Environment:        <none>
    Mounts:
      /db from heketi-storage-secret (rw)
      /heketi from heketi-storage (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-zh2vk (ro)
Conditions:
  Type          Status
  Initialized   True 
  Ready         False 
  PodScheduled  True 
Volumes:
  heketi-storage:
    Type:               Glusterfs (a Glusterfs mount on the host that shares a pod's lifetime)
    EndpointsName:      heketi-storage-endpoints
    Path:               heketidbstorage
    ReadOnly:           false
  heketi-storage-secret:
    Type:       Secret (a volume populated by a Secret)
    SecretName: heketi-storage-secret
    Optional:   false
  default-token-zh2vk:
    Type:       Secret (a volume populated by a Secret)
    SecretName: default-token-zh2vk
    Optional:   false
QoS Class:      BestEffort
Node-Selectors: <none>
Tolerations:    <none>
Events:
  FirstSeen     LastSeen        Count   From                                            SubObjectPath           Type            Reason          Message
  ---------     --------        -----   ----                                            -------------           --------        ------          -------
  26s           26s             1       default-scheduler                                                       Normal          Scheduled       Successfully assigned heketi-storage-copy-job-10twl to dhcp46-9.lab.eng.blr.redhat.com
  20s           20s             1       kubelet, dhcp46-9.lab.eng.blr.redhat.com        spec.containers{heketi} Normal          BackOff         Back-off pulling image "heketi/heketi:dev"
  20s           20s             1       kubelet, dhcp46-9.lab.eng.blr.redhat.com                                Warning         FailedSync      Error syncing pod, skipping: failed to "StartContainer" for "heketi" with ImagePullBackOff: "Back-off pulling image \"heketi/heketi:dev\""

  23s   8s      2       kubelet, dhcp46-9.lab.eng.blr.redhat.com        spec.containers{heketi} Normal  Pulling         pulling image "heketi/heketi:dev"
  21s   5s      2       kubelet, dhcp46-9.lab.eng.blr.redhat.com        spec.containers{heketi} Warning Failed          Failed to pull image "heketi/heketi:dev": rpc error: code = 2 desc = unknown: Not Found
  21s   5s      2       kubelet, dhcp46-9.lab.eng.blr.redhat.com                                Warning FailedSync      Error syncing pod, skipping: failed to "StartContainer" for "heketi" with ErrImagePull: "rpc error: code = 2 desc = unknown: Not Found"
--------------------

Comment 4 Prasanth 2017-06-09 09:08:09 UTC
Verified as fixed in cns-deploy-5.0.0-2

################
cns-deploy.log --verbose 
Using OpenShift CLI.
NAME              STATUS    AGE
storage-project   Active    1d
Using namespace "storage-project".
Checking that heketi pod is not running ... 
Checking status of pods matching 'glusterfs=heketi-pod':
No resources found.
Timed out waiting for pods matching 'glusterfs=heketi-pod'.
OK
template "deploy-heketi" created
serviceaccount "heketi-service-account" created
template "heketi" created
template "glusterfs" created
role "edit" added: "system:serviceaccount:storage-project:heketi-service-account"
Marking 'dhcp46-122.lab.eng.blr.redhat.com' as a GlusterFS node.
node "dhcp46-122.lab.eng.blr.redhat.com" labeled
Marking 'dhcp46-9.lab.eng.blr.redhat.com' as a GlusterFS node.
node "dhcp46-9.lab.eng.blr.redhat.com" labeled
Marking 'dhcp46-134.lab.eng.blr.redhat.com' as a GlusterFS node.
node "dhcp46-134.lab.eng.blr.redhat.com" labeled
Deploying GlusterFS pods.
daemonset "glusterfs" created
Waiting for GlusterFS pods to start ... 
Checking status of pods matching 'glusterfs-node=pod':
glusterfs-3m19h   1/1       Running   0         52s
glusterfs-nl9lf   1/1       Running   0         52s
glusterfs-p749w   1/1       Running   0         52s
OK
service "deploy-heketi" created
route "deploy-heketi" created
deploymentconfig "deploy-heketi" created
Waiting for deploy-heketi pod to start ... 
Checking status of pods matching 'glusterfs=heketi-pod':
deploy-heketi-1-fswnt   1/1       Running   0         1m
OK
Determining heketi service URL ... OK
Creating cluster ... ID: bd901d3bbabf347a5718dfd99b467d19
Creating node dhcp46-122.lab.eng.blr.redhat.com ... ID: 15fd7dde406c7d83781c07ce66ebd550
Adding device /dev/sdd ... OK
Adding device /dev/sde ... OK
Adding device /dev/sdf ... OK
Creating node dhcp46-9.lab.eng.blr.redhat.com ... ID: e96157370ed8695d016242d9671bfede
Adding device /dev/sdd ... OK
Adding device /dev/sde ... OK
Adding device /dev/sdf ... OK
Creating node dhcp46-134.lab.eng.blr.redhat.com ... ID: f20a07ceb0223c2a4873d0f036539483
Adding device /dev/sdd ... OK
Adding device /dev/sde ... OK
Adding device /dev/sdf ... OK
heketi topology loaded.
Saving heketi-storage.json
secret "heketi-storage-secret" created
endpoints "heketi-storage-endpoints" created
service "heketi-storage-endpoints" created
job "heketi-storage-copy-job" created

Checking status of pods matching 'job-name=heketi-storage-copy-job':
heketi-storage-copy-job-1g2sc   0/1       Completed   0         11s
deploymentconfig "deploy-heketi" deleted
route "deploy-heketi" deleted
service "deploy-heketi" deleted
job "heketi-storage-copy-job" deleted
pod "deploy-heketi-1-fswnt" deleted
secret "heketi-storage-secret" deleted
service "heketi" created
route "heketi" created
deploymentconfig "heketi" created
Waiting for heketi pod to start ... 
Checking status of pods matching 'glusterfs=heketi-pod':
deploy-heketi-1-fswnt   1/1       Terminating   0         1m
OK
Determining heketi service URL ... OK
heketi is now running and accessible via http://heketi-storage-project.cloudapps.mystorage.com/
Ready to create and provide GlusterFS volumes.
################

Comment 6 errata-xmlrpc 2017-10-11 07:12:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:2881

Comment 7 vinutha 2018-12-06 19:34:16 UTC
Marking qe-test-coverage as - since the preferred mode of deployment is using ansible


Note You need to log in before you can comment on or make changes to this bug.