Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1498595

Summary: Cannot deploy CNS on OCP 3.6 as the heketi-storage-copy-job is trying to pull a wrong image "heketi/heketi:dev" instead of "rhgs3/rhgs-volmanager-rhel7:latest"
Product: OpenShift Container Platform Reporter: Prasanth <pprakash>
Component: InstallerAssignee: Jose A. Rivera <jarrpa>
Status: CLOSED ERRATA QA Contact: Wenkai Shi <weshi>
Severity: high Docs Contact:
Priority: medium    
Version: 3.6.0CC: aos-bugs, asrivast, bbilgin, dmesser, hchiramm, jarrpa, jokerman, madam, mmccomas, pprakash, rcyriac, rhs-bugs, rreddy, rtalur, sankarshan, suchaudh
Target Milestone: ---   
Target Release: 3.6.z   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
undefined
Story Points: ---
Clone Of: 1494270 Environment:
Last Closed: 2017-12-14 21:01:55 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Prasanth 2017-10-04 17:18:06 UTC
+++ This bug was initially created as a clone of Bug #1494270 +++

Description of problem:

Trying to use openshift-ansible (latest checkout from github) in order to deploy CNS on OCP 3.6 Startup of RHGS containers later than image tag 3.3.0-15 fails with the an error message from within the container about TCMU_LOGDIR and GB_GLFS_LRU_COUNT environment variables not being set.

How reproducible:

always

Steps to Reproduce:
1. Download latest rhgs3/rhgs-(server|volmanager)-rhel7 images
2. Deploy using openshift-ansible and [glusterfs] inventory groups
3. Deployment playbooks times out waiting for GlusterFS pods to start

Actual results:

Deployment fails, OCP deployment fails if the required pods were designated to provide storage to the registry.

Expected results:

Deployment succeeds, containers are coming up with reasonable default values for above mentioned environment variables in case they are not set.


If the setup doesn't have access to the external public registry, there is a high chance for anyone to hit this issue: https://github.com/openshift/openshift-ansible/pull/5562

Ex: If BLOCK_REGISTRY='--block-registry docker.io' is configured in docker, the ansible deployment will fail as the heketi-storage-copy-job currently tries to pull the "heketi/heketi:dev" image. See below:


**************************************************************************
# oc get pods
NAME                            READY     STATUS              RESTARTS   AGE
deploy-heketi-storage-1-b92l9   1/1       Running             0          1m
glusterfs-storage-0t6c5         1/1       Running             0          5m
glusterfs-storage-sn930         1/1       Running             0          5m
glusterfs-storage-vq22s         1/1       Running             0          5m
heketi-storage-copy-job-sczx6   0/1       ContainerCreating   0          9s


# oc get pods
NAME                            READY     STATUS         RESTARTS   AGE
deploy-heketi-storage-1-b92l9   1/1       Running        0          3m
glusterfs-storage-0t6c5         1/1       Running        0          7m
glusterfs-storage-sn930         1/1       Running        0          7m
glusterfs-storage-vq22s         1/1       Running        0          7m
heketi-storage-copy-job-sczx6   0/1       ErrImagePull   0          1m


# oc describe pod heketi-storage-copy-job-sczx6
Name:                   heketi-storage-copy-job-sczx6
Namespace:              glusterfs
Security Policy:        privileged
Node:                   dhcp46-202.lab.eng.blr.redhat.com/10.70.46.202
Start Time:             Wed, 04 Oct 2017 19:15:39 +0530
Labels:                 controller-uid=4edcee40-a90a-11e7-bd89-005056a53cea
                        job-name=heketi-storage-copy-job
Annotations:            kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"Job","namespace":"glusterfs","name":"heketi-storage-copy-job","uid":"4edcee40-a90a-11e7-bd89-005056a53cea"...
                        openshift.io/scc=privileged
Status:                 Pending
IP:                     10.131.0.6
Controllers:            Job/heketi-storage-copy-job
Containers:
  heketi:
    Container ID:
    Image:              heketi/heketi:dev
    Image ID:
    Port:
    Command:
      cp
      /db/heketi.db
      /heketi
    State:              Waiting
      Reason:           ImagePullBackOff
    Ready:              False
    Restart Count:      0
    Environment:        <none>
    Mounts:
      /db from heketi-storage-secret (rw)
      /heketi from heketi-storage (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-pxl40 (ro)
Conditions:
  Type          Status
  Initialized   True 
  Ready         False 
  PodScheduled  True 
Volumes:
  heketi-storage:
    Type:               Glusterfs (a Glusterfs mount on the host that shares a pod's lifetime)
    EndpointsName:      heketi-storage-endpoints
    Path:               heketidbstorage
    ReadOnly:           false
  heketi-storage-secret:
    Type:       Secret (a volume populated by a Secret)
    SecretName: heketi-storage-secret
    Optional:   false
  default-token-pxl40:
    Type:       Secret (a volume populated by a Secret)
    SecretName: default-token-pxl40
    Optional:   false
QoS Class:      BestEffort
Node-Selectors: <none>
Tolerations:    <none>
Events:
  FirstSeen     LastSeen        Count   From                                            SubObjectPath           Type            Reason          Message
  ---------     --------        -----   ----                                            -------------           --------        ------          -------
  46s           46s             1       default-scheduler                                                       Normal          Scheduled       Successfully assigned heketi-storage-copy-job-sczx6 to dhcp46-202.lab.eng.blr.redhat.com
  31s           31s             1       kubelet, dhcp46-202.lab.eng.blr.redhat.com      spec.containers{heketi} Normal          BackOff         Back-off pulling image "heketi/heketi:dev"
  41s           17s             2       kubelet, dhcp46-202.lab.eng.blr.redhat.com      spec.containers{heketi} Normal          Pulling         pulling image "heketi/heketi:dev"
  32s           8s              2       kubelet, dhcp46-202.lab.eng.blr.redhat.com      spec.containers{heketi} Warning         Failed          Failed to pull image "heketi/heketi:dev": rpc error: code = 2 desc = unknown: Not Found
  32s           8s              3       kubelet, dhcp46-202.lab.eng.blr.redhat.com                              Warning         FailedSync      Error syncing pod
**************************************************************************

Jose, could you please confirm if this is the current behaviour/issue in OCP 3.6 ansible installer? Once you confirm, i'll go ahead and open a separate BZ with OCP 3.6 to track the same.

--- Additional comment from Jose A. Rivera on 2017-10-04 11:11:42 EDT ---

Confirmed. Let's see if OCP will allow this fix into 3.6.z. :)

--- Additional comment from Prasanth on 2017-10-04 11:47:10 EDT ---

(In reply to Jose A. Rivera from comment #6)
> Confirmed. Let's see if OCP will allow this fix into 3.6.z. :)

Thanks for confirming the same, Jose. I'll soon go ahead and file a BZ in OCP 3.6 and let's try to get the fix into 3.6.z. :)

--- Additional comment from Prasanth on 2017-10-04 11:47:47 EDT ---

Based on Comment 5, moving this BZ to Verified.

Comment 1 Scott Dodson 2017-10-04 20:20:38 UTC
https://github.com/openshift/openshift-ansible/pull/5663 proposed fix

Comment 2 Jose A. Rivera 2017-10-09 17:19:20 UTC
PR is merged.

Comment 3 Wenkai Shi 2017-10-18 11:24:04 UTC
Verified with version openshift-ansible-3.6.173.0.56-1.git.0.eecaf3e.el7, installation could succeed with special heketi image.

# cat hosts
...
openshift_storage_glusterfs_heketi_image=heketi/heketi
openshift_storage_glusterfs_heketi_version=dev
...

# docker ps -a
... docker.io/heketi/heketi@...

Comment 6 errata-xmlrpc 2017-12-14 21:01:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:3438