Bug 1467318

Summary: Getting proxy error while deploying heketi pod(pod restarted)
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Apeksha <akhakhar>
Component: heketiAssignee: Mohamed Ashiq <mliyazud>
Status: CLOSED ERRATA QA Contact: Anoop <annair>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: rhgs-3.0CC: akhakhar, aos-bugs, decarr, hchiramm, jokerman, mliyazud, mmccomas, pronix.service, rcyriac, rhs-bugs, rreddy, rtalur, sjenning, storage-qa-internal
Target Milestone: ---   
Target Release: CNS 3.6   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: heketi-5.0.0-3.el7rhgs Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-10-11 07:07:22 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1445448    
Attachments:
Description Flags
openshift log none

Description Apeksha 2017-07-03 11:38:41 UTC
Created attachment 1293853 [details]
openshift log

Description of problem:
CNS deploy failed on latest OCP build.
Getting proxy error while deploying heketi pod and we also see that the deploy-heketi pod has restarted

Version-Release number of selected component (if applicable):
atomic-openshift-3.6.131-1.git.0.9a89da2.el7.x86_64
openshift-ansible-3.6.131-1.git.0.d87dfaa.el7.noarch
cns-deploy-5.0.0-4.el7rhgs.x86_64
heketi-client-5.0.0-2.el7rhgs.x86_64

How reproducible: Once

Steps to Reproduce:
1. Setup OCP
2. Setup CNS - cns-deploy command failed at deploy-hekti stage.


[root@dhcp47-54 ~]# oc get pods
NAME                             READY     STATUS    RESTARTS   AGE
deploy-heketi-1-n6x5j            1/1       Running   1          1h
glusterfs-2kbwj                  1/1       Running   0          1h
glusterfs-9kqbh                  1/1       Running   0          1h
glusterfs-qmxxb                  1/1       Running   0          1h
storage-project-router-1-ndqth   1/1       Running   4          2h

[root@dhcp47-54 ~]# oc logs deploy-heketi-1-n6x5j
Heketi 5.0.0
[heketi] INFO 2017/07/03 09:51:26 Loaded kubernetes executor
[heketi] INFO 2017/07/03 09:51:26 Loaded simple allocator
[heketi] INFO 2017/07/03 09:51:26 GlusterFS Application Loaded
Listening on port 8080
[negroni] Started GET /queue/ce50f655c7a4afcb8b81d45f91d6a019
[negroni] Completed 404 Not Found in 46.852µs

proxy error found in ocp logs:  

15:21:09.365835   15822 rest.go:324] Starting watch for /apis/build.openshift.io/v1/buildconfigs, rv=3129 labels= fields= timeout=7m4s
 15:21:14.900710   15822 proxy.go:192] Error proxying data from client to backend: write tcp 10.70.46.24:53688->10.70.47.174:10250: write: broken pipe


Additional info:

other setup details - http://pastebin.test.redhat.com/499757

Also attached the openshift logs (journalctl -u atomic-openshift-master)

Comment 1 Apeksha 2017-07-03 11:57:09 UTC
setup details as mentioned in pastebin, just incase the pastebin info gets scraped in future-

    [root@dhcp47-54 ~]# heketi-cli topology info
     
    Cluster Id: 21528bc45bc6c0fb927999cf6b3d951d
     
        File:  true
        Block: true
     
        Volumes:
     
        Nodes:
     
            Node Id: 56f4ec771adddc8c32a1da9c4486dc6a
            State: online
            Cluster Id: 21528bc45bc6c0fb927999cf6b3d951d
            Zone: 1
            Management Hostname: dhcp47-183.lab.eng.blr.redhat.com
            Storage Hostname: 10.70.47.183
            Devices:
                    Id:41571878d9a35db3019c922e36f347ed   Name:/dev/sdf            State:online    Size (GiB):99      Used (GiB):2       Free (GiB):97      
                            Bricks:
                                    Id:3e38eab5c5e8766761614952ba513312   Size (GiB):2       Path:
                    Id:6fc48d160d31385754d358ce933db784   Name:/dev/sdd            State:online    Size (GiB):99      Used (GiB):0       Free (GiB):99      
                            Bricks:
                    Id:9e5014b09c12f8e9e1283b4e7ab41500   Name:/dev/sde            State:online    Size (GiB):99      Used (GiB):0       Free (GiB):99      
                            Bricks:
     
            Node Id: 887b962c9ac00e6641c11bb2be0b980d
            State: online
            Cluster Id: 21528bc45bc6c0fb927999cf6b3d951d
            Zone: 1
            Management Hostname: dhcp47-174.lab.eng.blr.redhat.com
            Storage Hostname: 10.70.47.174
            Devices:
                    Id:34c90ca30ba53f9c735a7329f406adab   Name:/dev/sdd            State:online    Size (GiB):99      Used (GiB):0       Free (GiB):99      
                            Bricks:
                    Id:41d738fd220afba241515ec47352dd3b   Name:/dev/sdf            State:online    Size (GiB):99      Used (GiB):2       Free (GiB):97      
                            Bricks:
                                    Id:4af6aff5284cf8c3545e20bd9cff3345   Size (GiB):2       Path:
                    Id:bd406bd0caba85a01525277ec5fc778f   Name:/dev/sde            State:online    Size (GiB):99      Used (GiB):0       Free (GiB):99      
                            Bricks:
     
            Node Id: 9f94b4dd8bf2619a36497917e2aac6b2
            State: online
            Cluster Id: 21528bc45bc6c0fb927999cf6b3d951d
            Zone: 1
            Management Hostname: dhcp46-133.lab.eng.blr.redhat.com
            Storage Hostname: 10.70.46.133
            Devices:
                    Id:5364c16fd178cfea64af1e023bbbe072   Name:/dev/sdf            State:online    Size (GiB):99      Used (GiB):2       Free (GiB):97      
                            Bricks:
                                    Id:27e5a2036089d3a3ecc472fdee41e1e2   Size (GiB):2       Path:
                    Id:9caf3731085e2225f49982456614dbb2   Name:/dev/sdd            State:online    Size (GiB):99      Used (GiB):0       Free (GiB):99      
                            Bricks:
                    Id:b2b5a36c2feb3055e703c45e3b60a1a1   Name:/dev/sde            State:online    Size (GiB):99      Used (GiB):0       Free (GiB):99      
                            Bricks:
     
     
    [root@dhcp47-54 ~]# oc logs deploy-heketi-1-n6x5j
    Heketi 5.0.0
    [heketi] INFO 2017/07/03 09:51:26 Loaded kubernetes executor
    [heketi] INFO 2017/07/03 09:51:26 Loaded simple allocator
    [heketi] INFO 2017/07/03 09:51:26 GlusterFS Application Loaded
    Listening on port 8080
    [negroni] Started GET /queue/ce50f655c7a4afcb8b81d45f91d6a019
    [negroni] Completed 404 Not Found in 46.852µs
     
    [root@dhcp47-54 ~]# oc get pods
    NAME                             READY     STATUS    RESTARTS   AGE
    deploy-heketi-1-n6x5j            1/1       Running   1          14m
    glusterfs-2kbwj                  1/1       Running   0          17m
    glusterfs-9kqbh                  1/1       Running   0          17m
    glusterfs-qmxxb                  1/1       Running   0          17m
    storage-project-router-1-ndqth   1/1       Running   4          42m
     
     
    [root@dhcp47-54 ~]# oc get nodes
    NAME                                STATUS                     AGE       VERSION
    dhcp46-133.lab.eng.blr.redhat.com   Ready                      2h        v1.6.1+5115d708d7
    dhcp46-24.lab.eng.blr.redhat.com    Ready,SchedulingDisabled   2h        v1.6.1+5115d708d7
    dhcp47-174.lab.eng.blr.redhat.com   Ready                      2h        v1.6.1+5115d708d7
    dhcp47-183.lab.eng.blr.redhat.com   Ready                      2h        v1.6.1+5115d708d7
     
    [root@dhcp47-54 ~]# oc get all
    NAME                        REVISION   DESIRED   CURRENT   TRIGGERED BY
    dc/deploy-heketi            1          1         1         config
    dc/storage-project-router   1          1         1         config
     
    NAME                          DESIRED   CURRENT   READY     AGE
    rc/deploy-heketi-1            1         1         1         1h
    rc/storage-project-router-1   1         1         1         2h
     
    NAME                   HOST/PORT                                                          PATH      SERVICES        PORT      TERMINATION   WILDCARD
    routes/deploy-heketi   deploy-heketi-storage-project.cloudapps.mystorage.com ... 1 more             deploy-heketi   <all>                   None
     
    NAME                         CLUSTER-IP       EXTERNAL-IP   PORT(S)                   AGE
    svc/deploy-heketi            172.30.230.70    <none>        8080/TCP                  1h
    svc/storage-project-router   172.30.208.233   <none>        80/TCP,443/TCP,1936/TCP   2h
     
    NAME                                READY     STATUS    RESTARTS   AGE
    po/deploy-heketi-1-n6x5j            1/1       Running   1          1h
    po/glusterfs-2kbwj                  1/1       Running   0          1h
    po/glusterfs-9kqbh                  1/1       Running   0          1h
    po/glusterfs-qmxxb                  1/1       Running   0          1h
    po/storage-project-router-1-ndqth   1/1       Running   4          2h
     
     
    [root@dhcp47-54 ~]# oc status
    In project storage-project on server https://10.70.46.24:8443
     
    http://deploy-heketi-storage-project.cloudapps.mystorage.com (svc/deploy-heketi)
      dc/deploy-heketi deploys docker.io/rhgs3/rhgs-volmanager-rhel7:3.3.0-2
        deployment #1 deployed 2 hours ago - 1 pod
     
    svc/storage-project-router - 172.30.208.233 ports 80, 443, 1936
      dc/storage-project-router deploys brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/ose-haproxy-router:v3.6.131
        deployment #1 deployed 2 hours ago - 1 pod
     
    pod/glusterfs-2kbwj runs rhgs3/rhgs-server-rhel7:3.3.0-5
     
    pod/glusterfs-9kqbh runs rhgs3/rhgs-server-rhel7:3.3.0-5
     
    pod/glusterfs-qmxxb runs rhgs3/rhgs-server-rhel7:3.3.0-5
     
    View details with 'oc describe <resource>/<name>' or list everything with 'oc get all'.
     
     
    output of cns-deploy command where it failed:
     
        Using OpenShift CLI.
        NAME              STATUS    AGE
        storage-project   Active    24m
        Using namespace "storage-project".
        Checking that heketi pod is not running ...
        Checking status of pods matching 'glusterfs=heketi-pod':
        No resources found.
        Timed out waiting for pods matching 'glusterfs=heketi-pod'.
        OK
        template "deploy-heketi" created
        serviceaccount "heketi-service-account" created
        template "heketi" created
        template "glusterfs" created
        role "edit" added: "system:serviceaccount:storage-project:heketi-service-account"
        Marking 'dhcp47-174.lab.eng.blr.redhat.com' as a GlusterFS node.
        node "dhcp47-174.lab.eng.blr.redhat.com" labeled
        Marking 'dhcp47-183.lab.eng.blr.redhat.com' as a GlusterFS node.
        node "dhcp47-183.lab.eng.blr.redhat.com" labeled
        Marking 'dhcp46-133.lab.eng.blr.redhat.com' as a GlusterFS node.
        node "dhcp46-133.lab.eng.blr.redhat.com" labeled
        Deploying GlusterFS pods.
        daemonset "glusterfs" created
        Waiting for GlusterFS pods to start ...
        Checking status of pods matching 'glusterfs-node=pod':
        glusterfs-2kbwj   1/1       Running   0         3m
        glusterfs-9kqbh   1/1       Running   0         3m
        glusterfs-qmxxb   1/1       Running   0         3m
        OK
        service "deploy-heketi" created
        route "deploy-heketi" created
        deploymentconfig "deploy-heketi" created
        Waiting for deploy-heketi pod to start ...
        Checking status of pods matching 'glusterfs=heketi-pod':
        deploy-heketi-1-n6x5j   1/1       Running   0         3m
        OK
        Determining heketi service URL ... OK
        Creating cluster ... ID: 21528bc45bc6c0fb927999cf6b3d951d
        Allowing file volumes on cluster.
        Allowing block volumes on cluster.
        Creating node dhcp47-174.lab.eng.blr.redhat.com ... ID: 887b962c9ac00e6641c11bb2be0b980d
        Adding device /dev/sdd ... OK
        Adding device /dev/sde ... OK
        Adding device /dev/sdf ... OK
        Creating node dhcp47-183.lab.eng.blr.redhat.com ... ID: 56f4ec771adddc8c32a1da9c4486dc6a
        Adding device /dev/sdd ... OK
        Adding device /dev/sde ... OK
        Adding device /dev/sdf ... OK
        Creating node dhcp46-133.lab.eng.blr.redhat.com ... ID: 9f94b4dd8bf2619a36497917e2aac6b2
        Adding device /dev/sdd ... OK
        Adding device /dev/sde ... OK
        Adding device /dev/sdf ... OK
        heketi topology loaded.
        Error: Id not found
        Failed on setup openshift heketi storage

Comment 2 Derek Carr 2017-07-03 14:46:47 UTC
the proxy.go errors are a red herring.

Comment 3 Seth Jennings 2017-07-03 15:00:35 UTC
So it is failing here:

https://github.com/gluster/gluster-kubernetes/blob/master/deploy/gk-deploy#L575-L579

Which means this command is failing:

heketi-cli -s http://${heketi_service} --user admin --secret \"${ADMIN_KEY}\" setup-openshift-heketi-storage

What is the output of that command (with ADMIN_KEY filled in)?

I would assume that it can't contact the heketi server.

What is the "storage-project-router-1-ndqth"? Is a LB in front of the heketi server? Because it has restarted 4 times and seems to be having issues.

Comment 5 Apeksha 2017-07-04 06:30:27 UTC
Tried the heketi-cli command manually, changed the log level to debug, and here is the output-

[root@dhcp47-54 ~]# heketi-cli -s http://deploy-heketi-storage-project.cloudapps.mystorage.com setup-openshift-heketi-storage
Error: 

[root@dhcp47-54 ~]# oc logs deploy-heketi-1-n6x5j 
Heketi 5.0.0
[heketi] INFO 2017/07/04 06:18:12 Loaded kubernetes executor
[heketi] INFO 2017/07/04 06:18:12 Loaded simple allocator
[heketi] INFO 2017/07/04 06:18:12 GlusterFS Application Loaded
Listening on port 8080
[negroni] Started GET /clusters
[negroni] Completed 200 OK in 272.191µs
[negroni] Started GET /clusters/21528bc45bc6c0fb927999cf6b3d951d
[negroni] Completed 200 OK in 414.421µs
[negroni] Started POST /volumes
[negroni] Completed 202 Accepted in 335.235µs
[asynchttp] INFO 2017/07/04 06:20:20 asynchttp.go:125: Started job c3ee227ae056c768cf393f85f37b1030
[heketi] INFO 2017/07/04 06:20:20 Creating volume ac32dc76f52889d4980ab51602648369
[heketi] INFO 2017/07/04 06:20:20 brick_num: 0
[negroni] Started GET /queue/c3ee227ae056c768cf393f85f37b1030
[negroni] Completed 200 OK in 36.683µs
[heketi] INFO 2017/07/04 06:20:20 Creating brick e2c4b3d010ad422dac773dc8b64449bb
[heketi] INFO 2017/07/04 06:20:20 Creating brick b09bd6d9a5e1044ecad199f75c9524dd
[heketi] INFO 2017/07/04 06:20:20 Creating brick 489cbd8be23cb89eb6b20678310c1386
[kubeexec] DEBUG 2017/07/04 06:20:20 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:250: Host: dhcp46-133.lab.eng.blr.redhat.com Pod: glusterfs-9kqbh Command: mkdir -p /var/lib/heketi/mounts/vg_b2b5a36c2feb3055e703c45e3b60a1a1/brick_489cbd8be23cb89eb6b20678310c1386
Result: 
[kubeexec] DEBUG 2017/07/04 06:20:20 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:250: Host: dhcp47-183.lab.eng.blr.redhat.com Pod: glusterfs-2kbwj Command: mkdir -p /var/lib/heketi/mounts/vg_9e5014b09c12f8e9e1283b4e7ab41500/brick_b09bd6d9a5e1044ecad199f75c9524dd
Result: 
[kubeexec] DEBUG 2017/07/04 06:20:20 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:250: Host: dhcp47-174.lab.eng.blr.redhat.com Pod: glusterfs-qmxxb Command: mkdir -p /var/lib/heketi/mounts/vg_34c90ca30ba53f9c735a7329f406adab/brick_e2c4b3d010ad422dac773dc8b64449bb
Result: 
[kubeexec] DEBUG 2017/07/04 06:20:21 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:250: Host: dhcp46-133.lab.eng.blr.redhat.com Pod: glusterfs-9kqbh Command: lvcreate --poolmetadatasize 12288K -c 256K -L 2097152K -T vg_b2b5a36c2feb3055e703c45e3b60a1a1/tp_489cbd8be23cb89eb6b20678310c1386 -V 2097152K -n brick_489cbd8be23cb89eb6b20678310c1386
Result:   Using default stripesize 64.00 KiB.
  Thin pool volume with chunk size 256.00 KiB can address at most 63.25 TiB of data.
  Logical volume "brick_489cbd8be23cb89eb6b20678310c1386" created.
[kubeexec] DEBUG 2017/07/04 06:20:21 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:250: Host: dhcp47-183.lab.eng.blr.redhat.com Pod: glusterfs-2kbwj Command: lvcreate --poolmetadatasize 12288K -c 256K -L 2097152K -T vg_9e5014b09c12f8e9e1283b4e7ab41500/tp_b09bd6d9a5e1044ecad199f75c9524dd -V 2097152K -n brick_b09bd6d9a5e1044ecad199f75c9524dd
Result:   Using default stripesize 64.00 KiB.
  Thin pool volume with chunk size 256.00 KiB can address at most 63.25 TiB of data.
  Logical volume "brick_b09bd6d9a5e1044ecad199f75c9524dd" created.
[kubeexec] DEBUG 2017/07/04 06:20:21 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:250: Host: dhcp47-174.lab.eng.blr.redhat.com Pod: glusterfs-qmxxb Command: lvcreate --poolmetadatasize 12288K -c 256K -L 2097152K -T vg_34c90ca30ba53f9c735a7329f406adab/tp_e2c4b3d010ad422dac773dc8b64449bb -V 2097152K -n brick_e2c4b3d010ad422dac773dc8b64449bb
Result:   Using default stripesize 64.00 KiB.
  Thin pool volume with chunk size 256.00 KiB can address at most 63.25 TiB of data.
  Logical volume "brick_e2c4b3d010ad422dac773dc8b64449bb" created.
[kubeexec] DEBUG 2017/07/04 06:20:21 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:250: Host: dhcp46-133.lab.eng.blr.redhat.com Pod: glusterfs-9kqbh Command: mkfs.xfs -i size=512 -n size=8192 /dev/mapper/vg_b2b5a36c2feb3055e703c45e3b60a1a1-brick_489cbd8be23cb89eb6b20678310c1386
Result: meta-data=/dev/mapper/vg_b2b5a36c2feb3055e703c45e3b60a1a1-brick_489cbd8be23cb89eb6b20678310c1386 isize=512    agcount=8, agsize=65472 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=0, sparse=0
data     =                       bsize=4096   blocks=523776, imaxpct=25
         =                       sunit=64     swidth=64 blks
naming   =version 2              bsize=8192   ascii-ci=0 ftype=1
log      =internal log           bsize=4096   blocks=2560, version=2
         =                       sectsz=512   sunit=64 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
[kubeexec] DEBUG 2017/07/04 06:20:21 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:250: Host: dhcp47-183.lab.eng.blr.redhat.com Pod: glusterfs-2kbwj Command: mkfs.xfs -i size=512 -n size=8192 /dev/mapper/vg_9e5014b09c12f8e9e1283b4e7ab41500-brick_b09bd6d9a5e1044ecad199f75c9524dd
Result: meta-data=/dev/mapper/vg_9e5014b09c12f8e9e1283b4e7ab41500-brick_b09bd6d9a5e1044ecad199f75c9524dd isize=512    agcount=8, agsize=65472 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=0, sparse=0
data     =                       bsize=4096   blocks=523776, imaxpct=25
         =                       sunit=64     swidth=64 blks
naming   =version 2              bsize=8192   ascii-ci=0 ftype=1
log      =internal log           bsize=4096   blocks=2560, version=2
         =                       sectsz=512   sunit=64 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
[kubeexec] DEBUG 2017/07/04 06:20:21 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:250: Host: dhcp47-174.lab.eng.blr.redhat.com Pod: glusterfs-qmxxb Command: mkfs.xfs -i size=512 -n size=8192 /dev/mapper/vg_34c90ca30ba53f9c735a7329f406adab-brick_e2c4b3d010ad422dac773dc8b64449bb
Result: meta-data=/dev/mapper/vg_34c90ca30ba53f9c735a7329f406adab-brick_e2c4b3d010ad422dac773dc8b64449bb isize=512    agcount=8, agsize=65472 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=0, sparse=0
data     =                       bsize=4096   blocks=523776, imaxpct=25
         =                       sunit=64     swidth=64 blks
naming   =version 2              bsize=8192   ascii-ci=0 ftype=1
log      =internal log           bsize=4096   blocks=2560, version=2
         =                       sectsz=512   sunit=64 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
[kubeexec] DEBUG 2017/07/04 06:20:21 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:250: Host: dhcp46-133.lab.eng.blr.redhat.com Pod: glusterfs-9kqbh Command: echo "/dev/mapper/vg_b2b5a36c2feb3055e703c45e3b60a1a1-brick_489cbd8be23cb89eb6b20678310c1386 /var/lib/heketi/mounts/vg_b2b5a36c2feb3055e703c45e3b60a1a1/brick_489cbd8be23cb89eb6b20678310c1386 xfs rw,inode64,noatime,nouuid 1 2" | tee -a /var/lib/heketi/fstab > /dev/null
Result: 
[kubeexec] DEBUG 2017/07/04 06:20:21 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:250: Host: dhcp47-183.lab.eng.blr.redhat.com Pod: glusterfs-2kbwj Command: echo "/dev/mapper/vg_9e5014b09c12f8e9e1283b4e7ab41500-brick_b09bd6d9a5e1044ecad199f75c9524dd /var/lib/heketi/mounts/vg_9e5014b09c12f8e9e1283b4e7ab41500/brick_b09bd6d9a5e1044ecad199f75c9524dd xfs rw,inode64,noatime,nouuid 1 2" | tee -a /var/lib/heketi/fstab > /dev/null
Result: 
[kubeexec] DEBUG 2017/07/04 06:20:21 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:250: Host: dhcp47-174.lab.eng.blr.redhat.com Pod: glusterfs-qmxxb Command: echo "/dev/mapper/vg_34c90ca30ba53f9c735a7329f406adab-brick_e2c4b3d010ad422dac773dc8b64449bb /var/lib/heketi/mounts/vg_34c90ca30ba53f9c735a7329f406adab/brick_e2c4b3d010ad422dac773dc8b64449bb xfs rw,inode64,noatime,nouuid 1 2" | tee -a /var/lib/heketi/fstab > /dev/null
Result: 
[kubeexec] DEBUG 2017/07/04 06:20:21 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:250: Host: dhcp47-183.lab.eng.blr.redhat.com Pod: glusterfs-2kbwj Command: mount -o rw,inode64,noatime,nouuid /dev/mapper/vg_9e5014b09c12f8e9e1283b4e7ab41500-brick_b09bd6d9a5e1044ecad199f75c9524dd /var/lib/heketi/mounts/vg_9e5014b09c12f8e9e1283b4e7ab41500/brick_b09bd6d9a5e1044ecad199f75c9524dd
Result: 
[kubeexec] DEBUG 2017/07/04 06:20:21 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:250: Host: dhcp46-133.lab.eng.blr.redhat.com Pod: glusterfs-9kqbh Command: mount -o rw,inode64,noatime,nouuid /dev/mapper/vg_b2b5a36c2feb3055e703c45e3b60a1a1-brick_489cbd8be23cb89eb6b20678310c1386 /var/lib/heketi/mounts/vg_b2b5a36c2feb3055e703c45e3b60a1a1/brick_489cbd8be23cb89eb6b20678310c1386
Result: 
[negroni] Started GET /queue/c3ee227ae056c768cf393f85f37b1030
[negroni] Completed 200 OK in 44.729µs
[kubeexec] DEBUG 2017/07/04 06:20:21 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:250: Host: dhcp47-174.lab.eng.blr.redhat.com Pod: glusterfs-qmxxb Command: mount -o rw,inode64,noatime,nouuid /dev/mapper/vg_34c90ca30ba53f9c735a7329f406adab-brick_e2c4b3d010ad422dac773dc8b64449bb /var/lib/heketi/mounts/vg_34c90ca30ba53f9c735a7329f406adab/brick_e2c4b3d010ad422dac773dc8b64449bb
Result: 
[kubeexec] DEBUG 2017/07/04 06:20:21 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:250: Host: dhcp46-133.lab.eng.blr.redhat.com Pod: glusterfs-9kqbh Command: mkdir /var/lib/heketi/mounts/vg_b2b5a36c2feb3055e703c45e3b60a1a1/brick_489cbd8be23cb89eb6b20678310c1386/brick
Result: 
[kubeexec] DEBUG 2017/07/04 06:20:21 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:250: Host: dhcp47-183.lab.eng.blr.redhat.com Pod: glusterfs-2kbwj Command: mkdir /var/lib/heketi/mounts/vg_9e5014b09c12f8e9e1283b4e7ab41500/brick_b09bd6d9a5e1044ecad199f75c9524dd/brick
Result: 
[kubeexec] DEBUG 2017/07/04 06:20:21 /src/github.com/heketi/heketi/executors/kubeexec/kubeexec.go:250: Host: dhcp47-174.lab.eng.blr.redhat.com Pod: glusterfs-qmxxb Command: mkdir /var/lib/heketi/mounts/vg_34c90ca30ba53f9c735a7329f406adab/brick_e2c4b3d010ad422dac773dc8b64449bb/brick
Result: 
[sshexec] INFO 2017/07/04 06:20:21 Creating volume heketidbstorage replica 3
panic: REQUIRE:
	func (github.com/heketi/heketi/executors/sshexec.(*SshExecutor).createVolumeOptionsCommand) 0x75b4c4
	File /builddir/build/BUILD/heketi-5.0.0/src/github.com/heketi/heketi/executors/sshexec/volume.go:161
goroutine 86 [running]:
github.com/lpabon/godbc.dbc_panic(0x180e11e, 0x7, 0x0, 0x0, 0x0, 0x0)
	/builddir/build/BUILD/heketi-5.0.0/src/github.com/lpabon/godbc/godbc.go:85 +0x2f4
github.com/lpabon/godbc.Require(0x7f13cf62f000, 0x0, 0x0, 0x0)
	/builddir/build/BUILD/heketi-5.0.0/src/github.com/lpabon/godbc/godbc.go:100 +0x5d
github.com/heketi/heketi/executors/sshexec.(*SshExecutor).createVolumeOptionsCommand(0xc4200a0630, 0xc4205aaae0, 0x1, 0x1, 0xc4201fe7d0)
	/builddir/build/BUILD/heketi-5.0.0/src/github.com/heketi/heketi/executors/sshexec/volume.go:161 +0x64
github.com/heketi/heketi/executors/sshexec.(*SshExecutor).VolumeCreate(0xc4200a0630, 0xc420215500, 0x21, 0xc4205aaae0, 0x0, 0x0, 0x0)
	/builddir/build/BUILD/heketi-5.0.0/src/github.com/heketi/heketi/executors/sshexec/volume.go:65 +0x4f2
github.com/heketi/heketi/apps/glusterfs.(*VolumeEntry).createVolume(0xc4200d4140, 0xc4202b7c20, 0x22fefe0, 0xc4200a0630, 0xc42016b440, 0x3, 0x4, 0x3, 0x4)
	/builddir/build/BUILD/heketi-5.0.0/src/github.com/heketi/heketi/apps/glusterfs/volume_entry_create.go:37 +0x126
github.com/heketi/heketi/apps/glusterfs.(*VolumeEntry).Create(0xc4200d4140, 0xc4202b7c20, 0x22fefe0, 0xc4200a0630, 0x22f9560, 0xc420113060, 0x0, 0x0)
	/builddir/build/BUILD/heketi-5.0.0/src/github.com/heketi/heketi/apps/glusterfs/volume_entry.go:387 +0xa94
github.com/heketi/heketi/apps/glusterfs.(*App).VolumeCreate.func2(0xed0ed2aa4, 0x25b8622e, 0x2357ee0, 0xc4204ad790)
	/builddir/build/BUILD/heketi-5.0.0/src/github.com/heketi/heketi/apps/glusterfs/app_volume.go:147 +0x11a
github.com/heketi/rest.(*AsyncHttpManager).AsyncHttpRedirectFunc.func1(0xc4202e9bc0, 0xc42018c0a0)
	/builddir/build/BUILD/heketi-5.0.0/src/github.com/heketi/rest/asynchttp.go:128 +0xf4
created by github.com/heketi/rest.(*AsyncHttpManager).AsyncHttpRedirectFunc
	/builddir/build/BUILD/heketi-5.0.0/src/github.com/heketi/rest/asynchttp.go:138 +0x60

Comment 6 Raghavendra Talur 2017-07-04 06:38:18 UTC
Seth and Derek,

The output of the command this time seems to be different that what we got in first iteration. 

1st time: Id not found
2nd time: panic in heketi


To answer other questions:
There is no load balancer, heketi pod does not have replicas.
storage router is setup so that client node can send commands to heketi-cli and such.


I am interested to know how you figured which line in cns-deploy was failing based on the attached logs. I could not find any reference to heketi-cli being used.


IMO, the original "Id not found" error would not have happened unless heketi pod got a error for one of the kubeexec commands it runs. The heketi pod talks to gluster pods using kubeexec. We were thinking the proxy errors are related tot that. Hence the question.

Comment 8 Humble Chirammal 2017-07-04 07:40:21 UTC
(In reply to Seth Jennings from comment #3)
> So it is failing here:
> 
> https://github.com/gluster/gluster-kubernetes/blob/master/deploy/gk-
> deploy#L575-L579
> 
> Which means this command is failing:
> 
> heketi-cli -s http://${heketi_service} --user admin --secret
> \"${ADMIN_KEY}\" setup-openshift-heketi-storage
> 
> What is the output of that command (with ADMIN_KEY filled in)?


Seth and Derek, Thanks for your input and for confirming that 'proxy errors are red herring'.

This is failing on above mentioned heketi-cli command, but its not specific to "ADMIN_KEY".

>
What is the "storage-project-router-1-ndqth"? Is a LB in front of the heketi server? Because it has restarted 4 times and seems to be having issues
>

More or less, yes, we could say we have a LB ( route) in front of heketi server in openshift context. Looking at current input, it seems that, something going wrong with heketi service/pod. 

One question I have here is, will the router get restarted if there is anything wrong with the associated service/pod exposing the service ?

Comment 13 Apeksha 2017-07-04 10:00:53 UTC
i have pasted the deploy-heketi-pod logs in comment 5 - https://bugzilla.redhat.com/show_bug.cgi?id=1467318#c5

Comment 14 Apeksha 2017-07-05 06:20:22 UTC
Not hitting this issue in the new build. 


oadm version-
oadm v3.6.133
kubernetes v1.6.1+5115d708d7

Server https://dhcp46-24.lab.eng.blr.redhat.com:8443
openshift v3.6.133
kubernetes v1.6.1+5115d708d7

oc get nodes-
NAME                                STATUS                     AGE       VERSION^M
dhcp46-133.lab.eng.blr.redhat.com   Ready                      41m       v1.6.1+5115d708d7
dhcp46-24.lab.eng.blr.redhat.com    Ready,SchedulingDisabled   41m       v1.6.1+5115d708d7
dhcp47-174.lab.eng.blr.redhat.com   Ready                      41m       v1.6.1+5115d708d7
dhcp47-183.lab.eng.blr.redhat.com   Ready                      41m       v1.6.1+5115d708d7

oc get pods-
NAME                             READY     STATUS    RESTARTS   AGE
glusterfs-8x00s                  1/1       Running   0          19m
glusterfs-9px1b                  1/1       Running   0          19m
glusterfs-mhq14                  1/1       Running   0          19m
heketi-1-cj2t4                   1/1       Running   0          14m
storage-project-router-1-qxs8h   1/1       Running   0          24m

cns-deploy version- cns-deploy-5.0.0-5.el7rhgs.x86_64
heketi-client version- heketi-client-5.0.0-3.el7rhgs.x86_64
docker-client version- docker-client-1.12.6-32.git88a4867.el7.x86_64
rhgs-server-id- rhgs3/rhgs-server-rhel7:3.3.0-6
rhgs-volmanager-id- rhgs3/rhgs-volmanager-rhel7:3.3.0-3

Comment 15 errata-xmlrpc 2017-10-11 07:07:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:2879