Bug 1613260 - CNS: heketi pod not inready state - Liveness probe failed
Summary: CNS: heketi pod not inready state - Liveness probe failed
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: heketi
Version: cns-3.9
Hardware: x86_64
OS: Linux
Target Milestone: ---
: CNS 3.10
Assignee: John Mulligan
QA Contact: Apeksha
Depends On:
Blocks: 1568862
TreeView+ depends on / blocked
Reported: 2018-08-07 10:30 UTC by Apeksha
Modified: 2018-09-12 09:24 UTC (History)
11 users (show)

Fixed In Version: heketi-7.0.0-6.el7rhgs
Doc Type: Bug Fix
Doc Text:
Previously, the heketi startup script for the container image used a blocking file lock with a time out. This could cause issues when the heketi db is stored on gluster volumes. With this update, the startup script now polls for db availability with a non-blocking file lock.
Clone Of:
Last Closed: 2018-09-12 09:23:51 UTC
Target Upstream Version:

Attachments (Terms of Use)
State dumps (15.41 KB, application/x-xz)
2018-08-07 18:23 UTC, John Mulligan
no flags Details

System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1594623 0 high CLOSED RFE: implement fuse-interrupt support in glusterfs 2021-02-22 00:41:40 UTC
Red Hat Product Errata RHEA-2018:2686 0 None None None 2018-09-12 09:24:40 UTC

Internal Links: 1594623

Description Apeksha 2018-08-07 10:30:02 UTC
Description of problem:
CNS: heketi pod not inready state -  Liveness probe failed

Version-Release number of selected component (if applicable):

How reproducible: Once

Steps to Reproduce:
1. Ran the CI -automated cases
 Setup state after running automated cases:
[root@ap-v310-live-master-0 ~]# oc get pods
NAME                                      READY     STATUS    RESTARTS   AGE
glusterblock-cns-provisioner-dc-1-rb8d2   1/1       Running   0          12h
glusterfs-cns-42pdv                       1/1       Running   0          12h
glusterfs-cns-bhwkl                       1/1       Running   1          10h
glusterfs-cns-x7wz2                       1/1       Running   0          12h
heketi-cns-1-fzf74                        1/1       Running   0          12h

2. added 1 new device:
root@ap-v310-live-master-0 ~]# heketi-cli -s device add  --name=/dev/sde --node=2e57d741b1f98ee7fff28c74ed5cc195 --user=admin --secret=admin 
Device added successfully
[root@ap-v310-live-master-0 ~]# heketi-cli -s node info 2e57d741b1f98ee7fff28c74ed5cc195 --user=admin --secret=admin 
Node Id: 2e57d741b1f98ee7fff28c74ed5cc195
State: online
Cluster Id: 81b4d3bb8f886a694314345153ebe135
Zone: 46153
Management Hostname: ap-v310-live-app-cns-1
Storage Hostname:
Id:43606302cc79f2a7abcd5830f9624d0f   Name:/dev/sde            State:online    Size (GiB):599     Used (GiB):0       Free (GiB):599     
Id:5139384d7d59ce5eb415120c3d409321   Name:/dev/sdd            State:online    Size (GiB):99      Used (GiB):26      Free (GiB):73      

[root@ap-v310-live-master-0 ~]# heketi-cli -s topology info --user=admin --secret=admin | grep Used
		Id:43606302cc79f2a7abcd5830f9624d0f   Name:/dev/sde            State:online    Size (GiB):599     Used (GiB):1       Free (GiB):598     
		Id:5139384d7d59ce5eb415120c3d409321   Name:/dev/sdd            State:online    Size (GiB):99      Used (GiB):27      Free (GiB):72      
		Id:665770e549fe00beb7cd6ae3aa1e210e   Name:/dev/sde            State:online    Size (GiB):599     Used (GiB):20      Free (GiB):579     
		Id:ce445f73a3a9451fdf8b97156fab1cc8   Name:/dev/sdd            State:online    Size (GiB):99      Used (GiB):6       Free (GiB):93      
		Id:a8644538ea56df20eeeef6c3e841bfd4   Name:/dev/sde            State:online    Size (GiB):599     Used (GiB):16      Free (GiB):583     
		Id:e9e5f6194454dac0973225d22a93ee59   Name:/dev/sdd            State:online    Size (GiB):99      Used (GiB):10      Free (GiB):89      

3. started deleting heketi volumes.

[root@ap-v310-live-master-0 ~]# heketi-cli -s volume list --user=admin --secret=admin 
Id:00a00d449b9c36eb8686c36d3cb2005c    Cluster:81b4d3bb8f886a694314345153ebe135    Name:vol_00a00d449b9c36eb8686c36d3cb2005c
Id:12ec7120fe44f629bc9ee0fb0d7e48de    Cluster:81b4d3bb8f886a694314345153ebe135    Name:vol_12ec7120fe44f629bc9ee0fb0d7e48de
Id:3f013706010308b612ba45ec367db640    Cluster:81b4d3bb8f886a694314345153ebe135    Name:vol_3f013706010308b612ba45ec367db640
Id:47039b0b2701a16b86a61374a2585aa8    Cluster:81b4d3bb8f886a694314345153ebe135    Name:heketidbstorage
Id:62104fbaf138a2dcf6e72b595fc36ea7    Cluster:81b4d3bb8f886a694314345153ebe135    Name:vol_62104fbaf138a2dcf6e72b595fc36ea7
Id:91c3852ecf63a96683afedabc8e1c24f    Cluster:81b4d3bb8f886a694314345153ebe135    Name:vol_91c3852ecf63a96683afedabc8e1c24f
Id:9555959d1b86b7638adf91a0d0160c51    Cluster:81b4d3bb8f886a694314345153ebe135    Name:vol_9555959d1b86b7638adf91a0d0160c51
Id:9d016cf374bf03899168ad40f7503991    Cluster:81b4d3bb8f886a694314345153ebe135    Name:vol_9d016cf374bf03899168ad40f7503991
Id:a7399e6ae979482549569999a27bc436    Cluster:81b4d3bb8f886a694314345153ebe135    Name:cns-vol_cns_mongodb5_c1e4fa4a-99bc-11e8-b9e8-005056a589a3
Id:b35201645ff2fe12488eb598d364c77c    Cluster:81b4d3bb8f886a694314345153ebe135    Name:vol_b35201645ff2fe12488eb598d364c77c
Id:c759c9744f45bbd07514f8fd7947a9d2    Cluster:81b4d3bb8f886a694314345153ebe135    Name:vol_c759c9744f45bbd07514f8fd7947a9d2
Id:e3f2b3d96cb277c441cff2a581895e47    Cluster:81b4d3bb8f886a694314345153ebe135    Name:rally-vo6i2ntq9hjb36
Id:eada2d1f7afbae686dcd84768b0be0f4    Cluster:81b4d3bb8f886a694314345153ebe135    Name:vol_eada2d1f7afbae686dcd84768b0be0f4
Id:f1ab2878afabe1fc5a6c20cd5a8f0bb1    Cluster:81b4d3bb8f886a694314345153ebe135    Name:vol_f1ab2878afabe1fc5a6c20cd5a8f0bb1

[root@ap-v310-live-master-0 ~]# oc rsh glusterfs-cns-42pdv gluster v list

[root@ap-v310-live-master-0 ~]# heketi-cli -s volume delete 3f013706010308b612ba45ec367db640 --user=admin --secret=admin 
Volume 3f013706010308b612ba45ec367db640 deleted
[root@ap-v310-live-master-0 ~]# heketi-cli -s volume delete 91c3852ecf63a96683afedabc8e1c24f --user=admin --secret=admin 
Volume 91c3852ecf63a96683afedabc8e1c24f deleted
[root@ap-v310-live-master-0 ~]# heketi-cli -s volume delete 9555959d1b86b7638adf91a0d0160c51 --user=admin --secret=admin 
^[[AVolume 9555959d1b86b7638adf91a0d0160c51 deleted
[root@ap-v310-live-master-0 ~]# heketi-cli -s volume delete 9d016cf374bf03899168ad40f7503991 --user=admin --secret=admin 
Volume 9d016cf374bf03899168ad40f7503991 deleted
[root@ap-v310-live-master-0 ~]# heketi-cli -s volume delete b35201645ff2fe12488eb598d364c77c --user=admin --secret=admin 
Volume b35201645ff2fe12488eb598d364c77c deleted
[root@ap-v310-live-master-0 ~]# heketi-cli -s volume delete c759c9744f45bbd07514f8fd7947a9d2 --user=admin --secret=admin 
Volume c759c9744f45bbd07514f8fd7947a9d2 deleted
[root@ap-v310-live-master-0 ~]# heketi-cli -s volume delete cb0d756bbc21175603495d33e76e581e --user=admin --secret=admin 
Volume cb0d756bbc21175603495d33e76e581e deleted
[root@ap-v310-live-master-0 ~]# heketi-cli -s volume delete eada2d1f7afbae686dcd84768b0be0f4 --user=admin --secret=admin 
Volume eada2d1f7afbae686dcd84768b0be0f4 deleted
root@ap-v310-live-master-0 ~]# heketi-cli -s volume delete f1ab2878afabe1fc5a6c20cd5a8f0bb1 --user=admin --secret=admin 
Error: Get dial tcp getsockopt: connection refused

[root@ap-v310-live-master-0 ~]# heketi-cli -s volume list --user=admin --secret=admin 
Error: Get dial tcp getsockopt: connection refused

[root@ap-v310-live-master-0 ~]# curl
curl: (7) Failed connect to; Connection refused

[root@ap-v310-live-master-0 ~]# oc get svc
NAME                                                TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
glusterfs-dynamic-mongodb5                          ClusterIP    <none>        1/TCP      12h
glusterfs-dynamic-volumes-concurrently-7pxfa400-1   ClusterIP   <none>        1/TCP      12h
heketi-cns                                          ClusterIP   8080/TCP   12h
heketi-db-cns-endpoints                             ClusterIP      <none>        1/TCP      12h
[root@ap-v310-live-master-0 ~]# curl
curl: (7) Failed connect to; Connection refused

[root@ap-v310-live-master-0 ~]# oc get pods
NAME                                      READY     STATUS    RESTARTS   AGE
glusterblock-cns-provisioner-dc-1-rb8d2   1/1       Running   0          12h
glusterfs-cns-42pdv                       1/1       Running   0          13h
glusterfs-cns-bhwkl                       1/1       Running   1          11h
glusterfs-cns-x7wz2                       1/1       Running   0          12h
heketi-cns-1-fzf74                        0/1       Running   1          12h
[root@ap-v310-live-master-0 ~]# oc describe pod heketi-cns-1-fzf74 
Name:           heketi-cns-1-fzf74
Namespace:      cns
Node:           ap-v310-live-app-cns-2/
Start Time:     Mon, 06 Aug 2018 21:06:23 +0000
Labels:         deployment=heketi-cns-1
Annotations:    openshift.io/deployment-config.latest-version=1
Status:         Running
Controlled By:  ReplicationController/heketi-cns-1
    Container ID:   docker://59735747aadff9384586c0c0479724289a3c12a435914053e943e2ac59c0666b
    Image:          rhgs3/rhgs-volmanager-rhel7:latest
    Image ID:       docker-pullable://registry.access.redhat.com/rhgs3/rhgs-volmanager-rhel7@sha256:a7b31e1515ef5f6eacaf10c5fc9cc0247695b6382e535753f21294d73f874db0
    Port:           8080/TCP
    Host Port:      0/TCP
    State:          Running
      Started:      Tue, 07 Aug 2018 09:46:08 +0000
    Last State:     Terminated
      Reason:       Error
      Exit Code:    137
      Started:      Mon, 06 Aug 2018 21:06:26 +0000
      Finished:     Tue, 07 Aug 2018 09:46:07 +0000
    Ready:          False
    Restart Count:  1
    Liveness:       http-get http://:8080/hello delay=30s timeout=3s period=10s #success=1 #failure=3
    Readiness:      http-get http://:8080/hello delay=3s timeout=3s period=10s #success=1 #failure=3
      HEKETI_USER_KEY:                 user
      HEKETI_ADMIN_KEY:                admin
      HEKETI_EXECUTOR:                 kubernetes
      HEKETI_FSTAB:                    /var/lib/heketi/fstab
      HEKETI_SNAPSHOT_LIMIT:           14
      /etc/heketi from config (rw)
      /var/lib/heketi from db (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from heketi-cns-service-account-token-x584g (ro)
  Type           Status
  Initialized    True 
  Ready          False 
  PodScheduled   True 
    Type:           Glusterfs (a Glusterfs mount on the host that shares a pod's lifetime)
    EndpointsName:  heketi-db-cns-endpoints
    Path:           heketidbstorage
    ReadOnly:       false
    Type:        Secret (a volume populated by a Secret)
    SecretName:  heketi-cns-config-secret
    Optional:    false
    Type:        Secret (a volume populated by a Secret)
    SecretName:  heketi-cns-service-account-token-x584g
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     <none>
  Type     Reason     Age               From                             Message
  ----     ------     ----              ----                             -------
  Warning  Unhealthy  2m (x2 over 2m)   kubelet, ap-v310-live-app-cns-2  Readiness probe failed: Get net/http: request canceled (Client.Timeout exceeded while awaiting headers)
  Warning  Unhealthy  2m (x3 over 2m)   kubelet, ap-v310-live-app-cns-2  Liveness probe failed: Get net/http: request canceled (Client.Timeout exceeded while awaiting headers)
  Normal   Pulled     1m (x2 over 12h)  kubelet, ap-v310-live-app-cns-2  Container image "rhgs3/rhgs-volmanager-rhel7:latest" already present on machine
  Normal   Created    1m (x2 over 12h)  kubelet, ap-v310-live-app-cns-2  Created container
  Normal   Started    1m (x2 over 12h)  kubelet, ap-v310-live-app-cns-2  Started container
  Normal   Killing    1m                kubelet, ap-v310-live-app-cns-2  Killing container with id docker://heketi:Container failed liveness probe.. Container will be killed and recreated.
  Warning  Unhealthy  56s (x2 over 1m)  kubelet, ap-v310-live-app-cns-2  Liveness probe failed: Get dial tcp getsockopt: connection refused
  Warning  Unhealthy  9s (x10 over 1m)  kubelet, ap-v310-live-app-cns-2  Readiness probe failed: Get dial tcp getsockopt: connection refused

Comment 2 Apeksha 2018-08-07 12:06:41 UTC
since the heketi pod is not in ready state, the heketi logs is empty.(note setup is in same state)
sosreports from all nodes available at - http://rhsqe-repo.lab.eng.blr.redhat.com/cns/bugs/BZ-1613260/

Comment 4 John Mulligan 2018-08-07 18:23:49 UTC
Created attachment 1474074 [details]
State dumps

We susp[ect this is an issue with the heketidbstorage gluster volume not properly handling the timout of a blocking flock.
See also:  https://github.com/gluster/glusterfs/issues/465

We're attaching statedump so that someone on the gluster dev team can confirm.

Comment 5 John Mulligan 2018-08-07 18:26:49 UTC
Talur asks me to ask: do we see an issue here with the lock translator?

Comment 6 John Mulligan 2018-08-07 18:28:04 UTC
In the short term we may need to add a workaround to the Heketi start up script to avoid using a blocking flock, however before we do that we'd like to confirm that this is really the problem.

Comment 7 John Mulligan 2018-08-07 18:30:51 UTC
Node packages: 
[root@ap-v310-live-app-cns-2 ~]# rpm -qa | grep gluster

Pod (server / brick server) packages:
sh-4.2# rpm -qa | grep gluster

Comment 9 Krutika Dhananjay 2018-08-09 05:38:06 UTC
Sorry, I didn't get a chance to take a look at this y'day. Will check and get back.


Comment 10 Krutika Dhananjay 2018-08-09 06:36:40 UTC
I'm not able to completely make sense out of the data yet.

This is what I see in the brick statedump:

posixlk.posixlk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid = 42678, owner=8124271b980976b3, client=0x7f2de402ead0, connection-id=(null), granted at 2018-08-06 21:06:27

There is one active (granted) flock on /heketi.db acquired by pid 42678 on this brick and this brick happens to be the third brick in this replica set (or heketidbstorage-client-2).

There are no other locks blocked on this file (at least not on this brick).

But the client statedump tells a slightly different story. There is an lk call triggered by fuse mount (pid 42547) that has completed on the first and the second brick but still waiting on the third brick (heketidbstorage-client-2).

1. Are we sure that the affected client to heketidbstorage volume was being accessed through the fuse mount whose pid is 42547?

2. Do we have the statedump from the other two bricks?

Better yet, could you share the setup details if you have it so that I can take a look?


Comment 15 Raghavendra Talur 2018-08-09 16:21:22 UTC
Location of gluster statedumps and how to find them in OCP.

Remember that gluster volumes are mounted on the host node where the application pod is running. So the gluster client process is on the OCP node where app pod is deployed. Also note that the mount process has different log paths. In general

1. for client dumps use find /run -name "*dump*" on the node where gluster client is running
2. for bricks, on the nodes where gluster pods are running use find /var/lib/origin/openshift.local.volumes/pods/ -name "*dump*"
3. remember that gluster brick statedumps won't have the same PID in the filename as what you see on the node. The pid  in filename is the PID of the brick process in the POD namespace

Comment 16 Krutika Dhananjay 2018-08-09 17:09:36 UTC
Thanks Talur for explaining the problem and helping with the statedump!

Here's what I see in the statedump of the first replica of the heketidbstorage volume:

posixlk.posixlk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid = 15803, owner=28ac2db21d8caa2e, client=0x7f9b5416a130, connection-id=(null), granted at 2018-08-07 09:46:08
posixlk.posixlk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 44289, owner=3254c93825f30313, client=0x7f9b5c6ff600, connection-id=(null), blocked at 2018-08-07 09:53:34
posixlk.posixlk[2](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 22058, owner=0e37f83eae6d24b0, client=0x7f9b5c6ff600, connection-id=(null), blocked at 2018-08-07 15:56:23

There is one active lock and two blocked locks.
As per the statedump, pid 15803 as held on to this lock since August 07 09:46.

This corresponds to the process 15803 on cns2, confirmed by the following info from ps output:

root      15803  0.0  0.0   4356   348 ?        D    Aug07   0:00 flock -w 60 /var/lib/heketi/heketi.db true

The other two flocks are from pids 44589 and 22058 and have been blocked on this since Aug 07 09:53hrs and 15:56hrs respectively.

And ps output on cns0 gives me this:
root      22058  0.0  0.0 107944   640 ?        D    Aug07   0:00 flock -w 5 /var/lib/origin/openshift.local.volumes/pods/b74bdfd0-9a27-11e8-a4bc-005056a589a3/volumes/kubernetes.io~glusterfs/db/heketi.db true                             
root      44289  0.0  0.0   4356   348 ?        D    Aug07   0:00 flock -w 60 /var/lib/heketi/heketi.db true

It does seem like this is because of interrupt on the first application not leading to release of the locks, although I'm not able to tell what's the difference between "/var/lib/origin/openshift.local.volumes/pods/b74bdfd0-9a27-11e8-a4bc-005056a589a3/volumes/kubernetes.io~glusterfs/db/heketi.db" true and "/var/lib/heketi/heketi.db"

Talur, would you know?


Comment 17 Raghavendra Talur 2018-08-09 17:16:00 UTC
Flock command that has -w 5 is a debugging attempt, ignore it. That is probably why the path is different too.

Comment 18 John Mulligan 2018-08-09 17:18:43 UTC
(In reply to Raghavendra Talur from comment #17)
> Flock command that has -w 5 is a debugging attempt, ignore it. That is
> probably why the path is different too.

Indeed. I believe I may have run that command (outside the container but against the same mountpoint) when I first started looking at this issue.

Comment 21 Raghavendra Talur 2018-08-10 18:00:38 UTC
Comment 16 establishes the root cause to be same as described in 1594623. We have a patch in heketi-start.sh script that avoids the bug. Not setting this as a depend on for the same reason.

Comment 24 John Mulligan 2018-08-17 14:51:39 UTC
To verify this bug I believe that the easiest approach is to:
* Mount heketidbstorage somwhere you can manually run flock command
* Stop any running heketi pods
* On your mount, cd to path where the heketi db resides
* Run: 'flock -w 60 heketi.db cat'
* This command should block (waiting on EOF in cat)
* Background the flock command _or_ switch to a 2nd terminal
(At this point you are simulating another "stuck" heketi pod)
* Start the heketi pod and monitor the state of the heketi container

Prior to the fix the command's -w 60 flag indicates that the command should time out after 60 seconds. However, this will not happen and the pod will block starting up.

After the fix, the script will "poll" the lock and time out after 60 seconds with a log message indicating that the db is read only.

* After the test is over terminate the cat command and flock will exit, releasing the lock
* Unmount the heketidbstorage volume
* You may want to reboot any nodes that "stuck" heketi pods are running on before doing anything else with that cluster.

Comment 25 Apeksha 2018-08-21 08:28:09 UTC
In which build is this fixed?

Comment 26 Apeksha 2018-08-23 13:09:29 UTC
Ran the heketi automated cases on build heketi-client-7.0.0-6.el7rhgs.x86_64, dint hit this issue, hence marking it as verified.

Comment 27 Anjana KD 2018-08-30 06:51:11 UTC
Updated doc text in the Doc Text field. Please review for technical accuracy.

Comment 29 Anjana KD 2018-09-07 18:24:37 UTC
Thanks John, I have updated the doc text based on your feedback.

Comment 30 John Mulligan 2018-09-07 19:03:10 UTC
Doc Text looks OK

Comment 32 errata-xmlrpc 2018-09-12 09:23:51 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.