Description of problem: CNS: heketi pod not inready state - Liveness probe failed Version-Release number of selected component (if applicable): heketi-client-6.0.0-7.4.el7rhgs.x86_64 atomic-openshift-node-3.10.14-1.git.0.ba8ae6d.el7.x86_64 How reproducible: Once Steps to Reproduce: 1. Ran the CI -automated cases Setup state after running automated cases: [root@ap-v310-live-master-0 ~]# oc get pods NAME READY STATUS RESTARTS AGE glusterblock-cns-provisioner-dc-1-rb8d2 1/1 Running 0 12h glusterfs-cns-42pdv 1/1 Running 0 12h glusterfs-cns-bhwkl 1/1 Running 1 10h glusterfs-cns-x7wz2 1/1 Running 0 12h heketi-cns-1-fzf74 1/1 Running 0 12h 2. added 1 new device: root@ap-v310-live-master-0 ~]# heketi-cli -s http://10.70.47.73:8080 device add --name=/dev/sde --node=2e57d741b1f98ee7fff28c74ed5cc195 --user=admin --secret=admin Device added successfully [root@ap-v310-live-master-0 ~]# heketi-cli -s http://10.70.47.73:8080 node info 2e57d741b1f98ee7fff28c74ed5cc195 --user=admin --secret=admin Node Id: 2e57d741b1f98ee7fff28c74ed5cc195 State: online Cluster Id: 81b4d3bb8f886a694314345153ebe135 Zone: 46153 Management Hostname: ap-v310-live-app-cns-1 Storage Hostname: 10.70.46.153 Devices: Id:43606302cc79f2a7abcd5830f9624d0f Name:/dev/sde State:online Size (GiB):599 Used (GiB):0 Free (GiB):599 Id:5139384d7d59ce5eb415120c3d409321 Name:/dev/sdd State:online Size (GiB):99 Used (GiB):26 Free (GiB):73 [root@ap-v310-live-master-0 ~]# heketi-cli -s http://10.70.47.73:8080 topology info --user=admin --secret=admin | grep Used Id:43606302cc79f2a7abcd5830f9624d0f Name:/dev/sde State:online Size (GiB):599 Used (GiB):1 Free (GiB):598 Id:5139384d7d59ce5eb415120c3d409321 Name:/dev/sdd State:online Size (GiB):99 Used (GiB):27 Free (GiB):72 Id:665770e549fe00beb7cd6ae3aa1e210e Name:/dev/sde State:online Size (GiB):599 Used (GiB):20 Free (GiB):579 Id:ce445f73a3a9451fdf8b97156fab1cc8 Name:/dev/sdd State:online Size (GiB):99 Used (GiB):6 Free (GiB):93 Id:a8644538ea56df20eeeef6c3e841bfd4 Name:/dev/sde State:online Size (GiB):599 Used (GiB):16 Free (GiB):583 Id:e9e5f6194454dac0973225d22a93ee59 Name:/dev/sdd State:online Size (GiB):99 Used (GiB):10 Free (GiB):89 3. started deleting heketi volumes. [root@ap-v310-live-master-0 ~]# heketi-cli -s http://10.70.47.73:8080 volume list --user=admin --secret=admin Id:00a00d449b9c36eb8686c36d3cb2005c Cluster:81b4d3bb8f886a694314345153ebe135 Name:vol_00a00d449b9c36eb8686c36d3cb2005c Id:12ec7120fe44f629bc9ee0fb0d7e48de Cluster:81b4d3bb8f886a694314345153ebe135 Name:vol_12ec7120fe44f629bc9ee0fb0d7e48de Id:3f013706010308b612ba45ec367db640 Cluster:81b4d3bb8f886a694314345153ebe135 Name:vol_3f013706010308b612ba45ec367db640 Id:47039b0b2701a16b86a61374a2585aa8 Cluster:81b4d3bb8f886a694314345153ebe135 Name:heketidbstorage Id:62104fbaf138a2dcf6e72b595fc36ea7 Cluster:81b4d3bb8f886a694314345153ebe135 Name:vol_62104fbaf138a2dcf6e72b595fc36ea7 Id:91c3852ecf63a96683afedabc8e1c24f Cluster:81b4d3bb8f886a694314345153ebe135 Name:vol_91c3852ecf63a96683afedabc8e1c24f Id:9555959d1b86b7638adf91a0d0160c51 Cluster:81b4d3bb8f886a694314345153ebe135 Name:vol_9555959d1b86b7638adf91a0d0160c51 Id:9d016cf374bf03899168ad40f7503991 Cluster:81b4d3bb8f886a694314345153ebe135 Name:vol_9d016cf374bf03899168ad40f7503991 Id:a7399e6ae979482549569999a27bc436 Cluster:81b4d3bb8f886a694314345153ebe135 Name:cns-vol_cns_mongodb5_c1e4fa4a-99bc-11e8-b9e8-005056a589a3 Id:b35201645ff2fe12488eb598d364c77c Cluster:81b4d3bb8f886a694314345153ebe135 Name:vol_b35201645ff2fe12488eb598d364c77c Id:c759c9744f45bbd07514f8fd7947a9d2 Cluster:81b4d3bb8f886a694314345153ebe135 Name:vol_c759c9744f45bbd07514f8fd7947a9d2 Id:e3f2b3d96cb277c441cff2a581895e47 Cluster:81b4d3bb8f886a694314345153ebe135 Name:rally-vo6i2ntq9hjb36 Id:eada2d1f7afbae686dcd84768b0be0f4 Cluster:81b4d3bb8f886a694314345153ebe135 Name:vol_eada2d1f7afbae686dcd84768b0be0f4 Id:f1ab2878afabe1fc5a6c20cd5a8f0bb1 Cluster:81b4d3bb8f886a694314345153ebe135 Name:vol_f1ab2878afabe1fc5a6c20cd5a8f0bb1 [root@ap-v310-live-master-0 ~]# oc rsh glusterfs-cns-42pdv gluster v list cns-vol_cns_mongodb5_c1e4fa4a-99bc-11e8-b9e8-005056a589a3 heketidbstorage rally-vo6i2ntq9hjb36 vol_00a00d449b9c36eb8686c36d3cb2005c vol_12ec7120fe44f629bc9ee0fb0d7e48de vol_3f013706010308b612ba45ec367db640 vol_62104fbaf138a2dcf6e72b595fc36ea7 vol_846af6cc3d2087d7099d9fc01b2cb7e7 vol_91c3852ecf63a96683afedabc8e1c24f vol_9555959d1b86b7638adf91a0d0160c51 vol_9d016cf374bf03899168ad40f7503991 vol_b35201645ff2fe12488eb598d364c77c vol_c759c9744f45bbd07514f8fd7947a9d2 vol_eada2d1f7afbae686dcd84768b0be0f4 vol_f1ab2878afabe1fc5a6c20cd5a8f0bb1 [root@ap-v310-live-master-0 ~]# heketi-cli -s http://10.70.47.73:8080 volume delete 3f013706010308b612ba45ec367db640 --user=admin --secret=admin Volume 3f013706010308b612ba45ec367db640 deleted [root@ap-v310-live-master-0 ~]# heketi-cli -s http://10.70.47.73:8080 volume delete 91c3852ecf63a96683afedabc8e1c24f --user=admin --secret=admin Volume 91c3852ecf63a96683afedabc8e1c24f deleted [root@ap-v310-live-master-0 ~]# heketi-cli -s http://10.70.47.73:8080 volume delete 9555959d1b86b7638adf91a0d0160c51 --user=admin --secret=admin ^[[AVolume 9555959d1b86b7638adf91a0d0160c51 deleted [root@ap-v310-live-master-0 ~]# heketi-cli -s http://10.70.47.73:8080 volume delete 9d016cf374bf03899168ad40f7503991 --user=admin --secret=admin Volume 9d016cf374bf03899168ad40f7503991 deleted [root@ap-v310-live-master-0 ~]# heketi-cli -s http://10.70.47.73:8080 volume delete b35201645ff2fe12488eb598d364c77c --user=admin --secret=admin Volume b35201645ff2fe12488eb598d364c77c deleted [root@ap-v310-live-master-0 ~]# heketi-cli -s http://10.70.47.73:8080 volume delete c759c9744f45bbd07514f8fd7947a9d2 --user=admin --secret=admin Volume c759c9744f45bbd07514f8fd7947a9d2 deleted [root@ap-v310-live-master-0 ~]# heketi-cli -s http://10.70.47.73:8080 volume delete cb0d756bbc21175603495d33e76e581e --user=admin --secret=admin Volume cb0d756bbc21175603495d33e76e581e deleted [root@ap-v310-live-master-0 ~]# heketi-cli -s http://10.70.47.73:8080 volume delete eada2d1f7afbae686dcd84768b0be0f4 --user=admin --secret=admin Volume eada2d1f7afbae686dcd84768b0be0f4 deleted [ root@ap-v310-live-master-0 ~]# heketi-cli -s http://10.70.47.73:8080 volume delete f1ab2878afabe1fc5a6c20cd5a8f0bb1 --user=admin --secret=admin Error: Get http://10.70.47.73:8080/queue/7de85cc50a4abdfdebb4bf7850ae1c28: dial tcp 10.70.47.73:8080: getsockopt: connection refused [root@ap-v310-live-master-0 ~]# heketi-cli -s http://10.70.47.73:8080 volume list --user=admin --secret=admin Error: Get http://10.70.47.73:8080/volumes: dial tcp 10.70.47.73:8080: getsockopt: connection refused [root@ap-v310-live-master-0 ~]# curl http://10.70.47.73:8080/hello curl: (7) Failed connect to 10.70.47.73:8080; Connection refused [root@ap-v310-live-master-0 ~]# oc get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE glusterfs-dynamic-mongodb5 ClusterIP 172.30.44.184 <none> 1/TCP 12h glusterfs-dynamic-volumes-concurrently-7pxfa400-1 ClusterIP 172.30.157.215 <none> 1/TCP 12h heketi-cns ClusterIP 172.30.164.147 10.70.47.73 8080/TCP 12h heketi-db-cns-endpoints ClusterIP 172.30.1.71 <none> 1/TCP 12h [root@ap-v310-live-master-0 ~]# curl http://172.30.164.147:8080/hello curl: (7) Failed connect to 172.30.164.147:8080; Connection refused [root@ap-v310-live-master-0 ~]# oc get pods NAME READY STATUS RESTARTS AGE glusterblock-cns-provisioner-dc-1-rb8d2 1/1 Running 0 12h glusterfs-cns-42pdv 1/1 Running 0 13h glusterfs-cns-bhwkl 1/1 Running 1 11h glusterfs-cns-x7wz2 1/1 Running 0 12h heketi-cns-1-fzf74 0/1 Running 1 12h [root@ap-v310-live-master-0 ~]# oc describe pod heketi-cns-1-fzf74 Name: heketi-cns-1-fzf74 Namespace: cns Node: ap-v310-live-app-cns-2/10.70.46.206 Start Time: Mon, 06 Aug 2018 21:06:23 +0000 Labels: deployment=heketi-cns-1 deploymentconfig=heketi-cns glusterfs=heketi-cns-pod heketi=cns-pod Annotations: openshift.io/deployment-config.latest-version=1 openshift.io/deployment-config.name=heketi-cns openshift.io/deployment.name=heketi-cns-1 openshift.io/scc=privileged Status: Running IP: 172.16.8.7 Controlled By: ReplicationController/heketi-cns-1 Containers: heketi: Container ID: docker://59735747aadff9384586c0c0479724289a3c12a435914053e943e2ac59c0666b Image: rhgs3/rhgs-volmanager-rhel7:latest Image ID: docker-pullable://registry.access.redhat.com/rhgs3/rhgs-volmanager-rhel7@sha256:a7b31e1515ef5f6eacaf10c5fc9cc0247695b6382e535753f21294d73f874db0 Port: 8080/TCP Host Port: 0/TCP State: Running Started: Tue, 07 Aug 2018 09:46:08 +0000 Last State: Terminated Reason: Error Exit Code: 137 Started: Mon, 06 Aug 2018 21:06:26 +0000 Finished: Tue, 07 Aug 2018 09:46:07 +0000 Ready: False Restart Count: 1 Liveness: http-get http://:8080/hello delay=30s timeout=3s period=10s #success=1 #failure=3 Readiness: http-get http://:8080/hello delay=3s timeout=3s period=10s #success=1 #failure=3 Environment: HEKETI_USER_KEY: user HEKETI_ADMIN_KEY: admin HEKETI_EXECUTOR: kubernetes HEKETI_FSTAB: /var/lib/heketi/fstab HEKETI_SNAPSHOT_LIMIT: 14 HEKETI_KUBE_GLUSTER_DAEMONSET: 1 HEKETI_IGNORE_STALE_OPERATIONS: true Mounts: /etc/heketi from config (rw) /var/lib/heketi from db (rw) /var/run/secrets/kubernetes.io/serviceaccount from heketi-cns-service-account-token-x584g (ro) Conditions: Type Status Initialized True Ready False PodScheduled True Volumes: db: Type: Glusterfs (a Glusterfs mount on the host that shares a pod's lifetime) EndpointsName: heketi-db-cns-endpoints Path: heketidbstorage ReadOnly: false config: Type: Secret (a volume populated by a Secret) SecretName: heketi-cns-config-secret Optional: false heketi-cns-service-account-token-x584g: Type: Secret (a volume populated by a Secret) SecretName: heketi-cns-service-account-token-x584g Optional: false QoS Class: BestEffort Node-Selectors: <none> Tolerations: <none> Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning Unhealthy 2m (x2 over 2m) kubelet, ap-v310-live-app-cns-2 Readiness probe failed: Get http://172.16.8.7:8080/hello: net/http: request canceled (Client.Timeout exceeded while awaiting headers) Warning Unhealthy 2m (x3 over 2m) kubelet, ap-v310-live-app-cns-2 Liveness probe failed: Get http://172.16.8.7:8080/hello: net/http: request canceled (Client.Timeout exceeded while awaiting headers) Normal Pulled 1m (x2 over 12h) kubelet, ap-v310-live-app-cns-2 Container image "rhgs3/rhgs-volmanager-rhel7:latest" already present on machine Normal Created 1m (x2 over 12h) kubelet, ap-v310-live-app-cns-2 Created container Normal Started 1m (x2 over 12h) kubelet, ap-v310-live-app-cns-2 Started container Normal Killing 1m kubelet, ap-v310-live-app-cns-2 Killing container with id docker://heketi:Container failed liveness probe.. Container will be killed and recreated. Warning Unhealthy 56s (x2 over 1m) kubelet, ap-v310-live-app-cns-2 Liveness probe failed: Get http://172.16.8.7:8080/hello: dial tcp 172.16.8.7:8080: getsockopt: connection refused Warning Unhealthy 9s (x10 over 1m) kubelet, ap-v310-live-app-cns-2 Readiness probe failed: Get http://172.16.8.7:8080/hello: dial tcp 172.16.8.7:8080: getsockopt: connection refused
since the heketi pod is not in ready state, the heketi logs is empty.(note setup is in same state) sosreports from all nodes available at - http://rhsqe-repo.lab.eng.blr.redhat.com/cns/bugs/BZ-1613260/
Created attachment 1474074 [details] State dumps We susp[ect this is an issue with the heketidbstorage gluster volume not properly handling the timout of a blocking flock. See also: https://github.com/gluster/glusterfs/issues/465 We're attaching statedump so that someone on the gluster dev team can confirm.
Talur asks me to ask: do we see an issue here with the lock translator?
In the short term we may need to add a workaround to the Heketi start up script to avoid using a blocking flock, however before we do that we'd like to confirm that this is really the problem.
Node packages: [root@ap-v310-live-app-cns-2 ~]# rpm -qa | grep gluster glusterfs-3.8.4-53.el7.x86_64 glusterfs-fuse-3.8.4-53.el7.x86_64 glusterfs-libs-3.8.4-53.el7.x86_64 glusterfs-client-xlators-3.8.4-53.el7.x86_64 Pod (server / brick server) packages: sh-4.2# rpm -qa | grep gluster glusterfs-libs-3.8.4-54.10.el7rhgs.x86_64 glusterfs-3.8.4-54.10.el7rhgs.x86_64 glusterfs-api-3.8.4-54.10.el7rhgs.x86_64 glusterfs-server-3.8.4-54.10.el7rhgs.x86_64 glusterfs-client-xlators-3.8.4-54.10.el7rhgs.x86_64 glusterfs-cli-3.8.4-54.10.el7rhgs.x86_64 glusterfs-fuse-3.8.4-54.10.el7rhgs.x86_64 glusterfs-geo-replication-3.8.4-54.10.el7rhgs.x86_64 gluster-block-0.2.1-14.1.el7rhgs.x86_64
Sorry, I didn't get a chance to take a look at this y'day. Will check and get back. -Krutika
I'm not able to completely make sense out of the data yet. This is what I see in the brick statedump: <snip> ... ... [xlator.features.locks.heketidbstorage-locks.inode] path=/heketi.db mandatory=0 posixlk-count=1 posixlk.posixlk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid = 42678, owner=8124271b980976b3, client=0x7f2de402ead0, connection-id=(null), granted at 2018-08-06 21:06:27 ... ... </snip> There is one active (granted) flock on /heketi.db acquired by pid 42678 on this brick and this brick happens to be the third brick in this replica set (or heketidbstorage-client-2). There are no other locks blocked on this file (at least not on this brick). But the client statedump tells a slightly different story. There is an lk call triggered by fuse mount (pid 42547) that has completed on the first and the second brick but still waiting on the third brick (heketidbstorage-client-2). 1. Are we sure that the affected client to heketidbstorage volume was being accessed through the fuse mount whose pid is 42547? 2. Do we have the statedump from the other two bricks? Better yet, could you share the setup details if you have it so that I can take a look? -Krutika
Location of gluster statedumps and how to find them in OCP. Remember that gluster volumes are mounted on the host node where the application pod is running. So the gluster client process is on the OCP node where app pod is deployed. Also note that the mount process has different log paths. In general 1. for client dumps use find /run -name "*dump*" on the node where gluster client is running 2. for bricks, on the nodes where gluster pods are running use find /var/lib/origin/openshift.local.volumes/pods/ -name "*dump*" 3. remember that gluster brick statedumps won't have the same PID in the filename as what you see on the node. The pid in filename is the PID of the brick process in the POD namespace
Thanks Talur for explaining the problem and helping with the statedump! Here's what I see in the statedump of the first replica of the heketidbstorage volume: [xlator.features.locks.heketidbstorage-locks.inode] path=/heketi.db mandatory=0 posixlk-count=3 posixlk.posixlk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid = 15803, owner=28ac2db21d8caa2e, client=0x7f9b5416a130, connection-id=(null), granted at 2018-08-07 09:46:08 posixlk.posixlk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 44289, owner=3254c93825f30313, client=0x7f9b5c6ff600, connection-id=(null), blocked at 2018-08-07 09:53:34 posixlk.posixlk[2](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 22058, owner=0e37f83eae6d24b0, client=0x7f9b5c6ff600, connection-id=(null), blocked at 2018-08-07 15:56:23 There is one active lock and two blocked locks. As per the statedump, pid 15803 as held on to this lock since August 07 09:46. This corresponds to the process 15803 on cns2, confirmed by the following info from ps output: root 15803 0.0 0.0 4356 348 ? D Aug07 0:00 flock -w 60 /var/lib/heketi/heketi.db true The other two flocks are from pids 44589 and 22058 and have been blocked on this since Aug 07 09:53hrs and 15:56hrs respectively. And ps output on cns0 gives me this: root 22058 0.0 0.0 107944 640 ? D Aug07 0:00 flock -w 5 /var/lib/origin/openshift.local.volumes/pods/b74bdfd0-9a27-11e8-a4bc-005056a589a3/volumes/kubernetes.io~glusterfs/db/heketi.db true root 44289 0.0 0.0 4356 348 ? D Aug07 0:00 flock -w 60 /var/lib/heketi/heketi.db true It does seem like this is because of interrupt on the first application not leading to release of the locks, although I'm not able to tell what's the difference between "/var/lib/origin/openshift.local.volumes/pods/b74bdfd0-9a27-11e8-a4bc-005056a589a3/volumes/kubernetes.io~glusterfs/db/heketi.db" true and "/var/lib/heketi/heketi.db" Talur, would you know? -Krutika
Flock command that has -w 5 is a debugging attempt, ignore it. That is probably why the path is different too.
(In reply to Raghavendra Talur from comment #17) > Flock command that has -w 5 is a debugging attempt, ignore it. That is > probably why the path is different too. Indeed. I believe I may have run that command (outside the container but against the same mountpoint) when I first started looking at this issue.
Comment 16 establishes the root cause to be same as described in 1594623. We have a patch in heketi-start.sh script that avoids the bug. Not setting this as a depend on for the same reason.
To verify this bug I believe that the easiest approach is to: * Mount heketidbstorage somwhere you can manually run flock command * Stop any running heketi pods * On your mount, cd to path where the heketi db resides * Run: 'flock -w 60 heketi.db cat' * This command should block (waiting on EOF in cat) * Background the flock command _or_ switch to a 2nd terminal (At this point you are simulating another "stuck" heketi pod) * Start the heketi pod and monitor the state of the heketi container Prior to the fix the command's -w 60 flag indicates that the command should time out after 60 seconds. However, this will not happen and the pod will block starting up. After the fix, the script will "poll" the lock and time out after 60 seconds with a log message indicating that the db is read only. * After the test is over terminate the cat command and flock will exit, releasing the lock * Unmount the heketidbstorage volume * You may want to reboot any nodes that "stuck" heketi pods are running on before doing anything else with that cluster.
In which build is this fixed?
Ran the heketi automated cases on build heketi-client-7.0.0-6.el7rhgs.x86_64, dint hit this issue, hence marking it as verified.
Updated doc text in the Doc Text field. Please review for technical accuracy.
Thanks John, I have updated the doc text based on your feedback.
Doc Text looks OK
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:2686