Bug 1613260
| Summary: | CNS: heketi pod not inready state - Liveness probe failed | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Apeksha <akhakhar> | ||||
| Component: | heketi | Assignee: | John Mulligan <jmulligan> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Apeksha <akhakhar> | ||||
| Severity: | urgent | Docs Contact: | |||||
| Priority: | urgent | ||||||
| Version: | cns-3.9 | CC: | akhakhar, akrishna, hchiramm, jmulligan, kdhananj, kramdoss, madam, rhs-bugs, rtalur, sankarshan, storage-qa-internal | ||||
| Target Milestone: | --- | ||||||
| Target Release: | CNS 3.10 | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | heketi-7.0.0-6.el7rhgs | Doc Type: | Bug Fix | ||||
| Doc Text: |
Previously, the heketi startup script for the container image used a blocking file lock with a time out. This could cause issues when the heketi db is stored on gluster volumes. With this update, the startup script now polls for db availability with a non-blocking file lock.
|
Story Points: | --- | ||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2018-09-12 09:23:51 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 1568862 | ||||||
| Attachments: |
|
||||||
|
Description
Apeksha
2018-08-07 10:30:02 UTC
since the heketi pod is not in ready state, the heketi logs is empty.(note setup is in same state) sosreports from all nodes available at - http://rhsqe-repo.lab.eng.blr.redhat.com/cns/bugs/BZ-1613260/ Created attachment 1474074 [details] State dumps We susp[ect this is an issue with the heketidbstorage gluster volume not properly handling the timout of a blocking flock. See also: https://github.com/gluster/glusterfs/issues/465 We're attaching statedump so that someone on the gluster dev team can confirm. Talur asks me to ask: do we see an issue here with the lock translator? In the short term we may need to add a workaround to the Heketi start up script to avoid using a blocking flock, however before we do that we'd like to confirm that this is really the problem. Node packages: [root@ap-v310-live-app-cns-2 ~]# rpm -qa | grep gluster glusterfs-3.8.4-53.el7.x86_64 glusterfs-fuse-3.8.4-53.el7.x86_64 glusterfs-libs-3.8.4-53.el7.x86_64 glusterfs-client-xlators-3.8.4-53.el7.x86_64 Pod (server / brick server) packages: sh-4.2# rpm -qa | grep gluster glusterfs-libs-3.8.4-54.10.el7rhgs.x86_64 glusterfs-3.8.4-54.10.el7rhgs.x86_64 glusterfs-api-3.8.4-54.10.el7rhgs.x86_64 glusterfs-server-3.8.4-54.10.el7rhgs.x86_64 glusterfs-client-xlators-3.8.4-54.10.el7rhgs.x86_64 glusterfs-cli-3.8.4-54.10.el7rhgs.x86_64 glusterfs-fuse-3.8.4-54.10.el7rhgs.x86_64 glusterfs-geo-replication-3.8.4-54.10.el7rhgs.x86_64 gluster-block-0.2.1-14.1.el7rhgs.x86_64 Sorry, I didn't get a chance to take a look at this y'day. Will check and get back. -Krutika I'm not able to completely make sense out of the data yet. This is what I see in the brick statedump: <snip> ... ... [xlator.features.locks.heketidbstorage-locks.inode] path=/heketi.db mandatory=0 posixlk-count=1 posixlk.posixlk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid = 42678, owner=8124271b980976b3, client=0x7f2de402ead0, connection-id=(null), granted at 2018-08-06 21:06:27 ... ... </snip> There is one active (granted) flock on /heketi.db acquired by pid 42678 on this brick and this brick happens to be the third brick in this replica set (or heketidbstorage-client-2). There are no other locks blocked on this file (at least not on this brick). But the client statedump tells a slightly different story. There is an lk call triggered by fuse mount (pid 42547) that has completed on the first and the second brick but still waiting on the third brick (heketidbstorage-client-2). 1. Are we sure that the affected client to heketidbstorage volume was being accessed through the fuse mount whose pid is 42547? 2. Do we have the statedump from the other two bricks? Better yet, could you share the setup details if you have it so that I can take a look? -Krutika Location of gluster statedumps and how to find them in OCP. Remember that gluster volumes are mounted on the host node where the application pod is running. So the gluster client process is on the OCP node where app pod is deployed. Also note that the mount process has different log paths. In general 1. for client dumps use find /run -name "*dump*" on the node where gluster client is running 2. for bricks, on the nodes where gluster pods are running use find /var/lib/origin/openshift.local.volumes/pods/ -name "*dump*" 3. remember that gluster brick statedumps won't have the same PID in the filename as what you see on the node. The pid in filename is the PID of the brick process in the POD namespace Thanks Talur for explaining the problem and helping with the statedump! Here's what I see in the statedump of the first replica of the heketidbstorage volume: [xlator.features.locks.heketidbstorage-locks.inode] path=/heketi.db mandatory=0 posixlk-count=3 posixlk.posixlk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid = 15803, owner=28ac2db21d8caa2e, client=0x7f9b5416a130, connection-id=(null), granted at 2018-08-07 09:46:08 posixlk.posixlk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 44289, owner=3254c93825f30313, client=0x7f9b5c6ff600, connection-id=(null), blocked at 2018-08-07 09:53:34 posixlk.posixlk[2](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 22058, owner=0e37f83eae6d24b0, client=0x7f9b5c6ff600, connection-id=(null), blocked at 2018-08-07 15:56:23 There is one active lock and two blocked locks. As per the statedump, pid 15803 as held on to this lock since August 07 09:46. This corresponds to the process 15803 on cns2, confirmed by the following info from ps output: root 15803 0.0 0.0 4356 348 ? D Aug07 0:00 flock -w 60 /var/lib/heketi/heketi.db true The other two flocks are from pids 44589 and 22058 and have been blocked on this since Aug 07 09:53hrs and 15:56hrs respectively. And ps output on cns0 gives me this: root 22058 0.0 0.0 107944 640 ? D Aug07 0:00 flock -w 5 /var/lib/origin/openshift.local.volumes/pods/b74bdfd0-9a27-11e8-a4bc-005056a589a3/volumes/kubernetes.io~glusterfs/db/heketi.db true root 44289 0.0 0.0 4356 348 ? D Aug07 0:00 flock -w 60 /var/lib/heketi/heketi.db true It does seem like this is because of interrupt on the first application not leading to release of the locks, although I'm not able to tell what's the difference between "/var/lib/origin/openshift.local.volumes/pods/b74bdfd0-9a27-11e8-a4bc-005056a589a3/volumes/kubernetes.io~glusterfs/db/heketi.db" true and "/var/lib/heketi/heketi.db" Talur, would you know? -Krutika Flock command that has -w 5 is a debugging attempt, ignore it. That is probably why the path is different too. (In reply to Raghavendra Talur from comment #17) > Flock command that has -w 5 is a debugging attempt, ignore it. That is > probably why the path is different too. Indeed. I believe I may have run that command (outside the container but against the same mountpoint) when I first started looking at this issue. Comment 16 establishes the root cause to be same as described in 1594623. We have a patch in heketi-start.sh script that avoids the bug. Not setting this as a depend on for the same reason. To verify this bug I believe that the easiest approach is to: * Mount heketidbstorage somwhere you can manually run flock command * Stop any running heketi pods * On your mount, cd to path where the heketi db resides * Run: 'flock -w 60 heketi.db cat' * This command should block (waiting on EOF in cat) * Background the flock command _or_ switch to a 2nd terminal (At this point you are simulating another "stuck" heketi pod) * Start the heketi pod and monitor the state of the heketi container Prior to the fix the command's -w 60 flag indicates that the command should time out after 60 seconds. However, this will not happen and the pod will block starting up. After the fix, the script will "poll" the lock and time out after 60 seconds with a log message indicating that the db is read only. * After the test is over terminate the cat command and flock will exit, releasing the lock * Unmount the heketidbstorage volume * You may want to reboot any nodes that "stuck" heketi pods are running on before doing anything else with that cluster. In which build is this fixed? Ran the heketi automated cases on build heketi-client-7.0.0-6.el7rhgs.x86_64, dint hit this issue, hence marking it as verified. Updated doc text in the Doc Text field. Please review for technical accuracy. Thanks John, I have updated the doc text based on your feedback. Doc Text looks OK Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:2686 |