Bug 1355689
Summary: | heketi service failed to start if two nodes are down | |||
---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Neha <nerawat> | |
Component: | heketi | Assignee: | Michael Adam <madam> | |
Status: | CLOSED NOTABUG | QA Contact: | Bala Konda Reddy M <bmekala> | |
Severity: | high | Docs Contact: | ||
Priority: | unspecified | |||
Version: | rhgs-3.1 | CC: | abhishku, annair, bkunal, bmekala, hchiramm, jkaur, jmulligan, kramdoss, madam, nerawat, pprakash, rcyriac, rreddy, rtalur, sanandpa, sankarshan, ssaha, vinug | |
Target Milestone: | --- | Keywords: | Reopened, ZStream | |
Target Release: | --- | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | If docs needed, set a value | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1359775 (view as bug list) | Environment: | ||
Last Closed: | 2018-09-19 17:18:59 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | 1432048 | |||
Bug Blocks: | 1573420, 1622458 |
Description
Neha
2016-07-12 09:12:19 UTC
Humble this could be a dup of BZ 1355801 (In reply to Luis Pabón from comment #2) > Humble this could be a dup of BZ 1355801 Here the setup is replica 3 and 2 nodes are down , then the volume move to READONLY. Expected. Isnt it ? AFAIK this is expected behaviour, here bug is filed for heketi service failure. [heketi] ERROR 2016/06/23 08:33:47 /src/github.com/heketi/heketi/apps/glusterfs/app.go:149: write /var/lib/heketi/heketi.db: read-only file system ERROR: Unable to start application heketi-1-kjif9 0/1 CrashLoopBackOff 1 3m <node> (In reply to Neha from comment #4) > AFAIK this is expected behaviour, here bug is filed for heketi service > failure. > Thanks for confirming! > [heketi] ERROR 2016/06/23 08:33:47 > /src/github.com/heketi/heketi/apps/glusterfs/app.go:149: write > /var/lib/heketi/heketi.db: read-only file system > ERROR: Unable to start application > > heketi-1-kjif9 0/1 > CrashLoopBackOff 1 3m <node> When the DB file is not writable, the service wont start. That too expected, Isnt it ? (In reply to Humble Chirammal from comment #5) > (In reply to Neha from comment #4) > > AFAIK this is expected behaviour, here bug is filed for heketi service > > failure. > > > > Thanks for confirming! > > > [heketi] ERROR 2016/06/23 08:33:47 > > /src/github.com/heketi/heketi/apps/glusterfs/app.go:149: write > > /var/lib/heketi/heketi.db: read-only file system > > ERROR: Unable to start application > > > > heketi-1-kjif9 0/1 > > CrashLoopBackOff 1 3m <node> > > When the DB file is not writable, the service wont start. That too expected, > Isnt it ? Yes that is self explanatory. This bug report is filed for "BZ 1341943 - Database needs to be placed in a reliable persistent storage in case of failure" The question here is what is the expectation here with respect to "database reliability"? For replica 3 my understanding is it should sustain 2 node failures. Correct me if I am wrong here. In this case it can only sustain one node failure. How to restore setup back in case of two node failure? How long it will try to restart pod (CrashLoopBackOff) in case if nodes are down for longer period of time ? (In reply to Neha from comment #6) > (In reply to Humble Chirammal from comment #5) > > (In reply to Neha from comment #4) > > > AFAIK this is expected behaviour, here bug is filed for heketi service > > > failure. > > > > > > > Thanks for confirming! > > > > > [heketi] ERROR 2016/06/23 08:33:47 > > > /src/github.com/heketi/heketi/apps/glusterfs/app.go:149: write > > > /var/lib/heketi/heketi.db: read-only file system > > > ERROR: Unable to start application > > > > > > heketi-1-kjif9 0/1 > > > CrashLoopBackOff 1 3m <node> > > > > When the DB file is not writable, the service wont start. That too expected, > > Isnt it ? > > Yes that is self explanatory. > > This bug report is filed for "BZ 1341943 - Database needs to be placed in a > reliable persistent storage in case of failure" Thats a different discussion altogether. I dont think that is something we can fix it by this bug report. > > The question here is what is the expectation here with respect to "database > reliability"? > As mentioned above it has to be answered a different bug/thread. > For replica 3 my understanding is it should sustain 2 node failures. Correct > me if I am wrong here. Please note that the volume is not 'replica 3', its distributed replica ( 2x3). As discussed this is expected to move a volume to READONLY mode when the quorum is not met to serve the file. > > In this case it can only sustain one node failure. > > How to restore setup back in case of two node failure? > > How long it will try to restart pod (CrashLoopBackOff) in case if nodes are > down for longer period of time ? Neha as mentioned above please open a discussion or question bug for these. As per our discussion I am inclined to close this bug. Please let me know your thought. We will proceed accordingly. (In reply to Humble Chirammal from comment #7) > (In reply to Neha from comment #6) > > (In reply to Humble Chirammal from comment #5) > > > (In reply to Neha from comment #4) > > > > AFAIK this is expected behaviour, here bug is filed for heketi service > > > > failure. > > > > > > > > > > Thanks for confirming! > > > > > > > [heketi] ERROR 2016/06/23 08:33:47 > > > > /src/github.com/heketi/heketi/apps/glusterfs/app.go:149: write > > > > /var/lib/heketi/heketi.db: read-only file system > > > > ERROR: Unable to start application > > > > > > > > heketi-1-kjif9 0/1 > > > > CrashLoopBackOff 1 3m <node> > > > > > > When the DB file is not writable, the service wont start. That too expected, > > > Isnt it ? > > > > Yes that is self explanatory. > > > > This bug report is filed for "BZ 1341943 - Database needs to be placed in a > > reliable persistent storage in case of failure" > > Thats a different discussion altogether. I dont think that is something we > can fix it by this bug report. > > > > > The question here is what is the expectation here with respect to "database > > reliability"? > > > > As mentioned above it has to be answered a different bug/thread. > > > For replica 3 my understanding is it should sustain 2 node failures. Correct > > me if I am wrong here. > > Please note that the volume is not 'replica 3', its distributed replica ( > 2x3). As discussed this is expected to move a volume to READONLY mode when > the quorum is not met to serve the file. Yes that is correct that its [2 x 3]. I dont think so if behaviour will chnage even if its "plain replica" volume. > > > > > > In this case it can only sustain one node failure. > > > > How to restore setup back in case of two node failure? > > > > How long it will try to restart pod (CrashLoopBackOff) in case if nodes are > > down for longer period of time ? > > Neha as mentioned above please open a discussion or question bug for these. > As per our discussion I am inclined to close this bug. Please let me know > your thought. We will proceed accordingly. I believe we can still track this here rather than opening a new bug. This is expected behaviour from gluster point of view but as a solution to make heketi db reliable, its a problem. Already there is parent bug for that #1341943
>
> I believe we can still track this here rather than opening a new bug. This
> is expected behaviour from gluster point of view but as a solution to make
> heketi db reliable, its a problem. Already there is parent bug for that
> #1341943
@neha, iic, we are in agreement on FS goes READONLY scenario which is an expected result from GLUSTER. If you still have doubt please feel free to discuss
.
@Luis, It looks to me that Neha is trying to find answers for below.
*) The question here is what is the expectation here with respect to "database
reliability"?
*) How to restore setup back in case of two node failure?
*) How long it will try to restart pod (CrashLoopBackOff) in case if nodes are down for longer period of time ?
(In reply to Humble Chirammal from comment #9) > > > > > I believe we can still track this here rather than opening a new bug. This > > is expected behaviour from gluster point of view but as a solution to make > > heketi db reliable, its a problem. Already there is parent bug for that > > #1341943 > > @neha, iic, we are in agreement on FS goes READONLY scenario which is an > expected result from GLUSTER. If you still have doubt please feel free to > discuss > . > @Luis, It looks to me that Neha is trying to find answers for below. > > *) The question here is what is the expectation here with respect to > "database > reliability"? The expectation is that it is as reliable as any GlusterFS volume. > > *) How to restore setup back in case of two node failure? Mount and copy the files out of the volume. Delete the volume and create a new one with the same name. Then mount and copy back. > > *) How long it will try to restart pod (CrashLoopBackOff) in case if nodes > are down for longer period of time ? That depends on the algorithm in Kubernetes. Created a patch upstream to allow startup in read-only mode: https://github.com/heketi/heketi/issues/435 All read-only commands will work, like listings, and backup. Please retest. The issue is still seen with the heketi-client-3.1.0-10.el7rhgs.x86_64 1) heketi pod was configured on host 'dhcp47-110'. [root@dhcp46-2 ~]# oc get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE glusterfs-7buxc 1/1 Running 0 1d 10.70.47.110 dhcp47-110.lab.eng.blr.redhat.com glusterfs-qt5fx 1/1 Running 0 1d 10.70.47.112 dhcp47-112.lab.eng.blr.redhat.com glusterfs-x9b1n 1/1 Running 0 1d 10.70.46.224 dhcp46-224.lab.eng.blr.redhat.com heketi-1-cljgb 1/1 Running 0 1d 10.128.0.8 dhcp47-110.lab.eng.blr.redhat.com storage-project-router-1-hw98o 1/1 Running 0 1d 10.70.47.112 dhcp47-112.lab.eng.blr.redhat.com [root@dhcp46-2 ~]# 2) node 'dhcp47-110' was shutdown, heketi spun up on 'dhcp46-224' [root@dhcp46-2 ~]# oc get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE glusterfs-7buxc 1/1 Running 0 1d 10.70.47.110 dhcp47-110.lab.eng.blr.redhat.com glusterfs-qt5fx 1/1 Running 0 1d 10.70.47.112 dhcp47-112.lab.eng.blr.redhat.com glusterfs-x9b1n 1/1 Running 0 1d 10.70.46.224 dhcp46-224.lab.eng.blr.redhat.com heketi-1-vdxdt 1/1 Running 0 53s 10.131.0.10 dhcp46-224.lab.eng.blr.redhat.com storage-project-router-1-hw98o 1/1 Running 0 1d 10.70.47.112 dhcp47-112.lab.eng.blr.redhat.com [root@dhcp46-2 ~]# 3) shutdown 'dhcp46-224', expecting heketi service to be up on 'dhcp47-112'. heketi pod failed to start up on dhcp47-112. [root@dhcp46-2 ~]# oc get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE glusterfs-7buxc 1/1 Running 0 1d 10.70.47.110 dhcp47-110.lab.eng.blr.redhat.com glusterfs-qt5fx 1/1 Running 0 1d 10.70.47.112 dhcp47-112.lab.eng.blr.redhat.com glusterfs-x9b1n 1/1 Running 0 1d 10.70.46.224 dhcp46-224.lab.eng.blr.redhat.com heketi-1-zu69v 0/1 CrashLoopBackOff 5 6m 10.130.0.6 dhcp47-112.lab.eng.blr.redhat.com storage-project-router-1-hw98o 1/1 Running 0 1d 10.70.47.112 dhcp47-112.lab.eng.blr.redhat.com Moving the bug to 'Assigned' based on above test. *) volume info of heketidbstorage volume. *) status ( ex: RO ) of the volume. *) describe output of heketi pod. *) Did kube try to start new pod ? *) What if you 'delete' heketi-1-zu69v , Is it starting new heketi pod? *) What is recorded in Heketi logs ? Without these information it is very difficult to proceed. FYI# c#27 talks about 'heketi-1-zu69v' heketi-1-zu69v 0/1 CrashLoopBackOff however c#29 is from new iteration and pod name is 'heketi-1-4bx8o'. Its better if you can include 'ls -ld' of /var/lib/heketi and 'ls -l' output of 'heketi.db' file . At glance: Why I am asking this information is due to below reasons. This is supposed to be fixed with https://github.com/heketi/heketi/pull/436/ I could expect an error message "https://github.com/heketi/heketi/pull/436/files#diff-f394c40886f16cc9392ab7f130752b8bR106" in heketi logs when it try to open it in READONLY mode. If it failed 'Unable to open database:' , neither of this is available in logs , app.db, err = bolt.Open(dbfilename, 0600, &bolt.Options{Timeout: 3 * time.Second}) if err != nil { logger.Warning("Unable to open database. Retrying using read only mode") // Try opening as read-only app.db, err = bolt.Open(dbfilename, 0666, &bolt.Options{ ReadOnly: true, }) if err != nil { logger.LogError("Unable to open database: %v", err) return nil } app.dbReadOnly = true Other necessary info, # gluster v list heketidbstorage vol_476e104dacc88c57855b958765e5e20d vol_5554335ecd62ede9d278b5b5c5fd133a vol_6407e1b9266794d33f302d572e0fe63c vol_d1c17bd998262085e2078893501045db vol_dd4d7b53f504019c07cfa31439513444 vol_f8e1da8579b1d5bfcdbe1ca6fac1245e sh-4.2# gluster v info heketidbstorage Volume Name: heketidbstorage Type: Replicate Volume ID: 2fec6d2b-20a7-4b1c-9411-7a29c4e6bbce Status: Started Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: 10.70.46.224:/var/lib/heketi/mounts/vg_92a0ea834f0f64420e8b3fa3f638b075/brick_4ec8801b26aa4344cf3a40d13bce4d34/brick Brick2: 10.70.47.110:/var/lib/heketi/mounts/vg_9021e39a7bd981c1a42501b6e9da487f/brick_bc5901a1a862854d8f73ed24915828ad/brick Brick3: 10.70.47.112:/var/lib/heketi/mounts/vg_3587717368692d764d40936b0f5fd47f/brick_be67fe7670103762ae3e5b1545dfd55f/brick Options Reconfigured: performance.readdir-ahead: on sh-4.2# ls -ld /var/lib/heketi drwxr-xr-x. 3 root root 33 Dec 26 12:17 /var/lib/heketi [root@dhcp46-2 mnt_tmp]# ls -l total 132 -rw-r--r--. 1 root root 131072 Dec 28 14:32 heketi.db drwxr-xr-x. 2 root root 4096 Dec 26 17:48 secret Hi, I did little RCA on this issue. The patch in heketi tries to talk to db when db has 666 permission and it is not tested with 644 permission. gluster readonly mode makes the db 644. # stat /mnt/heketi.db File: ‘/mnt/heketi.db’ Size: 131072 Blocks: 176 IO Block: 131072 regular file Device: 29h/41d Inode: 12926231922372432401 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Context: system_u:object_r:fusefs_t:s0 Access: 2016-12-27 08:50:41.711681035 -0500 Modify: 2016-12-29 01:25:24.200451583 -0500 Change: 2016-12-29 01:25:24.211451693 -0500 Birth: - PR: https://github.com/heketi/heketi/pull/436/files (In reply to Mohamed Ashiq from comment #32) > Hi, > > I did little RCA on this issue. The patch in heketi tries to talk to db when > db has 666 permission and it is not tested with 644 permission. gluster > readonly mode makes the db 644. > > # stat /mnt/heketi.db > File: ‘/mnt/heketi.db’ > Size: 131072 Blocks: 176 IO Block: 131072 regular file > Device: 29h/41d Inode: 12926231922372432401 Links: 1 > Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) > Context: system_u:object_r:fusefs_t:s0 > Access: 2016-12-27 08:50:41.711681035 -0500 > Modify: 2016-12-29 01:25:24.200451583 -0500 > Change: 2016-12-29 01:25:24.211451693 -0500 > Birth: - > > > PR: > https://github.com/heketi/heketi/pull/436/files Thanks Ashiq!! Exactly, as in c#30 and c#31 . If "666" heketi.db is able to start the heketi service in 'readonly' mode and then changing back to 644 with a code change to "644" ( https://github.com/heketi/heketi/pull/436/files#diff-f394c40886f16cc9392ab7f130752b8bR109 ) also worked, we need a new PR. Reference: https://github.com/boltdb/bolt/#read-only-mode AFAIK this can be used to open a db in read-only mode but does not talk about the underlying filesystem. [NOTE] It seems that, we need to look this in detail wrt: The underlying FS state ( RO) + file permission ( 666/644 ) + bolt.Open() actions https://github.com/boltdb/bolt/blob/2e25e3bb4285d41d223bb80b12658a2c9b9bf3e3/db.go#L150 . Release team is notified about the change, I am removing Devel Ack and proposing this for CNS 3.5 release. Development Management has reviewed and declined this request. You may appeal this decision by reopening this request. This is due to the behavior of BoltDB, boltDb does not detect a read-only filesystem, hence heketi does not Open db in read only mode in such cases. As a work around we can detect read only filesystem in heketi and handle it. As we store the db in secret since heketi v4, this should be more safe now or not a worry at all in CNS deployments, but still may consider heketi running outside kube/openshift setups. Also look at# https://github.com/heketi/heketi/issues/685#issuecomment-282934600 The following Pull request addresses the issue :: https://github.com/heketi/heketi/pull/701 (In reply to Humble Chirammal from comment #43) > As we store the db in secret since heketi v4, this should be more safe now > or not a worry at all in CNS deployments, but still may consider heketi > running outside kube/openshift setups. That is right. This should not be an Issue when used in CNS as the product. I.e in Tandem with OpenShift. But in the scenario in which there is Stand alone (heketi + rhgs) problems needs to be addressed > > Also look at# > https://github.com/heketi/heketi/issues/685#issuecomment-282934600 This is an issue with behavior of the underlying gluster filesystem, the bug lies in Gluster :: open() with O_RDWR on a RO-filesystem returns -1 with errno == EROFS open() with O_RDWR on a RO file returns -1 with error == EACCESS As soon as RHGS bug is opened, I will defer this from CNS 3.5 release. https://bugzilla.redhat.com/show_bug.cgi?id=1432048 is the RHGS bug. We have a dependent bug on RHGS now and the fix should land in GlusterFS. I am deferring this bug from this release. Resetting the needinfo from nerawat to current heketi QE. @Abhishek: Will it be possible for you to reproduce this with latest builds? We/Engineering thinks that this is already fixed. If You can reproduce, we can reconsider else we would like to close this. (In reply to Bipin Kunal from comment #62) > @Abhishek: Will it be possible for you to reproduce this with latest builds? > We/Engineering thinks that this is already fixed. If You can reproduce, we > can reconsider else we would like to close this. I too would like to see this closed. It is correct that heketi / heketi pod does not start if two nodes hosting the bricks of heketidbstorage are down. If the bricks are down for unknown reasons we should debug why the glusterfs volume is in an unhealthy state rather than keep this ancient heketi bz alive. Closing according to the discussion in the triage meeting: this works as designed. |