Bug 1565940
Summary: | [Tracker-RHGS-BZ#1618221] "Repeated failure/recovery of glusterd" + "volume create request on cns setup" can lead to a stale volume entry in gluster | ||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | krishnaram Karthick <kramdoss> | ||||||||||||||||||
Component: | rhgs-server-container | Assignee: | Saravanakumar <sarumuga> | ||||||||||||||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Prasanth <pprakash> | ||||||||||||||||||
Severity: | urgent | Docs Contact: | |||||||||||||||||||
Priority: | unspecified | ||||||||||||||||||||
Version: | cns-3.9 | CC: | amukherj, asriram, bkunal, dwojslaw, ekuric, hchiramm, jmulligan, kramdoss, madam, nberry, ndevos, pprakash, rcyriac, rhs-bugs, rtalur, sankarshan, storage-qa-internal, suprasad | ||||||||||||||||||
Target Milestone: | --- | ||||||||||||||||||||
Target Release: | --- | ||||||||||||||||||||
Hardware: | Unspecified | ||||||||||||||||||||
OS: | Unspecified | ||||||||||||||||||||
Whiteboard: | |||||||||||||||||||||
Fixed In Version: | Doc Type: | Known Issue | |||||||||||||||||||
Doc Text: |
If heketi tries to perform volume create and delete operations in a loop while one of the gluster pod is also restarted in a loop in background, it can lead to a rare race condition where the glusterd instance while coming up on the restarted gluster pod end up with syncing a stale volume which was already deleted resulting into one additional volume entry in the restarted gluster pod. This will eventually lead to a situation where heketi will fail to create the volume with a message "volume already exists".
Workaround:
The above can be confirmed by looking at the 'gluster v list | wc -l' output from all the gluster pods and see if there's a mismatch in number of volume entries and which ever node there's an extra entry for a volume removing /var/lib/glusterd/vols/<volname in question> from the backend and restarting glusterd instance would fix this issue.
|
Story Points: | --- | ||||||||||||||||||
Clone Of: | |||||||||||||||||||||
: | 1582402 (view as bug list) | Environment: | |||||||||||||||||||
Last Closed: | 2019-04-17 06:56:19 UTC | Type: | Bug | ||||||||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||||||||
Documentation: | --- | CRM: | |||||||||||||||||||
Verified Versions: | Category: | --- | |||||||||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||||||
Embargoed: | |||||||||||||||||||||
Bug Depends On: | 1618221 | ||||||||||||||||||||
Bug Blocks: | 1573420, 1641685, 1641915 | ||||||||||||||||||||
Attachments: |
|
Description
krishnaram Karthick
2018-04-11 06:19:11 UTC
Created attachment 1420180 [details]
heketi_logs
Created attachment 1420181 [details]
gluster_volume_info
Ok, here is the root cause: If a volume is created but start fails then the volume create operation is considered as a failure. This triggers a cleanup process which is not strict enough(we don't check if volume is really destroyed). As the retry mechanism uses the same ID as before we end up getting errors from gluster "volume already exists". Possible Solution: a. Split the volume create executor into two parts or more. After each executor call determine the best course of action and process. b. In case of a cleanup where we are not sure if the cleanup has completed or not, we cannot reuse the IDs and will have to retry at a higher level. Note that we should not delete the PendingOperations of the first volume create. It might have more cleanups to be performed. Here's my interpretation of the problem: Based on my reading of the code I doubt this exact condition is new to this build and has probably been lurking around for a while. But I suspect that the retry loop actually makes this easier to hit. It's important that in this test glusterd is being stopped and started in a loop. According to the logs volume create requests are coming in and most fail because gluster on the node it is trying to use is down. Typically, this failure occurs on a 'gluster volume create' command. However, due to timing it occasionally hits a case where 'gluster volume create' succeeds but 'gluster volume start' fails. Within the code to create a volume is an "immediate cleanup" of the volume when an error is hit. However, because of the state of glusterd this cleanup also fails. At this point the volume create operation's rollback function is called. Rollback attempts to clean up the bricks (only the bricks, because historically the code assumed the self-cleanup of volume create would succeed) and then the heketi db entries. Since these steps work, rollback succeeds and a new retry is attempted. On this attempt the create fails because the volume is already in use by gluster. Again, rollback fires with a false success removing any trace of the volume from the db. Dealing with the cleanup when the volume can't reliably be deleted on the gluster side will be challenging and need more design & changes on the heketi side (at the very least). However, in the short term I'm going to look into making the rollback fail reliably when the volume cleanup hasn't happened. This will leave the volume in the db (in a pending state) and the heketi db tools can then be used to view it and help clean it up manually on the gluster side. Unfortunately, we can't avoid the problem just by cycling IDs because we support user specified names. This would also not avoid the issue of the volume being removed from the db on rollback failures. Created attachment 1422571 [details]
heketi_logs_apr_16_failedQA
Created attachment 1422573 [details]
db_dump_apr16
Doc text looks good to me. Created attachment 1422936 [details]
glusterd_logs_collated
Created attachment 1422937 [details]
cmd_history_collated.txt
Created attachment 1422941 [details]
all cmd_history separate
Created attachment 1422942 [details]
all glusterd logs separate
I don't see any additional work to do here. |