Bug 1098045

Summary: [SNAPSHOT]: recreation of already deleted/restored snaps occur due to optimization
Product: [Community] GlusterFS Reporter: Avra Sengupta <asengupt>
Component: snapshotAssignee: Avra Sengupta <asengupt>
Status: CLOSED EOL QA Contact:
Severity: urgent Docs Contact:
Priority: urgent    
Version: pre-releaseCC: amukherj, bugs, gluster-bugs
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: SNAPSHOT
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1098103 (view as bug list) Environment:
Last Closed: 2015-10-22 15:40:20 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1098103    

Description Avra Sengupta 2014-05-15 07:21:43 UTC
Description of problem:
During snapshot create, if a brick is down and the quorum is met, the snapshot create is successful. In the missed_snaps_list, an entry is maintained for the missed create. The same happens for missed deletes and restores.

There was an optimization, in which if a create for a brick is pending, and a delete/restore is also missed for the same brick, we don't add a new entry to the list for delete/restore. We just mark the create entry as done. The rationale behind this optimization was if you haven't created anything yet, there is nothing to delete or restore from.

The issue with the optimizations is that, when the create actually happens after the brick comes up, the entry is updated from pending to done only in the node hosting the brick (as that node alone is aware of this  change). This information is propagated to other nodes during subsequent handshakes following this event but not before that.

So if after the create is successful, the node hosting the brick (and having the updated entry) again goes down. And during this time a delete/restore command is issued. the other nodes will utilize the optimization and not add a delete entry but instead mark the create pending entry as done. This leads to inconsistency and when the node comes back up, it not only contains the stale snap, but also makes other nodes believe that its a new snap.


Version-Release number of selected component (if applicable):


How reproducible:
Everytime


Steps to Reproduce:
1."pkill gluster" on one node.
2.Issue a create command from another node, and check /var/lib/glusterd/snaps/missed_snaps_list for the missed create entry.
3. Bring back glusterd on the first node. 
4. The missed_snaps_list gets synced and the missing snapshot is taken. In the first node, check the /var/lib/glusterd/snaps/missed_snaps_list to see that the entry is now updated from pending to done.
5. Again issue "pkill gluster" on the first node.
6. Now issue the delete command for the same snap from another node. 
7. Check /var/lib/glusterd/snaps/missed_snaps_list in this node to see that a new delete entry is not added, but the create entry is marked as done.
8. Now bring back glusterd on the first node.

Actual results:
When the first node comes back after the delete command is issued, the missed_snaps_list on syncing does not have any delete info. Hence it doesn't delete the stale snap. 
The other nodes also don't have any delete entry at this point in time, and hence assume the stale snap info to be a new snap info, and recreate the stale snap, thus landing the cluster in an inconsistent state.


Expected results:
The delete info should be correctly propagated, and the stale snap should be deleted. The other nodes should therefore not recreate the already deleted/restored snap


Additional info:

Comment 1 Anand Avati 2014-05-20 14:40:28 UTC
REVIEW: http://review.gluster.org/7811 (glusterd/snapshot: Removing missed snap optimization) posted (#1) for review on master by Avra Sengupta (asengupt)

Comment 2 Anand Avati 2014-05-28 12:21:47 UTC
REVIEW: http://review.gluster.org/7811 (glusterd/snapshot: Removing missed snap optimization) posted (#2) for review on master by Avra Sengupta (asengupt)

Comment 3 Anand Avati 2014-06-16 11:44:03 UTC
REVIEW: http://review.gluster.org/8077 (DUMMY PATCH FOR LOGS) posted (#1) for review on master by Avra Sengupta (asengupt)

Comment 4 Kaleb KEITHLEY 2015-10-22 15:40:20 UTC
pre-release version is ambiguous and about to be removed as a choice.

If you believe this is still a bug, please change the status back to NEW and choose the appropriate, applicable version for it.