Bug 1596626

Summary: Heketi stops all further operations on finding "Id Not Found" errors due to db inconsistent state
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Saravanakumar <sarumuga>
Component: heketiAssignee: Michael Adam <madam>
Status: CLOSED ERRATA QA Contact: Rachael <rgeorge>
Severity: high Docs Contact:
Priority: unspecified    
Version: cns-3.10CC: akrishna, hchiramm, jmulligan, madam, pprakash, rhs-bugs, rtalur, sankarshan, sarumuga, sselvan, storage-qa-internal, vinug
Target Milestone: ---   
Target Release: CNS 3.10   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: heketi-7.0.0-2.el7rhgs Doc Type: Bug Fix
Doc Text:
Previously, when the heketi database contained entries with broken references, various operations failed with the error "Id not found". With this fix, broken references are ignored when deleting a block hosting volume, cleaning up bricks with empty paths, and starting the heketi service when removing said reference would not lead to any additional broken references.
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-09-12 09:23:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1568862    

Description Saravanakumar 2018-06-29 10:12:29 UTC
Description of problem:

Rebase heketi for CNS-3.10

Comment 7 Raghavendra Talur 2018-06-29 15:55:18 UTC
Prasanth,

We don't need a rebase, I will change the description and title. 

However, upstream has identified three PRs 

https://github.com/heketi/heketi/pull/1216
https://github.com/heketi/heketi/pull/1213/
https://github.com/heketi/heketi/pull/1206


which are critical for stability and recovering from bad db state. I would like to use this bug for pulling in the three PRs mentioned.

Comment 8 Raghavendra Talur 2018-06-29 15:59:49 UTC
To complement the update in the title:

Earlier, whenever heketi encountered "Id not found" errors from the database it aborted the operation. With the patches mentioned in comment 7, it is possible for heketi to skip such errors when not critical. This allows users to continue using heketi.

Comment 10 Raghavendra Talur 2018-07-06 20:46:01 UTC
1. kill heketi pod while creating a bunch of PVCs in a loop
2. when the pod is restarted, ensure that there are some pending operations. You can see pending operations by either using heketi-cli dump db op or using heketi db export command and looking into the json.
3. Once the existence of pending operations is confirmed, using the db tool to delete pending operations.
4. Perform node replace or device replace operations and ensure they all pass.

Comment 12 Anjana KD 2018-08-31 00:08:35 UTC
Updated doc text in the Doc Text field. Please review for technical accuracy.

Comment 15 John Mulligan 2018-09-07 19:06:04 UTC
Doc Text looks OK

Comment 17 errata-xmlrpc 2018-09-12 09:23:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:2686