Bug 1596626 - Heketi stops all further operations on finding "Id Not Found" errors due to db inconsistent state
Summary: Heketi stops all further operations on finding "Id Not Found" errors due to d...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: heketi
Version: cns-3.10
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: CNS 3.10
Assignee: Michael Adam
QA Contact: Rachael
URL:
Whiteboard:
Depends On:
Blocks: 1568862
TreeView+ depends on / blocked
 
Reported: 2018-06-29 10:12 UTC by Saravanakumar
Modified: 2018-12-19 08:49 UTC (History)
12 users (show)

Fixed In Version: heketi-7.0.0-2.el7rhgs
Doc Type: Bug Fix
Doc Text:
Previously, when the heketi database contained entries with broken references, various operations failed with the error "Id not found". With this fix, broken references are ignored when deleting a block hosting volume, cleaning up bricks with empty paths, and starting the heketi service when removing said reference would not lead to any additional broken references.
Clone Of:
Environment:
Last Closed: 2018-09-12 09:23:45 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2018:2686 0 None None None 2018-09-12 09:24:56 UTC

Description Saravanakumar 2018-06-29 10:12:29 UTC
Description of problem:

Rebase heketi for CNS-3.10

Comment 7 Raghavendra Talur 2018-06-29 15:55:18 UTC
Prasanth,

We don't need a rebase, I will change the description and title. 

However, upstream has identified three PRs 

https://github.com/heketi/heketi/pull/1216
https://github.com/heketi/heketi/pull/1213/
https://github.com/heketi/heketi/pull/1206


which are critical for stability and recovering from bad db state. I would like to use this bug for pulling in the three PRs mentioned.

Comment 8 Raghavendra Talur 2018-06-29 15:59:49 UTC
To complement the update in the title:

Earlier, whenever heketi encountered "Id not found" errors from the database it aborted the operation. With the patches mentioned in comment 7, it is possible for heketi to skip such errors when not critical. This allows users to continue using heketi.

Comment 10 Raghavendra Talur 2018-07-06 20:46:01 UTC
1. kill heketi pod while creating a bunch of PVCs in a loop
2. when the pod is restarted, ensure that there are some pending operations. You can see pending operations by either using heketi-cli dump db op or using heketi db export command and looking into the json.
3. Once the existence of pending operations is confirmed, using the db tool to delete pending operations.
4. Perform node replace or device replace operations and ensure they all pass.

Comment 12 Anjana KD 2018-08-31 00:08:35 UTC
Updated doc text in the Doc Text field. Please review for technical accuracy.

Comment 15 John Mulligan 2018-09-07 19:06:04 UTC
Doc Text looks OK

Comment 17 errata-xmlrpc 2018-09-12 09:23:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:2686


Note You need to log in before you can comment on or make changes to this bug.