Description of problem: Rebase heketi for CNS-3.10
Prasanth, We don't need a rebase, I will change the description and title. However, upstream has identified three PRs https://github.com/heketi/heketi/pull/1216 https://github.com/heketi/heketi/pull/1213/ https://github.com/heketi/heketi/pull/1206 which are critical for stability and recovering from bad db state. I would like to use this bug for pulling in the three PRs mentioned.
To complement the update in the title: Earlier, whenever heketi encountered "Id not found" errors from the database it aborted the operation. With the patches mentioned in comment 7, it is possible for heketi to skip such errors when not critical. This allows users to continue using heketi.
1. kill heketi pod while creating a bunch of PVCs in a loop 2. when the pod is restarted, ensure that there are some pending operations. You can see pending operations by either using heketi-cli dump db op or using heketi db export command and looking into the json. 3. Once the existence of pending operations is confirmed, using the db tool to delete pending operations. 4. Perform node replace or device replace operations and ensure they all pass.
Updated doc text in the Doc Text field. Please review for technical accuracy.
Doc Text looks OK
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:2686