Bug 1562581

Summary:

Node removal fails when run concurrently with volume deletion

Product:

[Red Hat Storage] Red Hat Gluster Storage

Reporter:

Rachael <rgeorge>

Component:

heketi

Assignee:

Michael Adam <madam>

Status:

CLOSED WONTFIX

QA Contact:

Rachael <rgeorge>

Severity:

high

Docs Contact:

Priority:

unspecified

Version:

cns-3.9

CC:

hchiramm, jmulligan, kramdoss, rgeorge, rhs-bugs, rtalur, storage-qa-internal

Target Milestone:

---

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2019-03-12 20:04:32 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1641915

Attachments:

Description	Flags
heketi_logs	none
topology info	none

Comment 3 krishnaram Karthick 2018-04-01 06:30:33 UTC

Created attachment 1415730 [details]
heketi_logs

Comment 4 krishnaram Karthick 2018-04-01 06:31:12 UTC

Created attachment 1415731 [details]
topology info

Comment 5 Michael Adam 2018-05-08 06:23:41 UTC

Let me say, this is is actually good behavior.
We could have a nicer error message.
But it is correct to not proceed with node removal while the volume delete is operating on the node.

Also thanks for confirming that the db stayed consistent.

Not sure what to make out of this BZ:
Is it a request for a better error message?
Or to block the CLI until the volume delete is done and only afterwards remove the node?

Comment 6 krishnaram Karthick 2018-05-15 05:55:15 UTC

(In reply to Michael Adam from comment #5)
> Let me say, this is is actually good behavior.
> We could have a nicer error message.
> But it is correct to not proceed with node removal while the volume delete
> is operating on the node.

The expectation from this bug is to have a seamless node removal operation. I believe the error seen is due to the fact that the brick replace operation has failed as the existing brick is already deleted as part of volume delete. It would be great if this is handled gracefully by the node removal process.

> 
> Also thanks for confirming that the db stayed consistent.
> 
> Not sure what to make out of this BZ:
> Is it a request for a better error message?
> Or to block the CLI until the volume delete is done and only afterwards
> remove the node?

As mentioned above, the expectation from this bug is to handle the node removal process gracefully. We cannot fail Node removal each time a volume delete operation is run which involves node remove. With scale, admin will have to run the node removal command several times which kills the uer experience.