Bug 1663367 - [RHV-RHGS] Deleting 1TB image file, leads to errors in RHV
Summary: [RHV-RHGS] Deleting 1TB image file, leads to errors in RHV
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: sharding
Version: rhgs-3.4
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
: ---
Assignee: Krutika Dhananjay
QA Contact: SATHEESARAN
URL:
Whiteboard:
Depends On:
Blocks: 1663368
TreeView+ depends on / blocked
 
Reported: 2019-01-04 05:16 UTC by SATHEESARAN
Modified: 2019-01-21 11:02 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1663368 (view as bug list)
Environment:
Last Closed: 2019-01-21 10:59:09 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description SATHEESARAN 2019-01-04 05:16:46 UTC
Description of problem:
-----------------------
When deleting the VM image file of size 1TB, there are sequence of issues/errors seen in RHV Manager. SPM goes non-operational and reboots, sanlock errors are  seen. Possible guess is that the latency in the gluster storage domain is causing such problem.


Version-Release number of selected component (if applicable):
--------------------------------------------------------------
RHV 4.0
RHGS 3.4.2

How reproducible:
-----------------
Always

Steps to Reproduce:
--------------------
1. Create a gluster storage domain
2. Create a disk of size 1TB ( either preallocate the disk or thin-allocate and write some data in to the disk )
3. Delete the VM disk from RHV Manager UI

Actual results:
---------------
On the hosts tab, host with SPM role goes inactive, events tab shows that sanlock error has occurred, vdsm heartbeat exceeded on that host, and the SPM host goes to reboot. VMs running on the SPM host goes to unknown state

Expected results:
-----------------
No errors and healthy VMs

Comment 1 Krutika Dhananjay 2019-01-04 05:32:12 UTC
Requesting volume-profile for the run where the host went unresponsive.

Comment 2 SATHEESARAN 2019-01-09 07:02:38 UTC
(In reply to Krutika Dhananjay from comment #1)
> Requesting volume-profile for the run where the host went unresponsive.

Hi Krutika,

We did the series of tests and here are the observations.

RHV                  RHGS                               Result
4.0.7          3.0 ( 3.8.4-18.6.el7rhgs )      Deleting the image causes problems
4.0.7          3.4.3 ( 3.12.2-35.el7rhgs )     Deleting the image causes problems
4.2.8          3.4.3 ( 3.12.2-35.el7rhgs )     No issues seen while deleting the images

We too tried to have RHGS 3.0 with RHV 4.2.8 and due to dependency problems, unable to add RHV 4.0 nodes to RHV Manager 4.2.8
The tests are carried out for preallocated disk images and also with creating a 1TB thin provisioned disk and filling it with
data up to 990GB.

So results looks like RHV 4.0 with RHGS combinations is having a problem.


@Sahina, what do you think ?

Comment 3 Sahina Bose 2019-01-09 08:54:12 UTC
(In reply to SATHEESARAN from comment #2)
> (In reply to Krutika Dhananjay from comment #1)
> > Requesting volume-profile for the run where the host went unresponsive.
> 
> Hi Krutika,
> 
> We did the series of tests and here are the observations.
> 
> RHV                  RHGS                               Result
> 4.0.7          3.0 ( 3.8.4-18.6.el7rhgs )      Deleting the image causes
> problems
> 4.0.7          3.4.3 ( 3.12.2-35.el7rhgs )     Deleting the image causes
> problems
> 4.2.8          3.4.3 ( 3.12.2-35.el7rhgs )     No issues seen while deleting
> the images
> 
> We too tried to have RHGS 3.0 with RHV 4.2.8 and due to dependency problems,
> unable to add RHV 4.0 nodes to RHV Manager 4.2.8
> The tests are carried out for preallocated disk images and also with
> creating a 1TB thin provisioned disk and filling it with
> data up to 990GB.
> 
> So results looks like RHV 4.0 with RHGS combinations is having a problem.
> 
> 
> @Sahina, what do you think ?

Is the issue with the images created using older version of RHV + RHGS?
Can these images be deleted successfully on update to latest versions of RHV & RHGS?

Comment 4 SATHEESARAN 2019-01-09 11:16:32 UTC
(In reply to Sahina Bose from comment #3)
> (In reply to SATHEESARAN from comment #2)
> > (In reply to Krutika Dhananjay from comment #1)
> > > Requesting volume-profile for the run where the host went unresponsive.
> > 
> > Hi Krutika,
> > 
> > We did the series of tests and here are the observations.
> > 
> > RHV                  RHGS                               Result
> > 4.0.7          3.0 ( 3.8.4-18.6.el7rhgs )      Deleting the image causes
> > problems
> > 4.0.7          3.4.3 ( 3.12.2-35.el7rhgs )     Deleting the image causes
> > problems
> > 4.2.8          3.4.3 ( 3.12.2-35.el7rhgs )     No issues seen while deleting
> > the images
> > 
> > We too tried to have RHGS 3.0 with RHV 4.2.8 and due to dependency problems,
> > unable to add RHV 4.0 nodes to RHV Manager 4.2.8
> > The tests are carried out for preallocated disk images and also with
> > creating a 1TB thin provisioned disk and filling it with
> > data up to 990GB.
> > 
> > So results looks like RHV 4.0 with RHGS combinations is having a problem.
> > 
> > 
> > @Sahina, what do you think ?
> 
> Is the issue with the images created using older version of RHV + RHGS?
> Can these images be deleted successfully on update to latest versions of RHV
> & RHGS?

The issue was seen with RHV 4.0 & upgrading to new gluster version. 
But the RHV version wasn't upgraded to RHV 4.2.8. Will complete that part of the testing, 
and let you know the results

Comment 5 SATHEESARAN 2019-01-21 10:59:09 UTC
This issue is seen with RHGS 3.0 & RHV 4.0.7.

When updating to the latest RHGS 3.4.2 ( glusterfs-3.12.2-32.el7rhgs ) and RHV 4.2.7,
this issue is not seen any more.

I have discussed the same with Sahina, and I'm closing this bug as the issue is not seen with latest gluster builds


Note You need to log in before you can comment on or make changes to this bug.