Description of problem: ----------------------- When deleting the VM image file of size 1TB, there are sequence of issues/errors seen in RHV Manager. SPM goes non-operational and reboots, sanlock errors are seen. Possible guess is that the latency in the gluster storage domain is causing such problem. Version-Release number of selected component (if applicable): -------------------------------------------------------------- RHV 4.0 RHGS 3.4.2 How reproducible: ----------------- Always Steps to Reproduce: -------------------- 1. Create a gluster storage domain 2. Create a disk of size 1TB ( either preallocate the disk or thin-allocate and write some data in to the disk ) 3. Delete the VM disk from RHV Manager UI Actual results: --------------- On the hosts tab, host with SPM role goes inactive, events tab shows that sanlock error has occurred, vdsm heartbeat exceeded on that host, and the SPM host goes to reboot. VMs running on the SPM host goes to unknown state Expected results: ----------------- No errors and healthy VMs
Requesting volume-profile for the run where the host went unresponsive.
(In reply to Krutika Dhananjay from comment #1) > Requesting volume-profile for the run where the host went unresponsive. Hi Krutika, We did the series of tests and here are the observations. RHV RHGS Result 4.0.7 3.0 ( 3.8.4-18.6.el7rhgs ) Deleting the image causes problems 4.0.7 3.4.3 ( 3.12.2-35.el7rhgs ) Deleting the image causes problems 4.2.8 3.4.3 ( 3.12.2-35.el7rhgs ) No issues seen while deleting the images We too tried to have RHGS 3.0 with RHV 4.2.8 and due to dependency problems, unable to add RHV 4.0 nodes to RHV Manager 4.2.8 The tests are carried out for preallocated disk images and also with creating a 1TB thin provisioned disk and filling it with data up to 990GB. So results looks like RHV 4.0 with RHGS combinations is having a problem. @Sahina, what do you think ?
(In reply to SATHEESARAN from comment #2) > (In reply to Krutika Dhananjay from comment #1) > > Requesting volume-profile for the run where the host went unresponsive. > > Hi Krutika, > > We did the series of tests and here are the observations. > > RHV RHGS Result > 4.0.7 3.0 ( 3.8.4-18.6.el7rhgs ) Deleting the image causes > problems > 4.0.7 3.4.3 ( 3.12.2-35.el7rhgs ) Deleting the image causes > problems > 4.2.8 3.4.3 ( 3.12.2-35.el7rhgs ) No issues seen while deleting > the images > > We too tried to have RHGS 3.0 with RHV 4.2.8 and due to dependency problems, > unable to add RHV 4.0 nodes to RHV Manager 4.2.8 > The tests are carried out for preallocated disk images and also with > creating a 1TB thin provisioned disk and filling it with > data up to 990GB. > > So results looks like RHV 4.0 with RHGS combinations is having a problem. > > > @Sahina, what do you think ? Is the issue with the images created using older version of RHV + RHGS? Can these images be deleted successfully on update to latest versions of RHV & RHGS?
(In reply to Sahina Bose from comment #3) > (In reply to SATHEESARAN from comment #2) > > (In reply to Krutika Dhananjay from comment #1) > > > Requesting volume-profile for the run where the host went unresponsive. > > > > Hi Krutika, > > > > We did the series of tests and here are the observations. > > > > RHV RHGS Result > > 4.0.7 3.0 ( 3.8.4-18.6.el7rhgs ) Deleting the image causes > > problems > > 4.0.7 3.4.3 ( 3.12.2-35.el7rhgs ) Deleting the image causes > > problems > > 4.2.8 3.4.3 ( 3.12.2-35.el7rhgs ) No issues seen while deleting > > the images > > > > We too tried to have RHGS 3.0 with RHV 4.2.8 and due to dependency problems, > > unable to add RHV 4.0 nodes to RHV Manager 4.2.8 > > The tests are carried out for preallocated disk images and also with > > creating a 1TB thin provisioned disk and filling it with > > data up to 990GB. > > > > So results looks like RHV 4.0 with RHGS combinations is having a problem. > > > > > > @Sahina, what do you think ? > > Is the issue with the images created using older version of RHV + RHGS? > Can these images be deleted successfully on update to latest versions of RHV > & RHGS? The issue was seen with RHV 4.0 & upgrading to new gluster version. But the RHV version wasn't upgraded to RHV 4.2.8. Will complete that part of the testing, and let you know the results
This issue is seen with RHGS 3.0 & RHV 4.0.7. When updating to the latest RHGS 3.4.2 ( glusterfs-3.12.2-32.el7rhgs ) and RHV 4.2.7, this issue is not seen any more. I have discussed the same with Sahina, and I'm closing this bug as the issue is not seen with latest gluster builds