Description of problem: ======================== The storage nodes still contains VM images even after deletion of VM's (deleted from RHEVM) Version-Release number of selected component (if applicable): =========================================================== [10/17/12 - 11:27:30 root@rhs-client6 ~]# rpm -qa | grep gluster glusterfs-geo-replication-3.3.0rhsvirt1-7.el6rhs.x86_64 vdsm-gluster-4.9.6-14.el6rhs.noarch gluster-swift-plugin-1.0-5.noarch gluster-swift-container-1.4.8-4.el6.noarch org.apache.hadoop.fs.glusterfs-glusterfs-0.20.2_0.2-1.noarch glusterfs-3.3.0rhsvirt1-7.el6rhs.x86_64 glusterfs-server-3.3.0rhsvirt1-7.el6rhs.x86_64 glusterfs-rdma-3.3.0rhsvirt1-7.el6rhs.x86_64 gluster-swift-proxy-1.4.8-4.el6.noarch gluster-swift-account-1.4.8-4.el6.noarch gluster-swift-doc-1.4.8-4.el6.noarch glusterfs-fuse-3.3.0rhsvirt1-7.el6rhs.x86_64 glusterfs-debuginfo-3.3.0rhsvirt1-7.el6rhs.x86_64 gluster-swift-1.4.8-4.el6.noarch gluster-swift-object-1.4.8-4.el6.noarch [10/17/12 - 11:34:01 root@rhs-client6 ~]# gluster --version glusterfs 3.3.0rhsvirt1 built on Oct 8 2012 15:23:00 Steps to Reproduce: 1.Create a pure replicate volume (1x2) with 2 servers and 1 brick on each server. This is the storage for the VM's. start the volume. 2.Set-up the Cluster, Storage-Domain, Hosts from RHEVM to use this volume as VM store. 3. Create 5 vm's on a replicate storage domain (form RHEVM) selected pxe boot configured network configures virtual disk to have 30GB pre-allocated space . 4. Poweroff both servers (storage nodes) Note: The 4 out of 5 virtual disks were in Locked state before powering off servers. (storage nodes) 5. Poweron immediately server2.(storage node 2) 6. The virtual disk congratulation information on 4 vm's which were in locked state are completely lost. 7. The set-up was left for around 18 hours. After 18 hours the server2 was up, server1 was down and host2 (hypervisor 2 )was down. 8. Brought back server1. 9. Deleted all the 5 VM's from RHEVM Actual results: ================ The deletion of 5 VM's is updated in the hypervisors but not in the back end storage. Both the servers (storage nodes) had still has the 5 VM's data . The deletion operation on the servers (storage nodes) didn't succeed . Expected results: ================== The VM's should have been deleted from the storage nodes when performed delete operation from RHEVM. Additional info: ================== [10/17/12 - 11:43:17 root@rhs-client6 ~]# gluster volume info replicate Volume Name: replicate Type: Replicate Volume ID: b2f1fd96-fcec-4110-81e6-963dba306d00 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: rhs-client6.lab.eng.blr.redhat.com:/disk0 Brick2: rhs-client7.lab.eng.blr.redhat.com:/disk0 Options Reconfigured: performance.quick-read: disable performance.io-cache: disable performance.stat-prefetch: disable performance.read-ahead: disable cluster.eager-lock: enable storage.linux-aio: enable All the volume options are set when created the volume.
not sure which component has bugs here... did 'unlink()' reach mountpoint at all? if it has not reached, then bug is not part of glusterfs, and if it did, then it is glusterfs bug. Need to confirm that.
shwetha, can once you get the detail on forcefully removing the files, can you update if that is able to remove the files?
Hi Amar, I have requested Haim in rhev-gluster mailing list to provide us pointers about how to delete VMs directly from data-base. Yet to get the reply from Haim, Waiting for this reply. will ping him once in IRC on Monday. Also we don't have the set-up to try removing the files. Since we had to continue our testing we had to remove the set-up we had for this case. Once we have the command will try to re-create the problem.
Hi Amar, Got reply from Haim. Here he what he has asked us to do: please refer to the following tables: [root@test]# psql -U postgres engine -c "\d" | grep vm | grep -v view public | tags_vm_map | table | engine public | tags_vm_pool_map | table | engine public | vm_device | table | engine public | vm_dynamic | table | engine public | vm_interface | table | engine public | vm_interface_statistics | table | engine public | vm_pool_map | table | engine public | vm_pools | table | engine public | vm_static | table | engine public | vm_statistics | table | engine in specific, delete the vms from both vm_static and vm_dynamic, with DELETE FROM action.
ok... after doing this, will we be actually removing the VMs? if its able to remove the VM from storage, i would close the bug as storage had little role to play here.
Planning to propose it as a known issue with RHEV Image hosting, as storage couldn't do much if it doesnt get 'unlink()' call itself. (updated the doc-text)
Marked for inclusion in Known Issues. Nothing much can be done in the storage layer if unlinks are not received.
Documentation for Beta release includes the known issues section, meantime, no tasks from Storage layer to fix the bug.