Created attachment 867495 [details] sos report from engine and vdsm logs Description of problem: Similar to https://bugzilla.redhat.com/show_bug.cgi?id=1026697 , only with gluster storage domain Link of gluster storage domain has disappeared from /rhev/data-center/SPUUID/, but vdsm reports that the domain status is 'active' by: [root@green-vdsa images]# vdsClient -s 0 getStoragePoolInfo 235c8719-6c66-4535-b13c-95591998d068 name = shared1 isoprefix = pool_status = connected lver = 0 spm_id = 1 master_uuid = eb3bf350-d77f-4213-a664-cf0d40a4a173 version = 3 domains = d20d8a88-61cf-484b-9ecd-4ebefbc92d7f:Active,7288f26b-36f9-4352-8309-c507adf59f4f:Active,03356609-057f-4d3b-9afb-57a28517b9f4:Active,eb3bf350-d77f-4213-a664-cf0d40a4a173:Active,e7081a6c-1c86-4f6f-85c1-e39dc6c5c198:Active,1d11607f-6e7e-48df-908f-4b28913aad9d:Active,c87164ff-588b-4b71-808f-4a2386e1e8b3:Active type = ISCSI master_ver = 2 d20d8a88-61cf-484b-9ecd-4ebefbc92d7f = {'status': 'Active', 'diskfree': '4017566253056', 'isoprefix': '', 'alerts': [], 'disktotal': '4395899027456', 'version': 3} And by repoStats: [root@green-vdsa images]# vdsClient -s 0 repoStats Domain d20d8a88-61cf-484b-9ecd-4ebefbc92d7f {'code': 0, 'version': 3, 'acquired': True, 'delay': '0.0012317', 'lastCheck': '7.0', 'valid': True} Mount point exists: [root@green-vdsa ~]# ll /rhev/data-center/mnt/glusterSD/10.35.102.17\:_elad-ovirt/ total 0 drwxr-xr-x. 4 vdsm kvm 64 Feb 25 10:05 d20d8a88-61cf-484b-9ecd-4ebefbc92d7f -rwxr-xr-x. 1 vdsm kvm 0 Feb 25 10:01 __DIRECT_IO_TEST__ getVolumeSize for volumes under that domain succeeds without any issue: Thread-268076::INFO::2014-02-25 17:47:48,652::logUtils::44::dispatcher::(wrapper) Run and protect: getVolumeSize(sdUUID='d20d8a88-61cf-484b-9ecd-4ebefbc92d7f', spUUID='235c8719-6c66-4535-b13c-95591998d068', imgUUID='7c02c1a6-ad75-41b2-9975-a2108b724b5e', volUUID='489f3200-3972-4a49-b33f-8b37d28014ef', options=None) Thread-268076::INFO::2014-02-25 17:47:48,657::logUtils::47::dispatcher::(wrapper) Run and protect: getVolumeSize, Return response: {'truesize': '7516192768', 'apparentsize': '7516192768'} But under /rhev/data-center/ it doesn't exist: [root@green-vdsa images]# ls -l /rhev/data-center/235c8719-6c66-4535-b13c-95591998d068/ total 28 lrwxrwxrwx. 1 vdsm kvm 66 Feb 25 10:32 03356609-057f-4d3b-9afb-57a28517b9f4 -> /rhev/data-center/mnt/blockSD/03356609-057f-4d3b-9afb-57a28517b9f4 lrwxrwxrwx. 1 vdsm kvm 66 Feb 25 15:30 1d11607f-6e7e-48df-908f-4b28913aad9d -> /rhev/data-center/mnt/blockSD/1d11607f-6e7e-48df-908f-4b28913aad9d lrwxrwxrwx. 1 vdsm kvm 85 Feb 25 10:32 7288f26b-36f9-4352-8309-c507adf59f4f -> /rhev/data-center/mnt/10.35.64.81:_export_elad_1/7288f26b-36f9-4352-8309-c507adf59f4f lrwxrwxrwx. 1 vdsm kvm 100 Feb 25 10:32 c87164ff-588b-4b71-808f-4a2386e1e8b3 -> /rhev/data-center/mnt/lion.qa.lab.tlv.redhat.com:_export_elad_2/c87164ff-588b-4b71-808f-4a2386e1e8b3 lrwxrwxrwx. 1 vdsm kvm 93 Feb 25 10:32 e7081a6c-1c86-4f6f-85c1-e39dc6c5c198 -> /rhev/data-center/mnt/lion.qa.lab.tlv.redhat.com:_test_1/e7081a6c-1c86-4f6f-85c1-e39dc6c5c198 lrwxrwxrwx. 1 vdsm kvm 66 Feb 25 10:32 eb3bf350-d77f-4213-a664-cf0d40a4a173 -> /rhev/data-center/mnt/blockSD/eb3bf350-d77f-4213-a664-cf0d40a4a173 lrwxrwxrwx. 1 vdsm kvm 66 Feb 25 10:32 mastersd -> /rhev/data-center/mnt/blockSD/eb3bf350-d77f-4213-a664-cf0d40a4a173 ^^Link to domain d20d8a88-61cf-484b-9ecd-4ebefbc92d7f does not exist^^ The domain is reported as 'active' by engine but creation of new images under it isn't possible: Thread-283341::ERROR::2014-02-25 16:35:16,751::dispatcher::67::Storage.Dispatcher.Protect::(run) {'status': {'message': "Image does not exist in domain: 'image=19a5e8e9-2811-4b94-8579-bc7ecd92258b, domain=d20d8a88 -61cf-484b-9ecd-4ebefbc92d7f'", 'code': 268}} Version-Release number of selected component (if applicable): vdsm-4.14.3-0.el6.x86_64 ovirt-engine-3.4.0-0.11.beta3.el6.noarch libvirt-0.10.2-29.el6_5.2.x86_64 qemu-kvm-rhev-0.12.1.2-2.415.el6_5.3.x86_64 glusterfs-server-3.4.2-1.el6.x86_64 glusterfs-vim-3.2.7-1.el6.x86_64 glusterfs-api-devel-3.4.2-1.el6.x86_64 glusterfs-debuginfo-3.4.2-1.el6.x86_64 glusterfs-libs-3.4.2-1.el6.x86_64 glusterfs-cli-3.4.2-1.el6.x86_64 glusterfs-rdma-3.4.2-1.el6.x86_64 glusterfs-3.4.2-1.el6.x86_64 glusterfs-resource-agents-3.4.2-1.el6.noarch glusterfs-geo-replication-3.4.2-1.el6.x86_64 glusterfs-api-3.4.2-1.el6.x86_64 glusterfs-fuse-3.4.2-1.el6.x86_64 glusterfs-devel-3.4.2-1.el6.x86_64 How reproducible: Need a situation in which the gluster storage domain link disappears. Steps to Reproduce: Happened to me on a shared DC with several storage domains (mixed types) I created a new gluster storage domain based on a volume that contain 2 bricks. After the creation of the domain, I was able to create images under it without any problem. Actual results: Storage domain activation: Thread-266535::INFO::2014-02-25 10:03:57,068::logUtils::44::dispatcher::(wrapper) Run and protect: activateStorageDomain(sdUUID='d20d8a88-61cf-484b-9ecd-4ebefbc92d7f', spUUID='235c8719-6c66-4535-b13c-95591998d068', options=None) Thread-266535::INFO::2014-02-25 10:03:57,990::sp::1104::Storage.StoragePool::(_linkStorageDomain) Linking /rhev/data-center/mnt/glusterSD/10.35.102.17:_elad-ovirt/d20d8a88-61cf-484b-9ecd-4ebefbc92d7f to /rhev/data-center/235c8719-6c66-4535-b13c-95591998d068/d20d8a88-61cf-484b-9ecd-4ebefbc92d7f Storage domain performed well, I was able to create images under it. After several hours, when I tried to create a new disk under it, I got the "image does not exist under the storage domain" error. Expected results: Monitoring the domain should be done also for its symbolic link and not only for its mount point, so if the symbolic link get lost, the domain should become inactive. Additional info: sos report from engine and vdsm logs
(In reply to Elad from comment #0) > Created attachment 867495 [details] > > Expected results: > Monitoring the domain should be done also for its symbolic link and not only > for its mount point, so if the symbolic link get lost, the domain should > become inactive. No, the domain monitor should not watch links and should not change its state becuse the link was deleted. We should find why the link was not created or removed, and fix that. However, with the current state of logging in vdsm, it is not possible. We must have a log for each path created, modified or removed. When we have that we can fix this. I recommend to close this as CANTFIX for now.
This is an automated message. Re-targeting all non-blocker bugs still open on 3.4.0 to 3.4.1.
Fede, IIUC, your fix for bug 1091030 will address this issue too, right?
Yes this looks like a duplicate of bug 1091030. It's worth trying to see if the gerrit change 27466 fixed this as well.
(In reply to Federico Simoncelli from comment #4) > Yes this looks like a duplicate of bug 1091030. > > It's worth trying to see if the gerrit change 27466 fixed this as well. Moving to ON_QA based on that statement
(In reply to Allon Mureinik from comment #5) > (In reply to Federico Simoncelli from comment #4) > > Yes this looks like a duplicate of bug 1091030. > > > > It's worth trying to see if the gerrit change 27466 fixed this as well. > Moving to ON_QA based on that statement oVirt 3.4.1 has been released.