Bug 1069772 - [vdsm] gluster storage domain is reported as 'active' by host, even though its link under /rhev/data-center/SPUUID/ is missing
Summary: [vdsm] gluster storage domain is reported as 'active' by host, even though it...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: oVirt
Classification: Retired
Component: vdsm
Version: 3.4
Hardware: x86_64
OS: Unspecified
unspecified
high
Target Milestone: ---
: 3.4.1
Assignee: Federico Simoncelli
QA Contact: Gil Klein
URL:
Whiteboard: storage
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-02-25 15:57 UTC by Elad
Modified: 2016-02-10 20:42 UTC (History)
10 users (show)

Fixed In Version: v4.14.8.1
Clone Of:
Environment:
Last Closed: 2014-05-17 18:06:11 UTC
oVirt Team: Storage
Embargoed:


Attachments (Terms of Use)
sos report from engine and vdsm logs (12.08 MB, application/x-gzip)
2014-02-25 15:57 UTC, Elad
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 27466 0 None MERGED sp: update domain links on state change 2020-09-10 12:51:05 UTC

Description Elad 2014-02-25 15:57:16 UTC
Created attachment 867495 [details]
sos report from engine and vdsm logs

Description of problem:
Similar to https://bugzilla.redhat.com/show_bug.cgi?id=1026697 , only with gluster storage domain
Link of gluster storage domain has disappeared from /rhev/data-center/SPUUID/, but vdsm reports that the domain status is 'active' by:


[root@green-vdsa images]# vdsClient -s 0 getStoragePoolInfo 235c8719-6c66-4535-b13c-95591998d068
        name = shared1
        isoprefix =
        pool_status = connected
        lver = 0
        spm_id = 1
        master_uuid = eb3bf350-d77f-4213-a664-cf0d40a4a173
        version = 3
        domains = d20d8a88-61cf-484b-9ecd-4ebefbc92d7f:Active,7288f26b-36f9-4352-8309-c507adf59f4f:Active,03356609-057f-4d3b-9afb-57a28517b9f4:Active,eb3bf350-d77f-4213-a664-cf0d40a4a173:Active,e7081a6c-1c86-4f6f-85c1-e39dc6c5c198:Active,1d11607f-6e7e-48df-908f-4b28913aad9d:Active,c87164ff-588b-4b71-808f-4a2386e1e8b3:Active
        type = ISCSI
        master_ver = 2
        d20d8a88-61cf-484b-9ecd-4ebefbc92d7f = {'status': 'Active', 'diskfree': '4017566253056', 'isoprefix': '', 'alerts': [], 'disktotal': '4395899027456', 'version': 3}
        
And by repoStats:

[root@green-vdsa images]# vdsClient -s 0 repoStats
Domain d20d8a88-61cf-484b-9ecd-4ebefbc92d7f {'code': 0, 'version': 3, 'acquired': True, 'delay': '0.0012317', 'lastCheck': '7.0', 'valid': True}


Mount point exists:

[root@green-vdsa ~]# ll /rhev/data-center/mnt/glusterSD/10.35.102.17\:_elad-ovirt/
total 0
drwxr-xr-x. 4 vdsm kvm 64 Feb 25 10:05 d20d8a88-61cf-484b-9ecd-4ebefbc92d7f
-rwxr-xr-x. 1 vdsm kvm  0 Feb 25 10:01 __DIRECT_IO_TEST__

getVolumeSize for volumes under that domain succeeds without any issue:

Thread-268076::INFO::2014-02-25 17:47:48,652::logUtils::44::dispatcher::(wrapper) Run and protect: getVolumeSize(sdUUID='d20d8a88-61cf-484b-9ecd-4ebefbc92d7f', spUUID='235c8719-6c66-4535-b13c-95591998d068', imgUUID='7c02c1a6-ad75-41b2-9975-a2108b724b5e', volUUID='489f3200-3972-4a49-b33f-8b37d28014ef', options=None)
Thread-268076::INFO::2014-02-25 17:47:48,657::logUtils::47::dispatcher::(wrapper) Run and protect: getVolumeSize, Return response: {'truesize': '7516192768', 'apparentsize': '7516192768'}


But under /rhev/data-center/ it doesn't exist:

[root@green-vdsa images]# ls -l /rhev/data-center/235c8719-6c66-4535-b13c-95591998d068/
total 28
lrwxrwxrwx. 1 vdsm kvm  66 Feb 25 10:32 03356609-057f-4d3b-9afb-57a28517b9f4 -> /rhev/data-center/mnt/blockSD/03356609-057f-4d3b-9afb-57a28517b9f4
lrwxrwxrwx. 1 vdsm kvm  66 Feb 25 15:30 1d11607f-6e7e-48df-908f-4b28913aad9d -> /rhev/data-center/mnt/blockSD/1d11607f-6e7e-48df-908f-4b28913aad9d
lrwxrwxrwx. 1 vdsm kvm  85 Feb 25 10:32 7288f26b-36f9-4352-8309-c507adf59f4f -> /rhev/data-center/mnt/10.35.64.81:_export_elad_1/7288f26b-36f9-4352-8309-c507adf59f4f
lrwxrwxrwx. 1 vdsm kvm 100 Feb 25 10:32 c87164ff-588b-4b71-808f-4a2386e1e8b3 -> /rhev/data-center/mnt/lion.qa.lab.tlv.redhat.com:_export_elad_2/c87164ff-588b-4b71-808f-4a2386e1e8b3
lrwxrwxrwx. 1 vdsm kvm  93 Feb 25 10:32 e7081a6c-1c86-4f6f-85c1-e39dc6c5c198 -> /rhev/data-center/mnt/lion.qa.lab.tlv.redhat.com:_test_1/e7081a6c-1c86-4f6f-85c1-e39dc6c5c198
lrwxrwxrwx. 1 vdsm kvm  66 Feb 25 10:32 eb3bf350-d77f-4213-a664-cf0d40a4a173 -> /rhev/data-center/mnt/blockSD/eb3bf350-d77f-4213-a664-cf0d40a4a173
lrwxrwxrwx. 1 vdsm kvm  66 Feb 25 10:32 mastersd -> /rhev/data-center/mnt/blockSD/eb3bf350-d77f-4213-a664-cf0d40a4a173


^^Link to domain d20d8a88-61cf-484b-9ecd-4ebefbc92d7f does not exist^^

The domain is reported as 'active' by engine but creation of new images under it isn't possible:

Thread-283341::ERROR::2014-02-25 16:35:16,751::dispatcher::67::Storage.Dispatcher.Protect::(run) {'status': {'message': "Image does not exist in domain: 'image=19a5e8e9-2811-4b94-8579-bc7ecd92258b, domain=d20d8a88
-61cf-484b-9ecd-4ebefbc92d7f'", 'code': 268}}


Version-Release number of selected component (if applicable):
vdsm-4.14.3-0.el6.x86_64
ovirt-engine-3.4.0-0.11.beta3.el6.noarch
libvirt-0.10.2-29.el6_5.2.x86_64
qemu-kvm-rhev-0.12.1.2-2.415.el6_5.3.x86_64
glusterfs-server-3.4.2-1.el6.x86_64
glusterfs-vim-3.2.7-1.el6.x86_64
glusterfs-api-devel-3.4.2-1.el6.x86_64
glusterfs-debuginfo-3.4.2-1.el6.x86_64
glusterfs-libs-3.4.2-1.el6.x86_64
glusterfs-cli-3.4.2-1.el6.x86_64
glusterfs-rdma-3.4.2-1.el6.x86_64
glusterfs-3.4.2-1.el6.x86_64
glusterfs-resource-agents-3.4.2-1.el6.noarch
glusterfs-geo-replication-3.4.2-1.el6.x86_64
glusterfs-api-3.4.2-1.el6.x86_64
glusterfs-fuse-3.4.2-1.el6.x86_64
glusterfs-devel-3.4.2-1.el6.x86_64

How reproducible:
Need a situation in which the gluster storage domain link disappears.

Steps to Reproduce:
Happened to me on a shared DC with several storage domains (mixed types)  
I created a new gluster storage domain based on a volume that contain 2 bricks.
After the creation of the domain, I was able to create images under it without any problem. 

Actual results:

Storage domain activation:

Thread-266535::INFO::2014-02-25 10:03:57,068::logUtils::44::dispatcher::(wrapper) Run and protect: activateStorageDomain(sdUUID='d20d8a88-61cf-484b-9ecd-4ebefbc92d7f', spUUID='235c8719-6c66-4535-b13c-95591998d068', options=None)


Thread-266535::INFO::2014-02-25 10:03:57,990::sp::1104::Storage.StoragePool::(_linkStorageDomain) Linking /rhev/data-center/mnt/glusterSD/10.35.102.17:_elad-ovirt/d20d8a88-61cf-484b-9ecd-4ebefbc92d7f to /rhev/data-center/235c8719-6c66-4535-b13c-95591998d068/d20d8a88-61cf-484b-9ecd-4ebefbc92d7f

Storage domain performed well, I was able to create images under it. After several hours, when I tried to create a new disk under it, I got the "image does not exist under the storage domain" error.


Expected results:
Monitoring the domain should be done also for its symbolic link and not only for its mount point, so if the symbolic link get lost, the domain should become inactive.

Additional info: sos report from engine and vdsm logs

Comment 1 Nir Soffer 2014-02-25 17:33:18 UTC
(In reply to Elad from comment #0)
> Created attachment 867495 [details]
> 
> Expected results:
> Monitoring the domain should be done also for its symbolic link and not only
> for its mount point, so if the symbolic link get lost, the domain should
> become inactive.

No, the domain monitor should not watch links and should not change its state becuse the link was deleted.

We should find why the link was not created or removed, and fix that.

However, with the current state of logging in vdsm, it is not possible. We must have a log for each path created, modified or removed. When we have that we can fix this.

I recommend to close this as CANTFIX for now.

Comment 2 Sandro Bonazzola 2014-03-04 09:20:55 UTC
This is an automated message.
Re-targeting all non-blocker bugs still open on 3.4.0 to 3.4.1.

Comment 3 Allon Mureinik 2014-05-04 08:05:23 UTC
Fede, IIUC, your fix for bug 1091030 will address this issue too, right?

Comment 4 Federico Simoncelli 2014-05-14 23:03:57 UTC
Yes this looks like a duplicate of bug 1091030.

It's worth trying to see if the gerrit change 27466 fixed this as well.

Comment 5 Allon Mureinik 2014-05-14 23:16:05 UTC
(In reply to Federico Simoncelli from comment #4)
> Yes this looks like a duplicate of bug 1091030.
> 
> It's worth trying to see if the gerrit change 27466 fixed this as well.
Moving to ON_QA based on that statement

Comment 6 Allon Mureinik 2014-05-17 18:06:11 UTC
(In reply to Allon Mureinik from comment #5)
> (In reply to Federico Simoncelli from comment #4)
> > Yes this looks like a duplicate of bug 1091030.
> > 
> > It's worth trying to see if the gerrit change 27466 fixed this as well.
> Moving to ON_QA based on that statement

oVirt 3.4.1 has been released.


Note You need to log in before you can comment on or make changes to this bug.