Bug 1443819 - Stale Active LVs in Hosted-Engine Storage Domain
Summary: Stale Active LVs in Hosted-Engine Storage Domain
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-hosted-engine-ha
Version: 4.0.7
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ovirt-4.2.2
: ---
Assignee: Simone Tiraboschi
QA Contact: Nikolai Sednev
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-04-20 05:43 UTC by Germano Veit Michel
Modified: 2021-06-10 12:16 UTC (History)
11 users (show)

Fixed In Version: ovirt-hosted-engine-ha-2.2.5
Doc Type: Bug Fix
Doc Text:
Previously on Self-hosted Engine storage domains, all images (including user created disks) were active only while the required images were open. This could result in stale logical volumes if the user created unused disks on the storage domain. Now ovirt-hosted-engine-ha activates images only when needed.
Clone Of:
Environment:
Last Closed: 2018-05-15 17:32:29 UTC
oVirt Team: Integration
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1275552 0 medium CLOSED [hosted-engine] Disk actions on Hosted-engine storage domain should warn the user or possible risks. 2022-06-30 07:51:40 UTC
Red Hat Bugzilla 1386497 0 low CLOSED [RFE] [Docs][Admin] Add a Warning Box to SD-extension in Admin Guide, warning that extending the HE-SD is not supported. 2021-02-22 00:41:40 UTC
Red Hat Bugzilla 1444671 0 unspecified CLOSED [hosted-engine] Disk actions on Hosted-engine storage domain should warn the user or possible risks. 2021-06-10 12:21:37 UTC
Red Hat Bugzilla 1451653 0 high CLOSED [RFE][HE] - HE storage domain should be treated as a regular storage domain. 2022-06-22 13:40:29 UTC
Red Hat Knowledge Base (Solution) 2726671 0 None None None 2017-04-24 00:14:17 UTC
Red Hat Product Errata RHBA-2018:1472 0 None None None 2018-05-15 17:33:55 UTC
oVirt gerrit 87325 0 master MERGED storage: avoid preparing all the images in advance 2020-01-26 12:58:00 UTC
oVirt gerrit 87326 0 master MERGED vintage: storage: avoid relying on image.prepare_images 2020-01-26 12:58:00 UTC
oVirt gerrit 87546 0 ovirt-hosted-engine-setup-2.2 MERGED vintage: storage: avoid relying on image.prepare_images 2020-01-26 12:58:00 UTC
oVirt gerrit 87584 0 v2.2.z MERGED storage: avoid preparing all the images in advance 2020-01-26 12:58:00 UTC

Internal Links: 1275552 1386497 1444671 1451653

Description Germano Veit Michel 2017-04-20 05:43:12 UTC
Description of problem:

1. Host boots and connects to storage
2. RHEL activates all LVs
3. VDSM Deactivate all LVs on bootstrap.

All this works well. However, for the Hosted-Engine Storage Domain most if not all images are active and not open right after vdsm initialization, because they are all activated again a few seconds after vdsm initialization deactivates them.

These are stale LVs and this is undesirable. This even caused corruption before Nir's --refresh patch. 
We don't want to rely on --refresh all the time, these LVs cannot be active.

It's ovirt-ha-agent that asks VDSM to prepare all images in the HE SD, and the unused LVs are never deactivated.

See:

# cat /etc/ovirt-hosted-engine/hosted-engine.conf  | grep sdUUID
sdUUID=b1806393-a63b-4c0e-a4ab-4fad369c1654

Now let's see how many active but not open images (disks with tags, see IU_ )we have there.

# lvs -o +tags | grep b1806393-a63b-4c0e-a4ab-4fad369c1654 | grep IU_ | grep '\-wi\-a\-' | wc -l
13

13 cannot be just OVFs or Hosted-Engine conf volumes.

Let's see how many are not active:

# lvs -o +tags | grep b1806393-a63b-4c0e-a4ab-4fad369c1654 | grep IU_ | grep '\-wi\-\-\-' | wc -l
0

One example, for a disk that I created 1 minute ago:

# lvs -o +tags | grep 85b71ffb-47e3-47bf-af7a-ce135655cc4f
  de5c96de-6c8d-4b37-a7ba-8d922d95a63c b1806393-a63b-4c0e-a4ab-4fad369c1654 -wi-a-----   1.00g                                                                      IU_85b71ffb-47e3-47bf-af7a-ce135655cc4f,MD_15,PU_00000000-0000-0000-0000-000000000000      

From my investigation, this is what happens:

4. ovirt-ha-agent asks vdsm to prepare all images of the HE SD. So vdsm activates all of them right after boot.

    def _initialize_storage_images(self):
        [....]
        img.prepare_images()
        [....]

While prepare_images comments are good enough:

    def prepare_images(self):
        """
        It scans for all the available images and volumes on the hosted-engine
        storage domain and for each of them calls prepareImage on VDSM.
        prepareImage will create the needed symlinks and it will activate
        the LV if on block devices.
        """

5. So once we see this, the images are all active.

MainThread::INFO::2017-04-19 15:43:16,141::hosted_engine::639::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) Connecting the storage
MainThread::INFO::2017-04-19 15:43:16,142::storage_server::219::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) Connecting storage server
MainThread::INFO::2017-04-19 15:43:18,353::storage_server::233::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) Refreshing the storage domain
MainThread::INFO::2017-04-19 15:43:18,669::hosted_engine::666::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) Preparing images
MainThread::INFO::2017-04-19 15:43:18,669::image::126::ovirt_hosted_engine_ha.lib.image.Image::(prepare_images) Preparing images

6. And nobody asks vdsm to tear the unused ones down. Because ovirt-ha-agent just calls teardown_images() on this exception

    def _initialize_storage_images(self):
        [....]
        try:
            sserver.connect_storage_server()
        except ex.DuplicateStorageConnectionException:
            [....]
            img.teardown_images()
            [....]

Version-Release number of selected component (if applicable):
It's reproducible on pretty much every RHV version, all the way from 3.6 to latest. 
Just tested it on:
ovirt-hosted-engine-ha-2.0.6-1.el7ev.noarch
vdsm-4.18.21-1.el7ev.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Add a disk to the hosted_storage
2. Wait a ovirt-ha-agent cycle
3. check host LVs

Actual results:
Stale LVs active

Expected results:
No Stale LVs

Comment 2 Yaniv Kaul 2017-04-20 05:46:50 UTC
Those are only disks on the HE SD, which should only be the HE VM disks, right?

Comment 3 Germano Veit Michel 2017-04-20 05:51:05 UTC
(In reply to Yaniv Kaul from comment #2)
> Those are only disks on the HE SD, which should only be the HE VM disks,
> right?

No, we allow using the HE SD as a normal SD. 

So user can create as many disks as he wants in the HE SD and attach to the VMs.

Comment 5 Martin Sivák 2017-04-20 08:12:20 UTC
The code allows that, but we have always said it is not supported. It is going to change, but we haven't gotten to it yet.

Comment 6 Germano Veit Michel 2017-04-23 23:49:10 UTC
(In reply to Martin Sivák from comment #5)
> The code allows that, but we have always said it is not supported. It is
> going to change, but we haven't gotten to it yet.

Hi Martin,

Thanks for linking, 1275552. I think it needs to be escalated asap, will do it now.

So if it is not supported this must be made VERY clear, including a warning/blocking the action in the Portal.

A small note buried here: https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.0/html-single/self-hosted_engine_guide/

Saying "The self-hosted engine requires a shared storage domain dedicated to the Manager virtual machine." is not enough. We must add a warning/experimental label as we have for OVS.

Depending on the names of the Storage Domains, `hosted_storage` is even the default one to create a new disk (as it is in our labs), due to alphabetical order.

I understand the previous bugs related to 1275552 were mostly performance/compromised HA issues. But this BZ here - stale LVs - can lead to data corruption as we have already seen in the past with stale LVs (BZ1358348). We cannot rely on the lvm refresh patch to always save VMs from corruption in the HE SD, that is a safety net mechanism. This is very serious.

Comment 7 Yaniv Lavi 2017-05-14 14:20:18 UTC
We plan to make the HE SD a normal SD in the system, I'll be closing the other bugs.

Comment 8 Allon Mureinik 2017-05-14 14:26:24 UTC
(In reply to Yaniv Dary from comment #7)
> We plan to make the HE SD a normal SD in the system, I'll be closing the
> other bugs.

So please do.
There's no actionable item here for storage.
Setting devel cond-nack until a clear requirement arises.

Comment 9 Yaniv Lavi 2018-02-05 09:51:40 UTC
Any improvement on this in 4.2?

Comment 10 Simone Tiraboschi 2018-02-08 10:15:52 UTC
(In reply to Yaniv Lavi from comment #9)
> Any improvement on this in 4.2?

It will be probably worst on this side.
On node-zero flow, just after the setup, the hosted-engine storage domain will be active in the engine and it will be the master storage domain.
No other storage domain is required on technical side to start using the system.

Although we don't recommend it in our documentation, the user can create other VMs on the hosted-engine storage domain and the engine doesn't complain at all
so at the end the user can have more disks on the hosted-engine storage domain and so more LVs.

On ovirt-ha-agent side it's still almost the same: prepare_images will prepare all the images found on the storage domain and this will result in LVs active but not open.

Comment 12 Nikolai Sednev 2018-02-20 18:12:34 UTC
I've deployed clean environment over iSCSI using ansible and created 10 disks on hosted storage of 2GB size each. 

Here what I saw just after creation of disks from host: 
alma03 ~]# lvs -o +tags
  LV                                   VG                                   Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert LV Tags                                                                              
  0196b277-7292-4512-aeb5-71795dd58ce9 0d528e5a-43f8-4b73-b53c-61a909def9e7 -wi-------   2.00g                                                     IU_0740975b-3d77-4e63-b045-b6de00580139,MD_9,PU_00000000-0000-0000-0000-000000000000 
  02c9ade8-3d76-45f3-85b4-104b997b54af 0d528e5a-43f8-4b73-b53c-61a909def9e7 -wi-------   2.00g                                                     IU_9bd6d63e-3915-446e-8040-910100fb48d8,MD_8,PU_00000000-0000-0000-0000-000000000000 
  17ed51d4-df2e-40c1-a4eb-ced43546ed16 0d528e5a-43f8-4b73-b53c-61a909def9e7 -wi-ao----   1.00g                                                     IU_cc594e63-b22b-40ca-9c12-6c9576c10372,MD_4,PU_00000000-0000-0000-0000-000000000000 
  1a06eeb9-4230-4a44-ba7d-291616fed6ac 0d528e5a-43f8-4b73-b53c-61a909def9e7 -wi-------   2.00g                                                     IU_59eb6bf7-4c4a-41b2-a6e5-42744bcb0b93,MD_11,PU_00000000-0000-0000-0000-000000000000
  29be3876-d794-422c-97bc-a1101d83530b 0d528e5a-43f8-4b73-b53c-61a909def9e7 -wi-a-----   1.00g                                                     IU_e8ba1447-e2c8-4d0d-bbda-016e35e3483d,MD_6,PU_00000000-0000-0000-0000-000000000000 
  47c56bed-8332-4b99-9083-31ba817bed3c 0d528e5a-43f8-4b73-b53c-61a909def9e7 -wi-------   2.00g                                                     IU_fe0ab22c-8d92-4ca9-9445-05eb142b3f59,MD_13,PU_00000000-0000-0000-0000-000000000000
  663f0706-b715-41b0-86e6-b42d00af9447 0d528e5a-43f8-4b73-b53c-61a909def9e7 -wi-------   2.00g                                                     IU_34bb3d89-5a8c-414d-8a2d-e42677e49cc6,MD_14,PU_00000000-0000-0000-0000-000000000000
  77f70737-2c51-41f4-9181-8c1be655027a 0d528e5a-43f8-4b73-b53c-61a909def9e7 -wi-a-----   1.00g                                                     IU_42c15741-4d08-4360-b968-0f43e6abd284,MD_5,PU_00000000-0000-0000-0000-000000000000 
  816db21b-b746-4566-bd95-932acc5a6814 0d528e5a-43f8-4b73-b53c-61a909def9e7 -wi-------   2.00g                                                     IU_8868b703-d716-4c2c-8fe0-deab42f298bd,MD_10,PU_00000000-0000-0000-0000-000000000000
  a20c86c4-ef74-4d72-9c7a-fe1ed0ed7739 0d528e5a-43f8-4b73-b53c-61a909def9e7 -wi-------   2.00g                                                     IU_bd35c468-4501-4189-99c9-bbe24d6fbf87,MD_12,PU_00000000-0000-0000-0000-000000000000
  a52b729b-836f-46f7-8046-1c345e0143d8 0d528e5a-43f8-4b73-b53c-61a909def9e7 -wi-ao----  50.00g                                                     IU_1f0754c4-2066-44bf-a044-94c8ea279b41,MD_7,PU_00000000-0000-0000-0000-000000000000 
  b1648a60-a21c-4ae2-a3f4-dd21eacce714 0d528e5a-43f8-4b73-b53c-61a909def9e7 -wi-------   2.00g                                                     IU_9f22be1b-2277-48d6-be3d-4cf2ac57f651,MD_16,PU_00000000-0000-0000-0000-000000000000
  c843ead8-3f03-4a97-8bde-3655308e466d 0d528e5a-43f8-4b73-b53c-61a909def9e7 -wi-------   2.00g                                                     IU_beaf4558-cd4a-4ce7-bff1-2e9a8adabef1,MD_17,PU_00000000-0000-0000-0000-000000000000
  ce68569a-5537-4cdc-ac4e-2de72ce25259 0d528e5a-43f8-4b73-b53c-61a909def9e7 -wi-------   2.00g                                                     IU_9eea0a1e-3139-42c8-8700-c08c44812518,MD_15,PU_00000000-0000-0000-0000-000000000000
  ids                                  0d528e5a-43f8-4b73-b53c-61a909def9e7 -wi-ao---- 128.00m                                                                                                                                          
  inbox                                0d528e5a-43f8-4b73-b53c-61a909def9e7 -wi-a----- 128.00m                                                                                                                                          
  leases                               0d528e5a-43f8-4b73-b53c-61a909def9e7 -wi-a-----   2.00g                                                                                                                                          
  master                               0d528e5a-43f8-4b73-b53c-61a909def9e7 -wi-ao----   1.00g                                                                                                                                          
  metadata                             0d528e5a-43f8-4b73-b53c-61a909def9e7 -wi-a----- 512.00m                                                                                                                                          
  outbox                               0d528e5a-43f8-4b73-b53c-61a909def9e7 -wi-a----- 128.00m                                                                                                                                          
  xleases                              0d528e5a-43f8-4b73-b53c-61a909def9e7 -wi-a-----   1.00g 

Then I restarted ha-agent and broker and checked again:
                                                                                                                                         
[root@alma03 ~]# systemctl restart ovirt-ha-broker &&  systemctl restart ovirt-ha-agent
[root@alma03 ~]# lvs -o +tags
  LV                                   VG                                   Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert LV Tags                                                                              
  0196b277-7292-4512-aeb5-71795dd58ce9 0d528e5a-43f8-4b73-b53c-61a909def9e7 -wi-------   2.00g                                                     IU_0740975b-3d77-4e63-b045-b6de00580139,MD_9,PU_00000000-0000-0000-0000-000000000000 
  02c9ade8-3d76-45f3-85b4-104b997b54af 0d528e5a-43f8-4b73-b53c-61a909def9e7 -wi-------   2.00g                                                     IU_9bd6d63e-3915-446e-8040-910100fb48d8,MD_8,PU_00000000-0000-0000-0000-000000000000 
  17ed51d4-df2e-40c1-a4eb-ced43546ed16 0d528e5a-43f8-4b73-b53c-61a909def9e7 -wi-ao----   1.00g                                                     IU_cc594e63-b22b-40ca-9c12-6c9576c10372,MD_4,PU_00000000-0000-0000-0000-000000000000 
  1a06eeb9-4230-4a44-ba7d-291616fed6ac 0d528e5a-43f8-4b73-b53c-61a909def9e7 -wi-------   2.00g                                                     IU_59eb6bf7-4c4a-41b2-a6e5-42744bcb0b93,MD_11,PU_00000000-0000-0000-0000-000000000000
  29be3876-d794-422c-97bc-a1101d83530b 0d528e5a-43f8-4b73-b53c-61a909def9e7 -wi-a-----   1.00g                                                     IU_e8ba1447-e2c8-4d0d-bbda-016e35e3483d,MD_6,PU_00000000-0000-0000-0000-000000000000 
  47c56bed-8332-4b99-9083-31ba817bed3c 0d528e5a-43f8-4b73-b53c-61a909def9e7 -wi-------   2.00g                                                     IU_fe0ab22c-8d92-4ca9-9445-05eb142b3f59,MD_13,PU_00000000-0000-0000-0000-000000000000
  663f0706-b715-41b0-86e6-b42d00af9447 0d528e5a-43f8-4b73-b53c-61a909def9e7 -wi-------   2.00g                                                     IU_34bb3d89-5a8c-414d-8a2d-e42677e49cc6,MD_14,PU_00000000-0000-0000-0000-000000000000
  77f70737-2c51-41f4-9181-8c1be655027a 0d528e5a-43f8-4b73-b53c-61a909def9e7 -wi-a-----   1.00g                                                     IU_42c15741-4d08-4360-b968-0f43e6abd284,MD_5,PU_00000000-0000-0000-0000-000000000000 
  816db21b-b746-4566-bd95-932acc5a6814 0d528e5a-43f8-4b73-b53c-61a909def9e7 -wi-------   2.00g                                                     IU_8868b703-d716-4c2c-8fe0-deab42f298bd,MD_10,PU_00000000-0000-0000-0000-000000000000
  a20c86c4-ef74-4d72-9c7a-fe1ed0ed7739 0d528e5a-43f8-4b73-b53c-61a909def9e7 -wi-------   2.00g                                                     IU_bd35c468-4501-4189-99c9-bbe24d6fbf87,MD_12,PU_00000000-0000-0000-0000-000000000000
  a52b729b-836f-46f7-8046-1c345e0143d8 0d528e5a-43f8-4b73-b53c-61a909def9e7 -wi-ao----  50.00g                                                     IU_1f0754c4-2066-44bf-a044-94c8ea279b41,MD_7,PU_00000000-0000-0000-0000-000000000000 
  b1648a60-a21c-4ae2-a3f4-dd21eacce714 0d528e5a-43f8-4b73-b53c-61a909def9e7 -wi-------   2.00g                                                     IU_9f22be1b-2277-48d6-be3d-4cf2ac57f651,MD_16,PU_00000000-0000-0000-0000-000000000000
  c843ead8-3f03-4a97-8bde-3655308e466d 0d528e5a-43f8-4b73-b53c-61a909def9e7 -wi-------   2.00g                                                     IU_beaf4558-cd4a-4ce7-bff1-2e9a8adabef1,MD_17,PU_00000000-0000-0000-0000-000000000000
  ce68569a-5537-4cdc-ac4e-2de72ce25259 0d528e5a-43f8-4b73-b53c-61a909def9e7 -wi-------   2.00g                                                     IU_9eea0a1e-3139-42c8-8700-c08c44812518,MD_15,PU_00000000-0000-0000-0000-000000000000
  ids                                  0d528e5a-43f8-4b73-b53c-61a909def9e7 -wi-ao---- 128.00m                                                                                                                                          
  inbox                                0d528e5a-43f8-4b73-b53c-61a909def9e7 -wi-a----- 128.00m                                                                                                                                          
  leases                               0d528e5a-43f8-4b73-b53c-61a909def9e7 -wi-a-----   2.00g                                                                                                                                          
  master                               0d528e5a-43f8-4b73-b53c-61a909def9e7 -wi-ao----   1.00g                                                                                                                                          
  metadata                             0d528e5a-43f8-4b73-b53c-61a909def9e7 -wi-a----- 512.00m                                                                                                                                          
  outbox                               0d528e5a-43f8-4b73-b53c-61a909def9e7 -wi-a----- 128.00m                                                                                                                                          
  xleases                              0d528e5a-43f8-4b73-b53c-61a909def9e7 -wi-a-----   1.00g 


All disks were with -wi------- as expected, they were not active/open or active but not open.

Moving to verified.

Worked for me on these components on host:
rhvm-appliance-4.2-20180202.0.el7.noarch
ovirt-hosted-engine-ha-2.2.5-1.el7ev.noarch
ovirt-hosted-engine-setup-2.2.10-1.el7ev.noarch
Red Hat Enterprise Linux Server release 7.4 (Maipo)
Linux 3.10.0-693.19.1.el7.x86_64 #1 SMP Thu Feb 1 12:34:44 EST 2018 x86_64 x86_64 x86_64 GNU/Linux

Comment 15 errata-xmlrpc 2018-05-15 17:32:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:1472

Comment 16 Franta Kust 2019-05-16 13:03:20 UTC
BZ<2>Jira Resync


Note You need to log in before you can comment on or make changes to this bug.