Bug 1883157 - Upgrade to 4.4.2 will fail due to dangling symlinks
Summary: Upgrade to 4.4.2 will fail due to dangling symlinks
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-node
Classification: oVirt
Component: Installation & Update
Version: 4.4.2
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ovirt-4.4.3-1
: 4.4.3
Assignee: Nir Levy
QA Contact: peyu
URL:
Whiteboard:
Depends On:
Blocks: 1886647 1895356
TreeView+ depends on / blocked
 
Reported: 2020-09-28 08:35 UTC by Jean-Louis Dupond
Modified: 2020-11-27 15:50 UTC (History)
11 users (show)

Fixed In Version: ovirt-node-ng-image-4.4.3
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-11-27 15:50:38 UTC
oVirt Team: Node
Embargoed:
pm-rhel: ovirt-4.4+
peyu: testing_plan_complete+
pm-rhel: planning_ack+
sbonazzo: devel_ack+
peyu: testing_ack+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 111752 0 master MERGED spec, pre scriptlet, ignore non local storages 2021-01-15 03:19:21 UTC
oVirt gerrit 112102 0 ovirt-4.4 MERGED spec, pre scriptlet, ignore non local storages 2021-01-15 03:18:44 UTC

Description Jean-Louis Dupond 2020-09-28 08:35:16 UTC
Description of problem:
When trying to upgrade a 4.4.1 oVirt Node to oVirt 4.4.2 the upgrade fails due to dangling symlinks from iSCSI Storage Domain.


How reproducible:
Upgrade a oVirt Node 4.4.1 with iSCSI Storage domains to 4.4.2


Steps to Reproduce:
1. Run yum upgrade on the node

Actual results:
  Running scriptlet: ovirt-node-ng-image-update-4.4.2-1.el8.noarch                                                                                                                                                                                                                                                        1/3 
Local storage domains were found on the same filesystem as / ! Please migrate the data to a new LV before upgrading, or you will lose the VMs
See: https://bugzilla.redhat.com/show_bug.cgi?id=1550205#c3
Storage domains were found in:
	/rhev/data-center/mnt/blockSD/6e99da85-8414-4ec5-92c3-b6cf741fc125/dom_md
	/rhev/data-center/mnt/blockSD/37a74cff-19be-44a2-98f9-0720745fa4b5/dom_md
	/rhev/data-center/mnt/blockSD/0040a08b-36ea-4bdb-ba93-f4d55321bb97/dom_md
	/rhev/data-center/mnt/blockSD/2c5eef3e-b40e-4ea5-8c97-07e5114381ac/dom_md
error: %prein(ovirt-node-ng-image-update-4.4.2-1.el8.noarch) scriptlet failed, exit status 1


Expected results:
The symlinks should have been cleaned/ignored, so the upgrade can complete.


Additional info:
This was introduced in https://bugzilla.redhat.com/show_bug.cgi?id=1850378

Comment 1 Sandro Bonazzola 2020-09-28 08:44:13 UTC
Nir Soffer, any chance this is a vdsm bug not removing symlinks on block storage domain deactivation?

Comment 2 Jean-Louis Dupond 2020-09-28 09:25:07 UTC
When checking the dangling symlinks it seems like they are all removed snapshots of VM's.
The logs neither say anything about removing the LV after merging the snapshot.

There are also symlinks still pointing to an 'active' /dev/xxx/xxx, but that LV doesn't exist anymore:
# fdisk -l /dev/6e99da85-8414-4ec5-92c3-b6cf741fc125/06a5da70-f29c-41f2-a063-aa22677e7bdc
fdisk: cannot open /dev/6e99da85-8414-4ec5-92c3-b6cf741fc125/06a5da70-f29c-41f2-a063-aa22677e7bdc: Input/output error

Comment 3 peyu 2020-09-29 02:42:54 UTC
The issue described in this bug seems to be the correct response to upgrade.
Because after the Bug 1850378 is fixed, if local storage is defined(or found) on the file system / (root), the host upgrade will be blocked.

Comment 4 Nir Levy 2020-09-29 08:12:59 UTC
can you please specify how have you migrate the snapshot?
and what is the status of folders 
/rhev/data-center/mnt/blockSD/


The root cause to what you are encountering is rhvh would not upgrade on a disk that have 
any relevant info of the previous installation to avoid data loss such as snapshot and vms that has not been migrated,

in case you have you done it manually you should handle all content of those folders  (the folders themselves can be left as empty folders)
If you have used the admin pages we need to make sure we are cleaning all links after migration,

but for us to know that we need to know how were you getting to this situation.

Comment 5 Jean-Louis Dupond 2020-09-29 08:43:08 UTC
Hi Nir,

Well the snapshots were not migrated. They are old symlinks from already removed/merged snapshots.

Its quite easy to reproduce.

1. Create a VM on iSCSI Storage Domain
2. Create snapshot on it
3. Delete snapshot
4. You'll see dangling symlinks

Seems like RemoveSnapshotSingleDiskLive does not properly remove the symlinks.

Comment 6 Nir Soffer 2020-09-29 09:45:58 UTC
(In reply to Jean-Louis Dupond from comment #0)
> Description of problem:
> When trying to upgrade a 4.4.1 oVirt Node to oVirt 4.4.2 the upgrade fails
> due to dangling symlinks from iSCSI Storage Domain.
> 
> 
> How reproducible:
> Upgrade a oVirt Node 4.4.1 with iSCSI Storage domains to 4.4.2
> 
> 
> Steps to Reproduce:
> 1. Run yum upgrade on the node
> 
> Actual results:
>   Running scriptlet: ovirt-node-ng-image-update-4.4.2-1.el8.noarch          
> 1/3 
> Local storage domains were found on the same filesystem as / ! Please
> migrate the data to a new LV before upgrading, or you will lose the VMs
> See: https://bugzilla.redhat.com/show_bug.cgi?id=1550205#c3
> Storage domains were found in:
> 	/rhev/data-center/mnt/blockSD/6e99da85-8414-4ec5-92c3-b6cf741fc125/dom_md
> 	/rhev/data-center/mnt/blockSD/37a74cff-19be-44a2-98f9-0720745fa4b5/dom_md
> 	/rhev/data-center/mnt/blockSD/0040a08b-36ea-4bdb-ba93-f4d55321bb97/dom_md
> 	/rhev/data-center/mnt/blockSD/2c5eef3e-b40e-4ea5-8c97-07e5114381ac/dom_md

This check in the scriptlet is wrong. These are not local storage domains but 
block storage domains.

The check for local storage domains should exclude /rhev/. Local storage
domains are never created in this location.

Local stoage domains can be created anywhere outside /rhev. For every 
local fs storage domain we will have a symlink:

    /rhev/data-center/mnt/_path_to_local_dir -> /path/to/local/dir

> error: %prein(ovirt-node-ng-image-update-4.4.2-1.el8.noarch) scriptlet
> failed, exit status 1
> 
> 
> Expected results:
> The symlinks should have been cleaned/ignored, so the upgrade can complete.

Removing the symlinks would be nice but it is not required for normal
operation of the system.

> Additional info:
> This was introduced in https://bugzilla.redhat.com/show_bug.cgi?id=1850378

Correct, this fix for this bug is incorrect.

Comment 7 Nir Soffer 2020-09-29 09:50:36 UTC
(In reply to Sandro Bonazzola from comment #1)
> Nir Soffer, any chance this is a vdsm bug not removing symlinks on block
> storage domain deactivation?

Vdsm never removes symlinks in /rhev/data-center/mnt/blockSD/*/. It would
be nice to remove them but we can never guarantee that the links are removed,
for example if vdsm is killed.

The issue in this bug is wrong search for local fs storage domain.

Comment 8 RHEL Program Management 2020-10-20 07:39:17 UTC
The documentation text flag should only be set after 'doc text' field is provided. Please provide the documentation text and set the flag to '?' again.

Comment 10 Sandro Bonazzola 2020-11-13 16:10:00 UTC
$ git tag --contains a6ed080d886db7db73d912a68858e2e52558fc04
ovirt-node-ng-image-4.4.3

Comment 11 peyu 2020-11-16 06:27:29 UTC
The bug has been resolved on "ovirt-node-ng-image-update-4.4.3-1.el8"

Test Version:
host: ovirt-node-ng-image-update-4.4.3-1.el8
oVirt: 4.4.1.4-1.el8


Test Steps:
1. Install ovirt-node-ng-installer-4.4.1-2020072310.el8.iso on an iSCSI machine
2. Setup local repos and point to "ovirt-node-ng-image-update-4.4.3-1.el8.noarch.rpm"
3. Add host to oVirt
4. Add a iSCSI storage domain and wait for its status to become "Active"
5. Create a VM on iSCSI Storage Domain
6. Create a snapshot on it
7. Delete the snapshot
8. Manage the host to maintenance mode
9. Upgrade the host
   # yum update
   # reboot
10. Activate the host via oVirt
11. Start the VM

Actual results:
Upgrade is successful. The status of the iSISI storage domain is "Active" and the VM can start up successful after the upgrade. 

Move the bug status to "VERIFIED".


Note You need to log in before you can comment on or make changes to this bug.