Created attachment 1769242 [details] engine logs Created attachment 1769242 [details] engine logs Description of problem: Fail to add ISCSI storage Version-Release number of selected component (if applicable): 4.4.6.1-0.11.el8ev How reproducible: 100% Steps to Reproduce: 1. Provision host with rhel-8.4 2. Deploy oVirt standalone 3. Add storage using 'ovirt-ansible-collection' 'storages' role Actual results: Fail to add storage Expected results: Should succeed Additional info: 2021-04-05 12:39:51,174+03 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateStorageDomainVDSCommand] (default task-13) [26cd6445] START, CreateStorageDomainVDSCommand(HostName = host_mixed_1, CreateStorageDomainVDSCommandParameters:{hostId='e9bac068-3c18-4598-b770-006668898a3c', storageDomain='StorageDomainStatic:{name='iscsi_0', id='9ea2871d-844b-4d9e-8ab5-2657681dc7ca'}', args='grXCdu-qaca-5KQS-PIGu-06w8-czv0-jqWfc3'}), log id: 5a475ff8 2021-04-05 12:39:52,994+03 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateStorageDomainVDSCommand] (default task-13) [26cd6445] Failed in 'CreateStorageDomainVDS' method 2021-04-05 12:39:53,000+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-13) [26cd6445] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM host_mixed_1 command CreateStorageDomainVDS failed: Cannot create Logical Volume: "vgname=9ea2871d-844b-4d9e-8ab5-2657681dc7ca lvname=master err=['WARNING: ext3 signature detected on /dev/9ea2871d-844b-4d9e-8ab5-2657681dc7ca/master at offset 1080. Wipe it? [y/n]: [n]', ' Aborted wiping of ext3.', ' 1 existing signature left on the device.', ' Failed to wipe signatures on logical volume 9ea2871d-844b-4d9e-8ab5-2657681dc7ca/master.', ' Aborting. Failed to wipe start of new LV.']" 2021-04-05 12:39:53,001+03 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateStorageDomainVDSCommand] (default task-13) [26cd6445] Command 'org.ovirt.engine.core.vdsbroker.vdsbroker.CreateStorageDomainVDSCommand' return value 'StatusOnlyReturn [status=Status [code=550, message=Cannot create Logical Volume: "vgname=9ea2871d-844b-4d9e-8ab5-2657681dc7ca lvname=master err=['WARNING: ext3 signature detected on /dev/9ea2871d-844b-4d9e-8ab5-2657681dc7ca/master at offset 1080. Wipe it? [y/n]: [n]', ' Aborted wiping of ext3.', ' 1 existing signature left on the device.', ' Failed to wipe signatures on logical volume 9ea2871d-844b-4d9e-8ab5-2657681dc7ca/master.', ' Aborting. Failed to wipe start of new LV.']"]]'
https://bugzilla.redhat.com/show_bug.cgi?id=1894692 could be related, what is the version of lvm?
(In reply to Benny Zlotnik from comment #1) > https://bugzilla.redhat.com/show_bug.cgi?id=1894692 This seems to be the root cause for this. Since the master lv is owned by vdsm, I think we can safely override lvm to always wipe the signature, but I'm not sure we have a good way to do this. We can add "wipefs -a /dev/vg-name/master" before using lvm to avoid future regressions in lvm. The master storage domain contains only tasks. We already handle the case when the master storage domain is broken by reconstructing the master storage domain, so wiping the master lv is safe.
According to https://bugzilla.redhat.com/1894692#c15 we can add --yes to the lvm command to make it succeed in this case.
This was the solution in libguestfs: https://github.com/libguestfs/libguestfs/commit/21cd97732c4973db835b8b6540c8ad582ebd2bda ISTR there's no equivalent lvm.conf option to force wiping of signatures. The only current option disables wiping signatures which is different. As I said in the other bug, the ideal situation is that LVM wouldn't let the old content leak into new LVs at all, by background zeroing / zero-on-first-read.
Since we cannot guarantee the contents of a logical volume in all cases (only when wipe-after-delete is enabled), we need to add --yes to lvcreate so leftovers from previous usage of the logical volume do not fail the operation. See comment 4 for libguestfs patch fixing same issue.
(In reply to Benny Zlotnik from comment #1) > https://bugzilla.redhat.com/show_bug.cgi?id=1894692 > could be related, what is the version of lvm? [root@lynx09 ~]# lvm version LVM version: 2.03.11(2)-RHEL8 (2021-01-28) Library version: 1.02.175-RHEL8 (2021-01-28) Driver version: 4.43.0
Thanks Nir and Benny for the analysis. Roman, let's submit a fix as soon as we can.
Roni, can you please provide all the detailed steps that done in - "Add storage using 'ovirt-ansible-collection' 'storages' role" ?
Roni, can you test if this vdsm build fixes the issue? https://jenkins.ovirt.org/job/vdsm_standard-check-patch/27281/artifact/build-artifacts.build-py3.el8.x86_64/ Hopefully you can use this vdsm build instead of the release build in your deployment.
(In reply to Eyal Shenitzky from comment #8) > Roni, can you please provide all the detailed steps that done in - > "Add storage using 'ovirt-ansible-collection' 'storages' role" ? Sure please see attachment "ovirt-ansible-collection storages roles console output" above
(In reply to Nir Soffer from comment #10) > Roni, can you test if this vdsm build fixes the issue? > https://jenkins.ovirt.org/job/vdsm_standard-check-patch/27281/artifact/build- > artifacts.build-py3.el8.x86_64/ > > Hopefully you can use this vdsm build instead of the release build > in your deployment. Nir, not fixed! please see attached "engine logs (vdsm upgrade to v4.40.60.2-10)" 2021-04-06 02:07:31,919+03 ERROR [org.ovirt.engine.core.bll.storage.connection.ISCSIStorageHelper] (default task-11) [0d44f274-d216-43a5-b825-f082c0605c45] The connection with details '00000000-0000-0000-0000-000000000000' failed because of error code '465' and error message is: failed to setup iscsi subsystem
(In reply to Roni from comment #14) > (In reply to Nir Soffer from comment #10) > > Roni, can you test if this vdsm build fixes the issue? > > https://jenkins.ovirt.org/job/vdsm_standard-check-patch/27281/artifact/build- > > artifacts.build-py3.el8.x86_64/ > > > > Hopefully you can use this vdsm build instead of the release build > > in your deployment. > > Nir, not fixed! please see attached "engine logs (vdsm upgrade to > v4.40.60.2-10)" > > 2021-04-06 02:07:31,919+03 ERROR > [org.ovirt.engine.core.bll.storage.connection.ISCSIStorageHelper] (default > task-11) [0d44f274-d216-43a5-b825-f082c0605c45] The connection with details > '00000000-0000-0000-0000-000000000000' failed because of error code '465' > and error message is: failed to setup iscsi subsystem this looks like BZ #1788631. Could you please confirm you are not hitting this issue (i.e. mixing IP address and DNS name)?
The error: vdsm.storage.iscsiadm.IscsiNodeError: (15, Logging in to [iface: default, target: iqn.1992-08.com.netapp:vserver-rhv-qe, portal: 10.46.16.10,3260] Logging in to [iface: default, target: iqn.1992-08.com.netapp:vserver-rhv-qe, portal: 10.46.16.10,3260] Login to [iface: default, target: iqn.1992-08.com.netapp:vserver-rhv-qe, portal: 10.46.16.10,3260] successful. iscsiadm: Could not login to [iface: default, target: iqn.1992-08.com.netapp:vserver-rhv-qe, portal: 10.46.16.10,3260]. iscsiadm: initiator reported error (15 - session exists)\niscsiadm: Could not log into all portals Mean we try to login twice to the same portal. This happens when we mixing server ip address and dns name, and in general when not using the addresses returned by ovirt when discovering the targets. Correct use of ovirt API is: - discover targets - extract the targets from the response - use the extracted targets when connecting to the targets Using hard coded addresses like described in comments 17 is unlikely to work, unless the underlying ansible code is using the APIs according these rules.
Created attachment 1769813 [details] succeeded_to_add_iscsi_sd
I agree with Vojta about opening new buf for scsi login. Regarding the scsi login issue - the real world normal flow is that host has no scsi nodes and RHV is handling scsi discovery and login. Handling host with arbitrary scsi nodes using automatic login can be tested separately as negative tests, but should not be the normal flow. Do not keep scsi nodes from cleanup step on the hosts!
Patch was merged and should be available in the next compose.
I reproduce this on rhel 8.4. 1. Manged storage domain and uncheck "Discard after delete" advanced option. The bug does not affect users using discard after delete, since in this case the previous contents of the disk are zeroed before a disk is deleted. 2. Create new preallocated disk 3. Create file system on the disk This can be done by attaching the disk to a VM, or by activating the LV on the host and running mkfs.xfs. 4. Delete the disk 5. Create new preallocated disk with same size When creating the new disk, the operation fails with: 2021-04-07 22:36:38,416+0300 ERROR (tasks/0) [storage.TaskManager.Task] (Task='bfaf87f4-e34e-447e-a512-fc92622bebfb') Unexpect ed error (task:880) Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 887, in _run return fn(*args, **kargs) File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 350, in run return self.cmd(*self.argslist, **self.argsdict) File "/usr/lib/python3.6/site-packages/vdsm/storage/securable.py", line 79, in wrapper return method(self, *args, **kwargs) File "/usr/lib/python3.6/site-packages/vdsm/storage/sp.py", line 1921, in createVolume initial_size=initialSize, add_bitmaps=addBitmaps) File "/usr/lib/python3.6/site-packages/vdsm/storage/sd.py", line 1081, in createVolume initial_size=initial_size, add_bitmaps=add_bitmaps) File "/usr/lib/python3.6/site-packages/vdsm/storage/volume.py", line 1253, in create add_bitmaps=add_bitmaps) File "/usr/lib/python3.6/site-packages/vdsm/storage/blockVolume.py", line 508, in _create initialTags=(sc.TAG_VOL_UNINIT,)) File "/usr/lib/python3.6/site-packages/vdsm/storage/lvm.py", line 1582, in createLV raise se.CannotCreateLogicalVolume(vgName, lvName, err) vdsm.storage.exception.CannotCreateLogicalVolume: Cannot create Logical Volume: "vgname=31e3675e-302a-4a22-ad9a-402c0c1e9765 l vname=d94aec7f-0f95-4741-8c11-7434bab09889 err=['WARNING: xfs signature detected on /dev/31e3675e-302a-4a22-ad9a-402c0c1e9765/ d94aec7f-0f95-4741-8c11-7434bab09889 at offset 0. Wipe it? [y/n]: [n]', ' Aborted wiping of xfs.', ' 1 existing signature le ft on the device.', ' Failed to wipe signatures on logical volume 31e3675e-302a-4a22-ad9a-402c0c1e9765/d94aec7f-0f95-4741-8c1 1-7434bab09889.', ' Aborting. Failed to wipe start of new LV.']" Why it failed: The deleted disk had XFS file system. The new disk was created using the same area on storage. After LVM zero the start of the disk it look for file system signatures and try to wipe the XFS signature. This issue affects any lvcreate operation, and wiping LUNs before using them will not help. Each time a raw disk is deleted, the next disk created may fail.
This is the failing lvm command: 2021-04-07 22:36:37,960+0300 WARN (tasks/0) [storage.LVM] Command with specific filter failed or returned no data, retrying w ith a wider filter, cmd=['/sbin/lvm', 'lvcreate', '--config', 'devices { preferred_names=["^/dev/mapper/"] ignore_suspended_ devices=1 write_cache_state=0 disable_after_error_count=3 filter=["a|^/dev/mapper/36001405e78e5d0feeb043ffa0aea5010$|", "r| .*|"] hints="none" obtain_device_list_from_udev=0 } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 us e_lvmetad=0 } backup { retain_min=50 retain_days=0 }', '--autobackup', 'n', '--contiguous', 'n', '--size', '2048m', '--addta g', 'OVIRT_VOL_INITIALIZING', '--name', 'd94aec7f-0f95-4741-8c11-7434bab09889', '31e3675e-302a-4a22-ad9a-402c0c1e9765'] rc=5 o ut=[] err=['WARNING: xfs signature detected on /dev/31e3675e-302a-4a22-ad9a-402c0c1e9765/d94aec7f-0f95-4741-8c11-7434bab09889 at offset 0. Wipe it? [y/n]: [n]', ' Aborted wiping of xfs.', ' 1 existing signature left on the device.', ' Failed to wipe signatures on logical volume 31e3675e-302a-4a22-ad9a-402c0c1e9765/d94aec7f-0f95-4741-8c11-7434bab09889.', ' Aborting. Failed to wipe start of new LV.'] (lvm:511) Zdenek, how LVM found a signature at offset 0, if it zero the lv before wiping?
Created attachment 1770018 [details] LVM configuration on host reproducing the issue
(In reply to Nir Soffer from comment #41) Correction for LVM flow (from lvmguy): When using wipe_signatures_when_zeroing_new_lvs=1 lvm wipe signature before zeroing. so detecting the signature at offset 0 is expected.
This should be fixed in vdsm 4.40.60.3. However we want to fix this in a safer way, disabling wiping signatures instead of using global --yes that may have unwanted results in future lvm versions.
Easier way to reproduce and verify: 1. Start a vm 2. Add new raw preallocated disk on block storage domain that does not use discard-after-delete # lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 6G 0 disk ├─sda1 8:1 0 1M 0 part ├─sda2 8:2 0 1G 0 part /boot ├─sda3 8:3 0 615M 0 part [SWAP] └─sda4 8:4 0 4.4G 0 part / sr0 11:0 1 1024M 0 rom vda 252:0 0 2G 0 disk 3. In the guest, create file system on the new disk # mkfs.xfs /dev/vda meta-data=/dev/vda isize=512 agcount=4, agsize=131072 blks = sectsz=512 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=1, rmapbt=0 = reflink=1 data = bsize=4096 blocks=524288, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0, ftype=1 log =internal log bsize=4096 blocks=2560, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 Discarding blocks...Done. 3. Deactivate the disk and remove it (Remove permanently) 4. Immediately add new disk with same format/size to the vm On vdsm version without the fix the operation will fail when lvm fail to wipe the signatures. With fixed version, the operation will succeed. 5. In the guest, verify that the first 4k of the disk are zeroed: # dd if=/dev/vda bs=4K count=1 status=none | hexdump 0000000 0000 0000 0000 0000 0000 0000 0000 0000 * 0001000
(In reply to Nir Soffer from comment #41) > This is the failing lvm command: > 'd94aec7f-0f95-4741-8c11-7434bab09889', > '31e3675e-302a-4a22-ad9a-402c0c1e9765'] rc=5 o > ut=[] err=['WARNING: xfs signature detected on > /dev/31e3675e-302a-4a22-ad9a-402c0c1e9765/d94aec7f-0f95-4741-8c11- > 7434bab09889 > at offset 0. Wipe it? [y/n]: [n]', ' Aborted wiping of xfs.', ' 1 existing > signature left on the device.', ' Failed to wipe > signatures on logical volume > 31e3675e-302a-4a22-ad9a-402c0c1e9765/d94aec7f-0f95-4741-8c11-7434bab09889.', > ' Aborting. Failed > to wipe start of new LV.'] (lvm:511) > > Zdenek, how LVM found a signature at offset 0, if it zero the lv before > wiping? Wiping is extension of original plain zeroing. So the order of operation: When Wiping Y -> first checks for signatures - if there is found a signature -> prompt for wiping. If answered 'n' -> command is aborted as likely the user does not want to damage the content of disk. (User can use -Wn to avoid wiping (which comes with implicit -Zn)) If answered 'y' -> lvm2 wipes all signatures + also zeroes first 8k. When Wiping N -> no signature checking and according -Z y|n first 8k is cleared (default is y).
All referenced patches are merged, can you please update on this bug status?
Fixed in: commit e8b7f23a82aa5cbc6e5921c31d0c64674d1c2c6b lvm: Disable wiping signatures $ git describe e8b7f23a82aa5cbc6e5921c31d0c64674d1c2c6b v4.40.60.3-3-ge8b7f23a8
Verified on: 4.4.6.5-0.17.el8ev
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (RHV RHEL Host (ovirt-host) [ovirt-4.4.6]), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:2178