Description of problem: If I extend the Hosted-Engine Storage Domain (iSCSI) by adding a second target (with a LUN), instead of adding a second LUN to the already existing target, hosted-engine.conf does not get updated, and the Hosted-Engine VG comes up partially activated when host is rebooted. If all the required LVs are present in the initial LUN, all is good and once the Engine goes up the host is activated fine, connecting to the additional LUN. But if the Hosted-Engine disk was extended and some extends fall into the additional PV, user might get into trouble. During 3.6 to 4.0 upgrade a backup LV is created as well, if that falls into the new PV and the host is rebooted, it can also cause trouble. # cat /etc/ovirt-hosted-engine/hosted-engine.conf | egrep 'iqn|gateway' gateway=192.168.0.254 iqn=iqn.2003-01.org.linux-iscsi.storage.x8664:hostedengine Version-Release number of selected component (if applicable): ovirt-hosted-engine-ha-2.0.2-1.el7ev.noarch ovirt-engine-4.0.4 How reproducible: 100% Steps to Reproduce: 1. Deploy Hosted-Engine 2. Extend the HE SD by adding a second LUN which sits in a DIFFERENT target from the initial LUN. 3. Reboot host Actual results: StorageDomainAccessError: Domain is either partially accessible or entirely inaccessible: ('2ef8489e-bfb7-4e1b-b70c-8788fe66836d',) HE SD VG comes up partially activated because hosted-engine.conf is missing the second target. Expected results: HE SD VG fully active on host reboot. Additional Information: # iscsiadm -m session tcp: [1] 192.168.100.1:3260,1 iqn.2003-01.org.linux-iscsi.storage.x8664:hostedengine (non-flash) tcp: [2] 192.168.100.1:3260,1 iqn.2003-01.org.linux-iscsi.storage.x8664:hostedengineext (non-flash) # cat /etc/ovirt-hosted-engine/hosted-engine.conf | egrep 'gateway|iqn' gateway=192.168.0.254 iqn=iqn.2003-01.org.linux-iscsi.storage.x8664:hostedengine
In fact, this is preventing the agent to start the VM. VDSM doesn't report the HE SD as up, the agent keeps waiting? ha-agent: MainThread::INFO::2016-10-20 16:19:13,334::hosted_engine::860::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_domain_monitor_status) VDSM domain monitor status: PENDING vdsm: Thread-98::ERROR::2016-10-20 16:18:33,196::lvm::404::Storage.LVM::(_reloadvgs) vg 2ef8489e-bfb7-4e1b-b70c-8788fe66836d has pv_count 2 but pv_names ('/dev/mapper/360014052d03c99fec334a71a88308fb6',) Thread-98::ERROR::2016-10-20 16:18:33,289::monitor::425::Storage.Monitor::(_checkDomainStatus) Error checking domain 2ef8489e-bfb7-4e1b-b70c-8788fe66836d Thread-98::ERROR::2016-10-20 16:23:53,915::monitor::425::Storage.Monitor::(_checkDomainStatus) Error checking domain 2ef8489e-bfb7-4e1b-b70c-8788fe66836d Traceback (most recent call last): File "/usr/share/vdsm/storage/monitor.py", line 406, in _checkDomainStatus self.domain.selftest() File "/usr/share/vdsm/storage/blockSD.py", line 1011, in selftest lvm.chkVG(self.sdUUID) File "/usr/share/vdsm/storage/lvm.py", line 1012, in chkVG raise se.StorageDomainAccessError("%s: %s" % (vgName, err)) StorageDomainAccessError: Domain is either partially accessible or entirely inaccessible: ('2ef8489e-bfb7-4e1b-b70c-8788fe66836d: [\' WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!\', " Couldn\'t find device with uuid kddnm3-LiT8-OpnM-focS-cVkV-7p2O-r7QxDz.", \' The volume group is missing 1 physical volumes.\']',)
To be discussed next HE meeting.
expected to be a non-issue with the new flow, no?
The new flow only affects deployment. The behaviour post-deployment stays the same. So this is still an issue.
We should have much more information now when Node 0 deployment is the default and with the libvirt xml being present in the OVF we use to start the VM.
(In reply to Martin Sivák from comment #11) > We should have much more information now when Node 0 deployment is the > default and with the libvirt xml being present in the OVF we use to start > the VM. But will that be reflected (updated) after extending the domain?
The engine is in charge of preparing the OVF so we can refresh it immediately (like we do for other changes). The agent only needs access to a single storage domain to be able to read the OVF, because copies of it are stored on all storage domains the VM disks come from. The only missing piece might be in the amount of detail we get from the XML with regards the mount paths and arguments and that needs to be investigated.
re-targeting to 4.3.1 since this BZ has not been proposed as blocker for 4.3.0. If you think this bug should block 4.3.0 please re-target and set blocker flag.
Moving to 4.3.2 not being identified as blocker for 4.3.1.
Considering this for 4.4, eventually document we support only single LUN.
Let's add to hosted engine install guide: we support only single LUN for hosted engine deploy.
I would recommend doing the following documentation: 1. Add it to the HE deployment guide that only single target it supported 2. Add a Warning Box to SD-extension in Admin Guide, warning that extending the HE-SD is not supported.
(In reply to Martin Tessun from comment #20) > I would recommend doing the following documentation: > > 1. Add it to the HE deployment guide that only single target it supported I'll add this to the Installation Guide as part of bug 1774495. > 2. Add a Warning Box to SD-extension in Admin Guide, warning that extending > the HE-SD is not supported. Need to do this. Targeting 4.4 GA.
(In reply to Martin Tessun from comment #20) > > I would recommend doing the following documentation: > > 2. Add a Warning Box to SD-extension in Admin Guide, warning that extending > > the HE-SD is not supported. Martin, can you give me a better idea of what needs to be said, and where it should be said in the Admin Guide?
(In reply to Steve Goodman from comment #22) > (In reply to Martin Tessun from comment #20) > > > I would recommend doing the following documentation: > > > > 2. Add a Warning Box to SD-extension in Admin Guide, warning that extending > > > the HE-SD is not supported. > > Martin, can you give me a better idea of what needs to be said, and where it > should be said in the Admin Guide? I believe we need to add a warning to the Storage Domain Management section (extending a Storage Domain): Warning: Changing the Hosted Engine Storage Domain is not supported. It could happen that the HE is no longer bootable after that.
Published.