Bug 1316075
Summary: | hosted-engine --deploy additional host installation fails when using fc for engine | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [oVirt] ovirt-hosted-engine-setup | Reporter: | Marco Buettner <buettner> | ||||||||||||
Component: | Plugins.Block | Assignee: | Simone Tiraboschi <stirabos> | ||||||||||||
Status: | CLOSED NOTABUG | QA Contact: | Elad <ebenahar> | ||||||||||||
Severity: | high | Docs Contact: | |||||||||||||
Priority: | unspecified | ||||||||||||||
Version: | 1.3.3.4 | CC: | acanan, amureini, buettner, bugs, mehrtens, nsoffer, stirabos, ylavi | ||||||||||||
Target Milestone: | ovirt-3.6.6 | Keywords: | Reopened | ||||||||||||
Target Release: | --- | Flags: | sbonazzo:
ovirt-3.6.z?
sbonazzo: exception? rule-engine: planning_ack? rule-engine: devel_ack? rule-engine: testing_ack? |
||||||||||||
Hardware: | x86_64 | ||||||||||||||
OS: | Linux | ||||||||||||||
Whiteboard: | |||||||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||
Clone Of: | Environment: | ||||||||||||||
Last Closed: | 2016-04-11 12:07:33 UTC | Type: | Bug | ||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||
Documentation: | --- | CRM: | |||||||||||||
Verified Versions: | Category: | --- | |||||||||||||
oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||
Embargoed: | |||||||||||||||
Attachments: |
|
Created attachment 1134461 [details]
Logfile of the engine-setup
Created attachment 1134462 [details]
Logfile of the OK host1
Both nodes are on the same versions: [root@kvm02 /]# rpm -qa | grep -i engine ovirt-engine-sdk-python-3.6.3.0-1.el7.centos.noarch ovirt-hosted-engine-ha-1.3.4.3-1.el7.centos.noarch ovirt-hosted-engine-setup-1.3.3.4-1.el7.centos.noarch [root@kvm02 /]# rpm -qa | grep -i ovirt ovirt-host-deploy-1.4.1-1.el7.centos.noarch ovirt-release36-003-1.noarch libgovirt-0.3.3-1.el7_2.1.x86_64 ovirt-engine-sdk-python-3.6.3.0-1.el7.centos.noarch ovirt-setup-lib-1.0.1-1.el7.centos.noarch ovirt-vmconsole-host-1.0.0-1.el7.centos.noarch ovirt-hosted-engine-ha-1.3.4.3-1.el7.centos.noarch ovirt-vmconsole-1.0.0-1.el7.centos.noarch ovirt-hosted-engine-setup-1.3.3.4-1.el7.centos.noarch [root@kvm02 /]# rpm -qa | grep -i vdsm vdsm-4.17.23-1.el7.noarch vdsm-xmlrpc-4.17.23-1.el7.noarch vdsm-python-4.17.23-1.el7.noarch vdsm-cli-4.17.23-1.el7.noarch vdsm-jsonrpc-4.17.23-1.el7.noarch vdsm-hook-vmfex-dev-4.17.23-1.el7.noarch vdsm-infra-4.17.23-1.el7.noarch vdsm-yajsonrpc-4.17.23-1.el7.noarch The issue is that VDSM doesn't see any VG ('vgUUID':'') on that LUN: { 'status':'used', 'vendorID':'IBM', 'capacity':'53687091200', 'fwrev':'0000', 'vgUUID':'', 'pvsize':'', 'pathlist':[ ], 'logicalblocksize':'512', 'pathstatus':[ { 'capacity':'53687091200', 'physdev':'sdc', 'type':'FCP', 'state':'active', 'lun':'2' }, { 'capacity':'53687091200', 'physdev':'sdf', 'type':'FCP', 'state':'active', 'lun':'2' }, { 'capacity':'53687091200', 'physdev':'sdr', 'type':'FCP', 'state':'active', 'lun':'2' }, { 'capacity':'53687091200', 'physdev':'sdu', 'type':'FCP', 'state':'active', 'lun':'2' } ], 'devtype':'FCP', 'physicalblocksize':'512', 'pvUUID':'', 'serial':'SIBM_2145_00c020422648XX00', 'GUID':'360050763008108992000000000000005', 'productID':'2145' }, While on the first host we had: 2016-03-09 00:51:45 INFO otopi.plugins.ovirt_hosted_engine_setup.storage.blockd blockd._misc:634 Creating Volume Group 2016-03-09 00:51:47 DEBUG otopi.plugins.ovirt_hosted_engine_setup.storage.blockd blockd._misc:636 {'status': {'message': 'OK', 'code': 0}, 'uuid': 'CFNiqE-oGTG-8rC3-EBvD-uqh1-bZ6W-hebsod'} 2016-03-09 00:51:47 DEBUG otopi.plugins.ovirt_hosted_engine_setup.storage.blockd blockd._misc:674 {'status': {'message': 'OK', 'code': 0}, 'info': {'vgsize': '53284438016', 'name': '0c892016-8f8d-430c-aa2a-16f1d8d944fa', 'vgUUID': 'CFNiqE-oGTG-8rC3-EBvD-uqh1-bZ6W-hebsod', 'pvlist': [{'vendorID': 'IBM', 'capacity': '53284438016', 'fwrev': '0000', 'vgUUID': 'CFNiqE-oGTG-8rC3-EBvD-uqh1-bZ6W-hebsod', 'pathlist': [], 'pathstatus': [{'capacity': '53687091200', 'physdev': 'sdc', 'type': 'FCP', 'state': 'active', 'lun': '2'}, {'capacity': '53687091200', 'physdev': 'sdf', 'type': 'FCP', 'state': 'active', 'lun': '2'}, {'capacity': '53687091200', 'physdev': 'sdn', 'type': 'FCP', 'state': 'active', 'lun': '2'}, {'capacity': '53687091200', 'physdev': 'sdq', 'type': 'FCP', 'state': 'active', 'lun': '2'}], 'devtype': 'FCP', 'pvUUID': 'd0Wt6e-Spni-C7dF-zIXG-Q4g3-9fWP-WPxR6t', 'serial': 'SIBM_2145_00c020422648XX00', 'GUID': '360050763008108992000000000000005', 'devcapacity': '53687091200', 'productID': '2145'}], 'state': 'OK', 'vgfree': '53284438016', 'type': 2, 'attr': {'partial': '-', 'permission': 'w', 'allocation': 'n', 'exported': '-', 'clustered': '-', 'resizeable': 'z'}}} Can you please attach VDSM logs from your additional host? Can you please run pvscan --cache && vgs on both the hosts and paste here the output? Created attachment 1134533 [details]
VDSM Log from Host 2
On the first host the command runs without result. VDSM log there shows also some exceptions. I will paste it here, too. [root@kvm02 vdsm]# pvscan --cache && vgs /dev/mapper/360050763008108992000000000000003: lseek 392855289856 failed: Das Argument ist ungültig /dev/mapper/360050763008108992000000000000003: lseek 392855289856 failed: Das Argument ist ungültig /dev/mapper/360050763008108992000000000000003: lseek 392855289856 failed: Das Argument ist ungültig VG #PV #LV #SN Attr VSize VFree 0c892016-8f8d-430c-aa2a-16f1d8d944fa 1 12 0 wz--n- 49,62g 34,25g 8ebfbb92-7fc5-4157-b2e8-8be536a8b1be 1 8 0 wz--n- 365,62g 361,50g a4ae7659-5a50-4fa9-9e82-24c6e290db34 1 152 0 wz--n- 15,01t 3,51t ab8bb5ff-9096-46d0-8519-28ef297fafca 1 11 0 wz--n- 16,37t 7,39t centos_kvm02 1 2 0 wz--n- 557,88g 0 ff697023-ccab-450a-9bf5-fe943ae72d0b 1 12 0 wz--n- 10,63t 10,24t Created attachment 1134550 [details]
VDSM logfiles from first Host in relevant time
On the first host you deployed on VG CFNiqE-oGTG-8rC3-EBvD-uqh1-bZ6W-hebsod but it seams that 'lvm reload operation' on the second host is failing due to an error on a different VG. Thread-54::DEBUG::2016-03-09 02:01:03,639::lvm::347::Storage.OperationMutex::(_reloadpvs) Operation 'lvm reload operation' released the operation mutex Thread-54::DEBUG::2016-03-09 02:01:03,720::lvm::290::Storage.Misc.excCmd::(cmd) /usr/bin/taskset --cpu-list 0-31 /usr/bin/sudo -n /usr/sbin/lvm pvcreate --config ' devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter = [ '\''a|/dev/mapper/360050763008108992000000000000003|/dev/mapper/360050763008108992000000000000004|/dev/mapper/360050763008108992000000000000005|/dev/mapper/3600c0ff000262944734e295601000000|/dev/mapper/3600c0ff000262cc8b2fd025601000000|'\'', '\''r|.*|'\'' ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup { retain_min = 50 retain_days = 0 } ' --test --metadatasize 128m --metadatacopies 2 --metadataignore y /dev/mapper/360050763008108992000000000000003 /dev/mapper/360050763008108992000000000000004 /dev/mapper/360050763008108992000000000000005 /dev/mapper/3600c0ff000262cc8b2fd025601000000 /dev/mapper/3600c0ff000262944734e295601000000 (cwd None) Thread-54::DEBUG::2016-03-09 02:01:04,103::lvm::290::Storage.Misc.excCmd::(cmd) FAILED: <err> = ' WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!\n TEST MODE: Metadata will NOT be updated and volumes will not be (de)activated.\n /dev/mapper/360050763008108992000000000000003: lseek 392855289856 failed: Invalid argument\n /dev/mapper/360050763008108992000000000000003: lseek 392855289856 failed: Invalid argument\n WARNING: Volume Group 8ebfbb92-7fc5-4157-b2e8-8be536a8b1be is not consistent.\n Can\'t initialize physical volume "/dev/mapper/360050763008108992000000000000003" of volume group "8ebfbb92-7fc5-4157-b2e8-8be536a8b1be" without -ff\n /dev/mapper/360050763008108992000000000000003: lseek 392855289856 failed: Invalid argument\n WARNING: Volume Group 8ebfbb92-7fc5-4157-b2e8-8be536a8b1be is not consistent.\n Can\'t initialize physical volume "/dev/mapper/360050763008108992000000000000004" of volume group "ff697023-ccab-450a-9bf5-fe943ae72d0b" without -ff\n /dev/mapper/360050763008108992000000000000003: lseek 392855289856 failed: Invalid argument\n WARNING: Volume Group 8ebfbb92-7fc5-4157-b2e8-8be536a8b1be is not consistent.\n Can\'t initialize physical volume "/dev/mapper/360050763008108992000000000000005" of volume group "0c892016-8f8d-430c-aa2a-16f1d8d944fa" without -ff\n /dev/mapper/360050763008108992000000000000003: lseek 392855289856 failed: Invalid argument\n WARNING: Volume Group 8ebfbb92-7fc5-4157-b2e8-8be536a8b1be is not consistent.\n Can\'t initialize physical volume "/dev/mapper/3600c0ff000262cc8b2fd025601000000" of volume group "a4ae7659-5a50-4fa9-9e82-24c6e290db34" without -ff\n /dev/mapper/360050763008108992000000000000003: lseek 392855289856 failed: Invalid argument\n WARNING: Volume Group 8ebfbb92-7fc5-4157-b2e8-8be536a8b1be is not consistent.\n Can\'t initialize physical volume "/dev/mapper/3600c0ff000262944734e295601000000" of volume group "ab8bb5ff-9096-46d0-8519-28ef297fafca" without -ff\n'; <rc> = 5 Thread-54::DEBUG::2016-03-09 02:01:04,108::lvm::864::Storage.LVM::(testPVCreate) rc: 5, out: [], err: [' WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!', ' TEST MODE: Metadata will NOT be updated and volumes will not be (de)activated.', ' /dev/mapper/360050763008108992000000000000003: lseek 392855289856 failed: Invalid argument', ' /dev/mapper/360050763008108992000000000000003: lseek 392855289856 failed: Invalid argument', ' WARNING: Volume Group 8ebfbb92-7fc5-4157-b2e8-8be536a8b1be is not consistent.', ' Can\'t initialize physical volume "/dev/mapper/360050763008108992000000000000003" of volume group "8ebfbb92-7fc5-4157-b2e8-8be536a8b1be" without -ff', ' /dev/mapper/360050763008108992000000000000003: lseek 392855289856 failed: Invalid argument', ' WARNING: Volume Group 8ebfbb92-7fc5-4157-b2e8-8be536a8b1be is not consistent.', ' Can\'t initialize physical volume "/dev/mapper/360050763008108992000000000000004" of volume group "ff697023-ccab-450a-9bf5-fe943ae72d0b" without -ff', ' /dev/mapper/360050763008108992000000000000003: lseek 392855289856 failed: Invalid argument', ' WARNING: Volume Group 8ebfbb92-7fc5-4157-b2e8-8be536a8b1be is not consistent.', ' Can\'t initialize physical volume "/dev/mapper/360050763008108992000000000000005" of volume group "0c892016-8f8d-430c-aa2a-16f1d8d944fa" without -ff', ' /dev/mapper/360050763008108992000000000000003: lseek 392855289856 failed: Invalid argument', ' WARNING: Volume Group 8ebfbb92-7fc5-4157-b2e8-8be536a8b1be is not consistent.', ' Can\'t initialize physical volume "/dev/mapper/3600c0ff000262cc8b2fd025601000000" of volume group "a4ae7659-5a50-4fa9-9e82-24c6e290db34" without -ff', ' /dev/mapper/360050763008108992000000000000003: lseek 392855289856 failed: Invalid argument', ' WARNING: Volume Group 8ebfbb92-7fc5-4157-b2e8-8be536a8b1be is not consistent.', ' Can\'t initialize physical volume "/dev/mapper/3600c0ff000262944734e295601000000" of volume group "ab8bb5ff-9096-46d0-8519-28ef297fafca" without -ff'], unusedDevs: set([]), usedDevs: set(['/dev/mapper/360050763008108992000000000000005', '/dev/mapper/360050763008108992000000000000004', '/dev/mapper/360050763008108992000000000000003', '/dev/mapper/3600c0ff000262944734e295601000000', '/dev/mapper/3600c0ff000262cc8b2fd025601000000']) Thread-54::INFO::2016-03-09 02:01:04,109::logUtils::51::dispatcher::(wrapper) Run and protect: getDeviceList, Return response: {'devList': [{'status': 'used', 'fwrev': '0000', 'vgUUID': '', 'pathlist': [], 'logicalblocksize': '512', 'devtype': 'FCP', 'physicalblocksize': '512', 'serial': 'SIBM_2145_00c020422648XX00', 'GUID': '360050763008108992000000000000003', 'productID': '2145', 'vendorID': 'IBM', 'capacity': '339302416384', 'pvsize': '', 'pathstatus': [{'type': 'FCP', 'physdev': 'sda', 'capacity': '339302416384', 'state': 'active', 'lun': '0'}, {'type': 'FCP', 'physdev': 'sdd', 'capacity': '339302416384', 'state': 'active', 'lun': '0'}, {'type': 'FCP', 'physdev': 'sdl', 'capacity': '339302416384', 'state': 'active', 'lun': '0'}, {'type': 'FCP', 'physdev': 'sdo', 'capacity': '339302416384', 'state': 'active', 'lun': '0'}], 'pvUUID': ''}, {'status': 'used', 'fwrev': '0000', 'vgUUID': '', 'pathlist': [], 'logicalblocksize': '512', 'devtype': 'FCP', 'physicalblocksize': '512', 'serial': 'SIBM_2145_00c020422648XX00', 'GUID': '360050763008108992000000000000004', 'productID': '2145', 'vendorID': 'IBM', 'capacity': '11689827237888', 'pvsize': '', 'pathstatus': [{'type': 'FCP', 'physdev': 'sdb', 'capacity': '11689827237888', 'state': 'active', 'lun': '1'}, {'type': 'FCP', 'physdev': 'sde', 'capacity': '11689827237888', 'state': 'active', 'lun': '1'}, {'type': 'FCP', 'physdev': 'sdm', 'capacity': '11689827237888', 'state': 'active', 'lun': '1'}, {'type': 'FCP', 'physdev': 'sdp', 'capacity': '11689827237888', 'state': 'active', 'lun': '1'}], 'pvUUID': ''}, {'status': 'used', 'fwrev': '0000', 'vgUUID': '', 'pathlist': [], 'logicalblocksize': '512', 'devtype': 'FCP', 'physicalblocksize': '512', 'serial': 'SIBM_2145_00c020422648XX00', 'GUID': '360050763008108992000000000000005', 'productID': '2145', 'vendorID': 'IBM', 'capacity': '53687091200', 'pvsize': '', 'pathstatus': [{'type': 'FCP', 'physdev': 'sdc', 'capacity': '53687091200', 'state': 'active', 'lun': '2'}, {'type': 'FCP', 'physdev': 'sdf', 'capacity': '53687091200', 'state': 'active', 'lun': '2'}, {'type': 'FCP', 'physdev': 'sdn', 'capacity': '53687091200', 'state': 'active', 'lun': '2'}, {'type': 'FCP', 'physdev': 'sdq', 'capacity': '53687091200', 'state': 'active', 'lun': '2'}], 'pvUUID': ''}, {'status': 'used', 'fwrev': 'G200', 'vgUUID': '', 'pathlist': [], 'logicalblocksize': '512', 'devtype': 'FCP', 'physicalblocksize': '512', 'serial': 'SHP_MSA_2040_SAN_00c0ff262cc80000b2fd025601000000', 'GUID': '3600c0ff000262cc8b2fd025601000000', 'productID': 'MSA 2040 SAN', 'vendorID': 'HP', 'capacity': '16499997671424', 'pvsize': '', 'pathstatus': [{'type': 'FCP', 'physdev': 'sdg', 'capacity': '16499997671424', 'state': 'active', 'lun': '1'}, {'type': 'FCP', 'physdev': 'sdj', 'capacity': '16499997671424', 'state': 'active', 'lun': '1'}, {'type': 'FCP', 'physdev': 'sdr', 'capacity': '16499997671424', 'state': 'active', 'lun': '1'}, {'type': 'FCP', 'physdev': 'sdt', 'capacity': '16499997671424', 'state': 'active', 'lun': '1'}], 'pvUUID': ''}, {'status': 'used', 'fwrev': 'G200', 'vgUUID': '', 'pathlist': [], 'logicalblocksize': '512', 'devtype': 'FCP', 'physicalblocksize': '512', 'serial': 'SHP_MSA_2040_SAN_00c0ff2629440000734e295601000000', 'GUID': '3600c0ff000262944734e295601000000', 'productID': 'MSA 2040 SAN', 'vendorID': 'HP', 'capacity': '17999998222336', 'pvsize': '', 'pathstatus': [{'type': 'FCP', 'physdev': 'sdh', 'capacity': '17999998222336', 'state': 'active', 'lun': '2'}, {'type': 'FCP', 'physdev': 'sdk', 'capacity': '17999998222336', 'state': 'active', 'lun': '2'}, {'type': 'FCP', 'physdev': 'sds', 'capacity': '17999998222336', 'state': 'active', 'lun': '2'}, {'type': 'FCP', 'physdev': 'sdu', 'capacity': '17999998222336', 'state': 'active', 'lun': '2'}], 'pvUUID': ''}]} Thats right, in the first output it seems thats the LUN info were corrupt. 8ebfbb92-7fc5-4157-b2e8-8be536a8b1be 1 8 0 wz--n- 365,62g 361,50g Is an old data VG. I had detached it from ovirt and split it into two LUNs before the new installation. Now I had attached the resized LUN again to the Datacenter Result now is the clean output on KVM02 [root@kvm02 vdsm]# pvscan --cache && vgs VG #PV #LV #SN Attr VSize VFree 0c892016-8f8d-430c-aa2a-16f1d8d944fa 1 12 0 wz--n- 49,62g 34,25g 2674dd07-7162-4279-8c8b-ce22e12398b8 1 6 0 wz--n- 315,62g 311,75g a4ae7659-5a50-4fa9-9e82-24c6e290db34 1 152 0 wz--n- 15,01t 3,51t ab8bb5ff-9096-46d0-8519-28ef297fafca 1 11 0 wz--n- 16,37t 7,39t centos_kvm02 1 2 0 wz--n- 557,88g 0 ff697023-ccab-450a-9bf5-fe943ae72d0b 1 12 0 wz--n- 10,63t 10,24t Can you please try redeploying now? At the moment it isn't possible, because it runs manually installed as host in production envirement. I'll do this ASAP and give you feedback. Why is it on 3.6.4? I'm not sure it's something we really want to fix. Here the issue was that another VG was failing for other reasons and so VDSM failed to scan and report the VG used by the hosted-engine SD. It should work on clean environments. Nir, any idea about how we can improve this? After the re-atachment of the splitted lun, the lvm looks good and also the hosted-engine --deploy works as exspected. Using latest code, hosted-engine deployment of second host over FC finished successfully. ovirt-hosted-engine-setup-1.3.3.4-1.el7ev.noarch vdsm-4.17.23-0.el7ev.noarch rhevm-appliance-20160229.0-1.el7ev.noarch According to comment 16 and comment 17, the issue seems resolved. Are there any other action items here? (In reply to Allon Mureinik from comment #18) > According to comment 16 and comment 17, the issue seems resolved. Are there > any other action items here? On the straight path, it was working also in the past. The issue is when a VG on another LUN is corrupted due to any reasons and so the scan of available SD by VDSM takes too long or fail. (In reply to Simone Tiraboschi from comment #19) > (In reply to Allon Mureinik from comment #18) > > According to comment 16 and comment 17, the issue seems resolved. Are there > > any other action items here? > > On the straight path, it was working also in the past. > The issue is when a VG on another LUN is corrupted due to any reasons and so > the scan of available SD by VDSM takes too long or fail. There's nothing intelligent VDSM can do with a corrupted VG. If there's a workaround we should help the user with some manual procedure, but nothing more. |
Created attachment 1134460 [details] Logfile from failed setup Description of problem: After the deployment of the first Host with the hosted engine on top, the setup of an additional host fails after the choice of storage to use with "The selected device is already used". Version-Release number of selected component (if applicable): OS: CentOS 7.2 latest stable Ovirt 3.6 latest stable How reproducible: Steps to Reproduce: 1. Install CentOS 7 Host as first Host with Ovirt 3.6 stable repo 2. yum install ovirt-hosted-engine-setup ovirt-engine-appliance.noarch 3. Complete the setup with FC as Engine Storage 4. Set up an additional Host with CentOS 7, too. Install Ovirt 3.6 stable repo and ovirt-hosted-engine-setup 5. Run the setup with hosted-engine --deploy and choose fc and the same lun as in step 3 choosen. Actual results: Setup of an additional host in hosted engine fc enviroment fails. Expected results: Complete setup of an additional host works. Additional info: See attached logs