Bug 1192642
| Summary: | [Bug 3.3] Can't add new FC storage domain | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Robert McSwain <rmcswain> |
| Component: | ovirt-engine | Assignee: | Daniel Erez <derez> |
| Status: | CLOSED NOTABUG | QA Contact: | Aharon Canan <acanan> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 3.3.0 | CC: | adahms, amureini, derez, ecohen, iheim, lpeer, lsurette, rbalakri, Rhev-m-bugs, rmcswain, yeylon |
| Target Milestone: | ovirt-3.6.3 | ||
| Target Release: | 3.6.0 | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | storage | ||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2015-02-17 15:16:25 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
(In reply to Robert McSwain from comment #0) > > Q1: Is this expected behavior when the new lun for the storage domain is > only exposed to one host in the cluster? Yes. Moreover, IIRC, in 3.4 RHEVM even checks this before attempting to create the domain. Daniel, can you confirm/refute? > > Q2: Where is the documentation that states that all nodes must have > visibility > to the lun if this is the case? I could not find such a note, but there definitely should be one. Andrew, can you weigh in here please? Hi Allon, Thank you for the needinfo request. We have a range of information about storage domains in the Administration Guide and Technical Guide, but I don't think there is a section where we call this out directly. I would be happy to raise a documentation bug and work on including this information if you like. Kind regards, Andrew (In reply to Allon Mureinik from comment #2) > (In reply to Robert McSwain from comment #0) > > > > Q1: Is this expected behavior when the new lun for the storage domain is > > only exposed to one host in the cluster? > Yes. > Moreover, IIRC, in 3.4 RHEVM even checks this before attempting to create > the domain. > Daniel, can you confirm/refute? Connectivity is verified only upon attaching the domain to pool. I'm not sure about documentation, but the limitation is explained on [1] (btw, this limitation should probably be relaxed to the cluster level after deprecating the pool in 3.6). As a quick workaround, you can try putting the problematic host in maintenance - if the issue reoccurs, please attach the relevant vdsm and engine logs (the tarball in the ticket seems to be broken). [1] https://access.redhat.com/solutions/61912 > > > > > Q2: Where is the documentation that states that all nodes must have > > visibility > > to the lun if this is the case? > I could not find such a note, but there definitely should be one. > Andrew, can you weigh in here please? A storage device used for a domain should be accessible from all hosts in the DC. This is behavior by design. I'll follow up on the documentation for 3.5.z to make this issue clearer. |
Release: Red Hat Enterprise Virtualization Hypervisor release 6.5 (20140407.0.el6ev) Kernel: Linux encl1-usrhv201 2.6.32-431.11.2.el6.x86_64 #1 SMP Mon Mar 3 13:32:45 EST 2014 x86_64 x86_64 x86_64 GNU/Linux Uptime: 08:46:48 up 4 days, 18:28, 0 users, load average: 0.83, 0.74, 0.77 CPUINFO Processor count: 24 vendor_id : GenuineIntel model name : Intel(R) Xeon(R) CPU E5-2667 0 @ 2.90GHz cpu MHz : 2900.118 siblings : 12 (per socket) cpu cores : 6 (per socket) Socket: 1 [logical cores: 12 ] Socket: 0 [logical cores: 12 ] BIOS Information Vendor: HP Version: I31 Release Date: 02/10/2014 iLO Firmware Revision: 1.51 System Information Manufacturer: HP Product Name: ProLiant BL460c Gen8 MEMINFO MemTotal: 132128300 kB Note: 132,128,300 kB SwapTotal: 82841592 kB Note: 82,841,592 kB SwapFree: 82841592 kB HugePages_Total: 0 ==== Problem description ==== Customer with a RHEV 3.3 environment consisting of 10 BL460 Gen8 Hypervisors and a DL360p Gen8 as RHEVM. They have a FC single storage domain that is on a EMC VNX5500 storage array. They recently did a test where they wanted to add an additional FC EMC VNXe3200 storage array. It has configured/zoned on same FC switches. For testing, they have created a LUN. From the hypervisor customer can see the new LUN. He was able to create a new domain called "eng_lun_usdfs". However, He was not able to attach new LUN to the datacenter. the error was "failed to attach data to the datacenter" The LUN was presented to hypervisor "encl1-ushrv201" only as part of this test. I wonder if this is the problem. In earlier releases of RHEV, there was a clear statement that storage domains needed to be accessible to all hosts. It makes sense to me that ALL nodes need to be able to access the storage domain. I could only find one sentence that suggests that this is mandatory in the 3.3 Admin Guide: <snip> https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Virtualization/ 3.3/html-single/Administration_Guide/index.html A local storage domain can be set up on a host. When you set up host to use local storage, the host automatically gets added to a new data center and cluster that no other hosts can be added to. Multiple host clusters require that all hosts have access to all storage domains, which is not possible with local storage. Virtual machines created in a single host cluster cannot be migrated, fenced or scheduled. </snip> I find some comments that suggest that this is not the case: <snip> All communication to the storage domain is via the selected host and not directly from the Red Hat Enterprise Virtualization Manager. At least one active host must exist in the system, and be attached to the chosen data center, before the storage is configured. </snip> On RHEVM, engine.log shows when it was created: ./var/log/ovirt-engine/engine.log:2015-02-09 09:08:05,933 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ajp-/127.0.0.1:8702-7) [92383fb] Correlation ID: 92383fb, Job ID: c3fcfc65-043f-4978-9eaf-baa2233ecc2b, Call Stack: null, Custom Event ID: -1, Message: Storage Domain eng_lun_usdfs was added by admin@internal ./var/log/ovirt-engine/engine.log:2015-02-09 09:09:15,359 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ajp-/127.0.0.1:8702-5) [604172a8] Correlation ID: 6cadeb4a, Job ID: b6cf8f3a-8c45-4916-b477-0caac69539f3, Call Stack: null, Custom Event ID: -1, Message: Failed to attach Storage Domain eng_lun_usdfs to Data Center Bothell. (User: admin@internal) ./var/log/ovirt-engine/engine.log:2015-02-09 09:11:04,705 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (pool-4-thread-31) [39724b62] Correlation ID: 4772dd74, Job ID: d9ebfabe-88e4-4ed1-b60d-282c073fff55, Call Stack: null, Custom Event ID: -1, Message: Failed to attach Storage Domain eng_lun_usdfs to Data Center Bothell. (User: admin@internal) ./var/log/ovirt-engine/engine.log:2015-02-09 10:10:13,553 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (pool-4-thread-38) [232ff5a8] Correlation ID: 723e9458, Job ID: 2ac90545-750f-47d3-b6d3-ad514b70b431, Call Stack: null, Custom Event ID: -1, Message: Failed to attach Storage Domain eng_lun_usdfs to Data Center Bothell. (User: admin@internal) When I look at the logs, I can see that all nodes get the GetVGInfoVDSCommand: *2015-02-09 09:08:06,464 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVGInfoVDSCommand] (pool-4-thread-39) [72f2ad3d] START, GetVGInfoVDSCommand(HostName = encl1-usrhv206, HostId = dadbd25f-4e1a-4c80-a215-59cee9fee3f9, VGID=z1jxsy*-dP5X-xfXy-0FF4-fTPh-Gzw4-2clq9k), log id: 1fef84c1 *2015-02-09 09:08:06,464 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVGInfoVDSCommand] (pool-4-thread-45) [16efcfd3] START, GetVGInfoVDSCommand(HostName = encl2-usrhv207, HostId = 89c931b6-64fe-4002-91a3-7301576f374c, VGID=z1jxsy*-dP5X-xfXy-0FF4-fTPh-Gzw4-2clq9k), log id: 2b371828 *2015-02-09 09:08:06,464 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVGInfoVDSCommand] (pool-4-thread-37) [49ca1813] START, GetVGInfoVDSCommand(HostName = encl1-usrhv203, HostId = be2a7d7a-7359-49c6-a838-df57f7d25f51, VGID=z1jxsy*-dP5X-xfXy-0FF4-fTPh-Gzw4-2clq9k), log id: 6219176f *2015-02-09 09:08:06,465 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVGInfoVDSCommand] (pool-4-thread-36) [58e67dea] START, GetVGInfoVDSCommand(HostName = encl1-usrhv205, HostId = 295eba80-1289-44df-adc2-d3ebee14f2b8, VGID=z1jxsy*-dP5X-xfXy-0FF4-fTPh-Gzw4-2clq9k), log id: 617604af *2015-02-09 09:08:06,472 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVGInfoVDSCommand] (pool-4-thread-49) [5a353168] START, GetVGInfoVDSCommand(HostName = encl1-usrhv202, HostId = d0de758e-0e85-41d9-b5ea-d503a74162d2, VGID=z1jxsy*-dP5X-xfXy-0FF4-fTPh-Gzw4-2clq9k), log id: 7d993780 *2015-02-09 09:08:06,473 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVGInfoVDSCommand] (pool-4-thread-50) [5a3ed11c] START, GetVGInfoVDSCommand(HostName = encl1-usrhv204, HostId = ecdfc282-56ff-4e30-8e0d-994fa5e2c41b, VGID=z1jxsy*-dP5X-xfXy-0FF4-fTPh-Gzw4-2clq9k), log id: 69559c6e *2015-02-09 09:08:06,474 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVGInfoVDSCommand] (pool-4-thread-40) [56d12751] START, GetVGInfoVDSCommand(HostName = encl2-usrhv209, HostId = 67ccf3f9-afb2-484d-bb57-6fe8650de8c3, VGID=z1jxsy*-dP5X-xfXy-0FF4-fTPh-Gzw4-2clq9k), log id: 1e8af27c *2015-02-09 09:08:06,474 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVGInfoVDSCommand] (pool-4-thread-26) [454bb3f] START, GetVGInfoVDSCommand(HostName = encl2-usrhv210, HostId = b9cea4b5-d2c8-4b75-8257-f3c20bf7199c, VGID=z1jxsy-*dP5X-xfXy-0FF4-fTPh-Gzw4-2clq9k), log id: 574ec52a *2015-02-09 09:08:06,475 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVGInfoVDSCommand] (pool-4-thread-43) [5212b38d] START, GetVGInfoVDSCommand(HostName = encl1-usrhv201, HostId = 593d7488-cfa6-43e5-bdd2-9291861926ef, VGID=z1jxsy*-dP5X-xfXy-0FF4-fTPh-Gzw4-2clq9k), log id: 399b8b6d *2015-02-09 09:08:06,478 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVGInfoVDSCommand] (pool-4-thread-24) [3d6c635e] START, GetVGInfoVDSCommand(HostName = encl2-usrhv208, HostId = f86b698d-d97e-4133-9bc7-c8b86304c4d7, VGID=z1jxsy*-dP5X-xfXy-0FF4-fTPh-Gzw4-2clq9k), log id: 73912fc8 But since only ONE can see it, the other fail. If I look at one node, I can see this pattern in the failure: 2015-02-09 09:08:06,412 INFO [org.ovirt.engine.core.bll.storage.SyncLunsInfoForBlockStorageDomainCommand] (pool-4-thread-39) [72f2ad3d] Running command: SyncLunsInfoForBlockStorageDomainCommand internal: true. Entities affected : ID: 52645ff5-ee38-4574-9787-5cf72d893f39 Type: Storage 2015-02-09 09:08:06,464 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVGInfoVDSCommand] (pool-4-thread-39) [72f2ad3d] START, GetVGInfoVDSCommand(HostName = encl1-usrhv206, HostId = dadbd25f-4e1a-4c80-a215-59cee9fee3f9, VGID=z1jxsy-dP5X-xfXy-0FF4-fTPh-Gzw4-2clq9k), log id: 1fef84c1 2015-02-09 09:08:06,671 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVGInfoVDSCommand] (pool-4-thread-39) [72f2ad3d] Failed in GetVGInfoVDS method 2015-02-09 09:08:06,672 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVGInfoVDSCommand] (pool-4-thread-39) [72f2ad3d] Error code VolumeGroupDoesNotExist and error message VDSGenericException: VDSErrorException: Failed to GetVGInfoVDS, error = Volume Group does not exist: ('vg_uuid: z1jxsy-dP5X-xfXy-0FF4-fTPh-Gzw4-2clq9k',) 2015-02-09 09:08:06,672 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVGInfoVDSCommand] (pool-4-thread-39) [72f2ad3d] Command org.ovirt.engine.core.vdsbroker.vdsbroker.GetVGInfoVDSCommand return value 2015-02-09 09:08:06,672 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVGInfoVDSCommand] (pool-4-thread-39) [72f2ad3d] HostName = encl1-usrhv206 2015-02-09 09:08:06,672 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVGInfoVDSCommand] (pool-4-thread-39) [72f2ad3d] Command GetVGInfoVDS execution failed. Exception: VDSErrorException: VDSGenericException: VDSErrorException: Failed to GetVGInfoVDS, error = Volume Group does not exist: ('vg_uuid: z1jxsy-dP5X-xfXy-0FF4-fTPh-Gzw4-2clq9k',) 2015-02-09 09:08:06,672 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVGInfoVDSCommand] (pool-4-thread-39) [72f2ad3d] FINISH, GetVGInfoVDSCommand, log id: 1fef84c1 2015-02-09 09:08:06,672 ERROR [org.ovirt.engine.core.bll.storage.SyncLunsInfoForBlockStorageDomainCommand] (pool-4-thread-39) [72f2ad3d] Command org.ovirt.engine.core.bll.storage.SyncLunsInfoForBlockStorageDomainCommand throw Vdc Bll exception. With error message VdcBLLException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to GetVGInfoVDS, error = Volume Group does not exist: ('vg_uuid: z1jxsy-dP5X-xfXy-0FF4-fTPh-Gzw4-2clq9k',) (Failed with error VolumeGroupDoesNotExist and code 506) Q1: Is this expected behavior when the new lun for the storage domain is only exposed to one host in the cluster? Q2: Where is the documentation that states that all nodes must have visibility to the lun if this is the case? Q3: If I am off base, what is causing the failure? How reproducible: XXXX Steps to reproduce: XXXX Actual results: XXXX Expected results: XXXX Summary of actions taken to resolve/troubleshoot issue: XXXX