Bug 988299
Summary: | Impossible to start VM from Gluster Storage Domain | ||
---|---|---|---|
Product: | [Retired] oVirt | Reporter: | Maciej Malesa <mmalesa> |
Component: | ovirt-engine-core | Assignee: | Deepak C Shetty <deepakcs> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | |
Severity: | high | Docs Contact: | |
Priority: | urgent | ||
Version: | 3.3 | CC: | acathrow, amureini, andrew, ar1666, barumuga, bharata.rao, cdhouch, chris.sullivan, deepakcs, hchiramm, iheim, mmalesa, sabose, samppah, sankarshan, yeylon |
Target Milestone: | --- | ||
Target Release: | 3.3 | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
URL: | http://gerrit.ovirt.org/#/c/17994/ | ||
Whiteboard: | gluster | ||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2013-09-23 07:36:18 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 918494 | ||
Attachments: |
Description
Maciej Malesa
2013-07-25 09:24:13 UTC
Bala, could you check this? please add 1. engine logs 2. vdsm logs 3. supervdsm logs Hi Maciej, 1) Please look at the below links for the pre-req. needed to use Gluster volume (aka Gluster cluster) as part of oVirt Storage domain ... http://www.ovirt.org/Features/GlusterFS_Storage_Domain#Testing http://www.ovirt.org/Features/GlusterFS_Storage_Domain#Important_Pre-requisites So apart from 36:36, you need to set other glusterd and gluster volume options (manually for now) to consume gluster cluster as a storage domain. 2) Per the pre-req. mentioned in the above page, we need a minm. of libvirt version 1.0.1 and qemu version 1.3 , since these are the respective versions that have gluster support, which is needed by oVirt Gluster storage domain Can you provide the libvirt and qemu version on your test system ? 3) engine, vdsm and supervdsm logs will help as Bala already noted. For now it looks un-related to the libvirt/qemu version issue.. but can't be sure, unless I see the VDSM logs to understand the stack from where the above error is coming. Maciej, Between step 3 and 4, did you create a new VM disk on the storage domain or not ? (In reply to Deepak C Shetty from comment #3) > Hi Maciej, > > 1) Please look at the below links for the pre-req. needed to use Gluster > volume (aka Gluster cluster) as part of oVirt Storage domain ... > > http://www.ovirt.org/Features/GlusterFS_Storage_Domain#Testing > http://www.ovirt.org/Features/GlusterFS_Storage_Domain#Important_Pre- > requisites > > So apart from 36:36, you need to set other glusterd and gluster volume > options > (manually for now) to consume gluster cluster as a storage domain. is this needed even when creating the volume via ovirt-engine? > > 2) Per the pre-req. mentioned in the above page, we need a minm. of libvirt > version 1.0.1 and qemu version 1.3 , since these are the respective versions > that have gluster support, which is needed by oVirt Gluster storage domain doesn't vdsm require this minimal versions via its spec to avoid unneeded issues here? (In reply to Itamar Heim from comment #5) > (In reply to Deepak C Shetty from comment #3) > > Hi Maciej, > > > > 1) Please look at the below links for the pre-req. needed to use Gluster > > volume (aka Gluster cluster) as part of oVirt Storage domain ... > > > > http://www.ovirt.org/Features/GlusterFS_Storage_Domain#Testing > > http://www.ovirt.org/Features/GlusterFS_Storage_Domain#Important_Pre- > > requisites > > > > So apart from 36:36, you need to set other glusterd and gluster volume > > options > > (manually for now) to consume gluster cluster as a storage domain. > > is this needed even when creating the volume via ovirt-engine? Yes it is, since there is no way to set these options (as of today) via the 'optimize for virt. store' settings. I am working with Gluster India team to see how best this can be done from UI, so that in future, we don't have a need to do this manual steps.. but reaching a conclusion on that topic has taken some time! > > > > > 2) Per the pre-req. mentioned in the above page, we need a minm. of libvirt > > version 1.0.1 and qemu version 1.3 , since these are the respective versions > > that have gluster support, which is needed by oVirt Gluster storage domain > > doesn't vdsm require this minimal versions via its spec to avoid unneeded > issues here? Current vdsm.spec has qemu-kvm >=0.15 ... If i change that to qemu-kvm >= 1.3 i believe VDSM won't get installed for many of the distro's that doesn't have qemu 1.3.. so I believe changing vdsm.spec to force 1.3 is not the right thing. It also wouldn't work for the case where user doesn't want to use gluster storage domain but VDSM will force qemu 1.3 which isn't good too. I am not sure whether we need to handle this as part of oVirt Cluster version 3.3 dep ? is there a way we can force VDSM dep pkg version based on Cluster version being used ? It still may not be ideal, since User can use oVirt Cluster ver 3.3 but use NFS/iSCSI domain and not Gluster... so I am open to suggestions on how to handle this issue in the right way ? thanx, deepak P.S. Having said that I am still not sure if the qemu/libvirt version is the issue here.. since we don't have VDSM logs, we can't know the real reason for the reported issue yet! But that is separate from the issue you raised on how to handle the pkg version dep. (In reply to Deepak C Shetty from comment #4) > Maciej, > Between step 3 and 4, did you create a new VM disk on the storage domain > or not ? Hi. I created new VM and after that I created disk from the "wizard". It did not start. Next approach was to create disk first and attach it to the new VM. Then I issued VM start. Effect was the same. I'm seeing the same issue even after setting the pre-req options on the latest 3.3 beta. Caerie, Can you exactly provide the steps you did on the Gluster volume side ? Also provide details on where did you install the oVirt engine and VDSM from, so that I can try to re-create. Lastly, can u provide the VDSM side logs pls ? Without logs its too difficult to make out. I did a new CentOS 6.4 install. I used the following repos for GlusterFS and Ovirt: [ovirt-beta] name=Beta builds of the oVirt project baseurl=http://ovirt.org/releases/beta/rpm/EL/$releasever/ enabled=1 skip_if_unavailable=1 gpgcheck=0 [glusterfs-epel] name=GlusterFS is a clustered file-system capable of scaling to several petabytes. baseurl=http://download.gluster.org/pub/gluster/glusterfs/3.4/3.4.0/EPEL.repo/epel-$releasever/$basearch/ enabled=1 skip_if_unavailable=1 gpgcheck=0 I added the gluster nodes as Hosts in Ovirt as their own Cluster and attempted to create the Volume but could not through the GUI (I didn't know about the pre-reqs until I found this bug report today). So I created the Distributed Replcated volume with: gluster volume create vmstorage replica 2 192.168.12.108:/bricks/brick1 192.168.12.109:/bricks/brick1 The Ovirt GUI immediately saw the volume and added it. I then used the GUI to add 2 more bricks. I then made a second Cluster of VM servers and added those to Ovirt. I then created the Storage Domain using the gluster mount created above, 192.168.12.108:vmstorage Then I created a blank VM using the wizard and the disk 16G disk for it. When I tried to start it up I received the Bad volume specification error. I found this bug report, and followed the pre-req steps for setting the gluster options and the change to the glusterd.vol file. I restarted the glusterd the gluster volume. I was then able to click the Optimize for VirtStore option and it seemed to work. I deleted the original VM, created a new one, and receive the same error when trying to start it. From the ovirt-engine.log: INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQ uartzScheduler_Worker-31) Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM test3 is do wn. Exit message: Bad volume specification {'index': 0, 'iface': 'virtio', 'reqsize': '0', 'format': 'raw', 't ype': 'disk', 'volumeID': '6511e8b7-64b1-4928-8e6c-52041e5a1733', 'apparentsize': '17179869184', 'imageID': '5 b65e144-4c0a-4d22-b372-7bc55d8e2785', 'specParams': {}, 'readonly': 'false', 'domainID': '27fadc4e-a194-4bad-a ef0-b85d73d1e51b', 'deviceId': '5b65e144-4c0a-4d22-b372-7bc55d8e2785', 'truesize': '0', 'poolID': 'a8276303-95 12-46e1-a2ad-621123ab45dd', 'device': 'disk', 'shared': 'false', 'propagateErrors': 'off', 'optional': 'false' }. Attaching the VDSM logs from the first gluster node and the ovirt-engine log Created attachment 781451 [details]
ovirt-engine.log AND all vdsm logs for gluster node
I looked at the engine.log and vdsm.log from your zip. It seems like the Error "Bad volume specification" is coming on the Engine side only. I don't see any error on the VDSM side. I looked around the 2013-08-01 00:12 to 2013-08-01 00:14 time window on engine and vdsm logs I see that the reported error is only on the Engine side. There is no ERROR or WARN related events in the vdsm.log. So someone from Engine side needs to debug this to figure why Engine thinks its a bad volume spec. Having said that.. I have few interesting observations... 1) On the VDSM side i see connectStorageServer but i don't see createStorageDomain and createVolume calls. which means.. the Engine never passed createStorageDomain and createVolume (when user created a 16GB disk) to VDSM... thats really strange! and i have never seen something like this before. 2) So what i see in the above time window on the VDSM side is.. connectStorageServer (which mounts the gluster volume on VDSM host) createStorageDomain - missing createVolume - missing connectStoragePool createVm - missing... which is expected, since Engine fails with the 'bad volume spec' error I am not much of an expert to debug the Engine side logs, so it would be good if someone can help debug from engine side. We also need to understnad what 'bad volume spec' error of engine actually means ? I have no clue looking at the engine.log thanx, deepak Just to try to narrow this down, I reinstalled everything and added back in only a two node glusterfs setup. Added the hosts to Ovirt but still couldn't create the Volume through the GUI (after making sure all pre-reqs were done) I kept getting a 500 GUI error. Created the glusterfs volume via the CLI and it showed up in the GUI. Created a virtual machine host to be the storage master. Created a new storage domain using the glusterf fs volume. Created a test disk in the domain. This creates all the files it should on the glusterfs volume. Attempted to remove the disk and it fails, setting the disk to illegal status. It looks like it's trying to find a volume named 0a414f79-0b29-4ca7-a2d7-733158969dfc and failing for some reason, even though that is what is located at the root of my glusterfs volume : total 4 drwxr-xr-x. 4 vdsm kvm 91 Aug 1 13:52 . drwxr-xr-x. 3 root root 4096 Jul 31 10:29 .. drwxr-xr-x. 5 vdsm kvm 45 Aug 1 13:53 0a414f79-0b29-4ca7-a2d7-733158969dfc -rwxr-xr-x. 1 vdsm kvm 0 Aug 1 13:52 __DIRECT_IO_TEST__ From the ovirt-engine.log 2013-08-01 16:06:03,684 INFO [org.ovirt.engine.core.bll.RemoveDiskCommand] (pool-6-thread-43) [4b1dfc61] Running command: RemoveDiskCommand internal: false. Entities affected : ID: 618ef3cb-7d43-4519-a9a4-bb718fe9ec68 Type: Disk 2013-08-01 16:06:03,687 INFO [org.ovirt.engine.core.bll.RemoveImageCommand] (pool-6-thread-43) Running command: RemoveImageCommand internal: true. Entities affected : ID: 0a414f79-0b29-4ca7-a2d7-733158969dfc Type: Storage 2013-08-01 16:06:03,697 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.DeleteImageGroupVDSCommand] (pool-6-thread-43) START, DeleteImageGroupVDSCommand( storagePool Id = dcffbd47-8adb-49da-9e3c-787338c58b58, ignoreFailoverLimit = false, storageDomainId = 0a414f79-0b29-4ca7-a2d7-733158969dfc, imageGroupId = 618ef3cb-7d43-4519-a9a4 -bb718fe9ec68, postZeros = false, forceDelete = false), log id: 2b13e97 2013-08-01 16:06:03,765 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.DeleteImageGroupVDSCommand] (pool-6-thread-43) Failed in DeleteImageGroupVDS method 2013-08-01 16:06:03,766 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.DeleteImageGroupVDSCommand] (pool-6-thread-43) Error code StorageDomainDoesNotExist and error message IRSGenericException: IRSErrorException: Failed to DeleteImageGroupVDS, error = Storage domain does not exist: ('0a414f79-0b29-4ca7-a2d7-733158969dfc',) 2013-08-01 16:06:03,767 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (pool-6-thread-43) IrsBroker::Failed::DeleteImageGroupVDS due to: IRSErrorException: IRSGenericException: IRSErrorException: Failed to DeleteImageGroupVDS, error = Storage domain does not exist: ('0a414f79-0b29-4ca7-a2d7-733158969dfc',) 2013-08-01 16:06:03,777 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.DeleteImageGroupVDSCommand] (pool-6-thread-43) FINISH, DeleteImageGroupVDSCommand, log id: 2b13e97 2013-08-01 16:06:03,777 ERROR [org.ovirt.engine.core.bll.RemoveImageCommand] (pool-6-thread-43) Command org.ovirt.engine.core.bll.RemoveImageCommand throw Vdc Bll exception. With error message VdcBLLException: org.ovirt.engine.core.vdsbroker.irsbroker.IRSErrorException: IRSGenericException: IRSErrorException: Failed to DeleteImageGroupVDS, error = Storage domain does not exist: ('0a414f79-0b29-4ca7-a2d7-733158969dfc',) (Failed with VDSM error StorageDomainDoesNotExist and code 358) 2013-08-01 16:06:03,783 INFO [org.ovirt.engine.core.bll.RemoveImageCommand] (pool-6-thread-43) Command [id=7cc38dc4-fd4f-498f-8581-bb471a75b30b]: Compensating CHANGED_STATUS_ONLY of org.ovirt.engine.core.common.businessentities.Image; snapshot: EntityStatusSnapshot [id=a5129d1b-7b5d-4210-a52e-1ff905175e4f, status=ILLEGAL]. 2013-08-01 16:06:03,794 INFO [org.ovirt.engine.core.bll.AsyncTaskManager] (pool-6-thread-43) Removed task 7014e15e-e1fa-440e-8e5b-f0f10762e56f from DataBase 2013-08-01 16:06:03,798 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (pool-6-thread-43) Correlation ID: 4b1dfc61, Job ID: 6c595a2f-46c0-4d12-bcf7-3bff04b6e421, Call Stack: null, Custom Event ID: -1, Message: User admin@internal finished removing disk mydisk_disk1 with storage failure in domain vmstoragedomain.
> 2013-08-01 16:06:03,765 ERROR
> [org.ovirt.engine.core.vdsbroker.irsbroker.DeleteImageGroupVDSCommand]
> (pool-6-thread-43) Failed in DeleteImageGroupVDS method
> 2013-08-01 16:06:03,766 ERROR
> [org.ovirt.engine.core.vdsbroker.irsbroker.DeleteImageGroupVDSCommand]
> (pool-6-thread-43) Error code StorageDomainDoesNotExist and error message
> IRSGenericException: IRSErrorException: Failed to DeleteImageGroupVDS, error
> = Storage domain does not exist: ('0a414f79-0b29-4ca7-a2d7-733158969dfc',)
> 2013-08-01 16:06:03,767 ERROR
> [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand]
> (pool-6-thread-43) IrsBroker::Failed::DeleteImageGroupVDS due to:
> IRSErrorException: IRSGenericException: IRSErrorException: Failed to
> DeleteImageGroupVDS, error = Storage domain does not exist:
> ('0a414f79-0b29-4ca7-a2d7-733158969dfc',)
This matches with what I saw and reported as observation on VDSM side, in my prev. comment. Since createStorageDomain is not being called on the VDSM side, there is really no storage domain created on the host.. which is what Engine log is saying here. But the root cause seems to be the fact that createStorageDomain is not passed by Engine to the VDSM... ideally it should have failed at the "Created a new storage domain using the glusterf fs volume." step itself... but strangely that goes thru.
Need more debug from the engine side to decipher why create storage domain is not passed by the Engine to VDSM.
Caerie,
Can you dump the output of 'df -h' on the host side after
"Created a new storage domain using the glusterf fs volume." is shown successful on the GUI ?
thanx,
deepak
(In reply to Deepak C Shetty from comment #14) > > 2013-08-01 16:06:03,765 ERROR > > [org.ovirt.engine.core.vdsbroker.irsbroker.DeleteImageGroupVDSCommand] > > (pool-6-thread-43) Failed in DeleteImageGroupVDS method > > 2013-08-01 16:06:03,766 ERROR > > [org.ovirt.engine.core.vdsbroker.irsbroker.DeleteImageGroupVDSCommand] > > (pool-6-thread-43) Error code StorageDomainDoesNotExist and error message > > IRSGenericException: IRSErrorException: Failed to DeleteImageGroupVDS, error > > = Storage domain does not exist: ('0a414f79-0b29-4ca7-a2d7-733158969dfc',) > > 2013-08-01 16:06:03,767 ERROR > > [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] > > (pool-6-thread-43) IrsBroker::Failed::DeleteImageGroupVDS due to: > > IRSErrorException: IRSGenericException: IRSErrorException: Failed to > > DeleteImageGroupVDS, error = Storage domain does not exist: > > ('0a414f79-0b29-4ca7-a2d7-733158969dfc',) > > This matches with what I saw and reported as observation on VDSM side, in my > prev. comment. Since createStorageDomain is not being called on the VDSM > side, there is really no storage domain created on the host.. which is what > Engine log is saying here. But the root cause seems to be the fact that > createStorageDomain is not passed by Engine to the VDSM... ideally it should > have failed at the "Created a new storage domain using the glusterf fs > volume." step itself... but strangely that goes thru. > > Need more debug from the engine side to decipher why create storage domain > is not passed by the Engine to VDSM. Deepak, 2013-07-31 15:29:59,996 INFO [org.ovirt.engine.core.vdsbroker.CreateVmVDSCommand] (pool-6-thread-39) START, CreateVmVDSCommand(HostName = vmhp61m2, HostId = ca4f4013-e3f2-4c8f-89fc-c6507f771eb3, vmId=70b48ccd-6479-40c8-a14f-8351dee1d144, vm=VM [test3]), log id: 4f3f3a43 Is it possible that the VDSM log attached if from the storage node and not hypervisore node vmhp61m2? Caerie - can you confirm. > > Caerie, > Can you dump the output of 'df -h' on the host side after > "Created a new storage domain using the glusterf fs volume." is shown > successful on the GUI ? > Also, in the engine logs (from comment 10)- 2013-07-31 15:26:51,803 WARN [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (pool-6-thread-39) vds nfsp14m2 reported domain 27fadc4e-a194-4bad-aef0-b85d73d1e51b:glusterstoredomain as in problem, moving the vds to status NonOperational Deepak, looks like there is some issue with storage domain ? But in the vdsm log Thread-68109::DEBUG::2013-07-31 15:26:56,438::task::579::TaskManager.Task::(_updateState) Task=`0b717e05-e630-4087-91de-8d6073ec259f`::moving from state init -> state preparing Thread-68109::INFO::2013-07-31 15:26:56,439::logUtils::44::dispatcher::(wrapper) Run and protect: repoStats(options=None) Thread-68109::INFO::2013-07-31 15:26:56,439::logUtils::47::dispatcher::(wrapper) Run and protect: repoStats, Return response: {'a806930b-177c-45c6-a8f8-3650892f9639': {'delay': '0.000398594', 'lastCheck': '8.5', 'code': 0, 'valid': True, 'version': 0}, '27fadc4e-a194-4bad-aef0-b85d73d1e51b': {'delay': '0.00172392', 'lastCheck': '8.6', 'code': 0, 'valid': True, 'version': 3}} > > thanx, > deepak (In reply to Sahina Bose from comment #15) > (In reply to Deepak C Shetty from comment #14) > > > 2013-08-01 16:06:03,765 ERROR > > > [org.ovirt.engine.core.vdsbroker.irsbroker.DeleteImageGroupVDSCommand] > > > (pool-6-thread-43) Failed in DeleteImageGroupVDS method > > > 2013-08-01 16:06:03,766 ERROR > > > [org.ovirt.engine.core.vdsbroker.irsbroker.DeleteImageGroupVDSCommand] > > > (pool-6-thread-43) Error code StorageDomainDoesNotExist and error message > > > IRSGenericException: IRSErrorException: Failed to DeleteImageGroupVDS, error > > > = Storage domain does not exist: ('0a414f79-0b29-4ca7-a2d7-733158969dfc',) > > > 2013-08-01 16:06:03,767 ERROR > > > [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] > > > (pool-6-thread-43) IrsBroker::Failed::DeleteImageGroupVDS due to: > > > IRSErrorException: IRSGenericException: IRSErrorException: Failed to > > > DeleteImageGroupVDS, error = Storage domain does not exist: > > > ('0a414f79-0b29-4ca7-a2d7-733158969dfc',) > > > > This matches with what I saw and reported as observation on VDSM side, in my > > prev. comment. Since createStorageDomain is not being called on the VDSM > > side, there is really no storage domain created on the host.. which is what > > Engine log is saying here. But the root cause seems to be the fact that > > createStorageDomain is not passed by Engine to the VDSM... ideally it should > > have failed at the "Created a new storage domain using the glusterf fs > > volume." step itself... but strangely that goes thru. > > > > Need more debug from the engine side to decipher why create storage domain > > is not passed by the Engine to VDSM. > > Deepak, > 2013-07-31 15:29:59,996 INFO > [org.ovirt.engine.core.vdsbroker.CreateVmVDSCommand] (pool-6-thread-39) > START, CreateVmVDSCommand(HostName = vmhp61m2, HostId = > ca4f4013-e3f2-4c8f-89fc-c6507f771eb3, > vmId=70b48ccd-6479-40c8-a14f-8351dee1d144, vm=VM [test3]), log id: 4f3f3a43 > > Is it possible that the VDSM log attached if from the storage node and not > hypervisore node vmhp61m2? > Caerie - can you confirm. Sahina, Agree.. I think Caerie is attaching the VDSM log from glsuter node, we need the log from VDSM / hypevisor host node. In fact that was my first doubt too.. but i overlooked it, since i saw connectStoragePool being called in the vdsm.log. But thinking abt this again, I guess the storagepool and storageserver concepts gets used on the storage node as well.. so your guess is right. Caerie, pls provide the vdsm.log from hypervisor host (vmhp61m2) > > > > Caerie, > > Can you dump the output of 'df -h' on the host side after > > "Created a new storage domain using the glusterf fs volume." is shown > > successful on the GUI ? > > > > Also, in the engine logs (from comment 10)- > 2013-07-31 15:26:51,803 WARN > [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] > (pool-6-thread-39) vds nfsp14m2 reported domain > 27fadc4e-a194-4bad-aef0-b85d73d1e51b:glusterstoredomain as in problem, > moving the vds to status NonOperational > > > Deepak, looks like there is some issue with storage domain ? > But in the vdsm log > > Thread-68109::DEBUG::2013-07-31 > 15:26:56,438::task::579::TaskManager.Task::(_updateState) > Task=`0b717e05-e630-4087-91de-8d6073ec259f`::moving from state init -> state > preparing > Thread-68109::INFO::2013-07-31 > 15:26:56,439::logUtils::44::dispatcher::(wrapper) Run and protect: > repoStats(options=None) > Thread-68109::INFO::2013-07-31 > 15:26:56,439::logUtils::47::dispatcher::(wrapper) Run and protect: > repoStats, Return response: {'a806930b-177c-45c6-a8f8-3650892f9639': > {'delay': '0.000398594', 'lastCheck': '8.5', 'code': 0, 'valid': True, > 'version': 0}, '27fadc4e-a194-4bad-aef0-b85d73d1e51b': {'delay': > '0.00172392', 'lastCheck': '8.6', 'code': 0, 'valid': True, 'version': 3}} Sahina, if your guess is right, the vdsm.log attached itself is wrong.. so no point in seeing repoStats in vdsm.log. Bala/Sahina, One another reason why i thought the vdsm.log provided was from the hyp. host is because i saw this... Thread-89425::INFO::2013-08-01 00:12:08,716::logUtils::44::dispatcher::(wrapper) Run and protect: connectStorageServer(domType=7, spUUID='00000000-0000-0000-0000-000000000000', conList=[{'port': '', 'connection': '192.168.12.108:vmstorage', 'iqn': '', 'portal': '', 'user': '', 'vfs_type': 'glusterfs', 'password': '******', 'id': 'c79ae912-4adb-4407-97e7-f57ebebe6b3b'}], options=None) Thread-89425::DEBUG::2013-08-01 00:12:08,738::mount::226::Storage.Misc.excCmd::(_runcmd) '/usr/bin/sudo -n /bin/mount -t glusterfs 192.168.12.108:vmstorage /rhev/data-center/mnt/glusterSD/192.168.12.108:vmstorage' (cwd None) Thread-89425::DEBUG::2013-08-01 00:12:09,101::hsm::2333::Storage.HSM::(__prefetchDomains) glusterDomPath: glusterSD/* Thread-89425::DEBUG::2013-08-01 00:12:09,123::hsm::2345::Storage.HSM::(__prefetchDomains) Found SD uuids: ('27fadc4e-a194-4bad-aef0-b85d73d1e51b',) Thread-89425::DEBUG::2013-08-01 00:12:09,123::hsm::2396::Storage.HSM::(connectStorageServer) knownSDs: {a806930b-177c-45c6-a8f8-3650892f9639: storage.nfsSD.findDomain, 27fadc4e-a194-4bad-aef0-b85d73d1e51b: storage.glusterSD.findDomain} Thread-89425::INFO::2013-08-01 00:12:09,124::logUtils::47::dispatcher::(wrapper) Run and protect: connectStorageServer, Return response: {'statuslist': [{'status': 0, 'id': 'c79ae912-4adb-4407-97e7-f57ebebe6b3b'}]} domType=7 is GLUSTERFS_DOMAIN Does connectStorageServer makes sense if this vdsm.log is from gluster only node ? Deepak, Looks like virt and storage are running on the same node. And this log is from that, but not from the node on which the VM creation was done? Yes, the vdsm logs were to the gluster node as stated in the file comments. I've since redone the entire setup from scratch and made it simpler. These logs go to: 1 - Virtual Machine host 2 - GlusterFS Host nodes 1 gluster volume with two bricks in distribute replicate 2 Including logs for: ovirt-engine virtual machine host vdsm logs gluster node 1 - glusterfs and vdsm logs gluster node 2 - glusterfs and vdsm logs Created attachment 782002 [details]
all logs for ovirt, virtual host, two gluster nodes
Thanks Caerie, I looked into the vmhost_vdsm.log and engine.log and found some details. From engine.log... 2013-08-02 09:48:44,527 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-54) Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM testvm1 is down. Exit message: Bad volume specification {'index': 0, 'iface': 'virtio', 'reqsize': '0', 'format': 'raw', 'type': 'disk', 'volumeID': '88ead0ec-9cb1-40a6-9755-d69806ac9146', 'apparentsize': '17179869184', 'imageID': '5242493d-2420-4079-9dcc-81dc032fb1d4', 'specParams': {}, 'readonly': 'false', 'domainID': '0a414f79-0b29-4ca7-a2d7-733158969dfc', 'deviceId': '5242493d-2420-4079-9dcc-81dc032fb1d4', 'truesize': '17179869184', 'poolID': 'dcffbd47-8adb-49da-9e3c-787338c58b58', 'device': 'disk', 'shared': 'false', 'propagateErrors': 'off', 'optional': 'false'}. So the problem is reported at 2013-08-02 09:48:44 time During the same time from vmhost_vdsm.log ... Thread-43199::ERROR::2013-08-02 09:48:48,414::task::850::TaskManager.Task::(_setError) Task=`e58043ac-226d-45c8-8fed-d85ccd148ed5`::Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 857, in _run return fn(*args, **kargs) File "/usr/share/vdsm/logUtils.py", line 45, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 3251, in prepareImage 'vmVolInfo': vol.getVmVolumeInfo()} File "/usr/share/vdsm/storage/glusterVolume.py", line 30, in getVmVolumeInfo volInfo = svdsmProxy.glusterVolumeInfo(volname, volfileServer) File "/usr/share/vdsm/supervdsm.py", line 50, in __call__ return callMethod() File "/usr/share/vdsm/supervdsm.py", line 48, in <lambda> **kwargs) File "<string>", line 2, in glusterVolumeInfo File "/usr/lib64/python2.6/multiprocessing/managers.py", line 740, in _callmethod raise convert_to_error(kind, result) OSError: [Errno 2] No such file or directory: gluster Thread-43199::DEBUG::2013-08-02 09:48:48,416::task::869::TaskManager.Task::(_run) Task=`e58043ac-226d-45c8-8fed-d85ccd148ed5`::Task._run: e58043ac-226d-45c8-8fed-d85ccd148ed5 ('0a414f79-0b29-4ca7-a2d7-733158969dfc', 'dcffbd47-8adb-49da-9e3c-787338c58b58', '5242493d-2420-4079-9dcc-81dc032fb1d4', '88ead0ec-9cb1-40a6-9755-d69806ac9146') {} failed - stopping task Thread-43199::DEBUG::2013-08-02 09:48:48,416::task::1194::TaskManager.Task::(stop) Task=`e58043ac-226d-45c8-8fed-d85ccd148ed5`::stopping in state preparing (force False) Thread-43199::DEBUG::2013-08-02 09:48:48,416::task::974::TaskManager.Task::(_decref) Task=`e58043ac-226d-45c8-8fed-d85ccd148ed5`::ref 1 aborting True Thread-43199::ERROR::2013-08-02 09:48:48,421::dispatcher::70::Storage.Dispatcher.Protect::(run) [Errno 2] No such file or directory: gluster Traceback (most recent call last): File "/usr/share/vdsm/storage/dispatcher.py", line 62, in run result = ctask.prepare(self.func, *args, **kwargs) File "/usr/share/vdsm/storage/task.py", line 1159, in prepare raise self.error OSError: [Errno 2] No such file or directory: gluster Thread-43199::DEBUG::2013-08-02 09:48:48,422::vm::2034::vm.Vm::(_startUnderlyingVm) vmId=`42b8b60f-7d89-4780-a27e-c0627cd49f8a`::_ongoingCreations released Thread-43199::ERROR::2013-08-02 09:48:48,422::vm::2060::vm.Vm::(_startUnderlyingVm) vmId=`42b8b60f-7d89-4780-a27e-c0627cd49f8a`::The vm start process failed Traceback (most recent call last): File "/usr/share/vdsm/vm.py", line 2020, in _startUnderlyingVm self._run() File "/usr/share/vdsm/vm.py", line 2815, in _run self.preparePaths(devices[DISK_DEVICES]) File "/usr/share/vdsm/vm.py", line 2081, in preparePaths drive['path'] = self.cif.prepareVolumePath(drive, self.id) File "/usr/share/vdsm/clientIF.py", line 256, in prepareVolumePath raise vm.VolumeError(drive) VolumeError: Bad volume specification {'index': 0, 'iface': 'virtio', 'reqsize': '0', 'format': 'raw', 'type': 'disk', 'volumeID': '88ead0ec-9cb1-40a6-9755-d69806ac9146', 'apparentsize': '17179869184', 'imageID': '5242493d-2420-4079-9dcc-81dc032fb1d4', 'specParams': {}, 'readonly': 'false', 'domainID': '0a414f79-0b29-4ca7-a2d7-733158969dfc', 'deviceId': '5242493d-2420-4079-9dcc-81dc032fb1d4', 'truesize': '17179869184', 'poolID': 'dcffbd47-8adb-49da-9e3c-787338c58b58', 'device': 'disk', 'shared': 'false', 'propagateErrors': 'off', 'optional': 'false'} Thread-43199::DEBUG::2013-08-02 09:48:48,429::vm::2444::vm.Vm::(setDownStatus) vmId=`42b8b60f-7d89-4780-a27e-c0627cd49f8a`::Changed state to Down: Bad volume specification {'index': 0, 'iface': 'virtio', 'reqsize': '0', 'format': 'raw', 'type': 'disk', 'volumeID': '88ead0ec-9cb1-40a6-9755-d69806ac9146', 'apparentsize': '17179869184', 'imageID': '5242493d-2420-4079-9dcc-81dc032fb1d4', 'specParams': {}, 'readonly': 'false', 'domainID': '0a414f79-0b29-4ca7-a2d7-733158969dfc', 'deviceId': '5242493d-2420-4079-9dcc-81dc032fb1d4', 'truesize': '17179869184', 'poolID': 'dcffbd47-8adb-49da-9e3c-787338c58b58', 'device': 'disk', 'shared': 'false', 'propagateErrors': 'off', 'optional': 'false'} The root cause being... OSError: [Errno 2] No such file or directory: gluster So it looks like on the VDSM host you don't have /usr/sbin/gluster or u have the gluster binary but not in the above path ? Caerie, can you confirm the below... 1) /usr/sbin/gluster present on vdsm host ? 2) Is the glusterd service running on the VDSM host ? 3) 'gluster volume info' on VDSM host 4) 'gluster volume info --remote-host=192.168.12.108' from VDSM host ? (assuming 192.168.12.108 is one of your gluster nodes) I still didn't see things like createStorageDomain etc in the vmhost_vdsm.log but thats probably bcos its rolled over.. nevertheless, given we see the VDSM error at the same time as engine.log, I think we don't need to see other vdsm host logs. Bala, So glusterVolume.py tries to do ... volInfo = svdsmProxy.glusterVolumeInfo(volname, volfileServer) which goes to the vdsm-gluster plugin/cli code in VDSM which is failing with OSError: [Errno 2] No such file or directory: gluster So any idea what are the cases when gluster vdsm plugin can fail with the above error ? One obvious is gluster cli not installed or present on VDSM host, but can there be other issues ? thanx, deepak Caerie, In addition to the above, pls provide the output of below commands executed on the VDSM host ( not the gluster node) 1) rpm -qa| grep gluster (In reply to Deepak C Shetty from comment #21) > Thanks Caerie, > I looked into the vmhost_vdsm.log and engine.log and found some details. > > From engine.log... > > 2013-08-02 09:48:44,527 INFO > [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] > (DefaultQuartzScheduler_Worker-54) Correlation ID: null, Call Stack: null, > Custom Event ID: -1, Message: VM testvm1 is down. Exit message: Bad volume > specification {'index': 0, 'iface': 'virtio', 'reqsize': '0', 'format': > 'raw', 'type': 'disk', 'volumeID': '88ead0ec-9cb1-40a6-9755-d69806ac9146', > 'apparentsize': '17179869184', 'imageID': > '5242493d-2420-4079-9dcc-81dc032fb1d4', 'specParams': {}, 'readonly': > 'false', 'domainID': '0a414f79-0b29-4ca7-a2d7-733158969dfc', 'deviceId': > '5242493d-2420-4079-9dcc-81dc032fb1d4', 'truesize': '17179869184', 'poolID': > 'dcffbd47-8adb-49da-9e3c-787338c58b58', 'device': 'disk', 'shared': 'false', > 'propagateErrors': 'off', 'optional': 'false'}. > > So the problem is reported at 2013-08-02 09:48:44 time > > During the same time from vmhost_vdsm.log ... > > Thread-43199::ERROR::2013-08-02 > 09:48:48,414::task::850::TaskManager.Task::(_setError) > Task=`e58043ac-226d-45c8-8fed-d85ccd148ed5`::Unexpected error > Traceback (most recent call last): > File "/usr/share/vdsm/storage/task.py", line 857, in _run > return fn(*args, **kargs) > File "/usr/share/vdsm/logUtils.py", line 45, in wrapper > res = f(*args, **kwargs) > File "/usr/share/vdsm/storage/hsm.py", line 3251, in prepareImage > 'vmVolInfo': vol.getVmVolumeInfo()} > File "/usr/share/vdsm/storage/glusterVolume.py", line 30, in > getVmVolumeInfo > volInfo = svdsmProxy.glusterVolumeInfo(volname, volfileServer) > File "/usr/share/vdsm/supervdsm.py", line 50, in __call__ > return callMethod() > File "/usr/share/vdsm/supervdsm.py", line 48, in <lambda> > **kwargs) > File "<string>", line 2, in glusterVolumeInfo > File "/usr/lib64/python2.6/multiprocessing/managers.py", line 740, in > _callmethod > raise convert_to_error(kind, result) > OSError: [Errno 2] No such file or directory: gluster > Thread-43199::DEBUG::2013-08-02 > 09:48:48,416::task::869::TaskManager.Task::(_run) > Task=`e58043ac-226d-45c8-8fed-d85ccd148ed5`::Task._run: > e58043ac-226d-45c8-8fed-d85ccd148ed5 > ('0a414f79-0b29-4ca7-a2d7-733158969dfc', > 'dcffbd47-8adb-49da-9e3c-787338c58b58', > '5242493d-2420-4079-9dcc-81dc032fb1d4', > '88ead0ec-9cb1-40a6-9755-d69806ac9146') {} failed - stopping task > Thread-43199::DEBUG::2013-08-02 > 09:48:48,416::task::1194::TaskManager.Task::(stop) > Task=`e58043ac-226d-45c8-8fed-d85ccd148ed5`::stopping in state preparing > (force False) > Thread-43199::DEBUG::2013-08-02 > 09:48:48,416::task::974::TaskManager.Task::(_decref) > Task=`e58043ac-226d-45c8-8fed-d85ccd148ed5`::ref 1 aborting True > > > Thread-43199::ERROR::2013-08-02 > 09:48:48,421::dispatcher::70::Storage.Dispatcher.Protect::(run) [Errno 2] No > such file or directory: gluster > Traceback (most recent call last): > File "/usr/share/vdsm/storage/dispatcher.py", line 62, in run > result = ctask.prepare(self.func, *args, **kwargs) > File "/usr/share/vdsm/storage/task.py", line 1159, in prepare > raise self.error > OSError: [Errno 2] No such file or directory: gluster > Thread-43199::DEBUG::2013-08-02 > 09:48:48,422::vm::2034::vm.Vm::(_startUnderlyingVm) > vmId=`42b8b60f-7d89-4780-a27e-c0627cd49f8a`::_ongoingCreations released > Thread-43199::ERROR::2013-08-02 > 09:48:48,422::vm::2060::vm.Vm::(_startUnderlyingVm) > vmId=`42b8b60f-7d89-4780-a27e-c0627cd49f8a`::The vm start process failed > Traceback (most recent call last): > File "/usr/share/vdsm/vm.py", line 2020, in _startUnderlyingVm > self._run() > File "/usr/share/vdsm/vm.py", line 2815, in _run > self.preparePaths(devices[DISK_DEVICES]) > File "/usr/share/vdsm/vm.py", line 2081, in preparePaths > drive['path'] = self.cif.prepareVolumePath(drive, self.id) > File "/usr/share/vdsm/clientIF.py", line 256, in prepareVolumePath > raise vm.VolumeError(drive) > VolumeError: Bad volume specification {'index': 0, 'iface': 'virtio', > 'reqsize': '0', 'format': 'raw', 'type': 'disk', 'volumeID': > '88ead0ec-9cb1-40a6-9755-d69806ac9146', 'apparentsize': '17179869184', > 'imageID': '5242493d-2420-4079-9dcc-81dc032fb1d4', 'specParams': {}, > 'readonly': 'false', 'domainID': '0a414f79-0b29-4ca7-a2d7-733158969dfc', > 'deviceId': '5242493d-2420-4079-9dcc-81dc032fb1d4', 'truesize': > '17179869184', 'poolID': 'dcffbd47-8adb-49da-9e3c-787338c58b58', 'device': > 'disk', 'shared': 'false', 'propagateErrors': 'off', 'optional': 'false'} > Thread-43199::DEBUG::2013-08-02 > 09:48:48,429::vm::2444::vm.Vm::(setDownStatus) > vmId=`42b8b60f-7d89-4780-a27e-c0627cd49f8a`::Changed state to Down: Bad > volume specification {'index': 0, 'iface': 'virtio', 'reqsize': '0', > 'format': 'raw', 'type': 'disk', 'volumeID': > '88ead0ec-9cb1-40a6-9755-d69806ac9146', 'apparentsize': '17179869184', > 'imageID': '5242493d-2420-4079-9dcc-81dc032fb1d4', 'specParams': {}, > 'readonly': 'false', 'domainID': '0a414f79-0b29-4ca7-a2d7-733158969dfc', > 'deviceId': '5242493d-2420-4079-9dcc-81dc032fb1d4', 'truesize': > '17179869184', 'poolID': 'dcffbd47-8adb-49da-9e3c-787338c58b58', 'device': > 'disk', 'shared': 'false', 'propagateErrors': 'off', 'optional': 'false'} > > > The root cause being... > OSError: [Errno 2] No such file or directory: gluster > > So it looks like on the VDSM host you don't have /usr/sbin/gluster > or u have the gluster binary but not in the above path ? > > Caerie, can you confirm the below... > > 1) /usr/sbin/gluster present on vdsm host ? > 2) Is the glusterd service running on the VDSM host ? > 3) 'gluster volume info' on VDSM host > 4) 'gluster volume info --remote-host=192.168.12.108' from VDSM host ? > (assuming 192.168.12.108 is one of your gluster nodes) > > I still didn't see things like createStorageDomain etc in the > vmhost_vdsm.log but thats probably bcos its rolled over.. nevertheless, > given we see the VDSM error at the same time as engine.log, I think we don't > need to see other vdsm host logs. > > Bala, > So glusterVolume.py tries to do ... > > volInfo = svdsmProxy.glusterVolumeInfo(volname, volfileServer) > which goes to the vdsm-gluster plugin/cli code in VDSM > gluster.cli is part of vdsm itself. I remember your storage domain patch moved from vdsm-gluster. > which is failing with > > OSError: [Errno 2] No such file or directory: gluster > > So any idea what are the cases when gluster vdsm plugin can fail with the > above error ? One obvious is gluster cli not installed or present on VDSM > host, but can there be other issues ? > > thanx, > deepak (In reply to Bala.FA from comment #23) > > Bala, > > So glusterVolume.py tries to do ... > > > > volInfo = svdsmProxy.glusterVolumeInfo(volname, volfileServer) > > which goes to the vdsm-gluster plugin/cli code in VDSM > > > > gluster.cli is part of vdsm itself. I remember your storage domain patch > moved from vdsm-gluster. Bala, Regret if my question was not clear. Yes gluster.cli is part of VDSM itself, but it depends on /usr/sbin/gluster, which probably is not present on the VDSM host, hence the failure. Caerie, Pls provide the requested info from VDSM host to help root cause further. Thanks. For the Ovirt Engine box: [root@ovirtp1m2 ~]# rpm -qa |grep gluster glusterfs-3.4.0-3.el6.x86_64 glusterfs-fuse-3.4.0-3.el6.x86_64 glusterfs-rdma-3.4.0-3.el6.x86_64 For the Virtual Machine Host : [root@vmhp60m2 ~]# rpm -qa |grep gluster glusterfs-rdma-3.4.0-3.el6.x86_64 glusterfs-3.4.0-3.el6.x86_64 glusterfs-fuse-3.4.0-3.el6.x86_64 For the Gluster Nodes: [root@nfsp12m2 ~]# rpm -qa |grep gluster glusterfs-server-3.4.0-3.el6.x86_64 glusterfs-3.4.0-3.el6.x86_64 glusterfs-fuse-3.4.0-3.el6.x86_64 glusterfs-rdma-3.4.0-3.el6.x86_64 vdsm-gluster-4.12.0-0.1.rc3.el6.noarch glusterfs-geo-replication-3.4.0-3.el6.x86_64 [root@nfsp13m2 ~]# rpm -qa |grep gluster glusterfs-server-3.4.0-3.el6.x86_64 glusterfs-3.4.0-3.el6.x86_64 glusterfs-fuse-3.4.0-3.el6.x86_64 glusterfs-rdma-3.4.0-3.el6.x86_64 vdsm-gluster-4.12.0-0.1.rc3.el6.noarch glusterfs-geo-replication-3.4.0-3.el6.x86_64 I have two Ovirt Clusters here, one has Gluster enabled and has the Gluster Hosts on it, the other has Virtualization enabled and has the Virtual Host server on it. That would explain the difference in what is installed on each. Now that we have an idea here I'm going to make the packages on the Virtual Machine Host and the Ovirt Engine match those on the Gluster Hosts. This will be great if it's just a package dependency issue. After installing vdsm-gluster on the Virtual Machine host, the virtual machines are failing to start with an "unknown protocol gluster" message in Ovirt. I then made sure glusterd was running and there didn't seem to be a difference. At least this fixes the original problem of this bug report of "Bad Volume specification" I tried installing vdsm-gluster on the ovirt-engine just to see what would happen, and obviously that didn't change anything. Since this is failing with unknown service gluster, I'm going to try removing the virtual machine cluster and re-adding with both virtualization and gluster service enabled in ovirt. Attaching ovirt-engine log and vdsm log from virtual machine host. Snippet from engine.log 2013-08-05 08:51:19,269 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (pool-6-thread-47) [1b05fd77] Corre lation ID: 1b05fd77, Job ID: b77a5d81-8b8e-41ca-b4ef-46ac232065cf, Call Stack: null, Custom Event ID: -1, Message: VM test5 was started by admin@internal (Host: vmhp60m2). 2013-08-05 08:51:21,375 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (DefaultQuartzScheduler_Worker-62) START, De stroyVDSCommand(HostName = vmhp60m2, HostId = 9f7eb405-5207-4ac5-b80e-e40ac40354af, vmId=6f505f64-2422-4949-be52-131ae5dfda78, force=fal se, secondsToWait=0, gracefully=false), log id: 19971bf9 2013-08-05 08:51:21,484 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (DefaultQuartzScheduler_Worker-62) FINISH, D estroyVDSCommand, log id: 19971bf9 2013-08-05 08:51:21,495 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-62) Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM test5 is down. Exit message: internal error unknown protocol ty pe 'gluster'. [root@vmhp60m2 ~]# rpm -qa |grep gluster glusterfs-rdma-3.4.0-3.el6.x86_64 glusterfs-server-3.4.0-3.el6.x86_64 vdsm-gluster-4.12.0-0.1.rc3.el6.noarch glusterfs-3.4.0-3.el6.x86_64 glusterfs-fuse-3.4.0-3.el6.x86_64 [root@ovirtp1m2 ovirt-engine]# rpm -qa |grep gluster glusterfs-3.4.0-3.el6.x86_64 glusterfs-fuse-3.4.0-3.el6.x86_64 glusterfs-rdma-3.4.0-3.el6.x86_64 vdsm-gluster-4.12.0-18.git78903dd.el6.noarch glusterfs-server-3.4.0-3.el6.x86_64 Created attachment 782850 [details]
engine.log and vdsm.log after adding vdsm-gluster
Caerie, Thanks for confirming that after installing vdsm-gluster (which in turn installs glusterfs-server) the original reported problem in this BZ is gone. Ideally just install glsuterfs-server should have worked too. BTW, FYI there is a big mail thread going on in the gluster-devel mailing list on how to package gluster such that these kind of issues (and some other types) are not seen by the end User. I have ensured that the issue reported by this BZ has been reported as a feedback to which I have recd. favourable response... so in future, when the next set of Gluster RPM packaging changes are introduced, it should ensure that for hypervisor only host, all that is needed is glusterfs, glusterfs-libs glusterfs-fuse and glusterfs-cli (proposed) packages. I will be modifying the vdsm.spec (aka vdsm dep for gluster) once that is changed on the gluster project side. Having said that.. reg. the next problem you are seeing.. Here is my debug From vmhost_vdsm.log <disk device="disk" snapshot="no" type="network"> <source name="vmstorage/0a414f79-0b29-4ca7-a2d7-733158969dfc/images/9f88f7d6-1822-47a4-8d55-3747126b2a99/2c62aafc-6f04-4e5a-af48-9284316d35fe" protocol="gluster"> <host name="192.168.12.108" port="0" transport="tcp"/> </source> <target bus="virtio" dev="vda"/> <serial>9f88f7d6-1822-47a4-8d55-3747126b2a99</serial> <driver cache="none" error_policy="stop" io="threads" name="qemu" type="raw"/> </disk> Thread-196808::DEBUG::2013-08-05 08:51:29,754::vm::2034::vm.Vm::(_startUnderlyingVm) vmId=`6f505f64-2422-4949-be52-131ae5dfda78`::_ongoingCreations released Thread-196808::ERROR::2013-08-05 08:51:29,755::vm::2060::vm.Vm::(_startUnderlyingVm) vmId=`6f505f64-2422-4949-be52-131ae5dfda78`::The vm start process failed Traceback (most recent call last): File "/usr/share/vdsm/vm.py", line 2020, in _startUnderlyingVm self._run() File "/usr/share/vdsm/vm.py", line 2901, in _run self._connection.createXML(domxml, flags), File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 76, in wrapper ret = f(*args, **kwargs) File "/usr/lib64/python2.6/site-packages/libvirt.py", line 2645, in createXML if ret is None:raise libvirtError('virDomainCreateXML() failed', conn=self) libvirtError: internal error unknown protocol type 'gluster' Thread-196808::DEBUG::2013-08-05 08:51:29,759::vm::2444::vm.Vm::(setDownStatus) vmId=`6f505f64-2422-4949-be52-131ae5dfda78`::Changed state to Down: internal error unknown protocol type 'gluster' This means that VDSM generated the right libvirt XML that is used to create the VM. But looks like your qemu (either qemu-kvm or just qemu) on the VDSM hypervisor host is old and doesn't support 'gluster' as a supported protocol. The minimum qemu version needed in qemu 1.3 Can you pls give the o/p of the following commands from your hyp. host 1) rpm -qa| grep qemu 2) ldd `which qemu-kvm | grep libgf`. Replace qemu-kvm with the actual qemu binary if that is not applicable in your case. [root@vmhp60m2 ~]# rpm -qa |grep qemu gpxe-roms-qemu-0.9.7-6.9.el6.noarch qemu-kvm-tools-0.12.1.2-2.355.0.1.el6.centos.6.x86_64 qemu-kvm-0.12.1.2-2.355.0.1.el6.centos.6.x86_64 qemu-img-0.12.1.2-2.355.0.1.el6.centos.6.x86_64 This appears to be the latest qemu-kvm for Centos 6.4 which is what all of our servers run. I'll try to find something newer that's been backported. I foresee a LOT of people with this same problem that are running 6.4 across their Enterprise. FYI even the ovirt-node for EL6 in beta ISOs runs qemu-kvm-0.12 so that won't work either. Caerie, I agree... and I have taken up the same with vdsm-devel list @ https://lists.fedorahosted.org/pipermail/vdsm-devel/2013-August/002497.html Having said that here is the summary of this BZ... 1) Installing vdsm-gluster on the host resolved the "bad volume spec" issue. This will further be fixed on gluster is available post RPM re-structuring and vdsm.spec.in is changed to reflect the dep on the new gluster packages 2) Qemu 1.3 (once installed) should allow you to start a VM gracefully using Gluster storage domain. ( Googling: I don't see a repo for centos which has qemu 1.3) Can you try compiling it if possible to validate this ? thanx, deepak (In reply to Deepak C Shetty from comment #31) > Caerie, > I agree... and I have taken up the same with vdsm-devel list @ > https://lists.fedorahosted.org/pipermail/vdsm-devel/2013-August/002497.html > > Having said that here is the summary of this BZ... > > 1) Installing vdsm-gluster on the host resolved the "bad volume spec" issue. > This will further be fixed on gluster is available post RPM re-structuring > and vdsm.spec.in is changed to reflect the dep on the new gluster packages > > 2) Qemu 1.3 (once installed) should allow you to start a VM gracefully using > Gluster storage domain. ( Googling: I don't see a repo for centos which has > qemu 1.3) Can you try compiling it if possible to validate this ? RHEL 6.5 (and so centos 6.5) will get native gluster support Tracked in 848070 > > thanx, > deepak Update: 1) GlusterFS RPM packaging went thru few changes to accomodate the client & server side usecases. More details abt that discussion is @ http://lists.gnu.org/archive/html/gluster-devel/2013-07/msg00098.html http://lists.gnu.org/archive/html/gluster-devel/2013-08/msg00008.html So the new GlusterFS client side RPMs which are needed by VDSM when its acting as the hyp. host (not storage/gluster node) are available for all the distros in Q. Patch for updating vdsm.spec for this is @ http://gerrit.ovirt.org/#/c/17994/ 2) Per the discussion @ https://lists.fedorahosted.org/pipermail/vdsm-devel/2013-August/002497.html qemu w/ libgfapi support (typically qemu 1.3 but could be qemu <some other version> when libgfapi is backported) is available in F18/F19 repos... either in the distro repo or in the virt-preview repo @ fedorapeople.org/groups/virt/virt-preview/ Per the above thread, qemu w/ libgfapi support is being backported to rhel6 (hence should be available in centos too) and this is being tracked @ https://bugzilla.redhat.com/show_bug.cgi?id=848070 So VDSM as part of installing qemu or User installing it manually will be able to get the latest qemu which has libgfapi (aka gluster backend) support for F18/F19/RHEL/Centos from their respective repos. 3) Note: F17 dep is being removed as it has reached End of Life. See ... http://fedoraproject.org/wiki/End_of_life thanx, deepak Hi. I was wondering if it is possible to include correct version of qemu into ovirt 3.3 repository as it is required for one of ovirt's main new features - Gluster Storage Domain. Is there any release plan for RHEL 6.5? It seems that it will be a long time for having this feature in CentOS (supported by ovirt as I understand). Maciek Hi, I've experienced problems very similar to Caerie's using the following config: - Fedora 19 kernel 3.10.10-200 (engine and hosts) - ovirt-engine 3.3.0-0.7.rc2.fc19 - vdsm 4.12.1 - libvirt 1.0.5 - qemu 1.4 - gluster 3.4 - all hosts both Gluster and virt hosts Issues: - Could not start VM from Gluster domain - Could not delete VM disks from Gluster domain (deletion reports successful however disk remains in list with status Illegal) Notes: - All virt hosts mounted Gluster volume successfully - I was able to start a VM created through Ovirt, but I had to start it directly on the virt host via qemu-system-x86_64 -file gluster://... ovirt-engine.log and vdsm.log from virt host will be attached. I'm in the process of upgrading to ovirt-engine 3.3.0-1 and will report back if the problem persists or is resolved. Cheers, Chris Created attachment 794636 [details]
ovirt-engine.log and vdsm.log excerpts at time of VM start
From the logs: File "/usr/share/vdsm/storage/remoteFileHandler.py", line 199, in callCrabRPCFunction raise err OSError: [Errno 2] No such file or directory: '/rhev/data-center/mnt/glusterSD/192.168.12.108:vmstorage/27fadc4e-a194-4bad-aef0-b85d73d1e51b' Thread-4905::DEBUG::2013-07-31 12:39:22,319::domainMonitor::233::Storage.DomainMonitorThread::(_monitorDomain) Domain 27fadc4e-a194-4bad-aef0-b85d73d1e51b changed its status to Invalid Could you make sure you have set the required permissions on bricks and volume options as per http://www.ovirt.org/Features/GlusterFS_Storage_Domain#Important_Pre-requisites Created attachment 795388 [details]
VDSM log at point of VM start failure - ovirt 3.3.0-1 vdsm 4.12.1
(In reply to Sahina Bose from comment #37) > From the logs: > > File "/usr/share/vdsm/storage/remoteFileHandler.py", line 199, in > callCrabRPCFunction > raise err > OSError: [Errno 2] No such file or directory: > '/rhev/data-center/mnt/glusterSD/192.168.12.108:vmstorage/27fadc4e-a194-4bad- > aef0-b85d73d1e51b' > Thread-4905::DEBUG::2013-07-31 > 12:39:22,319::domainMonitor::233::Storage.DomainMonitorThread:: > (_monitorDomain) Domain 27fadc4e-a194-4bad-aef0-b85d73d1e51b changed its > status to Invalid > > Could you make sure you have set the required permissions on bricks and > volume options as per > http://www.ovirt.org/Features/GlusterFS_Storage_Domain#Important_Pre- > requisites Hi Sahina, I'm not sure that the above log excerpt is from the log extracts I provided - I can confirm that the Gluster volume is Active in the oVirt web interface and mounted on the virt host. I've uploaded a more complete VDSM log file as I realized that the previous excerpt did not include the Python tracebacks - my apologies! The VM start error occurs at line 110930 and appears to be a KeyError on the name of the Gluster volume. The pre-reqs you linked to have already been incorporated prior to the error occurring - Gluster volume information, /etc/glusterd.vol file and relevant RPM listing have been uploaded for confirmation. I've completed upgrading to ovirt-engine 3.3.0-1 and the problem persists. Thanks for your help, Chris Created attachment 795389 [details]
Gluster volume info
Created attachment 795390 [details]
glusterd.vol file
Created attachment 795391 [details]
GlusterFS/libvirt/qemu RPM list from virt host
(In reply to Chris Sullivan from comment #39) > Hi Sahina, > > I'm not sure that the above log excerpt is from the log extracts I provided > - I can confirm that the Gluster volume is Active in the oVirt web interface > and mounted on the virt host. > > I've uploaded a more complete VDSM log file as I realized that the previous > excerpt did not include the Python tracebacks - my apologies! The VM start > error occurs at line 110930 and appears to be a KeyError on the name of the > Gluster volume. > > The pre-reqs you linked to have already been incorporated prior to the error > occurring - Gluster volume information, /etc/glusterd.vol file and relevant > RPM listing have been uploaded for confirmation. > > I've completed upgrading to ovirt-engine 3.3.0-1 and the problem persists. > > Thanks for your help, > > Chris After investigating /usr/share/vdsm/storage/glusterVolume.py I identified the issue as being caused by the GlusterFS volume having an underscore in the name (i.e. r410-01:hades_storage). This is converted to be the mount point 'r410-01:hades__storage' underneath the /rhev/data-center/mnt/glusterSD folder - I'm assuming that the first underscore escapes the second. When glusterVolume.py splits the mount point it does not convert the double underscore back to a single one as per the actual volume name, and so it fails when looking up the volume info for the volume as the volume 'hades__storage' does not exist. I removed all the bricks from the hades_storage volume, cleared the fattrs and .glusterFS folders, then recreated the volume under the name hades. VMs now start successfully from the volume. Hope this helps resolve the issue :) I think that the best way to solve the issue would be to modify the VDSM Gluster code to handle volume names with underscores in a manner consistent with how ovirt-engine defines the mountpoints. Cheers, Chris Hi, Also having / (slash) in front of volume name causes "Bad volume specification" -error message when starting VM stored on Gluster Storage Domain. Ie. server.tld:/volname is bad. Mounting of volume is fine and creating storage domain shows no errors or warnings. As far as I know it's perfectly fine to use / with GlusterFS when mounting volumes. -samuli (In reply to Samuli Heinonen from comment #44) > Hi, > > Also having / (slash) in front of volume name causes "Bad volume > specification" -error message when starting VM stored on Gluster Storage > Domain. Ie. server.tld:/volname is bad. Mounting of volume is fine and > creating storage domain shows no errors or warnings. > > As far as I know it's perfectly fine to use / with GlusterFS when mounting > volumes. > > -samuli Hi Samuli, I found the same issue - it's because the / gets converted to an underscore which causes the same problem as an underscore in the actual volume name i.e. it is not properly stripped out when the volume information is being looked up. Cheers, Chris Chris, Samuli, Thanks for reporting these issues and special thanks to Chris for helping root cause it. I have fixed both the '_' and leading '/' issues in http://gerrit.ovirt.org/#/c/19219/ I tested with server:vol_name and server:/vol_name and it worked fine. The fix is very minimal, so pls see if you can validate in your env. by taking the patch and/or applying the patch changes directly in /usr/share/vdsm/storage/glusterVolume.py and restarting vdsmd and re-testing. Sahina, Thanks for the follow-up. thanx, deepak Thanks Deepak, I'm happy to confirm that it fixes the issue when using /. -samuli closing as this should be in 3.3 (doing so in bulk, so may be incorrect) |