Description of problem:VM failed to poweron and failed with error "Bad Volume Specification"....in the VDSM logs "Timeout: Connection timed out"... Version-Release number of selected component (if applicable):- Ovirt :- 3.5.1 Glusterfs :- 3.6.1 Host :- 4 Hosts (Compute+ Storage)...each server has 24 bricks Guest VM :- More then 100 VMId :- d877313c18d9783ca09b62acf5588048 VDSM Logs :- http://ur1.ca/jxabi Engine Logs :- http://ur1.ca/jxabv How reproducible: Create the Ovirt Host cluster with 4 hosts and make the gluster volume on it....use the same host nodes for gluster as well as ovirt host compute purpose... Steps to Reproduce: 1. Create Cluster 2. Reboot one of the Host/Storage node.. 3. Try to poweron failed VM's Actual results:All VM's Failed to poweron with error "Bad Volume Specification" Expected results: VM should be poweron without any error... Additional info:In very beginning When i deployed this cluster..all worked well for me(all the guest VM's created and running successfully)....but suddenly one of the host node rebooted and none of the VM can boot up now...and failed with the following error "Bad Volume Specification" VMId :- d877313c18d9783ca09b62acf5588048 VDSM Logs :- http://ur1.ca/jxabi Engine Logs :- http://ur1.ca/jxabv ------------------------ [root@cpu01 ~]# vdsClient -s 0 getVolumeInfo e732a82f-bae9-4368-8b98-dedc1c3814de 00000002-0002-0002-0002-000000000145 6d123509-6867-45cf-83a2-6d679b77d3c5 9030bb43-6bc9-462f-a1b9-f6d5a02fb180 status = OK domain = e732a82f-bae9-4368-8b98-dedc1c3814de capacity = 21474836480 voltype = LEAF description = parent = 00000000-0000-0000-0000-000000000000 format = RAW image = 6d123509-6867-45cf-83a2-6d679b77d3c5 uuid = 9030bb43-6bc9-462f-a1b9-f6d5a02fb180 disktype = 2 legality = LEGAL mtime = 0 apparentsize = 21474836480 truesize = 4562972672 type = SPARSE children = [] pool = ctime = 1422676305 ---------------------
I have found some disconnection errors in the bricks logs...
Hi All, With the help of gluster community and ovirt-china community...my issue got resolved... The main root cause was the following :- 1. the glob operation takes quite a long time, longer than the ioprocess default 60s.. 2. python-ioprocess updated which makes a single change of configuration file doesn't work properly, only because this we should hack the code manually... Solution (Need to do on all the hosts) :- 1. Add the the ioprocess timeout value in the /etc/vdsm/vdsm.conf file as :- ------------ [irs] process_pool_timeout = 180 ------------- 2. Check /usr/share/vdsm/storage/outOfProcess.py, line 71 and see whether there is still "IOProcess(DEFAULT_TIMEOUT)" in it,if yes...then changing the configuration file takes no effect because now timeout is the third parameter not the second of IOProcess.__init__(). 3. Change IOProcess(DEFAULT_TIMEOUT) to IOProcess(timeout=DEFAULT_TIMEOUT) and remove the /usr/share/vdsm/storage/outOfProcess.pyc file and restart vdsm and supervdsm service on all hosts.... Thanks,
Thanks punit for updating the bug. Can you please close the bug ?
Closing this bug as per reporter #c2 problem have solved. feel free to re-open this bug if issue exist.