1203506 – Bad Volume Specification | Connection Time Out

Bug 1203506 - Bad Volume Specification | Connection Time Out

Summary: Bad Volume Specification | Connection Time Out

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	glusterd
Sub Component:
Version:	3.6.1
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Assignee:	punit
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2015-03-19 02:42 UTC by punit
Modified:	2016-02-23 12:33 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2016-02-23 12:33:07 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description punit 2015-03-19 02:42:03 UTC

Description of problem:VM failed to poweron and failed with error "Bad Volume Specification"....in the VDSM logs "Timeout: Connection timed out"...

Version-Release number of selected component (if applicable):-

Ovirt :- 3.5.1
Glusterfs :- 3.6.1
Host :- 4 Hosts (Compute+ Storage)...each server has 24 bricks
Guest VM :- More then 100

VMId :- d877313c18d9783ca09b62acf5588048

VDSM Logs :- http://ur1.ca/jxabi
Engine Logs :- http://ur1.ca/jxabv


How reproducible: Create the Ovirt Host cluster with 4 hosts and make the gluster volume on it....use the same host nodes for gluster as well as ovirt host compute purpose... 


Steps to Reproduce:
1. Create Cluster
2. Reboot one of the Host/Storage node.. 
3. Try to poweron failed VM's

Actual results:All VM's Failed to poweron with error "Bad Volume Specification"


Expected results: VM should be poweron without any error...


Additional info:In very beginning When i deployed this cluster..all worked well for me(all the guest VM's created and running successfully)....but suddenly one of the host node rebooted and none of the VM can boot up now...and failed with the following error "Bad Volume Specification"

VMId :- d877313c18d9783ca09b62acf5588048

VDSM Logs :- http://ur1.ca/jxabi
Engine Logs :- http://ur1.ca/jxabv

------------------------
[root@cpu01 ~]# vdsClient -s 0 getVolumeInfo e732a82f-bae9-4368-8b98-dedc1c3814de 00000002-0002-0002-0002-000000000145 6d123509-6867-45cf-83a2-6d679b77d3c5 9030bb43-6bc9-462f-a1b9-f6d5a02fb180
        status = OK
        domain = e732a82f-bae9-4368-8b98-dedc1c3814de
        capacity = 21474836480
        voltype = LEAF
        description =
        parent = 00000000-0000-0000-0000-000000000000
        format = RAW
        image = 6d123509-6867-45cf-83a2-6d679b77d3c5
        uuid = 9030bb43-6bc9-462f-a1b9-f6d5a02fb180
        disktype = 2
        legality = LEGAL
        mtime = 0
        apparentsize = 21474836480
        truesize = 4562972672
        type = SPARSE
        children = []
        pool =
        ctime = 1422676305
---------------------

Comment 1 punit 2015-03-23 01:19:29 UTC

I have found some disconnection errors in the bricks logs...

Comment 2 punit 2015-03-25 09:41:45 UTC

Hi All,

With the help of gluster community and ovirt-china community...my issue got resolved...

The main root cause was the following :- 

1. the glob operation takes quite a long time, longer than the ioprocess default 60s..
2. python-ioprocess updated which makes a single change of configuration file doesn't work properly, only because this we should hack the code manually...

 Solution (Need to do on all the hosts) :- 

 1. Add the the ioprocess timeout value in the /etc/vdsm/vdsm.conf file as  :- 

------------
[irs]
process_pool_timeout = 180
-------------

2. Check /usr/share/vdsm/storage/outOfProcess.py, line 71 and see whether there is  still "IOProcess(DEFAULT_TIMEOUT)" in it,if yes...then changing the configuration file takes no effect because now timeout is the third parameter not the second of IOProcess.__init__().

3. Change IOProcess(DEFAULT_TIMEOUT) to IOProcess(timeout=DEFAULT_TIMEOUT) and remove the  /usr/share/vdsm/storage/outOfProcess.pyc file and restart vdsm and supervdsm service on all hosts.... 

Thanks,

Comment 3 Mohammed Rafi KC 2016-02-23 12:24:16 UTC

Thanks punit for updating the bug.

Can you please close the bug ?

Comment 4 Gaurav Kumar Garg 2016-02-23 12:33:07 UTC

Closing this bug as per reporter #c2 problem have solved. feel free to re-open this bug if issue exist.

Note You need to log in before you can comment on or make changes to this bug.