Bug 764963 (GLUSTER-3231) - glusterd won't start on SSA
Summary: glusterd won't start on SSA
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: GLUSTER-3231
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: glusterfs
Version: 1.0
Hardware: x86_64
OS: Linux
urgent
medium
Target Milestone: ---
: ---
Assignee: Csaba Henk
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-07-22 21:14 UTC by Vikas Gorur
Modified: 2012-07-20 06:51 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-07-20 06:51:31 UTC
Target Upstream Version:


Attachments (Terms of Use)
TRACE logs for glusterd (3.15 KB, application/x-gzip)
2011-07-22 18:14 UTC, Vikas Gorur
no flags Details

Description Vikas Gorur 2011-07-22 18:14:56 UTC
Created attachment 583

Comment 1 Vikas Gorur 2011-07-22 21:14:20 UTC
Customer (AT&T) is on SSA. GlusterFS upgraded to 3.2.2.

glusterd does not start. TRACE logs are attached. glusterd did start once or twice, but it is very rare. 90% of the time it fails to start.

/etc/hosts is fine. Both servers had only one eth0 interface. RDMA is not being used. I installed glusterfs-rdma anyway and it still didn't work.

Debugging with gdb on rpcsvc_transport_create didn't yield anything. The listen() call succeeded, but glusterd still didn't start.

Comment 2 tcp 2011-07-23 01:08:20 UTC
Looks like the problem could be in geo-replication config. Maybe some tools required by geo-replication are missing on the system?

glusterd init() calls configure_syncdaemon() which uses a lot of system() calls without logging in case of error. As you can see from the attached logs, the initialization has proceeded upto glusterd_uuid_init(). glusterd_restore() has a debug message (in the code) which says what value it returned, but that is not seen in the log messages. Good likely hood that it was never called. The only function before this was configure_syncdaemon(). Any of its invoked system() commands could have failed.

Other system logs might throw light on what command could have failed.

Pavan

Comment 3 Vijay Bellur 2011-07-25 04:18:12 UTC
Csaba,

Can you please look into this? If glusterd's init() fails because of gsync's init failure, the failure reason should be captured in the logs. Else we should have masked this error and let glusterd continue.

Vijay

Comment 4 Csaba Henk 2011-07-25 04:25:48 UTC
(In reply to comment #3)
Yeah, Pavan told me about that already.

Comment 5 Csaba Henk 2011-07-25 05:32:45 UTC
For which branch(es) do we need a fix? The patch made for master won't necessarily apply to the version used by customers who'd need it.

Comment 6 Vijay Bellur 2011-07-25 05:45:37 UTC
(In reply to comment #5)
> For which branch(es) do we need a fix? The patch made for master won't
> necessarily apply to the version used by customers who'd need it.

We need a fix for release-3.2.

Comment 7 Renee 2011-08-01 20:19:37 UTC
raising to P1 - customer impact

Comment 8 Anand Avati 2011-08-24 08:33:32 UTC
CHANGE: http://review.gluster.com/96 (Change-Id: I28de4cce140faf1b35ecdc5cbd408f21c9926341) merged in master by Vijay Bellur (vijay)

Comment 9 Anand Avati 2011-09-22 09:10:33 UTC
CHANGE: http://review.gluster.com/167 (Change-Id: I28de4cce140faf1b35ecdc5cbd408f21c9926341) merged in release-3.2 by Vijay Bellur (vijay)

Comment 10 Csaba Henk 2011-09-27 14:02:56 UTC
(In reply to comment #9)
> CHANGE: http://review.gluster.com/167 (Change-Id:
> I28de4cce140faf1b35ecdc5cbd408f21c9926341) merged in release-3.2 by Vijay
> Bellur (vijay)

Any update on this, with the log enhancement patch included?

Comment 11 Amar Tumballi 2012-06-08 10:04:45 UTC
All are 'GlusterFS-Commercial' bugs, mostly related to customers a year back or so. Good to have a resolution on these issues. Moving the component considering the visibility in RHS component :-)

Comment 13 Vidya Sakar 2012-07-20 06:51:31 UTC
Closing the bug as it is for SSA. Please reopen if this behaviour is observed with RHS too.


Note You need to log in before you can comment on or make changes to this bug.