Bug 1476828
Summary: | selfheal deamon getting connection refused, due to bricks listening on different ports | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Nag Pavan Chilakam <nchilaka> |
Component: | glusterd | Assignee: | Atin Mukherjee <amukherj> |
Status: | CLOSED WONTFIX | QA Contact: | Bala Konda Reddy M <bmekala> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | rhgs-3.3 | CC: | mchangir, nchilaka, pkarampu, rgowdapp, rhs-bugs, storage-qa-internal, vbellur |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2017-08-02 04:43:06 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Nag Pavan Chilakam
2017-07-31 14:42:03 UTC
From Pranith's initial debugging: Even fuse client see the bricks down as below [root@dhcp35-126 arbo]# grep connected /mnt/arbo/.meta/graphs/active/arbo-client-*/private /mnt/arbo/.meta/graphs/active/arbo-client-0/private:connected = 0 /mnt/arbo/.meta/graphs/active/arbo-client-1/private:connected = 1 /mnt/arbo/.meta/graphs/active/arbo-client-2/private:connected = 1 /mnt/arbo/.meta/graphs/active/arbo-client-3/private:connected = 0 /mnt/arbo/.meta/graphs/active/arbo-client-4/private:connected = 1 /mnt/arbo/.meta/graphs/active/arbo-client-5/private:connected = 1 [root@dhcp35-126 arbo]# Raghu/Milind, I asked Nag to keep the setup in that state for you guys to take a look to find why the connection is not successful. Pranith > [2017-07-31 14:39:51.525435] E [socket.c:2360:socket_connect_finish] 0-arbo-client-0: connection to 10.70.35.192:49160 failed (Connection refused); disconnecting socket
> [2017-07-31 14:39:51.527889] E [socket.c:2360:socket_connect_finish] 0-arbo-client-3: connection to 10.70.35.192:49157 failed (Connection refused); disconnecting socket
> [2017-07-31 14:39:55.528636] I [rpc-clnt.c:2001:rpc_clnt_reconfig] 0-arbo-client-0: changing port to 49160 (from 0)
> [2017-07-31 14:39:55.531813] I [rpc-clnt.c:2001:rpc_clnt_reconfig] 0-arbo-client-3: changing port to 49157 (from 0)
> [2017-07-31 14:39:55.533989] E [socket.c:2360:socket_connect_finish] 0-arbo-client-0: connection to 10.70.35.192:49160 failed (Connection refused); disconnecting socket
> [2017-07-31 14:39:55.536651] E [socket.c:2360:socket_connect_finish] 0-arbo-client-3: connection to 10.70.35.192:49157 failed (Connection refused); disconnecting socket
Note that the error here is "Connection refused". Looks like the brick is not listening on the port mentioned here. Can we check whether the brick is actually listening on the port shd is trying to connect (use netstat on brick or cmdline of brick process)? In the past we've seen portmapper issue where glusterd returns stale ports to clients resulting in these kind of errors.
(In reply to Raghavendra G from comment #4) > > [2017-07-31 14:39:51.525435] E [socket.c:2360:socket_connect_finish] 0-arbo-client-0: connection to 10.70.35.192:49160 failed (Connection refused); disconnecting socket > > [2017-07-31 14:39:51.527889] E [socket.c:2360:socket_connect_finish] 0-arbo-client-3: connection to 10.70.35.192:49157 failed (Connection refused); disconnecting socket > > [2017-07-31 14:39:55.528636] I [rpc-clnt.c:2001:rpc_clnt_reconfig] 0-arbo-client-0: changing port to 49160 (from 0) > > [2017-07-31 14:39:55.531813] I [rpc-clnt.c:2001:rpc_clnt_reconfig] 0-arbo-client-3: changing port to 49157 (from 0) > > [2017-07-31 14:39:55.533989] E [socket.c:2360:socket_connect_finish] 0-arbo-client-0: connection to 10.70.35.192:49160 failed (Connection refused); disconnecting socket > > [2017-07-31 14:39:55.536651] E [socket.c:2360:socket_connect_finish] 0-arbo-client-3: connection to 10.70.35.192:49157 failed (Connection refused); disconnecting socket > > Note that the error here is "Connection refused". Looks like the brick is > not listening on the port mentioned here. Can we check whether the brick is > actually listening on the port shd is trying to connect (use netstat on > brick or cmdline of brick process)? In the past we've seen portmapper issue > where glusterd returns stale ports to clients resulting in these kind of > errors. Yes, you are right, they are not listening on these ports [root@dhcp35-192 ~]# ps -ef|grep glusterfsd|grep arbo root 12260 1 0 Jul31 ? 00:00:08 /usr/sbin/glusterfsd -s 10.70.35.192 --volfile-id arbo.10.70.35.192.rhs-brick2-arbo -p /var/lib/glusterd/vols/arbo/run/10.70.35.192-rhs-brick2-arbo.pid -S /var/run/gluster/b3da6309c22e2f4797517b5244c7979f.socket --brick-name /rhs/brick2/arbo -l /var/log/glusterfs/bricks/rhs-brick2-arbo.log --xlator-option *-posix.glusterd-uuid=f641078d-dd40-4316-8136-1e6dc58210a2 --brick-port 49155 --xlator-option arbo-server.listen-port=49155 root 12521 1 0 Jul31 ? 00:00:08 /usr/sbin/glusterfsd -s 10.70.35.192 --volfile-id arbo.10.70.35.192.rhs-brick1-arbo -p /var/lib/glusterd/vols/arbo/run/10.70.35.192-rhs-brick1-arbo.pid -S /var/run/gluster/e37f5747fa0bff95d27ef5693d32a873.socket --brick-name /rhs/brick1/arbo -l /var/log/glusterfs/bricks/rhs-brick1-arbo.log --xlator-option *-posix.glusterd-uuid=f641078d-dd40-4316-8136-1e6dc58210a2 --brick-port 49158 --xlator-option arbo-server.listen-port=49158 [root@dhcp35-192 ~]# netstat -ntap|egrep "12260|12521|Addr" Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 0 0 0.0.0.0:49155 0.0.0.0:* LISTEN 12260/glusterfsd tcp 0 0 0.0.0.0:49158 0.0.0.0:* LISTEN 12521/glusterfsd tcp 0 0 10.70.35.192:1023 10.70.35.192:24007 ESTABLISHED 12260/glusterfsd tcp 0 0 10.70.35.192:963 10.70.35.192:24007 ESTABLISHED 12521/glusterfsd Moving to glusterd component as it is a port mapper issue Yes, I used kill -9 |