Bug 1356058
Summary: | glusterd doesn't scan for free ports from base range (49152) if last allocated port is greater than base port | |||
---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Anoop <annair> | |
Component: | glusterd | Assignee: | Atin Mukherjee <amukherj> | |
Status: | CLOSED ERRATA | QA Contact: | Byreddy <bsrirama> | |
Severity: | urgent | Docs Contact: | ||
Priority: | unspecified | |||
Version: | rhgs-3.1 | CC: | amainkar, amukherj, annair, bsrirama, hchiramm, pkarampu, pprakash, rcyriac, rgowdapp, rhinduja, rhs-bugs, vbellur | |
Target Milestone: | --- | Keywords: | Reopened | |
Target Release: | RHGS 3.2.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | aplo | |||
Fixed In Version: | glusterfs-3.8.4-1 | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1358194 (view as bug list) | Environment: | ||
Last Closed: | 2017-03-23 05:39:30 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1351522 |
Description
Anoop
2016-07-13 10:45:03 UTC
Anoop, Took a look at the logs: [2016-07-13 15:38:25.372642] I [MSGID: 114018] [client.c:2037:client_rpc_notify] 0-vol_245115537574a798a1db158b9c130b00-client-5: disconnected from vol_245115537574a798a1db158b9c130b00-client-5. Client process will keep trying to connect to glusterd until brick's port is available [2016-07-13 15:38:36.347314] I [MSGID: 108006] [afr-common.c:4273:afr_local_init] 0-vol_245115537574a798a1db158b9c130b00-replicate-0: no subvolumes up [2016-07-13 15:38:36.347387] I [MSGID: 108006] [afr-common.c:4273:afr_local_init] 0-vol_245115537574a798a1db158b9c130b00-replicate-1: no subvolumes up Based on this we see that the time to connect to bricks is taking more than 10 seconds. If it takes more than 10 (It waited from [2016-07-13 15:38:25.372642] till [2016-07-13 15:38:36.347314]) seconds it considers this as disconnect. So the gfapi is not able to bring the volume up in 10 seconds which is the reason for this problem. I think the reason it gives in cli output is not correct, we need to change it. But the thing we need to find out is why is connecting to bricks taking more than 10 seconds. Also added Raghavendra G who maintains RPC. Given that RHGS bugzilla components have been re-aligned as per upstream components, replicate will be the right component to use sh-4.2# gluster vol heal vol_ff9a5c030680f66b5bd09d0ab01aae1c info Brick 10.70.42.21:/var/lib/heketi/mounts/vg_181bbba243c96ee49f6b5cb6a6e5021e/brick_6ed0731c04013252df0b3664f298de31/brick Status: Transport endpoint is not connected Number of entries: - Brick 10.70.43.69:/var/lib/heketi/mounts/vg_2f2a03809f3f15d1ba6b6303e4f23689/brick_04811366b19a90a07b047d48a7f3c2e6/brick Status: Transport endpoint is not connected Number of entries: - Brick 10.70.42.79:/var/lib/heketi/mounts/vg_5695c40a0db83ab24420867e5c8ea353/brick_3c00a9d999e5237c339720a08228e1c1/brick Status: Connected Number of entries: 0 Brick 10.70.42.21:/var/lib/heketi/mounts/vg_181bbba243c96ee49f6b5cb6a6e5021e/brick_12e749136fd867e9f673631d45a4acac/brick Status: Transport endpoint is not connected Number of entries: - Brick 10.70.43.69:/var/lib/heketi/mounts/vg_2f2a03809f3f15d1ba6b6303e4f23689/brick_7e57f03448767cd3c502db6c99d35698/brick Status: Transport endpoint is not connected Number of entries: - Brick 10.70.42.79:/var/lib/heketi/mounts/vg_5695c40a0db83ab24420867e5c8ea353/brick_43a3cba69d834ef59b09656500e35ce8/brick Status: Connected Number of entries: 0 I could see following errors during socket_connect: [2016-07-14 17:27:40.301966] E [socket.c:2395:socket_connect_finish] 0-vol_ff9a5c030680f66b5bd09d0ab01aae1c-client-1: connection to 10.70.43.69:49347 failed (No route t o host) [2016-07-14 17:27:41.306344] E [socket.c:2395:socket_connect_finish] 0-vol_ff9a5c030680f66b5bd09d0ab01aae1c-client-4: connection to 10.70.43.69:49348 failed (No route to host) Also corroborated through following cmds: sh-4.2# cat < /dev/tcp/10.70.42.21/49347 sh: connect: No route to host sh: /dev/tcp/10.70.42.21/49347: No route to host sh-4.2# cat < /dev/tcp/10.70.43.69/49348 sh: connect: Connection timed out sh: /dev/tcp/10.70.43.69/49348: Connection timed out Similar errors of no route were seen while trying to connect to 10.70.42.21: [root@dhcp42-21 ~]# ps ax | grep -i vol_ff9a5c030680f66b5bd09d0ab01aae1c 6984 ? Ssl 0:00 /usr/sbin/glusterfsd -s 10.70.42.21 --volfile-id vol_ff9a5c030680f66b5bd09d0ab01aae1c.10.70.42.21.var-lib-heketi-mounts-vg_181bbba243c96ee49f6b5cb6a6e5021e-brick_6ed0731c04013252df0b3664f298de31-brick -p /var/lib/glusterd/vols/vol_ff9a5c030680f66b5bd09d0ab01aae1c/run/10.70.42.21-var-lib-heketi-mounts-vg_181bbba243c96ee49f6b5cb6a6e5021e-brick_6ed0731c04013252df0b3664f298de31-brick.pid -S /var/run/gluster/3e5c1af08a37d9679ffdb17237f6a935.socket --brick-name /var/lib/heketi/mounts/vg_181bbba243c96ee49f6b5cb6a6e5021e/brick_6ed0731c04013252df0b3664f298de31/brick -l /var/log/glusterfs/bricks/var-lib-heketi-mounts-vg_181bbba243c96ee49f6b5cb6a6e5021e-brick_6ed0731c04013252df0b3664f298de31-brick.log --xlator-option *-posix.glusterd-uuid=0db0968e-b44b-44d9-9402-532cd8cca734 --brick-port 49346 --xlator-option vol_ff9a5c030680f66b5bd09d0ab01aae1c-server.listen-port=49346 6993 ? Ssl 0:00 /usr/sbin/glusterfsd -s 10.70.42.21 --volfile-id vol_ff9a5c030680f66b5bd09d0ab01aae1c.10.70.42.21.var-lib-heketi-mounts-vg_181bbba243c96ee49f6b5cb6a6e5021e-brick_12e749136fd867e9f673631d45a4acac-brick -p /var/lib/glusterd/vols/vol_ff9a5c030680f66b5bd09d0ab01aae1c/run/10.70.42.21-var-lib-heketi-mounts-vg_181bbba243c96ee49f6b5cb6a6e5021e-brick_12e749136fd867e9f673631d45a4acac-brick.pid -S /var/run/gluster/9ea3bab99c6aa29e9da656449076c85c.socket --brick-name /var/lib/heketi/mounts/vg_181bbba243c96ee49f6b5cb6a6e5021e/brick_12e749136fd867e9f673631d45a4acac/brick -l /var/log/glusterfs/bricks/var-lib-heketi-mounts-vg_181bbba243c96ee49f6b5cb6a6e5021e-brick_12e749136fd867e9f673631d45a4acac-brick.log --xlator-option *-posix.glusterd-uuid=0db0968e-b44b-44d9-9402-532cd8cca734 --brick-port 49347 --xlator-option vol_ff9a5c030680f66b5bd09d0ab01aae1c-server.listen-port=49347 24377 pts/0 S+ 0:00 grep --color=auto -i vol_ff9a5c030680f66b5bd09d0ab01aae1c [root@dhcp42-21 ~]# logout Connection to 10.70.42.21 closed. Note that on 10.70.42.21, brick process is listening on two ports - 49346 and 49347. sh-4.2# cat < /dev/tcp/10.70.42.21/49346 sh: connect: No route to host sh: /dev/tcp/10.70.42.21/49346: No route to host sh-4.2# cat < /dev/tcp/10.70.42.21/49347 sh: connect: No route to host sh: /dev/tcp/10.70.42.21/49347: No route to host For connection to bricks on 10.70.42.79, there is no problem connecting to ports on which glusterfsd listening - 49346 and 49347. To summarize: * there is a problem connecting to ports on which glusterfs bricks are listening on 10.70.42.21 and 10.70.43.69. These two are the machines on which self-heal daemon reported ENOTCONN. * On same machines - 10.70.42.21 and 10.70.43.69 - there is no problem connecting to port 24007 (glusterd's listening port). Selfheal daemon too was able to connect to portmapper program of glusterd (evidence in logs) * There was no problem connecting to glusterfs bricks' listening ports on 10.70.42.79 Seems like a network problem here. I assume once the network problem is resolved, selfheal daemon should work fine. I tried this on a non-container setup and everything seems to be working fine. However, I'm hitting this issue consistently on the container setup. Should we be also looking at container networking on why the routes are not being established? Yes Anoop. Who would be able to help us in this area? Do you know? IIUC, the ports are in question here, not the host as other ports can be contacted on the same host. How are we doing with the firewall in this space? Its a problem with firewall. I flushed all iptable rules and healinfo is working fine now. We can make the firewall rules permanent using --permanent option so we won't face this issue. The ports mentioned in the developer guide are as follows. So, if the setup is made based on the developer guide, the mentioned ports are *NOT* opened (49346 and 49347) which caused the issue. https://docs.google.com/document/d/1n_O--v9d3yj4Jbxa0bglLInKyCmFX8aH6Pkqh3-RY8s/edit# -A OS_FIREWALL_ALLOW -p tcp -m state --state NEW -m tcp --dport 24007 -j ACCEPT -A OS_FIREWALL_ALLOW -p tcp -m state --state NEW -m tcp --dport 24008 -j ACCEPT -A OS_FIREWALL_ALLOW -p tcp -m state --state NEW -m tcp --dport 2222 -j ACCEPT -A OS_FIREWALL_ALLOW -p tcp -m state --state NEW -m multiport --dports 49152:49251 -j ACCEPT One question remains here, why the port allocation happened on "49367" ( 49152 + 200+ ) and beyond ? considering the brick port allocation starts from 49152 and the number of bricks are very less , I couldnt find a reason to allocate port >= "49367". This has to be answered by glusterd folks. Because we cannot have a rule which opens lots of ports without proper reason. Anoop, can you attach 'netstat' o/p from this host ? glusterd tries to allocate port for brick processes starting with 49152 and if bind fails it will iterate over till a free port is found. But this is for a fresh set up case. On a restart it will start allocating ports from the last_alloc + 1. If you'd end up deleting all the volumes and create new one, it may look like why the port is not 49152 in this case, but thats an expected behaviour. Anoop, could you confirm whether this set up was running from days? The setup has been running for almost 5 days now. And you have deleted/created many volumes? (In reply to Atin Mukherjee from comment #12) > On a restart it will start allocating ports from > the last_alloc + 1. Does that mean, if there are 50 bricks and 5 restarts, glusterd will be atleast trying/attempting to bind with 250 ports ? If yes, it could be an issue. @Anoop, what do you think ? (In reply to Humble Chirammal from comment #15) > (In reply to Atin Mukherjee from comment #12) > > On a restart it will start allocating ports from > > the last_alloc + 1. > > Does that mean, if there are 50 bricks and 5 restarts, glusterd will be > atleast trying/attempting to bind with 250 ports ? If yes, it could be an > issue. @Anoop, what do you think ? No thats not the case.The point I am trying to bring here about two possibilities: 1.base_port, last_alloc all at lower cap i.e. 49152 but glusterd allocated port from 49152 + ~200. This is only possible if the ports starting with 49152 are already in use. netstat may not prove it as the ports might have been freed by this time. 2. Multiple volumes created so that last_alloc moves forward. Note that ports for the bricks for deleted volumes will be freed but will not be reused as the scan is uni directional.once ports are allocated for bricks, on a restart glusterd tries to bind to the same port. And this is the reason I asked Anup whether this setup is running old and if any testing was done where volumes were created and deleted. Please share the setup credentials such that I can take a look at it. Anoop, can I get gluster volume status output please? So if the logs for the same set up are available at Logs available on root.42.41:/root/sosreport-dhcp42-21.lab.eng.blr.redhat.com-20160712154443.tar.xz (as per Anoop) then my guess was correct (Point 2 in comment 16). [root@dhcp42-41 bricks]# pwd /root/sosreport-dhcp42-21.lab.eng.blr.redhat.com-20160713114623/var/log/glusterfs/bricks [root@dhcp42-41 bricks]# grep -irns "49152" * var-lib-heketi-mounts-vg_2af243604dd82a9918105ef492224163-brick_09e042f6f80d3ee2291952b4b1b5f197-brick.log-20160713:1:[2016-07-12 18:24:20.393518] I [MSGID: 100030] [glusterfsd.c:2338:main] 0-/usr/sbin/glusterfsd: Started running /usr/sbin/glusterfsd version 3.7.9 (args: /usr/sbin/glusterfsd -s 10.70.42.21 --volfile-id heketidbstorage.10.70.42.21.var-lib-heketi-mounts-vg_2af243604dd82a9918105ef492224163-brick_09e042f6f80d3ee2291952b4b1b5f197-brick -p /var/lib/glusterd/vols/heketidbstorage/run/10.70.42.21-var-lib-heketi-mounts-vg_2af243604dd82a9918105ef492224163-brick_09e042f6f80d3ee2291952b4b1b5f197-brick.pid -S /var/run/gluster/46e92e1968001faa8ed92294489ca01d.socket --brick-name /var/lib/heketi/mounts/vg_2af243604dd82a9918105ef492224163/brick_09e042f6f80d3ee2291952b4b1b5f197/brick -l /var/log/glusterfs/bricks/var-lib-heketi-mounts-vg_2af243604dd82a9918105ef492224163-brick_09e042f6f80d3ee2291952b4b1b5f197-brick.log --xlator-option *-posix.glusterd-uuid=0db0968e-b44b-44d9-9402-532cd8cca734 --brick-port 49152 --xlator-option heketidbstorage-server.listen-port=49152) [root@dhcp42-41 bricks]# grep -irns "49153" * var-lib-heketi-mounts-vg_6b6ef4b6fe84ee5014a324da05a164ad-brick_b72901778a18a8db4d632fef20da675b-brick.log:6856:[2016-07-12 19:19:13.312349] I [MSGID: 100030] [glusterfsd.c:2338:main] 0-/usr/sbin/glusterfsd: Started running /usr/sbin/glusterfsd version 3.7.9 (args: /usr/sbin/glusterfsd -s 10.70.42.21 --volfile-id heketidbstorage.10.70.42.21.var-lib-heketi-mounts-vg_6b6ef4b6fe84ee5014a324da05a164ad-brick_b72901778a18a8db4d632fef20da675b-brick -p /var/lib/glusterd/vols/heketidbstorage/run/10.70.42.21-var-lib-heketi-mounts-vg_6b6ef4b6fe84ee5014a324da05a164ad-brick_b72901778a18a8db4d632fef20da675b-brick.pid -S /var/run/gluster/6861843ce09d26e8a7ea66136a742f98.socket --brick-name /var/lib/heketi/mounts/vg_6b6ef4b6fe84ee5014a324da05a164ad/brick_b72901778a18a8db4d632fef20da675b/brick -l /var/log/glusterfs/bricks/var-lib-heketi-mounts-vg_6b6ef4b6fe84ee5014a324da05a164ad-brick_b72901778a18a8db4d632fef20da675b-brick.log --xlator-option *-posix.glusterd-uuid=0db0968e-b44b-44d9-9402-532cd8cca734 --brick-port 49153 --xlator-option heketidbstorage-server.listen-port=49153) The above means glusterd allocated 49152 & 49153 for one of the brick process for heketidbstorage. Now if I grep for ports 49154 onwards, I could see subsequent brick logs which indicate that brick process has been started with the mentioned port. Now from the brick log if I extract the volume name I don't see the volume entry in /var/lib/glusterd/vols/ which means this volume is been deleted. For eg: For port 49200 (On random basis) [root@dhcp42-41 bricks]# grep -irns "49200" * var-lib-heketi-mounts-vg_181bbba243c96ee49f6b5cb6a6e5021e-brick_cb9b517043523cd0f510b6df07067f70-brick.log:272:[2016-07-12 15:50:51.349200] I [MSGID: 115029] [server-handshake.c:690:server_setvolume] 0-vol_ddf6856ca7bd654155de1bc0967eb98a-server: accepted client from dhcp42-21.lab.eng.blr.redhat.com-20594-2016/07/12-15:50:49:913931-vol_ddf6856ca7bd654155de1bc0967eb98a-client-2-0-0 (version: 3.7.9) And the volume doesn't exist in /var/lib/glusterd/vols/ [root@dhcp42-41 vols]# pwd /root/sosreport-dhcp42-21.lab.eng.blr.redhat.com-20160713114623/var/lib/glusterd/vols [root@dhcp42-41 vols]# ls -lrt vol_ddf6856ca7bd654155de1bc0967eb98a ls: cannot access vol_ddf6856ca7bd654155de1bc0967eb98a: No such file or directory [root@dhcp42-41 vols]# So I could see that this set up is running since 1st July and because of so many iterations of volume creation/deletion the last_alloc port number has moved forward. This is expected functionality and hence I am closing this bug. Feel free to reopen if you have any other points to debate. Atin and me have this conversation going on mails between the two of us, just putting it on the bug for wider audience. [Anoop] We are going to deal with 100's for volumes in Aplo and we may have to delete and recreate volumes based on app container request. So, if this is an expected behaviour in Gluster then we may invariability get into this issue. [Atin] Gluster as of now doesnt reutilize the ports (already used earlier for gluster bricks, but now the respective volumes are deleted) for freshly created volumes and we'd need to open up ports even if the earlier ports have been freed. In a non container setup deletion of a volume is not a frequent demand/case from users, so we are safe on that regard. But from containers perspective if deletion of a volume is a frequent case then we can not have no of active volumes vs no of open ports ratio mentioned in the workflow doc. We are considering to improve the port allocation logic in gluster.next, but that is not near by any means. [Anoop] I agree with you on Atin. But this is a limitation (wrt. Aplo) and I feel should be known issue rather than it not being a bug. [Atin] From Aplo perspective yes its a limitation and we'd need to highlight it. I will see if we can work on something to get the portmap logic change in rhgs-3.2. ----------------------------------------------------------------------- I'm of the option that we should put this is a known issue. http://review.gluster.org/#/c/14670/ could potentially solve this problem. So I have another solution with the current design and a patch http://review.gluster.org/14939 was posted for review and the same got merged in upstream master now. Downstream patches are merged now. Moving this bug to verified state based on bug https://bugzilla.redhat.com/show_bug.cgi?id=1263090 verified details, where i am seeing glusterd is rescanning for free ports from base port(49152 ) instead of last allocated port. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2017-0486.html |