Bug 858732
Summary: | glusterd does not start anymore on one node | |||
---|---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | daniel de baerdemaeker <debaerd> | |
Component: | glusterd | Assignee: | bugs <bugs> | |
Status: | CLOSED EOL | QA Contact: | ||
Severity: | high | Docs Contact: | ||
Priority: | medium | |||
Version: | mainline | CC: | awbelikov, bugs, cww, djuran, gareth.glaccum, gianluca.cecchi, gluster-bugs, hakan, hamiller, kparthas, mbukatov, ndevos, rwheeler, shyu, smitra | |
Target Milestone: | --- | Keywords: | Triaged | |
Target Release: | --- | |||
Hardware: | x86_64 | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | Bug Fix | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1190099 1269929 (view as bug list) | Environment: | ||
Last Closed: | 2015-10-22 15:46:38 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1190099, 1269929 |
Description
daniel de baerdemaeker
2012-09-19 14:15:40 UTC
-bash-4.1# cat /etc/glusterfs/glusterd.vol volume management type mgmt/glusterd option working-directory /var/lib/glusterd option transport-type socket,rdma option transport.socket.keepalive-time 10 option transport.socket.keepalive-interval 2 option transport.socket.read-fail-log off end-volume -bash-4.1# can you please see if its fixed with 3.3.1 version or 3.4.0qa* (qa6 as of now) version? we are not able to reproduce this in house. I have managed to reproduce this on 3.3.1 Two bricks in replication. An external program filled up / (including /var). Upon reboot that system could no longer start glusterd, even after space was made. To rectify, and I cannot be sure why, I had modified /var/lib/glusterd/vols/mythfe3brick/bricks/192.168.1.31\:-glusterfs-brick1 on 192.168.1.31 which was the system whch filled the disk I change the line from listen-port=0 to listen-port=24009 I had a similar problem. In my case /var/lib/glusterd/peers/xxx had become empty after the reboot. I resolved it by "rm /var/lib/glusterd/peers/xxx". Then I could start glusterfsd and readd the peer using "gluster peer probe". Hello, I have the same problem of / becoming full of one node that is Fedora 19 with gluster 3.4.1. Under / there are quite all paths (apart /tmp). I have removed /var/lib/glusterd/peers/xxx file and rebooted the server. Now I don't get anymore the error 0-management: Initialization of volume 'management' failed, review your volfile again but glusterd doesn't start and I get: [2013-11-22 09:35:23.843703] W [rpc-transport.c:175:rpc_transport_load] 0-rpc-transport: missing 'option transport-type'. defaulting to "socket" [2013-11-22 09:35:23.847153] I [socket.c:3480:socket_init] 0-glusterfs: SSL support is NOT enabled [2013-11-22 09:35:23.847213] I [socket.c:3495:socket_init] 0-glusterfs: using system polling thread [2013-11-22 09:35:23.860648] I [cli-cmd-volume.c:1275:cli_check_gsync_present] 0-: geo-replication not installed [2013-11-22 09:35:23.861496] E [socket.c:2157:socket_connect_finish] 0-glusterfs: connection to 127.0.0.1:24007 failed (Connection refused) # systemctl status glusterd glusterd.service - GlusterFS an clustered file-system server Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled) Active: failed (Result: exit-code) since Fri 2013-11-22 10:40:23 CET; 51min ago Process: 1042 ExecStart=/usr/sbin/glusterd -p /run/glusterd.pid (code=exited, status=1/FAILURE) Nov 22 10:40:22 f18ovn01 systemd[1]: Starting GlusterFS an clustered file-system server... Nov 22 10:40:23 f18ovn01 systemd[1]: glusterd.service: control process exited, code=exited status=1 Nov 22 10:40:23 f18ovn01 systemd[1]: Failed to start GlusterFS an clustered file-system server. Nov 22 10:40:23 f18ovn01 systemd[1]: Unit glusterd.service entered failed state. Any other files to check to solve this further error? fw seems ok: # iptables -L -n | grep 24007 ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpt:24007 and port not used: [root@f18ovn01 glusterfs]# netstat -an|grep 24007 [root@f18ovn01 glusterfs]# Comparing on the other host it should be glusterd itself listening on that port on which it receives the connection refused error??? [root@f18ovn03 glusterfs]# ps -ef|grep glusterd.pid root 1043 1 0 Nov21 ? 00:04:03 /usr/sbin/glusterd -p /run/glusterd.pid [root@f18ovn03 glusterfs]# lsof -Pp 1043 | grep 24007 glusterd 1043 root 7u IPv4 21822 0t0 TCP f18ovn03:24007->f18ovn03:1022 (ESTABLISHED) glusterd 1043 root 9u IPv4 13859 0t0 TCP *:24007 (LISTEN) glusterd 1043 root 10u IPv4 13888 0t0 TCP f18ovn03:24007->f18ovn03:1021 (ESTABLISHED) glusterd 1043 root 12u IPv4 13891 0t0 TCP localhost:24007->localhost:1020 (ESTABLISHED) glusterd 1043 root 13u IPv4 13893 0t0 TCP localhost:24007->localhost:1019 (ESTABLISHED) Any update on this bug? We have run into the same problem: [2015-05-18 10:38:43.843495] W [rdma.c:4197:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event channel creation failed (No such device) [2015-05-18 10:38:43.843513] E [rdma.c:4485:init] 0-rdma.management: Failed to initialize IB Device [2015-05-18 10:38:43.843522] E [rpc-transport.c:320:rpc_transport_load] 0-rpc-transport: 'rdma' initialization failed [2015-05-18 10:38:43.843562] W [rpcsvc.c:1389:rpcsvc_transport_create] 0-rpc-service: cannot create listener, initing the transport failed [2015-05-18 10:38:45.142777] E [run.c:190:runner_log] 0-glusterd: command failed: /usr/lib/x86_64-linux-gnu/glusterfs/gsyncd -c /etc/glusterd/geo-replication/gsyncd.conf --config-set-rx gluster-params xlator-option=*-dht.assert-no-child-down=true . [2015-05-18 10:38:45.142845] E [xlator.c:390:xlator_init] 0-management: Initialization of volume 'management' failed, review your volfile again [2015-05-18 10:38:45.142857] E [graph.c:292:glusterfs_graph_init] 0-management: initializing translator failed [2015-05-18 10:38:45.142866] E [graph.c:479:glusterfs_graph_activate] 0-graph: init failed (In reply to Anatoly Belikov from comment #8) > Any update on this bug? We have run into the same problem: ... > [2015-05-18 10:38:45.142777] E [run.c:190:runner_log] 0-glusterd: command > failed: /usr/lib/x86_64-linux-gnu/glusterfs/gsyncd -c > /etc/glusterd/geo-replication/gsyncd.conf --config-set-rx gluster-params > xlator-option=*-dht.assert-no-child-down=true . ... This is not the same problem, but it has the same effect. Please file a new bug report against the geo-replication component. This could be a packaging issue, or something else that caused the "gsync" command to fail. --- The original problem reported in this bug is caused by glusterd being unable to read some of its configuration files. This can (or could?) happen when /var/lib is full or out of inodes. Cleanup and manually restoring the configuration under /var/ilb/glusterd is needed in that case. KP or some of the other GlusterD developers can chime in with more details, and maybe a link to the documentation or email that describes how to restore the configuration. because of the large number of bugs filed against mainline version\ is ambiguous and about to be removed as a choice. If you believe this is still a bug, please change the status back to NEW and choose the appropriate, applicable version for it. The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days |