Bug 1782200
Summary: | glusterd restart failing to start. | ||
---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Kshithij Iyer <kiyer> |
Component: | glusterd | Assignee: | bugs <bugs> |
Status: | CLOSED NEXTRELEASE | QA Contact: | |
Severity: | medium | Docs Contact: | |
Priority: | unspecified | ||
Version: | mainline | CC: | bugs, pasik, srakonde |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-01-08 15:04:40 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Kshithij Iyer
2019-12-11 12:17:13 UTC
From when this test case started failing? anyway, it doesn't look like a gluster issue to me. The testcase is failing from Dec 11th onwards. I feel that something is wrong with gluster itself. Please share the glusterd log from the node where it failed to start. I was able to reproduce this issue by just restarting glusterd thrice on a single node. [root@dhcp43-96 ~]# systemctl status glusterd ● glusterd.service - GlusterFS, a clustered file-system server Loaded: loaded (/usr/lib/systemd/system/glusterd.service; disabled; vendor preset: disabled) Active: active (running) since Mon 2019-12-16 19:46:13 IST; 1s ago Docs: man:glusterd(8) Process: 4287 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, status=0/SUCCESS) Main PID: 4288 (glusterd) CGroup: /system.slice/glusterd.service └─4288 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO Dec 16 19:46:13 dhcp43-96.lab.eng.blr.redhat.com systemd[1]: Starting GlusterFS, a clustered file-system server... Dec 16 19:46:13 dhcp43-96.lab.eng.blr.redhat.com systemd[1]: Started GlusterFS, a clustered file-system server. [root@dhcp43-96 ~]# systemctl restart glusterd Job for glusterd.service failed because start of the service was attempted too often. See "systemctl status glusterd.service" and "journalctl -xe" for details. To force a start use "systemctl reset-failed glusterd.service" followed by "systemctl start glusterd.service" again. [root@dhcp43-96 ~]# systemctl status glusterd ● glusterd.service - GlusterFS, a clustered file-system server Loaded: loaded (/usr/lib/systemd/system/glusterd.service; disabled; vendor preset: disabled) Active: failed (Result: start-limit) since Mon 2019-12-16 19:46:18 IST; 2s ago Docs: man:glusterd(8) Process: 4287 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, status=0/SUCCESS) Main PID: 4288 (code=exited, status=15) Dec 16 19:46:13 dhcp43-96.lab.eng.blr.redhat.com systemd[1]: Starting GlusterFS, a clustered file-system server... Dec 16 19:46:13 dhcp43-96.lab.eng.blr.redhat.com systemd[1]: Started GlusterFS, a clustered file-system server. Dec 16 19:46:18 dhcp43-96.lab.eng.blr.redhat.com systemd[1]: Stopping GlusterFS, a clustered file-system server... Dec 16 19:46:18 dhcp43-96.lab.eng.blr.redhat.com systemd[1]: Stopped GlusterFS, a clustered file-system server. Dec 16 19:46:18 dhcp43-96.lab.eng.blr.redhat.com systemd[1]: start request repeated too quickly for glusterd.service Dec 16 19:46:18 dhcp43-96.lab.eng.blr.redhat.com systemd[1]: Failed to start GlusterFS, a clustered file-system server. Dec 16 19:46:18 dhcp43-96.lab.eng.blr.redhat.com systemd[1]: Unit glusterd.service entered failed state. Dec 16 19:46:18 dhcp43-96.lab.eng.blr.redhat.com systemd[1]: glusterd.service failed. [root@dhcp43-96 ~]# cat /var/log/glusterfs/glusterd.log [2019-12-16 14:14:53.877984] I [MSGID: 100030] [glusterfsd.c:2902:main] 0-glusterd: Started running glusterd version 20191213.5ab8350 (args: glusterd --xlator-option *.upgrade=on -N) [2019-12-16 14:14:53.878293] I [glusterfsd.c:2629:daemonize] 0-glusterfs: Pid of current running process is 13851 [2019-12-16 14:14:53.883281] I [MSGID: 106478] [glusterd.c:1431:init] 0-management: Maximum allowed open file descriptors set to 65536 [2019-12-16 14:14:53.883391] I [MSGID: 106479] [glusterd.c:1507:init] 0-management: Using /var/lib/glusterd as working directory [2019-12-16 14:14:53.883418] I [MSGID: 106479] [glusterd.c:1513:init] 0-management: Using /var/run/gluster as pid file working directory [2019-12-16 14:14:53.898243] I [socket.c:1015:__socket_server_bind] 0-socket.management: process started listening on port (24007) [2019-12-16 14:14:53.902709] I [socket.c:958:__socket_server_bind] 0-socket.management: closing (AF_UNIX) reuse check socket 10 [2019-12-16 14:14:53.903971] I [MSGID: 106059] [glusterd.c:1891:init] 0-management: max-port override: 60999 [2019-12-16 14:14:53.905714] E [MSGID: 101032] [store.c:463:gf_store_handle_retrieve] 0-: Path corresponding to /var/lib/glusterd/glusterd.info. [No such file or directory] [2019-12-16 14:14:53.905768] E [MSGID: 101032] [store.c:463:gf_store_handle_retrieve] 0-: Path corresponding to /var/lib/glusterd/glusterd.info. [No such file or directory] [2019-12-16 14:14:53.905778] I [MSGID: 106514] [glusterd-store.c:2153:glusterd_restore_op_version] 0-management: Detected new install. Setting op-version to maximum : 80000 [2019-12-16 14:14:53.905848] E [MSGID: 101032] [store.c:463:gf_store_handle_retrieve] 0-: Path corresponding to /var/lib/glusterd/options. [No such file or directory] [2019-12-16 14:14:53.915932] I [MSGID: 106194] [glusterd-store.c:3977:glusterd_store_retrieve_missed_snaps_list] 0-management: No missed snaps list. [2019-12-16 14:14:53.916055] E [MSGID: 101032] [store.c:463:gf_store_handle_retrieve] 0-: Path corresponding to /var/lib/glusterd/glusterd.upgrade. [No such file or directory] [2019-12-16 14:14:53.916103] I [glusterd.c:2024:init] 0-management: Regenerating volfiles due to a max op-version mismatch or glusterd.upgrade file not being present, op_version retrieved:0, max op_version: 80000 [2019-12-16 14:14:53.916873] W [glusterfsd.c:1631:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7dd5) [0x7fc5d0f36dd5] -->glusterd(glusterfs_sigwaiter+0xe5) [0x56117bd235e5] -->glusterd(cleanup_and_exit+0x6b) [0x56117bd2344b] ) 0-: received signum (15), shutting down [2019-12-16 14:14:53.917004] W [mgmt-pmap.c:132:rpc_clnt_mgmt_pmap_signout] 0-glusterfs: failed to create XDR payload [2019-12-16 14:16:02.641373] I [MSGID: 100030] [glusterfsd.c:2902:main] 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 20191213.5ab8350 (args: /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO) [2019-12-16 14:16:02.645083] I [glusterfsd.c:2629:daemonize] 0-glusterfs: Pid of current running process is 4250 [2019-12-16 14:16:02.659057] I [MSGID: 106478] [glusterd.c:1431:init] 0-management: Maximum allowed open file descriptors set to 65536 [2019-12-16 14:16:02.659188] I [MSGID: 106479] [glusterd.c:1507:init] 0-management: Using /var/lib/glusterd as working directory [2019-12-16 14:16:02.659286] I [MSGID: 106479] [glusterd.c:1513:init] 0-management: Using /var/run/gluster as pid file working directory [2019-12-16 14:16:02.674453] I [socket.c:1015:__socket_server_bind] 0-socket.management: process started listening on port (24007) [2019-12-16 14:16:02.678673] I [socket.c:958:__socket_server_bind] 0-socket.management: closing (AF_UNIX) reuse check socket 12 [2019-12-16 14:16:02.679951] I [MSGID: 106059] [glusterd.c:1891:init] 0-management: max-port override: 60999 [2019-12-16 14:16:02.683970] I [MSGID: 106228] [glusterd.c:484:glusterd_check_gsync_present] 0-glusterd: geo-replication module not installed in the system [No such file or directory] [2019-12-16 14:16:02.684414] E [MSGID: 101032] [store.c:463:gf_store_handle_retrieve] 0-: Path corresponding to /var/lib/glusterd/glusterd.info. [No such file or directory] [2019-12-16 14:16:02.684488] E [MSGID: 101032] [store.c:463:gf_store_handle_retrieve] 0-: Path corresponding to /var/lib/glusterd/glusterd.info. [No such file or directory] [2019-12-16 14:16:02.684498] I [MSGID: 106514] [glusterd-store.c:2153:glusterd_restore_op_version] 0-management: Detected new install. Setting op-version to maximum : 80000 [2019-12-16 14:16:02.685808] I [MSGID: 106194] [glusterd-store.c:3977:glusterd_store_retrieve_missed_snaps_list] 0-management: No missed snaps list. [2019-12-16 14:16:02.686008] I [glusterd.c:2024:init] 0-management: Regenerating volfiles due to a max op-version mismatch or glusterd.upgrade file not being present, op_version retrieved:0, max op_version: 80000 Final graph: +------------------------------------------------------------------------------+ 1: volume management 2: type mgmt/glusterd 3: option rpc-auth.auth-glusterfs on 4: option rpc-auth.auth-unix on 5: option rpc-auth.auth-null on 6: option rpc-auth-allow-insecure on 7: option transport.listen-backlog 1024 8: option max-port 60999 9: option event-threads 1 10: option ping-timeout 0 11: option transport.socket.listen-port 24007 12: option transport.socket.read-fail-log off 13: option transport.socket.keepalive-interval 2 14: option transport.socket.keepalive-time 10 15: option transport-type socket 16: option working-directory /var/lib/glusterd 17: end-volume 18: +------------------------------------------------------------------------------+ [2019-12-16 14:16:02.693808] I [MSGID: 101190] [event-epoll.c:682:event_dispatch_epoll_worker] 0-epoll: Started thread with index 0 [2019-12-16 14:16:09.121556] W [glusterfsd.c:1631:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7dd5) [0x7f5d14e00dd5] -->/usr/sbin/glusterd(glusterfs_sigwaiter+0xe5) [0x55e68e3a35e5] -->/usr/sbin/glusterd(cleanup_and_exit+0x6b) [0x55e68e3a344b] ) 0-: received signum (15), shutting down [2019-12-16 14:16:09.136682] I [MSGID: 100030] [glusterfsd.c:2902:main] 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 20191213.5ab8350 (args: /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO) [2019-12-16 14:16:09.137597] I [glusterfsd.c:2629:daemonize] 0-glusterfs: Pid of current running process is 4269 [2019-12-16 14:16:09.142805] I [MSGID: 106478] [glusterd.c:1431:init] 0-management: Maximum allowed open file descriptors set to 65536 [2019-12-16 14:16:09.142924] I [MSGID: 106479] [glusterd.c:1507:init] 0-management: Using /var/lib/glusterd as working directory [2019-12-16 14:16:09.142951] I [MSGID: 106479] [glusterd.c:1513:init] 0-management: Using /var/run/gluster as pid file working directory [2019-12-16 14:16:09.153574] I [socket.c:1015:__socket_server_bind] 0-socket.management: process started listening on port (24007) [2019-12-16 14:16:09.157273] I [socket.c:958:__socket_server_bind] 0-socket.management: closing (AF_UNIX) reuse check socket 12 [2019-12-16 14:16:09.158414] I [MSGID: 106059] [glusterd.c:1891:init] 0-management: max-port override: 60999 [2019-12-16 14:16:09.160684] I [MSGID: 106228] [glusterd.c:484:glusterd_check_gsync_present] 0-glusterd: geo-replication module not installed in the system [No such file or directory] [2019-12-16 14:16:09.161188] E [MSGID: 101032] [store.c:463:gf_store_handle_retrieve] 0-: Path corresponding to /var/lib/glusterd/glusterd.info. [No such file or directory] [2019-12-16 14:16:09.161277] E [MSGID: 101032] [store.c:463:gf_store_handle_retrieve] 0-: Path corresponding to /var/lib/glusterd/glusterd.info. [No such file or directory] [2019-12-16 14:16:09.161287] I [MSGID: 106514] [glusterd-store.c:2153:glusterd_restore_op_version] 0-management: Detected new install. Setting op-version to maximum : 80000 [2019-12-16 14:16:09.161775] I [MSGID: 106194] [glusterd-store.c:3977:glusterd_store_retrieve_missed_snaps_list] 0-management: No missed snaps list. Final graph: +------------------------------------------------------------------------------+ 1: volume management 2: type mgmt/glusterd 3: option rpc-auth.auth-glusterfs on 4: option rpc-auth.auth-unix on 5: option rpc-auth.auth-null on 6: option rpc-auth-allow-insecure on 7: option transport.listen-backlog 1024 8: option max-port 60999 9: option event-threads 1 10: option ping-timeout 0 11: option transport.socket.listen-port 24007 12: option transport.socket.read-fail-log off 13: option transport.socket.keepalive-interval 2 14: option transport.socket.keepalive-time 10 15: option transport-type socket 16: option working-directory /var/lib/glusterd 17: end-volume 18: +------------------------------------------------------------------------------+ [2019-12-16 14:16:09.163492] I [MSGID: 101190] [event-epoll.c:682:event_dispatch_epoll_worker] 0-epoll: Started thread with index 0 [2019-12-16 14:16:13.947003] W [glusterfsd.c:1631:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7dd5) [0x7f6121672dd5] -->/usr/sbin/glusterd(glusterfs_sigwaiter+0xe5) [0x55be235f15e5] -->/usr/sbin/glusterd(cleanup_and_exit+0x6b) [0x55be235f144b] ) 0-: received signum (15), shutting down [2019-12-16 14:16:13.960474] I [MSGID: 100030] [glusterfsd.c:2902:main] 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 20191213.5ab8350 (args: /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO) [2019-12-16 14:16:13.961417] I [glusterfsd.c:2629:daemonize] 0-glusterfs: Pid of current running process is 4288 [2019-12-16 14:16:13.966378] I [MSGID: 106478] [glusterd.c:1431:init] 0-management: Maximum allowed open file descriptors set to 65536 [2019-12-16 14:16:13.966498] I [MSGID: 106479] [glusterd.c:1507:init] 0-management: Using /var/lib/glusterd as working directory [2019-12-16 14:16:13.966525] I [MSGID: 106479] [glusterd.c:1513:init] 0-management: Using /var/run/gluster as pid file working directory [2019-12-16 14:16:13.976827] I [socket.c:1015:__socket_server_bind] 0-socket.management: process started listening on port (24007) [2019-12-16 14:16:13.980250] I [socket.c:958:__socket_server_bind] 0-socket.management: closing (AF_UNIX) reuse check socket 12 [2019-12-16 14:16:13.981305] I [MSGID: 106059] [glusterd.c:1891:init] 0-management: max-port override: 60999 [2019-12-16 14:16:13.983367] I [MSGID: 106228] [glusterd.c:484:glusterd_check_gsync_present] 0-glusterd: geo-replication module not installed in the system [No such file or directory] [2019-12-16 14:16:13.983826] E [MSGID: 101032] [store.c:463:gf_store_handle_retrieve] 0-: Path corresponding to /var/lib/glusterd/glusterd.info. [No such file or directory] [2019-12-16 14:16:13.983874] E [MSGID: 101032] [store.c:463:gf_store_handle_retrieve] 0-: Path corresponding to /var/lib/glusterd/glusterd.info. [No such file or directory] [2019-12-16 14:16:13.983882] I [MSGID: 106514] [glusterd-store.c:2153:glusterd_restore_op_version] 0-management: Detected new install. Setting op-version to maximum : 80000 [2019-12-16 14:16:13.984298] I [MSGID: 106194] [glusterd-store.c:3977:glusterd_store_retrieve_missed_snaps_list] 0-management: No missed snaps list. Final graph: +------------------------------------------------------------------------------+ 1: volume management 2: type mgmt/glusterd 3: option rpc-auth.auth-glusterfs on 4: option rpc-auth.auth-unix on 5: option rpc-auth.auth-null on 6: option rpc-auth-allow-insecure on 7: option transport.listen-backlog 1024 8: option max-port 60999 9: option event-threads 1 10: option ping-timeout 0 11: option transport.socket.listen-port 24007 12: option transport.socket.read-fail-log off 13: option transport.socket.keepalive-interval 2 14: option transport.socket.keepalive-time 10 15: option transport-type socket 16: option working-directory /var/lib/glusterd 17: end-volume 18: +------------------------------------------------------------------------------+ [2019-12-16 14:16:13.985611] I [MSGID: 101190] [event-epoll.c:682:event_dispatch_epoll_worker] 0-epoll: Started thread with index 0 [2019-12-16 14:16:18.432907] W [glusterfsd.c:1631:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7dd5) [0x7f2dd9903dd5] -->/usr/sbin/glusterd(glusterfs_sigwaiter+0xe5) [0x559a918875e5] -->/usr/sbin/glusterd(cleanup_and_exit+0x6b) [0x559a9188744b] ) 0-: received signum (15), shutting down [root@dhcp43-96 ~]# This is happening because of https://review.gluster.org/#/c/glusterfs/+/23751/. This patch ensures that glusterd will restarted by systemctl for maximum of 3 times in one hour. We can use systemctl reset-failed glusterd.service to reset the counter. what does this particular test do? In this testcase we are creating a volume of any type, after which we fuse mount it. And create 2 snapshots for the volumes with description and check snapshot status information with snapshot name, volume name and without snapshot name/volume name. Post which we restart glusterd and once glusterd is up and running we check the information again. We run this testcase for replicated, distributed-replicated, dispersed, distributed and distributed-dispersed. I would suggest to increase the StartLimitBurst to 6 as If I am not wrong there are a couple of testcases which restart glusterd more than three times or more in one hour. REVIEW: https://review.gluster.org/23970 (glusterd: increase the StartLimitBurst) posted (#1) for review on master by Sanju Rakonde REVIEW: https://review.gluster.org/23970 (glusterd: increase the StartLimitBurst) merged (#2) on master by Sanju Rakonde |