Bug 1782200
| Summary: | glusterd restart failing to start. | ||
|---|---|---|---|
| Product: | [Community] GlusterFS | Reporter: | Kshithij Iyer <kiyer> |
| Component: | glusterd | Assignee: | bugs <bugs> |
| Status: | CLOSED NEXTRELEASE | QA Contact: | |
| Severity: | medium | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | mainline | CC: | bugs, pasik, srakonde |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-01-08 15:04:40 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Kshithij Iyer
2019-12-11 12:17:13 UTC
From when this test case started failing? anyway, it doesn't look like a gluster issue to me. The testcase is failing from Dec 11th onwards. I feel that something is wrong with gluster itself. Please share the glusterd log from the node where it failed to start. I was able to reproduce this issue by just restarting glusterd thrice on a single node.
[root@dhcp43-96 ~]# systemctl status glusterd
● glusterd.service - GlusterFS, a clustered file-system server
Loaded: loaded (/usr/lib/systemd/system/glusterd.service; disabled; vendor preset: disabled)
Active: active (running) since Mon 2019-12-16 19:46:13 IST; 1s ago
Docs: man:glusterd(8)
Process: 4287 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, status=0/SUCCESS)
Main PID: 4288 (glusterd)
CGroup: /system.slice/glusterd.service
└─4288 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO
Dec 16 19:46:13 dhcp43-96.lab.eng.blr.redhat.com systemd[1]: Starting GlusterFS, a clustered file-system server...
Dec 16 19:46:13 dhcp43-96.lab.eng.blr.redhat.com systemd[1]: Started GlusterFS, a clustered file-system server.
[root@dhcp43-96 ~]# systemctl restart glusterd
Job for glusterd.service failed because start of the service was attempted too often. See "systemctl status glusterd.service" and "journalctl -xe" for details.
To force a start use "systemctl reset-failed glusterd.service" followed by "systemctl start glusterd.service" again.
[root@dhcp43-96 ~]# systemctl status glusterd
● glusterd.service - GlusterFS, a clustered file-system server
Loaded: loaded (/usr/lib/systemd/system/glusterd.service; disabled; vendor preset: disabled)
Active: failed (Result: start-limit) since Mon 2019-12-16 19:46:18 IST; 2s ago
Docs: man:glusterd(8)
Process: 4287 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, status=0/SUCCESS)
Main PID: 4288 (code=exited, status=15)
Dec 16 19:46:13 dhcp43-96.lab.eng.blr.redhat.com systemd[1]: Starting GlusterFS, a clustered file-system server...
Dec 16 19:46:13 dhcp43-96.lab.eng.blr.redhat.com systemd[1]: Started GlusterFS, a clustered file-system server.
Dec 16 19:46:18 dhcp43-96.lab.eng.blr.redhat.com systemd[1]: Stopping GlusterFS, a clustered file-system server...
Dec 16 19:46:18 dhcp43-96.lab.eng.blr.redhat.com systemd[1]: Stopped GlusterFS, a clustered file-system server.
Dec 16 19:46:18 dhcp43-96.lab.eng.blr.redhat.com systemd[1]: start request repeated too quickly for glusterd.service
Dec 16 19:46:18 dhcp43-96.lab.eng.blr.redhat.com systemd[1]: Failed to start GlusterFS, a clustered file-system server.
Dec 16 19:46:18 dhcp43-96.lab.eng.blr.redhat.com systemd[1]: Unit glusterd.service entered failed state.
Dec 16 19:46:18 dhcp43-96.lab.eng.blr.redhat.com systemd[1]: glusterd.service failed.
[root@dhcp43-96 ~]# cat /var/log/glusterfs/glusterd.log
[2019-12-16 14:14:53.877984] I [MSGID: 100030] [glusterfsd.c:2902:main] 0-glusterd: Started running glusterd version 20191213.5ab8350 (args: glusterd --xlator-option *.upgrade=on -N)
[2019-12-16 14:14:53.878293] I [glusterfsd.c:2629:daemonize] 0-glusterfs: Pid of current running process is 13851
[2019-12-16 14:14:53.883281] I [MSGID: 106478] [glusterd.c:1431:init] 0-management: Maximum allowed open file descriptors set to 65536
[2019-12-16 14:14:53.883391] I [MSGID: 106479] [glusterd.c:1507:init] 0-management: Using /var/lib/glusterd as working directory
[2019-12-16 14:14:53.883418] I [MSGID: 106479] [glusterd.c:1513:init] 0-management: Using /var/run/gluster as pid file working directory
[2019-12-16 14:14:53.898243] I [socket.c:1015:__socket_server_bind] 0-socket.management: process started listening on port (24007)
[2019-12-16 14:14:53.902709] I [socket.c:958:__socket_server_bind] 0-socket.management: closing (AF_UNIX) reuse check socket 10
[2019-12-16 14:14:53.903971] I [MSGID: 106059] [glusterd.c:1891:init] 0-management: max-port override: 60999
[2019-12-16 14:14:53.905714] E [MSGID: 101032] [store.c:463:gf_store_handle_retrieve] 0-: Path corresponding to /var/lib/glusterd/glusterd.info. [No such file or directory]
[2019-12-16 14:14:53.905768] E [MSGID: 101032] [store.c:463:gf_store_handle_retrieve] 0-: Path corresponding to /var/lib/glusterd/glusterd.info. [No such file or directory]
[2019-12-16 14:14:53.905778] I [MSGID: 106514] [glusterd-store.c:2153:glusterd_restore_op_version] 0-management: Detected new install. Setting op-version to maximum : 80000
[2019-12-16 14:14:53.905848] E [MSGID: 101032] [store.c:463:gf_store_handle_retrieve] 0-: Path corresponding to /var/lib/glusterd/options. [No such file or directory]
[2019-12-16 14:14:53.915932] I [MSGID: 106194] [glusterd-store.c:3977:glusterd_store_retrieve_missed_snaps_list] 0-management: No missed snaps list.
[2019-12-16 14:14:53.916055] E [MSGID: 101032] [store.c:463:gf_store_handle_retrieve] 0-: Path corresponding to /var/lib/glusterd/glusterd.upgrade. [No such file or directory]
[2019-12-16 14:14:53.916103] I [glusterd.c:2024:init] 0-management: Regenerating volfiles due to a max op-version mismatch or glusterd.upgrade file not being present, op_version retrieved:0, max op_version: 80000
[2019-12-16 14:14:53.916873] W [glusterfsd.c:1631:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7dd5) [0x7fc5d0f36dd5] -->glusterd(glusterfs_sigwaiter+0xe5) [0x56117bd235e5] -->glusterd(cleanup_and_exit+0x6b) [0x56117bd2344b] ) 0-: received signum (15), shutting down
[2019-12-16 14:14:53.917004] W [mgmt-pmap.c:132:rpc_clnt_mgmt_pmap_signout] 0-glusterfs: failed to create XDR payload
[2019-12-16 14:16:02.641373] I [MSGID: 100030] [glusterfsd.c:2902:main] 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 20191213.5ab8350 (args: /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO)
[2019-12-16 14:16:02.645083] I [glusterfsd.c:2629:daemonize] 0-glusterfs: Pid of current running process is 4250
[2019-12-16 14:16:02.659057] I [MSGID: 106478] [glusterd.c:1431:init] 0-management: Maximum allowed open file descriptors set to 65536
[2019-12-16 14:16:02.659188] I [MSGID: 106479] [glusterd.c:1507:init] 0-management: Using /var/lib/glusterd as working directory
[2019-12-16 14:16:02.659286] I [MSGID: 106479] [glusterd.c:1513:init] 0-management: Using /var/run/gluster as pid file working directory
[2019-12-16 14:16:02.674453] I [socket.c:1015:__socket_server_bind] 0-socket.management: process started listening on port (24007)
[2019-12-16 14:16:02.678673] I [socket.c:958:__socket_server_bind] 0-socket.management: closing (AF_UNIX) reuse check socket 12
[2019-12-16 14:16:02.679951] I [MSGID: 106059] [glusterd.c:1891:init] 0-management: max-port override: 60999
[2019-12-16 14:16:02.683970] I [MSGID: 106228] [glusterd.c:484:glusterd_check_gsync_present] 0-glusterd: geo-replication module not installed in the system [No such file or directory]
[2019-12-16 14:16:02.684414] E [MSGID: 101032] [store.c:463:gf_store_handle_retrieve] 0-: Path corresponding to /var/lib/glusterd/glusterd.info. [No such file or directory]
[2019-12-16 14:16:02.684488] E [MSGID: 101032] [store.c:463:gf_store_handle_retrieve] 0-: Path corresponding to /var/lib/glusterd/glusterd.info. [No such file or directory]
[2019-12-16 14:16:02.684498] I [MSGID: 106514] [glusterd-store.c:2153:glusterd_restore_op_version] 0-management: Detected new install. Setting op-version to maximum : 80000
[2019-12-16 14:16:02.685808] I [MSGID: 106194] [glusterd-store.c:3977:glusterd_store_retrieve_missed_snaps_list] 0-management: No missed snaps list.
[2019-12-16 14:16:02.686008] I [glusterd.c:2024:init] 0-management: Regenerating volfiles due to a max op-version mismatch or glusterd.upgrade file not being present, op_version retrieved:0, max op_version: 80000
Final graph:
+------------------------------------------------------------------------------+
1: volume management
2: type mgmt/glusterd
3: option rpc-auth.auth-glusterfs on
4: option rpc-auth.auth-unix on
5: option rpc-auth.auth-null on
6: option rpc-auth-allow-insecure on
7: option transport.listen-backlog 1024
8: option max-port 60999
9: option event-threads 1
10: option ping-timeout 0
11: option transport.socket.listen-port 24007
12: option transport.socket.read-fail-log off
13: option transport.socket.keepalive-interval 2
14: option transport.socket.keepalive-time 10
15: option transport-type socket
16: option working-directory /var/lib/glusterd
17: end-volume
18:
+------------------------------------------------------------------------------+
[2019-12-16 14:16:02.693808] I [MSGID: 101190] [event-epoll.c:682:event_dispatch_epoll_worker] 0-epoll: Started thread with index 0
[2019-12-16 14:16:09.121556] W [glusterfsd.c:1631:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7dd5) [0x7f5d14e00dd5] -->/usr/sbin/glusterd(glusterfs_sigwaiter+0xe5) [0x55e68e3a35e5] -->/usr/sbin/glusterd(cleanup_and_exit+0x6b) [0x55e68e3a344b] ) 0-: received signum (15), shutting down
[2019-12-16 14:16:09.136682] I [MSGID: 100030] [glusterfsd.c:2902:main] 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 20191213.5ab8350 (args: /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO)
[2019-12-16 14:16:09.137597] I [glusterfsd.c:2629:daemonize] 0-glusterfs: Pid of current running process is 4269
[2019-12-16 14:16:09.142805] I [MSGID: 106478] [glusterd.c:1431:init] 0-management: Maximum allowed open file descriptors set to 65536
[2019-12-16 14:16:09.142924] I [MSGID: 106479] [glusterd.c:1507:init] 0-management: Using /var/lib/glusterd as working directory
[2019-12-16 14:16:09.142951] I [MSGID: 106479] [glusterd.c:1513:init] 0-management: Using /var/run/gluster as pid file working directory
[2019-12-16 14:16:09.153574] I [socket.c:1015:__socket_server_bind] 0-socket.management: process started listening on port (24007)
[2019-12-16 14:16:09.157273] I [socket.c:958:__socket_server_bind] 0-socket.management: closing (AF_UNIX) reuse check socket 12
[2019-12-16 14:16:09.158414] I [MSGID: 106059] [glusterd.c:1891:init] 0-management: max-port override: 60999
[2019-12-16 14:16:09.160684] I [MSGID: 106228] [glusterd.c:484:glusterd_check_gsync_present] 0-glusterd: geo-replication module not installed in the system [No such file or directory]
[2019-12-16 14:16:09.161188] E [MSGID: 101032] [store.c:463:gf_store_handle_retrieve] 0-: Path corresponding to /var/lib/glusterd/glusterd.info. [No such file or directory]
[2019-12-16 14:16:09.161277] E [MSGID: 101032] [store.c:463:gf_store_handle_retrieve] 0-: Path corresponding to /var/lib/glusterd/glusterd.info. [No such file or directory]
[2019-12-16 14:16:09.161287] I [MSGID: 106514] [glusterd-store.c:2153:glusterd_restore_op_version] 0-management: Detected new install. Setting op-version to maximum : 80000
[2019-12-16 14:16:09.161775] I [MSGID: 106194] [glusterd-store.c:3977:glusterd_store_retrieve_missed_snaps_list] 0-management: No missed snaps list.
Final graph:
+------------------------------------------------------------------------------+
1: volume management
2: type mgmt/glusterd
3: option rpc-auth.auth-glusterfs on
4: option rpc-auth.auth-unix on
5: option rpc-auth.auth-null on
6: option rpc-auth-allow-insecure on
7: option transport.listen-backlog 1024
8: option max-port 60999
9: option event-threads 1
10: option ping-timeout 0
11: option transport.socket.listen-port 24007
12: option transport.socket.read-fail-log off
13: option transport.socket.keepalive-interval 2
14: option transport.socket.keepalive-time 10
15: option transport-type socket
16: option working-directory /var/lib/glusterd
17: end-volume
18:
+------------------------------------------------------------------------------+
[2019-12-16 14:16:09.163492] I [MSGID: 101190] [event-epoll.c:682:event_dispatch_epoll_worker] 0-epoll: Started thread with index 0
[2019-12-16 14:16:13.947003] W [glusterfsd.c:1631:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7dd5) [0x7f6121672dd5] -->/usr/sbin/glusterd(glusterfs_sigwaiter+0xe5) [0x55be235f15e5] -->/usr/sbin/glusterd(cleanup_and_exit+0x6b) [0x55be235f144b] ) 0-: received signum (15), shutting down
[2019-12-16 14:16:13.960474] I [MSGID: 100030] [glusterfsd.c:2902:main] 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 20191213.5ab8350 (args: /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO)
[2019-12-16 14:16:13.961417] I [glusterfsd.c:2629:daemonize] 0-glusterfs: Pid of current running process is 4288
[2019-12-16 14:16:13.966378] I [MSGID: 106478] [glusterd.c:1431:init] 0-management: Maximum allowed open file descriptors set to 65536
[2019-12-16 14:16:13.966498] I [MSGID: 106479] [glusterd.c:1507:init] 0-management: Using /var/lib/glusterd as working directory
[2019-12-16 14:16:13.966525] I [MSGID: 106479] [glusterd.c:1513:init] 0-management: Using /var/run/gluster as pid file working directory
[2019-12-16 14:16:13.976827] I [socket.c:1015:__socket_server_bind] 0-socket.management: process started listening on port (24007)
[2019-12-16 14:16:13.980250] I [socket.c:958:__socket_server_bind] 0-socket.management: closing (AF_UNIX) reuse check socket 12
[2019-12-16 14:16:13.981305] I [MSGID: 106059] [glusterd.c:1891:init] 0-management: max-port override: 60999
[2019-12-16 14:16:13.983367] I [MSGID: 106228] [glusterd.c:484:glusterd_check_gsync_present] 0-glusterd: geo-replication module not installed in the system [No such file or directory]
[2019-12-16 14:16:13.983826] E [MSGID: 101032] [store.c:463:gf_store_handle_retrieve] 0-: Path corresponding to /var/lib/glusterd/glusterd.info. [No such file or directory]
[2019-12-16 14:16:13.983874] E [MSGID: 101032] [store.c:463:gf_store_handle_retrieve] 0-: Path corresponding to /var/lib/glusterd/glusterd.info. [No such file or directory]
[2019-12-16 14:16:13.983882] I [MSGID: 106514] [glusterd-store.c:2153:glusterd_restore_op_version] 0-management: Detected new install. Setting op-version to maximum : 80000
[2019-12-16 14:16:13.984298] I [MSGID: 106194] [glusterd-store.c:3977:glusterd_store_retrieve_missed_snaps_list] 0-management: No missed snaps list.
Final graph:
+------------------------------------------------------------------------------+
1: volume management
2: type mgmt/glusterd
3: option rpc-auth.auth-glusterfs on
4: option rpc-auth.auth-unix on
5: option rpc-auth.auth-null on
6: option rpc-auth-allow-insecure on
7: option transport.listen-backlog 1024
8: option max-port 60999
9: option event-threads 1
10: option ping-timeout 0
11: option transport.socket.listen-port 24007
12: option transport.socket.read-fail-log off
13: option transport.socket.keepalive-interval 2
14: option transport.socket.keepalive-time 10
15: option transport-type socket
16: option working-directory /var/lib/glusterd
17: end-volume
18:
+------------------------------------------------------------------------------+
[2019-12-16 14:16:13.985611] I [MSGID: 101190] [event-epoll.c:682:event_dispatch_epoll_worker] 0-epoll: Started thread with index 0
[2019-12-16 14:16:18.432907] W [glusterfsd.c:1631:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7dd5) [0x7f2dd9903dd5] -->/usr/sbin/glusterd(glusterfs_sigwaiter+0xe5) [0x559a918875e5] -->/usr/sbin/glusterd(cleanup_and_exit+0x6b) [0x559a9188744b] ) 0-: received signum (15), shutting down
[root@dhcp43-96 ~]#
This is happening because of https://review.gluster.org/#/c/glusterfs/+/23751/. This patch ensures that glusterd will restarted by systemctl for maximum of 3 times in one hour. We can use systemctl reset-failed glusterd.service to reset the counter. what does this particular test do? In this testcase we are creating a volume of any type, after which we fuse mount it. And create 2 snapshots for the volumes with description and check snapshot status information with snapshot name, volume name and without snapshot name/volume name. Post which we restart glusterd and once glusterd is up and running we check the information again. We run this testcase for replicated, distributed-replicated, dispersed, distributed and distributed-dispersed. I would suggest to increase the StartLimitBurst to 6 as If I am not wrong there are a couple of testcases which restart glusterd more than three times or more in one hour. REVIEW: https://review.gluster.org/23970 (glusterd: increase the StartLimitBurst) posted (#1) for review on master by Sanju Rakonde REVIEW: https://review.gluster.org/23970 (glusterd: increase the StartLimitBurst) merged (#2) on master by Sanju Rakonde |