Bug 1782200 - glusterd restart failing to start.
Summary: glusterd restart failing to start.
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: glusterd
Version: mainline
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: bugs@gluster.org
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-12-11 12:17 UTC by Kshithij Iyer
Modified: 2020-01-08 15:04 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-01-08 15:04:40 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Gluster.org Gerrit 23970 0 None Merged glusterd: increase the StartLimitBurst 2020-01-08 15:04:38 UTC

Description Kshithij Iyer 2019-12-11 12:17:13 UTC
Description of problem:
glusterd restart seems to be failing when I try to run the testcase test_snap_status_glusterd_restart on the latest builds with "Job for glusterd.service failed because start of the service was attempted too often. See "systemctl status glusterd.service" and "journalctl -xe" for details.
To force a start use "systemctl reset-failed glusterd.service" followed by "systemctl start glusterd.service" again."

################################################################################
Test run output:
################################################################################
17:26:09 functional/snapshot/test_snap_status_glusterd_restart.py::SnapshotGlusterdrestart_cplex_dispersed_glusterfs::test_snap_status_glusterd_restart PASSED [ 20%]
17:26:49 functional/snapshot/test_snap_status_glusterd_restart.py::SnapshotGlusterdrestart_cplex_distributed-replicated_glusterfs::test_snap_status_glusterd_restart PASSED [ 40%]
17:27:30 functional/snapshot/test_snap_status_glusterd_restart.py::SnapshotGlusterdrestart_cplex_replicated_glusterfs::test_snap_status_glusterd_restart FAILED [ 60%]
17:27:45 functional/snapshot/test_snap_status_glusterd_restart.py::SnapshotGlusterdrestart_cplex_replicated_glusterfs::test_snap_status_glusterd_restart ERROR [ 60%]
################################################################################


What does the testcase do?

1. Create volume of any type and start it.
2. Create two snapshots with description
3. Check snapshot status information with snap name, volume name and
   without snap name/vol name.
4. Restart glusterd on all nodes
5. Follow step3 again and validate snapshot.



################################################################################
glustomain.log
################################################################################
2019-12-11 11:57:42,525 INFO (run_async) root.2.188 (cp): service glusterd restart
2019-12-11 11:57:42,525 DEBUG (_get_ssh_connection) Retrieved connection from cache: root.2.188
2019-12-11 11:57:42,530 INFO (run_async) root.2.80 (cp): service glusterd restart
2019-12-11 11:57:42,531 DEBUG (_get_ssh_connection) Retrieved connection from cache: root.2.80
2019-12-11 11:57:42,536 INFO (run_async) root.2.75 (cp): service glusterd restart
2019-12-11 11:57:42,537 DEBUG (_get_ssh_connection) Retrieved connection from cache: root.2.75
2019-12-11 11:57:42,542 INFO (run_async) root.2.79 (cp): service glusterd restart
2019-12-11 11:57:42,542 DEBUG (_get_ssh_connection) Retrieved connection from cache: root.2.79
2019-12-11 11:57:42,547 INFO (run_async) root.2.81 (cp): service glusterd restart
2019-12-11 11:57:42,548 DEBUG (_get_ssh_connection) Retrieved connection from cache: root.2.81
2019-12-11 11:57:42,553 INFO (run_async) root.2.74 (cp): service glusterd restart
2019-12-11 11:57:42,553 DEBUG (_get_ssh_connection) Retrieved connection from cache: root.2.74
2019-12-11 11:57:42,631 INFO (_log_results) RETCODE (root.2.79): 1
2019-12-11 11:57:42,631 INFO (_log_results) STDERR (root.2.79)...
Redirecting to /bin/systemctl restart glusterd.service
Job for glusterd.service failed because start of the service was attempted too often. See "systemctl status glusterd.service" and "journalctl -xe" for details.
To force a start use "systemctl reset-failed glusterd.service" followed by "systemctl start glusterd.service" again.

2019-12-11 11:57:42,641 INFO (_log_results) RETCODE (root.2.81): 1
2019-12-11 11:57:42,641 INFO (_log_results) STDERR (root.2.81)...
Redirecting to /bin/systemctl restart glusterd.service
Job for glusterd.service failed because start of the service was attempted too often. See "systemctl status glusterd.service" and "journalctl -xe" for details.
To force a start use "systemctl reset-failed glusterd.service" followed by "systemctl start glusterd.service" again.

2019-12-11 11:57:42,641 INFO (_log_results) RETCODE (root.2.80): 1
2019-12-11 11:57:42,641 INFO (_log_results) STDERR (root.2.80)...
Redirecting to /bin/systemctl restart glusterd.service
Job for glusterd.service failed because start of the service was attempted too often. See "systemctl status glusterd.service" and "journalctl -xe" for details.
To force a start use "systemctl reset-failed glusterd.service" followed by "systemctl start glusterd.service" again.

2019-12-11 11:57:42,641 INFO (_log_results) RETCODE (root.2.74): 1
2019-12-11 11:57:42,642 INFO (_log_results) STDERR (root.2.74)...
Redirecting to /bin/systemctl restart glusterd.service
Job for glusterd.service failed because start of the service was attempted too often. See "systemctl status glusterd.service" and "journalctl -xe" for details.
To force a start use "systemctl reset-failed glusterd.service" followed by "systemctl start glusterd.service" again.

2019-12-11 11:57:42,642 INFO (_log_results) RETCODE (root.2.75): 1
2019-12-11 11:57:42,642 INFO (_log_results) STDERR (root.2.75)...
Redirecting to /bin/systemctl restart glusterd.service
Job for glusterd.service failed because start of the service was attempted too often. See "systemctl status glusterd.service" and "journalctl -xe" for details.
To force a start use "systemctl reset-failed glusterd.service" followed by "systemctl start glusterd.service" again.

2019-12-11 11:57:43,017 INFO (_log_results) RETCODE (root.2.188): 0
2019-12-11 11:57:43,017 INFO (_log_results) STDERR (root.2.188)...
Redirecting to /bin/systemctl restart glusterd.service
################################################################################


Version-Release number of selected component (if applicable):
Whatever is the latest upstream versioin

How reproducible:
Always.

Steps to Reproduce:
https://review.gluster.org/#/c/glusto-tests/+/20150/16/tests/functional/snapshot/test_snap_status_glusterd_restart.py

Actual results:
glusterd restart fails.

Expected results:
glusterd restart shouldn't fail.

Additional info:

Comment 1 Sanju 2019-12-16 05:33:04 UTC
From when this test case started failing?

anyway, it doesn't look like a gluster issue to me.

Comment 2 Kshithij Iyer 2019-12-16 06:24:22 UTC
The testcase is failing from Dec 11th onwards. I feel that something is wrong with gluster itself.

Comment 3 Sanju 2019-12-16 06:55:02 UTC
Please share the glusterd log from the node where it failed to start.

Comment 4 Kshithij Iyer 2019-12-16 14:19:31 UTC
I was able to reproduce this issue by just restarting glusterd thrice on a single node.

[root@dhcp43-96 ~]# systemctl status glusterd
● glusterd.service - GlusterFS, a clustered file-system server
   Loaded: loaded (/usr/lib/systemd/system/glusterd.service; disabled; vendor preset: disabled)
   Active: active (running) since Mon 2019-12-16 19:46:13 IST; 1s ago
     Docs: man:glusterd(8)
  Process: 4287 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, status=0/SUCCESS)
 Main PID: 4288 (glusterd)
   CGroup: /system.slice/glusterd.service
           └─4288 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO

Dec 16 19:46:13 dhcp43-96.lab.eng.blr.redhat.com systemd[1]: Starting GlusterFS, a clustered file-system server...
Dec 16 19:46:13 dhcp43-96.lab.eng.blr.redhat.com systemd[1]: Started GlusterFS, a clustered file-system server.
[root@dhcp43-96 ~]# systemctl restart glusterd
Job for glusterd.service failed because start of the service was attempted too often. See "systemctl status glusterd.service" and "journalctl -xe" for details.
To force a start use "systemctl reset-failed glusterd.service" followed by "systemctl start glusterd.service" again.
[root@dhcp43-96 ~]# systemctl status glusterd
● glusterd.service - GlusterFS, a clustered file-system server
   Loaded: loaded (/usr/lib/systemd/system/glusterd.service; disabled; vendor preset: disabled)
   Active: failed (Result: start-limit) since Mon 2019-12-16 19:46:18 IST; 2s ago
     Docs: man:glusterd(8)
  Process: 4287 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, status=0/SUCCESS)
 Main PID: 4288 (code=exited, status=15)

Dec 16 19:46:13 dhcp43-96.lab.eng.blr.redhat.com systemd[1]: Starting GlusterFS, a clustered file-system server...
Dec 16 19:46:13 dhcp43-96.lab.eng.blr.redhat.com systemd[1]: Started GlusterFS, a clustered file-system server.
Dec 16 19:46:18 dhcp43-96.lab.eng.blr.redhat.com systemd[1]: Stopping GlusterFS, a clustered file-system server...
Dec 16 19:46:18 dhcp43-96.lab.eng.blr.redhat.com systemd[1]: Stopped GlusterFS, a clustered file-system server.
Dec 16 19:46:18 dhcp43-96.lab.eng.blr.redhat.com systemd[1]: start request repeated too quickly for glusterd.service
Dec 16 19:46:18 dhcp43-96.lab.eng.blr.redhat.com systemd[1]: Failed to start GlusterFS, a clustered file-system server.
Dec 16 19:46:18 dhcp43-96.lab.eng.blr.redhat.com systemd[1]: Unit glusterd.service entered failed state.
Dec 16 19:46:18 dhcp43-96.lab.eng.blr.redhat.com systemd[1]: glusterd.service failed.
[root@dhcp43-96 ~]# cat /var/log/glusterfs/glusterd.log 
[2019-12-16 14:14:53.877984] I [MSGID: 100030] [glusterfsd.c:2902:main] 0-glusterd: Started running glusterd version 20191213.5ab8350 (args: glusterd --xlator-option *.upgrade=on -N) 
[2019-12-16 14:14:53.878293] I [glusterfsd.c:2629:daemonize] 0-glusterfs: Pid of current running process is 13851
[2019-12-16 14:14:53.883281] I [MSGID: 106478] [glusterd.c:1431:init] 0-management: Maximum allowed open file descriptors set to 65536 
[2019-12-16 14:14:53.883391] I [MSGID: 106479] [glusterd.c:1507:init] 0-management: Using /var/lib/glusterd as working directory 
[2019-12-16 14:14:53.883418] I [MSGID: 106479] [glusterd.c:1513:init] 0-management: Using /var/run/gluster as pid file working directory 
[2019-12-16 14:14:53.898243] I [socket.c:1015:__socket_server_bind] 0-socket.management: process started listening on port (24007)
[2019-12-16 14:14:53.902709] I [socket.c:958:__socket_server_bind] 0-socket.management: closing (AF_UNIX) reuse check socket 10
[2019-12-16 14:14:53.903971] I [MSGID: 106059] [glusterd.c:1891:init] 0-management: max-port override: 60999 
[2019-12-16 14:14:53.905714] E [MSGID: 101032] [store.c:463:gf_store_handle_retrieve] 0-: Path corresponding to /var/lib/glusterd/glusterd.info. [No such file or directory]
[2019-12-16 14:14:53.905768] E [MSGID: 101032] [store.c:463:gf_store_handle_retrieve] 0-: Path corresponding to /var/lib/glusterd/glusterd.info. [No such file or directory]
[2019-12-16 14:14:53.905778] I [MSGID: 106514] [glusterd-store.c:2153:glusterd_restore_op_version] 0-management: Detected new install. Setting op-version to maximum : 80000 
[2019-12-16 14:14:53.905848] E [MSGID: 101032] [store.c:463:gf_store_handle_retrieve] 0-: Path corresponding to /var/lib/glusterd/options. [No such file or directory]
[2019-12-16 14:14:53.915932] I [MSGID: 106194] [glusterd-store.c:3977:glusterd_store_retrieve_missed_snaps_list] 0-management: No missed snaps list. 
[2019-12-16 14:14:53.916055] E [MSGID: 101032] [store.c:463:gf_store_handle_retrieve] 0-: Path corresponding to /var/lib/glusterd/glusterd.upgrade. [No such file or directory]
[2019-12-16 14:14:53.916103] I [glusterd.c:2024:init] 0-management: Regenerating volfiles due to a max op-version mismatch or glusterd.upgrade file not being present, op_version retrieved:0, max op_version: 80000
[2019-12-16 14:14:53.916873] W [glusterfsd.c:1631:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7dd5) [0x7fc5d0f36dd5] -->glusterd(glusterfs_sigwaiter+0xe5) [0x56117bd235e5] -->glusterd(cleanup_and_exit+0x6b) [0x56117bd2344b] ) 0-: received signum (15), shutting down 
[2019-12-16 14:14:53.917004] W [mgmt-pmap.c:132:rpc_clnt_mgmt_pmap_signout] 0-glusterfs: failed to create XDR payload
[2019-12-16 14:16:02.641373] I [MSGID: 100030] [glusterfsd.c:2902:main] 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 20191213.5ab8350 (args: /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO) 
[2019-12-16 14:16:02.645083] I [glusterfsd.c:2629:daemonize] 0-glusterfs: Pid of current running process is 4250
[2019-12-16 14:16:02.659057] I [MSGID: 106478] [glusterd.c:1431:init] 0-management: Maximum allowed open file descriptors set to 65536 
[2019-12-16 14:16:02.659188] I [MSGID: 106479] [glusterd.c:1507:init] 0-management: Using /var/lib/glusterd as working directory 
[2019-12-16 14:16:02.659286] I [MSGID: 106479] [glusterd.c:1513:init] 0-management: Using /var/run/gluster as pid file working directory 
[2019-12-16 14:16:02.674453] I [socket.c:1015:__socket_server_bind] 0-socket.management: process started listening on port (24007)
[2019-12-16 14:16:02.678673] I [socket.c:958:__socket_server_bind] 0-socket.management: closing (AF_UNIX) reuse check socket 12
[2019-12-16 14:16:02.679951] I [MSGID: 106059] [glusterd.c:1891:init] 0-management: max-port override: 60999 
[2019-12-16 14:16:02.683970] I [MSGID: 106228] [glusterd.c:484:glusterd_check_gsync_present] 0-glusterd: geo-replication module not installed in the system [No such file or directory]
[2019-12-16 14:16:02.684414] E [MSGID: 101032] [store.c:463:gf_store_handle_retrieve] 0-: Path corresponding to /var/lib/glusterd/glusterd.info. [No such file or directory]
[2019-12-16 14:16:02.684488] E [MSGID: 101032] [store.c:463:gf_store_handle_retrieve] 0-: Path corresponding to /var/lib/glusterd/glusterd.info. [No such file or directory]
[2019-12-16 14:16:02.684498] I [MSGID: 106514] [glusterd-store.c:2153:glusterd_restore_op_version] 0-management: Detected new install. Setting op-version to maximum : 80000 
[2019-12-16 14:16:02.685808] I [MSGID: 106194] [glusterd-store.c:3977:glusterd_store_retrieve_missed_snaps_list] 0-management: No missed snaps list. 
[2019-12-16 14:16:02.686008] I [glusterd.c:2024:init] 0-management: Regenerating volfiles due to a max op-version mismatch or glusterd.upgrade file not being present, op_version retrieved:0, max op_version: 80000
Final graph:
+------------------------------------------------------------------------------+
  1: volume management
  2:     type mgmt/glusterd
  3:     option rpc-auth.auth-glusterfs on
  4:     option rpc-auth.auth-unix on
  5:     option rpc-auth.auth-null on
  6:     option rpc-auth-allow-insecure on
  7:     option transport.listen-backlog 1024
  8:     option max-port 60999
  9:     option event-threads 1
 10:     option ping-timeout 0
 11:     option transport.socket.listen-port 24007
 12:     option transport.socket.read-fail-log off
 13:     option transport.socket.keepalive-interval 2
 14:     option transport.socket.keepalive-time 10
 15:     option transport-type socket
 16:     option working-directory /var/lib/glusterd
 17: end-volume
 18:  
+------------------------------------------------------------------------------+
[2019-12-16 14:16:02.693808] I [MSGID: 101190] [event-epoll.c:682:event_dispatch_epoll_worker] 0-epoll: Started thread with index 0 
[2019-12-16 14:16:09.121556] W [glusterfsd.c:1631:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7dd5) [0x7f5d14e00dd5] -->/usr/sbin/glusterd(glusterfs_sigwaiter+0xe5) [0x55e68e3a35e5] -->/usr/sbin/glusterd(cleanup_and_exit+0x6b) [0x55e68e3a344b] ) 0-: received signum (15), shutting down 
[2019-12-16 14:16:09.136682] I [MSGID: 100030] [glusterfsd.c:2902:main] 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 20191213.5ab8350 (args: /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO) 
[2019-12-16 14:16:09.137597] I [glusterfsd.c:2629:daemonize] 0-glusterfs: Pid of current running process is 4269
[2019-12-16 14:16:09.142805] I [MSGID: 106478] [glusterd.c:1431:init] 0-management: Maximum allowed open file descriptors set to 65536 
[2019-12-16 14:16:09.142924] I [MSGID: 106479] [glusterd.c:1507:init] 0-management: Using /var/lib/glusterd as working directory 
[2019-12-16 14:16:09.142951] I [MSGID: 106479] [glusterd.c:1513:init] 0-management: Using /var/run/gluster as pid file working directory 
[2019-12-16 14:16:09.153574] I [socket.c:1015:__socket_server_bind] 0-socket.management: process started listening on port (24007)
[2019-12-16 14:16:09.157273] I [socket.c:958:__socket_server_bind] 0-socket.management: closing (AF_UNIX) reuse check socket 12
[2019-12-16 14:16:09.158414] I [MSGID: 106059] [glusterd.c:1891:init] 0-management: max-port override: 60999 
[2019-12-16 14:16:09.160684] I [MSGID: 106228] [glusterd.c:484:glusterd_check_gsync_present] 0-glusterd: geo-replication module not installed in the system [No such file or directory]
[2019-12-16 14:16:09.161188] E [MSGID: 101032] [store.c:463:gf_store_handle_retrieve] 0-: Path corresponding to /var/lib/glusterd/glusterd.info. [No such file or directory]
[2019-12-16 14:16:09.161277] E [MSGID: 101032] [store.c:463:gf_store_handle_retrieve] 0-: Path corresponding to /var/lib/glusterd/glusterd.info. [No such file or directory]
[2019-12-16 14:16:09.161287] I [MSGID: 106514] [glusterd-store.c:2153:glusterd_restore_op_version] 0-management: Detected new install. Setting op-version to maximum : 80000 
[2019-12-16 14:16:09.161775] I [MSGID: 106194] [glusterd-store.c:3977:glusterd_store_retrieve_missed_snaps_list] 0-management: No missed snaps list. 
Final graph:
+------------------------------------------------------------------------------+
  1: volume management
  2:     type mgmt/glusterd
  3:     option rpc-auth.auth-glusterfs on
  4:     option rpc-auth.auth-unix on
  5:     option rpc-auth.auth-null on
  6:     option rpc-auth-allow-insecure on
  7:     option transport.listen-backlog 1024
  8:     option max-port 60999
  9:     option event-threads 1
 10:     option ping-timeout 0
 11:     option transport.socket.listen-port 24007
 12:     option transport.socket.read-fail-log off
 13:     option transport.socket.keepalive-interval 2
 14:     option transport.socket.keepalive-time 10
 15:     option transport-type socket
 16:     option working-directory /var/lib/glusterd
 17: end-volume
 18:  
+------------------------------------------------------------------------------+
[2019-12-16 14:16:09.163492] I [MSGID: 101190] [event-epoll.c:682:event_dispatch_epoll_worker] 0-epoll: Started thread with index 0 
[2019-12-16 14:16:13.947003] W [glusterfsd.c:1631:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7dd5) [0x7f6121672dd5] -->/usr/sbin/glusterd(glusterfs_sigwaiter+0xe5) [0x55be235f15e5] -->/usr/sbin/glusterd(cleanup_and_exit+0x6b) [0x55be235f144b] ) 0-: received signum (15), shutting down 
[2019-12-16 14:16:13.960474] I [MSGID: 100030] [glusterfsd.c:2902:main] 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 20191213.5ab8350 (args: /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO) 
[2019-12-16 14:16:13.961417] I [glusterfsd.c:2629:daemonize] 0-glusterfs: Pid of current running process is 4288
[2019-12-16 14:16:13.966378] I [MSGID: 106478] [glusterd.c:1431:init] 0-management: Maximum allowed open file descriptors set to 65536 
[2019-12-16 14:16:13.966498] I [MSGID: 106479] [glusterd.c:1507:init] 0-management: Using /var/lib/glusterd as working directory 
[2019-12-16 14:16:13.966525] I [MSGID: 106479] [glusterd.c:1513:init] 0-management: Using /var/run/gluster as pid file working directory 
[2019-12-16 14:16:13.976827] I [socket.c:1015:__socket_server_bind] 0-socket.management: process started listening on port (24007)
[2019-12-16 14:16:13.980250] I [socket.c:958:__socket_server_bind] 0-socket.management: closing (AF_UNIX) reuse check socket 12
[2019-12-16 14:16:13.981305] I [MSGID: 106059] [glusterd.c:1891:init] 0-management: max-port override: 60999 
[2019-12-16 14:16:13.983367] I [MSGID: 106228] [glusterd.c:484:glusterd_check_gsync_present] 0-glusterd: geo-replication module not installed in the system [No such file or directory]
[2019-12-16 14:16:13.983826] E [MSGID: 101032] [store.c:463:gf_store_handle_retrieve] 0-: Path corresponding to /var/lib/glusterd/glusterd.info. [No such file or directory]
[2019-12-16 14:16:13.983874] E [MSGID: 101032] [store.c:463:gf_store_handle_retrieve] 0-: Path corresponding to /var/lib/glusterd/glusterd.info. [No such file or directory]
[2019-12-16 14:16:13.983882] I [MSGID: 106514] [glusterd-store.c:2153:glusterd_restore_op_version] 0-management: Detected new install. Setting op-version to maximum : 80000 
[2019-12-16 14:16:13.984298] I [MSGID: 106194] [glusterd-store.c:3977:glusterd_store_retrieve_missed_snaps_list] 0-management: No missed snaps list. 
Final graph:
+------------------------------------------------------------------------------+
  1: volume management
  2:     type mgmt/glusterd
  3:     option rpc-auth.auth-glusterfs on
  4:     option rpc-auth.auth-unix on
  5:     option rpc-auth.auth-null on
  6:     option rpc-auth-allow-insecure on
  7:     option transport.listen-backlog 1024
  8:     option max-port 60999
  9:     option event-threads 1
 10:     option ping-timeout 0
 11:     option transport.socket.listen-port 24007
 12:     option transport.socket.read-fail-log off
 13:     option transport.socket.keepalive-interval 2
 14:     option transport.socket.keepalive-time 10
 15:     option transport-type socket
 16:     option working-directory /var/lib/glusterd
 17: end-volume
 18:  
+------------------------------------------------------------------------------+
[2019-12-16 14:16:13.985611] I [MSGID: 101190] [event-epoll.c:682:event_dispatch_epoll_worker] 0-epoll: Started thread with index 0 
[2019-12-16 14:16:18.432907] W [glusterfsd.c:1631:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7dd5) [0x7f2dd9903dd5] -->/usr/sbin/glusterd(glusterfs_sigwaiter+0xe5) [0x559a918875e5] -->/usr/sbin/glusterd(cleanup_and_exit+0x6b) [0x559a9188744b] ) 0-: received signum (15), shutting down 
[root@dhcp43-96 ~]#

Comment 5 Sanju 2019-12-16 16:31:37 UTC
This is happening because of https://review.gluster.org/#/c/glusterfs/+/23751/. This patch ensures that glusterd will restarted by systemctl for maximum of 3 times in one hour. We can use systemctl reset-failed glusterd.service to reset the counter.

what does this particular test do?

Comment 6 Kshithij Iyer 2019-12-17 06:09:46 UTC
In this testcase we are creating a volume of any type, after which we fuse mount it. And create 2 snapshots for the volumes with description and check snapshot status information with snapshot name, volume name and without snapshot name/volume name. Post which we restart glusterd and once glusterd is up and running we check the information again. We run this testcase for replicated, distributed-replicated, dispersed, distributed and distributed-dispersed.

I would suggest to increase the StartLimitBurst to 6 as If I am not wrong there are a couple of testcases which restart glusterd more than three times or more in one hour.

Comment 7 Worker Ant 2020-01-07 10:06:20 UTC
REVIEW: https://review.gluster.org/23970 (glusterd: increase the StartLimitBurst) posted (#1) for review on master by Sanju Rakonde

Comment 8 Worker Ant 2020-01-08 15:04:40 UTC
REVIEW: https://review.gluster.org/23970 (glusterd: increase the StartLimitBurst) merged (#2) on master by Sanju Rakonde


Note You need to log in before you can comment on or make changes to this bug.