Created attachment 1633224 [details] glusterfsd process log Description of problem: During my recent test on glusterfs7, still found in case of reboot storage nodes, often, after glusterd and glusterfsd get up, the volume status is wrong! Glusterd and glusterfsd process are both alive however gluster v status command showd glusterfsd process N/A Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1.reboot all storage node at the same time 2.wait for all nodes getup 3.execute "gluster v status all" Actual results: some volume glusterfsd fail to get online Expected results: all glsuterfsd get online Additional info: # gluster v status ccs Status of volume: ccs Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick mn-0.local:/mnt/bricks/ccs/brick N/A N/A N N/A Brick mn-1.local:/mnt/bricks/ccs/brick 53952 0 Y 2065 Brick dbm-0.local:/mnt/bricks/ccs/brick N/A N/A N N/A Self-heal Daemon on localhost N/A N/A Y 4940 Self-heal Daemon on dbm-0.local N/A N/A N N/A Self-heal Daemon on mn-1.local N/A N/A Y 2537 Task Status of Volume ccs ------------------------------------------------------------------------------ There are no active volume tasks # ps -ef | grep glusterfsd| grep ccs root 1764 1 0 09:10 ? 00:00:07 /usr/sbin/glusterfsd -s mn-0.local --volfile-id ccs.mn-0.local.mnt-bricks-ccs-brick -p /var/run/gluster/vols/ccs/mn-0.local-mnt-bricks-ccs-brick.pid -S /var/run/gluster/7ea87ceb0a781684.socket --brick-name /mnt/bricks/ccs/brick -l /var/log/glusterfs/bricks/mnt-bricks-ccs-brick.log --log-level TRACE --xlator-option *-posix.glusterd-uuid=ebaded6d-91d5-4873-a60a-59bbcc813714 --process-name brick --brick-port 53952 --xlator-option ccs-server.listen-port=53952 --xlator-option transport.socket.bind-address=mn-0.local [root@mn-0:/var/log/storageinfo/symptom_log] [root@mn-0:/var/log/storageinfo/symptom_log] # netstat -anlp| grep 1764 tcp 0 0 192.168.1.6:53952 0.0.0.0:* LISTEN 1764/glusterfsd tcp 0 0 192.168.1.6:53952 192.168.1.11:49058 ESTABLISHED 1764/glusterfsd tcp 0 0 192.168.1.6:53952 192.168.1.6:49069 ESTABLISHED 1764/glusterfsd tcp 0 0 192.168.1.6:53952 192.168.1.33:49139 ESTABLISHED 1764/glusterfsd tcp 0 0 192.168.1.6:53952 192.168.1.12:49136 ESTABLISHED 1764/glusterfsd tcp 0 0 192.168.1.6:53952 192.168.1.16:49139 ESTABLISHED 1764/glusterfsd tcp 0 0 192.168.1.6:53952 192.168.1.23:49145 ESTABLISHED 1764/glusterfsd tcp 0 0 192.168.1.6:53952 192.168.1.5:49052 ESTABLISHED 1764/glusterfsd tcp 0 0 192.168.1.6:53952 192.168.1.8:49113 ESTABLISHED 1764/glusterfsd tcp 0 0 192.168.1.6:53952 192.168.1.7:49104 ESTABLISHED 1764/glusterfsd tcp 0 0 192.168.1.6:53952 192.168.1.6:49056 ESTABLISHED 1764/glusterfsd tcp 0 0 192.168.1.6:53952 192.168.1.6:49082 ESTABLISHED 1764/glusterfsd tcp 0 0 192.168.1.6:53952 192.168.1.29:49144 ESTABLISHED 1764/glusterfsd tcp 0 0 192.168.1.6:53952 192.168.1.5:49045 ESTABLISHED 1764/glusterfsd tcp 0 0 192.168.1.6:53952 192.168.1.11:49100 ESTABLISHED 1764/glusterfsd tcp 0 0 192.168.1.6:49149 192.168.1.6:24007 ESTABLISHED 1764/glusterfsd unix 2 [ ACC ] STREAM LISTENING 25405 1764/glusterfsd /var/run/gluster/7ea87ceb0a781684.socket unix 2 [ ACC ] STREAM LISTENING 40159 1764/glusterfsd /var/run/gluster/changelog-25ddbf533d927939.sock unix 3 [ ] STREAM CONNECTED 41282 1764/glusterfsd /var/run/gluster/7ea87ceb0a781684.socket unix 2 [ ] DGRAM 26910 1764/glusterfsd [root@mn-0:/var/log/storageinfo/symptom_log] [root@mn-0:/var/log/storageinfo/symptom_log] # gluster v info ccs Volume Name: ccs Type: Replicate Volume ID: 521261bc-2cba-4e7b-a21a-8486712d7a31 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: mn-0.local:/mnt/bricks/ccs/brick Brick2: mn-1.local:/mnt/bricks/ccs/brick Brick3: dbm-0.local:/mnt/bricks/ccs/brick Options Reconfigured: diagnostics.brick-log-level: TRACE cluster.self-heal-daemon: on nfs.disable: on storage.fips-mode-rchecksum: on transport.address-family: inet cluster.server-quorum-type: none cluster.quorum-type: auto cluster.quorum-reads: true cluster.consistent-metadata: on server.allow-insecure: on network.ping-timeout: 42 cluster.favorite-child-policy: mtime cluster.heal-timeout: 60 performance.client-io-threads: off cluster.metadata-self-heal: on cluster.data-self-heal: on cluster.entry-self-heal: on cluster.server-quorum-ratio: 51% [some analysis based on enclosed log] From glusterd.log [2019-11-06 07:10:42.708849] D [MSGID: 0] [glusterd-utils.c:6625:glusterd_restart_bricks] 0-management: starting the volume ccs --------- glusterd start glusterfsd process here … [2019-11-06 07:10:43.710937] T [socket.c:226:socket_dump_info] 0-management: $$$ client: connecting to (af:1,sock:12) /var/run/gluster/7ea87ceb0a781684.socket non-SSL (errno:0:Success) -- does this mean connection with glusterfsd is successful ? From glusterfsd.log [2019-11-06 07:10:42.779208] T [socket.c:226:socket_dump_info] 0-socket.glusterfsd: $$$ client: listening on (af:1,sock:7) /var/run/gluster/7ea87ceb0a781684.socket non-SSL (errno:0:Success) ------I think this means glusterfsd unix domain socket is ready to receive
Created attachment 1633225 [details] glusterd process log
it seems like to be a config issue finally, in my env glusterd.conf the ping-timeout value is set to be 0, this seems to have sth to do with this issue, after i change this ping-timeout value to 30, this problem disappeared!