1592258 – Gluster brick will not come back up online

Bug 1592258 - Gluster brick will not come back up online

Summary: Gluster brick will not come back up online

Keywords:
Status:	CLOSED WORKSFORME
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	glusterd
Sub Component:
Version:	3.12
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Assignee:	bugs@gluster.org
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-06-18 09:28 UTC by stefan.luteijn
Modified:	2018-10-05 02:23 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2018-10-05 02:23:45 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Dumps of the bricks of the volume that has the brick refusing the get back online. (520.00 KB, application/x-tar) 2018-06-18 09:28 UTC, stefan.luteijn	no flags	Details
Brick and glusterd logs from 18-06 to 20-06 when the issue occurred (11.35 MB, application/x-tar) 2018-07-03 12:13 UTC, stefan.luteijn	no flags	Details
View All

Description stefan.luteijn 2018-06-18 09:28:19 UTC

Created attachment 1452579 [details]
Dumps of the bricks of the volume that has the brick refusing the get back online.

Description of problem:
Every now and then we have gluster bricks going offline on our cluster. Most of the time we can restart them by running gluster volume start <gluster_vol> force. Every now and then however we get a brick which refuses to start again even when we force start it. Stopping and then starting the volume does bring all bricks online in all cases.

Version-Release number of selected component (if applicable):
glusterfs 3.12.9

How reproducible:
Not reliably. Run gluster with replication on (in our setup with ~50 volumes) and every 1 or 2 days a few bricks will go offline. About every 1 out of every 5 of those bricks does not want to get back online when we firce start them and seems to be assigned a port number already in use by another brick on that node. 

Steps to Reproduce:
1. Have an offline brick of a live gluster volume
2. Gluster volume start <gluster_vol> force
3. Confirm brick of the gluster is still offline

Actual results:
Gluster brick stays offline after running gluster volume start <gluster_vol> force

Expected results:
Gluster brick will get back online after running gluster volume start <gluster_vol> force

Additional info:
We run a setup where we create replicated volumes using three gluster nodes. We also run a dr site. with another three gluster nodes for which we use the data replication option of gluster to keep them in sync with the main site.
We have noticed that this exclusively happens on our environment when replication is on. We did notice that so far everytime we have a brick that does not want to get back online, it seems that is assigned a port number which has already been assigned to a different brick. However, I don't know at which point in time a port number gets assigned to a brick so this might be a false flag. In any case, below the config of the brick config with the same port number:

172.16.0.4:-var-lib-heketi-mounts-vg_ff6390856febea2c9ec2a7fb7d0c1ff9-brick_08f4fd811dc92bbbfbf1872b7a49c67d-brick:listen-port=49187
<c1ff9-brick_08f4fd811dc92bbbfbf1872b7a49c67d-brick                          
uuid=19a9c113-41ba-411c-ad76-c7b8fdbe14f2
hostname=172.16.0.4
path=/var/lib/heketi/mounts/vg_ff6390856febea2c9ec2a7fb7d0c1ff9/brick_08f4fd811dc92bbbfbf1872b7a49c67d/brick
real_path=/var/lib/heketi/mounts/vg_ff6390856febea2c9ec2a7fb7d0c1ff9/brick_08f4fd811dc92bbbfbf1872b7a49c67d/brick
listen-port=49187
rdma.listen-port=0
decommissioned=0
brick-id=vol_ade97766557f27313661681852eebdf0-client-2
mount_dir=/brick
snap-status=0
brick-fsid=65204

172.16.0.4:-var-lib-heketi-mounts-vg_ff6390856febea2c9ec2a7fb7d0c1ff9-brick_847d3127b26fbea7aa55fd24f46042e4-brick:listen-port=49187
<c1ff9-brick_847d3127b26fbea7aa55fd24f46042e4-brick 
uuid=19a9c113-41ba-411c-ad76-c7b8fdbe14f2
hostname=172.16.0.4
path=/var/lib/heketi/mounts/vg_ff6390856febea2c9ec2a7fb7d0c1ff9/brick_847d3127b26fbea7aa55fd24f46042e4/brick
real_path=/var/lib/heketi/mounts/vg_ff6390856febea2c9ec2a7fb7d0c1ff9/brick_847d3127b26fbea7aa55fd24f46042e4/brick
listen-port=49187
rdma.listen-port=0
decommissioned=0
brick-id=vol_c0d4d479b43ff3d00601b04d25eff60e-client-2
mount_dir=/brick
snap-status=0
brick-fsid=65104

Used gluster packages:
glusterfs.x86_64                         3.12.9-1.el7         @centos-gluster312
glusterfs-api.x86_64                     3.12.9-1.el7         @centos-gluster312
glusterfs-cli.x86_64                     3.12.9-1.el7         @centos-gluster312
glusterfs-client-xlators.x86_64          3.12.9-1.el7         @centos-gluster312
glusterfs-fuse.x86_64                    3.12.9-1.el7         @centos-gluster312
glusterfs-geo-replication.x86_64         3.12.9-1.el7         @centos-gluster312
glusterfs-libs.x86_64                    3.12.9-1.el7         @centos-gluster312
glusterfs-rdma.x86_64                    3.12.9-1.el7         @centos-gluster312
glusterfs-server.x86_64                  3.12.9-1.el7         @centos-gluster312
Available Packages
glusterfs-api-devel.x86_64               3.12.9-1.el7         centos-gluster312 
glusterfs-coreutils.x86_64               0.2.0-1.el7          centos-gluster312 
glusterfs-devel.x86_64                   3.12.9-1.el7         centos-gluster312 
glusterfs-events.x86_64                  3.12.9-1.el7         centos-gluster312 
glusterfs-extra-xlators.x86_64           3.12.9-1.el7         centos-gluster312 
glusterfs-gnfs.x86_64                    3.12.9-1.el7         centos-gluster312 
glusterfs-resource-agents.noarch         3.12.9-1.el7         centos-gluster312 

xfs options:
xfs_info /dev/mapper/vg_ff6390856febea2c9ec2a7fb7d0c1ff9-brick_08f4fd811dc92bbbfbf1872b7a49c67d
meta-data=/dev/mapper/vg_ff6390856febea2c9ec2a7fb7d0c1ff9-brick_08f4fd811dc92bbbfbf1872b7a49c67d isize=512    agcount=8, agsize=65472 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=0 spinodes=0
data     =                       bsize=4096   blocks=523776, imaxpct=25
         =                       sunit=64     swidth=64 blks
naming   =version 2              bsize=8192   ascii-ci=0 ftype=1
log      =internal               bsize=4096   blocks=2560, version=2
         =                       sectsz=512   sunit=64 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

uname -r
4.11.9-coreos
cat /etc/issue
\S
Kernel \r on an \m

10:46$ df -Th
Filesystem              Type      Size  Used Avail Use% Mounted on
udev                    devtmpfs  7,7G     0  7,7G   0% /dev
tmpfs                   tmpfs     1,6G  9,7M  1,6G   1% /run
/dev/sda1               ext4      220G  149G   60G  72% /
tmpfs                   tmpfs     7,7G  369M  7,3G   5% /dev/shm
tmpfs                   tmpfs     5,0M  4,0K  5,0M   1% /run/lock
tmpfs                   tmpfs     7,7G     0  7,7G   0% /sys/fs/cgroup
/dev/loop1              squashfs  158M  158M     0 100% /snap/mailspring/209
/dev/loop0              squashfs   94M   94M     0 100% /snap/slack/6
/dev/loop4              squashfs  157M  157M     0 100% /snap/mailspring/216
/dev/loop3              squashfs   87M   87M     0 100% /snap/core/4486
/dev/loop6              squashfs   94M   94M     0 100% /snap/slack/5
/dev/loop2              squashfs  144M  144M     0 100% /snap/slack/7
/dev/loop5              squashfs   87M   87M     0 100% /snap/core/4571
/dev/loop7              squashfs  158M  158M     0 100% /snap/mailspring/202
/dev/loop8              squashfs   87M   87M     0 100% /snap/core/4650
tmpfs                   tmpfs     1,6G   64K  1,6G   1% /run/user/1000

Attached are the brick dump files of the volume

Comment 1 Atin Mukherjee 2018-07-02 02:46:33 UTC

We'd need following information to be captured to start debugging this problem:

1. output of gluster v get all all - wanted to understand if brick multiplexing is turned on or not.
2. glusterd and brick log file (for the bricks which is shown N/A) from the node where the brick is hosted.

Comment 2 stefan.luteijn 2018-07-03 12:13:29 UTC

Created attachment 1456225 [details]
Brick and glusterd logs from 18-06 to 20-06 when the issue occurred

Attached the glusterd and brick logs as requested

Comment 3 stefan.luteijn 2018-07-03 12:14:36 UTC

gluster v get all all
Option                                  Value                                   
------                                  -----                                   
cluster.server-quorum-ratio             51                                      
cluster.enable-shared-storage           disable                                 
cluster.op-version                      31202                                   
cluster.max-op-version                  31202                                   
cluster.brick-multiplex                 disable                                 
cluster.max-bricks-per-process          0                                       
cluster.localtime-logging               disable

Comment 4 stefan.luteijn 2018-07-06 11:49:57 UTC

Some extra information: The issue occurred after the gluster pod crashed due to out of memory reasons. We found out that after we remove the brick pid file under /var/run/gluster/vols/<vol_name> on the machine that hosted the offline brick, we could start the brick succesfully again with gluster volume start <vol_name> force.

We related this to the below log entries from the brick and glusterd logs (also provided in the attachments):

brick log:
[2018-06-18 06:55:19.152561] E [socket.c:2369:socket_connect_finish] 0-glusterfs: connection to 172.16.0.4:24007 failed (Connection refused); disconnecting socket

glusterd log
[2018-06-18 06:57:00.326425] I [glusterd-utils.c:5953:glusterd_brick_start] 0-management: discovered already-running brick /var/lib/heketi/mounts/vg_ff6390856febea2c9ec2a7fb7d0c1ff9/brick_08f4fd811dc92bbbfbf1872b7a49c67d/brick

Comment 5 Atin Mukherjee 2018-07-08 13:37:02 UTC

This seems to me that the pidfile had a pid which was pointing to some other running process (it may be a other gluster process or something completely different). Ideally the pid allocation is forward moving process and possibility of having pid clash in a running environment is vert very rare until and unless the system has exhausted the upper bound of the pid limit and starts scanning from the scratch. Do you see this happening frequently, if so I'd definitely request you to capture the exact pid number from the brick pidfile and then do a 'ps aux | grep <pid> to see what process is this (ofcourse I'd require the brick name). Also what's the output of gluster volume status at the moment and if we create a new dummy volume then, what pid does the new brick pick up?

If you can help me with this information I guess we should be able to narrow down this issue.

Comment 6 stefan.luteijn 2018-07-13 09:32:07 UTC

At the moment the gluster volume status  shows all bricks available and online. I have created a new volume as request from which the brick process came up under pid 28819. It is relatively small gluster environment with about 60 volumes running, all with 3 replications split between 3 gluster nodes.

We are wondering though if the issue can't be that the brick pid file isn't properly cleaned up after an oom crash.

'Unfortunately' we haven't had this issue recurring this weekend so I cannot check the process id if a stale brick at the moment, nor the age of the pid file.

Comment 7 Atin Mukherjee 2018-10-05 02:23:45 UTC

Given you couldn't reproduce this issue and there's not enough information available at the bug, I'm closing this issue. In case you happen to hit this again, please follow comment 5 and reopen this bug.

Note You need to log in before you can comment on or make changes to this bug.