Bug 1475340

Summary:	Glusterfs mount inside POD gets terminated when scaled with brick multiplexing enabled
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	Shekhar Berry <shberry>
Component:	kubernetes	Assignee:	Humble Chirammal <hchiramm>
Status:	CLOSED CURRENTRELEASE	QA Contact:	krishnaram Karthick <kramdoss>
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	rhgs-3.0	CC:	amukherj, aos-bugs, aos-storage-staff, csaba, ekuric, hchiramm, jeder, jsafrane, madam, mpillai, nchilaka, pprakash, psuriset, rhs-bugs, rsussman, rtalur, shberry, storage-qa-internal
Target Milestone:	---
Target Release:	CNS 3.6
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:	aos-scalability-36
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2018-01-03 10:22:19 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1477024
Bug Blocks:	1445448

Description Shekhar Berry 2017-07-26 13:13:11 UTC

Description of problem:

All pods that have a active PVC mounted in a pod lose that glusterfs mount, the pods must then be deleted for it to get that mount back. There were in total of 1000 PODs created and all of them lost mount. Note during creation all mount were present. Brick multiplexing was enabled on the gluster volumes.
 
When Df command was run inside pod, it gave following error message for glusterfs mount:

 '/mnt/glusterfs': Transport endpoint is not connected

No restart of the atomic-openshift-node.service is done either manually or via ansible-playbook. SO this is not realted to Bug https://bugzilla.redhat.com/show_bug.cgi?id=1423640

Version-Release number of selected component (if applicable):

heketi-client-5.0.0-2.el7rhgs.x86_64
 brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhgs3/rhgs-server-rhel7:3.3.0-7
brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhgs3/rhgs-volmanager-rhel7:3.3.0-7
oc version
oc v3.6.135
kubernetes v1.6.1+5115d708d7
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://gprfs013.sbu.lab.eng.bos.redhat.com:8443
openshift v3.6.135
kubernetes v1.6.1+5115d708d7

docker version
Client:
 Version:         1.12.6
 API version:     1.24
 Package version: docker-1.12.6-31.1.git97ba2c0.el7.x86_64
 Go version:      go1.8
 Git commit:      97ba2c0/1.12.6
 Built:           Fri May 26 16:26:51 2017
 OS/Arch:         linux/amd64

Server:
 Version:         1.12.6
 API version:     1.24
 Package version: docker-1.12.6-31.1.git97ba2c0.el7.x86_64
 Go version:      go1.8
 Git commit:      97ba2c0/1.12.6
 Built:           Fri May 26 16:26:51 2017
 OS/Arch:         linux/amd64




How reproducible:

Happened Twice in my setup

Steps to Reproduce:
1. Create PODs and mount gluster PV with Brick Multiplexing enabled volume
2. Login after some time and observe df
3. '/mnt/glusterfs': Transport endpoint is not connected

Actual results:
'/mnt/glusterfs': Transport endpoint is not connected

Expected results:

Gluster mount should not have terminated

Master Log: http://perf1.perf.lab.eng.bos.redhat.com/pub/shberry/tranport_end_point/ocp_master_var_log/

Node Log (of failed PODs): http://perf1.perf.lab.eng.bos.redhat.com/pub/shberry/tranport_end_point/worker_var_log/

CNS Node Log: http://perf1.perf.lab.eng.bos.redhat.com/pub/shberry/tranport_end_point/cns_var_log/

Comment 2 Humble Chirammal 2017-07-26 13:56:28 UTC

Shekhar, Is this issue only visible when you enable 'brick multiplex on' in a cluster?

Comment 3 Jan Safranek 2017-07-26 14:26:36 UTC

OpenShift puts glusterfs mount logs to /var/lib/origin/openshift.local.volumes/plugins/kubernetes.io/glusterfs, inspecting these on the faulty node (and publishing them) might be useful too.

Comment 4 Shekhar Berry 2017-07-27 08:59:02 UTC

(In reply to Humble Chirammal from comment #2)
> Shekhar, Is this issue only visible when you enable 'brick multiplex on' in
> a cluster?

I have not seen this with brick multiplexing disabled as of now.

Comment 6 Humble Chirammal 2017-07-27 10:44:30 UTC

(In reply to Shekhar Berry from comment #4)
> (In reply to Humble Chirammal from comment #2)
> > Shekhar, Is this issue only visible when you enable 'brick multiplex on' in
> > a cluster?
> 
> I have not seen this with brick multiplexing disabled as of now.

Thanks. Just to isolate, you are not able to mount and use this share manually in any of the nodes. Isnt it ? Also, can you please capture iptables rules which are active from the nodes ?

Comment 7 Shekhar Berry 2017-07-27 11:38:43 UTC

(In reply to Jan Safranek from comment #3)
> OpenShift puts glusterfs mount logs to
> /var/lib/origin/openshift.local.volumes/plugins/kubernetes.io/glusterfs,
> inspecting these on the faulty node (and publishing them) might be useful
> too.

Here's the link to mount logs of all PVC from one of the fault nodes:

http://perf1.perf.lab.eng.bos.redhat.com/pub/shberry/tranport_end_point/worker_mount_logs/

Comment 8 Shekhar Berry 2017-07-27 11:44:25 UTC

(In reply to Humble Chirammal from comment #6)
> (In reply to Shekhar Berry from comment #4)
> > (In reply to Humble Chirammal from comment #2)
> > > Shekhar, Is this issue only visible when you enable 'brick multiplex on' in
> > > a cluster?
> > 
> > I have not seen this with brick multiplexing disabled as of now.
> 
> Thanks. Just to isolate, you are not able to mount and use this share
> manually in any of the nodes. Isnt it ? Also, can you please capture
> iptables rules which are active from the nodes ?

Yes, its true that I am unable to mount the share locally on the glusterfs pod itself.

Here's the link to iptables_List from one of the nodes: 
http://perf1.perf.lab.eng.bos.redhat.com/pub/shberry/tranport_end_point/iptables_L

Comment 14 Shekhar Berry 2017-07-28 04:55:42 UTC

Here the link to brick log file mentioned in comment 13

http://perf1.perf.lab.eng.bos.redhat.com/pub/shberry/tranport_end_point/cns_var_log/glusterfs/bricks/

Comment 22 Humble Chirammal 2017-07-31 11:25:34 UTC

I did a quick look at this setup and iic, there is no "mount process" running in the node. The fuse mount processes are gone somehow. Shekhar, any node service restart or simliar executed in this setup?

Comment 23 Humble Chirammal 2017-08-02 15:50:24 UTC

On further check, I have noticed rpc errors in this setup, which is already a known bug in RHGS:
 
var-lib-heketi-mounts-vg_84a07855b88ead2326fb1f557beac8fd-brick_c6b4373a2612f5edd579dacf926e4343-brick.log-20170723:[2017-07-20 12:18:43.335874] E [rpcsvc.c:557:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to complete successfully


var-lib-heketi-mounts-vg_3b8548f910b60688eef1582f69a3fee6-brick_d79122071ae433b2b2ab336f4a287bf5-brick.log-20170723:[2017-07-20 12:18:40.233833] E [rpcsvc.c:557:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to complete successfully


As these errors are noticed when the issue happend I am attaching this to the same. 

We are also recreating the setup once again.

Comment 24 Humble Chirammal 2017-08-02 16:20:32 UTC

(In reply to Humble Chirammal from comment #23)
> On further check, I have noticed rpc errors in this setup, which is already
> a known bug in RHGS:
>  
> var-lib-heketi-mounts-vg_84a07855b88ead2326fb1f557beac8fd-
> brick_c6b4373a2612f5edd579dacf926e4343-brick.log-20170723:[2017-07-20
> 12:18:43.335874] E [rpcsvc.c:557:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc
> actor failed to complete successfully
> 
> 
> var-lib-heketi-mounts-vg_3b8548f910b60688eef1582f69a3fee6-
> brick_d79122071ae433b2b2ab336f4a287bf5-brick.log-20170723:[2017-07-20
> 12:18:40.233833] E [rpcsvc.c:557:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc
> actor failed to complete successfully
> 
> 
> As these errors are noticed when the issue happend I am attaching this to
> the same. 
> 

Mentioned bug # BZ#1477024

> We are also recreating the setup once again.

Comment 39 Humble Chirammal 2017-08-08 09:25:13 UTC

Regardless of the root cause, one thing which I would like to clarify here.

The 'auto_unmount' option has been set on the volume mounts and for some reason the fuse process were vanished in this setup, however the mounts ( mount command ) were intact in these servers. I was in an impression that, 'auto_unmount' would take down the subjected mounts if there is a failure in original mount process. It could also be that, the monitoring process were also taken down/failed when the issue happened. However I would like to clarify/make sure the "unmount" happens when there is a failure in the original mount process. 

csaba, can you please take a look at this scenario and share your thought? The logs are available till comment#21.

Comment 40 Atin Mukherjee 2017-08-08 09:28:10 UTC

(In reply to Humble Chirammal from comment #39)
> Regardless of the root cause, one thing which I would like to clarify here.
> 
> The 'auto_unmount' option has been set on the volume mounts and for some
> reason the fuse process were vanished in this setup, however the mounts (
> mount command ) were intact in these servers. I was in an impression that,
> 'auto_unmount' would take down the subjected mounts if there is a failure in
> original mount process. It could also be that, the monitoring process were
> also taken down/failed when the issue happened. However I would like to
> clarify/make sure the "unmount" happens when there is a failure in the
> original mount process. 
> 
> csaba, can you please take a look at this scenario and share your thought?
> The logs are available till comment#21.

AFAIK, Shekhar was running 3.2 client bits where auto_unmount feature is not in place. How is this relevant then?

Comment 41 Humble Chirammal 2017-08-08 09:58:57 UTC

(In reply to Atin Mukherjee from comment #40)
> (In reply to Humble Chirammal from comment #39)
> > Regardless of the root cause, one thing which I would like to clarify here.
> > 
> > The 'auto_unmount' option has been set on the volume mounts and for some
> > reason the fuse process were vanished in this setup, however the mounts (
> > mount command ) were intact in these servers. I was in an impression that,
> > 'auto_unmount' would take down the subjected mounts if there is a failure in
> > original mount process. It could also be that, the monitoring process were
> > also taken down/failed when the issue happened. However I would like to
> > clarify/make sure the "unmount" happens when there is a failure in the
> > original mount process. 
> > 
> > csaba, can you please take a look at this scenario and share your thought?
> > The logs are available till comment#21.
> 
> AFAIK, Shekhar was running 3.2 client bits where auto_unmount feature is not
> in place. How is this relevant then?

How did you verify the auto_unmount is not in place ? I can clearly see that auto_unmount is in place and also the glusterfs version in this setup is the package which has the support.

Comment 43 Atin Mukherjee 2017-08-08 10:15:06 UTC

(In reply to Humble Chirammal from comment #41)
> (In reply to Atin Mukherjee from comment #40)
> > (In reply to Humble Chirammal from comment #39)
> > > Regardless of the root cause, one thing which I would like to clarify here.
> > > 
> > > The 'auto_unmount' option has been set on the volume mounts and for some
> > > reason the fuse process were vanished in this setup, however the mounts (
> > > mount command ) were intact in these servers. I was in an impression that,
> > > 'auto_unmount' would take down the subjected mounts if there is a failure in
> > > original mount process. It could also be that, the monitoring process were
> > > also taken down/failed when the issue happened. However I would like to
> > > clarify/make sure the "unmount" happens when there is a failure in the
> > > original mount process. 
> > > 
> > > csaba, can you please take a look at this scenario and share your thought?
> > > The logs are available till comment#21.
> > 
> > AFAIK, Shekhar was running 3.2 client bits where auto_unmount feature is not
> > in place. How is this relevant then?
> 
> How did you verify the auto_unmount is not in place ? I can clearly see that
> auto_unmount is in place and also the glusterfs version in this setup is the
> package which has the support.

Shekhar mentioned the version details to me earlier. And I'm not sure currently what you are looking at is the same setup. Shekhar might have upgraded it?

Comment 44 Humble Chirammal 2017-08-08 10:31:40 UTC

(In reply to Atin Mukherjee from comment #43)
> (In reply to Humble Chirammal from comment #41)
> > (In reply to Atin Mukherjee from comment #40)
> > > (In reply to Humble Chirammal from comment #39)
> > > > Regardless of the root cause, one thing which I would like to clarify here.
> > > > 
> > > > The 'auto_unmount' option has been set on the volume mounts and for some
> > > > reason the fuse process were vanished in this setup, however the mounts (
> > > > mount command ) were intact in these servers. I was in an impression that,
> > > > 'auto_unmount' would take down the subjected mounts if there is a failure in
> > > > original mount process. It could also be that, the monitoring process were
> > > > also taken down/failed when the issue happened. However I would like to
> > > > clarify/make sure the "unmount" happens when there is a failure in the
> > > > original mount process. 
> > > > 
> > > > csaba, can you please take a look at this scenario and share your thought?
> > > > The logs are available till comment#21.
> > > 
> > > AFAIK, Shekhar was running 3.2 client bits where auto_unmount feature is not
> > > in place. How is this relevant then?
> > 
> > How did you verify the auto_unmount is not in place ? I can clearly see that
> > auto_unmount is in place and also the glusterfs version in this setup is the
> > package which has the support.
> 
> Shekhar mentioned the version details to me earlier. And I'm not sure
> currently what you are looking at is the same setup. Shekhar might have
> upgraded it?

Afaict, no upgrade yet and from start its running with same version. I was looking at the logs which he mentioned earlier. Shekhar can confirm though.

Comment 45 Shekhar Berry 2017-08-08 10:34:53 UTC

(In reply to Humble Chirammal from comment #44)
> (In reply to Atin Mukherjee from comment #43)
> > (In reply to Humble Chirammal from comment #41)
> > > (In reply to Atin Mukherjee from comment #40)
> > > > (In reply to Humble Chirammal from comment #39)
> > > > > Regardless of the root cause, one thing which I would like to clarify here.
> > > > > 
> > > > > The 'auto_unmount' option has been set on the volume mounts and for some
> > > > > reason the fuse process were vanished in this setup, however the mounts (
> > > > > mount command ) were intact in these servers. I was in an impression that,
> > > > > 'auto_unmount' would take down the subjected mounts if there is a failure in
> > > > > original mount process. It could also be that, the monitoring process were
> > > > > also taken down/failed when the issue happened. However I would like to
> > > > > clarify/make sure the "unmount" happens when there is a failure in the
> > > > > original mount process. 
> > > > > 
> > > > > csaba, can you please take a look at this scenario and share your thought?
> > > > > The logs are available till comment#21.
> > > > 
> > > > AFAIK, Shekhar was running 3.2 client bits where auto_unmount feature is not
> > > > in place. How is this relevant then?
> > > 
> > > How did you verify the auto_unmount is not in place ? I can clearly see that
> > > auto_unmount is in place and also the glusterfs version in this setup is the
> > > package which has the support.
> > 
> > Shekhar mentioned the version details to me earlier. And I'm not sure
> > currently what you are looking at is the same setup. Shekhar might have
> > upgraded it?
> 
> Afaict, no upgrade yet and from start its running with same version. I was
> looking at the logs which he mentioned earlier. Shekhar can confirm though.

No upgrade has been done yet. The environment is exactly the same as it was when issue occurred.

Comment 46 Shekhar Berry 2017-08-08 11:06:32 UTC

Just checked again my setup with Atin, my client version is RHGS 3.2 Async which had that auto_unmount patch.

Comment 47 Atin Mukherjee 2017-08-08 12:23:44 UTC

Setting needinfo back on Csaba to check (In reply to Humble Chirammal from comment #39)
> Regardless of the root cause, one thing which I would like to clarify here.
> 
> The 'auto_unmount' option has been set on the volume mounts and for some
> reason the fuse process were vanished in this setup, however the mounts (
> mount command ) were intact in these servers. I was in an impression that,
> 'auto_unmount' would take down the subjected mounts if there is a failure in
> original mount process. It could also be that, the monitoring process were
> also taken down/failed when the issue happened. However I would like to
> clarify/make sure the "unmount" happens when there is a failure in the
> original mount process. 
> 
> csaba, can you please take a look at this scenario and share your thought?
> The logs are available till comment#21.

I have seen in my local testing (as shared in comment 28, point 4) if we do a sigkill of a mount process I do see the mount point entries but not the process.

Setting needinfo on Csaba to further comment here.

Comment 49 Shekhar Berry 2017-08-16 09:05:33 UTC

I hit the issue again even with selinux disabled while I was trying to scale with brick multiplex enabled.

In next step, I upgraded my OCP to latest bits and also upgraded RHGS client to 3.3 from 3.2.

oc version
oc v3.6.173.0.5
kubernetes v1.6.1+5115d708d7
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://gprfs013.sbu.lab.eng.bos.redhat.com:8443
openshift v3.6.173.0.5
kubernetes v1.6.1+5115d708d7

rpm -qa | grep gluster
glusterfs-client-xlators-3.8.4-34.el7rhgs.x86_64
glusterfs-3.8.4-34.el7rhgs.x86_64
glusterfs-fuse-3.8.4-34.el7rhgs.x86_64
glusterfs-libs-3.8.4-34.el7rhgs.x86_64

After upgrading I have not hit the issue again. 

I scaled upto 1000 volumes with brick multiplex enabled and did concurrent IOs on all 1000 volumes but no gluster mount failure is seen.

Latest setup is running for 72 hours without any gluster mount failing.

Comment 50 Humble Chirammal 2017-08-16 09:16:39 UTC

Thanks for the update Shekhar! comment#47 need to be addressed, so we have to open another bug on "auto_unmount". However as the issue reported here is different which has not seen with latest build, I am moving this bug to ON_QA for now.

Comment 51 krishnaram Karthick 2017-09-15 16:15:00 UTC

verified this bug in cns-deploy-5.0.0-38.el7rhgs.x86_64. The issue reported in this bug is not seen. Moving the bug to verified.

[root@dhcp46-207 ~]# oc rsh mongodb-92-1-vrz6l

sh-4.2# 
sh-4.2# 
sh-4.2# 
sh-4.2# df -h
Filesystem                                                                                    Size  Used Avail Use% Mounted on
/dev/mapper/docker-8:17-341-1d5a28c57b66145fe303ab86e321673136b7bc0e6d062f3109c6284258f6db98   10G  598M  9.4G   6% /
tmpfs                                                                                          24G     0   24G   0% /dev
tmpfs                                                                                          24G     0   24G   0% /sys/fs/cgroup
/dev/sdb1                                                                                      40G  1.7G   39G   5% /etc/hosts
shm                                                                                            64M     0   64M   0% /dev/shm
10.70.46.193:vol_69eaf705b69b91d1aa5ba816e14b2c14                                            1016M  236M  780M  24% /var/lib/mongodb/data
tmpfs                                                                                          24G   16K   24G   1% /run/secrets/kubernetes.io/serviceaccount
sh-4.2# uptime
 16:14:01 up 1 day,  7:46,  0 users,  load average: 36.13, 16.40, 9.06