1403587 – [Perf] : pcs cluster resources went into stopped state during Multithreaded perf tests on RHGS layered over RHEL 6

Bug 1403587 - [Perf] : pcs cluster resources went into stopped state during Multithreaded perf tests on RHGS layered over RHEL 6

Summary: [Perf] : pcs cluster resources went into stopped state during Multithreaded p...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	common-ha
Sub Component:
Version:	rhgs-3.2
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Target Release:	RHGS 3.2.0
Assignee:	Kaleb KEITHLEY
QA Contact:	Ambarish
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1351528 1404410 1405002 1405004
TreeView+	depends on / blocked

Reported:	2016-12-11 17:10 UTC by Ambarish
Modified:	2017-03-28 06:54 UTC (History)
CC List:	13 users (show)
Fixed In Version:	glusterfs-3.8.4-9
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1404410 (view as bug list)
Environment:
Last Closed:	2017-03-23 05:10:43 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	1403919	0	unspecified	CLOSED	[Ganesha] : pcs status is not the same across the ganesha cluster in RHEL 6 environment	2021-02-22 00:41:40 UTC
Red Hat Product Errata	RHSA-2017:0484	0	normal	SHIPPED_LIVE	Moderate: Red Hat Gluster Storage 3.2.0 security, bug fix, and enhancement update	2017-03-23 09:06:37 UTC

Internal Links: 1403919

Description Ambarish 2016-12-11 17:10:56 UTC

Description of problem:
-----------------------

4 Node Ganesha Cluster.
4 clients,1:1 mount.
Mount vers=3.

Ran iozone seq writes on a fresh setup.

Test hangs forever.

iostat showed absolutely no signs of running I/O on the servers.


Version-Release number of selected component (if applicable):
--------------------------------------------------------------

glusterfs-ganesha-3.8.4-7.el6rhs.x86_64
nfs-ganesha-2.4.1-2.el6rhs.x86_64


How reproducible:
----------------

2/2 on freshly installed setups.

Steps to Reproduce:
-------------------

1. Create a 4 node Ganesha cluster(RHEL 6.8).Mount a 2*2 volume on 4 RHEL 7.3 clients via v3.

2. Run iozone seq writes.


Actual results:
---------------

iozone threads hang after a few minutes.

Expected results:
-----------------

No hangs.

Additional info:
----------------

Server OS : RHEL 6.8
Client OS : RHEL 7.3

*Vol Config* :

Volume Name: testvol
Type: Distributed-Replicate
Volume ID: c43082bb-e807-46b8-8e07-c8eae54eec21
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: gqas013.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick0
Brick2: gqas011.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick1
Brick3: gqas005.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick2
Brick4: gqas006.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick3
Options Reconfigured:
ganesha.enable: on
features.cache-invalidation: on
nfs.disable: on
performance.readdir-ahead: on
performance.stat-prefetch: off
server.allow-insecure: on
nfs-ganesha: enable
cluster.enable-shared-storage: enable
[root@gqas005 /]#

Comment 3 Ambarish 2016-12-11 17:17:48 UTC

A slight change in the description.

How reproducible  ---> 2/3

Comment 4 Ambarish 2016-12-11 17:21:05 UTC

Ganesha was alive and running at all times.

pcs status after running(during hang) :

[root@gqas005 /]# pcs status
Cluster name: G1474623742.03
Last updated: Sun Dec 11 05:15:36 2016		Last change: Sun Dec 11 04:22:38 2016 by root via crm_attribute on gqas011.sbu.lab.eng.bos.redhat.com
Stack: cman
Current DC: gqas005.sbu.lab.eng.bos.redhat.com (version 1.1.14-8.el6-70404b0) - partition WITHOUT quorum
4 nodes and 24 resources configured

Online: [ gqas005.sbu.lab.eng.bos.redhat.com ]
OFFLINE: [ gqas006.sbu.lab.eng.bos.redhat.com gqas011.sbu.lab.eng.bos.redhat.com gqas013.sbu.lab.eng.bos.redhat.com ]

Full list of resources:

 Clone Set: nfs_setup-clone [nfs_setup]
     Stopped: [ gqas005.sbu.lab.eng.bos.redhat.com gqas006.sbu.lab.eng.bos.redhat.com gqas011.sbu.lab.eng.bos.redhat.com gqas013.sbu.lab.eng.bos.redhat.com ]
 Clone Set: nfs-mon-clone [nfs-mon]
     Stopped: [ gqas005.sbu.lab.eng.bos.redhat.com gqas006.sbu.lab.eng.bos.redhat.com gqas011.sbu.lab.eng.bos.redhat.com gqas013.sbu.lab.eng.bos.redhat.com ]
 Clone Set: nfs-grace-clone [nfs-grace]
     Stopped: [ gqas005.sbu.lab.eng.bos.redhat.com gqas006.sbu.lab.eng.bos.redhat.com gqas011.sbu.lab.eng.bos.redhat.com gqas013.sbu.lab.eng.bos.redhat.com ]
 Resource Group: gqas013.sbu.lab.eng.bos.redhat.com-group
     gqas013.sbu.lab.eng.bos.redhat.com-nfs_block	(ocf::heartbeat:portblock):	Stopped
     gqas013.sbu.lab.eng.bos.redhat.com-cluster_ip-1	(ocf::heartbeat:IPaddr):	Stopped
     gqas013.sbu.lab.eng.bos.redhat.com-nfs_unblock	(ocf::heartbeat:portblock):	Stopped
 Resource Group: gqas005.sbu.lab.eng.bos.redhat.com-group
     gqas005.sbu.lab.eng.bos.redhat.com-nfs_block	(ocf::heartbeat:portblock):	Stopped
     gqas005.sbu.lab.eng.bos.redhat.com-cluster_ip-1	(ocf::heartbeat:IPaddr):	Stopped
     gqas005.sbu.lab.eng.bos.redhat.com-nfs_unblock	(ocf::heartbeat:portblock):	Stopped
 Resource Group: gqas006.sbu.lab.eng.bos.redhat.com-group
     gqas006.sbu.lab.eng.bos.redhat.com-nfs_block	(ocf::heartbeat:portblock):	Stopped
     gqas006.sbu.lab.eng.bos.redhat.com-cluster_ip-1	(ocf::heartbeat:IPaddr):	Stopped
     gqas006.sbu.lab.eng.bos.redhat.com-nfs_unblock	(ocf::heartbeat:portblock):	Stopped
 Resource Group: gqas011.sbu.lab.eng.bos.redhat.com-group
     gqas011.sbu.lab.eng.bos.redhat.com-nfs_block	(ocf::heartbeat:portblock):	Stopped
     gqas011.sbu.lab.eng.bos.redhat.com-cluster_ip-1	(ocf::heartbeat:IPaddr):	Stopped
     gqas011.sbu.lab.eng.bos.redhat.com-nfs_unblock	(ocf::heartbeat:portblock):	Stopped

Failed Actions:
* nfs-mon_monitor_10000 on gqas005.sbu.lab.eng.bos.redhat.com 'unknown error' (1): call=12, status=Timed Out, exitreason='none',
    last-rc-change='Sun Dec 11 04:24:18 2016', queued=0ms, exec=0ms


PCSD Status:
  gqas013.sbu.lab.eng.bos.redhat.com: Online
  gqas005.sbu.lab.eng.bos.redhat.com: Online
  gqas006.sbu.lab.eng.bos.redhat.com: Online
  gqas011.sbu.lab.eng.bos.redhat.com: Online

[root@gqas005 /]#

Comment 18 Atin Mukherjee 2016-12-14 03:42:45 UTC

upstream mainline patch http://review.gluster.org/16122 posted for review.

Comment 19 Atin Mukherjee 2016-12-15 12:32:11 UTC

upstream mainline http://review.gluster.org/16122 
release-3.9 : http://review.gluster.org/16139 
release-3.8 : http://review.gluster.org/16140

downstream patch : https://code.engineering.redhat.com/gerrit/#/c/93080

Comment 21 Ambarish 2017-01-16 09:18:35 UTC

Verifed on glusterfs-3.8.4-11 and Ganesha 2.4.1-4.

Could not reproduce the reported issue.

Comment 23 errata-xmlrpc 2017-03-23 05:10:43 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0484.html

Note You need to log in before you can comment on or make changes to this bug.