Bug 1403587 - [Perf] : pcs cluster resources went into stopped state during Multithreaded perf tests on RHGS layered over RHEL 6
Summary: [Perf] : pcs cluster resources went into stopped state during Multithreaded p...
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: common-ha
Version: rhgs-3.2
Hardware: x86_64
OS: Linux
Target Milestone: ---
: RHGS 3.2.0
Assignee: Kaleb KEITHLEY
QA Contact: Ambarish
Depends On:
Blocks: 1351528 1404410 1405002 1405004
TreeView+ depends on / blocked
Reported: 2016-12-11 17:10 UTC by Ambarish
Modified: 2017-03-28 06:54 UTC (History)
13 users (show)

Fixed In Version: glusterfs-3.8.4-9
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1404410 (view as bug list)
Last Closed: 2017-03-23 05:10:43 UTC
Target Upstream Version:

Attachments (Terms of Use)

System ID Priority Status Summary Last Updated
Red Hat Bugzilla 1403919 None None None Never
Red Hat Product Errata RHSA-2017:0484 normal SHIPPED_LIVE Moderate: Red Hat Gluster Storage 3.2.0 security, bug fix, and enhancement update 2017-03-23 09:06:37 UTC

Internal Links: 1403919

Description Ambarish 2016-12-11 17:10:56 UTC
Description of problem:

4 Node Ganesha Cluster.
4 clients,1:1 mount.
Mount vers=3.

Ran iozone seq writes on a fresh setup.

Test hangs forever.

iostat showed absolutely no signs of running I/O on the servers.

Version-Release number of selected component (if applicable):


How reproducible:

2/2 on freshly installed setups.

Steps to Reproduce:

1. Create a 4 node Ganesha cluster(RHEL 6.8).Mount a 2*2 volume on 4 RHEL 7.3 clients via v3.

2. Run iozone seq writes.

Actual results:

iozone threads hang after a few minutes.

Expected results:

No hangs.

Additional info:

Server OS : RHEL 6.8
Client OS : RHEL 7.3

*Vol Config* :

Volume Name: testvol
Type: Distributed-Replicate
Volume ID: c43082bb-e807-46b8-8e07-c8eae54eec21
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Brick1: gqas013.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick0
Brick2: gqas011.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick1
Brick3: gqas005.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick2
Brick4: gqas006.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick3
Options Reconfigured:
ganesha.enable: on
features.cache-invalidation: on
nfs.disable: on
performance.readdir-ahead: on
performance.stat-prefetch: off
server.allow-insecure: on
nfs-ganesha: enable
cluster.enable-shared-storage: enable
[root@gqas005 /]#

Comment 3 Ambarish 2016-12-11 17:17:48 UTC
A slight change in the description.

How reproducible  ---> 2/3

Comment 4 Ambarish 2016-12-11 17:21:05 UTC
Ganesha was alive and running at all times.

pcs status after running(during hang) :

[root@gqas005 /]# pcs status
Cluster name: G1474623742.03
Last updated: Sun Dec 11 05:15:36 2016		Last change: Sun Dec 11 04:22:38 2016 by root via crm_attribute on gqas011.sbu.lab.eng.bos.redhat.com
Stack: cman
Current DC: gqas005.sbu.lab.eng.bos.redhat.com (version 1.1.14-8.el6-70404b0) - partition WITHOUT quorum
4 nodes and 24 resources configured

Online: [ gqas005.sbu.lab.eng.bos.redhat.com ]
OFFLINE: [ gqas006.sbu.lab.eng.bos.redhat.com gqas011.sbu.lab.eng.bos.redhat.com gqas013.sbu.lab.eng.bos.redhat.com ]

Full list of resources:

 Clone Set: nfs_setup-clone [nfs_setup]
     Stopped: [ gqas005.sbu.lab.eng.bos.redhat.com gqas006.sbu.lab.eng.bos.redhat.com gqas011.sbu.lab.eng.bos.redhat.com gqas013.sbu.lab.eng.bos.redhat.com ]
 Clone Set: nfs-mon-clone [nfs-mon]
     Stopped: [ gqas005.sbu.lab.eng.bos.redhat.com gqas006.sbu.lab.eng.bos.redhat.com gqas011.sbu.lab.eng.bos.redhat.com gqas013.sbu.lab.eng.bos.redhat.com ]
 Clone Set: nfs-grace-clone [nfs-grace]
     Stopped: [ gqas005.sbu.lab.eng.bos.redhat.com gqas006.sbu.lab.eng.bos.redhat.com gqas011.sbu.lab.eng.bos.redhat.com gqas013.sbu.lab.eng.bos.redhat.com ]
 Resource Group: gqas013.sbu.lab.eng.bos.redhat.com-group
     gqas013.sbu.lab.eng.bos.redhat.com-nfs_block	(ocf::heartbeat:portblock):	Stopped
     gqas013.sbu.lab.eng.bos.redhat.com-cluster_ip-1	(ocf::heartbeat:IPaddr):	Stopped
     gqas013.sbu.lab.eng.bos.redhat.com-nfs_unblock	(ocf::heartbeat:portblock):	Stopped
 Resource Group: gqas005.sbu.lab.eng.bos.redhat.com-group
     gqas005.sbu.lab.eng.bos.redhat.com-nfs_block	(ocf::heartbeat:portblock):	Stopped
     gqas005.sbu.lab.eng.bos.redhat.com-cluster_ip-1	(ocf::heartbeat:IPaddr):	Stopped
     gqas005.sbu.lab.eng.bos.redhat.com-nfs_unblock	(ocf::heartbeat:portblock):	Stopped
 Resource Group: gqas006.sbu.lab.eng.bos.redhat.com-group
     gqas006.sbu.lab.eng.bos.redhat.com-nfs_block	(ocf::heartbeat:portblock):	Stopped
     gqas006.sbu.lab.eng.bos.redhat.com-cluster_ip-1	(ocf::heartbeat:IPaddr):	Stopped
     gqas006.sbu.lab.eng.bos.redhat.com-nfs_unblock	(ocf::heartbeat:portblock):	Stopped
 Resource Group: gqas011.sbu.lab.eng.bos.redhat.com-group
     gqas011.sbu.lab.eng.bos.redhat.com-nfs_block	(ocf::heartbeat:portblock):	Stopped
     gqas011.sbu.lab.eng.bos.redhat.com-cluster_ip-1	(ocf::heartbeat:IPaddr):	Stopped
     gqas011.sbu.lab.eng.bos.redhat.com-nfs_unblock	(ocf::heartbeat:portblock):	Stopped

Failed Actions:
* nfs-mon_monitor_10000 on gqas005.sbu.lab.eng.bos.redhat.com 'unknown error' (1): call=12, status=Timed Out, exitreason='none',
    last-rc-change='Sun Dec 11 04:24:18 2016', queued=0ms, exec=0ms

PCSD Status:
  gqas013.sbu.lab.eng.bos.redhat.com: Online
  gqas005.sbu.lab.eng.bos.redhat.com: Online
  gqas006.sbu.lab.eng.bos.redhat.com: Online
  gqas011.sbu.lab.eng.bos.redhat.com: Online

[root@gqas005 /]#

Comment 18 Atin Mukherjee 2016-12-14 03:42:45 UTC
upstream mainline patch http://review.gluster.org/16122 posted for review.

Comment 19 Atin Mukherjee 2016-12-15 12:32:11 UTC
upstream mainline http://review.gluster.org/16122 
release-3.9 : http://review.gluster.org/16139 
release-3.8 : http://review.gluster.org/16140

downstream patch : https://code.engineering.redhat.com/gerrit/#/c/93080

Comment 21 Ambarish 2017-01-16 09:18:35 UTC
Verifed on glusterfs-3.8.4-11 and Ganesha 2.4.1-4.

Could not reproduce the reported issue.

Comment 23 errata-xmlrpc 2017-03-23 05:10:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.