Bug 1329421 - [GSS] - Failover did not occur when a brick went unresponsive.
Summary: [GSS] - Failover did not occur when a brick went unresponsive.
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: core
Version: rhgs-3.1
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: ---
Assignee: Atin Mukherjee
QA Contact: Anoop
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-04-21 22:03 UTC by Oonkwee Lim
Modified: 2019-10-10 11:56 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-05-10 05:15:29 UTC
Embargoed:


Attachments (Terms of Use)

Description Oonkwee Lim 2016-04-21 22:03:10 UTC
Description of problem:
Customer stated that when one of his node went unresponsive, the failover to the good node did not occur, causing the FUSE client to hang.

This specific machine was having a high CPU load, I tried to ssh to the gluster server node 1, It was waiting up to 2 minutes to ask for the password. When I opened the VMware console, it was not showing a 'frozen' server, the server was alive, but looks like was 'too busy'. I pressed enter and the Red Hat screen was jumping the lines, but never showed the prompt.

From the client's log, we do see the unwinding of saved frames possibly due to timeouts.

Version-Release number of selected component (if applicable):
Red Hat Enterprise Linux Server release 7.1 (Maipo)
Red Hat Gluster Storage Server 3.1 Update 1

How reproducible:
Occurred only once

Steps to Reproduce:
1. None
2.
3.

Actual results:
The FUSE client kept hanging.

Expected results:
The FUSE client should not hang but should switch over to the other node.

Additional info:


Note You need to log in before you can comment on or make changes to this bug.