Bug 971774

Summary: [RFE] Improve handling of failure of a disk, raid array or raid controller
Product: [Community] GlusterFS Reporter: Niels de Vos <ndevos>
Component: posixAssignee: Niels de Vos <ndevos>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: mainlineCC: gluster-bugs, primeroznl
Target Milestone: ---Keywords: FutureFeature, Patch, Triaged
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-3.4.0 Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-07-24 17:24:05 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 970960    
Attachments:
Description Flags
Test script none

Description Niels de Vos 2013-06-07 09:06:04 UTC
1. Proposed title of this feature request
   Improve handling of failure of a disk, raid array or raid controller 

3. What is the nature and description of the request?
   Currently, failure of disk, raid array or raid controller does lead to
   writes to the XFS filesystem failing.  Currently this occurs then:
   - I/O errors on the client
   - the brick with failed disks stays online in 'gluster volume status' 
     while in fact it is no longer is available
   - The node does NOT fence itself or do anything else to recover as the 
     gluster layer is unaware of the failed XFS filesystem

   This RFE requests to improve this behavior.
   The brick with failed disks should drop out of the gluster infrastructure.

4. Why does the customer need this? (List the business requirements here)
   This will improve reliability of the gluster setup,
   gluster should be notified when I/O errors to the XFS filesystem 
   of the bricks occur.

5. How would the customer like to achieve this? (List the functional requirements here)
   To be discussed, but it seems sane to get the affected brick marked
   as failed.

6. For each functional requirement listed, specify how Red Hat and the customer can test to confirm the requirement is successfully implemented.
   I/O errors on the XFS brick (i.e. when harddisk fails) should be
   handled better, i.e. 'gluster volume status' should reflect it.

7. Is there already an existing RFE upstream or in Red Hat Bugzilla?
   no

8. Does the customer have any specific timeline dependencies and which release would they like to target (i.e. RHEL5, RHEL6)?
   no

9. Is the sales team involved in this request and do they have any additional input?
   no

10. List any affected packages or components.
    gluster*

11. Would the customer be able to assist in testing this functionality if implemented?
    yes

--- Additional comment from Niels de Vos on 2013-06-05 18:26:26 CEST ---

A basic health check of the underlying filesystem should be sufficient.

After some tests, it seems that a stat() returns -EIO in case of common disk
failures.

I have a simple implementation based on timers for a health-check and will do
some tests and share the results.

Comment 1 Anand Avati 2013-06-07 09:48:48 UTC
REVIEW: http://review.gluster.org/5176 (posix: add a simple health-checker based on timers) posted (#1) for review on master by Niels de Vos (ndevos)

Comment 2 Anand Avati 2013-06-09 10:42:33 UTC
REVIEW: http://review.gluster.org/5176 (posix: add a simple health-checker based on timers) posted (#2) for review on master by Niels de Vos (ndevos)

Comment 3 Anand Avati 2013-06-24 12:14:09 UTC
REVIEW: http://review.gluster.org/5176 (posix: add a simple health-checker) posted (#3) for review on master by Niels de Vos (ndevos)

Comment 4 Anand Avati 2013-07-01 14:47:22 UTC
REVIEW: http://review.gluster.org/5176 (posix: add a simple health-checker) posted (#4) for review on master by Niels de Vos (ndevos)

Comment 5 Niels de Vos 2013-07-01 15:07:41 UTC
Created attachment 767432 [details]
Test script

Comment 6 Anand Avati 2013-07-02 08:52:33 UTC
REVIEW: http://review.gluster.org/5176 (posix: add a simple health-checker) posted (#5) for review on master by Niels de Vos (ndevos)

Comment 7 Anand Avati 2013-07-03 19:25:33 UTC
REVIEW: http://review.gluster.org/5176 (posix: add a simple health-checker) posted (#6) for review on master by Niels de Vos (ndevos)

Comment 8 Anand Avati 2013-07-04 05:35:30 UTC
COMMIT: http://review.gluster.org/5176 committed in master by Vijay Bellur (vbellur) 
------
commit 98f62a731ca13296b937bfff14d0a2f8dfc49a54
Author: Niels de Vos <ndevos>
Date:   Mon Jun 24 14:05:58 2013 +0200

    posix: add a simple health-checker
    
    Goal of this health-checker is to detect fatal issues of the underlying
    storage that is used for exporting a brick. The current implementation
    requires the filesystem to detect the storage error, after which it will
    notify the parent xlators and exit the glusterfsd (brick) process to
    prevent further troubles.
    
    The interval the health-check runs can be configured per volume with the
    storage.health-check-interval option. The default interval is 30
    seconds.
    
    It is not trivial to write an automated test-case with the current
    prove-framework. These are the manual steps that can be done to verify
    the functionality:
    
    - setup a Logical Volume (/dev/bz970960/xfs) and format is as XFS for
      brick usage
    
    - create a volume with the one brick
    
        # gluster volume create failing_xfs glufs1:/bricks/failing_xfs/data
        # gluster volume start failing_xfs
    
    - mount the volume and verify the functionality
    
    - make the storage fail (use device-mapper, or pull disks)
    
        # dmsetup table
        ..
        bz970960-xfs: 0 196608 linear 7:0 2048
    
        # echo 0  196608 error > dmsetup-error-target
        # dmsetup load bz970960-xfs dmsetup-error-target
        # dmsetup resume bz970960-xfs
    
        # dmsetup table
        ...
        bz970960-xfs: 0 196608 error
    
    - notice the errors caught by syslog:
    
        Jun 24 11:31:49 vm130-32 kernel: XFS (dm-2): metadata I/O error: block 0x0 ("xfs_buf_iodone_callbacks") error 5 buf count 512
        Jun 24 11:31:49 vm130-32 kernel: XFS (dm-2): I/O Error Detected. Shutting down filesystem
        Jun 24 11:31:49 vm130-32 kernel: XFS (dm-2): Please umount the filesystem and rectify the problem(s)
        Jun 24 11:31:49 vm130-32 kernel: VFS:Filesystem freeze failed
        Jun 24 11:31:50 vm130-32 GlusterFS[1969]: [2013-06-24 10:31:50.500674] M [posix-helpers.c:1114:posix_health_check_thread_proc] 0-failing_xfs-posix: health-check failed, going down
        Jun 24 11:32:09 vm130-32 kernel: XFS (dm-2): xfs_log_force: error 5 returned.
        Jun 24 11:32:20 vm130-32 GlusterFS[1969]: [2013-06-24 10:32:20.508690] M [posix-helpers.c:1119:posix_health_check_thread_proc] 0-failing_xfs-posix: still alive! -> SIGTERM
    
    - these errors are in the log of the brick as well:
    
        [2013-06-24 10:31:50.500607] W [posix-helpers.c:1102:posix_health_check_thread_proc] 0-failing_xfs-posix: stat() on /bricks/failing_xfs/data returned: Input/output error
        [2013-06-24 10:31:50.500674] M [posix-helpers.c:1114:posix_health_check_thread_proc] 0-failing_xfs-posix: health-check failed, going down
        [2013-06-24 10:32:20.508690] M [posix-helpers.c:1119:posix_health_check_thread_proc] 0-failing_xfs-posix: still alive! -> SIGTERM
    
    - the glusterfsd process has exited correctly:
    
        # gluster volume status
        Status of volume: failing_xfs
        Gluster process						Port	Online	Pid
        ------------------------------------------------------------------------------
        Brick glufs1:/bricks/failing_xfs/data			N/A	N	N/A
        NFS Server on localhost					2049	Y	1897
    
    Change-Id: Ic247fbefb97f7e861307a5998a9a7a3ecc80aa07
    BUG: 971774
    Signed-off-by: Niels de Vos <ndevos>
    Reviewed-on: http://review.gluster.org/5176
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Vijay Bellur <vbellur>