Bug 970960 - [RFE] Improve handling of failure of a disk, raid array or raid controller
[RFE] Improve handling of failure of a disk, raid array or raid controller
Status: CLOSED DUPLICATE of bug 852578
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: glusterfs (Show other bugs)
2.1
All Linux
medium Severity medium
: ---
: ---
Assigned To: Niels de Vos
Sachidananda Urs
: FutureFeature, Patch, Triaged
Depends On: 971774
Blocks:
  Show dependency treegraph
 
Reported: 2013-06-05 06:24 EDT by Christian Horn
Modified: 2013-07-11 05:54 EDT (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-07-11 05:54:48 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 391573 None None None Never

  None (edit)
Description Christian Horn 2013-06-05 06:24:55 EDT
1. Proposed title of this feature request
   Improve handling of failure of a disk, raid array or raid controller 

3. What is the nature and description of the request?
   Currently, failure of disk, raid array or raid controller does lead to
   writes to the XFS filesystem failing.  Currently this occurs then:
   - I/O errors on the client
   - the brick with failed disks stays online in 'gluster volume status' 
     while in fact it is no longer is available
   - The node does NOT fence itself or do anything else to recover as the 
     gluster layer is unaware of the failed XFS filesystem

   This RFE requests to improve this behavior.
   The brick with failed disks should drop out of the gluster infrastructure.

4. Why does the customer need this? (List the business requirements here)
   This will improve reliability of the gluster setup,
   gluster should be notified when I/O errors to the XFS filesystem 
   of the bricks occur.

5. How would the customer like to achieve this? (List the functional requirements here)
   To be discussed, but it seems sane to get the affected brick marked
   as failed.

6. For each functional requirement listed, specify how Red Hat and the customer can test to confirm the requirement is successfully implemented.
   I/O errors on the XFS brick (i.e. when harddisk fails) should be
   handled better, i.e. 'gluster volume status' should reflect it.

7. Is there already an existing RFE upstream or in Red Hat Bugzilla?
   no

8. Does the customer have any specific timeline dependencies and which release would they like to target (i.e. RHEL5, RHEL6)?
   no

9. Is the sales team involved in this request and do they have any additional input?
   no

10. List any affected packages or components.
    gluster*

11. Would the customer be able to assist in testing this functionality if implemented?
    yes
Comment 2 Niels de Vos 2013-06-05 12:26:26 EDT
A basic health check of the underlying filesystem should be sufficient.

After some tests, it seems that a stat() returns -EIO in case of common disk
failures.

I have a simple implementation based on timers for a health-check and will do
some tests and share the results.
Comment 5 Niels de Vos 2013-07-01 11:10:24 EDT
Bug 971774 has a test-script as attachment 767432 [details].

Note You need to log in before you can comment on or make changes to this bug.