Bug 761937 (GLUSTER-205)
Summary: | [ glusterfs 2.0.6rc4 ] - Hard disk failure not handled correctly | ||
---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Gururaj K <guru> |
Component: | replicate | Assignee: | Vikas Gorur <vikas> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | |
Severity: | low | Docs Contact: | |
Priority: | low | ||
Version: | 2.0.5 | CC: | amarts, gluster-bugs, vijay |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | Type: | --- | |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 762118 |
Description
Amar Tumballi
2009-08-11 19:21:06 UTC
* 4 server distribute-replicate setup * Write behind on client side, IO threads on server side * Error observed when one of the servers - brick8 (second subvolume of replicate) encountered a hardware error and /jbod (on which the backend resided) got remounted read-only [root@brick8 ~]# dmesg | tail .. sd 1:2:0:0: SCSI error: return code = 0x00040000 end_request: I/O error, dev sdb, sector 4630780962 EXT3-fs error (device sdb1): ext3_get_inode_loc: unable to read inode block - inode=289424022, block=578847616 [root@brick8 ~]# touch /jbod/abcd touch: cannot touch `/jbod/abcd': Read-only file system * Test tools failed with various messages: ------------------------------------------- iozone: .. Error reading block at 4346937344 read: File descriptor in bad state ------------------------------------------- exnihilate.sh: .. split: _91.10042: Input/output error ------------------------------------------- rsync: .. rsync: mkstemp "/mnt/pavan/rc4/client03/rsync/usr/include/python2.4/.sysmodule.h.RO0w3Q" failed: Input/output error (5) rsync: mkstemp "/mnt/pavan/rc4/client03/rsync/usr/include/sys/.sysmacros.h.gH5HMG" failed: Input/output error (5) ------------------------------------------- dbench: (Not the exact output): "No such file or directory" Something else I observed in this situation: A file was present only on the subvolume that was read-only. When we did a 'stat' on that file, it didn't get created on the other subvolume by self-heal. Stepping through afr_lookup_cbk in gdb, it was found that the open_fd_count returned was 1, even though I'm 99% sure that the file hadn't been opened by any other process. Checking the backend glusterfsd's fd's in /proc also did not show the file as open. Due to the non-zero open_fd_count, self-heal wasn't happening. (In reply to comment #2) > Something else I observed in this situation: > > A file was present only on the subvolume that was read-only. When we did a > 'stat' on that file, it didn't get created on the other subvolume by self-heal. > > Stepping through afr_lookup_cbk in gdb, it was found that the open_fd_count > returned was 1, even though I'm 99% sure that the file hadn't been opened by > any other process. Checking the backend glusterfsd's fd's in /proc also did not > show the file as open. Due to the non-zero open_fd_count, self-heal wasn't > happening. I have reported a separate bug to track the above issue (and another one that is related): http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=214 PATCH: http://patches.gluster.com/patch/2426 in master (cluster/afr: Refactored lookup_cbk and introduce precedence of errors.) |