| Summary: | Auto-heal fails on files that are open()-ed/mmap()-ed | ||
|---|---|---|---|
| Product: | [Community] GlusterFS | Reporter: | Gordan Bobic <gordan> |
| Component: | replicate | Assignee: | Vikas Gorur <vikas> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | |
| Severity: | low | Docs Contact: | |
| Priority: | low | ||
| Version: | 2.0.4 | CC: | aavati, corentin.chary, gluster-bugs, gordan, pavan, vijay |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | Type: | --- | |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
|
Description
Anand Avati
2009-07-27 21:10:11 UTC
The system configuration is the same setup as Bug 126 (see there for volume spec files). Currently 2 nodes are online, and the 3rd is a clean, empty new node joining the cluster. This is probably also reproducible with just 1 node online and the 2nd node being the clean, empty node joining the cluster. When a new, empty node comes online, it cannot auto-heal files on the gluster file system that are currently open and/or mmap-ed on the other nodes. Files entries get created, but ls -la shows that on the new node then are 0 bytes. The complete file system resync was initiated using: # ls -laR / but this only seems to download the non-open/mmaped files. Looking at the files that suffer from this, it is striking that they are all listed as open using lsof on the other two nodes. The way the boot-strap onto gluster root goes is that the initial root mounts the gluster root, and then fires up a modified init chrooted into the directory where it mounted gluster root. Once the gluster root directory is mounted, it tries to fire up /usr/comoonics/sbin/init. Here are the files init depends on: init 1 root cwd DIR 0,19 4096 1 / init 1 root rtd DIR 0,19 4096 1 / init 1 root txt REG 0,19 47057 228797259 /usr/comoonics/sbin/init init 1 root mem REG 0,19 139416 227928927 /lib64/ld-2.5.so init 1 root mem REG 0,19 1713160 227928957 /lib64/libc-2.5.so init 1 root mem REG 0,19 23360 227928999 /lib64/libdl-2.5.so init 1 root mem REG 0,19 247528 227929101 /lib64/libsepol.so.1 init 1 root mem REG 0,19 95464 227929095 /lib64/libselinux.so.1 init 1 root 10u FIFO 0,17 14455 /dev/initctl And it is confirmed these are what init is linked against: # ldd /usr/comoonics/sbin/init libsepol.so.1 => /lib64/libsepol.so.1 (0x0000003b79a00000) libselinux.so.1 => /lib64/libselinux.so.1 (0x0000003b79e00000) libc.so.6 => /lib64/libc.so.6 (0x000000334ac00000) libdl.so.2 => /lib64/libdl.so.2 (0x000000334b000000) /lib64/ld-linux-x86-64.so.2 (0x000000334a800000) All of these happen to be among the files that have 0 size on the new node (init, and all those libraries under /lib64, plus a about 30 other libraries, all of which similarly turn up in lsof on the nodes that are already running). Other files appear to have synced OK. The problem with this is that the files didn't self-heal properly. This, init fails (as it is not a valid executable). There is a further chain of dysfunction thereafter in the process of trying to manually get the files onto the new node (touching a file on the existing nodes gives an error like "text file is busy", but if the 0 byte size on replication problem is fixed I suspect the rest will fall in place. In case you are wondering how the 2nd node came to be online, I rsync-ed the underlying file system across to the new node, so this issue didn't arise as the files required to boot were already in place. Since this clearly affects the replication of files that are open and/or mmap-ed (such as shared libraries) there is a possibility that this may be related to Bug 126 (shared library corruption). And I just checked - /usr/lib64/libglusterfs.so.0.0.0 got corrupted again on this cluster in the past 24 hours. No other libraries got corrupted, which seems interesting. Yes, now that you mention it, it looks like this is the limitation that I'm bumping into. I guess that means that for now I have to come up with a workaround, such as dumping+restoring all open files on all the running nodes when adding a new node, between when the new node mounts gluster-root and tries to chroot into it. Changing target-milestone to 2.1 as 2.1-mustfix is deprecated. Something has been bothering me about this, and I just figured what it is. The read shouldn't fail even if self-heal does, provided there is at least one copy available in the cluster. It would seem that with read-subvolume set, the local copy tries to be used even when it cannot be healed, and thus the read fails. read-subvolume should specify preference, not exclusion of other nodes. Now, granted, if self-heal worked on open files, this wouldn't be a problem, but as things are, I would argue this is still a bug on 2.0.x branch since the read shouldn't fail if there is at least one copy available in the cluster regardless of whether the read-subvolume option is set. PATCH: http://patches.gluster.com/patch/2218 in master (protocol/client: whitespace cleanup) PATCH: http://patches.gluster.com/patch/2219 in master (protoocl/client: file directory reopen support) PATCH: http://patches.gluster.com/patch/2221 in master (protocol/client: preserve open/create flags in fdctx for reopening) PATCH: http://patches.gluster.com/patch/2271 in master (Check for other return values as well from call to inode_path.) PATCH: http://patches.gluster.com/patch/2346 in master (cluster/afr: Set read-child = source regardless of foreground/background self-heal) PATCH: http://patches.gluster.com/patch/2347 in master (cluster/afr: Hold blocking locks for data self-heal.) PATCH: http://patches.gluster.com/patch/2349 in master (cluster/afr: Refactored the data self-heal algorithm.) PATCH: http://patches.gluster.com/patch/2348 in master (cluster/afr: Provide a post-post_op hook in the transaction.) PATCH: http://patches.gluster.com/patch/2350 in master (cluster/afr: Do self-heal on reopened fds.) PATCH: http://patches.gluster.com/patch/2351 in master (cluster/afr: Refactored the self-heal interface.) PATCH: http://patches.gluster.com/patch/2362 in master (cluster/afr: Do self-heal on unopened fds.) PATCH: http://patches.gluster.com/patch/2413 in master (afr: handle fdctx->pre_op_done handling) PATCH: http://patches.gluster.com/patch/2416 in master (afr: fix crash in afr_sh_data_close) PATCH: http://patches.gluster.com/patch/2473 in master (afr: fix fd reference leak) PATCH: http://patches.gluster.com/patch/2475 in master (afr: remove memcpy of @local contents in afr_local_copy) PATCH: http://patches.gluster.com/patch/2494 in master (cluster/afr: Fix conditional typo.) PATCH: http://patches.gluster.com/patch/2551 in master (afr: fix memory leaks) PATCH: http://patches.gluster.com/patch/2580 in master (afr: fix fd ref leak in self-heal) *** Bug 167 has been marked as a duplicate of this bug. *** |