| Summary: | Corruption after node down/up | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | [Community] GlusterFS | Reporter: | Jeff Evans <jeffe> | ||||||
| Component: | replicate | Assignee: | Vikas Gorur <vikas> | ||||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | |||||||
| Severity: | high | Docs Contact: | |||||||
| Priority: | urgent | ||||||||
| Version: | 2.0.3 | CC: | aavati, gluster-bugs, vijay | ||||||
| Target Milestone: | --- | ||||||||
| Target Release: | --- | ||||||||
| Hardware: | x86_64 | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | Type: | --- | |||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Attachments: |
|
||||||||
Low Down:
Two machines, RHEL 5.3, fuse 2.7.4, each running a single brick server.
clients on the same machines, AFR with writebehind & local
read-subvolume enabled.
clients run with --disable-direct-io-mode.
Activity:
Each client has several open fd's, including a xen image.
On one machine, kill glusterfsd while the open fd's are still being written to.
umount & mount the underlying FS.
Restart glusterfsd.
Weirdness:
On the same machine, client log entries:
... forced unwinding frame type(1) ...
... disconnected ... connected.
Server log entries:
...
[server-protocol.c:3903:server_readv] invalid argument: state->fd
...
[fd.c:326:gf_fd_fdptr_get] fd: invalid argument
[server-protocol.c:4108:server_flush] invalid argument: state->fd
[server-protocol.c:3903:server_readv] invalid argument: state->fd
...
[posix.c:1712:posix_writev] export: writev failed on
fd=0x2aaaac0040c0: Bad file descriptor
...
[server-protocol.c:3956:server_writev] invalid argument: state->fd
[server-protocol.c:4062:server_fsync] invalid argument: state->fd
...
[fd.c:282:gf_fd_put] fd: invalid argument
...
***REPEATS AD NAUSEUM***
Really, Really Weird:
The fd's seem to have been confused some how, as data from the xen
images began to appear in other open files.
This occurred on the underlying FS on the broken server side only.
-SERVER-
volume export
type storage/posix
option directory /export/u0
end-volume
volume posix-locks
type features/posix-locks
subvolumes export
end-volume
volume u0
type performance/io-threads
subvolumes posix-locks
end-volume
volume server
type protocol/server
option transport-type tcp/server
option auth.addr.u0.allow 192.*
subvolumes u0
end-volume
-CLIENT-
volume u0-2
type protocol/client
option transport-type tcp/client
option remote-host 192.168.200.2
option remote-subvolume u0
end-volume
volume u0-1
type protocol/client
option transport-type tcp/client
option remote-host 192.168.200.1
option remote-subvolume u0
end-volume
volume afr
type cluster/afr
option read-subvolume u0-2 # (set to u0-1 on machine1)
subvolumes u0-1 u0-2
end-volume
volume writebehind
type performance/write-behind
option cache-size 4MB
subvolumes afr
end-volume
PATCH: http://patches.gluster.com/patch/943 in master (protocol/client: fixed registration of saved_fds) PATCH: http://patches.gluster.com/patch/943 in release-2.0 (protocol/client: fixed registration of saved_fds) |
Created attachment 54 [details] captured from screen