Bug 761927 (GLUSTER-195) - Corruption after node down/up
Summary: Corruption after node down/up
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: GLUSTER-195
Product: GlusterFS
Classification: Community
Component: replicate
Version: 2.0.3
Hardware: x86_64
OS: Linux
urgent
high
Target Milestone: ---
Assignee: Vikas Gorur
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-08-07 05:32 UTC by Jeff Evans
Modified: 2015-12-01 16:45 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)
server log (193.56 KB, application/x-gunzip)
2009-08-07 02:32 UTC, Jeff Evans
no flags Details
Client log (1.87 KB, text/plain)
2009-08-07 02:33 UTC, Jeff Evans
no flags Details

Description Jeff Evans 2009-08-07 02:33:48 UTC
Created attachment 54 [details]
captured from screen

Comment 1 Jeff Evans 2009-08-07 05:32:11 UTC
Low Down:

Two machines, RHEL 5.3, fuse 2.7.4, each running a single brick server.

clients on the same machines, AFR with writebehind & local
read-subvolume enabled.

clients run with --disable-direct-io-mode.

Activity:

Each client has several open fd's, including a xen image.

On one machine, kill glusterfsd while the open fd's are still being written to.

umount & mount the underlying FS.

Restart glusterfsd.

Weirdness:

On the same machine, client log entries:
... forced unwinding frame type(1) ...
... disconnected ... connected.

Server log entries:

...
[server-protocol.c:3903:server_readv] invalid argument: state->fd
...
[fd.c:326:gf_fd_fdptr_get] fd: invalid argument
[server-protocol.c:4108:server_flush] invalid argument: state->fd
[server-protocol.c:3903:server_readv] invalid argument: state->fd
...
[posix.c:1712:posix_writev] export: writev failed on
fd=0x2aaaac0040c0: Bad file descriptor
...
[server-protocol.c:3956:server_writev] invalid argument: state->fd
[server-protocol.c:4062:server_fsync] invalid argument: state->fd
...
[fd.c:282:gf_fd_put] fd: invalid argument
...

***REPEATS AD NAUSEUM***

Really, Really Weird:

The fd's seem to have been confused some how, as data from the xen
images began to appear in other open files.

This occurred on the underlying FS on the broken server side only.

-SERVER-

volume export
  type storage/posix
  option directory /export/u0
end-volume

volume posix-locks
  type features/posix-locks
  subvolumes export
end-volume

volume u0
  type performance/io-threads
  subvolumes posix-locks
end-volume

volume server
  type protocol/server
  option transport-type tcp/server
  option auth.addr.u0.allow 192.*
  subvolumes u0
end-volume

-CLIENT-

volume u0-2
    type protocol/client   
    option transport-type tcp/client
    option remote-host 192.168.200.2
    option remote-subvolume u0
end-volume

volume u0-1
    type protocol/client
    option transport-type tcp/client
    option remote-host 192.168.200.1
    option remote-subvolume u0
end-volume

volume afr
   type cluster/afr
   option read-subvolume u0-2   # (set to u0-1 on machine1)
   subvolumes u0-1 u0-2
end-volume

volume writebehind
  type performance/write-behind
  option cache-size 4MB
  subvolumes afr
end-volume

Comment 2 Anand Avati 2009-08-07 05:42:12 UTC
PATCH: http://patches.gluster.com/patch/943 in master (protocol/client: fixed
registration of saved_fds)

PATCH: http://patches.gluster.com/patch/943 in release-2.0 (protocol/client:
fixed registration of saved_fds)


Note You need to log in before you can comment on or make changes to this bug.