Bug 786087 - [c3aa99d907591f72b6302287b9b8899514fb52f1]: source brick crashed when second operation was started when first operation was running
Summary: [c3aa99d907591f72b6302287b9b8899514fb52f1]: source brick crashed when second ...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: unclassified
Version: pre-release
Hardware: Unspecified
OS: Unspecified
medium
urgent
Target Milestone: ---
Assignee: krishnan parthasarathi
QA Contact:
URL:
Whiteboard:
: 788476 (view as bug list)
Depends On:
Blocks: 817967
TreeView+ depends on / blocked
 
Reported: 2012-01-31 11:59 UTC by Rahul C S
Modified: 2015-11-03 23:04 UTC (History)
6 users (show)

Fixed In Version: glusterfs-3.4.0
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-07-24 18:04:18 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions: 1f3a0dd4742a2fcd3215aee4a5e22125d7ea4f4d
Embargoed:


Attachments (Terms of Use)
Replace brick logs (4.18 MB, application/x-bzip)
2012-01-31 11:59 UTC, Rahul C S
no flags Details

Description Rahul C S 2012-01-31 11:59:14 UTC
Created attachment 558611 [details]
Replace brick logs

Description of problem:

Striped replicate volume info:
Volume Name: vol
Type: Striped-Replicate
Volume ID: ba41b542-bdbd-4691-989b-6103301135fa
Status: Started
Number of Bricks: 1 x 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: dagobah:/data/export1
Brick2: dagobah:/data/export2
Brick3: dagobah:/data/export3
Brick4: dagobah:/data/export4
Options Reconfigured:
performance.stat-prefetch: off
cluster.stripe-block-size: 32KB
performance.write-behind: off

1 client was running multiple levels of directory creation & another client was running rdd[./rdd -f 2GB -i rdd.in -o /data/mounts/fuse/rddfile -r 100 -t 4]

Issued replace-brick operation,
root@Dagobah:/data# gluster volume replace-brick vol dagobah:/data/export1/ dagobah:/data/export5 start
replace-brick started successfully
root@Dagobah:/data# gluster volume replace-brick vol dagobah:/data/export1/ dagobah:/data/export5 status
Number of files migrated = 106       Current file= /playground/crawl/dir.1/dir.1/dir.2 
root@Dagobah:/data# gluster volume replace-brick vol dagobah:/data/export3/ dagobah:/data/export7 start
replace-brick failed to start
root@Dagobah:/data# gluster volume replace-brick vol dagobah:/data/export3/ dagobah:/data/export7 status
incorrect source or destination brick
root@Dagobah:/data# gluster volume replace-brick vol dagobah:/data/export1/ dagobah:/data/export5 status
Source brick dagobah:/data/export1 is not online.

Source brick had crashed with following backtrace:
Core was generated by `/usr/local/sbin/glusterfsd -s localhost --volfile-id vol.dagobah.data-export1 -'.
Program terminated with signal 11, Segmentation fault.
#0  __opendir (name=0x0) at ../sysdeps/unix/opendir.c:86
86	../sysdeps/unix/opendir.c: No such file or directory.
	in ../sysdeps/unix/opendir.c
(gdb) bt
#0  __opendir (name=0x0) at ../sysdeps/unix/opendir.c:86
#1  0x00007fa1fbd1a8a7 in posix_opendir (frame=0x7fa1ff990ba0, this=0x23d8040, loc=0x7fa1fe7e9904, fd=0x7fa1f87045b0)
    at ../../../../../xlators/storage/posix/src/posix.c:568
#2  0x00007fa1fbb07112 in posix_acl_opendir (frame=0x7fa1ff9b0638, this=0x23d95a0, loc=0x7fa1fe7e9904, fd=0x7fa1f87045b0)
    at ../../../../../xlators/system/posix-acl/src/posix-acl.c:1067
#3  0x00007fa1fb8eddd0 in pl_opendir (frame=0x7fa1ff992930, this=0x23da7d0, loc=0x7fa1fe7e9904, fd=0x7fa1f87045b0)
    at ../../../../../xlators/features/locks/src/posix.c:388
#4  0x00007fa1fb6d78d8 in iot_opendir_wrapper (frame=0x7fa1ff996904, this=0x23db9e0, loc=0x7fa1fe7e9904, fd=0x7fa1f87045b0)
    at ../../../../../xlators/performance/io-threads/src/io-threads.c:1469
#5  0x00007fa201340066 in call_resume_wind (stub=0x7fa1fe7e98cc) at ../../../libglusterfs/src/call-stub.c:2359
#6  0x00007fa201347409 in call_resume (stub=0x7fa1fe7e98cc) at ../../../libglusterfs/src/call-stub.c:3932
#7  0x00007fa1fb6ce8ef in iot_worker (data=0x23ed940) at ../../../../../xlators/performance/io-threads/src/io-threads.c:138
#8  0x00007fa200cb6efc in start_thread (arg=0x7fa1f9d37700) at pthread_create.c:304
#9  0x00007fa2009f189d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#10 0x0000000000000000 in ?? ()
(gdb) f 1
#1  0x00007fa1fbd1a8a7 in posix_opendir (frame=0x7fa1ff990ba0, this=0x23d8040, loc=0x7fa1fe7e9904, fd=0x7fa1f87045b0)
    at ../../../../../xlators/storage/posix/src/posix.c:568
568	        dir = opendir (real_path);
(gdb) p real_path
$1 = 0x0


I have attached the logs for further investigation along with the replace-brick logs.

Comment 1 Rahul C S 2012-02-08 09:51:42 UTC
*** Bug 788476 has been marked as a duplicate of this bug. ***

Comment 2 Rahul C S 2012-02-08 09:53:16 UTC
Happens with latest git head also. 

Replace-brick does not work & crashes source brick

Comment 3 Amar Tumballi 2012-02-27 12:27:49 UTC
test with 3.3.0qa24 please

Comment 4 Anand Avati 2012-03-18 07:02:24 UTC
CHANGE: http://review.gluster.com/2950 (afr: Copy loc->gfid independent of lookup being fresh or otherwise) merged in master by Anand Avati (avati)

Comment 5 Rahul C S 2012-04-05 08:03:17 UTC
Works, does not crash source brick. Just prints "replace-brick failed to start" to the console.


Note You need to log in before you can comment on or make changes to this bug.