Red Hat Bugzilla – Bug 852307
abort after an interrupted replace-brick operation causes glusterd to hang
Last modified: 2015-11-03 18:04:37 EST
+++ This bug was initially created as a clone of Bug #816915 +++
Description of problem:
If source brick was killed while replace-brick operation in progress, a subsequent replace-brick abort will result in hang of glusterd. Though glusterd seems to be in _Interruptible sleep_ ('S' state of ps output), one cannot attach gdb or strace to glusterd process. Even other commands on gluster-cli fail. However attaching strace to glusterd process even before abort was attempted showed that glusterd to be hung in lsetxattr syscall. A statedump of client - a maintainance mount - and src brick revealed that setxattr call to be stuck in pump translator.
Code analysis with KP pointed the cause to crawl operation not being started after restart of brick.
Version-Release number of selected component (if applicable):
How reproducible: Consistently
Steps to Reproduce:
--- Additional comment from email@example.com on 2012-05-03 03:17:49 EDT ---
*** Bug 787123 has been marked as a duplicate of this bug. ***
--- Additional comment from firstname.lastname@example.org on 2012-07-11 02:23:08 EDT ---
patch sent @ http://review.gluster.com/3264
--- Additional comment from email@example.com on 2012-07-11 03:11:18 EDT ---
*** Bug 818519 has been marked as a duplicate of this bug. ***
--- Additional comment from firstname.lastname@example.org on 2012-07-11 03:19:27 EDT ---
*** Bug 797729 has been marked as a duplicate of this bug. ***
replace-brick functionality can be achieved by 'add-brick + remove-brick' today, so not planning to work on that.