1802449 – spurious self-heald.t failure

Bug 1802449 - spurious self-heald.t failure

Summary: spurious self-heald.t failure

Keywords:
Status:	CLOSED NEXTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	replicate
Sub Component:
Version:	7
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	bugs@gluster.org
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:	1801623
Blocks:
TreeView+	depends on / blocked

Reported:	2020-02-13 07:12 UTC by Pranith Kumar K
Modified:	2020-02-25 17:58 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Clone Of:	1801623
Environment:
Last Closed:	2020-02-17 09:41:23 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Gluster.org Gerrit	24117	0	None	Merged	tests: Fix spurious self-heald.t failure	2020-02-17 09:41:22 UTC

Description Pranith Kumar K 2020-02-13 07:12:17 UTC

+++ This bug was initially created as a clone of Bug #1801623 +++

20:10:58 ./tests/basic/afr/self-heald.t .. 
20:10:58 1..84
20:10:58 ok   1 [    160/   1061] <  46> 'glusterd'
20:10:58 ok   2 [      9/     17] <  47> 'pidof glusterd'
20:10:58 ok   3 [      9/     97] <  48> 'gluster --mode=script --wignore volume create patchy replica 2 builder203.int.aws.gluster.org:/d/backends/patchy0 builder203.int.aws.gluster.org:/d/backends/patchy1 builder203.int.aws.gluster.org:/d/backends/patchy2 builder203.int.aws.gluster.org:/d/backends/patchy3 builder203.int.aws.gluster.org:/d/backends/patchy4 builder203.int.aws.gluster.org:/d/backends/patchy5'
20:10:58 ok   4 [     12/    147] <  49> 'gluster --mode=script --wignore volume set patchy cluster.background-self-heal-count 0'
20:10:58 ok   5 [     12/    154] <  50> 'gluster --mode=script --wignore volume set patchy cluster.eager-lock off'
20:10:58 ok   6 [     15/    159] <  51> 'gluster --mode=script --wignore volume set patchy performance.flush-behind off'
20:10:58 ok   7 [     14/   1407] <  52> 'gluster --mode=script --wignore volume start patchy'
20:10:58 ok   8 [     10/    217] <  53> '_GFS --attribute-timeout=0 --entry-timeout=0 --volfile-id=/patchy --volfile-server=builder203.int.aws.gluster.org /mnt/glusterfs/0'
20:10:58 [2020-02-10 14:40:07.086270] E [rpc-clnt.c:346:saved_frames_unwind] (--> /build/install/lib/libglusterfs.so.0(_gf_log_callingfn+0x1bc)[0x7fb36875b6ee] (--> /build/install/lib/libgfrpc.so.0(+0x1176f)[0x7fb3680c076f] (--> /build/install/lib/libgfrpc.so.0(+0x11856)[0x7fb3680c0856] (--> /build/install/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x16c)[0x7fb3680c0de4] (--> /build/install/lib/libgfrpc.so.0(+0x12731)[0x7fb3680c1731] ))))) 0-gf-attach-rpc: forced unwinding frame type(brick operations) op(--(1)) called at 2020-02-10 14:40:07.077738 (xid=0x2)
20:10:58 got error -1 on RPC
20:10:58 ok   9 [     14/   2096] <  21> 'kill_brick patchy builder203.int.aws.gluster.org /d/backends/patchy1'
20:10:58 ok  10 [     15/   2101] <  21> 'kill_brick patchy builder203.int.aws.gluster.org /d/backends/patchy3'
20:10:58 [2020-02-10 14:40:11.315789] E [rpc-clnt.c:346:saved_frames_unwind] (--> /build/install/lib/libglusterfs.so.0(_gf_log_callingfn+0x1bc)[0x7f75e7b8a6ee] (--> /build/install/lib/libgfrpc.so.0(+0x1176f)[0x7f75e74ef76f] (--> /build/install/lib/libgfrpc.so.0(+0x11856)[0x7f75e74ef856] (--> /build/install/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x16c)[0x7f75e74efde4] (--> /build/install/lib/libgfrpc.so.0(+0x12731)[0x7f75e74f0731] ))))) 0-gf-attach-rpc: forced unwinding frame type(brick operations) op(--(1)) called at 2020-02-10 14:40:11.302544 (xid=0x2)
20:10:58 got error -1 on RPC
20:10:58 ok  11 [     11/   2103] <  21> 'kill_brick patchy builder203.int.aws.gluster.org /d/backends/patchy5'
20:10:58 ok  12 [   2386/    194] <  72> '43 get_pending_heal_count patchy'
20:10:58 ok  13 [     16/    183] <  75> '3 disconnected_brick_count patchy'
20:10:58 ok  14 [    112/    175] <  82> '53 get_pending_heal_count patchy'
20:10:58 ok  15 [     81/    208] <  87> '43 get_pending_heal_count patchy'
20:10:58 ok  16 [     16/     77] <  90> '! gluster --mode=script --wignore volume heal patchy'
20:10:58 ok  17 [     11/    155] <  91> 'gluster --mode=script --wignore volume set patchy cluster.self-heal-daemon off'
20:10:58 ok  18 [     11/     71] <  92> '! gluster --mode=script --wignore volume heal patchy'
20:10:58 ok  19 [     24/     72] <  93> '! gluster --mode=script --wignore volume heal patchy full'
20:10:58 ok  20 [     11/    163] <  94> 'gluster --mode=script --wignore volume start patchy force'
20:10:58 ok  21 [     14/    158] <  95> 'gluster --mode=script --wignore volume set patchy cluster.self-heal-daemon on'
20:10:58 ok  22 [     10/     79] <  96> 'Y glustershd_up_status'
20:10:58 ok  23 [     10/   3342] <  34> '1 afr_child_up_status_in_shd patchy 1'
20:10:58 ok  24 [     11/    329] <  34> '1 afr_child_up_status_in_shd patchy 3'
20:10:58 ok  25 [     10/     93] <  34> '1 afr_child_up_status_in_shd patchy 5'
20:10:58 ok  26 [     10/    268] < 100> 'gluster --mode=script --wignore volume heal patchy'
20:10:58 ok  27 [   5164/      1] < 103> '[ 43 -gt 0 ]'
20:10:58 ok  28 [      9/     57] < 105> 'gluster --mode=script --wignore volume heal patchy full'
20:10:58 ok  29 [     68/    154] < 106> '^0$ get_pending_heal_count patchy'
20:10:58 not ok  30 [     25/    511] < 119> '0 get_pending_heal_count patchy' -> 'Got "1" instead of "0"'
20:10:58 ok  31 [     20/      5] < 129> 'touch /mnt/glusterfs/0/f'
20:10:58 ok  32 [      8/     14] < 130> 'mkdir /mnt/glusterfs/0/d'
20:10:58 ok  33 [      9/    113] < 132> 'gluster --mode=script --wignore volume set patchy cluster.data-self-heal off'
20:10:58 ok  34 [     15/     54] < 133> 'off volume_option patchy cluster.data-self-heal'
20:10:58 ok  35 [      9/   1069] <  21> 'kill_brick patchy builder203.int.aws.gluster.org /d/backends/patchy1'
20:10:58 ok  36 [      9/   1072] <  21> 'kill_brick patchy builder203.int.aws.gluster.org /d/backends/patchy3'
20:10:58 ok  37 [      9/   1083] <  21> 'kill_brick patchy builder203.int.aws.gluster.org /d/backends/patchy5'
20:10:58 ok  38 [     18/    116] < 136> '1 get_pending_heal_count patchy'
20:10:58 ok  39 [      9/    146] < 137> 'gluster --mode=script --wignore volume start patchy force'
20:10:58 ok  40 [     16/     62] < 138> 'Y glustershd_up_status'
20:10:58 ok  41 [      9/   2572] <  34> '1 afr_child_up_status_in_shd patchy 1'
20:10:58 ok  42 [      9/     62] <  34> '1 afr_child_up_status_in_shd patchy 3'
20:10:58 ok  43 [      8/     62] <  34> '1 afr_child_up_status_in_shd patchy 5'
20:10:58 ok  44 [      9/     63] < 141> 'gluster --mode=script --wignore volume heal patchy'
20:10:58 ok  45 [      9/    119] < 142> '^0$ get_pending_heal_count patchy'
20:10:58 ok  46 [      9/    109] < 143> 'gluster --mode=script --wignore volume set patchy cluster.data-self-heal on'
20:10:58 ok  47 [     18/    114] < 146> 'gluster --mode=script --wignore volume set patchy cluster.metadata-self-heal off'
20:10:58 ok  48 [     10/     53] < 147> 'off volume_option patchy cluster.metadata-self-heal'
20:10:58 ok  49 [      9/   1068] <  21> 'kill_brick patchy builder203.int.aws.gluster.org /d/backends/patchy1'
20:10:58 ok  50 [      9/   1072] <  21> 'kill_brick patchy builder203.int.aws.gluster.org /d/backends/patchy3'
20:10:58 ok  51 [      9/   1076] <  21> 'kill_brick patchy builder203.int.aws.gluster.org /d/backends/patchy5'
20:10:58 ok  52 [     10/      8] < 150> 'chmod 777 /mnt/glusterfs/0/f'
20:10:58 ok  53 [      9/    108] < 151> '1 get_pending_heal_count patchy'
20:10:58 ok  54 [      9/    153] < 152> 'gluster --mode=script --wignore volume start patchy force'
20:10:58 ok  55 [     13/     63] < 153> 'Y glustershd_up_status'
20:10:58 ok  56 [      9/   2571] <  34> '1 afr_child_up_status_in_shd patchy 1'
20:10:58 ok  57 [      9/     64] <  34> '1 afr_child_up_status_in_shd patchy 3'
20:10:58 ok  58 [      9/     62] <  34> '1 afr_child_up_status_in_shd patchy 5'
20:10:58 ok  59 [      8/     62] < 156> 'gluster --mode=script --wignore volume heal patchy'
20:10:58 ok  60 [     10/    121] < 157> '^0$ get_pending_heal_count patchy'
20:10:58 ok  61 [      9/    116] < 158> 'gluster --mode=script --wignore volume set patchy cluster.metadata-self-heal on'
20:10:58 ok  62 [     12/    111] < 161> 'gluster --mode=script --wignore volume set patchy cluster.entry-self-heal off'
20:10:58 ok  63 [     11/     55] < 162> 'off volume_option patchy cluster.entry-self-heal'
20:10:58 ok  64 [      9/   1067] <  21> 'kill_brick patchy builder203.int.aws.gluster.org /d/backends/patchy1'
20:10:58 ok  65 [      9/   1072] <  21> 'kill_brick patchy builder203.int.aws.gluster.org /d/backends/patchy3'
20:10:58 ok  66 [      9/   1073] <  21> 'kill_brick patchy builder203.int.aws.gluster.org /d/backends/patchy5'
20:10:58 ok  67 [     10/     10] < 164> 'touch /mnt/glusterfs/0/d/a'
20:10:58 ok  68 [    129/      1] < 168> 'test 2 -eq 2 -o 2 -eq 4'
20:10:58 ok  69 [      9/    124] < 169> 'gluster --mode=script --wignore volume start patchy force'
20:10:58 ok  70 [     11/     64] < 170> 'Y glustershd_up_status'
20:10:58 ok  71 [      9/   2580] <  34> '1 afr_child_up_status_in_shd patchy 1'
20:10:58 ok  72 [      9/     64] <  34> '1 afr_child_up_status_in_shd patchy 3'
20:10:58 ok  73 [      9/     73] <  34> '1 afr_child_up_status_in_shd patchy 5'
20:10:58 ok  74 [      9/     60] < 172> 'gluster --mode=script --wignore volume heal patchy'
20:10:58 ok  75 [     11/    120] < 173> '^0$ get_pending_heal_count patchy'
20:10:58 ok  76 [      9/    116] < 174> 'gluster --mode=script --wignore volume set patchy cluster.entry-self-heal on'
20:10:58 ok  77 [     11/     57] < 178> '! gluster --mode=script --wignore volume heal fail info'
20:10:58 ok  78 [      9/   7088] < 181> 'gluster --mode=script --wignore volume stop patchy'
20:10:58 ok  79 [      9/     83] < 182> '! gluster --mode=script --wignore volume heal patchy info'
20:10:58 ok  80 [      9/   3089] < 185> 'gluster --mode=script --wignore volume delete patchy'
20:10:58 ok  81 [     12/    112] < 186> 'gluster --mode=script --wignore volume create patchy builder203.int.aws.gluster.org:/d/backends/patchy{6}'
20:10:58 ok  82 [     11/    110] < 187> 'gluster --mode=script --wignore volume start patchy'
20:10:58 ok  83 [     11/     95] < 188> '! gluster --mode=script --wignore volume heal patchy info'
20:10:58 ok  84 [     11/    202] < 191> '! log_newer 1581345602 offset reused from another DIR'
20:10:58 Failed 1/84 subtests 

Problem:
heal-info code assumes that all indices in xattrop directory 
definitely need heal. There is one corner case.
The very first xattrop on the file will lead to adding the
gfid to 'xattrop' index in fop path and in _cbk path it is
removed because the fop is zero-xattr xattrop in success case.
These gfids could be read by heal-info and shown as needing heal.

Fix:
Check the pending flag to see if the file definitely needs or
not instead of which index is being crawled at the moment.


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

--- Additional comment from Worker Ant on 2020-02-12 14:21:44 UTC ---

REVIEW: https://review.gluster.org/24110 (tests: Fix spurious self-heald.t failure) merged (#2) on master by Pranith Kumar Karampuri

Comment 1 Worker Ant 2020-02-13 07:13:57 UTC

REVIEW: https://review.gluster.org/24117 (tests: Fix spurious self-heald.t failure) posted (#1) for review on release-7 by Pranith Kumar Karampuri

Comment 2 Worker Ant 2020-02-17 09:41:23 UTC

REVIEW: https://review.gluster.org/24117 (tests: Fix spurious self-heald.t failure) merged (#1) on release-7 by Pranith Kumar Karampuri

Note You need to log in before you can comment on or make changes to this bug.