Bug 1413005

Summary: [Remove-brick] Lookup failed errors are seen in rebalance logs during rm -rf
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Prasad Desala <tdesala>
Component: distributeAssignee: Nithya Balachandran <nbalacha>
Status: CLOSED ERRATA QA Contact: Prasad Desala <tdesala>
Severity: medium Docs Contact:
Priority: unspecified    
Version: rhgs-3.2CC: amukherj, rhinduja, rhs-bugs, sheggodu, storage-qa-internal, tdesala
Target Milestone: ---Keywords: ZStream
Target Release: RHGS 3.4.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: rebase
Fixed In Version: glusterfs-3.12.2-1 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-09-04 06:29:55 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1503134    

Description Prasad Desala 2017-01-13 11:49:22 UTC
Description of problem:
=======================
While remove-brick is in-progress, started removing the entire dataset on the mount point using rm -rf from multiple terminals. The rebalance logs are getting filled with many lookup failed error messages.

When these lookup failed errors were logged, it is just displayed with the file name and lookup failed message. There should be some additional logging information that should get logged along with the lookup failed message which makes easy to find the cause of lookup failure.

[2017-01-13 09:33:07.090970] E [MSGID: 109023] [dht-rebalance.c:2200:gf_defrag_migrate_single_file] 0-distrep-dht: Migrate file failed: vmlinux.lds.S lookup failed
[2017-01-13 09:33:08.568525] E [MSGID: 109023] [dht-rebalance.c:2200:gf_defrag_migrate_single_file] 0-distrep-dht: Migrate file failed: __ashrdi3.S lookup failed
[2017-01-13 09:33:08.571814] E [MSGID: 109023] [dht-rebalance.c:2200:gf_defrag_migrate_single_file] 0-distrep-dht: Migrate file failed: __ashldi3.S lookup failed
[2017-01-13 09:33:08.586351] E [MSGID: 109023] [dht-rebalance.c:2200:gf_defrag_migrate_single_file] 0-distrep-dht: Migrate file failed: ashrdi3.c lookup failed
[2017-01-13 09:33:08.590265] E [MSGID: 109023] [dht-rebalance.c:2200:gf_defrag_migrate_single_file] 0-distrep-dht: Migrate file failed: __lshrdi3.S lookup failed
[2017-01-13 09:33:08.599489] E [MSGID: 109023] [dht-rebalance.c:2200:gf_defrag_migrate_single_file] 0-distrep-dht: Migrate file failed: checksum.c lookup failed
[2017-01-13 09:33:08.601561] E [MSGID: 109023] [dht-rebalance.c:2200:gf_defrag_migrate_single_file] 0-distrep-dht: Migrate file failed: __ucmpdi2.S lookup failed
[2017-01-13 09:33:08.612718] E [MSGID: 109023] [dht-rebalance.c:2200:gf_defrag_migrate_single_file] 0-distrep-dht: Migrate file failed: delay.c lookup failed
[2017-01-13 09:33:08.614396] E [MSGID: 109023] [dht-rebalance.c:2200:gf_defrag_migrate_single_file] 0-distrep-dht: Migrate file failed: bitops.c lookup failed
[2017-01-13 09:33:08.618801] E [MSGID: 109023] [dht-rebalance.c:2200:gf_defrag_migrate_single_file] 0-distrep-dht: Migrate file failed: internal.h lookup failed
[2017-01-13 09:33:08.620468] E [MSGID: 109023] [dht-rebalance.c:2200:gf_defrag_migrate_single_file] 0-distrep-dht: Migrate file failed: do_csum.S lookup failed
[2017-01-13 09:33:08.624202] E [MSGID: 109023] [dht-rebalance.c:2200:gf_defrag_migrate_single_file] 0-distrep-dht: Migrate file failed: lshrdi3.c lookup failed
[2017-01-13 09:33:08.626305] E [MSGID: 109023] [dht-rebalance.c:2200:gf_defrag_migrate_single_file] 0-distrep-dht: Migrate file failed: memcpy.S lookup failed
[2017-01-13 09:33:08.631211] E [MSGID: 109023] [dht-rebalance.c:2200:gf_defrag_migrate_single_file] 0-distrep-dht: Migrate file failed: memset.S lookup failed
[2017-01-13 09:33:08.636532] E [MSGID: 109023] [dht-rebalance.c:2200:gf_defrag_migrate_single_file] 0-distrep-dht: Migrate file failed: usercopy.c lookup failed


Version-Release number of selected component (if applicable):
3.8.4-11.el7rhgs.x86_64

How reproducible:
always

Steps to Reproduce:
==================
1) Create distributed-replicate volume and start it.
2) FUSE mount the volume.
3) Under mount point, create two sub directories say /mnt/terminal{1..2}
4) Start Linux kernel untar from both sub directories that is /mnt/terminal1 and /mnt/terminal2
5) Wait for few mins and while untar is in-progress, add couple of bricks to the volume. 
6) Immediately remove the added bricks in step-5 // this will start rebalance 
7) Wait for few mins and while untar is in-progress issue rm -rf * from each terminal directories.

Check for the rebalance logs.

Actual results:
===============
Lookup failed errors are seen in rebalance logs during rm -rf

Expected results:
=================
There should not be any lookup failed errors in rebalance logs. 

Additional info:
================
These lookup failures are not impacting the remove-brick rebalance. On all the nodes, remove-brick rebalance completed successfully.

Comment 3 Ambarish 2017-01-13 12:01:13 UTC
I hit this on add-brick + rm on the Scale setup as well.

Comment 11 Prasad Desala 2018-04-17 12:19:13 UTC
Verified this BZ on glusterfs version: 3.12.2-7.el7rhgs.x86_64. 

Now, lookup failed errors are logged with the error message.

[MSGID: 109023] [dht-rebalance.c:2618:gf_defrag_migrate_single_file] 0-distrepx3-dht: Migrate file failed: /linux-4.9.27/Documentation/devicetree/bindings/phy/keystone-usb-phy.txt lookup failed [No such file or directory]
[MSGID: 109023] [dht-rebalance.c:2618:gf_defrag_migrate_single_file] 0-distrepx3-dht: Migrate file failed: /a84-40 lookup failed [No such file or directory]
[MSGID: 109023] [dht-rebalance.c:2618:gf_defrag_migrate_single_file] 0-distrepx3-dht: Migrate file failed: /a72-40 lookup failed [No such file or directory]

Moving this BZ to Verified.

Comment 12 errata-xmlrpc 2018-09-04 06:29:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2607