Bug 1546941

Summary: [Rebalance] ENOSPC errors on few files in rebalance logs
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Prasad Desala <tdesala>
Component: distributeAssignee: Nithya Balachandran <nbalacha>
Status: CLOSED ERRATA QA Contact: Prasad Desala <tdesala>
Severity: low Docs Contact:
Priority: low    
Version: rhgs-3.4CC: rhinduja, rhs-bugs, storage-qa-internal
Target Milestone: ---Keywords: Regression
Target Release: RHGS 3.4.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.12.2-8 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1553598 (view as bug list) Environment:
Last Closed: 2018-09-04 06:42:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1553598, 1555161    
Bug Blocks: 1503137    

Description Prasad Desala 2018-02-20 05:57:45 UTC
Description of problem:
=======================
After rebalance is triggered on the volume, on few files I am seeing ENOSPC errors in rebalance logs though there is enough space left on the bricks.

2018-02-20 05:31:01.051156] E [MSGID: 109023] [dht-rebalance.c:2749:gf_defrag_migrate_single_file] 0-distrepx3-dht: migrate-data on /linux-4.6.4/Documentation/ABI/testing/configfs-usb-gadget-serial failed: [No space left on device]
[2018-02-20 05:31:01.139027] E [MSGID: 109023] [dht-rebalance.c:2749:gf_defrag_migrate_single_file] 0-distrepx3-dht: migrate-data on /linux-4.6.4/Documentation/ABI/testing/debugfs-pktcdvd failed: [No space left on device]
[2018-02-20 05:31:01.769876] E [MSGID: 109023] [dht-rebalance.c:2749:gf_defrag_migrate_single_file] 0-distrepx3-dht: migrate-data on /linux-4.6.4/Documentation/ABI/testing/sysfs-bus-iio-meas-spec failed: [No space left on device]
[2018-02-20 05:31:03.973459] E [MSGID: 109023] [dht-rebalance.c:2749:gf_defrag_migrate_single_file] 0-distrepx3-dht: migrate-data on /linux-4.6.4/Documentation/ABI/testing/sysfs-ibft failed: [No space left on device]
[2018-02-20 05:31:09.066247] E [MSGID: 109023] [dht-rebalance.c:2749:gf_defrag_migrate_single_file] 0-distrepx3-dht: migrate-data on /linux-4.6.4/Documentation/DocBook/media/v4l/vidioc-g-fbuf.xml failed: [No space left on device]
[2018-02-20 05:31:13.597532] E [MSGID: 109023] [dht-rebalance.c:2749:gf_defrag_migrate_single_file] 0-distrepx3-dht: migrate-data on /linux-4.6.4/Documentation/RCU/trace.txt failed: [No space left on device]
[2018-02-20 05:31:24.303807] E [MSGID: 109023] [dht-rebalance.c:2749:gf_defrag_migrate_single_file] 0-distrepx3-dht: migrate-data on /linux-4.6.4/Documentation/cgroup-v1/cgroups.txt failed: [No space left on device]
[2018-02-20 05:31:24.992058] E [MSGID: 109023] [dht-rebalance.c:2749:gf_defrag_migrate_single_file] 0-distrepx3-dht: migrate-data on /linux-4.6.4/Documentation/connector/connector.txt failed: [No space left on device]
[2018-02-20 05:31:30.169573] E [MSGID: 109023] [dht-rebalance.c:2749:gf_defrag_migrate_single_file] 0-distrepx3-dht: migrate-data on /linux-4.6.4/Documentation/devicetree/bindings/arm/calxeda/l2ecc.txt failed: [No space left on device]
[2018-02-20 05:31:33.802256] E [MSGID: 109023] [dht-rebalance.c:2749:gf_defrag_migrate_single_file] 0-distrepx3-dht: migrate-data on /linux-4.6.4/Documentation/devicetree/bindings/arm/omap/crossbar.txt failed: [No space left on device]
[2018-02-20 05:31:39.996225] E [MSGID: 109023] [dht-rebalance.c:2749:gf_defrag_migrate_single_file] 0-distrepx3-dht: migrate-data on /linux-4.6.4/Documentation/devicetree/bindings/clock/ti/mux.txt failed: [No space left on device]
[2018-02-20 05:31:40.264191] E [MSGID: 109023] [dht-rebalance.c:2749:gf_defrag_migrate_single_file] 0-distrepx3-dht: migrate-data on /linux-4.6.4/Documentation/devicetree/bindings/clock/clk-palmas-clk32kg-clocks.txt failed: [No space left on device]

Grep output for a file in rebalance logs:

# grep -i /linux-4.6.4/Documentation/hsi.txt /var/log/glusterfs/distrepx3-rebalance.log 
[2018-02-20 05:33:04.980564] I [dht-rebalance.c:1513:dht_migrate_file] 0-distrepx3-dht: /linux-4.6.4/Documentation/hsi.txt: attempting to move from distrepx3-replicate-0 to distrepx3-replicate-2
[2018-02-20 05:33:05.096191] W [MSGID: 109023] [dht-rebalance.c:962:__dht_check_free_space] 0-distrepx3-dht: data movement of file {blocks:6 name:(/linux-4.6.4/Documentation/hsi.txt) } would result in dst node (distrepx3-replicate-2:41451088) having lower disk space then the source node (distrepx3-replicate-0:41453744).Skipping file.
[2018-02-20 05:33:05.127613] I [MSGID: 109126] [dht-rebalance.c:2714:gf_defrag_migrate_single_file] 0-distrepx3-dht: File migration skipped for /linux-4.6.4/Documentation/hsi.txt.
[2018-02-20 05:33:05.127739] E [MSGID: 109023] [dht-rebalance.c:2749:gf_defrag_migrate_single_file] 0-distrepx3-dht: migrate-data on /linux-4.6.4/Documentation/hsi.txt failed: [No space left on device]


Version-Release number of selected component (if applicable):
3.12.2-4.el7rhgs.x86_64

How reproducible:
1/1

Steps to Reproduce:
===================
1) Create a x3 volume and start it.
2) FUSE mount it on multiple clients.
3) Run linux kernel untar from two clients .
4) While IO is in-progress, add bricks to the volume and start rebalance without force.

Actual results:
===============
Seeing ENOSPC errors on few files in rebalance logs though there is enough space left on the bricks.

Expected results:
=================
NO errors in rebalance logs.

Comment 11 Prasad Desala 2018-04-24 11:34:20 UTC
Verified this BZ on glusterfs version 3.12.2-8.el7rhgs.x86_64.

Followed the same steps as in the description, we are not seeing ENOSPC errors in rebalance logs. 

Moving this BZ to Verified.

Comment 13 errata-xmlrpc 2018-09-04 06:42:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2607