Bug 814661

Summary: 'ls' on directory failed with 'Invalid argument' error message after add-brick, rebalance volume operations
Product: [Community] GlusterFS Reporter: Shwetha Panduranga <shwetha.h.panduranga>
Component: distributeAssignee: shishir gowda <sgowda>
Status: CLOSED DUPLICATE QA Contact:
Severity: high Docs Contact:
Priority: unspecified    
Version: mainlineCC: gluster-bugs, nsathyan
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-05-14 04:00:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
Attaching fuse mount log file none

Description Shwetha Panduranga 2012-04-20 11:02:48 UTC
Created attachment 578960 [details]
Attaching fuse mount log file

Description of problem:
------------------------
1) removal of directories failed on fuse mount when add-brick, rebalance operations were performed on distribute-replicate volume. 

2) subsequent listing of files on the directory failed with "Invalid argument" error message. 

Version-Release number of selected component (if applicable):
3.3.0qa36

How reproducible:
often

create_dirs.sh:-
-------------------
#!/bin/bash

mountpoint=`pwd`

mkdir $mountpoint/deep_dirs
cd $mountpoint/deep_dirs

for i in {1..100};
	do
	level1_dir=$mountpoint/deep_dirs/level1.dir.$i
	mkdir $level1_dir
	cd $level1_dir
	for j in {1..50};
		do
		level2_dir=level2.dir.$j
		mkdir $level2_dir
		done
	cd $mountpoint/deep_dirs
	done

gfsc1.sh:-
----------
#!/bin/bash

mountpoint=`pwd`
for i in {1..5}
do 
	level1_dir=$mountpoint/fuse1.$i
	mkdir $level1_dir

	cd $level1_dir

	for j in {1..5}
	do
		level2_dir=dir.$j
		mkdir $level2_dir
		cd $level2_dir
 
		for k in {1..5}
		do 
			echo "Creating File: $leve1_dir/$level2_dir/file.$k"
			dd if=/dev/zero of=file.$k bs="$k"M count=1024 
		done
		cd $level1_dir
	done
	cd $mountpoint
done

nfsc1.sh:-
----------
#!/bin/bash

mountpoint=`pwd`

for i in {1..5}
do 
	level1_dir=$mountpoint/nfs1.$i
	mkdir $level1_dir
	cd $level1_dir

	for j in {1..5}
	do
		level2_dir=dir.$j
		mkdir $level2_dir
		cd $level2_dir
 
		for k in {1..5}
		do
			echo "Creating File: $leve1_dir/$level2_dir/file.$k" 
			dd if=/dev/zero of=file.$k bs="$k"M count=1024 
		done
		cd $level1_dir
	done
	cd $mountpoint
done

fs_perf_test.sh:-
----------------
#!/bin/bash

while true;
	do
	rm -rf ./sync_field/
	/usr/local/sbin/fs_perf 1024
	done


Steps to Reproduce:
1.create a distribute-replicate volume(2x2)
2.create 2 fuse, 2 nfs mounts
3.run gfsc1.sh on fuse_mount1, create_dirs.sh on fuse_mount2, nfsc1.sh on nfs_mount1, fs_perf_test.sh on nfs_mount2
4.after the script create_dirs.sh is successfully executed, bring down 2 bricks one from each replicate pair.
5.rm -rf deep_dirs directory (created by script create_dirs.sh)
6.add-bricks to volume
7.start rebalance
8.set self-heal-daemon off on the volume.
9. ls -l deep_dirs
  
Actual results:
[04/20/12 - 18:18:35 root@APP-CLIENT1 gfsc2]# rm -rf deep_dirs/
rm: cannot remove `deep_dirs': Directory not empty
[04/20/12 - 18:29:15 root@APP-CLIENT1 gfsc2]# ls -l deep_dirs/
ls: cannot access deep_dirs/level1.dir.73: Invalid argument
ls: cannot access deep_dirs/level1.dir.82: Invalid argument
ls: cannot access deep_dirs/level1.dir.83: Invalid argument
ls: cannot access deep_dirs/level1.dir.87: Invalid argument
ls: cannot access deep_dirs/level1.dir.89: Invalid argument
ls: cannot access deep_dirs/level1.dir.90: Invalid argument
ls: cannot access deep_dirs/level1.dir.92: Invalid argument
ls: cannot access deep_dirs/level1.dir.96: Invalid argument
total 300
drwxr-xr-x 52 root root 8192 Apr 20 18:28 level1.dir.100
??????????  ? ?    ?       ?            ? level1.dir.73
drwxr-xr-x 52 root root 8192 Apr 20 18:27 level1.dir.74
drwxr-xr-x 52 root root 8192 Apr 20 18:27 level1.dir.75
drwxr-xr-x 52 root root 8192 Apr 20 18:27 level1.dir.76
drwxr-xr-x 52 root root 8192 Apr 20 18:27 level1.dir.77
drwxr-xr-x 52 root root 8192 Apr 20 18:27 level1.dir.78
drwxr-xr-x 52 root root 8192 Apr 20 18:27 level1.dir.79
drwxr-xr-x 52 root root 8192 Apr 20 18:27 level1.dir.80
drwxr-xr-x 52 root root 8192 Apr 20 18:27 level1.dir.81
??????????  ? ?    ?       ?            ? level1.dir.82
??????????  ? ?    ?       ?            ? level1.dir.83
drwxr-xr-x 52 root root 8192 Apr 20 18:27 level1.dir.84
drwxr-xr-x 52 root root 8192 Apr 20 18:27 level1.dir.85
drwxr-xr-x 52 root root 8192 Apr 20 18:27 level1.dir.86
??????????  ? ?    ?       ?            ? level1.dir.87
drwxr-xr-x 52 root root 8192 Apr 20 18:27 level1.dir.88
??????????  ? ?    ?       ?            ? level1.dir.89
??????????  ? ?    ?       ?            ? level1.dir.90
drwxr-xr-x 52 root root 8192 Apr 20 18:28 level1.dir.91
??????????  ? ?    ?       ?            ? level1.dir.92
drwxr-xr-x 52 root root 8192 Apr 20 18:28 level1.dir.93
drwxr-xr-x 52 root root 8192 Apr 20 18:28 level1.dir.94
drwxr-xr-x 52 root root 8192 Apr 20 18:28 level1.dir.95
??????????  ? ?    ?       ?            ? level1.dir.96
drwxr-xr-x 52 root root 8192 Apr 20 18:28 level1.dir.97
drwxr-xr-x 52 root root 8192 Apr 20 18:28 level1.dir.98
drwxr-xr-x 52 root root 8192 Apr 20 18:28 level1.dir.99

Additional info:
--------------------
[04/20/12 - 21:40:49 root@APP-SERVER2 ~]# gluster volume info
 
Volume Name: dstore
Type: Distributed-Replicate
Volume ID: e8755038-e649-4525-96f9-b52357d00d99
Status: Started
Number of Bricks: 3 x 2 = 6
Transport-type: tcp
Bricks:
Brick1: 192.168.2.35:/export1/dstore1
Brick2: 192.168.2.36:/export1/dstore1
Brick3: 192.168.2.35:/export2/dstore1
Brick4: 192.168.2.36:/export2/dstore1
Brick5: 192.168.2.35:/export1/dstore2
Brick6: 192.168.2.36:/export1/dstore2
Options Reconfigured:
cluster.self-heal-daemon: off

Comment 1 shishir gowda 2012-04-24 06:10:25 UTC
This should be fixed as part of bug 802233.
Can you please verify?

Comment 2 shishir gowda 2012-05-07 06:14:18 UTC
Can you please check if the issue is fixed?

Comment 3 Shwetha Panduranga 2012-05-12 11:00:22 UTC
when rm -rf is in progress on mount point and we perform add-brick and start rebalance,the following is the output of "rm -rf" operation:


[root@AFR-Server1 gfsc2]# rm -rf deep_dirs/
rm: cannot remove `deep_dirs/level1.dir.25': Directory not empty
rm: cannot remove `deep_dirs/level1.dir.26': Directory not empty
rm: cannot remove `deep_dirs/level1.dir.27': Directory not empty
rm: cannot remove `deep_dirs/level1.dir.28': Directory not empty
rm: cannot remove `deep_dirs/level1.dir.29': Directory not empty
rm: cannot remove `deep_dirs/level1.dir.30': Directory not empty
rm: cannot remove `deep_dirs/level1.dir.31': Directory not empty
rm: cannot remove `deep_dirs/level1.dir.32': Directory not empty
rm: cannot remove `deep_dirs/level1.dir.33': Directory not empty

subsequent 'ls -l deep_dirs' will lists all the files which were not deleted in "rm -rf" operation

"ls -l" reporting "Invalid argument" is not seen on 3.3.0qa41.

Comment 4 shishir gowda 2012-05-14 04:00:32 UTC

*** This bug has been marked as a duplicate of bug 802233 ***