Bug 1297280 - tar on a glusterfs mount displays "file changed as we read it" even though the file was not changed
tar on a glusterfs mount displays "file changed as we read it" even though th...
Status: CLOSED EOL
Product: GlusterFS
Classification: Community
Component: replicate (Show other bugs)
3.7.6
x86_64 Linux
medium Severity medium
: ---
: ---
Assigned To: Krutika Dhananjay
: Triaged
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2016-01-11 00:34 EST by wuyl
Modified: 2017-03-08 05:57 EST (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1212842
Environment:
Last Closed: 2017-03-08 05:57:24 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
error report log of tar (11.18 KB, text/plain)
2016-01-11 00:34 EST, wuyl
no flags Details

  None (edit)
Description wuyl 2016-01-11 00:34:13 EST
Created attachment 1113494 [details]
error report log of tar

still got the issue of "file changed as we read it" when tar on glusterfs, version 3.7.6


Description of problem:
-----------------------
same as Bug #1212842

tar command on the glusterfs mount point is seen to report the following error on small files -

tar: ./linux-4.3.3/arch/cris/include/arch-v32/mach-a3/mach/hwregs/iop/asm: file changed as we read it

There were no changes made to files while the tar command was in progress. Cab be duplicated on 3 X 2 volume.


volume information:

# gluster volume info new_volume
 
Volume Name: new_volume
Type: Distributed-Replicate
Volume ID: 1cb209b9-c174-4bac-8a31-245647d8bf46
Status: Started
Number of Bricks: 3 x 2 = 6
Transport-type: tcp
Bricks:
Brick1: DN-02:/native/brick02
Brick2: DN-03:/native/brick03
Brick3: DN-04:/native/brick04
Brick4: DN-05:/native/brick05
Brick5: DN-06:/native/brick06
Brick6: DN-07:/native/brick07
Options Reconfigured:
performance.io-thread-count: 2
cluster.consistent-metadata: on
cluster.quorum-type: auto
performance.cache-size: 3GB
performance.readdir-ahead: on



Version-Release number of selected component (if applicable):
--------------------------------------------------------------
# rpm -qa glusterfs
glusterfs-3.7.6-1.el6.x86_64


How reproducible:
------------------
Easily reproducible.


Steps to Reproduce:
--------------------
1. Created the above volume with the volume options as seen above and started it.
2. Mounted the volume using fuse.
3. On the mount point, download the linux kernel, followed by untar and then tar-ing it again -
# wget https://cdn.kernel.org/pub/linux/kernel/v4.x/linux-4.3.3.tar.xz; tar xf linux-4.3.3.tar.xz; for i in {1..10}; do tar zcvf test$i.tar linux-4.3.3 2> tar$i.log; done

can see error everytime.


Actual results:
----------------
While the tar command is continuously running, the above described messages are seen for many files.


Expected results:
------------------
tar is expected to run successfully without any warning.
Comment 1 wuyl 2016-01-11 00:49:51 EST
os info:

# lsb_release -a
LSB Version:	:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch
Distributor ID:	CentOS
Description:	CentOS release 6.3 (Final)
Release:	6.3
Codename:	Final
Comment 2 Krutika Dhananjay 2016-01-13 02:00:22 EST
Hi,

Did you unmount and mount the volume again after enabling consistent-metadata and before running tar again on the volume? If not, could you just try that?
Here are the steps:

1) Enable consistent-metadata: # gluster volume set <VOL> cluster.consistent-metadata on

2) Unmount the volume

3) Remount the volume

4) Run tar.

-Krutika
Comment 3 wuyl 2016-01-13 03:08:17 EST
(In reply to Krutika Dhananjay from comment #2)
> Hi,
> 
> Did you unmount and mount the volume again after enabling
> consistent-metadata and before running tar again on the volume? If not,
> could you just try that?
> Here are the steps:
> 
> 1) Enable consistent-metadata: # gluster volume set <VOL>
> cluster.consistent-metadata on
> 
> 2) Unmount the volume
> 
> 3) Remount the volume
> 
> 4) Run tar.
> 
> -Krutika

hello, Krutika:

yes, we unmount all clients, and then set the attributes. then remount again.

an interesting point is that after we mount back glusterfs, for the first repeat of tar commmand, warnings are reduced significantly. But during 2nd to 10th repeats of the tar, the amounts of warning can be seen increased.
Comment 4 Krutika Dhananjay 2016-01-14 06:09:06 EST
(In reply to wuyl from comment #3)
> (In reply to Krutika Dhananjay from comment #2)
> > Hi,
> > 
> > Did you unmount and mount the volume again after enabling
> > consistent-metadata and before running tar again on the volume? If not,
> > could you just try that?
> > Here are the steps:
> > 
> > 1) Enable consistent-metadata: # gluster volume set <VOL>
> > cluster.consistent-metadata on
> > 
> > 2) Unmount the volume
> > 
> > 3) Remount the volume
> > 
> > 4) Run tar.
> > 
> > -Krutika
> 
> hello, Krutika:
> 
> yes, we unmount all clients, and then set the attributes. then remount again.
> 
> an interesting point is that after we mount back glusterfs, for the first
> repeat of tar commmand, warnings are reduced significantly. But during 2nd
> to 10th repeats of the tar, the amounts of warning can be seen increased.

That's interesting.
So I tried recreating the bug in vain on my setup.

Here are my steps:
1. Create a distributed replicated volume with bricks spread across multiple nodes and start and mount it using FUSE protocol.
2. Copy linux tar ball into the root of the volume and untar it.
3. Run tar on the linux source tree root.

====> Tar throws 'File changed as we read it' warnings for multiple files/directories. [EXPECTED]

4. Then I enabled consistent-metadata on the volume and ran tar again.

====> Tar threw 'File changed as we read it' for 3 directories. [EXPECTED]

5. Then I unmounted the volume and mounted it again and ran tar.

====> Tar ran to completion without any warnings.

6. I ran tar again just to see if during repeats the warnings are seen. But no, it ran just fine.

7. Created a few files under some of the directories in the volume from a different mount point and ran tar on the first mount point but it ran fine without any warnings.


Is there anything that you are doing differently here? If there are additional steps you are executing or have additional information about the test case (say if IO was in progress on the directory being archived while tar was running), could you please share that as well?

NOTE: If your intention is to merely make tar ignore these warnings, then you can use the following command line option while invoking tar: --warning=no-file-changed.

-Krutika
Comment 5 wuyl 2016-01-16 21:06:33 EST
Hi, Krutika:

we did umount all clients before change settings. If we don't umount the clients, when modify volume config, glusterfs will report the following error:

volume set: failed: One or more connected clients cannot support the feature being set. These clients need to be upgraded or disconnected before running this command again
Comment 6 wuyl 2016-01-19 06:23:06 EST
Hi, Krutika:

Thanks for the response. Another question is we running jobs in a cluster of grid engine. We got the following errors from time to time.

error reason: 01/18/2016 13:50:36  [5014:7471]: can't stat() "/share/test/yileilih/pipeline_whole_genome_observation/work/da

we don't know whether the above error is related to glusterfs. seems like the filename with path is too long.

the related part on glusterfs log is as follows:

[2016-01-18 05:50:19.138583] I [MSGID: 109036] [dht-common.c:7869:dht_log_new_layout_for_dir_selfheal] 0-new_volume-dht: Setting layout of /yileilih/pipeline_whole_genome_observation/work/da with [Subvol_name: new_volume-replicate-0, Err: -1 , Start: 0 , Stop: 1428805999 , Hash: 1 ], [Subvol_name: new_volume-replicate-1, Err: -1 , Start: 1428806000 , Stop: 2857611999 , Hash: 1 ], [Subvol_name: new_volume-replicate-2, Err: -1 , Start: 2857612000 , Stop: 4294967295 , Hash: 1 ], 
[2016-01-18 05:50:19.143423] I [MSGID: 109036] [dht-common.c:7869:dht_log_new_layout_for_dir_selfheal] 0-new_volume-dht: Setting layout of /yileilih/pipeline_whole_genome_observation/work/da/46c38e4e83fa2e10c3d33800042937 with [Subvol_name: new_volume-replicate-0, Err: -1 , Start: 2857612000 , Stop: 4294967295 , Hash: 1 ], [Subvol_name: new_volume-replicate-1, Err: -1 , Start: 0 , Stop: 1428805999 , Hash: 1 ], [Subvol_name: new_volume-replicate-2, Err: -1 , Start: 1428806000 , Stop: 2857611999 , Hash: 1 ], 

appreciate your help.
Comment 7 Krutika Dhananjay 2016-01-19 07:24:27 EST
(In reply to wuyl from comment #5)
> Hi, Krutika:
> 
> we did umount all clients before change settings. If we don't umount the
> clients, when modify volume config, glusterfs will report the following
> error:
> 
> volume set: failed: One or more connected clients cannot support the feature
> being set. These clients need to be upgraded or disconnected before running
> this command again

Hi,

I meant we MUST unmount the clients AFTER setting the option too, and then remount and run tar. Could you please try that as well?

Also, I'm curious, what version of GlusterFS are the clients running? The very fact that you saw the error - "One or more connected clients cannot support the feature being set. These clients need to be upgraded or disconnected before running this command again" - means that not all clients are at the same version as the servers.



With respect to the logs of the kind:

<logs>

[2016-01-18 05:50:19.138583] I [MSGID: 109036] [dht-common.c:7869:dht_log_new_layout_for_dir_selfheal] 0-new_volume-dht: Setting layout of /yileilih/pipeline_whole_genome_observation/work/da with [Subvol_name: new_volume-replicate-0, Err: -1 , Start: 0 , Stop: 1428805999 , Hash: 1 ], [Subvol_name: new_volume-replicate-1, Err: -1 , Start: 1428806000 , Stop: 2857611999 , Hash: 1 ], [Subvol_name: new_volume-replicate-2, Err: -1 , Start: 2857612000 , Stop: 4294967295 , Hash: 1 ], 
[2016-01-18 05:50:19.143423] I [MSGID: 109036] [dht-common.c:7869:dht_log_new_layout_for_dir_selfheal] 0-new_volume-dht: Setting layout of /yileilih/pipeline_whole_genome_observation/work/da/46c38e4e83fa2e10c3d33800042937 with [Subvol_name: new_volume-replicate-0, Err: -1 , Start: 2857612000 , Stop: 4294967295 , Hash: 1 ], [Subvol_name: new_volume-replicate-1, Err: -1 , Start: 0 , Stop: 1428805999 , Hash: 1 ], [Subvol_name: new_volume-replicate-2, Err: -1 , Start: 1428806000 , Stop: 2857611999 , Hash: 1 ], 

</logs>

... these messages are informational (indicated by the 'I' after the timestamp) and therefore are harmless. These messages can be ignored.

-Krutika
Comment 8 wuyl 2016-01-21 07:34:20 EST
all of our clients and servers are running GlusterFS 3.7.6. we installed via yum repo.

As for the log, there is a following piece of info can be seen.

<logs>

[2016-01-18 21:41:37.736914] I [MSGID: 109036] [dht-common.c:7869:dht_log_new_layout_for_dir_selfheal] 0-new_volume-dht: Setting layout of /somebody/pipeline_whole_genome/work/44/fc2094365fdd1128e6a80453cea9dc with [Subvol_name: new_volume-replicate-0, Err: -1 , Start: 2145256000 , Stop: 4294967295 , Hash: 1 ], [Subvol_name: new_volume-replicate-1, Err: 28 , Start: 0 , Stop: 0 , Hash: 0 ], [Subvol_name: new_volume-replicate-2, Err: -1 , Start: 0 , Stop: 2145255999 , Hash: 1 ], 

</logs>

it is noted that there is an "Err: 28", what's the meaning. we know that this log is just for information purpose, and we just want to get some more info about our cluster. Thanks for your kind explanation.
Comment 9 wuyl 2016-01-25 22:34:23 EST
Hi, Krutika:

we used tar with "--warning=no-file-changed", though there is no warning displayed anymore. But seems system still got error.

For exmaple, in our shell scripts:

<shell scripts>
tar zcf $1/work.tar.gz $1/work --warning=no-file-changed && rm -rf $1/work
<shell script>

system will not execute the part "rm -rf $1/work" though there is no warning for the first part, which i think is because the error is still there.

any suggestion on how to finish the scripts? thanks
Comment 10 Krutika Dhananjay 2016-01-25 23:42:37 EST
Hi,

So the good news is that we have recently identified couple of bugs in DHT that can lead to the 'file changed as we read it' errors.

I want the following pieces of information from you:

1) The protocol that was used to mount the volume - FUSE/NFS/Samba?

2) When you recreate this issue next time, can you please find out whether the error was seen on files or directories or both and share that information?

-Krutika
Comment 11 wuyl 2016-01-27 00:38:21 EST
we tried both fuse and nfs, can get the warning all the time. haven't try samba yet.

the warning is shown on both folder and files.
Comment 12 wuyl 2016-02-17 05:48:57 EST
Hi, Krutika:

by the way, what is your os and version? thx.
Comment 13 Krutika Dhananjay 2016-02-17 10:13:05 EST
Hi,

I tried this on Fedora 22. What about you?
-Krutika
Comment 14 wuyl 2016-02-17 20:27:38 EST
centos 6.3

yilei
Comment 15 wuyl 2016-02-20 03:25:34 EST
why when i using yum list cmd to show glusterfs package, it always shown in yellow or red color for these glusterfs packages?

<snapshot>
glusterfs.x86_64(in yellow color)                                         3.7.6-1.el6                             @/glusterfs-3.7.6-1.el6.x86_64               
glusterfs-api.x86_64(in yellow color)                                     3.7.6-1.el6                             @/glusterfs-api-3.7.6-1.el6.x86_64           
glusterfs-cli.x86_64(in yellow color)                                     3.7.6-1.el6                             @/glusterfs-cli-3.7.6-1.el6.x86_64           
glusterfs-client-xlators.x86_64(in red color)                          3.7.6-1.el6                             @/glusterfs-client-xlators-3.7.6-1.el6.x86_64
glusterfs-fuse.x86_64(in yellow color)                                    3.7.6-1.el6                             @/glusterfs-fuse-3.7.6-1.el6.x86_64          
glusterfs-libs.x86_64(in yellow color)                                    3.7.6-1.el6                             @/glusterfs-libs-3.7.6-1.el6.x86_64          
glusterfs-server.x86_64(in red color)                                  3.7.6-1.el6                             @/glusterfs-server-3.7.6-1.el6.x86_64
</snapshot>

thanks!
Comment 16 Kaushal 2017-03-08 05:57:24 EST
This bug is getting closed because GlusteFS-3.7 has reached its end-of-life.

Note: This bug is being closed using a script. No verification has been performed to check if it still exists on newer releases of GlusterFS.
If this bug still exists in newer GlusterFS releases, please reopen this bug against the newer release.

Note You need to log in before you can comment on or make changes to this bug.