Bug 1297280 - tar on a glusterfs mount displays "file changed as we read it" even though the file was not changed
Summary: tar on a glusterfs mount displays "file changed as we read it" even though th...
Keywords:
Status: CLOSED EOL
Alias: None
Product: GlusterFS
Classification: Community
Component: replicate
Version: 3.7.6
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Krutika Dhananjay
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-01-11 05:34 UTC by wuyl
Modified: 2018-08-14 12:52 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of: 1212842
Environment:
Last Closed: 2017-03-08 10:57:24 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)
error report log of tar (11.18 KB, text/plain)
2016-01-11 05:34 UTC, wuyl
no flags Details

Description wuyl 2016-01-11 05:34:13 UTC
Created attachment 1113494 [details]
error report log of tar

still got the issue of "file changed as we read it" when tar on glusterfs, version 3.7.6


Description of problem:
-----------------------
same as Bug #1212842

tar command on the glusterfs mount point is seen to report the following error on small files -

tar: ./linux-4.3.3/arch/cris/include/arch-v32/mach-a3/mach/hwregs/iop/asm: file changed as we read it

There were no changes made to files while the tar command was in progress. Cab be duplicated on 3 X 2 volume.


volume information:

# gluster volume info new_volume
 
Volume Name: new_volume
Type: Distributed-Replicate
Volume ID: 1cb209b9-c174-4bac-8a31-245647d8bf46
Status: Started
Number of Bricks: 3 x 2 = 6
Transport-type: tcp
Bricks:
Brick1: DN-02:/native/brick02
Brick2: DN-03:/native/brick03
Brick3: DN-04:/native/brick04
Brick4: DN-05:/native/brick05
Brick5: DN-06:/native/brick06
Brick6: DN-07:/native/brick07
Options Reconfigured:
performance.io-thread-count: 2
cluster.consistent-metadata: on
cluster.quorum-type: auto
performance.cache-size: 3GB
performance.readdir-ahead: on



Version-Release number of selected component (if applicable):
--------------------------------------------------------------
# rpm -qa glusterfs
glusterfs-3.7.6-1.el6.x86_64


How reproducible:
------------------
Easily reproducible.


Steps to Reproduce:
--------------------
1. Created the above volume with the volume options as seen above and started it.
2. Mounted the volume using fuse.
3. On the mount point, download the linux kernel, followed by untar and then tar-ing it again -
# wget https://cdn.kernel.org/pub/linux/kernel/v4.x/linux-4.3.3.tar.xz; tar xf linux-4.3.3.tar.xz; for i in {1..10}; do tar zcvf test$i.tar linux-4.3.3 2> tar$i.log; done

can see error everytime.


Actual results:
----------------
While the tar command is continuously running, the above described messages are seen for many files.


Expected results:
------------------
tar is expected to run successfully without any warning.

Comment 1 wuyl 2016-01-11 05:49:51 UTC
os info:

# lsb_release -a
LSB Version:	:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch
Distributor ID:	CentOS
Description:	CentOS release 6.3 (Final)
Release:	6.3
Codename:	Final

Comment 2 Krutika Dhananjay 2016-01-13 07:00:22 UTC
Hi,

Did you unmount and mount the volume again after enabling consistent-metadata and before running tar again on the volume? If not, could you just try that?
Here are the steps:

1) Enable consistent-metadata: # gluster volume set <VOL> cluster.consistent-metadata on

2) Unmount the volume

3) Remount the volume

4) Run tar.

-Krutika

Comment 3 wuyl 2016-01-13 08:08:17 UTC
(In reply to Krutika Dhananjay from comment #2)
> Hi,
> 
> Did you unmount and mount the volume again after enabling
> consistent-metadata and before running tar again on the volume? If not,
> could you just try that?
> Here are the steps:
> 
> 1) Enable consistent-metadata: # gluster volume set <VOL>
> cluster.consistent-metadata on
> 
> 2) Unmount the volume
> 
> 3) Remount the volume
> 
> 4) Run tar.
> 
> -Krutika

hello, Krutika:

yes, we unmount all clients, and then set the attributes. then remount again.

an interesting point is that after we mount back glusterfs, for the first repeat of tar commmand, warnings are reduced significantly. But during 2nd to 10th repeats of the tar, the amounts of warning can be seen increased.

Comment 4 Krutika Dhananjay 2016-01-14 11:09:06 UTC
(In reply to wuyl from comment #3)
> (In reply to Krutika Dhananjay from comment #2)
> > Hi,
> > 
> > Did you unmount and mount the volume again after enabling
> > consistent-metadata and before running tar again on the volume? If not,
> > could you just try that?
> > Here are the steps:
> > 
> > 1) Enable consistent-metadata: # gluster volume set <VOL>
> > cluster.consistent-metadata on
> > 
> > 2) Unmount the volume
> > 
> > 3) Remount the volume
> > 
> > 4) Run tar.
> > 
> > -Krutika
> 
> hello, Krutika:
> 
> yes, we unmount all clients, and then set the attributes. then remount again.
> 
> an interesting point is that after we mount back glusterfs, for the first
> repeat of tar commmand, warnings are reduced significantly. But during 2nd
> to 10th repeats of the tar, the amounts of warning can be seen increased.

That's interesting.
So I tried recreating the bug in vain on my setup.

Here are my steps:
1. Create a distributed replicated volume with bricks spread across multiple nodes and start and mount it using FUSE protocol.
2. Copy linux tar ball into the root of the volume and untar it.
3. Run tar on the linux source tree root.

====> Tar throws 'File changed as we read it' warnings for multiple files/directories. [EXPECTED]

4. Then I enabled consistent-metadata on the volume and ran tar again.

====> Tar threw 'File changed as we read it' for 3 directories. [EXPECTED]

5. Then I unmounted the volume and mounted it again and ran tar.

====> Tar ran to completion without any warnings.

6. I ran tar again just to see if during repeats the warnings are seen. But no, it ran just fine.

7. Created a few files under some of the directories in the volume from a different mount point and ran tar on the first mount point but it ran fine without any warnings.


Is there anything that you are doing differently here? If there are additional steps you are executing or have additional information about the test case (say if IO was in progress on the directory being archived while tar was running), could you please share that as well?

NOTE: If your intention is to merely make tar ignore these warnings, then you can use the following command line option while invoking tar: --warning=no-file-changed.

-Krutika

Comment 5 wuyl 2016-01-17 02:06:33 UTC
Hi, Krutika:

we did umount all clients before change settings. If we don't umount the clients, when modify volume config, glusterfs will report the following error:

volume set: failed: One or more connected clients cannot support the feature being set. These clients need to be upgraded or disconnected before running this command again

Comment 6 wuyl 2016-01-19 11:23:06 UTC
Hi, Krutika:

Thanks for the response. Another question is we running jobs in a cluster of grid engine. We got the following errors from time to time.

error reason: 01/18/2016 13:50:36  [5014:7471]: can't stat() "/share/test/yileilih/pipeline_whole_genome_observation/work/da

we don't know whether the above error is related to glusterfs. seems like the filename with path is too long.

the related part on glusterfs log is as follows:

[2016-01-18 05:50:19.138583] I [MSGID: 109036] [dht-common.c:7869:dht_log_new_layout_for_dir_selfheal] 0-new_volume-dht: Setting layout of /yileilih/pipeline_whole_genome_observation/work/da with [Subvol_name: new_volume-replicate-0, Err: -1 , Start: 0 , Stop: 1428805999 , Hash: 1 ], [Subvol_name: new_volume-replicate-1, Err: -1 , Start: 1428806000 , Stop: 2857611999 , Hash: 1 ], [Subvol_name: new_volume-replicate-2, Err: -1 , Start: 2857612000 , Stop: 4294967295 , Hash: 1 ], 
[2016-01-18 05:50:19.143423] I [MSGID: 109036] [dht-common.c:7869:dht_log_new_layout_for_dir_selfheal] 0-new_volume-dht: Setting layout of /yileilih/pipeline_whole_genome_observation/work/da/46c38e4e83fa2e10c3d33800042937 with [Subvol_name: new_volume-replicate-0, Err: -1 , Start: 2857612000 , Stop: 4294967295 , Hash: 1 ], [Subvol_name: new_volume-replicate-1, Err: -1 , Start: 0 , Stop: 1428805999 , Hash: 1 ], [Subvol_name: new_volume-replicate-2, Err: -1 , Start: 1428806000 , Stop: 2857611999 , Hash: 1 ], 

appreciate your help.

Comment 7 Krutika Dhananjay 2016-01-19 12:24:27 UTC
(In reply to wuyl from comment #5)
> Hi, Krutika:
> 
> we did umount all clients before change settings. If we don't umount the
> clients, when modify volume config, glusterfs will report the following
> error:
> 
> volume set: failed: One or more connected clients cannot support the feature
> being set. These clients need to be upgraded or disconnected before running
> this command again

Hi,

I meant we MUST unmount the clients AFTER setting the option too, and then remount and run tar. Could you please try that as well?

Also, I'm curious, what version of GlusterFS are the clients running? The very fact that you saw the error - "One or more connected clients cannot support the feature being set. These clients need to be upgraded or disconnected before running this command again" - means that not all clients are at the same version as the servers.



With respect to the logs of the kind:

<logs>

[2016-01-18 05:50:19.138583] I [MSGID: 109036] [dht-common.c:7869:dht_log_new_layout_for_dir_selfheal] 0-new_volume-dht: Setting layout of /yileilih/pipeline_whole_genome_observation/work/da with [Subvol_name: new_volume-replicate-0, Err: -1 , Start: 0 , Stop: 1428805999 , Hash: 1 ], [Subvol_name: new_volume-replicate-1, Err: -1 , Start: 1428806000 , Stop: 2857611999 , Hash: 1 ], [Subvol_name: new_volume-replicate-2, Err: -1 , Start: 2857612000 , Stop: 4294967295 , Hash: 1 ], 
[2016-01-18 05:50:19.143423] I [MSGID: 109036] [dht-common.c:7869:dht_log_new_layout_for_dir_selfheal] 0-new_volume-dht: Setting layout of /yileilih/pipeline_whole_genome_observation/work/da/46c38e4e83fa2e10c3d33800042937 with [Subvol_name: new_volume-replicate-0, Err: -1 , Start: 2857612000 , Stop: 4294967295 , Hash: 1 ], [Subvol_name: new_volume-replicate-1, Err: -1 , Start: 0 , Stop: 1428805999 , Hash: 1 ], [Subvol_name: new_volume-replicate-2, Err: -1 , Start: 1428806000 , Stop: 2857611999 , Hash: 1 ], 

</logs>

... these messages are informational (indicated by the 'I' after the timestamp) and therefore are harmless. These messages can be ignored.

-Krutika

Comment 8 wuyl 2016-01-21 12:34:20 UTC
all of our clients and servers are running GlusterFS 3.7.6. we installed via yum repo.

As for the log, there is a following piece of info can be seen.

<logs>

[2016-01-18 21:41:37.736914] I [MSGID: 109036] [dht-common.c:7869:dht_log_new_layout_for_dir_selfheal] 0-new_volume-dht: Setting layout of /somebody/pipeline_whole_genome/work/44/fc2094365fdd1128e6a80453cea9dc with [Subvol_name: new_volume-replicate-0, Err: -1 , Start: 2145256000 , Stop: 4294967295 , Hash: 1 ], [Subvol_name: new_volume-replicate-1, Err: 28 , Start: 0 , Stop: 0 , Hash: 0 ], [Subvol_name: new_volume-replicate-2, Err: -1 , Start: 0 , Stop: 2145255999 , Hash: 1 ], 

</logs>

it is noted that there is an "Err: 28", what's the meaning. we know that this log is just for information purpose, and we just want to get some more info about our cluster. Thanks for your kind explanation.

Comment 9 wuyl 2016-01-26 03:34:23 UTC
Hi, Krutika:

we used tar with "--warning=no-file-changed", though there is no warning displayed anymore. But seems system still got error.

For exmaple, in our shell scripts:

<shell scripts>
tar zcf $1/work.tar.gz $1/work --warning=no-file-changed && rm -rf $1/work
<shell script>

system will not execute the part "rm -rf $1/work" though there is no warning for the first part, which i think is because the error is still there.

any suggestion on how to finish the scripts? thanks

Comment 10 Krutika Dhananjay 2016-01-26 04:42:37 UTC
Hi,

So the good news is that we have recently identified couple of bugs in DHT that can lead to the 'file changed as we read it' errors.

I want the following pieces of information from you:

1) The protocol that was used to mount the volume - FUSE/NFS/Samba?

2) When you recreate this issue next time, can you please find out whether the error was seen on files or directories or both and share that information?

-Krutika

Comment 11 wuyl 2016-01-27 05:38:21 UTC
we tried both fuse and nfs, can get the warning all the time. haven't try samba yet.

the warning is shown on both folder and files.

Comment 12 wuyl 2016-02-17 10:48:57 UTC
Hi, Krutika:

by the way, what is your os and version? thx.

Comment 13 Krutika Dhananjay 2016-02-17 15:13:05 UTC
Hi,

I tried this on Fedora 22. What about you?
-Krutika

Comment 14 wuyl 2016-02-18 01:27:38 UTC
centos 6.3

yilei

Comment 15 wuyl 2016-02-20 08:25:34 UTC
why when i using yum list cmd to show glusterfs package, it always shown in yellow or red color for these glusterfs packages?

<snapshot>
glusterfs.x86_64(in yellow color)                                         3.7.6-1.el6                             @/glusterfs-3.7.6-1.el6.x86_64               
glusterfs-api.x86_64(in yellow color)                                     3.7.6-1.el6                             @/glusterfs-api-3.7.6-1.el6.x86_64           
glusterfs-cli.x86_64(in yellow color)                                     3.7.6-1.el6                             @/glusterfs-cli-3.7.6-1.el6.x86_64           
glusterfs-client-xlators.x86_64(in red color)                          3.7.6-1.el6                             @/glusterfs-client-xlators-3.7.6-1.el6.x86_64
glusterfs-fuse.x86_64(in yellow color)                                    3.7.6-1.el6                             @/glusterfs-fuse-3.7.6-1.el6.x86_64          
glusterfs-libs.x86_64(in yellow color)                                    3.7.6-1.el6                             @/glusterfs-libs-3.7.6-1.el6.x86_64          
glusterfs-server.x86_64(in red color)                                  3.7.6-1.el6                             @/glusterfs-server-3.7.6-1.el6.x86_64
</snapshot>

thanks!

Comment 16 Kaushal 2017-03-08 10:57:24 UTC
This bug is getting closed because GlusteFS-3.7 has reached its end-of-life.

Note: This bug is being closed using a script. No verification has been performed to check if it still exists on newer releases of GlusterFS.
If this bug still exists in newer GlusterFS releases, please reopen this bug against the newer release.

Comment 17 Arie Skliarouk 2018-08-13 12:49:08 UTC
The bug is still relevant on glusterfs 3.13 (as shipped in ubuntu PPA).

I traced it to different timestamps of file replicas (it looks like glusterfs returns timestamp of the real file from local node):

stat on the first replica:

root@tframe0-atl:~# stat /mnt/opsfs/db/CMTS/CMTS_MnB-101824/CMTS_MnB-101824.tar.gz /mnt/opsfs-tf0/BRICK/db/CMTS/CMTS_MnB-101824/CMTS_MnB-101824.tar.gz 
  File: '/mnt/opsfs/db/CMTS/CMTS_MnB-101824/CMTS_MnB-101824.tar.gz'
  Size: 19090142        Blocks: 37286      IO Block: 131072 regular file
Device: 8ah/138d        Inode: 9964002487893152032  Links: 1
Access: (0664/-rw-rw-r--)  Uid: (  999/     ops)   Gid: (  999/     ops)
Access: 2018-08-13 00:00:03.059002452 +0000
Modify: 2018-07-06 13:28:10.768794322 +0000
Change: 2018-07-06 13:28:14.340859975 +0000
 Birth: -
  File: '/mnt/opsfs-tf0/BRICK/db/CMTS/CMTS_MnB-101824/CMTS_MnB-101824.tar.gz'
  Size: 19090142        Blocks: 37296      IO Block: 4096   regular file
Device: fc02h/64514d    Inode: 1187765     Links: 2
Access: (0664/-rw-rw-r--)  Uid: (  999/     ops)   Gid: (  999/     ops)
Access: 2018-08-13 00:00:03.059002452 +0000
Modify: 2018-07-06 13:28:10.768794322 +0000
Change: 2018-07-06 13:28:14.340859975 +0000
 Birth: -

stat on the second replica:

root@tframe1-atl:~# stat /mnt/opsfs/db/CMTS/CMTS_MnB-101824/CMTS_MnB-101824.tar.gz /mnt/opsfs-tf0/BRICK/db/CMTS/CMTS_MnB-101824/CMTS_MnB-101824.tar.gz 
  File: '/mnt/opsfs/db/CMTS/CMTS_MnB-101824/CMTS_MnB-101824.tar.gz'
  Size: 19090142        Blocks: 37286      IO Block: 131072 regular file
Device: b8h/184d        Inode: 9964002487893152032  Links: 1
Access: (0664/-rw-rw-r--)  Uid: (  999/     ops)   Gid: (  999/     ops)
Access: 2018-07-06 13:28:07.736870120 +0000
Modify: 2018-07-06 13:28:10.768794322 +0000
Change: 2018-07-06 13:28:14.340980075 +0000
 Birth: -
stat: cannot stat '/mnt/opsfs-tf0/BRICK/db/CMTS/CMTS_MnB-101824/CMTS_MnB-101824.tar.gz': No such file or directory
root@tframe1-atl:~# stat /mnt/opsfs/db/CMTS/CMTS_MnB-101824/CMTS_MnB-101824.tar.gz /mnt/opsfs-tf1/BRICK/db/CMTS/CMTS_MnB-101824/CMTS_MnB-101824.tar.gz 
  File: '/mnt/opsfs/db/CMTS/CMTS_MnB-101824/CMTS_MnB-101824.tar.gz'
  Size: 19090142        Blocks: 37286      IO Block: 131072 regular file
Device: b8h/184d        Inode: 9964002487893152032  Links: 1
Access: (0664/-rw-rw-r--)  Uid: (  999/     ops)   Gid: (  999/     ops)
Access: 2018-07-06 13:28:07.736870120 +0000
Modify: 2018-07-06 13:28:10.768794322 +0000
Change: 2018-07-06 13:28:14.340980075 +0000
 Birth: -

After I reset timestamp of real file to be the same, the tar stopped complaining.

IMHO, glusterfs must self-heal the timestamp in such cases.

Comment 18 Arie Skliarouk 2018-08-14 12:52:37 UTC
My mistake, the problem did not go away after making sure access times are set.

Secondly, there is live bug report on this (a bug when lstat and fstat return different values):
https://bugzilla.redhat.com/show_bug.cgi?id=1058526


Note You need to log in before you can comment on or make changes to this bug.