Bug 1297280
Summary: | tar on a glusterfs mount displays "file changed as we read it" even though the file was not changed | ||||||
---|---|---|---|---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | wuyl <wuyl> | ||||
Component: | replicate | Assignee: | Krutika Dhananjay <kdhananj> | ||||
Status: | CLOSED EOL | QA Contact: | |||||
Severity: | medium | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 3.7.6 | CC: | bugs, gkuri, kdhananj, skliarie+redhat-bugzilla, smohan, ssampat, wuyl | ||||
Target Milestone: | --- | Keywords: | Triaged | ||||
Target Release: | --- | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | 1212842 | Environment: | |||||
Last Closed: | 2017-03-08 10:57:24 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
wuyl
2016-01-11 05:34:13 UTC
os info: # lsb_release -a LSB Version: :core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch Distributor ID: CentOS Description: CentOS release 6.3 (Final) Release: 6.3 Codename: Final Hi, Did you unmount and mount the volume again after enabling consistent-metadata and before running tar again on the volume? If not, could you just try that? Here are the steps: 1) Enable consistent-metadata: # gluster volume set <VOL> cluster.consistent-metadata on 2) Unmount the volume 3) Remount the volume 4) Run tar. -Krutika (In reply to Krutika Dhananjay from comment #2) > Hi, > > Did you unmount and mount the volume again after enabling > consistent-metadata and before running tar again on the volume? If not, > could you just try that? > Here are the steps: > > 1) Enable consistent-metadata: # gluster volume set <VOL> > cluster.consistent-metadata on > > 2) Unmount the volume > > 3) Remount the volume > > 4) Run tar. > > -Krutika hello, Krutika: yes, we unmount all clients, and then set the attributes. then remount again. an interesting point is that after we mount back glusterfs, for the first repeat of tar commmand, warnings are reduced significantly. But during 2nd to 10th repeats of the tar, the amounts of warning can be seen increased. (In reply to wuyl from comment #3) > (In reply to Krutika Dhananjay from comment #2) > > Hi, > > > > Did you unmount and mount the volume again after enabling > > consistent-metadata and before running tar again on the volume? If not, > > could you just try that? > > Here are the steps: > > > > 1) Enable consistent-metadata: # gluster volume set <VOL> > > cluster.consistent-metadata on > > > > 2) Unmount the volume > > > > 3) Remount the volume > > > > 4) Run tar. > > > > -Krutika > > hello, Krutika: > > yes, we unmount all clients, and then set the attributes. then remount again. > > an interesting point is that after we mount back glusterfs, for the first > repeat of tar commmand, warnings are reduced significantly. But during 2nd > to 10th repeats of the tar, the amounts of warning can be seen increased. That's interesting. So I tried recreating the bug in vain on my setup. Here are my steps: 1. Create a distributed replicated volume with bricks spread across multiple nodes and start and mount it using FUSE protocol. 2. Copy linux tar ball into the root of the volume and untar it. 3. Run tar on the linux source tree root. ====> Tar throws 'File changed as we read it' warnings for multiple files/directories. [EXPECTED] 4. Then I enabled consistent-metadata on the volume and ran tar again. ====> Tar threw 'File changed as we read it' for 3 directories. [EXPECTED] 5. Then I unmounted the volume and mounted it again and ran tar. ====> Tar ran to completion without any warnings. 6. I ran tar again just to see if during repeats the warnings are seen. But no, it ran just fine. 7. Created a few files under some of the directories in the volume from a different mount point and ran tar on the first mount point but it ran fine without any warnings. Is there anything that you are doing differently here? If there are additional steps you are executing or have additional information about the test case (say if IO was in progress on the directory being archived while tar was running), could you please share that as well? NOTE: If your intention is to merely make tar ignore these warnings, then you can use the following command line option while invoking tar: --warning=no-file-changed. -Krutika Hi, Krutika: we did umount all clients before change settings. If we don't umount the clients, when modify volume config, glusterfs will report the following error: volume set: failed: One or more connected clients cannot support the feature being set. These clients need to be upgraded or disconnected before running this command again Hi, Krutika: Thanks for the response. Another question is we running jobs in a cluster of grid engine. We got the following errors from time to time. error reason: 01/18/2016 13:50:36 [5014:7471]: can't stat() "/share/test/yileilih/pipeline_whole_genome_observation/work/da we don't know whether the above error is related to glusterfs. seems like the filename with path is too long. the related part on glusterfs log is as follows: [2016-01-18 05:50:19.138583] I [MSGID: 109036] [dht-common.c:7869:dht_log_new_layout_for_dir_selfheal] 0-new_volume-dht: Setting layout of /yileilih/pipeline_whole_genome_observation/work/da with [Subvol_name: new_volume-replicate-0, Err: -1 , Start: 0 , Stop: 1428805999 , Hash: 1 ], [Subvol_name: new_volume-replicate-1, Err: -1 , Start: 1428806000 , Stop: 2857611999 , Hash: 1 ], [Subvol_name: new_volume-replicate-2, Err: -1 , Start: 2857612000 , Stop: 4294967295 , Hash: 1 ], [2016-01-18 05:50:19.143423] I [MSGID: 109036] [dht-common.c:7869:dht_log_new_layout_for_dir_selfheal] 0-new_volume-dht: Setting layout of /yileilih/pipeline_whole_genome_observation/work/da/46c38e4e83fa2e10c3d33800042937 with [Subvol_name: new_volume-replicate-0, Err: -1 , Start: 2857612000 , Stop: 4294967295 , Hash: 1 ], [Subvol_name: new_volume-replicate-1, Err: -1 , Start: 0 , Stop: 1428805999 , Hash: 1 ], [Subvol_name: new_volume-replicate-2, Err: -1 , Start: 1428806000 , Stop: 2857611999 , Hash: 1 ], appreciate your help. (In reply to wuyl from comment #5) > Hi, Krutika: > > we did umount all clients before change settings. If we don't umount the > clients, when modify volume config, glusterfs will report the following > error: > > volume set: failed: One or more connected clients cannot support the feature > being set. These clients need to be upgraded or disconnected before running > this command again Hi, I meant we MUST unmount the clients AFTER setting the option too, and then remount and run tar. Could you please try that as well? Also, I'm curious, what version of GlusterFS are the clients running? The very fact that you saw the error - "One or more connected clients cannot support the feature being set. These clients need to be upgraded or disconnected before running this command again" - means that not all clients are at the same version as the servers. With respect to the logs of the kind: <logs> [2016-01-18 05:50:19.138583] I [MSGID: 109036] [dht-common.c:7869:dht_log_new_layout_for_dir_selfheal] 0-new_volume-dht: Setting layout of /yileilih/pipeline_whole_genome_observation/work/da with [Subvol_name: new_volume-replicate-0, Err: -1 , Start: 0 , Stop: 1428805999 , Hash: 1 ], [Subvol_name: new_volume-replicate-1, Err: -1 , Start: 1428806000 , Stop: 2857611999 , Hash: 1 ], [Subvol_name: new_volume-replicate-2, Err: -1 , Start: 2857612000 , Stop: 4294967295 , Hash: 1 ], [2016-01-18 05:50:19.143423] I [MSGID: 109036] [dht-common.c:7869:dht_log_new_layout_for_dir_selfheal] 0-new_volume-dht: Setting layout of /yileilih/pipeline_whole_genome_observation/work/da/46c38e4e83fa2e10c3d33800042937 with [Subvol_name: new_volume-replicate-0, Err: -1 , Start: 2857612000 , Stop: 4294967295 , Hash: 1 ], [Subvol_name: new_volume-replicate-1, Err: -1 , Start: 0 , Stop: 1428805999 , Hash: 1 ], [Subvol_name: new_volume-replicate-2, Err: -1 , Start: 1428806000 , Stop: 2857611999 , Hash: 1 ], </logs> ... these messages are informational (indicated by the 'I' after the timestamp) and therefore are harmless. These messages can be ignored. -Krutika all of our clients and servers are running GlusterFS 3.7.6. we installed via yum repo. As for the log, there is a following piece of info can be seen. <logs> [2016-01-18 21:41:37.736914] I [MSGID: 109036] [dht-common.c:7869:dht_log_new_layout_for_dir_selfheal] 0-new_volume-dht: Setting layout of /somebody/pipeline_whole_genome/work/44/fc2094365fdd1128e6a80453cea9dc with [Subvol_name: new_volume-replicate-0, Err: -1 , Start: 2145256000 , Stop: 4294967295 , Hash: 1 ], [Subvol_name: new_volume-replicate-1, Err: 28 , Start: 0 , Stop: 0 , Hash: 0 ], [Subvol_name: new_volume-replicate-2, Err: -1 , Start: 0 , Stop: 2145255999 , Hash: 1 ], </logs> it is noted that there is an "Err: 28", what's the meaning. we know that this log is just for information purpose, and we just want to get some more info about our cluster. Thanks for your kind explanation. Hi, Krutika: we used tar with "--warning=no-file-changed", though there is no warning displayed anymore. But seems system still got error. For exmaple, in our shell scripts: <shell scripts> tar zcf $1/work.tar.gz $1/work --warning=no-file-changed && rm -rf $1/work <shell script> system will not execute the part "rm -rf $1/work" though there is no warning for the first part, which i think is because the error is still there. any suggestion on how to finish the scripts? thanks Hi, So the good news is that we have recently identified couple of bugs in DHT that can lead to the 'file changed as we read it' errors. I want the following pieces of information from you: 1) The protocol that was used to mount the volume - FUSE/NFS/Samba? 2) When you recreate this issue next time, can you please find out whether the error was seen on files or directories or both and share that information? -Krutika we tried both fuse and nfs, can get the warning all the time. haven't try samba yet. the warning is shown on both folder and files. Hi, Krutika: by the way, what is your os and version? thx. Hi, I tried this on Fedora 22. What about you? -Krutika centos 6.3 yilei why when i using yum list cmd to show glusterfs package, it always shown in yellow or red color for these glusterfs packages? <snapshot> glusterfs.x86_64(in yellow color) 3.7.6-1.el6 @/glusterfs-3.7.6-1.el6.x86_64 glusterfs-api.x86_64(in yellow color) 3.7.6-1.el6 @/glusterfs-api-3.7.6-1.el6.x86_64 glusterfs-cli.x86_64(in yellow color) 3.7.6-1.el6 @/glusterfs-cli-3.7.6-1.el6.x86_64 glusterfs-client-xlators.x86_64(in red color) 3.7.6-1.el6 @/glusterfs-client-xlators-3.7.6-1.el6.x86_64 glusterfs-fuse.x86_64(in yellow color) 3.7.6-1.el6 @/glusterfs-fuse-3.7.6-1.el6.x86_64 glusterfs-libs.x86_64(in yellow color) 3.7.6-1.el6 @/glusterfs-libs-3.7.6-1.el6.x86_64 glusterfs-server.x86_64(in red color) 3.7.6-1.el6 @/glusterfs-server-3.7.6-1.el6.x86_64 </snapshot> thanks! This bug is getting closed because GlusteFS-3.7 has reached its end-of-life. Note: This bug is being closed using a script. No verification has been performed to check if it still exists on newer releases of GlusterFS. If this bug still exists in newer GlusterFS releases, please reopen this bug against the newer release. The bug is still relevant on glusterfs 3.13 (as shipped in ubuntu PPA). I traced it to different timestamps of file replicas (it looks like glusterfs returns timestamp of the real file from local node): stat on the first replica: root@tframe0-atl:~# stat /mnt/opsfs/db/CMTS/CMTS_MnB-101824/CMTS_MnB-101824.tar.gz /mnt/opsfs-tf0/BRICK/db/CMTS/CMTS_MnB-101824/CMTS_MnB-101824.tar.gz File: '/mnt/opsfs/db/CMTS/CMTS_MnB-101824/CMTS_MnB-101824.tar.gz' Size: 19090142 Blocks: 37286 IO Block: 131072 regular file Device: 8ah/138d Inode: 9964002487893152032 Links: 1 Access: (0664/-rw-rw-r--) Uid: ( 999/ ops) Gid: ( 999/ ops) Access: 2018-08-13 00:00:03.059002452 +0000 Modify: 2018-07-06 13:28:10.768794322 +0000 Change: 2018-07-06 13:28:14.340859975 +0000 Birth: - File: '/mnt/opsfs-tf0/BRICK/db/CMTS/CMTS_MnB-101824/CMTS_MnB-101824.tar.gz' Size: 19090142 Blocks: 37296 IO Block: 4096 regular file Device: fc02h/64514d Inode: 1187765 Links: 2 Access: (0664/-rw-rw-r--) Uid: ( 999/ ops) Gid: ( 999/ ops) Access: 2018-08-13 00:00:03.059002452 +0000 Modify: 2018-07-06 13:28:10.768794322 +0000 Change: 2018-07-06 13:28:14.340859975 +0000 Birth: - stat on the second replica: root@tframe1-atl:~# stat /mnt/opsfs/db/CMTS/CMTS_MnB-101824/CMTS_MnB-101824.tar.gz /mnt/opsfs-tf0/BRICK/db/CMTS/CMTS_MnB-101824/CMTS_MnB-101824.tar.gz File: '/mnt/opsfs/db/CMTS/CMTS_MnB-101824/CMTS_MnB-101824.tar.gz' Size: 19090142 Blocks: 37286 IO Block: 131072 regular file Device: b8h/184d Inode: 9964002487893152032 Links: 1 Access: (0664/-rw-rw-r--) Uid: ( 999/ ops) Gid: ( 999/ ops) Access: 2018-07-06 13:28:07.736870120 +0000 Modify: 2018-07-06 13:28:10.768794322 +0000 Change: 2018-07-06 13:28:14.340980075 +0000 Birth: - stat: cannot stat '/mnt/opsfs-tf0/BRICK/db/CMTS/CMTS_MnB-101824/CMTS_MnB-101824.tar.gz': No such file or directory root@tframe1-atl:~# stat /mnt/opsfs/db/CMTS/CMTS_MnB-101824/CMTS_MnB-101824.tar.gz /mnt/opsfs-tf1/BRICK/db/CMTS/CMTS_MnB-101824/CMTS_MnB-101824.tar.gz File: '/mnt/opsfs/db/CMTS/CMTS_MnB-101824/CMTS_MnB-101824.tar.gz' Size: 19090142 Blocks: 37286 IO Block: 131072 regular file Device: b8h/184d Inode: 9964002487893152032 Links: 1 Access: (0664/-rw-rw-r--) Uid: ( 999/ ops) Gid: ( 999/ ops) Access: 2018-07-06 13:28:07.736870120 +0000 Modify: 2018-07-06 13:28:10.768794322 +0000 Change: 2018-07-06 13:28:14.340980075 +0000 Birth: - After I reset timestamp of real file to be the same, the tar stopped complaining. IMHO, glusterfs must self-heal the timestamp in such cases. My mistake, the problem did not go away after making sure access times are set. Secondly, there is live bug report on this (a bug when lstat and fstat return different values): https://bugzilla.redhat.com/show_bug.cgi?id=1058526 |