| Summary: | file broken.. | ||
|---|---|---|---|
| Product: | [Community] GlusterFS | Reporter: | Yasuya Ichikawa <yasuya.ichikawa> |
| Component: | glusterd | Assignee: | shishir gowda <sgowda> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 3.1.3 | CC: | amarts, gluster-bugs, nsathyan, vijay |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | Type: | --- | |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
Hi, Please issue this command on the client machines: 'echo 3 > /proc/sys/vm/drop_caches' And issue this command on the mount point 'find . |xargs stat' If the issue still persists, please try to disable and enable stat-prefetch xlator using the following command: 'gluster volume set test stat-prefetch off' followed by 'gluster volume set rep stat-prefetch on' Please let us know if this fixes the issue at hand. Could you please provide more information as to ops/steps lead to this issue? Closing the bug as there has been no response to the last post. I'm sorry but I couldn't reply.
Because my glusterfs environment was not available.
I tried your advice. but,
I couldn't improve this problem.
This is a client log.
client:/a/brick01/xxxxxx/item# ls -al ./739/10498739/0624/
ls: cannot access ./739/10498739/0624/img936012067904.jpg: No such file or director
ls: cannot access ./739/10498739/0624/img674019172034.jpg: No such file or director
ls: cannot access ./739/10498739/0624/img936012067217.jpg: No such file or director
ls: cannot access ./739/10498739/0624/img6740191723305.jpg: No such file or directo
ls: cannot access ./739/10498739/0624/img9360120674726.jpg: No such file or directo
ls: cannot access ./739/10498739/0624/img9360120675818.jpg: No such file or directo
ls: cannot access ./739/10498739/0624/img6740191728085.jpg: No such file or directo
total 204
drwxr-sr-x 2 20102 group 4096 Jun 24 21:39 .
drwxr-sr-x 4 20102 group 4096 Jun 24 19:41 ..
?????????? ? ? ? ? ? img674019172034.jpg
?????????? ? ? ? ? ? img6740191723305.jpg
-rw-r--r-- 1 20102 group 31233 Jun 24 21:39 img6740191728059.jpg
?????????? ? ? ? ? ? img6740191728085.jpg
-rw-r--r-- 1 20102 group 93827 Jun 24 21:39 img6740191728978.jpg
?????????? ? ? ? ? ? img936012067217.jpg
?????????? ? ? ? ? ? img9360120674726.jpg
?????????? ? ? ? ? ? img9360120675818.jpg
-rw-r--r-- 1 20102 group 48420 Jun 24 19:41 img9360120677749.jpg
?????????? ? ? ? ? ? img936012067904.jpg
This is a client settings.
client:/a/brick01/xxxxxx/item# cat /proc/sys/vm/drop_caches
3
client:/a/brick01/xxxxxx/item# cat /usr/local/glusterfs-3.2.0/etc/glusterfs/xxxxxx.vol
#------------------------------------------------------------------------
#
# brick settings
#
#------------------------------------------------------------------------
volume disk1
type protocol/client
option transport-type tcp/client
option remote-host gluster-server01
option ping-timeout 5
option remote-subvolume /brick01
end-volume
volume disk2
type protocol/client
option transport-type tcp/client
option remote-host gluster-server02
option ping-timeout 5
option remote-subvolume /brick01
end-volume
#------------------------------------------------------------------------
#
# replicate settings
#
#------------------------------------------------------------------------
volume replicate1
type cluster/replicate
subvolumes disk1 disk2
end-volume
#------------------------------------------------------------------------
#
# performance settings
#
#------------------------------------------------------------------------
volume cache
type performance/io-cache
option cache-size 256MB
subvolumes replicate1
end-volume
volume writeback
type performance/write-behind
option cache-size 128MB
subvolumes cache
end-volume
volume quickread
type performance/quick-read
option cache-timeout 1
option max-file-size 512KB
subvolumes writeback
end-volume
volume iothreads
type performance/io-threads
option thread-count 16
subvolumes quickread
end-volume
This is a server log.
[root@gluster-server01 item]# getfattr -m . -d -e hex ./739/10498739/0624/img6740191728085.jpg
# file: 739/10498739/0624/img6740191728085.jpg
trusted.gfid=0xc7daf1f183d946dcaca210b30dc26d1e
[root@gluster-server02 item]# getfattr -m . -d -e hex ./739/10498739/0624/img6740191728085.jpg
getfattr: ./739/10498739/0624/img6740191728085.jpg: No such file or directory
This is a server settings.
[root@gluster-server01 item]# gluster volume info
Volume Name: xxxxxx
Type: Replicate
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: gluster-server01:/brick01
Brick2: gluster-server02:/brick01
Options Reconfigured:
performance.stat-prefetch: off
diagnostics.dump-fd-stats: off
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
performance.io-thread-count: 64
performance.cache-size: 6GB
network.ping-timeout: 5
What should I do?
Please give me some advice.
My environment is as following.
+-->[gluster-server01]<--+
[server]---+ +---[client]
+---[gluster-server02]<--+
* Write path
[server] -> [gluster-server01]
* Read path
[client] -> [gluster-server01]
[client] -> [gluster-server02]
I couldn't read more than 128KB from client. client:~# ls -lh /a/brick01/xxxxxx/nsys/ ls: cannot access /a/brick01/xxxxxx/data.129: Invalid argument ls: cannot access /a/brick01/xxxxxx/data.128: Invalid argument ls: cannot access /a/brick01/xxxxxx/data.130: Invalid argument -rw-r--r-- 1 root root 121K Jun 30 12:50 data.121 -rw-r--r-- 1 root root 122K Jun 30 12:52 data.122 -rw-r--r-- 1 root root 123K Jun 30 12:50 data.123 -rw-r--r-- 1 root root 124K Jun 30 12:50 data.124 -rw-r--r-- 1 root root 125K Jun 30 12:50 data.125 -rw-r--r-- 1 root root 126K Jun 30 12:50 data.126 -rw-r--r-- 1 root root 127K Jun 30 12:52 data.127 ?????????? ? ? ? ? ? data.128 ?????????? ? ? ? ? ? data.129 ?????????? ? ? ? ? ? data.130
> My environment is as following.
>
> +-->[gluster-server01]<--+
> [server]---+ +---[client]
> +---[gluster-server02]<--+
>
>
> * Write path
> [server] -> [gluster-server01]
>
> * Read path
> [client] -> [gluster-server01]
> [client] -> [gluster-server02]
What is a 'server' here? and is it accessing the storage via glusterfs mount? GlsuterFS doesn't support writing directly to the backend.
I understand what you mean. but,
When I wrote via fuse, Those problems also happend.
This is a write log from client to server:
client:~# mount | grep brick
/usr/local/glusterfs-3.2.0/etc/glusterfs/xxxxxx.vol on /a/brick01/xxxxxx type fuse.glusterfs (rw,allow_other,default_permissions,max_read=131072)
client:~# cd /a/brick01/xxxxxx/yyyy;pwd
/a/brick01/xxxxxx/yyyy
client:/a/brick01/xxxxxx/yyyy# for x in 121 122 123 124 125 126 127 128 129 130; do dd if=/dev/zero of=./data.${x} bs=1024 count=${x}; done
121+0 records in
121+0 records out
123904 bytes (124 kB) copied, 0.00740512 s, 16.7 MB/s
122+0 records in
122+0 records out
124928 bytes (125 kB) copied, 0.00693275 s, 18.0 MB/s
123+0 records in
123+0 records out
125952 bytes (126 kB) copied, 0.00667576 s, 18.9 MB/s
124+0 records in
124+0 records out
126976 bytes (127 kB) copied, 0.00650061 s, 19.5 MB/s
125+0 records in
125+0 records out
128000 bytes (128 kB) copied, 0.00780486 s, 16.4 MB/s
126+0 records in
126+0 records out
129024 bytes (129 kB) copied, 0.00731015 s, 17.6 MB/s
127+0 records in
127+0 records out
130048 bytes (130 kB) copied, 0.00669811 s, 19.4 MB/s
128+0 records in
128+0 records out
131072 bytes (131 kB) copied, 0.00661765 s, 19.8 MB/s
129+0 records in
129+0 records out
132096 bytes (132 kB) copied, 0.00749507 s, 17.6 MB/s
130+0 records in
130+0 records out
133120 bytes (133 kB) copied, 0.00710567 s, 18.7 MB/s
client:/a/brick01/xxxxxx/yyyy# ls -lh data.1*
ls: cannot access data.128: Invalid argument <-- broken
ls: cannot access data.129: Invalid argument <-- broken
ls: cannot access data.130: Invalid argument <-- broken
-rw-r--r-- 1 root root 121K Jun 30 14:40 data.121
-rw-r--r-- 1 root root 122K Jun 30 14:40 data.122
-rw-r--r-- 1 root root 123K Jun 30 14:40 data.123
-rw-r--r-- 1 root root 124K Jun 30 14:40 data.124
-rw-r--r-- 1 root root 125K Jun 30 14:40 data.125
-rw-r--r-- 1 root root 126K Jun 30 14:40 data.126
-rw-r--r-- 1 root root 127K Jun 30 14:40 data.127
Hi Yasuya, We do not support writing to backend directly (bypassing the client). This creates inconsistencies. As your log shows, one of the backend has the data, while the other does not [root@gluster-server01 item]# getfattr -m . -d -e hex ./739/10498739/0624/img6740191728085.jpg # file: 739/10498739/0624/img6740191728085.jpg trusted.gfid=0xc7daf1f183d946dcaca210b30dc26d1e <<----as you wrote to the backe server01 directly. [root@gluster-server02 item]# getfattr -m . -d -e hex ./739/10498739/0624/img6740191728085.jpg getfattr: ./739/10498739/0624/img6740191728085.jpg: No such file or directory the client protocol (replicate) actually takes care of mirroring the info on both servers. Please restart all your tests, and do not write to the backend directly. One way to recover data is, to write data(copy the file) to the mount point, and after completion, delete the data on the backend to which you had written directly(older copy). Please let me know if I can go ahead and close this bug. Closing the bug as I have not heard back from the originator of the bug. The issue was writing to one of the replica backend directly, which is not supported. I tried recommended configuration but, same situation was going on.
Finally I resolved this problem to setting as following.
#------------------------------------------------------------------------
#
# performance settings
#
#------------------------------------------------------------------------
volume cache
type performance/io-cache
option cache-size 256MB
subvolumes replicate1
end-volume
volume writeback
type performance/write-behind
option cache-size 128MB
subvolumes cache
end-volume
#volume quickread
# type performance/quick-read
# option cache-timeout 1
# option max-file-size 512KB
# subvolumes writeback
#end-volume
volume iothreads
type performance/io-threads
option thread-count 16
subvolumes writeback
end-volume
When I set quickread, glusterfs could not write over 128KB.
Thank you for helping us.
|
Hello. I found a error. o client side E [client3_1-fops.c:1898:client3_1_lookup_cbk] 0-: error E [client3_1-fops.c:1898:client3_1_lookup_cbk] 0-: error W [fuse-bridge.c:184:fuse_entry_cbk] 0-glusterfs-fuse: 17: LOOKUP() /file => -1 (Invalid argument) o server side E [server.c:67:gfs_serialize_reply] 0-: Failed to encode message This is a result of 'ls'. [root@client01 ~]# ls -la /test total 92 drwxr-xr-x 6 root root 4096 Mar 30 13:46 . drwxr-xr-x 4 root root 4096 Mar 28 10:56 .. ?--------- ? ? ? ? ? file <- this drwx------ 2 root root 16384 Mar 30 09:51 lost+found -rw-r--r-- 1 root root 10 Mar 30 13:46 test -rw-r--r-- 1 root root 12502 Mar 29 20:01 log This is a result of 'gluster volume info all'. [root@server01 01]# gluster volume info all Volume Name: test Type: Replicate Status: Started Number of Bricks: 2 Transport-type: tcp Bricks: Brick1: server01:/brick/01 Brick2: server02:/brick/01 Options Reconfigured: network.ping-timeout: 5 This is a result of 'gluster volume info all'. [root@server01 01]# getfattr -m . -d -e hex ./file # file: file trusted.afr.brick1=0x000000000000000000000000 trusted.afr.brick2=0x000000000000000000000000 trusted.gfid=0x7c6cfbbb220940b096b4cd1710d015ea [root@server02 01]# getfattr -m . -d -e hex ./file # file: file trusted.afr.brick1=0x000000000000000000000000 trusted.afr.brick2=0x000000000000000000000000 trusted.gfid=0x7c6cfbbb220940b096b4cd1710d015ea This is other infomation. OS: CentOS 5.5 (x86_64) FileSystem: ext3 Kernel Version: 2.6.18-194.3.1.el5xen Glusterfs Version: 3.1.3 Please tell me how to repair this file.