Bug 764359 (GLUSTER-2627)

Summary: file broken..
Product: [Community] GlusterFS Reporter: Yasuya Ichikawa <yasuya.ichikawa>
Component: glusterdAssignee: shishir gowda <sgowda>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 3.1.3CC: amarts, gluster-bugs, nsathyan, vijay
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Description Yasuya Ichikawa 2011-03-30 01:57:18 EDT
Hello.

I found a error.

  o client side
    E [client3_1-fops.c:1898:client3_1_lookup_cbk] 0-: error
    E [client3_1-fops.c:1898:client3_1_lookup_cbk] 0-: error
    W [fuse-bridge.c:184:fuse_entry_cbk] 0-glusterfs-fuse: 17: LOOKUP() /file => -1 (Invalid argument)

  o server side
    E [server.c:67:gfs_serialize_reply] 0-: Failed to encode message


This is a result of 'ls'.

  [root@client01 ~]# ls -la /test
  total 92
  drwxr-xr-x 6 root root  4096 Mar 30 13:46 .
  drwxr-xr-x 4 root root  4096 Mar 28 10:56 ..
  ?--------- ? ?    ?        ?            ? file        <- this
  drwx------ 2 root root 16384 Mar 30 09:51 lost+found
  -rw-r--r-- 1 root root    10 Mar 30 13:46 test
  -rw-r--r-- 1 root root 12502 Mar 29 20:01 log


This is a result of 'gluster volume info all'.

  [root@server01 01]# gluster volume info all
  Volume Name: test
  Type: Replicate
  Status: Started
  Number of Bricks: 2
  Transport-type: tcp
  Bricks:
  Brick1: server01:/brick/01
  Brick2: server02:/brick/01
  Options Reconfigured:
  network.ping-timeout: 5


This is a result of 'gluster volume info all'.

  [root@server01 01]# getfattr -m . -d -e hex ./file
  # file: file
  trusted.afr.brick1=0x000000000000000000000000
  trusted.afr.brick2=0x000000000000000000000000
  trusted.gfid=0x7c6cfbbb220940b096b4cd1710d015ea

  [root@server02 01]# getfattr -m . -d -e hex ./file
  # file: file
  trusted.afr.brick1=0x000000000000000000000000
  trusted.afr.brick2=0x000000000000000000000000
  trusted.gfid=0x7c6cfbbb220940b096b4cd1710d015ea


This is other infomation.

  OS: CentOS 5.5 (x86_64)
  FileSystem: ext3
  Kernel Version: 2.6.18-194.3.1.el5xen
  Glusterfs Version: 3.1.3


Please tell me how to repair this file.
Comment 1 shishir gowda 2011-04-05 00:09:20 EDT
Hi,

Please issue this command on the client machines:
'echo 3 > /proc/sys/vm/drop_caches'

And issue this command on the mount point
'find . |xargs stat'

If the issue still persists, please try to disable and enable stat-prefetch xlator
using the following command:

'gluster volume set test stat-prefetch off'

followed by

'gluster volume set rep stat-prefetch on'

Please let us know if this fixes the issue at hand.

Could you please provide more information as to ops/steps lead to this issue?
Comment 2 shishir gowda 2011-04-10 23:53:00 EDT
Closing the bug as there has been no response to the last post.
Comment 3 Yasuya Ichikawa 2011-06-29 04:09:43 EDT
I'm sorry but I couldn't reply.
Because my glusterfs environment was not available.

I tried your advice. but,
I couldn't improve this problem.

This is a client log.

  client:/a/brick01/xxxxxx/item# ls -al  ./739/10498739/0624/
  ls: cannot access ./739/10498739/0624/img936012067904.jpg: No such file or director
  ls: cannot access ./739/10498739/0624/img674019172034.jpg: No such file or director
  ls: cannot access ./739/10498739/0624/img936012067217.jpg: No such file or director
  ls: cannot access ./739/10498739/0624/img6740191723305.jpg: No such file or directo
  ls: cannot access ./739/10498739/0624/img9360120674726.jpg: No such file or directo
  ls: cannot access ./739/10498739/0624/img9360120675818.jpg: No such file or directo
  ls: cannot access ./739/10498739/0624/img6740191728085.jpg: No such file or directo
  total 204
  drwxr-sr-x 2 20102 group  4096 Jun 24 21:39 .
  drwxr-sr-x 4 20102 group  4096 Jun 24 19:41 ..
  ?????????? ? ?     ?       ?            ? img674019172034.jpg
  ?????????? ? ?     ?       ?            ? img6740191723305.jpg
  -rw-r--r-- 1 20102 group 31233 Jun 24 21:39 img6740191728059.jpg
  ?????????? ? ?     ?       ?            ? img6740191728085.jpg
  -rw-r--r-- 1 20102 group 93827 Jun 24 21:39 img6740191728978.jpg
  ?????????? ? ?     ?       ?            ? img936012067217.jpg
  ?????????? ? ?     ?       ?            ? img9360120674726.jpg
  ?????????? ? ?     ?       ?            ? img9360120675818.jpg
  -rw-r--r-- 1 20102 group 48420 Jun 24 19:41 img9360120677749.jpg
  ?????????? ? ?     ?       ?            ? img936012067904.jpg


This is a client settings.

  client:/a/brick01/xxxxxx/item# cat /proc/sys/vm/drop_caches
  3
  client:/a/brick01/xxxxxx/item# cat /usr/local/glusterfs-3.2.0/etc/glusterfs/xxxxxx.vol
  #------------------------------------------------------------------------
  #
  # brick settings
  #
  #------------------------------------------------------------------------
  volume disk1
      type protocol/client
      option transport-type tcp/client
      option remote-host gluster-server01
      option ping-timeout 5
      option remote-subvolume /brick01
  end-volume
  
  volume disk2
      type protocol/client
      option transport-type tcp/client
      option remote-host gluster-server02
      option ping-timeout 5
      option remote-subvolume /brick01
  end-volume
  
  #------------------------------------------------------------------------
  #
  # replicate settings
  #
  #------------------------------------------------------------------------
  volume replicate1
     type cluster/replicate
     subvolumes disk1 disk2
  end-volume
  
  #------------------------------------------------------------------------
  #
  # performance settings
  #
  #------------------------------------------------------------------------
  volume cache
    type performance/io-cache
    option cache-size 256MB
    subvolumes replicate1
  end-volume
  
  volume writeback
    type performance/write-behind
    option cache-size 128MB
    subvolumes cache
  end-volume
  
  volume quickread
    type performance/quick-read
    option cache-timeout 1
    option max-file-size 512KB
    subvolumes writeback
  end-volume
  
  volume iothreads
    type performance/io-threads
    option thread-count 16
    subvolumes quickread
  end-volume


This is a server log.

  [root@gluster-server01 item]# getfattr -m . -d -e hex ./739/10498739/0624/img6740191728085.jpg
  # file: 739/10498739/0624/img6740191728085.jpg
  trusted.gfid=0xc7daf1f183d946dcaca210b30dc26d1e

  [root@gluster-server02 item]# getfattr -m . -d -e hex ./739/10498739/0624/img6740191728085.jpg
  getfattr: ./739/10498739/0624/img6740191728085.jpg: No such file or directory


This is a server settings.

  [root@gluster-server01 item]# gluster volume info
  Volume Name: xxxxxx
  Type: Replicate
  Status: Started
  Number of Bricks: 2
  Transport-type: tcp
  Bricks:
  Brick1: gluster-server01:/brick01
  Brick2: gluster-server02:/brick01
  Options Reconfigured:
  performance.stat-prefetch: off
  diagnostics.dump-fd-stats: off
  diagnostics.count-fop-hits: on
  diagnostics.latency-measurement: on
  performance.io-thread-count: 64
  performance.cache-size: 6GB
  network.ping-timeout: 5


What should I do?
Please give me some advice.

My environment is as following.

             +-->[gluster-server01]<--+
  [server]---+                        +---[client]
             +---[gluster-server02]<--+


  * Write path
    [server] -> [gluster-server01]

  * Read path
    [client] -> [gluster-server01]
    [client] -> [gluster-server02]
Comment 4 Yasuya Ichikawa 2011-06-29 21:02:45 EDT
I couldn't read more than 128KB from client.

client:~# ls -lh /a/brick01/xxxxxx/nsys/
ls: cannot access /a/brick01/xxxxxx/data.129: Invalid argument
ls: cannot access /a/brick01/xxxxxx/data.128: Invalid argument
ls: cannot access /a/brick01/xxxxxx/data.130: Invalid argument
-rw-r--r--  1 root root 121K Jun 30 12:50 data.121
-rw-r--r--  1 root root 122K Jun 30 12:52 data.122
-rw-r--r--  1 root root 123K Jun 30 12:50 data.123
-rw-r--r--  1 root root 124K Jun 30 12:50 data.124
-rw-r--r--  1 root root 125K Jun 30 12:50 data.125
-rw-r--r--  1 root root 126K Jun 30 12:50 data.126
-rw-r--r--  1 root root 127K Jun 30 12:52 data.127
??????????  ? ?          ?          ?            ? data.128
??????????  ? ?          ?          ?            ? data.129
??????????  ? ?          ?          ?            ? data.130
Comment 5 Amar Tumballi 2011-06-29 21:36:09 EDT
> My environment is as following.
> 
>              +-->[gluster-server01]<--+
>   [server]---+                        +---[client]
>              +---[gluster-server02]<--+
> 
> 
>   * Write path
>     [server] -> [gluster-server01]
> 
>   * Read path
>     [client] -> [gluster-server01]
>     [client] -> [gluster-server02]

What is a 'server' here? and is it accessing the storage via glusterfs mount? GlsuterFS doesn't support writing directly to the backend.
Comment 6 Yasuya Ichikawa 2011-06-29 22:44:04 EDT
I understand what you mean. but,
When I wrote via fuse, Those problems also happend.

This is a write log from client to server:

client:~# mount | grep brick
/usr/local/glusterfs-3.2.0/etc/glusterfs/xxxxxx.vol on /a/brick01/xxxxxx type fuse.glusterfs (rw,allow_other,default_permissions,max_read=131072)

client:~# cd /a/brick01/xxxxxx/yyyy;pwd
/a/brick01/xxxxxx/yyyy

client:/a/brick01/xxxxxx/yyyy# for x in 121 122 123 124 125 126 127 128 129 130; do   dd if=/dev/zero of=./data.${x} bs=1024 count=${x}; done
121+0 records in
121+0 records out
123904 bytes (124 kB) copied, 0.00740512 s, 16.7 MB/s
122+0 records in
122+0 records out
124928 bytes (125 kB) copied, 0.00693275 s, 18.0 MB/s
123+0 records in
123+0 records out
125952 bytes (126 kB) copied, 0.00667576 s, 18.9 MB/s
124+0 records in
124+0 records out
126976 bytes (127 kB) copied, 0.00650061 s, 19.5 MB/s
125+0 records in
125+0 records out
128000 bytes (128 kB) copied, 0.00780486 s, 16.4 MB/s
126+0 records in
126+0 records out
129024 bytes (129 kB) copied, 0.00731015 s, 17.6 MB/s
127+0 records in
127+0 records out
130048 bytes (130 kB) copied, 0.00669811 s, 19.4 MB/s
128+0 records in
128+0 records out
131072 bytes (131 kB) copied, 0.00661765 s, 19.8 MB/s
129+0 records in
129+0 records out
132096 bytes (132 kB) copied, 0.00749507 s, 17.6 MB/s
130+0 records in
130+0 records out
133120 bytes (133 kB) copied, 0.00710567 s, 18.7 MB/s

client:/a/brick01/xxxxxx/yyyy# ls -lh data.1*
ls: cannot access data.128: Invalid argument <-- broken
ls: cannot access data.129: Invalid argument <-- broken
ls: cannot access data.130: Invalid argument <-- broken
-rw-r--r-- 1 root root 121K Jun 30 14:40 data.121
-rw-r--r-- 1 root root 122K Jun 30 14:40 data.122
-rw-r--r-- 1 root root 123K Jun 30 14:40 data.123
-rw-r--r-- 1 root root 124K Jun 30 14:40 data.124
-rw-r--r-- 1 root root 125K Jun 30 14:40 data.125
-rw-r--r-- 1 root root 126K Jun 30 14:40 data.126
-rw-r--r-- 1 root root 127K Jun 30 14:40 data.127
Comment 7 shishir gowda 2011-07-05 22:52:31 EDT
Hi Yasuya,

We do not support writing to backend directly (bypassing the client). This creates inconsistencies. As your log shows, one of the backend has the data, while the other does not

 [root@gluster-server01 item]# getfattr -m . -d -e hex
./739/10498739/0624/img6740191728085.jpg
  # file: 739/10498739/0624/img6740191728085.jpg
  trusted.gfid=0xc7daf1f183d946dcaca210b30dc26d1e
<<----as you wrote to the backe server01 directly.

  [root@gluster-server02 item]# getfattr -m . -d -e hex
./739/10498739/0624/img6740191728085.jpg
  getfattr: ./739/10498739/0624/img6740191728085.jpg: No such file or directory

the client protocol (replicate) actually takes care of mirroring the info on both servers.

Please restart all your tests, and do not write to the backend directly.

One way to recover data is, to write data(copy the file) to the mount point, and after completion, delete the data on the backend to which you had written directly(older copy).

Please let me know if I can go ahead and close this bug.
Comment 8 shishir gowda 2011-07-12 23:23:45 EDT
Closing the bug as I have not heard back from the originator of the bug.

The issue was writing to one of the replica backend directly, which is not supported.
Comment 9 Yasuya Ichikawa 2011-07-13 20:35:49 EDT
I tried recommended configuration but, same situation was going on.
Finally I resolved this problem to setting as following.

  #------------------------------------------------------------------------
  #
  # performance settings
  #
  #------------------------------------------------------------------------
  volume cache
    type performance/io-cache
    option cache-size 256MB
    subvolumes replicate1
  end-volume

  volume writeback
    type performance/write-behind
    option cache-size 128MB
    subvolumes cache
  end-volume

  #volume quickread
  #  type performance/quick-read
  #  option cache-timeout 1
  #  option max-file-size 512KB
  #  subvolumes writeback
  #end-volume

  volume iothreads
    type performance/io-threads
    option thread-count 16
    subvolumes writeback
  end-volume

When I set quickread, glusterfs could not write over 128KB.
Thank you for helping us.