Bug 1047449

Summary:	io-cache : Warning message "vol-io-cache: page error for page = 0x7ff570559550 & waitq = 0x7ff570055130" in client logs filling up root partition
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	spandura
Component:	glusterfs	Assignee:	Raghavendra G <rgowdapp>
Status:	CLOSED ERRATA	QA Contact:	spandura
Severity:	high	Docs Contact:
Priority:	high
Version:	2.1	CC:	grajaiya, vagarwal, vbellur
Target Milestone:	---	Keywords:	ZStream
Target Release:	RHGS 2.1.2
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	glusterfs-3.4.0.54rhs-2.el6rhs	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:
Clones:	1048084 (view as bug list)		Environment:
Last Closed:	2014-02-25 08:13:21 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1048084

Description spandura 2013-12-31 07:32:43 UTC

Description of problem:
========================
On a 2 x 3 distribute replicate volume, we were running fs_sanity from 2 fuse mounts. After sometime Bonnie test (test part of fs_sanity) failed on both the mount points. The client logs got filled with warning message growing upto 12GB in turn filling the root file system: 

[2013-12-30 10:11:50.538894] W [fuse-bridge.c:2618:fuse_readv_cbk] 0-glusterfs-fuse: 57274844: READ => -1 (Input/output error)
[2013-12-30 10:11:50.538937] W [page.c:991:__ioc_page_error] 0-vol-io-cache: page error for page = 0x7ff5705d8150 & waitq = 0x7ff571214620


Failure on Mount1:
-----------------
executing bonnie
Using uid:0, gid:0.
Writing a byte at a time...done
Writing intelligently...done
Rewriting...done
Reading a byte at a time...done
Reading intelligently...done
start 'em...Can't write block.: Transport endpoint is not connected
Can't sync file.
Can't write block.: Transport endpoint is not connected
Can't sync file.
Can't write block.: Transport endpoint is not connected
Can't sync file.
Can't write block.: Transport endpoint is not connected
Can't sync file.

Failure on Mount2:
------------------
executing bonnie
Using uid:0, gid:0.
Writing a byte at a time...done
Writing intelligently...done
Rewriting...done
Reading a byte at a time...done
Reading intelligently...done
start 'em...Bonnie: drastic I/O error (re-write read): Transport endpoint is not connected
Can't read a full block, only got 4096 bytes.
Can't read a full block, only got 4096 bytes.
Can't read a full block, only got 4096 bytes.
Can't read a full block, only got 4096 bytes.


Version-Release number of selected component (if applicable):
============================================================
glusterfs 3.4.0.52rhs built on Dec 19 2013 12:20:16

How reproducible:


Steps to Reproduce:
=====================
1.Create 2 x 3 distribute replicate volume. "enable" the volume option "linux-aio" . Start the volume.

2. From the client node create 2 fuse mounts. 

3. Run "fs_sanity" on both the mounts. 
     
      a. create a nfs mount to /opt on rhsqe-repo.lab.eng.blr.redhat.com 
( rhsqe-repo.lab.eng.blr.redhat.com:/opt on /opt type nfs (rw,vers=4,addr=10.70.34.52,clientaddr=10.70.34.93) )

      b. Execute the command to run the fs_sanity : "cd /opt/qa/tools/system_light ; ./run.sh -w /mnt/vol/ -l /root/fs_sanity_log.log"

Actual results:
===============
Warning messages filling the file system. 


Additional info:
===================

[root@fan ~]# gluster v info
 
Volume Name: vol
Type: Distributed-Replicate
Volume ID: 6136c5d9-9b40-4503-aa4a-11ff3da44e88
Status: Started
Number of Bricks: 2 x 3 = 6
Transport-type: tcp
Bricks:
Brick1: dj:/rhs/brick1/b1
Brick2: fan:/rhs/brick1/b1-rep1
Brick3: mia:/rhs/brick1/b1-rep2
Brick4: dj:/rhs/brick1/b2
Brick5: fan:/rhs/brick1/b2-rep1
Brick6: mia:/rhs/brick1/b2-rep2
Options Reconfigured:
storage.linux-aio: enable


[root@fan ~]# 
[root@fan ~]# gluster v status
Status of volume: vol
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick dj:/rhs/brick1/b1					49158	Y	16632
Brick fan:/rhs/brick1/b1-rep1				49156	Y	3313
Brick mia:/rhs/brick1/b1-rep2				49158	Y	10439
Brick dj:/rhs/brick1/b2					49159	Y	16644
Brick fan:/rhs/brick1/b2-rep1				49157	Y	3325
Brick mia:/rhs/brick1/b2-rep2				49159	Y	10451
NFS Server on localhost					2049	Y	3338
Self-heal Daemon on localhost				N/A	Y	3345
NFS Server on dj					2049	Y	16657
Self-heal Daemon on dj					N/A	Y	16663
NFS Server on mia					2049	Y	10466
Self-heal Daemon on mia					N/A	Y	10470
 
Task Status of Volume vol
------------------------------------------------------------------------------
There are no active volume tasks
 
[root@fan ~]#

Comment 2 spandura 2013-12-31 09:28:03 UTC

SOS Reports: http://rhsqe-repo.lab.eng.blr.redhat.com/bugs_necessary_info/1047449/

Comment 3 Vivek Agarwal 2014-01-06 07:32:31 UTC

https://code.engineering.redhat.com/gerrit/#/c/17981/

Comment 4 spandura 2014-01-07 06:07:29 UTC

Verified the fix on the build "glusterfs 3.4.0.54rhs built on Jan  5 2014 06:26:17" . Bug is fixed. Moving the bug to verified state.

Comment 6 errata-xmlrpc 2014-02-25 08:13:21 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2014-0208.html