1219547 – I/O failure on attaching tier

Bug 1219547 - I/O failure on attaching tier

Summary: I/O failure on attaching tier

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	tiering
Sub Component:
Version:	3.7.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Assignee:	Dan Lambright
QA Contact:	bugs@gluster.org
Docs Contact:
URL:
Whiteboard:
Depends On:	1214289 1228643 1230692 1259081 1263549
Blocks:	qe_tracker_everglades glusterfs-3.7.0 1260923
TreeView+	depends on / blocked

Reported:	2015-05-07 14:52 UTC by Dan Lambright
Modified:	2015-10-30 17:32 UTC (History)
CC List:	5 users (show)
Fixed In Version:	glusterfs-3.7.0
Clone Of:	1214289
Environment:
Last Closed:	2015-05-14 17:29:39 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Dan Lambright 2015-05-07 14:52:50 UTC

+++ This bug was initially created as a clone of Bug #1214289 +++

Description of problem:
I/O failure on attaching tier

Version-Release number of selected component (if applicable):
glusterfs-server-3.7dev-0.994.git0d36d4f.el6.x86_64

How reproducible:


Steps to Reproduce:
1. Create a replica volume
2. Start 100% writes I/O on the volum
3. Attach a a tier while the I/O is in progress
4. Attach tier is successful, but I/O fails

Actual results:
See that the I/O's are failing. Here is the console o/p:

linux-2.6.31.1/arch/ia64/include/asm/sn/mspec.h
tar: linux-2.6.31.1/arch/ia64/include/asm/sn/mspec.h: Cannot open: Stale file handle
linux-2.6.31.1/arch/ia64/include/asm/sn/nodepda.h
tar: linux-2.6.31.1/arch/ia64/include/asm/sn/nodepda.h: Cannot open: Stale file handle
linux-2.6.31.1/arch/ia64/include/asm/sn/pcibr_provider.h
tar: linux-2.6.31.1/arch/ia64/include/asm/sn/pcibr_provider.h: Cannot open: Stale file handle
linux-2.6.31.1/arch/ia64/include/asm/sn/pcibus_provider_defs.h
tar: linux-2.6.31.1/arch/ia64/include/asm/sn/pcibus_provider_defs.h: Cannot open: Stale file handle
linux-2.6.31.1/arch/ia64/include/asm/sn/pcidev.h
tar: linux-2.6.31.1/arch/ia64/include/asm/sn/pcidev.h: Cannot open: Stale file handle
linux-2.6.31.1/arch/ia64/include/asm/sn/pda.h
tar: linux-2.6.31.1/arch/ia64/include/asm/sn/pda.h: Cannot open: Stale file handle
linux-2.6.31.1/arch/ia64/include/asm/sn/pic.h
tar: linux-2.6.31.1/arch/ia64/include/asm/sn/pic.h: Cannot open: Stale file handle
linux-2.6.31.1/arch/ia64/include/asm/sn/rw_mmr.h
tar: linux-2.6.31.1/arch/ia64/include/asm/sn/rw_mmr.h: Cannot open: Stale file handle
linux-2.6.31.1/arch/ia64/include/asm/sn/shub_mmr.h
tar: linux-2.6.31.1/arch/ia64/include/asm/sn/shub_mmr.h: Cannot open: Stale file handle
linux-2.6.31.1/arch/ia64/include/asm/sn/shubio.h
tar: linux-2.6.31.1/arch/ia64/include/asm/sn/shubio.h: Cannot open: Stale file handle
linux-2.6.31.1/arch/ia64/include/asm/sn/simulator.h
tar: linux-2.6.31.1/arch/ia64/include/asm/sn/simulator.h: Cannot open: Stale file handle
linux-2.6.31.1/arch/ia64/include/asm/sn/sn2/


Expected results:
I/O should continue normally while the tier is being added. Additionally, all the new writes post the tier addition should go to the hot tier.

Additional info:

--- Additional comment from Anoop on 2015-04-22 07:05:58 EDT ---

Volume info before attach:

Volume Name: vol1
Type: Replicate
Volume ID: b77d4050-7fdc-45ff-a084-f85eec2470fc
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 10.70.35.56:/rhs/brick1
Brick2: 10.70.35.67:/rhs/brick1

Volume Info post attach
Volume Name: vol1
Type: Tier
Volume ID: b77d4050-7fdc-45ff-a084-f85eec2470fc
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: 10.70.35.67:/rhs/brick2
Brick2: 10.70.35.56:/rhs/brick2
Brick3: 10.70.35.56:/rhs/brick1
Brick4: 10.70.35.67:/rhs/brick1

--- Additional comment from Dan Lambright on 2015-04-22 15:46:08 EDT ---

When we attach a tier, the new added translator has no cached sub volume for IOs in flight. So IOs to open files fail. Solution is to recompute the cached sub volume for all open FDs with a lookup in tier_init, I believe, working on a fix.

--- Additional comment from Anand Avati on 2015-04-28 16:28:27 EDT ---

REVIEW: http://review.gluster.org/10435 (cluster/tier: don't use hot tier until subvolumes ready (WIP)) posted (#1) for review on master by Dan Lambright (dlambrig)

--- Additional comment from Anand Avati on 2015-04-29 16:22:55 EDT ---

REVIEW: http://review.gluster.org/10435 (cluster/tier: don't use hot tier until subvolumes ready (WIP)) posted (#2) for review on master by Dan Lambright (dlambrig)

--- Additional comment from Anand Avati on 2015-04-29 18:05:44 EDT ---

REVIEW: http://review.gluster.org/10435 (cluster/tier: don't use hot tier until subvolumes ready (WIP)) posted (#3) for review on master by Dan Lambright (dlambrig)

--- Additional comment from Anand Avati on 2015-05-04 14:55:52 EDT ---

REVIEW: http://review.gluster.org/10435 (cluster/tier: don't use hot tier until subvolumes ready) posted (#4) for review on master by Dan Lambright (dlambrig)

--- Additional comment from Dan Lambright on 2015-05-04 14:57:34 EDT ---

There may still be a window where an I/O error can happen, but this fix should close most of them. The window will be able to be completely close after BZ 1156637 is resolved.

--- Additional comment from Anand Avati on 2015-05-05 11:36:32 EDT ---

COMMIT: http://review.gluster.org/10435 committed in master by Kaleb KEITHLEY (kkeithle) 
------
commit 377505a101eede8943f5a345e11a6901c4f8f420
Author: Dan Lambright <dlambrig>
Date:   Tue Apr 28 16:26:33 2015 -0400

    cluster/tier: don't use hot tier until subvolumes ready
    
    When we attach a tier, the hot tier becomes the hashed
    subvolume. But directories may not yet have been replicated by
    the fix layout process. Hence lookups to those directories
    will fail on the hot subvolume. We should only go to the hashed
    subvolume once the layout has been fixed. This is known if the
    layout for the parent directory does not have an error. If
    there is an error, the cold tier is considered the hashed
    subvolume. The exception to this rules is ENOCON, in which
    case we do not know where the file is and must abort.
    
    Note we may revalidate a lookup for a directory even if the
    inode has not yet been populated by FUSE. This case can
    happen in tiering (where one tier has completed a lookup
    but the other has not, in which case we revalidate one tier
    when we call lookup the second time). Such inodes are
    still invalid and should not be consulted for validation.
    
    Change-Id: Ia2bc62e1d807bd70590bd2a8300496264d73c523
    BUG: 1214289
    Signed-off-by: Dan Lambright <dlambrig>
    Reviewed-on: http://review.gluster.org/10435
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Raghavendra G <rgowdapp>
    Reviewed-by: N Balachandran <nbalacha>

Comment 1 Anand Avati 2015-05-08 12:17:05 UTC

COMMIT: http://review.gluster.org/10649 committed in release-3.7 by Vijay Bellur (vbellur) 
------
commit d4e9c501a2b949909c4eb0be4cdedb30648cc895
Author: Dan Lambright <dlambrig>
Date:   Thu May 7 12:27:49 2015 -0400

    cluster/tier: don't use hot tier until subvolumes ready
    
    This is a backport of fix 10435 to Gluster 3.7.
    
    When we attach a tier, the hot tier becomes the hashed
    subvolume. But directories may not yet have been replicated by
    the fix layout process. Hence lookups to those directories
    will fail on the hot subvolume. We should only go to the hashed
    subvolume once the layout has been fixed. This is known if the
    layout for the parent directory does not have an error. If
    there is an error, the cold tier is considered the hashed
    subvolume. The exception to this rules is ENOCON, in which
    case we do not know where the file is and must abort.
    
    Note we may revalidate a lookup for a directory even if the
    inode has not yet been populated by FUSE. This case can
    happen in tiering (where one tier has completed a lookup
    but the other has not, in which case we revalidate one tier
    when we call lookup the second time). Such inodes are
    still invalid and should not be consulted for validation.
    
    > http://review.gluster.org/#/c/10435/
    > Change-Id: Ia2bc62e1d807bd70590bd2a8300496264d73c523
    > BUG: 1214289
    > Signed-off-by: Dan Lambright <dlambrig>
    > Reviewed-on: http://review.gluster.org/10435
    > Tested-by: Gluster Build System <jenkins.com>
    > Reviewed-by: Raghavendra G <rgowdapp>
    > Reviewed-by: N Balachandran <nbalacha>
    > Signed-off-by: Dan Lambright <dlambrig>
    
    Change-Id: Ia2bc62e1d807bd70590bd2a8300496264d73c523
    BUG: 1219547
    Signed-off-by: Dan Lambright <dlambrig>
    Reviewed-on: http://review.gluster.org/10649
    Tested-by: NetBSD Build System
    Reviewed-by: Joseph Fernandes
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Vijay Bellur <vbellur>

Comment 2 Anoop 2015-05-13 12:46:26 UTC

Reproduced this ont the BETA2 build too, hence moving it to ASSIGNED.

Comment 3 Dan Lambright 2015-05-13 15:00:19 UTC

I am unable to reproduce this. Can you help?

1. What tool do you use for 100% writes?

2. What errors do you see?

The way I tried to reproduce this was

1. Create a replica volume
2. Start compile SSL on the volume
3. Attach a a tier while the I/O is in progress
4. Attach tier is successful

Comment 4 Dan Lambright 2015-05-13 16:24:19 UTC

I was able to recreated it doing this:

for i in {1..10000}; do echo Build $i;dd if=/dev/urandom of=f$i bs=100M count=1 ;done

Comment 5 Niels de Vos 2015-05-14 17:29:39 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report.

glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Comment 6 Niels de Vos 2015-05-14 17:36:01 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report.

glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Comment 7 Niels de Vos 2015-05-14 17:38:22 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report.

glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Comment 8 Niels de Vos 2015-05-14 17:47:32 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report.

glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Note You need to log in before you can comment on or make changes to this bug.