1347251 – fix the issue of Rolling upgrade or non-disruptive upgrade of disperse or erasure code volume to work

Bug 1347251 - fix the issue of Rolling upgrade or non-disruptive upgrade of disperse or erasure code volume to work

Summary: fix the issue of Rolling upgrade or non-disruptive upgrade of disperse or era...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	disperse
Sub Component:
Version:	rhgs-3.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Target Release:	RHGS 3.2.0
Assignee:	Ashish Pandey
QA Contact:	Nag Pavan Chilakam
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1347686 1351522 1351530 1360152 1360174
TreeView+	depends on / blocked

Reported:	2016-06-16 11:44 UTC by Nag Pavan Chilakam
Modified:	2023-09-14 03:26 UTC (History)
CC List:	7 users (show)
Fixed In Version:	glusterfs-3.8.4-1
Doc Type:	Bug Fix
Doc Text:	Online rolling upgrades were not possible from Red Hat Gluster Storage 3.1.x to 3.1.y (where y is more recent than x) because of client limitations. Red Hat Gluster Storage 3.2 enables online rolling upgrades from 3.2.x to 3.2.y (where y is more recent than x).
Clone Of:
Clones:	1347686 1422539 (view as bug list)
Environment:
Last Closed:	2017-03-23 05:36:44 UTC
Embargoed:
Dependent Products:
Flags:	aspandey: needinfo-

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2017:0486	0	normal	SHIPPED_LIVE	Moderate: Red Hat Gluster Storage 3.2.0 security, bug fix, and enhancement update	2017-03-23 09:18:45 UTC

Description Nag Pavan Chilakam 2016-06-16 11:44:31 UTC

Description of problem:
========================
When we do a rolling upgrade of an ec volume from 3.1.2 to 3.1.3, I am hitting IO error after updating one node successfully and moving on to 2nd node to update


Version-Release number of selected component (if applicable):
==================================
before upgrade:
[root@node1-dhcp35-213 rhn]# cat /etc/redhat-*
Red Hat Enterprise Linux Server release 6.7 (Santiago)
Red Hat Gluster Storage Server 3.1 Update 2
[root@node1-dhcp35-213 rhn]# rpm -qa|grep gluster
vdsm-gluster-4.16.30-1.3.el6rhs.noarch
glusterfs-server-3.7.5-19.el6rhs.x86_64
glusterfs-api-3.7.5-19.el6rhs.x86_64
gluster-nagios-common-0.2.3-1.el6rhs.noarch
glusterfs-3.7.5-19.el6rhs.x86_64
glusterfs-client-xlators-3.7.5-19.el6rhs.x86_64
glusterfs-geo-replication-3.7.5-19.el6rhs.x86_64
glusterfs-cli-3.7.5-19.el6rhs.x86_64
glusterfs-rdma-3.7.5-19.el6rhs.x86_64
python-gluster-3.7.5-19.el6rhs.noarch
gluster-nagios-addons-0.2.5-1.el6rhs.x86_64
glusterfs-libs-3.7.5-19.el6rhs.x86_64
glusterfs-fuse-3.7.5-19.el6rhs.x86_64


after upgrade:
[root@node3-dhcp35-62 rhn]# cat /etc/redhat-*
Red Hat Enterprise Linux Server release 6.8 (Santiago)
Red Hat Gluster Storage Server 3.1 Update 3
[root@node3-dhcp35-62 rhn]# rpm -qa|grep gluster
glusterfs-rdma-3.7.9-10.el6rhs.x86_64
glusterfs-libs-3.7.9-10.el6rhs.x86_64
python-gluster-3.7.9-10.el6rhs.noarch
glusterfs-api-3.7.9-10.el6rhs.x86_64
glusterfs-client-xlators-3.7.9-10.el6rhs.x86_64
gluster-nagios-common-0.2.4-1.el6rhs.noarch
vdsm-gluster-4.16.30-1.5.el6rhs.noarch
glusterfs-3.7.9-10.el6rhs.x86_64
glusterfs-fuse-3.7.9-10.el6rhs.x86_64
glusterfs-server-3.7.9-10.el6rhs.x86_64
gluster-nagios-addons-0.2.7-1.el6rhs.x86_64
glusterfs-geo-replication-3.7.9-10.el6rhs.x86_64
glusterfs-cli-3.7.9-10.el6rhs.x86_64
[root@node3-dhcp35-62 rhn]# 


How reproducible:
===================
always


Steps to Reproduce:
=================
1.have a 3 node cluster with 4 bricks each and make sure you have setup all the configurations for doing an update(channel registration, storage subscription etc)
2.create an dist-ec volume such that each node hosts max 2 bricks of one dht-subvol
3.now start the volume
4.enable quotas
5. mount volume using fuse on a client
6. trigger IO by downloading kernel image and start untarring it
7. Now bring down all gluster processes including bricks, deamons and glusterd using pkill glusterfs,glusterfsd and service glusterd stop
8. issue an yum update to update to latest packages including gluster
===>IOs should be happening without any issue
9. Now once updating is successful, start glusterd==>this too will work and IOs will still be happening.
10. Now that node3 is updated sucessfully, we now move on to next node say node2
11. Now kill glusterfs,glusterfsd and glusterd on node2
====>You will now hit IO error or IO will stop abruptly.
To recheck just create a dir and try to copy the kernel.tar to this dir and it will fail as below:
[root@nchilaka-rhel6-fuseclient1-43-172 kern]# cp linux-4.6.2.tar.xz dir.6
cp: reading `linux-4.6.2.tar.xz': Input/output error

Comment 2 Nag Pavan Chilakam 2016-06-20 11:36:06 UTC

Note for QE and Dev:Once this bug is fixed, make sure that the documentation is updated for that release accordingly Refer 1347252 - [DOC]: Have a note saying non-disruptive or rolling upgrade is not supported for a disperse volume

Comment 4 Atin Mukherjee 2016-09-17 13:28:20 UTC

Upstream mainline : http://review.gluster.org/14761
Upstream 3.8 : http://review.gluster.org/15013

And the fix is available in rhgs-3.2.0 as part of rebase to GlusterFS 3.8.4.

Comment 7 Nag Pavan Chilakam 2016-11-09 10:45:23 UTC

Still seeing EIO:
had a 4+2 setup on a 3 node setup, means 2 ec bricks on each node.

pumping IOs(lin untar) from one client
upgraded node#1 and healing completed
Now brought down node#2
Seeing EIO as soon as node#2 is down:
tar: linux-4.8.6/drivers/nfc/nfcmrvl/spi.c: Cannot open: Input/output error
linux-4.8.6/drivers/nfc/nfcmrvl/uart.c
tar: linux-4.8.6/drivers/nfc/nfcmrvl/uart.c: Cannot open: Input/output error
linux-4.8.6/drivers/nfc/nfcmrvl/usb.c
tar: linux-4.8.6/drivers/nfc/nfcmrvl/usb.c: Cannot open: Input/output error
linux-4.8.6/drivers/nfc/nfcsim.c
tar: linux-4.8.6/drivers/nfc/nfcsim.c: Cannot open: Input/output error
linux-4.8.6/drivers/nfc/nfcwilink.c
tar: linux-4.8.6/drivers/nfc/nfcwilink.c: Cannot open: Input/output error
linux-4.8.6/drivers/nfc/nxp-nci/
tar: linux-4.8.6/drivers/nfc/nxp-nci: Cannot mkdir: Input/output error
linux-4.8.6/drivers/nfc/nxp-nci/Kconfig
tar: linux-4.8.6/drivers/nfc/nxp-nci/Kconfig: Cannot open: Input/output error
linux-4.8.6/drivers/nfc/nxp-nci/Makefile
tar: linux-4.8.6/drivers/nfc/nxp-nci/Makefile: Cannot open: Input/output error
linux-4.8.6/drivers/nfc/nxp-nci/core.c
tar: linux-4.8.6/drivers/nfc/nxp-nci/core.c: Cannot open: Input/output error
linux-4.8.6/drivers/nfc/nxp-nci/firmware.c
tar: linux-4.8.6/drivers/nfc/nxp-nci/firmware.c: Cannot open: Input/output error
linux-4.8.6/drivers/nfc/nxp-nci/i2c.c
tar: linux-4.8.6/drivers/nfc/nxp-nci/i2c.c: Cannot open: Input/output error
linux-4.8.6/drivers/nfc/nxp-nci/nxp-nci.h
tar: linux-4.8.6/drivers/nfc/nxp-nci/nxp-nci.h: Cannot open: Input/output error
linux-4.8.6/drivers/nfc/pn533/
tar: linux-4.8.6/drivers/nfc/pn533: Cannot mkdir: Input/output error
linux-4.8.6/drivers/nfc/pn533/Kconfig
tar: linux-4.8.6/drivers/nfc/pn533/Kconfig: Cannot open: Input/output error
linux-4.8.6/drivers/nfc/pn533/Makefile
tar: linux-4.8.6/drivers/nfc/pn533/Makefile: Cannot open: Input/output error
linux-4.8.6/drivers/nfc/pn533/i2c.c
tar: linux-4.8.6/drivers/nfc/pn533/i2c.c: Cannot open: Input/output error
linux-4.8.6/drivers/nfc/pn533/pn533.c
tar: linux-4.8.6/drivers/nfc/pn533/pn533.c: Cannot open: Input/output error
linux-4.8.6/drivers/nfc/pn533/pn533.h
tar: linux-4.8.6/drivers/nfc/pn533/pn533.h: Cannot open: Input/output error
linux-4.8.6/drivers/nfc/pn533/usb.c
tar: linux-4.8.6/drivers/nfc/pn533/usb.c: Cannot open: Input/output error
linux-4.8.6/drivers/nfc/pn544/
tar: linux-4.8.6/drivers/nfc/pn544: Cannot mkdir: Input/output error
linux-4.8.6/drivers/nfc/pn544/Kconfig
tar: linux-4.8.6/drivers/nfc/pn544/Kconfig: Cannot open: Input/output error
linux-4.8.6/drivers/nfc/pn544/Makefile
tar: linux-4.8.6/drivers/nfc/pn544/Makefile: Cannot open: Input/output error
linux-4.8.6/drivers/nfc/pn544/i2c.c
tar: linux-4.8.6/drivers/nfc/pn544/i2c.c: Cannot open: Input/output error
linux-4.8.6/drivers/nfc/pn544/mei.c




Another note:
On a 6 nodes setup seeing spurious heal info entries(only mistake or configuration change from my side was that the clients were already upgraded)


I tried this on a 4+2 setup 6node setup
I fuse mounted the vol on two clients and trigger linux untar and dd of 10000files in each of the cleint
I then brought down node#1 and #2 and upgraded them.

The upgrade went smooth.
But the heal info never completes due to spurious entries. THat is the files getting written at that time are being shown as heal pending.

This leaves the admin confused as the heal info never shows as complete until IOs are stopped.

Hence admin can never proceed with completing the upgrade .
Discussed with Pranith and hence moving to failed_qa
(At the best it is still a blocker for verifying this bug)


[root@dhcp35-239 ~]# 
[root@dhcp35-239 ~]# rpm -qa|grep gluster
gluster-nagios-common-0.2.4-1.el7rhgs.noarch
glusterfs-server-3.7.9-12.el7rhgs.x86_64
glusterfs-client-xlators-3.7.9-12.el7rhgs.x86_64
python-gluster-3.7.9-12.el7rhgs.noarch
glusterfs-libs-3.7.9-12.el7rhgs.x86_64
glusterfs-api-3.7.9-12.el7rhgs.x86_64
glusterfs-cli-3.7.9-12.el7rhgs.x86_64
glusterfs-geo-replication-3.7.9-12.el7rhgs.x86_64
vdsm-gluster-4.17.33-1.el7rhgs.noarch
glusterfs-rdma-3.7.9-12.el7rhgs.x86_64
glusterfs-3.7.9-12.el7rhgs.x86_64
gluster-nagios-addons-0.2.7-1.el7rhgs.x86_64
glusterfs-fuse-3.7.9-12.el7rhgs.x86_64

Comment 8 Ashish Pandey 2016-11-14 11:26:06 UTC

Main issue of this BZ has been fixed on client side -
------------------------------------
This issue arises when we do a rolling update from 3.7.5 to 3.7.9. For 4+2 volume running 3.7.5, if we update 2 nodes and after heal completion kill 2 older nodes, this problem can be seen. After update and killing of bricks, 2 nodes will return inodelk count key in dict while other 2 nodes will not have inodelk count in dict. This is also true for get-link-count. During dictionary match , ec_dict_compare, this will lead to mismatch of answers and the file operation on mount point will fail with IO error.

To solve this don't match inode, entry and link count keys while comparing two dictionaries. However, while combining the data in ec_dict_combine, go through all the dictionaries and select the maximum values received in different dicts for these keys.

Due to this we have to upgrade client first to make sure that we do not compare inodelk counts, which we get from server and are different, on client side.

This is what mentioned in previous comment.

-----------------------------------

However, this brings the new issue.

In patch http://review.gluster.org/#/c/13733/, for every update fop we set dirty flag. Which means we create index entries then perform fop and then unset dirty flag.
So while fop is going on and we use heal info command, it also shows the files on which IO is going on.
Setting and clearing dirty flag triggers from client side.

To solve this issue, we have sent patch http://review.gluster.org/#/c/15543/
In this patch we take _LOCK_ on a file for which index entry is created. We check the version and size and only when these two differ, we list it in heal info.

The problem is -
- we have upgraded the client, that means it will set dirty flag and create
index entries on servers.
- Now, we are upgrading nodes to new version one by one. However, in this
process, some old nodes will NOT have the second patch which can take LOCKS
on files and investigate if it really needs heal or not.

So the nodes which have old version will list those indices as heal NEEDED.

Comment 12 Ashish Pandey 2016-11-16 08:02:11 UTC

I am setting require_doc_text as ? as I think customer should know the issue about rolling upgrade for 3.1.X, although this is not fixed for 3.1.X

Comment 15 Nag Pavan Chilakam 2017-02-06 13:59:08 UTC

also make sure to retest (as part of sanity)
1347257 - spurious heal info as pending heal entries never end on an EC volume while IOs are going on

Comment 19 Nag Pavan Chilakam 2017-02-15 13:59:55 UTC

Cloned this bug to 1422539 - IO error seen with Rolling or non-disruptive upgrade of an distribute-disperse(EC) volume from 3.1.2 to 3.1.3

I am closed the BZ#1422539 to can't fix as it is theoritically not possible as we can't make change to a shipped release.

Hence I am changing the title of this bz to reflect the actual problem, ie rolling upgrade of an EC volume

Comment 20 Nag Pavan Chilakam 2017-02-15 14:01:47 UTC

based on above comments, 
I am moving to verified

Comment 21 Nag Pavan Chilakam 2017-02-15 14:16:06 UTC

Points to note with this fix:
=============================
1)Rolling or non-disruptive upgrade will work with base release as 3.2.0 to any greater version release. ie 3.2.0 to 3.2.0_beyond say 3.3
2)Rolling or non-disruptive upgrade will NOT work with base release as <3.2 to any supported release. Eg:3.1.3 to 3.2 or 3.1.2 to 3.1.3 or 3.1.2 to 3.2 will Not work, hence do disruptive upgrade.
3) to test the fix, we tested between dev builds of 3.2.0 ie b/w 3.8.4-3 to 3.8.4-12 , the rolling upgrade worked in this case

Comment 24 errata-xmlrpc 2017-03-23 05:36:44 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0486.html

Comment 25 Red Hat Bugzilla 2023-09-14 03:26:57 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days

Note You need to log in before you can comment on or make changes to this bug.