Bug 1532868 - gluster upgrade causes vm disk errors [NEEDINFO]
Summary: gluster upgrade causes vm disk errors
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: bitrot
Version: 3.12
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: bugs@gluster.org
QA Contact:
bugs@gluster.org
URL:
Whiteboard:
Depends On:
Blocks: 1598788
TreeView+ depends on / blocked
 
Reported: 2018-01-09 22:56 UTC by bugs
Modified: 2018-10-23 14:44 UTC (History)
10 users (show)

Fixed In Version: glusterfs-3.12.5
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1598788 (view as bug list)
Environment:
Last Closed: 2018-10-23 14:44:42 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
kdhananj: needinfo? (bugs)


Attachments (Terms of Use)

Description bugs 2018-01-09 22:56:11 UTC
Description of problem:
We have a gluster replica x3 that is used for vm storage. After upgrading from 3.10 to 3.12 ovirt started logging "VM <vm name> has been paused due to unknown storage error." and pausing the effected vms. In order to upgrade I used the gluster upgrade guide. 

After looking into it some more I found the following lines in our mount log. To me this seems that there is an issue with sharding. Note copying the vm image and using the new one seems to fix the problem

[2018-01-09 15:45:47.098380] E [shard.c:426:shard_modify_size_and_block_count] (-->/usr/lib64/glusterfs/3.12.3/xlator/cluster/distribute.so(+0x6abed) [0x7f7241b3dbed] -->/usr/lib64/glusterfs/3.12.3/xlator/features/shard.so(+0xbafe) [0x7f72418bbafe] -->/usr/lib64/glusterfs/3.12.3/xlator/features/shard.so(+0xb35b) [0x7f72418bb35b] ) 0-tortoise-shard: Failed to get trusted.glusterfs.shard.file-size for 6493ab88-f4a8-4696-a52e-6425a595fc80
[2018-01-09 15:45:47.098419] W [fuse-bridge.c:874:fuse_attr_cbk] 0-glusterfs-fuse: 4369015: STAT() /aee19709-5859-4e48-b761-c4f8a140ea61/images/855cfe92-22ad-4f40-94f8-2547fa5e0f8e/e29713bf-557a-4943-a7c5-c29edc141c01 => -1 (Invalid argument)

Version-Release number of selected component (if applicable):
centos-release-gluster310.noarch   1.0-1.el7.centos           @extras           
centos-release-gluster312.noarch   1.0-1.el7.centos           @extras           
glusterfs.x86_64                   3.12.3-1.el7               @centos-gluster312
glusterfs-api.x86_64               3.12.3-1.el7               @centos-gluster312
glusterfs-cli.x86_64               3.12.3-1.el7               @centos-gluster312
glusterfs-client-xlators.x86_64    3.12.3-1.el7               @centos-gluster312
glusterfs-fuse.x86_64              3.12.3-1.el7               @centos-gluster312
glusterfs-gnfs.x86_64              3.12.3-1.el7               @centos-gluster312
glusterfs-libs.x86_64              3.12.3-1.el7               @centos-gluster312
glusterfs-server.x86_64            3.12.3-1.el7               @centos-gluster312
libntirpc.x86_64                   1.5.3-1.el7                @centos-gluster312
nfs-ganesha.x86_64                 2.5.3-1.el6                @centos-gluster312
nfs-ganesha-gluster.x86_64         2.5.3-1.el6                @centos-gluster312
userspace-rcu.x86_64               0.10.0-3.el7               @centos-gluster312

previous version:
Dec 06 03:50:42 Updated: glusterfs-libs.x86_64 3.10.8-1.el7
Dec 06 03:50:42 Updated: glusterfs.x86_64 3.10.8-1.el7
Dec 06 03:50:42 Updated: glusterfs-client-xlators.x86_64 3.10.8-1.el7
Dec 06 03:50:42 Updated: glusterfs-api.x86_64 3.10.8-1.el7
Dec 06 03:50:42 Updated: glusterfs-fuse.x86_64 3.10.8-1.el7
Dec 06 03:50:42 Updated: glusterfs-cli.x86_64 3.10.8-1.el7
Dec 06 03:50:45 Updated: glusterfs-server.x86_64 3.10.8-1.el7
Dec 06 03:50:45 Updated: glusterfs-ganesha.x86_64 3.10.8-1.el

How reproducible:
Happened on multiple vm images. I do not have another cluster to try it on

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:
Volume Name: tortoise
Type: Replicate
Volume ID: d4c00537-f1e8-4c43-b21d-90c9a6c5dee9
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: stor1.example.com:/tort/brick
Brick2: stor2.example.com:/tort/brick
Brick3: stor3.example.com:/tort/brick (arbiter)
Options Reconfigured:
user.cifs: off
features.shard: on
cluster.shd-wait-qlength: 10000
cluster.shd-max-threads: 8
cluster.locking-scheme: granular
cluster.data-self-heal-algorithm: full
cluster.server-quorum-type: server
cluster.eager-lock: enable
performance.low-prio-threads: 32
performance.stat-prefetch: off
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
auth.allow: *
network.ping-timeout: 10
cluster.quorum-type: auto
server.allow-insecure: on
storage.owner-gid: 36
storage.owner-uid: 36
performance.strict-o-direct: enable
network.remote-dio: enable
transport.address-family: inet
nfs.disable: on
features.shard-block-size: 256MB
cluster.enable-shared-storage: disable

Comment 1 bugs 2018-01-18 16:27:11 UTC
We recently updated the cluster to 3.12.4. Images that had the problem still do. Just figured this is an interesting point since one of the major fixes in 3.12.4 was a shard issue.

Comment 2 Krutika Dhananjay 2018-01-19 03:41:13 UTC
(In reply to bugs from comment #0)
> Description of problem:
> We have a gluster replica x3 that is used for vm storage. After upgrading
> from 3.10 to 3.12 ovirt started logging "VM <vm name> has been paused due to
> unknown storage error." and pausing the effected vms. In order to upgrade I
> used the gluster upgrade guide. 
> 
> After looking into it some more I found the following lines in our mount
> log. To me this seems that there is an issue with sharding. Note copying the
> vm image and using the new one seems to fix the problem
> 
> [2018-01-09 15:45:47.098380] E
> [shard.c:426:shard_modify_size_and_block_count]
> (-->/usr/lib64/glusterfs/3.12.3/xlator/cluster/distribute.so(+0x6abed)
> [0x7f7241b3dbed]
> -->/usr/lib64/glusterfs/3.12.3/xlator/features/shard.so(+0xbafe)
> [0x7f72418bbafe]
> -->/usr/lib64/glusterfs/3.12.3/xlator/features/shard.so(+0xb35b)
> [0x7f72418bb35b] ) 0-tortoise-shard: Failed to get
> trusted.glusterfs.shard.file-size for 6493ab88-f4a8-4696-a52e-6425a595fc80
> [2018-01-09 15:45:47.098419] W [fuse-bridge.c:874:fuse_attr_cbk]
> 0-glusterfs-fuse: 4369015: STAT()
> /aee19709-5859-4e48-b761-c4f8a140ea61/images/855cfe92-22ad-4f40-94f8-
> 2547fa5e0f8e/e29713bf-557a-4943-a7c5-c29edc141c01 => -1 (Invalid argument)
> 
> Version-Release number of selected component (if applicable):
> centos-release-gluster310.noarch   1.0-1.el7.centos           @extras       
> 
> centos-release-gluster312.noarch   1.0-1.el7.centos           @extras       
> 
> glusterfs.x86_64                   3.12.3-1.el7              
> @centos-gluster312
> glusterfs-api.x86_64               3.12.3-1.el7              
> @centos-gluster312
> glusterfs-cli.x86_64               3.12.3-1.el7              
> @centos-gluster312
> glusterfs-client-xlators.x86_64    3.12.3-1.el7              
> @centos-gluster312
> glusterfs-fuse.x86_64              3.12.3-1.el7              
> @centos-gluster312
> glusterfs-gnfs.x86_64              3.12.3-1.el7              
> @centos-gluster312
> glusterfs-libs.x86_64              3.12.3-1.el7              
> @centos-gluster312
> glusterfs-server.x86_64            3.12.3-1.el7              
> @centos-gluster312
> libntirpc.x86_64                   1.5.3-1.el7               
> @centos-gluster312
> nfs-ganesha.x86_64                 2.5.3-1.el6               
> @centos-gluster312
> nfs-ganesha-gluster.x86_64         2.5.3-1.el6               
> @centos-gluster312
> userspace-rcu.x86_64               0.10.0-3.el7              
> @centos-gluster312
> 
> previous version:
> Dec 06 03:50:42 Updated: glusterfs-libs.x86_64 3.10.8-1.el7
> Dec 06 03:50:42 Updated: glusterfs.x86_64 3.10.8-1.el7
> Dec 06 03:50:42 Updated: glusterfs-client-xlators.x86_64 3.10.8-1.el7
> Dec 06 03:50:42 Updated: glusterfs-api.x86_64 3.10.8-1.el7
> Dec 06 03:50:42 Updated: glusterfs-fuse.x86_64 3.10.8-1.el7
> Dec 06 03:50:42 Updated: glusterfs-cli.x86_64 3.10.8-1.el7
> Dec 06 03:50:45 Updated: glusterfs-server.x86_64 3.10.8-1.el7
> Dec 06 03:50:45 Updated: glusterfs-ganesha.x86_64 3.10.8-1.el
> 
> How reproducible:
> Happened on multiple vm images. I do not have another cluster to try it on
> 
> Steps to Reproduce:
> 1.
> 2.
> 3.
> 
> Actual results:
> 
> 
> Expected results:
> 
> 
> Additional info:
> Volume Name: tortoise
> Type: Replicate
> Volume ID: d4c00537-f1e8-4c43-b21d-90c9a6c5dee9
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x (2 + 1) = 3
> Transport-type: tcp
> Bricks:
> Brick1: stor1.example.com:/tort/brick
> Brick2: stor2.example.com:/tort/brick
> Brick3: stor3.example.com:/tort/brick (arbiter)
> Options Reconfigured:
> user.cifs: off
> features.shard: on
> cluster.shd-wait-qlength: 10000
> cluster.shd-max-threads: 8
> cluster.locking-scheme: granular
> cluster.data-self-heal-algorithm: full
> cluster.server-quorum-type: server
> cluster.eager-lock: enable
> performance.low-prio-threads: 32
> performance.stat-prefetch: off
> performance.io-cache: off
> performance.read-ahead: off
> performance.quick-read: off
> auth.allow: *
> network.ping-timeout: 10
> cluster.quorum-type: auto
> server.allow-insecure: on
> storage.owner-gid: 36
> storage.owner-uid: 36
> performance.strict-o-direct: enable
> network.remote-dio: enable
> transport.address-family: inet
> nfs.disable: on
> features.shard-block-size: 256MB
> cluster.enable-shared-storage: disable


1. So this issue was seen *after* the upgrade and not *during* the same right?
2. Were you running rebalance/remove-brick when you saw this issue?
3. If this bug is consistently reproducible, could you please get the tcpdump output of the client machine and share the same along with the newly generated client and brick logs?
Here's what you need to do to capture tcpdump output:
tcpdump -i <network interface name> -w <output-filename>.pcap


-Krutika

Comment 3 bugs 2018-01-20 18:26:51 UTC
> 1. So this issue was seen *after* the upgrade and not *during* the same
> right?
Yes we started experiencing this issue after our cluster was upgraded to 3.12.3
> 2. Were you running rebalance/remove-brick when you saw this issue?
no the cluster was online and was not in a healing state
> 3. If this bug is consistently reproducible, could you please get the
> tcpdump output of the client machine and share the same along with the newly
> generated client and brick logs?
> Here's what you need to do to capture tcpdump output:
> tcpdump -i <network interface name> -w <output-filename>.pcap
This bug is reproducible in the fact that some of our vms will get paused at some point before copying the disks, and if we do not make a new image it will continue happening. I have no guarantee of how much data would be necessary, also the tcpdump would include vm image files.
> 
> 
> -Krutika

Comment 4 David Galloway 2018-02-22 20:44:10 UTC
I just hit this bug a few hours after upgrading from glusterfs-3.8.4-18.el7rhgs.x86_64 to 3.8.4-54.el7rhgs.x86_64.

The VM started back up fine after manually setting it to Up.

I have sosreports from the RHV Manager, the hypervisor, and one of the Gluster nodes if it'd help.

=== RHV Manager ===

2018-02-22 19:29:17,200Z INFO  [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-4) [] VM 'da38e6f8-9ccb-4ce0-95f0-afc98a171386'(teuthology) moved from 'Up' --> 'Paused'
2018-02-22 19:29:17,243Z INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-4) [] EVENT_ID: VM_PAUSED(1,025), Correlation ID: null, Call Stack: null, Custom ID: null, Custom Event ID: -1, Message: VM teuthology has been paused.
2018-02-22 19:29:17,290Z ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-4) [] EVENT_ID: VM_PAUSED_ERROR(139), Correlation ID: null, Call Stack: null, Custom ID: null, Custom Event ID: -1, Message: VM teuthology has been paused due to unknown storage error.

=== Hypervisor Gluster log ===

[2018-02-22 19:29:17.197805] E [shard.c:426:shard_modify_size_and_block_count] (-->/usr/lib64/glusterfs/3.8.4/xlator/cluster/distribute.so(+0x6a26d) [0x7f854855926d] -->/usr/lib64/glusterfs/3.8.4/xlator/features/shard.so(+0xb38e) [0x7f85482d838e] -->/usr/lib64/glusterfs/3.8.4/xlator/features/shard.so(+0xac3b) [0x7f85482d7c3b] ) 0-ssdstorage-shard: Failed to get trusted.glusterfs.shard.file-size for 4a9b76ea-b373-4129-adcb-3a9b04b14ea1
[2018-02-22 19:29:17.197840] W [fuse-bridge.c:767:fuse_attr_cbk] 0-glusterfs-fuse: 686161812: STAT() /ba9d818f-aa63-4a96-a9be-8d50d04fe44e/images/e4493c34-2805-41ed-8a1b-bfc1a6adad93/6f0ab3fe-e64f-4ad8-8a25-a0881e9713c3 => -1 (Invalid argument)


Nothing relevant in vdsm.log aside from this one line:

2018-02-22 20:29:17,108+0000 INFO  (periodic/314) [vdsm.api] START getVolumeSize(sdUUID=u'ba9d818f-aa63-4a96-a9be-8d50d04fe44e', spUUID=u'28fc87ad-2e28-44d2-8ce4-2e63b9bad4c6', imgUUID=u'e4493c34-2805-41ed-8a1b-bfc1a6adad93', volUUID=u'6f0ab3fe-e64f-4ad8-8a25-a0881e9713c3', options=None) from=internal, task_id=15aec0cf-8854-4959-9e14-3bb14a00dc63 (api:46)

Comment 5 Krutika Dhananjay 2018-02-23 07:00:10 UTC
(In reply to David Galloway from comment #4)
> I just hit this bug a few hours after upgrading from
> glusterfs-3.8.4-18.el7rhgs.x86_64 to 3.8.4-54.el7rhgs.x86_64.
> 
> The VM started back up fine after manually setting it to Up.
> 
> I have sosreports from the RHV Manager, the hypervisor, and one of the
> Gluster nodes if it'd help.
> 
> === RHV Manager ===
> 
> 2018-02-22 19:29:17,200Z INFO 
> [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer]
> (ForkJoinPool-1-worker-4) [] VM
> 'da38e6f8-9ccb-4ce0-95f0-afc98a171386'(teuthology) moved from 'Up' -->
> 'Paused'
> 2018-02-22 19:29:17,243Z INFO 
> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
> (ForkJoinPool-1-worker-4) [] EVENT_ID: VM_PAUSED(1,025), Correlation ID:
> null, Call Stack: null, Custom ID: null, Custom Event ID: -1, Message: VM
> teuthology has been paused.
> 2018-02-22 19:29:17,290Z ERROR
> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
> (ForkJoinPool-1-worker-4) [] EVENT_ID: VM_PAUSED_ERROR(139), Correlation ID:
> null, Call Stack: null, Custom ID: null, Custom Event ID: -1, Message: VM
> teuthology has been paused due to unknown storage error.
> 
> === Hypervisor Gluster log ===
> 
> [2018-02-22 19:29:17.197805] E
> [shard.c:426:shard_modify_size_and_block_count]
> (-->/usr/lib64/glusterfs/3.8.4/xlator/cluster/distribute.so(+0x6a26d)
> [0x7f854855926d]
> -->/usr/lib64/glusterfs/3.8.4/xlator/features/shard.so(+0xb38e)
> [0x7f85482d838e]
> -->/usr/lib64/glusterfs/3.8.4/xlator/features/shard.so(+0xac3b)
> [0x7f85482d7c3b] ) 0-ssdstorage-shard: Failed to get
> trusted.glusterfs.shard.file-size for 4a9b76ea-b373-4129-adcb-3a9b04b14ea1
> [2018-02-22 19:29:17.197840] W [fuse-bridge.c:767:fuse_attr_cbk]
> 0-glusterfs-fuse: 686161812: STAT()
> /ba9d818f-aa63-4a96-a9be-8d50d04fe44e/images/e4493c34-2805-41ed-8a1b-
> bfc1a6adad93/6f0ab3fe-e64f-4ad8-8a25-a0881e9713c3 => -1 (Invalid argument)
> 
> 
> Nothing relevant in vdsm.log aside from this one line:
> 
> 2018-02-22 20:29:17,108+0000 INFO  (periodic/314) [vdsm.api] START
> getVolumeSize(sdUUID=u'ba9d818f-aa63-4a96-a9be-8d50d04fe44e',
> spUUID=u'28fc87ad-2e28-44d2-8ce4-2e63b9bad4c6',
> imgUUID=u'e4493c34-2805-41ed-8a1b-bfc1a6adad93',
> volUUID=u'6f0ab3fe-e64f-4ad8-8a25-a0881e9713c3', options=None)
> from=internal, task_id=15aec0cf-8854-4959-9e14-3bb14a00dc63 (api:46)

If this bug is consistently reproducible, could you please get the tcpdump output of the gluster client machine and share the same along with the newly generated client and brick logs?
Here's what you need to do to capture tcpdump output:
tcpdump -i <network interface name> -w <output-filename>.pcap

Comment 6 David Galloway 2018-02-23 12:26:15 UTC
http://file.rdu.redhat.com/~dgallowa/bz1532868-2min.pcap

The pause happens at 16:25:17.  I unpaused it this morning and the VM paused again within minutes so I'm consistently able to reproduce at this time.  This unfortunately means all of Ceph's upstream testing has ground to a halt, however.

Comment 7 David Galloway 2018-02-23 13:13:56 UTC
[root@ssdstore02 9b]# getfattr -d -m . -e hex 4a9b76ea-b373-4129-adcb-3a9b04b14ea1
# file: 4a9b76ea-b373-4129-adcb-3a9b04b14ea1
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.ssdstorage-client-0=0x000000000000000000000000
trusted.afr.ssdstorage-client-1=0x000000000000000000000000
trusted.afr.ssdstorage-client-2=0x000000000000000000000000
trusted.bit-rot.version=0x020000000000000059122de60002a5dc
trusted.gfid=0x4a9b76eab3734129adcb3a9b04b14ea1
trusted.glusterfs.shard.block-size=0x0000000000400000
trusted.glusterfs.shard.file-size=0x000000fa0000000000000000000000000000000077dd75100000000000000000


More logs: file.rdu.redhat.com/~dgallowa/bz1532868/

Comment 8 Krutika Dhananjay 2018-02-23 18:11:54 UTC
Thanks David.

This issue is caused due to a bug in bit-rot. The patch at https://review.gluster.org/#/c/19021/ should fix this.

There is evidence in the logs to suggest everytime there was a log from shard about missing trusted.glusterfs.shard.file-size xattr, there was a metadata heal on the same file preceding it.

There is also evidence in tcpdump shared by David that suggests in one rare instance of STAT, the xattr trusted.glusterfs.shard.file-size xattr was not returned by the brick and right then the pause happened.

In upstream this fix is available since glusterfs-3.12.5.

You can work around the issue by disabling metadata-self-heal:

# gluster volume set <VOL> cluster.metadata-self-heal off

Could you please confirm if this fixed the issue for you?

-Krutika

Comment 14 Shyamsundar 2018-10-23 14:44:42 UTC
Bug stands fixed in 3.12.5 as per comment #8, closing the bug.


Note You need to log in before you can comment on or make changes to this bug.