1390521 – qemu gfapi in 3.8.5 is broken

Bug 1390521 - qemu gfapi in 3.8.5 is broken

Summary: qemu gfapi in 3.8.5 is broken

Keywords:
Status:	CLOSED EOL
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	libgfapi
Sub Component:
Version:	3.8
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	rjoseph
QA Contact:	SATHEESARAN
Docs Contact:
URL:
Whiteboard:
Depends On:	1392286
Blocks:	1388560
TreeView+	depends on / blocked

Reported:	2016-11-01 09:56 UTC by Dmitry Melekhov
Modified:	2023-09-14 03:33 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2017-11-07 10:38:57 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
qemu log (150.15 KB, text/plain) 2016-11-03 04:45 UTC, Dmitry Melekhov	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	1392286	0	unspecified	CLOSED	gfapi clients crash while using async calls due to double fd_unref	2021-02-22 00:41:40 UTC

Internal Links: 1392286

Description Dmitry Melekhov 2016-11-01 09:56:55 UTC

Description of problem:

after upgrade to 3.8.5 from 3.8.4 on Centos 7 host we get:

block I/O error in device 'drive-ide0-0-0': Input/output error (5)

on all VMs, all VMs we run have virtual ide drives...


fuse mounts work OK though.

I don't see any errors in volume's brick log:

[2016-11-01 09:29:56.519571] I [MSGID: 115029] [server-handshake.c:692:server_setvolume] 0-pool-server: accepted client from son-27691-2016/11/01-09:29:51:431330-pool-client-6-0-0 (version: 3.8.5)
[2016-11-01 09:29:56.522235] I [login.c:76:gf_auth] 0-auth/login: allowed user names: a02c192c-679c-4929-97d3-b0ae0595df52
[2016-11-01 09:29:56.522305] I [MSGID: 115029] [server-handshake.c:692:server_setvolume] 0-pool-server: accepted client from spirit-5113-2016/11/01-09:29:51:784139-pool-client-6-0-0 (version: 3.8.5)
[2016-11-01 09:29:58.563046] I [login.c:76:gf_auth] 0-auth/login: allowed user names: a02c192c-679c-4929-97d3-b0ae0595df52

What is interesting here is that we have 3 nodes and volume is 3way replicated,
we got errors right after we upgraded last node from 3.8.4  to 3.8.5.

And we did not forget to upgrade 

[root@father qemu]# rpm -qf /lib64/libglusterfs.so.0
glusterfs-libs-3.8.5-1.el7.x86_64

and restart libvirtd.

Downgrading back to 3.8.4 solved problem.

Thank you!

Comment 1 Soumya Koduri 2016-11-03 04:35:10 UTC

I am not sure where qemu/gfapi driver logging happens. But do you see any warnings/errors there?

Comment 2 Dmitry Melekhov 2016-11-03 04:45:42 UTC

Created attachment 1216849 [details]
qemu log

Hello!

This is log, which contains ide i/o error and previous messages.

It also contains log after downgrade to 3.8.4.
Thank you!

Comment 3 Soumya Koduri 2016-11-03 07:07:23 UTC

Okay. I do not see any obvious errors in the log file. CCin Prasanna.

Do you see any issues with the newly created volume after upgrade to 3.8.5?

A patch addressing few ref leaks in gfapi went into 3.8.5 I think. There are some issues reported in gluster-devel wrt that patch -

http://www.gluster.org/pipermail/gluster-devel/2016-October/051234.html

Not sure if its related to this issue. But patch http://review.gluster.org/#/c/15768/ is submitted to address that.

Comment 4 Dmitry Melekhov 2016-11-03 07:20:10 UTC

No, I did not try to create new volumes with 3.8.5.
Unfortunately, we are in sort of production and have no test environment right now, so I can't try to reproduce this or do any tests...

Comment 5 Prasanna Kumar Kalever 2016-11-03 07:37:26 UTC

From the log I cannot see anything suspicious,

[2016-11-01 08:58:31.174381] I [MSGID: 104041] [glfs-resolve.c:885:__glfs_active_subvol] 0-pool: switched to graph 66617468-6572-2d32-3437-39342d323031 (0)
block I/O error in device 'drive-ide0-0-0': Input/output error (5)

The chance that http://review.gluster.org/#/c/15768/ probably fix is only if there are any async calls made before ?

Comment 6 Dmitry Melekhov 2016-12-13 10:31:01 UTC

Hello!

Just tried to reproduce upgrade from 3.8.4 to 3.8.5 on Centos 7.
Immediately reproduced.
Any progress in fixing this problem for 3.8.x?

Thank you!

Comment 7 SATHEESARAN 2016-12-16 03:29:23 UTC

(In reply to Need Real Name from comment #6)
> Hello!
> 
> Just tried to reproduce upgrade from 3.8.4 to 3.8.5 on Centos 7.
> Immediately reproduced.
> Any progress in fixing this problem for 3.8.x?
> 
> Thank you!

Hello there,

The issue is already fixed with the patch[1] in gluster 3.8 branch, which I tested.
Please try with the latest on glusterfs 3.8.z[2] and let us know, if you have more problems.

[1] - http://review.gluster.org/#/c/15779/1
[2] - http://buildlogs.centos.org/centos/7/storage/x86_64/gluster-3.8/

Thanks again for tracking the issue and looking for the solution.

Comment 8 SATHEESARAN 2016-12-16 03:32:08 UTC

Rajesh,

Since this bug is already fixed, can we move the state of the bug to appropriate state, so that the users will not be confused with the bug state.

Comment 9 Dmitry Melekhov 2016-12-16 05:37:10 UTC

Installed 3.8.7 , everything is fine.
Thank you!

btw, very strange that 3.8.5 is last version in stable SIG repository :-(

Comment 10 SATHEESARAN 2016-12-16 13:06:48 UTC

(In reply to Need Real Name from comment #9)
> Installed 3.8.7 , everything is fine.
> Thank you!
> 
> btw, very strange that 3.8.5 is last version in stable SIG repository :-(

Happy to hear that it worked for you !!

@kshlm, By any chance, do you know any reasons why the stable SIG repository has glusterfs-3.8.5 ? It should be the latest of glusterfs isn't it ?

Comment 11 rjoseph 2016-12-20 07:28:27 UTC

Following patch fixes this issue hence moving the bug state to MODIFIED.

http://review.gluster.org/15779

Comment 12 Niels de Vos 2017-11-07 10:38:57 UTC

This bug is getting closed because the 3.8 version is marked End-Of-Life. There will be no further updates to this version. Please open a new bug against a version that still receives bugfixes if you are still facing this issue in a more current release.

Comment 13 Red Hat Bugzilla 2023-09-14 03:33:43 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days

Note You need to log in before you can comment on or make changes to this bug.