Bug 1390521 - qemu gfapi in 3.8.5 is broken
Summary: qemu gfapi in 3.8.5 is broken
Keywords:
Status: CLOSED EOL
Alias: None
Product: GlusterFS
Classification: Community
Component: libgfapi
Version: 3.8
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: rjoseph
QA Contact: SATHEESARAN
URL:
Whiteboard:
Depends On: 1392286
Blocks: 1388560
TreeView+ depends on / blocked
 
Reported: 2016-11-01 09:56 UTC by Dmitry Melekhov
Modified: 2023-09-14 03:33 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-11-07 10:38:57 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)
qemu log (150.15 KB, text/plain)
2016-11-03 04:45 UTC, Dmitry Melekhov
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1392286 0 unspecified CLOSED gfapi clients crash while using async calls due to double fd_unref 2021-02-22 00:41:40 UTC

Internal Links: 1392286

Description Dmitry Melekhov 2016-11-01 09:56:55 UTC
Description of problem:

after upgrade to 3.8.5 from 3.8.4 on Centos 7 host we get:

block I/O error in device 'drive-ide0-0-0': Input/output error (5)

on all VMs, all VMs we run have virtual ide drives...


fuse mounts work OK though.

I don't see any errors in volume's brick log:

[2016-11-01 09:29:56.519571] I [MSGID: 115029] [server-handshake.c:692:server_setvolume] 0-pool-server: accepted client from son-27691-2016/11/01-09:29:51:431330-pool-client-6-0-0 (version: 3.8.5)
[2016-11-01 09:29:56.522235] I [login.c:76:gf_auth] 0-auth/login: allowed user names: a02c192c-679c-4929-97d3-b0ae0595df52
[2016-11-01 09:29:56.522305] I [MSGID: 115029] [server-handshake.c:692:server_setvolume] 0-pool-server: accepted client from spirit-5113-2016/11/01-09:29:51:784139-pool-client-6-0-0 (version: 3.8.5)
[2016-11-01 09:29:58.563046] I [login.c:76:gf_auth] 0-auth/login: allowed user names: a02c192c-679c-4929-97d3-b0ae0595df52

What is interesting here is that we have 3 nodes and volume is 3way replicated,
we got errors right after we upgraded last node from 3.8.4  to 3.8.5.

And we did not forget to upgrade 

[root@father qemu]# rpm -qf /lib64/libglusterfs.so.0
glusterfs-libs-3.8.5-1.el7.x86_64

and restart libvirtd.

Downgrading back to 3.8.4 solved problem.

Thank you!

Comment 1 Soumya Koduri 2016-11-03 04:35:10 UTC
I am not sure where qemu/gfapi driver logging happens. But do you see any warnings/errors there?

Comment 2 Dmitry Melekhov 2016-11-03 04:45:42 UTC
Created attachment 1216849 [details]
qemu log

Hello!

This is log, which contains ide i/o error and previous messages.

It also contains log after downgrade to 3.8.4.
Thank you!

Comment 3 Soumya Koduri 2016-11-03 07:07:23 UTC
Okay. I do not see any obvious errors in the log file. CCin Prasanna.

Do you see any issues with the newly created volume after upgrade to 3.8.5?

A patch addressing few ref leaks in gfapi went into 3.8.5 I think. There are some issues reported in gluster-devel wrt that patch -

http://www.gluster.org/pipermail/gluster-devel/2016-October/051234.html

Not sure if its related to this issue. But patch http://review.gluster.org/#/c/15768/ is submitted to address that.

Comment 4 Dmitry Melekhov 2016-11-03 07:20:10 UTC
No, I did not try to create new volumes with 3.8.5.
Unfortunately, we are in sort of production and have no test environment right now, so I can't try to reproduce this or do any tests...

Comment 5 Prasanna Kumar Kalever 2016-11-03 07:37:26 UTC
From the log I cannot see anything suspicious,

[2016-11-01 08:58:31.174381] I [MSGID: 104041] [glfs-resolve.c:885:__glfs_active_subvol] 0-pool: switched to graph 66617468-6572-2d32-3437-39342d323031 (0)
block I/O error in device 'drive-ide0-0-0': Input/output error (5)

The chance that http://review.gluster.org/#/c/15768/ probably fix is only if there are any async calls made before ?

Comment 6 Dmitry Melekhov 2016-12-13 10:31:01 UTC
Hello!

Just tried to reproduce upgrade from 3.8.4 to 3.8.5 on Centos 7.
Immediately reproduced.
Any progress in fixing this problem for 3.8.x?

Thank you!

Comment 7 SATHEESARAN 2016-12-16 03:29:23 UTC
(In reply to Need Real Name from comment #6)
> Hello!
> 
> Just tried to reproduce upgrade from 3.8.4 to 3.8.5 on Centos 7.
> Immediately reproduced.
> Any progress in fixing this problem for 3.8.x?
> 
> Thank you!

Hello there,

The issue is already fixed with the patch[1] in gluster 3.8 branch, which I tested.
Please try with the latest on glusterfs 3.8.z[2] and let us know, if you have more problems.

[1] - http://review.gluster.org/#/c/15779/1
[2] - http://buildlogs.centos.org/centos/7/storage/x86_64/gluster-3.8/

Thanks again for tracking the issue and looking for the solution.

Comment 8 SATHEESARAN 2016-12-16 03:32:08 UTC
Rajesh,

Since this bug is already fixed, can we move the state of the bug to appropriate state, so that the users will not be confused with the bug state.

Comment 9 Dmitry Melekhov 2016-12-16 05:37:10 UTC
Installed 3.8.7 , everything is fine.
Thank you!

btw, very strange that 3.8.5 is last version in stable SIG repository :-(

Comment 10 SATHEESARAN 2016-12-16 13:06:48 UTC
(In reply to Need Real Name from comment #9)
> Installed 3.8.7 , everything is fine.
> Thank you!
> 
> btw, very strange that 3.8.5 is last version in stable SIG repository :-(

Happy to hear that it worked for you !!

@kshlm, By any chance, do you know any reasons why the stable SIG repository has glusterfs-3.8.5 ? It should be the latest of glusterfs isn't it ?

Comment 11 rjoseph 2016-12-20 07:28:27 UTC
Following patch fixes this issue hence moving the bug state to MODIFIED.

http://review.gluster.org/15779

Comment 12 Niels de Vos 2017-11-07 10:38:57 UTC
This bug is getting closed because the 3.8 version is marked End-Of-Life. There will be no further updates to this version. Please open a new bug against a version that still receives bugfixes if you are still facing this issue in a more current release.

Comment 13 Red Hat Bugzilla 2023-09-14 03:33:43 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days


Note You need to log in before you can comment on or make changes to this bug.