Bug 1471690
Summary: | [Ganesha] : Ganesha crashed (pub_glfs_fstat) when IO resumed post failover/failback. | ||
---|---|---|---|
Product: | [Retired] nfs-ganesha | Reporter: | Ambarish <asoman> |
Component: | NFS | Assignee: | Frank Filz <ffilz> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 2.4 | CC: | bturner, dang, ffilz, jthottan, kkeithle, mbenjamin, pasik, rhinduja, skoduri |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2019-11-22 15:48:43 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Ambarish
2017-07-17 09:19:33 UTC
@Frank/Dan The core analysis for this issue is here - https://bugzilla.redhat.com/show_bug.cgi?id=1469474#c5 From code inspection at-least I cant see an issue with passing my_fd directly as glusterfs_open_my_fd input argument.. Even on gdb, I observed that my_fd values are intact after we come out of this function. Do you see any race here or am I missing to look at some basic fundamental issue here? passing my_fd is fine, but I think the issue was passing &my_fd, which likely causes memory corruption. Sorry. do you mean other way around? Here is the current code flow glusterfs_open2() { ... struct glusterfs_fd *my_fd = NULL; .... if (state != NULL) my_fd = &container_of(state, struct glusterfs_state_fd, state)->glusterfs_fd; .... .... /* truncate is set in p_flags */ status = glusterfs_open_my_fd(myself, openflags, p_flags, my_fd); >>> we are passing my_fd itself here. | v fsal_status_t glusterfs_open_my_fd(struct glusterfs_handle *objhandle, fsal_openflags_t openflags, int posix_flags, struct glusterfs_fd *my_fd) { ... my_fd->glfd = glfd; my_fd->openflags = openflags; my_fd->creds.caller_uid = op_ctx->creds->caller_uid; my_fd->creds.caller_gid = op_ctx->creds->caller_gid; my_fd->creds.caller_glen = op_ctx->creds->caller_glen; garray_copy = &my_fd->creds.caller_garray; ... if ((*garray_copy) != NULL) { /* Replace old creds */ gsh_free(*garray_copy); } Or do you think the issue mentioned in https://review.gerrithub.io/#/c/371135/ wrt using caller_garray post its free may have caused this memory_corruption in my_fd as well here? I am trying to reproduce this issue to check if the https://review.gerrithub.io/#/c/371135/ fixes this issue as well. Thanks! No, I think we're in agreement. The code above is correct; the code that was in place when https://bugzilla.redhat.com/show_bug.cgi?id=1469474 hit (before the revert) looked like this: /* truncate is set in p_flags */ status = glusterfs_open_my_fd(myself, openflags, p_flags, &my_fd); Note &my_fd there. That code is wrong. I think it's unrelated to the garray_copy bits. Oh..Dan++ thanks.. I dint notice that change in the backport. So this bug just needs re-test with 2.5 sources and then can mark it CLOSED/VERIFIED. |