This service will be undergoing maintenance at 00:00 UTC, 2017-10-23 It is expected to last about 30 minutes
Bug 499918 - saCkptSectionIterationNext() error
saCkptSectionIterationNext() error
Status: CLOSED UPSTREAM
Product: Fedora
Classification: Fedora
Component: openais (Show other bugs)
rawhide
All Linux
low Severity medium
: ---
: ---
Assigned To: Jan Friesse
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2009-05-08 18:10 EDT by David Teigland
Modified: 2009-06-01 04:47 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-06-01 04:47:37 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Patch for Makefile.am, so ipc_hdb is no included multiple times (3.14 KB, patch)
2009-05-27 10:24 EDT, Jan Friesse
no flags Details | Diff

  None (edit)
Description David Teigland 2009-05-08 18:10:48 EDT
Description of problem:

I think we may have lost something in transit between irc/email/svn,

Mar 26 16:10:20 <dct>   confchg, node1 create ckpt, node2 open ckpt, node2
                        read ckpt -> fail

Mar 26 16:10:46 <dct>   nodeid 1 creates the ckpt

Mar 26 16:13:42 <dct>   saCkptCheckpointOpen() works,
                        saCkptSectionIterationInitialize() works,
                        then saCkptSectionIterationNext() fails

Mar 26 16:30:34 <sdake> wow iteration fails straight up single node
Mar 26 16:30:39 <sdake> that was working like 1 week ago or less
Mar 26 16:52:30 <sdake> dct found problem
Mar 26 16:52:32 <sdake> patch coming to list now

This looks like the patch, but I don't see it in svn
https://lists.linux-foundation.org/pipermail/openais/2009-March/011048.html

And I'm still getting error 9 (BAD_HANDLE) from saCkptSectionIterationNext().   


Version-Release number of selected component (if applicable):


How reproducible:

node1: mount gfs
node1: take plock on gfs
node2: mount gfs

when node2 mounts, node1 creates a checkpoint containing the info about the plock, node2 then opens and tries to read that checkpoint.  node2 will then show this error in /var/log/messages

bull-02 dlm_controld[8233]: retrieve_plocks: ckpt iternext error 9 x

Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:
Comment 1 Jan Friesse 2009-05-27 10:24:39 EDT
Created attachment 345615 [details]
Patch for Makefile.am, so ipc_hdb is no included multiple times

included is patch for Makefile.am of corosync, so coroipcc.o is no
longer included in lib... directly, but rather *.so is a dependency, so
ipc_hdb is no longer in multiple *.so and multiple times in binary what
causes problem.
Comment 2 Jan Friesse 2009-05-28 08:09:13 EDT
Better (I hope) description of problem:

Functions from ckpt library (like aCkptCheckpointOpen, saCkptSectionIterationInitialize, ...) internally uses corosync functions reply_receive, reply_receive_in_buf, ... This functions are included in coroipcc.c source file and uses global static variable ipc_hdb.

Without patch, coroipcc is linked to shared library (libcoroipcc.so) AND linked with every corosync libraries (like cpg, ....), so global variable ipc_hdb is included not only in libcoroipcc.so, but also in libcpg.so, ...

dlm_controld has function retrieve_plocks, and whole binary is linked with libcoroipcc and libcpg. So ipc_hdb is included TWICE (so has TWO addresses).

Main problem causing the bug was, that reply_receive uses address from one library, and reply_receive_in_buf uses other. This confuses check of hdb_get function. And this is, what I don't understand 100%. Why linker allowed two existence of ipc_hdb or better, why it choose different addresses in different functions (but defined in same module and called from same module)? Or better. It looks like linker chooses addresses just randomly.

After removing linking of coroipcc.o to cpg, and rather use of dynamic version,  (this means, there is only one instance of ipc_hdb) problem disappeared for me.
Comment 3 Jan Friesse 2009-06-01 04:47:37 EDT
Committed to upstream, so I'm closing this bug.

Note You need to log in before you can comment on or make changes to this bug.