Bug 1328377 - osd crash with Segmentation fault in handle_osd_ping
Summary: osd crash with Segmentation fault in handle_osd_ping
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat
Component: RADOS
Version: 1.3.2
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: rc
: 1.3.4
Assignee: David Zafman
QA Contact: ceph-qe-bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-04-19 09:02 UTC by Vikhyat Umrao
Modified: 2019-10-10 11:54 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-05-25 18:09:57 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Vikhyat Umrao 2016-04-19 09:02:17 UTC
Description of problem:
osd crash with Segmentation fault in handle_osd_ping


  -1> 2016-04-18 12:04:55.912562 7f9e871a2700  1 -- 192.168.0.46:6823/6476 <== osd.21 192.168.0.43:0/6251 1 ==== osd_ping(ping e3580 stamp 2016-04-18 12:04:48.875455) v2 ==== 47+0+0 (1708338583 0 0) 0x669d1e00 con 0x663d31e0

     0> 2016-04-18 12:04:55.913613 7f9e871a2700 -1 *** Caught signal (Segmentation fault) **
 in thread 7f9e871a2700

 ceph version 0.94.5-9.el7cp (deef183a81111fa5e128ec88c90a32c9587c615d)
 1: /usr/bin/ceph-osd() [0xa05e32]
 2: (()+0xf100) [0x7f9ea295c100]
 3: (OSD::handle_osd_ping(MOSDPing*)+0x75a) [0x659e7a]
 4: (OSD::heartbeat_dispatch(Message*)+0x2fb) [0x65b0cb]
 5: (DispatchQueue::entry()+0x62a) [0xbc2aba]
 6: (DispatchQueue::DispatchThread::entry()+0xd) [0xae572d]
 7: (()+0x7dc5) [0x7f9ea2954dc5]
 8: (clone()+0x6d) [0x7f9ea143528d]


Version-Release number of selected component (if applicable):
Red Hat Ceph Storage 1.3.2 
ceph version 0.94.5-9.el7cp

Comment 6 Brad Hubbard 2016-04-21 01:01:54 UTC
3: (OSD::handle_osd_ping(MOSDPing*)+0x75a) [0x659e7a]

That's here:

3652          }
3653
3654          Message *r = new MOSDPing(monc->get_fsid(),
   0x0000000000659e73 <+1875>:  mov    0x160(%rbx),%r12

3655                                    curmap->get_epoch(),
   0x0000000000659e5b <+1851>:  mov    0x10(%rsp),%rax
   0x0000000000659e7a <+1882>:  mov    0x10(%rax),%eax  <----HERE
   0x0000000000659e85 <+1893>:  mov    %eax,0x8(%rsp)

3656                                    MOSDPing::PING_REPLY,
3657                                    m->stamp);

If the "epoch" field in an OSDMap class is stored at offset 0x10 (I don't have time to check this and will be out for most of the next week and a half) then it looks like it is curmap or its "epoch" field which is causing the crash.

HTH.

Comment 8 Vikhyat Umrao 2016-04-21 14:25:41 UTC
(In reply to Brad Hubbard from comment #6)
> 3: (OSD::handle_osd_ping(MOSDPing*)+0x75a) [0x659e7a]
> 
> That's here:
> 
> 3652          }
> 3653
> 3654          Message *r = new MOSDPing(monc->get_fsid(),
>    0x0000000000659e73 <+1875>:  mov    0x160(%rbx),%r12
> 
> 3655                                    curmap->get_epoch(),
>    0x0000000000659e5b <+1851>:  mov    0x10(%rsp),%rax
>    0x0000000000659e7a <+1882>:  mov    0x10(%rax),%eax  <----HERE
>    0x0000000000659e85 <+1893>:  mov    %eax,0x8(%rsp)
> 
> 3656                                    MOSDPing::PING_REPLY,
> 3657                                    m->stamp);
> 
> If the "epoch" field in an OSDMap class is stored at offset 0x10 (I don't
> have time to check this and will be out for most of the next week and a
> half) then it looks like it is curmap or its "epoch" field which is causing
> the crash.
> 
> HTH.

Thanks Brad. I tired with you KCS :  https://access.redhat.com/articles/1614603

$ objdump -rdS ./ceph-osd > /root/dump.txt

$ grep -m1 -B5 handle_osd_ping dump.txt
  659715:	eb ac                	jmp    6596c3 <_ZN10OSDService24send_message_osd_clusterEiP7Messagej+0x2f3>
  659717:	90                   	nop
  659718:	0f 1f 84 00 00 00 00 	nopl   0x0(%rax,%rax,1)
  65971f:	00 

0000000000659720 <_ZN3OSD15handle_osd_pingEP8MOSDPing>:

$ c++filt _ZN3OSD15handle_osd_pingEP8MOSDPing
OSD::handle_osd_ping(MOSDPing*)


echo 'obase=16;ibase=16;659720+75A'|bc
659E7A

# grep 659e7a -A5 -B35 dump.txt 
  659de5:	00 
  659de6:	e8 45 68 fb ff       	callq  610630 <_ZNSt8ios_baseD2Ev@plt>
  659deb:	48 8b 44 24 10       	mov    0x10(%rsp),%rax
  659df0:	31 d2                	xor    %edx,%edx
  659df2:	48 89 df             	mov    %rbx,%rdi
  659df5:	8b 40 10             	mov    0x10(%rax),%eax
  659df8:	8d 70 01             	lea    0x1(%rax),%esi
  659dfb:	e8 c0 e8 fe ff       	callq  6486c0 <_ZN3OSD16osdmap_subscribeEmb>
  659e00:	e9 fb fb ff ff       	jmpq   659a00 <_ZN3OSD15handle_osd_pingEP8MOSDPing+0x2e0>
  659e05:	0f 1f 00             	nopl   (%rax)
  659e08:	48 39 ce             	cmp    %rcx,%rsi
  659e0b:	0f 84 37 03 00 00    	je     65a148 <_ZN3OSD15handle_osd_pingEP8MOSDPing+0xa28>
  659e11:	39 51 20             	cmp    %edx,0x20(%rcx)
  659e14:	0f 8f 2e 03 00 00    	jg     65a148 <_ZN3OSD15handle_osd_pingEP8MOSDPing+0xa28>
  659e1a:	48 8d 44 24 6c       	lea    0x6c(%rsp),%rax
  659e1f:	48 89 ef             	mov    %rbp,%rdi
  659e22:	48 89 c6             	mov    %rax,%rsi
  659e25:	48 89 44 24 20       	mov    %rax,0x20(%rsp)
  659e2a:	e8 61 0f 06 00       	callq  6bad90 <_ZNSt3mapIiiSt4lessIiESaISt4pairIKiiEEEixERS3_>
  659e2f:	8b 00                	mov    (%rax),%eax
  659e31:	85 c0                	test   %eax,%eax
  659e33:	0f 85 37 07 00 00    	jne    65a570 <_ZN3OSD15handle_osd_pingEP8MOSDPing+0xe50>
  659e39:	48 8b 74 24 20       	mov    0x20(%rsp),%rsi
  659e3e:	48 89 ef             	mov    %rbp,%rdi
  659e41:	e8 da 5e 04 00       	callq  69fd20 <_ZNSt8_Rb_treeIiSt4pairIKiiESt10_Select1stIS2_ESt4lessIiESaIS2_EE5eraseERS1_>
  659e46:	48 8b 43 08          	mov    0x8(%rbx),%rax
  659e4a:	48 8b 78 58          	mov    0x58(%rax),%rdi
  659e4e:	e8 3d 32 3e 00       	callq  a3d090 <_ZN4ceph12HeartbeatMap10is_healthyEv>
  659e53:	84 c0                	test   %al,%al
  659e55:	0f 84 05 02 00 00    	je     65a060 <_ZN3OSD15handle_osd_pingEP8MOSDPing+0x940>
  659e5b:	48 8b 44 24 10       	mov    0x10(%rsp),%rax
  659e60:	41 8b 8f dc 01 00 00 	mov    0x1dc(%r15),%ecx
  659e67:	bf e8 01 00 00       	mov    $0x1e8,%edi
  659e6c:	41 8b 97 e0 01 00 00 	mov    0x1e0(%r15),%edx
  659e73:	4c 8b a3 60 01 00 00 	mov    0x160(%rbx),%r12
  659e7a:	8b 40 10             	mov    0x10(%rax),%eax <==============
  659e7d:	89 4c 24 28          	mov    %ecx,0x28(%rsp)
  659e81:	89 54 24 20          	mov    %edx,0x20(%rsp)
  659e85:	89 44 24 08          	mov    %eax,0x8(%rsp)
  659e89:	e8 62 6a fb ff       	callq  6108f0 <_Znwm@plt>
  659e8e:	b9 01 00 00 00       	mov    $0x1,%ecx

> If the "epoch" field in an OSDMap class is stored at offset 0x10  then it looks like it is curmap or its "epoch" field which is causing the crash.

For this I looked into the code as I was not able to check "OSDMap class is stored at offset 0x10 "

Then I looked into the code :

curmap->get_epoch(),

In this code if I check as per my understanding we wont crash inside "get_epoch()" :

epoch_t get_epoch() const { return epoch; }

But this could :

OSDMapRef curmap = service.get_osdmap(); <================== 

OSD.h                     391 OSDMapRef get_osdmap() {

  OSDMapRef get_osdmap() {
    Mutex::Locker l(publish_lock);
    return osdmap;
  }

PG.h                      227 OSDMapRef get_osdmap() const {

 OSDMapRef get_osdmap() const {
    assert(is_locked());
    assert(osdmap_ref);
    return osdmap_ref;
  }

Comment 9 Brad Hubbard 2016-04-22 00:30:06 UTC
(In reply to Vikhyat Umrao from comment #8)

I find myself with a small window of time today (I'm on PTO)...

>   659e7a:	8b 40 10             	mov    0x10(%rax),%eax <==============
>   659e7d:	89 4c 24 28          	mov    %ecx,0x28(%rsp)
>   659e81:	89 54 24 20          	mov    %edx,0x20(%rsp)
>   659e85:	89 44 24 08          	mov    %eax,0x8(%rsp)
>   659e89:	e8 62 6a fb ff       	callq  6108f0 <_Znwm@plt>
>   659e8e:	b9 01 00 00 00       	mov    $0x1,%ecx

Nice work. I just did a "disass /m 0x659e7a" after downloading all matching binaries to cwd, upacking them, and then entering the following in gdb.

(gdb) set debug-file-directory usr/lib/debug
(gdb) set use-deprecated-index-sections on
(gdb) set solib-search-path usr/lib64
(gdb) set sysroot .
(gdb) file ./usr/bin/ceph-osd
Reading symbols from ./usr/bin/ceph-osd...Reading symbols from /home/brad/working/scratch/bz1328377/usr/lib/debug/usr/bin/ceph-osd.debug...done.
done.

> 
> > If the "epoch" field in an OSDMap class is stored at offset 0x10  then it looks like it is curmap or its "epoch" field which is causing the crash.
> 
> For this I looked into the code as I was not able to check "OSDMap class is
> stored at offset 0x10 "

The main reason I'm looking at this this morning is it came to me last night that this is easy (once you know how) and I should have been able to do it yesterday.

(gdb) p ((OSDMap*)0x0)->epoch
Cannot access memory at address 0x10

So the offset of the "epoch" field in an OSDMap class *is* 0x10 and this reinforces my theory. 

> 
> Then I looked into the code :
> 
> curmap->get_epoch(),
> 
> In this code if I check as per my understanding we wont crash inside
> "get_epoch()" :
> 
> epoch_t get_epoch() const { return epoch; }

Well, here's what I believe is happening. Because get_epoch is such a simple member function the compiler can optimise it away and essentially replace it with a single instruction;

mov    0x10(%rax),%eax

So this *is* the implementation of get_epoch and since this is a segfault the instruction pointer should be on the exact instruction that caused the segfault, not a return address.

Since this is the first use of curmap after it is returned by get_osdmap it could be the entire OSDMap instance is corrupt but, since this appears to be protected by a lock, I'd suggest it is more likely that just "epoch" is corrupt. Either way I'd suggest a code review to see if there can be a race on curmap or "epoch" or if some other situation could lead to this. The logs should give hints to the code paths involved but it may be difficult to spot.

The only other suggestions I have are capture a core file which may pinpoint a pair of racing threads, or run the ceph-osd under valgrind which may not be practical/possible.

I'm out of time again I'm afraid.

HTH.

Comment 13 Brad Hubbard 2016-04-28 05:17:18 UTC
(In reply to Brad Hubbard from comment #9)

> Since this is the first use of curmap after it is returned by get_osdmap it
> could be the entire OSDMap instance is corrupt but, since this appears to be
> protected by a lock, I'd suggest it is more likely that just "epoch" is
> corrupt. 

I retract this statement :)

I'm now leaning towards "curmap" (an OSDMapref) being NULL or invalid. Here's why...

I found this, it's ancient and fixed long ago, but is amazingly similar.

http://www.spinics.net/lists/ceph-devel/msg07677.html

The fix in that case was the following.

$ git show 76efd9772c60b93bbf632e3ecc3b9117dc081427
commit 76efd9772c60b93bbf632e3ecc3b9117dc081427
Author: Samuel Just <sam.just>
Date:   Wed Jul 18 09:26:11 2012 -0700

    OSD: publish_map in init to initialize OSDService map
    
    Other areas rely on OSDService::get_map() to function, possibly before
    activate_map is first called.  In particular, with handle_osd_ping,
    not initializing the map member results in:
    
    ceph version 0.48argonaut-413-g90ddc5a (commit:90ddc5ae51627e7656459085d7e15105c8b8316d)
     1: /tmp/cephtest/binary/usr/local/bin/ceph-osd() [0x71ba9a]
     2: (()+0xfcb0) [0x7fcd8243dcb0]
     3: (OSD::handle_osd_ping(MOSDPing*)+0x74d) [0x5dbdfd]
     4: (OSD::heartbeat_dispatch(Message*)+0x22b) [0x5dc70b]
     5: (SimpleMessenger::DispatchQueue::entry()+0x92b) [0x7b5b3b]
     6: (SimpleMessenger::dispatch_entry()+0x24) [0x7b6914]
     7: (SimpleMessenger::DispatchThread::entry()+0xd) [0x7762fd]
     8: (()+0x7e9a) [0x7fcd82435e9a]
     9: (clone()+0x6d) [0x7fcd809ea4bd]
     NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
    
    Signed-off-by: Samuel Just <sam.just>

diff --git a/src/osd/OSD.cc b/src/osd/OSD.cc
index 4acd82c..17e14f4 100644
--- a/src/osd/OSD.cc
+++ b/src/osd/OSD.cc
@@ -819,6 +819,7 @@ int OSD::init()
     return -EINVAL;
   }
   osdmap = get_map(superblock.current_epoch);
+  service.publish_map(osdmap);
 
   bind_epoch = osdmap->get_epoch();


So I looked at the current incarnation of this function...

1789 int OSD::init()
1790 {
1791   CompatSet initial, diff;
1792   Mutex::Locker lock(osd_lock);
----8<----
1886   osdmap = get_map(superblock.current_epoch);  <----NOTE
1887   check_osdmap_features(store);
1888 
1889   create_recoverystate_perf();
1890 
1891   {
1892     epoch_t bind_epoch = osdmap->get_epoch();
1893     service.set_epochs(NULL, NULL, &bind_epoch);
1894   }
1895 
1896   // load up pgs (as they previously existed)
1897   load_pgs();
1898 
1899   dout(2) << "superblock: i am osd." << superblock.whoami << dendl;
1900 
1901   create_logger();
1902 
1903   // i'm ready!
1904   client_messenger->add_dispatcher_head(this);
1905   cluster_messenger->add_dispatcher_head(this);
1906 
1907   hbclient_messenger->add_dispatcher_head(&heartbeat_dispatcher);
1908   hb_front_server_messenger->add_dispatcher_head(&heartbeat_dispatcher);
1909   hb_back_server_messenger->add_dispatcher_head(&heartbeat_dispatcher);
1910 
1911   objecter_messenger->add_dispatcher_head(service.objecter);
1912 
1913   monc->set_want_keys(CEPH_ENTITY_TYPE_MON | CEPH_ENTITY_TYPE_OSD);
1914   r = monc->init();
1915   if (r < 0)
1916     goto out;
1917 
1918   // tell monc about log_client so it will know about mon session resets
1919   monc->set_log_client(&log_client);
1920   update_log_config();
1921 
1922   osd_tp.start();
1923   osd_op_tp.start();
1924   recovery_tp.start();
1925   disk_tp.start();
1926   command_tp.start();
1927 
1928   set_disk_tp_priority();
1929 
1930   // start the heartbeat
1931   heartbeat_thread.create();  <----NOTE
1932 
1933   // tick
1934   tick_timer.add_event_after(cct->_conf->osd_heartbeat_interval, new C_Tick(this));
1935 
1936   service.init();
1937   service.publish_map(osdmap);  <----NOTE
1938   service.publish_superblock(superblock);

Note that we see "superblock: i am osd.43" in the above logs and I guess we must have started the heartbeat_thread because we are in "OSD::heartbeat_dispatch" but have we published the map yet? There is only around one hundredth of a second between the "i am osd" output and the crash...

We can see that a lot of code has crept in between when we get the map and when we actually publish it. I wonder if this has opened up an opportunity to race? The osd_lock is locked during this whole period but perhaps this is not enough? Perhaps if a ping comes in within that window we see this issue? One possible thing to try would be to move the block 1936-1938 to sometime before line 1931 (heartbeat_thread.create()).

IOW, are we calling service.get_osdmap() before we have called service.publish_map(osdmap) ?

Vikhyat, It might be worth seeing if Sam would have time to look at my theory and see whether the suggested reordering might resolve the race condition? Unfortunately I'm not back until next Tuesday but, if it can wait until then, I'd be happy to create a test package with the modifications suggested above.

Comment 14 David Zafman 2016-04-28 06:07:00 UTC

I too was thinking along those lines.  To verify I added a sleep right before the first map is published on osd.0 then started it and a bunch more OSDs.  It did not reproduce the problem.


(In reply to Brad Hubbard from comment #13)
> (In reply to Brad Hubbard from comment #9)
> 
> > Since this is the first use of curmap after it is returned by get_osdmap it
> > could be the entire OSDMap instance is corrupt but, since this appears to be
> > protected by a lock, I'd suggest it is more likely that just "epoch" is
> > corrupt. 
> 
> I retract this statement :)
> 
> I'm now leaning towards "curmap" (an OSDMapref) being NULL or invalid.
> Here's why...
> 
> I found this, it's ancient and fixed long ago, but is amazingly similar.
> 
> http://www.spinics.net/lists/ceph-devel/msg07677.html
> 
> The fix in that case was the following.
> 
> $ git show 76efd9772c60b93bbf632e3ecc3b9117dc081427
> commit 76efd9772c60b93bbf632e3ecc3b9117dc081427
> Author: Samuel Just <sam.just>
> Date:   Wed Jul 18 09:26:11 2012 -0700
> 
>     OSD: publish_map in init to initialize OSDService map
>     
>     Other areas rely on OSDService::get_map() to function, possibly before
>     activate_map is first called.  In particular, with handle_osd_ping,
>     not initializing the map member results in:
>     
>     ceph version 0.48argonaut-413-g90ddc5a
> (commit:90ddc5ae51627e7656459085d7e15105c8b8316d)
>      1: /tmp/cephtest/binary/usr/local/bin/ceph-osd() [0x71ba9a]
>      2: (()+0xfcb0) [0x7fcd8243dcb0]
>      3: (OSD::handle_osd_ping(MOSDPing*)+0x74d) [0x5dbdfd]
>      4: (OSD::heartbeat_dispatch(Message*)+0x22b) [0x5dc70b]
>      5: (SimpleMessenger::DispatchQueue::entry()+0x92b) [0x7b5b3b]
>      6: (SimpleMessenger::dispatch_entry()+0x24) [0x7b6914]
>      7: (SimpleMessenger::DispatchThread::entry()+0xd) [0x7762fd]
>      8: (()+0x7e9a) [0x7fcd82435e9a]
>      9: (clone()+0x6d) [0x7fcd809ea4bd]
>      NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> needed to interpret this.
>     
>     Signed-off-by: Samuel Just <sam.just>
> 
> diff --git a/src/osd/OSD.cc b/src/osd/OSD.cc
> index 4acd82c..17e14f4 100644
> --- a/src/osd/OSD.cc
> +++ b/src/osd/OSD.cc
> @@ -819,6 +819,7 @@ int OSD::init()
>      return -EINVAL;
>    }
>    osdmap = get_map(superblock.current_epoch);
> +  service.publish_map(osdmap);
>  
>    bind_epoch = osdmap->get_epoch();
> 
> 
> So I looked at the current incarnation of this function...
> 
> 1789 int OSD::init()
> 1790 {
> 1791   CompatSet initial, diff;
> 1792   Mutex::Locker lock(osd_lock);
> ----8<----
> 1886   osdmap = get_map(superblock.current_epoch);  <----NOTE
> 1887   check_osdmap_features(store);
> 1888 
> 1889   create_recoverystate_perf();
> 1890 
> 1891   {
> 1892     epoch_t bind_epoch = osdmap->get_epoch();
> 1893     service.set_epochs(NULL, NULL, &bind_epoch);
> 1894   }
> 1895 
> 1896   // load up pgs (as they previously existed)
> 1897   load_pgs();
> 1898 
> 1899   dout(2) << "superblock: i am osd." << superblock.whoami << dendl;
> 1900 
> 1901   create_logger();
> 1902 
> 1903   // i'm ready!
> 1904   client_messenger->add_dispatcher_head(this);
> 1905   cluster_messenger->add_dispatcher_head(this);
> 1906 
> 1907   hbclient_messenger->add_dispatcher_head(&heartbeat_dispatcher);
> 1908   hb_front_server_messenger->add_dispatcher_head(&heartbeat_dispatcher);
> 1909   hb_back_server_messenger->add_dispatcher_head(&heartbeat_dispatcher);
> 1910 
> 1911   objecter_messenger->add_dispatcher_head(service.objecter);
> 1912 
> 1913   monc->set_want_keys(CEPH_ENTITY_TYPE_MON | CEPH_ENTITY_TYPE_OSD);
> 1914   r = monc->init();
> 1915   if (r < 0)
> 1916     goto out;
> 1917 
> 1918   // tell monc about log_client so it will know about mon session resets
> 1919   monc->set_log_client(&log_client);
> 1920   update_log_config();
> 1921 
> 1922   osd_tp.start();
> 1923   osd_op_tp.start();
> 1924   recovery_tp.start();
> 1925   disk_tp.start();
> 1926   command_tp.start();
> 1927 
> 1928   set_disk_tp_priority();
> 1929 
> 1930   // start the heartbeat
> 1931   heartbeat_thread.create();  <----NOTE
> 1932 
> 1933   // tick
> 1934   tick_timer.add_event_after(cct->_conf->osd_heartbeat_interval, new
> C_Tick(this));
> 1935 
> 1936   service.init();
> 1937   service.publish_map(osdmap);  <----NOTE
> 1938   service.publish_superblock(superblock);
> 
> Note that we see "superblock: i am osd.43" in the above logs and I guess we
> must have started the heartbeat_thread because we are in
> "OSD::heartbeat_dispatch" but have we published the map yet? There is only
> around one hundredth of a second between the "i am osd" output and the
> crash...
> 
> We can see that a lot of code has crept in between when we get the map and
> when we actually publish it. I wonder if this has opened up an opportunity
> to race? The osd_lock is locked during this whole period but perhaps this is
> not enough? Perhaps if a ping comes in within that window we see this issue?
> One possible thing to try would be to move the block 1936-1938 to sometime
> before line 1931 (heartbeat_thread.create()).
> 
> IOW, are we calling service.get_osdmap() before we have called
> service.publish_map(osdmap) ?
> 
> Vikhyat, It might be worth seeing if Sam would have time to look at my
> theory and see whether the suggested reordering might resolve the race
> condition? Unfortunately I'm not back until next Tuesday but, if it can wait
> until then, I'd be happy to create a test package with the modifications
> suggested above.

Comment 15 Brad Hubbard 2016-04-28 06:23:17 UTC
(In reply to David Zafman from comment #14)
> 
> I too was thinking along those lines.  To verify I added a sleep right
> before the first map is published on osd.0 then started it and a bunch more
> OSDs.  It did not reproduce the problem.

Hmmm, cool. Nice idea.

We must be missing something along those lines though I think...


Note You need to log in before you can comment on or make changes to this bug.