Bug 673144 - [vdsm] [storage] mailbox checksum errors running lvextend - extend fails and vm pause
Summary: [vdsm] [storage] mailbox checksum errors running lvextend - extend fails and...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: vdsm
Version: 6.1
Hardware: x86_64
OS: All
unspecified
high
Target Milestone: rc
: ---
Assignee: Eduardo Warszawski
QA Contact: yeylon@redhat.com
URL:
Whiteboard: Storage
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-01-27 14:36 UTC by Haim
Modified: 2016-04-18 06:37 UTC (History)
11 users (show)

Fixed In Version: vdsm-4.9-47.el6
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-08-19 15:24:02 UTC
Target Upstream Version:


Attachments (Terms of Use)
SPM log. (917.98 KB, application/x-gzip)
2011-01-27 14:36 UTC, Haim
no flags Details

Description Haim 2011-01-27 14:36:58 UTC
Created attachment 475612 [details]
SPM log.

Description of problem:

when running vm on SPM that extends its lv rapidly (request to extend logical volume - use FCP storage), I get checksum errors on mailbox lv, and vm pause.
theory is that we have a race, as the way SPM and HSM exchange message goes as follows: 

- when HSM want to deliver message to SPM, it writes to certain location in its 
  outbox, which is SPM inbox directory. 
- on this scenario, where vm runs on SPM, vdsm machine has 2 roles, one as SPM, 
  and one as HSM (this is our code), raise might occur when there is one thread 
  that writes a message (as HSM) to extend the lv, 'dd' command is initated, and 
  thread goes to sleep, during that time, there is another thread that reads (as 
  SPM) its INBOX, due to the fact that 'dd' wasn't finished, there is a checksum 
  error.
- afterwards, there's another thread that deletes mailbox, which rubbish the all 
  thing. 

the above issue reproduced twice so far, and happens superficially on my setup, using FCP storage. 

repro steps:

1) create vm with OS installed (or live CD). 
2) add new thinly provisioned disk (100G)
3) extend lv by using 'dd' command. 
4) happens after approximately 30G (29 extends)

Comment 1 Haim 2011-02-08 09:44:13 UTC
verified on vdsm-4.9-47.el6.x86_64. 

manage to repro, and see error message, vm continue to extend. 


Dummy-122::ERROR::2011-02-08 11:38:16,385::storage_mailbox::538::Storage.MailBox.SpmMailMonitor::(_validateMailbox) SPM_MailMonitor: mailbox 1 checksum failed,
 not clearing mailbox, clearing newMail.


Note You need to log in before you can comment on or make changes to this bug.