Note: This bug is displayed in read-only format because
the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Created attachment 475612[details]
SPM log.
Description of problem:
when running vm on SPM that extends its lv rapidly (request to extend logical volume - use FCP storage), I get checksum errors on mailbox lv, and vm pause.
theory is that we have a race, as the way SPM and HSM exchange message goes as follows:
- when HSM want to deliver message to SPM, it writes to certain location in its
outbox, which is SPM inbox directory.
- on this scenario, where vm runs on SPM, vdsm machine has 2 roles, one as SPM,
and one as HSM (this is our code), raise might occur when there is one thread
that writes a message (as HSM) to extend the lv, 'dd' command is initated, and
thread goes to sleep, during that time, there is another thread that reads (as
SPM) its INBOX, due to the fact that 'dd' wasn't finished, there is a checksum
error.
- afterwards, there's another thread that deletes mailbox, which rubbish the all
thing.
the above issue reproduced twice so far, and happens superficially on my setup, using FCP storage.
repro steps:
1) create vm with OS installed (or live CD).
2) add new thinly provisioned disk (100G)
3) extend lv by using 'dd' command.
4) happens after approximately 30G (29 extends)
verified on vdsm-4.9-47.el6.x86_64.
manage to repro, and see error message, vm continue to extend.
Dummy-122::ERROR::2011-02-08 11:38:16,385::storage_mailbox::538::Storage.MailBox.SpmMailMonitor::(_validateMailbox) SPM_MailMonitor: mailbox 1 checksum failed,
not clearing mailbox, clearing newMail.
Created attachment 475612 [details] SPM log. Description of problem: when running vm on SPM that extends its lv rapidly (request to extend logical volume - use FCP storage), I get checksum errors on mailbox lv, and vm pause. theory is that we have a race, as the way SPM and HSM exchange message goes as follows: - when HSM want to deliver message to SPM, it writes to certain location in its outbox, which is SPM inbox directory. - on this scenario, where vm runs on SPM, vdsm machine has 2 roles, one as SPM, and one as HSM (this is our code), raise might occur when there is one thread that writes a message (as HSM) to extend the lv, 'dd' command is initated, and thread goes to sleep, during that time, there is another thread that reads (as SPM) its INBOX, due to the fact that 'dd' wasn't finished, there is a checksum error. - afterwards, there's another thread that deletes mailbox, which rubbish the all thing. the above issue reproduced twice so far, and happens superficially on my setup, using FCP storage. repro steps: 1) create vm with OS installed (or live CD). 2) add new thinly provisioned disk (100G) 3) extend lv by using 'dd' command. 4) happens after approximately 30G (29 extends)