Created attachment 590187 [details] hsm logs Version: RHEVM SI4, vdsm-4.9.6-10.el6.x86_64 Scenario: The env contains hosts that run `300-500 vms each. The mailbox becomes full a lot of time so a lot of lvextend requests are not read by the SPM. The mailbox has 63 slots per host and the SPM does not clear the messages right after it reads them so in case of failed lvextends the mailbox becomes full. From the HSM: Thread-215::DEBUG::2012-06-07 10:04:08,263::storage_mailbox::361::Storage.MailBox.HsmMailMonitor::(_handleMessage) HSM_MailMonitor - ignoring duplicate message <storage.storage_mailbox.SPM_Extend_Message instance at 0x7f033c949ef0> Thread-215::ERROR::2012-06-07 10:04:08,266::storage_mailbox::444::Storage.MailBox.HsmMailMonitor::(run) HSM_MailboxMonitor - Incoming mailmonitoring thread caught exception; will try to recover Traceback (most recent call last): File "/usr/share/vdsm/storage/storage_mailbox.py", line 412, in run self._handleMessage(message) File "/usr/share/vdsm/storage/storage_mailbox.py", line 364, in _handleMessage raise RuntimeError("HSM_MailMonitor - Active messages list full, cannot add new message") RuntimeError: HSM_MailMonitor - Active messages list full, cannot add new message Thread-215::DEBUG::2012-06-07 10:04:08,275::storage_mailbox::361::Storage.MailBox.HsmMailMonitor::(_handleMessage) HSM_MailMonitor - ignoring duplicate message <storage.storage_mailbox.SPM_Extend_Message instance at 0x2a74680> Thread-215::DEBUG::2012-06-07 10:04:08,277::storage_mailbox::361::Storage.MailBox.HsmMailMonitor::(_handleMessage) HSM_MailMonitor - ignoring duplicate message <storage.storage_mailbox.SPM_Extend_Message instance at 0x7f03408585a8> Thread-215::ERROR::2012-06-07 10:04:08,280::storage_mailbox::444::Storage.MailBox.HsmMailMonitor::(run) HSM_MailboxMonitor - Incoming mailmonitoring thread caught exception; will try to recover Traceback (most recent call last): File "/usr/share/vdsm/storage/storage_mailbox.py", line 412, in run self._handleMessage(message) File "/usr/share/vdsm/storage/storage_mailbox.py", line 364, in _handleMessage raise RuntimeError("HSM_MailMonitor - Active messages list full, cannot add new message") RuntimeError: HSM_MailMonitor - Active messages list full, cannot add new message Thread-215::DEBUG::2012-06-07 10:04:08,282::storage_mailbox::361::Storage.MailBox.HsmMailMonitor::(_handleMessage) HSM_MailMonitor - ignoring duplicate message <storage.storage_mailbox.SPM_Extend_Message instance at 0x7f0384f1a878> Thread-215::ERROR::2012-06-07 10:04:08,285::storage_mailbox::444::Storage.MailBox.HsmMailMonitor::(run) HSM_MailboxMonitor - Incoming mailmonitoring thread caught exception; will try to recover Traceback (most recent call last): File "/usr/share/vdsm/storage/storage_mailbox.py", line 412, in run self._handleMessage(message) File "/usr/share/vdsm/storage/storage_mailbox.py", line 364, in _handleMessage raise RuntimeError("HSM_MailMonitor - Active messages list full, cannot add new message") RuntimeError: HSM_MailMonitor - Active messages list full, cannot add new message Thread-215::DEBUG::2012-06-07 10:04:08,288::storage_mailbox::361::Storage.MailBox.HsmMailMonitor::(_handleMessage) HSM_MailMonitor - ignoring duplicate message <storage.storage_mailbox.SPM_Extend_Message instance at 0x7f0348e23368> Thread-215::ERROR::2012-06-07 10:04:08,290::storage_mailbox::444::Storage.MailBox.HsmMailMonitor::(run) HSM_MailboxMonitor - Incoming mailmonitoring thread caught exception; will try to recover Traceback (most recent call last): File "/usr/share/vdsm/storage/storage_mailbox.py", line 412, in run self._handleMessage(message) File "/usr/share/vdsm/storage/storage_mailbox.py", line 364, in _handleMessage raise RuntimeError("HSM_MailMonitor - Active messages list full, cannot add new message") RuntimeError: HSM_MailMonitor - Active messages list full, cannot add new message Thread-215::DEBUG::2012-06-07 10:04:08,293::storage_mailbox::361::Storage.MailBox.HsmMailMonitor::(_handleMessage) HSM_MailMonitor - ignoring duplicate message <storage.storage_mailbox.SPM_Extend_Message instance at 0x2980b90> Thread-215::DEBUG::2012-06-07 10:04:08,295::storage_mailbox::361::Storage.MailBox.HsmMailMonitor::(_handleMessage) HSM_MailMonitor - ignoring duplicate message <storage.storage_mailbox.SPM_Extend_Message instance at 0x7f03407c3f80> Thread-215::DEBUG::2012-06-07 10:04:08,302::storage_mailbox::361::Storage.MailBox.HsmMailMonitor::(_handleMessage) HSM_MailMonitor - ignoring duplicate message <storage.storage_mailbox.SPM_Extend_Message instance at 0x7f02ecd80cb0> Thread-215::DEBUG::2012-06-07 10:04:08,305::storage_mailbox::361::Storage.MailBox.HsmMailMonitor::(_handleMessage) HSM_MailMonitor - ignoring duplicate message <storage.storage_mailbox.SPM_Extend_Message instance at 0x22607a0> Thread-215::DEBUG::2012-06-07 10:04:08,310::storage_mailbox::361::Storage.MailBox.HsmMailMonitor::(_handleMessage) HSM_MailMonitor - ignoring duplicate message <storage.storage_mailbox.SPM_Extend_Message instance at 0x1c8c290> Thread-215::ERROR::2012-06-07 10:04:08,315::storage_mailbox::444::Storage.MailBox.HsmMailMonitor::(run) HSM_MailboxMonitor - Incoming mailmonitoring thread caught exception; will try to recover Traceback (most recent call last): File "/usr/share/vdsm/storage/storage_mailbox.py", line 412, in run self._handleMessage(message) File "/usr/share/vdsm/storage/storage_mailbox.py", line 364, in _handleMessage raise RuntimeError("HSM_MailMonitor - Active messages list full, cannot add new message")
Created attachment 590188 [details] spm logs
http://gerrit.ovirt.org/#/c/6083/
Comment #3 is a mistake.
This request was not resolved in time for the current release. Red Hat invites you to ask your support representative to propose this request, if still desired, for consideration in the next release of Red Hat Enterprise Linux.
As stated in comment #4, the change in comment #3 is a mistake and it is not related at all to the present BZ. Therefore we should not change BZ status based on this.
During add 150 hosts to Data Center, get same errors: Thread-31019::ERROR::2013-08-19 21:41:41,627::storage_mailbox::503::Storage.MailBox.HsmMailMonitor::(run) HSM_MailboxMonitor - Incoming mailmonitoring thread caught exception; will try to recover Traceback (most recent call last): File "/usr/share/vdsm/storage/storage_mailbox.py", line 468, in run self._handleMessage(message) File "/usr/share/vdsm/storage/storage_mailbox.py", line 408, in _handleMessage raise RuntimeError("HSM_MailMonitor - Active messages list full, " RuntimeError: HSM_MailMonitor - Active messages list full, cannot add new message RHEVM 3.3 - IS10 environment: RHEVM: rhevm-3.3.0-0.15.master.el6ev.noarch PythonSDK: rhevm-sdk-python-3.3.0.10-1.el6ev.noarch VDSM: vdsm-4.12.0-61.git8178ec2.el6ev.x86_64 LIBVIRT: libvirt-0.10.2-18.el6_4.9.x86_64 QEMU & KVM: qemu-kvm-rhev-0.12.1.2-2.355.el6_4.5.x86_64 SANLOCK: sanlock-2.8-1.el6.x86_64
Created attachment 788200 [details] ## Logs rhevm, vdsm, libvirt, thread dump, superVdsm
Fede, is this still interesting?
(In reply to Allon Mureinik from comment #14) > Fede, is this still interesting? Fede, your input please?
(In reply to Allon Mureinik from comment #15) > (In reply to Allon Mureinik from comment #14) > > Fede, is this still interesting? > Fede, your input please? We could try to reproduce the issue but there are too many variables here: the number of VMs, how fast the storage is, how many hosts are accessing it, how many disks are thinly provisioned, etc. In the end the only feedback we'll have is about the environment we tested. We know that this is a limit of the technology we're using (as others as e.g. max number of lvs in a domain, etc.) and we'll try to resolve this going forward (remove the spm, metadata lock per domain etc.). I think we need the feedback from a pm (needinfo on Sean) to understand how much this is a pressing issue. If it's critical enough we can look and check if there's anything that we can do right now.
(In reply to Federico Simoncelli from comment #16) > I think we need the feedback from a pm (needinfo on Sean) to understand how > much this is a pressing issue. If it's critical enough we can look and check > if there's anything that we can do right now. This was reported by QA against 3.1 and was never escalated from the field - it's not pressing. Pushing out to 3.6 to be reexamined after the SPM is removed.
(In reply to Allon Mureinik from comment #17) > (In reply to Federico Simoncelli from comment #16) > > I think we need the feedback from a pm (needinfo on Sean) to understand how > > much this is a pressing issue. If it's critical enough we can look and check > > if there's anything that we can do right now. > This was reported by QA against 3.1 and was never escalated from the field - > it's not pressing. > Pushing out to 3.6 to be reexamined after the SPM is removed. Closing old bugs. Since this was never encountered, and 3.6.0 will change the mechanism anyway, I don't think this is interesting. If anyone disagrees, feel free to explain and reopen.
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days