Bug 1136278 - The SPM stops monitoring the Incoming mail
Summary: The SPM stops monitoring the Incoming mail
Keywords:
Status: CLOSED DUPLICATE of bug 1119664
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm
Version: 3.4.0
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
: 3.5.0
Assignee: Nir Soffer
QA Contact: Aharon Canan
URL:
Whiteboard: storage
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-09-02 09:31 UTC by Roman Hodain
Modified: 2019-04-28 09:44 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-09-02 17:42:46 UTC
oVirt Team: Storage
Target Upstream Version:
Embargoed:
amureini: needinfo+


Attachments (Terms of Use)

Description Roman Hodain 2014-09-02 09:31:16 UTC
Description of problem:
The SPM stops monitoring the incoming mails after an IO error which causes the
mailbox to get full and VMs in paused state.


Version-Release number of selected component (if applicable):
	vdsm 4.13.2-0.17.el6ev

How reproducible:
	Unknown

Steps to Reproduce:
	Unknown

Actual results:
	The mailbox gets full as noone picks the messages

Expected results:
	The monitoring is reestablished

Additional info:



  60182 Thread-17::DEBUG::2014-08-18 11:06:51,805::blockSD::595::Storage.Misc.excCmd::(getReadDelay) FAILED: <err> = "/bin/dd: reading `/dev/bcd91f21-6ee0-4ff9-962f-d33328fc2a89/metadata': Input/output error\n0+0 records in\n0+0 records out\n0 bytes (0 B) copied, 3.55498 s, 0.0 kB/s\n"; <rc> = 1
  60183 Dummy-56::DEBUG::2014-08-18 11:06:51,806::storage_mailbox::733::Storage.Misc.excCmd::(_checkForMail) FAILED: <err> = "dd: reading `/rhev/data-center/06bfda7a-371a-4bdf-a886-e1e2e33859fc/mastersd/dom_md/inbox': Input/output error\n0+0 records in\n0+0 records out\n0 bytes (0 B) copied, 7.12117 s, 0.0 kB/s\n"; <rc> = 1
  60184 Thread-17::ERROR::2014-08-18 11:06:51,809::domainMonitor::239::Storage.DomainMonitorThread::(_monitorDomain) Error while collecting domain bcd91f21-6ee0-4ff9-962f-d33328fc2a89 monitoring information
  60185 Traceback (most recent call last):
  60186   File "/usr/share/vdsm/storage/domainMonitor.py", line 217, in _monitorDomain
  60187   File "/usr/share/vdsm/storage/blockSD.py", line 595, in getReadDelay
  60188   File "/usr/share/vdsm/storage/misc.py", line 229, in readspeed
  60189   File "/usr/share/vdsm/storage/misc.py", line 204, in _readfile
  60190 MiscFileReadException: Internal file read failure: ('/dev/bcd91f21-6ee0-4ff9-962f-d33328fc2a89/metadata',)
...
  60208 Thread-21::ERROR::2014-08-18 11:06:51,881::domainMonitor::239::Storage.DomainMonitorThread::(_monitorDomain) Error while collecting domain 2d6abcd1-afc6-4d77-ad5c-d465b9d256c7 monitoring information
  60209 Traceback (most recent call last):
  60210   File "/usr/share/vdsm/storage/domainMonitor.py", line 217, in _monitorDomain
  60211   File "/usr/share/vdsm/storage/blockSD.py", line 595, in getReadDelay
  60212   File "/usr/share/vdsm/storage/misc.py", line 229, in readspeed
  60213   File "/usr/share/vdsm/storage/misc.py", line 204, in _readfile
  60214 MiscFileReadException: Internal file read failure: ('/dev/2d6abcd1-afc6-4d77-ad5c-d465b9d256c7/metadata',)
...
  60256 Dummy-56::INFO::2014-08-18 11:06:54,580::storage_mailbox::796::Storage.MailBox.SpmMailMonitor::(run) SPM_MailMonitor - Incoming mail monitoring thread stopped


We can also see that the amunt of messages is growing

   2001 2014-08-18 10:51:33,755::storage_mailbox::344::Storage.MailBox.HsmMailMonitor::(_handleResponses) HSM_MailboxMonitor(1/63) - Checking reply: '1xtnd\xed0\x9b;\xa2x\x9a\x86\xa3D\xf68\xc1\x88\xe8\xf5F\x03)R\xb3\xc3\x8c\xa5\xbfO)O\xb7\xa1KM000000000000080000000000000'
   2002 2014-08-18 10:58:33,879::storage_mailbox::428::Storage.MailBox.HsmMailMonitor::(_handleMessage) HSM_MailMonitor - start: 64, end: 128, len: 4096, message(1/63): '1xtnd\xed0\x9b;\xa2x\x9a\x86\xa3D\xf68\xc1\x88\xe8\xf5\xf5I\x84  y\xec\xab\xcbH?V\xd4\x8b\x1dX000000000000080000000000000'
   2003 2014-08-18 10:58:40,804::storage_mailbox::344::Storage.MailBox.HsmMailMonitor::(_handleResponses) HSM_MailboxMonitor(1/63) - Checking reply: '1xtnd\xed0\x9b;\xa2x\x9a\x86\xa3D\xf68\xc1\x88\xe8\xf5\xf5I\x84  y\xec\xab\xcbH?V\xd4\x8b\x1dX000000000000080000000000000'
   2004 2014-08-18 12:55:26,631::storage_mailbox::428::Storage.MailBox.HsmMailMonitor::(_handleMessage) HSM_MailMonitor - start: 64, end: 128, len: 4096, message(1/63): '1xtnd\xde#\xe0\x049R\xf5\xbe\xd4D\x83x\x94\x1c"\x99<\xf62\x9e\xf3~\xed\xa2\xefI\x17\xc4\xc8\x14\x99a000000000000080000000000000'
   2005 2014-08-18 12:55:26,639::storage_mailbox::428::Storage.MailBox.HsmMailMonitor::(_handleMessage) HSM_MailMonitor - start: 128, end: 192, len: 4096, message(2/63): '1xtnd\xde#\xe0\x049R\xf5\xbe\xd4D\x83x\x94\x1c"\x99\x95\xbdj\xfa=\xca\x02\xad\x87M>\xbe8\x04\x87\xe5000000000000080000000000000'
   2006 2014-08-18 13:01:30,816::storage_mailbox::428::Storage.MailBox.HsmMailMonitor::(_handleMessage) HSM_MailMonitor - start: 192, end: 256, len: 4096, message(3/63): '1xtnd\xde#\xe0\x049R\xf5\xbe\xd4D\x83x\x94\x1c"\x99\xc8\xce\xc2\x9d\xa8b\x17\x84\xcdG\xc6\x05\x0c\xe6\xec\xc7000000000000080000000000000'
   2007 2014-08-18 13:03:58,809::storage_mailbox::428::Storage.MailBox.HsmMailMonitor::(_handleMessage) HSM_MailMonitor - start: 256, end: 320, len: 4096, message(4/63): '1xtnd\xed0\x9b;\xa2x\x9a\x86\xa3D\xf68\xc1\x88\xe8\xf5\x1eU_\xf4I\xad\xf7\x9c\xbfJ\xfd\x90\x15"\xb9>000000000000080000000000000'
   2008 2014-08-18 14:06:02,548::storage_mailbox::428::Storage.MailBox.HsmMailMonitor::(_handleMessage) HSM_MailMonitor - start: 320, end: 384, len: 4096, message(5/63): '1xtnd\xde#\xe0\x049R\xf5\xbe\xd4D\x83x\x94\x1c"\x99Z\x16\x7fuv!T\xa0\xbeD>\xf5;\x1e\xaa#000000000000080000000000000'
...
   2061 2014-08-20 14:20:33,088::storage_mailbox::428::Storage.MailBox.HsmMailMonitor::(_handleMessage) HSM_MailMonitor - start: 3712, end: 3776, len: 4096, message(58/63): '1xtnd\xed0\x9b;\xa2x\x9a\x86\xa3D\xf68\xc1\x88\xe8\xf5}\xc3\xc8P\xae\xbc\n\x98\xb1A\xfa\x10\x0b\x8b\xfb 000000000000080000000000000'
   2062 2014-08-20 14:22:38,379::storage_mailbox::428::Storage.MailBox.HsmMailMonitor::(_handleMessage) HSM_MailMonitor - start: 3776, end: 3840, len: 4096, message(59/63): '1xtnd+\x82A&oE\xe4\x83\xd8O\x12-Uv\xcb\xa7m\x80\xcc\x11\xa3\xc5O\x99iL\x83L\xb2\xa8\xad\xa3000000000000080000000000000'
   2063 2014-08-20 14:35:56,971::storage_mailbox::428::Storage.MailBox.HsmMailMonitor::(_handleMessage) HSM_MailMonitor - start: 3840, end: 3904, len: 4096, message(60/63): '1xtndG\xaej\xf9\x88f\xb5\x8a\x00C\xf3"+0\x93O9\xda\x83\xfb{+\x12\x9b\xe7HRXP\xfez\x17000000000000080000000000000'
   2064 2014-08-20 14:45:09,358::storage_mailbox::428::Storage.MailBox.HsmMailMonitor::(_handleMessage) HSM_MailMonitor - start: 3904, end: 3968, len: 4096, message(61/63): '1xtnd\xb6\xc4\x9a\xf61fQ\x9bRE\xe11\xf1V%"*f)*\x16\x1dZ\xb0\x92NA\x90\xa3$X\x9e000000000000080000000000000'
   2065 2014-08-20 14:55:18,710::storage_mailbox::428::Storage.MailBox.HsmMailMonitor::(_handleMessage) HSM_MailMonitor - start: 3968, end: 4032, len: 4096, message(62/63): '1xtnd+\x82A&oE\xe4\x83\xd8O\x12-Uv\xcb\xa7\x91\xfe"N\x807\x1b\xa7|J\x17<V\x1c{2000000000000080000000000000'

The numebr of messages is only growing in between.

Comment 2 Allon Mureinik 2014-09-02 12:38:48 UTC
Nir, is this related to bug 1117795?

Comment 3 Nir Soffer 2014-09-02 17:42:46 UTC
This is a duplicate of bug 1119664, fixed in RHEV 3.4.1.

*** This bug has been marked as a duplicate of bug 1119664 ***


Note You need to log in before you can comment on or make changes to this bug.