Bug 973534

Summary: Notification service cannot recover from connectivity loss
Product: Red Hat Enterprise Virtualization Manager Reporter: Mooli Tayer <mtayer>
Component: ovirt-engine-notification-serviceAssignee: Mooli Tayer <mtayer>
Status: CLOSED CURRENTRELEASE QA Contact: Ilanit Stein <istein>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 3.3.0CC: acathrow, iheim, jkt, Rhev-m-bugs
Target Milestone: ---   
Target Release: 3.3.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: infra
Fixed In Version: is2 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1019461    

Description Mooli Tayer 2013-06-12 07:48:28 UTC
Description of problem:

If DB connectivity is lost at some point and later regained, the notification service daemon will not regain connectivity unless restarted.

How reproducible:

Every time.

Steps to Reproduce:

1.)First, to make reproduction quicker edit conf file:
share/ovirt-engine/conf/notifier.conf.defaults
INTERVAL_IN_SECONDS=10
(note: to run the notification service you will also need to configure the MAIL_SERVER property)
 
2.)Run the notification service:
share/ovirt-engine/services/ovirt-engine-notifier.py start

and watch the log:
var/log/ovirt-engine/notifier/notifier.log

3.)Disable DB:
sudo service postgresql stop
wait for first exception in the log

4.)Enable DB:
sudo service postgresql start

Actual results:

Notification service does not regain connectivity (unless restarted), 
and null pointer exceptions keep getting written to the log, once every iteration:

Failed to run the service: [null]
java.lang.NullPointerException
        at org.ovirt.engine.core.tools.common.db.StandaloneDataSource.checkConnection(StandaloneDataSource.java:112)
        at org.ovirt.engine.core.tools.common.db.StandaloneDataSource.getConnection(StandaloneDataSource.java:130)
        at org.ovirt.engine.core.notifier.NotificationService.processEvents(NotificationService.java:220)
        at org.ovirt.engine.core.notifier.NotificationService.run(NotificationService.java:103)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:722)

Expected results:

Notification service goes back to work normally

Additional info:

Comment 1 Ilanit Stein 2013-07-17 07:44:34 UTC
Verified on is5.

Comment 2 Itamar Heim 2014-01-21 22:26:43 UTC
Closing - RHEV 3.3 Released

Comment 3 Itamar Heim 2014-01-21 22:30:00 UTC
Closing - RHEV 3.3 Released