Bug 1321009

Summary: NPE caused by requests during startup result in UndeclaredRepositoryException later
Product: [JBoss] JBoss Enterprise Portal Platform 6 Reporter: Martin Weiler <mweiler>
Component: PortalAssignee: Default User <jbpapp-maint>
Status: CLOSED WONTFIX QA Contact: Tomas Kyjovsky <tkyjovsk>
Severity: high Docs Contact:
Priority: high    
Version: 6.2.0CC: epp-bugs, tkobayas
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-04-04 08:45:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Martin Weiler 2016-03-24 13:16:28 UTC
Description of problem:
We are currently facing issues with JCR session handling in our application.
The root cause of this issue is that the JCR session handling in the eXo framework does not handle closed sessions correctly, and this exception is thrown randomly: 

2016-03-02 06:39:49,961 ERROR [org.apache.catalina.core.ContainerBase.[jboss.web].[default
-host]] (http-/0.0.0.0:8443-1) JBWEB000313: Exception processing error page /WEB-INF/pages/error/general-error.xhtml: javax.servlet.ServletException: org.chromattic.api.UndeclaredRepositoryException: javax.jcr.InvalidItemStateException: This kind of operation is forbidden after a session.logout().: This kind of operation is forbidden after a session.logout().: This kind of operation is forbidden after a session.logout(). 
at org.exoplatform.portal.application.PortalController.onService(PortalController.java:112) [exo.portal.webui.portal-3.6.5.Final-redhat-1-BZ-1066502.jar:3.6.5.Final-redhat-1-BZ-1066502] 
at org.exoplatform.container.web.AbstractHttpServlet.service(AbstractHttpServlet.java:132) [exo.kernel.container-2.4.7-GA-redhat-1.jar:2.4.7-GA-redhat-1] 
at javax.servlet.http.HttpServlet.service(HttpServlet.java:847) [jboss-servlet-api_3.0_spec-1.0.2.Final-redhat-1.jar:1.0.2.Final-redhat-1] 
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:295) [jbossweb-7.2.2.Final-redhat-1.jar:7.2.2.Final-redhat-1] 
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:214) [jbossweb-7.2.2.Final-redhat-1.jar:7.2.2.Final-redhat-1] 
at org.apache.catalina.core.ApplicationDispatcher.invoke(ApplicationDispatcher.java:832) [jbossweb-7.2.2.Final-redhat-1.jar:7.2.2.Final-redhat-1] 
at org.apache.catalina.core.ApplicationDispatcher.processRequest(ApplicationDispatcher.java:622) [jbossweb-7.2.2.Final-redhat-1.jar:7.2.2.Final-redhat-1] 

However, the error only appears on days, when there have been the following NPEs at server startup (we do nightly server restarts). These exceptions could be reproduced by invoking any of the pages of our application during server startup: 

2016-03-02 01:01:33,020 ERROR [org.exoplatform.services.organization.idm.PicketLinkIDMOrganizationServiceImpl] (http-/0.0.0.0:8443-1) null: java.lang.NullPointerException 
at org.exoplatform.services.organization.idm.PicketLinkIDMServiceImpl.getIdentitySession(PicketLinkIDMServiceImpl.java:211)
[exo.portal.component.identity-3.6.5.Final-redhat-1.jar:3.6.5.Final-redhat-1] 
at org.exoplatform.services.organization.idm.PicketLinkIDMOrganizationServiceImpl.startRequest(PicketLinkIDMOrganizationSer
viceImpl.java:140) [exo.portal.component.identity-3.6.5.Final-redhat-1.jar:3.6.5.Final-redhat-1] 
at org.exoplatform.container.component.RequestLifeCycle.doBegin(RequestLifeCycle.java:57) [exo.kernel.container-2.4.7-GA-redhat-1.jar:2.4.7-GA-redhat-1] 
at org.exoplatform.container.component.RequestLifeCycleStack.begin(RequestLifeCycleStack.java:94) [exo.kernel.container-2.4.7-GA-redhat-1.jar:2.4.7-GA-redhat-1] 
at org.exoplatform.container.component.RequestLifeCycle.begin(RequestLifeCycle.java:107) [exo.kernel.container-2.4.7-GA-redhat-1.jar:2.4.7-GA-redhat-1] 
at org.exoplatform.container.component.RequestLifeCycle.begin(RequestLifeCycle.java:141) [exo.kernel.container-2.4.7-GA-redhat-1.jar:2.4.7-GA-redhat-1] 
at org.exoplatform.web.WebAppController.service(WebAppController.java:335) [exo.portal.component.web.controller-3.6.5.Final-redhat-1.jar:3.6.5.Final-redhat-1] 
at org.exoplatform.portal.application.PortalController.onService(PortalController.java:110) [exo.portal.webui.portal-3.6.5.Final-redhat-1-BZ-1066502.jar:3.6.5.Final-redhat-1-BZ-1066502] 

Version-Release number of selected component (if applicable):


How reproducible:
Intermittently

Steps to Reproduce:
1. Deploy sample-portal (*)
2. Start the server
3. Start a script that attempts to access the sample-portal repeatedly, eg:
$ while true; do wget -nv -a wget.log "http://localhost:8080/sample-portal/" ; rm index.html*; sleep 1; done

* not 100% sure if this is necessary, but I could not reproduce with the default portal only

Actual results:
Intermittently, the requests result in the outlined NPE

Expected results:
Requests are either blocked by the PortalCheckInitFilter, or return the requested page once the portal is started. No NPE should happen


Additional info:
Further debug logging revealed that the PortalController is not fully initialized, but the PortalContainer.isStarted already returns true:

The request arrives when the PortalContainer is in "started" state:

2016-03-23 12:53:38,118 DEBUG [org.gatein.portal.init.PortalCheckInitFilter] (http-/0.0.0.0:8181-1) Request 'http://host:8181/sample-portal' - context= sample-portal - isPortalStarted=false
2016-03-23 12:53:38,118 DEBUG [org.gatein.portal.init.PortalCheckInitFilter] (http-/0.0.0.0:8181-1) PortalContainer 'sample-portal' - context= sample-portal - isStarted=true

Then the NPE occurs:

2016-03-23 12:53:38,221 ERROR [org.apache.catalina.core.ContainerBase.[jboss.web].[default-host].[/sample-portal].[portal]] (http-/0.0.0.0:8181-1) JBWEB000236: Servlet.service() for servlet portal threw exception: java.lang.NullPointerException
	at org.exoplatform.portal.application.PortalRequestContext.getUserPortalConfig(PortalRequestContext.java:268) [exo.portal.webui.portal-3.6.5.Final-redhat-1-BZ-1066502.jar:3.6.5.Final-redhat-1-BZ-1066502]
	at org.exoplatform.portal.application.PortalRequestHandler.execute(PortalRequestHandler.java:156) [exo.portal.webui.portal-3.6.5.Final-redhat-1-BZ-1066502.jar:3.6.5.Final-redhat-1-BZ-1066502]

In my opinion, the problem is that the PortalController goes into state "started" when there is still initialization going on, which is only completed seconds later:

2016-03-23 12:53:41,752 INFO  [org.gatein.portal.application.PortalController] (MSC service thread 1-2) The WebAppController has been successfully initialized for the portal 'sample-portal'

Comment 2 Toshiya Kobayashi 2016-03-25 06:50:47 UTC
Note: The issue is observed in JPP 6.1.1 and 6.2.0.

Comment 3 Toshiya Kobayashi 2016-03-25 08:22:09 UTC
Analysis:

I see PortalCheckInitFilter has 2 problems.

1. PortalCheckInitFilter flips the flag isPortalStarted to true once it confirms ONE PortalContainer is started. So if it confirms "portal" PortalContainer is started, it will allow "sample-portal" requests too.

https://github.com/gatein/gatein-portal/blob/3.6.x/component/portal/src/main/java/org/gatein/portal/init/PortalCheckInitFilter.java

We can fix it by checking per PortalContainer. But the issue which the customer is hitting is the next one.

2. The gap between "PortalContainer.isStarted" and deployment complete.

As you can find in the RootContainer.createPortalContainer(), there is a gap between "pcontainer.start(true);" and "executeInitTasks(pcontainer, PortalContainerPostCreateTask.TYPE);" where PortalController.onInit() is actually executed. 

https://github.com/exoplatform/kernel/blob/stable/2.4.x/exo.kernel.container/src/main/java/org/exoplatform/container/RootContainer.java#L759-L766

So we need to add (or replace with) another status check.

Comment 4 Toshiya Kobayashi 2016-04-04 08:45:56 UTC
This issue is resolved by a custom filter which waits for server startup. The following article contains the filter and instructions:

* NullPointerExceptions in log files when JPP 6 is accessed before fully started 
https://access.redhat.com/solutions/497133