Bug 1390739 - undefined method `path' for nil:NilClass
Summary: undefined method `path' for nil:NilClass
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat CloudForms Management Engine
Classification: Red Hat
Component: Providers
Version: 5.7.0
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: GA
: 5.8.0
Assignee: Jirka Kremser
QA Contact: Mike Foley
URL:
Whiteboard:
Depends On:
Blocks: 1387271 1393546
TreeView+ depends on / blocked
 
Reported: 2016-11-01 19:05 UTC by Paul Gier
Modified: 2019-08-06 20:07 UTC (History)
12 users (show)

Fixed In Version: 5.8.0.0
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1393546 (view as bug list)
Environment:
Last Closed: 2017-06-12 17:55:56 UTC
Category: ---
Cloudforms Team: Middleware
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Hawkular Services log file using postgres jdbc config (127.98 KB, text/plain)
2016-11-02 18:51 UTC, Paul Gier
no flags Details
HS Log (95.15 KB, text/plain)
2016-11-03 19:25 UTC, Matt Mahoney
no flags Details
CFME evm.log (1.81 MB, application/zip)
2016-11-03 20:00 UTC, Matt Mahoney
no flags Details

Description Paul Gier 2016-11-01 19:05:37 UTC
When adding a hawkular middleware provider which is using postgresql to store inventory information, the inventory refresh fails in CloudForms and displays the error: "undefined method `path' for nil:NilClass ".

The evm.log file contains the following:
> ManageIQ::Providers::Hawkular::MiddlewareManager: [hawkular-prod-01]
> [----] E, [2016-11-01T13:31:06.087098 #2209:935148] ERROR -- : MIQ(ManageIQ::Providers::Hawkular::MiddlewareManager::Refresher#refresh) EMS: [hawkular-prod-01], id: [1] Refresh failed
> [----] E, [2016-11-01T13:31:06.087361 #2209:935148] ERROR -- : [NoMethodError]: undefined method `path' for nil:NilClass  Method:[rescue in block in refresh]
> [----] E, [2016-11-01T13:31:06.087834 #2209:935148] ERROR -- : /var/www/miq/vmdb/app/models/manageiq/providers/hawkular/middleware_manager.rb:91:in `block in os_resource_for'
> /var/www/miq/vmdb/app/models/ext_management_system.rb:360:in `with_provider_connection'
> /var/www/miq/vmdb/app/models/manageiq/providers/hawkular/middleware_manager.rb:89:in `os_resource_for'
> /var/www/miq/vmdb/app/models/manageiq/providers/hawkular/middleware_manager.rb:85:in `machine_id'
> /var/www/miq/vmdb/app/models/manageiq/providers/hawkular/middleware_manager/refresh_parser.rb:33:in `block (2 levels) in fetch_middleware_servers'

Comment 2 Paul Gier 2016-11-01 19:50:36 UTC
Some additional information discovered during testing:
- The inventory tables in postgres are successfully created when the container is started with the postgres jdbc parameters, so the container is able to connect to the postgresql server. 
- Running the same container without connecting to postgresql (using the built-in hsqldb) the inventory refresh works fine.

Comment 4 Lukas Krejci 2016-11-02 12:59:42 UTC
I can't comment on the MiQ side of the picture, Ruby is like Greek to me :)

While I don't know the semantics of the "refresh", could the reason for this be that the hawkular provider is looking for something in inventory that is not yet there?

Postgres is slower than the embedded database so this might be just a timing issue of sorts coupled with an improperly handled response handling on the hawkular provider side.

Could you attach the hawkular-services logs so that I can check the error is not coming from the inventory side?

Comment 5 Jirka Kremser 2016-11-02 13:22:23 UTC
This will happen if this line is reached https://git.io/vXs7U or in other words if there is no such resource type called "Operating System" under the feed in the h-inventory.

There are couple of scenarios, I am aware of, when this can be true. Either the h-services were started with -Dhawkular.agent.enabled=false or the refresh is called before the data is in the h-inventory as Lukas has mentioned above ^, the operation system metrics are not being collected, because it's somehow turned off in the agent config in the standalone.xml or some other other unknown reason to me.

Anyway the code should be more robust imho, but I am not sure how to handle this. It's the trade-off between the fail-fast strategy vs the "robust self-healing smartness" :]

What are the implications if the os field is empty? Perhaps Juca knows.

Comment 6 Juraci Paixão Kröhling 2016-11-02 14:37:54 UTC
> What are the implications if the os field is empty? Perhaps Juca knows.

There's no problem at all if the field is empty. It is *expected* to be filled for Linux machines, specially the ones with systemd, such as newer Ubuntu's, Fedora's, ... It's *not expected* to be filled for containers, for instance. Or Windows machines.

I think I talked with Mazz back then, and there was a guarantee that "Operating System" would always exist. If that changed, then the "API" changed and this code obviously need to be revisited.

Comment 7 Lukas Krejci 2016-11-02 16:07:12 UTC
> I think I talked with Mazz back then, and there was a guarantee that "Operating System" would always exist.

s/always/eventually/ and you are right :) Agent always discovers it but it can take time for it to appear in inventory...

Comment 8 Paul Gier 2016-11-02 18:51:47 UTC
Created attachment 1216708 [details]
Hawkular Services log file using postgres jdbc config

Comment 9 Paul Gier 2016-11-02 18:53:22 UTC
I should also mention that this is postgresql version 9.2.15, which is the current version available in the RHEL 7 yum repo.

Comment 10 Lukas Krejci 2016-11-03 10:26:38 UTC
Seeing this in the HS logs:
4:28:34,815 WARN  [org.hawkular.inventory.rest] (default task-17) RestEasy exception, : java.lang.RuntimeException: org.postgresql.util.PSQLException: ERROR: cached plan must not change result type
	at org.umlg.sqlg.structure.SqlgEdge.load(SqlgEdge.java:235)
	at org.umlg.sqlg.structure.SqlgElement.property(SqlgElement.java:215)
	at org.umlg.sqlg.structure.SqlgEdge.property(SqlgEdge.java:54)
	at org.hawkular.inventory.impl.tinkerpop.TinkerpopBackend.relate(TinkerpopBackend.java:717)
	at org.hawkular.inventory.impl.tinkerpop.TinkerpopBackend$3.relateToParent(TinkerpopBackend.java:924)
	at org.hawkular.inventory.impl.tinkerpop.TinkerpopBackend$3.defaultAction(TinkerpopBackend.java:842)
	at org.hawkular.inventory.impl.tinkerpop.TinkerpopBackend$3.defaultAction(TinkerpopBackend.java:839)
	at org.hawkular.inventory.api.model.StructuredData$Visitor$Simple.visitString(StructuredData.java:461)
	at org.hawkular.inventory.api.model.StructuredData.accept(StructuredData.java:97)
	at org.hawkular.inventory.impl.tinkerpop.TinkerpopBackend$3.visitMap(TinkerpopBackend.java:913)
	at org.hawkular.inventory.impl.tinkerpop.TinkerpopBackend$3.visitMap(TinkerpopBackend.java:839)
...

So the issue is twofold here.
1) Inventory on Postgres suffers from concurrent update of schema and querying (I've already started working on a fix yesterday)
2) MiQ provider should not assume the Operating System resource will always be there - it will *eventually* appear in inventory but there is no guarantee it will be there at the time of refresh.

Comment 11 Matt Mahoney 2016-11-03 19:25:25 UTC
Created attachment 1217127 [details]
HS Log

Comment 13 Matt Mahoney 2016-11-03 20:00:00 UTC
Created attachment 1217129 [details]
CFME evm.log

Comment 14 Lukas Krejci 2016-11-04 17:18:49 UTC
Note that PR https://github.com/hawkular/hawkular-inventory/pull/309 contains a fix for the postgres issues in inventory and is pending review.

Comment 15 Lukas Krejci 2016-11-07 12:50:44 UTC
Reassigning back to Heiko, because this is hopefully handled on the inventory side, while the MiQ side still needs attention.

Comment 16 Jirka Kremser 2016-11-07 16:10:27 UTC
MiQ PR: https://github.com/ManageIQ/manageiq/pull/12477

Comment 17 Paul Gier 2016-11-08 22:25:53 UTC
This seems to be working in the DR8 (0.19.0.Final) build.  Moving to ON_QA to verify that the original issue is resolved.

Comment 18 Heiko W. Rupp 2016-11-09 07:57:15 UTC
The ruby side is merged to MiQ master, but as far as I can see not backported to Euwe. So the check for the "null pointer exception" is not yet in CF

Comment 19 Jirka Kremser 2016-11-09 14:00:27 UTC
@Paul Gier: you probably got this from the comments, but just for sure: this issue is not-deterministic, it happens only if you add the provider and run the refresh in the MiQ fast enough.

Hmm, actually changing the platform enabled from "true" to "false" in the standalone.xml of the monitored server should work too.

Comment 24 Heiko W. Rupp 2016-11-09 19:32:52 UTC
Setting to POST, as it is in master on the ruby-side already


Note You need to log in before you can comment on or make changes to this bug.