Bug 1983824

Summary: Postgres JDBC 42.2.23 wont connect to Postgresql 13.3 via localhost (127.0.0.1)
Product: [Fedora] Fedora Modules Reporter: Martin Gregorie <martin>
Component: postgresqlAssignee: langdon <langdon>
Status: CLOSED NOTABUG QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: unspecifiedCC: asamalik, fjanus, hhorak, karsten, langdon, nphilipp, odubaj, panovotn, phracek, pkubat, praiskup, rado.pitonak, rbean
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-08-17 09:16:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
My current version of pg_hba.conf
none
A tgz file containing my Postgres configuration and Java test clients
none
Section of postgresql.service journal. none

Description Martin Gregorie 2021-07-19 21:49:04 UTC
Created attachment 1803440 [details]
My current version of pg_hba.conf

Description of problem:

The current JDBC module, 42.2.23 can't connect to a local postgresql instance
running version 13.3 as installed when upgrading from Fedora 33 to Fedora 34, and also will not accept connections over the local network.

There appears to be two common causes: 

1) formerly the Postgres JDBC module would use UNIX-domain socket connections
   to connect to the postgres server and a plain (non-SSL)
   connection when connecting over a LAN. However, since JDBC driver version
   42.2.23 was introduced, local connections are now forced to use localhost
   (127.0.0.1) connections because UNIX-domain socket connections are no 
   longer supported

2) This should not be a big issue except that: this version of Postgresql
   will not accept a connection definition of 'hostnossl' in pg_hba.conf
   and AFAIK localhost connections never use SSL.

3) This also prevents prevents it from accepting non-SSL connections from
   anywhere: I want to use them within my LAN.

4) This becomes rather harder to diagnose because:

   - if pg_hba.conf contains hostnossl definitions the server crashes rather
     than starting or restarting and the journal entry says that it failed 
     without saying why. I only discovered this by trial and error, so
     better diagnostics when parsing the that file, please.

   - describing connection as 'local' or 'host' does no good because it
     only causes the JDBC module to throw an exception:

     Exception caught: FATAL: no pg_hba.conf entry for host "127.0.0.1", 
     user "ma", database "ma", SSL off

     when the JDBC caller is run locally and causes the connection to be
     refused: 

         Exception caught: Connection to localhost:5432 refused. 
         Check that the hostname and port are correct and that the postmaster
         is accepting TCP/IP connections.

     Both these cases were working correctly before I upgraded from Fedora 
     33 to 34 (and hence also migrated to PostgreSQL 13.3 and JDBC version
     42.2.23 from the previous versions
     
Version-Release number of selected component (if applicable): 13.3 


How reproducible: 
=================
Always happens when I try to access the PostgreSQL 13.3 database using the JDBC module 42.2.23


Steps to Reproduce:
===================
1. Start Postgresql 13.3 with the pg_hba.conf file (attached)
2. Run any Java program that tries to access the database using
   JDBC version 42.2.23


Actual results:
===============
See above


Expected results: 
=================
A connection established to the database and its contents successfully accessed.


Additional info:
================
As I said above: the JDBC client programs were successfully accessing the database before I upgraded my systems from Fedora 33 to 34.

The database reindexing process was run immediately after the upgrade and completed successfully. The database seems to be intact, since running psql locally works just fine: no access errors and expected results returned for queries.

I've marked this bug as URGENT since it is blocking access to my databases

Comment 1 Filip Januš 2021-07-20 06:29:22 UTC
Hi Martin, 
thanks for your report, are you able to provide a complete reproducer, please? It would help and it makes the issue solving faster.

Comment 2 Martin Gregorie 2021-07-20 12:43:16 UTC
I'm not familiar with the term "complete reproducer", so can you say what it should include. 

I can, of course, supply:

- copies of all the Postgres configuration files

- the source of the simple Java program I've been using to investigate this problem, a simplified psql equivalent I wrote before
  the database application, a mail archiving system (the archiveer, and two graphical query programs, all written in Java).

- not mentioned: I also wrote a Perl plugin for Spamassassin (uses a Postgres view to determine if I've previously sent mail to the sender
  of the e-mail being checked. Its is also failing but I haven't yet looked to see why - I suspect that its failing because its using a 
  UNIX-domain socket connection. 

I assume you'll want all of the above, but is there anything else I should be including?

Comment 3 Martin Gregorie 2021-07-20 12:57:01 UTC
CORRECTION: The Perl plugin is using a basic TCP/IP connection via port 5432: the error message I'm seeing is:

DBI connect('dbname=ma;host=zoogz','ma',...) failed: could not connect to server: Connection refused: 10 Time(s)
    TCP/IP connections on port 5432? at /etc/mail/spamassassin/MAwhitelist.pm line 90.: 10 Time(s)
    Is the server running on host "zoogz" (192.168.7.2) and accepting: 10 Time(s)

IOW, it should be resolved when the pg_hba.conf parser accepts 'hostnossl' connection definitions.

Comment 4 Filip Januš 2021-07-20 13:13:01 UTC
(In reply to Martin Gregorie from comment #2)
> I'm not familiar with the term "complete reproducer", so can you say what it
> should include. 
> 
> I can, of course, supply:
> 
> - copies of all the Postgres configuration files
> 
> - the source of the simple Java program I've been using to investigate this
> problem, a simplified psql equivalent I wrote before
>   the database application, a mail archiving system (the archiveer, and two
> graphical query programs, all written in Java).
> 
> - not mentioned: I also wrote a Perl plugin for Spamassassin (uses a
> Postgres view to determine if I've previously sent mail to the sender
>   of the e-mail being checked. Its is also failing but I haven't yet looked
> to see why - I suspect that its failing because its using a 
>   UNIX-domain socket connection. 
> 
> I assume you'll want all of the above, but is there anything else I should
> be including?

Definitely as you write, please provide your configuration and simple java code for demonstration of the problem.

Comment 5 Martin Gregorie 2021-07-20 17:41:10 UTC
Created attachment 1803815 [details]
A tgz file containing my Postgres configuration and Java test clients

Comment 6 Martin Gregorie 2021-07-20 18:03:05 UTC
There are three Java test programs in the jsql directory inside the attached Bug-1983824.tgz:

- JSql - as already discussed

- DriverTest - this just lists the available JDBC drivers: - currently just org.postgresql.Driver@5ce65a89

- BrowseDB   - a simple visual database display tool - currently does exactly the same as JSql

If you want to compile these programs you'll find that I've set it up to use ant from the command line and that the build.xml file should work OK - it expects all jarfiles to be on $CLASSPATH including environ-1.5.jar which is included in the archive. It contains a collection of generally useful classes such as ReportError (controllable tracing and error reporting) and ArgParser (command line parser) which are used by JSql plus a lot more that I use elsewhere, e.g. CSV, a class for reading and writing CSV format files (why isn't there one in the standard Java class library?)

Anyway, I've just included the jarfile because doing that was easy, but if you'd prefer source so you can compile it yourself, just ask.

Comment 7 Martin Gregorie 2021-07-21 22:34:11 UTC
Created attachment 1804273 [details]
Section of postgresql.service journal.

I have discovered a problem with stopping and restarting the postgresql server:

The command "systemctl stop postgresql.service" often fails to stop the server, If this happens, the job log reports that the 'postmaster' process failed to stop
and that the reason is in the journal (it never is) and the service goes into a failed state. I have not found a way to restart postgresql once this state is reached except by rebooting the system. 

Once this state is reached, none of the systemctl commands daemon-reload, reload, restart or clean have any effect.

The pg_log.txt shows a relevant section of the journal when a cronjob runs at 04:00:00. It stops the postgresql, runs rsnapshot and restarts postgresql. is this a systemd or a postgresql problem?

Comment 8 Martin Gregorie 2021-07-21 22:41:09 UTC
This non-restarting issue typically leaves a message in the cronjob log saying that a process, postmaster, has failed to stop and is still running. The systemd postgresql.service is left in a FAILED state: I have yet find a way to clear that state and restart the service short of rebooting the system.

I'd appreciate knowing whether this is a systemd issue or a Postgresql one and if there's any way of recovering from it apart from a reboot.

Comment 9 Ondrej Dubaj 2021-07-22 07:47:58 UTC
Martin, I see you are experiencing problems with postgresql-jdbc-42.2.23. You have probably used an unsupported version of this package, as postgresql-jdbc-42.2.23 is present in Fedora official repositories just for a few hours. This means you have used some kind of 3rd partly software, which can bring problems.

Currently, in the latest stable Fedora-34, we have postgresql-jdbc-42.2.18 present and in Fedora-Rawhide we have postgresql-jdbc-42.2.23 present (but just for a few hours).

On our side, with the latest supported version of postgresql-13.3-4 and posgresql-jdbc-42.2.23-1 everything works as expected.

Can you please say, what version of Fedora are you using ? Are you please able to reproduce the problem with any stable and supported versions of these packages ?

Thank you in advance.

Ondrej

Comment 10 Martin Gregorie 2021-07-22 10:07:03 UTC
I am using Fedora 34, installed last Friday via an online upgrade from 33.

The copy of postgresql-jdbc-42.2.23 should be OK since I got it from https://jdbc.postgresql.org before I realised that the upgrade had already installed 42.2.18

I've just been upgrading the simple code I developed some years back to learn about the care and feeding of the Postgresql JDBC Manager, so I guess I should be able to simply remove 42.2.23 from my CLASSPATH definition. However,  the package installer *should* have added 42.2.18 to the basic Java search list, putting it on the search path  ahead of anything in $CLASSPATH.

Anyway, I can easily remove 42.2.23 from CLASSPATH temporarily to see what difference that makes.

However, my main problem right now is a bit different: my overnight backup run is essentially this:

systemctl stop postgresql.service
rsnapshot
systemctl start postgresql.service

But, unfortunately the Postgres start doesn't succeed: the stop seems to leave the postgres service in an unrecoverable FAILURE state which causes the subsequent start request to do nothing, leaving the service status 
set to FAILURE after trying twice to SIGKILL postmaster and reporting incorrectly that it had left postmaster running after failing to kill it, presumably because the FAILURE state prevented 'start' from even trying to start postmaster.

The only way out of this I've discovered is to reboot the computer. This never happened while I was on Fedora 33, or 32 for that matter.

Anyway: today I'm intent on getting Java access to Postgresql up and working again and, in the process, producing a more ordered list of the stuff I've found. I'll add this to the bug report later today.

Comment 11 Ondrej Dubaj 2021-07-22 10:19:36 UTC
I am afraid, that if you are using postgresql-jdbc-42.2.23 from https://jdbc.postgresql.org, we cannot help you with your problem. Please remove postgresql-jdbc-42.2.23 and try to install the supported versions of packages and try to reproduce the problem. If it persist, we can look deeper where the problem lies. Are you using the supported version of postgresql from Fedora repositories ?

Thanks in advance.

Comment 12 Martin Gregorie 2021-07-22 10:34:14 UTC
Yes, PostgreSQL is exactly as installed by the F33 -> F34 upgrade.

Will revert to JDBC 42.2.18 before doing anything more.

Comment 13 Martin Gregorie 2021-07-24 17:11:10 UTC
I've now reverted to JDBC 42.2.18 but have made no further progress because I cannot now start postgresql.

systemctl start postgresql.service  returns the messages:

https://serverfault.com/questions/547165/why-does-rsync-spawn-multiple-processes-for-me#547194
Jul 24 17:32:49 zoogz.gregorie.lan systemd[1]: Failed to start PostgreSQL database server.
                                    
Running "systemctl reset-failed postgresql.service" apparently works 

However, the following attempt to start Postgres fails in exactly the same way:

# systemctl start postgresql.service 
Job for postgresql.service failed because the control process exited with error code.
See "systemctl status postgresql.service" and "journalctl -xeu postgresql.service" for details.
# systemctl status postgresql.service 
× postgresql.service - PostgreSQL database server
     Loaded: loaded (/etc/systemd/system/postgresql.service; enabled; vendor preset: disabled)
    Drop-In: /etc/systemd/system/postgresql.service.d
             └─dataLocation.conf
     Active: failed (Result: exit-code) since Sat 2021-07-24 17:41:10 BST; 15s ago
    Process: 142827 ExecStartPre=/usr/libexec/postgresql-check-db-dir postgresql (code=exited, status=0/SUCCESS)
    Process: 142829 ExecStart=/usr/bin/postmaster -D ${PGDATA} (code=exited, status=1/FAILURE)
   Main PID: 142829 (code=exited, status=1/FAILURE)
        CPU: 87ms

Jul 24 17:41:10 zoogz.gregorie.lan systemd[1]: Starting PostgreSQL database server...
Jul 24 17:41:10 zoogz.gregorie.lan postmaster[142829]: 2021-07-24 17:41:10.565 BST [142829] LOG:  redirecting log output to logging collector process
Jul 24 17:41:10 zoogz.gregorie.lan postmaster[142829]: 2021-07-24 17:41:10.565 BST [142829] HINT:  Future log output will appear in directory "log".
Jul 24 17:41:10 zoogz.gregorie.lan systemd[1]: postgresql.service: Main process exited, code=exited, status=1/FAILURE
Jul 24 17:41:10 zoogz.gregorie.lan systemd[1]: postgresql.service: Killing process 142831 (postmaster) with signal SIGKILL.
Jul 24 17:41:10 zoogz.gregorie.lan systemd[1]: postgresql.service: Killing process 142831 (postmaster) with signal SIGKILL.
Jul 24 17:41:10 zoogz.gregorie.lan systemd[1]: postgresql.service: Failed with result 'exit-code'.
Jul 24 17:41:10 zoogz.gregorie.lan systemd[1]: postgresql.service: Unit process 142831 (postmaster) remains running after unit stopped.
Jul 24 17:41:10 zoogz.gregorie.lan systemd[1]: Failed to start PostgreSQL database server.

and this behavour repeats even if the computer has been freshly rebooted.

This behavior was stably repeatable before I ran dnf upgrade last night and remains stably repeatable after the dnf system update and and following reboot.

I have not made any changes to the Postgresql configuration files since my last comment.
The only Postgresql related change I've made was replace JDBC 42.2.23 with 42.2.18
but I fail to see how that could have any effect on whether the Postgresql server starts up or on why 
"systemctl reset-failed postgresql.service" appears to do nothing.

So, questions:

- Do you know a way to force postgresql to start regardless of the unit status? 
   
- If not, is this a problem with the way that postgresql.service is defined? 
  IOW, does the service need code added to allow 'systemctl reset-failed postgresql.service' to reset 
  the service state.

- Can you recommend a book or website that explains the systemd service data structures and/or how to
  inspect and modify them?

Comment 14 Fedora Admin user for bugzilla script actions 2021-08-02 00:09:18 UTC
This package has changed maintainer in Fedora. Reassigning to the new maintainer of this component.

Comment 15 Filip Januš 2021-08-02 07:09:45 UTC
I am afraid that your issue is caused by some type of misconfiguration of postgresql-server. Because it works on my side and I am not able to reproduce Your problem.
Maybe try to add a network mask here: "host      ma              ma              127.0.0.1               ident " -> "host      ma              ma              127.0.0.1/32               ident ",
it could cause some ty of failure during start.

Comment 16 Martin Gregorie 2021-08-02 13:29:34 UTC
You're absolutely right: I worked out wht postgresql wasn't starting over the weekend.

The immediate problem was screwed up directives in pg_hba.conf, which cause an immediate systemd service FAILURE condition.

However, mistakes in pg_hba.conf are not being reported. if pg_hba.conf has errors in it, postgresql.service exits with FAILURE set, and this exit is written to both /var/log/messages and the the systemd journal. However no diagnostic messages are sent to either /var/log/messages or captured in the systemd journal, which makes fixing the problem needlessly difficult.

I'm getting a bit more information from a simple Java program I originally used while I was finding out how to pass configuration details to the Postgresql JDBC module:
====================================================================================
$ java JSql jdbc:postgresql://zoogz/ma ma xxxxxx -ddd 
Connecting to Database URL = jdbc:postgresql://zoogz/ma
Properties: {password=xxxxxx, user=ma, ssl=false}
Exception caught: FATAL: Ident authentication failed for user "ma"

org.postgresql.util.PSQLException: FATAL: Ident authentication failed for user "ma"
====================================================================================

The above was run from another system on my LAN. 
The computer name, username and password are correct for the user owning the database: 
ssh logins using them work as expected. The password is obscured (obviously) even though 
the 'zoogz' computer is not accessable from outside my LAN.

The connection attempt fails regardless of whether the SSL property is true or false, 
so I must still be missing something quite obvious about 'ident' authorisation.

The relevant parts of my current pg_hba.conf are: 
==========================================================================
# TYPE    DATABASE        USER            ADDRESS                 METHOD

# "local" is for Unix domain socket connections only
local     all             all                                     trust

# IPv4 local connections:
host      all             all             192.168.7.0/24          ident
host      ma              ma              localhost               ident

# IPv6 local connections:
host      all             all             ::1/128                 ident

# Allow replication connections from localhost, by a user with the
# replication privilege.
local     replication     all                                     peer
==========================================================================

Comment 17 Martin Gregorie 2021-08-07 17:44:04 UTC
I'm pleased to say that I now have Postgres working exactly as I wanted it to.

- I use it to support a mail archive where:

  - all programs except one are Java, so all use JDBC. They are (an incremental backup program, a loader which can
    read backups as well as incoming mail (via mstor), a mail query and display program, and a maintenanve program).

  - the non-Java program is a Spamassasin plugin, so is written in Perl. Its purpose is to whitelist incoming e-mail
    if the database shows we have previously sent mail to the sender.

This application has been running since 2008, IOW for the last 13 years. During this time successive Fedora upgrades have not required any any action at my end apart from sometimes needing to run a database restructure. Similarly, there has been no alteration to the database logging mechanism, but I have equally lost familiarity with the documentation until the most recent upgrade to Fedora 34 / PostgreSQL 13.3 and the associated change that Postgres log messages are no longer sent to /val/log/messages

I'm certain that others will be in my shoes, so I have a few suggestions that should make debugging configuration problems easier:

Documentation
=============
- The Postgres Manual's Server Configuration section deals well with log message handling, but I expected to see the
  use and syntax of the pg_hba.conf and pg_ident.conf files at least referenced from this section: without this
  link, they are relatively hard to find simply because they are not explicitly named in subsection titles.

- both these sections could do with revision, particularly pg_ident.conf: where an example or two would help a lot

- Similarly, its not necessary to include so much detail in the comments in these files: rather the opposite, 
  since Postgres updates do not appear to update those comments.
 
A modest software change request
================================
- I understand the rationale of adding the 'log' directory structure to the 'data' collection, but there is a
  serious disconnect between the way that Postgres uses it and the way that systemd appears not to display
  anything that has been written to the postgresql-Day.log files in data/log. 

  Worse, the text:

  See "systemctl status postgresql.service" and "journalctl -xeu postgresql.service" for details.

  which is output by the "systemctl status postgresql.service" command is useless, because after a failure to
  start Postgres, neither the systemd status report nor the corresponding systemd journal entry, which just
  says that log output is being sent to the logging collector process, points at anything directly useful
  for fixing the problem. Meanwhile the systemd journal entry is even less helpful because it says just 
  'Future log output will appear in directory "log".' 

  At the least it should give the full pathname of 'directory "log", but preferably the systemd log entry should
  include a copy of the error message that was written to 'log/postgresql-Day.log'

Comment 18 Ondrej Dubaj 2021-08-17 08:33:29 UTC
Filip, can you look at the documenation changes provided by Martin?

Thanks.

Comment 19 Filip Januš 2021-08-17 09:16:21 UTC
From my point of view, all necessary information is available in official postgresql documentation[1]. Or if you want there is postgresql-docs package with the same information.
From pg_hba.conf side there is also sample file /usr/share/pgsql/pg_hba.conf.sample. Here we are able to find syntax and documentation how to deal with this configuration.

I agree with Martin that "systemctl status" could be more descriptive. So please feel free to open a new BZ with this issue or even a pull request.
This JDBC issue I am closing as not a bug.

[1] https://www.postgresql.org/files/documentation/pdf/13/postgresql-13-US.pdf