PROBLEM:

We have rebooted both the primary and standby postgres nodes. After the reboot , the enterprise failover manager(EFM) , was not getting started on the standby node.

PRIMARY NODE – 10.20.30.40
STANDBY NODE – 10.20.30.41


[root@STANDBY efm-3.9]# systemctl start edb-efm-3.9
Job for edb-efm-3.9.service failed because the control process exited with error code. See "systemctl status edb-efm-3.9.service" and "journalctl -xe" for details.

-- Check the log 
cat /var/log/efm-3.9/efm.log


        at com.enterprisedb.efm.nodes.EfmAgent.run(EfmAgent.java:211)
        at com.enterprisedb.efm.main.ServiceCommand.main(ServiceCommand.java:111)
2021-06-28 11:14:10 com.enterprisedb.efm.nodes.EfmAgent run ERROR: Exception starting service
java.lang.SecurityException: authentication failed

SOLUTION:

Check the cluster status on primary node:

[root@PRIMARY ~]# /usr/edb/efm-3.9/bin/efm cluster-status efm
Cluster Status: efm

        Agent Type  Address              Agent  DB       VIP
        -----------------------------------------------------------------------
        Master      10.20.30.40         UP     UP

Allowed node host list:
        10.20.30.40

Membership coordinator: 10.20.30.40

Standby priority host list:
        (List is empty.)

Promote Status:

        DB Type     Address              WAL Received LSN   WAL Replayed LSN   Info
        ---------------------------------------------------------------------------
        Master      10.20.30.40                             3/80000D0

        No standby databases were found.

Here, we don’t see the entry for standby server. So let’s add that in allow node list.

Run allow-node on primary

[root@PRIMARY ~]# /usr/edb/efm-3.4/bin/efm allow-node  efm 10.20.30.41

Start the efm service on standby server.


[root@STANDBY efm-3.9]# systemctl start edb-efm-3.9

It succeeded this time. Check the cluster status again.

Cluster status:

[root@STANDBY ~]#  /usr/edb/efm-3.9/bin/efm cluster-status efm
Cluster Status: efm

        Agent Type  Address              Agent  DB       VIP
        -----------------------------------------------------------------------
        Master      10.20.30.40          UP     UP
        Standby     10.20.30.41          UP     UP

Allowed node host list:
        10.20.30.40 10.20.30.41

Membership coordinator: 10.20.30.40

Standby priority host list:
        10.20.30.41

Promote Status:

        DB Type     Address              WAL Received LSN   WAL Replayed LSN   Info
        ---------------------------------------------------------------------------
        Master      10.20.30.40                             3/80001B0
        Standby     10.20.30.41          3/80001B0          3/80001B0

        Standby database(s) in sync with master. It is safe to promote.