PROBLEM:
I have received below error , while starting efm agent on standby server.
[root@dbhost41 edbdata]# systemctl start edb-efm-3.9
Job for edb-efm-3.9.service failed because the control process exited with error code. See "systemctl status edb-efm-3.9.service" and "journalctl -xe" for details.
cat /var/log/efm-3.9/startup-efm.log
2021-06-15 14:58:06 Trigger file validation failed. Could not start agent as standby. See logs for more details.
SOLUTION:
1. Check the trigger_file parameter value.
postgres=# \x
Expanded display is on.
postgres=# select * from pg_settings where name='promote_trigger_file';
-[ RECORD 1 ]---+-------------------------------------------------------------------
name | promote_trigger_file
setting | --- >> It is blank, means no values is set.
unit |
category | Replication / Standby Servers
short_desc | Specifies a file name whose presence ends recovery in the standby.
extra_desc |
context | sighup
vartype | string
source | default
min_val |
max_val |
enumvals |
boot_val |
reset_val |
sourcefile |
sourceline |
pending_restart | f
Above output shows, promote_trigger_file parameter in not set in the config file.
2. uncomment and update the promote_trigger_file parameter in postgres.conf file
vi postgres.conf
promote_trigger_file='/postgres/edbdata/trigger5444'
3. Reload the configuration:
postgres=# select pg_reload_conf();
pg_reload_conf
----------------
t
(1 row)
postgres=# \x
Expanded display is on.
postgres=# select * from pg_settings where name='promote_trigger_file';
-[ RECORD 1 ]---+-------------------------------------------------------------------
name | promote_trigger_file
setting | /pgdata/edbdata/trigger_file
unit |
category | Replication / Standby Servers
short_desc | Specifies a file name whose presence ends recovery in the standby.
extra_desc |
context | sighup
vartype | string
source | configuration file
min_val |
max_val |
enumvals |
boot_val |
reset_val | /pgdata/edbdata/trigger_file
sourcefile | /pgdata/edbdata/postgresql.conf
sourceline | 318
pending_restart | f
Now we can see the value is update, lets restart the efm.
4. Start efm and check status :
[root@dbhost41 edbdata]# systemctl start edb-efm-3.9
[root@dbhost41 ~]# systemctl status edb-efm-3.9
● edb-efm-3.9.service - EnterpriseDB Failover Manager 3.9
Loaded: loaded (/usr/lib/systemd/system/edb-efm-3.9.service; disabled; vendor preset: disabled)
Active: active (running) since Tue 2021-06-15 15:08:20 +03; 3h 20min ago
Process: 1660 ExecStart=/bin/bash -c /usr/edb/efm-3.9/bin/runefm.sh start ${CLUSTER} (code=exited, status=0/SUCCESS)
Main PID: 1740 (java)
Tasks: 27
CGroup: /system.slice/edb-efm-3.9.service
└─1740 /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.292.b10-1.el7_9.x86_64/jre/bin/java -cp /usr/edb/efm-3.9/lib/EFM-3.9.jar -Xmx128m com.enterprisedb.efm.main.ServiceCom...
Jun 15 15:08:16 dbhost41 systemd[1]: Starting EnterpriseDB Failover Manager 3.9...
Jun 15 15:08:17 dbhost41 sudo[1757]: efm : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/edb/efm-3.9/bin/efm_root_functions validatedbowner efm
Jun 15 15:08:17 dbhost41 sudo[1777]: efm : TTY=unknown ; PWD=/ ; USER=enterprisedb ; COMMAND=/usr/edb/efm-3.9/bin/efm_db_functions validaterecoveryconf efm
Jun 15 15:08:17 dbhost41 sudo[1795]: efm : TTY=unknown ; PWD=/ ; USER=enterprisedb ; COMMAND=/usr/edb/efm-3.9/bin/efm_db_functions validatedbconf efm
Jun 15 15:08:17 dbhost41 sudo[1813]: efm : TTY=unknown ; PWD=/ ; USER=enterprisedb ; COMMAND=/usr/edb/efm-3.9/bin/efm_db_functions validatepgbin efm
Jun 15 15:08:17 dbhost41 sudo[1849]: efm : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/edb/efm-3.9/bin/efm_root_functions dbservicestatus efm
Jun 15 15:08:17 dbhost41 sudo[1873]: efm : TTY=unknown ; PWD=/ ; USER=enterprisedb ; COMMAND=/usr/edb/efm-3.9/bin/efm_db_functions validatepromotetriggerfil...rigger_file
Jun 15 15:08:20 dbhost41 systemd[1]: Started EnterpriseDB Failover Manager 3.9.
Hint: Some lines were ellipsized, use -l to show in full.
[root@dbhost88 ~]#
[root@dbhost41 ~]# /usr/edb/efm-3.9/bin/efm cluster-status efm
Cluster Status: efm
Agent Type Address Agent DB VIP
-----------------------------------------------------------------------
Master 10.20.30.40 UP UP
Standby 10.20.30.41 UP UP
Allowed node host list:
10.20.30.40 10.20.30.41
Membership coordinator: 10.20.30.40
Standby priority host list:
10.20.30.41
Promote Status:
DB Type Address WAL Received LSN WAL Replayed LSN Info
---------------------------------------------------------------------------
Master 10.20.30.40 0/70001C0
Standby 10.20.30.41 0/7000000 0/70001C0
Standby database(s) in sync with master. It is safe to promote.