基于Repmgr实现数据库高可用

安装环境

软件环境

  • postgres 10
  • repmgr 5.3.2
  • centos7

网络环境

IP地址 运行软件
10.10.2.12/node1 repmgr,repmgrd,pg
10.10.2.13/node2 repmgr,repmgrd,pg
10.10.2.14/node3 repmgr,repmgrd,pg

前期准备

  • hosts 文件配置

    vi /etc/hosts # 
    10.10.2.12 node1
    10.10.2.13 node2
    10.10.2.14 node3
    
  • 安装PG10

 $ yum install https://download.postgresql.org/pub/repos/yum/reporpms/EL-7-x86_64/pgdg-redhat-repo-latest.noarch.rpm
  $ yum update -y 
  $ yum install -y postgresql10-server postgresql10  postgresql10-contrib
  # 只有主库需要初始化
  $ sudo su postgres -c  "/usr/pgsql-10/bin/initdb  --data-checksums -D /var/lib/pgsql/10/data" 
  • 安装repmgr

    $ yum install -y repmgr10.x86_64 
    
  • 提升postgres 用户具有部分sudo 执行权限

    repmgr 需要非root用户管理

    vi /etc/sudoers.d/postgres # chmod 400 /etc/sudoers.d/postgres
    Defaults:postgres !requiretty
    postgres ALL = NOPASSWD: /usr/bin/systemctl stop postgresql-10, \
    /usr/bin/systemctl start postgresql-10, \
    /usr/bin/systemctl restart postgresql-10, \
    /usr/bin/systemctl reload postgresql-10
    
  • ssh 免密互通

    配置ssh免密互通非必须,在执行switchover操作时需要

    $ su - postgres
    $ ssh-keygen
    $ cat .ssh/id_rsa.pub >  .ssh/authorized_keys # 本机免密登录 
    # 将 .ssh 目录下authorized_keys,id_rsa,id_rsa.pub 拷贝到其他节点对应位置上,并赋值对应权限
    $ chmod 600 .ssh/authorized_keys
    $ chmod 600 .ssh/id_rsa
    $ chmod 644 .ssh/id_rsa.pub 
    # 用命令拷贝到其他节点 # 首次连接需要设置密码
    $ ssh-copy-id ${远端服务器} 
    $ ssh $远端服务器}
    

配置

  • PG 配置

    vi postgres.conf
    # 必须项
    max_wal_senders = 10
    max_replication_slots = 10 # 复制槽
    wal_level = 'logical' # replica 或 logical ,便于后期应用逻辑复制建议使用logical
    hot_standby = on # 从库在recovery状态下可读
    wal_log_hints = on # pg_rewind 依赖配置
    # 可选,建议开启项目。后期修改可能需要重启数据库
    archive_mode = always # 从库也可做归档
    archive_command = '/bin/true'
    archive_timeout = 300 # 单位秒 ,五分钟
    wal_keep_segments = 1024 # 保留wal文件个数, 如果设置slot 可忽略
    # 访问相关
    listen_addresses = '*'                  # 监听网卡
    port = 5432                             # 端口
    max_connections = 1000                  # 最大连接数
    superuser_reserved_connections = 3     #  超级用户预留
    
    vi pg_hba.conf #  确保本机repmgr 用户可访问
    host    replication   repmgr      127.0.0.1/32            trust
    host    replication   repmgr      0.0.0.0/0               trust
      
    local   repmgr        repmgr                              trust
    host    repmgr        repmgr      127.0.0.1/32            trust
    host    repmgr        repmgr      0.0.0.0/0               md5
    
    # 元数据信息存储设置
    postgres=# create user repmgr;
    CREATE ROLE
    postgres=# alter user repmgr with password 'xxxxx';
    ALTER ROLE
    postgres=# alter user repmgr replication ;
    ALTER ROLE
    postgres=# create database repmgr owner repmgr;
    CREATE DATABASE
    postgres=#\c repmgr
    repmgr=# create extension repmgr;
    CREATE EXTENSION
    repmgr=# GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA repmgr TO repmgr ;
    
  • repmgr

    vi repmgr.conf # /etc/repmgr/10/repmgr.conf
    ## 必须项
    node_id=1
    node_name= node1
    conninfo ='host=node1 dbname=repmgr user=repmgr connect_timeout=2'
    data_directory = '/var/lib/pgsql/10/data'
    ## 可选项
    config_directory = # pg配置文件位置
    replication_user = repmgr # 默认为conninfo 中使用的user
    location = # 服务器所在物理位置,有助于判断当发生脑裂时网络互通逻辑
    use_replication_slots = true # 是否使用物理复制槽
    log_level = INFO # 日志等级
    log_file =  '/var/log/repmgr/repmgr.log' # 日志输出文件位置 ,结合lograte管理
    # 数据库yum 方式安装,使用systemd 管理时
    service_start_command   = 'sudo systemctl start postgresql-10'
    service_stop_command    = 'sudo systemctl stop postgresql-10'
    service_restart_command = 'sudo systemctl restart postgresql-10'
    service_reload_command  = 'sudo systemctl reload postgresql-10' 
    

注册主库

  • 启动数据库

    # 启动数据库服务
    systemctl start postgresql-10
    # 设置开机自启
    systemctl enable postgresql-10
    
    # 连接测试 repmgr.conf 中的conninfo 连接串信息测试本地登录
    psql  'host=node1 dbname=repmgr user=repmgr connect_timeout=2'
    
  • 注册

    $ su - postgres
      
    # 测试注册主节点
    $ /usr/pgsql-10/bin/repmgr -f /etc/repmgr/10/repmgr.conf primary register --dry-run -L debug
    INFO: connecting to primary database...
    DEBUG: connecting to: "user=repmgr dbname=repmgr host=node1 port=5432 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
    INFO: "repmgr" extension is already installed
      
    # 正式注册主节点
    $ /usr/pgsql-10/bin/repmgr -f /etc/repmgr/10/repmgr.conf primary register 
    INFO: connecting to primary database...
    INFO: "repmgr" extension is already installed
    NOTICE: primary node record (ID: 2) registered
      
    # 查看节点信息
    $ /usr/pgsql-10/bin/repmgr -f /etc/repmgr/10/repmgr.conf cluster show
     ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                             
    ----+-------+---------+-----------+----------+----------+----------+----------+------------------------------------------------
     2  | node1 | primary | * running |          | default  | 100      | 2        | host=node1 port=5432 user=repmgr dbname=repmgr
      
    
  • 在数据库中查看注册信息

    $ repmgr=# set search_path = "$user", public,repmgr;
    SET
      
    $ repmgr=# \d+ 
                                List of relations
     Schema |        Name        | Type  | Owner  |    Size    | Description 
    --------+--------------------+-------+--------+------------+-------------
     repmgr | events             | table | repmgr | 16 kB      | 
     repmgr | monitoring_history | table | repmgr | 0 bytes    | 
     repmgr | nodes              | table | repmgr | 16 kB      | 
     repmgr | replication_status | view  | repmgr | 0 bytes    | 
     repmgr | show_nodes         | view  | repmgr | 0 bytes    | 
     repmgr | voting_term        | table | repmgr | 8192 bytes | 
    (6 rows)
      
    $ SELECT * FROM repmgr.nodes;
    -[ RECORD 1 ]----+-----------------------------------------------
    node_id          | 2
    upstream_node_id | 
    active           | t
    node_name        | node1
    type             | primary
    location         | default
    priority         | 100
    conninfo         | host=node1 port=5432 user=repmgr dbname=repmgr
    repluser         | repmgr
    slot_name        | repmgr_slot_2
    config_file      | /etc/repmgr/10/repmgr.conf
      
    $ SELECT * FROM repmgr.show_nodes ;
    -[ RECORD 1 ]------+-----------------------------------------------
    node_id            | 2
    node_name          | node1
    active             | t
    upstream_node_id   | 
    upstream_node_name | 
    type               | primary
    priority           | 100
    conninfo           | host=node1 port=5432 user=repmgr dbname=repmgr
    

加入从库

加入node2 ,node3 两个节点

  • 安装PG & repmgr

  • 配置repmgr

  • 从主节点clone

    # 试运行 连接信息 -U -h -d 为主库地址 , 是否使用复制槽 在 repmgr.conf 中指定
    $ /usr/pgsql-10/bin/repmgr -h node1 -U repmgr -d repmgr -f /etc/repmgr/10/repmgr.conf standby clone --dry-run
      
    # 正式运行
    $ /usr/pgsql-10/bin/repmgr -h node1 -U repmgr -d repmgr -f /etc/repmgr/10/repmgr.conf standby clone 
    # 可选参数
    -c, --fast-checkpoint 主库执行checkpoint
    --upstream-node-id  可用于级联复制
    
    # 启动数据库
    $ sudo systemctl start postgresql-10
    # 注册从节点
    $ /usr/pgsql-10/bin/repmgr -f /etc/repmgr/10/repmgr.conf standby register 
    
    # 登录主库查看 主从复制信息
    select * from pg_stat_replication ;
    -[ RECORD 1 ]----+------------------------------
    pid              | 12204
    usesysid         | 16384
    usename          | repmgr
    application_name | node3
    client_addr      | 10.10.2.13
    client_hostname  | 
    client_port      | 37412
    backend_start    | 2022-08-11 06:15:08.568251+00
    backend_xmin     | 
    state            | streaming
    sent_lsn         | 0/5000648
    write_lsn        | 0/5000648
    flush_lsn        | 0/5000648
    replay_lsn       | 0/5000648
    write_lag        | 
    flush_lag        | 
    replay_lag       | 
    sync_priority    | 0
    sync_state       | async
    
     select * from pg_replication_slots ;
    -[ RECORD 1 ]-------+--------------
    slot_name           | repmgr_slot_3
    plugin              | 
    slot_type           | physical
    datoid              | 
    database            | 
    temporary           | f
    active              | t
    active_pid          | 12386
    xmin                | 
    catalog_xmin        | 
    restart_lsn         | 0/7001158
    confirmed_flush_lsn | 
    
    /usr/pgsql-10/bin/repmgr -f /etc/repmgr/10/repmgr.conf cluster show
     ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                             
    ----+-------+---------+-----------+----------+----------+----------+----------+------------------------------------------------
     2  | node1 | primary | * running |          | default  | 100      | 2        | host=node1 port=5432 user=repmgr dbname=repmgr
     3  | node3 | standby |   running | node1    | default  | 100      | 2        | host=node2 user=repmgr dbname=repmgr          
    

主动switchover

# 在计划提升为主库的节点上执行
/usr/pgsql-10/bin/repmgr -f /etc/repmgr/10/repmgr.conf standby switchover --siblings-follow 

# 查看切换结果 
/usr/pgsql-10/bin/repmgr -f /etc/repmgr/10/repmgr.conf cluster show
 ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                     
----+-------+---------+-----------+----------+----------+----------+----------+--------------------------------------------------------
 1  | node1 | primary | * running |          | default  | 100      | 4        | host=node1 port=5432 user=repmgr dbname=repmgr        
 2  | node2 | standby |   running | node1    | default  | 100      | 3        | host=node2 user=repmgr dbname=repmgr                  
 3  | node3 | standby |   running | node1    | default  | 100      | 4        | host=node3 dbname=repmgr user=repmgr connect_timeout=2
  

注意事项,1 原主库上的slot 仍然存在(后期测试几次发现会自动删除) 2 级联复制的情况下避免最下游节点提升主

修改级联关系

/usr/pgsql-10/bin/repmgr -f /etc/repmgr/10/repmgr.conf standby follow --upstream-node-id 1

被动switchover

模拟主节点故障

# node1 关闭
sudo systemctl stop postgresql-10

# node2 上查看集群状态
/usr/pgsql-10/bin/repmgr -f /etc/repmgr/10/repmgr.conf cluster show
 ID | Name  | Role    | Status        | Upstream | Location | Priority | Timeline | Connection string                                     
----+-------+---------+---------------+----------+----------+----------+----------+--------------------------------------------------------
 1  | node1 | primary | ? unreachable | ?        | default  | 100      |          | host=node1 port=5432 user=repmgr dbname=repmgr        
 2  | node2 | standby |   running     | ? node1  | default  | 100      | 4        | host=node2 user=repmgr dbname=repmgr                  
 3  | node3 | standby |   running     | ? node1  | default  | 100      | 4        | host=node3 dbname=repmgr user=repmgr connect_timeout=2

WARNING: following issues were detected
  - unable to connect to node "node1" (ID: 1)
  - node "node1" (ID: 1) is registered as an active primary but is unreachable
  - unable to connect to node "node2" (ID: 2)'s upstream node "node1" (ID: 1)
  - unable to determine if node "node2" (ID: 2) is attached to its upstream node "node1" (ID: 1)
  - unable to connect to node "node3" (ID: 3)'s upstream node "node1" (ID: 1)
  - unable to determine if node "node3" (ID: 3) is attached to its upstream node "node1" (ID: 1)

HINT: execute with --verbose option to see connection error messages

提升node2 节点为主节点

# 在node2 上执行
$ /usr/pgsql-10/bin/repmgr -f /etc/repmgr/10/repmgr.conf standby promote
WARNING: 1 sibling nodes found, but option "--siblings-follow" not specified
DETAIL: these nodes will remain attached to the current primary:
  node3 (node ID: 3)
NOTICE: promoting standby to primary
DETAIL: promoting server "node2" (ID: 2) using "/usr/pgsql-10/bin/pg_ctl  -w -D '/var/lib/pgsql/10/data' promote"
waiting for server to promote.... done
server promoted
NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete
NOTICE: STANDBY PROMOTE successful
DETAIL: server "node2" (ID: 2) was successfully promoted to primary

# 查看集群状态
$ /usr/pgsql-10/bin/repmgr -f /etc/repmgr/10/repmgr.conf cluster show
 ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                     
----+-------+---------+-----------+----------+----------+----------+----------+--------------------------------------------------------
 1  | node1 | primary | - failed  | ?        | default  | 100      |          | host=node1 port=5432 user=repmgr dbname=repmgr        
 2  | node2 | primary | * running |          | default  | 100      | 5        | host=node2 user=repmgr dbname=repmgr                  
 3  | node3 | standby |   running | ? node1  | default  | 100      | 4        | host=node3 dbname=repmgr user=repmgr connect_timeout=2

WARNING: following issues were detected
  - unable to connect to node "node1" (ID: 1)
  - unable to connect to node "node3" (ID: 3)'s upstream node "node1" (ID: 1)
  - unable to determine if node "node3" (ID: 3) is attached to its upstream node "node1" (ID: 1)

HINT: execute with --verbose option to see connection error messages

# 如果原来集群中有其他从节点,主节点也指向故障节点node1 , 在提升node2节点时同时将其他节点指向自己
$ /usr/pgsql-10/bin/repmgr -f /etc/repmgr/10/repmgr.conf standby promote --siblings-follow 

node3 重新指定主库为node2

# 在node3 上执行
$ /usr/pgsql-10/bin/repmgr -f /etc/repmgr/10/repmgr.conf standby follow
# 查看状态
$ /usr/pgsql-10/bin/repmgr -f /etc/repmgr/10/repmgr.conf cluster show
 ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                     
----+-------+---------+-----------+----------+----------+----------+----------+--------------------------------------------------------
 1  | node1 | primary | - failed  | ?        | default  | 100      |          | host=node1 port=5432 user=repmgr dbname=repmgr        
 2  | node2 | primary | * running |          | default  | 100      | 5        | host=node2 user=repmgr dbname=repmgr                  
 3  | node3 | standby |   running | node2    | default  | 100      | 4        | host=node3 dbname=repmgr user=repmgr connect_timeout=2

WARNING: following issues were detected
  - unable to connect to node "node1" (ID: 1)

HINT: execute with --verbose option to see connection error messages

node1 重新加入集群

# 登录到原主node1 节点 , 确保数据是关闭状态 
$ /usr/pgsql-10/bin/repmgr node rejoin -f /etc/repmgr/10/repmgr.conf -d 'host=node3 dbname=repmgr user=repmgr'  --force-rewind --config-files=postgresql.conf --verbose --dry-run
NOTICE: using provided configuration file "/etc/repmgr/10/repmgr.conf"
NOTICE: rejoin target is node "node3" (ID: 3)
INFO: replication slots in use, 3 free slots on node 9
INFO: replication connection to the rejoin target node was successful
INFO: local and rejoin target system identifiers match
DETAIL: system identifier is 7129425727714895616
INFO: prerequisites for using pg_rewind are met
INFO: temporary archive directory "/tmp/repmgr-config-archive-node1" created
INFO: file "postgresql.conf" would be copied to "/tmp/repmgr-config-archive-node1/postgresql.conf"
INFO: 1 files would have been copied to "/tmp/repmgr-config-archive-node1"
INFO: temporary archive directory "/tmp/repmgr-config-archive-node1" deleted
INFO: pg_rewind would now be executed
DETAIL: pg_rewind command is:
  /usr/pgsql-10/bin/pg_rewind -D '/var/lib/pgsql/10/data' --source-server='host=node3 dbname=repmgr user=repmgr connect_timeout=2'
INFO: prerequisites for executing NODE REJOIN are met

# 正式执行
$ /usr/pgsql-10/bin/repmgr node rejoin -f /etc/repmgr/10/repmgr.conf -d 'host=node3 dbname=repmgr user=repmgr'  --force-rewind --config-files=postgresql.conf --verbose
# 启动数据库
$ sudo systemctl start postgresql-10
# 查看结果
$ /usr/pgsql-10/bin/repmgr -f /etc/repmgr/10/repmgr.conf cluster show

node1 节点故障不可恢复情况

# 将node1 节点从集群中注销, 登录到其他任意节点
$ /usr/pgsql-10/bin/repmgr -f /etc/repmgr/10/repmgr.conf standby unregister --node-id
 $ repmgr -f /etc/repmgr/10/repmgr.conf cluster event

自动failover

修改配置

vi postgresql.conf

shared_preload_libraries = 'repmgr'

vi /etc/repmgr/10/repmgr.conf

failover=automatic
promote_command='/usr/pgsql-10/bin/repmgr standby promote -f /etc/repmgr/10/repmgr.conf --log-to-file'
follow_command='/usr/pgsql-10/bin/repmgr standby follow -f /etc/repmgr/10/repmgr.conf --log-to-file --upstream-node-id=%n'
log_file='/var/lib/pgsql/repmgrd.log'
monitoring_history=true #(启用监控参数)
monitor_interval_secs=5 #(定义监视数据间隔写入时间参数)
reconnect_attempts=10 #(故障转移之前,尝试重新连接主库次数(默认为6)参数)
reconnect_interval=5 #(每间隔5s尝试重新连接一次参数

重启 postgres

repmgr node service --action=restart

启动 repmgrd

systemctl start repmgr-10
systemctl enable repmgr-10

测试

主节点重新加入集群

# 全部原主节点保持关闭状态。 首先将执行pg_rewind ->   自动启动服务 - > 注册服务
 repmgr node rejoin -d 'host=node1 dbname=repmgr user=repmgr' --force-rewind --config-files=postgresql.conf,postgresql.auto.conf --verbose --dry-run
 repmgr node rejoin -d 'host=node1 dbname=repmgr user=repmgr' --force-rewind --config-files=postgresql.conf,postgresql.auto.conf --verbose

自动failover 几点说明

为预防网络脑裂发生: repmgr 做了如下努力: 1 ,使用见证节点。主要应用于只有主从两个节点的场景, 将见证节点与主节点部署在同一个网络区间。 当主节点与见证节点同时不可见时,认为是网络错误,不发生切换。当见证节点可见,主节点不可见时,认为主节点失效,触发切换。

引入见证节点提高了维护成本。

多IDC网络场景,方案2 : 使用location 标记服务所在的位置。当与主节点location相同的节点都不可见时,认为是网络错误,不发生切换。如果集群中没有其他节点与主节点的location配置相同,则永远不发生切换。

举例: 三个IDC,三个pg服务。location分别设置为dc1(主节点),dc2(从节点),dc3(从节点)。 当dc1 节点发生故障时不会发生故障转移。 因为集群中location 为dc1的节点(只有一个)都不可见。

合理的设置应该为cd1(主节点) ,dc1(一个从),dc2(另一个从)。 此时当dc1 发生以外时,可故障转移。

注意事项: 当主节点网络问题。如网卡,网线故障。服务正常的情况下。集群也会发生故障转移。当网络恢复的时候发生脑裂。 主要问题。当主节点脱离集群时没有彻底关闭服务。repmgr不能管理pg服务,将其关闭。

可以如下设置,通过确定与从节点连接个数决定是否关闭当前服务。在command中关闭当前pg

#child_nodes_check_interval=5           # Interval (in seconds) to check for attached child nodes (standbys)
#child_nodes_connected_min_count=-1     # Minimum number of child nodes which must remain connected, otherwise
                                        # disconnection command will be triggered
#child_nodes_disconnect_min_count=-1    # Minimum number of disconnected child nodes required to execute disconnection command
                                        # (ignored if "child_nodes_connected_min_count" set)
#child_nodes_disconnect_timeout=30      # Interval between child node disconnection and disconnection command execution
child_nodes_disconnect_command='' 

location 修改

localtion 参数在高可用架构中为防止网络分区问题发挥重要作用。 发生故障转移后location 通常也需要重新设计。

修改 /etc/repmgr/10/repmgr.conf

生效

repmgr standby register --force
systemctl restart repmgr-10

查看

repmgr cluster show
或
select * from repmgr.nodes;

修改location