postgreSQL restore monitoring dashboard

PostgreSQL Restore Monitoring: Tracking Progress ETAs

Start times are easy. End times are the problem. When a 2TB database goes down, “it will be done when it’s done” is not an acceptable answer for the C-suite.

This guide provides the tooling to move from guessing to estimating. We cover two distinct scenarios using PostgreSQL 14+ (applicable back to 13).

1. The Narrative Scenarios

Scenario A: Logical Restore (The CPU/Lock Heavy Path)

Context: You are migrating or restoring a 1 TB Data Warehouse to a new instance for testing or disaster recovery (DR).

  • Backup Format: Directory format (-Fd) with 8 parallel jobs (-j 8).
  • Pain Points:
    • Phase 1 (Data): COPY is I/O and CPU bound.
    • Phase 2 (Indexes/Constraints): This is usually the “long tail” where progress bars stall. Index builds eat CPU; Foreign Keys (FKs) require rigorous checking.
  • Command: pg_restore -h $HOST -p $PORT -U postgres -d target_db -j 8 -Fd /mnt/backups/2023-10-27_prod_dump/

Scenario B: Physical PITR (The I/O Heavy Path)

Context: Production storage failure. You are rebuilding the primary from a pg_basebackup (1.5 TB) and replaying 200 GB of WAL archives to reach a specific point in time (e.g., 14:00 UTC).

  • Pain Points: Replaying WAL is single-threaded. The speed depends entirely on how fast one CPU core can apply changes and how fast storage can verify/fsync blocks.
  • Configuration: restore_command = 'cp /mnt/wal_archive/%f %p' configured in postgresql.conf (or postgresql.auto.conf for PG12+). hot_standby = on is crucial for monitoring.

2. Database-Side Monitoring (SQL)

If you can connect to the database, the catalogs tell the truth.

For Scenario A (Logical Restore)

Goal: Track which tables are loading and, crucially, catch the “silent” index creation phase.

1. The “What is happening right now?” Monitor This query joins standard activity with pg_stat_progress_copy (PG 14+) to show exactly how many rows have been processed for active table loads.

SQL

-- monitor_logical_restore.sql
-- Run this on the target database
SELECT 
    a.pid,
    a.application_name,
    a.state,
    -- Extract table name from query or progress view
    COALESCE(p.relid::regclass::text, 'Index/Constraint Phase') AS target_object,
    a.query_start,
    now() - a.query_start AS duration,
    -- Progress metrics for COPY phase
    p.bytes_processed / 1024 / 1024 AS mb_processed,
    p.lines_processed AS rows_loaded,
    -- Identify what command is running
    substring(a.query from 1 for 50) |

| '...' AS current_query
FROM pg_stat_activity a
LEFT JOIN pg_stat_progress_copy p ON a.pid = p.pid
WHERE a.application_name = 'pg_restore' -- pg_restore sets this automatically
AND a.state!= 'idle'
ORDER BY duration DESC;

Sample Output (Phase 1: Data Load) pid | application_name | state | target_object | duration | mb_processed | rows_loaded | current_query ——+——————+——–+—————+———-+————–+————-+—————————————————- 4021 | pg_restore | active | public.orders | 00:15:23 | 4502 | 12000000 | COPY public.orders (id, user_id, amount) FROM STDIN… 4022 | pg_restore | active | public.logs | 00:08:12 | 1200 | 5000000 | COPY public.logs (id, message, created_at) FROM STD…

2. Tracking Index Creation Progress Once COPY finishes, pg_restore switches to CREATE INDEX. This is often the longest phase.

SQL

-- monitor_index_build.sql
SELECT 
    pid,
    datname,
    relid::regclass AS table_name,
    index_relid::regclass AS index_being_built,
    phase,
    -- Blocks processed vs Total Blocks
    lockers_total,
    lockers_done,
    current_locker_pid,
    blocks_total,
    blocks_done,
    -- Calculate % completion
    ROUND(100.0 * blocks_done / NULLIF(blocks_total, 0), 2) AS progress_pct
FROM pg_stat_progress_create_index;

Sample Output (Phase 2: Indexing) pid | table_name | index_being_built | phase | blocks_total | blocks_done | progress_pct ——+—————+——————-+——————–+————–+————-+————– 4021 | public.orders | orders_pkey | building index: scanning table | 5000000 | 2500000 | 50.00 4025 | public.users | users_email_idx | building index: sorting | 1000000 | 1000000 | 100.00

For Scenario B (Physical/PITR)

Goal: Calculate how far behind the replay is compared to the target.

Prerequisite: Ensure hot_standby = on is set in postgresql.conf during recovery. This allows read-only queries while WALs are being replayed.

SQL

-- monitor_pitr_lag.sql
WITH recovery_status AS (
    SELECT 
        pg_last_wal_replay_lsn() AS current_replay_lsn,
        -- Replace this with your actual target LSN if known, 
        -- or calculate distinct from pg_walfile_name_offset
        pg_lsn('14/A0000000') AS target_lsn, 
        now() AS current_wall_time,
        pg_last_xact_replay_timestamp() AS last_replayed_ts
)
SELECT 
    current_replay_lsn,
    last_replayed_ts,
    -- How many bytes left to replay?
    pg_size_pretty(target_lsn - current_replay_lsn) AS bytes_remaining,
    -- Time lag (Wall clock vs Database time)
    now() - last_replayed_ts AS time_lag
FROM recovery_status;

Sample Output current_replay_lsn | last_replayed_ts | bytes_remaining | time_lag ——————–+—————————-+—————–+—————– 12/B0000000 | 2023-10-27 13:45:00.123+00 | 45 GB | 00:15:00.000000


3. Unix/Linux-Side Monitoring & Scripts

Sometimes the DB is locked or too slow to query. We drop to the shell.

For Scenario A: Logical Restore (The Wrapper)

Using pv with pg_restore -Fd is difficult because directory formats perform random file access. Instead, we monitor the size of the DB directory on disk or use a wrapper that tracks PIDs.

However, for a compressed dump file (.dump or .sql.gz), pv is king.

The “Transparent Pipe” Restore Script:

Bash

#!/bin/bash
# logical_restore_wrapper.sh
# Usage:./logical_restore_wrapper.sh dump_file.sql.gz target_db

DUMP_FILE="$1"
TARGET_DB="$2"
LOG_FILE="/var/log/pg_restore_${TARGET_DB}_$(date +%s).log"

# 1. Get total size for progress bar
TOTAL_SIZE=$(du -sb "$DUMP_FILE" | awk '{print $1}')

echo "Starting restore of $DUMP_FILE to $TARGET_DB at $(date)"
echo "Logs: $LOG_FILE"

# 2. The Pipeline
# pv: Monitor progress based on bytes fed into the pipe
# zcat: Decompress on the fly
# psql: Apply to DB
# set -o pipefail: If psql fails, the whole script fails (unlike standard pipes)

set -o pipefail

(pv -n -s "$TOTAL_SIZE" "$DUMP_FILE" | zcat | psql -U postgres -d "$TARGET_DB") 2>&1 | tee "$LOG_FILE"

EXIT_CODE=$?

if; then
    echo "Restore Complete Success at $(date)"
else
    echo "Restore FAILED with code $EXIT_CODE. Check $LOG_FILE"
fi

Sample Output (Terminal View) Starting restore of prod_dump.sql.gz to analytics_db at Mon Oct 27 10:00:00 UTC 2023 Logs: /var/log/pg_restore_analytics_db_1698400800.log 14.2GiB 0:45:12[================> ] 42% ETA 1:02:00

For Scenario B: Physical PITR (Log Parsing)

Since physical restores replay WAL files sequentially, we can tail the logs to see exactly which file is being processed.

The “WAL Stalker” Script:

Bash

#!/bin/bash
# monitor_pitr_logs.sh
# Usage: Run this in a tmux/screen session during PITR

LOG_FILE="/var/log/postgresql/postgresql-14-main.log"
# Set the total number of WAL files you expect to replay (Estimate this by checking your archive folder)
# Example: If you have files from 00000001000000AA to 00000001000000FF, that's roughly 85 files.
TOTAL_WALS_TO_REPLAY=500 
START_TIME=$(date +%s)

echo "Tailing log for WAL restoration..."

# We grep for the specific success message PG emits when a segment is restored
tail -F "$LOG_FILE" | grep --line-buffered "restored log file" | awk -v total="$TOTAL_WALS_TO_REPLAY" -v start="$START_TIME" '
BEGIN { count=0 }
{
    count++;
    current_time = systime();
    elapsed = current_time - start;
    
    # Calculate rate (WALs per second)
    if (elapsed > 0) rate = count / elapsed; else rate = 0;
    
    # Estimate remaining
    remaining_wals = total - count;
    if (rate > 0) eta_seconds = remaining_wals / rate; else eta_seconds = 0;
    
    printf "\rProcessed: %d / %d | Last: %s | Avg Rate: %.2f WALs/s | EST Remaining: %.1f min", 
           count, total, $4, rate, eta_seconds/60;
    fflush(stdout);
}'

Sample Output (Live Updating Line) Processed: 150 / 500 | Last: 00000001000000AB | Avg Rate: 0.85 WALs/s | EST Remaining: 6.9 min


4. Estimating and Predicting Remaining Time

“Are we there yet?” requires math, not hope.

Why Prediction Fails

  1. Index Skew: Loading data is linear (O(n)). Building B-Tree indexes is O(n log n). The last 10% of the restore (indexes) can take 50% of the time.
  2. WAL Variation: Not all WAL files are full. Replaying a 16MB WAL file that touches one page is faster than one that touches 1,000 pages.

Heuristic 1: The Logical Restore “Rule of Thirds”

For complex schemas (many indexes/FKs), split the estimation:

  1. Data Load: Use pv or file size. If 50GB processed of 100GB, you are 50% done with Phase 1.
  2. Index/Constraints: Assume this takes 60-80% of the time taken for Data Load.
    • Example: Data load took 4 hours. Expect indexes to take another 2.5 – 3.5 hours.

Heuristic 2: The Physical Replay “Moving Average”

Use LSN Differential math.

  1. Find Start LSN: The Redo point in your backup label.
  2. Find Target LSN: The LSN of your point-in-time target (or current master LSN).
  3. Calculate Throughput:

SQL

-- Run this every 5 minutes to get a moving average
SELECT 
    now(),
    pg_last_wal_replay_lsn() AS current,
    -- Difference in bytes since script started
    pg_wal_lsn_diff(pg_last_wal_replay_lsn(), '0/START_LSN') / EXTRACT(EPOCH FROM (now() - 'START_TIME'::timestamp)) AS bytes_per_sec

Example Calculation:

  • Total Replay needed: 200 GB
  • Current Speed: 50 MB/s (derived from watching monitor_pitr_logs.sh)
  • Time Remaining = 200,000 MB / 50 MB/s = 4,000 seconds (~66 minutes).

5. “Fool-Proof” Production Considerations

Scripts fail. SSH sessions die. Here is how to harden your restore process.

1. The “Nohup” Safety Net

Never run a multi-hour restore directly in an interactive shell. If your WiFi blips, the pg_restore receives a SIGHUP and dies.

Use systemd-run or tmux:

Bash

# The professional way to run a one-off detached restore job
sudo systemd-run --unit=pg-restore-job --scope \
    pg_restore -j 8 -Fd /backups/db -d production_db > /var/log/pg_restore.log 2>&1

Check status with systemctl status pg-restore-job.

2. Validating Exit Codes

Always capture exit codes. pg_restore is “chatty”. It often emits warnings (e.g., “table exists”, “user not found”) that return exit code 0 or 1 depending on severity.

Use -e (exit on error) strictly if you want a pristine restore, but be aware it is brittle. A better approach for production scripts is parsing the log for “FATAL” or “ERROR”.

3. Tuning for Speed (Temporary)

Before pg_restore (Scenario A) or starting the WAL replay (Scenario B), optimize postgresql.conf temporarily:

  • fsync = off (Only for the restore duration! Turn back on immediately after or you risk corruption).
  • maintenance_work_mem = 4GB (Speed up index creation).
  • max_wal_size = 50GB (Reduce checkpoint frequency during load).
  • autovacuum = off (Prevent cleanup threads from fighting for I/O).

4. Alerting on Failure

Don’t stare at the screen. Make the script scream if it fails.

Bash

# Add to the end of your bash wrapper
if; then
    curl -X POST -H 'Content-type: application/json' \
    --data "{\"text\":\"CRITICAL: DB Restore Failed on $HOSTNAME. Check logs.\"}" \
    https://hooks.slack.com/services/YOUR/WEBHOOK/URL
fi