Postgres pod in crash-loop with the following errors in the log:
LOG: database system was shut down at 2019-03-20 14:46:22 UTC
LOG: record with incorrect prev-link B17/E7000003 at 2/8B4B0580
LOG: invalid primary checkpoint record
LOG: record with incorrect prev-link 5C92/52270000 at 2/8B4B0510
LOG: invalid secondary checkpoint record
PANIC: could not locate a valid checkpoint record
LOG: startup process (PID 24) was terminated by signal 6: Aborted
LOG: aborting startup due to startup process failure
LOG: database system is shut down
As a result, other Customer platform pods dependent on postgres were stuck in “init”.
The error indicated that the postgres data directory (in particular, transaction log) was corrupted.
The following Stackoverflow question provided a recommended solution:
To repair the transaction log we launched the postgres container and executed the pg_resetxlog command:
$ sudo gravity enter
$ docker run -v /opt/customer/storage/pgdata:/opt/customer/storage/pgdata -ti
// inside postgres container
$ su -u postgres
$ /usr/lib/postgresql/9.6/bin/pg_resetxlog /opt/customer/storage/pgdata
After that, restarted the postgres pod which came back up and all Customer pods launched as well.
Prepared by: @r0mant