Upgrade of monitoring-app fails with "etcdserver: request is too large" error

Description

An upgrade fails during “monitoring-app” application upgrade with the upgrade hook (“monitoring-app-update-xxx” pod) showing etcdserver: request is too large error:

$ kubectl logs monitoring-app-update-d4e4e5-c4zn8 -n kube-system
---> Assuming changeset from the environment: monitoring-5514
---> Creating monitoring namespace
namespace/monitoring configured
---> Getting node name for influxdb pod
---> Creating or updating resources
2020-05-23T07:52:26Z DEBU             changeset init logrus/exported.go:77
2020-05-23T07:52:26Z ERRO             "\nERROR REPORT:\nOriginal Error: *errors.StatusError etcdserver: request is too large\nStack Trace:\n\t/gopath/src/github.com/gravitational/rigging/changeset.go:1295 github.com/gravitational/rigging.(*Changeset).withUpsertOp\n\t/gopath/src/github.com/gravitational/rigging/changeset.go:1523 github.com/gravitational/rigging.(*Changeset).upsertServiceAccount\n\t/gopath/src/github.com/gravitational/rigging/changeset.go:146 github.com/gravitational/rigging.(*Changeset).upsertResource\n\t/gopath/src/github.com/gravitational/rigging/changeset.go:112 github.com/gravitational/rigging.(*Changeset).Upsert\n\t/gopath/src/github.com/gravitational/rigging/tool/rig/main.go:273 main.upsert\n\t/gopath/src/github.com/gravitational/rigging/tool/rig/main.go:122 main.run\n\t/gopath/src/github.com/gravitational/rigging/tool/rig/main.go:31 main.main\n\t/go/src/runtime/proc.go:209 runtime.main\n\t/go/src/runtime/asm_amd64.s:1338 runtime.goexit\nUser Message: \n" logrus/exported.go:102
ERROR: etcdserver: request is too large

Affected versions

The issue may surface when upgrading to Gravity version prior to 5.5.45. Starting with version 5.5.45, Gravity sets a larger request limit size on etcd which should avoid this issue.

Workaround

To workaround the issue, increase the etcd request limit size to 10MB (from the default 1.5MB) by performing the following on each master node, one by one:

  1. Enter the planet container: sudo gravity enter.
  2. Edit etcd systemd unit file (nano /lib/systemd/system/etcd.service) and add command line parameter --max-request-bytes=10485760 to the ExecCommand.
  3. Reload systemd configuration: systemctl daemon-reload.
  4. Restart etcd service: systemctl restart etcd.

Once etcd has been updated, rollback and re-execute the monitoring-app upgrade phase to make sure it completes successfully now, and then resume the upgrade operation:

sudo gravity plan rollback --phase=/runtime/monitoring-app
sudo gravity plan execute --phase=/runtime/monitoring-app
sudo gravity plan resume