Architecture

Last change on 2025-08-28 • Created on 2025-08-28 • ID: CL-29C65

Replication stages during live migration

Each cloud server runs on a physical server. In some scenarios, a physical server may become unavailable. To keep cloud servers running, our systems perform a live migration from the affected physical server to another physical server within the same location.

Replication stages

Everything is replicated from the source server to the target server in three stages.

Stage 1 Stage 2 Stage 3
The new, empty cloud server target is created on a different physical server within the same location. The cloud server target is not available yet. The cloud server source is turned off and becomes unavailable. The cloud server source is turned back on. If the target server still needs anything, the source server can still help.
Most state of the source server is replicated onto the target server. The local NVMe SSD and the source memory are copied to the target, while tracking the pages that have been changed on the source. The state required for running the cloud server is now replicated from the source server onto the target server.
In addition, the static state of the system is now replicated.
If networking is still updating and routing traffic to the source server, the source server forwards that traffic to the target server.

The following data is replicated:

  • Critical data: e.g. local NVMe SSD, memory, Volume
  • Operational data: data required to run the cloud server

Live migration process:

Stage 1
Network.:
Traffic is routed to source
Availabil.:
source brownout
Duration:
Several minutes
Physical server A

cloud server source
State:
ON
Critical:
Sending...
Operational:
Not replicated
cloud server target
State:
ON
Critical:
Receiving...
Operational:
Pending

Physical server B
Stage 2
Network.:
Cloud server is unavailable
Availabil.:
blackout
Duration:
Usually less than 1 second
Physical server A

cloud server source
State:
OFF
Critical:
Already replica.
Operational:
Sending...
cloud server target
State:
OFF
Critical:
Full replication
Operational:
Receiving...

Physical server B
Stage 3
Network.:
Traffic is routed to target
Availabil.:
target brownout

Physical server A

cloud server source
State:
ON
Critical:
Already replicat.
Operational:
Already replicat.
cloud server target
State:
ON
Critical:
Full replication
Operational:
Full replication

Physical server B

Table of Contents