Replication stages during live migration
Each cloud server runs on a physical server. In some scenarios, a physical server may become unavailable. To keep cloud servers running, our systems perform a live migration from the affected physical server to another physical server within the same location.
Replication stages
Everything is replicated from the source server to the target server in three stages.
Stage 1 | Stage 2 | Stage 3 |
The new, empty cloud server target is created
on a different physical server within the same location. The
cloud server target is not available yet.
|
The cloud server source is turned off and becomes unavailable.
|
The cloud server source is turned back on.
If the target server still needs anything, the
source server can still help.
|
Most state of the source server is replicated
onto the target server. The local NVMe SSD and the source
memory are copied to the target , while tracking
the pages that have been changed on the source .
|
The state required for running the cloud server is now replicated
from the source server onto the target
server.In addition, the static state of the system is now replicated. |
If networking is still updating and routing traffic to the
source server, the source server
forwards that traffic to the target server.
|
The following data is replicated:
- Critical data: e.g. local NVMe SSD, memory, Volume
- Operational data: data required to run the cloud server
Live migration process:
Stage 1
Network.:
Traffic is routed to
source
Availabil.:
source
brownout
Duration:
Several minutes
Physical server
A
cloud server
source
State:
ON
Critical:
Sending...
Operational:
Not replicated
cloud server
target
State:
ON
Critical:
Receiving...
Operational:
Pending
Physical server
B
Stage 2
Network.:
Cloud server is unavailable
Availabil.:
blackout
Duration:
Usually less than 1 second
Physical server
A
cloud server source
State:
OFF
Critical:
Already replica.
Operational:
Sending...
cloud server
target
State:
OFF
Critical:
Full replication
Operational:
Receiving...
Physical server
B
Stage 3
Network.:
Traffic is routed to
target
Availabil.:
target
brownout
Physical server
A
cloud server
source
State:
ON
Critical:
Already replicat.
Operational:
Already replicat.
cloud server
target
State:
ON
Critical:
Full replication
Operational:
Full replication
Physical server
B