Skip to content

Patching and Uptime

Document: Technical Design – Backend – Patching & Uptime
Status: Exploratory
Last updated: 2026-01-10


1. Purpose and Non-Negotiable Goal

Hard constraint: Atherion should avoid “maintenance nights” as a player-facing experience.

Desired player experience (inspired by GW2 / Warframe patterns):

  • Players receive an in-game notification that a new version is available.
  • Players can choose when to restart (within a grace window).
  • Players can finish their current activity (e.g., an Everspire run).
  • Hard cutoffs are rare, short, and communicated clearly.

This document proposes a model to achieve high uptime with graceful patching, using SpacetimeDB on Maincloud and an instance-based world.


2. Definitions

Client Version
The game client build version installed on the player’s machine.

Server Module Version
The version of SpacetimeDB module code currently deployed.

Epoch (Build Epoch)
A logical “compatibility generation” of the live environment.
When an epoch changes, new sessions/instances migrate to the new epoch while old instances may drain.

Draining
A state where an instance continues running for existing participants but rejects new entrants.

Hard Cutoff
A point in time after which old client versions cannot start or continue sessions.


3. Guiding Principles

3.1. Do Not Patch by Disconnecting Everyone

Decision (hard):
Patching must be instance-aware. The game may “move forward” while old instances finish.

Rationale:
Global disconnects turn every patch into a high-friction event and train players to avoid playing near patch time.


3.2. Separate “Server Deploy” From “Player Restart”

Decision (goal):
Server code can be updated without immediately forcing client restarts.

Rationale:
This enables soft patching: new content and logic can roll out while giving players agency to restart on their schedule.

This requires compatibility discipline (see Section 7).


3.3. Prefer Additive Changes

Decision (directional):
Prefer additive schema and logic changes over breaking changes.

Rationale:
Additive changes are easiest to run in mixed-version periods and align with incremental migration strategies.


4. High-Level Patching Strategy: Epoch + Drain

4.1. Lifecycle Overview

A patch rollout has phases:

  1. Announce
    1. Notify players: “Update available.”
    2. Provide a grace period and cutoff time.
  2. Gate New Sessions
    1. New logins require the new client version (optional: after some delay).
    2. Existing sessions can continue.
  3. Create New Epoch
    1. New instances are created under the new epoch.
    2. Matchmaking routes new runs to the new epoch.
  4. Drain Old Epoch Instances
    1. Old instances continue but block new joins.
    2. Active runs can finish normally.
  5. Hard Cutoff
    1. Old clients can no longer authenticate.
    2. Remaining old-epoch instances are ended or migrated.

This model avoids a “global offline window.”


4.2. Which Content Types Drain Cleanly?

Everspire runs: drain extremely well.

  • They have clear start/end boundaries.
  • They can be marked draining immediately at patch start.
  • They can be allowed to finish.

Open-world map instances: drain moderately well.

  • Players can be migrated by party-follow.
  • For active world events, allow completion windows.

Social hubs: drain trivially.

  • No combat, quick migration.

5. Server-Side State Model (Conceptual)

This section defines the minimum server-side state required for soft patching.

5.1. Patch State

A single authoritative PatchState record (per region) controls messaging and enforcement:

  • server_epoch (integer)
  • required_client_version (string or semver)
  • grace_until (timestamp)
  • message (string)
  • policy (enum-like: announce_only, gate_new_sessions, hard_cutoff)

Clients subscribe to this state to display notifications.


5.2. Session Tracking

Each connected session tracks:

  • player_id
  • client_version
  • connected_epoch
  • last_seen_at

This enables:

  • mixed-version grace windows
  • enforcement at cutoff
  • reconnect rules

5.3. Instance State

Each instance/run tracks:

  • instance_id
  • instance_type (hub/openworld/everspire)
  • epoch
  • status (running, draining, ended)
  • created_at, expires_at (optional)
  • owner (party/run owner where applicable)

Matchmaking must prefer instances in the latest epoch.


6. Client Experience (UX Contract)

6.1. Notifications

When a patch is available:

  • show a non-blocking banner:
    • “Update available. Restart when ready.”
  • show a countdown when cutoff is scheduled:
    • “Update required in 2 hours”

Messaging should be:

  • polite
  • time-aware
  • consistent

6.2. Preventing “Patch Traps”

Players should not be surprised by cutoff mid-activity.

Rules of thumb:

  • If an Everspire run is active, allow it to finish.
  • If cutoff is near, warn players before starting new long content:
    • “A patch is scheduled in 10 minutes. Starting a new expedition is not recommended.”

6.3. Reconnect Grace

During draining, allow limited reconnect:

  • If a player disconnects mid-run, allow rejoin for X minutes.
  • After X minutes, the run continues without them (or replaces them via matchmaking, if designed).

7. Compatibility Discipline (How to Make This Work)

7.1. Mixed-Version Window Requirements

To allow old clients to remain connected while new clients join, the backend must avoid breaking changes.

Guidelines:

  • Add new fields; do not remove old fields immediately.
  • Add new reducers/endpoints; keep old ones until deprecation.
  • Use feature flags for new behavior.
  • Keep old clients functional in “legacy mode” during grace.

7.2. Additive Schema + Incremental Migration Model

Preferred pattern:

  1. Introduce new schema elements (tables/columns) alongside old.
  2. Write server logic that supports both.
  3. Backfill or migrate data gradually.
  4. Switch reads to the new schema.
  5. Remove old elements in a later cleanup patch.

This reduces the need for downtime due to data migration and supports rolling patch behavior.


8. Handling Breaking Changes (When Unavoidable)

Sometimes breaking changes cannot be avoided (e.g., protocol overhaul, major feature rewrite).

Policy:

  • Schedule a longer grace period (hours, not minutes).
  • Gate new sessions early to reduce mixed-version complexity.
  • Drain aggressively where safe.
  • Use a clear final cutoff.

Even in this case, the goal is:

  • no “hours of downtime”
  • only a forced restart at cutoff

9. Failure Modes and Safeguards

9.1. What If a Patch Goes Wrong?

If a deployed patch causes critical issues:

  • ability to revert module version quickly
  • ability to freeze creation of new instances in the new epoch
  • ability to keep old epoch alive temporarily (if safer)

Rollback strategy should prioritize:

  • preserving durable state
  • preserving active runs when possible
  • clear communication to players

9.2. What If Players Ignore the Update?

At hard cutoff:

  • Old client versions are denied authentication for new sessions.
  • Existing sessions on old client versions are disconnected (forced end of session).
  • No reconnect allowed with old client version after cutoff.
  • The client displays a blocking message:
    • “Update required. Please restart to continue.”
  • Optionally (client-side policy): the game may exit to desktop after showing the message, but exiting is not strictly required as long as gameplay cannot continue.

The system should not keep legacy sessions alive beyond the cutoff.


10. Maincloud / SpacetimeDB Considerations (High-Level)

Current assumptions:

  • Server logic runs in SpacetimeDB modules.
  • Instance loops can be scheduled (tick reducers) per instance/run.
  • Deployments should be capable of hotswapping modules without full-region shutdown in typical cases.

This document intentionally avoids relying on undocumented platform behavior. Where exact platform capabilities impact design, we should test and document findings.


11. Open Questions

  • Should Everspire runs pause/resume across disconnects, or only allow short reconnect?
  • How long should grace periods be for typical patches vs major updates?
  • Should open-world map instances be migrated automatically or ended with a safe teleport?
  • How should the client behave if it cannot download the patch immediately?
  • How do we log and measure “patch friction” as a metric?

12. Relationship to Other Documents

  • technical-design/architecture.md provides the instance-based foundation.
  • technical-design/backend/instance-model.md will define instance schemas and lifecycle reducers.
  • technical-design/backend/tick-rates.md will define tick scheduling per content type.
  • game-design/world/everspire.md informs drain rules for dungeon runs.

End of document.


Discussion