Keyphrases
In-memory
100%
Fault Tolerance
100%
Large Language Models
100%
Pre-trained Language Model
100%
Checkpointing
33%
Snapshotting
33%
Fault Tolerance Methods
33%
Caching
16%
Communication Cost
16%
Two-layer
16%
Training Period
16%
Node Failure
16%
Data Transfer
16%
Erasure Codes
16%
Training Costs
16%
Software Failure
16%
System Reliability
16%
Failure Probability
16%
Asynchronously
16%
Management Process
16%
GPU Memory
16%
Survival Probability
16%
Extensive System
16%
CPU Resource
16%
Enhanced System
16%
Layer Hierarchy
16%
System Scaling
16%
Saving Time
16%
Asynchronous Checkpointing
16%
Layered Security
16%
Reduced Data
16%
Volatile Memory
16%
Computer Science
fault-tolerance
100%
Large Language Model
100%
Checkpointing
37%
Graphics Processing Unit
25%
Input/Output
25%
Tolerance Method
25%
Communication Cost
12%
Process Management
12%
Survival Probability
12%
Protection Parameter
12%
Layered Protection
12%
Software Failure
12%
Volatile Memory
12%
Training Period
12%