top of page

STADLE Model Training Capabilities

Continuous & Distributed Learning

image.png
image.png

Maintains a single model updated with new data as it is collected. Addresses the issue of "catastrophic forgetting" by ensuring new training data minimally disrupts previously learned information.

Efficiently updates models from multiple data sources without centralizing the data. Overcomes the inefficiencies and regulatory challenges of centralized data collection. Implements a Federated Learning approach to aggregate data source-specific models without transferring data.

Traditional AI model training tends to be a one-and-done process - collect enough data once and train a model on the data once.

However, training data in real-world applications is often continuously produced over time, with the inherent trends in new data shifting away from older data.

image.png

How can we modify the model training process to allow for previously trained models to be updated with new data in a time and compute-efficient manner?

Existing approaches to handle model training in this case have key flaws:

image.png

​​A standard approach is to combine the new data with previously collected data and retrain from scratch each time
    → Training time scales with training data, training eventually becomes impossible in reasonable time

image.png

​​An alternative approach is to start training from the most recent version of the model on only the new data, to keep training time constant
    → Training on new data often leads the model forgetting key information it learned prior (“catastrophic forgetting” problem)

New model training should undo prior training as little as possible!

image.png

​​STADLE tracks how each past training process affected different parts of the model.  When a new training process is started, STADLE summarizes the prior training information and modifies the process to penalize modifications to important parts of the model

Traditional AI model training also tends to focus on the centralized case - collect data from many sources into a single location and train a single model on all of the collected data.

However, there are many cases where this data centralization process is extremely inefficient (large scales) or even impossible (highly regulated industries).

image.png

How can we train models across multiple data sources without transferring data from its source?

Standard model training approaches struggle in this case:

image.png

​The most common approach is to simply train one model per data source, deploying either one or all the models to each location for inference

    → Each model only sees a subset of the full data and thus fails to generalize, managing multiple models leads to deployment and routing complexity at each location

 

Alternatively, a continuous learning-based approach can be used by training a single model on each data source, moving the model from location to location

    → This faces the same “catastrophic forgetting” problem, along with poor time efficiency with multiple data sources

Federated Learning (FL) directly targets this problem by aggregating the training of data source-specific models, allowing for training over all available data in parallel without centralizing.

image.png

Current FL approaches are adequate, but are still lacking in certain areas:

    → Data sources with different underlying distributions (common in real-world use-cases) lead to poor training speed and performance

    → Traditional architectures struggle to scale horizontally


STADLE expands on traditional FL by simultaneously targeting the training efficiency problem:

    → In the continuous learning case, we prevent later training from overwriting past training; for FL, we instead prevent the training at each source from overwriting the training at other sources.  This greatly improves training speed and robustness on real-world data
    → STADLE adopts a hierarchical architecture with horizontal auto-scaling capability to dynamically adapt to both small scales (e.g. five hospitals) and large scales (e.g. thousands of IoT devices)

In many cases, creating a single generalized model may not be desired; we can instead allow the deployed models to specialize on different subsets of the data to better capture different trends.

STADLE allows for general model training (beneficial to all deployed models) to be separated from specialized model training and aggregated across models, maximizing deployment-specific performance while retaining general accuracy.

image.png
In a Meeting

Get In Touch

Thanks for submitting!

bottom of page