FlexOlmo: Revolutionizing Collaborative Language Model Training Without Data Sharing

**FlexOlmo**, developed by the Allen Institute for Artificial Intelligence, showcases the potential for organizations to collaboratively work on language models utilizing local datasets without sharing sensitive information.

**FlexOlmo** is built on the Mixture-of-Experts (MoE) architecture, where each expert corresponds to an independently trained feedforward network (FFN). A fixed public model (denoted as Mpub) acts as a common reference point. Each data owner trains their expert Mi on their private dataset D_i, while keeping all attention layers and other non-expert parameters frozen.

A key challenge with independent experts is coordination. FlexOlmo addresses this by using the frozen public model as a benchmark. The public expert remains unchanged during the training process, while new experts are developed using local data. This approach ensures that all experts align with the same reference model and can be integrated without further retraining.

FlexOlmo is particularly suited for scenarios where strict control over data access is necessary. Data sources can be activated or deactivated based on application needs. For instance, toxic content may be included for research purposes but excluded from public use.

Researchers illustrated this by removing the news expert during a test run. As expected, performance on news-related tasks declined, yet results in other areas remained stable.

Even if licenses change or usage rights expire, data sources can be deactivated without needing to retrain the entire model. The final model consists of 37 billion parameters, with 20 billion being active.

The team evaluated FlexOlmo using publicly available data along with seven specialized datasets: news, creative writing, code, scientific articles, educational content, mathematics, and Reddit posts.

When tested across 31 tasks, FlexOlmo achieved an average performance increase of 41% compared to a model trained solely on public data. Overall, FlexOlmo outperformed a hypothetical model that had access to all data while incurring the same computational costs. Only a model trained on the complete dataset with double the resources showed marginally better results.

Since data owners only provide access to trained models, the risk of data leakage is minimized. During testing, attempts to reconstruct the training data succeeded in just 0.7% of cases. For organizations handling highly confidential information, FlexOlmo offers the ability to train while ensuring differential privacy, providing formal data protection. Each participant can opt into this feature independently. The Allen Institute also released OLMoTrace, a tool to track the relationship between language model outputs and their training sources.

**Delegate some routine tasks with** BotHub**!** No VPN is required to access the service, and you can use a Russian bank card. By following this link, you can obtain 100,000 free tokens for your first tasks and start working with neural networks right away!

This translation is sourced from the news article here.