Federated Learning with Soft Embeddings: A New, Efficient Way to Train Retrieval Models

by
webAI Team
September 24, 2025
Key Takeaways
  • Adapters for soft embeddings: Lightweight layers in frozen SLMs adapt encoders to new corpora without full fine-tuning.
  • Classifier-as-Retriever: Replaces static MIPS with a trainable classifier, boosting accuracy from ~12% to ~99%.
  • Federated training: Distributed optimization across clients delivered up to 2.6× faster training versus centralized baselines.
  • Differential privacy: Gradient clipping and noise injection safeguard client data with minimal performance loss.
  • Federated Learning with Soft Embeddings: A New, Efficient Way to Train Retrieval Models

    The problem with retrieval today

    Retrieval-augmented generation (RAG) has become the standard way of grounding large language models (LLMs) in real-world knowledge. By pulling in relevant documents at query time, RAG reduces hallucinations and makes AI systems more useful across domains.

    But adapting retrieval systems to new domains remains a costly, inefficient process. Fully fine-tuning a large model is compute-heavy and often infeasible on resource-constrained edge devices. Standard similarity search, usually implemented with dot-product lookups in a vector database, is static and suboptimal, struggling to capture the nuances of specialized corpora.

    As AI shifts closer to customer data, running on devices at the edge and inside regulated environments, the challenge becomes clear: We need retrieval architectures that are lighter, faster, and privacy-preserving.

    Our research: Two key innovations

    That’s the problem our new research tackles. In our newest paper, published on arXiv, we introduce a novel retrieval architecture that combines adapters for soft embeddings with a classifier-as-retriever approach. Together, these innovations dramatically improve accuracy while reducing training costs, and they integrate naturally with federated and privacy-preserving training strategies.

    Overview of our architecture. A frozen small language model (SLM) is augmented with lightweight adapters for soft embeddings and a classifier head that learns retrieval directly.

    1. Adapters for soft embeddings

    Instead of fully fine-tuning a large encoder, we start with a frozen small language model (SLM). Between the tokenizer and the transformer blocks of the SLM, we insert a lightweight adapter; a simple transformation matrix that is trainable.

    This adapter reshapes the token embeddings into what we call soft embeddings: Domain-tuned representations of the corpus. Unlike full fine-tuning, this approach leaves the base model untouched, but still adapts the embedding space to capture the terminology and structure of the target domain. The result is a much more memory-efficient way of steering a model toward specialized knowledge.

    2. Classifier-as-Retriever (CaR)

    Traditional RAG systems rely on maximum inner product search (MIPS) to measure similarity between query embeddings and document embeddings. While effective at scale, MIPS is a fixed heuristic in that it cannot learn or adapt.

    We propose an alternative: Attaching a classifier head to the frozen SLM. Trained on query–document pairs, the classifier learns to map queries directly to their corresponding documents. This transforms retrieval from a static lookup into a trainable similarity function that improves with exposure to domain-specific data.

    In experiments, this shift made a dramatic difference. On frozen off-the-shelf SLMs, MIPS achieved only ~12% top-1 accuracy on downstream retrieval tasks. By contrast, our classifier-as-retriever approach boosted accuracy to between 96–99%, a total step change in retrieval quality.

    Accuracy comparison across methods. Traditional MIPS on frozen SLMs achieved ~12%, while classifier retriever and soft embeddings each delivered major improvements. Combined, they reached ~99%.

    Why these matter together

    Both innovations — adapters for soft embeddings and classifier-as-retriever — can be used independently. The adapter improves how embeddings are shaped; the classifier makes retrieval adaptive rather than static. Combined, they provide a lightweight but powerful way to customize retrieval for any domain, without the overhead of full model fine-tuning.

    The trade-offs: Accuracy vs. speed

    Our approach offers two different training paths, each with its own balance of speed and accuracy.

    Option A: Classifier-only training

    • Extremely fast; often under a minute on small datasets.
    • Achieves high accuracy (mid-to-high 90s).

    Option B: Classifier + adapters

    • Slower to train; tens of minutes or more.
    • Achieves maximum accuracy, approaching ~99%.
    Our experiments confirm this trade-off. The above table compares accuracy and training time across centralized and federated setups.

    The choice depends on application needs. For some scenarios, “fast enough” accuracy combined with speed is ideal. For others, squeezing out the last percentage points of accuracy justifies the additional cost. 

    Federated + Private Training: Scaling it out

    Retrieval accuracy is only one part of the challenge. Training must also be fast and privacy-preserving, especially when data lives on the edge.

    • Federated learning (FL): Training happens across multiple devices. Instead of sending raw data to a central server, each device trains locally and only shares small parameter updates.
    • Differential privacy (DP): Before updates are shared, they are clipped and noise is added, ensuring no private data can be reconstructed.

    Results:

    • Training with FL yielded significant speedups — between 1.7x and 2.6x faster with two or three devices, compared to centralized training.
    • Adding DP provided an extra layer of privacy for edge devices, with only a small drop in accuracy.
    Summarized speedup results, showing how federated training reduced overall time-to-train compared to centralized training.

    Key insight: These benefits are orthogonal. Even if you don’t adopt our retrieval architecture, federated training and differential privacy still make distributed learning faster and more secure.

    Why this matters for enterprises

    Enterprises increasingly need retrieval systems that adapt to their own knowledge bases while meeting strict performance and privacy constraints. Our approach delivers on several fronts:

    • Domain-specific precision: Create retrieval models that understand the unique terminology of your corpus, from manufacturing manuals to legal documents.
    • Faster deployment cycles: Training speedups reduce cost and time-to-value.
    • Privacy by design: Data never has to leave the customer environment, a key advantage for healthcare, aerospace, and financial applications.
    • Flexibility: Choose between classifier-only or classifier+adapter training based on accuracy vs. speed needs.
    • Adaptability: Train domain-tuned soft embeddings for each corpus, producing multiple expert retrievers instead of one-size-fits-all models.
    • Scalability: Add corpus-specific experts and route queries intelligently, ensuring consistent performance as your knowledge base grows.

    Redefining Retrieval Training: What Sets This Approach Apart

    Traditional approaches rely on large models, full fine-tuning, and static similarity search. Our approach flips the equation:

    • Small models, frozen — making them edge- and memory-friendly.
    • Lightweight adapters — adapting embeddings with minimal overhead.
    • Trainable retrieval — replacing static similarity with a classifier.
    • Federated + private training — scaling securely across distributed devices.

    The leap in accuracy — from ~12% with MIPS on a frozen SLM to ~96–99% with classifier retriever — shows how transformative this shift can be.

    What Comes Next

    This research demonstrates a path to making retrieval smarter, faster, and safer for enterprise AI.

    For the proofs, math, and experimental benchmarks, see the full paper on arXiv

    But the real story is what lies ahead: scaling these techniques across larger datasets, extending them into new domains, and weaving them into the broader fabric of distributed intelligence.

    This is one milestone of many on our journey. And we’re just getting started.

    Author Information
    webAI Team
    Related posts.
    No items found.
    All
    First Principles
    Dropdown IconDropdown IconDropdown IconDropdown Icon
    Industry insights
    Dropdown IconDropdown IconDropdown IconDropdown Icon
    AI fundamentals
    Dropdown IconDropdown IconDropdown IconDropdown Icon
    Engineering
    Dropdown IconDropdown IconDropdown IconDropdown Icon
    Changelog
    Dropdown IconDropdown IconDropdown IconDropdown Icon
    Company news
    Dropdown IconDropdown IconDropdown IconDropdown Icon
    Customers
    Dropdown IconDropdown IconDropdown IconDropdown Icon
    Industry insights
    The Lightbulb of AI