Source linked

Distributed Speciation Search Cuts LLM Red-Teaming Time by 3.2x

ToxSearch-S uses MPI to parallelize evolutionary prompt search, achieving 3.2x wall-clock speedup with 4 workers while preserving peak toxicity and discovering more localized behavioral failure modes.

toxsearch slarge language modelsred teamingai safetympievolutionary search

A distributed implementation of quality-diversity search for LLM toxicity, ToxSearch-S, runs 3.2x faster with 4 workers than sequential execution while preserving the same peak toxicity score.

Why Quality-Diversity Matters for Red-Teaming

Standard red-teaming optimizes for a single metric - often peak toxicity - which can miss entire families of adversarial prompts. ToxSearch-S adds incremental speciation: it uses embedding-driven niche maintenance to preserve behavioral diversity across generations. The result is a search that covers more distinct failure modes without retracing old ground.

The paper's authors compare ToxSearch-S against two baselines: ToxSearch (focused on peak toxicity) and RainbowPlus (designed for embedding-level diversity). Under a fixed budget, ToxSearch-S attains peak toxicity competitive with both, but follows a measurably less toxic best-so-far trajectory. That lower cumulative search pressure is exactly what you want when covering many weak spots matters more than hitting one max-value spike.

MPI Distribution: 1.8x with 2 Workers, 3.2x with 4

ToxSearch-S uses an MPI master-worker architecture. Rank 0 handles population and species bookkeeping; each of the $n_w$ workers evolves prompts and evaluates them in parallel. Wall-clock gains are linear-ish: approximately 1.8x with two workers and 3.2x with four. More importantly, Best@B - the best toxicity achieved within the budget - remains statistically indistinguishable from sequential execution.

Four-worker runs do produce significantly larger final species cardinality and more toxicity-bearing species. The extra parallel capacity lets the algorithm explore more niches before the budget runs out, a direct practical benefit for safety teams juggling compute limits.

Localized Behavioral Pockets vs. Embedding Spread

Diversity is not one-dimensional. RainbowPlus yields greater embedding-level spread - its prompts scatter widely in latent space. ToxSearch-S instead partitions high-toxicity prompts into more localized behavioral pockets, reflected by a higher DBSCAN cluster count. For a red team, this matters: you get a finer-grained map of what kinds of prompts trigger toxicity, not just a cloud of points.

These results position incremental speciation as a practical quality-diversity mechanism for AI safety. MPI buys real-world throughput gains without degrading search quality, letting teams run more exhaustive adversarial testing in less clock time.


Source: Distributed Quality-Diversity Search for Toxicity in Large Language Models
Domain: arxiv.org

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.