125 Wikipedia Edits Tilt Llama 8B النتائج على رعاية الحيوانات

Sixty-eight percent of the highest-attributed documents driving Llama 3.1 8B's answers on animal welfare come from just 125 Wikipedia edits made by a group called Pro-Animal Wikipedians (PAW). That's the core finding of a new study that uses gradient-based data attribution to trace how small coordinated edits propagate into large language model behavior.

PAW advocates added sourced animal welfare content to 115 Wikipedia pages. The authors applied two attribution methods: TrackStar on Llama 3.1 8B and MAGIC counterfactual influence estimation on Llama-3.2-1B. TrackStar found PAW-edited sections made up 68% of the highest-attributed documents for animal welfare queries (p < 0.0001), but only 52% for unrelated queries about the same companies (p = 0.53). The model links PAW content specifically to animal welfare topics, not to the entities in general.

Counterfactual Influence: 10 of 10, Across 5 Seeds

MAGIC counterfactual influence estimation sharpens the picture further. Across five random training-order seeds, the top-10 most influential documents on animal welfare queries were PAW edits in every single seed. On general queries, the same top-10 sat at chance (4 to 6 of 10). Mean PAW influence exceeded mean control influence on animal welfare queries with p < 0.0001 in every seed, an effect 6 to 30 times larger than on general queries. Leave-subset-out validation gave Spearman rho = 1.00 for all 10 runs.

Fine-Tuning Confirms the Mechanism

To show causation, the authors fine-tuned separate models on PAW content versus control content. Each model performed better specifically on the type of text it was trained on. The PAW-trained model cut perplexity on animal welfare text from 12.4 to 8.4, while the control-trained model cut perplexity on control text from 16.1 to 11.4. That's a concrete, measurable shift in the model's internal representation of the topic.

Wikipedia appears in nearly every major language model training dataset and is weighted more heavily than web-crawled text. A small, coordinated Wikipedia editing campaign therefore measurably shapes how language models handle the topics those edits address. This work exposes a vector for value introduction into pretrained models that the industry has largely ignored.

Source: Small edits, large models: How Wikipedia advocacy shapes LLM values
Domain: arxiv.org

125 Wikipedia Edits Tilt Llama 8B النتائج على رعاية الحيوانات

Counterfactual Influence: 10 of 10, Across 5 Seeds

Fine-Tuning Confirms the Mechanism

More in Artificial Intelligence