Semantic Regexes: Auto-Interpreting LLM Features with a Structured Language

Angie Boggust1*Donghao Ren2Yannick Assogba2Dominik Moritz2Arvind Satyanarayan1Fred Hohman2
1MIT CSAIL
2Apple
*Work done at Apple
arXiv October 2025

Semantic regexes are an automated interpretability method that describe LLM features using a structured language. Semantic regexes provide accurate, concise, and consistent feature descriptions that help humans build mental models of feature activations.

Max Acts / Gemma-2-2b / Gemmascope-res-16k

4,300 results
cf1daa_eleuther_acts_top20_gemma-2-2b_gemmascope-res-16k

Clarity
58.97%
Responsiveness
82.45%
Purity
77.52%
Detection
85.69%
Fuzzing
85.83%
Faithfulness
49.51%

Max Acts / Gemma-2-2b / Gemmascope-res-65k

4,279 results
cf1daa_eleuther_acts_top20_gemma-2-2b_gemmascope-res-65k

Clarity
46.26%
Responsiveness
79.71%
Purity
75.21%
Detection
84.72%
Fuzzing
85.19%
Faithfulness
40.73%

Max Acts / Gpt2-small / Res-jb

1,838 results
cf1daa_eleuther_acts_top20_gpt2-small_res-jb

Clarity
61.23%
Responsiveness
84.77%
Purity
77.16%
Detection
87.65%
Fuzzing
89.97%
Faithfulness
32.02%

Max Acts / Gpt2-small / Res-jb / Gpt-4o

2,361 results
cf1daa_eleuther_acts_top20_gpt2-small_res-jb_gpt-4o

Clarity
62.84%
Responsiveness
87.62%
Purity
82.65%
Detection
88.76%
Fuzzing
85.96%
Faithfulness
6.20%

Token Act Pair / Gemma-2-2b / Gemmascope-res-16k

4,296 results
cf1daa_oai_token-act-pair_gemma-2-2b_gemmascope-res-16k

Clarity
34.23%
Responsiveness
75.62%
Purity
69.27%
Detection
78.38%
Fuzzing
78.63%
Faithfulness
37.65%

Token Act Pair / Gemma-2-2b / Gemmascope-res-65k

4,272 results
cf1daa_oai_token-act-pair_gemma-2-2b_gemmascope-res-65k

Clarity
25.80%
Responsiveness
74.74%
Purity
68.87%
Detection
76.82%
Fuzzing
77.15%
Faithfulness
30.11%

Token Act Pair / Gpt2-small / Res-jb

1,837 results
cf1daa_oai_token-act-pair_gpt2-small_res-jb

Clarity
34.70%
Responsiveness
76.41%
Purity
65.40%
Detection
79.42%
Fuzzing
78.75%
Faithfulness
27.60%

Token Act Pair / Gpt2-small / Res-jb / Gpt-4o

2,344 results
cf1daa_oai_token-act-pair_gpt2-small_res-jb_gpt-4o

Clarity
48.14%
Responsiveness
84.56%
Purity
78.49%
Detection
85.36%
Fuzzing
82.06%
Faithfulness
13.15%

Semantic Regex / Gemma-2-2b / Gemmascope-res-16k

25,990 results
cf1daa_semantic_regex_gemma-2-2b_gemmascope-res-16k

Clarity
57.38%
Responsiveness
75.59%
Purity
57.92%
Detection
78.72%
Fuzzing
76.85%
Faithfulness
38.03%

Semantic Regex / Gemma-2-2b / Gemmascope-res-65k

25,957 results
cf1daa_semantic_regex_gemma-2-2b_gemmascope-res-65k

Clarity
47.78%
Responsiveness
76.47%
Purity
59.14%
Detection
79.43%
Fuzzing
79.54%
Faithfulness
29.39%

Semantic Regex / Gpt2-small / Res-jb

1,838 results
cf1daa_semantic_regex_gpt2-small_res-jb

Clarity
56.48%
Responsiveness
76.34%
Purity
60.41%
Detection
81.27%
Fuzzing
81.36%
Faithfulness
21.31%

Semantic Regex / Gpt2-small / Res-jb / Gpt-4o

2,339 results
cf1daa_semantic_regex_gpt2-small_res-jb_gpt-4o

Clarity
53.48%
Responsiveness
79.54%
Purity
68.22%
Detection
84.58%
Fuzzing
79.91%
Faithfulness
10.93%