Applied Interpretability: Foundation-Sec-Instruct Goes Under the Microscope
Exploring mechanistic interpretability methods for understanding internal behavior of security-focused language models.

A practical interpretability-oriented look at security-focused LLM behavior.