Exploring How To Fail Interpretability Research
Exploring How To Fail Interpretability Research reveals several interesting facts.
- Been Kim (Google Brain) https://simons.berkeley.edu/talks/tbd-72 Frontiers of Deep Learning.
- This is a talk I gave to my MATS 9.0 training scholars about the big picture of mech interp - as of Oct 2025, what had changed?
- A talk I gave to my MATS 9.0 training program about reasoning model
- Take your personal data back with Incogni! Use code WELCHLABS at the link below and get 60% off an annual plan: ...
- Read more about Anthropic's
In-Depth Information on How To Fail Interpretability Research
Been Kim (Google Brain) https://simons.berkeley.edu/talks/tba-90 Emerging Challenges in Deep Learning. Stanford AI Lab Faculty Lunch, November 7, 2025. Updated version of https://web.stanford.edu/~cgpotts/blog/interp/ 0:59 ... A surprising fact about modern large language models is that nobody really knows how they work internally. At Anthropic, the ... With a growing interest in
... simple activation steering proved more effective than complex methods, Nanda argues for grounding
Stay tuned for more updates related to How To Fail Interpretability Research.