A new research by Google DeepMind shows how sparse autoencoders (SAEs) with special JumpReLU activation can help interpret LLMs.
https://venturebeat.com/ai/deepmind-makes-big-jump-toward-interpreting-llms-with-sparse-autoencoders/?utm_source=dlvr.it&utm_medium=blogger
https://venturebeat.com/ai/deepmind-makes-big-jump-toward-interpreting-llms-with-sparse-autoencoders/?utm_source=dlvr.it&utm_medium=blogger