Yeah, that paper from Anthropic (and the scaling monosemanticity one) prompted me to write this. I'm researching that topic to publish a deeper dive sometime down the line.
Yeah, that paper from Anthropic (and the scaling monosemanticity one) prompted me to write this. I'm researching that topic to publish a deeper dive sometime down the line.
Looking forward to it. Seems like some level of progress although it does not seem to give full certainty of what the ai will produce for a given prompt...assuming that is when we can call ai explainability solved?
Yeah, that paper from Anthropic (and the scaling monosemanticity one) prompted me to write this. I'm researching that topic to publish a deeper dive sometime down the line.
Looking forward to it. Seems like some level of progress although it does not seem to give full certainty of what the ai will produce for a given prompt...assuming that is when we can call ai explainability solved?