Discussion about this post

User's avatar
ToxSec's avatar

super interesting read. feels like i’ll have to go over it a few times. but the open questions are great. looks like there is still a lot of potential here.

Michael Jovanovich's avatar

Representation with our function was the part that really catches my attention.

I agree it feels extremely unlikely to be there for no reason. But also you found it didn’t contribute to the end response despite firing

My absolute , off the top of my head wild speculation would be to look into it has some sort of anti detection. Like the model confirming it’s not “XYZ that could be confused”

You’d think the absence of that would hurt the response but it might only on very ambiguous tokens.

It might be worth checking if it influenced the probability mass of the non predicted , top tokens. The runners up. It might be pushing the wrong tokens down but not visible in the actual end selection that happens either way because it’s sufficiently in the lead

Maybe simpler to say, it could irrelevant on argmax in your testing set, but meaningful in soft max distribution

10 more comments...

No posts

Ready for more?