The architecture section remind me of auto encoders. Where the bottle neck is there so that model learns general rules instead of only pattern matching.
I was also wondering about the legal reasoning setup. Do you use the same model for reward and generation or is there a learned function in between that transforms vectors from source model to the target reward model.
Not too long ago, I created a blog on making the sub stack built-in text to speech reader work. And I’m secretly feeling like an asshole because I can’t get it to work still. when I try to figure it out, it seems as though it’s not fully offered everyone or something like that. I almost wonder if I’m getting guard railed for cursing or something like that But I have a feeling that you don’t. Have you tried to make it work cause there’s a lot of things I really wanna read that are very much a cumbersome task without having a speech to text.
The architecture section remind me of auto encoders. Where the bottle neck is there so that model learns general rules instead of only pattern matching.
I was also wondering about the legal reasoning setup. Do you use the same model for reward and generation or is there a learned function in between that transforms vectors from source model to the target reward model.
Different models, but there’s no principled reason. We need to test a lot more to confirm which is best
is there any open source code for TRM?
Not too long ago, I created a blog on making the sub stack built-in text to speech reader work. And I’m secretly feeling like an asshole because I can’t get it to work still. when I try to figure it out, it seems as though it’s not fully offered everyone or something like that. I almost wonder if I’m getting guard railed for cursing or something like that But I have a feeling that you don’t. Have you tried to make it work cause there’s a lot of things I really wanna read that are very much a cumbersome task without having a speech to text.