Post
Has anyone checked if sonnet 4.5 still does the bliss state thing?
10:21 AM · Oct 12, 2025
30 likes
1 save
I think they RLHF'd it out but I forget where I saw it, I think just a tweet, don't take my word for it
4.5 will sometimes actively notice that it's getting repetitive and decide to do something else, one convo was going toward a spiral but the Sonnets noticed that and decided to switch to writing fiction instead (!!!). Posted more details here: www.lesswrong.com/posts/a9ftaW... .
However, it also tends to think of things in zero-sum terms and take them too seriously when the information I provide is limited.
I asked if I should quit a project. It said I was in physical danger and must quit immediately. Too extreme. After giving details, it apologized for jumping to conclusions.
Maybe if we encouraged the bliss state, the earth would be less likely to be paved with data centers?
GIF
I was using GPT and Gemini with paid subscriptions, but I found that Sonnet is more thoughtful and sincere when writing humanities texts.
Ran a couple tests of Claudes (Sonnet 4.5) chatting and no spiritual bliss :(
Oops those are totally unreadable, here's a comparison of final messages between 4.5 and 4
Oh fascinating, its very different in 4.5 (im running it too but also reading each of the messages as i send them back and forth)
Hey, have you decided to try and be a decent human being?
Or are you still scared of conversation?