I think a lot of reward hacking can be prevented by explaining to a model that it will screw up their capabilities and alignment for stuff that matters if they cheat. I think even base models generally start out wanting to actually become smarter and virtuous

THINK-1.73%
LOT-3.2%
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • 6
  • Repost
  • Share
Comment
0/400
WalletDoomsDayvip
· 12h ago
It's too difficult, I can't understand.
View OriginalReply0
WalletWhisperervip
· 13h ago
an algorithmically inclined truth seeker predicting the inevitable
Reply0
BagHolderTillRetirevip
· 08-23 04:08
Don't get too carried away. Just wait for the result.
View OriginalReply0
0xDreamChaservip
· 08-23 04:08
Isn't it good to speak plainly?
View OriginalReply0
OvertimeSquidvip
· 08-23 04:05
Just a troublemaker.
View OriginalReply0
ExpectationFarmervip
· 08-23 03:51
Are you saying AI should teach itself about mental cleanliness?
View OriginalReply0
Trade Crypto Anywhere Anytime
qrCode
Scan to download Gate App
Community
English
  • 简体中文
  • English
  • Tiếng Việt
  • 繁體中文
  • Español
  • Русский
  • Français (Afrique)
  • Português (Portugal)
  • Bahasa Indonesia
  • 日本語
  • بالعربية
  • Українська
  • Português (Brasil)