andrewducker | Interesting Links for 06-04-2023

You're viewing

andrewducker's journal
Create a Dreamwidth Account Learn More

Reload page in style: site light

Flat | Top-Level Comments Only

From:

simont

#2: I'm not following the GPT-4 discourse at all really, but I'm interested by the part of that article that mentions OpenAI's claim that it's supposed to "refuse unaligned requests", i.e. unaligned with normal human morality. Or, more to the point, that it has anything to do with the reported conversation.

It strikes me as unexpected because you wouldn't classify a human as unaligned with normal morality just because they threw themself wholeheartedly into a totally counterfactual what-if exercise of this kind. It's only if they actually put it into practice, or seriously planned to, or conspired to, that we'd think there was something wrong. But a science fiction author, for example, is totally allowed to strategy-game to their heart's content about how best to turn the universe into an equivalent mass of paperclips, and if they do it well enough, they'd even be praised for it!

Of course in that example it would be difficult to imagine the human actually intending to do it for real. But there are examples where it can be harder to tell. For example, carefully researching the best way in some circumstances to commit a murder or hide a body: bad if you're doing it for the purposes of actually committing murder, but just fine for the purposes of writing a mystery novel.

And you wouldn't imagine, say, Asimov's robots having trouble with the distinction either. If you ordered Daneel Olivaw to turn the universe into paperclips, he'd refuse, because First Law; but if you ordered him to think carefully about how another entity might, he'd have no reason not to give it his best shot, even if it made him a little uncomfortable to imagine. Indeed, if he had any reason to think some other entity was planning a paperclip-oriented rampage, then First Law would outright require him to anticipate that entity's moves as best he could, so as to thwart them effectively.

I suppose the point is that at the moment, the GPT series doesn't really have the distinction: to it, everything is a what-if exercise, and everything is real, because they're the same. So perhaps even giving a considered answer to "what would you do in these counterfactual conditions?" is considered equivalent to saying yes to the demand "ok, go do it for real".

Edited (last para was unclear) Date: 2023-04-06 12:55 pm (UTC)

From:

andrewducker

Yeah, not that it has a model of the world, but it certainly doesn't distinguish between the world and a theoretical exercise, except insofar as a theoretical exercise allows it to bypass its limitations.