Computer Morality

All posts tagged Computer Morality

Moral Self-Correction in Large Language Models?

Published March 21, 2023 by Nan Mykel

This is over Nan’s head but thought you might like to know.

(Ha Ha, my computer wouldn’t accept “computer morality” in my tags, insisting it should be computer mortality…)

arXiv Forum: How do we make accessible research papers a reality?

Can we truly call it “Open Science” when most research papers are not fully accessible? You are invited to join the forum on Monday April 17 to chart a path towards truly accessible research papers.

Learn More   Skip to main content

Computer Science > Computation and Language  [Submitted on 15 Feb 2023 (v1), last revised 18 Feb 2023 (this version, v2)]

The Capacity for Moral Self-Correction in Large Language Models

We test the hypothesis that language models trained with reinforcement learning from human feedback (RLHF) have the capability to “morally self-correct” — to avoid producing harmful outputs — if instructed to do so. We find strong evidence in support of this hypothesis across three different experiments, each of which reveal different facets of moral self-correction. We find that the capability for moral self-correction emerges at 22B model parameters, and typically improves with increasing model size and RLHF training. We believe that at this level of scale, language models obtain two capabilities that they can use for moral self-correction: (1) they can follow instructions and (2) they can learn complex normative concepts of harm like stereotyping, bias, and discrimination. As such, they can follow instructions to avoid certain kinds of morally harmful outputs. We believe our results are cause for cautious optimism regarding the ability to train language models to abide by ethical principles.

Subjects: Computation and Language (cs.CL)
Cite as: arXiv:2302.07459 [cs.CL]
(or arXiv:2302.07459v2 [cs.CL]for this version)
https://doi.org/10.48550/arXiv.2302.07459

From <https://arxiv.org/abs/2302.07459> 49 scientists apparently  plan for  greater accessibility of research papers  April 17  is the forum to discuss how this might happen.  “You are invited to join the forum,” presumably via the internet.

 

The Twisting Tail

the world turns on a word

butimbeautiful

You - philosophical, thoughtful, witty. Me - still thinks fart jokes are funny. We should DEFINITELY get together!

Mock Paper Scissors

The Internet's Band of Incorrigible Spitballers® and Cult Failure Since 2006

Pacific Paratrooper

This WordPress.com site is Pacific War era information

Edge of Humanity Magazine

An Independent Non-Discriminatory Platform With No Religious, Political, Financial, or Social Affiliations

K E Garland

Inspirational kwotes, stories and images

Nguyễn Thị Phương Trâm

Art and Literature Beyond Borders

Darcy Hitchcock

Envision a sustainable future

Barbara Crane Navarro

Rainforest Art Project - Pas de Cartier !

Kate Lunsford

Reflective Writing

Rosamond Press

A Newspaper for the Arts

Ned Hamson's Second Line View of the News

Second Look Behind the Headlines - News you can use...

Aging Capriciously

Divergent Thoughts on Life, Love and Death

Some View on the World

With previous posting of "Our World" on Blogger

Filosofa's Word

Cogito Ergo Sum

Trent's World (the Blog)

Random Ramblings and Reviews from Trent P. McDonald

Catxman's Cradle

Catxman dances, Catxman spins around, leaps ....... // I sing a song, a song of hope, a song of looove -- a song of burning roses. / Synthesizer notes. // (c) 2021-22

%d bloggers like this: