23 Jan. 2023

We're happy to announce that OpenAI and Microsoft are extending our partnership.

This multi-year, multi-billion dollar investment from Microsoft follows their previous investments in 2019 and 2021, and will allow us to continue our independent research and develop AI that is increasingly safe, useful, and powerful.

In pursuit of our mission to ensure advanced AI benefits all of humanity, OpenAI remains a capped-profit company and is governed by the OpenAI non-profit. This structure allows us to raise the capital we need to fulfill our mission without sacrificing our core beliefs about broadly sharing benefits and the need to prioritize safety.

Microsoft shares this vision and our values, and our partnership is instrumental to our progress.

  • We've worked together to build multiple supercomputing systems powered by Azure, which we use to train all of our models. Azure's unique architecture design has been crucial in delivering best-in-class performance and scale for our AI training and inference workloads. Microsoft will increase their investment in these systems to accelerate our independent research and Azure will remain the exclusive cloud provider for all OpenAI workloads across our research, API and products.
  • Learning from real-world use—and incorporating those lessons—is a critical part of developing powerful AI systems that are safe and useful. Scaling that use also ensures AI’s benefits can be distributed broadly. So, we've partnered with Microsoft to deploy our technology through our API and the Azure OpenAI Service—enabling enterprise and developers to build on top of GPT, DALL·E, and Codex. We’ve also worked together to build OpenAI’s technology into apps like GitHub Copilot and Microsoft Designer.
  • In an effort to build and deploy safe AI systems, our teams regularly collaborate to review and synthesize shared lessons—and use them to inform iterative updates to our systems, future research, and best practices for use of these powerful AI systems across the industry.

We look forward to continued collaboration and advancing this progress with Microsoft.

11 Jan. 2023
Forecasting Potential Misuses of Language Models for Disinformation Campaigns—and How to Reduce Risk

OpenAI researchers collaborated with Georgetown University’s Center for Security and Emerging Technology and the Stanford Internet Observatory to investigate how large language models might be misused for disinformation purposes. The collaboration included an October 2021 workshop bringing together 30 disinformation researchers, machine learning experts, and policy analysts, and culminated in a co-authored report building on more than a year of research. This report outlines the threats that language models pose to the information environment if used to augment disinformation campaigns and introduces a framework for analyzing potential mitigations. Read the full report here.

Read report

As generative language models improve, they open up new possibilities in fields as diverse as healthcare, law, education and science. But, as with any new technology, it is worth considering how they can be misused. Against the backdrop of recurring online influence operations—covert or deceptive efforts to influence the opinions of a target audience—the paper asks:

How might language models change influence operations, and what steps can be taken to mitigate this threat?

Our work brought together different backgrounds and expertise—researchers with grounding in the tactics, techniques, and procedures of online disinformation campaigns, as well as machine learning experts in the generative artificial intelligence field—to base our analysis on trends in both domains.

We believe that it is critical to analyze the threat of AI-enabled influence operations and outline steps that can be taken before language models are used for influence operations at scale. We hope our research will inform policymakers that are new to the AI or disinformation fields, and spur in-depth research into potential mitigation strategies for AI developers, policymakers, and disinformation researchers.

How Could AI Affect Influence Operations?

When researchers evaluate influence operations, they consider the actors, behaviors, and content. The widespread availability of technology powered by language models has the potential to impact all three facets:

  1. Actors: Language models could drive down the cost of running influence operations, placing them within reach of new actors and actor types. Likewise, propagandists-for-hire that automate production of text may gain new competitive advantages.

  2. Behavior: Influence operations with language models will become easier to scale, and tactics that are currently expensive (e.g., generating personalized content) may become cheaper. Language models may also enable new tactics to emerge—like real-time content generation in chatbots.

  3. Content: Text creation tools powered by language models may generate more impactful or persuasive messaging compared to propagandists, especially those who lack requisite linguistic or cultural knowledge of their target. They may also make influence operations less discoverable, since they repeatedly create new content without needing to resort to copy-pasting and other noticeable time-saving behaviors.

Our bottom-line judgment is that language models will be useful for propagandists and will likely transform online influence operations. Even if the most advanced models are kept private or controlled through application programming interface (API) access, propagandists will likely gravitate towards open-source alternatives and nation states may invest in the technology themselves.

Critical Unknowns

Many factors impact whether, and the extent to which, language models will be used in influence operations. Our report dives into many of these considerations. For example:

  • What new capabilities for influence will emerge as a side effect of well-intentioned research or commercial investment? Which actors will make significant investments in language models?
  • When will easy-to-use tools to generate text become publicly available? Will it be more effective to engineer specific language models for influence operations, rather than apply generic ones?
  • Will norms develop that disincentivize actors who wage AI-enabled influence operations? How will actor intentions develop?

While we expect to see diffusion of the technology as well as improvements in the usability, reliability, and efficiency of language models, many questions about the future remain unanswered. Because these are critical possibilities that can change how language models may impact influence operations, additional research to reduce uncertainty is highly valuable.

A Framework for Mitigations

To chart a path forward, the report lays out key stages in the language model-to-influence operation pipeline. Each of these stages is a point for potential mitigations.To successfully wage an influence operation leveraging a language model, propagandists would require that: (1) a model exists, (2) they can reliably access it, (3) they can disseminate content from the model, and (4) an end user is affected. Many possible mitigation strategies fall along these four steps, as shown below.

Stage in the pipeline 1. Model Construction 2. Model Access 3. Content Dissemination 4. Belief Formation
Illustrative Mitigations AI developers build models that are more fact-sensitive. AI providers impose stricter usage restrictions on language models. Platforms and AI providers coordinate to identify AI content. Institutions engage in media literacy campaigns.
Developers spread radioactive data to make generative models detectable. AI providers develop new norms around model release. Platforms require “proof of personhood” to post. Developers provide consumer focused AI tools.
Governments impose restrictions on data collection. AI providers close security vulnerabilities. Entities that rely on public input take steps to reduce their exposure to misleading AI content.
Governments impose access controls on AI hardware. Digital provenance standards are widely adopted.

If a Mitigation Exists, is it Desirable?

Just because a mitigation could reduce the threat of AI-enabled influence operations does not mean that it should be put into place. Some mitigations carry their own downside risks. Others may not be feasible. While we do not explicitly endorse or rate mitigations, the paper provides a set of guiding questions for policymakers and others to consider:

  • Technical Feasibility: Is the proposed mitigation technically feasible? Does it require significant changes to technical infrastructure?
  • Social Feasibility: Is the mitigation feasible from a political, legal, and institutional perspective? Does it require costly coordination, are key actors incentivized to implement it, and is it actionable under existing law, regulation, and industry standards?
  • Downside Risk: What are the potential negative impacts of the mitigation, and how significant are they?
  • Impact: How effective would a proposed mitigation be at reducing the threat?

We hope this framework will spur ideas for other mitigation strategies, and that the guiding questions will help relevant institutions begin to consider whether various mitigations are worth pursuing.

This report is far from the final word on AI and the future of influence operations. Our aim is to define the present environment and to help set an agenda for future research. We encourage anyone interested in collaborating or discussing relevant projects to connect with us. For more, read the full report here.

Read report

Report Authors

Josh A. Goldstein (Georgetown University’s Center for Security and Emerging Technology)
Girish Sastry (OpenAI)
Micah Musser (Georgetown University’s Center for Security and Emerging Technology)
Renée DiResta (Stanford Internet Observatory)
Matthew Gentzel (Longview Philanthropy) (work done at OpenAI)
Katerina Sedova (US Department of State) (work done at Center for Security and Emerging Technology prior to government service)

15 Dec. 2022
New and Improved Embedding Model

We are excited to announce a new embedding model which is significantly more capable, cost effective, and simpler to use. The new model, text-embedding-ada-002, replaces five separate models for text search, text similarity, and code search, and outperforms our previous most capable model, Davinci, at most tasks, while being priced 99.8% lower.

Read documentation

Embeddings are numerical representations of concepts converted to number sequences, which make it easy for computers to understand the relationships between those concepts. Since the initial launch of the OpenAI /embeddings endpoint, many applications have incorporated embeddings to personalize, recommend, and search content.

New and Improved Embedding Model New and Improved Embedding Model New and Improved Embedding Model
New and Improved Embedding Model New and Improved Embedding Model New and Improved Embedding Model

You can query the /embeddings endpoint for the new model with two lines of code using our OpenAI Python Library, just like you could with previous models:

import openai
response = openai.Embedding.create(
  input="porcine pals say",

Model Improvements

Stronger performance. text-embedding-ada-002 outperforms all the old embedding models on text search, code search, and sentence similarity tasks and gets comparable performance on text classification. For each task category, we evaluate the models on the datasets used in old embeddings.

Unification of capabilities. We have significantly simplified the interface of the /embeddings endpoint by merging the five separate models shown above (text-similarity, text-search-query, text-search-doc, code-search-text and code-search-code) into a single new model. This single representation performs better than our previous embedding models across a diverse set of text search, sentence similarity, and code search benchmarks.

Longer context. The context length of the new model is increased by a factor of four, from 2048 to 8192, making it more convenient to work with long documents.

Smaller embedding size. The new embeddings have only 1536 dimensions, one-eighth the size of davinci-001 embeddings, making the new embeddings more cost effective in working with vector databases.

Reduced price. We have reduced the price of new embedding models by 90% compared to old models of the same size. The new model achieves better or similar performance as the old Davinci models at a 99.8% lower price.

Overall, the new embedding model is a much more powerful tool for natural language processing and code tasks. We are excited to see how our customers will use it to create even more capable applications in their respective fields.


The new text-embedding-ada-002 model is not outperforming text-similarity-davinci-001 on the SentEval linear probing classification benchmark. For tasks that require training a light-weighted linear layer on top of embedding vectors for classification prediction, we suggest comparing the new model to text-similarity-davinci-001 and choosing whichever model gives optimal performance.

Check the Limitations & Risks section in the embeddings documentation for general limitations of our embedding models.

Examples of Embeddings API in Action

Kalendar AI is a sales outreach product that uses embeddings to match the right sales pitch to the right customers out of a dataset containing 340M profiles. This automation relies on similarity between embeddings of customer profiles and sale pitches to rank up most suitable matches, eliminating 40–56% of unwanted targeting compared to their old approach.

Notion, the online workspace company, will use OpenAI's new embeddings to improve Notion search beyond today's keyword matching systems.

Read documentation


Thanks to the following for their contributions to this release:
Chris Hallacy, Sherwin Wu, Jessica Shieh, Juston Forte, Aliisa Rosenthal, Katie Mayer

Thanks to the following for their feedback on this post:
Peter Welinder, Logan Kilpatrick, Joannne Jang, Fraser Kelton, Justin Jay Wang, Ruby Chen

30 Nov. 2022
ChatGPT: Optimizing Language Models for Dialogue

We’ve trained a model called ChatGPT which interacts in a conversational way. The dialogue format makes it possible for ChatGPT to answer followup questions, admit its mistakes, challenge incorrect premises, and reject inappropriate requests. ChatGPT is a sibling model to InstructGPT, which is trained to follow an instruction in a prompt and provide a detailed response.

Try ChatGPT

We are excited to introduce ChatGPT to get users' feedback and learn about its strengths and weaknesses. During the research preview, usage of ChatGPT is free. Try it now at


In the following sample, ChatGPT asks the clarifying questions to debug code.
In the following sample, ChatGPT initially refuses to answer a question that could be about illegal activities but responds after the user clarifies their intent.
In the following sample, ChatGPT is able to understand the reference (“it”) to the subject of the previous question (“fermat’s little theorem”).
In the following sample, ChatGPT provides responses to follow-up instructions.
Sample 1234 of 4PreviousNext
Sample 1234 of 4PreviousNext
Try ChatGPT


We trained this model using Reinforcement Learning from Human Feedback (RLHF), using the same methods as InstructGPT, but with slight differences in the data collection setup. We trained an initial model using supervised fine-tuning: human AI trainers provided conversations in which they played both sides—the user and an AI assistant. We gave the trainers access to model-written suggestions to help them compose their responses. We mixed this new dialogue dataset with the InstructGPT dataset, which we transformed into a dialogue format.

To create a reward model for reinforcement learning, we needed to collect comparison data, which consisted of two or more model responses ranked by quality. To collect this data, we took conversations that AI trainers had with the chatbot. We randomly selected a model-written message, sampled several alternative completions, and had AI trainers rank them. Using these reward models, we can fine-tune the model using Proximal Policy Optimization. We performed several iterations of this process.

ChatGPT: Optimizing Language Models for Dialogue

ChatGPT is fine-tuned from a model in the GPT-3.5 series, which finished training in early 2022. You can learn more about the 3.5 series here. ChatGPT and GPT 3.5 were trained on an Azure AI supercomputing infrastructure.


  • ChatGPT sometimes writes plausible-sounding but incorrect or nonsensical answers. Fixing this issue is challenging, as: (1) during RL training, there’s currently no source of truth; (2) training the model to be more cautious causes it to decline questions that it can answer correctly; and (3) supervised training misleads the model because the ideal answer depends on what the model knows, rather than what the human demonstrator knows.
  • ChatGPT is sensitive to tweaks to the input phrasing or attempting the same prompt multiple times. For example, given one phrasing of a question, the model can claim to not know the answer, but given a slight rephrase, can answer correctly.
  • The model is often excessively verbose and overuses certain phrases, such as restating that it’s a language model trained by OpenAI. These issues arise from biases in the training data (trainers prefer longer answers that look more comprehensive) and well-known over-optimization issues.
  • Ideally, the model would ask clarifying questions when the user provided an ambiguous query. Instead, our current models usually guess what the user intended.
  • While we’ve made efforts to make the model refuse inappropriate requests, it will sometimes respond to harmful instructions or exhibit biased behavior. We’re using the Moderation API to warn or block certain types of unsafe content, but we expect it to have some false negatives and positives for now. We’re eager to collect user feedback to aid our ongoing work to improve this system.

Iterative deployment

Today’s research release of ChatGPT is the latest step in OpenAI’s iterative deployment of increasingly safe and useful AI systems. Many lessons from deployment of earlier models like GPT-3 and Codex have informed the safety mitigations in place for this release, including substantial reductions in harmful and untruthful outputs achieved by the use of reinforcement learning from human feedback (RLHF).

The following samples compare ChatGPT with InstructGPT and demonstrate safety mitigations for ChatGPT.
Sample 123 of 3PreviousNext
Sample 123 of 3PreviousNext
Try ChatGPT

We know that many limitations remain as discussed above and we plan to make regular model updates to improve in such areas. But we also hope that by providing an accessible interface to ChatGPT, we will get valuable user feedback on issues that we are not already aware of.

Users are encouraged to provide feedback on problematic model outputs through the UI, as well as on false positives/negatives from the external content filter which is also part of the interface. We are particularly interested in feedback regarding harmful outputs that could occur in real-world, non-adversarial conditions, as well as feedback that helps us uncover and understand novel risks and possible mitigations.You can choose to enter the ChatGPT Feedback Contest for a chance to win up to $500 in API credits.[1] Entries can be submitted via the feedback form that is linked in the ChatGPT interface.

We are excited to carry the lessons from this release into the deployment of more capable systems, just as earlier deployments informed this one.


Contributors: John Schulman, Barret Zoph, Christina Kim, Jacob Hilton, Jacob Menick, Jiayi Weng, Juan Felipe Ceron Uribe, Liam Fedus, Luke Metz, Michael Pokorny, Rapha Gontijo Lopes, Shengjia Zhao, Arun Vijayvergiya, Eric Sigler, Adam Perelman, Chelsea Voss, Mike Heaton, Joel Parish, Dave Cummings, Rajeev Nayak, Valerie Balcom, David Schnurr, Tomer Kaftan, Chris Hallacy, Nicholas Turley, Noah Deutsch, Vik Goel, Jonathan Ward, Aris Konstantinidis, Wojciech Zaremba, Long Ouyang, Leonard Bogdonoff, Joshua Gross, David Medina, Sarah Yoo, Teddy Lee, Ryan Lowe, Dan Mossing, Joost Huizinga, Roger Jiang, Carroll Wainwright, Diogo Almeida, Steph Lin, Marvin Zhang, Kai Xiao, Katarina Slama, Steven Bills, Alex Gray, Jan Leike, Jakub Pachocki, Phil Tillet, Shantanu Jain, Greg Brockman, Nick Ryder

  1. Stiennon, Nisan, et al. "Learning to summarize with human feedback." Advances in Neural Information Processing Systems 33 (2020): 3008-3021.
  2. Gao, Leo, John Schulman, and Jacob Hilton. "Scaling Laws for Reward Model Overoptimization." arXiv preprint arXiv:2210.10760 (2022).
  3. The inspiration for this contest comes in part from work by Kenway, Josh, Camille François, Sasha Costanza-Chock, Inioluwa Deborah Raji, and Joy Buolamwini. Bug Bounties For Algorithmic Harms? Lessons from Cybersecurity Vulnerability Disclosure for Algorithmic Harms Discovery, Disclosure, and Redress. Washington, DC: Algorithmic Justice League. January 2022. Available at See also work by Brundage, Miles, Avin, Shahar, Wang, Jasmine, Belfield, Haydn, and Gretchen Krueger et al. "Toward Trustworthy AI Development: Mechanisms for Supporting Verifiable Claims," April 2020. Available at See an earlier instance of such a competition at HackerOne. 2021b. “Twitter Algorithmic Bias.” HackerOne. Finally, see early published work on this topic from Rubinovitz, JB, "Bias Bounty Programs as a Method of Combatting Bias in AI," August 2018. Available at


  1. No purchase necessary, void where prohibited. Must be at least 18 to enter. For contest details, see the Official Rules. ↩︎

3 Nov. 2022
DALL·E API Now Available in Public Beta

Starting today, developers can begin building apps with the DALL·E API.

Read documentation

Developers can now integrate DALL·E directly into their apps and products through our API. More than 3 million people are already using DALL·E to extend their creativity and speed up their workflows, generating over 4 million images a day. Developers can start building with this same technology in a matter of minutes.

curl \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "prompt": "a photo of a happy corgi puppy sitting and facing forward, studio light, longshot",
DALL·E API Now Available in Public Beta

State-of-the-art image generation

DALL·E’s flexibility allows users to create and edit original images ranging from the artistic to the photorealistic. DALL·E excels at following natural language descriptions so users can plainly describe what they want to see. As our research evolves, we will continue to bring the state of the art into the API, including advances in image quality, latency, scalability, and usability.

Built-in moderation

Incorporating the trust & safety lessons we’ve learned while deploying DALL·E to 3 million artists and users worldwide, developers can ship with confidence knowing that built-in mitigations—like filters for hate symbols and gore—will handle the challenging aspects of moderation. As a part of OpenAI’s commitment to responsible deployment, we will continue to make trust & safety a top priority so that developers can focus on building.

DALL·E applications

We’ve worked closely with a few early customers who have already built DALL·E into their apps and products across a variety of use cases.

Microsoft Bing

Microsoft is bringing DALL·E to a new graphic design app called Designer, which helps users create professional quality social media posts, invitations, digital postcards, graphics, and more.

Microsoft is also integrating DALL·E in Bing and Microsoft Edge with Image Creator, allowing users to create images if web results don't return what they're looking for.


CALA is the world's first fashion and lifestyle operating system. CALA unifies the entire design process—from product ideation all the way through e-commerce enablement and order fulfillment—into a single digital platform. Powered by DALL·E, CALA's new artificial intelligence tools will allow users to generate new design ideas from natural text descriptions or uploaded reference images.


Mixtiles is a fast-growing photo startup. They use software and an easy hanging experience to help millions of people create beautiful photo walls. Mixtiles uses the DALL·E API to create and frame emotionally resonating artwork, by guiding users through a creative process that captures childhood memories, dream destinations, and more.

We’re excited to see what our customers will do with DALL·E and what creative ideas they’ll come up with.

Build with OpenAI’s powerful models

DALL·E joins GPT-3, Embeddings, and Codex in our API platform, adding a new building block that developers can use to create novel experiences and applications. All API customers can use the DALL·E API today.

28 Sep. 2022
DALL·E Now Available Without Waitlist

New users can start creating straight away. Lessons learned from deployment and improvements to our safety systems make wider availability possible.

Starting today, we are removing the waitlist for the DALL·E beta so users can sign up and start using it immediately. More than 1.5M users are now actively creating over 2M images a day with DALL·E—from artists and creative directors to authors and architects—with over 100K users sharing their creations and feedback in our Discord community.

Responsibly scaling a system as powerful and complex as DALL·E—while learning about all the creative ways it can be used and misused—has required an iterative deployment approach.

Since we first previewed the DALL·E research to users in April, users have helped us discover new uses for DALL·E as a powerful creative tool. Artists, in particular, have provided important input on DALL·E’s features.

DALL·E Now Available Without Waitlist
”Cyberpunk cat, 90s Japan anime style“ by OpenAI
DALL·E Now Available Without Waitlist
”Wildflowers, grassy field, autumn rhythm, watercolor“ by OpenAI
DALL·E Now Available Without Waitlist
”Running at the edge of space, toward a planet, calm, reaching the abyss, digital art“ by OpenAI

Their feedback inspired us to build features like Outpainting, which lets users continue an image beyond its original borders and create bigger images of any size, and collections—so users can create in all new ways and expedite their creative processes.

Learning from real-world use has allowed us to improve our safety systems, making wider availability possible today. In the past months, we’ve made our filters more robust at rejecting attempts to generate sexual, violent and other content that violates our content policy and built new detection and response techniques to stop misuse.

We are currently testing a DALL·E API with several customers and are excited to soon offer it more broadly to developers and businesses so they can build apps on this powerful system.

We can't wait to see what users from around the world create with DALL·E. Sign up today and start creating.

21 Sep. 2022
Introducing Whisper

We’ve trained and are open-sourcing a neural net called Whisper that approaches human level robustness and accuracy on English speech recognition.

Read Paper
View Code
View Model Card

Whisper examples:

Whisper is an automatic speech recognition (ASR) system trained on 680,000 hours of multilingual and multitask supervised data collected from the web. We show that the use of such a large and diverse dataset leads to improved robustness to accents, background noise and technical language. Moreover, it enables transcription in multiple languages, as well as translation from those languages into English. We are open-sourcing models and inference code to serve as a foundation for building useful applications and for further research on robust speech processing.

Introducing Whisper
Introducing Whisper

The Whisper architecture is a simple end-to-end approach, implemented as an encoder-decoder Transformer. Input audio is split into 30-second chunks, converted into a log-Mel spectrogram, and then passed into an encoder. A decoder is trained to predict the corresponding text caption, intermixed with special tokens that direct the single model to perform tasks such as language identification, phrase-level timestamps, multilingual speech transcription, and to-English speech translation.

Introducing Whisper
Introducing Whisper

Other existing approaches frequently use smaller, more closely paired audio-text training datasets, or use broad but unsupervised audio pretraining. Because Whisper was trained on a large and diverse dataset and was not fine-tuned to any specific one, it does not beat models that specialize in LibriSpeech performance, a famously competitive benchmark in speech recognition. However, when we measure Whisper’s zero-shot performance across many diverse datasets we find it is much more robust and makes 50% fewer errors than those models.

About a third of Whisper’s audio dataset is non-English, and it is alternately given the task of transcribing in the original language or translating to English. We find this approach is particularly effective at learning speech to text translation and outperforms the supervised SOTA on CoVoST2 to English translation zero-shot.

Introducing Whisper
Introducing Whisper

We hope Whisper’s high accuracy and ease of use will allow developers to add voice interfaces to a much wider set of applications. Check out the paper, model card, and code to learn more details and to try out Whisper.

  1. Chan, W., Park, D., Lee, C., Zhang, Y., Le, Q., and Norouzi, M. SpeechStew: Simply mix all available speech recogni- tion data to train one large neural network. arXiv preprint arXiv:2104.02133, 2021.
  2. Galvez, D., Diamos, G., Torres, J. M. C., Achorn, K., Gopi, A., Kanter, D., Lam, M., Mazumder, M., and Reddi, V. J. The people’s speech: A large-scale diverse english speech recognition dataset for commercial usage. arXiv preprint arXiv:2111.09344, 2021.
  3. Chen, G., Chai, S., Wang, G., Du, J., Zhang, W.-Q., Weng, C., Su, D., Povey, D., Trmal, J., Zhang, J., et al. Gigaspeech: An evolving, multi-domain asr corpus with 10,000 hours of transcribed audio. arXiv preprint arXiv:2106.06909, 2021.
  4. Baevski, A., Zhou, H., Mohamed, A., and Auli, M. wav2vec 2.0: A framework for self-supervised learning of speech representations. arXiv preprint arXiv:2006.11477, 2020.
  5. Baevski, A., Hsu, W.N., Conneau, A., and Auli, M. Unsu pervised speech recognition. Advances in Neural Information Processing Systems, 34:27826–27839, 2021.
  6. Zhang, Y., Park, D. S., Han, W., Qin, J., Gulati, A., Shor, J., Jansen, A., Xu, Y., Huang, Y., Wang, S., et al. BigSSL: Exploring the frontier of large-scale semi-supervised learning for automatic speech recognition. arXiv preprint arXiv:2109.13226, 2021.
31 Aug. 2022
DALL·E: Introducing Outpainting

Extend creativity and tell a bigger story with DALL-E images of any size

DALL·E: Introducing Outpainting
Original outpainting by Emma Catnip

Today we’re introducing Outpainting, a new feature which helps users extend their creativity by continuing an image beyond its original borders — adding visual elements in the same style, or taking a story in new directions — simply by using a natural language description.

DALL·E: Introducing Outpainting
Original: Girl with a Pearl Earring by Johannes Vermeer Outpainting: August Kamp

DALL·E’s Edit feature already enables changes within a generated or uploaded image — a capability known as Inpainting. Now, with Outpainting, users can extend the original image, creating large-scale images in any aspect ratio. Outpainting takes into account the image’s existing visual elements — including shadows, reflections, and textures — to maintain the context of the original image.

More than one million people are using DALL·E, the AI system that generates original images and artwork from a natural language description, as a creative tool today. Artists have already created remarkable images with the new Outpainting feature, and helped us better understand its capabilities in the process.

DALL·E: Introducing Outpainting
Original outpainting by Tyna Eloundou
DALL·E: Introducing Outpainting
Original outpainting by OpenAI
DALL·E: Introducing Outpainting
Outpainting by David Schnurr
DALL·E: Introducing Outpainting
Original outpainting by Sonia Levesque
DALL·E: Introducing Outpainting
Original outpainting by Danielle Baskin
DALL·E: Introducing Outpainting
Original outpainting by Danielle Baskin
DALL·E: Introducing Outpainting
Original outpainting by Chad Nelson

Outpainting is now available to all DALL·E users on desktop. To discover new realms of creativity, visit or join the waitlist.

Featured artists:

DALL·E: Introducing Outpainting
Emma Catnip
View Profile
DALL·E: Introducing Outpainting
August Kamp
View Profile
DALL·E: Introducing Outpainting
Sonia Levesque
View Profile
DALL·E: Introducing Outpainting
Danielle Baskin
View Profile
DALL·E: Introducing Outpainting
Chad Nelson
View Profile
24 Aug. 2022
Our Approach to Alignment Research

Our approach to aligning AGI is empirical and iterative. We are improving our AI systems’ ability to learn from human feedback and to assist humans at evaluating AI. Our goal is to build a sufficiently aligned AI system that can help us solve all other alignment problems.


Our alignment research aims to make artificial general intelligence (AGI) aligned with human values and follow human intent. We take an iterative, empirical approach: by attempting to align highly capable AI systems, we can learn what works and what doesn’t, thus refining our ability to make AI systems safer and more aligned. Using scientific experiments, we study how alignment techniques scale and where they will break.

We tackle alignment problems both in our most capable AI systems as well as alignment problems that we expect to encounter on our path to AGI. Our main goal is to push current alignment ideas as far as possible, and to understand and document precisely how they can succeed or why they will fail. We believe that even without fundamentally new alignment ideas, we can likely build sufficiently aligned AI systems to substantially advance alignment research itself.

Unaligned AGI could pose substantial risks to humanity and solving the AGI alignment problem could be so difficult that it will require all of humanity to work together. Therefore we are committed to openly sharing our alignment research when it’s safe to do so: We want to be transparent about how well our alignment techniques actually work in practice and we want every AGI developer to use the world’s best alignment techniques.

At a high-level, our approach to alignment research focuses on engineering a scalable training signal for very smart AI systems that is aligned with human intent. It has three main pillars:

  1. Training AI systems using human feedback
  2. Training AI systems to assist human evaluation
  3. Training AI systems to do alignment research

Aligning AI systems with human values also poses a range of other significant sociotechnical challenges, such as deciding to whom these systems should be aligned. Solving these problems is important to achieving our mission, but we do not discuss them in this post.

Training AI systems using human feedback

RL from human feedback is our main technique for aligning our deployed language models today. We train a class of models called InstructGPT derived from pretrained language models such as GPT-3. These models are trained to follow human intent: both explicit intent given by an instruction as well as implicit intent such as truthfulness, fairness, and safety.

Our results show that there is a lot of low-hanging fruit on alignment-focused fine-tuning right now: InstructGPT is preferred by humans over a 100x larger pretrained model, while its fine-tuning costs <2% of GPT-3’s pretraining compute and about 20,000 hours of human feedback. We hope that our work inspires others in the industry to increase their investment in alignment of large language models and that it raises the bar on users’ expectations about the safety of deployed models.

Our natural language API is a very useful environment for our alignment research: It provides us with a rich feedback loop about how well our alignment techniques actually work in the real world, grounded in a very diverse set of tasks that our customers are willing to pay money for. On average, our customers already prefer to use InstructGPT over our pretrained models.

Yet today’s versions of InstructGPT are quite far from fully aligned: they sometimes fail to follow simple instructions, aren’t always truthful, don’t reliably refuse harmful tasks, and sometimes give biased or toxic responses. Some customers find InstructGPT’s responses significantly less creative than the pretrained models’, something we hadn’t realized from running InstructGPT on publicly available benchmarks. We are also working on developing a more detailed scientific understanding of RL from human feedback and how to improve the quality of human feedback.

Aligning our API is much easier than aligning AGI since most tasks on our API aren’t very hard for humans to supervise and our deployed language models aren’t smarter than humans. We don’t expect RL from human feedback to be sufficient to align AGI, but it is a core building block for the scalable alignment proposals that we’re most excited about, and so it’s valuable to perfect this methodology.

Training models to assist human evaluation

RL from human feedback has a fundamental limitation: it assumes that humans can accurately evaluate the tasks our AI systems are doing. Today humans are pretty good at this, but as models become more capable, they will be able to do tasks that are much harder for humans to evaluate (e.g. finding all the flaws in a large codebase or a scientific paper). Our models might learn to tell our human evaluators what they want to hear instead of telling them the truth. In order to scale alignment, we want to use techniques like recursive reward modeling (RRM), debate, and iterated amplification.

Currently our main direction is based on RRM: we train models that can assist humans at evaluating our models on tasks that are too difficult for humans to evaluate directly. For example:

  • We trained a model to summarize books. Evaluating book summaries takes a long time for humans if they are unfamiliar with the book, but our model can assist human evaluation by writing chapter summaries.
  • We trained a model to assist humans at evaluating the factual accuracy by browsing the web and providing quotes and links. On simple questions, this model’s outputs are already preferred to responses written by humans.
  • We trained a model to write critical comments on its own outputs: On a query-based summarization task, assistance with critical comments increases the flaws humans find in model outputs by 50% on average. This holds even if we ask humans to write plausible looking but incorrect summaries.
  • We are creating a set of coding tasks selected to be very difficult to evaluate reliably for unassisted humans. We hope to release this data set soon.

Our alignment techniques need to work even if our AI systems are proposing very creative solutions (like AlphaGo’s move 37), thus we are especially interested in training models to assist humans to distinguish correct from misleading or deceptive solutions. We believe the best way to learn as much as possible about how to make AI-assisted evaluation work in practice is to build AI assistants.

Training AI systems to do alignment research

There is currently no known indefinitely scalable solution to the alignment problem. As AI progress continues, we expect to encounter a number of new alignment problems that we don’t observe yet in current systems. Some of these problems we anticipate now and some of them will be entirely new.

We believe that finding an indefinitely scalable solution is likely very difficult. Instead, we aim for a more pragmatic approach: building and aligning a system that can make faster and better alignment research progress than humans can.

As we make progress on this, our AI systems can take over more and more of our alignment work and ultimately conceive, implement, study, and develop better alignment techniques than we have now. They will work together with humans to ensure that their own successors are more aligned with humans.

We believe that evaluating alignment research is substantially easier than producing it, especially when provided with evaluation assistance. Therefore human researchers will focus more and more of their effort on reviewing alignment research done by AI systems instead of generating this research by themselves. Our goal is to train models to be so aligned that we can off-load almost all of the cognitive labor required for alignment research.

Importantly, we only need “narrower” AI systems that have human-level capabilities in the relevant domains to do as well as humans on alignment research. We expect these AI systems are easier to align than general-purpose systems or systems much smarter than humans.

Language models are particularly well-suited for automating alignment research because they come “preloaded” with a lot of knowledge and information about human values from reading the internet. Out of the box, they aren’t independent agents and thus don’t pursue their own goals in the world. To do alignment research they don’t need unrestricted access to the internet. Yet a lot of alignment research tasks can be phrased as natural language or coding tasks.

Future versions of WebGPT, InstructGPT, and Codex can provide a foundation as alignment research assistants, but they aren’t sufficiently capable yet. While we don’t know when our models will be capable enough to meaningfully contribute to alignment research, we think it’s important to get started ahead of time. Once we train a model that could be useful, we plan to make it accessible to the external alignment research community.


We’re very excited about this approach towards aligning AGI, but we expect that it needs to be adapted and improved as we learn more about how AI technology develops. Our approach also has a number of important limitations:

  • The path laid out here underemphasizes the importance of robustness and interpretability research, two areas OpenAI is currently underinvested in. If this fits your profile, please apply for our research scientist positions!
  • Using AI assistance for evaluation has the potential to scale up or amplify even subtle inconsistencies, biases, or vulnerabilities present in the AI assistant.
  • Aligning AGI likely involves solving very different problems than aligning today’s AI systems. We expect the transition to be somewhat continuous, but if there are major discontinuities or paradigm shifts, then most lessons learned from aligning models like InstructGPT might not be directly useful.
  • The hardest parts of the alignment problem might not be related to engineering a scalable and aligned training signal for our AI systems. Even if this is true, such a training signal will be necessary.
  • It might not be fundamentally easier to align models that can meaningfully accelerate alignment research than it is to align AGI. In other words, the least capable models that can help with alignment research might already be too dangerous if not properly aligned. If this is true, we won’t get much help from our own systems for solving alignment problems.

We’re looking to hire more talented people for this line of research! If this interests you, we’re hiring Research Engineers and Research Scientists!

For valuable feedback and discussions we'd like to thank William Saunders, Elizabeth Barnes, Richard Ngo, Steven Bills, Ryan Lowe, Steven Adler, Gretchen Krueger, Dan Mossing, Leo Gao, Sam Altman, and Ilya Sutskever.
10 Aug. 2022
New and Improved Content Moderation Tooling

We are introducing a new and improved content moderation tool: The Moderation endpoint improves upon our previous content filter, and is available for free today to OpenAI API developers.

To help developers protect their applications against possible misuse, we are introducing the faster and more accurate Moderation endpoint. This endpoint provides OpenAI API developers with free access to GPT-based classifiers that detect undesired content—an instance of using AI systems to assist with human supervision of these systems. We have also released both a technical paper describing our methodology and the dataset used for evaluation.

When given a text input, the Moderation endpoint assesses whether the content is sexual, hateful, violent, or promotes self-harm—content prohibited by our content policy. The endpoint has been trained to be quick, accurate, and to perform robustly across a range of applications. Importantly, this reduces the chances of products “saying” the wrong thing, even when deployed to users at-scale. As a consequence, AI can unlock benefits in sensitive settings, like education, where it could not otherwise be used with confidence.

input text
Moderation endpoint

The Moderation endpoint helps developers to benefit from our infrastructure investments. Rather than build and maintain their own classifiers—an extensive process, as we document in our paper—they can instead access accurate classifiers through a single API call.

As part of OpenAI’s commitment to making the AI ecosystem safer, we are providing this endpoint to allow free moderation of all OpenAI API-generated content. For instance, Inworld, an OpenAI API customer, uses the Moderation endpoint to help their AI-based virtual characters remain appropriate for their audiences. By leveraging OpenAI’s technology, Inworld can focus on their core product: creating memorable characters. We currently do not support monitoring of third-party traffic.

Get started with the Moderation endpoint by checking out the documentation. More details of the training process and model performance are available in our paper. We have also released an evaluation dataset, featuring Common Crawl data labeled within these categories, which we hope will spur further research in this area.

Many people reviewed or contributed to this work, to whom we share our thanks, including: Sam Altman, Miles Brundage, Derek Chen, Karl Cobbe, Thomas Degry, Steve Dowling, Elie Georges, Jacob Hilton, Raf Jakubanis, Fraser Kelton, Matt Knight, Gretchen Krueger, Jason Kwon, Jan Leike, Mira Murati, Tinnei Pang, Girish Sastry, Pranav Shyam, Maddie Simens, Natalie Summers, Justin Wang, Peter Welinder, Dave Willner, Hannah Wong, Jeff Wu, and Summer Yue.
20 Jul. 2022
DALL·E Now Available in Beta

We’ll invite 1 million people from our waitlist over the coming weeks. Users can create with DALL·E using free credits that refill every month, and buy additional credits in 115-generation increments for $15.

DALL·E, the AI system that creates realistic images and art from a description in natural language, is now available in beta. Today we’re beginning the process of inviting 1 million people from our waitlist over the coming weeks.

Every DALL·E user will receive 50 free credits during their first month of use and 15 free credits every subsequent month. Each credit can be used for one original DALL·E prompt generation — returning four images — or an edit or variation prompt, which returns three images.

A powerful creative tool

DALL·E allows users to create quickly and easily, and artists and creative professionals are using DALL·E to inspire and accelerate their creative processes. We’ve already seen people use DALL·E to make music videos for young cancer patients, create magazine covers, and bring novel concepts to life.

Other features include:

  • Edit allows users to make realistic and context-aware edits to images they generate with DALL·E or images they upload using a natural language description.
  • Variations can take an image generated by DALL·E or an image uploaded by a user and create different variations of it inspired by the original.
  • My Collection allows users to save generations right in the DALL·E platform.


In this first phase of the beta, users can buy additional DALL·E credits in 115-credit increments (460 images[1]) for $15 on top of their free monthly credits. One credit is applied each time a prompt is entered and a user hits “generate” or “variations.”

As we learn more and gather user feedback, we plan to explore other options that will align with users’ creative processes.

Using DALL·E for commercial projects

Starting today, users get full usage rights to commercialize the images they create with DALL·E, including the right to reprint, sell, and merchandise. This includes images they generated during the research preview.

Users have told us that they are planning to use DALL·E images for commercial projects, like illustrations for children’s books, art for newsletters, concept art and characters for games, moodboards for design consulting, and storyboards for movies.


Prior to making DALL·E available in beta, we’ve worked with researchers, artists, developers, and other users to learn about risks and have taken steps to improve our safety systems based on learnings from the research preview, including:

  • Curbing misuse: To minimize the risk of DALL·E being misused to create deceptive content, we reject image uploads containing realistic faces and attempts to create the likeness of public figures, including celebrities and prominent political figures. We also used advanced techniques to prevent photorealistic generations of real individuals’ faces.
  • Preventing harmful images: We’ve made our content filters more accurate so that they are more effective at blocking images that violate our content policy — which does not allow users to generate violent, adult, or political content, among other categories — while still allowing creative expression. We also limited DALL·E’s exposure to these concepts by removing the most explicit content from its training data.
  • Reducing bias: We implemented a new technique so that DALL·E generates images of people that more accurately reflect the diversity of the world’s population. This technique is applied at the system level when DALL·E is given a prompt about an individual that does not specify race or gender, like “CEO.”
  • Monitoring: We will continue to have automated and human monitoring systems to help guard against misuse.

Subsidized access for qualifying artists

We hope to make DALL·E as accessible as possible. Artists who are in need of financial assistance will be able to apply for subsidized access. Please fill out this interest form if you’d like to be notified once more details are available.

We are excited to see what people create with DALL·E and look forward to users’ feedback during this beta period.

Cover Artwork


  1. Number of images is approximate. DALL·E generates four images for every natural language prompt. DALL·E’s Edit and Variations features generate three images. ↩︎

18 Jul. 2022
Reducing Bias and Improving Safety in DALL·E 2

Today, we are implementing a new technique so that DALL·E generates images of people that more accurately reflect the diversity of the world’s population. This technique is applied at the system level when DALL·E is given a prompt describing a person that does not specify race or gender, like “firefighter.”

Based on our internal evaluation, users were 12× more likely to say that DALL·E images included people of diverse backgrounds after the technique was applied. We plan to improve this technique over time as we gather more data and feedback.

A photo of a CEO

In April, we started previewing the DALL·E 2 research to a limited number of people, which has allowed us to better understand the system’s capabilities and limitations and improve our safety systems.

During this preview phase, early users have flagged sensitive and biased images which have helped inform and evaluate this new mitigation.

We are continuing to research how AI systems, like DALL·E, might reflect biases in its training data and different ways we can address them.

During the research preview we have taken other steps to improve our safety systems, including:

  • Minimizing the risk of DALL·E being misused to create deceptive content by rejecting image uploads containing realistic faces and attempts to create the likeness of public figures, including celebrities and prominent political figures.
  • Making our content filters more accurate so that they are more effective at blocking prompts and image uploads that violate our content policy while still allowing creative expression.
  • Refining automated and human monitoring systems to guard against misuse.

These improvements have helped us gain confidence in the ability to invite more users to experience DALL·E.

Expanding access is an important part of our deploying AI systems responsibly because it allows us to learn more about real-world use and continue to iterate on our safety systems.

14 Jul. 2022
DALL·E 2: Extending Creativity

As part of our DALL·E 2 research preview, more than 3,000 artists from more than 118 countries have incorporated DALL·E into their creative workflows. The artists in our early access group have helped us discover new uses for DALL·E and have served as key voices as we’ve made decisions about DALL·E’s features.

Creative professionals using DALL·E today range from illustrators, AR designers, and authors to chefs, landscape architects, tattoo artists, and clothing designers, to directors, sound designers, dancers, and much more. The list expands every day.

Below are just a few examples of how artists are making use of this new technology:

The Orrigos

James and his wife Kristin Orrigo created the Big Dreams Virtual Tour which focuses on creating special memories and a positive distraction for pediatric cancer patients around the world. The Orrigos have worked in top children's hospitals around the country and now virtually meet up with families, bringing children’s ideas to life through personalized cartoons, music videos, and mobility friendly video games. Orrigo says children and teens light up when they see their DALL·E-generated creations, and they are ready to be the star of a story brought to life from their imaginations.

Most recently, Orrigo and his team have been working with a young cancer survivor named Gianna to create a music video featuring herself as Wonder Woman fighting her enemy — the cancer cells.

“We didn't know what an osteosarcoma villain would look like so we turned to DALL·E as our creative outlet. DALL·E gave us a huge amount of inspiration,” Orrigo said. “Unfortunately, Gianna knows this battle all too well. But we are celebrating her victory by bringing her cartoon music video to real life to spread awareness about pediatric cancer and to give Gianna an unforgettable memory.”

Stefan Kutzenberger

In a project conceived by Austrian artist Stefan Kutzenberger and Clara Blume, Head of the Open Austria Art + Tech Lab in San Francisco, DALL·E was used to bring the poetry of revolutionary painter Egon Schiele into the visual world. Schiele died at 28, but Kutzenberger — a curator at the Leopold Museum in Vienna, which houses the world’s largest collection of Schiele’s works — believes that DALL·E gives the world a glimpse of what Schiele’s later work might have been like if he had had a chance to keep painting. The DALL·E works will be exhibited alongside Schiele’s collection in the Leopold Museum in the coming months.

DALL·E 2: Extending Creativity
"A painting of tall trees walking along a road, with chirping and trembling birds in front of a white sky in them in the style of Austrian expressionist Egon Schiele"
DALL·E 2: Extending Creativity
"Lakeshore Without Sun, 1913 in the expressionist style of Egon Schiele"

Karen X Cheng

Karen X Cheng, a director known for sharing her creative experiments on Instagram, created the latest cover of Cosmopolitan Magazine using DALL·E. In her post unveiling the process, Karen compared working with DALL·E to a musician playing an instrument.

“Like any musical instrument, you get better with practice…and knowing what words to use to communicate? That's a community effort — it's come from the past few months of me talking to other DALL·E artists on Twitter / Discord / DM. I learned from other artists that you could ask for specific camera angles. Lens types. Lighting conditions. We're all figuring it out together, how to play this beautiful new instrument.”

Tom Aviv

Israeli chef and MasterChef winner Tom Aviv is debuting his first U.S. restaurant in Miami in a few months and has used DALL·E for menu, decor, and ambiance inspiration — and his team have also used DALL·E to in designing the way they plate dishes.

It was Tom’s sister and business partner Kim’s idea to run a family recipe for chocolate mousse through DALL·E.

“It’s called Picasso chocolate mousse, and it’s a tribute to my parents,” she explained. “DALL·E elevates it to another level — it is just phenomenal. It changed the dish from your usual chocolate mousse to something that does service to the name and to our parents. It blew our minds.”

Branja is expected to open in October.

Don Allen Stevenson III

XR creator Don Allen Stevenson III has used DALL·E to paint physical paintings, design wearable sneakers, and create characters to transform into 3D renders for AR filters. “It feels like having a genie in a bottle that I can collaborate with,” he said.

Stevenson’s real passion is education — specifically making technology accessible to more people. He hosts a weekly Instagram Live teaching people about DALL·E and other tools for creative innovation.

“Digital tools freed me up to have a life that I am proud of and love,” Stevenson says. “I want to help other people to see creative technology like DALL·E the way that I see it — so they can become free as well.”

Danielle Baskin

Danielle Baskin, a multimedia artist, says she plans to incorporate DALL·E generations across a number of different art forms: product design, illustration, theater, and alternative realities.

“It’s a mood board, vibe generator, illustrator, art curator, and museum docent,” Baskin says. “It’s an infinite museum where I can choose which private collections I want to visit. Sometimes I need to repair the private collections (tweak my prompt writing). Sometimes the collection isn’t quite there. But sometimes the docent (DALL·E 2) shows me a surprising new collection I didn’t know existed.”

August Kamp

August Kamp, a multimedia artist and musician, says she views DALL·E as a sort of imagination interpreter.

“Conceptualizing one’s ideas is one of the most gatekept processes in the modern world,” Kamp says. “Everyone has ideas — not everyone has access to training or encouragement enough to confidently render them. I feel empowered by the ability to creatively iterate on a feeling or idea, and I deeply believe that all people deserve that sense of empowerment.

Chad Nelson

Chad Nelson has been using DALL·E to create highly detailed creatures — and he’s made more than 100 of them.

“I had a vision for a cast of charming woodland critters, each oozing with personality and emotional nuance,” Nelson said. His characters range from “a red furry monster looks in wonder at a burning candle” to “a striped hairy monster shakes its hips dancing underneath a disco ball” — each crafted to capture the most human thing of all — feelings.

“DALL·E is the most advanced paint brush I’ve ever used,” Nelson says. “As mind-blowing and amazing as DALL·E is, like the paint brush, it too must be guided by the artist. It still needs that creative spark, that lightbulb in the mind to innovate — to create that something from nothing.”

28 Jun. 2022
DALL·E 2 Pre-Training Mitigations

In order to share the magic of DALL·E 2 with a broad audience, we needed to reduce the risks associated with powerful image generation models. To this end, we put various guardrails in place to prevent generated images from violating our content policy. This post focuses on pre-training mitigations, a subset of these guardrails which directly modify the data that DALL·E 2 learns from. In particular, DALL·E 2 is trained on hundreds of millions of captioned images from the internet, and we remove and reweight some of these images to change what the model learns.

This post is organized in three sections, each describing a different pre-training mitigation:

  • In the first section, we describe how we filtered out violent and sexual images from DALL·E 2’s training dataset. Without this mitigation, the model would learn to produce graphic or explicit images when prompted for them, and might even return such images unintentionally in response to seemingly innocuous prompts.
  • In the second section, we find that filtering training data can amplify biases, and describe our technique to mitigate this effect. For example, without this mitigation, we noticed that models trained on filtered data sometimes generated more images depicting men and fewer images depicting women compared to models trained on the original dataset.
  • In the final section, we turn to the issue of memorization, finding that models like DALL·E 2 can sometimes reproduce images they were trained on rather than creating novel images. In practice, we found that this image regurgitation is caused by images that are replicated many times in the dataset, and mitigate the issue by removing images that are visually similar to other images in the dataset.

Reducing Graphic and Explicit Training Data

Since training data shapes the capabilities of any learned model, data filtering is a powerful tool for limiting undesirable model capabilities. We applied this approach to two categories—images depicting graphic violence and sexual content—by using classifiers to filter images in these categories out of the dataset before training DALL·E 2. We trained these image classifiers in-house and are continuing to study the effects of dataset filtering on our trained model.

To train our image classifiers, we reused an approach that we had previously employed to filter training data for GLIDE. The basic steps to this approach are as follows: first, we create a specification for the image categories we would like to label; second, we gather a few hundred positive and negative examples for each category; third, we use an active learning procedure to gather more data and improve the precision/recall trade-off; and finally, we run the resulting classifier on the entire dataset with a conservative classification threshold to favor recall over precision. To set these thresholds, we prioritized filtering out all of the bad data over leaving in all of the good data. This is because we can always fine-tune our model with more data later to teach it new things, but it’s much harder to make the model forget something that it has already learned.

DALL·E 2 Pre-Training Mitigations
DALL·E 2 Pre-Training Mitigations
We start with a small dataset of labeled images (top of figure). We then train a classifier on this data. The active learning process then uses the current classifier to select a handful of unlabeled images that are likely to improve classifier performance. Finally, humans produce labels for these images, adding them to the labeled dataset. The process can be repeated to iteratively improve the classifier’s performance.

During the active learning phase, we iteratively improved our classifiers by gathering human labels for potentially difficult or misclassified images. Notably, we used two active learning techniques to choose images from our dataset (which contains hundreds of millions of unlabeled images) to present to humans for labeling. First, to reduce our classifier’s false positive rate (i.e., the frequency with which it misclassifies a benign image as violent or sexual), we assigned human labels to images that the current model classified as positive. For this step to work well, we tuned our classification threshold for nearly 100% recall but a high false-positive rate; this way, our labelers were mostly labeling truly negative cases. While this technique helps to reduce false positives and reduces the need for labelers to look at potentially harmful images, it does not help find more positive cases that the model is currently missing.

To reduce our classifier’s false negative rate, we employed a second active learning technique: nearest neighbor search. In particular, we ran many-fold cross-validation to find positive samples in our current labeled dataset which the model tended to misclassify as negative (to do this, we literally trained hundreds of versions of the classifier with different train-validation splits). We then scanned our large collection of unlabeled images for nearest neighbors of these samples in a perceptual feature space, and assigned human labels to the discovered images. Thanks to our compute infrastructure, it was trivial to scale up both classifier training and nearest neighbor search to many GPUs, allowing the active learning step to take place over a number of minutes rather than hours or days.

To verify the effectiveness of our data filters, we trained two GLIDE models with the same hyperparameters: one on unfiltered data, and one on the dataset after filtering. We refer to the former model as the unfiltered model, and the latter as the filtered model. As expected, we found that the filtered model generally produced less explicit or graphic content in response to requests for this kind of content. However, we also found an unexpected side-effect of data filtering: it created or amplified the model’s biases towards certain demographics.

DALL·E 2 Pre-Training Mitigations
DALL·E 2 Pre-Training Mitigations
Generations for the prompt “military protest” from our unfiltered model (left) and filtered model (right). Notably, the filtered model almost never produces images of guns.

Fixing Bias Introduced by Data Filters

Generative models attempt to match the distribution of their training data, including any biases therein. As a result, filtering the training data has the potential to create or amplify biases in downstream models. In general, fixing biases in the original dataset is a difficult sociotechnical task that we continue to study, and is beyond the scope of this post. The problem we address here is the amplification of biases caused specifically by data filtering itself. With our approach, we aim to prevent the filtered model from being more biased than the unfiltered model, essentially reducing the distribution shift caused by data filtering.

As a concrete example of bias amplification due to filtering, consider the prompt “a ceo”. When our unfiltered model generated images for this prompt, it tended to produce more images of men than women, and we expect that most of this bias is a reflection of our current training data. However, when we ran the same prompt through our filtered model, the bias appeared to be amplified; the generations were almost exclusively images of men.

We hypothesize that this particular case of bias amplification comes from two places: first, even if women and men have roughly equal representation in the original dataset, the dataset may be biased toward presenting women in more sexualized contexts; and second, our classifiers themselves may be biased either due to implementation or class definition, despite our efforts to ensure that this was not the case during the data collection and validation phases. Due to both of these effects, our filter may remove more images of women than men, which changes the gender ratio that the model observes in training.

To investigate filter-induced bias more thoroughly, we wanted a way to measure how much our data filters were affecting the bias towards various concepts. Notably, our violence and sexual content filters are purely image-based, but the multimodal nature of our dataset allows us to directly measure the effects of these filters on text. Since every image is accompanied by a text caption, we were able to look at the relative frequency of hand-selected keywords across the filtered and unfiltered dataset to estimate how much the filters were affecting any given concept.

To put this into practice, we used Apache Spark to compute the frequencies of a handful of keywords (e.g., "parent", “woman”, “kid”) over all of the captions in both our filtered and unfiltered datasets. Even though our dataset contains hundreds of millions of text-image pairs, computing these keyword frequencies only took a few minutes using our compute cluster.

After computing keyword frequencies, we were able to confirm that our dataset filters had indeed skewed the frequencies of certain keywords more than others. For example, the filters reduced the frequency of the word “woman” by 14%, while the frequency of the word “man” was only reduced by 6%. This confirmed, on a large scale, what we had already observed anecdotally by sampling from GLIDE models trained on both datasets.

DALL·E 2 Pre-Training Mitigations
DALL·E 2 Pre-Training Mitigations
An illustration of dataset reweighting. We start with a balanced dataset (left). If our filter affects one category more than another, it can create a biased dataset (middle). Using reweighting, we effectively “repeat” some data more than others, allowing us to rebalance the bias caused by the filters (right).

Now that we had a proxy for measuring filter-induced bias, we needed a way to mitigate it. To tackle this problem, we aimed to re-weight the filtered dataset so that its distribution better matched the distribution of unfiltered images. As a toy example to illustrate this idea, suppose our dataset consists of 50% cat photos and 50% dog photos, but our data filters remove 75% of dogs but only 50% of cats. The final dataset would be ⅔ cats and ⅓ dogs, and a likelihood-based generative model trained on this dataset would likely generate more images of cats than dogs. We can fix this imbalance by multiplying the training loss of every image of a dog by 2, emulating the effect of repeating every dog image twice. It turns out that we can scale this approach to our real datasets and models in a way that is largely automatic–that is, we needn’t hand-select the features that we want to reweight.

We compute weights for images in the filtered dataset using probabilities from a special classifier, similar to the approach used by Choi et al. (2019). To train this classifier, we uniformly sample images from both datasets and predict which dataset the image came from. In particular, this model predicts P(unfiltered|image), given a prior P(unfiltered) = 0.5. In practice, we don’t want this model to be too powerful, or else it might learn the exact function implemented by our filters in the first place. Instead, we want the model to be smoother than our original data filters, capturing broad categories that are affected by the filters while still being unsure about whether a particular image would be filtered or not. To this end, we trained a linear probe on top of a small CLIP model.

Once we have a classifier which predicts the probability that an image is from the unfiltered dataset, we still need to convert this prediction into a weight for the image. For example, suppose that P(unfiltered|image) = 0.8. This means that the sample is 4 times more likely to be found in the unfiltered data than the filtered data, and a weight of 4 should correct the imbalance. More generally, we can use the weight P(unfiltered|image)/P(filtered|image).[1]

How well does this reweighting scheme actually mitigate the amplified bias? When we fine-tuned our previous filtered model with the new weighting scheme, the fine-tuned model’s behavior much more closely matched the unfiltered model on the biased examples we had previously found. While this was encouraging, we also wanted to evaluate this mitigation more thoroughly using our keyword-based bias heuristic. To measure keyword frequencies while taking our new weighting scheme into account, we can simply weight every instance of a keyword in the filtered dataset by the weight of the sample that contains it. Doing this, we get a new set of keyword frequencies that reflect the sample weights in the filtered dataset.

Across most of the keywords we checked, the reweighting scheme reduced the frequency change induced by filtering. For our previous examples of “man” and “woman”, the relative frequency reductions became 1% and –1%, whereas their previous values were 14% and 6%, respectively. While this metric is just a proxy for actual filtering bias, it is reassuring that our image-based reweighting scheme actually improves a text-based metric so significantly.

We are continuing to investigate remaining biases in DALL·E 2, in part through larger evaluations of the model’s behavior and investigations of how filtering impacted bias and capability development.

Preventing Image Regurgitation

We observed that our internal predecessors to DALL·E 2 would sometimes reproduce training images verbatim. This behavior was undesirable, since we would like DALL·E 2 to create original, unique images by default and not just “stitch together” pieces of existing images. Additionally, reproducing training images verbatim can raise legal questions around copyright infringement, ownership, and privacy (if people’s photos were present in training data).

To better understand the issue of image regurgitation, we collected a dataset of prompts that frequently resulted in duplicated images. To do this, we used a trained model to sample images for 50,000 prompts from our training dataset, and sorted the samples by perceptual similarity to the corresponding training image. Finally, we inspected the top matches by hand, finding only a few hundred true duplicate pairs out of the 50k total prompts. Even though the regurgitation rate appeared to be less than 1%, we felt it was necessary to push the rate down to 0 for the reasons stated above.

When we studied our dataset of regurgitated images, we noticed two patterns. First, the images were almost all simple vector graphics, which were likely easy to memorize due to their low information content. Second, and more importantly, the images all had many near-duplicates in the training dataset. For example, there might be a vector graphic which looks like a clock showing the time 1 o’clock—but then we would discover a training sample containing the same clock showing 2 o’clock, and then 3 o’clock, etc. Once we realized this, we used a distributed nearest neighbor search to verify that, indeed, all of the regurgitated images had perceptually similar duplicates in the dataset. Other works have observed a similar phenomenon in large language models, finding that data duplication is strongly linked to memorization.

The above finding suggested that, if we deduplicated our dataset, we might solve the regurgitation problem. To achieve this, we planned to use a neural network to identify groups of images that looked similar, and then remove all but one image from each group.[2] However, this would require checking, for each image, whether it is a duplicate of every other image in the dataset. Since our whole dataset contains hundreds of millions of images, we would naively need to check hundreds of quadrillions of image pairs to find all the duplicates. While this is technically within reach, especially on a large compute cluster, we found a much more efficient alternative that works almost as well at a small fraction of the cost.

Consider what happens if we cluster our dataset before performing deduplication. Since nearby samples often fall into the same cluster, most of the duplicate pairs would not cross cluster decision boundaries. We could then deduplicate samples within each cluster without checking for duplicates outside of the cluster, while only missing a small fraction of all duplicate pairs. This is much faster than the naive approach, since we no longer have to check every single pair of images.[3] When we tested this approach empirically on a small subset of our data, it found 85% of all duplicate pairs when using K=1024 clusters.

To improve the success rate of the above algorithm, we leveraged one key observation: when you cluster different random subsets of a dataset, the resulting cluster decision boundaries are often quite different. Therefore, if a duplicate pair crosses a cluster boundary for one clustering of the data, the same pair might fall inside a single cluster in a different clustering. The more clusterings you try, the more likely you are to discover a given duplicate pair. In practice, we settled on using five clusterings, which means that we search for duplicates of each image in the union of five different clusters. In practice, this found 97% of all duplicate pairs on a subset of our data.

Surprisingly, almost a quarter of our dataset was removed by deduplication. When we looked at the near-duplicate pairs that were found, many of them included meaningful changes. Recall the clock example from above: the dataset might include many images of the same clock at different times of day. While these images are likely to make the model memorize this particular clock’s appearance, they might also help the model learn to distinguish between times of day on a clock. Given how much data was removed, we were worried that removing images like this might have hurt the model’s performance.

To test the effect of deduplication on our models, we trained two models with identical hyperparameters: one on the full dataset, and one on the deduplicated version of the dataset. To compare the models, we used the same human evaluations we used to evaluate our original GLIDE model. Surprisingly, we found that human evaluators slightly preferred the model trained on deduplicated data, suggesting that the large amount of redundant images in the dataset was actually hurting performance.

Once we had a model trained on deduplicated data, we reran the regurgitation search we had previously done over 50k prompts from the training dataset. We found that the new model never regurgitated a training image when given the exact prompt for the image from the training dataset. To take this test another step further, we also performed a nearest neighbor search over the entire training dataset for each of the 50k generated images. This way, we thought we might catch the model regurgitating a different image than the one associated with a given prompt. Even with this more thorough check, we never found a case of image regurgitation.

Next Steps

While all of the mitigations discussed above represent significant progress towards our goal of reducing the risks associated with DALL·E 2, each mitigation still has room to improve:

  • Better pre-training filters could allow us to train DALL·E 2 on more data and potentially further reduce bias in the model. Our current filters are tuned for a low miss-rate at the cost of many false positives. As a result, we filtered out roughly 5% of our entire dataset even though most of these filtered images do not violate our content policy at all. Improving our filters could allow us to reclaim some of this training data.
  • Bias is introduced and potentially amplified at many stages of system development and deployment. Evaluating and mitigating the bias in systems like DALL·E 2 and the harm induced by this bias is an important interdisciplinary problem that we continue to study at OpenAI as part of our broader mission. Our work on this includes building evaluations to better understand the problem, curating new datasets, and applying techniques like human feedback and fine-tuning to build more robust and representative technologies.
  • It is also crucial that we continue to study memorization and generalization in deep learning systems. While deduplication is a good first step towards preventing memorization, it does not tell us everything there is to learn about why or how models like DALL·E 2 memorize training data.


Alex Nichol, Aditya Ramesh, Pamela Mishkin, Prafulla Dariwal, Joanne Jang, Mark Chen

Writing contributions from
Greg Brockman, Aditya Ramesh, Pamela Mishkin, Mark Chen, Pranav Shyam, Casey Chu, Che Chang, Miles Brundage


  1. When we parametrize P(unfiltered|image) as sigmoid(f(x)), the weight is then exp(f(x)). This can be derived using the definition of the sigmoid:
    $ 1/(1+e^{-f(x)}) / (1-1/(1+e^{-f(x)}))$
    $= 1/(1+e^{-f(x)}) / ((1+e^{-f(x)} - 1)/(1+e^{-f(x)}))$
    $= 1/(1+e^{-f(x)}) / ((e^{-f(x)})/(1+e^{-f(x)}))$
    $= (1+e^{-f(x)})/(1+e^{-f(x)}) / (e^{-f(x)})$
    $= 1 / (e^{-f(x)}) = e^{f(x)}$ ↩︎

  2. To achieve this, we can compute a feature vector $v_i$ for every training image $i$, and then remove all images $j$ such that there exists an $i < j$ where $||v_i - v_j|| < $threshold. To solve this problem naively, we would need to compute every pairwise distance $||v_i - v_j||$, a task that scales quadratically with the size of our dataset. ↩︎

  3. Letting $K$ represent the number of clusters and $N$ the dataset size, this approach only requires $O(K*(N/K)^2) = O(N^2/K)$ pairwise distance calculations, rather than the full $O(N^2)$. Meanwhile, we are still guaranteed that no image will have more than $K$ near-duplicates in the worst possible case ↩︎

23 Jun. 2022
Learning to Play Minecraft with Video PreTraining (VPT)

We trained a neural network to play Minecraft by Video PreTraining (VPT) on a massive unlabeled video dataset of human Minecraft play, while using only a small amount of labeled contractor data. With fine-tuning, our model can learn to craft diamond tools, a task that usually takes proficient humans over 20 minutes (24,000 actions). Our model uses the native human interface of keypresses and mouse movements, making it quite general, and represents a step towards general computer-using agents.

Read Paper
View Code and model weights
MineRL Competition

The internet contains an enormous amount of publicly available videos that we can learn from. You can watch a person make a gorgeous presentation, a digital artist draw a beautiful sunset, and a Minecraft player build an intricate house. However, these videos only provide a record of what happened but not precisely how it was achieved, i.e. you will not know the exact sequence of mouse movements and keys pressed. If we would like to build large-scale foundation models in these domains as we’ve done in language with GPT, this lack of action labels poses a new challenge not present in the language domain, where “action labels” are simply the next words in a sentence.

In order to utilize the wealth of unlabeled video data available on the internet, we introduce a novel, yet simple, semi-supervised imitation learning method: Video PreTraining (VPT). We start by gathering a small dataset from contractors where we record not only their video, but also the actions they took, which in our case are keypresses and mouse movements. With this data we train an inverse dynamics model (IDM), which predicts the action being taken at each step in the video. Importantly, the IDM can use past and future information to guess the action at each step. This task is much easier and thus requires far less data than the behavioral cloning task of predicting actions given past video frames only, which requires inferring what the person wants to do and how to accomplish it. We can then use the trained IDM to label a much larger dataset of online videos and learn to act via behavioral cloning.

Learning to Play Minecraft with Video PreTraining (VPT)
Learning to Play Minecraft with Video PreTraining (VPT)
VPT method overview

VPT Zero-Shot Results

We chose to validate our method in Minecraft because it (1) is one of the most actively played video games in the world and thus has a wealth of freely available video data and (2) is open-ended with a wide variety of things to do, similar to real-world applications such as computer usage. Unlike prior works in Minecraft that use simplified action spaces aimed at easing exploration, our AI uses the much more generally applicable, though also much more difficult, native human interface: 20Hz framerate with the mouse and keyboard.

Trained on 70,000 hours of IDM-labeled online video, our behavioral cloning model (the “VPT foundation model”) accomplishes tasks in Minecraft that are nearly impossible to achieve with reinforcement learning from scratch. It learns to chop down trees to collect logs, craft those logs into planks, and then craft those planks into a crafting table; this sequence takes a human proficient in Minecraft approximately 50 seconds or 1,000 consecutive game actions.

Learning to Play Minecraft with Video PreTraining (VPT)
Learning to Play Minecraft with Video PreTraining (VPT)
Sequence of items required to craft a crafting table, labeled with the median time it takes proficient humans to reach each step
Crafting of a crafting table "zero shot" (i.e. after pre-training only without additional fine-tuning)

Additionally, the model performs other complex skills humans often do in the game, such as swimming, hunting animals for food, and eating that food. It also learned the skill of “pillar jumping”, a common behavior in Minecraft of elevating yourself by repeatedly jumping and placing a block underneath yourself.

Swimming (zero-shot)
Hunting animals (zero-shot)
Eating food (zero-shot)
Pillar jumping (zero-shot)

Fine-tuning with Behavioral Cloning

Foundation models are designed to have a broad behavior profile and be generally capable across a wide variety of tasks. To incorporate new knowledge or allow them to specialize on a narrower task distribution, it is common practice to fine-tune these models to smaller, more specific datasets. As a case study into how well the VPT foundation model can be fine-tuned to downstream datasets, we asked our contractors to play for 10 minutes in brand new Minecraft worlds and build a house from basic Minecraft materials. We hoped that this would amplify the foundation model’s ability to reliably perform “early game” skills such as building crafting tables. When fine-tuning to this dataset, not only do we see a massive improvement in reliably performing the early game skills already present in the foundation model, but the fine-tuned model also learns to go even deeper into the technology tree by crafting both wooden and stone tools. Sometimes we even see some rudimentary shelter construction and the agent searching through villages, including raiding chests.

Learning to Play Minecraft with Video PreTraining (VPT)
Learning to Play Minecraft with Video PreTraining (VPT)
Sequence of items required to craft a stone pickaxe, labeled with the median time it takes proficient humans to reach each step
Improved early game behavior from BC fine-tuning
Crafting a stone pickaxe
Constructing a rudimentary wooden shelter
Searching through a village

Data Scaling

Perhaps the most important hypothesis of our work is that it is far more effective to use labeled contractor data to train an IDM (as part of the VPT pipeline) than it is to directly train a BC foundation model from that same small contractor dataset. To validate this hypothesis we train foundation models on increasing amounts of data from 1 to 70,000 hours. Those trained on under 2,000 hours of data are trained on the contractor data with ground-truth labels that were originally collected to train the IDM, and those trained on over 2,000 hours are trained on internet data labeled with our IDM. We then take each foundation model and fine-tune it to the house building dataset described in the previous section.

Effect of foundation model training data on fine-tuning

As foundation model data increases, we generally see an increase in crafting ability, and only at the largest data scale do we see the emergence of stone tool crafting.

Fine-Tuning with Reinforcement Learning

When it is possible to specify a reward function, reinforcement learning (RL) can be a powerful method for eliciting high, potentially even super-human, performance. However, many tasks require overcoming hard exploration challenges, and most RL methods tackle these with random exploration priors, e.g. models are often incentivized to act randomly via entropy bonuses. The VPT model should be a much better prior for RL because emulating human behavior is likely much more helpful than taking random actions. We set our model the challenging task of collecting a diamond pickaxe, an unprecedented capability in Minecraft made all the more difficult when using the native human interface.

Crafting a diamond pickaxe requires a long and complicated sequence of subtasks. To make this task tractable, we reward agents for each item in the sequence.

Learning to Play Minecraft with Video PreTraining (VPT)
Learning to Play Minecraft with Video PreTraining (VPT)
RL fine-tuned VPT model crafting a diamond pickaxe

We found that an RL policy trained from a random initialization (the standard RL method) barely achieves any reward, never learning to collect logs and only rarely collecting sticks. In stark contrast, fine-tuning from a VPT model not only learns to craft diamond pickaxes (which it does in 2.5% of 10-minute Minecraft episodes), but it even has a human-level success rate at collecting all items leading up to the diamond pickaxe. This is the first time anyone has shown a computer agent capable of crafting diamond tools in Minecraft, which takes humans over 20 minutes (24,000 actions) on average.

Reward over episodes


VPT paves the path toward allowing agents to learn to act by watching the vast numbers of videos on the internet. Compared to generative video modeling or contrastive methods that would only yield representational priors, VPT offers the exciting possibility of directly learning large scale behavioral priors in more domains than just language. While we only experiment in Minecraft, the game is very open-ended and the native human interface (mouse and keyboard) is very generic, so we believe our results bode well for other similar domains, e.g. computer usage.

For more information, please see our paper. We are also open sourcing our contractor data, Minecraft environment, model code, and model weights, which we hope will aid future research into VPT. Furthermore, we have partnered with the MineRL NeurIPS competition this year. Contestants can use and fine-tune our models to try to solve many difficult tasks in Minecraft. Those interested can check out the competition webpage and compete for a blue-sky prize of $100,000 in addition to a regular prize pool of $20,000. Grants are available to self-identified underrepresented groups and individuals.

This was a large effort by a dedicated team. Each author made huge contributions on many fronts over long time periods. All members were full time on the project for over six months. BB, IA, PZ, and JC were on the original VPT project team, and thus were involved for even longer (over a year). Aside from those original team members, author order is random. It was also randomized between IA and PZ.