How OpenAI Determines What ChatGPT Should—and Shouldn’t—Do

(SeaPRwire) – OpenAI released a blog post on Wednesday detailing the guiding principles and operational framework of its Model Spec—the blueprint that governs how ChatGPT, the planet’s most popular chatbot, operates.

According to the blog post, the Model Spec serves as an “interface” that clarifies the intended conduct of the world’s most widely used chatbot, enabling researchers, policymakers, and the general public to examine and discuss it. This has produced a 100-page manual that outlines how OpenAI’s systems should resolve potential conflicts between duties to society, the company itself, developers who create applications using OpenAI’s technology, and individual users. Consequently, the Spec operates on a hierarchical structure where bans on “high severity harms” override instructions from developers or users, with regular updates—the latest version from December.

Jason Wolfe, who coordinates contributions to the Model Spec from various divisions of OpenAI, describes his role as “a maintainer in the tradition of an open-source side project.” Wolfe notes that researchers have started contacting OpenAI for guidance on creating comparable Model Specs, which motivated the blog post explaining the company’s rationale behind the document. “There’s a lot of material here that could be valuable to people,” he says.

The blog post clarifies, however, that the document is “not an implementation,” prompting questions about how accurately it reflects what truly molds ChatGPT during its training phase. Wolfe describes the mechanism through which the Spec affects model conduct as “complicated.” While portions of the Spec are occasionally incorporated directly into “alignment” training—where AI systems are taught to act appropriately—just as frequently, the principles documented are condensed versions of comprehensive work performed by internal safety teams. “In many instances, the Spec and the training are essentially parallel processes that we maintain in alignment,” Wolfe explains.

The expanding user base of ChatGPT means the guidelines in the Spec could impact hundreds of millions globally. By February, approximately 10% of the world’s population were weekly active users of the chatbot. This challenge is intensified by OpenAI’s swift internal expansion, as the firm intends to nearly double its workforce from 4,500 employees by year-end 2026. Per the blog post, dozens of staff members across the organization have directly provided content for the Spec, drawing from research, product, and legal departments, among others. Wolfe collaborates especially closely with the leaders of model behavior and policy.

OpenAI isn’t alone among AI firms struggling with how to direct model behavior and gather feedback as AI’s societal influence increases. In January, Anthropic released the “Claude Constitution,” an 80-page paper characterizing the kind of entity it envisions for Claude, its premier model. The two documents diverge significantly in tone: Anthropic’s Constitution resembles a treatise on moral philosophy, while OpenAI’s Spec is more like a collection of legal precedents with illustrations of preferred conduct. Sharan Maiya, a Cambridge University PhD candidate in AI alignment, observes that “the Anthropic Constitution is more philosophical, and the OpenAI Spec is more behavioral.”

The two companies also employ their documents differently. Wolfe states that OpenAI’s Spec is “first and foremost, a document for people,” valuable for establishing agreement on preferred model conduct but more distant from what the system actually internalizes. Conversely, Amanda Askell, the philosopher overseeing Anthropic’s Constitution, explains that Anthropic provides its Constitution to Claude “to generate training material itself that helps it comprehend the document.” Users discovered a pre-release version of Anthropic’s Constitution reproduced completely in Claude’s outputs before its formal launch, proving the model had memorized the constitutional text. “I do want the document to resonate well with Claude models,” Askell said to TIME in January.

Comparable publications outlining desired model behavior have not been released by Google DeepMind, xAI, or Meta.

Boundary lines for AI systems have become especially prominent lately. Last month, OpenAI entered into an agreement with the Department of War following Anthropic’s refusal to eliminate restrictions concerning domestic mass surveillance and autonomous weapons. OpenAI subsequently reversed course, with Sam Altman conceding the agreement appeared “opportunistic and sloppy.” A revised version of the pact stipulates that “the AI system shall not be intentionally used for domestic surveillance of U.S. persons and nationals,” though legal specialists disagree on the robustness of these protections.

The Model Spec declares that OpenAI’s models must “never” be utilized to enable mass surveillance.

“I hope that, as much as feasible given the confidential nature of this work, if we modify our policies for classified deployments, we can discover methods to make those modifications open,” says Wolfe.

This article is provided by a third-party content provider. SeaPRwire (https://www.seaprwire.com/) makes no warranties or representations regarding its content.

Category: Top News, Daily News

SeaPRwire provides global press release distribution services for companies and organizations, covering more than 6,500 media outlets, 86,000 editors and journalists, and over 3.5 million end-user desktop and mobile apps. SeaPRwire supports multilingual press release distribution in English, Japanese, German, Korean, French, Russian, Indonesian, Malay, Vietnamese, Chinese, and more.

ICE’s Biggest Prison Contractors Hail ‘New Growth Opportunities’ as Revenue Reaches All-Time High

Sleep Needs Throughout Life: How Much Rest You Require Based on Your Age

Starmer Condemns Musk’s Misinformation Campaign Against U.K. Democracy “`