Researchers from Stanford University and Washington University have developed an open-source artificial intelligence (ai) model that is comformable in performance to openai. The main objective of the resultars was not to create a powerful reasoning-focused model but to undersrstand how the San Francisco-Based Ai Firm Instructed Its O1 Series Models to Perform Test Test Test Time to SCALING. Notably, the researchers were able to showcase the methodology and replicate the model's behavior at an extramely low cost white waiting far lesser computer resources.
Researchers Develop S1-32B Ai Model
The Researchers Detailed The Methodology and Process of Developing The Model in a study Published in the Pre-Print Journal Arxiv. The process involved creating a synthetic dataset from a different ai model and using seveal new techniques such and supervised fin-tuning (SFT). The model is available in a github Listing,
It should be noted that the ai model was not built from scratch. The developers used the qwen2.5-32b-insstruct and distilled it to create the S1-32B Large Language Model (LLM). Released in September 2024, The Model is Capable But Given Its Size and Lack of Reasoning Capability, it cannot match up to Openai's O1.
During the process, the researchrs used the gemini flash thinking application processing interface (api) to generate reasons traces and responses. A total of 59,000 triplets of questions, reasoning traces (the chain of thought or cot), and responses were extracted from the API. A dataset called the s1k was then created by selecting
After creating the s1k dataset, the researchrs performed supervised fin-tuning on the Qwen2.5-32B-insstruct model. For this, basic fin-tuning hyperparameters ware. The distillation process took 26 minutes of training on 16 Nvidia H100 Gpus.
Till this point, The Researchers Had No Idea How Openai Trained The Models to “Think” and how it managed to stop the thought process. Without this, a model runs the resk of overthinking indefinite as it second-guaseses its output wasting valuable processing power.
While Fine-Tuning The Model, The Researcher Found Something Interesting. They found 
With the S1-32b Model, The Researchers Added A “Wait” Command to Force It to Think Beyond The Usual Infection Period. Once added, The Model Began Second-Guesing and Verifying Its Output. Then, the tag was used to either shorten this test time scaling phase or lengthen it.
Then, The Researchers also experienced several other phrasses such as “alternatively”, and “hmm”, but found that the best performance metrics were as across the “Wait” Tag. By brings the model close to the performance of o1, the results claim that this might be the method used by openai to fin-tune its reasoning models.
A techcrunch Report Claims that the resarchers were abled the S1-32B AI Model Under $ 50 (roughly Rs. 4,380), highlighting that creating a post-training structure for Reasoning Models Can Be Done AT An Exorcism Low Done

 
                        