The model can use complex reasoning skills to solve math and science problems.
All paid OpenAI API users can now access preview and mini versions of the hotly-anticipated o1 large language model.
OpenAI has also enabled streaming for the models, allowing users to recieve responses incrementally as they're being generated. This can result in faster and more efficient responses.
Designed for advanced reasoning and problem-solving, o1-preview performs above PhD-level for a number of scientific areas. It's also better at coding and math than its predecessors. But it responds much more slowly than older models like GPT-4o.
The mini version is more limited than o1-preview, but provides much faster responses.
Before now, the model was only accessible via API for "tier five" developers, who have spent at least $1,000 in the OpenAI platform.
It's also been available to paid ChatGPT users for several weeks. But neither o1-preview nor o1-mini can search the internet through the ChatGPT interface.
Nonetheless, the models will still be useful for users who want to solve complex scientific, coding or mathematical problems that go beyond the reach of GPT-4o.
OpenAI has trained o1 to reason through problems step-by-step, similar to how humans process chains of thought.
This "teaches the model how to think productively using its chain of thought in a highly data-efficient training process," according to a company blog.
This means answers may come more slowly. But they're much more likely to be accurate.
OpenAI says the models respond relatively slowly as they're supposed to spend more time "thinking" through their answers. In other words, for now they're focused on accuracy rather than speed.
Math has long been a weak spot for GPT and other LLMs. But OpenAI says o1-preview is a substantial improvement on GPT-4o.
It correctly answered 83% of the questions in an International Mathematics Olympiad entry exam. GPT-4o, on the other hand, only got 13% of its answers right.
If you're interested in its math capabilities, scroll to the bottom of this article for responses to a math challenge Indie Hackers set each model.
The o1 models' academic strengths also stretch to science. OpenAI says o1-preview is better at biology, physics, chemistry and coding than its predecessors. It even beat 'PhD-level' expert human benchamarks:
The models are also less likely to produce harmful completions about violence, harrassment and other forms of wrongdoing.
You can find a full list of the model's benchmark results on the OpenAI website.
The o1-preview model is much more expensive than GPT-4o, the next smartest OpenAI LLM.
All models are priced according to tokens, which are small units of language slightly shorter than most words.
It costs $15 per million input tokens and $60 per million output tokens. Output is partly so expensive because it includes additional reasoning tokens which weren't a part of previous model responses.
GPT-4o, on the other hand, costs $2.50 per million input tokens and $10 per million output tokens.
o1-mini is also more expensive than GPT-4o at $3 per million input tokens and $12 per million output tokens.
For comparison, GPT-4o-mini costs $0.15 per million input tokens and $0.6 per million output tokens.
Discounts for cached inputs are available for all listed models. Unlike the GPT-4o models, o1 model requests cannot be sent in batches.
If you've made it this far, congratulations! Here are the results of a tricky math problem Indie Hackers asked o1-preview, o1-mini and GPT-4o to solve.
The models were asked to simplify the same expression to get a sense of how 'chain of thought' reasoning works in practice:
The o1-mini model quickly recognized the Pythagorean identity of cos²(𝒙)+sin²(𝒙) = 1 on the left hand side of the equation.
It used this as its starting point to break down and simplify the right hand side to 1. The whole process took six seconds.
GPT-4o took about a minute to respond, also spotting and working back from the Pythagorean identity.
The o1-preview model, however, worked on the right hand side of the equation first, recognizing it could something called Euler's formula to simplify (eᶦˣ + e⁻ᶦˣ)²/4.
It then invoked the Pythagorean identity to break the expression down further.
After a total thinking time of 1 minute and 39 seconds (and a second prompt), the o1-preview model correctly simplified the expression to 1.
I'm really happy for OpenAI, and imma let them finish, but all I hear is: it's expensive and slower. Anyone who already tried it in production please weigh in, is it worth it for regular LLM load? I've played with the "preview" already but the outputs were... different than anything I've seen from previous models. Using the same prompt you have for 4o's output might not be the smoothest "upgrade."
Your writing is always engaging! I appreciate how you emphasize the significance of clear documentation in API design. I’ve integrated EchoAPI into my process, and it has made documentation management so much easier.
I can’t wait to work with the streaming API. My WritPop app right now takes 5 or 6 seconds to update document areas because I am using html, so having it update with the option for a non technical user to see the code changes happen in real time is fun and educational.
I've looked at the score of OpenAI. And see that the scoring of new GPT-4o has already surpass the o1 model. So that might be why o1 becomes available. Besides, 4o is faster in response and can analyze pictures. So maybe I'll just stick to 4o for now.
Anyone know if it supports structured output? It's what has kept us from adopting it. https://platform.openai.com/docs/guides/structured-outputs
I believe it works without any problems. I was able to successfully employ the gpt-4o-2024-08-06 model with structured output through chat.completions. Now I'm trying to implement a solution with structured output for exchanging messages with the Assistant.
No idea - curious to know as well.
I can’t wait to work with the streaming API. My WritPop app right now takes 5 or 6 seconds to update document areas because I am using html, so having it update with the option for a non technical user to see the code changes happen in real time is fun and educational.
hello