Launched GPT-4o, free to use, more chatty....OpenAI "Zailiangbaobao" ChatGPT|GPT-4o|....OpenAI
On May 14, the U.S. Open Artificial Intelligence Research Center brought another surprise to users. It announced the launch of its latest flagship generative AI model GPT-4o at the spring conference that day.
Compared to GPT-4Turbo, GPT-4o is half the price, 2x faster, and has 5x higher rate limits.
In addition to the ability to have multiple modalities, the launch of GPT-4o also comes with a major highlight: it is free. It is reported that it will be integrated into various OpenAI products in stages in the next few weeks.
In the early morning of the 14th, Beijing time, OpenAI Chief Technology Officer Meera Murati brought out the new model GPT-4o and desktop App at the press conference, and demonstrated a series of the company's innovations.
GPT-4o has significant advantages over the previous generation in terms of speed and price. It can handle 50 different languages and has the ability to process text, images, audio and other modalities, which can bring users a more natural and smooth experience. interactive experience.
At this press conference, Murati mainly listed several points.
First, users of the new model GPT-4o do not need to register, and all functions are free.
Until then, free users of ChatGPT could only use GPT-3.5. After the update, users can use GPT-4o for free to perform data analysis, image analysis, Internet search, access to app stores and other operations.
Of course, the premise of being free is limited to a certain number of messages. Once the specified number is exceeded, the model for free users will be switched back to GPT-3.5. And paid users will get a higher number of messages, at least 5 times that of free users.
On the same day, OpenAI CEO Sam Altman tweeted that the new GPT-4o was OpenAI’s “best model ever.”
Second, ChatGPT has added a PC desktop version.
Using the ChatGPT app on your computer
Apple computer users will receive a ChatGPT desktop application designed for macOS. Users can "shoot" the desktop through shortcut keys, then synchronize the screenshot to ChatGPT and ask it questions. This lightweight experience can be seamlessly integrated into the user's workflow, reducing the time spent logging in to web pages.
OpenAI also said that a Windows version will be available later this year.
Mulati said this is also the first time they have made improvements in ease of use.
In addition, ChatGPT has also optimized the user interface, aiming to improve the user experience, make the interaction more smooth and natural, and ensure that users focus on efficient cooperation with ChatGPT rather than the interface operation itself.
After the press conference, the industry was in an uproar. Some media say this heralds "an evolution in the smart era." In the future, the Internet in mobile devices may be condensed into one program, and users can use it to solve all their needs: texting, navigation, identifying objects, hailing a taxi, etc.
At the press conference, GPT-4o’s ability to process multiple modalities such as text, images, and audio was constantly mentioned. According to reports, GPT-4o supports input of any combination of text, audio, and images, and generates output of any combination of text, audio, and images.
As early as a few days ago, Altman announced in a podcast that OpenAI will improve and enhance the quality of ChatGPT's voice function, and expressed his belief that voice interaction is an important way to the future of interaction.
The "o" in the new model GPT-4o is the abbreviation of Omni, which can perform audio, visual and text reasoning in real time.
GPT-4o can respond to audio input in as little as 232 milliseconds and an average of 320 milliseconds, consistent with how quickly humans respond in conversation. In other words, it can already achieve a state of "real-time" response, no longer needing to wait awkwardly for a few seconds before getting an answer like before.
At the same time, just like chatting with a real person, users can interrupt GPT-4o during its response and make more requests, such as changing the topic, asking it to change the intonation of its voice, or even answering in the form of a robot or a musical.
Secondly, it captures emotions more sensitively and delicately.
At the press conference, GPT-4o was able to understand the meaning of "nervous" from the host's gasping sound and guide him to take a deep breath. When it is praised, it will immediately respond: "Stop talking, you make me blush."
At the press conference, the creative team also demonstrated the use of various functions of GPT-4o, including real-time translation, teaching how to solve equations, and recognizing human expressions, etc.
After witnessing GPT-4o's fluent responses, many users commented that this new model "seems more talkative, and sometimes even a little frivolous."
However, the use of GPT-4o is also accompanied by some mistakes. It mistook the host's smiling face for a desktop and tried to solve the problem without the equations being displayed.
Last year, Grok, the first large-scale artificial intelligence model product released by Musk's xAI team, became famous for its "no taboo" answer to user questions. The xAI team also said when introducing the product: “If you don’t like humor, don’t use it!”
In fact, both Grok and the AI robot Pi developed by Suleiman, co-founder of Google subsidiary DeepMind, have distinct personalized characteristics.
Some comments say that in comparison, GPT-4o's ability to stably process text, images, and audio puts OpenAI in a leading position in this artificial intelligence competition.
Interestingly, Murati described GPT-4o as “magical” when he introduced it. But she also added that with the launch of the product, the company will "eliminate the mystery."
Some analysts pointed out that the free implementation of GPT-4o is a key, which means that OpenAI has begun to increase its efforts to bring large models to the market.
Recently, it was revealed that Apple is finalizing an agreement with OpenAI to introduce some of the latter’s technology into the iPhone this year. With this, Apple will be able to provide "chatbots" powered by ChatGPT as part of the artificial intelligence capabilities in iOS 18.
Although rumors about OpenAI’s cooperation with Apple have always existed, they have not been confirmed. But sharp-eyed people also discovered that Apple products were widely used at the OpenAI conference.
At the same time, Google’s 2024 I/O Developer Conference will be held at 1:00 on May 15th, Beijing time, exactly 24 hours after the latest OpenAI conference. It is reported that Google will display its latest artificial intelligence development results at the conference and release the latest developments of the Gemini large model.
At the end of last year, Google announced the launch of what it considers to be the largest and most powerful artificial intelligence model, Gemini, which also has strong understanding and reasoning capabilities in recognizing text, images, and videos.
When major technology giants are showing off their magic weapons to compete for the throne, the public is also curious about who will get more favor from the market and the industry.