Why are the videos generated by the domestic large model comparable to Sora? ,Under limited computing power

Release time:Sep 18, 2024 07:47 AM

Recently, a number of multimodal large models developed by Shanghai-based Xiyu Technology were released at Xuhui Binjiang. Dr. Yan Junjie, the company's founder, also delivered a speech at the 2024 Pujiang Innovation Forum Global Venture Capital Conference as a representative of entrepreneurs. The video generated by the large model he played in his speech was quite good. Whether it was a magical skit in the style of the Harry Potter movies or a sci-fi video of astronauts sailing in space on a spacecraft, the experience it brought to the audience was comparable to Sora developed by OpenAI.

Under the condition of limited computing power, how can domestic large models generate high-quality text, pictures, videos, music and voice? Yan Junjie shared his views.

Yan Junjie graduated from the Institute of Automation, Chinese Academy of Sciences. He was the vice president of SenseTime Group and founded Xiyu Technology at the end of 2021. In his opinion, there are currently three important optimization directions for large artificial intelligence models: First, to continuously reduce the error rate of the model, because most models have a high error rate, sometimes amazing performance, sometimes unreliable, which becomes a major bottleneck restricting the model from handling complex tasks; second, to achieve infinite input and output, because this is a human ability, and the computing requirements of large models will soon reach the upper limit that computing power cannot afford as the square of the input and output processing volume increases. This bottleneck requires underlying innovation to break; third, multi-modality, that is, text, sound, pictures, videos and other modalities can be generated to interact with users in various information.

Video generated by MiniMax large model

"How do we overcome technical difficulties in these three areas? We believe that within the same capabilities, faster is better," said Yan Junjie. "Among two models with similar performance, the one with faster training and reasoning can more effectively use computing resources to iterate more data, thereby obtaining better model capabilities. So we believe that faster is better. This is a simple but easily overlooked philosophical concept."

In pursuit of "speed", the MiniMax team has made a number of technical innovations to the large model. MoE is one of the innovations. When this architecture was not yet recognized by most experts, they decided to be the first in China to complete a breakthrough in the core MoE algorithm technology route.

It is reported that the design idea of ​​the hybrid expert model is "specialization in one's field", that is, to classify tasks and then assign them to multiple "experts" to solve. The corresponding concept is the dense model, and the "generalist" model adopts this architecture. Compared with a "generalist", a group of "experts" can complete complex tasks more efficiently and professionally, and can also greatly increase the model capacity without significantly increasing the computing cost, making large models at the trillion-parameter level possible. In the abab-text-6.5s large language model developed by Xiyu Technology, the MoE model is 3-5 times faster than the dense model. This large model can handle billions of interactions every day, and MOE plays a key role.

The LinearAttention mechanism is also a technological innovation carried out by the MiniMax team. Through algorithm optimization, it transforms the quadratic growth relationship between input length and computational complexity in the traditional model architecture into a linear relationship, taking a key step towards "achieving infinite input and output".

Yan Junjie introduced the models and products developed by MiniMax.

Supported by technologies such as hybrid expert models and linear attention mechanisms, the video model abab-video-1 has the characteristics of high compression rate, good text response, and support for native high-resolution and high-frame rate videos, which is comparable to the texture of movies. The music model abab-music-1 supports multifunctional end-to-end music generation and can be used to synthesize pure music, a cappella works and other music forms, and can meet the simultaneous generation of accompaniment and vocals. It is expected to greatly simplify the music recording and creation process, allowing laymen to engage in music creation. Readers can log in to the web version of "Conch AI" to experience the fun of creating videos and music.

Video generated by MiniMax large model

Xiyu Technology has also updated the voice model abab-speech-1, which can generate synthesized speech in multiple languages ​​such as Mandarin, Cantonese, Japanese, Korean, Spanish, etc., with a high degree of anthropomorphism and delicate and natural emotional changes.

Yan Junjie introduced that currently, the MiniMax large model interacts with end users 3 billion times a day, processes more than 3 trillion token texts every day, and generates 20 million images and 70,000 hours of voice.

Video generated by MiniMax large model

The users who interact 3 billion times a day come from both the company's own products such as "Conch AI" and "Xingye" and the company's open platform partners. For example, Kingsoft Office Software cooperated with MiniMax, and through the thinking chain, WPS can display the reasoning steps of the large model when generating document summaries and answering user questions, thereby improving the transparency and credibility of the solution; the mobile office platform "DingTalk" cooperated with it to obtain the ability to generate copy and follow the format, which improved the production efficiency of users; the online literature website "Yuewen" obtained the ability to quickly understand the overall context of the context through cooperation, and can maintain emotional consistency in the production of audio books of novels, and can accurately analyze the emotions of the characters and perform stylized interpretations; the human resources platform "Zhaopin" cooperated with it to use vertical industry and full-time industry data to fine-tune the model, which greatly improved the accuracy of AI interview evaluation, job description information extraction and resume matching.

With the release of video models, music models, and voice models, Xiyu Technology has created a full set of multimodal large model products. Yan Junjie revealed that in the next few weeks, the company will release the multimodal large model abab7, which will be comparable to GPT-4o in speed and effect, and will be tested by partners and end users.

comment
Assisting in the cultivation of digital talents in urban operation and management, this municipal industry education consortium has established a park | industry | consortium
Assisting in the cultivation of digital talents in urban operation and management, this municipal industry education consortium has established a park | industry | consortium

On July 13th, the Caohejing Emerging Technology Development Zone Municipal Industry and Education Federation was officially established, which is one of the 14 municipal industry and education consortia established in Shanghai. The municipal industry education consortium adheres to promoting industry through education and assisting education through industry, deepening the integration of industry education and industry education cooperation, fully leveraging the role of government coordination, industry aggregation, enterprise guidance, and schools as the main body. Based on industrial parks, it is committed to building a group of consortia that combine talent cultivation, innovation and entrepreneurship, and promote high-quality development of industrial economy. List of proposed projects for the Shanghai Municipal Federation of Industry and Education. The Caohejing Emerging Technology Development Zone Municipal Industry Education Alliance was established by Shanghai Urban Construction Vocational College in collaboration with Shanghai Lingang Caohejing Talent Co., Ltd., relevant universities, and industry enterprises. This consortium takes Caohejing Development Zone as its practical base, with members covering the Metaverse Industrial Park and Intelligent

Large, medium, and small enterprises will carry out innovative integration, and there are huge business opportunities for carbon capture, utilization, and storage technology innovation | Large, medium, and small enterprises | Technology
Large, medium, and small enterprises will carry out innovative integration, and there are huge business opportunities for carbon capture, utilization, and storage technology innovation | Large, medium, and small enterprises | Technology

To achieve the goals of peaking carbon emissions and achieving carbon neutrality in our country, we cannot do without the integration and innovation of large, medium, and small enterprises. Recently, the "Convergence and Fission" Double Carbon Integration Innovation Forum for Large, Medium, and Small Enterprises was held at the Antai School of Economics and Management, Shanghai Jiao Tong University. More than 100 representatives from government agencies, leading enterprises, industry elites, investment institutions, and technology service institutions participated in the exchange. In the keynote report, experts from the Carbon Neutrality Development Research Institute of Shanghai Jiao Tong University introduced the current status and trends of carbon capture, utilization, and storage technology, pointing out that this cutting-edge technology can become a key focus of innovation for large, medium, and small enterprises. The Shanghai Action Plan for Targeting the New Track and Promoting the Development of Green and Low Carbon Industries proposes to promote the application of the new generation of phase-change carbon dioxide capture technology, break through key indicators such as solvent loss and regeneration heat consumption, and reduce capture costs. Promote carbon capture

This is already the 17th mentor studio of the Shanghai Education Commission, and the unveiling system of Zhu Weilie's mentor studio | Education | Shanghai
This is already the 17th mentor studio of the Shanghai Education Commission, and the unveiling system of Zhu Weilie's mentor studio | Education | Shanghai

Recently, the Zhu Weilie Mentor's Studio of the Shanghai Education System Care for the Next Generation Work Committee was officially unveiled at Shanghai Foreign Studies University. Professor Zhu Weilie, Honorary Director of the Institute of Middle East Studies at Shanghai Foreign Studies University and Director of the Research Center of the China Arab Cooperation Forum, has been tirelessly committed to the construction of Chinese characteristic Middle East and Arab studies for decades. He has made outstanding achievements in academic research, discipline construction, talent cultivation, governance and enlightenment, social services, international exchanges, and other related fields of Middle East studies. Professor Zhu Weilie is an ambassador of culture and friendship between China and Arab countries, making tremendous contributions to promoting exchanges and progress between the two major civilizations. He is enthusiastic about the growth of the next generation, guiding students to establish firm ideals and beliefs, guiding young teachers to strengthen their ideals and beliefs, cultivating moral character, and cultivating solid character

You still need to work hard, there is no rhythm of life. AI middle school entrance exam essay with the same topic | Chinese language teacher's review: Seemingly organized Fudan University | Huang Yufeng | Composition
You still need to work hard, there is no rhythm of life. AI middle school entrance exam essay with the same topic | Chinese language teacher's review: Seemingly organized Fudan University | Huang Yufeng | Composition

On June 17th, 2023, the Shanghai High School Entrance Examination will begin. The 600 word essay titled "The Joy of Understanding" has once again sparked public interest both inside and outside the candidate's family. Essay questions, whether difficult or not; If it were you, how would you write it? In this era of booming artificial intelligence, the issue of essay writing in college entrance exams may be approached from a machine perspective. Jiefang Daily · Shangguan News has invited major domestic generative pre training large models to write essays on the same topic on multiple AI platforms, and invited relevant experts from various fields to comment on how these robots understand natural language in the "simulation test". Has the writing skills surprised humans? How exactly is it written? Let's learn and learn from each other. This morning, the 2023 Shanghai High School Entrance Examination Chinese composition question was unveiled, titled "The Joy of Understanding". AI Plus

Also leading the revision of four new versions of traditional Chinese medicine national standards, this Chinese medicine master is in charge of adding masterpieces to fill the academic gap in medical history literature. Master | National Standards | Literature
Also leading the revision of four new versions of traditional Chinese medicine national standards, this Chinese medicine master is in charge of adding masterpieces to fill the academic gap in medical history literature. Master | National Standards | Literature

As the son of the renowned maritime physician Yan Cangshan, Yan Shiyun, a master of traditional Chinese medicine, has been practicing medicine and teaching for a year. He is now 83 years old. However, in his old age, he continued to work tirelessly on the Apricot Altar, leading the traditional medical classics and related standards and regulations both domestically and externally. On the 13th, the revision of the Chinese Medical Registration Examination, which fills the academic gap in medical history literature, was launched. This 5 million word masterpiece was co authored by him 30 years ago and continued to be co edited by him 30 years later. At the same time, the tenured professor and doctoral supervisor of Shanghai University of Traditional Chinese Medicine also led a team to revise four new national standards for traditional Chinese medicine and achieve compatibility with relevant international standards of the World Health Organization. The Jiefang Daily Shangguan News reporter learned that from 1990 to 1994, Shanghai University of Traditional Chinese Medicine Press published the "General Examination of Chinese Medical Registration"