How was the "Cao Zhi" big language model born? Let's take a look at the CTO presentation of Daguan Data

Release time:Apr 15, 2024 22:54 PM

"Cao Zhi composed poetry in seven steps, and his most famous chapter," Luoshen Fu, "is a typical long text in ancient literary works. This is also the specialty of the" Cao Zhi "big model, which is the intelligent analysis and writing of long document data." At the 2023 World Artificial Intelligence Conference, Chen Yunwen, Chairman of Daguan Data, officially released the "Cao Zhi" vertical field big language model.

This is the first domestically produced independent and controllable GPT language model specifically designed for vertical industries in China. It can accurately complete long text writing with multiple types and complex structures, automatically draft various types of documents, and in the future, it will achieve multimodal content generation, such as tables, charts, and images in long documents. So how was "Cao Zhi" born? Let's listen to Ji Daqi, CTO of Daguan Data.

Deeply cultivate the field of NLP

Daguan Data was founded in 2015 and grew up in Shanghai Pudong Software Park. Its founding team consists of program veterans who have worked with Chinese characters for over a decade and are deeply involved in the NLP field. In March of this year, with the release of the vertical, specialized, and independently controllable domestic version of the ChatGPT "Cao Zhi" big model, Daguan Data is continuously promoting the deep integration of NLP technology into different industry fields.

NLP is hailed as the jewel on the crown of AI. From the Internet to a broader industry, Daguan Data has accumulated a large amount of data, talents and NLP traditional architecture in the vertical fields in finance, government affairs, manufacturing and other industries. After extensive communication with clients from industries such as finance, government, and manufacturing, Ji Daqi, co-founder and CTO of Daguan Data, gradually discovered that NLP technology has broad application prospects in office documents.

In 2017, Google published a paper proposing two technical paths for NLP regarding "understanding" and "generation". "Based on the advantageous resources and future development of Daguan Data at that time, we chose the technological route of 'understanding' from the beginning." Ji Daqi introduced. This year, the IDP intelligent document review system developed by Ji Daqi and his R&D team using technologies such as knowledge graph and text recognition entered the market.


How was the "Cao Zhi" big language model born? Let's take a look at the CTO presentation of Daguan Data

With the continuous development of artificial intelligence, the demand for machine intelligence to process long texts is becoming increasingly urgent. Subsequently, Daguan Data was invested in the development of the big language model, with Ji Daqi serving as the overall project leader. This is the starting point of the birth of the current "Cao Zhi" model.

"Cultivate" an artificial intelligence version of "Cao Zhi"

"We want to 'cultivate' an artificial intelligence version of 'Cao Zhi', hoping that it can quickly grow into texts like the famous historical figure Cao Zhi in China." Speaking of the origin of the name 'Cao Zhi' big model, Ji Daqi smiled, "This was chosen by our employees from 40 to 50 names."

"Long text" is the target task of the "Cao Zhi" big model. Unlike the simple short text generation of question and answer, the "Cao Zhi" model can accurately complete the writing of long texts with multiple types and complex structures, automatically draft various types of documents, and has features such as automatic typesetting, intelligent error correction, text polishing, and automatic summary generation; It can also achieve multimodal content generation, such as tables, charts, images, etc. in long documents; Support writing in dozens of languages including Chinese, English, French, German, Japanese, Korean, etc., greatly improving office efficiency by assisting manual labor; In terms of long document translation, it achieves a 1:1 layout restoration of the original text's titles, paragraphs, and other content, providing a real-time translation experience, and is widely used in scenarios with intensive processing of multilingual documents.

This is also one of the first industrial application level models that can be implemented in large-scale language models in China, and has been put into use in multiple scenarios of AIGC in the financial field. Based on the "Cao Zhi" system, the "Cao Zhi" big model further consolidates the intelligent foundation of Daguan data industry application, comprehensively enhancing the AI full product matrix capability.

Text: Lu Xiaoyu


How was the "Cao Zhi" big language model born? Let's take a look at the CTO presentation of Daguan Data

Information: District Science and Technology Commission

*Reprinted from official WeChat account released by Pudong

Two women were stabbed to death and reported to have committed a crime 4 days before the follow-up visit for schizophrenia. Suspect of a bloody murder case in a Hong Kong shopping mall appeared in court today. Male | Last Friday | Murder case
Two women were stabbed to death and reported to have committed a crime 4 days before the follow-up visit for schizophrenia. Suspect of a bloody murder case in a Hong Kong shopping mall appeared in court today. Male | Last Friday | Murder case

According to Hong Kong's Wen Wei Po, a bloody knife stabbing case occurred at Hollywood Square in Diamond Hill last Friday. The police arrested a 39 year old man on suspicion of stabbing two young women, one of whom was stabbed over 30 times. The suspect appeared in the Kwun Tong Magistrates Court this morning. The police at the Kwun Tong Magistrate's Court temporarily charged the suspect with two counts of murder last Sunday. The suspect appeared in court this morning at the Kwun Tong Magistrate's Court. Acting Chief Magistrate Zheng Jihang, after listening to the opinions of both the prosecution and defense, decided to postpone the hearing for two weeks until 9:30 am on June 19th, waiting for two psychiatric expert reports to be obtained. The defense did not object. Zheng Jihang approved the application, and the defendant needs to be temporarily detained at Xiaolan Mental Hospital. When the suspect appeared in court, he wore black framed glasses, a light gray shirt, and camouflage green shorts, and was able to answer the judge's questions normally. accordingly

Secretary of the Provincial Party Committee: The focus of Henan's "summer harvest" has shifted to agricultural machinery in the northern region of Henan Province. | Support | Science | Organization | Province | Northern Henan | Summer Harvest | Rush Harvest
Secretary of the Provincial Party Committee: The focus of Henan's "summer harvest" has shifted to agricultural machinery in the northern region of Henan Province. | Support | Science | Organization | Province | Northern Henan | Summer Harvest | Rush Harvest

Currently, the highly anticipated summer harvest work in Henan has shifted its focus to the northern region of Henan. According to the Henan Daily client, on June 4th, Lou Yangsheng, Secretary of the Henan Provincial Party Committee, presided over a special video scheduling meeting on the "Three Summers" work in the province, listened to the situation report, analyzed and judged the situation, and arranged and deployed the next steps of work. Governor Wang Kai made specific arrangements. On the evening of May 31, 2023, in Xiafutou Village, Xuliang Town, Boai County, Jiaozuo, Henan Province, villagers braved light rain in the wheat fields to harvest wheat. Visual China Map Lou Yangsheng pointed out that the current summer harvest battle in the province has entered the decisive stage. Doing a good job in summer harvest in northern Henan Province is related to the summer grain yield and seed safety. We should focus on seizing opportunities and make every effort to organize the wheat harvesting work in the northern Henan region, minimize losses, and protect the interests of farmers to the greatest extent possible. Accurate forecasting is essential

Xinhua All Media+| Welcome home! What innovative technologies are protecting the return journey of Shenzhou 15? Spaceship | Shenzhou | Technology
Xinhua All Media+| Welcome home! What innovative technologies are protecting the return journey of Shenzhou 15? Spaceship | Shenzhou | Technology

On June 4th, the return capsule of the Shenzhou-15 manned spacecraft successfully landed at the Dongfeng landing site. Astronauts Fei Junlong, Deng Qingming, and Zhang Lu all safely and smoothly exited the spacecraft, and the Shenzhou-15 manned flight mission was a complete success. What innovative technologies are there to safeguard the return journey of Shenzhou 15 in this mission? On June 4th, the return capsule of the Shenzhou-15 manned spacecraft successfully landed at the Dongfeng landing site. Xinhua News Agency reporter Lian Zhen photographed that "the sky and the ground" ensure the high-precision return of spacecraft. For the Shenzhou series spacecraft, the return and re-entry GNC technology is directly related to the life safety of astronauts. Taking the success of this return mission as a symbol, China has comprehensively upgraded its GNC system since the Shenzhou-12 manned spacecraft, which features autonomous rapid rendezvous and docking, autonomous adaptive prediction and re-entry return guidance, and has completed a comprehensive update and replacement

The Chinese naval fleet has arrived! Assembly | Navy | Chinese Fleet
The Chinese naval fleet has arrived! Assembly | Navy | Chinese Fleet

At noon today, a Chinese naval fleet consisting of Zhanjiang and Xuchang ships arrived at the assembly area of the "Comodo-2023" multinational maritime joint exercise. It is understood that the assembly anchorage for this exercise is 3 nautical miles long and 1.5 nautical miles wide, capable of anchoring up to 50 ships. Naval vessels from various countries participating in the exercise will also arrive at the anchorage today to complete the assembly of the "Komodo 2023" multinational maritime joint exercise, which is held every two years by the Indonesian Navy. This year is already the fourth edition of the exercise. The exercise will be held from June 5th to 8th in the city of Jakarta, South Sulawesi Province, Indonesia, including the port and sea phases. In the coming days, participating navies from various countries will participate in ship reading style search and rescue exercises, maritime interception and damage management exercises, aerial exercises, and other course objectives exercises

New comment: Donkey like "morale" limit pulls US debt "bomb" fuse hard to dismantle US | debt | morale
New comment: Donkey like "morale" limit pulls US debt "bomb" fuse hard to dismantle US | debt | morale

On the evening of June 1st, the US Senate passed a bill on the federal government's debt ceiling and budget, and the flame of the US debt bomb was temporarily extinguished at the last moment. The two parties in the United States have staged an extreme tug of war over the US debt bomb. Some experts believe that the US debt crisis is the result of the reckless politics promoted by the US dollar hegemony, and the underlying cause of this crisis is the highly polarized political system of the US. Since the end of World War II, the US Congress has adjusted the debt ceiling more than a hundred times. The recurring debt crisis will not only have a catastrophic impact on the US economy and people's livelihoods, but also continuously erode the value of US dollar assets such as government credit and US bonds, bringing significant and far-reaching impacts to the global economic landscape. 【