How was the "Cao Zhi" big language model born? Let's take a look at the CTO presentation of Daguan Data
"Cao Zhi composed poetry in seven steps, and his most famous chapter," Luoshen Fu, "is a typical long text in ancient literary works. This is also the specialty of the" Cao Zhi "big model, which is the intelligent analysis and writing of long document data." At the 2023 World Artificial Intelligence Conference, Chen Yunwen, Chairman of Daguan Data, officially released the "Cao Zhi" vertical field big language model.
This is the first domestically produced independent and controllable GPT language model specifically designed for vertical industries in China. It can accurately complete long text writing with multiple types and complex structures, automatically draft various types of documents, and in the future, it will achieve multimodal content generation, such as tables, charts, and images in long documents. So how was "Cao Zhi" born? Let's listen to Ji Daqi, CTO of Daguan Data.
Deeply cultivate the field of NLP
Daguan Data was founded in 2015 and grew up in Shanghai Pudong Software Park. Its founding team consists of program veterans who have worked with Chinese characters for over a decade and are deeply involved in the NLP field. In March of this year, with the release of the vertical, specialized, and independently controllable domestic version of the ChatGPT "Cao Zhi" big model, Daguan Data is continuously promoting the deep integration of NLP technology into different industry fields.
NLP is hailed as the jewel on the crown of AI. From the Internet to a broader industry, Daguan Data has accumulated a large amount of data, talents and NLP traditional architecture in the vertical fields in finance, government affairs, manufacturing and other industries. After extensive communication with clients from industries such as finance, government, and manufacturing, Ji Daqi, co-founder and CTO of Daguan Data, gradually discovered that NLP technology has broad application prospects in office documents.
In 2017, Google published a paper proposing two technical paths for NLP regarding "understanding" and "generation". "Based on the advantageous resources and future development of Daguan Data at that time, we chose the technological route of 'understanding' from the beginning." Ji Daqi introduced. This year, the IDP intelligent document review system developed by Ji Daqi and his R&D team using technologies such as knowledge graph and text recognition entered the market.
![How was the "Cao Zhi" big language model born? Let's take a look at the CTO presentation of Daguan Data](https://a5qu.com/upload/images/d30cf0c74d95444e2cb8eba6b4d18c2c.jpg)
With the continuous development of artificial intelligence, the demand for machine intelligence to process long texts is becoming increasingly urgent. Subsequently, Daguan Data was invested in the development of the big language model, with Ji Daqi serving as the overall project leader. This is the starting point of the birth of the current "Cao Zhi" model.
"Cultivate" an artificial intelligence version of "Cao Zhi"
"We want to 'cultivate' an artificial intelligence version of 'Cao Zhi', hoping that it can quickly grow into texts like the famous historical figure Cao Zhi in China." Speaking of the origin of the name 'Cao Zhi' big model, Ji Daqi smiled, "This was chosen by our employees from 40 to 50 names."
"Long text" is the target task of the "Cao Zhi" big model. Unlike the simple short text generation of question and answer, the "Cao Zhi" model can accurately complete the writing of long texts with multiple types and complex structures, automatically draft various types of documents, and has features such as automatic typesetting, intelligent error correction, text polishing, and automatic summary generation; It can also achieve multimodal content generation, such as tables, charts, images, etc. in long documents; Support writing in dozens of languages including Chinese, English, French, German, Japanese, Korean, etc., greatly improving office efficiency by assisting manual labor; In terms of long document translation, it achieves a 1:1 layout restoration of the original text's titles, paragraphs, and other content, providing a real-time translation experience, and is widely used in scenarios with intensive processing of multilingual documents.
This is also one of the first industrial application level models that can be implemented in large-scale language models in China, and has been put into use in multiple scenarios of AIGC in the financial field. Based on the "Cao Zhi" system, the "Cao Zhi" big model further consolidates the intelligent foundation of Daguan data industry application, comprehensively enhancing the AI full product matrix capability.
Text: Lu Xiaoyu
![How was the "Cao Zhi" big language model born? Let's take a look at the CTO presentation of Daguan Data](https://a5qu.com/upload/images/8c8e090111792842cf68318f7f7ad4f1.jpg)
Information: District Science and Technology Commission
*Reprinted from official WeChat account released by Pudong