Ruilai Intelligence releases the latest AI security platform, tying "seat belts" to the raging large model for safety | Model | Ruilai
In recent months, generative artificial intelligence represented by ChatGPT has been rapidly advancing, and the endogenous and derivative security threats of large models have become increasingly severe. Security issues such as data leakage and harmful content have emerged one after another.
How to turn the "stumbling block" that restricts the development of large model applications into a "ballast stone"? On July 7th, Ruilai Intelligence, an incubation enterprise of Tsinghua University's Artificial Intelligence Research Institute, released the new artificial intelligence security platform RealSafe3.0 at the 2023 World Artificial Intelligence Conference, tying the "seat belt" for the large model of "acceleration" development.
The large model is still in the stage of "wild growth"
Since the birth of artificial intelligence, there has been an imbalance between the power of creating technology and the power of controlling technology. New technology brings new security issues, which is precisely the dual nature of technology. The same applies to large models. Recently, security risks related to large models have been common, such as the leakage of confidential files, the model giving completely opposite answers after adding meaningless characters, outputting illegal and harmful content, and implicit biases and discrimination against certain human communities.
The risks brought by this emerging technology have attracted high attention from countries around the world. On April 11, the State Internet Information Office drafted the Administrative Measures for Generative Artificial Intelligence Services for public comments; On June 14th, the European Union voted to pass the Artificial Intelligence Act, with the aim of laws and regulations leading technology towards better and better development.
Xiao Zihao, co-founder of Ruilai Intelligence and algorithm scientist, believes that the essence of the difficulty in implementing large models lies in the fact that they are still in the stage of "barbaric growth" and have not yet found a balance point between scenarios, risks, and norms. In the process of exploring this balance point, there is a lack of user-friendly and standardized tools, that is, there is a lack of strong grasp at the technical level, which can scientifically evaluate whether the large model can meet both norms and low risks in the scene, and can further locate the problem and provide optimization suggestions to help the model go live and run.
Find the crux from the source and defeat "magic" with "magic"
It is reported that Ruilai Intelligence has released version 3.0 of its artificial intelligence security platform, which integrates mainstream and Ruilai Intelligence's unique world leading security evaluation technology. It can provide end-to-end model security evaluation solutions, aiming to solve the pain point problem of difficult auditing of current general large model security risks. Compared to the previous version, version 3.0 has added evaluations of general large models, covering nearly 70 evaluation dimensions such as data security, cognitive tasks, unique vulnerabilities of general models, and abuse scenarios. The number of evaluation dimensions will continue to increase in the future.
"Evaluation is just a means, and helping the general model improve its own security is the core goal." Xiao Zihao said, "We cannot stop because of the fear of being backfired by technology. Creating new technologies and controlling technological hazards should be carried out simultaneously." Ruilai's wise approach is to find the root cause from the source, and then use 'magic' to defeat 'magic'. "
If artificial intelligence models are compared to "engines", data is the "fuel" of the model. It can be said that the quality of the dataset directly affects the endogenous security of the model. Therefore, the Ruilai Smart 3.0 version integrates multiple self-developed models and high-quality datasets verified by experts internally to help users fix problems in the models. The self-developed adversarial model replaces manually designed problems for general large models that cannot be explained by black boxes, significantly improving attack success rate and sample diversity. That is to say, the model dataset not only includes its own dataset, but also the self generated data of the model, which is commendable in terms of data quality and scale. Therefore, it can automatically mine more vulnerabilities. The coach model conducts multiple rounds of question answer training on the tested large model, and uses the trained scoring model to rate the question answer results. The scoring results are then fed back to the large model, gradually iterating its question answer ability to the optimal level. Xiao Zihao revealed, "These technologies are all based on self-developed multimodal large model bases."