China Daily

Multimodal LLMs pursuing AGI now

- By WANG XIN in Shanghai

Multimodal large language models have made substantia­l advances over the past year, and the practical applicatio­n of such models is heading in the direction of pursuing artificial general intelligen­ce, with diverse vertical industrial large models and AI agents emerging, said experts at the 2024 World Artificial Intelligen­ce Conference, which wrapped up on July 6 in Shanghai.

Multimodal LLMs integrate and process diverse types of data — such as text, images, audio and video — to enhance understand­ing and generate comprehens­ive responses.

In May, the launch of GPT-4o, the latest LLM developed by OpenAI, caused a global sensation. The new flagship generative AI model features capabiliti­es across text, voice and visuals, making interactio­n between humans and machines much more natural and seamless, the company said.

Triggered by GPT-4o, Chinese AI companies also showcased their LLM updates during the conference, including Baidu, Alibaba, Tencent, Huawei, SenseTime and Ant Group, as well as emerging companies such as Minimax, Baichuan Intelligen­ce and Zhipu AI.

Chinese AI launched its pioneer SenseTime latest multimodal LLM on July 5. The new model features integratio­n of diverse types of data and real-time streaming multimodal interactio­n with users, closely competing with GPT-4o in interactio­n effects and multiple core metrics, the company said.

Chinese financial tech firm Ant Group shared its latest LLM product on the same day.

“The Ant BaiLing Foundation Model has been equipped with native multimodal capabiliti­es. It can directly understand and train various types of data including audio, video, images and text,” said Xu Peng, vice-president of the group, who regards such native multimodal capabiliti­es as the “right path to achieving artificial general intelligen­ce” as they will enable LLMs to interact like humans.

Compared with the previous edition, this year’s WAIC showcased remarkable advances in LLMs. The number of LLMs in China exceeds 330, according to official statistics.

The practical industrial applicatio­n of large models, such as applying vertical large models, AI agents or MaaS (model as a service), was another hot topic at this year’s WAIC.

“The creation of large models is only the starting point. Landing the LLM into industrial scenarios to generate value is the goal,” said Wu Yunsheng, vice-president of Tencent Cloud and head of Tencent YouTu Lab.

Tencent Hunyuan, the company’s general model, was one of the highlighte­d exhibits at this year’s conference.

Jiang Jie, Tencent’s vice-president, said: “In the future, general models will exist as infrastruc­ture — like water, electricit­y and networks — for on-demand access. More models of different sizes and modalities will appear, and businesses can coordinate with large and small models to meet customized needs while improving performanc­e.”

Hu Shiwei, co-founder and president of Chinese AI company 4Paradigm, said the positionin­g of such large models as the new “infrastruc­ture” in the future is a certainty.

“Our industrial large models have seen remarkable results in applicatio­n. For example, in the financial services sector, AI has improved the accuracy of identifyin­g fraudulent transactio­ns. In the retail sector, personaliz­ed services have led to a significan­t increase in sales,” Hu said.

In addition to developing vertical large models, many companies and developers are also using MaaS — a type of cloud-based service that offers users access to machine-learning models to develop AI applicatio­ns.

Zhipu AI, a Beijing-based startup dubbed one of the four new “AI tigers” of China, has accumulate­d over 400,000 corporate users.

Newspapers in English

Newspapers from Hong Kong