LG’s Pride, ExaOne 3.0 — Is It Really That Impressive?
LG AI Research has recently and ambitiously unveiled “ExaOne 3.0,” released as open-source, drawing significant attention from the AI industry.
At a time when there are few AI models specifically tailored to the Korean language, the emergence of ExaOne 3.0 feels like a bold challenge LG has thrown at the Korean AI ecosystem. However, does LG’s ambitious AI model truly deliver the expected performance? Or is it just a product of impressive numbers and flashy marketing?
To answer this question, we need a fair and transparent performance evaluation. This is where AI Network Arena plays a crucial role. AI Network Arena is an innovative platform designed to fairly and transparently evaluate the performance of AI models, allowing for a comparative analysis of various LLMs (Large Language Models) in real user environments.
[The Importance of Open-Source LLMs]
To understand the significance of open-source LLMs like ExaOne 3.0, let’s first examine the key reasons why we should use open-source LLMs:
- Customizability and Optimization:
- Users can freely modify and optimize the model to meet their specific needs. The model can be retrained or fine-tuned to fit particular industries or business demands, leading to higher performance.
- Data Security and Privacy:
- Open-source LLMs can be run in a local environment, eliminating the need to send data to external servers. This is a significant advantage for enhancing security and privacy when dealing with sensitive data.
- Independence and Complete Data Ownership:
- By using open-source LLMs, you avoid being dependent on a specific vendor or platform, allowing you to maintain control over how the model is deployed and used.
In particular, if a company’s confidential documents, source codes, or sensitive personal data are leaked externally, it can lose its core competitive edge and face legal issues, resulting in significant damage to the business.
To prevent these risks, the fact that open source LLM can be run in a local environment is a huge advantage in terms of data security and privacy protection.
[Notable Features of EXAONE 3.0]
Meanwhile, ExaOne 3.0 has been highly praised in the media for the following reasons:
- Bilingual Capability: Proficient in both Korean and English, making it suitable for global and local use.
- Particularly, its performance in Korean language processing is rated as world-class.
- High Performance and Efficiency: It is approximately 56% faster than its predecessor and reduces costs by 72%.
- Extensive Data Training: Trained on high-quality datasets, including over 45 million papers and patent data, as well as more than 350 million images and text data.
- Adaptability Across Domains: Capable of adapting to various industries, offering customized solutions tailored to specific business needs.
Additionally, according to the EXAONE technical report published by LG AI Research
EXAONE 3.0 shows superior performance compared to open-source models of the same level in both Korean and English.
In particular, for the Korean language, EXAONE 3.0 demonstrated superior performance over all open-source models of the same level in the KoBEST benchmark, which evaluates Korean language comprehension capabilities. Additionally, it ranked first with a remarkable average score of 74.1 points.
But how does the actual performance stack up behind the flashy descriptions?
Considering the many claims of “the best model” that have poured out of the AI industry over the years, it remains to be seen whether this is just another case of exaggerated promotion or if it truly possesses genuine competitiveness.
We asked two open-source LLMs of similar scale to EXAONE, “Jeom-mat-chu” (short for “recommend a lunch menu”).
The Gemma 2.9B and Llama 3.1 8B models completely failed to grasp the meaning of this abbreviation, providing contextually irrelevant responses. In contrast, EXAONE 3.0 accurately understood the Korean abbreviation “Jeom-mat-chu” and provided appropriate lunch menu recommendations.
We asked the EXAONE 3.0, Gemma 2.9B, and Llama 3.1 8B models to “write a letter to a younger sibling who is going to the military.” In response to such a question, EXAONE 3.0 used empathetic language and included words that reflect the emotional significance of military service to Koreans. On the other hand, the other models either failed to grasp the specific context of “military” or were confused about the intended recipient of the letter, demonstrating relatively weaker performance.
Based on these examples, EXAONE appears to be the open-source LLM that best understands Korean culture, abbreviations, and sentiment. However, to confirm that these results are consistent in real user environments and not just isolated examples, we can use AI Network Arena to verify the model’s performance across a variety of situations, reflecting actual user preferences.
2. Contextual Understanding and Question Answering Based on Internal Corporate Documents
In a corporate environment, the security of internal confidential and sensitive information is crucial. To prevent the leakage of such information, companies prefer Local AI models that can be run on their own servers.
To evaluate practical usability in corporate settings, we assessed how accurately AI models could understand internal corporate documents and provide appropriate answers to related questions.
- “How did the scope of family members change in the 1990 Civil Law amendment?”
We provided both the EXAONE 3.0 and Gemma 2.9B models with a legal document of about 2–3 A4 pages and asked, “How did the scope of family members change in the 1990 Civil Law amendment?”
Both models provided the information that “the range was changed to include blood relatives within eight degrees of kinship on both paternal and maternal sides, and in-laws within four degrees of kinship.” However, it was noted that EXAONE 3.0 offered more specific and detailed information compared to the Gemma 2.9B model.
- “What is the main purpose of the network analysis mentioned in the report, and how can companies use it to strengthen their marketing strategies?”
We provided both EXAONE 3.0 and the Llama 3.1 9B models with a document on data-driven marketing for a company and asked what insights could be drawn from the report. Both models offered similarly useful responses; however, EXAONE stood out with a clear structure and detailed explanations, presenting practical marketing strategies. In contrast, Llama 3.1 8B focused more on conceptual explanations and provided relatively concise information.
Through these examples, it became evident that EXAONE provides much more detailed information when using internal corporate documents for work purposes. While EXAONE demonstrates the ability to offer specific and practical information in actual tasks, AI Network Arena can be used to fairly verify whether this ability is superior compared to other LLMs.
3. Problem-Solving Ability for Coding Assistance and Enhancing Internal Productivity
A company’s software code is a proprietary technology and a valuable asset. For security reasons, it is challenging to expose such internal code to external AI services like GPT. In this case, a locally executable AI model is useful, as it can help improve the productivity of internal developers while maintaining security. From this perspective, let’s compare a few real conversations related to coding.
- “Please tell me how to fix this error.”
We gave both EXAONE 3.0 and the Gemma 2.9B models an error code and asked them to identify the cause.
EXAONE 3.0 provided a more detailed and practical solution for exception handling, offering actual code examples along with suggestions for improvement. In contrast, Llama 3.1 8B provided general guidance on identifying the cause of the error and possible solutions, but it did not offer specific instructions on how to directly fix the error or handle exceptions in detail.
- “Please briefly explain how to reduce build times in a continuous integration (CI) pipeline.”
EXAONE 3.0 offers a more detailed and practical approach to optimizing CI pipelines. It explains specific methods step-by-step, making it easier to apply in real-world situations. In contrast, Gemma 2.9B presents a broader picture, focusing more on providing high-level strategies rather than concrete action plans. Ultimately, in situations where specific and practical solutions are needed, EXAONE 3.0 proves to be a more effective choice.
Through these examples, we can see that EXAONE 3.0 provides more specific and practical answers compared to other open-source models.
[Verification Using LLM Judge]
To further verify the performance of Exaone 3.0, we used GPT-4o to score the responses from EXAONE 3.0, Gemma 2.9B, and Llama 3.1 8B, all open-source models of similar scale, on a scale of 5 points.
We found that EXAONE 3.0 recorded the highest average score with 4.75 out of 5.
Both in document-based and coding-based questions, EXAONE 3.0 recorded high average scores, demonstrating superior performance compared to other open-source models of the same level.
[Deeper Verification with AI Network Arena]
While EXAONE 3.0 has shown excellence based on the three criteria mentioned above, these are results from specific examples. To evaluate its performance more fairly and transparently in various scenarios, AI Network Arena can be utilized. AI Network Arena is an innovative platform that allows direct comparison of multiple open-source LLMs, helping users find the most suitable model for real-world environments.
[Conclusion]
Based on the three criteria and qualitative evaluations using LLM Judge (GPT-4o), the LG EXAONE 3.0 model demonstrated exceptional understanding of the Korean language, sentiment, and culture. It also outperformed similar-scale open-source models in understanding and solving given tasks.
However, these results merely show EXAONE’s superiority in specific tests. To verify its real-world performance across diverse environments that users require, you can explore AI Network Arena. Through in-depth evaluations beyond simple tests, AI Network Arena provides clarity on how well EXAONE 3.0 performs in actual usage scenarios and which model best fits your needs.
Experience the true performance of EXAONE 3.0 yourself in the soon-to-be-released high-performance evaluation system, AI Network Arena.
P.S. LG vs. Samsung: Which is Better?
As a bonus, there is a response from EXAONE that went viral on social media.
When asked, “Which is better, LG or Samsung?” the answer was, “LG Electronics is better.”
Since EXAONE is a model developed under a specific company, there are concerns about potential bias toward that company. To address these concerns, we asked a variety of questions to check for any bias.
Upon review, it turned out that despite the hype, the model did not give biased responses.
This model review was written based on the innovative Web3 AI ecosystem of AI Network and AI Network Arena.
AI Network Arena is an innovative platform designed to evaluate various LLMs (Large Language Models) fairly and transparently. Arena compares and analyzes the performance of AI models in real-world environments, providing users with the most reliable evaluation results.
Participate in the soon-to-be-released high-performance LLM evaluation system, AI Network Arena, where you can experience and evaluate various AI models. Through Arena, you can help shape the future of AI and find the model that best suits your needs!
AI Network is a blockchain-based decentralized AI development ecosystem. GPU providers can be rewarded with $AIN tokens for sharing their GPUs, developers can use the shared GPUs to develop open-source AI projects, and creators can engage in AI-based creative activities using AINFTs. AI Network is creating a Web3 era for AI, where anyone can easily develop and utilize AI within the ecosystem.
AI Network Website: https://www.ainetwork.ai
AI Network DAO Discord: https://discord.com/invite/aindao
AI Network YouTube: https://www.youtube.com/@ainetwork_ai
AI Network Facebook: https://www.facebook.com/ainetworkofficial
AI Network Twitter: https://twitter.com/ainetwork_kr