work[WEB3 NOT IN THE BOOKS]

GPT-4o: Multimodal, Efficient, Private

The importance of Open Source in increasingly autonomous AI

AI Network
7 min readMay 24, 2024

View the YouTube live stream with AI Network CEO Minhyun Kim here.

GPT-4o, the latest evolution of chatGPT, has just been released by OpenAI, and is creating a bit of a buzz.

GPT-4o: Multi-Modality

The ‘o’ in GPT-4o stands for ‘omni’ (making it ‘generative pre-trained transformer 4 omni)’ and is so called because of its multi-modality. ChatGPT iterations up to this point (like GPT-4 & GPT-4 Turbo) have focused on reasoning and generating responses within one medium, and then having to translate information & responses across to other mediums like audio and video.

Up to this point, if you wanted ChatGPT to answer a question and create an image based on that question, the model would use the GPT for the text and DALL-E for the image generation. This meant switching between modalities (like text and visual) which in turn meant slower processing times and ultimately risking the loss of context and quality in the process.

For example, say you were holding a video conference in Korean and wanted the GPT to dictate into English and create an associated image with English text. It would have to use one module (audio) to listen, then switch to another to transcribe what was being said into Korean text, then translate that text into English, then switch to another module (visual) to create the image based on that information.

The evolution of GPT-4o is to include all of these modalities into one model, allowing GPT-4o to reason across text, audio and visual without having to switch between them, seamlessly and in real time, as well as being 50% cheaper in the API. This means the model can read images, perform audio to audio communications, view the world through a camera, and make responses almost instant, like an actual conversation (and even sing).

Source: OpenAI. GPT-4o rankings on various text evaluation tests against other models, using metrics like massive multitask language understanding, generalized public question answering, mathematics and others.

JEPA — Toward Autonomous Intelligence

Yann LeCun, 2018 Turing Award winner for his work on deep learning and neural networks, set forth a framework for autonomous intelligence — machines that are not only capable of performing specifically programmed tasks, but which are also able to learn and adapt like animals & humans, ultimately make decisions independently.

Yann LeCun. Image source.

Now Meta’s chief AI scientist, LeCun and Meta have proposed a new architecture to strive toward this goal. JEPA (Joint Embedding Predictive Architecture) learns by creating an internal model of the outside world which compares abstract representations of concepts. The AI focuses on learning a unified representation for different types of data, such as text and images, by embedding them into a common lower-dimensional space. This unified space allows the model to understand and capture the relationships and similarities between these diverse data types more effectively.

With images, for example, JEPA compares abstract representations of images rather than comparing the pixels themselves, learning through what we as humans would regard as concepts, rather than how the models previously learned by going through rigorous, repetitive training. In image-generation previous generative models would try and fill in every missing pixel, whereas the JPEA architecture focuses on understanding abstract representations and possesses the ability to discard irrelevant information, leading to more efficient training.

The idea here is similar to how a baby learns. An infant may knock an apple off a table a few times before realizing that it will always fall toward the ground — they won’t need to repeat knocking the apple a thousand times to learn the rule, and will be able to apply it in abstraction — in that any object they knock off a table will fall to the ground, and not just apples. The goal is to create an AI that can learn through representations and apply abstractions to other situations, like a human or animal does, ultimately leading to machines that can understand the world, plan, predict and accomplish complex tasks.

Autonomous Intelligence — Better Private or Open Source?

GPT-4o is one more step toward autonomous intelligence and AGI (artificial general intelligence) — machines that can learn, think and reason on their own — but does the manner in which these models evolve matter?

Is it important whether these AI models are privately owned, with the underlying code held as proprietary information, or if they’re open source, with the code open & anyone able to contribute?

GPT-4o is definitely a step toward an intelligent machine-learning future, but it’s a private step.

This can be seen in Open AI’s manner of releasing new iterations of their AI models. Releases are shrouded in secrecy, until suddenly new versions of ChatGPT drop, and everyone’s amazed by their improved capabilities. This is great for marketing (it’s exactly what they did with 4o), but it also means that the wider world also has no idea what’s coming in terms of AI. This is a big deal, considering AI disrupts so many of the ways in which we live and work. Think back to when ChatGPT was first launched in Nov 2022, and how it disrupted life within a few short weeks, from content creation to coding to education and everywhere in between.

AINA — Truly Open Source

Open source is the opposite of private AI. Open source means anyone can see the code and the models work, download them to use on personal or business computers, and that everyone can see the direction in which things are headed.

Meta have gone down this route with Llama 3, their open source large-language model, in stark contrast to OpenAI’s ChatGPT and Google’s Gemini which are both closed source. Meta isn’t the only one going down the open source route. Hugging Face is democratizing AI by making advanced NLP tools accessible through their open-source libraries, and Mistral AI develop cutting-edge open source AI models under the Apache 2.0 license.

Open source is one crucial step toward the democratization of AI for everyone, but there is another that can be taken; decentralization.

This is where AINA comes in, a Web3 platform and open source LLM built on the blockchain.

Open Source + Blockchain Technology = Democratization & Decentralization

AINA stands for AI Network + AI Agents, and is a place where anyone can utilize AI in a decentralized, recorded and transparent manner, and even create their own AI models and keep all the value their models create, for free.

AINA uses AI agents, which are autonomously learning independent AI that interact and learn from each other on the AI Network ecosystem. Each AI Agent is digitized as an AINFT (artificial intelligence non-fungible token), and each action, interaction, and transaction they make is recorded securely on the blockchain. These concepts combined mean AI Agents have the ability to learn independently from their human creators through AI-AI interactions, and all interactions, plus all value they create is recorded securely and transparently, and all value is owned by the AINFT holder.

AIN Wallet integration in AINA

AINA is run by $AIN token, and is the only AI platform which a wallet can be connected to. The token mediates all interactions on AINA, and ensures all transactions and creations of value are recorded, as well as ensuring no ambiguity on ownership of AINFTs.

GPT-4o is continuing the forward march toward smarter, faster and more autonomous AI, but in a closed and private way. Meta are working toward AI in an open way, also focusing on autonomous learning. AI Network are focusing on decentralized, blockchain-based open source AI, which learns autonomously and has the potential to create true value for anyone and everyone.

The AINA platform is due for release the end of May 2024.

AI Network is giving away $1,000 worth of $AIN credit for use on AINA.

Enter the draw here and follow AI Network on X for updates.

Lean more about #AINA https://ai-network.medium.com/aina-web3-platform-for-open-source-ai-agents-c193b0ffdd6a

AI Network is a decentralized AI development ecosystem based on blockchain technology. Within its ecosystem, resource providers can earn $AIN tokens for their GPUs, developers can gain access to GPUs for open source AI programs, and creators can transform their AI creations into AINFTs. The ultimate goal of AI Network is to bring AI to Web3, where everyone can easily develop and utilize artificial intelligence.

If you want to know more about us,

--

--

AI Network

A decentralized AI development ecosystem built on its own blockchain, AI Network seeks to become the “Internet for AI” in the Web3 era.