Generative AI has marked a new beginning in the world of AI and technology.

As OpenAI had launched their ChatGPT back in November 2022, Google and other big companies started to reimagine the potential of Generative AI with their competitive products. Google has always been competing with OpenAI’s ChatGPT and had previously presented the Bard AI.

On 9 December, 2023 Google has come up with Gemini AI claiming that it outperforms ChatGPT with a score of 90% in MMLU (Massive Multitask Language Understanding) test.

Gemini AI is a multimodal AI model that, the search engine giant says, can process text, images, and audio. It has overall three versions namely, Ultra, Pro and Nano, those will be available as and when they are announced.

Gemini Ultra version was available since its initial release, capable of processing highly complex tasks.

On 13 December, 2023, Gemini AI Pro version was unveiled of its potential to empower developers. It is available on Vertex AI, Google Cloud’s end-to-end AI platform that includes intuitive tooling, fully-managed infrastructure, and built-in privacy and safety features. As the Google CEO, Mr. Sundar Pichai, has addressed in a blog post,

Gemini is our most capable and general model yet, with state-of-the-art performance across many leading benchmarks. It was built from the ground up to be multimodal, which means it can generalize and seamlessly understand, operate across and combine different types of information including text, code, audio, image and video.

Gemini LLM will power the Generative AI chatbot, Bard, to process advanced understanding through reasoning, planning and other abilities. It will also be included into the search engine for a Generative AI user experience.

INTRODUCING A NEW ERA FOR DEVELOPERS

Gemini Pro is now available in Google Cloud as a preview version for developers in Vertex AI. The Vertex AI SDK is available for four languages, Python, Go, Java, and Node.js.

Vertex AI has many foundation models that are accessible through APIs, which are:

1. Gemini API: Advanced reasoning, multiturn chat, code generation, and multimodal prompts.

2. PaLM API: Natural language tasks, text embeddings, and multiturn chat.

3. Codey APIs: Code generation, code completion, and code chat.

4. Imagen API: Image generation, image editing, and visual captioning.

5. MedLM: Medical question answering and summarization. (Private GA)

Gemini Pro is introduced with models developed by Google DeepMind, specifically to enhance natural language tasks, multiturn text and code chat, and code generation.

Model customization is done through Model Tuning, to generate desired results without using any complex prompts. This reduces the cost and latency of requests.

Gemini AI further provides Vertex AI Grounding service which provides models to access specific data sources and reduce hallucinations on unknown topics. It is then, through citation metadata, the content is analysed for delicacies. In case of quotations or data taken from other web sources, the URL of source page is stated along with the title, license and publication date, if any.

After going through all the checks and safety filters, the response is then returned.

NEW TRENDS WITH GEMINI AI

Gemini AI has claimed of surpassing ChatGPT by an average of 3 to 4%. However minor issues were found on the data analysis due to data contamination.

Following the launch, company has reported that the demo wasn’t conducted in real time and still images or fed text prompts were used. As later stated by the company,

The video is an illustrative depiction of the possibilities of interacting with Gemini, based on real multimodal prompts and outputs from testing. We look forward to seeing what people create when access to Gemini Pro opens on December 13.

The latest update of ChatGPT 4 can produce images, audio or video using different models. ChatGPT works only with text that are passed to other models like DALL-E for producing images, Whisper API for speech-to-text capabilities.

Gemini AI has all-in-one capabilities that handles all types of outputs viz. audio, images, video and text.

Multimodal LLMs are the future of AI and Gemini AI has marked a new benchmark for ChatGPT as well. Most likely, the next version of ChatGPT will introduce multimodality. A note from Google and Alphabet CEO Sundar Pichai,

Now, we’re taking the next step on our journey with Gemini, our most capable and general model yet, with state-of-the-art performance across many leading benchmarks.

Gemini AI Setting Multimodal LLM Trends For Future

INTRODUCING A NEW ERA FOR DEVELOPERS

NEW TRENDS WITH GEMINI AI