AI-Powered Search: Google’s Transformation vs. Perplexity

TL;DR, Play the podcast (Audio Overview generated by NotebookLM)

  1. Abstract
  2. Google’s AI Transformation: From PageRank to Gemini-Powered Search
    1. The Search Generative Experience (SGE) Revolution
    2. Google’s LLM Arsenal
    3. Technical Architecture Integration
    4. Key Differentiators of Google’s AI Search
  3. Perplexity AI Architecture: The RAG-Powered Search Revolution
    1. Simplified Architecture View
    2. How Perplexity Works: From Query to Answer
    3. Technical Workflow Diagram
  4. The New Search Paradigm: AI-First vs AI-Enhanced Approaches
    1. Google’s Philosophy: “AI-Enhanced Universal Search”
    2. Perplexity’s Philosophy: “AI-Native Conversational Search”
    3. Comprehensive Technology & Business Comparison
  5. The Future of AI-Powered Search: A New Competitive Landscape
    1. Implementation Strategy Battle: Integration vs. Innovation
    2. The Multi-Modal Future
    3. Business Model Evolution Under AI
    4. Technical Architecture Convergence
    5. The Browser and Distribution Channel Wars
  6. Strategic Implications and Future Outlook
    1. Key Strategic Insights
    2. The New Competitive Dynamics
    3. Looking Ahead: Industry Predictions
  7. Recommendations for Stakeholders
  8. Conclusion

Abstract

This blog examines the rapidly evolving landscape of AI-powered search, comparing Google’s recent transformation with its Search Generative Experience (SGE) and Gemini integration against Perplexity AI‘s native AI-first approach. Both companies now leverage large language models, but with fundamentally different architectures and philosophies.

The New Reality: Google has undergone a dramatic transformation from traditional keyword-based search to an AI-driven conversational answer engine. With the integration of Gemini, LaMDA, PaLM, and the rollout of AI Overviews (formerly SGE), Google now synthesizes information from multiple sources into concise, contextual answers—directly competing with Perplexity’s approach.

Key Findings:

  • Convergent Evolution: Both platforms now use LLMs for answer generation, but Google maintains its traditional search infrastructure while Perplexity was built AI-first from the ground up
  • Architecture Philosophy: Google integrates AI capabilities into its existing search ecosystem (hybrid approach), while Perplexity centers everything around RAG and multi-model orchestration (AI-native approach)
  • AI Technology Stack: Google leverages Gemini (multimodal), LaMDA (conversational), and PaLM models, while Perplexity orchestrates external models (GPT, Claude, Gemini, Llama, DeepSeek)
  • User Experience: Google provides AI Overviews alongside traditional search results, while Perplexity delivers answer-first experiences with citations
  • Market Dynamics: The competition has intensified with Google’s AI transformation, making the choice between platforms more about implementation philosophy than fundamental capabilities

This represents a paradigm shift where the question is no longer “traditional vs. AI search” but rather “how to best implement AI-powered search” with different approaches to integration, user experience, and business models.

Keywords: AI Search, RAG, Large Language Models, Search Architecture, Perplexity AI, Google Search, Conversational AI, SGE, Gemini.

Google has undergone one of the most significant transformations in its history, evolving from a traditional link-based search engine to an AI-powered answer engine. This transformation represents a strategic response to the rise of AI-first search platforms and changing user expectations.

The Search Generative Experience (SGE) Revolution

Google’s Search Generative Experience (SGE), now known as AI Overviews, fundamentally changes how search results are presented:

  • AI-Synthesized Answers: Instead of just providing links, Google’s AI generates comprehensive insights, explanations, and summaries from multiple sources
  • Contextual Understanding: Responses consider user context including location, search history, and preferences for personalized results
  • Multi-Step Query Handling: The system can handle complex, conversational queries that require reasoning and synthesis
  • Real-Time Information Grounding: AI overviews are grounded in current, real-time information while maintaining accuracy

Google’s LLM Arsenal

Google has strategically integrated multiple advanced AI models into its search infrastructure:

Gemini: The Multimodal Powerhouse
  • Capabilities: Understands and generates text, images, videos, and audio
  • Search Integration: Enables complex query handling including visual search, reasoning tasks, and detailed information synthesis
  • Multimodal Processing: Handles queries that combine text, images, and other media types
LaMDA: Conversational AI Foundation
  • Purpose: Powers natural, dialogue-like interactions in search
  • Features: Enables follow-up questions and conversational context maintenance
  • Integration: Supports Google’s shift toward conversational search experiences

PaLM: Large-Scale Language Understanding

  • Role: Provides advanced language processing capabilities
  • Applications: Powers complex reasoning, translation (100+ languages), and contextual understanding
  • Scale: Handles extended documents and multimodal inputs

Technical Architecture Integration

Google’s approach differs from AI-first platforms by layering AI capabilities onto existing infrastructure:

  • Hybrid Architecture: Maintains traditional search capabilities while adding AI-powered features
  • Scale Integration: Leverages existing massive infrastructure and data
  • DeepMind Synergy: Strategic integration of DeepMind research into commercial search applications
  • Continuous Learning: ML ranking algorithms and AI models learn from user interactions in real-time
  • Global Reach: AI features deployed across 100+ languages with localized understanding

Perplexity AI Architecture: The RAG-Powered Search Revolution

Perplexity AI represents a fundamental reimagining of search technology, built on three core innovations:

  1. Retrieval-Augmented Generation (RAG): Combines real-time web crawling with large language model capabilities
  2. Multi-Model Orchestration: Leverages multiple AI models (GPT, Claude, Gemini, Llama, DeepSeek) for optimal responses
  3. Integrated Citation System: Provides transparent source attribution with every answer

The platform offers multiple access points to serve different user needs: Web Interface, Mobile App, Comet Browser, and Enterprise API.

Core Architecture Components

Simplified Architecture View

For executive presentations and high-level discussions, this three-layer view highlights the essential components:

How Perplexity Works: From Query to Answer

Understanding Perplexity’s workflow reveals why it delivers fundamentally different results than traditional search engines. Unlike Google’s approach of matching keywords to indexed pages, Perplexity follows a sophisticated multi-step process:

The Eight-Step Journey

  1. Query Reception: User submits a natural language question through any interface
  2. Real-Time Retrieval: Custom crawlers search the web for current, relevant information
  3. Source Indexing: Retrieved content is processed and indexed in real-time
  4. Context Assembly: RAG system compiles relevant information into coherent context
  5. Model Selection: AI orchestrator chooses the optimal model(s) for the specific query type
  6. Answer Generation: Selected model(s) generate comprehensive responses using retrieved context
  7. Citation Integration: System automatically adds proper source attribution
  8. Response Delivery: Final answer with citations is presented to the user

Technical Workflow Diagram

The sequence below shows how a user query flows through Perplexity’s system.

This process typically completes in under 3 seconds, delivering both speed and accuracy.

The New Search Paradigm: AI-First vs AI-Enhanced Approaches

The competition between Google and Perplexity has evolved beyond traditional vs. AI search to represent two distinct philosophies for implementing AI-powered search experiences.

  • Hybrid Integration: Layer advanced AI capabilities onto proven search infrastructure
  • Comprehensive Coverage: Maintain traditional search results alongside AI-generated overviews
  • Gradual Transformation: Evolve existing user behaviors rather than replace them entirely
  • Scale Advantage: Leverage massive existing data and infrastructure for AI training and deployment
  • Model Agnostic: Orchestrate best-in-class models rather than developing proprietary AI
  • Clean Slate Design: Built from the ground up with AI-first architecture
  • Answer-Centric: Focus entirely on direct answer generation with source attribution
  • Conversational Flow: Design for multi-turn, contextual conversations rather than single queries

Comprehensive Technology & Business Comparison

DimensionGoogle AI-Enhanced SearchPerplexity AI-Native Search
InputNatural language + traditional keywordsPure natural language, conversational
AI ModelsGemini, LaMDA, PaLM (proprietary)GPT, Claude, Gemini, Llama, DeepSeek (orchestrated)
ArchitectureHybrid (AI + traditional infrastructure)Pure AI-first (RAG-centered)
RetrievalEnhanced index + Knowledge Graph + real-timeCustom crawler + real-time retrieval
Core TechAI Overviews + traditional rankingRAG + multi-model orchestration
OutputHybrid (AI Overview + links + ads)Direct answers with citations
ContextLimited conversational memoryFull multi-turn conversation memory
ExtensionsMaps, News, Shopping, Ads integrationDocument search, e-commerce, APIs
BusinessAd-driven + AI premium featuresSubscription + API + e-commerce
UX“AI answers + traditional options”“Conversational AI assistant”
ProductsGoogle Search with SGE/AI OverviewPerplexity Web/App, Comet Browser
DeploymentGlobal rollout with localizationGlobal expansion, English-focused
Data AdvantageMassive proprietary data + real-timeReal-time web data + model diversity
ProductsGoogle Search, AdsPerplexity Web/App, Comet Browser

The Future of AI-Powered Search: A New Competitive Landscape

The integration of AI into search has fundamentally changed the competitive landscape. Rather than a battle between traditional and AI search, we now see different approaches to implementing AI-powered experiences competing for user mindshare and market position.

Implementation Strategy Battle: Integration vs. Innovation

Google’s Integration Strategy:

  • Advantage: Massive user base and infrastructure to deploy AI features at scale
  • Challenge: Balancing AI innovation with existing business model dependencies
  • Approach: Gradual rollout of AI features while maintaining traditional search options

Perplexity’s Innovation Strategy:

  • Advantage: Clean slate design optimized for AI-first experiences
  • Challenge: Building user base and competing with established platforms
  • Approach: Focus on superior AI experience to drive user acquisition

The Multi-Modal Future

Both platforms are moving toward comprehensive multi-modal experiences:

  • Visual Search Integration: Google Lens vs. Perplexity’s image understanding capabilities
  • Voice-First Interactions: Google Assistant integration vs. conversational AI interfaces
  • Video and Audio Processing: Gemini’s multimodal capabilities vs. orchestrated model approaches
  • Document Intelligence: Enterprise document search and analysis capabilities

Business Model Evolution Under AI

Advertising Model Transformation:

  • Google must adapt its ad-centric model to AI Overviews without disrupting user experience
  • Challenge of monetizing direct answers vs. traditional click-through advertising
  • Need for new ad formats that work with conversational AI

Subscription and API Models:

  • Perplexity’s success with subscription tiers validates alternative monetization
  • Growing enterprise demand for AI-powered search APIs and integrations
  • Premium features becoming differentiators (document search, advanced models, higher usage limits)

Technical Architecture Convergence

Despite different starting points, both platforms are converging on similar technical capabilities:

  • Real-Time Information: Both now emphasize current, up-to-date information retrieval
  • Source Attribution: Transparency and citation becoming standard expectations
  • Conversational Context: Multi-turn conversation support across platforms
  • Model Diversity: Google developing multiple specialized models, Perplexity orchestrating external models

The Browser and Distribution Channel Wars

Perplexity’s Chrome Acquisition Strategy:

  • $34.5B all-cash bid for Chrome represents unprecedented ambition in AI search competition
  • Strategic Value: Control over browser defaults, user data, and search distribution
  • Market Impact: Success would fundamentally alter competitive dynamics and user acquisition costs
  • Regulatory Reality: Bid likely serves as strategic positioning and leverage rather than realistic acquisition

Alternative Distribution Strategies:

  • AI-native browsers (Comet) as specialized entry points
  • API integrations into enterprise and developer workflows
  • Mobile-first experiences capturing younger user demographics

Strategic Implications and Future Outlook

The competition between Google’s AI-enhanced approach and Perplexity’s AI-native strategy represents a fascinating case study in how established platforms and startups approach technological transformation differently.

Key Strategic Insights

  • The AI Integration Challenge: Google’s transformation demonstrates that even dominant platforms must fundamentally reimagine their core products to stay competitive in the AI era
  • Architecture Philosophy Matters: The choice between hybrid integration (Google) vs. AI-first design (Perplexity) creates different strengths, limitations, and user experiences
  • Business Model Pressure: AI-powered search challenges traditional advertising models, forcing experimentation with subscriptions, APIs, and premium features
  • User Behavior Evolution: Both platforms are driving the shift from “search and browse” to “ask and receive” interactions, fundamentally changing how users access information

The New Competitive Dynamics

Advantages of Google’s AI-Enhanced Approach:

  • Massive scale and infrastructure for global AI deployment
  • Existing user base to gradually transition to AI features
  • Deep integration with knowledge graphs and proprietary data
  • Ability to maintain traditional search alongside AI innovations

Advantages of Perplexity’s AI-Native Approach:

  • Optimized user experience designed specifically for conversational AI
  • Agility to implement cutting-edge AI techniques without legacy constraints
  • Model-agnostic architecture leveraging best-in-class external AI models
  • Clear value proposition for users seeking direct, cited answers

Looking Ahead: Industry Predictions

Near-Term (1-2 years):

  • Continued convergence of features between platforms
  • Google’s global rollout of AI Overviews across all markets and languages
  • Perplexity’s expansion into enterprise and specialized vertical markets
  • Emergence of more AI-native search platforms following Perplexity’s model

Medium-Term (3-5 years):

  • AI-powered search becomes the standard expectation across all platforms
  • Specialized AI search tools for professional domains (legal, medical, scientific research)
  • Integration of real-time multimodal capabilities (live video analysis, augmented reality search)
  • New regulatory frameworks for AI-powered information systems

Long-Term (5+ years):

  • Fully conversational AI assistants replace traditional search interfaces
  • Personal AI agents that understand individual context and preferences
  • Integration with IoT and ambient computing for seamless information access
  • Potential emergence of decentralized, blockchain-based search alternatives

Recommendations for Stakeholders

For Technology Leaders:

  • Hybrid Strategy: Consider Google’s approach of enhancing existing systems with AI rather than complete rebuilds
  • Model Orchestration: Investigate Perplexity’s approach of orchestrating multiple AI models for optimal results
  • Real-Time Capabilities: Invest in real-time information retrieval and processing systems
  • Citation Systems: Implement transparent source attribution to build user trust

For Business Strategists:

  • Revenue Model Innovation: Experiment with subscription, API, and premium feature models beyond traditional advertising
  • User Experience Focus: Prioritize conversational, answer-first experiences in product development
  • Distribution Strategy: Evaluate the importance of browser control and default search positions
  • Competitive Positioning: Decide between AI-enhancement of existing products vs. AI-native alternatives

For Investors:

  • Platform Risk Assessment: Evaluate how established platforms are adapting to AI disruption
  • Technology Differentiation: Assess the sustainability of competitive advantages in rapidly evolving AI landscape
  • Business Model Viability: Monitor the success of alternative monetization strategies beyond advertising
  • Regulatory Impact: Consider potential regulatory responses to AI-powered information systems and search market concentration

The future of search will be determined by execution quality, user adoption, and the ability to balance innovation with practical business considerations. Both Google and Perplexity have established viable but different paths forward, setting the stage for continued innovation and competition in the AI-powered search landscape.

  • Monitor the browser control battle and distribution channel acquisitions
  • Technology Differentiation: Assess the sustainability of competitive advantages in rapidly evolving AI landscape
  • Business Model Viability: Monitor the success of alternative monetization strategies beyond advertising
  • Regulatory Impact: Consider potential regulatory responses to AI-powered information systems and search market concentration

Conclusion

The evolution of search from Google’s traditional PageRank-driven approach to today’s AI-powered landscape represents one of the most significant technological shifts in internet history. Google’s recent transformation with its Search Generative Experience and Gemini integration demonstrates that even the most successful platforms must reinvent themselves to remain competitive in the AI era.

The competition between Google’s AI-enhanced strategy and Perplexity’s AI-native approach offers valuable insights into different paths for implementing AI at scale. Google’s hybrid approach leverages massive existing infrastructure while gradually transforming user experiences, while Perplexity’s clean-slate design optimizes entirely for conversational AI interactions.

As both platforms continue to evolve, the ultimate winners will be users who gain access to more intelligent, efficient, and helpful ways to access information. The future of search will likely feature elements of both approaches: the scale and comprehensiveness of Google’s enhanced platform combined with the conversational fluency and transparency of AI-native solutions.

The battle for search supremacy in the AI era has only just begun, and the innovations emerging from this competition will shape how humanity accesses and interacts with information for decades to come.


This analysis reflects the state of AI-powered search as of August 2025. The rapidly evolving nature of AI technology and competitive dynamics may significantly impact future developments. Both Google and Perplexity continue to innovate at unprecedented pace, making ongoing monitoring essential for stakeholders in this space. This analysis represents the current state of AI-powered search as of August 2025. The rapidly evolving nature of AI technology and competitive landscape may impact future developments.

Zuckerberg’s Gamble: Risks and Rewards in AI Talent Acquisition


Mark Zuckerberg’s recent move to bring Alex Wang and his team into Meta represents a bold and strategic maneuver amid the rapid advancement of large models and AGI development. Putting aside the ethical considerations, Zuckerberg’s approach—laying off staff, then offering sky-high compensation packages with a 48-hour ultimatum to Top AI scientists and engineers from OpenAI , alongside Meta’s acquisition of a 49% stake in Scale AI—appears to serve multiple objectives:

1. Undermining Competitors

By poaching key talent from rival companies, Meta not only weakens their R&D teams and disrupts their momentum but also puts pressure on Google, OpenAI, and others to reassess their partnerships with Scale AI. Meta’s investment may further marginalize these competitors by injecting uncertainty into their collaboration with Scale AI.

2. Reinvigorating the Internal Team

Bringing in fresh blood like Alex Wang’s team and Open AI Top talents could reenergize Meta’s existing research units. A successful “talent reset” may help the company gain a competitive edge in the race toward AGI.

3. Enhancing Brand Visibility

Even if the move doesn’t yield immediate results, it has already amplified Meta’s media presence, boosting its reputation as a leader in AI innovation.

From both a talent acquisition and PR standpoint, this appears to be a masterstroke for Meta.


However, the strategy is not without significant risks:

1. Internal Integration and Morale Challenges

The massive compensation packages offered to those talents could trigger resentment among existing employees—especially in the wake of recent layoffs—due to perceived pay inequity. This may lower morale and even accelerate internal attrition. Cultural differences between the incoming and incumbent teams could further complicate internal integration and collaboration.

2. Return on Investment and Performance Pressure

Meta’s substantial investment in Alex Wang and Scale AI comes with high expectations for short-term deliverables. In a domain as uncertain as AGI, both the market and shareholders will be eager for breakthroughs. If Wang’s team fails to deliver measurable progress quickly, Meta could face mounting scrutiny and uncertainty over the ROI.

3. Impacts on Scale AI and the Broader Ecosystem

Alex Wang stepping away as CEO is undoubtedly a major loss for Scale AI, even if he retains a board seat. Leadership transitions and potential talent departures may follow. Moreover, Scale AI’s history of legal and compliance issues could reflect poorly on Meta’s brand—especially if public perception ties Meta to those concerns despite holding only non-voting shares. More broadly, Meta’s aggressive “poaching” approach may escalate the AI talent war, drive up industry-wide costs, and prompt renewed debate over ethics and hiring norms in the AI sector.


Conclusion
Meta’s latest move is undeniably ambitious. While it positions the company aggressively in the AGI race, it also carries notable risks in terms of internal dynamics, ROI pressure, and broader ecosystem disruption. Only time will tell whether this bold gamble pays off.

Enigma – Mission X Challenge Accomplished with Python

Enigma M3 from 101 computing: https://www.101computing.net/enigma/
GitHub Repo: https://github.com/cuicaihao/Enigma-Mission-X

Short Summary

Inspired by Enigma – Mission X Challenge, this repo is used to save the research and practice efforts in Different Cipher methods.

The primary goals are using Python programming language to achieve targets listed as follows in Jupyter Notebooks:

Example

  • German Navy Ciphertext by Enigma M3: OJSBI BUPKA ECMEE ZH
  • German Message: Ziel hafen von DOVER
  • English Translation: Target port of DOVER
Enigma Mission – X

By running the notebook, it is not difficult to complete the deciphering process with the “keys” to get the original message from the ciphertext by the German Navy.

Notebook Outputs Example

However, it will be difficult to break down the cipher without knowing the keys. That will be the Turing-Welchman Bombe Simulator challenge.

About Enigma Mission X

Mission X is a game for programmers to accomplish the deciphering job required by Dr Alan Turing.

Mission X Letter from Alan Turning

Programmers need to break the secret with limited information as follows.

Example Message from German Navy

END

Technical Review 04: Human-Computer Interface from In-Context Learning to Instruct Understanding

  1. AI Assitant Summary
  2. Interface with LLM
  3. The Mysterious In-Context Learning
  4. Magical Instruct understanding
    1. Type 1: Academic Research Oriented Instruct
    2. Type 2: Human/Customer Needs Orented Instruct
  5. In Context Learning & Instruct Connection
  6. What’s Next?

AI Assitant Summary

The post first discusses different interface technologies used to connect people with language models. These include zero-shot prompting, few-shot prompting, in-context learning, and instruction. It explains the differences between zero-shot and few-shot learning and their advantages and limitations.

Next, it explores the concept of in-context learning, where language models can predict new examples by looking at existing ones without changing their parameters. It compares in-context learning with fine-tuning and highlights the differences between the two approaches.

The post then focuses on instructing understanding, dividing it into two categories: research-oriented and human/customer needs-oriented instruction. It emphasizes the importance of considering actual user needs in instruct-based tasks.

Lastly, it suggests a possible connection between in-context learning and instruction, proposing that language models could generate task descriptions based on real task instances. It mentions a study that shows improved performance when using instruction derived from this method.

Interface with LLM

Generally, the interface technologies between people and LLM that we often mention include zero-shot prompting, few-shot prompting, In-Context Learning, and instruction. These are actually ways of describing a specific task. But if you look at the literature, you will find that the names are quite confusing.

Zero-shot learning simply feeds the task text to the model and asks for results.

Text: i'll bet the video game is a lot more fun than the film.
Sentiment:

Few-shot learning presents a set of high-quality demonstrations, each consisting of both input and desired output, on the target task. As the model first sees good examples, it can better understand human intention and criteria for what kinds of answers are wanted. Therefore, few-shot learning often leads to better performance than zero-shot. However, it comes at the cost of more token consumption and may hit the context length limit when the input and output text are long.

Text: (lawrence bounces) all over the stage, dancing, running, sweating, mopping his face and generally displaying the wacky talent that brought him fame in the first place.
Sentiment: positive

Text: despite all evidence to the contrary, this clunker has somehow managed to pose as an actual feature movie, the kind that charges full admission and gets hyped on tv and purports to amuse small children and ostensible adults.
Sentiment: negative

Text: for the first time in years, de niro digs deep emotionally, perhaps because he's been stirred by the powerful work of his co-stars.
Sentiment: positive

Text: i'll bet the video game is a lot more fun than the film.
Sentiment:

Among them, Instruct is the interface method of ChatGPT, which means that people give descriptions of tasks in natural language, such as

Translate this sentence from Chinese to English:
....

“Zero-shot prompting” used to be called “zero-shot” in the past, but now it’s commonly referred to as “Instruct.” The two terms have the same meaning but there are two different methods involved.

When interacting with instruction models, we should describe the task requirement in detail, trying to be specific and precise and avoiding saying “not do something” but rather specify what to do.

Please label the sentiment towards the movie of the given movie review. The sentiment label should be "positive" or "negative". 
Text: i'll bet the video game is a lot more fun than the film.
Sentiment:

Explaining the desired audience is another smart way to give instructions. For example to produce educational materials for kids, and safe content,

Describe what is quantum physics to a 6-year-old.

.. in language that is safe for work.

In-context instruction learning (Ye et al. 2023) combines few-shot learning with instruction prompting. It incorporates multiple demonstration examples across different tasks in the prompt, each demonstration consisting of instruction, task input, and output. Note that their experiments were only on classification tasks and the instruction prompt contains all label options.

Definition: Determine the speaker of the dialogue, "agent" or "customer".
Input: I have successfully booked your tickets.
Ouput: agent

Definition: Determine which category the question asks for, "Quantity" or "Location".
Input: What's the oldest building in US?
Ouput: Location

Definition: Classify the sentiment of the given movie review, "positive" or "negative".
Input: i'll bet the video game is a lot more fun than the film.
Output:

In the early days, people would attempt to express a task by using different words or sentences, continually refining their approach. This method was effective for fitting the training data, without considering the distribution. The current approach is to give a specific command statement and aim for the language model to understand it. Both methods involve expressing tasks, but the underlying ideas behind them are distinct.

In Context Learning and few-shot prompting have a similar meaning. They both involve providing examples to a language model and using them to solve new problems.

In my opinion, In Context Learning can be seen as a specific task, while Instruct is a more abstract method of describing tasks. However, the usage of these terms can be confusing, and this understanding is just my personal opinion. Therefore, I will only discuss In Context Learning and Instruct here, and no longer mention zero-shot and few-shot anymore.

The Mysterious In-Context Learning

If you think about it carefully, you will find that In Context Learning is a very magical technology. What’s so magical about it?

The magic is that when you provide LLM with several sample examples {<x1,y1>, <x2, y2>, …, <xn, yn> }, and then give {<x_n+1>} to it, LLM can successfully predict the corresponding ones {<y_n+1>}.

When you hear this, you might ask: What’s so magical about this? Isn’t that how fine-tuning works? If you ask this, it means you haven’t thought deeply enough about this issue.

Fine-tuning and In Context Learning both seem to provide some examples to LLM, but they are qualitatively different (refer to the figure above): Fine-tuning uses these examples as training data and uses backpropagation to modify LLM. The model parameters and the action of modifying the model parameters reflect the process of LLM learning from these examples.

However, In Context Learning only took out examples for LLM to take a look at, and did not use backpropagation to modify the parameters of the LLM model based on the examples, and asked it to predict new examples. Since the model parameters have not been modified, this means that it seems that LLM has not gone through a learning process. If it has not gone through a learning process, then why can it predict new examples just by looking at it?

This is the magic of In Context Learning. Does this remind you of a lyric: “Just because I took one more look at you in the crowd, I can never forget your face again.” The song is called “Legend”. Are you saying it is legendary or not?

It seems that In Context Learning does not learn knowledge from examples. In fact, does LLM learn strangely? Or is it true that it didn’t learn anything? The answer to this question is still an unsolved mystery. Some existing studies have different versions, and it is difficult to judge which one tells the truth. Some research conclusions are even contradictory.

Here are some current opinions. As for who is right and who is wrong, you can only decide for yourself. Of course, I think pursuing the truth behind this magical phenomenon is a good research topic.

Rethinking the Role of Demonstrations: What Makes In-Context Learning Work? try to prove In Context Learning is not learning a mapping between x and y.

It was discovered that in the sample examples {<xi, yi>} provided to LLM, it does not actually matter whether the corresponding correct answer is yi. If we replace the correct answer with another random answer, this does not affect the effect of In Context Learning.

What really has a greater impact on In Context Learning is the distribution of x and y, that is, the distribution of the input text x and the candidate answers y. If you change these two distributions, for example, replace y with something other than the candidate answer. , then the In Context Learning effect drops sharply.

In short, this work proves that In Context Learning does not learn the mapping function, but the distribution of input and output is very important, and these two cannot be changed randomly.

Magical Instruct understanding

We can regard “Instruct” as a task description that is convenient for human beings to understand. Under this premise, the current research on “Instruct” can be divided into two types: “Instruct” which is more academic research, and “Instruct” which describes human real needs.

Fine-tuned Language Models Are Zero-Shot Learners (FLAN)

Type 1: Academic Research Oriented Instruct

Let’s look at the first type “Instruct” which is more academically research-oriented. Its core research theme is the generalization ability of the LLM model to understand “Instruct” in multi-task scenarios.

As shown in the FLAN model in the figure above, that is to say, there are many NLP tasks. For each task, the researchers construct one or more Prompt templates as the Instruct of the task and then use training examples to fine-tune the LLM model so that LLM can learn multiple tasks at the same time. task.

After training the model, give the LLM model instruction for a brand-new task that it has never seen before, and then let LLM solve the zero-shot task. Based on whether the task is solved well enough, we can judge whether the LLM model has the generalization ability to understand the Instruct.

Research findings suggest several factors that can significantly enhance the generalization capabilities of the Language Models Instruction (LLM). To augment the model’s instructional comprehension, the following strategies have proven effective: increasing the number of multi-tasking tasks, expanding the size of the LLM model, implementing CoT Prompting, and diversifying the range of tasks. By incorporating these measures, the LLM model can substantially improve its capacity to understand instructions.

Type 2: Human/Customer Needs Orented Instruct

The second type is instruction based on real human needs. This type of research is represented by InstructGPT and ChatGPT. This type of work is also based on multi-tasking, but the biggest difference from academic research-oriented work is that it is oriented to the real needs of human users.

Why do you say that? Because the task description prompts they use for LLM multi-task training are sampled from real requests submitted by a large number of users, instead of fixing the scope of the research task and then letting researchers write the task description prompts.

The so-called “real needs” here are reflected in two aspects: first, because they are randomly selected from the task descriptions submitted by users, the types of tasks covered are more diverse and more in line with the real needs of users; second, a certain prompt description of a task is submitted by the user and reflects what ordinary users would say when expressing task requirements, not what you think users would say. Obviously, the user experience of the LLM model modified by this type of work will be better.

In the InstructGPT paper, this method is also compared with the Instruct-based method of FLAN. First, fine-tune the tasks, data, and Prompt template mentioned by FLAN on GPT3 to reproduce the FLAN method on GPT3, and then compare it with InstructGPT. Because the basic model of InstructGPT is also GPT3, there are only differences in data and methods. The two are comparable, and it is found that the effect of the FLAN method is far behind InstructGPT.

So what’s the reason behind it? After analyzing the data, the paper believes that the FLAN method involves relatively few task fields and is a subset of the fields involved in InstructGPT, so the effect is not good. In other words, the tasks involved in the FLAN paper are inconsistent with the actual needs of users, which results in insufficient results in real scenarios. This means that it is very important to collect real needs from user data.

In Context Learning & Instruct Connection

If we assume that In Context Learning uses some examples to concretely express task commands, Instruct is an abstract task description that is more in line with human habits.

So, a natural question is: is there any connection between them? For example, can we provide LLM with several specific examples of completing a certain task and let LLM find the corresponding Instruct command described in natural language? (aka, Can LLM create the instruct command for itself by watching the human involved process)

There’s actually some work being done on this issue here and there, and I think it’s a really interesting research direction.

Let’s talk about the answer first. The answer is: Yes, LLM can.

Large Language Models are Human-Level Prompt Engineers is a very interesting job in this direction.

As shown in the figure above, for a certain task, give LLM some examples, let LLM automatically generate natural language commands that can describe the task, and then it use the task description generated by LLM to test the task’s effectiveness.

The basic models it uses are GPT 3 and InstructGPT. After the blessing of this technology, the effect of Instruct generated by LLM is greatly improved compared to GPT 3 and InstructGPT which do not use this technology, and in some tasks Superhuman performance.

This shows that there is a mysterious inner connection between concrete task examples and natural language descriptions of tasks. As for what exactly this connection is? We don’t know anything solid conclusions about this yet.

What’s Next?

Technical Review 05: How to Enhance LLM’s Reasoning Ability

Previous Blogs:

Fast Neural Style Transfer by PyTorch (Mac OS)

2021-Jan-31: The git repo has been upgraded from PyTorch 0.3.0 to PyTorch 1.7.0 with Python 3.8.3.

Chris Cui's avatarC. Cui's Blog

2021-Jan-31: The git repo has been upgraded from PyTorch 0.3.0 to PyTorch 1.7.0.

Continue my last post Image Style Transfer Using ConvNets by TensorFlow (Windows), this article will introduce the Fast Neural Style Transfer by PyTorch on MacOS.

The original program is written in Python, and uses [PyTorch], [SciPy]. A GPU is not necessary but can provide a significant speedup especially for training a new model. Regular sized images can be styled on a laptop or desktop using saved models.

More details about the algorithm could be found in the following papers:

  1. Perceptual Losses for Real-Time Style Transfer and Super-Resolution ;
  2. Instance Normalization: The Missing Ingredient for Fast Stylization.

If you could not download the papers, here are the Papers.

You can find all the source code and images at my GitHub: fast_neural_style .

View original post 302 more words

Increasing Transparency into What It Takes to Achieve Performance Gains of Machine Learning Algorithms

The computations required for Deep Learning research have been doubling every few months, resulting in an estimated 300,000x increase from 2012 to 2018. AI could account for as much as one-tenth of the world’s electricity use by 2025 according to this article [1].

Continue reading “Increasing Transparency into What It Takes to Achieve Performance Gains of Machine Learning Algorithms”

Just Got a Reviewer Certificate from Data Mining and Knowledge Discovery (WIREs)

Thanks to the Editors and Board of WIREs for supporting me. As an independent reviewer, I will be fair to everyone and never give in to the “scientific mafia” and “citation cartels”.

Data Mining and Knowledge Discovery (WIREs) (Impact Factor: 2.541)

WIREs_Reviewer_Certificate.PNG

A Taste of TensorFlow on My Android Phone

If you like Google’s open-source machine learning framework, TensorFlow, do not miss this “TensorFlow For Poets“.  I went through the tutorial this afternoon and found it is super Awesome. See the photos below, I first tested it on the coffee mug from my Intern company, Aurecon Group. I used the virtual device, Nexus 5X, from Android Studio 3.0.1 on MacBook Air 11′  (Do not do this unless you have enough SSD 😛 ).

 

This slideshow requires JavaScript.

Then, I successfully installed the compiled app (TF_Classify) on my XIAO MI – 4C (MIUI 9.0 – Android 7.0) and tested it on my coffee mug at home.
You can download and install it on your own Android devices from the following link:

Continue reading “A Taste of TensorFlow on My Android Phone”

Starting My First Intern at Melbourne Australia Tomorrow

Dear All,

I am using the song above to thank you all for your help and support in the past. You know that I have spent the last three years (2014-2017) in pursuing my Ph.D. degree in Computer Science and got a plan to be graduated in 2018.

Continue reading “Starting My First Intern at Melbourne Australia Tomorrow”