AI-Powered Search: Google’s Transformation vs. Perplexity

TL;DR, Play the podcast (Audio Overview generated by NotebookLM)

  1. Abstract
  2. Google’s AI Transformation: From PageRank to Gemini-Powered Search
    1. The Search Generative Experience (SGE) Revolution
    2. Google’s LLM Arsenal
    3. Technical Architecture Integration
    4. Key Differentiators of Google’s AI Search
  3. Perplexity AI Architecture: The RAG-Powered Search Revolution
    1. Simplified Architecture View
    2. How Perplexity Works: From Query to Answer
    3. Technical Workflow Diagram
  4. The New Search Paradigm: AI-First vs AI-Enhanced Approaches
    1. Google’s Philosophy: “AI-Enhanced Universal Search”
    2. Perplexity’s Philosophy: “AI-Native Conversational Search”
    3. Comprehensive Technology & Business Comparison
  5. The Future of AI-Powered Search: A New Competitive Landscape
    1. Implementation Strategy Battle: Integration vs. Innovation
    2. The Multi-Modal Future
    3. Business Model Evolution Under AI
    4. Technical Architecture Convergence
    5. The Browser and Distribution Channel Wars
  6. Strategic Implications and Future Outlook
    1. Key Strategic Insights
    2. The New Competitive Dynamics
    3. Looking Ahead: Industry Predictions
  7. Recommendations for Stakeholders
  8. Conclusion

Abstract

This blog examines the rapidly evolving landscape of AI-powered search, comparing Google’s recent transformation with its Search Generative Experience (SGE) and Gemini integration against Perplexity AI‘s native AI-first approach. Both companies now leverage large language models, but with fundamentally different architectures and philosophies.

The New Reality: Google has undergone a dramatic transformation from traditional keyword-based search to an AI-driven conversational answer engine. With the integration of Gemini, LaMDA, PaLM, and the rollout of AI Overviews (formerly SGE), Google now synthesizes information from multiple sources into concise, contextual answers—directly competing with Perplexity’s approach.

Key Findings:

  • Convergent Evolution: Both platforms now use LLMs for answer generation, but Google maintains its traditional search infrastructure while Perplexity was built AI-first from the ground up
  • Architecture Philosophy: Google integrates AI capabilities into its existing search ecosystem (hybrid approach), while Perplexity centers everything around RAG and multi-model orchestration (AI-native approach)
  • AI Technology Stack: Google leverages Gemini (multimodal), LaMDA (conversational), and PaLM models, while Perplexity orchestrates external models (GPT, Claude, Gemini, Llama, DeepSeek)
  • User Experience: Google provides AI Overviews alongside traditional search results, while Perplexity delivers answer-first experiences with citations
  • Market Dynamics: The competition has intensified with Google’s AI transformation, making the choice between platforms more about implementation philosophy than fundamental capabilities

This represents a paradigm shift where the question is no longer “traditional vs. AI search” but rather “how to best implement AI-powered search” with different approaches to integration, user experience, and business models.

Keywords: AI Search, RAG, Large Language Models, Search Architecture, Perplexity AI, Google Search, Conversational AI, SGE, Gemini.

Google has undergone one of the most significant transformations in its history, evolving from a traditional link-based search engine to an AI-powered answer engine. This transformation represents a strategic response to the rise of AI-first search platforms and changing user expectations.

The Search Generative Experience (SGE) Revolution

Google’s Search Generative Experience (SGE), now known as AI Overviews, fundamentally changes how search results are presented:

  • AI-Synthesized Answers: Instead of just providing links, Google’s AI generates comprehensive insights, explanations, and summaries from multiple sources
  • Contextual Understanding: Responses consider user context including location, search history, and preferences for personalized results
  • Multi-Step Query Handling: The system can handle complex, conversational queries that require reasoning and synthesis
  • Real-Time Information Grounding: AI overviews are grounded in current, real-time information while maintaining accuracy

Google’s LLM Arsenal

Google has strategically integrated multiple advanced AI models into its search infrastructure:

Gemini: The Multimodal Powerhouse
  • Capabilities: Understands and generates text, images, videos, and audio
  • Search Integration: Enables complex query handling including visual search, reasoning tasks, and detailed information synthesis
  • Multimodal Processing: Handles queries that combine text, images, and other media types
LaMDA: Conversational AI Foundation
  • Purpose: Powers natural, dialogue-like interactions in search
  • Features: Enables follow-up questions and conversational context maintenance
  • Integration: Supports Google’s shift toward conversational search experiences

PaLM: Large-Scale Language Understanding

  • Role: Provides advanced language processing capabilities
  • Applications: Powers complex reasoning, translation (100+ languages), and contextual understanding
  • Scale: Handles extended documents and multimodal inputs

Technical Architecture Integration

Google’s approach differs from AI-first platforms by layering AI capabilities onto existing infrastructure:

  • Hybrid Architecture: Maintains traditional search capabilities while adding AI-powered features
  • Scale Integration: Leverages existing massive infrastructure and data
  • DeepMind Synergy: Strategic integration of DeepMind research into commercial search applications
  • Continuous Learning: ML ranking algorithms and AI models learn from user interactions in real-time
  • Global Reach: AI features deployed across 100+ languages with localized understanding

Perplexity AI Architecture: The RAG-Powered Search Revolution

Perplexity AI represents a fundamental reimagining of search technology, built on three core innovations:

  1. Retrieval-Augmented Generation (RAG): Combines real-time web crawling with large language model capabilities
  2. Multi-Model Orchestration: Leverages multiple AI models (GPT, Claude, Gemini, Llama, DeepSeek) for optimal responses
  3. Integrated Citation System: Provides transparent source attribution with every answer

The platform offers multiple access points to serve different user needs: Web Interface, Mobile App, Comet Browser, and Enterprise API.

Core Architecture Components

Simplified Architecture View

For executive presentations and high-level discussions, this three-layer view highlights the essential components:

How Perplexity Works: From Query to Answer

Understanding Perplexity’s workflow reveals why it delivers fundamentally different results than traditional search engines. Unlike Google’s approach of matching keywords to indexed pages, Perplexity follows a sophisticated multi-step process:

The Eight-Step Journey

  1. Query Reception: User submits a natural language question through any interface
  2. Real-Time Retrieval: Custom crawlers search the web for current, relevant information
  3. Source Indexing: Retrieved content is processed and indexed in real-time
  4. Context Assembly: RAG system compiles relevant information into coherent context
  5. Model Selection: AI orchestrator chooses the optimal model(s) for the specific query type
  6. Answer Generation: Selected model(s) generate comprehensive responses using retrieved context
  7. Citation Integration: System automatically adds proper source attribution
  8. Response Delivery: Final answer with citations is presented to the user

Technical Workflow Diagram

The sequence below shows how a user query flows through Perplexity’s system.

This process typically completes in under 3 seconds, delivering both speed and accuracy.

The New Search Paradigm: AI-First vs AI-Enhanced Approaches

The competition between Google and Perplexity has evolved beyond traditional vs. AI search to represent two distinct philosophies for implementing AI-powered search experiences.

  • Hybrid Integration: Layer advanced AI capabilities onto proven search infrastructure
  • Comprehensive Coverage: Maintain traditional search results alongside AI-generated overviews
  • Gradual Transformation: Evolve existing user behaviors rather than replace them entirely
  • Scale Advantage: Leverage massive existing data and infrastructure for AI training and deployment
  • Model Agnostic: Orchestrate best-in-class models rather than developing proprietary AI
  • Clean Slate Design: Built from the ground up with AI-first architecture
  • Answer-Centric: Focus entirely on direct answer generation with source attribution
  • Conversational Flow: Design for multi-turn, contextual conversations rather than single queries

Comprehensive Technology & Business Comparison

DimensionGoogle AI-Enhanced SearchPerplexity AI-Native Search
InputNatural language + traditional keywordsPure natural language, conversational
AI ModelsGemini, LaMDA, PaLM (proprietary)GPT, Claude, Gemini, Llama, DeepSeek (orchestrated)
ArchitectureHybrid (AI + traditional infrastructure)Pure AI-first (RAG-centered)
RetrievalEnhanced index + Knowledge Graph + real-timeCustom crawler + real-time retrieval
Core TechAI Overviews + traditional rankingRAG + multi-model orchestration
OutputHybrid (AI Overview + links + ads)Direct answers with citations
ContextLimited conversational memoryFull multi-turn conversation memory
ExtensionsMaps, News, Shopping, Ads integrationDocument search, e-commerce, APIs
BusinessAd-driven + AI premium featuresSubscription + API + e-commerce
UX“AI answers + traditional options”“Conversational AI assistant”
ProductsGoogle Search with SGE/AI OverviewPerplexity Web/App, Comet Browser
DeploymentGlobal rollout with localizationGlobal expansion, English-focused
Data AdvantageMassive proprietary data + real-timeReal-time web data + model diversity
ProductsGoogle Search, AdsPerplexity Web/App, Comet Browser

The Future of AI-Powered Search: A New Competitive Landscape

The integration of AI into search has fundamentally changed the competitive landscape. Rather than a battle between traditional and AI search, we now see different approaches to implementing AI-powered experiences competing for user mindshare and market position.

Implementation Strategy Battle: Integration vs. Innovation

Google’s Integration Strategy:

  • Advantage: Massive user base and infrastructure to deploy AI features at scale
  • Challenge: Balancing AI innovation with existing business model dependencies
  • Approach: Gradual rollout of AI features while maintaining traditional search options

Perplexity’s Innovation Strategy:

  • Advantage: Clean slate design optimized for AI-first experiences
  • Challenge: Building user base and competing with established platforms
  • Approach: Focus on superior AI experience to drive user acquisition

The Multi-Modal Future

Both platforms are moving toward comprehensive multi-modal experiences:

  • Visual Search Integration: Google Lens vs. Perplexity’s image understanding capabilities
  • Voice-First Interactions: Google Assistant integration vs. conversational AI interfaces
  • Video and Audio Processing: Gemini’s multimodal capabilities vs. orchestrated model approaches
  • Document Intelligence: Enterprise document search and analysis capabilities

Business Model Evolution Under AI

Advertising Model Transformation:

  • Google must adapt its ad-centric model to AI Overviews without disrupting user experience
  • Challenge of monetizing direct answers vs. traditional click-through advertising
  • Need for new ad formats that work with conversational AI

Subscription and API Models:

  • Perplexity’s success with subscription tiers validates alternative monetization
  • Growing enterprise demand for AI-powered search APIs and integrations
  • Premium features becoming differentiators (document search, advanced models, higher usage limits)

Technical Architecture Convergence

Despite different starting points, both platforms are converging on similar technical capabilities:

  • Real-Time Information: Both now emphasize current, up-to-date information retrieval
  • Source Attribution: Transparency and citation becoming standard expectations
  • Conversational Context: Multi-turn conversation support across platforms
  • Model Diversity: Google developing multiple specialized models, Perplexity orchestrating external models

The Browser and Distribution Channel Wars

Perplexity’s Chrome Acquisition Strategy:

  • $34.5B all-cash bid for Chrome represents unprecedented ambition in AI search competition
  • Strategic Value: Control over browser defaults, user data, and search distribution
  • Market Impact: Success would fundamentally alter competitive dynamics and user acquisition costs
  • Regulatory Reality: Bid likely serves as strategic positioning and leverage rather than realistic acquisition

Alternative Distribution Strategies:

  • AI-native browsers (Comet) as specialized entry points
  • API integrations into enterprise and developer workflows
  • Mobile-first experiences capturing younger user demographics

Strategic Implications and Future Outlook

The competition between Google’s AI-enhanced approach and Perplexity’s AI-native strategy represents a fascinating case study in how established platforms and startups approach technological transformation differently.

Key Strategic Insights

  • The AI Integration Challenge: Google’s transformation demonstrates that even dominant platforms must fundamentally reimagine their core products to stay competitive in the AI era
  • Architecture Philosophy Matters: The choice between hybrid integration (Google) vs. AI-first design (Perplexity) creates different strengths, limitations, and user experiences
  • Business Model Pressure: AI-powered search challenges traditional advertising models, forcing experimentation with subscriptions, APIs, and premium features
  • User Behavior Evolution: Both platforms are driving the shift from “search and browse” to “ask and receive” interactions, fundamentally changing how users access information

The New Competitive Dynamics

Advantages of Google’s AI-Enhanced Approach:

  • Massive scale and infrastructure for global AI deployment
  • Existing user base to gradually transition to AI features
  • Deep integration with knowledge graphs and proprietary data
  • Ability to maintain traditional search alongside AI innovations

Advantages of Perplexity’s AI-Native Approach:

  • Optimized user experience designed specifically for conversational AI
  • Agility to implement cutting-edge AI techniques without legacy constraints
  • Model-agnostic architecture leveraging best-in-class external AI models
  • Clear value proposition for users seeking direct, cited answers

Looking Ahead: Industry Predictions

Near-Term (1-2 years):

  • Continued convergence of features between platforms
  • Google’s global rollout of AI Overviews across all markets and languages
  • Perplexity’s expansion into enterprise and specialized vertical markets
  • Emergence of more AI-native search platforms following Perplexity’s model

Medium-Term (3-5 years):

  • AI-powered search becomes the standard expectation across all platforms
  • Specialized AI search tools for professional domains (legal, medical, scientific research)
  • Integration of real-time multimodal capabilities (live video analysis, augmented reality search)
  • New regulatory frameworks for AI-powered information systems

Long-Term (5+ years):

  • Fully conversational AI assistants replace traditional search interfaces
  • Personal AI agents that understand individual context and preferences
  • Integration with IoT and ambient computing for seamless information access
  • Potential emergence of decentralized, blockchain-based search alternatives

Recommendations for Stakeholders

For Technology Leaders:

  • Hybrid Strategy: Consider Google’s approach of enhancing existing systems with AI rather than complete rebuilds
  • Model Orchestration: Investigate Perplexity’s approach of orchestrating multiple AI models for optimal results
  • Real-Time Capabilities: Invest in real-time information retrieval and processing systems
  • Citation Systems: Implement transparent source attribution to build user trust

For Business Strategists:

  • Revenue Model Innovation: Experiment with subscription, API, and premium feature models beyond traditional advertising
  • User Experience Focus: Prioritize conversational, answer-first experiences in product development
  • Distribution Strategy: Evaluate the importance of browser control and default search positions
  • Competitive Positioning: Decide between AI-enhancement of existing products vs. AI-native alternatives

For Investors:

  • Platform Risk Assessment: Evaluate how established platforms are adapting to AI disruption
  • Technology Differentiation: Assess the sustainability of competitive advantages in rapidly evolving AI landscape
  • Business Model Viability: Monitor the success of alternative monetization strategies beyond advertising
  • Regulatory Impact: Consider potential regulatory responses to AI-powered information systems and search market concentration

The future of search will be determined by execution quality, user adoption, and the ability to balance innovation with practical business considerations. Both Google and Perplexity have established viable but different paths forward, setting the stage for continued innovation and competition in the AI-powered search landscape.

  • Monitor the browser control battle and distribution channel acquisitions
  • Technology Differentiation: Assess the sustainability of competitive advantages in rapidly evolving AI landscape
  • Business Model Viability: Monitor the success of alternative monetization strategies beyond advertising
  • Regulatory Impact: Consider potential regulatory responses to AI-powered information systems and search market concentration

Conclusion

The evolution of search from Google’s traditional PageRank-driven approach to today’s AI-powered landscape represents one of the most significant technological shifts in internet history. Google’s recent transformation with its Search Generative Experience and Gemini integration demonstrates that even the most successful platforms must reinvent themselves to remain competitive in the AI era.

The competition between Google’s AI-enhanced strategy and Perplexity’s AI-native approach offers valuable insights into different paths for implementing AI at scale. Google’s hybrid approach leverages massive existing infrastructure while gradually transforming user experiences, while Perplexity’s clean-slate design optimizes entirely for conversational AI interactions.

As both platforms continue to evolve, the ultimate winners will be users who gain access to more intelligent, efficient, and helpful ways to access information. The future of search will likely feature elements of both approaches: the scale and comprehensiveness of Google’s enhanced platform combined with the conversational fluency and transparency of AI-native solutions.

The battle for search supremacy in the AI era has only just begun, and the innovations emerging from this competition will shape how humanity accesses and interacts with information for decades to come.


This analysis reflects the state of AI-powered search as of August 2025. The rapidly evolving nature of AI technology and competitive dynamics may significantly impact future developments. Both Google and Perplexity continue to innovate at unprecedented pace, making ongoing monitoring essential for stakeholders in this space. This analysis represents the current state of AI-powered search as of August 2025. The rapidly evolving nature of AI technology and competitive landscape may impact future developments.

Our Future with AI: Three Strategies to Ensure It Stays on Our Side

As Artificial Intelligence rapidly evolves, ensuring it remains a beneficial tool rather than a source of unforeseen challenges is paramount; this article explores three critical strategies to keep AI firmly on our side. Our AI researchers can draw lessons from cybersecurity, robotics, and astrobiology side. Source: IEEE Spectrum April 2025; 3 Ways to Keep AI on Our Side: AI Researchers can Draw Lessons from Cybersecurity, Robotics, and Astrobiology

Play the podcast

中文翻译摘要

这篇文章提出了确保人工智能安全和有益发展的三个独特且跨学科的策略。

应对人工智能的独特错误模式:布鲁斯·施奈尔(Bruce Schneier)和内森·E·桑德斯(Nathan E. Sanders)(网络安全视角)指出,人工智能系统,特别是大型语言模型(LLMs),其错误模式与人类错误显著不同——它们更难预测,不集中在知识空白处,且缺乏对自身错误的自我意识。他们提出双重研究方向:一是工程化人工智能以产生更易于人类理解的错误(例如,通过RLHF等精炼的对齐技术);二是开发专门针对人工智能独特“怪异”之处的新型安全与纠错系统(例如,迭代且多样化的提示)。

更新伦理框架以打击人工智能欺骗:达里乌什·杰米尔尼亚克(Dariusz Jemielniak)(机器人与互联网文化视角)认为,鉴于人工智能驱动的欺骗行为(包括深度伪造、复杂的错误信息宣传和操纵性人工智能互动)日益增多,艾萨克·阿西莫夫(Isaac Asimov)传统的机器人三定律已不足以应对现代人工智能。他提出一条“机器人第四定律”:机器人或人工智能不得通过冒充人类来欺骗人类。实施这项法律将需要强制性的人工智能披露、清晰标注人工智能生成内容、技术识别标准、法律执行以及公众人工智能素养倡议,以维护人机协作中的信任。

建立通用人工智能(AGI)检测与互动的严格协议:埃德蒙·贝戈利(Edmon Begoli)和阿米尔·萨多夫尼克(Amir Sadovnik)(天体生物学/SETI视角)建议,通用人工智能(AGI)的研究可以借鉴搜寻地外文明(SETI)的方法论。他们主张对AGI采取结构化的科学方法,包括:制定清晰、多学科的“通用智能”及相关概念(如意识)定义;创建超越图灵测试局限性的鲁棒、新颖的AGI检测指标和评估基准;以及制定国际公认的检测后协议,以便在AGI出现时进行验证、确保透明度、安全性和伦理考量。

总而言之,这些观点强调了迫切需要创新、多方面的方法——涵盖安全工程、伦理准则修订以及严格的科学协议制定——以主动管理先进人工智能系统的社会融入和潜在未来轨迹。


Abstract: this article presents three distinct, cross-disciplinary strategies for ensuring the safe and beneficial development of Artificial Intelligence.

Addressing Idiosyncratic AI Error Patterns (Cybersecurity Perspective): Bruce Schneier and Nathan E. Sanders highlight that AI systems, particularly Large Language Models (LLMs), exhibit error patterns significantly different from human mistakes—being less predictable, not clustered around knowledge gaps, and lacking self-awareness of error. They propose a dual research thrust: engineering AIs to produce more human-intelligible errors (e.g., through refined alignment techniques like RLHF) and developing novel security and mistake-correction systems specifically designed for AI’s unique “weirdness” (e.g., iterative, varied prompting).

Updating Ethical Frameworks to Combat AI Deception (Robotics & Internet Culture Perspective): Dariusz Jemielniak argues that Isaac Asimov’s traditional Three Laws of Robotics are insufficient for modern AI due to the rise of AI-enabled deception, including deepfakes, sophisticated misinformation campaigns, and manipulative AI interactions. He proposes a “Fourth Law of Robotics”: A robot or AI must not deceive a human being by impersonating a human being. Implementing this law would necessitate mandatory AI disclosure, clear labeling of AI-generated content, technical identification standards, legal enforcement, and public AI literacy initiatives to maintain trust in human-AI collaboration.

Establishing Rigorous Protocols for AGI Detection and Interaction (Astrobiology/SETI Perspective): Edmon Begoli and Amir Sadovnik suggest that research into Artificial General Intelligence (AGI) can draw methodological lessons from the Search for Extraterrestrial Intelligence (SETI). They advocate for a structured scientific approach to AGI that includes:

  • Developing clear, multidisciplinary definitions of “general intelligence” and related concepts like consciousness.
  • Creating robust, novel metrics and evaluation benchmarks for detecting AGI, moving beyond limitations of tests like the Turing Test.
  • Formulating internationally recognized post-detection protocols for validation, transparency, safety, and ethical considerations, should AGI emerge.

Collectively, these perspectives emphasize the urgent need for innovative, multi-faceted approaches—spanning security engineering, ethical guideline revision, and rigorous scientific protocol development—to proactively manage the societal integration and potential future trajectory of advanced AI systems.


Here are the full detailed content:

3 Ways to Keep AI on Our Side

AS ARTIFICIAL INTELLIGENCE reshapes society, our traditional safety nets and ethical frameworks are being put to the test. How can we make sure that AI remains a force for good? Here we bring you three fresh visions for safer AI.

  • In the first essay, security expert Bruce Schneier and data scientist Nathan E. Sanders explore how AI’s “weird” error patterns create a need for innovative security measures that go beyond methods honed on human mistakes.
  • Dariusz Jemielniak, an authority on Internet culture and technology, argues that the classic robot ethics embodied in Isaac Asimov’s famous rules of robotics need an update to counterbalance AI deception and a world of deepfakes.
  • And in the final essay, the AI researchers Edmon Begoli and Amir Sadovnik suggest taking a page from the search for intelligent life in the stars; they propose rigorous standards for detecting the possible emergence of human-level AI intelligence.

As AI advances with breakneck speed, these cross-disciplinary strategies may help us keep our hands on the reins.


AI Mistakes Are Very Different from Human Mistakes

WE NEED NEW SECURITY SYSTEMS DESIGNED TO DEAL WITH THEIR WEIRDNESS

Bruce Schneier & Nathan E. Sanders

HUMANS MAKE MISTAKES all the time. All of us do, every day, in tasks both new and routine. Some of our mistakes are minor, and some are catastrophic. Mistakes can break trust with our friends, lose the confidence of our bosses, and sometimes be the difference between life and death.

Over the millennia, we have created security systems to deal with the sorts of mistakes humans commonly make. These days, casinos rotate their dealers regularly, because they make mistakes if they do the same task for too long. Hospital personnel write on patients’ limbs before surgery so that doctors operate on the correct body part, and they count surgical instruments to make sure none are left inside the body. From copyediting to double-entry bookkeeping to appellate courts, we humans have gotten really good at preventing and correcting human mistakes.

Humanity is now rapidly integrating a wholly different kind of mistakemaker into society: AI. Technologies like large language models (LLMs) can perform many cognitive tasks traditionally fulfilled by humans, but they make plenty of mistakes. You may have heard about chatbots telling people to eat rocks or add glue to pizza. What differentiates AI systems’ mistakes from human mistakes is their weirdness. That is, AI systems do not make mistakes in the same ways that humans do.

Much of the risk associated with our use of AI arises from that difference. We need to invent new security systems that adapt to these differences and prevent harm from AI mistakes.

IT’S FAIRLY EASY to guess when and where humans will make mistakes. Human errors tend to come at the edges of someone’s knowledge: Most of us would make mistakes solving calculus problems. We expect human mistakes to be clustered: A single calculus mistake is likely to be accompanied by others. We expect mistakes to wax and wane depending on factors such as fatigue and distraction. And mistakes are typically accompanied by ignorance: Someone who makes calculus mistakes is also likely to respond “I don’t know” to calculus-related questions.

To the extent that AI systems make these humanlike mistakes, we can bring all of our mistake-correcting systems to bear on their output. But the current crop of AI models—particularly LLMs—make mistakes differently.

AI errors come at seemingly random times, without any clustering around particular topics. The mistakes tend to be more evenly distributed through the knowledge space; an LLM might be equally likely to make a mistake on a calculus question as it is to propose that cabbages eat goats. And AI mistakes aren’t accompanied by ignorance. An LLM will be just as confident when saying something completely and obviously wrong as it will be when saying something true.

The inconsistency of LLMs makes it hard to trust their reasoning in complex, multistep problems. If you want to use an AI model to help with a business problem, it’s not enough to check that it understands what factors make a product profitable; you need to be sure it won’t forget what money is.

THIS SITUATION INDICATES two possible areas of research: engineering LLMs to make mistakes that are more humanlike, and building new mistake-correcting systems that deal with the specific sorts of mistakes that LLMs tend to make.

We already have some tools to lead LLMs to act more like humans. Many of these arise from the field of “alignment” research, which aims to make models act in accordance with the goals of their human developers. One example is the technique that was arguably responsible for the breakthrough success of ChatGPT: reinforcement learning with human feedback. In this method, an AI model is rewarded for producing responses that get a thumbs-up from human evaluators. Similar approaches could be used to induce AI systems to make humanlike mistakes, particularly by penalizing them more for mistakes that are less intelligible.

When it comes to catching AI mistakes, some of the systems that we use to prevent human mistakes will help. To an extent, forcing LLMs to double-check their own work can help prevent errors. But LLMs can also confabulate seemingly plausible yet truly ridiculous explanations for their flights from reason.

Other mistake-mitigation systems for AI are unlike anything we use for humans. Because machines can’t get fatigued or frustrated, it can help to ask an LLM the same question repeatedly in slightly different ways and then synthesize its responses. Humans won’t put up with that kind of annoying repetition, but machines will.

RESEARCHERS ARE still struggling to understand where LLM mistakes diverge from human ones. Some of the weirdness of AI is actually more humanlike than it first appears.

Small changes to a query to an LLM can result in wildly different responses, a problem known as prompt sensitivity. But, as any survey researcher can tell you, humans behave this way, too. The phrasing of a question in an opinion poll can have drastic impacts on the answers.

LLMs also seem to have a bias toward repeating the words that were most common in their training data—for example, guessing familiar place names like “America” even when asked about more exotic locations. Perhaps this is an example of the human “availability heuristic” manifesting in LLMs; like humans, the machines spit out the first thing that comes to mind rather than reasoning through the question. Also like humans, perhaps, some LLMs seem to get distracted in the middle of long documents; they remember more facts from the beginning and end.

In some cases, what’s bizarre about LLMs is that they act more like humans than we think they should. Some researchers have tested the hypothesis that LLMs perform better when offered a cash reward or threatened with death. It also turns out that some of the best ways to “jailbreak” LLMs (getting them to disobey their creators’ explicit instructions) look a lot like the kinds of social-engineering tricks that humans use on each otherfor example, pretending to be someone else or saying that the request is just a joke. But other effective jailbreaking techniques are things no human would ever fall for. One group found that if they used ASCII art (constructions of symbols that look like words or pictures) to pose dangerous questions, like how to build a bomb, the LLM would answer them willingly.

Humans may occasionally make seemingly random, incomprehensible, and inconsistent mistakes, but such occurrences are rare and often indicative of more serious problems. We also tend not to put people exhibiting these behaviors in decision-making positions. Likewise, we should confine AI decision-making systems to applications that suit their actual abilities—while keeping the potential ramifications of their mistakes firmly in mind.


Asimov’s Laws of Robotics Need an Update for AI PROPOSING A FOURTH LAW OF ROBOTICS

Dariusz Jemielniak

IN 1942, the legendary science fiction author Isaac Asimov introduced his Three Laws of Robotics in his short story “Runaround.” The laws were later popularized in his seminal story collection I, Robot.

  1. FIRST LAW: A robot may not injure a human being or, through inaction, allow a human being to come to harm.
  2. SECOND LAW: A robot must obey the orders given it by human beings except where such orders would conflict with the First Law.
  3. THIRD LAW: A robot must protect its own existence as long as such protection does not conflict with the First or Second Law.

While drawn from works of fiction, these laws have shaped discussions of robot ethics for decades. And as AI systems—which can be considered virtual robots—have become more sophisticated and pervasive, some technologists have found Asimov’s framework useful for considering the potential safeguards needed for AI that interacts with humans.

But the existing three laws are not enough. Today, we are entering an era of unprecedented human-AI collaboration that Asimov could hardly have envisioned. The rapid advancement of generative AI, particularly in language and image generation, has created challenges beyond Asimov’s original concerns about physical harm and obedience.

THE PROLIFERATION of AI-enabled deception is particularly concerning. According to the FBI’s most recent Internet Crime Report, cybercrime involving digital manipulation and social engineering results in annual losses counted in the billions. The European Union Agency for Cybersecurity’s ENISA Threat Landscape 2023 highlighted deepfakes—synthetic media that appear genuine—as an emerging threat to digital identity and trust.

Social-media misinformation is a huge problem today. I studied it during the pandemic extensively and can say that the proliferation of generative AI tools has made its detection increasingly difficult. AI-generated propaganda is often just as persuasive as or even more persuasive than traditional propaganda, and bad actors can very easily use AI to create convincing content. Deepfakes are on the rise everywhere. Botnets can use AI-generated text, speech, and video to create false perceptions of widespread support for any political issue. Bots are now capable of making phone calls while impersonating people, and AI scam calls imitating familiar voices are increasingly common. Any day now, we can expect a boom in video-call scams based on AI-rendered overlay avatars, allowing scammers to impersonate loved ones and target the most vulnerable populations.

Even more alarmingly, children and teenagers are forming emotional attachments to AI agents, and are sometimes unable to distinguish between interactions with real friends and bots online. Already, there have been suicides attributed to interactions with AI chatbots.

In his 2019 book Human Compatible (Viking), the eminent computer scientist Stuart Russell argues that AI systems’ ability to deceive humans represents a fundamental challenge to social trust. This concern is reflected in recent policy initiatives, most notably the European Union’s AI Act, which includes provisions requiring transparency in AI interactions and transparent disclosure of AI-generated content. In Asimov’s time, people couldn’t have imagined the countless ways in which artificial agents could use online communication tools and avatars to deceive humans.

Therefore, we must make an addition to Asimov’s laws.

FOURTH LAW: A robot or AI must not deceive a human being by impersonating a human being.

WE NEED CLEAR BOUNDARIES. While human-AI collaboration can be constructive, AI deception undermines trust and leads to wasted time, emotional distress, and misuse of resources. Artificial agents must identify themselves to ensure our interactions with them are transparent and productive. AI-generated content should be clearly marked unless it has been significantly edited and adapted by a human.

Implementation of this Fourth Law would require

  • mandatory AI disclosure in direct interactions,
  • clear labeling of AI-generated content,
  • technical standards for AI identification,
  • legal frameworks for enforcement, and
  • educational initiatives to improve AI literacy.

Of course, all this is easier said than done. Enormous research efforts are already underway to find reliable ways to watermark or detect AI-generated text, audio, images, and videos. But creating the transparency I’m calling for is far from a solved problem.

The future of human-AI collaboration depends on maintaining clear distinctions between human and artificial agents. As noted in the IEEE report Ethically Aligned Design, transparency in AI systems is fundamental to building public trust and ensuring the responsible development of artificial intelligence.

Asimov’s complex stories showed that even robots that tried to follow the rules often discovered there were unintended consequences to their actions. Still, having AI systems that are at least trying to follow Asimov’s ethical guidelines would be a very good start.


What Can AI Researchers Learn from Alien Hunters?

THE SETI INSTITUTE’S APPROACH HAS LESSONS FOR RESEARCH ON ARTIFICIAL GENERAL INTELLIGENCE

Edmon Begoli & Amir Sadovnik

THE EMERGENCE OF artificial general intelligence (systems that can perform any intellectual task a human can) could be the most important event in human history. Yet AGI remains an elusive and controversial concept. We lack a clear definition of what it is, we don’t know how to detect it, and we don’t know how to interact with it if it finally emerges.

What we do know is that today’s approaches to studying AGI are not nearly rigorous enough. Companies like OpenAI are actively striving to create AGI, but they include research on AGI’s social dimensions and safety issues only as their corporate leaders see fit. And academic institutions don’t have the resources for significant efforts.

We need a structured scientific approach to prepare for AGI. A useful model comes from an unexpected field: the search for extraterrestrial intelligence, or SETI. We believe that the SETI Institute’s work provides a rigorous framework for detecting and interpreting signs of intelligent life.

The idea behind SETI goes back to the beginning of the space age. In their 1959 Nature paper, the physicists Giuseppe Cocconi and Philip Morrison suggested ways to search for interstellar communication. Given the uncertainty of extraterrestrial civilizations’ existence and sophistication, they theorized about how we should best “listen” for messages from alien societies.

We argue for a similar approach to studying AGI, in all its uncertainties. The last few years have shown a vast leap in AI capabilities. The large language models (LLMs) that power chatbots like ChatGPT and enable them to converse convincingly with humans have renewed the discussion of AGI. One notable 2023 preprint even argued that ChatGPT shows “sparks” of AGI, and today’s most cutting-edge language models are capable of sophisticated reasoning and outperform humans in many evaluations.

While these claims are intriguing, there are reasons to be skeptical. In fact, a large group of scientists have argued that the current set of tools won’t bring us any closer to true AGI. But given the risks associated with AGI, if there is even a small likelihood of it occurring, we must make a serious effort to develop a standard definition of AGI, establish a SETI-like approach to detecting it, and devise ways to safely interact with it if it emerges.

THE CRUCIAL FIRST step is to define what exactly to look for. In SETI’s case, researchers decided to look for certain narrowband signals that would be distinct from other radio signals present in the cosmic background. These signals are considered intentional and only produced by intelligent life. None have been found so far.

In the case of AGI, matters are far more complicated. Today, there is no clear definition of artificial general intelligence. The term is hard to define because it contains other imprecise and controversial terms. Although intelligence has been defined by the Oxford English Dictionary as “the ability to acquire and apply knowledge and skills,” there is still much debate on which skills are involved and how they can be measured. The term general is also ambiguous. Does an AGI need to be able to do absolutely everything a human can do?

One of the first missions of a “SETI for AGI” project must be to clearly define the terms general and intelligence so the research community can speak about them concretely and consistently. These definitions need to be grounded in disciplines such as computer science, measurement science, neuroscience, psychology, mathematics, engineering, and philosophy.

There’s also the crucial question of whether a true AGI must include consciousness and self-awareness. These terms also have multiple definitions, and the relationships between them and intelligence must be clarified. Although it’s generally thought that consciousness isn’t necessary for intelligence, it’s often intertwined with discussions of AGI because creating a self-aware machine would have many philosophical, societal, and legal implications.

NEXT COMES the task of measurement. In the case of SETI, if a candidate narrowband signal is detected, an expert group will verify that it is indeed from an extraterrestrial source. They’ll use established criteria—for example, looking at the signal type and checking for repetition—and conduct assessments at multiple facilities for additional validation.

How to best measure computer intelligence has been a long-standing question in the field. In a famous 1950 paper, Alan Turing proposed the “imitation game,” more widely known as the Turing Test, which assesses whether human interlocutors can distinguish if they are chatting with a human or a machine. Although the Turing Test was useful in the past, the rise of LLMs has made clear that it isn’t a complete enough test to measure intelligence. As Turing himself noted, the relationship between imitating language and thinking is still an open question.

Future appraisals must be directed at different dimensions of intelligence. Although measures of human intelligence are controversial, IQ tests can provide an initial baseline to assess one dimension. In addition, cognitive tests on topics such as creative problem-solving, rapid learning and adaptation, reasoning, and goal-directed behavior would be required to assess general intelligence.

But it’s important to remember that these cognitive tests were designed for humans and might contain assumptions that might not apply to computers, even those with AGI abilities. For example, depending on how it’s trained, a machine may score very high on an IQ test but remain unable to solve much simpler tasks. In addition, an AI may have new abilities that aren’t measurable by our traditional tests. There’s a clear need to design novel evaluations that can alert us when meaningful progress is made toward AGI.

IF WE DEVELOP AGI, we must be prepared to answer questions such as: Is the new form of intelligence a new form of life? What kinds of rights does it have? What are the potential safety concerns, and what is our approach to containing the AGI entity?

Here, too, SETI provides inspiration. SETI’s postdetection protocols emphasize validation, transparency, and international cooperation, with the goal of maximizing the credibility of the process, minimizing sensationalism, and bringing structure to such a profound event. Likewise, we need internationally recognized AGI protocols to bring transparency to the entire process, apply safety-related best practices, and begin the discussion of ethical, social, and philosophical concerns.

We readily acknowledge that the SETI analogy can go only so far. If AGI emerges, it will be a human-made phenomenon. We will likely gradually engineer AGI and see it slowly emerge, so detection might be a process that takes place over a period of years, if not decades. In contrast, the existence of extraterrestrial life is something that we have no control over, and contact could happen very suddenly.

The consequences of a true AGI are entirely unpredictable. To best prepare, we need a methodical approach to defining, detecting, and interacting with AGI, which could be the most important development in human history.


2024 Guest Lecture Notes: AI, Machine Learning and Data Mining in Recommendation System and Entity Matching

  1. Lecture Notes Repository on GitHub
    1. Disclaimer
    2. 2024-10-14: AI/ML in Action for CSE5ML
    3. 2024-10-15: AI/DM in Action for CSE5DMI
  2. Contribution to the Company and Society
  3. Reference

In October of 2024, I was invited by Dr Lydia C. and Dr Peng C to give two presentations as a guest lecturer at La Trobe University (Melbourne) to the students enrolled with CSE5DMI Data Mining and CSE5ML Machine Learning.

The lectures are focused on data mining and machine learning applications and practice in industry and digital retail; and how students should prepare themselves for their future. Attendees are postgraduate students currently enrolled in CSE5ML or CSE5DMI in 2024 Semester 2, approximately 150 students for each subject (CSE5ML or CSE5DMI) who are pursuing one of the following degrees:

  • Master of Information Technology (IT)
  • Master of Artificial Intelligence (AI)
  • Master of Data Science
  • Master of Business Analytics

Lecture Notes Repository on GitHub

Viewer can find the Lecture Notes on my GitHub Repository: https://github.com/cuicaihao/GuestLecturePublic under a Creative Commons Attribution 4.0 International License.

Disclaimer

This repository is intended for educational purposes only. The content, including presentations and case studies, is provided “as is” without any warranties or guarantees of any kind. The authors and contributors are not responsible for any errors or omissions, or for any outcomes related to the use of this material. Use the information at your own risk. All trademarks, service marks, and company names are the property of their respective owners. The inclusion of any company or product names does not imply endorsement by the authors or contributors.

This is public repository aiming to share the lecture for the public. The *.excalidraw files can be download and open on https://excalidraw.com/)

2024-10-14: AI/ML in Action for CSE5ML

  • General Slides CSE5ML
  • Case Study: Recommendation System
  • A recommendation system is an artificial intelligence or AI algorithm, usually associated with machine learning, that uses Big Data to suggest or recommend additional products to consumers. These can be based on various criteria, including past purchases, search history, demographic information, and other factors.
  • This presentation is developed for students of CSE5ML LaTrobe University, Melbourne and used in the guest lecture on 2024 October 14.

2024-10-15: AI/DM in Action for CSE5DMI

  • General Slides CSE5DMI
  • Case Study: Entity Matching System
    • Entity matching – the task of clustering duplicated database records to underlying entities.”Given a large collection of records, cluster these records so that the records in each cluster all refer to the same underlying entity.”
  • This presentation is developed for students of CSE5DMI LaTrobe University, Melbourne and used in the guest lecture on 2024 October 15.

Contribution to the Company and Society

This journey is also align to the Company’s strategy.

  • Being invited to be a guest lecturer for students with related knowledge backgrounds in 2024 aligns closely with EDG’s core values of “weʼre real, weʼre inclusive, weʼre responsible”.
  • By participating in a guest lecture and discussion on data analytics and AI/ML practice beyond theories, we demonstrate our commitment to sharing knowledge and expertise, embodying our responsibility to contribute positively to the academic community and bridge the gap between theory builders and problem solvers.
  • This event allows us to inspire and educate students in the same domains at La Trobe University, showcasing our passion and enthusiasm for the business. Through this engagement, we aim to positively impact attendees, providing suggestions for their career paths, and fostering a spirit of collaboration and continuous learning.
  • Showing our purpose, values, and ways of working will impress future graduates who may want to come and work for us, want to stay and thrive with us. It also helps us deliver on our purpose to create a more sociable future, together.

Moreover, I am grateful for all the support and encouragement I have received from my university friends and teammates throughout this journey. Additionally, the teaching resources and environment in the West Lecture Theatres at La Trobe University are outstanding!

Reference

-END-

The Future of Coding: Will Generative AI Make Programmers Obsolete?

Table of Content

  1. Is coding still worth learning in 2024?
  2. Is AI replacing software engineers?
  3. Impact of AI on software engineering
  4. The problem with AI-generated code
  5. How AI can help software engineers
  6. Does AI really make you code faster?
  7. Can one AI-powered engineer do the work of many?
  8. Future of Software Engineering
  9. Reference
Credits: this post is a notebook of the key points from YouTube Content Creator Programming with Mosh's video with some editorial works. TL,DR,: watch the video.

Is coding still worth learning in 2024?

This can be a common question for a lot of people especially the younger generation of students when they try to choose a career path with some kind of insurance for future incomings.

People are worried that AI is going to replace software engineers, or any engineer related to coding and designs.

As you know, we should trust solid data instead of media and hearsay in the digital area. Social media have been creating this anxious feeling that every job is going to collapse because of AI. Coding has no future.

But I’ve got a different take backed up by real-world numbers as follows.

Note: In this post, “software engineer” represents all groups of coders (data engineer, data analyst, data scientist, machine learning engineer, frontend/backend/full-stack developers, programmers and researchers).

Is AI replacing software engineers?

The short answer is NO.

But there is a lot of fear about AI replacing coders. Headling scream robots taking over jobs and it can be overwhelming. But the truth is:

AI is not going to take you jobs; instead it is the People who can work with AI will have the advantage, and probabley will take your job.

Software engineering is not going away at least not anytime soon in our generation. Here are some data to back this up.

The US Bureau of Labor and Statistics (BLS) is a government agency that tracks job growth across the country on its website. From the data, we see that there is a continued demand for software developers, and computer and information scientists.

They claimed that the requirement for software developers is expected to grow by 26% from 2022 to 2032, while the average across all occupations is only 3%. This is a strong indication that software engineering is here to stay.

Source: https://www.bls.gov/ooh/computer-and-information-technology/software-developers.htm#tab-6

In our lives, the research and development conducted by computer and information research scientists turn ideas into technology. As demand for new and better technology grows, demand for computer and information research scientists will grow as well.

There is a similar trend for Computer and Information Research Scientists, which is expected to grow by 23% from 2022 to 2032.

source: https://www.bls.gov/ooh/computer-and-information-technology/computer-and-information-research-scientists.htm#tab-6

Impact of AI on software engineering

To better understand the impact of AI on software engineering, let’s do a quick revisit of the history of programming.

In the early days of programming, engineers wrote codes in a way that only the computer understood. Then, we create compilers, we can program in a human-readable language like C++ and Jave without worrying about how the code should eventually get converted into zeros and ones, and where it will get stored in the memory.

Here is the fact

Compilers did not replace programmers. They made them more efficient!

Since then we have built so many software applications and totally changed the world.

The problem with AI-generated code

AI will likely do the same as changing the future, we will be able to delegate routine and repetitive coding tasks to AI, so we can focus on complex problem-solving, design and innovation.

This will allow us to build more sophisticated software applications most people can not even imagine today. But even then, just because AI can generate code doesn’t mean we can or we should delegate the entire coding aspect of software development to AI because

AI-Generated Code is Lower-Quality, we still need to review and refine it before using it in the production.

In fact, there is a study to support this: Coding on Copilot: 2023 Data Suggests Downward Pressure on Code Quality. According to this study, they collected 153M lines of code from 2020 to 2023 and found disconcerting trends for maintainability: Code churn will be doubled in 2024.

source: Abstract of the 2023 Data Shows Downward Pressure on
Code Quality

So, yes, we can produce more code with AI. but

More Code != Better Code

Humans should always review and refine AI-generated code for quality and security before deploying it to production. That means all the coding skills that software engineer currently has will continue to stay relevant in the future.

You still need the knowledge of data structure and algorithms programming languages and their tricky parts, tools and frameworks, you still need to have all that knowledge to review and refine the AI-generated code, you will just spend less time typing it into the computer.

So anyone telling you that you can use natural language to build software without understanding anything about coding is out of touch with the reality of software engineering (or he is trying to sell you something, i.e., GPUs).

source: NVIDIA CEO: No Need To Learn Coding, Anybody Can Be A Programmer With Technology

How AI can help software engineers

Of course, you can make a dummy app with AI in minutes, but this is not the same kind of software that runs our banks, transportation, healthcare, security and more. These are the software/systems that really matter, and our life depends on them. We can’t let a code monkey talk to a chatbot in English and get that software built. At least, this will not happen in our lifetime.

In the future, we will probably spend more time designing new features and products with AI instead of writing boilerplate code. We will likely delegate aspects of coding to AI, but this doesn’t mean we don’t need to learn to code.

As a software engineer or any coding practitioner, you will always need to review what AI generates and refine it either by hand or by guiding the AI to improve the code.

Keep in mind that Coding is only one small part of a software engineer’s job, we often spend most of our time talking to people, understanding requirements, writing stories, discussing software/system architecture, etc.

Instead of being worried about AI, I’m more concerned about Human Intelligence!

Does AI really make you code faster?

AI can only boost our programming productivity but not necessarily the overall productivity.

In fact, McKinsey’s report, Unleashing Developer Productivity with Generative AI, found that for highly complex tasks developers saw less than 10% improvement in their speed with generative AI supports.

source: https://www.mckinsey.com/capabilities/mckinsey-digital/our-insights/unleashing-developer-productivity-with-generative-ai

As you can see, AI helped the most with documentation and code generation to some extent, but when moving to code refactoring, the improvement dropped to 20% and for high-complexity tasks, it was less than 10%.

 Time savings shrank to less than 10 percent on tasks that developers deemed high in complexity due to, for example, their lack of familiarity with a necessary programming framework.

Thus, if anyone tells you that software engineers will be obsolete in 5 years, they are either ignorant or trying to sell you something.

In fact, some studies tell that the role of software engineers (coders) may become more valuable as they will be needed to develop, manage and maintain these AI systems.

They (software engineers) need to understand all the complexity of building software and use AI to boost their productivity.

Can one AI-powered engineer do the work of many?

Now, people are worried that one Senior Engineer can simply use AI to replace many Engineers, eventually, leaving no job opportunities for juniors.

But again this is a fallacy because the time saving you get from AI is not as great as you are promised in reality. Anyone who uses AI to generate code knows that. It takes effort to get the right prompts for usable results, and the code still needs polishing.

Thus, it is not like one engineer will suddenly have so much free time to do the job of many people.

But you may ask, this is now, what about the future? Maybe in a year or two, AI will start to build software like a human.

In theory, yes, AI is advancing and one day it may even reach and surpass human intelligence. But Einstein said:

In Theory, Theory and Practice are the Same.

In Practice, they are NOT.

The reality is that while machines may be able to handle repetitive and routine tasks, human creativity and expertise will still be necessary for developing complex solutions and strategies.

Software engineering will be extremely important over the next several decades. I don’t think it is going away in the future, but I do believe it will change.

Future of Software Engineering

Software powers our world and that will not change anytime soon.

In future, we have to learn how to input the right prompt into our AI tools to get the expected result. This is not an easy skill to develop, it requires problem-solving capability as well as programming knowledge of languages and tools. So, if you’ve already made up your mind and don’t want to invest your time in software engineering or coding. That’s perfectly fine. Follow your passion!

The coding tools will evolve as they always do, but the true coding skill lies in learning and adapting. The future engineer needs today’s coding skills and a good understanding to use AI effectively. The future brings more complexity and demands more knowledge and adaptability from software engineers.

If you like building things with code, and if the idea of shaping the future with technology gets you excited, don’t let negativity and fear of Gen-AIs hold you back.

Reference

Enigma – Mission X Challenge Accomplished with Python

Enigma M3 from 101 computing: https://www.101computing.net/enigma/
GitHub Repo: https://github.com/cuicaihao/Enigma-Mission-X

Short Summary

Inspired by Enigma – Mission X Challenge, this repo is used to save the research and practice efforts in Different Cipher methods.

The primary goals are using Python programming language to achieve targets listed as follows in Jupyter Notebooks:

Example

  • German Navy Ciphertext by Enigma M3: OJSBI BUPKA ECMEE ZH
  • German Message: Ziel hafen von DOVER
  • English Translation: Target port of DOVER
Enigma Mission – X

By running the notebook, it is not difficult to complete the deciphering process with the “keys” to get the original message from the ciphertext by the German Navy.

Notebook Outputs Example

However, it will be difficult to break down the cipher without knowing the keys. That will be the Turing-Welchman Bombe Simulator challenge.

About Enigma Mission X

Mission X is a game for programmers to accomplish the deciphering job required by Dr Alan Turing.

Mission X Letter from Alan Turning

Programmers need to break the secret with limited information as follows.

Example Message from German Navy

END

Deep ConvNets for Oracle Bone Script Recognition with PyTorch and Qt-GUI

Shang Dynasty Oracle Bone Scripts Images

1. Project Background

This blog demonstrates how to use Pytorch to build deep convolutional neural networks and use Qt to create the GUI with the pre-trained model. The final app runs like the figure below.

Qt GUI with pre-trained Deep ConvNets model for Oracle Bone Scripts Recognition: Model predicts the Oracle Bone Script ‘合’ with 99.8% Acc.

The five original oracle bone scripts in this sample image can be translated into modern Chinese characters as “贞,今日,其雨?” (Test: Today, will it rain?)

Please note that I am not an expert in the ancient Chinese language, and I think the translation may not be that accurate. But in the GUI, the user can draw the script in the input panel and then click the run button to get the top 10 Chinese characters ranked by probabilities. The highest result is presented with green background colour and 99.8% accuracy.

I will assume that readers have a basic understanding of the deep learning model, middle-level skills of python programming, and know a little about UX/UI design with Qt. There are awesome free tutorials on the internet or one could spend a few dollars to join online courses. I see no hurdles for people mastering these skills.

The following sections are arranged with the topics as follows. Explain the basic requirements for this project and then cover all the basic steps in detail:

  1. Init the project
  2. Create the Python Environment and Install the Dependencies
  3. Download the Raw Data and Preprocess the Data
  4. Build the Model with Pytorch
    • Review the Image
    • Test the Dataloader
    • Build the Deep ConvNets Model
    • Test the Model with Sample Images
  5. Test the Model with Qt-GUI

The source code can be found on my GitHub Repo: Oracle-Bone-Script-Recognition: Step by Step Demo; the README file contains all the basic steps to run on your local machine.

2. Basic Requirements

I used cookiecutter package to generate a skeleton of the project.

There are some opinions implicit in the project structure that has grown out of our experience with what works and what doesn’t when collaborating on data science projects. Some of the opinions are about workflows, and some of the opinions are about tools that make life easier.

  • Data is immutable
  • Notebooks are for exploration and communication (not for production)
  • Analysis is a DAG (I used the ‘Makefile’ to create command modules of the workflow)
  • Build from the environment up

Starting Requirements

  • conda 4.12.0
  • Python 3.7, 3.8 I would suggest using Anaconda for the installation of Python. Or you can just install the miniconda package, which saves a lot of space on your hard drive

3. Tutorial Step by Step

Step 1: Init the project

Use ‘git’ command to clone the project from Github.

cd PROJECT_DIR
git clone https://github.com/cuicaihao/deep-learning-for-oracle-bone-script-recognition 
# or
# gh repo clone cuicaihao/deep-learning-for-oracle-bone-script-recognition

Check the project structure.

cd deep-learning-for-oracle-bone-script-recognition
ls -l
# or 
# tree -h

You will see a similar structure as the one shown in the end. Meanwhile, you could open the ‘Makefile’ to see the raw commands of the workflow.

Step 2: Create the Python Environment and Install the Dependencies

The default setting is to create a virtual environment with Python 3.8.

make create_environment

Then, we activate the virtual environment.

conda activate oracle-bone-script-recognition

Then, we install the dependencies.

make requirements

The details of the dependencies are listed in the ‘requirements.txt’ file.

Step 3: Download the Raw Data and Preprocess the Data

This first challenge is to find a data set with the oracle bone scripts; I found this website 甲骨文 and its GitHub Repo which provided all the script images and image-to-label database I need. The image folder contains 1602 images, and the image name to Chinese character (key-value) pairs are stored in JSON, SQL and DB format, making it the perfect data set for our project startup.

甲骨文网页

we can download the raw data of the images and database of the oracle bone script. Then we will download the raw data and preprocess the data in the project data/raw directory.

make download_data

The basic step is to download repository, unzip the repo, and then make a copy of the images and database (JSON) file to the project data/raw directory.

Then, we preprocess the data to create a table (CSV file) for model development.

make create_dataset

The source code is located at src/data/make_dataset.py. The make command will provide the input arguments to this script to create two tables (CSV file) in the project data/processed directory.

The Raw Data of the Oracle Bone Scripts with the Image-Name Paris.

Step 4: Build the Model with Pytorch

This section is about model development.

4.1 Review Image and DataLoader

Before building the model, we need to review the image and data loader.

make image_review

This step will generate a series of images of the oracle bone script image sample to highlight the features of the images, such as colour, height, and width.

Besides, we show the results of different binarization methods of the original greyscale image with the tool provided by the scikit-image package.

The source code is located at src/visualization/visualize.py.

4.2 Test the DataLoader

We can still test the Dataloader with the command.

make test_dataloader

This will generate an 8×8 grid image of the oracle bone script image sample. The source code is located at src/data/make_dataloader.py.

In the image below, it generates a batch of 64 images with its label(Chinese characters) on the top-left corner.

A Batch of 8×8 Grid Images Prepared for Deep ConvNets Model

4.3 Build the Deep Convolutional Neural Networks Model

Now we can build the model. The source code is located at src/models/train_model.py. This command will generate the model and the training process records at models/.

make train_model

(Optional) One can monitor the process by using the tensorboard command.

# Open another terminal
tensorboard --logdir=models/runs

Then open the link: http://localhost:6006/ to monitor the training and validation losses, see the training batch images, and see the model graph.

After the training process, there is one model file named model_best in the models/ directory.

4.4 Test the Model with Sample Image

The pre-trained model is located at models/model_best. We can test the model with the sample image. I used the image (3653610.jpg) of the oracle bone script dataset in the Makefile test_model scripts, readers can change it to other images.

make test_model
# ...
# Chinese Character Label = 安
#      label name  count       prob
# 151    151    安      3 1.00000000
# 306    306    富      2 0.01444918
# 357    357    因      2 0.00002721
# 380    380    家      2 0.00001558
# 43      43    宜      5 0.00001120
# 586    586    会      1 0.00000136
# 311    311    膏      2 0.00000134
# 5        5    执      9 0.00000031
# 354    354    鲧      2 0.00000026
# 706    706    室      1 0.00000011

The command will generate a sample figure with a predicted label on the top and a table with the top 10 predicted labels sorted by the probability.

Model Prediction Label and Input Image

Step 5: Test the Model with Qt-GUI

Now, we have the model, we can test the model with the Qt-GUI. I used Qt Designer to create the UI file at src/ui/obs_gui.ui. Then, use the pyside6-uic command to get the Python code from the UI file `pyside6-uic src/ui/obs_gui.ui -o src/ui/obs_gui.py.

Activate the GUI by

python gui.py
# or 
# make test_gui
Draw the script of the ‘和’ and Run the Prediction
Website of the Oracle Bone Script (Index H)

The GUI contains an input drawing window for the user to scratch the oracle bone script as an image.
After the user finishes the drawing and clicks the RUN button. The input image is converted to a tensor (np.array) and fed into the model. The model will predict the label of the input image with probability which is shown on the top  Control Panel of the GUI.

  • Text Label 1: Show the Chinese character label of the input image ID and the Prediction Probability. If the Acc > 0.5, the label background colour is green; if the Acc < 0.0001, the label background colour is red. Otherwise, the label background colour is yellow.
  • Test Label 2: Show the top 10 predicted labels sorted by the probability.
  • Clean Button: Clean the input image.
  • Run Button: Run the model with the input image.
  • Translate Button: (Optional) Translate the Chinese character label to English. I did not find a good Translation service for a single character, so I left this park for future development or for the readers to think about it.

4 Summary

This repository is inspired by the most recent DeepMind’s work Predicting the past with Ithaca, I did not dig into the details of the work due to limited resources.

I think the work is very interesting, and I want to share my experience with the readers by trying a different language like Oracle Bone Scripts. It is also a good starter example for me to revisit the PyTorch deep learning packages and the qt-gui toolboxes.

I will be very grateful if you can share your experience with more readers. If you like this repository, please upvote/star it.

Conclusion

I made a formal statement on my GitHub on the first day of 2022, claiming that I would create 10 blogs on technology, but I got flattened by daily business and other work. But be a man of his word, I made my time to serve the community. Here comes the first one.

If you find the repository useful, please consider donating to the Standford Rural Area Education Program (https://sccei.fsi.stanford.edu/reap/): Policy change and research to help China’s invisible poor.

Reference

  1. Cookiecutter Data Science
  2. PyTorch Tutorial
  3. Qt for Python
  4. GitHub Chinese-Traditional-Culture/JiaGuWen
  5. Website of the Oracle Bone Script Index

-END-