Reasoning and Knowledge

The Unreasonable Effectiveness of Generative AI and Its Challenges

Abstract

The current landscape of Artificial Intelligence (AI) is undergoing a pivotal transformation with Generative AI (GenAI). This evolution represents more than a technological leap; it signifies a paradigm shift in AI, prompting critical questions about GenAI’s unique characteristics and its profound impact. Rooted in the rich history of the computing revolution, this exploration examines the nature of reasoning and knowledge in the context of computational complexity, weaving together historical achievements with recent advancements to hopefully enrich our understanding of AI.

Introduction: A Resounding Echo of the Computing Revolution

GenAI symbolizes a juncture, resonating with the early days of computing, marked by the revolutionary contributions of Kurt Gödel in 1931 and Alan Turing in 1937. Their groundbreaking work, initially aimed at exploring the potential of machines to replace mathematicians, unintentionally laid the foundation for modern computing. This pursuit rekindled an essential question from that era: “Can machines think?” Presently, this query has evolved, concentrating on the ability of machines to ‘reason’ and ‘acquire knowledge’ akin to humans, key traits of what we now identify as Artificial General Intelligence (AGI).

Reflecting the practical approach of Alan Turing in the 1950s, encapsulated in the philosophy of ‘let’s do it and see how it works,’ GenAI’s capabilities mirror this ethos. Techniques like Chain-of-Thought (CoT) prompting (Wei, 2022) and Retrieval-Augmented Generation (RAG) (Lewis, 2020) demonstrate how AI is made to emulate human reasoning and knowledge assimilation, with a focus on human-in-the-loop validation. CoT enables AI to ‘think through’ while solving problems, while RAG allows AI to access vast data sources to enrich its responses.

In contrast, Gödel’s more measured and discerning view on AGI, focusing on the human mind’s superiority over machines, offers a necessary counterweight. This viewpoint underscores the importance of a theoretically solid and robust approach to AI development. Intriguingly, current strides in AI, such as new training methods and alignment strategies, echo Gödel’s principles to some extent.

The evolution of GenAI is also characterized by advancements that enable it to self-reference and self-improve, drawing from Gödel’s mathematical insights and Turing’s computational theories. Innovations like Auto-CoT (Zhang, 2022) and Self-RAG (Asai, 2023) exemplify GenAI’s progression, utilizing its technology for self-enhancement.

Despite its impressive effectiveness, GenAI confronts challenges, particularly in its empirical approach that expands upon Turing’s legacy. Considering these challenges from Gödel’s perspective could provide valuable insights. This situation presents a notable irony: If the P vs. NP problem, a fundamental yet unresolved question in computational complexity, were to be definitively solved, it might have led Gödel to reconsider his stance on the superiority of the human mind over machines. This paradox, along with its implications for AGI, will be examined in greater detail later.

This introduction sets the stage for an in-depth examination of GenAI. Our objective is to dissect how GenAI attains its remarkable effectiveness and to shed light on its inherent challenges. We will focus on the broad historical and methodological aspects that are integral to its ongoing development.

For those interested in the historical and theoretical foundations of computing, the section ‘From the Impossible Machine to the Versatile Computer and AGI’ is recommended.

Readers intrigued by the uniqueness of Generative AI and its alignment with the foundation of computing and Turing’s empirical approach should explore ‘Generative AI Is A Full Extension of Modern Computing.’

To understand reasoning and knowledge from Gödel’s perspective, ‘Revisiting Reasoning and Knowledge from First Principles’ is the go-to section.

Lastly, ‘From P vs. NP to AGI’ offers forward-looking insights into human-centric AGI.

From the Impossible Machine to the Versatile Computer, and AGI

To fully understand the revolutionary nature of GenAI, we must delve into the foundational aspects of computing as envisioned by Alan Turing. During Turing’s era, the mathematical formalism school, led by David Hilbert, advocated the notion that all mathematical knowledge could be logically derived from a foundational set of axioms or first principles. This era saw the evolution of logical reasoning from Aristotle’s mindful mental discipline to a system of mechanical symbolic manipulation. Rooted in formal systems of logical rules and mathematical axioms, this approach suggested that the ingenuity and inherent knowledge of mathematicians might not be essential in mathematics. This line of thought also gave rise to the notion of a universal proving machine, capable of resolving any mathematical problem through symbolic logic, independent of prior mathematical knowledge — a concept central to the Entscheidungsproblem, or the ‘decision problem.’

From the Entscheidungsproblem to the Halting Problem

The mathematical landscape was fundamentally transformed by Kurt Gödel’s first incompleteness theorem in 1931. This theorem revealed that no formal system could be both ‘complete’ — encompassing every truth within its scope — and ‘consistent,’ free from contradictions. Gödel later elucidated the deep implications of his theorem, stating, “Mind, in its use, is not static, but constantly developing, i.e., that we understand abstract terms more and more precisely as we go on using them.” (Copland, 2013). This emphasized the vital role of human involvement in refining and expanding our understanding of first principles in mathematics, highlighting that mere logical reasoning from these foundational concepts is insufficient.

Amid this shifting paradigm, Turing embarked on a different trajectory toward the Entscheidungsproblem, diverging from the search for the universal ‘proving’ machine to investigate the possibilities of general computing machines. The motivation was that such a machine would be too powerful to exist. It would be able to reference itself, leading necessarily to a paradox.

This line of inquiry culminated in Turing’s formulation of the Halting Problem: Could a hypothetical machine, which he termed the ‘Halting Oracle’ determine whether any given machine would eventually halt or continue running indefinitely with a specific input, without even running the machine? Turing approached this challenge by conceptualizing a hypothetical machine M, designed to act in direct opposition to the Halting Oracle’s predictions as illustrated below:

This led to the self-referential paradox:

If M halts when given M as input, by its definition, it will loop indefinitely: a contradiction.
Conversely, if M loops indefinitely on input M, it should halt: another contradiction.

This profound paradox led Turing to a significant conclusion: the Halting Oracle, an envisioned entity capable of predicting any machine’s operational outcome, was an impossibility. As a result, the Halting Problem — determining whether a machine would halt or continue indefinitely — became undecidable. Turing’s insight further extended to mathematical proof: if a universal proof mechanism were possible, in theory, it could be transformed into a Halting Oracle by rephrasing each mathematical problem as a query for the Oracle. Thus, Turing’s demonstration of the Halting Oracle’s non-existence indirectly posited the absence of a universal proof mechanism. Conversely, the existence of the Halting Oracle would allow the creation of a universal proof mechanism, where proofs are systematically searched, and the Oracle is consulted on whether the search would halt. Therefore, the unsolvability of the Halting Problem is intrinsically linked to the unsolvability of the Entscheidungsproblem, and vice versa.

The Foundation of Modern Computing

Turing’s paradox has indeed left a lasting impact on the field of computing, crystallizing into three defining characteristics that shape modern computer systems:

General-Purpose Computers: Turing’s concept of a machine (M) capable of emulating any program, including the Halting Oracle, laid the groundwork for general-purpose computers. This innovation brought about a clear distinction between hardware and software.
Machine-Program Duality: The idea of the Halting Oracle treating M as both hardware and software introduced the machine-program duality. This duality implies that a program can be perceived as hardware for other programs, such as virtual machines and libraries, creating levels of abstraction.
Program-Data Duality: Turing’s insight that M can process other machines as input led to the emergence of program-data duality. This principle posits that programs can be treated as data by other programs, enabling software tools such as compilers and interpreters. This duality is at the heart of modern software architecture.

The absence of a universal proving mechanism, or a Halting Oracle — which could theoretically have made software engineering obsolete — instead unexpectedly spurred the computing revolution and gave rise to the versatile computer and the indispensable role of software engineers as we know them today.

The absence of a universal proving mechanism, or a Halting Oracle unexpectedly spurred the computing revolution and gave rise to the versatile computer and the indispensable role of software engineers.

From Human Intuition to AGI

Turing’s work reveals a profound paradox: the limitations inherent in logical reasoning within formal systems facilitate their mechanization. This realization highlights the indispensable role of human intuition, particularly in ‘selecting’ the most suitable theorem-proving machine from various options, as illustrated in the accompanying diagram. This intuitive process is akin to the intuition involved in discovering proof itself. From the perspective of modern computing, the act of choosing the right machine is analogous to writing a program, and Turing posited that a programmer’s intuition is akin to that of a mathematician.

The pivotal question, ‘Can machines think?’ — reinvigorated by the works of Turing — transforms in the modern era to ‘Can intuition be mechanized?’ This shift marks a profound evolution in the discussion about Artificial General Intelligence (AGI), suggesting that AGI may represent not merely a continuation but potentially the zenith of the computing revolution.

The pivotal question, ‘Can machines think?’ transforms to ‘Can intuition be mechanized?’

Gödel’s perspective enriches this notion, illustrating how the human mind’s capacity to conceptualize and expand axioms, transitioning from set A to A’, as depicted in the diagram, extends beyond what can be mechanistically proven. His insights underscore the distinctive ability of the human mind to transcend the limits of mechanized intuition.

Generative AI Is a Full Extension of Computing

The capabilities of Generative AI (GenAI), with Large Language Model (LLM) at heart as the general-purpose reasoning ‘hardware,’ have opened up a world rich with potential but also fraught with challenges. At the heart of this revolution is a paradoxical approach often described as ‘doing without knowing’ or ‘let’s do it and see how it works.’ This approach, characterizing GenAI’s reliance on emergent rather than on explicitly programmed intelligence, enables exceptional, or ‘unreasonable,’ effectiveness across various domains. However, this same quality that underpins their adaptability can also lead to fundamental errors, such as basic arithmetic mistakes, and to a phenomenon known as ‘hallucination.’ In these instances, GenAI confidently generates misleading or inaccurate information, highlighting its lack of awareness of the limits of its knowledge. This duality within GenAI — its versatility and susceptibility to error — is intriguing, as the very elements that fuel excitement about its capabilities are also those that necessitate caution.

Our enthusiasm for Generative AI (GenAI) is paradoxical, stemming from its effortless demonstration of potential, hinting at a future both promising and complex. Its effectiveness, deemed ‘unreasonable,’ signifies GenAI’s embodiment of the defining natures of computing, elevating it to the status of a revolutionary force, akin to computing itself.

Generative AI’s effectiveness signifies its embodiment of the defining natures of computing akin to computing itself.

LLM as the New Hardware

Diving into the specifics of Large Language Models (LLMs), we identify three key features:

Natural Language Interface: LLMs, adept at understanding and generating human language, enable intuitive interactions. This leads to the advanced handling of prompts and responses, involving ‘agents’ — software modules that facilitate interaction between the user and the LLM. This dynamic is integral to GenAI’s ‘machine-program duality’. Each level of abstraction in this structure represents the AI as perceived by external agents or users, echoing the multi-tiered architecture typical in modern computing systems.
Instruction Following: LLMs, viewed as foundational elements in AI’s abstraction hierarchy due to machine-program duality, excel in processing natural language instructions. This parallels the functionality of a CPU in a computer, positioning LLM as a form of general-purpose reasoning ‘hardware’. It can understand a wide range of commands and tasks, making it versatile like a computer’s hardware but in the domain of cognitive and reasoning tasks. This feature underscores GenAI’s capacity to function across diverse applications, mirroring the adaptability of general-purpose computers.
In-Context Learning/Few-Shot Learning: This remarkable feature of LLMs allows them to learn and adapt from limited examples, or even from their outputs. It demonstrates the program-data duality inherent in computing, where the data (in this case, learning examples or generated content) can be re-fed as program instructions. This ability for continuous learning and adaptation within a few examples mirrors the dynamic nature of data and program interchangeability, a key trait of computing.

Viewed through a transformative lens, GenAI emerges not merely as a branch of computing but as its full extension, signaling a shift akin to the computing revolution initiated in Turing’s era. This integration of computing’s core features is exemplified in an upcoming diagram:

Generative AI Innovations as the New Software Engineering

The intelligence in GenAI systems primarily originates from LLMs, yet the crucial role of human AI innovators in providing programmed intelligence is significant. Three key patterns to innovate — ‘Let’s Watch AI’s Back,’ ‘Let AI Do More Work,’ and ‘Let’s Look Around for AI’ — are redefining the landscape of software engineering with AI innovators at the driving seat. These patterns represent crucial methods for driving innovation in GenAI, marking a transformative approach to how software engineering is conducted in AI.

1. ‘Let’s Watch AI’s Back’: Tackling LLM’s Unpredictability

This pattern focuses on vigilant oversight and continuous refinement in AI systems. It addresses managing the unpredictability and potential errors in Large Language Models, a critical aspect in advancing GenAI. This method of innovation aligns with the Machine-Program Duality in GenAI.

2. ‘Let AI Do More Work’: Expanding AI’s Functional Scope

This pattern represents a strategic shift in enhancing GenAI’s functional capabilities, allowing AI systems to autonomously manage a broader spectrum of complex tasks. This method of innovation resonates with General-Purpose Computing in GenAI.

3. ‘Let’s Look Around for AI’: Broadening Contextual Awareness

This pattern, epitomized by Retrieval-Augmented Generation (RAG) (Lewis, 2020), involves integrating external, real-time information to enrich AI’s contextual understanding and responses. It reflects a method of innovation that aligns with the Program-Data Duality in GenAI

A good example is Self-Reflective RAG, or Self RAG (Asai, 2023), which employs the three patterns of innovation in Generative AI (GenAI) as follows:

1. Tackling AI’s Unpredictability or, ‘Let’s Watch AI’s Back’: This pattern focuses on vigilant oversight and continuous refinement in AI systems. It addresses managing the unpredictability and potential errors in Large Language Models, a critical aspect in advancing GenAI.

2. Expanding AI’s Functional Scope or, ‘Let AI Do More Work’: This pattern represents a significant shift in enhancing GenAI’s capabilities, empowering AI systems to autonomously manage a diverse range of complex tasks. A notable example of this trend is the Hypothetical Document Embeddings (HyDE) approach (Gao, 2022), which employs AI-generated examples to guide similarity searches. This illustrates the ingenious application of AI’s generative features, particularly in harnessing its ‘hallucinations’, to strategically enhance performance in dense retrieval systems.

3. Broadening Contextual Awareness or, ‘Let’s Look Around for AI’: This pattern, epitomized by Retrieval-Augmented Generation (RAG) (Lewis, 2020), involves integrating external, real-time information to enrich AI’s contextual understanding and responses.

The Auto-CoT (Zhang, 2023) method applies the three innovation patterns in GenAI as follows:

‘Let’s Watch AI’s Back’: It focuses on identifying and mitigating errors in AI-generated reasoning through clustering and targeted sampling, ensuring vigilant oversight and continuous refinement.
‘Let AI Do More Work’: Auto-CoT automates the generation of reasoning chains, allowing AI to handle complex reasoning tasks more autonomously.
‘Let’s Look Around for AI’: It enhances AI’s contextual adaptability by storing and retrieving diverse AI-generated reasoning chains, building a richer context for future tasks.

The integration of the three innovation patterns — ‘Let’s Watch AI’s Back,’ ‘Let AI Do More Work,’ and ‘Let’s Look Around for AI’ — in GenAI, as seen in Self-Reflective RAG and Auto-CoT, not only fosters specific advancements but also collectively leads to seemingly endless innovations in GenAI. This multifaceted approach might inadvertently push GenAI towards the realization of AGI, suggesting that the continuous evolution and integration of these innovative strategies could be pivotal in crossing the threshold into AGI.

Generative AI’s Inevitable Path to AGI

Let’s we revisit a crucial aspect of Alan Turing’s lasting legacy: the Turing Test. In his seminal paper “Computing Machinery and Intelligence” (1950), Turing introduced this test as a criterion for assessing machine intelligence, focusing on whether machines can exhibit behavior indistinguishable from humans. This method provides an empirical, human-centric standard for evaluating progress toward Artificial General Intelligence (AGI), sidestepping long-standing and definitional debates on “what is thinking?”

Turing perceived the test as an ever-evolving challenge, stating, “There would be no question of triumphing simultaneously over all machines. There might be men cleverer than any given machine, but then there might be other machines cleverer again, and so on” (Turing, S., 2012). This perspective aligns with the dynamic nature of today’s GenAI, particularly in its application of natural language processing and adaptive learning.

Exploring GenAI’s approach to reasoning and knowledge, we find it navigating questions central to human cognition. Does AI’s reasoning parallel human thought processes, or does it represent a new methodology? Is reasoning inherently dependent on knowledge, or can it function independently?

GenAI bypasses these debates, showcasing capabilities reminiscent of mini Turing Tests. Large Language Models (LLMs) incorporate cognitive, psychological, and philosophical concepts such as ‘thought’, ‘reflection’, ‘explanation’, and ‘critique’ within their technological framework. The Chain-of-Thought (CoT) approach (Wei, 2022), for example, defines ‘thought’ as a distinct step in the reasoning process. The AI can autonomously generate ‘rationals’, as seen in Zero-Shot CoT (Kojima, 2022), where prompts like “let’s think step-by-step” guide its reasoning. Integrating human feedback ensures the AI’s reasoning aligns with human thought patterns. The following simplified diagram shows how they work.

In Retrieval-Augmented Generation (RAG), ‘knowledge’ is interpreted as human-curated data accessible to the AI (Lewis, 2020). This definition, tailored to AI’s operational framework, aligns with the broader human understanding of knowledge. When the AI processes and presents this data in natural language, it synchronizes human and AI interpretations of knowledge. The following simplified diagram shows how it works.

GenAI also facilitates a dynamic interplay of ‘reasoning’ and ‘knowledge’. Building on Zero-Shot CoT, Automatic CoT (Zhang, 2023) employs prompts like “let’s think one-by-one” to encourage AI to generate its own CoT examples, thereby creating a knowledge base for future reasoning. This exemplifies GenAI’s adherence to Turing’s empirical approach.

These developments in GenAI resonate with Turing’s vision of how machines and humans process thought. While GenAI primarily focuses on scalability and commercial applications, its trajectory increasingly intersects with the pursuit of AGI. Driven by both economic imperatives and functional needs, GenAI’s journey is not only advancing AI technology but also, perhaps unintentionally, guiding it toward the realization of AGI.

GenAI’s journey is perhaps unintentionally guiding it toward the realization of AGI.

Revisiting Reasoning and Knowledge from First Principles

Although Gödel, known for his incompleteness theorems, did not directly address Artificial General Intelligence (AGI) in his lifetime, his work provides a foundational perspective that contrasts interestingly with contemporary GenAI developments. Gödel believed that the human mind’s capabilities far surpass what any machine could achieve, suggesting that true AGI may not be fully attainable. He emphasized a deep understanding of human intelligence — a concept encapsulated as ‘knowing before doing,’ and ‘let’s make it right.’ Gödel’s perspective implies that while advancements in GenAI are significant, there is still a long way to go. He advocates the importance of exploring human thought and reasoning, pointing out that developing AI technology should go hand in hand with a profound exploration of the complexities of human cognition.

As we delve into the complexities of Generative AI (GenAI), we adopt Gödel’s principle of ‘knowing before doing’, with a focus on the ‘knowing’ aspect. This philosophy, grounded in the belief of human cognitive superiority over machines, aims to analyze AI in a way that resonates more with human cognition and rationality. This approach informs and guides the training and alignment processes in the development of new Large Language Models (LLMs), regarded as general-purpose reasoning hardware.

Reason Logically vs. Plausibly

In Generative AI (GenAI) that employs Large Language Models (LLMs), we observe an interplay between informal and formal reasoning. The concept of ‘informal deductive reasoning,’ highlighted by Huang (2023), exemplifies this interplay. In this context, ‘informal’ typically refers to natural language reasoning, which, while often conflated with common-sense or inductive reasoning, maintains its distinctiveness.

Johan van Benthem (2011) provides a compelling scenario to demonstrate these reasoning types in action. To make the scenario more relatable, let’s paraphrase the original description from the source:

Imagine a waiter who receives an order for an espresso and a soda. He asks, ‘Who ordered the soda?’ Once he figures out who gets the soda, it’s easy for him to know who gets the espresso.

This scenario unfolds in two reasoning stages:

Plausible Reasoning: Initially, the waiter’s approach is based on plausible reasoning, drawing from his experience or observed customer behavior. This form of reasoning involves weak syllogism and probabilistic thinking, where general patterns (A→B) and specific instances (A is true) lead to a likely conclusion (B is likely being true). This concept aligns with George Pólya’s (1945) and E.T. Jaynes’ (2003) conceptualizations of ‘probability as extended logic,’ which allows for conclusions based on likelihoods derived from available information.
Logical Reasoning: After identifying the soda’s recipient, the waiter shifts to deductive, logical reasoning. This step involves a clear syllogism from a general premise (A→B) and specific information (C→A) to a logical conclusion (C→B).

This scenario not only illustrates the blending of informal and formal reasoning in everyday situations but also reveals how traditional ‘formal’ reasoning patterns, such as syllogism and weak syllogism, can manifest in natural language. In these examples, both the plausible reasoning by the waiter initially and his subsequent logical reasoning are expressed through natural language yet follow formal reasoning patterns. For this reason, to avoid confusion, it is more apt to classify reasoning into ‘logical’ and ‘plausible’ rather than ‘formal’ or ‘informal’. This distinction is important in understanding how GenAI, particularly with its reliance on natural language processing, approaches the complexity of human reasoning.

The First Principles of Logical Reasoning

Note: The following symbolic reasoning section is technical. Feel free to skip it without missing the article’s main ideas

To explore logical reasoning from first principles, we symbolically represent the cafe scenario:

E1: “Guest 1 ordered espresso”
S1: “Guest 1 ordered soda”
E2: “Guest 2 ordered espresso”
S2: “Guest 2 ordered soda”

The syllogistic reasoning is as follows:

Major Premise: Each guest orders exactly one drink, and each drink is ordered by exactly one guest, represented by two XOR statements:

E1⊕S1 (Guest 1 orders either espresso or soda, but not both)
E2⊕S2 (Guest 2 orders either espresso or soda, but not both)

Minor Premise: Guest 1 ordered soda (S1 is true).
Conclusion: Therefore, Guest 2 ordered espresso (deducing E2 from S1).

Symbolically, this is represented as:

(E1⊕S1)∧(E2⊕S2) (Major Premise)
S1 (Minor Premise)
∴E2 (Conclusion)

In logical reasoning, our conclusions are guided by well-established axioms of logic. These axioms serve as the foundational rules ensuring that our reasoning process is coherent and sound. Below is a table that outlines the fundamental axioms of logic. These axioms are the building blocks of logical reasoning, forming the basis of how we deduce truths and make rational decisions. The table is structured to present each axiom alongside a simple natural language description and its associated human value. This format aims to illustrate not only the mathematical or logical essence of each axiom but also its practical application in our reasoning processes and ethical considerations. You may find this table a useful reference, providing a clearer understanding of how logical principles are embedded in both formal reasoning and everyday thinking.

Interestingly, while AI can analyze logical reasoning symbolically and understand axioms of logic, it does not manipulate symbols in the traditional sense. The Zero-Shot CoT prompt ‘Let’s think step-by-step’ encourages AI to emulate syllogistic reasoning, where each ‘rational’ typically represents an application of syllogism. This likely involves pattern recognition based on the natural language descriptions of axioms found in the training data. Furthermore, aligning AI reasoning with human values inherent in each axiom may enhance its logical reasoning capabilities, as indicated in the ‘human values’ column.

Quick to Verify; Hard to Guess

In human cognition, we observe a shift from the structured approaches of strict logic to the fluidity of probabilistic intuition. This shift signifies a deeper understanding of decision-making and problem-solving.

Take, for instance, the seemingly mundane café scenario. On the surface, a waiter’s decision-making might appear to be a linear application of deductive reasoning. However, delving deeper, we find that the roots of his decisions lie in probabilistic intuition. The waiter employs plausible reasoning, navigating through various scenarios with an intuitive grasp of probabilities, shaped by his experiences and observations. This exemplifies the often-unseen role of intuitive judgment as the foundational layer upon which logical reasoning is built.

The impact of intuitive, probabilistic considerations even extends to the waiter’s logical reasoning in the application of logical axioms, such as the Law of Non-Contradiction and the Law of Excluded Middle. Plausible reasoning, therefore, is not just an adjunct but the guiding force, allowing the waiter to assess potential outcomes and make decisions based on educated guesses. The process of logical deduction, in this case, also plays a supporting role, confirming the conclusions initially drawn from intuitive insights.

This dynamic interplay between logic and intuition extends beyond routine scenarios, deeply influencing mathematics. In his work on the Entscheidungsproblem, Turing posited that plausible reasoning intuition, or what he termed, ‘intuition,’ plays a crucial role in the discovery of proofs. He suggested that once this intuitive phase is complete, the subsequent logical process is largely mechanical. This viewpoint underscores the interplay between intuitive insight and methodical logic in problem-solving and theorem-proving. In the context of modern AI, Chain-of-Thought (CoT) prompting (Wei, 2022) appears to tap into a similar mechanism. It likely triggers the AI to engage in a logical afterthought, essentially verifying the logical steps that follow an intuitive leap, mirroring this fundamental process of human reasoning.

How to Measure a Good Guess?

Further illustrating this point is the café scenario, where the process of establishing a logical sequence is deeply interwoven with plausible reasoning. For example, the assumption ‘B is likely true’ translates into a high probability P(B). Understanding what P(B) represents requires diving into the nature of probability and its role in plausible reasoning.

In exploring the concept of probability, we encounter three distinct perspectives:

Frequentist Perspective: Views probability as the long-term frequency of events. While not entirely practical for everyday decisions, like those faced by our waiter, it forms a foundational aspect of probability theory.
Bayesian Perspective: Considers probability as a measure of personal belief or knowledge, heavily influenced by human experience. This perspective is closely aligned with how we often interpret probabilities in daily life.
Jaynesian Perspective: As proposed by Jaynes in 2003, this approach extends the Bayesian framework to include non-human entities like AI and search algorithms. Here, probability is a state of knowledge applicable to any reasoning entity, whether human or machine.

We must acknowledge that Large Language Models (LLMs) function primarily within the realm of Jaynesian probability. This becomes apparent upon realizing that LLMs are non-human entities and incapable of performing frequentist trials to ascertain the probability distribution of subsequent tokens. Crucially, by regarding both humans and AI as entities capable of holding beliefs, we position ourselves as observers with the unique ability to analyze and understand both our own cognitive processes and those of AI systems. This dual role as observer and participant is pivotal in comprehensively understanding the dynamics of AI and its interaction with human reasoning.

The First Principles of Plausible Reasoning

The table below presents Jaynes’ desiderata of plausible reasoning. These principles offer guidelines that often underpin our everyday ‘common sense’ decisions. Each desideratum is presented with a probabilistic or mathematical interpretation, showcasing how our intuitive decisions align with these principles. Additionally, the table highlights corresponding human values, emphasizing the ethical and practical implications of these principles in our daily lives. While each desideratum is rich in detail, the table is designed for reference and deeper exploration at your leisure. It serves as a reminder of how structured reasoning guides our intuition and judgment, often in ways we might not explicitly realize.

Large Language Models (LLMs) and similar AI systems, while not intrinsically designed with Bayesian inference capabilities, often display behaviors aligning with principles similar to those in Jaynes’ desiderata. This alignment is likely a result of their comprehensive training on extensive natural language datasets. Future advancements in AI might focus more intentionally on embedding these principles of plausible reasoning, thereby improving the decision-making capabilities of these systems.

The Challenges for A Knowledgeable Reasoner

The intricate relationship between reasoning and knowledge plays a crucial role in effective reasoning, applicable to both human cognition and AI systems. This connection is underscored by principles such as ‘Equitable Assessment’ and ‘Comprehensive Analysis’ from Jaynes’ desiderata. These principles stress the importance of AI systems objectively considering and evaluating all pertinent evidence. A notable shortfall in this area is one of the primary causes of issues like AI ‘hallucination’ — drawing misleading conclusions from internal knowledge — or inaccuracies due to external misinformation, such as being ‘misled by similarity’ (Zhang, 2022). However, the overwhelming volume and complexity of real-world data pose significant challenges, rendering the achievement of a non-ideological and completely objective AI system a difficult, if not impractical, goal.

The intricate relationship between reasoning and knowledge plays a crucial role in effective reasoning, applicable to both human cognition and AI systems.

In the next part, we will explore the broader ramifications of this challenge, examining how the potential ‘infeasibility’ of fully implementing these principles affects the computing field and the onward journey of AI development.

From P vs. NP to AGI

Revisiting the Entscheidungsproblem

Our discussion on plausible reasoning naturally progresses to the Entscheidungsproblem in theorem proving, a specialized yet narrowly defined challenge. If plausible reasoning from first principles is feasible, then so must be the Entscheidungsproblem. However, Alan Turing proved in 1937 that this problem is unsolvable, prompting us to shift our perspective from a problem of possibility vs. impossibility to one of feasibility.

Originally, the Entscheidungsproblem was posed in the context of mathematics and logic, where concepts of efficiency and limits on input size are absent. In this realm, search is always exhaustive, involving the enumeration of all possibilities.

Gödel’s contributions, while not directly addressing AGI, had a significant impact on computational complexity theory. He introduced the idea of scaling problem sizes to study solving times, a key concept in the field. This theory categorizes problems based on required computational resources like time and space, defining classes such as P (Polynomial time), where problems are feasibly solvable; NP (Non-deterministic Polynomial time), where solutions are feasibly verifiable; and EXP (Exponential time), with exponentially increasing solving times. The P vs. NP question, central to this theory, asks if every problem quickly verifiable (NP) can also be feasibly solved (P). This is particularly relevant when considering the logical reasoning involved in the Entscheidungsproblem, highlighting how often problems in this domain can be solved feasibly.

In his 1945 correspondence with John von Neumann, Gödel speculated about his revised decision problem, now in the NP class, being both quickly verifiable through logical reasoning and quickly solvable, which implies the possibility of NP being equal to P. This speculation predated and hinted at the P vs. NP question. Gödel surmised that if his hypothesis were true, it would challenge his earlier beliefs about the unique role of human ingenuity in mathematics.

The prevailing consensus in computer science leans towards P not being equal to NP, suggesting that the Entscheidungsproblem, initially proved unsolvable by Turing, might be infeasible when viewed as a computational problem. In contrast, Large Language Models (LLMs) have demonstrated human-like plausible reasoning. This suggests the significance of GenAI as a vital approach in the ongoing pursuit of human-centric AGI.

What If There Were the Halting Oracle?

Reflecting on modern computing and AI, we gain a new perspective by pondering ‘what-if’ scenarios concerning the classical Entscheidungsproblem and the P vs. NP problem. This approach not only revisits these longstanding issues but also recontextualizes them in the context of contemporary technological advancements.

Imagine, for a moment, a world where the Halting Oracle exists. This extraordinary tool could potentially diminish the intellectual thrill and challenge presented by numerous open mathematical problems, such as Goldbach’s Conjecture. We could design a trivial program that systematically assesses every even number starting from 4, checking if each can be expressed as the sum of two prime numbers. Consulting the Halting Oracle about whether this program would ultimately conclude would potentially offer a solution to the conjecture.

Gödel’s insights are notably profound in this context. He argued that the non-existence of a universal proving machine, or the Halting Oracle, stems not from the limits of human reasoning but from the constraints of mechanized logical reasoning. When viewed through the lens of modern advancements, this observation underscores the absence of a conceptual framework in Turing’s era, including in his work, to effectively bridge the gap between logical and plausible reasoning. Turing’s use of ‘selection’ to describe plausible reasoning, a term that later became central as ‘search’ in computational complexity theory, highlights this conceptual divide. Paradoxically, the elusive goal of developing logical reasoning machines capable of encompassing plausible reasoning has catalyzed the creation of new frameworks and tools with the potential to mechanize plausible reasoning.

What If P=NP?

In a similar vein, speculating the emergence of a ‘Holy Grail’ solver — a feasible NP solver — the definition of AI undergoes a paradigm shift, diverging significantly from the traditional trajectory of AGI. This transformation, especially under the hypothetical scenario where P equals NP, introduces a radical dimension to AI capabilities. In such a landscape, the boundary between reasoning and knowledge becomes blurred; reasoning itself, traditionally reliant on accumulated knowledge, could be executed without the extensive databases of information currently deemed essential.

In such a hypothetical landscape, reasoning, traditionally reliant on accumulated knowledge, could be executed without the extensive databases of information currently deemed essential.

Equipped with an algorithm capable of resolving any solvable problem, AI in this context transcends the conventional need for evolutionary learning processes, data accumulation, or interaction methodologies that mimic human cognition. The AI, transformed by this computational breakthrough, stands as an omnipotent problem-solver, static in its learning but unparalleled in its problem-solving capacity. The role of AI thus shifts dramatically — from a tool evolving and adapting to human needs, to an all-encompassing solver that redefines the limits of computation and problem-solving. The very essence of problems, once bound by the necessity of knowledge and experience, now becomes a playground of pure algorithmic exploration. In this new era, humanity grapples with a profound identity crisis, questioning its role and purpose in a world where intelligence and problem-solving are no longer gated by the accumulation of knowledge.

In the hypothetical ‘P=NP’ era, humanity grapples with an identity crisis, questioning its role and purpose.

Conclusion

Thankfully, the prospect of such groundbreaking discoveries becoming a reality remains slim, and GenAI, despite its inherent challenges, might yet reveal a feasible route toward a future of human-centric AGI. Addressing the challenges posed by GenAI could lead us into an era brimming with fresh insights into computing, paralleled by an enhanced understanding of our own nature. This inevitably leads us to a question reminiscent of Gödel’s inquiries in logic: “What does it mean to mechanize the mind, especially in the context of creating AGI?”

What does it mean to mechanize the mind in the context of creating AGI?

[Note] In the development of this article, while the original concepts and ideas are our own, we gratefully acknowledge the assistance of OpenAI’s ChatGPT for its role in offering supplementary insights, refining certain points, and aiding in the drafting process.

References

Aaronson, S. (2016). P=?NP. In Open Problems in Mathematics. Springer.
Asai, A., Wu, Z., Wang, Y., Sil, A., & Hajishirzi, H. (2023). Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection. https://arxiv.org/abs/2310.11511.
Cook, S. (1971). “The complexity of theorem proving procedures”. Proceedings 1. of the Third Annual ACM Symposium on Theory of Computing. pp. 151–158.
Copland, J. B., Posy, C. J., & Shagrir, O. (2013). Computability: Turing, Gödel, Church, and Beyond (Kindle ed.). The MIT Press.
Fiorillo, C. D. (2012). Beyond Bayes: On the need for a unified and Jaynesian definition of probability and information within neuroscience. Information, 3(2), 175–203. Retrieved from https://doi.org/10.3390/info3020175.
Gao, Y., Xiong, Y., Gao, X., Jia, K., Pan, J., Bi, Y., Dai, Y., Sun, J., & Wang, H. (2023). Retrieval-Augmented Generation for Large Language Models: A Survey. ArXiv. https://arxiv.org/abs/2312.10997.
Gödel, K. (1931). Über formal unentscheidbare Sätze der Principia Mathematica und verwandter Systeme I. Monatshefte für Mathematik und Physik, 38(1), 173–198. doi:10.1007/BF01700692
He, Z., Zhong, Z., Cai, T., Lee, J. D., & He, D. (2023). REST: Retrieval-Based Speculative Decoding. Peking University & Princeton University. Retrieved from arXiv:2311.08252v1
Huang, J., & Chang, K. C-C. (2023). Towards Reasoning in Large Language Models: A Survey. [Manuscript submitted for publication]. University of Illinois at Urbana-Champaign. https://ar5iv.org/abs/2212.10403.
Jaynes, E. T. (2003). Probability Theory: The Logic of Science. Cambridge University Press.
Kojima, T., Gu, S. S., Reid, M., Matsuo, Y., & Iwasawa, Y. (2022). Large Language Models are Zero-Shot Reasoners. ArXiv. Available at: https://arxiv.org/abs/2205.11916.
Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W., Rocktäschel, T., Riedel, S., & Kiela, D. (2020). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. ArXiv. /abs/2005.11401.
Lu, C. (2020a). AI since Aristotle: Part 1 Logic, Intuition and Paradox. Retrieved from https://medium.com/cantors-paradise/logic-intuition-and-paradox-d0881627762a
Lu, C. (2020b). AI since Aristotle: Part 2 The Limit of Logic and The Rise of the Computer. Retrieved from https://www.cantorsparadise.com/the-limit-of-logic-and-the-rise-of-the-computer-1641219f86c3
Lu, C. (2020c). AI since Aristotle: Part 3 Intuition, Complexity and the Last Paradox. Retrieved from https://www.cantorsparadise.com/intuition-complexity-and-the-last-paradox-ec0a7f8ad93b
Lu, C. (2023). From Self-Referential Paradoxes to Intelligence. Retrieved from https://medium.com/@cplu/from-self-referential-paradoxes-to-intelligence-b073cbebafe2.
Mithen, S. J. (1990). The eco-psychology of decision making. In Thoughtful Foragers: A Study of Prehistoric Decision Making (pp. 21–51). Cambridge University Press.
Pólya, G. (1945). How to Solve It: A New Aspect of Mathematical Method. Princeton University Press.
Polya, G. (1954). Induction and Analogy in Mathematics.
Poincare, H. (1969). INTUITION and LOGIC in Mathematics. The Mathematics Teacher, 62(3), 205–212.
Prokopenko, M., Harré, M., Lizier, J., Boschetti, F., Peppas, P., & Kauffman, S. (2017). Self-referential basis of undecidable dynamics: From The Liar Paradox and The Halting Problem to The Edge of Chaos. ArXiv. https://doi.org/10.1016/j.plrev.2018.12.003.
Stanford Encyclopedia of Philosophy. (n.d.). Logic and Information. Retrieved from https://plato.stanford.edu/entries/logic-information/.
Stanford Encyclopedia of Philosophy. (n.d.). The Analysis of Knowledge. Retrieved from https://plato.stanford.edu/entries/knowledge-analysis/.
Turing, A. M. (1937). On Computable Numbers, with an Application to the Entscheidungsproblem. Proceedings of the London Mathematical Society, Series 2, 42(1), 230–265. doi:10.1112/plms/s2–42.1.230.
Turing, A. M. (1950). Computing Machinery and Intelligence. Mind, 59(236), 433–460. Retrieved from https://academic.oup.com/mind/article/LIX/236/433/986238.
Turing, A. M. (2012). Alan M. Turing: Centenary Edition. (S. Turing, Ed.) Cambridge University Press. (Original work published 1959).
van Benthem, J. (2011). Logical Dynamics of Information and Interaction. Cambridge: Cambridge University Press.
Wang, J., Li, J., & Zhao, H. (2023a). Self-prompted Chain-of-Thought on Large Language Models for Open-domain Multi-hop Reasoning. ArXiv. /abs/2310.13552.
Wang, X., Narang, S. (2023b). Self-Consistency Improves Chain of Thought Reasoning in Language Models. Retrieved from https://ar5iv.org/abs/2203.11171
Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., & Zhou, D. (2022). Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. ArXiv. /abs/2201.11903.
Zhang, Z., Zhang, A., Li, M., & Smola, A. (2022). Automatic chain of thought prompting in large language models.
Zhang, Z., Zhang, X., Ren, Y., Shi, S., Han, M., Wu, Y., Lai, R., & Cao, Z. (2023). IAG: Induction-Augmented Generation Framework for Answering Reasoning Questions. Huawei Poisson Lab, China.