This article synthesizes the core themes, foundational principles, and technical concepts of Artificial Intelligence as detailed in the source text, “Artificial Intelligence: A Modern Approach.” The field of AI is presented as a vast, interdisciplinary endeavor concerned with understanding and building intelligent entities. Its scope encompasses logic, probability, continuous mathematics, perception, reasoning, learning, action, and applications ranging from microelectronics to planetary explorers.

The prevailing methodology within AI is the rational agent approach, which focuses on creating agents that act rationally to achieve the best expected outcome. This contrasts with other historical approaches that aimed to model human thought or behavior. An intelligent agent’s design is determined by the nature of its environment, leading to a spectrum of agent types, from simple reflex agents to sophisticated learning agents that use utility functions to make complex decisions under uncertainty.

The technical foundations of AI are built on several key pillars:

  • Search and Optimization: Finding solutions to problems by exploring a state space is a fundamental technique. Methods range from uninformed algorithms like breadth-first search to highly efficient informed methods like A* search, which use domain-specific heuristics. Local search techniques, such as hill-climbing and simulated annealing, are used for optimization problems.
  • Logic and Knowledge Representation: Logic provides a formal language for representing knowledge and drawing conclusions. While propositional logic deals with facts, first-order logic offers greater expressiveness by representing objects and relations. Inference algorithms like forward and backward chaining and resolution are used to reason with this knowledge.
  • Probabilistic Reasoning: To handle the uncertainty inherent in the real world, AI has shifted from purely logical approaches to probabilistic models. Bayesian networks provide a powerful framework for representing and reasoning with uncertain knowledge, capturing dependencies between variables in a compact manner. Temporal models, such as Hidden Markov Models and Dynamic Bayesian Networks, extend this reasoning to processes that unfold over time.
  • Machine Learning: The ability to learn from experience is a hallmark of intelligence. The field has seen a significant shift from hand-crafted knowledge systems to models that learn from data. Key paradigms include supervised learning (e.g., decision trees, support vector machines), unsupervised learning (e.g., clustering), and reinforcement learning, where an agent learns an optimal policy by interacting with its environment and receiving rewards or penalties.
  • Deep Learning: A subfield of machine learning, deep learning has driven recent breakthroughs in AI. Using deep neural networks with many layers (e.g., convolutional and recurrent networks), this approach has achieved state-of-the-art performance in complex domains like computer vision, speech recognition, and natural language processing, often by learning feature hierarchies directly from raw data.

The history of AI is marked by periods of optimism followed by “AI winters,” driven by the immense difficulty of the problems being tackled, such as the “combinatorial explosion.” The current era is characterized by the success of data-driven methods, fueled by large datasets and massive computational power. AI systems now exceed human performance in a variety of tasks, from games like Go and poker to medical image diagnosis. This progress brings with it significant societal considerations, including risks related to autonomous weapons, surveillance, labor displacement, and the need for robust, safe, and ethical systems.

1. Defining Artificial Intelligence

AI is a broad field concerned with building intelligent entities. There are four main historical approaches to defining it, categorized along two dimensions: whether the focus is on thinking or behavior, and whether the standard for success is human fidelity or rationality.

  • Acting Humanly (The Turing Test Approach): Proposed by Alan Turing (1950), this approach defines intelligence via a behavioral test. A machine passes if a human interrogator, after a conversation, cannot tell if they are conversing with a machine or a person. This requires capabilities in natural language processing, knowledge representation, automated reasoning, and machine learning.
  • Thinking Humanly (The Cognitive Modeling Approach): This approach aims to build computer models that think like humans. It involves comparing the reasoning steps of a program to those of human subjects. This area is closely linked with the interdisciplinary field of cognitive science.
  • Thinking Rationally (The “Laws of Thought” Approach): This approach is rooted in logic, starting with Aristotle’s attempts to codify “right thinking.” The goal is to build systems that reason logically to derive correct conclusions from knowledge. A key limitation is that it does not inherently generate intelligent behavior and struggles with uncertainty.
  • Acting Rationally (The Rational Agent Approach): This is the prevailing approach in modern AI. An agent is a system that perceives and acts. A rational agent is one that acts to achieve the best expected outcome. This approach is more general than the “laws of thought” approach because correct inference is only one mechanism for achieving rationality, and it is more amenable to scientific development than approaches based on human behavior.

A crucial challenge for the rational agent approach is that a system deployed with an incorrect objective will have negative consequences, with the severity increasing with the system’s intelligence. Modern research has shifted towards creating machines that pursue objectives beneficial to humans but are uncertain as to what those objectives are.

2. The Foundations and History of AI

AI is an interdisciplinary field that draws on concepts and tools from several other disciplines. Its history is marked by distinct eras of research focus, progress, and setbacks.

2.1 Foundations of Artificial Intelligence

FieldCore Contributions to AI
PhilosophyProvided the concepts that the mind can be viewed as a machine, that it operates on knowledge encoded in an internal language, and that thought can be used to choose actions.
MathematicsProvided the tools for formal logic (certainty), probability (uncertainty), and computation itself. Key areas include computability (Church-Turing thesis) and tractability (NP-completeness).
EconomicsFormalized the problem of making rational decisions to maximize a desired outcome (utility). Contributed to the notion of rational agents, but AI research developed separately for many years due to different problem complexities.
NeuroscienceEstablished that the brain enables thought and that a collection of simple neurons can lead to cognition, action, and consciousness. Provides inspiration for computer architectures (e.g., neural networks).
PsychologyThe behaviorism movement insisted on objective measures of percepts and actions. Cognitive psychology, which views the brain as an information-processing device, provides the foundation for modeling human thinking.
Computer EngineeringProvided the artifact upon which AI is built: the digital computer. Increases in computing power and parallelism have been a key driver of AI progress.
Control Theory & CyberneticsFocused on building self-controlling artifacts that modify behavior based on environmental feedback. Modern control theory and AI are increasingly converging.
LinguisticsInvestigates the nature of language. Modern NLP intersects heavily with AI, with both fields trying to understand how knowledge representation is tied to language.

2.2 Abridged History of Artificial Intelligence

  • Gestation (1943–1955): Early work on artificial neurons (McCulloch & Pitts, 1943) and learning rules (Hebb, 1949). Marvin Minsky and Dean Edmonds built the first neural network computer, SNARC, in 1950.
  • The Birth of AI (1956): The field was formally named at the Dartmouth workshop. Early successes included Newell and Simon’s Logic Theorist, a program that could prove mathematical theorems.
  • Early Enthusiasm (1952–1969): Researchers focused on limited “microworlds” where intelligent behavior appeared achievable. Key programs included Slagle’s SAINT (calculus integration) and Evans’s ANALOGY (IQ test problems).
  • A Dose of Reality (1966–1973): Early methods failed to scale to larger problems due to the “combinatorial explosion.” The Lighthill report (1973) in the UK led to severe cuts in AI research funding. Research on neural networks also dwindled after Minsky and Papert’s book Perceptrons (1969) highlighted their limitations.
  • Expert Systems (1969–1986): The focus shifted to knowledge-intensive systems that solved problems in specific domains. The DENDRAL program was the first successful knowledge-intensive system. This era saw the commercialization of AI with “expert system shells.”
  • The “AI Winter” (late 1980s): A period where many companies failed due to extravagant promises, difficulty in maintaining complex expert systems, and the systems’ inability to learn from experience.
  • The Return of Neural Networks (1986–present): The reinvention of the back-propagation algorithm led to a resurgence of connectionist models.
  • Probabilistic Reasoning and Machine Learning (1987–present): The field shifted towards rigorous, data-driven, and probabilistic methods. Bayesian networks became a primary tool for handling uncertainty.
  • Big Data and Deep Learning (2001–present): The availability of massive datasets and powerful hardware (especially GPUs) enabled the success of deep learning models, which learn hierarchies of features from raw data and have driven the current state of the art.

3. The State of the Art and Societal Impact

Modern AI systems have achieved or surpassed human-level performance on a growing number of tasks.

3.1 Capabilities and Applications

  • Game Playing: AI systems have defeated human champions in games including chess (Deep Blue vs. Kasparov), Go (AlphaGo vs. Lee Sedol), Jeopardy!, poker, and complex video games like Dota 2 and StarCraft II.
  • Autonomous Vehicles: Waymo test vehicles passed 10 million miles on public roads in 2018 with minimal human intervention and began offering commercial robotic taxi services. Autonomous drones perform deliveries and aerobatic maneuvers.
  • Language Understanding: Accuracy on the SQUAD question-answering dataset increased from 60% to 95% between 2015 and 2019. Machine translation systems are widely used.
  • Image Understanding: On the ImageNet object recognition task, AI accuracy has surpassed human performance. Systems can also perform image captioning and visual question answering.
  • Medicine: AI programs demonstrate performance equivalent to human healthcare professionals in diagnosing diseases from images, including metastatic cancer and ophthalmic disease. The LYNA system achieved 99.6% accuracy in diagnosing metastatic breast cancer.

3.2 Risks, Benefits, and Ethical Considerations

The increasing capability and deployment of AI systems raise significant societal questions.

  • Benefits: Access to greater machine intelligence has the potential to raise the ceiling on human ambition, freeing humanity from repetitive labor and driving progress in science, medicine, and other fields.
  • Risks and Challenges:
    • Lethal Autonomous Weapons: Systems that can autonomously decide to kill humans pose a profound risk. Formal UN discussions on this topic began in 2014.
    • Surveillance and Persuasion: AI enables mass surveillance and the generation of targeted or fake content to persuade and manipulate individuals.
    • Labor Displacement: AI may automate tasks previously done by humans, potentially exacerbating economic inequality by shifting wealth from labor to capital.
    • Bias and Fairness: Systems trained on biased data can perpetuate or amplify societal biases.
    • Safety-Critical Applications: Ensuring the reliability and safety of AI in domains like self-driving cars and automated trading is paramount.

Governance and regulation are seen as critical to navigating these challenges, with research communities and corporations developing voluntary self-governance principles while governments establish advisory bodies.

4. Core Concepts: Intelligent Agents

The unifying theme of the text is the concept of the intelligent agent.

  • Agents and Environments: An agent perceives its environment through sensors and acts upon it through actuators. The agent function maps any given percept sequence to an action.
  • Rationality: A rational agent is one that acts to maximize its expected performance measure, given its percept sequence and built-in knowledge. The performance measure defines the criterion for success.
  • Task Environments (PEAS): Environments are specified by their Performance measure, Environment, Actuators, and Sensors. They can be characterized along several dimensions:
    • Observability: Fully vs. partially observable.
    • Agency: Single-agent vs. multiagent (which can be cooperative or competitive).
    • Determinism: Deterministic vs. stochastic (or nondeterministic).
    • Temporality: Episodic vs. sequential.
    • State: Static vs. dynamic.
    • Values: Discrete vs. continuous.
    • Knowledge: Known vs. unknown rules of the environment.
  • Types of Agent Programs:
    1. Simple Reflex Agents: Act based only on the current percept, ignoring percept history (using condition-action rules).
    2. Model-Based Reflex Agents: Maintain an internal state to track aspects of the world they cannot currently see. This requires a model of how the world evolves and how the agent’s actions affect it.
    3. Goal-Based Agents: Act to achieve goals. Search and planning are used to find action sequences that achieve these goals.
    4. Utility-Based Agents: Use a utility function to evaluate the desirability of different world states, allowing for rational decisions when goals conflict or are uncertain. They aim to maximize expected utility.
    5. Learning Agents: Can improve their performance over time by learning from experience.
  • Representation of States:
    • Atomic: States are black boxes with no internal structure. Used in problem-solving search.
    • Factored: A state is a vector of attribute values. Used in constraint satisfaction and planning.
    • Structured: A state includes objects and relations between them. Used in first-order logic and natural language understanding.

5. Problem Solving by Search

Search is a fundamental process for finding a sequence of actions (a solution) to reach a goal.

  • Problem Formulation: A search problem is defined by an initial state, a set of possible actions, a transition model (describing action outcomes), a goal test, and a path cost function. The state space is the graph of all reachable states.

5.1 Uninformed Search Strategies

These strategies use only the information available in the problem definition.

CriterionBreadth-FirstUniform-CostDepth-FirstDepth-LimitedIterative DeepeningBidirectional
Complete?YesYesNoNoYesYes
TimeO(b^d)O(b^(1+floor(C*/ε)))O(b^m)O(b^l)O(b^d)O(b^(d/2))
SpaceO(b^d)O(b^(1+floor(C*/ε)))O(bm)O(bl)O(bd)O(b^(d/2))
Optimal?Yes (if cost=1)YesNoNoYes (if cost=1)Yes
b: branching factor, d: depth of shallowest solution, m: max depth, l: depth limit, C: cost of optimal solution, ε: min action cost.*

Iterative deepening search is generally the preferred uninformed method when the state space is large and the solution depth is unknown.

5.2 Informed (Heuristic) Search Strategies

These strategies use domain-specific knowledge in the form of a heuristic function, h(n), which estimates the cost from node n to a goal.

  • Greedy Best-First Search: Expands the node that appears to be closest to the goal (minimizes h(n)).
  • A* Search: Expands the node with the lowest evaluation f(n) = g(n) + h(n), where g(n) is the cost to reach the node. A* is optimal if h(n) is an admissible heuristic (never overestimates the true cost).

5.3 Heuristic Functions

  • Relaxed Problems: A common way to create admissible heuristics is to derive them from a relaxed version of the problem (e.g., removing some constraints). The cost of an optimal solution in the relaxed problem is an admissible heuristic for the original problem.
  • Pattern Databases: Store the exact solution costs for subproblems, which serve as admissible heuristics.

5.4 Local Search and Optimization

These algorithms operate on a single current state, moving to neighboring states to find an optimal solution.

  • Hill-Climbing: Continuously moves “uphill” to states with higher value. Can get stuck in local maxima. Variants include random-restart and sideways moves.
  • Simulated Annealing: Combines hill-climbing with random walk to escape local maxima by occasionally allowing “downhill” moves.
  • Genetic Algorithms: Maintain a population of states and use selection, crossover, and mutation to generate new, better states.

6. Adversarial Search and Game Theory

This area deals with multiagent environments where agents’ goals may be in conflict.

  • Games: Defined by an initial state, players, actions, a transition model, and a terminal test with a utility function. Most AI research has focused on two-player, zero-sum games of perfect information (e.g., chess, Go).
  • Minimax Algorithm: Finds the optimal move by assuming the opponent also plays optimally. It recursively computes the utility of each state by maximizing its own utility and minimizing the opponent’s.
  • Alpha-Beta Pruning: Optimizes minimax search by pruning branches of the search tree that cannot influence the final decision.
  • Heuristic Evaluation: For complex games, search is limited to a certain depth, and the utility of non-terminal states is estimated using a heuristic evaluation function.
  • Monte Carlo Tree Search (MCTS): Evaluates states by running many fast, random playouts of the game to the end and averaging the outcomes. MCTS balances exploration of new moves with exploitation of promising ones. It has been highly successful in games like Go.
  • Games of Imperfect Information: (e.g., poker, Kriegspiel) require reasoning about belief states over possible opponent configurations.

7. Knowledge, Reasoning, and Logic

Logic provides a formal framework for representing knowledge and performing inference.

7.1 Propositional Logic

  • Syntax and Semantics: Represents facts as proposition symbols. Sentences are built using logical connectives (, , ¬, , ). A model assigns a truth value (true/false) to every proposition symbol.
  • Inference: The process of deriving new sentences from existing ones.
    • Model Checking: An algorithm like TT-ENTAILS? enumerates all possible models to check for entailment.
    • Theorem Proving: Applies rules of inference directly to sentences. Key methods include:
      • Resolution: A single, complete inference rule for knowledge bases in Conjunctive Normal Form (CNF).
      • Forward and Backward Chaining: Efficient inference algorithms for Horn clauses.

7.2 First-Order Logic (FOL)

FOL is more expressive than propositional logic.

  • Ontological Commitment: Represents a world of objects, properties of those objects, and relations among them.
  • Syntax: Includes constants, variables, predicates, functions, and quantifiers ( for universal, for existential).
  • Inference: Lifted from propositional logic to handle variables.
    • Unification: An algorithm for finding substitutions that make different logical expressions look identical.
    • Generalized Modus Ponens: A lifted version of Modus Ponens, forming the basis for forward and backward chaining in FOL.
    • Resolution: A lifted version of the resolution rule provides a complete inference procedure for FOL.

7.3 Advanced Knowledge Representation

  • Categories and Ontologies: Organizing objects into categories (taxonomies) allows for efficient reasoning via inheritance. Description logics are formal languages for defining and reasoning about categories.
  • Default Reasoning: Handling exceptions and default assumptions (e.g., “birds fly”) requires nonmonotonic logics, as the set of beliefs does not grow monotonically with new evidence.
  • Events and Time: The event calculus is a logic-based formalism for representing and reasoning about events and their effects on fluents (properties that change over time).

8. Planning

Planning agents find a sequence of actions to achieve a goal, using factored or structured state representations.

  • Classical Planning: Assumes a fully observable, deterministic, static environment with a known initial state and goal. Problems are often described in languages like PDDL (Planning Domain Definition Language).
  • Algorithms:
    • Forward (Progression) State-Space Search: Searches from the initial state towards the goal. Can use domain-independent heuristics.
    • Backward (Regression) State-Space Search: Searches backward from the goal.
    • Planning Graphs: A data structure used by algorithms like Graphplan to derive powerful heuristics by analyzing reachability.
    • SATPLAN: Encodes a planning problem as a Boolean satisfiability problem.
  • Hierarchical Task Network (HTN) Planning: Allows for planning with high-level actions (HLAs) that can be refined into lower-level action sequences, enabling more complex problems to be solved.
  • Planning in Nondeterministic and Partially Observable Environments:
    • Sensorless (Conformant) Planning: Finds a sequence of actions that works regardless of the initial state or action outcomes. Involves reasoning over belief states.
    • Contingency Planning: Generates plans with conditional branches to handle different percepts.
    • Online Planning: An agent interleaves planning and execution, replanning when necessary.

9. Probabilistic Reasoning

To handle uncertainty, agents use probability theory to represent degrees of belief.

  • Syntax: Random variables represent attributes of the world. A full joint probability distribution specifies the probability of every possible assignment of values to all variables.
  • Axioms of Probability: Basic rules governing probability assignments. From these, one can derive key concepts like conditional probability.
  • Bayes’ Rule: A fundamental rule for updating beliefs based on new evidence: P(cause | effect) = (P(effect | cause) * P(cause)) / P(effect).
  • Independence and Conditional Independence: These relationships allow for the compact representation of knowledge and simplify inference. A naive Bayes model, for example, assumes all effects are conditionally independent given the cause.

9.1 Bayesian Networks

A Bayesian network is a directed acyclic graph that provides a compact representation of a full joint probability distribution.

  • Structure: Nodes represent random variables. Directed arcs represent direct causal influences.
  • Semantics: Each node is associated with a conditional probability table (CPT) that specifies the probability of its value given the values of its parents. The network provides a concise specification of conditional independence relationships.

9.2 Inference in Bayesian Networks

  • Exact Inference: Can compute the posterior probability of any variable given evidence.
    • Enumeration: Sums over all variables. Time complexity is exponential in the number of variables.
    • Variable Elimination: More efficient by reordering calculations and storing intermediate results (factors).
  • Approximate Inference (MCMC): Used when exact inference is intractable.
    • Rejection Sampling: Generates samples from the prior and rejects those inconsistent with the evidence. Inefficient if evidence is unlikely.
    • Likelihood Weighting: Fixes evidence variables and weights each sample by the likelihood of the evidence.
    • Gibbs Sampling: A Markov chain Monte Carlo (MCMC) method that samples one variable at a time, conditioned on all other variables. Converges to the true posterior distribution.

9.3 Reasoning Over Time

  • Temporal Models: Use variables indexed by time to represent dynamic processes.
  • Markov Assumption: The current state depends only on a finite number of previous states.
  • Hidden Markov Models (HMMs): A model with a single discrete state variable. Key tasks include:
    • Filtering: Computing the belief state given evidence up to the current time.
    • Prediction: Computing future belief states.
    • Smoothing: Computing past belief states given all evidence (using the forward-backward algorithm).
    • Most Likely Explanation: Finding the most likely sequence of states given observations (using the Viterbi algorithm).
  • Kalman Filters: Handle continuous state variables under linear-Gaussian assumptions.
  • Dynamic Bayesian Networks (DBNs): Generalize HMMs to handle multiple state variables.

10. Making Decisions

10.1 Simple Decisions

  • Utility Theory: A rational agent should choose the action that maximizes its expected utility (MEU).
  • Axioms of Utility: A set of preference constraints (e.g., orderability, transitivity) that imply the existence of a utility function.
  • Decision Networks (Influence Diagrams): A graphical representation for decision problems, extending Bayesian networks with nodes for actions (rectangles) and utilities (diamonds).

10.2 Complex Decisions (Sequential Problems)

  • Markov Decision Process (MDP): A formal model for sequential decision problems in fully observable, stochastic environments. It is defined by a set of states, actions, a transition model P(s' | s, a), and a reward function R(s, a, s').
  • Policies: A solution to an MDP is a policy π(s), which specifies an action for each state.
  • Bellman Equation: Relates the utility of a state to the expected utility of its successors. For an optimal policy, it is: U(s) = max_a Σ_s' P(s' | s, a) [R(s, a, s') + γU(s')], where γ is a discount factor.
  • Algorithms for Solving MDPs:
    • Value Iteration: An iterative algorithm that calculates the utility of each state until convergence.
    • Policy Iteration: Alternates between evaluating a policy and improving it until the policy is optimal.
  • Partially Observable MDPs (POMDPs): A model for sequential decision making in partially observable environments. The agent must act based on a belief state (a probability distribution over states). Solving POMDPs is computationally much harder than solving MDPs.

11. Machine Learning

Machine learning enables systems to improve their performance based on experience.

  • Forms of Learning:
    • Supervised Learning: Learns a function from labeled training examples (x, y).
    • Unsupervised Learning: Finds patterns in unlabeled data.
    • Reinforcement Learning: Learns what to do from rewards and punishments.
  • Model Selection: Finding a hypothesis that generalizes well is crucial. This involves avoiding overfitting.
    • Cross-Validation: A technique for estimating the generalization error of a model by splitting the data into training and validation sets.
    • Regularization: Penalizing model complexity to prevent overfitting.

11.1 Supervised Learning Models

  • Decision Trees: Classify examples by sorting them down a tree from the root to a leaf node. Learned by recursively choosing the attribute with the highest information gain.
  • Linear Regression and Classification:
    • Linear Regression: Fits a linear function to a set of data points.
    • Linear Classifier (Perceptron): Uses a linear function with a hard threshold to separate classes.
    • Logistic Regression: Uses a logistic function to produce a probabilistic output. Trained using gradient descent to minimize loss.
  • Nonparametric Models: The hypothesis complexity grows with the data.
    • k-Nearest Neighbors (k-NN): Classifies a new example based on the majority class of its k closest neighbors in the training data.
    • Support Vector Machines (SVMs): Find a maximum margin separator between classes. The kernel trick allows SVMs to create complex, non-linear decision boundaries by implicitly mapping data to a higher-dimensional space.
  • Ensemble Learning: Combines multiple hypotheses to create a more accurate predictor.
    • Bagging: Trains multiple models on different random samples of the training set. A random forest is a popular bagging method using decision trees.
    • Boosting: Trains a sequence of models, where each model focuses on the examples that the previous ones got wrong.

11.2 Reinforcement Learning (RL)

An agent learns a policy by interacting with an environment.

  • Passive vs. Active RL: A passive agent executes a fixed policy and learns its utility. An active agent must also learn what actions to take.
  • Exploration vs. Exploitation: An active agent must balance trying new actions to learn their value (exploration) with taking actions known to be good (exploitation).
  • Model-Based vs. Model-Free:
    • Model-Based (e.g., Adaptive Dynamic Programming): The agent learns a model of the environment (transition probabilities and rewards) and uses it to solve the MDP.
    • Model-Free (e.g., TD Learning, Q-Learning): The agent learns a utility function or an action-utility function (Q-function) directly, without learning a model. Q-learning is a popular model-free algorithm that learns Q(s, a).
  • Policy Search: Directly searches the space of policies to find one with high reward, often using gradient-based methods.
  • Inverse Reinforcement Learning (IRL): Infers the reward function an expert is optimizing by observing their behavior.

12. Deep Learning and Applications

Deep learning uses neural networks with multiple hidden layers to learn representations of data.

12.1 Neural Networks

  • Structure: A network of interconnected units (neurons). Each unit computes a weighted sum of its inputs and applies a non-linear activation function (e.g., ReLU).
  • Learning: The weights are learned using back-propagation, which is an efficient implementation of gradient descent to minimize a loss function.
  • Convolutional Neural Networks (CNNs): A specialized architecture for processing grid-like data (e.g., images). It uses convolution layers to detect local features and pooling layers to downsample, creating a hierarchy of features.
  • Recurrent Neural Networks (RNNs): A specialized architecture for sequential data (e.g., text, speech). They have connections that loop back, allowing them to maintain a hidden state that acts as a memory.
  • Transformers: A modern architecture, particularly for NLP, that relies on a self-attention mechanism to model long-distance dependencies in data more effectively than RNNs.

12.2 Applications in NLP and Vision

  • Natural Language Processing (NLP):
    • Language Models: Assign probabilities to sequences of words. N-gram models and modern RNN/Transformer models are common.
    • Parsing: Analyzing the grammatical structure of sentences (e.g., using PCFGs).
    • Word Embeddings: Represent words as dense vectors in a low-dimensional space, capturing semantic relationships.
    • Transfer Learning: Large models (e.g., BERT, GPT-2) are pretrained on vast text corpora and then fine-tuned for specific tasks like question answering or machine translation.
  • Computer Vision:
    • Image Formation: The process by which a 3D scene is projected onto a 2D image.
    • Object Recognition: Includes image classification (what is in the image?) and object detection (where are the objects?). Modern systems are dominated by CNNs.
    • 3D Vision: Reconstructing 3D models from 2D images using techniques like stereo vision and structure from motion.

13. Robotics

Robots are physical agents that perceive and manipulate the physical world.

  • Hardware: Robots are equipped with sensors (e.g., cameras, LiDAR, GPS) to perceive and effectors/actuators (e.g., motors, wheels, grippers) to act.
  • Perception:
    • Localization: Determining the robot’s position and orientation. Probabilistic methods like particle filtering are effective.
    • Mapping: Building a map of the environment.
    • Simultaneous Localization and Mapping (SLAM): The problem of building a map while simultaneously localizing the robot within it.
  • Motion Planning: Finding a collision-free path from a start to a goal configuration.
    • Configuration Space: The space of all possible robot configurations. Obstacles in the real world map to complex obstacle regions in configuration space.
    • Algorithms: Include cell decomposition, probabilistic roadmaps (PRM), and rapidly-exploring random trees (RRT).
  • Control: Executing a planned trajectory by sending commands to the actuators.
    • Feedback Control (e.g., PID Control): Adjusts motor commands based on the error between the current state and the desired state.
  • Human-Robot Interaction (HRI): Involves challenges like predicting human intent, coordinating actions, and learning from human demonstrations.

Get the book