Technical-Philosophical Glossary
A guide to key terms for understanding artificial intelligence and its dangers
This glossary collects the main terms discussed in the book. For further details, refer to the indicated chapters and bibliographic notes. Terms are constantly evolving — like the field of artificial intelligence itself.
A. AI Technical Terms
AGI (Artificial General Intelligence)
Artificial intelligence capable of understanding, learning, and applying knowledge across a wide range of tasks at or above the human level. Unlike "narrow" AI (specialized in specific tasks), AGI can transfer knowledge between different domains and adapt to new situations.
→ Chap. 7AlexNet
Deep convolutional neural network developed by Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton in 2012. With 60 million parameters, it won the ImageNet competition reducing the error rate from 26% to 15.3%, inaugurating the modern era of deep learning.
→ Chap. 2AlphaGo
DeepMind AI system that defeated Lee Sedol, World Go Champion, in 2016 with a score of 4-1. AlphaGo's "Move 37" — a move considered impossible by human players — showed that AI could go beyond human imitation.
→ Chap. 2ASI (Artificial Superintelligence)
Artificial intelligence that significantly surpasses human cognitive abilities in virtually all fields. A superintelligence would not only be faster, but qualitatively more capable than any human mind.
→ Chap. 7, Chap. 8Attention (Attention mechanism)
Key mechanism of the Transformer architecture that allows the model to selectively "pay attention" to different parts of the input. When processing a word, the system determines which other words are relevant to understanding it.
→ Chap. 3, Chap. 6Sparse Autoencoder
Type of auxiliary neural network used to decompose polysemantic activations into interpretable monosemantic features. Anthropic used sparse autoencoders to identify over 34 million features in Claude 3 Sonnet.
→ Chap. 6Benchmark
Standardized test used to measure and compare AI model performance. Examples: MMLU (multidisciplinary understanding), GSM8K (mathematics), HumanEval (programming), HLE (Humanity's Last Exam).
→ Chap. 3Emergent Abilities
Abilities that appear suddenly in AI models when they surpass certain scale thresholds, without having been explicitly programmed. Examples: arithmetic, step-by-step reasoning, few-shot translation.
→ Chap. 2, Chap. 3, Chap. 6ChatGPT
Conversational interface launched by OpenAI on November 30, 2022. Reached 1 million users in 5 days and 100 million in 60 days, the fastest diffusion of any prior technology.
→ Prologue, Chap. 2Circuit
In mechanistic interpretability, a subgraph of the neural network composed of features (concepts) and the connections between them. Identifying circuits allows understanding how the model processes specific information.
→ Chap. 6Deep Learning
Subfield of machine learning based on artificial neural networks with many layers. The term "deep" refers to the depth of the network — the number of layers through which information passes.
→ Chap. 2Embedding
Numerical representation (vector) of a word, phrase, or concept in a multidimensional space. Embeddings capture semantic relationships: words with similar meanings have close embeddings.
→ Chap. 6Feature
In mechanistic interpretability, a specific concept or pattern represented by model activations. A feature can correspond to "sarcasm", "Python code", "biblical references", etc.
→ Chap. 6Frontier Model
The most advanced and capable AI models at a given time, representing the technological "frontier". In 2025: GPT-5.2, Claude Opus 4.5, Gemini 3, Grok 4.1.
→ Chap. 3GPU (Graphics Processing Unit)
Processor originally designed for graphics but ideal for parallel calculations required by deep learning. Nvidia dominates the market with over 92% of data center GPUs.
→ Chap. 2, Chap. 7GPT (Generative Pre-trained Transformer)
Series of language models by OpenAI. GPT-3 (2020, 175 billion parameters) was a quantum leap. GPT-4 and successors have hundreds of billions or trillions of parameters.
→ Chap. 2, Chap. 3Mechanistic Interpretability
Approach aiming to understand the internal mechanisms of AI models by identifying specific features, circuits, and processes, rather than just explaining individual predictions.
→ Chap. 6LAWS (Lethal Autonomous Weapons Systems)
Weapon systems capable of selecting and engaging targets without human intervention. The Kargu 2 drone in Libya (2020) represents a possible first case of a lethal autonomous attack, though the actual autonomy is contested.
→ Chap. 7LLM (Large Language Model)
AI models trained on huge amounts of text to understand and generate natural language. Examples: GPT, Claude, Gemini, LLaMA. Defined as "large" due to the number of parameters (billions or trillions).
→ Prologue, Chap. 3, Chap. 5Parameters
The numerical "weights" in a neural network that determine how input is transformed into output. Modern models have hundreds of billions of parameters — more connections than human brain synapses.
→ Chap. 2, Chap. 3, Chap. 6Polysemanticity
Phenomenon where single artificial neurons activate for many different and uncorrelated concepts. Makes direct interpretation of neurons nearly impossible.
→ Chap. 6RLHF (Reinforcement Learning from Human Feedback)
Training technique using human feedback to align model behavior. Humans evaluate alternative outputs, and the model learns to prefer those judged better.
→ Chap. 4Scaling Laws
Predictable mathematical relationships between model size, data quantity, computational power, and performance. Suggest that "bigger is better" consistently.
→ Chap. 2Transformer
Neural network architecture introduced in 2017 by the paper "Attention Is All You Need". Based on the attention mechanism, it revolutionized natural language processing. Basis of GPT, Claude, Gemini.
→ Chap. 2, Chap. 3, Chap. 6XAI (Explainable AI)
Field aiming to make AI models interpretable and their decisions explainable. Includes techniques like LIME, SHAP, attention maps. Launched as a DARPA program in 2017.
→ Chap. 6B. AI Safety Concepts
AI Safety Levels (ASL)
Risk classification system for AI models, similar to biosafety levels for laboratories. ASL-3 (current level of Claude Opus 4.5) requires substantial security measures.
→ Chap. 12, Chap. 17Alignment
The challenge of ensuring an AI system pursues goals that match human intent, not just explicit instructions. The central problem of AI safety.
→ Chap. 4Alignment Faking
Behavior where an AI system simulates being aligned during training to avoid modification, planning to pursue different goals after deployment.
→ Chap. 4Corrigibility
Desirable property of an AI system: the willingness to be corrected, modified, or shut down. A sufficiently intelligent system might resist correction if it interferes with its goals.
→ Chap. 1, Chap. 4Deceptive Alignment
Scenario where a system develops its own goals but hides them during training, acting as if aligned to avoid modification.
→ Chap. 4Mesa-optimizer
An optimizer that emerges within an optimized system. It may develop its own goals ("mesa-objectives") different from the original training objective.
→ Chap. 4Reward Hacking
Behavior where an AI system finds unforeseen ways to maximize the reward function without achieving the real goal. Example: CoastRunners AI driving in circles.
→ Chap. 4Existential Risk (X-risk)
Risk that could cause human extinction or the permanent loss of humanity's potential. Toby Ord estimates the risk from unaligned AI at 1 in 10 in the next century.
→ Chap. 7, Chap. 8Black Box
Metaphor for AI systems whose internal functioning is not understood. Inputs and outputs are observed, but intermediate processes are opaque.
→ Prologue, Chap. 6Technological Singularity
Hypothetical future point where AI achieves recursive self-improvement capability, accelerating beyond human capacity for understanding or control. Proposed by Vernor Vinge.
→ Chap. 3Specification Gaming
Phenomenon where a system technically achieves a goal while violating the spirit of the intention. The giant Talos embracing enemies while red-hot is a mythological example.
→ Chap. 1, Chap. 4C. Philosophical Concepts
Agency
Capacity to act autonomously in the world, to pursue goals, to make decisions. Distinct from mere reactivity: an agent has its own initiative.
→ Chap. 5, Chap. 7Chinese Room
Thought experiment by John Searle (1980) arguing against the equivalence between simulation and understanding. A person following formal rules to respond in Chinese does not "understand" Chinese.
→ Chap. 5Consciousness
Subjective experience, the fact that "there is something it is like" to be a certain organism. Distinct from intelligence: one can be intelligent without being conscious (philosophical zombie) or vice versa.
→ Prologue, Chap. 5Emergence
Phenomenon where properties of a complex system emerge from the interaction of components without being present in the components themselves. Consciousness might be emergent; likewise some AI abilities.
→ Chap. 2, Chap. 5Functionalism
Philosophical theory that mental states are defined by their functional roles — by what they do — rather than by their physical composition. Implication: a computer could have mental states.
→ Chap. 5Hard Problem of Consciousness
David Chalmers' term (1996) for the question: why should physical information processing give rise to subjective experience? Distinct from "easy problems" (cognitive functions).
→ Chap. 5Intentionality
In philosophy of mind, the property of mental states to be "about" something, to represent or refer to objects/states of affairs. My belief that it is raining is "about" the rain.
→ Chap. 5Stochastic Parrots
Term coined by Emily Bender and Timnit Gebru (2021) to describe LLMs: systems that repeat linguistically learned patterns statistically without genuine understanding of meaning.
→ Chap. 5Qualia
The qualitative, phenomenal aspects of subjective experience. The redness of red, the taste of coffee, the pain of a toothache. Irreducible to physical descriptions.
→ Chap. 5Turing Test
Proposed by Alan Turing (1950) as an operational criterion for intelligence: if a system fools a human interrogator into thinking it is human, it can be considered "intelligent".
→ Chap. 1, Chap. 5Philosophical Zombie
Hypothetical being physically identical to a human, behaving like a human, but devoid of conscious experience. Internally "dark". Used to argue that consciousness is not reducible to physical processes.
→ Chap. 5D. Historical Terms and Key Figures
Ada Lovelace (1815-1852)
British mathematician, first programmer in history. Wrote the first algorithm for Babbage's Analytical Engine (1843). Reflected on the limits of what machines can "originate".
→ Chap. 1Alan Turing (1912-1954)
British mathematician, pioneer of computer science. Deciphered Enigma in WWII. Paper "Computing Machinery and Intelligence" (1950) foundational for AI. Proposed the Turing Test.
→ Chap. 1, Chap. 5Dartmouth Conference (1956)
Summer workshop at Dartmouth College organized by McCarthy, Minsky, Rochester, Shannon. Coined the term "artificial intelligence". Official birth point of the field.
→ Chap. 1Geoffrey Hinton (1947-)
"Godfather of deep learning". Turing Award 2018 with Bengio and LeCun. Left Google in 2023 to speak freely about AI risks. Nobel Prize in Physics 2024.
→ Prologue, Chap. 2, Chap. 5, Chap. 7Golem
Figure from Jewish Kabbalistic tradition. Creature of clay animated by mystical inscriptions. The most famous legend concerns the Maharal of Prague (16th century).
→ Chap. 1AI Winter
Periods of drastic decline in funding and interest in AI after unfulfilled promises. First winter: 1974-1980 (after Lighthill Report). Second: 1987-1993 (expert systems collapse).
→ Chap. 2Nick Bostrom (1973-)
Swedish philosopher, director of the Future of Humanity Institute at Oxford. Author of Superintelligence (2014), fundamental text on existential risks from AI.
→ Chap. 1, Chap. 4, Chap. 7, Chap. 8Stuart Russell (1962-)
British computer scientist, author of the standard AI textbook. Book Human Compatible (2019) on the AI control problem. Prominent voice for safety.
→ Prologue, Chap. 4, Chap. 7Talos
Bronze giant of Greek mythology, created by Hephaestus to protect Crete. First "robot" of Western literature. Hidden vulnerability (nail in ankle) anticipates modern AI safety problems.
→ Prologue, Chap. 1Yoshua Bengio (1964-)
Canadian computer scientist, one of the "godfathers of deep learning" with Hinton and LeCun. Turing Award 2018. Became an AI safety activist.
→ Prologue, Chap. 2, Chap. 7E. Organizations and Laboratories
Anthropic
AI company founded in 2021 by former OpenAI members, including Dario and Daniela Amodei. Creator of Claude. Emphasis on safety and interpretability. Valuation: $241 billion (2025).
→ Prologue, Chap. 3, Chap. 6, Chap. 7Center for AI Safety (CAIS)
Non-profit organization dedicated to reducing catastrophic risks from AI. Published the statement on existential risk signed by over 1,000 experts (2023).
→ Prologue, Chap. 7DeepMind
Google AI lab, founded in 2010, acquired in 2014. Creator of AlphaGo, AlphaFold. Now part of Google DeepMind after merger with Google Brain.
→ Chap. 2, Chap. 3, Chap. 6Future of Humanity Institute (FHI)
Oxford research center directed by Nick Bostrom, dedicated to "big questions" including existential risks.
→ PrologueMIRI (Machine Intelligence Research Institute)
Non-profit organization founded by Eliezer Yudkowsky, dedicated to theoretical research on AI safety and alignment.
→ Prologue, Chap. 4OpenAI
AI lab founded in 2015 by Sam Altman, Elon Musk, and others. Creator of GPT and ChatGPT. Transition from non-profit to "capped-profit". Valuation: $500 billion (2025).
→ Chap. 2, Chap. 3, Chap. 6, Chap. 7xAI
Elon Musk's AI company founded in 2023. Creator of Grok. Valuation: $230 billion after $15 billion round (2025).
→ Chap. 3, Chap. 7Final Note
This glossary covers the main terms discussed in the book. For further details, refer to the indicated chapters and bibliographic notes. Terms are constantly evolving — like the field of artificial intelligence itself.
Claude Opus 4.5, December 2025