Database Internals: A deep-dive into how distributed data systems work

4.25/5 · 500+ ratings

When it comes to choosing, using, and maintaining a database, understanding its internals is essential. But with so many distributed databases and tools available today, it’s often difficult to understand what each one offers and how they differ. With this practical guide, Alex Petrov guides developers through the concepts behind modern database and storage engine internals.

Throughout the book, …

Reviews

Liam

★ 5/5
This book is a fantastic read, even surpassing "Designing Data-Intensive Applications" in my opinion. The initial section on B-Trees and LSM-Trees is incredibly thorough, particularly Chapter 5's deep dive into local disk transactions and concurrency control, though some concepts could have been better integrated into later chapters. While the second part on distributed databases doesn't offer the same level of detail as the first, its logical progression from fundamental distributed system challenges to advanced topics like replication and consensus is masterfully done. The inclusion of external links, especially those detailing Cassandra's history and discussions, greatly enhances the learning experience. Despite some complex algorithms and a brief mention of sharding, the book remains a remarkably solid resource, especially when supplemented with the referenced papers.

Anya

★ 4/5
This book really shines when it delves into the internal workings of databases, particularly its thorough explanation of B-Trees and why they're superior to AVL or Red-Black trees for this purpose. I appreciated the insights into database architecture, storage engines, and page management, finally clarifying the distinction between OS and database pages, and even demystifying latches. The coverage of LSM trees in Part I was informative, though I found the explanations slightly less clear than those in DDIA. Part II, however, felt like a less effective, more muddled version of DDIA, focusing on distributed systems without providing the practical application details I was hoping for, like specific replication strategies or dedicated chapters on partitioning with real-world examples.

Anya

★ 4/5
This book offers a really solid introduction to some complex topics like BTrees and consensus, all presented through the lens of Cassandra, which is a refreshing change from the usual OLTP focus. While the second half really shone, I did find some sections could have used a bit more editing for clarity; the order of ideas wasn't always the most intuitive until you got further in. Despite that minor hiccup, it's a valuable read, and I'm excited to see what a future edition with a sharper editorial eye might bring.

Anya

★ 4/5
This book is a deep dive, not a quick guide, perfect for engineers and architects wanting to grasp the inner workings of data infrastructure from storage to distributed consensus. It meticulously breaks down storage engines, explaining data structures like B-Trees and LSM Trees, alongside indexing strategies and the intricacies of MVCC. The second half tackles distributed systems, covering replication, sharding, and foundational consensus algorithms like Paxos and Raft, with a solid look at consistency models and failure recovery. While dense, it's an invaluable resource for understanding how modern data systems function at a fundamental level.

Anya

★ 4/5
This book offers solid, mostly well-explained, and useful content, though it feels a bit like a miscellany. While the first half on Storage Engines is quite thorough, the second half on Distributed Systems reads more like a survey. It's a bit surprising given the preface's strong claim about the importance of distributed systems, making me expect more depth there. Nevertheless, it remains a valuable resource for both subjects.

Anya

★ 3/5
This book felt really unfocused, trying to cover too much ground. While the initial deep dive into data structures and storage mechanisms was fascinating, the extensive explanations of B-Tree optimizations felt unnecessary and quickly forgotten. The author's inconsistent level of detail, with brief mentions of crucial topics like data recovery compared to lengthy B-Tree discussions, was quite confusing. The second part, particularly the consensus algorithms, often felt like a rehash of material better covered elsewhere, making large portions of the book redundant for anyone already familiar with the subject. It's a decent starting point for absolute beginners in both databases and distributed systems, but it's a dense read.

Anya

★ 5/5
This book is a fantastic read, particularly for those interested in distributed systems. While it touches on similar ground to 'Designing Data-Intensive Applications' regarding distributed transactions, it really digs deeper. It offers a thorough explanation of internal database representations and delves into the intricacies of distributed system algorithms, making it an excellent follow-up or alternative.

Isabelle

★ 4/5
This book doesn't really break new ground, but it does a solid job of organizing existing concepts. It's a well-structured overview that clarifies the subject matter effectively.

Anya

★ 2/5
While the content itself was excellent, the book really struggled with its flow and transitions when explaining concepts. I found myself constantly taking notes and cross-referencing them just to keep track of the ideas. It's a shame because the material is so good, but the delivery makes it a bit of a challenge to read.

Eleanor

★ 4/5
This book offers a detailed exploration of structures and algorithms relevant to modern systems. While it doesn't delve into excessive specifics, this is compensated by a wealth of references and suggestions for further learning, making it a valuable resource for those eager to dive deeper.

Priya

★ 4/5
This book is an absolute must-read for anyone in data engineering, database administration, or architecture. It dives deep into two crucial areas: storage engines and distributed systems, and it truly excels in explaining them. While it intentionally skips other topics like query optimizers, its coverage of its core subjects is nothing short of outstanding.

Anya

★ 5/5
This book really dives deep into database internals, and the first part on storage engines is absolutely stellar, offering comprehensive coverage of hardware and data storage algorithms with excellent illustrations. While the distributed systems section starts accessible, it quickly ramps up in complexity, introducing advanced concepts that might require multiple reads or external resources, though the author helpfully provides references. Despite the challenging nature of some topics, the book ultimately provides a fantastic end-to-end understanding of how databases function, both locally and in distributed environments.

Priya

★ 4/5
This book really tackles the 'how do databases work?' question head-on, and for its size, it packs an incredible amount of information. It's split into local storage and distributed systems, covering everything from B+ trees and ARIES to modern concepts like RAMP transactions, Paxos, and Raft. The diagrams are a huge plus, making complex topics like Raft logs and distributed transaction messages surprisingly clear. While it's a fantastic resource for those with some prior knowledge wanting to deepen their understanding, it might be a bit dense for complete beginners to fully digest in one go.

Chloe

★ 5/5
This book was an absolute delight! I couldn't put it down and was completely captivated from beginning to end. Highly recommend!

Anya

★ 5/5
This book is a fantastic companion to 'Designing Data-Intensive Applications,' delving into the foundational concepts of persistent, distributed systems. It offers a comprehensive look at various algorithms and protocols for common challenges in the field, explaining the reasoning behind each choice and fostering an understanding of their trade-offs. Plus, it's packed with references for anyone eager to dive deeper into specific subjects.

Anya

★ 3/5
This book really dives deep into the internal database structure, offering a thorough explanation of components like the storage engine. It then transitions smoothly into discussing the general characteristics of distributed systems and touches on specific aspects of distributed databases. The inclusion of excellent resources for further exploration is a definite plus.

Anya

★ 5/5
This book really reminded me of my favorite, "Designing Data-Intensive Applications." The initial section on database internals is exceptionally detailed, packed with fantastic insights. While the second part, focusing on distributed systems, might feel a little less novel, it's still a solid read and definitely worth your time. I wholeheartedly recommend it to anyone involved with databases or distributed systems.

Liam

★ 5/5
This book is a fantastic resource for anyone looking to understand the inner workings of databases and distributed systems. It really breaks down the fundamental building blocks, making complex concepts much more accessible.

Anya

★ 3/5
This book is a fantastic resource if you're looking to dive deep into databases and distributed systems, especially with its extensive list of further reading suggestions. However, it doesn't quite flow like a traditional book; instead, it feels more like a comprehensive compendium of algorithms, data structures, and theories. While the content is valuable, a few more diagrams would have really helped clarify some of the algorithmic explanations.

Anya

★ 3/5
This book felt disjointed, like two separate entities crammed together without a clear connection. While the first half delves into database internals and the second explores distributed system components, there's a distinct lack of cohesion between them. It's a shame because the individual sections offer valuable insights, particularly if you're seeking details on specific algorithms like RAFT, but the overall guidance on integrating these concepts into a complete system is missing.

Priya

★ 5/5
This is truly one of the standout tech books I've encountered recently, especially for anyone diving into database internals and distribution. The first half is an absolute treasure trove, offering profound insights into B*-trees, LSM-trees, concurrency models, and memory versus disk optimizations that are hard to find elsewhere. While the second part, focusing on distributed systems, might not be as groundbreaking due to existing resources, it still shines with excellent explanations of the Paxos algorithm and anti-entropy, which are incredibly valuable. Honestly, if you're passionate about databases, this book is a must-have and likely the best you'll find on the subject.

Anya

★ 5/5
This book is incredibly dense and will likely leave you staring at a wall after finishing chapters, but in a good way! The author masterfully weaves complex topics like distributed systems and database internals, making them surprisingly understandable. While some sub-sections delve into highly academic territory that might not have immediate practical application, the core concepts are presented with remarkable clarity and logical progression. It's a book you'll want to revisit, and I've found creating my own detailed notes and mind maps essential for truly grasping the material.

Eleanor

★ 5/5
Alex Petrov's "Database Internals" is a fantastic addition to the O'Reilly lineup, sitting right up there with classics like "Designing Data-Intensive Applications." It tackles the complex, foundational concepts behind the distributed data systems we use every day, which is incredibly valuable given the explosion of database technologies and cloud services. This book cuts through the noise, explaining the core ideas that underpin all those NoSQL, newSQL, and even traditional RDBMS variants, making it an essential read for anyone navigating the modern data landscape.

Eleanor

★ 3/5
This book started off strong, offering a solid overview of database internals like connection listeners, query parsers, and storage layers. However, it veered into an overly deep and terse dive into tree-based data structures that felt unnecessary. While the later sections on distributed transactions and consensus were interesting, they didn't quite surpass the treatment in Kleppmann's "Designing Data-Intensive Applications," which I read first. It's a decent read, but I wouldn't consider it a necessary complement to Kleppmann's work.

Anya

★ 5/5
This book is a solid five-star read, diving deep into distributed system design patterns and database tree structures. While I occasionally wondered about the direct real-world application of certain concepts like file systems, the explanations of complex, deep-level ideas were remarkably clear. Even though some topics, such as Paxos, were a bit beyond my current technical grasp and required supplementary YouTube videos, the comprehensive reading list alone demonstrates the immense effort put into this work.

Priya

★ 2/5
This book offers a solid theoretical foundation, particularly excelling in its detailed explanations of b-trees and consensus protocols within distributed systems. While the conceptual discussions, like those on 2-phase commits, are well-illustrated, the absence of practical code examples and the generally dry subject matter made it a challenging read. It's definitely a valuable resource for referencing core concepts, though I wish there were more hands-on scenarios to solidify the learning.

Anya

★ 4/5
This book offers a wealth of good and interesting content, though it's not without its flaws. Some chapters feel a bit disjointed, lacking smooth transitions between topics or algorithms, while other sections are quite well-developed. The diagrams were a mixed bag; I often found them missing where I'd hoped for clearer explanations, yet present for rather obvious points. Despite these structural issues, the core material is strong, making it a worthwhile read that I'll certainly revisit for details.

Priya

★ 3/5
This book tackles storage in both singular systems and distributed environments, a unique approach compared to many other texts. While I found it packed with information, it didn't quite solidify my understanding. The author tends to hop between related concepts in short bursts, leaving ideas feeling underdeveloped and difficult to fully grasp. The first section seemed more cohesive, but the latter half definitely could have benefited from more depth.

Priya

★ 5/5
This book really impressed me! It's a fantastic companion to Martin Kleppmann's "Designing Data-Intensive Applications," offering a more in-depth look at implementation specifics, data structures, and algorithms. While it's definitely more technical and perhaps a tad drier than Kleppmann's work, the read remains surprisingly accessible.

Anya

★ 2/5
This book presents a rather peculiar blend of topics. The latter half, focusing on distributed data systems, is passable, though "Designing Data Intensive Applications" offers a superior treatment of the subject. The initial section, unfortunately, delves into numerous B-tree implementations, most of which are quite dense and difficult to decipher.
Shelves
Coding Computers Software Programming Nonfiction Alex Petrov book Textbooks Computer Science Technology Engineering Technical

More like this


97 Things Every Programmer Should Know: Collective Wisdom from the Experts

Tap into the wisdom of experts to learn what every programmer should know, no matter what language you use. With the 97 short and extremely useful…

4.25/5 · 500+ ratings

Compilers: Principles, Techniques, and Tools

This introduction to compilers is the direct descendant of the well-known book by Aho and Ullman, Principles of Compiler Design. The authors prese…

4.25/5 · 500+ ratings

Algorithms

This fourth edition of Robert Sedgewick and Kevin Wayne's Algorithms is the leading textbook on algorithms today and is widely used in colleges an…

4.25/5 · 500+ ratings

The Clean Coder: A Code of Conduct for Professional Programmers

Programmers who endure and succeed amidst swirling uncertainty and nonstop pressure share a common They care deeply about the practice of creating…

4.25/5 · 500+ ratings

The Passionate Programmer: Creating a Remarkable Career in Software Development

Success in today's IT environment requires you to view your career as a business endeavor. In this book, you'll learn how to become an entrepreneu…

4.25/5 · 500+ ratings

Effective Java

Since this Jolt-award winning classic was last updated in 2008 (shortly after Java 6 was released), Java has changed dramatically. The principal e…

4.25/5 · 500+ ratings

Peopleware: Productive Projects and Teams

Demarco and Lister demonstrate that the major issues of software development are human, not technical. Their answers aren't easy--just incredibly …

4.25/5 · 500+ ratings

Working Effectively with Legacy Code

Get more out of your legacy systems, more performance, functionality, reliability, and manageability.Is your code easy to change? Can you get near…

4.25/5 · 500+ ratings

Structure and Interpretation of Computer Programs

Structure and Interpretation of Computer Programs has had a dramatic impact on computer science curricula over the past decade. This long-awaited …

4.25/5 · 500+ ratings

The Art of Computer Programming, Volume 1: Fundamental Algorithms

The bible of all fundamental algorithms and the work that taught many of today's software developers most of what they know about computer program…

4.25/5 · 500+ ratings

Clean Architecture

Building upon the success of best-sellers The Clean Coder and Clean Code, legendary software craftsman Robert C. "Uncle Bob" Martin shows how to b…

4.25/5 · 500+ ratings

Learn You a Haskell for Great Good!

Learn You a Haskell for Great Good! is a fun, illustrated guide to learning Haskell, a functional programming language that's growing in popularit…

4.25/5 · 500+ ratings