Fundamentals of Data Engineering: Plan and Build Robust Data Systems
- Pickup from New Mail
- New Mail Courier
- Pickup from the store
- Other transport services
- Cash upon receipt
- Bank transfer
- Privat 24
- WebMoney
- Автор: Joe Reis | Matt Housley
- ISBN-10: 1098108302
- ISBN-13: 978-1098108304
- Edition: 1st
- Publisher: O'Reilly Media
- Publication date: July 26, 2022
- Language: English
- Dimensions: 7 x 1 x 9.25 inches
- Print length: 447 pages
From the brand
-
Databases, data science & more
Visit the Store
-
Sharing the knowledge of experts
O'Reilly's mission is to change the world by sharing the knowledge of innovators. For over 40 years, we've inspired companies and individuals to do new things (and do them better) by providing the skills and understanding that are necessary for success.
Our customers are hungry to build the innovations that propel the world forward. And we help them do just that.
From the Publisher
From the Preface
How did this book come about? The origin is deeply rooted in our journey from data science into data engineering. We often jokingly refer to ourselves as recovering data scientists. We both had the experience of being assigned to data science projects, then struggling to execute these projects due to a lack of proper foundations. Our journey into data engineering began when we undertook data engineering tasks to build foundations and infrastructure.
With the rise of data science, companies splashed out lavishly on data science talent, hoping to reap rich rewards. Very often, data scientists struggled with basic problems that their background and training did not address—data collection, data cleansing, data access, data transformation, and data infrastructure. These are problems that data engineering aims to solve.
What This Book Isn’t
Before we cover what this book is about and what you’ll get out of it, let’s quickly cover what this book isn’t. This book isn’t about data engineering using a particular tool, technology, or platform. While many excellent books approach data engineering technologies from this perspective, these books have a short shelf life. Instead, we focus on the fundamental concepts behind data engineering.
By the end of this book you will understand:
- How data engineering impacts your current role (data scientist, software engineer, or data team)
- How to cut through the marketing hype and choose the right technologies, data arch. & processes
- How to use the data engineering lifecycle to design and build a robust architecture
- Best practices for each stage of the data lifecycle
What This Book Is About
This book aims to fill a gap in current data engineering content and materials. While there’s no shortage of technical resources that address specific data engineering tools and technologies, people struggle to understand how to assemble these components into a coherent whole that applies in the real world. This book connects the dots of the end-to-end data lifecycle. It shows you how to stitch together various technologies to serve the needs of downstream data consumers such as analysts, data scientists, and machine learning engineers. This book works as a complement to O’Reilly books that cover the details of particular technologies, platforms, and programming languages.
The big idea of this book is the data engineering lifecycle: data generation, storage, ingestion, transformation, and serving. Since the dawn of data, we’ve seen the rise and fall of innumerable specific technologies and vendor products, but the data engineering lifecycle stages have remained essentially unchanged. With this framework, the reader will come away with a sound understanding for applying technologies to real-world business problems.
Our goal here is to map out principles that reach across two axes. First, we wish to distill data engineering into principles that can encompass any relevant technology. Second, we wish to present principles that will stand the test of time. We hope that these ideas reflect lessons learned across the data technology upheaval of the last twenty years and that our mental framework will remain useful for a decade or more into the future.
One thing to note: we unapologetically take a cloud-first approach. We view the cloud as a fundamentally transformative development that will endure for decades; most on-premises data systems and workloads will eventually move to cloud hosting. We assume that infrastructure and systems are ephemeral and scalable, and that data engineers will lean toward deploying managed services in the cloud. That said, most concepts in this book will translate to non-cloud environments.
Who Should Read This Book
Our primary intended audience for this book consists of technical practitioners, mid- to senior-level software engineers, data scientists, or analysts interested in moving into data engineering; or data engineers working in the guts of specific technologies, but wanting to develop a more comprehensive perspective. Our secondary target audience consists of data stakeholders who work adjacent to technical practitioners—e.g., a data team lead with a technical background overseeing a team of data engineers, or a director of data warehousing wanting to migrate from on-premises technology to a cloud-based solution.
Ideally, you’re curious and want to learn—why else would you be reading this book? You stay current with data technologies and trends by reading books and articles on data warehousing/data lakes, batch and streaming systems, orchestration, modeling, management, analysis, developments in cloud technologies, etc. This book will help you weave what you’ve read into a complete picture of data engineering across technologies and paradigms.