Data Science on AWS: Implementing End-to-End, Continuous AI and Machine Learning Pipelines
- Pickup from New Mail
- New Mail Courier
- Pickup from the store
- Other transport services
- Cash upon receipt
- Bank transfer
- Privat 24
- WebMoney
- Автор: Chris Fregly | Antje Barth
- ISBN-10: 1492079391
- ISBN-13: 978-1492079392
- Edition: 1st
- Publisher: O'Reilly Media
- Publication date: May 11, 2021
- Language: English
- Dimensions: 7 x 1.05 x 9.19 inches
- Print length: 521 pages
From the brand
-
Explore more AWS resources
Visit the Store
-
Sharing the knowledge of experts
O'Reilly's mission is to change the world by sharing the knowledge of innovators. For over 40 years, we've inspired companies and individuals to do new things (and do them better) by providing the skills and understanding that are necessary for success.
Our customers are hungry to build the innovations that propel the world forward. And we help them do just that.
From the Publisher
Who Should Read This Book
This book is for anyone who uses data to make critical business decisions. The guidance here will help data analysts, data scientists, data engineers, ML engineers, research scientists, application developers, and DevOps engineers broaden their understanding of the modern data science stack and level up their skills in the cloud.
The Amazon AI and ML stack unifies data science, data engineering, and application development to help users level up their skills beyond their current roles. We show how to build and run pipelines in the cloud, then integrate the results into applications in minutes instead of days.
Ideally, and to get most out of this book, we suggest readers have the following knowledge:
- Basic understanding of cloud computing
- Basic programming skills with Python, R, Java/Scala, or SQL
- Basic familiarity with data science tools such as Jupyter Notebook, pandas, NumPy, or scikit-learn
Overview of the Chapters
Chapter 1 provides an overview of the broad and deep Amazon AI and ML stack, an enormously powerful and diverse set of services, open source libraries, and infrastructure to use for data science projects of any complexity and scale.
Chapter 2 describes how to apply the Amazon AI and ML stack to real-world use cases for recommendations, computer vision, fraud detection, natural language understanding (NLU), conversational devices, cognitive search, customer support, industrial predictive maintenance, home automation, Internet of Things (IoT), healthcare, and quantum computing.
Chapter 3 demonstrates how to use AutoML to implement a specific subset of these use cases with SageMaker Autopilot.
Chapters 4–9 dive deep into the complete model development life cycle (MDLC) for a BERT-based NLP use case, including data ingestion and analysis, feature selection and engineering, model training and tuning, and model deployment with Amazon SageMaker, Amazon Athena, Amazon Redshift, Amazon EMR, TensorFlow, PyTorch, and serverless Apache Spark.
Chapter 10 ties everything together into repeatable pipelines using MLOps with SageMaker Pipelines, Kubeflow Pipelines, Apache Airflow, MLflow, and TFX.
Chapter 11 demonstrates real-time ML, anomaly detection, and streaming analytics on real-time data streams with Amazon Kinesis and Apache Kafka.
Chapter 12 presents a comprehensive set of security best practices for data science projects and workflows, including IAM, authentication, authorization, network isolation, data encryption at rest, post-quantum network encryption in transit, governance, and auditability.
Throughout the book, we provide tips to reduce cost and improve performance for data science projects on AWS.