Advanced Analytics with PySpark: Patterns for Learning from Data at Scale Using Python and Spark
- Pickup from New Mail
- New Mail Courier
- Pickup from the store
- Other transport services
- Cash upon receipt
- Bank transfer
- Privat 24
- WebMoney
- Автор: Akash Tandon | Sandy Ryza | Uri Laserson | Sean Owen | Josh Wills
- ISBN-10: 1098103653
- ISBN-13: 978-1098103651
- Edition: 1st
- Publisher: O'Reilly Media
- Publication date: July 19, 2022
- Language: English
- Dimensions: 6.98 x 0.51 x 9.14 inches
- Print length: 233 pages
From the brand
-
Databases, data science & more
Visit the Store
-
Sharing the knowledge of experts
O'Reilly's mission is to change the world by sharing the knowledge of innovators. For over 40 years, we've inspired companies and individuals to do new things (and do them better) by providing the skills and understanding that are necessary for success.
Our customers are hungry to build the innovations that propel the world forward. And we help them do just that.
From the Publisher
From the Preface
Apache Spark’s long lineage of predecessors, from MPI (message passing interface) to MapReduce, made it possible to write programs that take advantage of massive resources while abstracting away the nitty-gritty details of distributed systems. As much as data processing needs have motivated the development of these frameworks, in a way the field of big data has become so related to them that its scope is defined by what these frameworks can handle. Spark’s original promise was to take this a little further—to make writing distributed programs feel like writing regular programs.
The rise in Spark’s popularity coincided with that of the Python data (PyData) ecosystem. So it makes sense that Spark’s Python API—PySpark—has significantly grown in popularity over the last few years. Although the PyData ecosystem has recently sprung up some distributed programming options, Apache Spark remains one of the most popular choices for working with large datasets across industries and domains. Thanks to recent efforts to integrate PySpark with the other PyData tools, learning the framework can help you boost your productivity significantly as a data science practitioner.
We think that the best way to teach data science is by example. To that end, we have put together a book of applications, trying to touch on the interactions between the most common algorithms, datasets, and design patterns in large-scale analytics. This book isn’t meant to be read cover to cover: page to a chapter that looks like something you’re trying to accomplish, or that simply ignites your interest, and start there.