Spark: the definitive guide: big data processing made simple
Chambers, William Andrew, Zaharia, Matei
Part 1. Gentle overview of big data and Spark. What is Apache Spark? -- A gentle introduction to Spark -- A tour of Spark's toolset -- Part 2. Structured APIs : DataFrames, SQL, and datasets. Structured API overview -- Basic structured operations -- Working with different types of data -- Aggregations -- Joins -- Data sources -- Spark SQL -- Datasets -- Part 3. Low-level APIs. Resilient distributed datasets (RDDs) -- Advanced RDDs -- Distributed shared variables -- Part 4. Production applications. How Spark runs on a cluster -- Developint Spark applications -- Deploying Spark -- Monitoring and debugging -- Performance tuning -- Part 5. Streaming. Stream processing fundamentals -- Structured streaming basics -- Event-time and stateful processing -- Structured streaming in production -- Part 6. Advanced analytics and machine learning. Advanced analytics and machine learning overview -- Preprocessing and feature engineering -- Classification -- Regression -- Recommendation -- Unsupervised learning -- Graph analytics -- Deep learning -- Part 7. Ecosystem. Language specifics : Python (PySpark) and R (SparkR and sparklyr) -- Ecosystem and community.
Catégories:
Année:
2018
Edition:
First edition
Editeur::
O'Reilly Media
Langue:
english
ISBN 10:
1491912308
ISBN 13:
9781491912300
Fichier:
EPUB, 7.53 MB
IPFS:
,
english, 2018