Posts by Ed Yeh

Structured Data Processing in Apache Spark, Using Data Frame

Apache Spark™ is a fast and general engine for large-scale data processing. It has quickly become a powerful and necessary tool in the world of Big Data. This webinar will provide a technical overview of Apache Spark DataFrame. Agenda: – Definition of a DataFrame including various data sources, primary feature and architecture – Use cases …

Processing and Serving Data with Apache Spark

Written by: Edward Yeh, Principal Big Data Consultant So what is Apache Spark and why do we care? Spark is a fast and general-purpose cluster computing system that is used for large-scale data processing of both structured and unstructured data. The project was initially developed by the AMPlab at UC Berkeley and has now evolved …