Kevin Kho – Medium

Kevin Kho

Published in
fugue-project

Benchmarking PySpark Pandas, Pandas UDFs, and Fugue Polars

A case study on the performance of group-map operations on different backends.

Apr 10, 2023

Benchmarking PySpark Pandas, Pandas UDFs, and Fugue Polars

Apr 10, 2023

Published in
fugue-project

Large Scale Image Processing with Spark through Fugue

How Clobotics Runs Distributed Image Processing

Jan 16, 2023

Large Scale Image Processing with Spark through Fugue

Jan 16, 2023

Published in
TDS Archive

Large Scale Data Profiling with whylogs and Fugue on Spark, Ray or Dask

Profiling large-scale data for use cases such as anomaly detection, drift detection, and data validation

Oct 4, 2022

Large Scale Data Profiling with whylogs and Fugue on Spark, Ray or Dask

Oct 4, 2022

Published in
TDS Archive

Why SQL-Like Interfaces are Sub-optimal for Distributed Computing

Examining the limitations of the SQL interface

Aug 23, 2022

Why SQL-Like Interfaces are Sub-optimal for Distributed Computing

Aug 23, 2022

Published in
TDS Archive

Why Pandas-like Interfaces are Sub-optimal for Distributed Computing

A deep look at the assumptions of the Pandas interface

Jun 7, 2022

Why Pandas-like Interfaces are Sub-optimal for Distributed Computing

Jun 7, 2022

Published in
The Prefect Blog

Introducing Prefect-ML: Orchestrate a Distributed Hyperparameter Grid Search on Dask

Use Prefect as a machine learning experiment tracker by leveraging mapping and artifacts.

Mar 1, 2022

Introducing Prefect-ML: Orchestrate a Distributed Hyperparameter Grid Search on Dask

Mar 1, 2022

Published in
TDS Archive

Introducing Fugue — Reducing PySpark Developer Friction

Increase developer productivity and decrease costs for big data projects

Feb 14, 2022

Introducing Fugue — Reducing PySpark Developer Friction

Feb 14, 2022

Published in
TDS Archive

Scaling PyCaret with Spark (or Dask) through Fugue

Run PyCaret functions on each partition of data distributedly

Jan 7, 2022

Scaling PyCaret with Spark (or Dask) through Fugue

Jan 7, 2022

Published in
TDS Archive

Delivering Spark Big Data Projects Faster and Cheaper with Fugue

Increase developer productivity and decrease compute usage for big data projects

Nov 8, 2021

Delivering Spark Big Data Projects Faster and Cheaper with Fugue

Nov 8, 2021

Published in
Plumbers Of Data Science

Using FugueSQL on Spark DataFrames with Databricks

In a previous article, I wrote about FugueSQL as a SQL interface on top of Spark, Dask, and Pandas DataFrames. For those familiar with…

Nov 5, 2021

Using FugueSQL on Spark DataFrames with Databricks

Nov 5, 2021

Kevin Kho

Kevin Kho

Working on Fugue. prev. at Prefect. https://github.com/fugue-project/fugue/

Following

Help
Status
About
Careers
Press
Blog
Privacy
Rules
Terms
Text to speech