Kevin Khoinfugue-projectBenchmarking PySpark Pandas, Pandas UDFs, and Fugue PolarsA case study on the performance of group-map operations on different backends.7 min read·Apr 10, 2023--2--2
Kevin Khoinfugue-projectLarge Scale Image Processing with Spark through FugueHow Clobotics Runs Distributed Image Processing5 min read·Jan 16, 2023--1--1
Kevin KhoinTowards Data ScienceLarge Scale Data Profiling with whylogs and Fugue on Spark, Ray or DaskProfiling large-scale data for use cases such as anomaly detection, drift detection, and data validation6 min read·Oct 4, 2022----
Kevin KhoinTowards Data ScienceWhy SQL-Like Interfaces are Sub-optimal for Distributed ComputingExamining the limitations of the SQL interface9 min read·Aug 23, 2022--5--5
Kevin KhoinTowards Data ScienceWhy Pandas-like Interfaces are Sub-optimal for Distributed ComputingA deep look at the assumptions of the Pandas interface9 min read·Jun 7, 2022--4--4
Kevin KhoinThe Prefect BlogIntroducing Prefect-ML: Orchestrate a Distributed Hyperparameter Grid Search on DaskUse Prefect as a machine learning experiment tracker by leveraging mapping and artifacts.10 min read·Mar 1, 2022----
Kevin KhoinTowards Data ScienceIntroducing Fugue — Reducing PySpark Developer FrictionIncrease developer productivity and decrease costs for big data projects15 min read·Feb 14, 2022--2--2
Kevin KhoinTowards Data ScienceScaling PyCaret with Spark (or Dask) through FugueRun PyCaret functions on each partition of data distributedly5 min read·Jan 7, 2022--2--2
Kevin KhoinTowards Data ScienceDelivering Spark Big Data Projects Faster and Cheaper with FugueIncrease developer productivity and decrease compute usage for big data projects8 min read·Nov 8, 2021----
Kevin KhoinPlumbers Of Data ScienceUsing FugueSQL on Spark DataFrames with DatabricksIn a previous article, I wrote about FugueSQL as a SQL interface on top of Spark, Dask, and Pandas DataFrames. For those familiar with…5 min read·Nov 5, 2021----