Kevin Khoinfugue-projectBenchmarking PySpark Pandas, Pandas UDFs, and Fugue PolarsA case study on the performance of group-map operations on different backends.Apr 10, 20232Apr 10, 20232
Kevin Khoinfugue-projectLarge Scale Image Processing with Spark through FugueHow Clobotics Runs Distributed Image ProcessingJan 16, 20231Jan 16, 20231
Kevin KhoinTowards Data ScienceLarge Scale Data Profiling with whylogs and Fugue on Spark, Ray or DaskProfiling large-scale data for use cases such as anomaly detection, drift detection, and data validationOct 4, 2022Oct 4, 2022
Kevin KhoinTowards Data ScienceWhy SQL-Like Interfaces are Sub-optimal for Distributed ComputingExamining the limitations of the SQL interfaceAug 23, 20225Aug 23, 20225
Kevin KhoinTowards Data ScienceWhy Pandas-like Interfaces are Sub-optimal for Distributed ComputingA deep look at the assumptions of the Pandas interfaceJun 7, 20224Jun 7, 20224
Kevin KhoinThe Prefect BlogIntroducing Prefect-ML: Orchestrate a Distributed Hyperparameter Grid Search on DaskUse Prefect as a machine learning experiment tracker by leveraging mapping and artifacts.Mar 1, 2022Mar 1, 2022
Kevin KhoinTowards Data ScienceIntroducing Fugue — Reducing PySpark Developer FrictionIncrease developer productivity and decrease costs for big data projectsFeb 14, 20222Feb 14, 20222
Kevin KhoinTowards Data ScienceScaling PyCaret with Spark (or Dask) through FugueRun PyCaret functions on each partition of data distributedlyJan 7, 20222Jan 7, 20222
Kevin KhoinTowards Data ScienceDelivering Spark Big Data Projects Faster and Cheaper with FugueIncrease developer productivity and decrease compute usage for big data projectsNov 8, 2021Nov 8, 2021
Kevin KhoinPlumbers Of Data ScienceUsing FugueSQL on Spark DataFrames with DatabricksIn a previous article, I wrote about FugueSQL as a SQL interface on top of Spark, Dask, and Pandas DataFrames. For those familiar with…Nov 5, 2021Nov 5, 2021