Published infugue-projectBenchmarking PySpark Pandas, Pandas UDFs, and Fugue PolarsA case study on the performance of group-map operations on different backends.Apr 10, 20232Apr 10, 20232
Published infugue-projectLarge Scale Image Processing with Spark through FugueHow Clobotics Runs Distributed Image ProcessingJan 16, 20231Jan 16, 20231
Published inTowards Data ScienceLarge Scale Data Profiling with whylogs and Fugue on Spark, Ray or DaskProfiling large-scale data for use cases such as anomaly detection, drift detection, and data validationOct 4, 2022Oct 4, 2022
Published inTowards Data ScienceWhy SQL-Like Interfaces are Sub-optimal for Distributed ComputingExamining the limitations of the SQL interfaceAug 23, 20225Aug 23, 20225
Published inTowards Data ScienceWhy Pandas-like Interfaces are Sub-optimal for Distributed ComputingA deep look at the assumptions of the Pandas interfaceJun 7, 20224Jun 7, 20224
Published inThe Prefect BlogIntroducing Prefect-ML: Orchestrate a Distributed Hyperparameter Grid Search on DaskUse Prefect as a machine learning experiment tracker by leveraging mapping and artifacts.Mar 1, 2022Mar 1, 2022
Published inTowards Data ScienceIntroducing Fugue — Reducing PySpark Developer FrictionIncrease developer productivity and decrease costs for big data projectsFeb 14, 20222Feb 14, 20222
Published inTowards Data ScienceScaling PyCaret with Spark (or Dask) through FugueRun PyCaret functions on each partition of data distributedlyJan 7, 20222Jan 7, 20222
Published inTowards Data ScienceDelivering Spark Big Data Projects Faster and Cheaper with FugueIncrease developer productivity and decrease compute usage for big data projectsNov 8, 2021Nov 8, 2021
Published inPlumbers Of Data ScienceUsing FugueSQL on Spark DataFrames with DatabricksIn a previous article, I wrote about FugueSQL as a SQL interface on top of Spark, Dask, and Pandas DataFrames. For those familiar with…Nov 5, 2021Nov 5, 2021