randy.pub

Personal homepage
Author Randy Boyes
Updated
Nav Home Publications Resume Posts

A more complete slice

Part 1 looked at some basic half-implementations of tidyverse functions, but can we actually get a full implementation of at least one function? slice seems like the easiest, so lets try. Tidyverse slice has the following options:

Set up as in the last post:

using DataFrames

struct TidyExpr
    f::Function
end

import Base.|>
Base.:(|>)(x::TidyExpr, y::TidyExpr) = TidyExpr(x.f ∘ y.f)
Base.:(|>)(x::DataFrames.DataFrame, y::TidyExpr) = y.f(x)

We'll need to expand the slice function to take kwargs and do things with the provided values. Slurped kwargs in julia are represented as Pairs, which can be easily turned into a Dict:

function example(args...; kwargs...)
  println(args)
  println("Keyword args:")
  println(Dict(kwargs))
end

example(1:10, by = :a, order_by = :b, na_rm = true)
(1:10,)
Keyword args:
Dict{Symbol, Any}(:order_by => :b, :by => :a, :na_rm => true)

As a first pass, lets set up a loop over a couple of supported kwargs with basic implementations of their functionality.

function slice(args...; kwargs...)
  f = identity
  for (k, v) in Dict(kwargs)
    if k == :by
      f = f ∘ (x -> groupby(x, v))
    elseif k == :order_by
      f = f ∘ (x -> sort(x, v))
    end
  end

  return TidyExpr(x -> f(x) isa GroupedDataFrame ?
    combine(y -> getindex(y, args[1], :), f(x)) :
    f(x)[args[1], :])
end

DataFrame(a = 1:6, b = [1, 1, 1, 2, 2, 2]) |> slice(1, by = :b)
2×2 DataFrame
 Row │ b      a
     │ Int64  Int64
─────┼──────────────
   1 │     1      1
   2 │     2      4
DataFrame(a = 1:6, b = 6:-1:1) |> slice(1, order_by = :b)
DataFrameRow
 Row │ a      b
     │ Int64  Int64
─────┼──────────────
   1 │     6      1

The structure of the function works, we just need to add the rest of the arguments:

function slice(args...; kwargs...)
  f = identity
  selection = args[1]
  if any(selection .< 0)
    selection = setdiff(setdiff)
  end
  dkw = Dict(kwargs)
  for (k, v) in dkw
    if k == :n
      selection = 1:v
    elseif k == :prop
      f = f ∘ (x -> x[1:floor(nrow(x) * prop), :])
    elseif k == :by
      f = f ∘ (x -> groupby(x, v))
    elseif k == :order_by
      f = f ∘ (x -> sort(x, v))
    elseif k == :replace

    elseif k == :weight_by

    elseif k == :na_rm

    elseif k == :preserve

    end
  end

  return TidyExpr(x -> f(x) isa GroupedDataFrame ?
    combine(y -> getindex(y, selection, :), f(x)) :
    f(x)[selection, :])
end
MIT License Randy Boyes. Website built with Franklin.jl and the Julia programming language. Design inspired by The Monospace Web. Code for this website is available on Github.