randy.pub

Personal homepage
Author Randy Boyes
Updated
Nav Home Publications Resume Posts

Dichotomania

Information destruction in the form of categorization of continuous data is widely denounced and even more widely practiced in the social sciences. Usually this "technique" results in attenuation of correlations, but are there scenarios where you can actually p-hack using dichotomization?

using TidierPlots
using TidierData
using Random
using DataFrames
using Makie

Random.seed!(123)
n = 200
x = randn(n)
y = 0.1 * x + randn(n)
df = DataFrame(x = x, y = y)

plot = ggplot(df) +
  geom_point(aes(x = :x, y = :y))

df_di = @chain df begin
  @mutate belowzero = x < 0
  @mutate jitter = (rand(!!n) - .5) / 20
  @mutate x_new = jitter + as_float(belowzero)
  @mutate label = belowzero ? "Low" : "High"
end

plot2 = ggplot(df_di) +
  geom_point(aes(x = :x_new, y = :y))
MIT License Randy Boyes. Website built with Franklin.jl and the Julia programming language. Design inspired by The Monospace Web. Code for this website is available on Github.