randy.pubPersonal homepage |
Author | Randy Boyes | |
Updated | |||
Nav | Home Publications Resume Posts |
Information destruction in the form of categorization of continuous data is widely denounced and even more widely practiced in the social sciences. Usually this "technique" results in attenuation of correlations, but are there scenarios where you can actually p-hack using dichotomization?
using TidierPlots
using TidierData
using Random
using DataFrames
using Makie
Random.seed!(123)
n = 200
x = randn(n)
y = 0.1 * x + randn(n)
df = DataFrame(x = x, y = y)
plot = ggplot(df) +
geom_point(aes(x = :x, y = :y))
df_di = @chain df begin
@mutate belowzero = x < 0
@mutate jitter = (rand(!!n) - .5) / 20
@mutate x_new = jitter + as_float(belowzero)
@mutate label = belowzero ? "Low" : "High"
end
plot2 = ggplot(df_di) +
geom_point(aes(x = :x_new, y = :y))