Ordinal Notes

Authors
Affiliation

Filippo Gambarota

Department of Developmental Psychology and Socialization, University of Padova

Gianmarco Altoè

\(-\mathbf{X}\boldsymbol{\beta}\) vs \(\mathbf{X}\boldsymbol{\beta}\) parametrization

In the tutorial we used the \(\alpha_k -\mathbf{X}\boldsymbol{\beta}\) parametrization (thus with the minus sign) because this force the \(\beta\) to have the interpretation as in standard regression models. Usually, \(\beta\) is the increase in \(y\) for a unit increase in \(x\). A negative \(\beta\) means that the expected value of \(y\) decrease for an increase in \(x\). Let’s see an example using the positive sign for \(\beta\).

k <- 4 # number of ordinal outcomes
x <- c(0, 1) # a binary predictor
b1 <- log(3) # log odds ratio comparing x1 and x0
probs0 <- rep(1/k, k)
alpha <- prob_to_alpha(probs0, link = "logit")
X <- matrix(x, nrow = 2)

# positive sign
(lp <- lapply(alpha, function(a) c(a + X %*% b1)))
$`1|2`
[1] -1.098612  0.000000

$`2|3`
[1] 0.000000 1.098612

$`3|4`
[1] 1.098612 2.197225
cump <- lapply(lp, plogis)
cump <- cbind(0, data.frame(cump), 1)
p <- data.frame(t(apply(cump, 1, diff)))
names(p) <- paste0("y", 1:k)
p$x <- x
p
    y1   y2   y3   y4 x
1 0.25 0.25 0.25 0.25 0
2 0.50 0.25 0.15 0.10 1
p |> 
  pivot_longer(starts_with("y")) |> 
  ggplot(aes(x = x, y = value, fill = name)) +
  geom_col(position = position_dodge()) +
  ylab("Probability") +
  ylim(c(0, 1)) +
  theme(legend.title = element_blank(),
        legend.position = "bottom") +
  ggtitle(latex2exp::TeX("$P(Y \\leq k) = \\alpha_k + X\\beta$"))

Clearly a positive \(\beta\) create higher probability for lower \(Y\) categories. This can be somehow not intuitive thus we can use the negative sign.

# negative sign

(lp <- lapply(alpha, function(a) c(a - X %*% b1)))
$`1|2`
[1] -1.098612 -2.197225

$`2|3`
[1]  0.000000 -1.098612

$`3|4`
[1] 1.098612 0.000000
cump <- lapply(lp, plogis)
cump <- cbind(0, data.frame(cump), 1)
p <- data.frame(t(apply(cump, 1, diff)))
names(p) <- paste0("y", 1:k)
p$x <- x

p |> 
  pivot_longer(starts_with("y")) |> 
  ggplot(aes(x = x, y = value, fill = name)) +
  geom_col(position = position_dodge()) +
  ylab("Probability") +
  ylim(c(0, 1)) +
  theme(legend.title = element_blank(),
        legend.position = "bottom") +
  ggtitle(latex2exp::TeX("$P(Y \\leq k) = \\alpha_k - X\\beta$"))

Using the second parametrization, with positive \(\beta\) we have higher probability for higher \(Y\) categories and the opposite.

The negative-sign parametrization is implicit when simulating from the latent distribution. Let’s see an example. We use a continuous \(x\) predictor because it is easier to see the results.

n <- 1e3
x <- runif(1e3)
B <- 3
ys_p <- x * B + rnorm(n)
ys_n <- x * -B + rnorm(n)
y_p <- findInterval(ys_p, alpha) + 1
y_n <- findInterval(ys_n, alpha) + 1

par(mfrow = c(1,2))
plot(x, ys_p, col = y_p, pch = 19, ylim = c(-7, 7), main = latex2exp::TeX("$\\Y^{*} = X \\beta + \\epsilon$, $\\beta = 3$"), ylab = latex2exp::TeX("$Y^{*}$"))
abline(h = alpha, lty = "dashed")
legend("bottomleft", fill = 1:k, legend = paste0("Y", 1:k))

plot(x, ys_n, col = y_n, pch = 19, ylim = c(-7, 7), main = latex2exp::TeX("$\\Y^{*} = X \\beta + \\epsilon$, $\\beta = -3$"), ylab = latex2exp::TeX("$Y^{*}$"))
abline(h = alpha, lty = "dashed")

Simulated data using the latent variable approach. The dotted lines are the thresholds \(\alpha\)

Checking the impact of \(\mathbf{\beta}\)

Choosing one or more plausible \(\beta_j\) values can be challenging. For a single \(\beta\) we can easily think about the odds ratio (for a logit model) or the Cohen’s \(d\) (for a probit) model. With multiple predictors and their interactions is not easy to fix plausible values. A good strategy is to try different values and check the impact on the predicted probabilities. In practice, we need to compute the predicted probabilities using the model equation for the \(k\) ordinal outcomes. This can be easily done with the sim_ord_latent() function, fixing the simulate = FALSE parameter. In this way, only the predicted probabilities are computed. Let’s see an example for a single \(x\) sampled for a standard normal distribution.

k <- 4
dat <- data.frame(x = seq(-4, 4, 0.1))
b1 <- 0.5
probs0 <- rep(1/k, k)
dat <- sim_ord_latent(~x, beta = b1, prob0 = probs0, data = dat, simulate = FALSE)
head(dat)
     x       yp1      yp12     yp123        y1        y2         y3         y4
1 -4.0 0.7112346 0.8807971 0.9568355 0.7112346 0.1695625 0.07603839 0.04316453
2 -3.9 0.7008582 0.8754466 0.9547226 0.7008582 0.1745885 0.07927594 0.04527742
3 -3.8 0.6902712 0.8698915 0.9525114 0.6902712 0.1796203 0.08261987 0.04748860
4 -3.7 0.6794810 0.8641271 0.9501979 0.6794810 0.1846461 0.08607076 0.04980214
5 -3.6 0.6684954 0.8581489 0.9477778 0.6684954 0.1896536 0.08962886 0.05222221
6 -3.5 0.6573231 0.8519528 0.9452469 0.6573231 0.1946297 0.09329410 0.05475309

Then we can plot the results:

dat |> 
  pivot_longer(matches("^y[1-9]")) |> 
  ggplot(aes(x = x, y = value, color = name)) +
  geom_line()

In this case the \(\beta_1 = 0.5\) can be considered a plausible value. Let’s see what happens increasing it:

data.frame(x = seq(-4, 4, 0.1)) |> 
  sim_ord_latent(~x, beta = 4, prob0 = probs0, data = _, simulate = FALSE) |> 
  pivot_longer(matches("^y[1-9]")) |> 
  ggplot(aes(x = x, y = value, color = name)) +
  geom_line()

We can clearly see the difference and probably \(\beta = 4\) can be considered too large. To note, the same result can be achieved using the num_latent_plot() (that under the hood uses the sim_ord_latent() function).

Let’s make now an example, with an interaction between a continous and categorical predictor.

dat <- expand_grid(x = seq(-4, 4, 0.1), g = c("a", "b"))
dat$g <- factor(dat$g)
contrasts(dat$g) <- c(-0.5, 0.5)
beta <- c(b1 = 0.5, b2 = 1, b3 = 0.1)
dat <- sim_ord_latent(~ x * g, beta = beta, prob0 = probs0, data = dat, simulate = FALSE)

dat |> 
  pivot_longer(matches("^y[1-9]")) |> 
  ggplot(aes(x = x, y = value, color = name, lty = g)) +
  geom_line()

We can see the impact of \(\beta_3 = 0.1\) that is the difference in slopes between the two groups. We can also use another plot to better see the group effect on each \(Y\).

dat |> 
  pivot_longer(matches("^y[1-9]")) |> 
  ggplot(aes(x = x, y = value, color = g)) +
  geom_line() +
  facet_wrap(~name)

Checking scale effects

With a numerical \(x\), checking the impact of scale effects is not easy (at least compared to the categorical case). For example, using the sim_ord_latent() function we can see the impact on the predicted probabilities of simulating a scale effect:

x <- runif(100)
b1 <- 10
z1 <- 2
k <- 4

dat <- data.frame(x = runif(100)) |> 
  sim_ord_latent(~x, ~x, beta = b1, zeta = z1, prob0 = rep(1/k, k), data = _, simulate = FALSE, link = "logit")
head(dat)
           x        yp1      yp12     yp123         y1         y2         y3
1 0.08801661 0.15984470 0.3234201 0.5456691 0.15984470 0.16357537 0.22224900
2 0.31533423 0.09423033 0.1573178 0.2509408 0.09423033 0.06308746 0.09362299
3 0.19417205 0.11285650 0.2113455 0.3608266 0.11285650 0.09848903 0.14948109
4 0.04250831 0.19789393 0.4036120 0.6499079 0.19789393 0.20571804 0.24629596
5 0.49774324 0.09575014 0.1371248 0.1925705 0.09575014 0.04137468 0.05544571
6 0.92380891 0.16394135 0.1890745 0.2170603 0.16394135 0.02513311 0.02798581
         y4
1 0.4543309
2 0.7490592
3 0.6391734
4 0.3500921
5 0.8074295
6 0.7829397
dat |> 
  pivot_longer(matches("^y[1-9]")) |> 
  ggplot(aes(x = x, y = value, color = name)) +
  geom_line()

We see only the expected probabilities but is not clear the impact of the scale effect. Instead we can simulate data and calculate the expected value of the location (\(\mu\)) and scale (\(s\)) for a categorized version of \(x\). Then plotting the average location and scale we can understand the impact of choosing a specific parameter value with a numerical predictor.

data.frame(x = runif(1e5)) |>
  sim_ord_latent(~x, ~x, beta = b1, zeta = z1, prob0 = rep(1/k, k), data = _, simulate = TRUE, link = "probit") |> 
  mutate(xc = cut(x, seq(0, 1, 0.1), include.lowest = TRUE),
         xc = as.integer(xc)) |> 
  group_by(xc) |> 
  summarise(location = mean(ys),
            scale = sd(ys)) |> 
  pivot_longer(c(location, scale)) |> 
  ggplot(aes(x = xc, y = value)) +
  facet_wrap(~name) +
  geom_line()