Title: | Turn a Regression Model Inside Out |
---|---|
Description: | Turns regression models inside out. Functions decompose variances and coefficients for various regression model types. Functions also visualize regression model objects using techniques developed in Schoon, Melamed, and Breiger (2024) <doi:10.1017/9781108887205>. |
Authors: | David Melamed [aut, cre] , Ronald L. Breiger [aut], Eric W. Schoon [aut] |
Maintainer: | David Melamed <[email protected]> |
License: | GPL-2 | GPL-3 |
Version: | 1.1.1 |
Built: | 2024-11-23 14:21:18 UTC |
Source: | https://github.com/dmmelamed/rioplot |
Beckfield (2006) analyzed these data using fixed and random effects regression models. He showed that regional economic and political integregation is associated with increased economic inequality. Schoon, Melamed, and Breiger (2024) turned these models inside out and decomposed the model coefficients.
data("Beckfield06")
data("Beckfield06")
A data frame with 48 observations on the following 9 variables.
year
a numeric vector
polint
a numeric vector
ecoint
a numeric vector
ecoints
a numeric vector
gdp
a numeric vector
trans
a numeric vector
outflo
a numeric vector
gini
a numeric vector
countryid
a character vector
Beckfield, Jason. 2006. "European integration and income inequality."" American Sociological Review 71(6): 964-985. Schoon, Eric W., David Melamed, and Ronald L. Breiger. 2024. Regression Inside Out. NY: Cambridge University Press.
data(Beckfield06) head(Beckfield06)
data(Beckfield06) head(Beckfield06)
Given two points, the function computes the cosine similarity between them.
cosine(x,y)
cosine(x,y)
x |
Point 1 |
y |
Point 2 |
The cosine similarity, ranging between -1 and +1.
Ronald L. Breiger, David Melamed and Eric Schoon
Schoon, Eric, David Melamed, and Ronald L. Breiger. 2023. Regression Inside Out. NY: Cambridge University Press.
data(Kenworthy99) m1 <- lm(scale(dv) ~ scale(gdp) + scale(pov) + scale(tran) -1,data=Kenworthy99) rp1 <- rio.plot(m1,include.int="no",r1=1:15) cosine(rp1$row.dimensions[15,],rp1$row.dimensions[8,]) # cosine similarity between USA and Ireland cosine(rp1$row.dimensions[15,],rp1$row.dimensions[14,]) # cosine similarity between USA and United Kingdom
data(Kenworthy99) m1 <- lm(scale(dv) ~ scale(gdp) + scale(pov) + scale(tran) -1,data=Kenworthy99) rp1 <- rio.plot(m1,include.int="no",r1=1:15) cosine(rp1$row.dimensions[15,],rp1$row.dimensions[8,]) # cosine similarity between USA and Ireland cosine(rp1$row.dimensions[15,],rp1$row.dimensions[14,]) # cosine similarity between USA and United Kingdom
This function takes a regression model object and a vector of case assignments to groups (note, cases can be in their own group) and computes each cases' contribution to the overall regression coefficients.
decompose.model(m1,group.by=group.by,include.int="yes",model.type="OLS")
decompose.model(m1,group.by=group.by,include.int="yes",model.type="OLS")
m1 |
A regression model object. OLS, logistic, Poisson and negative binomial regression are supported. |
group.by |
A numeric vector denoting group membership. Should be the same length as the number of cases. |
include.int |
Whether the regression model included an intercept. Default is "yes." |
model.type |
Type of model to be decomposed. OLS via lm, logistic via glm ("logit"), Poisson via glm ("poisson"), and negative binomial via MASS ("nb") are supported. |
decomp.coef |
Each case's or subset of cases' contribution to the estimated slope or regression coefficient. |
decomp.var |
Each case's or subset of cases' contribution to the variance of the estimated slope or regression coefficient. |
David Melamed, Ronald L. Breiger, and Eric Schoon
Schoon, Eric, David Melamed, and Ronald L. Breiger. 2024. Regression Inside Out. NY: Cambridge University Press.
data(Kenworthy99) m1 <- lm(scale(dv) ~ scale(gdp) + scale(pov) + scale(tran) -1,data=Kenworthy99) decompose.model(m1,group.by=c("Liberal","Corp","Liberal", "SocDem","SocDem","Corp","Corp","Corp","Corp","Corp","SocDem", "SocDem","Liberal","Liberal","Liberal"),include.int="no")
data(Kenworthy99) m1 <- lm(scale(dv) ~ scale(gdp) + scale(pov) + scale(tran) -1,data=Kenworthy99) decompose.model(m1,group.by=c("Liberal","Corp","Liberal", "SocDem","SocDem","Corp","Corp","Corp","Corp","Corp","SocDem", "SocDem","Liberal","Liberal","Liberal"),include.int="no")
Subset of data from the General Social Survey from 2016. Data were analyzed in Schoon, Melamed, and Breiger (2024). Full details on the variable selection and source information is available therein.
data("GSS.2016")
data("GSS.2016")
A data frame with 2867 observations on the following 27 variables.
sclass
a numeric vector
fulltime
a numeric vector
retired
a numeric vector
hrsworked
a numeric vector
occprestige
a numeric vector
occprestige_partner
a numeric vector
occprestige_mother
a numeric vector
occprestige_father
a numeric vector
children
a numeric vector
age
a numeric vector
educ
a numeric vector
paeduc
a numeric vector
maeduc
a numeric vector
speduc
a numeric vector
babs
a numeric vector
female
a numeric vector
white
a numeric vector
black
a numeric vector
other
a numeric vector
income
a numeric vector
republican
a numeric vector
conservative
a numeric vector
environment
a numeric vector
helpblackpeople
a numeric vector
science
a numeric vector
govequalwealth
a numeric vector
pclass
a numeric vector
Schoon, Eric W., David Melamed, and Ronald L. Breiger. 2024. Regression Inside Out. NY: Cambridge University Press.
data(GSS.2016) head(GSS.2016)
data(GSS.2016) head(GSS.2016)
Subset of the General Social Survey analyzed by Schoon, Melamed, and Breiger (2024). Full details on the variable selection and source information is available therein.
data("GSS2018")
data("GSS2018")
A data frame with 558 observations on the following 7 variables.
dog
a numeric vector
race
a numeric vector
sex
a numeric vector
children
a numeric vector
married
a numeric vector
age
a numeric vector
income
a numeric vector
Schoon, Eric W., David Melamed, and Ronald L. Breiger. 2024. Regression Inside Out. NY: Cambridge University Press.
data(GSS2018) head(GSS2018)
data(GSS2018) head(GSS2018)
Data analyzed by Hilbe (2011), and used here to illustrate model visualization and coefficient decomposition for count models.
data("Hilbe")
data("Hilbe")
A data frame with 601 observations on the following 9 variables.
naffairs
a numeric vector
avgmarr
a numeric vector
hapavg
a numeric vector
vryhap
a numeric vector
smerel
a numeric vector
vryrel
a numeric vector
yrsmarr4
a numeric vector
yrsmarr5
a numeric vector
yrsmarr6
a numeric vector
Hilbe, Joseph M., 2011. Negative binomial regression. NY: Cambridge University Press.
data(Hilbe) head(Hilbe)
data(Hilbe) head(Hilbe)
Data to replicate OLS regression models reported in Kenworthy (1999). Data were analyzed in Schoon, Melamed, and Breiger (2024). Full details on the variable selection and source information is available therein.
data("Kenworthy99")
data("Kenworthy99")
A data frame with 15 observations on the following 6 variables.
dv
a numeric vector
gdp
a numeric vector
pov
a numeric vector
tran
a numeric vector
ISO3
a character vector
nation.long
a character vector
Kenworthy, Lane. 1999. "Do social-welfare policies reduce poverty? A cross-national assessment."" Social Forces 77(3): 1119-1139. Schoon, Eric W., David Melamed, and Ronald L. Breiger. 2024. Regression Inside Out. NY: Cambridge University Press.
data(Kenworthy99) head(Kenworthy99)
data(Kenworthy99) head(Kenworthy99)
Given two points, p1 and p2, this function identifies the point at which p1 is projected onto the line connecting p2 and the origin (0,0). The projection occurs at a right angle.
project.point(p1,p2)
project.point(p1,p2)
p1 |
First point, the one that is to be projected onto point 2. |
p2 |
Second point, the one that is projected to the origin. This is the outcome or dependent variable in our book. See reference below. |
The output is just a single point. This is implemented as the point to which lines are drawn in many graphs.
Two values which correspond to the x and y co-ordinates in the graph.
David Melamed, Ronald L. Breiger, and Eric Schoon
Schoon, Eric, David Melamed, and Ronald L. Breiger. 2024. Regression Inside Out. NY: Cambridge University Press.
data(Kenworthy99) m1 <- lm(scale(dv) ~ scale(gdp) + scale(pov) + scale(tran) -1,data=Kenworthy99) rp1 <- rio.plot(m1,include.int="no",r1=1:15) project.point(as.numeric(rp1$col.dimensions[1,]),as.numeric(rp1$row.dimensions[1,]))
data(Kenworthy99) m1 <- lm(scale(dv) ~ scale(gdp) + scale(pov) + scale(tran) -1,data=Kenworthy99) rp1 <- rio.plot(m1,include.int="no",r1=1:15) project.point(as.numeric(rp1$col.dimensions[1,]),as.numeric(rp1$row.dimensions[1,]))
Subset of replication data from Ragin and Fiss (2017). Data were analyzed in Schoon, Melamed, and Breiger (2024). Full details on the variable selection and source information is available therein.
data("RaginData")
data("RaginData")
A data frame with 4185 observations on the following 10 variables.
incrat
a numeric
pinc
a numeric
ped
a numeric
resp_ed
a numeric
afqt
a numeric
kids
a numeric
married
a numeric
black
a numeric
male
a numeric
povd
a numeric
Ragin, Charles C. and Peer C. Fiss. 2017. Intersectional inequality: Race, class, test scores, and poverty. Chicago, IL: University of Chicago Press. Schoon, Eric W., David Melamed, and Ronald L. Breiger. 2024. Regression Inside Out. NY: Cambridge University Press.
data(RaginData) head(RaginData)
data(RaginData) head(RaginData)
rio.plot is used to generate a reduced rank image of a regression model. The function computes row and column dimensions for both cases and variables, and generates an image of the model based on those scores.
rio.plot(m1,exclude.vars="no",r1="none",case.names="",col.names="no", h.just=-.2,v.just=0,case.col="blue",var.name.col="black", include.int="yes",group.cases=1,model.type="OLS")
rio.plot(m1,exclude.vars="no",r1="none",case.names="",col.names="no", h.just=-.2,v.just=0,case.col="blue",var.name.col="black", include.int="yes",group.cases=1,model.type="OLS")
m1 |
a regression model object. Supported models include OLS, Logistic, Poisson, and Negative Binomial Regression. |
exclude.vars |
an optional numerical vector indicating variables from the model to exclude from the plot of the model. |
r1 |
an optional numerical vector indicating cases to include in the plot. By default, all cases are excluded from the plot. |
case.names |
a character string of names to label the cases. Should be the same length as 'r1.' |
col.names |
whether to include the variable names in the plot. Default is "no" |
h.just |
horizontal justification in the plot. Default is -.2 |
v.just |
vertical justification in the plot. Default is 0 |
case.col |
if cases are added to the plot, this is their color. Default is "blue" |
var.name.col |
Color of the names of variables in the plot. Default is "black" |
include.int |
Whether the underlying model included a model intercept. Default is "yes" |
group.cases |
Whether to aggregate cases into clusters or subsets. If yes, provide a numeric vector of memberships. It will aggregate over them by summing. |
model.type |
The type of regression model. OLS is supported via the lm function. Logistic and Poisson regression are supported via the glm function. Negative Binomial regression is supported via the MASS package. Default is "OLS." For logistic regression, use "logit." For Poisson regression, use "poisson." For negative binomial regression, use "nb." |
The function take a regression model object (OLS, logistic, Poisson, or negative binomial) and computes the corresponding row (case) and column (variables) scores. The scores are part of the output, as is a ggplot object of the model.
rio.plot returns several objects.
p1 |
a ggplot object of the model space, given the terms in the function |
row.dimensions |
the scores assigned to each case, or each subset of cases if they were aggregated using the 'group.cases' option. These are the co-ordinates in the plot. |
col.dimensions |
the scores assigned to each variable. These are the co-ordinates in the plot. |
case.variances |
each cases' contribution (or each subsets' contribution) to the variance of the estimated regression coefficient |
U |
The orthogonalized column space matrix from the Singular Value Decomposition of the predictor matrix and fitted values. |
UUt |
The orthogonalized column space matrix from the Singular Value Decomposition of the predictor matrix and fitted values, post-multiplied by its transpose. |
David Melamed, Ronald L. Breiger, and Eric Schoon
Schoon, Eric, David Melamed, and Ronald L. Breiger. 2024. Regression Inside Out. NY: Cambridge University Press.
data(Kenworthy99) m1 <- lm(scale(dv) ~ scale(gdp) + scale(pov) + scale(tran) -1,data=Kenworthy99) rp1 <- rio.plot(m1,include.int="no") names(rp1) rp1$gg.obj # rp1$gg.obj + ggplot2::scale_x_continuous(limits=c(-.55,1)) # useful option rp2 <- rio.plot(m1,r1=1:15,case.names=paste(1:15),include.int="no") rp2$gg.obj Kenworthy99 <- data.frame(Kenworthy99,type=c("Liberal","Corp","Liberal", "SocDem","SocDem","Corp","Corp","Corp","Corp","Corp","SocDem","SocDem", "Liberal","Liberal","Liberal")) rp3 <- rio.plot(m1,r1=1:15,group.cases=Kenworthy99$type,include.int="no") rp3$gg.obj # rp3$gg.obj + ggplot2::scale_x_continuous(limits=c(-1,20))
data(Kenworthy99) m1 <- lm(scale(dv) ~ scale(gdp) + scale(pov) + scale(tran) -1,data=Kenworthy99) rp1 <- rio.plot(m1,include.int="no") names(rp1) rp1$gg.obj # rp1$gg.obj + ggplot2::scale_x_continuous(limits=c(-.55,1)) # useful option rp2 <- rio.plot(m1,r1=1:15,case.names=paste(1:15),include.int="no") rp2$gg.obj Kenworthy99 <- data.frame(Kenworthy99,type=c("Liberal","Corp","Liberal", "SocDem","SocDem","Corp","Corp","Corp","Corp","Corp","SocDem","SocDem", "Liberal","Liberal","Liberal")) rp3 <- rio.plot(m1,r1=1:15,group.cases=Kenworthy99$type,include.int="no") rp3$gg.obj # rp3$gg.obj + ggplot2::scale_x_continuous(limits=c(-1,20))
Subset of replication data from Schneider and Makszin (2014). Data were analyzed in Schoon, Melamed, and Breiger (2024). Full details on the variable selection and source information is available therein.
data("SchneiderAndMakszin06")
data("SchneiderAndMakszin06")
A data frame with 30 observations on the following 36 variables.
id
a character vector
country
a character vector
year
a numeric vector
fde
a numeric vector
fde_cilb
a numeric vector
fde_ciub
a numeric vector
wcoord
a numeric vector
govint
a numeric vector
ud
a numeric vector
epl
a numeric vector
socexp
a numeric vector
eduexp
a numeric vector
vet_un
a numeric vector
lmexp
a numeric vector
wagecov
a numeric vector
vet_isced3
a numeric vector
eduexp_pri
a numeric vector
edu_terenr
a numeric vector
vt_reg
a numeric vector
vt_vap
a numeric vector
compvote
a numeric vector
fde2
a numeric vector
low_fde_l
a numeric vector
high_fde_l
a numeric vector
high_wc_l
a numeric vector
high_int_l
a numeric vector
high_ud_l
a numeric vector
high_epl_l
a numeric vector
high_socx_l
a numeric vector
high_edux_l
a numeric vector
high_lmx_l
a numeric vector
high_vet_l
a numeric vector
p1_y
a numeric vector
p2_y
a numeric vector
p3_y
a numeric vector
sol_y
a numeric vector
Schneider, Carsten Q., and Kristin Makszin. 2014. "Forms of welfare capitalism and education-based participatory inequality." Socio-Economic Review 12(2): 437-462. Schoon, Eric W., David Melamed, and Ronald L. Breiger. 2024. Regression Inside Out. NY: Cambridge University Press.
data(SchneiderAndMakszin06) head(SchneiderAndMakszin06)
data(SchneiderAndMakszin06) head(SchneiderAndMakszin06)
Subset of replication data from Wimmer, Cederman, and Min (2009). Data were analyzed in Schoon, Melamed, and Breiger (2024). Full details on the variable selection and source information is available therein.
data("Wimmer_et_al_EPR")
data("Wimmer_et_al_EPR")
A data frame with 7908 observations on the following 80 variables.
yearc
a numeric
year
a numeric
cowcode
a numeric
country
a character
gdpcap
a numeric
gdpcapl
a numeric
oilpc
a numeric
oilpcl
a numeric
popavg
a numeric
lpopl
a numeric
ethfrac
a numeric
western
a numeric
eeurop
a numeric
lamerica
a numeric
ssafrica
a numeric
asia
a numeric
nafrme
a numeric
lmtnest
a numeric
polity2
a numeric
polity
a numeric
anoc
a numeric
anocl
a numeric
democ
a numeric
democl
a numeric
regchg3
a numeric
pimppast
a numeric
groups
a numeric
egipgrps
a numeric
exclgrps
a numeric
exclpop
a numeric
lrexclpop
a numeric
ttlpop
a numeric
discpop
a numeric
pwrlpop
a numeric
olppop
a numeric
olpspop
a numeric
jppop
a numeric
sppop
a numeric
dompop
a numeric
monpop
a numeric
maxexclpop
a numeric
maxegippop
a numeric
maxpop
a numeric
newonset
a numeric
newethonset
a numeric
newhionset
a numeric
newethhionset
a numeric
onsetstatus
a numeric
onsetstatus2
a numeric
actoraim
a numeric
actoraim2
a numeric
ongoingwarl
a numeric
ongoinghiwarl
a numeric
newonset2
a numeric
newhionset2
a numeric
newethonset2
a numeric
warlfl
a numeric
onsetfl
a numeric
ethonsetfl
a numeric
onsetfl2
a numeric
ethonsetfl2
a numeric
warstns2
a numeric
warstns1
a numeric
atwarnsl
a numeric
npeaceyears
a numeric
nspline1
a numeric
nspline2
a numeric
nspline3
a numeric
hpeaceyears
a numeric
hspline1
a numeric
hspline2
a numeric
hspline3
a numeric
fpeaceyears
a numeric
fspline1
a numeric
fspline2
a numeric
fspline3
a numeric
speaceyears
a numeric
sspline1
a numeric
sspline2
a numeric
sspline3
a numeric
Wimmer, Andreas, Lars-Erik Cederman, and Brian Min. 2009. "Ethnic politics and armed conflict: A configurational analysis of a new global data set." American Sociological Review 74(2): 316-337.
data(Wimmer_et_al_EPR) head(Wimmer_et_al_EPR)
data(Wimmer_et_al_EPR) head(Wimmer_et_al_EPR)