Exploring Relationships with Machine Learning

Research Questions:

Do couples that meet on dating apps have higher or lower quality relationships?
Can any features in this dataset help predict how a subject would rate their relationship quality?
What insights can I derive from using machine learning for exploratory analysis?

Machine Learning Analysis

I analyzed this dataset by creating four notebooks where I conducted Exploratory Data Analysis, Classification, Regression, and Principal Component Analysis.

With the dropdown menu below, you can view visualizations of the results from those four notebooks:

This barchart shows the numerical features within the dataset that had the strongest correlations with each other based on the correlation matrix conducted in the EDA notebook.

This barchart shows the numerical features that had the strongest correlations with relationship quality ratings based on the correlation matrix conducted in the EDA notebook.

This barchart shows the features remaining in the Logistic Regression model after conducting backward elimination to remove statistically insignificant features (where their alpha levels were over p = .05). The specific steps I took can be found in the classification notebook.

This barchart shows the features left in my Linear Regression model after conducting backward elimination to remove all features that were statistically insignificant (where their alpha levels were over p = .05). The specific steps I took can be found in the regression notebook.

Top Correlations between Numeric Features

Top Correlations between Numeric Features & Relationship Quality

Classification Feature Coefficients

Regression Feature Coefficients

All the correlations between the feature pairs above had a p-value of 0, so there is statistical significance. Any correlations at an absolute value of 0.3 or below are pretty weak.

Interesting Findings:

Political Views: Subjects were more likely to be with partners that had similar political views.
Education: Subjects were more likely to be with partners that had similar education levels.

As there is correlation between a subject's and their mother's education levels, the couple's mothers were also likely to have similar education levels with each other.

Age: The older a subject was when the couple first met, the larger their age gap tended to be and the shorter it took for them to become a couple.

All the top correlations with rQual shown had p-values below 0.05, so there is statistical significance. Any correlations with an absolute value of 0.3 or below are pretty weak, so all of these correlations above with relationship quality very weak.

Interesting Findings:

Subjects were more likely to rate their relationships as "good" if the couple had:
- Higher household income
- Higher education levels
- Older age
Subjects were less likely to rate their relationships as "good" if the couple had:
- More household members below the age of 18

Interesting Findings:

Subjects were more likely to rate their relationships as "good" if the couple had:
- Higher household income
- Higher education levels
- Older age
- Sex frequency of once a week or more
Subjects were less likely to rate their relationships as "good" if the couple had:
- Sex frequency of once a month or less

Note! Any racial correlations here should be taken with a grain of salt. Most subjects and their partners in this dataset identified as white, so when it comes to issues of race on relationships, we would need much more data. And even with more data, racial issues can be very complex as they may be tied to many other correlated factors such as eductional and economic opportunties.

Interesting Findings:

Subjects were more likely to rate their relationships as "good" if the couple had:
- Higher household income
- About the same income earnings as each other
- Older age
- Sex frequency of once a week or more
- Met in school
- Met as "work neighbors"
Subjects were less likely to rate their relationships as "good" if the couple had:
- Sex frequency of once a month or less

Open selected notebook on Github

Components 0 & 1 - Features

The values of the features here shows is how they contribute to that principal component.

The grey colored rectangles have values closer to 0, so they don't contribute much to that component. (It doesn't necessarily mean the feature isn't significant. It just means this specific component's captured variance doesn't have anything to do with that feature.)

A yellow colored feature shows that it is positive, so if the component were represented on an axis, increasing along this axis would represent more of that feature.

For example, Component 0 has the feature "householdMinor_num" colored in yellow, so as you go forward on this axis, the subjects tend to have more children at home.

A black colored feature shows that it is negative, so if the component were represented on an axis, increasing along this axis would represent less of that feature.

For example, Component 0 has the feature "partnerAge" colored in black, so as you go forward on this axis, the subjects tend to have younger partners.

Below is my interpretation of the two components.

Component 0

Positive

Number of minors in household (strong)
Partner's mother education (strong)

Negative

Partner's Age (strong)

Component 1:

Positive

Household income (strong)
Partner's mother education (strong)

Negative

Number of minors in household (moderate)

Relationship Quality: Good vs Not Good

Interesting Findings:

A subject was more likely to rate their relationships as "good" if the couple had:
- Higher household income
- More education
- Older age
A subject was less likely to rate their relationships as "good" if the couple had:
- More household members below the age of 18

Selected notebook on Github

Return to dropdown

Conclusions

Do couples that meet on dating apps have higher or lower quality relationships?

In the all the models built, it seems there was no statistical significance between meeting on dating apps and rated relationship quality.

What insights can I derive from using machine learning for exploratory analysis? Can any features in this dataset help predict how a subject would rate their relationship quality?

Some interesting findings are that the education levels and political views of subjects and their partners tend to be about the same. The older a subject was when the couple first met, the larger their age gap tended to be and the shorter it took for them to become a couple.

As for the best features to predict relationship quality, a feature may show up as significant in one model, but maybe it wouldn't be so significant in other models. In the efforts for exploratory analysis, if a feature appeared as signifigant in multiple models surely, there is something worth further investigation here.

In the table below, I grouped together similar features into their general concepts. For example "subjectAge" and "partnerAge" both relate to age, and they are correlated, so for simplicity I grouped them together in the table below as "Age."

With this table we can see which general concepts popped up as significant in which models.

	Classification	Regression	Unsupervised	Total
# of Household Minors		✅	✅	2
Age	✅		✅	2
Earned about the same		✅		1
Education	✅		✅	2
Household Income	✅	✅	✅	3
Living Together	✅	✅		2
Met as Coworkers / Work Neighbors	✅	✅		2
Met in School		✅		1
Race		✅		1
Sex Frequency	✅	✅		2

It seems the general concepts that appeared in more than one model were number of household minors, age, education, income, living together, and sex frequency. Less of the first feature and more of any of the last five seem to be correlated with higher likelihood a subject would rate their relationship as "good."

Meeting as workmates also seems to have correlation with relationship quality, but there seems to be conflicting results. "metAs_coworkers" is correlated with lower likelihood of "good" relationship quality, while "metAs_workNeighbors" was correlated with higher liklihood. When consulting the codebook, it was unclear what the distinction was between "coworkers" and "work neighbors" were, if any.

The relationship between these seven concepts and relationship quality might be worth investigating further with future studies and more relationship data.

As a quick reminder, correlation does not mean causation. This analysis is done on just one dataset of couples. Relationships in the real world might be very different. Just because a couple is of certain ages, races, education levels, or incomes it neither means their relationship is doomed to fail, nor destined to succeed.

"I've learned that you can't predict [love] or plan for it. For someone like me who is obsessed with organization and planning, I love the idea that love is the one exception to that. Love is the one wild card." - Taylor Swift