Introduction:
High school and college students are the primary demographic
that credit card companies appeal to in the United States and across the globe
alike. Because of the new concern for credit scores and their necessity for
buying power as a part of this generation, I consider understanding the usage
and qualities people tend to display when using them incredibly important. As
such, I have decided to do an exploratory analysis of the Credit_Card data set
from the https://vincentarelbundock.github.io/Rdatasets/datasets.html website which was provided earlier in the
course. This resource is very helpful for finding unique and viable data sets
to explore relationships in raw data. The primary purpose of using this data is to uncover any trends and relationships between user data and their spending habits. In an age where there are abundant places to spend money and people to use credit, developing a better understanding of the relationships are an important statistical and practical use of data.
Setting up the data:
The first step for completing this analysis required cleaning
the data—getting rid of NA values and ensuring the information was properly communicated
meant that I could articulate the data efficiently and properly when creating the
visualizations.
These steps in R allowed me to eliminate NA values and scale
the figures to a ratio that saved me time and energy making sense of later. In
addition, they afforded me an opportunity to explore the data and make observations
about eh variables, their qualities (max/min, mean, etc.).
Visualizations:
One of my first steps when starting analysis is always to
understand distributions and recognize patterns between variables. Determining
which variables offer the most insight as well as getting a grasp on where relationships
could pose interesting theoretical questions were among my chief concerns.
I decided to view a distribution of age and income first, as
they most likely contribute to the most important aspects of the data.
In addition, I wanted to better understand the expenditures
listed in the data set:
These visualizations offered insight into the
characteristics of the different variables. For example, a pareto curve for the
expenditure histogram is indicative of the higher frequency of card holders.
For my other visualizations, I opted to use comparative scatter and box plots to
illustrate correlations between different variables.
The table produces a result that suggests a clear correlation between income and expenditure, The algorithm correctly predicted the result to a degree that provides evidence about the relationship to be a worthwhile insight.









No comments:
Post a Comment