Thursday, April 25, 2024

Visual Analytics Final Project

 

Introduction:

High school and college students are the primary demographic that credit card companies appeal to in the United States and across the globe alike. Because of the new concern for credit scores and their necessity for buying power as a part of this generation, I consider understanding the usage and qualities people tend to display when using them incredibly important. As such, I have decided to do an exploratory analysis of the Credit_Card data set from the https://vincentarelbundock.github.io/Rdatasets/datasets.html  website which was provided earlier in the course. This resource is very helpful for finding unique and viable data sets to explore relationships in raw data. The primary purpose of using this data is to uncover any trends and relationships between user data and their spending habits. In an age where there are abundant places to spend money and people to use credit, developing a better understanding of the relationships are an important statistical and practical use of data. 

Setting up the data:

The first step for completing this analysis required cleaning the data—getting rid of NA values and ensuring the information was properly communicated meant that I could articulate the data efficiently and properly when creating the visualizations. 


These steps in R allowed me to eliminate NA values and scale the figures to a ratio that saved me time and energy making sense of later. In addition, they afforded me an opportunity to explore the data and make observations about eh variables, their qualities (max/min, mean, etc.).

Visualizations:

One of my first steps when starting analysis is always to understand distributions and recognize patterns between variables. Determining which variables offer the most insight as well as getting a grasp on where relationships could pose interesting theoretical questions were among my chief concerns.

I decided to view a distribution of age and income first, as they most likely contribute to the most important aspects of the data. 





In addition, I wanted to better understand the expenditures listed in the data set:  

These visualizations offered insight into the characteristics of the different variables. For example, a pareto curve for the expenditure histogram is indicative of the higher frequency of card holders.



For my other visualizations, I opted to use comparative scatter and box plots to illustrate correlations between different variables. 

This visualization provides important perspective about the data; by using "ggplots" 'fill' feature, we are able to identify another important factor in the expenditure of card holders. Some data points depict users of the credit card that are not the owners, e.g. parents and relatives who set up a card in their name for 3rd party use. In addition, the use of scatter plots provided valuable visual comparisons to better understand the grouping and density dynamics present in the data. 
The visualization here presents a interesting trend towards the lower left quadrant of the plot. Many users lean towards the lower end of the income spectrum and similarly spend on average less than those that have more disposable income. Further, this plot appears to suggest no visible difference in the owner as significantly effecting the spending habits of the user, as many of the points are condensed around the same point. 

Another statistical process that can be used within R is the svm and machine learning algorithms to produce comparable data and understand if an underlying relationship exists between income and expenditure. 


The table produces a result that suggests a clear correlation between income and expenditure, The algorithm correctly predicted the result to a degree that provides evidence about the relationship to be a worthwhile insight. 
To conclude, the visualizations and support vector machine are useful tools for ascertaining the validity and patterns of relationships within data sets. We can deduce from our findings that income, ownership, age, and more play a role in the expenditure of card holders. Further, we can identify income as a key actor in the choice of users to spend or not. 

No comments:

Post a Comment

Final Project Visual Analytics

      For this project, I will be utilizing statistical visualizations derived from the "USMacroB" dataset. Spanning from 1959 to ...