Sunday, March 10, 2024

Module #9 Visualization in R

From the provided website--

https://vincentarelbundock.github.io/Rdatasets/datasets.html

--I chose what I considered the most appealing data set: CPS1988. This data set covered the different variables that might affect wages during the year 1988, arguably one of the most romanticized eras of the U.S. from a 2000's perspective. This data set used 7 variables: wage, education, experience, ethnicity, smsa (Yes or no), region, and whether or not they were part-time. My first visualization package I wanted to use was the classic ggplot2. I opted for a simple set up

The result looked as such:
I tried out a few different ways to best encapsulate the comparisons that I think would be important to someone looking at wage data and this was the most helpful I could imagine. It is able to capture the outliers in a visually significant way and is useful for also reflecting the distribution. I'm sure in comparison to modern models this would look very different and the outliers would be much more extreme. 
A much simpler visualization is the built-in hist() function within R that is usable, however lacks the same visual stun and shimmer that ggplot2 and other packages provide. I used the same variables for this visualization because the other variables are characters and difficult to appropriately represent this way. 

The last visualization is the weakest due to the simplicity and bareness of its design within R. This visualization is weaker than the others due to the density of the plots and also as a result of the interaction between the variables. Rather than being scattered, the stratified way in which education and the other variables work means that creating a more usable plot is difficult. 



No comments:

Post a Comment

Final Project Visual Analytics

      For this project, I will be utilizing statistical visualizations derived from the "USMacroB" dataset. Spanning from 1959 to ...