Sunday, March 31, 2024

Module 11 Assignment

 This weeks assignment was focused on generating visualization using a specific kind of R tool. Tufte is a new concept and I found the types of visualizations offered by the authors and contributors of these packages incredibly interesting. I decided to use a built-in data set for this assignment. "Economics" provides a thorough report of economic information from 1967 to 2015. I decided pce against psavert and applied the formula of tufte to the other characteristics of ggplot. 

I also installed and initialized the other packages prior to this. 

The output of my code is this:

The output is a marginal histogram scatterplot comparing the variables on either side of the opposing axes through histograms as well as reflecting the data in a central scatter plot. 

Sunday, March 24, 2024

Module #11 Debugging and Defensive Programming

 


The error I noticed pretty quickly about this code was the last line; the "return" line was improperly placed. Once I ran the code and got this error, 
I was convinced that the error could be attributed to the same line. The (i in 1:nrow) was correct because it properly selected the numbers 1to the last row; the outlier part of the function was properly written, although I hadn't seen the "all" function before. After dropping the 'return' line to the next, the syntax error was solved and the function could be used to print. I'm not sure exactly what the function might do, but I'm assuming that the input is considered against the data set and deemed an outlier or not.






Module #10 Assignment

 This module and lecture were focused on visualizing data through time series; the changes and continuities of statistical behaviors within data sets can be shown over any metric of time (seconds, days, years, etc.) Macroscopic perspectives typically use years to understand financial periods, celestial-bodies (i.e. rotations and revolutions), or historical trends to name a few examples. I decided t use the pre-loaded dataset within R "EuStockMarkets", which describes the stock market data concerning four countries within the EU: Germany DAX (Ibis), Switzerland (SM)I, France (CAC), and UK (FTSE) through the years 1991-1998. First, I loaded the data and my packages; in particular, I loaded ggplot to visualize the data, and later incorporated the tidyverse package to manage and manipulate the data. 

I changed the data set into  a data frame to prepare it for visualization and make it easier to use. Using tidyverse, I told R to use the price and index attributes; using the mutate function allowed me to compose the annual cycles (repetitions) of each country: Before the mutation, years were represented in decimals and broke down years into an unusable collection of quantities (i.e. 1991.xxx). Next, I used plot.ly to visualize the time series. 


The plot is designed to compare years(x) and price(y), with colors being differentiated by the country the stats belong to. The resulting plot is as such: 
This visualization presents a trend that depicts a growing rate of stock market prices over time during the period of interest. There are clear moments of oscillation, however the bigger picture presents a clear trend towards higher prices no matter which country is being depicted. 

Sunday, March 17, 2024

Module #9 Assignment

 This assignment had the goal of creating a multivariate data visualization using one of the discussed visualization software/features we've seen in this class. I decided to use the "swiss" data set in R that reports the fertility rates and infant mortality of the Swiss populations within its 47 French-speaking provinces in 1888 in addition to socio-economic factors. The likely problem for someone conducting an investigation on this data set would be about the variables contributing to the infant-mortality rates. The factors used were: agriculture, fertility, examination, education, and catholic population density. Using this data, I chose to explore the relationship between fertility rates and infant mortality, and incorporated teh other variables into a plot using the ggplot package. 


The resulting plot:
This plot adheres to the basic principles of design in a number of ways. The first, alignment, is already encapsulated by the organized and cohesive capability of ggplot and other software. The plot is readable, coherent, and through the use of labels, gradient, and color there it is evident that the interactions between variables and the plot itself are significant--these factors contribute to the overall repetition and contrast of the visualization. Color and shape add a sense of contrast and a better sense of contrast between the data points and the goal of the visualization. Balance is accomplished through the use of labels and scaling. The data itself is asymmetrical, but the labels on all four sides aid in a sense of balance across the model. Additionally, there is no abundance of space without being too cramped. 

Sunday, March 10, 2024

Module #9 Visualization in R

From the provided website--

https://vincentarelbundock.github.io/Rdatasets/datasets.html

--I chose what I considered the most appealing data set: CPS1988. This data set covered the different variables that might affect wages during the year 1988, arguably one of the most romanticized eras of the U.S. from a 2000's perspective. This data set used 7 variables: wage, education, experience, ethnicity, smsa (Yes or no), region, and whether or not they were part-time. My first visualization package I wanted to use was the classic ggplot2. I opted for a simple set up

The result looked as such:
I tried out a few different ways to best encapsulate the comparisons that I think would be important to someone looking at wage data and this was the most helpful I could imagine. It is able to capture the outliers in a visually significant way and is useful for also reflecting the distribution. I'm sure in comparison to modern models this would look very different and the outliers would be much more extreme. 
A much simpler visualization is the built-in hist() function within R that is usable, however lacks the same visual stun and shimmer that ggplot2 and other packages provide. I used the same variables for this visualization because the other variables are characters and difficult to appropriately represent this way. 

The last visualization is the weakest due to the simplicity and bareness of its design within R. This visualization is weaker than the others due to the density of the plots and also as a result of the interaction between the variables. Rather than being scattered, the stratified way in which education and the other variables work means that creating a more usable plot is difficult. 



Sunday, March 3, 2024

Module #8 Correlation Analysis

 This assignment was an exploration of correlation analysis through the use of graphic visualizations. The primary method of producing correlation analyses are through the use of plots, as the patterns and trends visualized provide a helpful aid for individuals to better understand a positive, negative, or random association between variables. One of the recent representations I saw about this was related to vehicles, however not mtcars. Instead, a simple dataframe created by a series of vectors from a data science textbook. 





This plot and abline suggests a strong positive correlation. Few states that being able to effectively communicate the principle ideas and glean valuable information of a graphic is the primary goal of a visualization; a correlation analysis is one of the most widely used statistical process analysts and students use. I think my visualization accomplishes this. By using a distinct abline, a grid, and a brief sample of a dataset, the information can be easily understood from an outsider perspective. 



Final Project Visual Analytics

      For this project, I will be utilizing statistical visualizations derived from the "USMacroB" dataset. Spanning from 1959 to ...