Friday, April 26, 2024

Final Project Visual Analytics

    For this project, I will be utilizing statistical visualizations derived from the "USMacroB" dataset. Spanning from 1959 to 1995, this dataset offers a view of macroeconomic variables in the United States. Our objective is to glean insights into pivotal economic indicators such as gross national product (GNP), monetary base, and treasury bill rates.

Exploring Economic Indicators:

Gross National Product (GNP): Our exploration commences with an examination of GNP's trajectory over time. Employing a time series plot, we chart the quarterly fluctuations in GNP from 1959 to 1995. This visualization provides a lucid depiction of economic growth trends throughout this period.

Monetary Base and Treasury Bill Rates: Transitioning to monetary indicators, we will be using the average of the seasonally adjusted monetary base and the average of the 3-month treasury bill rates. Through histograms and scatter plots, we delve into the distribution and relationships of these variables.

Analyzing Trends and Relationships:

Correlation Analysis: Venturing beyond individual variables, we delve into the intricate relationships among economic indicators. Employing scatter plots and correlation coefficients, we unearth correlations between GNP, monetary base, and treasury bill rates. This analysis yields valuable insights into the interconnectivity of economic factors.

Time Series Analysis: Leveraging time series plots, we scrutinize the trends and seasonality of economic variables across time. By decomposing the data into trend, seasonal, and residual components, we attain a deeper comprehension of enduring patterns and cyclical fluctuations within the economy.

Interactive Visualization

Animated Scatter Plot: To enrich our exploration, we deploy an animated scatter plot to visualize the dynamic interplay between GNP, monetary base, and treasury bill rates. This dynamic visualization enables us to observe the evolution of these variables over time and explore trends across diverse economic regions.



The Data

The data set is from a provided sources from earlier in the course-- Github Sets. After perusing through the available data sets I decided to use an economic set that contained 146 observations and 3 variables: GNP (Gross National Product), Mbase (Average of the seasonally adjusted monetary base), and tbill, which reflected the 3-month (quarterly) average of treasury bill rates. I went ahead and looked at the summary stats and toyed with the different functions within R to get a feel for the data and its preliminary characteristics.


    This image provides a preview of the data; it shows a column designated 'rownames' too, which is something that I had to alter when cleaning the data. Because each row reflected the passage of a single quarter, I knew that I would be able to use it for illustrating quarterly changes and getting a microscopic view of the data as years went on. However this posed another issue--it is tedious to look back at what row your are looking at and determine what year it is based on how many rows have passed (4 per year, 146 rows, 1959-1995) and thus what point in the fiscal year a data point is from. My solution to this was to create a new data frame that repeated the year four times to coincide with the quarters and bind it to the existing data frame.




With the new column, understanding quarterly and yearly changes and continuities become easier to understand visually, as well as making data manipulation easier.

Visualizations 
    I wanted to have a diverse set of visualizations when creating models and plots so I opted to use different strategies when creating them. For this segment, plot,ly, ggplot, and animated graphics will be employed to produce an array of visualizations to compare variables. 
    I decided plot.ly provides the baseline for R visualizations and began using that first:
The first plot created was a general plot and produced this result:
This plot does not provide causal relationship evidence, however we can see that the variables appear to have a close connection and provide visual correlation. The second plot I created was meant to compare gnp over time. This is the most common plot recognized when observing economic changes, and as we will see, the trend is apparent. 
One concern I have about using the cleaned data set and year is the duplication of data points and odd layering. While users understand these as Q1-Q4, R cannot distinguish them using plot.ly and assumptions are made about each point along the axes as to which quarter they belong. As you will see, I flip-flop between using rownames and year_df to communicate understanding about the data. Rownames (quarters) will frequently produce cleaner and more consistent-looking visualizations. The same is true from using 'rownames', however the clusters of data points do not provide further clarity and therefore are omitted from this presentation. 
The next visualization created is meant to compare the frequency and change of our mbase (monetary base) through a histogram. In addition, the use of an 'abline' and mean-line provide us a perspective about the mbase over time.
    As the monetary base rises (configured over time), the frequency of related values within the data diminishes. This trend highlights the slow increase near the beginning of our time series and faster progression as the decades continue. This could suggest eras of greater profit and value afforded to the monetary base as density rises.

GGPLOT
ggplot2 provides users with a tremendous visual advantage over plot.ly. Plot.ly is built-into R and can be useful for one-time or infrequent use, however ggplot reigns in the visualization aspect when visualizations are consistently depended on. 

 I decided to use quite a few different plots to best articulate the data. The ability to compare and distinguish data point and features is a strong-suit of the ggplot package. 
Here the differences between using year and quarters is evident. A more precise measurement offers a better grasp of the behavior of the data and adheres to the abline more accurately than 'year_df'. Regardless, both illustrate a positive correlation between years and gnp, as well as depicts the ebb and flow of gnp in certain years. 
The growth of the monetary base has been limited according to this plot until the 1980's, when the growth was able to reach new, distinct peaks and rise faster than any time before these quarters. As such, the abline has to accommodate for this sudden strength in the right of the plot; as before the growth was slowly curving, the mbase has rapidly increased in the latter quadrant of the plot

This plot is the most valuable for its inclusion of multiple variables. While the other plots represented a comparative view of two variables, this plot functions to show all three. Using rownames, the treasure bill count appears to have a distinct peak and fall on either side. Additionally, the monetary base is clearly increasing over time. Something we can draw from this plot is that the treasure bills do do not appear to be positively correlated to time, but appear to have a nearly neutral reaction to the other variables. Indeed, it seems as though the quarters between the 75th quarter (1986) and 125th (1990) were exceptionally busy with treasury bills and economic focus. Coincidentally, these years coincide with the election of a certain president, however these can only be assumed correlations since this does not compare presidencies or other factors. 
The final ggplot I decided to create was a time-series that utilizes a line plot instead. 

This is the most organized and defining visualization for the relationship between bill rates and quarters. The use of 'geom_smooth' encapsulates the overarching behavior of the data and offers a usable visual about the rate of bills over time. 

Animated Scatter Plot 

The final visualization tool I will be using is the 'gganimate' package. This will allow me to produce a gif that articulates the data input as a moving, alive instrument of statistics. 
This is the code I attempted to use. There are quite a few iterations that I tried to use in order to troubleshoot issues I was having or to clean up the visual. The product looks as such:
Unfortunately, the scatter plot cannot be run through blogger, so a still is the closest I can get to visualizing it here in this medium. 
Conclusions
In conclusion, our journey through the realm of economic trends utilizing statistical visualizations has provided invaluable insights into the intricacies of the U.S. economy from 1959 to 1995. By harnessing the power of visual representations, we enhance our understanding of economic phenomena and inform strategic decision-making in a complex and interconnected world.


Reflection
I could have made different decisions throughout numerous steps in this process. Cleaning the data so I would have a year data frame ended up being more time-consuming and in the end unnecessary. I have countless issues along the way figuring out which data set best communicated what I was trying to say and problems rendering ggplots and many of the visuals were tedious to fix. Additionally, I could have added more to the plots and made more inferences and statistical decisions to make the impact of my visuals more promising. In the end, I think I produced a usable and quite attractive presentation about my topic that could be used to make assumptions and inferences about a specific period of history.
 



No comments:

Post a Comment

Final Project Visual Analytics

      For this project, I will be utilizing statistical visualizations derived from the "USMacroB" dataset. Spanning from 1959 to ...