For the completion of this assignment, I followed five key steps as listed below. Each step has been elaborated upon in the subsequent sections.
The Visualization I chose for this assignment shows the wage gap between the male and the female genders across twenty industries in the United states. The visualization and the accompanying data can be viewed here.
The visualization represents the data as of 2019. However, based on the article, it would be beneficial to have either a benchmark value of comparison or view of historical data so that important trends can be observed.
This visualization can be improved on the fronts of accessibility and understandability so that it can bring more visibility to the important discussed of wage gap based on gender.
The data can be found either on the page of the visualization or can be found on the website of the US Census Bureau.
The following is the visualization:
While the visualization is interesting in it’s choice of graphic, it is inconvenient to some extent. The presence of a scale for various buckets of earning combined with sub-segments in combination with the wages and the industry makes it difficult to efficiently draw timely conclusions.
In order to critique this visualization, I chose to use the ‘Data Visualization Effectiveness Profile’. Here, I will evaluate it across seven criteria: Usefulness, Completeness, Perceptibility, Truthfulness, Intuitiveness, Aesthetics and Engagement.
Usefulness Usefulness is determined based in the needs of the audience and is measured on a scale from Useless to Very useful. It is very important to understand the wage gap that exists between genders and hence, it is important to represent this information in a way in which everyone can understand it. However, in this visual, it would have been useful to show the number of people in the market separated by gender. This will then start a conversation about causes that drive wage gap based on the industry. The visual is focused on civilian Employed Population who are 16+ years of age. The plot does not differentiate between those who are full-time employees and those who are part-time employees. We should also consider the fact that a few of the participants are student who are working part-time. Maternity can be a cause for women to be out of the job market for a while which can skew data. Hence, adding some more relevant information would have increased the usefulness of the visual.
Completeness Completeness is determined based on the extent to which information represented helps produce the right amount of understanding. It is measured on a scale from No relevant data to all relevant data. The visualization here provides no information about the number of people in a particular industry even though it’s a focal point of the article and the data set. It would be useful to also provide some information about full-time and part-time employees. The use of a spherical graphic is a good way of ensuring that visually, the gender gap between and women is not missed.
Perceptibility Perceptibility is measured based on the ease with which the information and the graphic can be assimilated by the viewer. This is measured on a scale from Unclear and difficult to Clear and easy. In this graphic, the presence of a color scheme for various buckets of data on the lower left corner and the presence of a scale on the lower right hand corner makes it difficult to keep up with the main representation. Once has to move back and forth from the legends to the main graphic which reduces Perceptibility. The repeated use of ‘$’ and ‘K’ is also redundant.
Truthfulness Truthfulness talks about the accuracy and the validity of the data. It is measured on a scale from Inaccurate and/or invalid to Accurate and valid. The date here is represented on sub-segments of sections of a pie. This means that the areas of each subsection will not be the same. Hence, the representation on the graphic is not accurately related to the amount of wage that is being represented. In addition to this, we can see on the lower right hand corner that each sub-segment represents increments of $10,000. However, the last sub-segment intends to represent the data in an increment of $20,000. However, we can see that it goes from $80K to $10K, i.e, overall, the range of the wages is represented on a segment going from $10K to $10K. This makes me question the accuracy of the representation and the justice it does to the actual data.
Intuitiveness Intuitiveness talks about the degree to which the visualization is familiar and easy to understand. It is represented on a scale from Unfamiliar; difficult to understand to Familiar; easy to understand. Based on conventions of data representations, those viewing the graphic might overlook the non-uniformity in the sub-segment size and the scale. This can lead to wrong intuitiveness and thus wrong conclusions.
Aesthetics Aesthetics range from Ugly to Pleasing to the eye to Beautiful. The use of color here (Blue for men and Pink for women) is in-line with general use. However, the use of gradients of blue and pink can be confusing.
Engagement Engagement is focused on overall quality. It ranges from Distracts from data to Neutral to Draws one into the data. In this visualization, the additional footers and the two legends serve as distractions from the graphic. The graphic itself leaves us with a few unanswered questions. Emphasis should also be on one industry vs other industries or national median wage vs the wage within an industry. The graphic tries to provide too much information while at the same time not providing much.
I used figma in order to think through my redesign of the above plot. Since my data has 20 categories (the industries) and three data series (the median wage of men, the median wage of women and the median wage across the population), I decided to use a combination of bar plot and line plot. The bar plots will be grouped by the wage of men and women for each industry and the line plot will hold the value of the median wage of the overall population.
My wireframe generated on figma can be viewed below (for 2019 data):
In order to test the solution, I showed the wired frame to three individuals (a Heinz student, a Tepper student and a student at Mellon College).
After interviewing the three students, I realized that my subtitle was not very informative as it seems like I am comparing the wage data with itself. Hence, I will have to make sure that in my final visualization, I add a more detailed subtitle. A lot of questions came up about the black line so I will have to be careful about how I represent it so that it is not misinterpreted.
The exercise of critiquing the visualization helped me immensely when it came to deconstructing it. It helped me gauge what works and gave me ideas to enhance it while also figuring out those aspects of it which need to be reconsidered. What I learnt is that fancy visualizations don’t often work. They can be cluttered and can at times complicate data which can easily be represented by simple graphs with no ambiguity.
This assignment aligned my thoughts not just in visual presentation but also in terms of how correct data representation is key. (I had to do quite a bit of data cleaning and data aggregation of over 5 years of data to make this work!)
In order to get the data ready, I did the following:
In this first visual, I worked off my wireframe to build a visualization using the ‘Combo: Lines & columns (grouped)’ as my graphic as choice. The bars represent the series data- ‘Median Wage of Women and the ‘Median Wage of Men’. The line representation is used for the overall ‘Median Wage of the Population’. The x-axis contains each of the twenty industries. I represented the median wage is dollars only on the y-axis so that it is not repeated multiple times. I also made sure that the sub-title of the graphic is more informative. I tested the colors used against the most common color blindness here to ensure that they were color blind friendly.
The visualization is as follows:
In my critique section, I mentioned the importance of historical data and how it can help enhance the visual. Keeping that in mind, I found data for years 2015 through 2019 on the US Census Bureau website and aggregated them into one single dataset. I then proceeded to clean the data following the steps mentioned before.
Once the data was cleaned, I represented the data using ‘Bar chart (grouped)’ visualization via ‘Grid of Charts’. Here, I split the grids according to gender and we can now see a side-by-side view of the changes in wage through the years for the two genders across industries. As in the previously developed visual, wage ($) is represented only once on the x-axis so that it is not repeated multiple times and here, the industries are arranged in alphabetical order with ‘Other Services’ at the end. As in the previous visualization, even here, I made sure that the sub-title is informative and not confusing.
Using the control, we can either hit play to look at the changes through the years or we can look at the difference in wages in any particular year.
The visualization is as follows (Hit Play and look at the trends!):
In this section, I would like to discuss additions I would have liked to make to the two plots I developed so that they are more informative without being cluttered. These changes would have been possible if I has access to specific data which I was unable to find. The reason I did not make-up this data for the visualization is that I used real-world data for the previous visualizations and I did not want to use that with a mix of made-up data.
The first addition I would have liked to make is to add a filter to the plots to also visualize the median wages as divided between part-time and full-time employees. It will be very interesting to see how the data skew (if any) shifts in that case.
The second enhancement I would like to see is to my second plot where, in addition to seeing the trends through the years, I would also like to see represented on the graphic the percentage by which men earn more than women. This was already calculated by me in the dataset. However, due to the limitations of flourish, I was unable to incorporate it without changing the individual x-axis which would not constitute the principles of good visualization practices.