I am handling some data from a survey regarding feeling about several factors related to workplace culture. It is currently in a longform tibble
called work_culture_data
like this:
> print(work_culture_data, n = 21)
# A tibble: 140 × 3
Response_ID Factor Level
<int> <fct> <fct>
1 6 Level_support_colleagues low
2 6 Level_support_community low
3 6 Level_career_prospects low
4 6 Level_career_satisfaction high
5 6 Level_career_impact low
6 6 Level_collaboration high
7 6 Level_assessment_fairness high
8 7 Level_support_colleagues high
9 7 Level_support_community high
10 7 Level_career_prospects very high
11 7 Level_career_satisfaction high
12 7 Level_career_impact high
13 7 Level_collaboration high
14 7 Level_assessment_fairness high
15 8 Level_support_colleagues high
16 8 Level_support_community low
17 8 Level_career_prospects very low
18 8 Level_career_satisfaction high
19 8 Level_career_impact high
20 8 Level_collaboration low
21 8 Level_assessment_fairness low
# … with 119 more rows
# ℹ Use `print(n = ...)` to see more rows
Which can be recreated with this dput()
output:
structure(list(Response_ID = c(6L, 6L, 6L, 6L, 6L, 6L, 6L, 7L,
7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 9L, 9L, 9L,
9L, 9L, 9L, 9L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 11L, 11L,
11L, 11L, 11L, 11L, 11L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 13L,
13L, 13L, 13L, 13L, 13L, 13L, 14L, 14L, 14L, 14L, 14L, 14L, 14L,
15L, 15L, 15L, 15L, 15L, 15L, 15L, 16L, 16L, 16L, 16L, 16L, 16L,
16L, 17L, 17L, 17L, 17L, 17L, 17L, 17L, 18L, 18L, 18L, 18L, 18L,
18L, 18L, 19L, 19L, 19L, 19L, 19L, 19L, 19L, 20L, 20L, 20L, 20L,
20L, 20L, 20L, 21L, 21L, 21L, 21L, 21L, 21L, 21L, 22L, 22L, 22L,
22L, 22L, 22L, 22L, 23L, 23L, 23L, 23L, 23L, 23L, 23L, 24L, 24L,
24L, 24L, 24L, 24L, 24L, 25L, 25L, 25L, 25L, 25L, 25L, 25L),
Factor = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L,
3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L,
4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L,
5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L,
6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L,
7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L,
1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L,
2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L,
3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L,
4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L), levels = c("Level_support_colleagues",
"Level_support_community", "Level_career_prospects", "Level_career_satisfaction",
"Level_career_impact", "Level_collaboration", "Level_assessment_fairness"
), class = "factor"), Level = structure(c(2L, 2L, 2L, 3L,
2L, 3L, 3L, 3L, 3L, 4L, 3L, 3L, 3L, 3L, 3L, 2L, 1L, 3L, 3L,
2L, 2L, 4L, 3L, 2L, 3L, 3L, 3L, 2L, 4L, 3L, 3L, 4L, 3L, 4L,
3L, 2L, 2L, 3L, 3L, 2L, 2L, 2L, 3L, 3L, 2L, 3L, 3L, 3L, 3L,
3L, 2L, 3L, 3L, 2L, 4L, 2L, 2L, 4L, 2L, 4L, 4L, 1L, 3L, 3L,
3L, 3L, 3L, 4L, 3L, 3L, 2L, 3L, 3L, 3L, 3L, 2L, 1L, 3L, 3L,
1L, 3L, 2L, 4L, 3L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 3L, 3L, 3L,
4L, 2L, 2L, 2L, 4L, 3L, 2L, 3L, 3L, 4L, 3L, 3L, 3L, 2L, 3L,
3L, 3L, 2L, 3L, 3L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 3L,
3L, 2L, 4L, 4L, 3L, 4L, 3L, 4L, 2L, 3L, 2L, 2L, 3L, 1L, 2L,
2L), levels = c("very low", "low", "high", "very high"), class = "factor")), row.names = c(NA,
-140L), class = c("tbl_df", "tbl", "data.frame"))
The actual dataset has 2000+ rows representing 400+ responses, and work_culture_data
here is a subset of 20 survey responses (20 unique Response_ID
s) where they rate (Level
factor variable from "very low" to "very high") seven factors (Factor
factor variable) about their workplace culture. For example, respondent number 6
thinks their Level_career_prospects
is low
.
Based on work_culture_data
, I'd like to create a 100% stacked bar chart with ggplot2
with the following features:
- The
Factors
are renamed in the final graph, such as fromLevel_career_prospects
to "Career prospects". This will be the vertical axis. - The stacked bars are horizontal where I can specify its component colors.
- There will be seven stacked bars altogether, each representing one of each
Factor
. - The stacked bars are made of the proportion of respondents who chose
Level
s in order from "very low" to "very high" (total four levels). Each segment of a stacked bar represents one ofLevel
s. Each stacked bar adds to 100%. - The horizontal axis has three labelled breaks: 0%, 50%, and 100% from left to right.
- The order of the stacked bars goes from the one with highest proportion of "very low" to least, from top to bottom.
- Ideally, I'd like the count of responses for each segment of the stacked bars shown.
I tried to create this plot starting with this line:
work_culture_fig <- ggplot(work_culture_data, aes(y = Factor, x = Level)) +
geom_col()
However, it gave me this output which baffles me:
I don't know where to go from here and am very confused... Should the tibble
data frame be widened first?
What did I did wrong? And how do I achieve 1~7 above in the final figure?
Thank you.