0

I am handling some data from a survey regarding feeling about several factors related to workplace culture. It is currently in a longform tibble called work_culture_data like this:

> print(work_culture_data, n = 21)
# A tibble: 140 × 3
   Response_ID Factor                    Level    
         <int> <fct>                     <fct>    
 1           6 Level_support_colleagues  low      
 2           6 Level_support_community   low      
 3           6 Level_career_prospects    low      
 4           6 Level_career_satisfaction high     
 5           6 Level_career_impact       low      
 6           6 Level_collaboration       high     
 7           6 Level_assessment_fairness high     
 8           7 Level_support_colleagues  high     
 9           7 Level_support_community   high     
10           7 Level_career_prospects    very high
11           7 Level_career_satisfaction high     
12           7 Level_career_impact       high     
13           7 Level_collaboration       high     
14           7 Level_assessment_fairness high     
15           8 Level_support_colleagues  high     
16           8 Level_support_community   low      
17           8 Level_career_prospects    very low 
18           8 Level_career_satisfaction high     
19           8 Level_career_impact       high     
20           8 Level_collaboration       low      
21           8 Level_assessment_fairness low      
# … with 119 more rows
# ℹ Use `print(n = ...)` to see more rows

Which can be recreated with this dput() output:

structure(list(Response_ID = c(6L, 6L, 6L, 6L, 6L, 6L, 6L, 7L, 
7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 9L, 9L, 9L, 
9L, 9L, 9L, 9L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 11L, 11L, 
11L, 11L, 11L, 11L, 11L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 13L, 
13L, 13L, 13L, 13L, 13L, 13L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 
15L, 15L, 15L, 15L, 15L, 15L, 15L, 16L, 16L, 16L, 16L, 16L, 16L, 
16L, 17L, 17L, 17L, 17L, 17L, 17L, 17L, 18L, 18L, 18L, 18L, 18L, 
18L, 18L, 19L, 19L, 19L, 19L, 19L, 19L, 19L, 20L, 20L, 20L, 20L, 
20L, 20L, 20L, 21L, 21L, 21L, 21L, 21L, 21L, 21L, 22L, 22L, 22L, 
22L, 22L, 22L, 22L, 23L, 23L, 23L, 23L, 23L, 23L, 23L, 24L, 24L, 
24L, 24L, 24L, 24L, 24L, 25L, 25L, 25L, 25L, 25L, 25L, 25L), 
    Factor = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 
    3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 
    4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 
    5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 
    6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 
    7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 
    1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 
    2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 
    3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 
    4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L), levels = c("Level_support_colleagues", 
    "Level_support_community", "Level_career_prospects", "Level_career_satisfaction", 
    "Level_career_impact", "Level_collaboration", "Level_assessment_fairness"
    ), class = "factor"), Level = structure(c(2L, 2L, 2L, 3L, 
    2L, 3L, 3L, 3L, 3L, 4L, 3L, 3L, 3L, 3L, 3L, 2L, 1L, 3L, 3L, 
    2L, 2L, 4L, 3L, 2L, 3L, 3L, 3L, 2L, 4L, 3L, 3L, 4L, 3L, 4L, 
    3L, 2L, 2L, 3L, 3L, 2L, 2L, 2L, 3L, 3L, 2L, 3L, 3L, 3L, 3L, 
    3L, 2L, 3L, 3L, 2L, 4L, 2L, 2L, 4L, 2L, 4L, 4L, 1L, 3L, 3L, 
    3L, 3L, 3L, 4L, 3L, 3L, 2L, 3L, 3L, 3L, 3L, 2L, 1L, 3L, 3L, 
    1L, 3L, 2L, 4L, 3L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 3L, 3L, 3L, 
    4L, 2L, 2L, 2L, 4L, 3L, 2L, 3L, 3L, 4L, 3L, 3L, 3L, 2L, 3L, 
    3L, 3L, 2L, 3L, 3L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 3L, 
    3L, 2L, 4L, 4L, 3L, 4L, 3L, 4L, 2L, 3L, 2L, 2L, 3L, 1L, 2L, 
    2L), levels = c("very low", "low", "high", "very high"), class = "factor")), row.names = c(NA, 
-140L), class = c("tbl_df", "tbl", "data.frame"))

The actual dataset has 2000+ rows representing 400+ responses, and work_culture_data here is a subset of 20 survey responses (20 unique Response_IDs) where they rate (Level factor variable from "very low" to "very high") seven factors (Factor factor variable) about their workplace culture. For example, respondent number 6 thinks their Level_career_prospects is low.

Based on work_culture_data, I'd like to create a 100% stacked bar chart with ggplot2 with the following features:

  1. The Factors are renamed in the final graph, such as from Level_career_prospects to "Career prospects". This will be the vertical axis.
  2. The stacked bars are horizontal where I can specify its component colors.
  3. There will be seven stacked bars altogether, each representing one of each Factor.
  4. The stacked bars are made of the proportion of respondents who chose Levels in order from "very low" to "very high" (total four levels). Each segment of a stacked bar represents one of Levels. Each stacked bar adds to 100%.
  5. The horizontal axis has three labelled breaks: 0%, 50%, and 100% from left to right.
  6. The order of the stacked bars goes from the one with highest proportion of "very low" to least, from top to bottom.
  7. Ideally, I'd like the count of responses for each segment of the stacked bars shown.

I tried to create this plot starting with this line:

work_culture_fig <- ggplot(work_culture_data, aes(y = Factor, x = Level)) + 
    geom_col()

However, it gave me this output which baffles me:

failed 100% stacked bar plot

I don't know where to go from here and am very confused... Should the tibble data frame be widened first?

What did I did wrong? And how do I achieve 1~7 above in the final figure?

Thank you.

CC BY-SA 4.0

2 Answers 2

2

Not Sure if i understod correctly..

# Start by removing the "levels" from each word
t <- work_cultur_data$Factor
work_cultur_data$Factor <- gsub("Level_([a-z])", " \\U\\1", t, perl=TRUE)
work_cultur_data$Factor<- gsub("^([a-z])", "\\U\\1", t, perl=TRUE)
work_cultur_data$Factor <- str_to_title(str_trim( gsub("_", " ", t) )) 
work_cultur_data$Factor <- t

# Change levels
l <- work_cultur_data$Level
l <-  fct_relevel(work_cultur_data$Level,c("very high","high","low","very low"))
work_cultur_data$Level <- l

# Plot
work_cultur_data %>%   
    ggplot(aes(x=Factor,fill=Level))+
  geom_bar()+labs(fill="")+ylab("")+
  theme(axis.text.y = element_blank(),
        axis.ticks.y = element_blank())

enter image description here

Changed the code to make it as percentage. I also corrected for the axis and made it more clear that it should be porportional.

enter image description here

# Plot
work_cultur_data %>%   group_by(Factor,Level) %>% summarize(prop=n()) %>% 
    ggplot(aes(y=Factor,x=prop,fill=Level))+
  geom_col(position="fill")+labs(fill="")+ylab("")+
  scale_x_continuous(labels = scales::percent)

To change colors manualy:

scale_fill_manual(values=sample(colors(),
                                  length(unique(work_cultur_data$Level))))

This is just a fancy shamncy way to sample different colors each time depending on how many unique levels you got in your fill argument. You could just specify the values to be c("red","#1CD317",colors()[444],"deeppink") - just different types of colors (either HEX code, a name, an index from all possible named colors or my favorite: DEEP PINK!

CC BY-SA 4.0
9
  • 2
    if you wanna flip x and y, actually change x and y in aes, and don't use coord_flip
    – tjebo
    19 hours ago
  • Thank you @RYann that is close! But I'd like the vertical axis in your plot to represent proportions instead of count. And do you have any suggestions for the other things I listed in 1-7?
    – hpy
    19 hours ago
  • 1
    Definitely very close now, and like I asked in the question: Is there a way to specify colors? And can the vertical axis show marks for 0%, 50%, and 100%? Many thanks!
    – hpy
    19 hours ago
  • 1
    Sure. refer to my edited answer
    – RYann
    19 hours ago
  • 1
    I edited the color option. about the levels, I changed it at the beginning. if you want to change it to a different order just edit the section where i specified "Change levels". the relevant command is fct_relevel.
    – RYann
    19 hours ago
1

enter image description here

Ive changed outside of ggplot the order of Factor. You could obviously keep the original features by adding new columns instead of mutating your existing ones. Also I changes instead of using four manual colors to a set pallete. R has really nice coloring options and you can make yourself familiar with them as time progresses. If you prefer manual coloring, use the line from the previous answer.

# Start by removing the "levels" from each word
t <- work_cultur_data$Factor
t <- gsub("Level_([a-z])", " \\U\\1", t, perl=TRUE)
t<- gsub("^([a-z])", "\\U\\1", t, perl=TRUE)
t <- str_to_title(str_trim( gsub("_", " ", t) )) 
work_cultur_data$Factor <- t

# Change levels
l <- work_cultur_data$Level
l <-  fct_relevel(work_cultur_data$Level,c("very high","high","low","very low"))
work_cultur_data$Level <- l


# Sort Factor by prop of Very low (Level)
arng_by <- work_cultur_data %>%
  filter(Level=="very low") %>% 
  group_by(Factor,Level) %>% 
  summarize(prop=n()) %>% arrange(prop) %>% pull(Factor)

f <- work_cultur_data$Factor
f <- fct_relevel(work_cultur_data$Factor,arng_by)
work_cultur_data$Factor <- f
# Plot
work_cultur_data %>%   group_by(Factor,Level) %>% summarize(prop=n()) %>% 
    ggplot(aes(y=Factor,x=prop,fill=Level))+
  geom_col(position="fill")+labs(fill="")+ylab("")+
  scale_x_continuous(labels = scales::percent)+
  scale_fill_brewer(palette = 11)
CC BY-SA 4.0
0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy

Not the answer you're looking for? Browse other questions tagged or ask your own question.