ggplot2 - R, ggplot, separate mean by range of x value -


i have set of data looks this

  chrom   pos gt diff 1 chr01 14653 ct 254 2 chr01 14907 ag 254 3 chr01 14930 ag 23 4 chr01 15190 ga 260 5 chr01 15211 tg 21 6 chr01 16378 tc 1167 

where pos range 1xxxx 1xxxxxxx. , chrom categorical variable contains values of "chr01" "chr22" , "chrx".

i want plot scatterplot:

  • y(diff) vs. x(pos)
  • having panels separated chrom
  • grouped gt (different colors gt)

i'm creating ggplot running average (though not time series data).

what want average every 1,000,000 range of pos gt.

for example,

for x in range(1 ~ 1,000,000) , diff average = _____

for x in range(1,000,001 ~ 2,000,000), diff average = _____

and want plot horizontal lines on ggplot (coloured gt).

#

what have far before apply function: enter image description here

after apply function:

enter image description here

i tried apply solution have, here problems:

  • there different panels, mean values different different panel, when apply code, horizontal mean lines identical first panel.
  • i'm having different ranges x-axis, when apply function, automatically fills out range previous horizontal mean line

here code before:

ggplot(data1, aes(x=pos,y=diff,colour=gt)) +   geom_point() +   facet_grid(~ chrom,scales="free_x",space="free_x") +    theme(strip.text.x = element_text(size=40),         strip.background = element_rect(color='lightblue',fill='lightblue'),         legend.position="top",         legend.title = element_text(size=40,colour="darkblue"),         legend.text = element_text(size=40),         legend.key.size = unit(2.5, "cm")) +   guides(fill = guide_legend(title.position="top",                              title = "legend:gt='ref'+'alt'"),          shape = guide_legend(override.aes=list(size=10))) +   scale_y_log10(breaks=trans_breaks("log10", function(x) 10^x, n=10)) +    scale_x_continuous(breaks = pretty_breaks(n=3)) 

this tougher expected! should @ least started, though:

# saves lot of headaches make factors need them options(stringsasfactors = false)    library(ggplot2) library(plyr)  # here's made-up data - helps if can post subset of # real data, though. dput() function useful that. dat <- data.frame(pos = seq(1, 1e7, = 1e4))   # add random gt value dat$gt <- sample(x = c("ct", "ag", "ga", "tg", "tc"),                  size = nrow(dat),                  replace = true)  # group millions - there several ways can  # never remember, here's simple way split millions dat$posgroup <- floor(dat$pos / 1e6)   # add arbitrary diff value dat$diff <- rnorm(n = nrow(dat),                   mean = 200 * dat$posgroup,                   sd = 300)    # aggregate data gt , pos-group # ideally, you'd inside of plot using stat_summary, # couldn't work. using 2 datasets in plot  # okay, though. datsum <- ddply(dat, .var = "posgroup", .fun = function(x) {      # calculate mean diff value each gt group in posgroup     meandiff <- ddply(x, .var = "gt", .fun = summarise, ymean = mean(diff))      # add center of posgroup range x position     meandiff$center <- (x$posgroup[1] * 1e6) + 0.5e6      # return results     meandiff  })   # on plot, these results grouped both pos , gt - # ggplot accept 1 vector grouping. make combination. datsum$combogroup <- paste(datsum$gt, datsum$posgroup)   # plot ggplot() +      # first, layer points     # large numbers of points can pretty slow - might try getting     # plot work subsample (~1000) , add in rest of     # data     geom_point(data = dat,                 aes(x = pos, y = diff, color = as.factor(gt))) +      # layer means. there variety of geoms     # use here, crossbar ymin , ymax set group mean     # simple 1     geom_crossbar(data = datsum, aes(x = center,                                       y = ymean,                                       ymin = ..y..,                                       ymax = ..y..,                                       color = as.factor(gt),                                      group = combogroup),                   size = 1) +       # other niceties     scale_x_continuous(breaks = seq(0, 1e7, = 1e6)) +     labs(x = "pos", y = "diff", color = "gt") +     theme_bw() 

which results in this:

plot of made-up data

there's more straightforward way this, don't know it. hope helps.


Comments

Popular posts from this blog

php - regexp cyrillic filename not matches -

c# - OpenXML hanging while writing elements -

Git submodule update: reference is not a tree... but commit IS there -