ggplot2 - R, ggplot, separate mean by range of x value -

i have set of data looks this

  chrom   pos gt diff 1 chr01 14653 ct 254 2 chr01 14907 ag 254 3 chr01 14930 ag 23 4 chr01 15190 ga 260 5 chr01 15211 tg 21 6 chr01 16378 tc 1167

where pos range 1xxxx 1xxxxxxx. , chrom categorical variable contains values of "chr01" "chr22" , "chrx".

i want plot scatterplot:

y(diff) vs. x(pos)
having panels separated chrom
grouped gt (different colors gt)

i'm creating ggplot running average (though not time series data).

what want average every 1,000,000 range of pos gt.

for example,

for x in range(1 ~ 1,000,000) , diff average = _____

for x in range(1,000,001 ~ 2,000,000), diff average = _____

and want plot horizontal lines on ggplot (coloured gt).

what have far before apply function: enter image description here

after apply function:

enter image description here

i tried apply solution have, here problems:

there different panels, mean values different different panel, when apply code, horizontal mean lines identical first panel.
i'm having different ranges x-axis, when apply function, automatically fills out range previous horizontal mean line

here code before:

ggplot(data1, aes(x=pos,y=diff,colour=gt)) +   geom_point() +   facet_grid(~ chrom,scales="free_x",space="free_x") +    theme(strip.text.x = element_text(size=40),         strip.background = element_rect(color='lightblue',fill='lightblue'),         legend.position="top",         legend.title = element_text(size=40,colour="darkblue"),         legend.text = element_text(size=40),         legend.key.size = unit(2.5, "cm")) +   guides(fill = guide_legend(title.position="top",                              title = "legend:gt='ref'+'alt'"),          shape = guide_legend(override.aes=list(size=10))) +   scale_y_log10(breaks=trans_breaks("log10", function(x) 10^x, n=10)) +    scale_x_continuous(breaks = pretty_breaks(n=3))

this tougher expected! should @ least started, though:

# saves lot of headaches make factors need them options(stringsasfactors = false)    library(ggplot2) library(plyr)  # here's made-up data - helps if can post subset of # real data, though. dput() function useful that. dat <- data.frame(pos = seq(1, 1e7, = 1e4))   # add random gt value dat$gt <- sample(x = c("ct", "ag", "ga", "tg", "tc"),                  size = nrow(dat),                  replace = true)  # group millions - there several ways can  # never remember, here's simple way split millions dat$posgroup <- floor(dat$pos / 1e6)   # add arbitrary diff value dat$diff <- rnorm(n = nrow(dat),                   mean = 200 * dat$posgroup,                   sd = 300)    # aggregate data gt , pos-group # ideally, you'd inside of plot using stat_summary, # couldn't work. using 2 datasets in plot  # okay, though. datsum <- ddply(dat, .var = "posgroup", .fun = function(x) {      # calculate mean diff value each gt group in posgroup     meandiff <- ddply(x, .var = "gt", .fun = summarise, ymean = mean(diff))      # add center of posgroup range x position     meandiff$center <- (x$posgroup[1] * 1e6) + 0.5e6      # return results     meandiff  })   # on plot, these results grouped both pos , gt - # ggplot accept 1 vector grouping. make combination. datsum$combogroup <- paste(datsum$gt, datsum$posgroup)   # plot ggplot() +      # first, layer points     # large numbers of points can pretty slow - might try getting     # plot work subsample (~1000) , add in rest of     # data     geom_point(data = dat,                 aes(x = pos, y = diff, color = as.factor(gt))) +      # layer means. there variety of geoms     # use here, crossbar ymin , ymax set group mean     # simple 1     geom_crossbar(data = datsum, aes(x = center,                                       y = ymean,                                       ymin = ..y..,                                       ymax = ..y..,                                       color = as.factor(gt),                                      group = combogroup),                   size = 1) +       # other niceties     scale_x_continuous(breaks = seq(0, 1e7, = 1e6)) +     labs(x = "pos", y = "diff", color = "gt") +     theme_bw()

which results in this:

plot of made-up data

there's more straightforward way this, don't know it. hope helps.

Search This Blog

Test

ggplot2 - R, ggplot, separate mean by range of x value -

Comments

Post a Comment

Popular posts from this blog

php - regexp cyrillic filename not matches -

c# - OpenXML hanging while writing elements -

Git submodule update: reference is not a tree... but commit IS there -