Rhee ChIP-exo Analysis

Rhee, H. S., Bataille, A. R., Zhang, L., & Pugh, B. F. (2014). Subnucleosomal Structures and Nucleosome Asymmetry across a Genome. Cell, 159(6), 1377–1388. doi:10.1016/j.cell.2014.10.054

cell_2014_rhee_pugh.pdf

Summary: Paired-end ChIP-exo of H2A, H2B, H3, and H4 throughout the S. cerevisiae genome. Also ChIP-exo on H3K4me3, H3K79me2/3, H3K36me3, H3K9ac, H2Bub, H2A.Z.

NCBI Data Deposition: http://www.ncbi.nlm.nih.gov/bioproject/PRJNA178297
SRA Experiments from this paper: SRA059355
SRX Experiment SRX655410: ChIP-exo histones 3

ChIP-exo Histone H3 (Biological Replicate 2): SRR1517820

Data Analysis

Download the SRA for SRR157820 (paired-end ChIP-exo), extract to two fastq files (R1 and R2) by the command:

fastq-dump --split-3 SRR157820.sra

Take a look at the following sample read fragment schematic: read_seq_explanation.png

Since lambda exonuclease is direction-specific (only digests single-strand in 5' -> 3' direction, take a look at Figure 1A from their initial paper Cell_2011_Rhee_Pugh.pdf), the 5' end of the read is the protein-DNA boundary
In single-end ChIP-exo, the 5' edge of the read is the boundary position, irrespective of whether the read aligns to the positive or negative strand
In paired-end sequencing, typically we don't care about the strand alignment (forward or reverse) of the "R1 read"; however, in paired-end ChIP-exo, the 5' edge of the read corresponding to the protein-DNA boundary is the "R1 read". Here, the "R2" read is just used for additional "mapping" quality. (Note: This is what I think to be true, I'm actually not entirely sure since they don't say in their methods. If your plot doesn't match up with their plot, this could be the reason why).
Thus, after paired-end alignment, the reads should be filtered so that only the read alignments corresponding to the "R1 reads" are kept. This can be accomplished during the conversion from SAM -> BAM by the following command:
samtools view -b -S -h -f 64 -o <bam_file_name>.bam <bowtie_output>.sam

Basically we want to first create a plot around the TSS sites with H3, to see if we can recreate their Figure 1A, 3rd panel from left. I recreated that figure using their published data from Supplementary Table S2 (H3 worksheet). Here's what it looks like (2014_cell_rhee_pugh_figure_1A_h3.png). Note that these genes were pre-ordered by H3 occupancy level at the +1 nucleosome (not something that you have to do).

You can find a list of the TSS positions from this CSV file on alchemy:
/data/illumina_pipeline/scripts/feature_files/yeast/gene_tables/xu_tss_sites_david_async_roberts_g1_alpha_expr.csv

Try to recreate this plot around the TSSs from that feature file. Remember that you're going to have to "flip" the genes that are on the negative strand, since we want the direction of transcription to always be from left-to-right in these plots. Just plot the 5' end of each read from the R1 file as above (remember to follow the read_seq_explanation.png, the SAM alignment position of forward strand reads is the 5' position, but the SAM alignment position of reverse strand reads have to be shifted by the read sequencing width (40 bp here) downstream to get the 5' position).