Course information

Course:
BIMS8601: Foundations of Computational Genomics
Term:
Spring 2022
Times:
Tue/Thurs 1:30pm-3:00pm
Location:
McKim Hall 1023 (BEC Classroom) (see map)
Description:
Students will learn foundations of computational methods for analysis of experimental data from genome, epigenome, and transcriptome sequencing experiments. The course will cover various biological data types (whole genome sequencing, ChIP-seq, ATAC-seq, RNA-seq, and DNA methylation profiling/bisulfite sequencing), algorithms, statistical and computational methods, and application areas in genomics and systems biology. Prior coursework/experience in linear algebra, UNIX, and R and Python programming required.
Grading:
  • Homework assignments - 60% (6 assignments)
  • Final presentation - 15% (≈ 1 assignment)
  • Class participation - 15%
  • Scribing - 10% (1 assignment)

Lectures

# Date General topic Instructor Resources
1 02/15/22
Course overview and introduction to computational genomics
Course overview, git/GitHub, history of genomics and sequencing technology, genomic data scale, popular topics in genomics, computing in genomics
Nathan Sheffield
2 02/17/22
Statistics and probability review 1
Random Variables, Probability Distributions, Expectation, Variance, Moment-Generating Functions, Central Limit Theorem
Stefan Bekiranov
3 02/22/22
Statistics and probability review 2
statistical tests, p-value, type I and type II errors, multiple testing corrections, FDR, ROC
Chongzhi Zang

Unit 1: Genome

4 02/24/22
Fundamental string matching algorithms
Local vs. global alignment, Dynamic programming, Heuristic approaches, BLAST
Aakrosh Ratan
5 03/01/22
Suffix trees, Suffix arrays, and Burrows-wheeler transform
Short-read alignments
Aakrosh Ratan
6 03/03/22
Bayes theorem, Likelihood, and Expectation-Maximization
Variant calling, Structural Variants
Aakrosh Ratan
7 03/08/22
Spring Recess
    8 03/10/22
    Spring Recess
      9 03/15/22
      De-bruijn graphs and String graphs
      Genome assembly
      Aakrosh Ratan
      10 03/17/22
      Hidden Markov Models (HMMs)
      Gene-finding, CpG islands and Chromatin states, Gibbs sampling, Expectation maximization
      Aakrosh Ratan
      11 03/22/22
      Linear Regression, Chi-Squared Test of Independence
      Genome Wide Association Studies, eQTLs
      Stefan Bekiranov

      Unit 2: Epigenome

      12 03/24/22
      Regulatory DNA, Transcription factors, Sequence motifs
      PWMs, information entropy, motif finding algorithms
      Chongzhi Zang
      13 03/29/22
      ChIP-seq, Epigenome profiles, Peak detection
      ChIP-seq, read mapping, epigenomic profile construction, narrow peak calling
      Chongzhi Zang
      14 03/31/22
      Epigenomic domains, Hierarchy and scales of genome structure
      Histone modifications, broad peak calling, chromatin domains, 3D genome basics
      Chongzhi Zang
      15 04/05/22
      Genomic intervals: formats, data structures and algorithms
      Genomic intervals; genomic interval file formats; interval operations; interval data structures (R-trees, B+ trees, NCList); interval search
      Nathan Sheffield
      16 04/07/22
      ATAC-seq diagnostics and harmonization
      ATAC-seq count data; data diagnostics; clip functions; consensus peaks; tests of normality; quantile normalization; Q-Q plots; batch correction
      Nathan Sheffield
      17 04/12/22
      Scalable computing in genomics
      Parallelization, workflow management, optimization, Big-Oh complexity, Efficiently processing large sequencing data
      Nathan Sheffield

      Unit 3: Transcriptome

      18 04/14/22
      Genomic data standards and reference genomes
      Standards and interoperability; GA4GH; Reference genomes; refget; sequence collections; APIs; other standards
      Nathan Sheffield
      19 04/19/22
      K-mer analysis
      RNA pseudoalignment; membership testers; Bloom filters
      Nathan Sheffield
      20 04/21/22
      Dimensionality reduction
      Curse of Dimensionality, PCA, NMF, t-SNE, UMAP
      Stefan Bekiranov
      21 04/26/22
      Differential expression analysis
      Mircoarray and Bulk RNA-seq Analysis
      Stefan Bekiranov
      22 04/28/22
      Spatial omics, Encoding of genomic data
      MERFISH, spatial transcriptomics, simplex encoding, Hamming codes
      Chongzhi Zang
      23 05/03/22
      Clustering, transcriptomic data integration
      Clustering algorithms, regulatory networks, transcriptional regulation
      Chongzhi Zang

      Final presentations

      24 05/05/22
      Final Presentations
        25 05/10/22
        Final Presentations

          Assignments

          Throughout the semester, there will be ~6 homework assignments. These assignments are typically programming assignments that involve implementing a method or algorithm or performing a data analysis. Assignments may also include written components or theoretical problems. The assignments will generally be assigned over the course of two weeks, but there is no fixed schedule and due dates will vary by assignment. Each assignment is worth 10% of the final grade.

          Students should complete assignments individually. We want you to work together at the level of sharing ideas, concepts, or suggested functions or reading material. You should not share or seek out completed solutions to the assignments.

          Scribing

          Each student will be assigned a single class session to serve as scribe. The role of the scribe is to take detailed notes for the class on the topic of the class session. This should include background preparation before the assigned class session, note-taking during the class session, expansion of related topics discussed in the class, and final polishing and writing up of the notes after the class session.

          Other class members are welcome to also contribute to notes for any class session, but the primary responsibility for polishing and integrating the notes belongs to the scribe.

          The notes should be submitted into the class scribe repository on GitHub (linked above).

          At the end of the course, all class members will have access to a “book” of complied class notes, which will be public.

          Class participation

          Students are expected to attend class. There is no textbook, but each lecture will have reading material posted. Students should read the lecture material before the lecture. You should plan to invest roughly 3 hours per week on reading the posted outside material. We will not have exams or test your reading, so it’s on your own and this is our guideline to make sure you get the most benefit from the class. You should feel flexible to increase/decrease this moderately according to topics where you have greater or less interest. The lectures will be most useful if you do the reading before the accompanying lecture so that you can come prepared with some background to ask questions.

          Final presentation

          The final presentation should cover one or more methods in computational genomics, which could be either a method we covered in class, or something that we did not cover that you want to present to the class. You should show us an introduction with context for the method, the details of the method, and then some research application. The presentations may focus on a particular application in the student’s research area. The application could also be a research question in extending the method, if you like. Aim for a 10 minute talk, with a couple of minutes for questions. Students should think of the final presentation as roughly equivalent to one homework assignment, in terms of expected preparation.

          Office hours

          Given the diversity of instructors in the course, we do not plan to hold regular office hours, but students should feel free to reach out to any instructor via e-mail to schedule a meeting. We will be available to meet individually with students as needed.

          Missing lectures

          If you need to miss a lecture, we will address it on a case-by-case basis. Possible ways to make up missed lectures could be, for example, you can go through the slides and study the topic on your own, and contribute your notes to the scribe repository, or we may record a lecture for you to review on your own.

          Recordings

          We do not intend to record lectures generally, but instructors may decide to record on a lecture-by-lecture basis, either for students who are missing or for other reasons. The University prohibits the recording of live class sessions unless all students have been informed that recording will occur and may be stored. Therefore, we notify that classes may be recorded at the discretion of the instructor. Any recording will follow UVA protocol, that is, will be stored for instructional purposes with students enrolled in the same class during the same term, and may only be stored on University-owned password-protected sites.