Machine
Learning and Data Mining
In Practice for
Biomedicine,
Seminar and Lab
Course, CS6501 - 01
Department of Computer Science,
University of Virginia
Spring 2014
Course
Logistics
Class
Description: Rapidly-growing biomedical data resources
have posted many interesting new challenges for machine learning, with the data
being large, loosely labeled, highly diverse, complex and often relationally
structured. This course offers opportunities for students to have in-depth
understanding and hands-on experience of applying machine learning / data
mining methods on real-world biomedical applications. Students are expected to
generate top-tier publications when finishing the course.
The course takes the form of half-seminar and half-project to
complement the lectures by the instructor.
Class Time: Tuesdays & Thursdays 12:30pm-1:45pm;
Starting on
Tuesday Jan 14th, 2014
Instructor: Yanjun (Jane) Qi, Rice Hall 503
Course Website :
•
Collab page to submit
reading assignment @ every Thursday and weekly reports @ every Tuesday
•
The instructor’s homepage @ http://www.cs.virginia.edu/yanjun/teach/2014s/
•
Corrections or comments to yanjun@virginia.edu
Textbook:
•
No text book
Prerequisites:
•
Pre-existing course work of introductory machine learning or
data mining course is required.
Grading:
•
No exams in this course.
•
Sit-in: No. This course is
for registered students only.
•
Final grades will be based on:
•
–50% for the participation and in-class paper
presentations/discussions (we have 30 classes till April 29th);
•
–50% for the quality of the class project delivered by each
team;
Course Seminar:
•
On every Thursday, the whole class will meet @ Rice Hall
•
Instructor will give a lecture about the relevant topic for the
first half of the class
•
The whole class will discuss the papers (assigned every Tuesday) about the
topic during the second half of the class
•
Each student is expected to prepare a slide with 2 pages
summarizing the assigned reading.
•
Students are picked up randomly to present their slides of the
reading
Course Project:
•
On every Tuesday, each team takes turns to meet the instructor for
30mins @ Rice Hall 503 (see agenda on collab)
•
Each team is required to report their progress formally on the
project (weekly summary + plan of action for next week)
•
I expect each team will generate a formal publication to a
relevant top-tier conference by the end of the semester.
Tentative
schedule:
Date |
Class Index |
Contents |
Readings |
Assignments |
TU Jan 14 |
1 |
Overview of class logistics |
Overview of Projects |
|
Introduction:
Overview of Bioinformatics |
||||
TR Jan 16 |
2 |
|
SVM + Kernel methods (URL), Andrew Ng lecture videos |
reading assignment due |
TU Jan 21 |
3 |
Team
Meet |
Project Initial Plan Due |
|
TR Jan 23 |
4 |
Gunnar Rätsch,
Max Planck Institute, “introduction
to bioinformatics”. |
reading assignment due |
|
TU Jan 28 |
5 |
Team
Meet |
Project Literature Survey Due |
|
Topic I: Graph
Modeling |
||||
TR Jan 30 |
6 |
Node-Based Learning of Multiple Gaussian Graphical Models, Journal
of Machine Learning Research (JMLR13) |
reading assignment due |
|
TU Feb 4 |
7 |
Team
Meet |
Project Data Survey Due |
|
TR Feb 6 |
8 |
BIG & QUIC: Sparse Inverse Covariance Estimation for a Million
Variables, NIPS2013 |
reading assignment due |
|
TU Feb 11 |
9 |
Team
Meet |
Project Progress Report Due |
|
TR Feb 13 |
10 |
Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions,
X. Zhu et al, ICML 2003 |
reading assignment due |
|
TU Feb 18 |
11 |
Team
Meet |
Project Progress Report Due |
|
Topic II: Sequence
Modeling |
||||
TR Feb 20 |
12 |
Fast String Kernels using Inexact Matching for Protein
Sequence, JMLR 2004 |
reading assignment due |
|
TU Feb 25 |
|
Team
Meet |
Project Progress Report Due |
|
TR Feb 27 |
13 |
Efficient counting of k-mers in DNA
sequences using a Bloom filter. Melsted
P, Pritchard JK. BMC Bioinformatics. 2011 |
reading assignment due |
|
TU Mar 4 |
14 |
Team
Meet |
Project Progress Report Due |
|
TR Mar 6 |
15 |
Deep Supervised and Convolutional Generative Stochastic
Network for |
reading assignment due |
|
TU Mar 11 |
16 |
Team
Meet |
Project Progress Report Due |
|
Topic III:
(Bio)-Text Modeling |
||||
TR Mar 13 |
17 |
Natural Language Processing (almost) from Scratch, JMLR 2011 http://arxiv.org/abs/1103.0398 |
reading assignment due |
|
TU Mar 18 |
18 |
Team
Meet |
Project Progress Report Due |
|
TR Mar 20 |
19 |
A method for integrating and ranking the evidence for
biochemical pathways by mining reactions from text, Bioinformatics 13 |
reading assignment due |
|
TU Mar 25 |
20 |
Team Meet
|
Project Progress Report Due |
|
Topic IV: (BioMed)-Temporal Modeling |
||||
|
||||
TR Mar 27 |
21 |
Segmenting Time Series: A Survey and Novel Approach,
http://www.ics.uci.edu/~pazzani/Publications/survey.pdf |
Paper submission
round I |
|
TU Apr 1 |
22 |
Team
Meet |
Project Progress Report Due |
|
TR Apr 3 |
23 |
Temporal Graphical Models for |
reading assignment due |
|
TU Apr 8 |
24 |
Team
Meet |
Project Progress Report Due |
|
Topic V: Tools
& Optimization |
||||
|
||||
TR Apr 10 |
25 |
Alternating Direction Method of Multipliers |
reading assignment due |
|
TU Apr 15 |
26 |
Team
Meet |
Project Progress Report Due |
|
TR Apr 17 |
27 |
-
Apache Spark, http://spark.apache.org/docs/latest/ -
GraphLab (http://graphlab.org/home/) |
reading assignment due |
|
TU Apr 22 |
28 |
Team
Meet |
Project Progress Report Due |
|
TR Apr 24 |
29 |
Implementing Neural Networks Efficiently, 2012 http://ronan.collobert.com/pub/matos/2012_implementingnn_springer.pdf |
reading assignment due |
|
TU Apr 29 |
30 |
Team Project Presentations |
|
Project Progress Report Due |
Project Summary |
||||
|
|
Team Project Presentations |
|
Paper
submission round II |
* All papers’ copyrights are
reserved by their original copyright owners.