­Machine Learning and Data Mining

In Practice for Biomedicine,

Seminar and Lab Course, CS6501 - 01

 

Department of Computer Science, University of Virginia

Spring 2014

 

 

 

 

 

Course Logistics

 

Class Description: Rapidly-growing biomedical data resources have posted many interesting new challenges for machine learning, with the data being large, loosely labeled, highly diverse, complex and often relationally structured. This course offers opportunities for students to have in-depth understanding and hands-on experience of applying machine learning / data mining methods on real-world biomedical applications. Students are expected to generate top-tier publications when finishing the course.

 

The course takes the form of half-seminar and half-project to complement the lectures by the instructor.

 

 

Class Time: Tuesdays & Thursdays 12:30pm-1:45pm;

Starting on Tuesday Jan 14th, 2014

 

Instructor: Yanjun (Jane) Qi, Rice Hall 503

 

Course Website :

    Collab page to submit reading assignment @ every Thursday and weekly reports @ every Tuesday

    The instructor’s homepage @ http://www.cs.virginia.edu/yanjun/teach/2014s/

    Corrections or comments to yanjun@virginia.edu

 

Textbook:

    No text book

 

Prerequisites:

    Pre-existing course work of introductory machine learning or data mining course is required.

 

Grading:

    No exams in this course.

    Sit-in: No.  This course is for registered students only.

 

    Final grades will be based on:

    –50% for the participation and in-class paper presentations/discussions (we have 30 classes till April 29th);

    –50% for the quality of the class project delivered by each team;

 

Course Seminar:

    On every Thursday, the whole class will meet @ Rice Hall

    Instructor will give a lecture about the relevant topic for the first half of the class

    The whole class will discuss the papers (assigned every Tuesday) about the topic during the second half of the class

    Each student is expected to prepare a slide with 2 pages summarizing the assigned reading.

    Students are picked up randomly to present their slides of the reading

 

Course Project:

    On every Tuesday, each team takes turns to meet the instructor for 30mins @ Rice Hall 503 (see agenda on collab)

    Each team is required to report their progress formally on the project (weekly summary + plan of action for next week)

    I expect each team will generate a formal publication to a relevant top-tier conference by the end of the semester.

 

 

Tentative schedule: 

 

Date

Class Index

Contents

Readings

Assignments

TU Jan 14

1

Overview of class logistics

Overview of Projects

 

Introduction: Overview of Bioinformatics

TR Jan 16

2

 

SVM + Kernel methods (URL), Andrew Ng lecture videos

reading assignment due

TU Jan 21

3

Team Meet

Project Initial Plan Due

TR Jan 23

4

PDF-1, PDF-2

Gunnar Rätsch, Max Planck Institute, “introduction to bioinformatics”.

reading assignment due

TU Jan 28

5

Team Meet

Project Literature Survey Due

Topic I: Graph Modeling

TR Jan 30

6

PDF

Node-Based Learning of Multiple Gaussian Graphical Models, Journal of Machine Learning Research (JMLR13)

reading assignment due

TU Feb 4

7

Team Meet

Project Data Survey Due

TR Feb 6

8

PDF

BIG & QUIC: Sparse Inverse Covariance Estimation for a Million Variables, NIPS2013

reading assignment due

TU Feb 11

9

Team Meet

Project Progress Report Due

TR Feb 13

10

PDF

Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions, X. Zhu et al, ICML 2003

reading assignment due

TU Feb 18

11

Team Meet

Project Progress Report Due

Topic II: Sequence Modeling

TR Feb 20

12

PDF

Fast String Kernels using Inexact Matching for Protein Sequence, JMLR 2004

reading assignment due

TU Feb 25

 

Team Meet

Project Progress Report Due

TR Feb 27

13

URL

Efficient counting of k-mers in DNA sequences using a Bloom filter. Melsted P, Pritchard JK. BMC Bioinformatics. 2011

reading assignment due

TU Mar 4

14

Team Meet

Project Progress Report Due

TR Mar 6

15

PDF

Deep Supervised and Convolutional Generative Stochastic Network for
Protein Secondary Structure Prediction, ICML 2014

reading assignment due

TU Mar 11

16

Team Meet

Project Progress Report Due

Topic III: (Bio)-Text Modeling

TR Mar 13

17

PDF

Natural Language Processing (almost) from Scratch, JMLR 2011 http://arxiv.org/abs/1103.0398

reading assignment due

TU Mar 18

18

Team Meet

Project Progress Report Due

TR Mar 20

19

PDF

A method for integrating and ranking the evidence for biochemical pathways by mining reactions from text, Bioinformatics 13

reading assignment due

TU Mar 25

20

Team Meet

Project Progress Report Due

Topic IV: (BioMed)-Temporal Modeling

 

TR Mar 27

21

PDF

Segmenting Time Series: A Survey and Novel Approach, http://www.ics.uci.edu/~pazzani/Publications/survey.pdf

Paper submission round I

TU Apr 1

22

Team Meet

Project Progress Report Due

TR Apr 3

23

PDF

Temporal Graphical Models for
Cross-Species Gene Regulatory Network Discovery. CSB10

reading assignment due

TU Apr 8

24

Team Meet

Project Progress Report Due

Topic V: Tools & Optimization

 

TR Apr 10

25

URL

Alternating Direction Method of Multipliers
http://www.yorkey.tk/wp-content/uploads/2013/03/admm.pdf 

 reading assignment due

TU Apr 15

26

Team Meet

Project Progress Report Due

TR Apr 17

27

URL

-       Apache Spark, http://spark.apache.org/docs/latest/

-       GraphLab (http://graphlab.org/home/)

reading assignment due

TU Apr 22

28

Team Meet

Project Progress Report Due

TR Apr 24

29

PDF

Implementing Neural Networks Efficiently, 2012

http://ronan.collobert.com/pub/matos/2012_implementingnn_springer.pdf

reading assignment due

TU Apr 29

30

Team Project Presentations

 

Project Progress Report Due

 Project Summary

 

 

Team Project Presentations

 

Paper submission round II

 

* All papers’ copyrights are reserved by their original copyright owners.