Class syllabus for Fall 2004
This course studies the practical aspects of large scale genomic data processing. Emphases will be placed on projects that carry out major genomic data processing steps. Important bioinformatic tools licensed from various genome research centers will be used in the projects. Students will also be taught to develop some simple scripting programs to process their own data. Topics include base-calling, raw sequence cleaning and contaminant removal; sequence database construction, search and update; shotgun assembly procedures and EST clustering methods; genome closure strategies and practices; sequence homology search and function prediction; annotation and submission of GenBank reports; data collection and dissipation through the Internet; and scripting languages for linking together an automatic genomic data processing pipeline. Additional topics in post-genomic research will also be discussed, including whole-genome oligo microarray design, microarray image processing, expression profile clustering, and pathway database. See the class schedule for an overview of the tentative course content for this semester.
Class place and time
Classes will be held TuTh 11:00-12:15 at Molecular Biology 1424.
Mr. Sang-Jin Kim
Class web page: http://www.cs.iastate.edu/~cs596
A few words about the class account and database
Note that genomic data can be quite enormous. For example, the chromatogram files used to assemble a mere 150K BAC can take up to 1G in total disk space! Although our projects are designed such that you will not need to store a lot of the data at the end, you have to be cautious about how much disk storage space you take to store your temporary results in your account and/or the database. Remove unnecessary files/tables as soon as possible to free up space for your next project. You may be denied access to your accounts due to disk quota violations until you delete some files. This limitation is not imposed by the instructor but by the system administrators of the machines you use.
The wide spectrum of class topics cannot be covered by a single textbook. Most information related to the bioinformati tools can be obtained from multiple sources on the Internet. We therefore do not have a single textbook chosen for this class. Instead, listed below are important reference books that will cover certain topics of this class. Obtaining of these reference books is optional and you can decide for yourself whether you want to keep some of the references. We will rely on lecture notes, online documentation, and class handouts to serve as our main teaching material. If you wish to pursue a future career in this field, however, the following books may be worth collecting:
Most relevent reference books:
Programming/Unix reference books:
Reference books about bioinformatic algorithms:
Reference books about genomic research lab procedures:
Wednesday, December 15 at 9:45AM.
There will be a final exam, which will make up 30% of your semester score. The rest 70% of your score will be divided among 7 working projects. Some projects are very easy, but some others require a little programming efforts. The weight of each project will be determined depending on its difficulty level. If you have done all projects at the end of the semester, the Instructor may give you a 5% extra credit, especially when your score is at the borderline of two different grade levels. This is solely at the discretion of the Instructor and cannot be negotiated. Your performance on the projects will determine the majority of your final score. Therefore, you are advised to start working on your projects at the earliest possible time. The date each project will due is clearly indicated in the class schedule. Late projects will be penalized at the rate of 5% for the first day, 15% for the second day and 20% thereafter. Since we will be working on genomic data processing steps, some projects will depend on previous ones to continue. If you skip any project, it may be hard to finish the rest of the projects. We will take a normalization of everybody's scores at the end of the semester by setting the highest score at 100 points and adjust others accordingly. Your score will then be rounded to the nearest full percentile, and your letter grade will be determined by the following table:
To prepare for this class, the instructor has spent great efforts collecting many important bioinformatic software tools from genome research centers and companies around the world. We also license their actual genomic data as projects material. Our licensers support higher education, and are willing to let us use their proprietary software tools and data without fee. However, they asked that we only use them in our class, and never distribute them outside of our class. To cope with the the licensing requirement set forth by our licensers and to prevent the termination of our licenses if some of us violate their request, you need to agree to this license agreement. Print a copy of it out, fill in your account information so we can enable your access to the class meterial, sign it, and give it back to the instructor. It is very important that you agree and abide to the licensing agreement so we can continue to have this course offered. If you have any concern with the licensing agreement, feel free to discuss it with the instructor.
Any academic dishonesty, including but not limited to, exchange of program codes, cheating during exams, plagiarism of projects, fabrication of results, sabotaging others' efforts, etc., will be viewed as academic offenses and will result in a F grade. Serious cases will be forwarded to the appropriate university committees for additional disciplinary actions. General discussions of projects at a conceptual level, including sharing experiences in the use of tools and help in debugging, etc. are allowed and encouraged.