CS1315: Introduction to Media Computation - PowerPoint PPT Presentation

View by Category
About This Presentation
Title:

CS1315: Introduction to Media Computation

Description:

* I usually bring in a couple musical instruments (harmonica, thumb piano, ukelele, flute) ... Introduction to Media Computation Author: Mark Guzdial Last modified by: – PowerPoint PPT presentation

Number of Views:84
Avg rating:3.0/5.0
Slides: 79
Provided by: MarkG210
Learn more at: http://www.cs.uml.edu
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: CS1315: Introduction to Media Computation


1
COMP.3500/5800 Topics in Bioinformatics
  • Webpage
  • http//www.cs.uml.edu/kim/580.html
  • What is bioinformatics ?
  • Study of DNA sequences, genomes, protein
  • Modeling/Inference
  • What are involved in bioinformatics ?
  • Biology
  • Statistics Linear Algebra
  • Computer Science
  • Algorithms
  • Machine learning

2
COMP.3500/5800 Topics in Bioinformatics
  • Why study bioinformatics ?
  • What makes us different ? 99.9 genomes are
    identical
  • How different cells are developed from the same
    genome ?
  • Study mutation in genome -gt drug development
  • For CS,
  • Study how bioinformatics algorithms are
    implemented

3
Topics today
  • Textbook, pgs. 16-19
  • DNA RNA
  • Genes to Proteins
  • transcription
  • Translation
  • Genome

4
DNA RNA Genes to Proteins Genomes
5
DNA and RNA
  • DNA (deoxyribonucleic acid) and RNA (ribonucleic
    acid) are composed of linear chains of monomeric
    units of nucleotides
  • A nucleotide has three parts a sugar, a phophate
    and a base
  • Four bases

6
Base Types
  • Nucleic acid bases are of two types
  • Pyrimidine pairím?dìn C, T, U (two nitrogens
    in 6-member ring at positions 1 and 3)
  • Purine A, G (pyrimidine ring fused to an
    imidazole ring (C3H4N2))

7
R
Y
W
M
K
B
A T G C
A T G C
A T G C
A T G C
s
N
H
V
D
A T G C
A T G C
A T G C
A T G C
8
Primary Structure of DNA and RNA
  • Nucleotides are joined by phosphodiester bonds
    and form sugar-phosphate backbone
  • Sugar is deoxyribose in DNA (left) and ribose in
    RNA (right)
  • Nitrogen-containing nucleobases are bonded to
    sugar

9
Online course on Biology
  • Educational Portal
  • DNA chemical structure
  • http//education-portal.com/academy/lesson/dna-and
    -the-chemical-structure-of-nucleic-acids.html

10
Secondary Structure
  • Double helix 1953 Watson and Crick using X-ray
    diffraction
  • Sugar-phosphate backbone is the outer part of the
    helix
  • Two strands run in antiparallel directions
  • Dimensions
  • Inside diameter of backbone 11 A (1.1 nm)
  • Outer diameter 20 A (1A10-10 m 0.1 nm)
  • Length of one complete turn 34 A, 10 base-pairs
  • Major and minor grooves drugs or polypeptides
    bind to DNA

11
Secondary Structure of DNA
  • Two strands are complementary
  • Base pairing A-T G-C
  • Pyrimidine and Purine form complementary H
    bonding

12
Monomer counts in DNA
  • In double strands
  • of A of T of G of C
  • Erwin Chargaffs 1st Parity Rule, 1951
  • In a single strand ?
  • of A of T of G of C
  • Erwin Chargaffs 2nd Parity Rule
  • How about oligomer (a few successive bases)
    frequencies ?

13
Oligomer Frequencies
  • Oligomer length k
  • Window of k sliding by one base (overlapping k-1
    bases)
  • A simple counting program
  • May have to contend with long sequences
  • An oligomer and its reverse complement
  • ACT vs. AGT

A C T A A G C G
A C T A A G C G
A C T A A G C G
14
Trimer Frequencies in Yeast
15
Trimer Frequencies in a Few Species
16
Importance of Hydrogen Bonding
  • Many consider hydrogen bond essential to the
    evolution of life
  • Individual hydrogen bond is weak, many H bonds
    collectively exert very strong force
  • Orderly repetitive arrangement of H bonds in
    polymers determines their shape

17
(No Transcript)
18
Online course on Biology
  • Educational Portal
  • Four bases
  • http//education-portal.com/academy/lesson/dna-ade
    nine-guanine-cytosine-thymine-complementary-base-p
    airing.html

19
Chromosome Length
  • 3.4A per base
  • 3 Billion bases
  • 1.8 meters of DNA
  • 0.09 nm of chromatin after being wound on
    histones
  • Five families of histones
  • H1/H5, H2A, H2B, H3, and H4

20
(No Transcript)
21
RNA
  • Sugar in RNA nucleotide is ribose rather than
    2-deoxyribose
  • Thymine is replaced by uracil (U)
  • RNA polymers are usually a few thousand
    nucleotides or shorter
  • RNA in cells is usually single-stranded
  • RNA is considered to be the original gene coding
    material, and it still code genes in a few viruses

22
RNA Types
  • Four RNAs are involved in protein synthesis

RNA Type Size Function
Transfer RNA Small Transports AA to protein synthesis sites
Ribosomal RNA Variable combines with proteins to form ribosome, where protein polypeptide chain grows
Messenger RNA Variable Transcribes AA sequence from genes
Small nuclear RNA Processing of initial mRNA to its mature form in eukaryotes
23
Online course on Biology
  • Educational Portal
  • RNA
  • http//education-portal.com/academy/lesson/differe
    nces-between-rna-and-dna-types-of-rna-mrna-trna-rr
    na.html

24
DNA RNA Genes to Proteins Genomes
25
Central Dogma in Molecular Biology
  • Info flows in one direction
  • DNA/genome
  • A template or a roadmap
  • RNA
  • Copies of genes to be expressed (activated)
  • Protein
  • Biochemical molecules performing biological
    functions

26
Gene to Protein Transcription Translation
27
Translation
28
Transcription
coding, sense
anti-sense
29
Transcription
30
Gene to Protein
Protein Coding Region
3UTR
5UTR
mRNA
Non-Protein Coding Region
Non-Protein Coding Region
Protein 2
Protein 1
exon
intergenic
intron
UTR
31
Alternative Splicing
32
Translation
  • Genetic Code
  • A triplet (called codon)
  • Ribosome moves along mRNA 3 bases at a time
  • Degenerate coding
  • 4x4x464 possible triplets into 20 Amino Acids
  • 8 AA have 3rd base irrelevant immune to
    mutation
  • Anti-codon reverse complement of a codon

33
Genetic Code
34
Genetic Code
35
Genetic Code
36
Amino Acids
  • General structure of amino acids
  • an amino group
  • a carboxyl group
  • a-carbon bonded to a hydrogen and a side-chain
    group, R
  • R determines the identity of particular amino
    acid
  • Protein a sequence of AAs
  • R large white and gray
  • C black
  • Nitrogen blue
  • Oxygen red
  • Hydrogen white

37
Translation and Transcription in Details
38
(No Transcript)
39
mRNA (messenger RNA) another view
40
Transcription
  • Gene sequence is copied from one strand
  • Sense strand mRNA sequence
  • Antisense strand is used to generate mRNA
    sequence
  • 5CGCTATAGCGTTTCAT 3 -- antisense, template
    strand
  • 3GCGATATCGCAAAGTA 5 sense, coding strand

41
Template, anti-sense
sense
42
RNAs
  • Protein coding RNA mRNA
  • Non-coding RNAs
  • tRNA (transfer RNA)
  • rRNA (ribosomal RNA)
  • Involved in protein synthesis
  • 80-85 of total RNAs
  • snRNA (small nuclear RNA)
  • Localized to the nucleus
  • Consists of families of RNAs responsible for
    functions such as RNA splicing
  • E.g. splicsome five snRNAs U1, U2, U4, U5 and
    U6
  • snoRNA (small nucleolar RNA)
  • Synthesis of rRNA occurs in nucleolus,
    specialized structure within the nucleus,
    facilitated by snoRNAs
  • miRNA (micro RNA)
  • Short 22-nt, regulating gene expressions

43
Ribosomes for Translation
  • Role of ribosomes in protein synthesis
  • Coordinate protein synthesis by placing mRNA,
    tRNA and protein factors in their correct
    positions
  • Components of ribomsomes catalyze at least some
    chemical reactions occurring during translation
  • Each ribosome consists of two units
  • 45S (18S, 28S, 5.8S) subunits in eukaryotes
  • 50S and 30S in bacteria (16S, 23S, 5S)

44
Schematic drawing of secondary structure for 16S
rRNA. Intrachain folding pattern includes loops
and double-stranded regions.
45
Protein Synthesis
  • Translation from mRNA to protein
  • mRNA is transported out of a cell nucleus,
    translated
  • tRNA (transfer RNA)

46
(No Transcript)
47
tRNA
  • Anti-codon and AA ends do not know each other
  • Aminoacyl-tRNA synthetase recognizes a DHU loop
    and determines which AA is attached to a tRNA

48
Is the genetic code arbitrary?
  • http//www.cs.uml.edu/kim/580/SA_genetic_code.pdf
  • Douglas Hofstadter, Scientific American, early
    80s
  • Where is the genetic code stored ?
  • Ribosome translate a codon to an AA
  • How does ribosome know which AA to attract for a
    codon?
  • tRNA has anti-codon in one end and AA in the
    other
  • Does a tRNA know the genetic code ?
  • No. Aminoacyl-tRNA synthetase matches up DHU loop
    to an AA

49
Is the genetic code arbitrary?
  • In a cell
  • Remove all mRNA and tRNA and discard them
  • Remove all DNA, modify it according to a new
    genetic code, and insert them back to the cell
  • Leave all ribosomes and others intact
  • Will the cell function the same way (or, will the
    same set of proteins be manufactured by the new
    genetic code) ?

50
Is the genetic code arbitrary?
  • New DNA has new coding for tRNA, mRNA
  • New tRNA
  • According to DHU loop, synthetase matches new
    anti-codon to an AA which would have been matched
    up by the old genetic code
  • Therefore, new genetic code will generate the
    same proteins

51
HW 1
  • Read Hofstadters article
  • Submit a report on 1/30/17, including
  • Re-statement of his hypothesis
  • Bases of his claim
  • If you agree with him claim
  • Justification of your argument
  • Include references in the report

52
DNA RNA Genes to Proteins Genomes
53
Genome
  • Genome
  • The entire DNAs of a cell is the genome
  • Individual units for coding proteins or RNA are
    genes
  • A gene starts with ATG, ends with one or two stop
    codons
  • Called ORF (Open Reading Frame)
  • Biological Info
  • Contained in genome
  • Encoded in nucleotide sequences of DNA or RNA
  • Partitioned into discrete units, genes

54
Cell
  • Different levels of cells
  • Prokaryote (karyan, kernel in
    Greek)(/proekaeri?ts) (pro for before)
  • Eukaryote (true)
  • Main difference is the presence of organelle,
    especially the nucleus, in eukaryotes

Organelle Prokaryotes Eukaryotes
Nucleus No definite nucleus Present
Cell membrane Present Present
Mitochondria None. Present
Endoplasmic reticulum None Present
Ribosomes Present Present
Chloroplasts None Present in green plants
55
animal cell
plant cell
Prokaryotic cell
56
Three Domain
  • Classification purely based on biochemistry (RNA)
  • C. Woese, 1981
  • Eubacteria (true bacteria)
  • Archaea (archaebacteria, early bacteria)
  • Eukarya (eukaryotes)

57
Genome Sequencing Projects
  • Major genome sequencing centers
  • U.S. Dept. of Energy Joint Genome Institute (435
    projects)
  • J. Craig Venter Institue (302)
  • The Institute for Genomic Research (TIGR) (206)
  • Washington Univ. (184)
  • Institut Pasteur, Univ. of Tokyo
  • www.ncbi.nlm.nih.gov/genomes/static/lcenters.html
  • national center for biotechnology information
  • Completely sequenced genomes include
  • Several hundred bacteria, over 20 archea, and
    over 30 eukarya
  • Human (homo sapies), chimpanzee (Pan
    troglodytes), mouse (Mus musculus), brown rat
    (Rattus norvegicus), dog (Canis familiaris),
    Thale cress (Arabidopsis thaliana), rice (Oryza
    sativa), Fruit fly (Drosophila melanogaster),
    yeast (Saccharomyces cerevisiae)
  • http//www.ebi.ac.uk/2can/genomes/genomes.html
    has descriptions of species and their clinical
    and scientific significances
  • http//www.genomesonline.org has current status
    of genome projects

58
Genome Databases
  • Completed genomes
  • ftp site -- ftp//ftp.ncbi.nlm.nih.gov/genomes/
  • http//www.ncbi.nlm.nih.gov/PMGifs/Genomes/allorg.
    html
  • http//www.ebi.ac.uk/genomes/mot/index.html
  • http/pir.goergetown.edu/pirwww/search/genome.html
  • Organism-specific databases
  • http//www.unledu/stc-95/ResTools/biotools/biotool
    s10.html
  • http//www.fp.mcs.anl.gov/gaasterland/genomes.htm
    l
  • http//www.hgmp.mrc.ac.uk/GenomeWeb/genome-db.html
  • http//www.bioinformatik.de/cgi-bin/browse/Catalog
    /Databases/Genome_Proejcts

59
Genomes of Prokaryotes
  • Circular double-stranded DNA
  • Protein-coding regions do not contain introns
  • Protein-coding regions are partially organized
    into operons tandom genes transcribed into a
    single mRNA molecule
  • The density of coding region is high
  • 89 in E.Coli

trpE
trpD
The trp operon in E.Coli begins with control
region, followed by genes performing successive
steps in systhesis of tryptophan AA
60
Genome of E.Coli
  • Many E.Coli proteins were known before the
    sequencing (1853 proteins)
  • Genome of Escherichia coli, strain MG1655
    published in 1997
  • By F. Blattner at Univ. Wisconsin
  • 4.64 Mbp
  • 4284 protein-coding genes, 122 structural RNA
    genes, Non-coding repeat sequences, Regulatory
    elements, etc.
  • Average size of ORF is 317 AA
  • Average inter-genic gap is 118 bp
  • ¾ transcribe single genes, and the rest are
    operons (gene clusters)
  • 60 protein functions are known
  • http//wishart.biology.ualberta.ca/BacMap/index.ht
    ml contains an atlas of bacterial genome diagram
    (2005)

61
(No Transcript)
62
Genome of Archea
  • Microorganism Methanococcus jannaschii
  • thrives in hydrothermal vents at temp from 48 to
    94 CB genes from 45 strains
  • Capable of self-reproduction from inorganic
    components
  • Metabolism is to synthesize methane from H2 and
    CO2
  • Sequenced in 1996 by TIGR
  • 1.665 Mbp in chromosome containing a circular DNA
    modecule, two extra-chromosomal elements
  • 1,784 protein-coding regions
  • Proteins in archea for transcription and
    translation are closer to those in eukaryote
  • Proteins involved in metabolism are closer to
    those of bacteria

63
Genomes of Eukarya
  • Majority of DNA is in the nucleus
  • Organized into chromosomes containing single-DNA
    molecule each
  • Smaller amount of DNA in organelles such as
    mitochondria and chloroplasts
  • Organelles originated as intra-cellular parasites
  • Organelle genomes usually have circular forms,
    but sometimes in linear or multi-circular shape
  • Genetic code is different that the one for
    nuclear genes
  • Diverse among species
  • Humans have 23 chromosomes, chimpanzees have 24
  • Human chromosome 2 is equivalent to a fusion of
    chimpanzee chromosomes 12 and 13
  • List of genome sequences
  • http//en.wikipedia.org/wiki/List_of_sequenced_euk
    aryotic_genomes

64
Genome of Saccharomyces cerevisiae (Yeast)
  • Simplest eukaryotic organism
  • Sequencing from 100 labs completed in 1992
  • 12.06 Mbp
  • 16 chromosomes
  • 6,172 protein-coding genes
  • Dense only 231 genes contain introns

65
Genome of Caenorhabditis elegans (C. elegans)
  • Completed in 1998
  • First full DNA sequence of a multi-cellular
    organism
  • 97 Mbp
  • Paired chromosomes
  • XX for a self-fertilizing hermaphrodite
    (simultaneously male and female)
  • XO for male
  • Avg. 5 introns per gene
  • Proteins
  • 42 have homologues to other species
  • 34 specific to nematodes (round worms)
  • 24 no known homologues

Chromosome Size (Mbp) Protein genes Kbp/gene
I 7.9 2803 5.06
II 8.5 3559 3.05
III 7.6 2508 5.40
IV 9.2 3094 5.17
V 9.8 4082 4.15
X 10.1 2631 6.54
66
Genome of Drosophila melanogaster (Fruit fly)
  • Completed in 1999 by Celera Genomics and Berkeley
  • 180 Mbp
  • Five chromosomes 3 large autosomes, Y, and tiny
    fifth
  • 13,601 genes, 1 gene/8Kbp
  • Has 289 homologues to human genes
  • Such as cancer, cardiovascular, neurological,
    etc.
  • There is a fly model for Parkinson and malaria

67
Genome of Arabidopsis thaliana
  • Relatively small genome, 146 Mbp, completed in
    2000
  • Five chromosomes
  • 25,498 predicted genes 1 gene/4.6 kbp
  • Proteins
  • Most A. thaliana proteins have homologues in
    animals
  • 60 of genes have human homologues, e.g., BRCA2
  • Gene distribution
  • Nucleus genome size (125 Mbp), genes (25,500)
  • Chloroplast genome (154 Kbp), genes (79)
  • Mitochondrion genome (367 Kbp), genes (58)

68
  • 20 of 54 genes in a 340-Kbp stretch of rice
    genome (top) are conserved and retain the same
    order in five A. thalia strands

69
Human Genome
  • Human Genome Project
  • Conceived in 1984, begun in 1990, completed in
    2001 ahead of 2003 schedule
  • What did the sequence reveal ?
  • 3 Bbp (base pair)
  • 24 chromosomes,
  • 22 autosomes plus two sex chromasomes (X,Y)
  • Longest 250 Mbp, shorted 55 Mbp
  • Mitochondrial genome
  • Circular DNA molecule of 16.569 Mbp
  • 10(13) cells
  • How many is 3 Bbp ?
  • Typical 11-pt font can print 60 nucleotide is 3
    in (10 cm).
  • In this format, 3 Bbp writes out in 5,000 mi

70
Genome of Homo sapiens
  • 22 chromosomes plus X (163 Mbp) and Y (51 Mbp)
  • Web resources
  • Interactive access to DNA and protein sequences
  • http//www.ensembl.org
  • Images of chromosomes, maps, loci
  • http//www.ncbi.nlm.nih.gov/projects/genome/guide/
  • Gene map 99
  • http//www.ncbi.nlm.nih.gov/genemap99
  • overview of human genome structure
  • http//www.ims.u-tokyo.ac.jp/imsut/en
  • SNP (Single nucleotide polymorphisms)
  • http//snp.cshl.org
  • Human genetic diseases
  • http//www.ncbi.nlm.nih.gov/Omim (Online
    Mendelian Inheritance in Man )
  • http//www.geneclinics.org/profiles/all-html

71
Human Genome Insights (ENCODE)
  • Majority of genome is transcribed
  • 50 transposons
  • 25 protein coding genes/1.3 exons
  • 23,700 protein coding genes
  • 160,000 transcripts
  • Average Gene 36,000 bp
  • 7 exons _at_ 300 bp
  • 6 introns _at_ 5,700 bp
  • 7 alternatively spliced products (95 of genes)
  • RefSeq 34,600 reference sequence genes
    (includes pseudogenes, known RNA genes)

72
Genome of Homo sapiens (contd)
  • Repeat sequences gt50 of the genome
  • Short interspersed nuclear elements (SINEs) 13
    , LINEs 21
  • Simple stutters (repeats of short oligomers
    including mini- and micro-satellites)
  • Triplet repeats such as CAG are implicated in
    numerous diseases (e.g., glutamine repeats in
    glutamine protein)
  • SNP (pronounced snip)
  • A-gtT mutation in beta-globin changes Glu -gt Val,
    creating a sticky surface on haemoglobin
    molecules gt sicklecell anaemia
  • Progeria
  • Avg 1 SNP/Kbp (100 SNPs per 100 Kbp)
  • Many 100-Kbp regions tend to remain intact, with
    fewer than five SNPs
  • ? discrete combinations of SNPs define
    individuals haplotype (haploid genotype)
  • Individual genomes are characterized by a
    distribtuion of genetic makers including SNPs
  • Intl HapMap Consortium

73
Genome of Homo sapiens (contd)
  • SNP consortium
  • Collects human SNPs, nearly 5 million SNPs
  • Show
  • Most of variations appear in all populations
  • However, a few SNPs are unique to particular
    populations
  • Genomes of individuals from Japan and China are
    very similar
  • Chromosome X varies more than other chromosomes
    (X is more subject to selective pressure)
  • Mitochondrial DNA
  • Double-stranded closed circular molecule of
    16,569 bp
  • Inherited almost exclusively through maternal
    lines
  • Not subject to recombination, and changes only by
    mutation
  • About 1 mutation every 25,000 years

74
mtDNA and Y
  • mtDNA Inherited through maternal lines
  • Both sons and daughters get it from their mother
  • All existing sequence variants are traced back to
    a single woman (Mitochondrial Eve) in Africa
    roughly 200,000 years ago
  • Supports from Africa hypothesis
  • Avg difference in mtDNA between pairs of
    individuals is 61.1, between Africans is 76.7,
    between non-Africans is 38.5
  • More divergent populations in Africa for much
    longer than in the rest of the world
  • Y chromosome
  • Most recent common male ancestor (Y-chromosome
    Adam) is around 59,000 years ago
  • Most divergent sequences are found from Africans

75
Other Species
Organism Genome size of genes
Epstein Barr virus 0.17 Mbp 80
E.Coli 4.6 Mbp 4,406
Yeast (S. cerevisiae) 12.5 Mbp 6,172
Nematode worm (C.elegans) 100.3 Mbp 19,099
Thale cress (A. thaliana) 115.4 Mbp 25,498
Fruit fly (D. melanogaster) 128.3 Mbp 13,601
Human (H. sapiens) 3223.0 Mbp 20,500
Fugu (Takifugu rubripes) 390.0 Mbp 30,000
Wheat 16000.0 Mbp 30,000
76
Genome
  • one or more chromosomes that contain the code
    (gene) that directs the synthesis of proteins
    that are essential for its structure and function
  • Human 22 pairs of homologous chromosomes XY
  • http//www.ncbi.nlm.nih.gov/genome/?termtxid9606
    orgn

77
Genes
  • Allele
  • Alternative forms of the same gene
  • Dominant, recessive

78
(No Transcript)
About PowerShow.com