Topics in Informatics - PowerPoint PPT Presentation

1 / 158
About This Presentation
Title:

Topics in Informatics

Description:

Design a simple survey to collect opinions about terminating death penalty. ... guaranteed to be unique (such as Social Security Number in a table with no more ... – PowerPoint PPT presentation

Number of Views:193
Avg rating:3.0/5.0
Slides: 159
Provided by: saintj
Category:

less

Transcript and Presenter's Notes

Title: Topics in Informatics


1
Topics in Informatics
  • Spring 2005, SJC

2
About the Instructor
  • Instructor Dr. Hong Zhou
  • Office McDonough 317
  • Office Hours MWF 1000 1100am
  • Email hzhou_at_sjc.edu, Phone 231-5826
  • Syllabus
  • You can all me either
  • Hong
  • Dr. Hong
  • Dr. Zhou

3
What is Informatics?
  • Search for What is informatics at
    http//www.google.com, we got different
    definitions.
  • Basically, the study and application of the
    knowledge and skills of data/information flow and
    manipulation (including storage, retrieval,
    analysis, and construction/deriving, etc).

4
Informatics
  • Data obtaining
  • Data flow and control
  • Data representation (records) and storage
  • Data retrieval/mining
  • Data analysis
  • Data derivation (generating new data from
    existing data via analysis)

5
What Will You Learn
  • Obtaining reliable data.
  • Data Management (Data Storage and Representation,
    Retrieval) ? database.
  • Introduction to Bioinformatics.
  • Introduction to Health Informatics.

6
Part I Obtaining Reliable Data
  • Complex and precise communication is something
    distinguishing us from non-human.
  • The world development is somehow the development
    of our understanding, i.e. information of the
    universe including our social systems.
  • Information and its uses are the center of such
    development.

7
Information vs Data
  • What is in your mind when we talk about
    INFORMATION?
  • Is information touchable, visible?
  • To my understanding, data is the description of
    information, and information is the
    interpretation of data.
  • So, lets deal with the description ? data, in
    this class.

8
What is Data?
  • When we talk about data, the first image in our
    mind might be numbers such as 5, 87, 98.34, etc.
  • However, are the numbers 5, 87, 98.34
    meaningful/informative?

9
Data with Context
  • Pure numbers are meaningless for us.
  • Numbers with context are meaningful, however.
  • For example, 5 pounds of sugar.
  • So, in this class, we are talking about
    meaningful data and ignore all meaningless data.
    (Are we meaningful persons?)

10
Quick Questions
  • Are following data meaningful?
  • 20
  • 20 years
  • A 20 years old girl
  • A 20 years old girl named Amie
  • A 20 years old girl named Amie who is a SJC
    student.

11
Data Target
  • Data is used to describe a subject.
  • For example, age, height, weight, gender,
    profession, are description of a person.
  • Medical record is a description of a patient

12
Quick Question
What are the targets of the following two rows of
data?
13
What is RELIABLE?
  • When we talk about reliable data, what does that
    mean?
  • Lets discuss this issue at two levels
  • Individual level
  • Group/population level (statistics)

14
Individual Level
  • Reliable data means that the data is closely
    related to the individual (or event) and
    precisely describes the individual (or event).
  • A computer of 3.2 ghz CPU, 512 mb RAM, 512 kb
    cache, etc.

15
Group/Population Level
  • Reliable is more meaningful at the group level.
  • Can a specific medical diagnose of a patient be
    representative of all patients with the same
    symptom?
  • Probably not.

16
Statistical Thinking
  • One powerful approach to analyze data is
    statistics.
  • We measure the reliability (significance) of data
    in the sense of statistics.
  • Statistical thinking is to use data to build our
    understanding, gain insights, and draw
    conclusions or make inferences.
  • Not drawing conclusion from an incident.

17
Principles in Statistical Thinking
  • Count on data instead of an incident
  • Where the data is from matters.
  • Lurking variables
  • Variation is everywhere
  • Conclusions are not absolutely certain

18
Count on large amount of data instead of a few
incidents
  • Famous fortune teller
  • The thumb of a monk

19
Where data is from
  • Group data can be collected from surveys or
    observations, or obtained from experiments.
  • When collecting data, where the data come from is
    important. For example, once there is a question
    If you had it to do over again, would you have
    children? 70 from the written responses are
    NO. Is this piece information reliable?

20
Lurking Variables
  • Is music practice improving test scores?
  • What is behind?

21
The Importance of RANDOM
  • The key factor in data collection is the RANDOM
    concept, i.e. the data has to be randomly
    collected with no bias.
  • Suppose that you are doing a survey of 2004
    election prediction from 10000 people in USA, how
    are you going to pick the 10000 persons? Only in
    schools? Only in New York? Only women? Avoid as
    much as bias as you can.

22
Experiments
  • Some reliable data can only be produced by
    experiments, especially in science.
  • For example, in biology, to pin down the function
    of a gene, you have to knock out the gene or
    depress it and check the phenotype changes. After
    that, you have to recover the gene and verify if
    the phenotype also recovers. Such experiments
    are very convincing, but expensive.

23
Another Experiment
  • It once was believed that women who take hormones
    after menopause reduce the risk of heart attack.
    The belief was resulted from the studies that
    simply compared women who were taking hormones
    with others who were not. Are such study results
    reliable?
  • Such experiments lack proper Controls, which are
    the essential in all experiments.
  • How are you going to design an experiment for
    this study?

24
Reliable Data contd
  • It is not a simple task to obtain reliable data,
    it requires extensive consideration and design.
  • Some experiment results may look convincing at
    some time, but may lose their reliability over
    time or when the environment changes. For
    example, the third stop light of cars.

25
Discussion
  • Is absence of evidence the evidence of absence?

26
Project 1
  • Write a paragraph to discuss the claim Absence
    of evidence is evidence of absence. Please make
    your own judgment as the grading is based on your
    argument.
  • Design a simple survey to collect opinions about
    terminating death penalty. Be aware of the
    importance of RANDOM. Write a short paragraph
    to argument that the data collected by your
    survey is reliable.
  • Points 100.
  • Due Date Feb 1st, 2005.
  • Submit your work in the digital drop box in
    Blackboard.

27
Part II Data Storage
  • Can all information be recorded as data? Lets
    start the discussion.
  • Feeling
  • Knowledge
  • Intelligence

28
Personal Ideas
  • My understanding Yes, just some of them are too
    complicated or too difficult to manifest
    precisely.
  • And that is whey we have IQ test, MQ
    (motivational quotient), EQ, etc.

29
Where to store
  • Data is stored somewhere.
  • Minds
  • Books (paper documents)
  • Computers
  • Etc
  • Lets compare the three storage methods, which
    one you think more lasting or appropriate?

30
Passing Words
  • In ancient time, knowledge is passed in words
    generation by generation.
  • Here is a story about passing by words
  • General called the captain telling tonight at
    700pm, the Halley comet will pass your camp in
    the sky. Organize your soldiers to watch.
  • Captain informed his lieutenant Tonight at
    700pm, the Halley comet will pass our camp in
    the sky and the general is coming to watch with
    our soldiers.
  • The lieutenant informed the sergeant Tonight at
    700pm, the general will accompany Halley comet
    passing over our camp, organize the soldiers
  • The sergeant to soldiers Tonight at 700pm,
    general Halley will pass over our camp in sky and
    we are going to watch that.

31
Data Storage
  • Paper storage
  • Size and cost
  • Transportation
  • Computer
  • Signature ? legal effect
  • Hacking
  • What if computers are down?
  • However, if data is not organized, it is
    difficult to make use of. So, data storage
    strategy is important.
  • In this class, we talk about data storage by
    using computer technology.

32
Ways to store
  • Data storage is a big, and probably the largest
    issue related to computer data manipulation.
  • Different database structures, different database
    managements, online storage, etc.

33
Chapter 1. File structure
  • Hierarchical structure
  • Easy to deal with the hierarchical relationships.
  • For example, the administration is a hierarchical
    structure.
  • Let me use the DOD/NIMA VPF structure as an
    example

34
VPF Structure
  • DOD (Department of Defense) and NIMA (National
    Image and Mapping Agency) sponsored the VPF
    development (Vector Product Format) ? Nickname
    very poor format
  • It is used to store the earth ground information
    and provide a digital map.

35
VPF structure
Database
  • Library

Library
Coverage
Coverage
Coverage
File1
File1
File1
File1
File1
36
Navigation in Hierarchical Structure
What is the purpose of Index?
37
Project 2
  • Create a hierarchical file structure to store
    some your works in SJC.
  • This is the way I prefer organize your works
    based on the classes you take.
  • If you have other ways, that is ok as long as
    they are organized well.
  • Show me in class what you have done.
  • Points 100

38
Chapter 2 XML
  • Extensible Markup Language
  • Purpose
  • Data transportation
  • Data representation
  • Data storage
  • Why we should talk about it here? Because the
    data inside a XML file is hierarchical

39
What XML Promises?
  • Data portability
  • Programming language Java promises the
    portability of programs.
  • However, programs are working on data. Before
    XML, data is not portable, communication among
    systems, agencies are extremely difficult.
  • XML allows systems to communicate using a
    standard means of data representation.

40
HTML?
  • HTML is the portable language for browsers.
  • It is a standard.
  • However, it governs how information is displayed
    in a browser with defined formats and defined
    tags.

41
The Difficulties XML faces
  • XML has some defined formats
  • But doesnt have defined tags.
  • User defined tags
  • Unlimited types of data.

42
Solution (Partially)
  • Make the information self-explained.
  • You have to invent your own tags!

43
A Simple Example
  • Fonship
  • Michele
  • female
  • 9/1980
  • 5/1985
  • Badley school

44
Tips about XML format
  • A tag is case sensitive
  • A starting tag must have a closing tag to match
  • All XML elements must be properly nested.
  • All XML documents must have a root element.
  • Attribute values must always be quoted.

45
Comments in XML
  • Comments in XML
  • The syntax for writing comments in XML is similar
    to that of HTML.
  • A sample XML file.

46
XML Element Naming
  • Names can contain letters, numbers, and other
    characters
  • Names must not start with a number or punctuation
    character
  • Names must not start with the letters xml (or XML
    or Xml ..)
  • Names cannot contain spaces.

47
Is it valid or not?
nameRose Washingtonst name
48
Element Content
  • An XML element is everything from (including) the
    element's start tag to (including) the element's
    end tag.
  • An element can have element content, mixed
    content, simple content, or empty content. An
    element can also have attributes.

49
Is this valid?
appleit
50
Child Elements vs. Attributes
Anna Smith
female Annastname Smith
51
Disadvantages of Attributes
  • attributes cannot contain multiple values (child
    elements can)
  • attributes are not easily expandable (for future
    changes)
  • attributes cannot describe structures (child
    elements can)
  • attributes are more difficult to manipulate by
    program code
  • attribute values are not easy to test against a
    Document Type Definition (DTD) - which is used to
    define the legal elements of an XML document

52
Using Child Elements?
  • So, it is a good idea to use Child Elements other
    than Attributes.
  • Check this out. Tell which way you prefer.
  • Can this file work? What is wrong?

53
A case for Attribute
  • What is metadata? Data about data. For example,
    your SJC student ID is a metadata about you since
    it does not describe you.

OReilly sssomewhere
54
Is this valid?
Eng100 id5Math100 200-300pmfice Hour McDonough Hall 211

What are the errors?
55
More about XML
  • Now we have so called XML database whose basic
    element is XML document. It is not very
    successful yet.
  • Remember that XML does not really do anything
    except describing data.
  • We have to interpret whatever it is describing.
    In the sense of computer software, the user has
    to develop software to interpret.
  • What are DTD and XML schema?
  • What are the disadvantages of XML? Please
    discuss about it.

56
Analyze the XML file
  • Example XML file
  • Lets discuss the weakness of this file.
  • What do you suggest?
  • How do you think about my solution?

57
In class exercise
  • Given the data shown in Access database, can we
    store the same data in XML format? Please try it
    in class. Thanks.

58
Useful Sites about XML
  • http//www.w3schools.com/xml/
  • http//www.xml.org

59
XML in Uses?
  • BBC topic news are also available online via XML.
    Example.
  • XML at work.
  • XML in commerce?
  • What is GML and SGML?

60
Project 3
  • Here are the requirements, which are also
    available in Blackboard.
  • Discussion will XML really be the standard of
    data transportation or data storage?

61
Part 3 Database
  • Instead of listing it as Chapter 3, it is listed
    as Part 3, which shows that this is a big issue.

62
Chapter 1 Database History
  • Hierarchical database
  • Network database
  • Relational database
  • Object-oriented database
  • Object-oriented relational database
  • XML database
  • etc

63
Relational Database
  • The major database in use.
  • Based on the relations between data items.
  • Key element tables.
  • Available relational databases Oracle, DB2,
    Sybase, MS SQL Server, Access, MySQL, etc.
  • A site about evaluation.
  • The instructors database work.

64
Records and Attributes
  • A table has multiple records, each has multiple
    values.
  • For example.
  • The attributes define the data types. All data
    in that column must conform to the given data
    types.

65
Primary Key
  • The primary key of a relational table uniquely
    identifies each record in the table. It can
    either be a normal attribute that is guaranteed
    to be unique (such as Social Security Number in a
    table with no more than one record per person) or
    it can be generated by the DBMS (such as a
    globally unique identifier, or GUID, in Microsoft
    SQL Server). Primary keys may consist of a single
    attribute or multiple attributes in combination
  • For example, in the table example, the primary
    key is Student.
  • Every table must have Primary Key defined.

66
Primary Key (2)
  • Guess what would be the Primary Key in the SJC
    database for students?
  • Will it be ok to use your name (last name and
    first name) as the primary key?

67
Create a table for
  • Smith, Jack, male, 8/15/1989, 421865241, Forrest,
    Shoplifting, Linda Luke, (860)321-9086, 105.
  • Marsa, Rose, female, 7/1/1988, 3245691877, Jones,
    Dog fighting, Nancy Charles, (860)321-9088, 106.
  • Lese, Sam, male, 3/21/1986, 425423785, Hartford,
    Dwell breaking, Linda Luke, (860)321-9086, 105.
  • Haly, Rachel, female, 3/25/1989, 423671841,
    Hartford, misconduct, Linda Luke, (860)321-9086,
    105.
  • Horse, James, male, 11/2/1987, 765213456, Lama,
    misconduct, Nancy Charles, (860)321-9088, 106.
  • Lincoln, George, male, 10/5/1988, 324342342,
    Jones, fighting, Linda Luke, (860)321-9086, 105.
  • Doom, Jade, female, 9/9/1988, 423213495,
    Hartford, misconduct, Nancy Charles,
    (860)321-9088, 106.

68
TableS
  • Surely we will deal with multiple database tables
    concerning any complete datasets.
  • When dealing with complicate datasets, first
    thing is to categorize the data into groups with
    each group represented by a table.
  • The second thing is to find and build the
    relationships between the tables.

69
Analyze the data
  • How many categories we have?
  • Lets use UML to clear the data relationship!
  • UML is Unified Modeling Language which arises in
    1990s. It derived from the three greatest minds
    of system modeling.
  • It is the standard language used to analyze
    system design.

70
Practice
  • The UML diagram
  • What tables you would construct for the data in
    the XML file?
  • Do this exercise in class.

71
Relationship
  • Now lets talk about the relationship types
  • One-to-One
  • One-to-many vs many-to-one
  • Many-to-many

72
One-One
  • SSN Person

SSN
Person
1
1
73
One-Many
  • Bank accounts ? ? person (one person can have
    multiple accounts, but one account belongs to one
    person/family).

Bank account
person

1
74
Many-Many
  • Course-Student. A student may take multiple
    courses, and a course may be taken by multiple
    students.

75
Foreign Key
  • A foreign key is a relationship or link between
    two tables which ensures that the data stored in
    a database is consistent.
  • The foreign key link is set up by matching
    columns in one table (the child) to the primary
    key columns in another table (the parent).
  • Referential Integrity

76
Foreign Key Example 1
Table Students
PK
Basket Ball players
studentID First name Last name Major SSN Gender DO
B
playerID First name Last name Position number
PK
One-to-one
parent
child
77
Foreign Key Example 2
  • Given a table about instructors whose columns are
    ID, first name and last name.
  • Suppose the basic information of a offered course
    is the instructor and the course name.

78
Contd

1
Course name Instructor
ID First name Last name
One-Many
79
Contd
  • Look at this example.

80
Exercise in Depth
  • UML diagram of the exercise.
  • Now, how to define the tables that can properly
    represent the UML diagram?

81
Common Rules
  • One object (entity) one table
  • One attribute one column
  • Additional PK optional in some cases.

82
How to define Relations between Tables?
  • First of all, we have to know that Parents come
    before children. Tables that can be built
    without referencing other tables/data could be
    used as parent table.
  • For example, student table vs basket ball player
    table.

83
Relations contd
  • In case of One-One relation, the parent table is
    the table that can be built without referencing
    any data in the child table. The child table
    must be the table that references data in the
    parent table.

84
Example
studentID First name Last name Major SSN Gender DO
B
playerID First name Last name Position number
85
Relation contd
  • In case of One-Many, the One must be the parent
    table, and the Many must be the child table

86
Example
child
parent
BookID Title Author Publish year Publisher ID
PublisherID Name Address
Many-One
87
Many-Many
  • It is pretty hard to express the Many-Many
    relations between two tables.
  • For example, students ? ? Courses relationship.
  • How are we going to do it?

88
Solution
  • Make use of another table! In this case, we have
    three tables. One for students only, one for
    courses only, and one to link students with
    courses.

89
Solution Example
studentID lastname firstname DOB gender
Course Title Location
StudentID Course
90
The full table construction
  • Lets work on this data to build the whole
    tables!
  • Now, lets do this project 4!

91
Sword, a real application
  • Publicly information about Sword.
  • A success story of data representation, storage
    and management in Mississippi.
  • Please form 2 or 3 groups for the coming projects
    since they are kind of complicated. Inform me of
    the group members in the next class. Thanks.

92
Discussion of Sword in Class
  • The sword data scenario

93
Chapter 2 Access Basics I
  • Please form 2 or 3 groups to for the coming
    projects since they are kind of complicated.
    Inform me of the group members in the next class.
    Thanks.
  • Every student is supposed to collect at least 2
    restaurant menus of the Hartford area. Keep them
    for later use.

94
Basics (1)
  • Open and save an Access database.
  • Create a table in Design View.
  • To create good tables, we need to understand our
    data first. Lets have a look at the existing
    data in next slide.

95
Try
  • Create a table to hold the information below?
  • Smith, Jack, male, 8/15/1989, 421865241, Forrest,
    Shoplifting, Linda Luke, (860)321-9086, 105.
  • Marsa, Rose, female, 7/1/1988, 3245691877, Jones,
    Dog fighting, Nancy Charles, (860)321-9088, 106.
  • Lese, Sam, male, 3/21/1986, 425423785, Hartford,
    Dwell breaking, Linda Luke, (860)321-9086, 105.
  • Haly, Rachel, female, 3/25/1989, 423671841,
    Hartford, misconduct, Linda Luke, (860)321-9086,
    105.
  • Horse, James, male, 11/2/1987, 765213456, Lama,
    misconduct, Nancy Charles, (860)321-9088, 106.
  • Lincoln, George, male, 10/5/1988, 324342342,
    Jones, fighting, Linda Luke, (860)321-9086, 105.
  • Doom, Jade, female, 9/9/1988, 423213495,
    Hartford, misconduct, Nancy Charles,
    (860)321-9088, 106.

96
Try contd
  • First, primary key!
  • Continue the building of one table for all the
    data.
  • After done, save the work and give the table a
    sensible name.

97
Create Table wizard
  • Lets explore the table creation function of
    Access we can create table by Wizard, i.e. with
    templates.

98
Create Multiple Tables
  • Based on the UML diagram of the data, lets
    create multiple tables.

99
Normalization
  • Normalization in database means to remove the
    redundant data to improve data storage
    efficiency, data integrity and scalability.
  • It is essential
  • Good online explanation

100
3 Level Normalization
  • The first level of normalization removes
    redundant data horizontally, i.e. no repeated
    columns.
  • The second level of normalization removes
    redundant data vertically, i.e. no repeated data
    in rows.
  • The third level of normalization organize data
    that does not depend on the primary key into
    another table.

101
Normalization
  • Totally there are 5 levels of normalization.
  • It is absolutely necessary to apply the 1st and
    2nd levels of normalization.
  • The 3rd level is applied sometimes.
  • Dont bother with the 4th or 5th levels of
    normalization

102
Exercise
  • What is the normalization level of the database
    constructed?

103
Basics (2) Simple Query
  • Based on the constructed table, lets have some
    fun with Query.
  • Query is a programming language called SQL
    (structured query language).
  • SQL is a standard interactive and programming
    language for getting information from and
    updating a database.
  • Click here to learn more?

104
(No Transcript)
105
Create Query
  • Create Query in design view
  • Create Query by using wizard
  • View the result sheet.

106
Query Syntax
  • Though we now know how to create simple queries
    graphically, we still need to understand the
    syntax.
  • SELECT sth FROM somethere.

Select from classes Select ID from
classes Select classes.ID, lastname from classes
107
Set Conditions
  • SELECT something FROM somewhere WHERE
    conditions-are-met
  • Select from students where gender0
  • Select from students where lastnameSmith
  • Select from students where DOB between
    1/1/1988 and 1/1/1990

108
Set Conditions
  • Select from students where lastname like
    Smi
  • Select from students where lastname like
    smi
  • SELECT FROM students WHERE gender0 AND
    lastname like smi
  • Be aware in standard SQL, LIKE smi

109
JOIN
  • In many cases, we need to fetch data from
    multiple tables. Thus, we need to bind together
    the data from the tables. The binding is based
    on some keys, usually the primary key or some
    other unique data items.
  • Good online material (but be aware that this is
    for standard SQL, not for Access!)
  • FOR Microsoft inquiry, please go to
    http//msdn.microsoft.com/

110
Join in Access
  • Select sth1, sth2 from table1 INNER JOIN table2
    ON table1.key1 table2.key2.
  • For example
  • Select students. from (students INNER JOIN
    studentscourses ON students.ID
    studentscourses.studentID) where
    studentscourses.courseNumComp200

111
Other JOINS
  • There are two different types of JOIN
  • INNER JOIN
  • OUTER JOIN
  • LEFT JOIN
  • RIGHT JOIN
  • Lets not deal with OUTER JOIN in this class
    to make it simple.

112
INNER JOIN
  • INNER JOIN only join the records that both tables
    have the corresponding key!.
  • See the MSDN explanation

113
Sort the Results
  • You can order the results in ascending or
    descending order.
  • Select from students order by studentID desc
  • Select from students order by lastname (if it
    is ascending order, you dont need to specify it)

114
Subquery
  • Inside a query, we can have another query to
    provide some information for a condition, i.e. we
    have subquery(s) inside a query.
  • Select from students where studentID in (select
    studentID from studentscourses where
    coursenumberComp200)

115
Functions
  • Access query could use built-in functions. For
    example, MAX, MIN, COUNT, etc. Lets experience
    COUNT.
  • Question how to find the number of students who
    are taking courses currently in the school?

116
Others
  • SO far, we have been dealing with SELECT queries.
    There are other types
  • CREATE create tables
  • INSERT insert rows
  • DROP drop tables
  • DELETE delete rows
  • ALTER - change the table structures
  • Etc.

117
Sample Database
  • Here is a sample database with some queries
    constructed. Might be useful as references.
  • Remember that this class is not only for
    database, so we cannot go very deep into database
    issues. If you have more interests in database,
    I may be able to offer a class specifically on
    database.

118
Project 5
  • Project 5 requests you to construct a database
    for a group of restaurant. Please use UML
    diagram to analyze the data first, then construct
    your database. Also, please provide some
    queries. -- Imaging that you are provide a
    hotline services for customer inquiries about
    food services in Hartford area.

119
Part 4 Bioinformatics
  • What is bioinformatics?
  • The study of the application of computer and
    statistical techniques to the management of
    biological information
  • The science of creating and managing biological
    databases to keep track of, and eventually
    simulate, the complexity of living organisms.
  • There exist different definitions, though.

120
The Possible Role of Bioinformatics?
  • Look over the history of biology, different
    approaches are used over the time.
  • Initially Guessing ? Observation ? Dissection.
  • Mendal started genetic experiments.
  • Biochemists used organics to clear out the
    metabolic pathways.
  • Molecular biology is another approach now used to
    decode the life secrets.
  • Is it the time for bioinformatics as another
    approach?

121
Several Foundations of Bioinformatics
  • Lives are from the same ancestors, either
    evolved or created. That means that knowledge
    obtained on one form of life may be applied to
    other forms. In fact, molecular biology started
    from bacteria, then yeast, then mammal. ?
    database
  • Publicly available data resources.
  • Human Genome Project

122
Publicly Resources
  • I am not sure how many biological research
    laboratories we have in the world, it must be
    MANY MANY.
  • No other science has equal or even close amount
    of research laboratories.
  • The largest amount of research funds from
    government, states, private corporations, etc.

123
Most famous Agencies
  • NIH (National Institute of Health)
  • WHO (World Health Organization)
  • Others

124
Huge Amount of Information
  • All the scientists in the world generated large
    amount of scientific information, and it is
    likely much of them is repeated.
  • Communication among scientists become extremely
    important.
  • That is why there are so many publicly available
    biological resources.
  • Internet plays a critical role in the information
    sharing.

125
Internets Information
  • Access to information for anyone with an Internet
    browser.
  • The data stored in centralized database us
    redundant by a factor of about 2.5, which
    provides a quality control.
  • Information from yeast (for example) could be
    helpful in finding/understanding homologous
    genes/pathways in humans (comparative genomics).

126
Human Genome Project
  • HGP.
  • Without HGP, there is no real Bioinformatics.
  • Bioinformatics boosted up after large amount of
    Human Genome are decoded ? how to use these DNA
    information? ? Computer technologies!

127
Bioinformatics and Evolution
Mutations
128
Mutations
  • Mutations that occur in germ cells will be passed
    on to the next generation, like any other DNA
    sequences.
  • So, as time and generations go by, a DNA sequence
    will acquire more and more mutations and resemble
    less and less the original DNA sequence.

129
Need to know where from
  • From an evolutionary perspective, we cannot know
    where we are going unless we know where we have
    been. Before, the study of human evolution was
    largely the province of paleoanthropologists who
    studied the fossil record.
  • However, gene comparisons now become the major
    and more accurate techniques ? using computer
    technologies/bioinformatics

130
Do you know
  • We all started from Africa?
  • Using the Mitochondrial DNA analysis among women
    from different nations, it is found that African
    people have larger variations in DNA sequence ?
    oldest group has the greatest genetic diversity ?
    African is the oldest population ? the ancestor.

131
Bioinformatics with AIDS
  • Analysis of the human genome guides AIDS
    research. Some persons long-infected with HIV
    have not shown any symptoms of the disease.
    Studies found that these people possess a variant
    of a receptor CCR5 ? Rarely in Asian and African
    ? guess it may come to European in 14th century.

132
Tools of Bioinformatics
  • Gene Predication Software
  • Sequence Alignment Software
  • Molecular Phylogenetics
  • Molecular Modeling and 3-D Visualization.

133
NCBI
  • National Center for Biotechnology Information.
  • PubMed (Medline)
  • Entrez
  • BLAST
  • OMIM
  • Books
  • TaxBrowser
  • Structure

134
PubMed
  • Access to the Medline database ? largest
    biomedical literature source.
  • Medline database contains citations and abstracts
    from more than 4600 biomedical journals published
    in USA and other countries.
  • Searches are commonly conducted using a
    keyword(s), author names, publication date,
    and/or journal titles.

135
Entrez
  • A search and retrieval system that integrates all
    of the databases available at NCBI. These
    databases include nucleotide sequences, protein
    sequences, genomes, molecular structure and
    PubMed.
  • GenBank, DNA DataBank of Japan, European
    Molecular Biology Laboratory make up the
    International Nucleotide Sequence Database
    Collaboration. These organizations exchange data
    every day.
  • Search for Bcl2 as an example.

136
BLAST
  • Basic Local Alignment Search Tool.

Sequence 1 AGTTCGATAGCTAAGGTCGG Sequence
2 AGTTCGATAGCTATGGTCGG
137
BLAST
Sequence 3 AGTTCGATAGCTAAGGTCGG Sequence
4 AGTTCGATAGCTAGGTCGGG
138
BLAST Another Look
Sequence 3 AGTTCGATAGCTAAGGTCGG Sequence
4 AGTTCGATAGCTAGGTCGG
139
Use BLAST
  • Click here.
  • Lets choose blastn.
  • Now, lets practice its uses.

140
OMIM
  • Online Mendelian Inheritance in Man
  • It is a database containing information about
    human genes and genetic disease. This resources
    is often used by physicians and researchers
    interested in genetic diseases.

141
Books
  • NCBI collaborates with authors and publishers to
    create a virtual bookshelf.

142
TaxBrowser
  • The taxonomy site contains a classification of
    all the organisms that are represented by
    sequences in the public databases, including
    model organisms commonly used in molecular
    biology.

143
Structure
  • The structure site features the Molecular
    Modeling Database (MMDB), which contains
    macromolecular 3-D structures as well as tools to
    analyze them. Included in the MMDB are
    experimentally determined structures obtained
    from the protein data bank.

144
Cn3D4.1
  • You can download it.
  • It reads MMDB instead of PDB file. This is
    because MMDB will ensures the correctness of the
    read PDB file.
  • The Link

145
Applications of Bioinformatics
  • Forensic Science
  • Agriculture
  • Medicine
  • Pharma/Biotechnology
  • Environmental Science
  • Ethical Legal, and Social ISsues

146
Forensic Science
  • Minisatellites consists of short DNA sequences
    that repeat in tandem. The number of repeats
    the sequence within each repeat can exhibit wide
    variation in a population. Techniques based on
    this were developed to identify individuals.
    E.g. FBI established Combined DNA Index System
    (CODIS) that contains profiles of convicted
    offenders.

147
Forensic Science
  • DNA testing is now the standard technique for
    confirm paternity.
  • Is also a technique to identify criminals and
    victims.
  • Computer technology is essential to search
    through the database for the identification.

148
Agriculture
  • Genome projects for major crop plants are well
    underway
  • Pest control
  • Seed quality
  • Plant micronutrients (golden rice)
  • Etc.

149
Medicine
  • The ability to correlate genetic data with
    medical records promises to improve our
    understanding of disease and improve treatments.
  • Microarray ? cancer classification
  • Associating SNPs with disease helps scientists to
    identify genes that play roles in disease
    progression.

150
Pharma/Biotechnology
  • Bioinformatics is providing a complete list of
    candidate genes for drug discovery. The tools of
    functional genomics are being used to establish
    the metabolic roles played by the candidate gene
    products.
  • Pharmaceutical companies are using bioinformatics
    to search for new antibiotics.

151
Contd
  • Advances in genomics are expanding the range of
    drug targets and are shifting the discovery
    effort from direct screening programs to rational
    target-based drug designs.

152
Environmental Sciences
  • Global biodiversity.
  • Global Biodiversity Information Facility (GBIF)
  • How to analyze these diversity and make use of
    them.
  • Computer software to monitor environmental
    changes, via birds and other animals behaviors.

153
Ethical, Legal and Social Issues
  • Anonymous databases ? include nonidentifiable
    genetic data.
  • Non-anonymous databases ? its data could be
    linked to individuals.
  • An ethical concern most relevant to non-anonymous
    databases is Informed Consent.

154
Informed Consent
  • Informed consent is the ethical practice of
    respecting individual autonomy and protecting an
    individual from harm. It refers to a process
    whereby an individual freely and knowingly weighs
    the risks and benefits of donating a tissue or
    DNA sample for research purposes.

155
Privacy Confidentiality
  • Personal privacy is an important aspect of
    informed consent. Privacy is the right to
    control access to information about oneself.
  • Confidentiality is the obligation for those who
    obtain information about individuals to protect
    the privacy of that information.

156
More
  • If society is to gain the most from genomic
    biology, then the public must be able to
    rationally consider scientific issues. They
    should not place a blind trust in scientists, nor
    should they dismiss new technologies out of hand.

157
In-Class Exercise
  • Human Genome is sequenced via the shortgun
    approach in which human chromosomes were randomly
    cut into pieces.
  • Each DNA pieces are sequenced separately.
  • Computer technology is then used to find the
    overlap and construct the contiguous sequence.

158
In-Class Exerices
  • Group 1
  • Group 2
  • Group 3
  • Each group will constitute two fragments and all
    groups work together for the final sequences.
  • For simplicity, we are dealing with only one
    strand for simplicity.
Write a Comment
User Comments (0)
About PowerShow.com