Lecture 3.1 MySQL and JDBC - PowerPoint PPT Presentation

1 / 75
About This Presentation
Title:

Lecture 3.1 MySQL and JDBC

Description:

Learn how to use the Structured Query Language (SQL) ... Integer, Real, Varchar, Text, Blob, etc... Operations on tables produce tables. Lecture 3.1 ... – PowerPoint PPT presentation

Number of Views:541
Avg rating:3.0/5.0
Slides: 76
Provided by: stephe78
Category:
Tags: jdbc | mysql | blob | lecture

less

Transcript and Presenter's Notes

Title: Lecture 3.1 MySQL and JDBC


1
Lecture 3.1 MySQL and JDBCLets talk databases
  • Sohrab Shah
  • UBC Bioinformatics Centre
  • sohrab_at_bioinformatics.ubc.ca
  • http//bioinformatics.ubc.ca/people/sohrab

2
Objectives
  • Learn the basics of relational databases
  • Learn how to use MySQL
  • Learn how to use the Structured Query Language
    (SQL)
  • Learn to communicate with MySQL through the Java
    Database Connectivity (JDBC) protocol

3
Outline
  • Why are databases important in bioinformatics?
  • Brief background in databases
  • Introduction to the Structured Query Language
  • A worked example a Sequence database in MySQL
  • The JDBC protocol using Connector/J for MySQL

4
MySQL
  • Relational Databases
  • Advantages
  • Representing data
  • SQL
  • Examples
  • Sequence
  • INSERT
  • DELETE
  • UPDATE
  • SELECT
  • JOIN
  • MySQL
  • Free
  • Fast
  • Functional
  • MySQL in bioinformatics
  • Ensembl
  • Gene Ontology
  • UCSC
  • JDBC
  • Connector/J
  • Making a connection
  • Preparing a query
  • Executing a query
  • Getting results

5
What is a database?
  • Collection of information
  • Spreadsheet
  • Filing cabinet
  • Oracle database
  • Biology is abound with collections of data
  • Tsunami, deluge, avalanche, flood
  • Databases help us efficiently organise, integrate
    and query data in order to make scientific
    inferences

http//bioteach.ubc.ca
6
Databases and bioinformatics
  • Nucleotide records
    36,653,899
  • Protein sequences
    4,436,362
  • 3D structures
    19,640
  • Interactions complexes
    52,385
  • Human Unigene Cluster
    118,517
  • Maps and Complete Genomes
    6,948
  • Different taxonomy Nodes
    283,121
  • Human dbSNP
    13,179,601
  • Human RefSeq records
    22,079
  • bp in Human Contigs gt 5,000 kb (116)
    2,487,920,000
  • PubMed records
    12,570,540
  • OMIM records
    15,138

7
Molecular biology needs databases!
  • High volume complex data structures

HELP!
8
NAR Database Issue - 2004
142 articles
9
RELATIONAL DATABASES
10
Relational Databases
  • A brief history
  • Developed by E.F. Codd (IBM) 1969-70
  • Died 2003
  • Awarded the Turing prize for his work
  • Developed 12 rules to define a RD that call for a
    language to define, manipulate and query the data
    in the database
  • 1 rule led to the Structured Query Language (SQL)
    that is used in every RDMBS system on the market
  • ANSI standard (92,99)

11
SQL
12
Relational Model
  • All data stored in tables
  • Table is a relation made up of columns (fields)
    and rows (records)
  • Intersection of a column and a row is a typed
    value
  • Integer, Real, Varchar, Text, Blob, etc
  • Operations on tables produce tables

13
Advantages of the relational model
  • Data independence
  • Shielding the data from the application
  • Efficiency
  • Storage, retrieval, integration
  • Data integrity/security
  • Constraints, access controls

14
ACID test
  • Atomicity
  • all or nothing transaction
  • If one operation fails, all fail
  • Consistency
  • data integrity
  • constraints
  • Isolation
  • Every transaction has a consistent view of the
    database regardless of what other transactions
    are being processes
  • Durability
  • Once a transaction is complete, the newly updated
    data will survive failures of any kind
  • logs

15
Research fuelled by corporate databases gives us
great technology for biological science
  • 30 years of research into robust systems
  • Industry standards for databases
  • Vendors committed to high-quality products
  • Oracle, DB2, Sybase, MS SQLserver, etc
  • Emergence of the internet and database driven
    web-content set the stage for bioinformatics
  • Data mining tools for creating statistical
    associations
  • Diapers and beer?
  • Teradata, a division of NCR Corporation

16
SQL
17
What drives a database?
SQL
18
SQL
  • Structured Query Language (ANSI 92,99)
  • Used in virtually every RDBMS product
  • Has operations for
  • Creating tables
  • Modifying tables
  • Relating tables
  • Inserting data
  • Updating data
  • Retrieving sets of data
  • Deleting sets of data
  • Deleting tables

19
SQL
  • Not all implementations consistent
  • WARNING
  • MySQL CREATE TABLE statements ! PostgreSQL
    CREATE TABLE statements

20
Commercial RDBMS
  • Oracle
  • According to Forbes, Larry Ellison is the 9th
    richest person in the US (18 billion)
  • DB2
  • IBMs solution free for academics
  • Microsoft SQL server
  • For Windows

21
Open Source RDBMS
  • PostgreSQL
  • http//www.postgresql.org/
  • the worlds most advanced Open Source database
    software
  • Began in 1986 at UC Berkeley
  • For many years considered the most
    sophisticated OS RDBMS
  • Performance?
  • Comes with most Linux distros
  • Small but loyal user community

22
MySQL
  • http//www.mysql.com/
  • The world's most popular open source database
  • gt 5,000,000 active installations
  • Easy to use
  • Very fast retrieval due to architecture
  • Considered by many to be a toy database
  • For years no row-level locking
  • Did not handle transactions well

23
MySQL
  • Free
  • As in free beer
  • Dual license
  • Commercial http//www.mysql.com/products/licensin
    g/commercial-license.html
  • OpenSource http//www.mysql.com/products/licensin
    g/opensource-license.html
  • As in free speech
  • Fast
  • Extremely fast reads for certain table types
  • Outperforms any RDMBS for reads
  • Functional
  • Ease of use
  • APIs in Perl, C, C, Java
  • Client/server architecture
  • Works well with Apache/PHP for very popular OS
    dynamic web solution

24
MySQL versions
  • 3.23. (http//dev.mysql.com/doc/mysql/en/News-3.2
    3.x.html)
  • Introduces row-level locking
  • Introduces full-text indexing
  • 4.0. (http//dev.mysql.com/doc/mysql/en/News-4.0.
    x.html)
  • Transactions, foreign keys with InnoDB
  • Improved Full-text indexing
  • 4.1. (http//dev.mysql.com/doc/mysql/en/News-4.1.
    x.html)
  • Subqueries
  • 5.0. (http//dev.mysql.com/doc/mysql/en/News-5.0.
    x.html)
  • Stored procedures

25
MySQL examples in bioinformatics
  • Free, fast and functional have made MySQL
    pervasive in bioinformatics
  • Ensembl (http//www.ensembl.org)
  • Automated eukaryotic annotation database
  • Gene Ontology (http//www.geneontology.org)
  • Controlled vocabulary for genes and functions
  • UCSC Genome Browser (http//genome.ucsc.edu)
  • Human and other genome browser
  • BASE (http//base.thep.lu.se)
  • BioArray Software Environment a web-based
    database solution for microarrays

26
Worked example a relational model for sequences
and features
  • Create a relational model
  • Tables to store
  • data
  • Sequence strings
  • Meta-data
  • Data about the data features and their
    locations
  • Insert some records
  • Query the data to pull out useful subsets

27
Creating a Relational Database
  • Start with a data set
  • Divide data set into records
  • The data
  • Divide records into useful fields that describe
    the particular record
  • The meta-data
  • Create a model based on the useful fields
  • Create a database from the model
  • Insert the data into the database
  • The data is now computable

28
Example Genbank sequence record
26-APR-2004
2075 bp
ACCESSION L32174
gene lt1..1204
/gene"LAF1"
29
Simple example a relational model for biological
sequences and features
30
Remove the last page from your binder
31
CREATE Sequence
  • CREATE TABLE Sequence (
  • sequence_id INT NOT NULL AUTO_INCREMENT,
  • sequence LONGTEXT NOT NULL,
  • defline TEXT,
  • accession VARCHAR(255) NOT NULL,
  • version INT DEFAULT 0,
  • length INT DEFAULT 0,
  • moltype INT NOT NULL,
  • PRIMARY KEY(sequence_id)
  • )

32
CREATE Ontology
  • CREATE TABLE Ontology (
  • ontology_id INT NOT NULL AUTO_INCREMENT,
  • term VARCHAR(255) NOT NULL,
  • description TEXT NOT NULL,
  • PRIMARY KEY (ontology_id)
  • )

33
CREATE Feature
  • CREATE TABLE Feature (
  • feature_id INT NOT NULL AUTO_INCREMENT,
  • sequence_id INT NOT NULL,
  • ontology_id INT NOT NULL,
  • FOREIGN KEY (sequence_id) REFERENCES Sequence,
  • FOREIGN KEY (ontology_id) REFERENCES Ontology,
  • PRIMARY KEY(feature_id)
  • )

34
CREATE Location
  • CREATE TABLE Location (
  • location_id INT NOT NULL AUTO_INCREMENT,
  • feature_id INT NOT NULL,
  • start INT NOT NULL,
  • stop INT NOT NULL,
  • strand INT NOT NULL,
  • FOREIGN KEY (feature_id) REFERENCES Feature,
  • PRIMARY KEY(location_id)
  • )

35
CREATE Qualifier
  • CREATE TABLE Qualifier (
  • qualifier_id INT NOT NULL AUTO_INCREMENT,
  • feature_id INT NOT NULL,
  • ontology_id INT NOT NULL,
  • value TEXT NOT NULL,
  • FOREIGN KEY (feature_id) REFERENCES Feature,
  • FOREIGN KEY (ontology_id) REFERENCES Ontology,
  • PRIMARY KEY (qualifier_id)
  • )

36
INSERT an ontology
mysqlgt INSERT INTO Ontology (term, description)
VALUES -gt ('start codon', 'denotes an
Methionine codon of a transcript') Query OK, 1
row affected (0.00 sec) mysqlgt SELECT FROM
Ontology -------------------------------------
---------------------------------- ontology_id
term description
-----------------------------------
------------------------------------
3 start codon denotes an Methionine codon of
a transcript --------------------------------
--------------------------------------- 1 row in
set (0.01 sec)
37
INSERT some more ontologies
mysqlgt INSERT INTO Ontology (term, description)
VALUES -gt ('exon', 'an exon in genomic
sequence') Query OK, 1 row affected (0.00
sec) mysqlgt INSERT INTO Ontology (term,
description) VALUES -gt ('exon type', '3\'UTR,
initial, internal, terminal, 5\'UTR') Query OK,
1 row affected (0.00 sec) mysqlgt SELECT FROM
Ontology -------------------------------------
---------------------------------- ontology_id
term description
-----------------------------------
------------------------------------
3 start codon denotes an Methionine codon of
a transcript 4 exon an
exon in genomic sequence
5 exon type 3'UTR, initial, internal,
terminal, 5'UTR ---------------------------
-------------------------------------------- 3
rows in set (0.00 sec)
38
INSERT a sequence
mysqlgt DESC Sequence --------------------------
-------------------------------------
Field Type Null Key Default
Extra ----------------------------
-----------------------------------
sequence_id int(11) PRI NULL
auto_increment sequence longtext
defline
text YES NULL
accession varchar(255)
version
int(11) YES 0
length int(11) YES
0 moltype
int(11) 0
----------------------------------------
----------------------- 7 rows in set (0.00
sec) mysqlgt INSERT INTO Sequence (sequence,
defline, accession, version, length, moltype)
-gt VALUES ('ATGACGATCAGCATCAGCTACAGCTG', 'gt
seq1', 'seq1', 1, 26, 1) Query OK, 1 row
affected (0.00 sec)
39
INSERT a Feature on a sequence
mysqlgt SELECT FROM Sequence -----------------
-----------------------------------------------
----------------------- sequence_id
sequence defline accession
version length moltype -----------------
-----------------------------------------------
----------------------- 2
ATGACGATCAGCATCAGCTACAGCTG gt seq1 seq1
1 26 1 -----------------
-----------------------------------------------
----------------------- 1 row in set (0.03
sec) mysqlgt SELECT FROM Ontology ------------
------------------------------------------------
----------- ontology_id term
description
---------------------------------------------
-------------------------- 3 start
codon denotes an Methionine codon of a
transcript 4 exon an
exon in genomic sequence
5 exon type 3'UTR, initial, internal,
terminal, 5'UTR ---------------------------
-------------------------------------------- 3
rows in set (0.00 sec) mysqlgt INSERT INTO
Feature (sequence_id, ontology_id) -gt VALUES
(2, 3) Query OK, 1 row affected (0.00 sec)
40
INSERT a Location
mysqlgt SELECT From Feature ------------------
-------------------- feature_id sequence_id
ontology_id -------------------------------
------- 1 2 3
-------------------------------------- 1
row in set (0.01 sec) mysqlgt DESC
Location -----------------------------------
----------------------- Field Type
Null Key Default Extra
------------------------------------------
---------------- location_id int(11)
PRI NULL auto_increment feature_id
int(11) 0
start int(11) 0
stop int(11)
0 strand
int(11) 0
------------------------------------------
---------------- 5 rows in set (0.00
sec) mysqlgt INSERT INTO Location (feature_id,
start, stop, strand) -gt VALUES(1,1,3,1) Query
OK, 1 row affected (0.02 sec)
41
Queries using SELECT
mysqlgt SELECT FROM Sequence -----------------
-----------------------------------------------
----------------------- sequence_id
sequence defline accession
version length moltype -----------------
-----------------------------------------------
----------------------- 2
ATGACGATCAGCATCAGCTACAGCTG gt seq1 seq1
1 26 1 3
SLKLSKLPSPLYQVCLE gt seq2 L32174
1 17 3 -----------------
-----------------------------------------------
----------------------- 2 rows in set (0.00
sec) mysqlgt SELECT sequence FROM Sequence WHERE
accession 'seq1' ----------------------------
sequence -----------------
----------- ATGACGATCAGCATCAGCTACAGCTG
---------------------------- 1 row in set
(0.12 sec) mysqlgt SELECT length FROM Sequence
WHERE sequence_id 3 -------- length
-------- 17 -------- 1 row in set
(0.03 sec)
42
Joining tables
mysqlgt SELECT FROM Feature ------------------
-------------------- feature_id sequence_id
ontology_id -------------------------------
------- 1 2 3
2 2 4
3 2 4
-------------------------------------- 3
rows in set (0.04 sec)
43
Setting up a complex query
  • Consider sequence seq1 with the following
    features
  • Initial exon from 1..6
  • Internal exon from 15..20
  • Note that with relational model the term exon
    only appears once in the database

44
Complex query
mysqlgt SELECT FROM Sequence WHERE sequence_id
2 --------------------------------------------
-------------------------------------------
sequence_id sequence
defline accession version length moltype
---------------------------------------------
------------------------------------------
2 ATGACGATCAGCATCAGCTACAGCTG gt seq1
seq1 1 26 1
---------------------------------------------
------------------------------------------ 1
row in set (0.04 sec) mysqlgt SELECT FROM
Feature WHERE sequence_id 2 -----------------
--------------------- feature_id
sequence_id ontology_id --------------------
------------------ 1 2
3 2 2
4 3 2 4
-------------------------------------- 3
rows in set (0.03 sec) mysqlgt SELECT FROM
Location ------------------------------------
---------- location_id feature_id start
stop strand -------------------------------
--------------- 1 1
1 3 1 2 2
1 6 1 3
3 15 20 1 -------------------
--------------------------- 3 rows in set
(0.20 sec) mysqlgt SELECT FROM
Ontology -------------------------------------
---------------------------------- ontology_id
term description
-----------------------------------
------------------------------------
3 start codon denotes an Methionine codon of
a transcript 4 exon an
exon in genomic sequence
5 exon type 3'UTR, initial, internal,
terminal, 5'UTR ---------------------------
-------------------------------------------- 3
rows in set (0.00 sec) mysqlgt SELECT FROM
Qualifier ------------------------------------
------------- qualifier_id feature_id
ontology_id value -----------------------
-------------------------- 1
2 5 initial 2
3 5 internal
--------------------------------------------
----- 2 rows in set (0.03 sec)
45
Complex query
Return me the sub-sequences and coordinates of
the exon features of seq1
mysqlgt SELECT SUBSTRING(sequence, start, stop),
start, stop, term -gt FROM Sequence, Ontology,
Feature, Location -gt WHERE accession 'seq1'
AND term 'exon' AND -gt Feature.ontology_id
Ontology.ontology_id AND -gt
Feature.sequence_id Sequence.sequence_id AND
-gt Feature.feature_id Location.feature_id ---
-----------------------------------------------
--- SUBSTRING(sequence, start, stop) start
stop term ----------------------------------
------------------- ATGACG
1 6 exon CAGCTACAGCTG
15 20 exon
---------------------------------------------
-------- 2 rows in set (0.04 sec)
46
Even more complex
Return me the sub-sequences, coordinates feature
name and qualifier value of the internal exon
features of seq1
mysqlgt SELECT SUBSTRING(sequence, start, stop),
start, stop, o1.term, value -gt FROM Sequence,
Feature, Ontology o1, Ontology o2, Location,
Qualifier -gt WHERE accession 'seq1' AND
-gt o1.term 'exon' AND -gt o2.term 'exon
type' AND -gt value 'internal' AND -gt
Feature.ontology_id o1.ontology_id AND -gt
Qualifier.ontology_id o2.ontology_id AND -gt
Qualifier.feature_id Feature.feature_id AND
-gt Feature.sequence_id Sequence.sequence_id
AND -gt Location.feature_id
Feature.feature_id -----------------------------
----------------------------------
SUBSTRING(sequence, start, stop) start stop
term value -------------------------------
--------------------------------
CAGCTACAGCTG 15 20
exon internal -------------------------------
-------------------------------- 1 row in
set (0.05 sec)
47
Aggregate queries
mysqlgt SELECT FROM Sequence -----------------
-----------------------------------------------
----------------------- sequence_id
sequence defline accession
version length moltype -----------------
-----------------------------------------------
----------------------- 2
ATGACGATCAGCATCAGCTACAGCTG gt seq1 seq1
1 26 1 3
SLKLSKLPSPLYQVCLE gt seq2 L32174
1 17 3 4
MASQQQCGAR gt seq seq3
1 10 3 -----------------
-----------------------------------------------
----------------------- mysqlgt SELECT
count(), moltype from Sequence GROUP BY
moltype ------------------- count()
moltype ------------------- 1
1 2 3 -------------------
2 rows in set (0.08 sec)
48
Using LIMIT
mysqlgt SELECT FROM Sequence LIMIT
2 --------------------------------------------
-------------------------------------------
sequence_id sequence
defline accession version length moltype
---------------------------------------------
------------------------------------------
2 ATGACGATCAGCATCAGCTACAGCTG gt seq1
seq1 1 26 1
3 SLKLSKLPSPLYQVCLE gt seq2
L32174 1 17 3
---------------------------------------------
------------------------------------------ 2
rows in set (0.08 sec)
49
UPDATING a table
mysqlgt SELECT FROM Qualifier ----------------
---------------------------------
qualifier_id feature_id ontology_id value
-------------------------------------------
------ 1 2 5
initial 2 3
5 internal -----------------------------
-------------------- 2 rows in set (0.00
sec) mysqlgt UPDATE Qualifier SET value
'terminal' -gt WHERE qualifier_id 2 Query
OK, 1 row affected (0.00 sec) Rows matched 1
Changed 1 Warnings 0
50
DELETING from a table
mysqlgt DELETE FROM Qualifier -gt WHERE
qualifier_id 2 Query OK, 1 row affected (0.04
sec) mysqlgt SELECT FROM Qualifier -----------
-------------------------------------
qualifier_id feature_id ontology_id value
--------------------------------------------
---- 1 2 5
initial -------------------------------------
----------- 1 row in set (0.03 sec)
51
  • mysqlgt INSERT INTO Location (feature_id, start,
    stop, strand)
  • -gt VALUES (2, 1, 6, 1)
  • Query OK, 1 row affected (0.00 sec)
  • mysqlgt INSERT INTO Feature (sequence_id,
    ontology_id)
  • -gt VALUES (2,4)
  • Query OK, 1 row affected (0.00 sec)
  • mysqlgt SELECT From Feature
  • --------------------------------------
  • feature_id sequence_id ontology_id
  • --------------------------------------
  • 1 2 3
  • 2 2 4
  • 3 2 4
  • --------------------------------------
  • 3 rows in set (0.00 sec)

52
Optimisation
  • Perking up MySQL
  • Queries
  • Database server

53
Indexing
  • In general, indexing your data makes retrieval
    orders of magnitude faster
  • Consider a list of 1000000 sequences with
    accession numbers
  • You need to find the one sequence with accession
    number AC123456
  • Response time requires O(1000000) operations if
    the accession field is not indexed
  • Equivalent to scanning through a list
  • Response time requires O(log(1000000)) O(6)
    operations if the accession field is indexed
  • Somewhat like a hashtable lookup

54
Types of indexes
  • PRIMARY KEY
  • To identify the main accessor field of the table
  • UNIQUE
  • Constraint to ensure that all entries in a field
    are different
  • INDEX
  • Creates a way to quickly search on a given field
  • FULLTEXT
  • For large TEXT fields gt 255 characters
  • Compound indexes (column1, column2, )
  • NOTE index is synonymous with KEY

55
Drawbacks to indexing
  • Need more disk space
  • Can slow down inserts
  • Know your data and the queries you will perform
    on the data
  • Only index fields you think you will query on
  • Requires spending time in the design phase to
    define requirements of the database

56
Creating an index
mysqlgt CREATE INDEX acindex ON Sequence
(accession) Query OK, 1 row affected (0.18
sec) Records 1 Duplicates 0 Warnings 0
57
Tuning the database
  • gt mysqladmin variables
  • gt mysqld --help

58
Variables (--variable-namevalue) and boolean
options FALSETRUE Value (after reading
options) ---------------------------------
----------------------------- basedir
/raid/db/mysql/mysql-max-4.0.14-pc-l
inux-i686/ bdb-home (No
default value) bdb-logdir
(No default value) bdb-tmpdir
(No default value) bind-address
(No default value) console
FALSE chroot
(No default value) character-sets-dir
/raid/db/mysql/mysql-max-4.0.14-pc-linux-i686/s
hare/mysql/charsets/ datadir
/raid/db/mysql/mysql-max-4.0.14-pc-linux-i686
/data/ default-character-set
latin1 enable-locking
FALSE enable-pstack FALSE gdb
FALSE innodb_data_ho
me_dir (No default
value) innodb_log_group_home_dir (No
default value) innodb_log_arch_dir
(No default value) innodb_flush_log_at_trx_commit
1 innodb_flush_method (No
default value) innodb_fast_shutdown
TRUE innodb_max_dirty_pages_pct
90 init-file (No default
value) log (No
default value) language
/raid/db/mysql/mysql-max-4.0.14-pc-linux-i686/shar
e/mysql/english/ local-infile
TRUE log-bin (No
default value) log-bin-index
(No default value) log-isam
myisam.log log-update
(No default value) log-slow-queries
(No default value) log-slave-updates
FALSE low-priority-updates
FALSE master-host (No
default value) master-user
test master-port 3306
59
master-connect-retry
60 master-retry-count
86400 master-info-file
master.info master-ssl
FALSE master-ssl-key (No
default value) master-ssl-cert
(No default value) master-ssl-capath
(No default value) master-ssl-cipher
(No default value) myisam-recover
OFF memlock
FALSE disconnect-slave-event-count
0 abort-slave-event-count
0 max-binlog-dump-events
0 sporadic-binlog-dump-fail FALSE new
FALSE old-protocol
10 old-rpl-compat
FALSE pid-file
/raid/db/mysql/mysql-max-4.0.14-pc-linux-i686/data
/watson.pid log-error port
3306 report-host (No
default value) report-user
(No default value) report-password
(No default value) report-port
3306 rpl-recovery-rank
0 relay-log (No default
value) relay-log-index (No
default value) safe-user-create
FALSE server-id
1 show-slave-auth-info
FALSE concurrent-insert
TRUE skip-grant-tables
FALSE skip-slave-start
FALSE relay-log-info-file
relay-log.info slave-load-tmpdir
/raid/tmp/ socket
/tmp/mysql.sock sql-bin-update-same
FALSE sql-mode
OFF temp-pool TRUE tmpdir
/raid/tmp
60
external-locking
FALSE use-symbolic-links
TRUE symbolic-links
TRUE log-warnings
FALSE warnings
FALSE back_log
50 bdb_cache_size
8388600 bdb_log_buffer_size
0 bdb_max_lock
10000 bdb_lock_max
10000 binlog_cache_size
32768 connect_timeout
5 delayed_insert_timeout
300 delayed_insert_limit
100 delayed_queue_size
1000 flush_time
0 ft_min_word_len
4 ft_max_word_len
254 ft_max_word_len_for_sort
20 ft_stopword_file (No default
value) innodb_mirrored_log_groups
1 innodb_log_files_in_group
2 innodb_log_file_size
5242880 innodb_log_buffer_size
1048576 innodb_buffer_pool_size
8388608 innodb_additional_mem_pool_size
1048576 innodb_file_io_threads
4 innodb_lock_wait_timeout
50 innodb_thread_concurrency
8 innodb_force_recovery
0 interactive_timeout
28800 join_buffer_size
131072 key_buffer_size
402653184 long_query_time
10 lower_case_table_names
FALSE max_allowed_packet
1047552 max_binlog_cache_size
4294967295 max_binlog_size
1073741824 max_connections
100 max_connect_errors
10 max_delayed_threads
20 max_heap_table_size 16777216
61
max_join_size
18446744073709551615 max_relay_log_size
0 max_seeks_for_key
4294967295 max_sort_length
1024 max_tmp_tables
32 max_user_connections
0 max_write_lock_count
4294967295 bulk_insert_buffer_size
8388608 myisam_block_size
1024 myisam_max_extra_sort_file_size
268435456 myisam_max_sort_file_size
2147483647 myisam_repair_threads
1 myisam_sort_buffer_size
67108864 net_buffer_length
16384 net_retry_count
10 net_read_timeout
30 net_write_timeout
60 open_files_limit
0 query_cache_limit
1048576 query_cache_size
33554432 query_cache_type
1 read_buffer_size
2093056 read_rnd_buffer_size
262144 record_buffer
2093056 relay_log_space_limit
0 slave_compressed_protocol
FALSE slave_net_timeout
3600 read-only
FALSE slow_launch_time
2 sort_buffer_size
2097144 table_cache
512 thread_concurrency
8 thread_cache_size
8 tmp_table_size
33554432 thread_stack
196608 wait_timeout
28800 default-week-format 0 To see
what values a running MySQL server is using,
type 'mysqladmin variables' instead of 'mysqld
--help'.
62
Tuning the system to your needs
  • Need to think about uses of the database
  • How many concurrent connections?
  • Will there be large records?
  • Will there be repetitive queries?
  • Will I need large indexes?
  • Tuning the system can give huge gains in
    performance lets you get the most out of the
    system

63
Important parameters
  • max_allowed_packet
  • Largest amount of data to be transmitted to the
    client in 1 packet
  • max_connections
  • The largest number of concurrent connections to
    the database server
  • datadir
  • The location of the data files on the system
  • query_cache
  • Size of cache for repetitive queries
  • Many, many others..

64
DBA
65
COMMUNICATING WITH MySQL
66
Communicating with MySQL
  • Through a GUI
  • MySQL ControlCentre
  • http//www.mysql.com/products/mysqlcc/
  • Standalone application supported by MySQL
  • Through the web
  • PhpMyAdmin
  • http//www.phpmyadmin.net/home_page/
  • Works with Apache web server
  • Through the Unix command line
  • MySQL client
  • Comes with MySQL
  • Through APIs (Application Programming Interface)
  • MySQL C API
  • Perl DBI
  • MySQL (C)
  • http//dev.mysql.com/downloads/other/plusplus/
  • JDBC (Java Database Connectivity)
  • Java protocol and API for RDBMS communication

67
Communicating with MySQL
  • Choose the method that is right for the job
  • Administration
  • MySQL CC
  • PHP MyAdmin
  • Standalone Application
  • APIs
  • Web Application
  • PHP/Java servlets
  • Low throughput queries
  • Command line client

68
Working with JDBC
  • JDBC is a standard API that provides
    database-independent connectivity to allow a Java
    application to interact with a database

http//java.sun.com/products/jdbc/overview.html
69
Connector/J
  • JDBC implementation for MySQL is Connector/J
  • http//www.mysql.com/products/connector/j/
  • Installation
  • export CLASSPATH/path/to/mysql-connector-java-
    version-bin.jarCLASSPATH

70
Connector/J Steps
  • Establish a connection
  • Prepare one or more queries
  • Execute one or more queries
  • Process the results (if applicable)
  • Destroy connection

71
Connector/J Establishing a connection
  • // we need the following 6 variables to make a
    jdbc connection
  • String DBSERVERNAME mysql
  • String JDBCDRIVERNAME com.mysql.jdbc.Driver
  • String host my.database.com
  • String databaseName sequence
  • String user me
  • String password mypwd
  • //load the driver into memory
  • Class.forName(JDBCDRIVERNAME).newInstance()
  • // create the connection URL
  • String connectionURL "jdbc DBSERVERNAME
    //" host "/" database "?" "user
    user "password" password
  • // get the connection from the driver manager
  • Connection connection
    DriverManager.getConnection(connectionURL)


72
Preparing and executing a query
// object required to execute the query
Statement statement null // object to store
results of the query ResultSet resultSet
null // create the query string String query
"SELECT sequence_id FROM Sequence" //
initialise the statement statement
connection.createStatement() // execute the
query the results are returned in resultSet
resultSet statement.executeQuery(query)

73
Process the results close connection
// iterate through the rows returned by the
query while (resultSet.next()) int
sequenceId resultSet.getInt("sample_id")
// do something with the sequenceId //
destroy the connection if its no longer needed
connection.close()
74
Topics not covered
  • MySQL tools
  • mysqldump
  • Tool to dump a schema, all the data and/or both
  • mysqlimport
  • Tool to import delimited files
  • Look before you parse!
  • mysqladmin
  • For DBAs to create database, change passwords,
    etc
  • Read the mysql documentation

75
Topics not covered
  • Setting connection parameters in JDBC
  • Consult Connector/J docs
  • Database design
  • Extremely important process
  • Many courses at univ/college

76
Summary
  • Relational databases are necessary in
    bioinformatics
  • Relational databases allow us to efficiently
    store and query large amounts of data
  • MySQL is a good choice for RDBMS engine because
    it is highly functional at no cost
  • JDBC provides a way to access MySQL from within a
    Java program

77
Resources
  • MySQL
  • http//www.mysql.com
  • http//dev.mysql.com.mysql/en/index.html
  • http//www.mysql.com/products/mysqlcc/
  • http//dev.mysql.com/doc/connector/j/en
  • NAR Database Issue 2004
  • http//nar.oupjournals.org/content/vol32/suppl_1
  • JDBC
  • http//java.sun.com/products/jdbc/
  • Me
  • sohrab_at_bioinformatics.ubc.ca
Write a Comment
User Comments (0)
About PowerShow.com