Title: Database Text Searching using Triggers, TSearch2, Materialized Views and User Security
1Database Text Searching using Triggers,
TSearch2,Materialized Viewsand User Security
2SCHARPThe Statistical Center for HIV/AIDS
Research Prevention
- The Statistical Center for HIV/AIDS Research
Prevention (SCHARP) is part of the program of
Population Sciences within the Vaccine and
Infectious Disease Institute (VIDI) of the Fred
Hutchinson Cancer Research Center in Seattle,
Washington. SCHARP provides statistical
collaboration to infectious disease researchers
around the world and conducts a complementary
program of statistical methodology, and
mathematical modeling research. SCHARP also
collects, manages, and analyzes data from
clinical trials and epidemiological studies
dedicated to the elimination of infectious
disease as a threat to human health. - Formerly known as HIVNET, SCHARP has been in
existence since 1994. We currently serve as the
Statistical and Data Management Center for the
HIV Prevention Trials Network (HPTN), the HIV
Vaccine Trials Network (HVTN), and the
Microbicide Trials Network (MTN). The HVTN, HPTN,
and MTN are part of the Division of AIDS (DAIDS)
of the National Institute of Allergy and
Infectious Diseases (NIAID), a component of the
National Institutes of Health (NIH). SCHARP also
collaborates with other organizations, including
the Chinese Center for Disease Control and
Prevention, the Center for HIV-AIDS Vaccine
Immunology (CHAVI), the Partnership for AIDS
Vaccine Evaluation (PAVE), and the International
Maternal Pediatric Adolescent AIDS Clinical
Trials (IMPAACT). - SCHARP conducts clinical trials in over 19
countries around the globe, including several
Sub-Saharan African and Caribbean countries,
India, Russia, China, Thailand, Brazil, and Peru.
3The Project
- Create a search engine to search the database and
return relevant results without knowing what
tables or columns to search. - Return result that the logged in user has select
(read) access to.
4Other Issues
- Joining all the tables and search all the columns
can take a very long time - The Tsearch2 encoder is not fast.
- With both of these issues, it caused me to use
the materialized view, so that all the encoding
was already done.
5The Solution
- Add triggers to the tables that we wish to be
searched and have them update a materialized view
that contains the GIS version of the data created
by TSearch2 - Create a function that searches this materialized
view and returns the results. - Now go back to that function and add in User
security, so that the user only sees data that
they have permission to view.
6Install TSearch2
- Due to the fact that we are running Solaris 9
with PostgreSQL v8.2.x we had to install
TSearch2. - We are have just completed upgrading to Solaris
10 and are planning to upgrade to PostgreSQL
v8.3.x in October and will be revising these
tools at that point. - Also an issue is that we are running the 64 bit
version of PostgreSQL and this makes the upgrades
harder. - To Install Tsearch2
- psql -h server_name -U postgres -d database_name
7The Materialized View
- CREATE TABLE "tools"."search_table" (
- "search_id" SERIAL,
- "row_id" TEXT,
- "column_name" NAME,
- "table_name" NAME,
- "schema_name" NAME,
- "database_name" NAME,
- "server_name" NAME,
- "vector" TSVECTOR,
- "description" TEXT,
- CONSTRAINT "search_table_pkey" PRIMARY
KEY("search_id") - ) WITH OIDS
- CREATE INDEX "search_table_index" ON
"tools"."search_table" - USING gist ("vector" "public"."gist_tsvector_ops
")
8The Trigger
- CREATE TABLE "al"."list_ldms_additive_code" (
- "pk" SERIAL,
- "ldms_additive_code" VARCHAR(255),
- "additive_description" TEXT,
- "info_source" VARCHAR(255),
- "notes" TEXT,
- CONSTRAINT "list_ldms_additive_code_ldms_additiv
e_code_key" UNIQUE("ldms_additive_code"), - CONSTRAINT "list_ldms_additive_code_pkey"
PRIMARY KEY("pk") - ) WITH OIDS
- CREATE TRIGGER "search_list_ldms_additive_code_tr"
AFTER INSERT OR UPDATE OR DELETE - ON "al"."list_ldms_additive_code" FOR EACH ROW
- EXECUTE PROCEDURE "tools"."update_search"(atlassql
-a, pk)
9Search Function Examples
- Search Function
- SELECT FROM tools.database_search
(search_string, server_name, database_name,
schema_name, table_name, column_name) - Search Parameters
- search_string is mandatory and is really the only
item needed. If you just fill in this one without
filling in any other items, the the entire
database will be searched for the text string. - server_name is optional and really reserved for
future use. (If you are using schema_name, you
may also wish to specify server_name and
database_name also to have your code pre-written
for the future.) - database_name is optional and really reserved for
future use. (If you are using schema_name, you
may also wish to specify server_name and
database_name also to have your code pre-written
for the future.) - schema_name is optional. This allows you to
search a specific schema instead of the entire
database. - table_name is optional. This allows you to search
a specific table. Without specifying the
schema_name it will search all tables with that
table name across in the entire database. - column_name is optional. This allows you to
search a specific column. Without specifying the
table_name or schema_name it will search all
tables across in the entire database that use
that column_name. - The server_name, database_name, schema_name,
table_name, and column_name allow you to perform
a WHERE on the results before the search is done
instead of afterwards. This makes the search
function much faster as it does not need to
search all those extra results. - Search Strings Commands
- and
- or
- ! not
- () grouping
- Search String Examples
- yellow
- grass green
- sun (yellow red)
10Search Function Examples
- Search String Results Examples
- SELECT FROM tools.database_search ('yellow',
'', '', '', '', '')
SELECT FROM tools.database_search ('grass
green', '', '', '', '', '')
11Search Function Examples
SELECT FROM tools.database_search ('sun
(yellow red)', '', '', '', '', '')
12Search Function Examples
SELECT FROM tools.database_search ('sun
!red)', '', '', '', '', '')
- Bugs / Notes
- If your key field data contains a ' the response
may not be escaped and so therefore may be
unusable. This needs to be verified and then
fixed if this is the case. - This function does not understand when you switch
roles. This is because the function needs to run
with its own permissions and this will skew the
results if it was using the role permissions. - Security Note
- You will only see the responses for tables that
you have SELECT permissions.
13Bulk Update(Add an entire table)
- Insert Search Data Function
- SELECT tools.insert_search_data ('server_name,
schema_name, table_name') - SELECT tools.insert_search_data ('server_name,
schema_name, table_name, ignore_column_name,
...') - Insert Search Data Parameters
- server_name The name of the server that that
the table resides in. - schema_name The name of the schema that that
the table resides in. - table_name The name of the table that you want
to be searchable. - ignore_column_name The name of the column in
the table that you do not wish to be searchable,
such as personal data, payroll data, id keys,
etc. - Search String Examples
- SELECT tools.insert_search_data('atlassql-test,
public, testing') SELECT tools.insert_search_dat
a('atlassql-test, public, testing, key_id,
not_string') - Notes
- If you run this command more than once on the
same table, you will get duplicate results. You
must run the drop_search_data function between
each insert_search_data function. - If any records are inserted, updated, or deleted
while this is happening you may have some
integrity issues. - If you are dropping and re-creating the
view/tables, it is better to drop the table, run
the drop_search_data function, then create the
table, add the trigger and finally insert the
data. This way there will not be any data
integrity issues. - This function is really just designed for doing
the initial inserts for an existing table that
the trigger is being added to.
14Bulk Delete(Entire table or more at once)
- Drops Search Data Function
- SELECT tools.drop_search_data (server_name,
database_name, schema_name, table_name) - Drop Search Data Parameters
- server_name is optional and really reserved for
future use. (If you are using schema_name, you
may also wish to specify server_name and
database_name also to have your code pre-written
for the future.) - database_name is optional and really reserved for
future use. (If you are using schema_name, you
may also wish to specify server_name and
database_name also to have your code pre-written
for the future.) - schema_name is optional. This allows you to
search a specific schema instead of the entire
database. - table_name is optional. This allows you to search
a specific table. Without specifying the
schema_name it will search all tables with that
table name across in the entire database. - You must enter one of the above or nothing will
be processed - Search String Examples
- SELECT tools.drop_search_data ('atlassql-test',
'atlastest', 'public', 'testing') - SELECT tools.drop_search_data ('atlassql-test',
'atlastest', 'public', '') - SELECT tools.drop_search_data ('atlassql-test',
'atlastest', '', '')
15Lookup Role Users
- User Search Function
- This function will return to you a list of
everybody who is attached to a role (user or
group) that has inherit privileges turned on. - SELECT FROM tools.lookup_role_users('role_name')
- Search Parameters
- role_name
- This is either a user or group name
- Search String Examples
- SELECT FROM tools.lookup_role_users('dba')
- SELECT FROM tools.lookup_role_users('lalbin')
- Search String Results Examples
- SELECT FROM tools.lookup_role_users('dba')
- SELECT FROM tools.lookup_role_users('lalbin')
16Lookup User Permissions
- User Privilege Search Function
- This function returns to you a list of the items
and the users who have the privilege type
requested. This function does drill down through
the groups also to find additional users why have
inherit privileges turned on. You also must use
the DISTINCT because this function returns
duplicates because of the drill down that it
performs. - SELECT DISTINCT FROM tools.display_user_perms('i
tem_type','privilege_type') - Search Parameters
- item_type
- r ordinary table
- i index
- S sequence
- v view
- c composite type
- s special
- t TOAST table
- privilege_type
- r SELECT ("read")
- w UPDATE ("write")
- a INSERT ("append")
- d DELETE
- R RULE
17Lookup User Permissions
- Search String Examples
- SELECT DISTINCT FROM tools.display_user_perms('r
','r') WHERE nspname 'public' - SELECT DISTINCT FROM tools.display_user_perms('r
','r') WHERE rolname 'lalbin' - SELECT DISTINCT FROM tools.display_user_perms('r
','r') WHERE relname 'testing' - SELECT DISTINCT FROM tools.display_user_perms('v
','r') WHERE rolname 'lalbin' - SELECT DISTINCT FROM tools.display_user_perms('r
','w') WHERE rolname 'lalbin' - Search String Results Examples
- SELECT DISTINCT FROM tools.display_user_perms('r
','r') WHERE nspname 'public'
18Lookup User Permissions
- SELECT DISTINCT FROM tools.display_user_perms('r
','r') WHERE rolname 'lalbin'
- Security Note
- This function does not understand when you switch
roles. This is because the function needs to run
with its own permissions and this will skew the
results if it was using the role permissions. - This function will only return results from the
database that you are currently logged into.
19Foreign Key Relationships(Crabbing or Crawling
the Data sideways)
- Foreign Key Relationship Search Function
- SELECT FROM tools.foreign_key_details
(schema_name, table_name, direction) - Foreign Key Relationship Parameters
- schema_name The name of the schema that that
the table resides in. - table_name The name of the table that you want
to be searchable. - direction 0 - returns tables that are
referenced by this data, 1 - returns tables that
are reference by this data. - Search Foreign Key Relationship Examples
- SELECT FROM tools.foreign_key_details('public',
'testing, 0) - Search Foreign Key Relationship Results Examples
- SELECT FROM tools.foreign_key_details ('al',
'list_antibody, 0)
- Security Note
- The function knows which role you are logged in
as and will return the appropriate results.
20update_search()
- CREATE OR REPLACE FUNCTION "tools"."update_search"
() RETURNS trigger AS - body
- Half of this code was taken from the shadow_fn
- If shadow_fn has been updated you may wish to
- also apply those updates here.
- my updates_table 0
- my database_name ""
- my server_name
- if (length(_0) 0)
-
- server_name _0
-
- else
-
- server_name "atlassql-test"
-
- if the search_table_updates table doesn't
exist then error
21update_search()
- get the primary key values for the row
- sql "SELECT column_name FROM
information_schema.key_column_usage " - ."WHERE table_schema'"._TD-table_schem
a."' " - ."AND table_name'"._TD-table_name."'
" - ."AND position_in_unique_constraint IS
NULL " - ."ORDER BY ordinal_position"
- rv spi_exec_query(sql)
- join primary key fields into one big field
- my oldnew _TD-event eq "DELETE" ?
"old" "new" - my _at_pkey_vals map vals_-column_name
oldnew _at_rv-rows - my _at_pkey_cols map _-column_name
_at_rv-rows - my pkey join(",",_at_pkey_vals)
- my pkey join("''",_at_pkey_cols,_at_pkey_vals)
- pkey pkey . "''"
- my pkey ""
- while (_at_pkey_vals)
-
- if (length(pkey))
-
22update_search()
- insert for each changed field
- foreach my c (_at_cols)
-
- process_column 0
-
- update only the changed columns
- if (valscold ne valscnew)
-
- process_column 1
-
- if UPDATE and the keys have changed, update
all columns - if (_TD-event eq "UPDATE")
-
- if (pkey ne pkey_update)
-
- process_column 1
-
-
23update_search()
- if (_TD-event eq "INSERT")
-
- sql "INSERT INTO
tools.search_table " - ."(row_id, column_name,
table_name, schema_name, " - ."database_name, server_name,
vector, description) " - ."VALUES ('pkey', 'c',
'"._TD-table_name."', " - ."'"._TD-table_schema."',
current_database(), 'server_name', " - ."to_tsvector('".valscnew
."'), '".valscnew."')" - spi_exec_query(sql)
-
- elsif (_TD-event eq "UPDATE")
-
- sql "UPDATE tools.search_table
SET " - ."row_id 'pkey',
column_name 'c', " - ."table_name
'"._TD-table_name."', schema_name
'"._TD-table_schema."', " - ."database_name
current_database(), server_name 'server_name',
" - ."vector to_tsvector('".val
scnew."'), description '".valscnew."'
" - ."WHERE "
- ."row_id 'pkey_update' AND
column_name 'c' AND "
24database_search(search text, server, database,
schema, table, column)
- CREATE OR REPLACE FUNCTION "tools"."database_searc
h" (text, text, text, text, text, text) RETURNS
SETOF "tools"."schema_search_t" AS - body
- my row
- my search_config ""
- if (length(_1) 0)
-
- if (length(search_config) 0)
-
- search_config search_config . " AND
server_name '" . _1 . "'" -
- else
-
- search_config " WHERE server_name '" .
_1 . "'" -
-
- if (length(_2) 0)
-
25database_search(search text, server, database,
schema, table, column)
- if (length(_4) 0)
-
- if (length(search_config) 0)
-
- search_config search_config . " AND
table_name '" . _4 . "'" -
- else
-
- search_config " WHERE table_name '" .
_4 . "'" -
-
- if (length(_5) 0)
-
- if (length(search_config) 0)
-
- search_config search_config . " AND
column_name '" . _5 . "'" -
26database_search(search text, server, database,
schema, table, column)
- my sth spi_query("SELECT m.search_id, "
- ."headline(m.description, q) As headline, "
- ."rank(m.vector, q) as rank, m.row_id, "
- ."m.column_name, m.table_name, "
- ."m.schema_name, m.database_name, "
- ."m.server_name "
- ."FROM (SELECT FROM "
- ."(SELECT FROM tools.search_table
".search_config.") AS t " - ."LEFT JOIN (SELECT DISTINCT from
tools.display_user_perms('r', 'r') " - ."WHERE rolname SESSION_USER) AS u "
- ."ON u.relname t.table_name AND u.nspname
t.schema_name " - ."WHERE u.rolname IS NOT NULL) AS m, "
- ."to_tsquery('_0') AS q "
- ."WHERE vector _at__at_ q "
- ."ORDER BY rank(vector, q) DESC")
- while (defined (row spi_fetchrow(sth)))
-
- return_next(
27display_user_perms(item type, access type)
- CREATE OR REPLACE FUNCTION "tools"."display_user_p
erms" (text, text) RETURNS SETOF
"tools"."user_perms" AS - body
- First Value Passed
- r ordinary table
- i index
- S sequence
- v view
- c composite type
- s special
- t TOAST table
-
- Second Value Passed
- r -- SELECT ("read")
- w -- UPDATE ("write")
- a -- INSERT ("append")
- d -- DELETE
- R -- RULE
- x -- REFERENCES
- t -- TRIGGER
28display_user_perms(item type, access type)
- my who_user spi_fetchrow(who_user_table)
- my sth spi_query("SELECT current_database()
AS - catname, nc.nspname,
- c.relname, a.rolname
- FROM pg_catalog.pg_class AS c
- LEFT JOIN pg_catalog.pg_authid AS a ON a.oid
c.relowner - LEFT JOIN pg_catalog.pg_namespace AS nc ON
nc.oid c.relnamespace - WHERE c.relkind '_0' AND nc.nspname !
'pg_' - AND c.relacl IS NULL AND (NOT
pg_is_other_temp_schema(nc.oid)) - ORDER BY c.relname")
- while (defined (row spi_fetchrow(sth)))
-
- if (length(row-rolname) 0)
-
- user login
- user_rtn who_user-session_user
- user via login then changed by set
role - user_rtn who_user-current_user
-
29display_user_perms(item type, access type)
- if (length(row3-rolname) 0)
-
- user login
- user_rtn who_user-session_user
- user via login then changed by set
role - user_rtn who_user-current_user
-
- else
-
- user_rtn row3-rolname
-
- return_next(
-
- catname row-catname,
- nspname row-nspname,
- relname row-relname,
- rolname user_rtn
- )
-
30display_user_perms(item type, access type)
- while (defined (row spi_fetchrow(sth)))
-
- postgresarwdxt/postgres,main_ror/post
gres - xxxx -- privileges granted to
PUBLIC - unamexxxx -- privileges granted to a
user - group gnamexxxx -- privileges granted to a
group -
- Trim first and last characters ""
- create array at ","
- _at_users_privs split(/,/,
substr(row-relacl,1,-1)) - my user_privlages
- Loop through all the users
- foreach user_privlages (_at_users_privs)
-
- if (substr(user_privlages,0,5) eq
"group") -
31display_user_perms(item type, access type)
-
- user_rtn user_privs0
-
- return_next(
-
- catname row-catname,
- nspname row-nspname,
- relname row-relname,
- rolname user_rtn
- )
- my sth2 spi_query("SELECT FROM
tools.lookup_role_users('row-rolname')") - while (defined (row2 spi_fetchrow(sth2)))
-
- elog(INFO, row2-rolname . "" .
user_privs1) - if (length(row2-rolname) 0)
-
- user login
- user_rtn who_user-session_user
- user via login then changed by
set role
32lookup_role_users (name)
- CREATE OR REPLACE FUNCTION "tools"."lookup_role_us
ers" (name) RETURNS SETOF "tools"."lookup_role_use
rs" AS - body
- my row
- my row2
- my sth spi_query("select u.rolname,
u.rolinherit - FROM pg_auth_members LEFT JOIN pg_authid u
- ON pg_auth_members.member u.oid
- LEFT JOIN pg_authid r
- ON pg_auth_members.roleid r.oid
- WHERE r.rolname '_0'
- AND u.rolinherit TRUE")
- while (defined (row spi_fetchrow(sth)))
-
- moved to be part of the above query
- if (row-rolinherit eq 't')
-
- return_next(
-
- rolname row-rolname
33foreign_key_details(schema name, table name,
direction)
- CREATE OR REPLACE FUNCTION "tools"."foreign_key_de
tails" (text, text, integer) RETURNS SETOF
"tools"."foreign_key_details_t" AS - body
- my sql
- my rv2
- my row2
-
- elog(NOTICE,"ROW START")
- sql "SELECT tc.constraint_name, "
- ."tc.constraint_type, "
- ."tc.table_name, "
- ."kcu.column_name, "
- ."tc.is_deferrable, "
- ."tc.initially_deferred, "
- ."rc.match_option AS match_type, "
- ."rc.update_rule AS on_update, "
- ."rc.delete_rule AS on_delete, "
- ."ccu.table_schema AS
references_schema, " - ."ccu.table_name AS references_table, "
- ."ccu.column_name AS references_field "
34foreign_key_details(schema name, table name,
direction)
- if (_2 0)
-
- sql sql . " AND tc.table_name
'"._1."' " - ."AND tc.table_schema '"._0."' "
-
- else
-
- sql sql . " AND ccu.table_name
'"._1."' " - ."AND ccu.table_schema '"._0."' "
-
- rv2 spi_query(sql)
- while (defined (row2 spi_fetchrow(rv2)))
-
- return_next(
-
- constraint_name row2-constraint_name,
35user_perms
- CREATE TYPE "tools"."user_perms" AS (
- "catname" NAME,
- "nspname" NAME,
- "relname" NAME,
- "rolname" NAME
- )
36lookup_role_users
- CREATE TYPE "tools"."lookup_role_users" AS (
- "rolname" NAME
- )
37schema_search_t
- CREATE TYPE "tools"."schema_search_t" AS (
- "search_id" INTEGER,
- "headline" TEXT,
- "rank" REAL,
- "row_id" TEXT,
- "column_name" NAME,
- "table_name" NAME,
- "schema_name" NAME,
- "database_name" NAME,
- "server_name" NAME
- )
38foreign_key_details_t
- CREATE TYPE "tools"."foreign_key_details_t" AS (
- "constraint_name" NAME,
- "table_name" NAME,
- "column_name" NAME,
- "references_schema" NAME,
- "references_table" NAME,
- "references_field" NAME
- )
39Future Enhancements
- Finish the code so that it fully supports
multi-databases and multi-servers - Update the code for the built in TSearch2 in
PostgreSQL v8.3.x
40Additional Possible Talks
- 64 Bit PostgreSQL Windows Drivers
- How to Create and Install both the 32bit the
64bit drivers on the same computer and the
reasons for doing so. - Materialized Views
- Using Triggers for real time materialized views.
- Using a cron job for materialized snapshots (with
table safety in case snapshot generation fails). - Shadow Tables, backup tables that show all
changes (what) and by who and when the took
place. - Microsoft Access
- Using Microsoft Access as a front end for
PostgreSQL databases. - Implementing PosgreSQL user right within your
Access Database. - How to remove the username and/or password from
Access so that others cant gain access to your
account. - Limitations of Data compared to PostgreSQL and
things to watch out for when moving data between
the systems. - Microsoft Excel
- Using PostgreSQL data in Data Tables and Pivot
Tables. - How to remove the username and/or password from
Excel so that others cant gain access to your
account. - Writing Data back into PostgreSQL.
- How to fix all your links in Excel to a
PostgreSQL server that no longer exists and
redirect them to a new server.