Special%20Topics%20in%20Computer%20Science%20The%20Art%20of%20Information%20Retrieval%20Chapter%207:%20Text%20Operations - PowerPoint PPT Presentation

About This Presentation

Title:

Special%20Topics%20in%20Computer%20Science%20The%20Art%20of%20Information%20Retrieval%20Chapter%207:%20Text%20Operations

Description:

Need to decompress from the beginning. Not for IR. Dictionary. Pointers to previous occurrences. ... Size compressed / size decompressed. Huffman, units = words: ... – PowerPoint PPT presentation

Number of Views:98

Avg rating:3.0/5.0

Slides: 26

Provided by: alexande95

Category:

Tags: 20art | 20chapter | 20computer | 20information | 20operations | 20retrieval | 20science | 20text | 20the | 20topics | 20in | 20of | decompress | preprocessing | special

Transcript and Presenter's Notes

Title: Special%20Topics%20in%20Computer%20Science%20The%20Art%20of%20Information%20Retrieval%20Chapter%207:%20Text%20Operations

1
Special Topics in Computer ScienceThe Art of
Information RetrievalChapter 7 Text Operations

Alexander Gelbukh
www.Gelbukh.com

2
Previous chapter Conclusions

Modeling of text helps predict behavior of
systems
Zipf law, Heaps law
Describing formally the structure of documents
allows to treat a part of their meaning
automatically, e.g., search
Languages to describe document syntax
SGML, too expensive
HTML, too simple
XML, good combination

3
Text operations

Linguistic operations
Document clustering
Compression
Encription (not discussed here)

4
Linguistic operations

Purpose Convert words to meanings
Synonyms or related words
Different words, same meaning. Morphology
Foot / feet, woman / female
Homonyms
Same words, different meanings. Word senses
River bank / financial bank
Stopwords
Word, no meaning. Functional words
The

5
For good or for bad?

More exact matching
Less noise, better recall
Unexpected behavior
Difficult for users to grasp
Harms if introduces errors
More expensive
Adds a whole new technology
Maintenance language dependents
Slows down
Good if done well, harmful if done badly

6
Document preprocessing

Lexical analysis (punctuation, case)
Simple but must be careful
Stopwords. Reduces index size and pocessing time
Stemming connected, connection, connections, ...
Multiword expressions hot dog, B-52
Here, all the power of linguistic analysis can be
used
Selection of index terms
Often nouns noun groups computer science
Construction of thesaurus
synonymy network of related concepts (words or
phrases)

7
Stemming

Methods
Linguistic analysis complex, expensive
maintenance
Table lookup simple, but needs data
Statistical (Avetisyan) no data, but imprecise
Suffix removal
Suffix removal
Porter algorithm. Martin Porter. Ready code on
his website
Substitution rules sses ? s, s ? ?
stresses ? stress.

8
Better stemming

The whole problematics of computational
linguistics
POS disambiguation
well ? adverb or noun? Oil well.
Statistical methods. Brill tagger
Syntactic analysis. Syntactic disambiguation
Word sense disambiguatiuon
bank1 and bank2 should be different stems
Statistical methods
Dictionary-based methods. Lesk algorithm
Semantic analysis

9
Thesaurus

Terms (controlled vocabulary) and relationships
Terms
used for indexing
represent a concept. One word or a phrase.
Usually nouns
sense. Definition or notes to distinguish senses
key (door).
Relationships
Paradigmatic
Synonymy, hierarchical (is-a, part),
non-hierarchical
Syntagmatic collocations, co-occurrences
WordNet. EuroWordNet
synsets

10
Use of thesurus

To help the user to formulate the query
Navigation in the hierarchy of words
Yahoo!
For the program, to collate related terms
woman ? female
fuzzy comparison woman ? 0.8 female. Path
length

11
Yahoo! vs. thesaurus

The book says Yahoo! is based on a thesaurus.
I disagree
Tesaurus words of language organized in
hierarchy
Document hierarchy documents attached to
hierarchy
This is word sense disambiguation
I claim that Yahoo! is based on (manual) WSD
Also uses thesaurus for navigation

12
Text operations

Linguistic operations
Document clustering
Compression
Encription (not discussed here)

13
Document clustering

Operation on the whole collection
Global vs. local
Global whole collection
At compile time, one-time operation
Local
Cluster the results of a specific query
At runtime, with each query
Is more a query transformation operation
Already discussed in Chapter 5

14
Text operations

Linguistic operations
Document clustering
Compression
Encription (not discussed here)

15
Compression

Gain storage, transmission, search
Lost time on compressing/decompressing
In IR need for random access.
Blocks do not work
Also pattern matching on compressed text

16
Compression methods

Statistical
Huffman fixed size per symbol.
More frequent symbols shorter
Allows starting decompression from any symbol
Arithmetic dynamic coding
Need to decompress from the beginning
Not for IR
Dictionary
Pointers to previous occurrences. Lampel-Ziv
Again not for IR

17
Compression ratio

Size compressed / size decompressed
Huffman, units words up to 2 bits per char
Close to the limit entropy. Only for large
texts!
Other methods similar ratio, but no random
access
Shannon optimal length for symbol with
probability p is - log2 p
Entropy Limit of compression
Average length with optimal coding
Property of model

18
Modeling

Find probability for the next symbol
Adaptive, static, semi-static
Adaptive good compression, but need to start
frombeginning
Static (for language) poor compression, random
access
Semi-static (for specific text two-pass) both
OK
Word-based vs. character-based
Word-based better compression and search

19
Huffman coding

Each symbol is encoded, sequentially
More frequent symbols have shorter codes
No code is a prefix of another one
How to buildthe tree book
Byte codesare better
Allow forsequentialsearch

20
Dictionary-based methods

Static (simple, poor compression), dynamic,
semi-static.
Lempel-Ziv references to previous occurrence
Adaptive
Disadvantages for IR
Need to decode from the very beginning
New statistical methods perform better

21
Comparison of methods
22
Compression of inverted files

Inverted file words lists of docs where they
occur
Lists of docs are ordered. Can be compressed
Seen as lists of gaps.
Short gaps occur more frequently
Statistical compression
Our work order the docs for better compression
We code runs of docs
Minimize the number of runs
Distance of different words
TSP.

23
Research topics

All computational linguistics
Improved POS tagging
Improved WSD
Uses of thesaurus
for user navigation
for collating similar terms
Better compression methods
Searchable compression
Random access

24
Conclusions

Text transformation meaning instead of strings
Lexical analysis
Stopwords
Stemming
POS, WSD, syntax, semantics
Ontologies to collate similar stems
Text compression
Searchable
Random access
Word-based statistical methods (Huffman)
Index compression

25
Thank you! Till compensation lecture

Write a Comment

User Comments (0)

About PowerShow.com

Recommended Relevance Latest Highest Rated Most Viewed

Sort by:

Related More from user

CrystalGraphics Presentations

World's Best PowerPoint Templates PowerPoint PPT Presentation

World's Best PowerPoint Templates - CrystalGraphics offers more PowerPoint templates than anyone else in the world, with over 4 million to choose from. Winner of the Standing Ovation Award for “Best PowerPoint Templates” from Presentations Magazine. They'll give your presentations a professional, memorable appearance - the kind of sophisticated look that today's audiences expect. Boasting an impressive range of designs, they will support your presentations with inspiring background photos or videos that support your themes, set the right mood, enhance your credibility and inspire your audiences.

CrystalGraphics 3D Character Slides for PowerPoint PowerPoint PPT Presentation

CrystalGraphics 3D Character Slides for PowerPoint - CrystalGraphics 3D Character Slides for PowerPoint

Chart and Diagram Slides for PowerPoint PowerPoint PPT Presentation

Chart and Diagram Slides for PowerPoint - Beautifully designed chart and diagram s for PowerPoint with visually stunning graphics and animation effects. Our new CrystalGraphics Chart and Diagram Slides for PowerPoint is a collection of over 1000 impressively designed data-driven chart and editable diagram s guaranteed to impress any audience. They are all artistically enhanced with visually stunning color, shadow and lighting effects. Many of them are also animated. And they’re ready for you to use in your PowerPoint presentations the moment you need them. – PowerPoint PPT presentation

Related Presentations

The%20Art%20of%20Cooking%20Special%20Diets PowerPoint PPT Presentation

The%20Art%20of%20Cooking%20Special%20Diets - The Art of Cooking Special Diets Back to Basics! Sueson Vess Betsy Hicks Julie Matthews Anna Sobaski | PowerPoint PPT presentation | free to view

Global Special Die And Tool, Die Set, Jig, And Fixture Market Opportunities And Strategies To 2030 PowerPoint PPT Presentation

Global Special Die And Tool, Die Set, Jig, And Fixture Market Opportunities And Strategies To 2030 - The major players covered in the global special die and tool, die set, jig, and fixture market are Kennametal, Inc., MISUMI Group Inc., Roto-Die Company... @ @ https://bit.ly/3eSpNaZ | PowerPoint PPT presentation | free to view

Special Die And Tool, Die Set, Jig, And Fixture Size, Share, Statistics, Latest Trends, Segmentation And Forecast to 2030 PowerPoint PPT Presentation

Special Die And Tool, Die Set, Jig, And Fixture Size, Share, Statistics, Latest Trends, Segmentation And Forecast to 2030 - The major players in the global special die and tool die set jig and fixture market are Kennametal, Inc., MISUMI Group Inc., Roto-Die Company, Inc., ALLMATIC-Jakob Spannsysteme GmbH, Midway Rotary Die Solutions | PowerPoint PPT presentation | free to view

Global Special Die And Tool, Die Set, Jig, And Fixture Market Overview And Top Key Players by 2030 PowerPoint PPT Presentation

Global Special Die And Tool, Die Set, Jig, And Fixture Market Overview And Top Key Players by 2030 - Special Die And Tool, Die Set, Jig, And Fixture Market Report 2020 is the latest research report added to The Business Research Company database. The report is covered in 150 pages covering 5 major players in the industry | PowerPoint PPT presentation | free to view

Special Purpose Needles Market 2021 Growth, COVID Impact, Trends Analysis Report PowerPoint PPT Presentation

Special Purpose Needles Market 2021 Growth, COVID Impact, Trends Analysis Report - Medtronic, B. Braun Melsungen AG, Becton, Dickinson and Company, Terumo Corporation, Smiths Medical, Boston Scientific Corporation, Novo Nordisk A/S, Argon Medical Devices, Inc., Stryker Corporation, NIPRO Medical Corporation, Cook Medical, and SERAG-WIESSNER GmbH & Co. are some of the major players operating in the global special purpose needles market. | PowerPoint PPT presentation | free to view

Celebrate the Fest of Raksha Bandhan with your Sibling by Fascinating Special Rakhi Collection at Rakhi.in!! PowerPoint PPT Presentation

Celebrate the Fest of Raksha Bandhan with your Sibling by Fascinating Special Rakhi Collection at Rakhi.in!! - Rakhi.in offers special Rakhi Collection like Fancy rakhi, Designer Rakhi, Silver Rakhi, Ethnic Rakhi, Diamond Rakhi, Bracelet Rakhi, Kids Rakhi, Golden Rakhi, Pearl Rakhi, Lumba Rakhi and much more for your siblings to expreince the happiness of Shopping onlin ewithin a few clicks.For More details visit at- http://www.rakhi.in/ | PowerPoint PPT presentation | free to view

2015 World Tumor Marker and Special Chemistry Testing Market: US, Europe (France, Germany, Italy, Spain, UK), Japan PowerPoint PPT Presentation

2015 World Tumor Marker and Special Chemistry Testing Market: US, Europe (France, Germany, Italy, Spain, UK), Japan - Big Market Research added a new research report package on "2015 World Tumor Marker and Special Chemistry Testing Market: US, Europe (France, Germany, Italy, Spain, UK), Japan" Access The Full Report On : http://www.bigmarketresearch.com/2015-world-tumor-marker-and-special-chemistry-testing-us-europe-france-germany-italy-spain-uk-japan-market This new 320-page report contains 83 tables and provides analysis of the global tumor markers and special chemistry testing market, including emerging tests, technologies, instrumentation, sales forecasts by country, market shares, and strategic profiles of leading suppliers. | PowerPoint PPT presentation | free to view

It's Time to fall in Love- Citizen Watch Valentine Special Collection PowerPoint PPT Presentation

It's Time to fall in Love- Citizen Watch Valentine Special Collection - Beef up love and romance factor with your beau because Valentine’s Day countdown started in everyone’s heart. This Valentine’s day add something special in surprise more than a bouquet of flowers and a box of chocolates. No doubt, it’s an emblem of sentiments. But your sentiments or feelings for her/him isn’t momentary but for the whole life. Citizen watch recommends few Valentine’s Day special timepieces that leave a long lasting sign of your sentiments on the wrist of your valentine. | PowerPoint PPT presentation | free to view

Spring Season Special Cookies and Gift Basket at Ingallina's Box Lunch PowerPoint PPT Presentation

Spring Season Special Cookies and Gift Basket at Ingallina's Box Lunch - Ingallina's Box Lunch offers Spring Season Special Spring Goodie Tray, Spring Combo Party Platter, Spring Cookie Box and More. Call (206) 766-9400 for more information. Our Services located at Seattle, Portland and Los Angeles. | PowerPoint PPT presentation | free to view

Make Birthday Celebrations Special by Sending Midnight Gifts to India from UK PowerPoint PPT Presentation

Make Birthday Celebrations Special by Sending Midnight Gifts to India from UK - To make an individual feel special on his or her Birthday, the best possible thing to do is to send him or her cakes and other birthday gifts right at the stroke of midnight. This will delight them beyond all words. They will be left spellbound! You can send yummy cakes, flowers, birthday hampers as well as gifts to your near and dear ones via GiftstoIndia24x7.com, the premier online gifting portal. Celebrating Birthdays of your loved ones at the stroke of midnight will make them feel special and on top of the world. They will remember this gesture for many years to come. Find more : http://www.giftstoindia24x7.com/Gifts/Midnight_Delivery/1.aspx To send gifts to India from UK : http://www.giftstoindia24x7.com/resources/giftstouk.aspx | PowerPoint PPT presentation | free to view

Toyota Celebrates 50th Anniversary with Special Edition Corolla PowerPoint PPT Presentation

Toyota Celebrates 50th Anniversary with Special Edition Corolla - http://www.northhollywoodtoyota.com/new-toyota/toyota-models/corolla/ | To celebrate the fiftieth birthday of the perennially popular Corolla, Toyota is offering a Special Edition vehicle that comes with a number of high-quality features. It will be available in Fall 2016 in a run of 8,000 vehicles. | PowerPoint PPT presentation | free to view

Special event security virginia security unlimited PowerPoint PPT Presentation

Special event security virginia security unlimited - Are you looking security guards to ensure everyone's safety. guard services for homes, communities, businesses and special events.For more information please call Frank Wesley, President of Security Unlimited Inc., at 301-717-4118 or email fwesley@securityunlimitedus.com. | PowerPoint PPT presentation | free to view

St Patricks Day Special 2016 | Ingallina's Box Lunch PowerPoint PPT Presentation

St Patricks Day Special 2016 | Ingallina's Box Lunch - Ingallina's Box Lunch - Celebrate this St. Patrick Day with Delicious Shamrock Goodies, Gift Basket and Patty's Day Special Combo Party Platters. | PowerPoint PPT presentation | free to view

Kids Special - Country Club Water Park, Ahmedabad PowerPoint PPT Presentation

Kids Special - Country Club Water Park, Ahmedabad - The largest network of family clubs, Country Club India, is famously known as the “Biggest Chain of Family Clubs”. One of such exciting events is Kids Special hosted by the Country Club water Park. The is Kids Special are full of enthusiasm and anticipation, along with the lot of entertainment! Book the tickets and join the fun | PowerPoint PPT presentation | free to view

Contact Instyle For Getting The Special Occasion Fabrics For The Day! PowerPoint PPT Presentation

Contact Instyle For Getting The Special Occasion Fabrics For The Day! - Instyle is one of the most well reputed online fabric stores in the world and has been into the market since 17 years satisfying the needs of customers of all types, all across the world. They have something or the other for everyone, for every such occasions like Christmas, Thanksgiving or even for some special ones like Anniversaries and weddings. For more information visit here http://bit.ly/1TKpaNc. | PowerPoint PPT presentation | free to view

Do you have the Right Insurance Coverage for your Special Event? PowerPoint PPT Presentation

Do you have the Right Insurance Coverage for your Special Event? - Property damage to venues is the most common type of claim we see on special events. There are many important coverage features you should have in your Special Event Insurance Policy. Insure your financial well-being with a stable company that will be there to pay your claim. Contact Statewide Commercial Insurance Brokers for your free quote. | PowerPoint PPT presentation | free to view

Caring for Your Special Needs Loved One PowerPoint PPT Presentation

Caring for Your Special Needs Loved One - As the parent, grandparent, or loved one of a special needs individual you are probably concerned about your loved one's future. The cost of caring for a special child or adult can be high. Many federal and state programs may help but you must use care in your estate plan to ensure that your loved one remains eligible for these programs. | PowerPoint PPT presentation | free to view

Nutrition for Children with Special Health Care Needs Module 4: Evaluating Nutrition Care Plans PowerPoint PPT Presentation

Nutrition for Children with Special Health Care Needs Module 4: Evaluating Nutrition Care Plans - Nutrition for Children with Special Health Care Needs Module 4: Evaluating Nutrition Care Plans Pacific West MCH Distance-Learning Curricula Nutrition for Children ... | PowerPoint PPT presentation | free to view

Children with special needs PhDr. Alena Kop PowerPoint PPT Presentation

Children with special needs PhDr. Alena Kop - Children with special needs PhDr. Alena Kop nyiov , PhD. Slovakia www.vudpap.sk Research Institute of Child Psychology and Pathopsychology is direct organization of ... | PowerPoint PPT presentation | free to view

The Role of Special Needs Trusts in Estate Planning PowerPoint PPT Presentation

The Role of Special Needs Trusts in Estate Planning - A look at special needs trusts in New York and why it is considered an essential part of a comprehensive estate plan. | PowerPoint PPT presentation | free to view

EBL Coaching - Special Education Tutoring PowerPoint PPT Presentation

EBL Coaching - Special Education Tutoring - New York special education offering seminars, workshops & summer programs for children with learning disabilities using Orton-Gillingham method. Visit at: www.eblcoaching.com | PowerPoint PPT presentation | free to view

Idea Special Education Law Summary PowerPoint PPT Presentation

Idea Special Education Law Summary - Idea special education law comprises of three main contents and these are FAPE, IDEA and LRE. In this presentation you will learn about all three of them. | PowerPoint PPT presentation | free to view

China Other Special Chemical Product Manufacturing Market Forecast – 2014 PowerPoint PPT Presentation

China Other Special Chemical Product Manufacturing Market Forecast – 2014 - China other special chemical product manufacturing industry, 2014 is valuable for anyone who wants to invest in the other special chemical product manufacturing industry, to get Chinese investments; to import into China or export from China, to build factories and take advantage of lower costs in China, to partner with one of the key Chinese corporations, to get market shares as China is boosting its domestic needs; to forecast the future of the world economy as China is leading the way; or to compete in the segment. Complete report is available @ http://www.rnrmarketresearch.com/china-other-special-chemical-product-manufacturing-industry-2014-market-report.html . | PowerPoint PPT presentation | free to view

Global Special Mission Aircraft Market and Forecast Report 2016-2021 PowerPoint PPT Presentation

Global Special Mission Aircraft Market and Forecast Report 2016-2021 - DecisionDatabases.com adds a report on Global Special Mission Aircraft Industry 2016 Market Research Report. This research study is segmented on the bases of applications, technology and geography. Visit Us - http://www.decisiondatabases.com/ip/5367-special-mission-aircraft-industry-market-report | PowerPoint PPT presentation | free to view

Mechanical Circulatory Support in Special Populations PowerPoint PPT Presentation

Mechanical Circulatory Support in Special Populations - Mechanical Circulatory Support in Special Populations Peripartum Cardiomyopathy Renzo Y. Loyaga-Rendon MD.,PhD.. Assistant Professor Advanced Heart Failure Section | PowerPoint PPT presentation | free to view

Assistive Technologies System for Children with Special Needs PowerPoint PPT Presentation

Assistive Technologies System for Children with Special Needs - Assistive Technologies System for Children with Special Needs Prof. D-r. Dragan Mihajlov, Faculty of Computer Science and Engineering, Ss. Cyril and Methodius University, | PowerPoint PPT presentation | free to view

CIS 207 education changes / sellfy.com PowerPoint PPT Presentation

CIS 207 education changes / sellfy.com - https://sellfy.com/p/aliP/ CIS 207 Week 1 Individual Data, Information, and Knowledge CIS 207 Week 1 Individual Information System Overview CIS 207 Week 2 Individual Map Your Network | PowerPoint PPT presentation | free to view

Home About Us Terms and Conditions Privacy Policy Contact Us
Copyright 2024 CrystalGraphics, Inc. — All rights Reserved. PowerShow.com is a trademark of CrystalGraphics, Inc.

The PowerPoint PPT presentation: "Special%20Topics%20in%20Computer%20Science%20The%20Art%20of%20Information%20Retrieval%20Chapter%207:%20Text%20Operations" is the property of its rightful owner.

Do you have PowerPoint slides to share? If so, share your PPT presentation slides online with PowerShow.com. It's FREE!