Globalisation - PowerPoint PPT Presentation

Loading...

PPT – Globalisation PowerPoint presentation | free to download - id: 1f7bca-NzU0O



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Globalisation

Description:

Globalisation & Computer systems. Week 7. Text processes and ... Collation is usually specific to a particular language' (Unicode version 3: glossary) ... – PowerPoint PPT presentation

Number of Views:57
Avg rating:3.0/5.0
Slides: 31
Provided by: css83
Learn more at: http://www.computing.surrey.ac.uk
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Globalisation


1
Globalisation Computer systems
  • Week 7
  • Text processes and globalisation part 1
  • Sorting strings collation
  • Searching strings and regular expressions
  • Practical regular expressions in UNIX

2
Text processes
  • Character encoding design
  • must provide the set of code values that allows
    programmers to design applications capable of
    implementing a variety of text processes in the
    desired language
  • Text processes operate over text elements

3
Text processes
  • Text elements
  • The objects of a text
  • Depends on perspective
  • Different text processes operate over different
    objects

4
Sorting
  • Sorting (collation)
  • The process of ordering units of textual
    information. Collation is usually specific to a
    particular language (Unicode version 3
    glossary)

5
Sorting
  • Language specific
  • sort order
  • phonetically based sort
  • graphically based sort
  • sort element

6
Sorting
  • Levels of comparison
  • Level 1 (primary difference)
  • Levels 2 and 3 (similar)
  • Level 4 (exact match)

7
Sorting
  • Levels of comparison
  • Level 4 exact match
  • match in code value
  • character equivalence
  • resumes resumes

8
Sorting
  • Levels of comparison
  • Level 1 (primary difference alphabetic)

9
Sorting
  • Levels of comparison
  • Level 1 (primary difference)
  • resume lt resumes

10
Sorting
  • Levels of comparison
  • Level 1 (primary difference)
  • resume lt resumes
  • Level 2 (similar no accent lt accent)
  • resume lt résumé
  • resumes lt résumés
  • Level 3 (similar lower case lt upper case)
  • résumé lt Résumé

11
Sorting
  • Forward and backward sequence sort
  • Forward sequence
  • Start comparison from beginning of string
  • Backward sequence
  • Start comparison from end of string

12
Sorting
  • Implementation
  • Sort keys
  • assign set of weights to each character in the
    string
  • compare substrings according to weighting
  • switch weightings on / off

13
Searching
  • Text elements
  • The objects of a text
  • Depends on perspective
  • Different text processes operate over different
    objects

14
Regular Expressions
  • Basis of all web-based and word-processor-based
    searches
  • Definition 1. An algebraic notation for
    describing a string
  • Definition 2. A set of rules that you can use to
    specify one or more items, such as words in a
    file, by using a single character string (Sarwar
    et al.)

15
Regular Expressions
  • regular expression, text corpus
  • regular expression algebra has variants Perl,
    Unix tools
  • Unix tools egrep, sed, awk

16
Regular Expressions
  • Find occurrences of /Nokia/ in the text
  • egrep -n Nokia nokia_corpus.txt

17
Regular Expressions
  • egrep -n Nokia nokia_corpus.txt

18
Regular Expressions
  • set operator
  • egrep -n Nnokia nokia_corpus.txt

19
Regular Expressions
  • optional operator
  • egrep -n shares? nokia_corpus.txt

20
Regular Expressions
  • egrep -n shares? nokia_corpus.txt

21
Regular Expressions
  • Kleene operators
  • /string/ zero or more occurrences of previous
    character
  • /string/ 1 or more occurrences of previous
    character

22
Regular Expressions
  • Wildcard operator
  • /string./ any character after the previous
    character

23
Regular Expressions
  • Wildcard operator
  • /string./ any character after the previous
    character
  • Combine wildcard and kleene
  • /string./ zero or more instances of any
    character after the previous character
  • /string./ one or more instances of any
    character after the previous character

24
Regular Expressions
  • egrep n profit. nokia_corpus.txt

25
Regular Expressions
  • Anchors
  • Beginning of line operator
  • egrep said nokia_corpus.txt
  • End of line operator
  • egrep said nokia_corpus.txt

26
Regular Expressions
  • Disjunction
  • set operator
  • /Sstring/ a string which begins with either S
    or s
  • Range
  • /A-Ztring/ a string beginning with a capital
    letter
  • pipe
  • /string1string2/ either string 1 or string 2

27
Regular Expressions
  • Disjunction
  • egrep n weakwarningdrop nokia_corpus.txt
  • egrep n weak.warn.drop. nokia_corpus.txt

28
Regular Expressions
  • Negation
  • /a-ztring any strings that does not begin
    with a small letter

29
Regular Expressions
  • Precedence
  • Parantheses
  • Kleene and optional operators . ?
  • Anchors and sequences
  • Disjunction operator
  • (a) /supply iers/

30
Regular Expressions
  • Precedence
  • Parantheses
  • Kleene and optional operators . ?
  • Anchors and sequences
  • Disjunction operator
  • /supply iers/ /supply/ /iers/
  • /suppl(yiers)/ /supply/ suppliers/
About PowerShow.com