Programming Interest Group http://www.comp.hkbu.edu.hk/~chxw/pig/index.htm - PowerPoint PPT Presentation

About This Presentation
Title:

Programming Interest Group http://www.comp.hkbu.edu.hk/~chxw/pig/index.htm

Description:

... are mappings between numbers and the symbols which make up a particular alphabet. ... To iterate through all the lowercase letters. for(ch = a'; ch = z'; ch ... – PowerPoint PPT presentation

Number of Views:144
Avg rating:3.0/5.0
Slides: 40
Provided by: xiaow
Category:

less

Transcript and Presenter's Notes

Title: Programming Interest Group http://www.comp.hkbu.edu.hk/~chxw/pig/index.htm


1
Programming Interest Grouphttp//www.comp.hkbu.ed
u.hk/chxw/pig/index.htm
  • Tutorial Three
  • Strings
  • Sorting

2
Character Codes
  • Character codes are mappings between numbers and
    the symbols which make up a particular alphabet.
  • ASCII American Standard Code for Information
    Interchange
  • A single byte character code
  • 128 characters are specified
  • The highest-order bit is left as zero
  • Unicode
  • Two bytes per character
  • Natively supported by Java

3
ASCII Code
4
Some Properties about ASCII
  • Uppercase letters, lowercase letters, and
    numerical digits appear sequentially.
  • To iterate through all the lowercase letters
  • for(ch a ch lt z ch)
  • is character ch uppercase?
  • (ch gt A) (ch lt Z)
  • Convert uppercase character ch to lowercase
  • ch (A a)

5
Strings
  • Strings are sequences of characters.
  • Different programming languages may have
    different representations!
  • C/C
  • Null-terminated array the string ends with null
    character \0.
  • Enough array must be allocated to hold the
    largest possible string (plus the null).
  • Java
  • Array plus length

6
Manipulating Strings
  • The length of the string
  • Copy a string
  • Reverse a string
  • Concatenate two strings
  • Search a character in a string
  • Search a string in a string
  • String matching problem (or string searching
    problem)

7
String Matching Problem
  • String matching algorithms are also used to
    search for particular patterns in DNA sequences.
  • E.g., find the location of pattern P abaa in
    the text T abcabaabcabac

8
String Matching Algorithms
  • http//en.wikipedia.org/wiki/String_searching_algo
    rithm
  • Given text T with length n, pattern P with
    length m
  • Naïve algorithm
  • Matching time O((n-m1)m)
  • Rabin-Karp algorithm
  • Preprocessing time ?(m)
  • Matching time O((n-m1)m)
  • Knuth-Morris-Pratt (or KMP) algorithm
  • Preprocessing time ?(m)
  • Matching time ?(n)
  • Boyer-Moore (or BM) algorithm
  • Preprocessing time ?(m)
  • Matching time worst ?(n), average ?(n/m)

9
C String Library Functions
  • ctype.h and string.h

include ltctype.hgt int isalpha(int c) int
isupper(int c) int islower(int c) int
isdigit(int c) int ispunct(int c) int
isxdigit(int c) int isprint(int c) int
toupper(int c) int tolower(int c)
include ltstring.hgt char strcat(char dst, const
char src) char strncat(char dst, const char
src, size_t n) int strcmp(const char s1, const
char s2) int strncmp(const char s1, const char
s2, size_t n) char strcpy(char dst, const
char src) char strncpy(char dst, const char
src, size_t n) size_t strlen(const char
s) char strstr(const char s1, const char
s2) char strtok(char s1, const char s2)
10
C String Library functions
  • C supports the c-style strings
  • C also has a string class

stringsize() stringempty() stringappend(s) s
tringerase(n, m) stringinsert(size_type n,
const strings) stringfind(s) stringrfind(s)
11
Java String Objects
  • String class java.lang.String
  • http//java.sun.com/j2se/1.4.2/docs/api/java/lang/
    String.html
  • In Java, strings are constant their values
    cannot be changed after they are created.
  • StringBuffer class java.lang.StringBuffer
  • http//java.sun.com/j2se/1.4.2/docs/api/java/lang/
    StringBuffer.html
  • A string buffer is like a String, but can be
    modified. At any point in time it contains some
    particular sequence of characters, but the length
    and content of the sequence can be changed
    through certain method calls.

12
Example Corporate Renaming
  • Corporate name changes are occurring with ever
    greater frequency, as companies merge, buy each
    other out, try to hide from bad publicity. These
    changes make it difficult to figure out the
    current name of a company when reading old
    documents.
  • Your company, Digiscam, has put you to work on a
    program which maintains a database of corporate
    names changes and does the appropriate
    substitutions to bring old documents up to date.
  • Your program should take as input a file with a
    given number of corporate name changes, followed
    by a given number of lines of text for you to
    correct. Only exact matches of the string should
    be replaced.
  • There will be at most 100 corporate changes, and
    each line of text is at most 1,000 characters
    long.

13
Sample Input and Output
  • 4
  • Anderson Consulting to Accenture
  • Enron to Dynegy
  • DEC to Compaq
  • TWA to American
  • 5
  • Anderson Accounting begat Anderson Consulting,
    which
  • offered advice to Enron before it DECLARED
    bankruptcy,
  • which made Anderson
  • Consulting quite happy it changed its name
  • in the first place!
  • Output
  • Anderson Accounting begat Accenture, which
  • offered advice to Dynegy before it CompaqLARED
    bankruptcy,
  • which made Anderson
  • Consulting quite happy it changed its name
  • in the first place!

14
Required String Operations
  • Read strings
  • Store strings
  • Search strings for patterns
  • Modify strings
  • Print strings

15
Read and Store
  • include ltstring.hgt
  • define MAXLEN 1001 / longest possible string
    /
  • define MAXCHANGES 101 / maximum number of name
    changes /
  • typedef char stringMAXLEN
  • string mergersMAXCHANGES2 / store
    before/after corporate names /
  • int nmergers / number of different name
    changes /
  • read_changes( )
  • int i
  • scanf(d\n, nmergers)
  • for (i 0 i lt nmergers i )
  • read_quoted_string( (mergersi0) )
  • read_quoted_string( (mergersi1) )

read_quoted_string(char s) int i 0
char c while ( (cgetchar()) ! \)
while ( (cgetchar()) ! \)
si c i
si \0
16
Searching for PatternsReturn the position of the
first occurrence of the pattern p in the text t,
and -1 if it does not occur.
  • int findmatch( char p, char t)
  • int i, j
  • int plen, tlen
  • plen strlen(p)
  • tlen strlen(t)
  • for ( i 0 i lt (tlen plen) i)
  • j 0
  • while ( (j lt plen) (tij pj) )
  • j
  • if (j plen)
  • return (i)
  • return (-1)

17
Manipulating StringsReplace the substring of
length xlen starting at position pos in string s
with the contents of string y.
  • replace_x_with_y(char s, int pos, int xlen, char
    y)
  • int i
  • int slen, ylen
  • slen strlen(s)
  • ylen strlen(y)
  • if (xlen gt ylen)
  • for( i (posxlen) i lt slen i)
  • si(ylen-xlen) si
  • else
  • for( i slen i gt (posxlen) i-- )
  • si(ylen-xlen) si
  • for (i 0 i lt ylen i)
  • sposi yi

18
Completing the Merger
  • main ()
  • string s
  • char c
  • int nlines
  • int i, j
  • int pos
  • read_changes()
  • scanf(d\n, nlines)
  • for ( i 1 I lt nlines i )
  • j 0
  • while ( (cgetchar()) ! \n)
  • sj c
  • j
  • sj \0
  • for( j 0 j lt nmergers j )
  • while ( (pos findmatch(mergersj0, s) )
    ! -1 )
  • replace_x_with_y (s, pos, strlen(mergersj0
    , mergersj1)

19
Practice
  • http//acm.uva.es/p/v8/848.html
  • http//acm.uva.es/p/v8/850.html
  • http//acm.uva.es/p/v100/10010.html
  • http//acm.uva.es/p/v100/10082.html
  • http//acm.uva.es/p/v101/10132.html
  • http//acm.uva.es/p/v101/10150.html ()
  • http//acm.uva.es/p/v101/10188.html
  • http//acm.uva.es/p/v102/10252.html

20
Sorting
  • Sorting is the most fundamental algorithmic
    problem in computer science.
  • Internal sorting the entire sort can be done in
    main memory (the input fit into main memory)
  • External sorting cannot be performed in main
    memory and must be done on disk or tape (the
    input is much too large to fit into memory)
  • To see a list of sorting algorithms
  • http//en.wikipedia.org/wiki/Sorting_algorithm

21
Properties of sorting algorithms
  • Computational complexity of element comparisons
    in terms of the size of the list
  • Worst case, best case, average case
  • Sort algorithms which only use an abstract key
    comparison operation always need at least
    O(n log n) comparisons on average.
  • Memory usage
  • some sorting algorithms are "in place", such that
    only O(1) or O(log n) memory is needed beyond the
    items being sorted, while others need to create
    auxiliary locations for data to be temporarily
    stored.
  • Stability
  • stable sorting algorithms maintain the relative
    order of records with equal keys (i.e. values).
    That is, a sorting algorithm is stable if
    whenever there are two records R and S with the
    same key and with R appearing before S in the
    original list, R will appear before S in the
    sorted list.
  • When equal elements are indistinguishable, such
    as with integers, stability is not an issue.
  • Unstable sorting algorithms may change the
    relative order of records with equal keys.
  • Unstable sorting algorithms can be specially
    implemented to be stable.

22
Some simple algorithms
  • Bubble sort (or sinking sort)
  • Selection sort
  • Insertion sort
  • Best case O(N)
  • Worst case O(N2)
  • Average case ?(N2)
  • All are O(N2)

23
Shellsort
  • Proposed by Donald Shell in 1959.
  • Increment sequences h1, h2, h3, ..., ht, used in
    reverse order h11.
  • After a phase, with an increment hk, A i lt A
    i hk. All elements spaced hk apart are
    sorted.
  • The action of an hk-sort is to perform an
    insertion sort on hk independent subarrays.
  • The running time of shell sort depends on the
    choice of increment sequence.
  • The average-case running time of shellsort, using
    Hibbards increments, is thought to be O(N5/4)
    worst case ?(N3/2)
  • The average-case running time of shellsort, using
    Sedgewicks increments, is conjectured to be
    O(N7/6) worst case ?(N4/3)
  • Shellsort is simple, and the performance is
    acceptable even for N in the tens of thousands.

24
More complicated algorithms
  • Mergesort
  • A good example of divide and conquer
  • Stable
  • Heapsort
  • Make use of data structure heap
  • Unstable
  • Running time O(NlogN)
  • Remark
  • Merge sort is the cornerstone of most external
    sorting algorithm

25
Quicksort
  • Quicksort is the fastest known sorting algorithm
    in practice.
  • A divide-and-conquer recursive algorithm
  • Average running time is O(NlogN).
  • Worst running time is O(N2).
  • Quicksort an array S
  • If the number of elements in S is 0 or 1, return
  • Pick any element v in S. This is called the
    pivot.
  • Partition S-v into two disjoint groups S1
    x ? S-vx ? v, and S2 x ? S-vx ? v.
  • Return quicksort (S1) followed by v followed by
    quicksort (S2).
  • Efficient implementations of Quicksort are
    typically unstable.
  • Details can be found at any data structure
    algorithm textbook, or goto http//en.wikipedia.or
    g/wiki/Quicksort

26
Non-comparison sorts
  • Not limited by the O(nlog n) lower bound
  • Bucket sort
  • http//en.wikipedia.org/wiki/Bucket_sort
  • Radix sort
  • http//en.wikipedia.org/wiki/Radix_sort
  • Counting sort
  • http//en.wikipedia.org/wiki/Counting_sort

27
Sorting Library Functions
  • In C

Sort an array include ltstdlib.hgt void
qsort(void base, size_t nmemb, size_t size,
int (compar) (const void , const
void )) This function sorts an array with nmemb
elements pointed by base, where each element is
size-bytes long. Binary search include
ltstdlib.hgt void bsearch(const void key, const
void base, size_t nmemb,
size_t size, int (compar)(const void , const
void ))
28
qsort( ) example
  • int main(void)
  • char line1024
  • char line_array1024
  • int i 0
  • int j 0
  • while((fgets(line, 1024, stdin)) ! NULL)
  • if(i lt 1024)
  • line_arrayi strdup(line)
  • else
  • break
  • sortstrarr(line_array, i)
  • while(j lt i)
  • printf("s", line_arrayj)
  • return 0

include ltstdio.hgt include ltstring.hgt include
ltstdlib.hgt void sortstrarr(void array, unsigned
n) static int cmpr(const void a, const void
b) static int cmpr(const void a, const void
b) return strcmp((char )a, (char
)b) void sortstrarr(void array, unsigned
n) qsort(array, n, sizeof(char ), cmpr)
29
Sorting and Searching in C
  • The C STL includes methods for sorting,
    searching, and more.

void sort(RandomAccessIterator bg,
RandomAccessIterator end) void
sort(RandomAccessIterator bg, RandomAccessIterator
end, BinaryPredicate
op) void stable_sort(RandomAccessIterator bg,
RandomAccessIterator end) void
stable_sort(RandomAccessIterator bg,
RandomAccessIterator end,
BinaryPredicate op)
30
Sorting and Searching in Java
  • java.util.Arrays

static void sort(Object a) static void
sort(Object a, Comparator c) static int
binarysearch(Object a, Object key) static int
binarysearch(Object a, Object key, Comparator
c)
sort() methods in jave.util.Arrays are all stable.
31
Example 1
  • http//acm.uva.es/p/v100/10041.html

Background  The world-known gangster Vito
Deadstone is moving to New York. He has a very
big family there, all of them living in Lamafia
Avenue. Since he will visit all his relatives
very often, he is trying to find a house close to
them. Problem  Vito wants to minimize the total
distance to all of them and has blackmailed you
to write a program that solves his problem.
Input  The input consists of several test
cases. The first line contains the number of test
cases. For each test case you will be given the
integer number of relatives r ( 0 lt r lt 500) and
the street numbers (also integers) s1, s2, , sr
where they live ( 0 lt si lt 30000 ). Note that
several relatives could live in the same street
number. Output  For each test case your program
must write the minimal sum of distances from the
optimal Vito's house to each one of his
relatives. The distance between two street
numbers si and sj is dij si-sj.
32
Example 1
  • If there is 0 or 1 relative, just return 0
  • If there are 2 relatives
  • If there are 3 relatives
  • If there are 4 relatives
  • Can you see the solution now?

33
Example 2
  • The following is a list of some sorting
    algorithms.
  • Bubble sort, heap sort, insertion sort, merge
    sort, quick sort, selection sort, shell sort
  • My business here is to give you some numbers, and
    to sort them is your business.
  • Attention, I want the smallest number at the top
    of the sorted list.
  • Input
  • The input file consist of a series of data sets.
    Each data set has two parts the first part
    contains two non-negative integers, n (1 n
    100,000) representing the total of numbers you
    will get, and m (1 m n) representing the
    interval of the output sorted list. The second
    part contains n positive integers which will be
    less than 2,000,000,000. The input is terminated
    by a line with two zeros.
  • Output
  • For each data set, you should output several
    numbers in ONE line. After you get the sorted
    list, you should output the first number of each
    m numbers, and you should print exact ONE space
    between two adjacent numbers. And please make
    sure that there should NOT be any blank line
    between outputs of two adjacent data sets.

34
Example 2
  • Sample Input
  • 8 2
  • 3
  • 5
  • 7
  • 1
  • 8
  • 6
  • 4
  • 2
  • 0 0
  • Output for the Sample Input
  • 1 3 5 7

35
Example 3
  • Dr. Lee cuts a string S into N pieces, s0,
    s1, , sN-1. Now, Dr. Lee gives you these N
    sub-strings. There might be several possibilities
    that the string S could be. For example, if Dr.
    Lee gives you three sub-strings a, ab,
    ac, the string S could be aabac, aacab,
    abaac, . Your task is to output the
    lexicographically smallest S.
  • Input
  • The first line of the input is a positive
    integer T. T is the number of the test cases. The
    first line of each test case is a positive
    integer N (1 N 8) which represents the number
    of sub-strings. After that, N lines followed. The
    i-th line is the i-th sub-string. Assume that the
    length of each sub-string is positive and less
    than 100.
  • Output
  • The output of each test is the lexicographically
    smallest S. No redundant spaces are needed.

36
Example 3
  • Sample Input
  • 1
  • 3
  • a
  • ab
  • ac
  • Output for the Sample Input
  • aabac

37
Example 3
  • Analysis
  • Solution One brute-force (N is small)
  • 8! 40320
  • A better solution
  • Define a new relation between two strings X and Y
  • If XY lt YX, then X ltlt Y. E.g. X b, Y ba.
    We have
  • X lt Y. But Y ltlt X because bab lt bba
  • Try to prove that if X ltlt Y and Y ltlt Z, then X ltlt
    Z
  • Then we can sort the N strings based on ltlt
    operator
  • Combine the sorted string

38
Example 3
  • include ltiostreamgt
  • include ltstringgt
  • include ltalgorithmgt
  • Using namespace std
  • int T, n
  • string s10
  • bool cmp(string x, string y)
  • return x y lt y x

int main( ) int i cint gtgt T while (T--)
cin gtgt n for(i 0 i lt n i) cin gtgt
si sort(s, sn, cmp) for (i 0 i lt n
i) cout ltlt si cout ltlt endl return
0
39
Practice
  • http//acm.uva.es/p/v1/120.html
  • http//acm.uva.es/p/v100/10026.html
  • http//acm.uva.es/p/v100/10037.html ()
  • http//acm.uva.es/p/v101/10138.html
  • http//acm.uva.es/p/v101/10152.html
  • http//acm.uva.es/p/v101/10191.html
  • http//acm.uva.es/p/v101/10194.html
Write a Comment
User Comments (0)
About PowerShow.com