Rani Siromoney - PowerPoint PPT Presentation

1 / 50
About This Presentation
Title:

Rani Siromoney

Description:

Professor Emeritus, Madras Christian College. Adjunct Professor, Chennai Mathematical Institute ... Department of Statistics, Madras Christian College, pp.67 ... – PowerPoint PPT presentation

Number of Views:190
Avg rating:3.0/5.0
Slides: 51
Provided by: kov6
Category:
Tags: madras | rani | siromoney

less

Transcript and Presenter's Notes

Title: Rani Siromoney


1
Forty Years of Formal Languages and Automata
Theory Reminiscences
Rani Siromoney
Professor Emeritus Madras Christian College and
Adjunct Professor Chennai Mathematical institute
3rd Update Meeting Automata Verification
MatScience February 29 March 2, 2004
2
LAND MARKS / MILESTONES
3
LAND MARKS / MILESTONES (continued)
4
(No Transcript)
5
CHRONOLOGY
Equal Matrix Languages (n-right linear/multi-tape
FSA)
Grammatical Inference Takada
Two-dimensionalKK GS
AlgorithmicLearning TheoryNoise
ModelSakakibara
L-Systems(2-D, Radial, Hexagonal)GS
DNA Splicing Takada
RecognizableKGS et al
DNA Computing Lila Kari
PKC
OTA(Learning)
Machine LearningILP Arul
Aqueous ComputingBireswar
6
  • Prior to 1965, mainly Chomskian Hierarchy
    General Trend from 1965 To introduce new
    classes of grammar / automata and the
    corresponding languages generated /
  • accepted or recognized motivated
    by nice theoretical properties / applications
    Parallel Rewriting Restricted Rewriting
    Using Simple Rules (CF/Regular) To enhance
    generative capacity capture interesting
    properties (indexed grammars, matrix
    grammars etc) Equal Matrix Languages, falls
    under the following categories - Parallel
    Rewriting - Restricted Rewriting -
    Nice Theoretical Properties

7
Equal Matrix Languages (EML), RS 1969
(n-right linear, Ibarra 1970 , Multi-tape Finite
Automata 1967)
  • Motivation Parikh mapping of any CFL
    semi-linear Converse not true Examples
    anbncn/n?1, ww / w ? (a,b),
    context-sensitive
  • To find a family of languages for which the
    converse is true
  • Came across Matrix Languages (Abraham 1964)
  • Came up with EML n right-linear rules applied
    in parallel, constrained by a matrix
  • Nice Properties Higher Generative
    Capacity(Simple EMG for well-known CSL) Parikh
    mapping semi-linear for all EML L ? w1 wk
    , w1 , , wk ??, k finite, is an EML
    (Bounded EML) iff Parikh mapping is semi-linear
    Unambiguity Automata characterization
    (Finite-turn Checking Automata) Closure
    Properties Decidability Results
  • Application to kernel sentences in Tamil (GS)

8
Two Dimensional Languages / Picture
LanguagesIntroduction (as extension of EML) Two
major types Significant contribution -
extension of catenation to row and column -
has become standard terminology Digression to
Kolam Marcia Aschers paper and book in 2002
9
Lindenmayer Systems
(2D, Radial, Hexagonal)
  • - 2D Expansion in all four directions-
    Hexagonal- Significant contribution-
    Introduction of arrow head catenation- Hexagon
    catenated to hexagon yields hexagon
  • Work continued (i) By MM KK in memorial
    volume (ii) KGS- View of 3D solids as hexagons
  • - Block behind block reflected by arrow head
    catenation

10
Consider hexagonal arrays, define arrowhead
catenation Triangular grid made up of lines
equally inclined and parallel to three fixed
directions Definition of catenation of an
a-hexagonal array (A)with a p-hexagonal array
(H)
11
(No Transcript)
12
(No Transcript)
13
Kambi Kolam and Circular DNA Splicing
Rani Siromoney
Professor Emeritus, Madras Christian
College Adjunct Professor, Chennai Mathematical
Institute Chennai, Tamil Nadu, India
14
Picture Languages
Kolam is a traditional art practiced
extensively in the southern part of India,
for decorating courtyards of dwellings.
15
(No Transcript)
16
Picture languages (continued)
Kolam figures grouped into families attracted
interest of theoretical computer scientists
concerned with analysis and description of
pictures through the use of picture languages,
which use sets of basic units and specific,
formal rules for combining the units
17
Derivation of multi-kambi kolam from single
kambi kolam According to one KP, a proper
kambi kolam should consist of a single
kambi If a kolam did contain more than one kambi
,then the greater the number of kambis the
easier it is to memorize the kolam. A single
kambi kolam can be converted into a
multi-kambi kolam by applying a cut at a
crossing. A cut and join (de-link) operation
fuses ends together, two at a time, after cut
at a cross which produces four ends.
18
(No Transcript)
19
  • A cut and connect operation can link two adjacent
    corners.
  • A cut is introduced such that it goes through two
    adjacent
  • rounded corners producing four ends.
  • These ends are connected either forming a
    crossing
  • alternately two new adjacent rounded corners.
  • -Two kambis when used in a cut and connect
    operation will
  • fuse into one same kambi.
  • - If two adjacent corners belong to the same
    kambi then a
  • cut and connect operation can produce two kambis
  • or just a kambi with an additional crossing.

20
SPLICING SYSTEMS (Tom Head) INTRODUCTION
Initial set (finite or infinite) consists of
double-stranded DNA molecules   Specific classes
of enzymatic activities considered- those of
restriction enzymes Recombinant behavior
modeled and associated sets analyzed by new
formalism called Splicing Systems Attention
focused oneffect of sets of restriction enzymes
and a ligasethat allow DNA molecules to be
cleaved and Re-associated to produce further
molecules.
21
Circular DNA and Splicing Systems DNA
molecules exist not only in linear forms
but also in circular forms.  
22
Action of splicing schemes on Circular strings
(RS,KGS,VRD,UBE,1992) certain recombination
processes pair of circular DNA molecules produce
a circular string Formally S (A, T, P)
splicing systemhpxq, wuxv circular strings in
A for which (p,x,q) P (u, x,v) S acts on two
circular strings to producea pair of single
circular string hgpxvwuxq
cut and connect operation
23
Picture Languages Ascher Marcia M. 2002, The
Kolam Tradition, American Scientist,90,56-63
Ascher , Marcia..2002, Mathematics Elsewhere
Princeton University Press. Princeton,
N.J Narasimhan, R. 1992. The oral-literate
dimension in Indian culture. In Indological
EssaysCommemorative Volume II for Gift
Siromoney, ed. M. Lockwood, Department of
Statistics, Madras Christian College,
pp.67-79 Prusinkiewicz, P., K. Krithivasan and
M. G. Vijayanarayana. 1989. Application of
L-systems to algorithmic generation of South
India folk art patterns and Karnatic music, in A
Perspective in Computer Science Commemorative
Volume for Gift Siromoney, ed. R.Narasimhan.
Computer Science Series, vol. 16, World
Scientific, Singapore pp. 229-247 Gift
Siromoney, Rani Siromoney and T. Robinson, Kambi
kolam and cycle grammers, in A Perspective in
Computer Science Commemorative Volume for Gift
Siromoney, ed. R.Narasimhan. Computer Science
Series, vol. 16, World Scientific, Singapore pp.
267-300 Gift Siromoney, Studies on the
traditional art of Kolam, Working Paper I, May
1985, (Manuscript)
24
Distributed Circular Systems Rani
SiromoneyDistributed Circular Systems,Grammar
Systems 2000 ,Bad Ischl, Austria
25
Sequential Distributed Circular Automata
26
  • Finite Automata for Circular Languages
  • J. Kari and L. Kari,Context-free
    Recombinations,Words, sequences, languages where
    computer science, biology and linguistics meet,
    C. Martin-Vide, V. Mitrana (Eds.). Kluwer, The
    Netherlands.
  • DefinitionFinite automaton A, circular
    languageK-accepted by A,L( A )K , all words w
    such that A has a cycle labeled by w
  • K AcceptanceCircular/linear language accepted
    by a finite automaton A, defined as L(A) ?
    L(A), L(A) linear language accepted by
    automaton A defined in the usual way
  • Definition A circular/linear language L ?
    ???? is regular if there is a finite automaton
    A that accepts the circular and linear parts of L
    i.e., that accepts L ?? and L ? ?

27
P-Acceptance The following definition is
equivalent to a definition given by Pixton
namely, the circular language accepted by a
finite automaton is a set of all words that label
a loop containing at least one initial and one
final state.DefinitionGiven a finite
automaton A,the circular language accepted by A,
L(A)Pis the set of all words w such that A
has a cycle labelled by w that containsat least
one final state.
28
H-Acceptance The circular languages accepted by
finite automaton by the following definition
coincide with the regular circular languages
introduced by Head Given a finite automation A,
the circular language accepted by A, L( A )H
is the set of all words w such that w u v and
v u ? L( A ) Pixton has shown that if in
addition we assume that the family of languages
is closed under repetition (i.e., wn is in the
language whenever w is) H acceptance and P
Acceptance are equivalent
29
  • Proposition
  • Family of circular languages accepted by finite
    automata by K-Acceptance strictly included in
  • Family of circular languages accepted by finite
    automata by P-acceptance strictly included in
  • Family of circular languages accepted by finite
    automata by H-acceptance
  • Context-free recombination are computationally
    weak, able to generate only regular languages.
  • Advantages of K-Acceptance
  • The same automaton accepts both the linear and
    circular components of the language

30
  • Sequential Distributed Architecture by Automata
    for Circular Languages
  • Splicing for purely circular strings and
  • automata characterization given by Pixton
  • Mixed splicing/recombination and automata
    characterizations given
  • (Pixton , Kari and Kari )
  • To extend sequential distributed architecture to
    these cases
  • Sequential grammar systems are extended to
    automata (Krithivasan et.al.)
  • Distributed FSAs and Distributed PDAs
    Similar to the modes defined in CD-Grammar
    Systems Acceptance power analyzed
    Distributed FSAs in all modes not more
    powerful than centralized FSAs For
    PDAs in all modes, distributed counterpart as
    powerful as TMs
  • For purely circular languages,consider
    P-acceptance
  • For mixed ( circular/linear ) languages, consider
    the K-Automata,
  • apply technique in Krithivasan et.al. for
    distributed processing in automata for
    the different modes of acceptance
  • Distributed processing in automata ( sequential )
    does not increase the generative power
    for Regular Circular Languages

31
  • Algorithmic Learning TheoryInductive Logic
    Programming
  • Sakakibara Y, Siromoney R 1992 , A noise model on
    learning sets of strings. Proceedings of the
    Fifth Annual ACM Workshop on Computational
    Learning Theory (ACM Press) pp 295-302
  • Siromoney A, Siromoney R 1993, Local Exceptions
    in Inductive Logic Programming, Presented at the
    International Workshop on Machine Intelligence,
    ARL Labs, Hitachi, Japan.
  • Siromoney A, Siromoney R 1995 Variations and
    Local Exceptions in Inductive Logic Programming.
    Machine Intelligence (eds) K Furukawa, D Michie,
    S Muggleton (Oxford Clarendon) vol. 14, pp
    211-232
  • Arul Siromoney and Rani Siromoney, June 1996 , A
    machine learning system for identifying
    transmembrane domains from amino acid sequences
    Sadhana, Vol. 21, Part 3, pp. 317-325.

32
  • A new noise model on learning sets of strings
    - in the framework of PAC learning
  • Instance domain, ?n set of strings of
    length n over a finite alphabet ?
  • EDIT operation errors - Insertion,
    deletion, change of a symbol in a string
  • EDIT Noise - Examples corrupted by random
    errors - General upper bounds on the EDIT
    noise rate that a learning algorithm of
    taking the strategy of minimizing
    disagreements can tolerate and learning algorithm
    can tolerate.
  • Next we present an efficient algorithm that can
    learn a class of decision lists (Rivest 87) over
    the attributes a string w contains a pattern p
    from noisy examples under some restriction on
    the EDIT noise rate.

33
  • ILP System uses background knowledge and
    a set of examples and counter examples to
    learn the description of a concept in the form of
    a set of Horn clauses or Prolog program.
  • A machine learning system that uses - inductive
    logic programming techniques- to learn how to
    identify- transmembrane domain from amino acid
    sequences- very important in protein
    classification problem
  • uses operations such as contains that act
    on entire sequences rather than individual
    elements
  • prediction accuracy of implementation (around)
    93 - compares favourably with earlier results

34
  • In Real Life, Rules have exceptions
  • Exceptions incorporated in ILP - GOLEM (ILP
    Learning Algorithm) suitably extended
  • Learning local exceptions in ILP PAC
    learnable
  • Application considered where exceptions are
    useful

35
AS RS
Variations and local exceptions in inductive
logic programming.- Variations are valid
departure from the normal.- Classical musical
composer writing variations on a theme.- Jazz
player improvising on a melody- Indian
percussionist improvising on a rhythm.
Variations are different from noise- Variation
valid departure from the normal- Noise
incorrect or illegal deviation from the
normalApplication in music, molecular biology,
Speech recognition and distributed knowledge
36
DNA Plasmids to Solve a Counting Problem
Rani Siromoney
Professor Emeritus, Madras Christian
College Adjunct Professor, Chennai Mathematical
Institute Chennai, Tamil Nadu, India siromoney_at_cmi
.ac.in
Bireswar Das
Junior Research Fellow Institute of
Mathematical Sciences Chennai, Tamil Nadu, India
bireswar_at_imsc.res.in
37
Sources 1. T. Head, Circular Suggestions for DNA
Computing, in Pattern Formation in Biology,
Vision and Dynamics, Eds. A.Carbone, M Gromov and
P.Prusinkiewicz, World Scientific,Singapore ,
2000, pp. 325-335. 2. J. Kari, A Cryptosystem
Based on Propositional Logic, in Machines,
Languages and Complexity, 5th International
Meeting of Young Computer Scientists,
Czeckoslovakia, Nov. 14-18, 1988, Eds. J. Dassow
and J.Kelemen, LNCS 381, Springer, 1989,
pp.210-219. 3. Rani Siromoney, Bireswar Das, DNA
Algorithm for Breaking a Propositional Logic
Based Cryptosystem, Bulletin of the EATCS, Number
79, February 2003, pp.170-176 (P.T.O)
38
Introducing CUT-DELETE-EXPAND-LIGATE (C-D-E-L)
model Combine features in Divide-Delete-Drop
(D-D-D), (Leiden) and
CUT-EXPAND-LIGATE (C-E-L)(Binghamton) to form
CUT-DELETE-EXPAND-LIGATE (C-D-E-L This enables us
to get an aqueous solution to 3SAT which is a
counting problem and known to be in IP. 3SAT
Defined as follows Instance F a
propositional formula of form F C1 ?C2 ? Cm
where Ci , i 1, 2, , m are clauses. Each Ci is
of the form ( li1 ? li2 ? li3) where li j , j
1, 2, 3 are literals from the set of variables
x1 , x2 , , xn Question What is the number
of truth assignments that satisfy F?
39
Data register molecule a standard double
stranded DNA cloning plasmid commercially
available. This plasmid is a circular molecule
approximately 3 kb. It contains a sub-segment,
MCS (multiple cloning site) of approximately 175
base pairs that can be removed using a pair of
restriction enzyme sites that flank the
segment. The MCS contains pair-wise disjoint
sites at which restriction enzymes act such that
each produces a 5 overhang bases.
40
In C-D-E-L, segment of plasmid used is of the
form c1s1c1c2s2c2cnsnncn ci, i1,,n
are called sites, such that no other subsequence
of plasmid matches with this sequence si,
i1,,n are called stations In D-D-D, lengths of
stations required to be same Difference in
C-D-E-L, lengths of stations all different
differences in lengths, fundamental in solving
3SAT Bio-molecular operations used in
(C-D-E-L)similar to the operations in C-E-L
41
DESIGN x1 , , xn the variables in F, ?x1 ,
, ?xn their negations si station associated
with xi ?si station associatd with ? si ci site
associated with station si?ci site associated
with station ?si vi length of station associated
with xi , i1, , nvnj length of station
associated with literal ?xj , j1,, n
Choose stations in such a way thatthe sequence
v1 , , v2n satisfies the property
k ? vi lt vk1 , k 1, , 2n-1
i1 i.e. an Super-increasing( Easy) Knapsack
Sequence From sum, sub-sequence efficiently
recovered.
42
Solution in Cn is analyzed by gel separation If
more than one solution is present, they will be
of different lengths, thus will form separate
bands By counting number of bands, we count
the number of satisfying assignments. Furthermore
, from lengths of satisfying assignment ,exact
assignment is read. This can be done since
stations have lengths from easy knapsack
sequence - any subsequence of an easy knapsack
sequence has different sum from the sums of other
subsequences.
43
C-D-E-L
44
(No Transcript)
45
Thus solution to 3SAT viz. finding the
number of satisfying assignments is effectively
done. Moreover, reading the truth assignments
is a great advantage to break the cryptosystem
based on propositional logic
46
Advantage over previous method of attack In the
cryptanalytic attack proposed earlier, modifying
D-D-D, it was required to execute the DNA
algorithm for each bit in the crypto-text But in
the present method proposed, using C-D-E-L
(combining features of C-C-C and C-E-L )
apply 3-SAT on P and read any satisfying
assignment from the final solution This gives an
equivalent public key, which amounts to breaking
the cryptosystem
47
Parikh Again
Parikh Mapping (vector) classical and important
tool in theory of Formal Languages.- Image of a
CFL is semi-linear- Basic idea Properties of
words expressed as numerical properties
of words- But much information lost in the
transition
48
Parikh Matrices
  • - All matrices are triangular with 1s on the
    main diagonal and 0s below it- Classical Parikh
    vector occurs as 2nd diagonal all other
    entries contain information about the order of
    letters in the original word
  • Two words with same Parikh matrix always have the
    same Parikh vector
  • But two words with the same Parikh Vector have in
    many cases different Parikh matrices
  • Thus the Parikh matrix gives more information
    about a word than its Parikh vector.
  • Still not injective- Open problem to
    characterize non-injectivity

49
References Mateescu,A., Salomaa,A., Salomaa K.
and Yu, S., A Sharpening of the Parikh mapping,
Theroret. Informatics Appl. 35 (2001)
551-564 A. Atanasiu, C. Martin-Vide and
Mateescu,A., On the injectivity of the Parikh
matrix mapping (submitted) Mateescu,A.,
Salomaa,A., and Yu, S., An Inequality for
Occurrences of SubwordsTurku Centre for Computer
Science, TUCS Technical ReportNo 481, December
2002
50
THANK YOU
Write a Comment
User Comments (0)
About PowerShow.com