Evaluation of On-Chip Interconnect Architectures for Multi-Core DSP - PowerPoint PPT Presentation

1 / 1
About This Presentation
Title:

Evaluation of On-Chip Interconnect Architectures for Multi-Core DSP

Description:

... on the network; re-order buffers will insure the proper ordering on the target. ... Re-Order buffer are complicated and require big area, and therefore ... – PowerPoint PPT presentation

Number of Views:25
Avg rating:3.0/5.0
Slides: 2
Provided by: mot109
Category:

less

Transcript and Presenter's Notes

Title: Evaluation of On-Chip Interconnect Architectures for Multi-Core DSP


1
Evaluation of On-Chip Interconnect Architectures
for Multi-Core DSP
Students Haim Assor, Horesh Ben Shitrit
Project Number p-2006-092
Supervisors Dr. Shlomo Greenberg
Mr. Ori Goren
Mr. Norman Goldstein
1. Introduction The need for Multi Core
Chip manufactures nowadays are building multiple
processing units inside one integrated chip.
Those energy-efficient processing cores instead
of one powerful core help in reducing the power
consumption while increasing performance. Each
core doesnt necessarily run as fast as highest
performing single-core module, but this
multi-core architecture improve overall
performance by utilizing parallelism.
Multi-Core systems allow parallel processing on a
chip using many small processors or simply allow
communication processors to process more data
streams such as communication channels. Enable
several cores to share the same resources cause
new communicational problems, problems like
resource unavailability, coherency keeping
etc. Using the classical shared bus approach for
connectivity between different components inside
the chip leads to a major bottleneck in todays
Multi-Core systems only one transaction at a
time is available on the bus. All other
components that wish to use the bus have to wait
for it to be free.
Performance through Parallelism
More Data Transfer
More Data Processing
Increased Algorithmic Complexity
High Energy Consumption
Multi Core system
Project Goal In this project we address the
connectivity problem in a multi-core DSP. We
explore and model new interconnect architectures
that comes to replace the classical shared bus.
Our goal is to analyze these architectures and
give quantitative evaluation to multi-core
performance with each architecture.
2. Shared Bus
3. Fabric
4. Network on Chip (NoC)
  • In Fabric architecture every master has a
    dedicated bus to each one of the slaves, and each
    bus has full bandwidth power and can operate any
    main bus feature.
  • The Fabric interconnect enables concurrent
    transactions between different components.
  • In each slave device entry there is an arbiter
    which is responsible to decide which transaction
    takes place at a specific moment.
  • The Fabric architecture enables high performance
    capabilities but expensive in terms of area.
  • Fabric main features are
  • Connects multi master multi slave systems and
    therefore more complex than the shred bus (design
    and verification efforts).
  • Non-Blocking - Many concurrent transactions.
  • The Transfer approach contains address, data and
    control similar to the shared bus.
  • Preemptive.
  • Like the Shared Bus, it can handle out-of-order
    transactions, but this feature will increase
    design effort and should be taken into
    consideration.
  • Enables memory bank interleaving.
  • Network on Chip use packet switched transfer
    approach to transfer data between different chip
    components. The NoC is based on computer
    networking. The packet contains the destination
    address, the data and other control features
    needed for correct transfer. Transactions that
    move through the network are out of order this
    means that packets from different initiators can
    mix on the network re-order buffers will insure
    the proper ordering on the target.
  • The NoC is constructed from identical routers
    which construct a homogeneous, scalable network,
    therefore has high growth capabilities.
  • NoC main features are
  • The NoC is based on Routers which considered
    simple and their major advantage is the ability
    to build a network easily from the same
    components.
  • The Router disadvantage is the need for Re-Order
    buffer. This feature is necessary because of the
    network transaction nature Out-of-order.
    Re-Order buffer are complicated and require big
    area, and therefore increases design effort.
  • The Transfer approach is packet based (each
    packet contain the address, data and control
    information).
  • Non-preemptive.
  • Semi Non-Blocking - Many transactions at a time,
    but can stall transaction.
  • Shared Bus is the classical way to connect
    components inside the chip. Components are
    divided to masters which can initiate a
    transaction (for example cores, DMA, peripherals
    like Ethernet controllers etc) and slaves who
    can only reply (like memories).
  • The 3-bus architecture (address, data and
    control) is a shared transmission medium. A
    signal transmitted by one device is available for
    reception by all other devices attached to the
    bus. Only one device can successfully transmit at
    a time, and an arbitration mechanism is needed to
    control the transfers.
  • In a multi-core system all cores connect to the
    same shares bus and therefore extra burden lies
    on the bus.
  • Shared Bus main features are
  • Simplicity (small design and verification
    efforts).
  • Blocking - Only one transaction at a time.
  • Transfer approach contains address, data and
    control.
  • Preemptive (can stall low priority transaction
    when higher priority arrives).
  • There are buses that can handle out-of-order
    transaction (Out of Order is a feature for
    handling data that was accepted not in the order
    that was requested).

5. Modeling
6. Asymptotic comparison
The Table shows a theoretical comparison of
multi-core interconnect solutions at the
asymptotic limit for n 100 (n represents number
of cores in the system). The advantages of Cost
and Performance of the NoC over other
interconnect solutions is clear, but this table
is deceiving because number of cores in today's
most advanced chips is currently 2 to 8, and the
NoC suffers from big overhead that is not
displayed in that table (the coefficient of the
cost and performance functions is dropped).
Therefore the NoC advantages will come to
fruition only in future technology generation
when the number of cores will increase.
Architecture Total area Power Dissipation Operating Frequency



In order to examine the performance of different
interconnects we modeled typical systems that
contain the same components with different
interconnect solutions. We used the PANAMA tool
which is SystemC based software that enables
modeling components and transactions. The PANAMA
helps architects to model behavior of complete
chip even before the RTL stage. The diagrams
represent the modeled systems. Each system
simulate real multi-core chip that contains 4
cores, memories, DMA and peripherals. Traces of
several typical applications were executed on
these systems. Besides the classical shared bus,
a split shared bus was modeled which is also a
common solution in order to overcome part of
shred bus limitations. Also modeled a specific
fabric which is in use at Freescales chips and a
split fabric to make the comparison
complete. Results for comparison between those 4
interconnect architectures are shown at the
bottom, it is clear that the Fabric outcomes the
shared bus at this type of systems.
Shared Bus
Fabric
NoC
.
7. Conclusions and future research
It is clear that in the long run with the
progress in technology, chips will be more
complicated, and will contain many cores on the
same die. With that scenario we assume that chips
manufactures will choose the NoC as their
interconnect solution. Meanwhile the most
complicated chips contain only several computing
units (cores) and the NoC advantages are still
not obvious, Furthermore NoC has a lot of
limitation when using it with small amount of
cores. Initial results show that for multi-core
chips containing several cores, the best
interconnect solution is the fabric. Our purpose
is to produce quantitative conclusions, which
will help choosing the best interconnect solution
according to number of cores in the system.
Write a Comment
User Comments (0)
About PowerShow.com