Title: Resource Mapping and Scheduling for Heterogeneous Network Processor Systems
1Resource Mapping and Scheduling for Heterogeneous
Network Processor Systems
- Liang Yang, Tushar Gohad, Pavel Ghosh,
- Devesh Sinha, Arunabha Sen and Andrea Richa
2Agenda
- Network Processor (NP) System
- Resource Mapping and Scheduling Problem
- Heuristic Approach
- Linear Programming and Randomized Rounding
- Resource Contention Issue
- Detection and Elimination
- Experimental Results
- Summary and Future Work
3Network Processor Systems
- Programmable devices designed to process packets
at wire-speed - Non-homogeneous real-time systems
- Comprise of a mix of ASICs, programmable
processors and on-chip interconnects - Optimized to support multiple applications such
as IPv4, Diffserv, etc.
4Resource Mapping and Scheduling Problem in NP
- Given a set APPAPP1, APP2, ,APPk of
applications each specified by a DAG, where each
application APPj has a set of constraints (e.g.
timing constraints, area constraints etc.), find
the mapping that minimize the system cost in
terms of dollar value while satisfying all the
design constraints - Assuming only one application active at any given
time
5System Specification
- Possible Task-to-Resource Mappings
- Several algorithms may be available for
execution of a task - Associated with each resource are cost and area
parameters - There may be multiple instances of a resource
6Integer Linear Programming (ILP) formulation
- Objective
- Find a task-to-resource mapping with minimum cost
- Constraints
- Board area constraint
- Timing constraint
- Unique task constraint
- Exclusive resource constraint
- Communication delay constraint
- Task-to-Resource mapping constraint
- Task dependency constraint
- Example design problem with 3-flows
- 800 variables
- 2000 constraints
7Heuristic Approach-- Randomized Rounding
- Based on Linear Programming solution
- Traditional evolutionary algorithms require a set
of feasible solutions as a starting point, i.e.
Genetic Algorithms, Simulated Annealing - Hard to obtain an initial feasible set due to the
conflicting constraints (area, time) in the
problem
8Randomized Rounding
- Relax integrality constraints of the ILP and
solve the LP - Fractional values of the binary variables used as
probabilities for rounding them to either 0 or 1 - Variable Randomized Rounding
- Randomly select variables from a set of randomly
chosen constraints - Round the selected variables
- Iterative rounding in case of constraint violation
9Randomized Rounding (cont.)
- Fixing Variables
- Reducing the number of variable to be rounded
- Fix variable with integer values after solving LP
- Iteratively solve LP till the number of integer
variables does not increase - Grouping variables
- Assign priority based on the variable group
affiliation -
10Randomized Rounding (cont.)
- Rollback Point Selection
- Roll back only to the last group where
constraint violation occurred - Rounding Step Size
- Round one or more each time?
11Randomized Rounding Results
- Near-optimal solution in a fraction of ILP
solution time
12Exploration of Solution Space
- If the deadline constraint is too strict, the ILP
may not have any feasible solution for the
existing set of resources. - On the other hand, with a too relaxed deadline
feasible solution will be obtained with increased
chance of resource contention. - Solution space is explored using binary search in
order to find a least cost feasible solution
without any resource contention.
13Improvement of Solution
- Relaxed deadline for packet processing helps to
reduce the system cost in dollar value. - Packet latency is increased, while satisfying the
line speed. - This approach allows multiple packets to be
inside the system simultaneously (packet level
parallelism). - There may be resource contention if more than one
packet try to access the same resource at the
same instance of time for two different tasks.
14Resource Contention
- Example
- Line rate 10Gbps, Packet size 64 bytes
- No Packet Gap
- Packet arrives every 51ns
15Resource Contention Detection
- Packet Flow Graph (PFG)
- This is visual depiction of the flow of packets
through various resources inside NP system - G(V, E) V is the set the of resources allocated
by the ILP, with additional entry and exit nodes,
s and t, respectively. - Edge e (u, v) e E, if resource u and v are
sequentially allocated. - Weight w(e) is associated with edge e w(e)
(x(e), y(e)) where x(e) is the allocation
sequence of the resource and y(e) is the
execution time on that sequence.
16Resource Contention Detection
- Resource Cycle Time
- Calculation in PFG
- It is defined as the maximum time span for which
a resource is busy in executing the set of tasks
for a packet. - Resource is not available until it finishes all
the tasks for a packet scheduled on it - Maximum Cycle Time
- It is defined as the maximum of all resource
cycle times. - Resource contention is detected if maximum cycle
time is greater than packet arrival rate. - Gantt chart is used to detect resource contention
among multiple paths in a task graph
17Resource Contention (Single Path)
18Resource Contention (Multiple Paths)
19Resource Contention Elimination
- Binary search approach to speed up the
exploration of solution space iteratively. - Solution found by ILP is scrutinized for resource
contention. - If there is no resource contention, no more work
needed. - search iteratively for least cost feasible
solution otherwise
20Resource Contention Elimination
d is the arrival rate of the packets and l is the
maximum diameter of the flow graphs
21Experimental Settings
- Codesign method applied to a Packet Processing
System similar to the Intel IXP2400 network
processor - Resource set derived from Intel IXP2400
architecture - Application set derived from the standard
benchmarking applications defined by the Network
Processing Forum, for which there is a mapping
available from Intel - Compared performance of the mapping generated by
our approach with the standard mapping specified
by Intel as part of the IXA Application Framework
22Performance Metrics
- End-to-end Packet Latency
- Defined as the time interval starting when the
first bit of a - packet enters the input port and ending when the
first bit of - the packet reaches the output port
- Throughput
- The number of data bits transferred in unit time.
Measured - at 0 packet loss while varying packet size
- Resource Utilization
- The ratio of the time a resource was active and
the total - measurement time
23Input Task Graphs
24Experimental Parameters
25Experimental Results
26Experimental Results
27Experimental Results
28Conclusion and Future Work
- Codesign framework for PPSs with consideration of
multiple flows and real-time constraints - The iterative improvement scheme introduces
packet-level parallelism into the system - For task graphs of the benchmark applications,
the method produces solution in a small time and
shows performance metrics comparable to the
existing PPSs - The framework can be extended with
- An object-oriented or modeling language for
specification - Effects of caching and multithreading
- Dynamic analysis for workload characterization
29