Parallel Computation of the 2D Laminar Axisymmetric Coflow Nonpremixed Flames - PowerPoint PPT Presentation

About This Presentation
Title:

Parallel Computation of the 2D Laminar Axisymmetric Coflow Nonpremixed Flames

Description:

Parallel Computation of the 2D Laminar Axisymmetric Coflow Nonpremixed Flames. Qingan Andy Zhang ... Ghost Points are placed at the boundary to reduce ... – PowerPoint PPT presentation

Number of Views:63
Avg rating:3.0/5.0
Slides: 26
Provided by: And6221
Category:

less

Transcript and Presenter's Notes

Title: Parallel Computation of the 2D Laminar Axisymmetric Coflow Nonpremixed Flames


1
Parallel Computation of the 2D Laminar
Axisymmetric Coflow Nonpremixed Flames
ECE 1747 Parallel Programming Course
Project Dec. 2006
  • Qingan Andy Zhang
  • PhD Candidate
  • Department of Mechanical and Industrial
    Engineering
  • University of Toronto

2
Outline
  • Introduction
  • Motivation
  • Objective
  • Methodology
  • Result
  • Conclusion
  • Future Improvement
  • Work in Progress

3
Introduction
  • Multi-dimensional flame
  • Easy to model
  • Computationally OK with detail sub-models such as
    chemistry, transport, etc.
  • Lots of experimental data
  • Resembles the turbulent flames in some cases (eg.
    flamelet regime)

4
Motivation
The run time is expected to be long if
  • Complex Chemical Mechanism
  • Appel (2000) mechanism (101 species,543
    reactions)
  • Complex Geometry
  • Large 2D coflow laminar flame (1,000500500,000)
  • 3D laminar flame (1,00050010050,000,000)
  • Complex Physical Problem
  • Soot formation
  • Multi-phase problem

5
Objective
To develop parallel flame code based on the
sequential flame code
  • Speedup
  • Feasibility
  • Accuracy
  • Flexibility

6
Methodology -- Options
  • Shared Memory
  • OpenMP
  • Pthread
  • Distributed Memory
  • MPI
  • Distributed Shared Memory
  • Munin
  • TreadMarks

MPI is chosen because it is widely used for
scientific computation, easy to program and also
the cluster is a Distributed Memory system.
7
Methodology -- Preparation
  • Linux OS
  • Programming tool (Fortran, Make, IDE)
  • Parallel computation concepts
  • MPI commands
  • Network (SSH, queuing system)

8
Methodology Sequential code
  • Sequential Code Analysis
  • Algorithm
  • Dependency
  • Data
  • I/O
  • CPU time breakdown

Sequential code is the backbone for
parallelization!
9
Methodology
Flow configuration and computational domain
10
Methodology
CFD Finite Volume Method Iterative process on
Staggered grid
Quantities solved (primitive variables) U, V,
P, Yi (i1,KK), T Yi --- ith gas species mass
fraction KK --- total gas species number
If KK100, then we have to solve (31001)104
equations at each point. If mesh is 1000500,
then, we have to solve 104100050052,000,000
equations in each iteration. If 3000 iterations
are required to get converged solution, we have
to totally solve 52,000,0003000156,000,000,000
equations.
Flow configuration and computational domain
11
General Transport Equation
Unsteady Term Convection Term Diffusion Term
Source Term Unsteady time variant
term Convection caused by flow motion Diffusion
For species molecular diffusion and thermo
diffusion Source term For species chemical
reaction
12
Mass and Momentum equation
Mass
Axial momentum
Radial momentum
13
Species and Energy equation
Species
Energy
14
Methodology Sequential code
  • Start iteration from scratch or continued job
  • Within one iteration
  • Iteration starts
  • Discretization ? get AP(I,J) and CON(I,J)
  • Solve ? TDMA or PbyP Gauss Elimination
  • Get new value? update F(I,J,NF) array
  • Do other equations
  • Iteration ends
  • End iteration if convergence reached

15
Methodology Sequential code
Most time-consuming part Species Jacobian
matrix DSDY(K1,K2,I,J) evaluation Dependency??
Fig. 1 CPU time for each sub-code summarized
after one iteration with radiation included
16
Methodology -- Parallelization
Domain Decomposition Method (DDM) with Message
Passing Interface (MPI) programming
Ghost Points are placed at the boundary to reduce
communication among processes!
17
Cluster Information
  • Cluster location icpet.nrc.ca in Ottawa
  • 40 nodes connected by Ethernet
  • AMD Opteron 250 (2.4GHz) with 5G memory
  • Redhat Linux Enterprise Edition 4.0
  • Batch-queuing system Sun Grid Engine (SGE)
  • Portland Group compilers (V 6.2) MPICH2

n1-5 n2-5 n3-5 n4-5    n5-5 n6-5 n7-5 n8-5 n1-4
n2-4 n3-4 n4-4    n5-4 n6-4 n7-4 n8-4 n1-3 n2-3
n3-3 n4-3    n5-3 n6-3 n7-3 n8-3 n1-2 n2-2 n3-2
n4-2    n5-2 n6-2 n7-2 n8-2 n1-1 n2-1 n3-1 n4-1
   n5-1 n6-1 n7-1 n8-1
18
Results --Speedup
Table 1 CPU time and speedup for 50 iterations
with Appel et al. 2000 mechanism
Processes Sequential 4 processes 6 processes 12 processes
CPU time(s) 51313 15254 10596 5253
Speedup 1 3.36 4.84 9.77
  1. Speedup is good
  2. CPU time spent on 50 iterations for the original
    sequential code is 51313 seconds, i.e. 14.26
    hours. Too long!

19
Results --Speedup
Fig. 3 Speedup obtained with different processes
20
Results --Real application
Flame field calculation using the parallel
code (Appel 2000 mechanism)
The trend is well predicted!
21
Conclusion
  • The sequential flame code is parallelized with
    DDM
  • Speedup is good
  • The parallel code is applied to model a flame
    using a detailed mechanism
  • Flexibility is good, i.e. geometry and/or of
    processors can be easily changed

22
Future Improvement
  • Optimized DDM
  • Species line solver

23
Work in Progress
  • Fixed sectional soot model
  • Add 70 equations to the original system of
    equations

24
Experience
  • Keep communication down
  • Wise parallelization method
  • Debugging is hard
  • I/O

25
Thanks
  • Questions?
Write a Comment
User Comments (0)
About PowerShow.com