Groundwater. Notes on geostatistics

Monica Riva, Alberto Guadagnini Politecnico di

Milano, Italy Key reference de Marsily, G.

(1986), Quantitative Hydrogeology. Academic

Press, New York, 440 pp

Modelling flow and transport in heterogenous

media motivation and general idea

Understanding the role of heterogeneity

Jan 2000 editorial "It's the Heterogeneity!

(Wood, W.W., Its the Heterogeneity!, Editorial,

Ground Water, 38(1), 1, 2000) heterogeneity of

chemical, biological, and flow conditions should

be a major concern in any remediation

scenario. Many in the groundwater community

either failed to "get" the message or were forced

by political considerations to provide rapid,

untested, site-specific active remediation

technology. "It's the heterogeneity," and it is

the Editor's guess that the natural system is so

complex that it will be many years before one can

effectively deal with heterogeneity on societally

important scales. Panel of experts

(DOE/RL-97-49, April 1997) As flow and transport

are poorly understood, previous and ongoing

computer modelling efforts are inadequate and

based on unrealistic and sometimes optimistic

assumptions, which render their output unreliable.

Flow and Transport in Multiscale Fields

(conceptual)

?

Field laboratory-derive conductivities

dispersivities appear to vary continuously with

the scale of observation (conductivity support,

plume travel distance). Anomalous

transport. Recent theories attempt to link such

scale-dependence to multiscale structure of Y

ln K. Predict observed effect of domain size on

apparent variance and integral scale of

Y. Predict observed supra linear growth rate of

dispersivity with mean travel distance

(time). Major challenge develop more

powerful/general stochastic theories/models for

multiscale random media, and back them with

lab/field observation.

?

?

?

Neuman S.P., On advective transport in fractal

permeability and velocity fields, Water Res.

Res., 31(6), 1455-1460, 1995.

Shed some light Conceptual difficulty Data

deduced by means of deterministic Fickian models

from laboratory and field tracer tests in a

variety of porous and fractured media, under

varied flow and transport regimes. Linear

regression aLa ? 0.017 s1.5 Supra-linear growth

Natural Variability. Geostatistics revisited

- Introduction Few field findings about spatial

variability - Regionalized variables
- Interpolation methods
- Simulation methods

AVRA VALLEY Clifton and Neuman, 1982 Clifton,

P.M., and S.P. Neuman, Effects of Kriging and

Inverse Modeling on Conditional Simulation of the

Avra Valley Aquifer in southern Arizona, Water

Resour. Res., 18(4), 1215-1234, 1982. Regional

Scale

Columbus Air Force Adams and Gelhar,

1992 Aquifer Scale

Mt. Simon aquifer Bakr, 1976 Local Scale

- Summary Variability is present at all scales
- But, what happens if we ignore it? We will see in

this class that this would lead to interpretation

problems in both groundwater flow and solute

transport phenomena - Examples in transport - Scale effects in

dispersion - - New processes arising
- Heterogeneous parameters ALL (T, K, , S, v

(q), BC, ...) - Most relevant one T (2D), or K (3D), as they

have been shown to vary orders of magnitude in an

apparently homogeneous aquifer

Variability in T and/or K

Summary of data from many different places in the

world. Careful though! Data are not always

obtained with rigorous procedures, and moreover,

as we will see throughout the course, data depend

on interpretation method and scale of

regularization Data given in terms of mean and

variance (dispersion around the mean value)

Variability in T and/or K Almost always slnT (or

slnK ) lt 2 (and in most cases lt1) This can be

questioned, but OK by now Correlation scales

(very important concept later!!)

- But, what is the correct treatment for natural

heterogeneity? - First of all, what do we know?
- - real data at (few) selected points
- - Statistical parameters
- - A huge uncertainty related to the lack of

data in most part of the aquifer. If parameter

continuous (of course they are), then the number

of locations without data is infinity - Note The value of K at any point DOES EXIST. The

problem is we do not know it (we could if we

measured it, but we could never be exhaustive

anyway) - Stochastic approach K at any given point is

RANDOM, coming from a predefined (maybe known,

maybe not) pdf, and spatially correlated ------

REGIONALIZED VARIABLE

Regionalized Variables

- T(x,?) is a Spatial Random Function iif
- If ? ?0 then T(x,?0) is a spatial function

(continuity?, differentiability?) - If x x0 then T(x0) (actually T(x0, ?)) is a

random function - Thus, as a random function, T(x0) has a

univariate distribution (log-normal according to

Law, 1944 Freeze, 1975)

Hoeksema and Kitanidis, 1985

Hoeksema Kitanidis, 1985 Log-T normal, log-K

normal Both consolidated and unconsolidated

deposits

Now we look at T(x), so we are interested in the

multivariate distribution of T(x1), T(x2), ...

T(xn) Most frequent hypothesis Y(Y(x1),

Y(x2), ... Y(xn))(ln T(x1), ln T(x2), ... ln

T(xn)) Is multinormal with But most

important NO INDEPENDENCE

What if independent? and then we are in

classical statistics But here we are not, so we

need some way to characterize dependency of one

variable at some point with the SAME variable at

a DIFFERENT point. This is the concept of the

SEMIVARIOGRAM (or VARIOGRAM)

Classification of SRF

- Second order stationary
- EZ(x)const
- C(x, y) is not a function of location (only of

separation distance, h) - Particular case isotropic RSF C(h) C(h)
- Anisotropic covariance different correlation

scales along different directions - Most important property if multinormal

distribution, first and second order moments are

enough to fully characterize the SRF multivariate

distribution

(No Transcript)

Relaxing the stationary assumption

1. The assumption of second-order stationarity

with finite variance, C(0), might not be satisfied

(The experimental variance tends to increase

with domain size)

2. Less stringent assumption INTRINSIC HYPOTHESIS

The variance of the first-order increments is

finite AND these increments are themselves

second-order stationary. Very simple example

hydraulic heads ARE non intrinsic SRF

EY(x h) Y(x) m(h) varY(x h) Y(x)

?(h)

Independent of x only function of h

Usually m(h) 0 if not, just define a new

function, Y(x) m(x), which satisfies this

consition

Definition of variogram, ?(h)

EY(x h) Y(x) 0 ?(h) (1/2) varY(x h)

Y(x) (1/2) E(Y(x h) Y(x))2

Variogram v. Covariance

1. The variogram is the mean quadratic increment

of Y between two points separated by h.

2. Compare the INTRINSIC HYPOTHESIS with

SECOND-ORDER STATIONARITY

EY(x) m constant ?(h) (1/2) E(Y(x h)

Y(x))2 (1/2) ( EY(x h)2

EY(x)2 2 m2 2 EY(x h) Y(x) 2 m2)

C(0) C(h)

variogram

covariance

h

The variogram

The definition of the Semi-Variogram is usually

given by the following probabilistic formula

When dealing with real data the

semi-variogram is estimated by the Experimental

Semi-Variogram. For a given separation vector,

h, there is a set of observation pairs that are

approximately separated by this distance. Let the

number of pairs in this set be N(h). The

experimental semi-variogram is given by

(No Transcript)

Some comments on the variogram

If Z(x) and Z(xh) are totally independent,

then If Z(x) and Z(xh) are totally

dependent, then One particular case is when x

xh. Therefore, by definition

In the stationary case

Variogram Models

- DEFINITIONS
- Nugget
- Sill
- Range
- Integral distance or correlation scale

- Models
- Pure Nugget
- Spherical
- Exponential
- Gaussian
- Power

(No Transcript)

- Correlation scales Larger in T than in K.

Larger in horizontal than in vertical. Fraction

of the domain of interest

Additional comments

- Second order stationary
- EZ(x)constant
- ?(h) is not a function of location
- Particular case isotropic RSF ?(h) ?(h)
- Anisotropic variograms two types of anisotropy

depending on correlation scale or sill value - Important property ?(h) ?2 C(h)
- Most important property if multinormal

distribution, first and second order moments are

enough to fully characterize the SRF multivariate

distribution

Estimation vs. Simulation

- Problem Few data available, maybe we know mean,

variance and variogram - Alternatives
- (1) Estimation (interpolation) problems KRIGING
- Kriging BLUE
- Extremely smooth
- Many possible krigings Alternative cokriging

http//www-sst.unil.ch/research/variowin/

The kriging equations - 1

We want to predict the value, Z(x0), at an

unsampled location, x0, using a weighted average

of the observed values at N neighboring

locations, Z(x1), Z(x2), ..., Z(xN). Let

Z(x0) represent the predicted value a weighted

average estimator be written as

The associated estimation error is

In general, we do not know the (constant) mean,

m, in the intrinsic hypothesis. We impose the

additional condition of equivalence between the

mathematical expectation of Z and Z0.

The kriging equations - 2

Unknown mathematical expectation of the process Z.

This condition allows obtaining an unbiased

estimator.

The kriging equations - 3

We wish to determine the set of weights. IMPOSE

the condition

The kriging equations - 4

We then use the definition of variogram

THEN

Which I will use into

The kriging equations - 5

By substitution

Noting that

We finally obtain

The kriging equations - 6

This is a constrained optimization problem. To

solve it we use the method of Lagrange

Multipliers from the calculus of variation. The

Lagrangian objective function is

To minimize this we must take the partial

derivative of the Lagrangian with respect to each

of the weights and with respect to the Lagrange

multiplier, and set the resulting expressions

equal to zero, yielding a system of linear

equations

The kriging equations - 7

Minimize this

and get (N1) linear equations with (N1)

unknowns

The kriging equations - 8

The complete system can be written as A ? b

The kriging equations - 9

We finally get the Variance of the Estimation

Error

Estimation vs. Simulation (ii)

- (2) Simulations try to reproduce the look of

the heterogeneous variable - Important when extreme values are important
- Many (actually infinite) solutions, all of them

equilikely (and with probability 0 to be

correct) - For each potential application we are interested

in one or the other

Estimation. 1

AVRA VALLEY. Regional Scale - Clifton, P.M., and

S.P. Neuman, Effects of Kriging and Inverse

Modeling on Conditional Simulation of the Avra

Valley Aquifer in southern Arizona, Water Resour.

Res., 18(4), 1215-1234, 1982.

Estimation. 2

AVRA VALLEY. Regional Scale - Clifton, P.M., and

S.P. Neuman, Effects of Kriging and Inverse

Modeling on Conditional Simulation of the Avra

Valley Aquifer in southern Arizona, Water Resour.

Res., 18(4), 1215-1234, 1982.

Estimation. 3

AVRA VALLEY. Regional Scale - Clifton, P.M., and

S.P. Neuman, Effects of Kriging and Inverse

Modeling on Conditional Simulation of the Avra

Valley Aquifer in southern Arizona, Water Resour.

Res., 18(4), 1215-1234, 1982.

Estimation. 4

AVRA VALLEY. Regional Scale - Clifton, P.M., and

S.P. Neuman, Effects of Kriging and Inverse

Modeling on Conditional Simulation of the Avra

Valley Aquifer in southern Arizona, Water Resour.

Res., 18(4), 1215-1234, 1982.

Estimation. 5

AVRA VALLEY. Regional Scale - Clifton, P.M., and

S.P. Neuman, Effects of Kriging and Inverse

Modeling on Conditional Simulation of the Avra

Valley Aquifer in southern Arizona, Water Resour.

Res., 18(4), 1215-1234, 1982.

Monte Carlo approach

CONDITIONAL CROSS-CORRELATED FIELDS Y lnT

h1

Statistical CONDITIONAL moments, first and second

order

h2

. . .

. . .

h2000

2000 simulations

NUMERICAL ANALYSIS - MONTE CARLO

? Evaluation of key statistics of medium

parameters (K, porosity, ) ? Synthetic

generation of an ensemble of equally likely

fields ? Solution of flow/transport problems on

each one of these ? Ensemble statistics

? Simple to understand ? Applicable to a wide

range of linear and nonlinear problems ? High

heterogeneities ? Conditioning

? Heavy calculations ? Fine computational grids ?

Reliable convergence criteria (?)

Problems reliable assessment of convergence

Ballio and Guadagnini 2004

Hydraulic head variance

Number of Monte Carlo simulations