The management and analysis of data derived from acid rock drainage (ARD) studies is essential for predictions, conclusions and recommendations. The precursor to data management and analysis is the sampling program. Sampling is the single most important aspect of a good survey, for without good sampling, analytical results may not be valid and hence correct interpretations will be difficult to achieve. Sampling with respect to acid rock drainage material is discussed by Downing and Shaw, 2000. Likewise, quality assurance / quality control is discussed in a paper by Downing and Mills, 1998. The purpose of this paper is to focus on the concepts and methods involved in data management and analysis.

The collection of data begins in the field with qualified people who must be involved from the initial data gathering through to the laboratory test work and interpretations with conclusions.

The first goal of the ARD practitioner is to characterize the material and determine its potential for generating acid and potential metal leaching capacity (of the material) with such categories as:

- Acid generating (AG)
- Potential acid generating (PAG)
- Potential acid consuming (PAC)
- Acid consuming (AC)

These various groups are all predicated upon a single threshold number (i.e. ABA = acid base accounting), upon which many dollars must be spent either treating the material or disposing of it in a proper manner. Poor sampling, poor laboratory analysis and poor data analysis can make the difference in either spending or saving millions of dollars.

Sources of ARD data are derived from the following:

__Field Data__

Field data consist of observations and collection parameters, collectively called observational data, collected at the site being examined.

__Analytical Data__

Analytical data generally consist of acid base accounting parameters (neutralization potential, total sulphide, sulphate sulphur, carbon dioxide), trace element, whole rock (major oxides) and water quality analyses. Major components of the data are both the method of analysis and the type of sample digestion, which have been discussed in papers by Shaw, Downing et al. and Mills.

__Site Test Data__

*Field Site*: The field laboratory site test work involves constructing waste rock test pads on site in order to monitor leachate from the various types of waste material under field conditions.

*Laboratory Site*: Laboratory site data consists of kinetic and humidity cell testator. These types of tests in many ways can be classified as experimental producing data obtained under controlled conditions. Laboratory tests are discussed by Shaw, Mills and Shaw.

__Mineralogical Data__

Mineralogical data consist of thin section and x-ray diffraction techniques to determine the modal mineralogy, (Shaw & Mill, 1998). The contribution of specific minerals to the neutralization potential is important in understanding the various static (and kinetic) test results (Jambor et al. 2000).

__Data Variability__

There are four kinds of variability in geological/ARD data (Koch & Link 1970):

- Natural variability , that inherent in the geological material being sampled,
- Sampling variability, that produced by the physical sampling process,
- Preparation variability, that introduced in preparation of the material for chemical analysis by crushing, splitting etc., and Analytical variability, that introduced by the chemical or physical determination of substances in the geological material.

What constitutes a good database and how reliable is it ? Data integrity is a constant concern. The construction of a valid database begins with good sample collection. Appropriate sample collection, preparation, analytical procedures and standards must be maintained throughout a project life. Errors can be generated throughout the whole scenario of a project from data collection, preparation, analysis, input, transfer and merging through to reporting.

How to eliminate or minimize errors is not the question during data analysis but how to recognize them, correct them and report them is of major importance. Check sampling and validation of the database should be carried out even though it is time consuming to the point of being 'boring'. Error recognition can be achieved through periodic printouts and plots, and/or a complete database dump followed by manual editing. This also provides a quick data reference. One should generate ways of cross checking the data through use of plots or mathematical manipulation, querying all results and basic statistics. There is always an element of luck spotting errors before final reporting. Errors always seem to crop up at the most inappropriate time. In reserve estimation, there are numerous mathematical manipulations where incorrect data can generate wrong results. An effective method of error reduction is having the project people directly involved with the data analysis and reporting since they can best identify incorrect results generated through the data processing.

The valid database is still prone to problems when it is subdivided into sections for analysis using similar or different software, manipulations and calculations performed and the data dumped back into the original database. Retaining current database versions is very important as well as documenting the whole database. A central, currently correct, database must be securely maintained as well as doing routine backups and offsite storage of the databases. The end product should always be questioned "how defensible is my data?"

ARD data does not generally behave as normal distribution but often more closely resembles a lognormal distribution and statistical assumptions may fail to describe the real data behaviour. Evaluation of the data can range from simple plots and statistics to more rigorous statistical analysis which at this stage would require the services of a qualified statistician. For most ARD applications, the former is the standard; very few studies have ever used the latter as is evident from lack of published papers or presentations at ARD conferences. The analytical data is essentially geochemical data, the evaluation of which is the focus of numerous papers published in geochemical journals (Garrett et al., 1980 and Kurzl, 1988).

Whatever methods are used by the ARD practitioner, they should be easily understood, visual (graphical presentations) and easily applied.

There are several computer statistical programs available which deal with the rigorous statistical analysis of the data. Computer spreadsheet programme generally contain features that can be used for a much less rigorous approach to the statistical analysis of the data. These programme also contain plotting features that are necessary for the visual interpretation and presentation of the data.

When utilizing multivariate statistics, one must be reasonably competent or employ the services of a qualified statistician.

The following are examples of useful plots that can aid in the interpretation of data.

- MPA vs S (total) Plot (Figure 1)

This type of plot shows the distribution of sulphate in the samples. For samples with no sulphate, there will be a straight line as MPA=S% total; if a sample does contain sulphate then that sample will plot below the line. Sulphur speciation is an integral part of this analysis, which is discussed by Day et al. 2000. - Neutralization Plots

These plots are useful to observe the major contributors to the neutralization potential (or buffering minerals).1. CO _{2}vs NP (Figures 2 & 2B)

This type of plot indicates the importance and distribution of carbonate for neutralization. If there is a direct (positive) correlation, then the assumption that NP is attributable to carbonate is correct.

2. Test for Surrogate Elements (Ca vs NP Figures 3 & 3B)

If the evaluation of CO_{2}vs NP is positive, then calcium-bearing carbonate must be examined using Ca vs NP plot. If again if there is a positive correlation, then one can assume that Ca could be used as a surrogate for neutralization potential (Kwong 2000, Day 1995, Downing & Giroux 1993).

Other surrogate element tests should include Mg vs NP and Ca+Mg vs NP.

3. NP vs MPA (Figure 4)

This type of plot shows the spatial distribution of neutralization and acidity. This plot is useful for presenting the acid rock generating potential of samples analyzed. - Carbonate Speciation

This type of analysis is very important in determining the main carbonate species which has a major implication upon neutralization potential. This can be done using various stains for calcite and dolomite or analytical techniques (Day, 1995). - Fe vs Al (Figure 5)

This type of plot is based upon the work by Bladh (1992), which essentially shows the propensity of sulphate minerals (goethite, jarosite, alunite) to form from the bulk chemistry of a sample. This is useful in predicting which samples when oxidized and weathered will form these minerals. - Trace Element Plots

Trace element plots should be used to show the distribution of trace elements that occur both with varying lithologies, ore and non-ore grade material and spatially within a deposit. It should be noted that it is not sufficient to calculate an average or median value for each trace element due to potential variability within a deposit. Methods of trace element analysis are discussed by Downing et al. (1998). - 3-Dimensional Plots

This type of plot can be devised using bubble plots, 3-D plot visualization or ternary diagrams, all of which are available in various software packages. An example of using bubble plot analysis is presented in Figures 6 and 6B. Figure 6 shows the rock types using the TiO_{2}vs Zr classification, while Figure 6B shows the same rock type classification but with a bubble indicating the NP/MPA ration, indicative of which rock type has better neutralization potential. These types of plots are good for examining more than three variables.

__Univariate Data Analysis__

Univariate statistical methods are used to summarize individual parameter data (assuming single population). Before univariate statistics are meaningful, the number of populations present must be established by plotting frequency (histograms) or probability plots. If a single population exists, then its tendency to normality or lognormality must also be established. For a single population, the Distribution Parameters can be calculated (mean-arithmetic, geometric, mode and median). However, one must be aware of incorporating very high numbers and background (detection levels) in the statistics, as they can produce skewed and misleading results.

__Multivariate Data Analysis__

Multivariate statistical methods are used to determine inter-relationships between variables. This type of analysis includes correlation coefficients, factor analysis, cluster analysis and discriminant analysis.

Problems that arise in using these methods are:

- Data below the laboratory detection limit creates censored value distributions,
- Multiple populations (eg. AG,PAG,PAC etc.) may be present, and
- The distribution of the variables may be non-normal.

__Geostatistical Analysis__

Geostatistical methods are used to determine the spatial relationships of individual elements. This is commonly used in ore and waste rock resource calculations (Downing and Giroux, 1993, 1999).

__Background and Threshold Data Analysis__

Many variables measured in the course of baseline and environmental studies are continuous due to inherent geological/geochemical variability within a sample and from sample to sample. For example, 100 samples analyzed for neutralization potential might produce values ranging from 1 to 250 kgCaCO_{3}/t. The variable NP is continuous between these limits because any intermediate value could be assumed by a sample. Also, due to errors inherent in sampling and analysis there is no discrete number that can define that variable, only an approximation with the best number resulting from use of standards within the analytical laboratory. Sampling variability is seldom examined in ARD studies. Unfortunately, many people accept the numbers from a laboratory and attempt to define the necessary criteria that regulators want in order to define the limits of acid generation to acid consumption. The concept of defining these limits must take into account the inherent geological/geochemical variability. An approach to this problem would be the determination (or selection) of threshold values. This concept is used by geochemists to determine the background and anomalous values of variables (Parslow, 1974 and Sinclair, 1974, 1991). This is a graphical method of analyzing variable distributions. This method has also been used to determine the background values between natural and anthropogenic sources (Runnells et al, 1998).

The MSDOS software program Probability Plot (Stanley, 1987) is used for this analysis. This program is an interactive software tool which allows a user to rapidly analyze cumulative frequency data. This analysis takes the form of a modeling exercise, comparing the actual cumulative frequency distribution with a theoretical frequency distribution model. This model is both flexible in that it is capable of representing numerous forms of frequency distributions consisting of combinations (mixtures) of normal (or log-normal) component populations, as well as restrictive in that it cannot represent other distribution forms commonly encountered in cumulative frequency data (such as poisson, exponential, or binomial distributions).

The user should remember that the probability plot analysis is merely a comparison of an actual cumulative frequency distribution with a theoretical distribution model. The use of this program implies/requires that the user assume that the actual data ARE distributed in the same form as the theoretical model. If this assumption is not met (at least to some degree), then the program will be of little help in understanding the data. A successful or appropriate frequency distribution model can be used to decompose the multi-modal data distribution into its component populations. These, in turn, can be used to define thresholds which separate the data into groups corresponding to these component populations.

This type of data analysis is useful for the following reasons:

- Determination of the number of populations from observations of both histograms and cumulative plots by optimizing this frequency distribution model and decomposing the data distribution into its component populations. This can be done by using normal and/or lognormal data. When two or more populations occur, then the characteristics for each population need to be determined (i.e. mineralogy, geochemistry),
- Model the cumulative plots in order to determine the relative percentage for each population by determining the general form of the theoretical frequency distribution model, and
- Determine the threshold values for each population by selecting thresholds to partition the data into groups representative of these component populations.

__Case History__

Data for this case history comes from a porphyry copper-gold deposit that contains both supergene and hypogene mineralization.

The following analysis was conducted:

- Arithmetic and logarithmic histograms and cumulative plots were constructed
- Plots were examined for populations and inflexion points noted
- Populations divided and modelled
- Statistical and Threshold values determined for each population (Table 1)
- Possible characteristics determined for each population
- Sampling
- More samples taken for a particular rock/mineralization type
- More samples taken in high grade zone than for either low grade or waste rock

- Grade
- Cutoff
- Low grade
- High grade

- Neutralizing minerals
- Short term (carbonate)
- Long term (silicate)

- Acid Generating Minerals
- Fast (pyrite)
- Slow (chalcopyrite)

- Mineralogy
- Gangue minerals
- Ore minerals
- Alteration minerals

- Rock Types
- Lithogeochemistry (see Downing & Madiesky, 1998)
- Trace Elements (see Downing and Gravel, 1998)

- Sampling
- Site specific threshold values determined for:
- Acid generating material
- Potential acid generating material
- Potential acid consuming material
- Acid consuming material

The results are summarized in Table 1.

__Supergene__

Copper: Not Determined

Sulphur:

Population 1 very low sulphide content, very low sulphate content

Population 2 low sulphide content, low sulphate content

Population 3 moderate sulphide content, low to moderate sulphate content

Population 4 high sulphide content, high sulphate content

NP:

Population 1 low carbonate content, very low non-carbonate gangue NP contributing material

Population 2 low to moderate carbonate content, low non-carbonate gangue NP contributing material

Population 3 moderate carbonate content, low non-carbonate gangue NP contributing material

Population 4 high carbonate content, moderate non-carbonate gangue NP contributing material

NP/MPA:

Population 1 negligible carbonate content, no non-carbonate gangue material, high sulphide content

Population 2 low carbonate content, negligible non-carbonate gangue material moderate sulphide content

Population 3 moderate carbonate content, low non-carbonate gangue material low sulphide content

Population 4 high carbonate content, moderate non-carbonate gangue material, negligible sulphides

Conclusions

- Not enough samples have been taken to properly characterize low grade ore from waste rock material for copper grade. This will also impact the sulphur and NP populations.
- Sulphides are pyrite/marcasite, chalcopyrite, bornite, trace galena & sphalerite
- Sulphate content due to barite, gypsum, jarosite, melanterite
- Each sample within a specific population must be identified in the field, as that particular population may have some aerial context (ie. oxide zone overlying the supergene zone)
- NP curve not as definitive regarding the populations as the NP/MPA curve.

__Hypogene__

Copper:

Population 1 low grade (includes waste rock)

Population 2 high grade

Population 1 low grade copper content, high pyrite content (waste rock to low grade ore)

Population 2 high grade copper content, low pyrite content

Population 1 negligible carbonate content, no non-carbonate gangue NP contributing material

Population 2 low carbonate content, negligible non-carbonate gangue NP contributing material

Population 3 moderate carbonate content, low non-carbonate gangue NP contributing material

Population 4 high carbonate content, moderate non-carbonate gangue NP contributing material

Population 1 negligible carbonate content, no non-carbonate gangue material, high sulphide content

Population 2 low carbonate content, negligible non-carbonate gangue material moderate sulphide content

Population 3 moderate carbonate content, low non-carbonate gangue material low sulphide content

Population 4 high carbonate content, moderate non-carbonate gangue material, negligible sulphides

Conclusions

- Not enough samples have been taken to properly characterize low grade ore from waste rock material for copper grade. This will also impact the sulphur and NP populations.
- 86% of samples have low NP values (populations 1 & 2)
- 10% of samples have a NP/MPA ratio > 4
- mixing of supergene and hypogene material is highly dependent upon segregation into populations which may not be economically feasible
- mixing of hypogene material is highly dependent upon segregation into populations and tonnage of each material

__Error Analysis__

Errors associated with sampling, inappropriate test procedures, tests run incorrectly and chemical analysis may be difficult to define, measure and analyze. Precision and Replication are parameters that are dealt with QA/QC procedures and must be documented in any ARD report. The Thompson-Howarth (1976) technique is a rigorous statistical method of calculating the differences between duplicate data in order to determine precision.

Errors also occur in differences of scale between testwork and the actual mine operations. This error will have far reaching implications if not recognized and solved during testwork and ARD mine planning.

__Quality Assurance / Quality Control Analysis__

This aspect of data management and analysis of data is discussed by Downing and Mills, 1998. Analysis and reporting of QA/QC is very important in order to give assurances that the data are reliable.

__Detection Level Analysis__

The determination of the accepted value when analytical results are reported as "at or below" detection level is important for sulphide (sulphur and sulphate analysis) and trace elements, and must be stated in the ARD report. One method is to use the detection level of the analytical method as the value, while another method is use half the detection level as the value.

__Lithogeochemical Data Analysis__

The analysis of lithogeochemical data is discussed in detail by Downing and Madeisky, 1999. Lithogeochemical data is representative of the bulk chemistry (and hence the bulk mineralogy) of a sample and as such can be used to predict the characteristics of lithological units, weathering potential, alteration potential and the determination of the theoretical and empirical buffering capacities. This is the only method that takes into account the mineralogy and chemistry of a sample and is therefore representative of that sample's buffering capacity.

__Particle Size Analysis__

Particle size, particle size distribution and individual mineral grain size are parameters that affect both acid generation and acid neutralization. These relationships are discussed by Mills (1998) and Scharer et al. (2000). Particle size analysis is a necessary aspect of any ARD study.

__Probability Analysis__

The ARD practitioner must eventually deal with the probability of "will this material become acid generating" and "when will it go acid". This aspect deals with both analytical and experimental data, the latter will give an approximation if rigorous experiments are conducted.

__Computer Modelling Analysis__

Computer modelling of geochemical data from acid-generating waste rock piles simulating the geochemical processes to predict the quality of acid rock drainage is discussed by Perkins et al, 1995. Various computer programs are reviewed based on five categories: equilibrium models, mass transfer modes, coupled mass transfer-flow models, "supporting " modes and "empirical and engineering" models. These models can be used for improving the understanding of the interactions between geochemical processes and for performing comparisons between decommissioning scenarios.

Computer models for predicting water quality from waste rock piles, tailings impoundments and open pit may use some of the above computer programs in conjunction with hydrogeological models. An approach for modelling pit filling and pit lake chemistry on mine closure is discussed by Bursey et al, 1997. Numerous papers have been presented regarding computer modelling for predicting water geochemistry and the reader is referred to the published papers in the proceedings from the 5^{th} International Conference on Acid Rock Drainage, May 2000.

As with all computer modelling, care must be taken that the ARD practitioner understands the program and its capabilities, including data input and predicted results.

__Geo-Environmental Analysis__

As ARD is essentially controlled by bedrock geology, an understanding of the geological environment is extremely important. Conceptual models for ore formation and mineralization provide the ARD practitioner and regulator with some ideas as to the potential size of metal leaching and mobility both from a natural (pre-mining) and anthropogenic (mining) context. This thesis is discussed in published papers by Alpers and Nordstrom (2000) and Kwong (2000). This concept is equally important for understanding mineralized bedrock where no metal mining is practical and ARD material is excavated for construction purposes. Data analysis in this aspect includes good data collection and rigorous interpretation(s) as to understanding background values and establishing thresholds.

The costs of an ARD study is a major component in the environmental survey. These costs have a far ranging impact upon the mining plan, reclamation and closure. It is necessary to conduct a thorough study and collect quality data that can be used with confidence by the mining engineers. This also has a profound impact upon the acceptance of the ARD study by the regulators and public. The ARD study should never be compromised by an underachieving budget (i.e. the budget should NOT pre-determine the thoroughness of the ARD study).

Costs involved in data are:

- Sampling,
- Quality control,
- Additional data analyses
- Compilation and data base management, and Interpretation and report.

Visual plots should be a major component of all reports as they show the reader the distribution of particular parameters from which interpretations are made. It also gives some assurance to the reader that the ARD practitioner has conducted a credible data analysis and interpretations are backed up with the data plots. Each study may be site specific, but the data analysis routines are the same.

**ŠThe
contents of this web page are protected by copyright law. Please
contact the authors for permission to re-use the contained
information.**

Stanley, C., 1987, *Probplot*, Association of Exploration Geochemists, Special Volume 14