## Topic outline

### Training Methodology & Objectives

**Training Methodology**__This Trainee-centered course includes the following training methodologies:-__- Talking presentation Slides (PPT with audio)
- Simulation & Animation
- Exercises
- Videos
- Case Studies
- Gamification (learning through games)
- Quizzes, Pre-test & Post-test

**Course Objectives****After completing the course, the employee will:-**Apply and gain a comprehensive knowledge on data engineering Use data analysis techniques for engineers, technologists and managers Explain basic statistics, data collection and sampling, sources of data, published data, observational and experimental studies Discuss the general use of graphical and numerical methods as well as the rules of thumb for drawing histograms Construct histogram, bar chart and pie chart Review notation and measures of centrality including the sample and population means Calculate the mean, median and mode as well as discuss the advantages and disadvantages of the mean and the median Use median for asymmetric distributions and describe the normal distribution and data Review left and right skewed distribution including the relation between mean and median and measures of variability Calculate the coefficient of variation and the range, mean for grouped data and median for grouped data Describe box plots, relative frequencies for qualitative data and relative frequencies Discuss the distributions of random variables, basics of probability, frequency interpretation and need of probability Determine the sum of the probabilities of the sample points Identify disjoint or mutually exclusive outcomes, probabilities when events are not disjoint and useful results Capture the population parameter, change the confidence level and interpret confidence intervals Employ hypothesis tests based on a difference means and examine the standard error formula ### Course Description

##### This E-learning course is designed to provide participants with a detailed and up-to-date overview of data engineering. It covers the data analysis techniques for engineers, technologists and managers; the basic statistics, data collection and sampling; the sources of data, published data, observational and experimental studies; the general use of graphical and numerical methods; the rules of thumb for drawing histograms; the notation and measures of centrality; the construction of histogram, bar chart and pie chart; the sample and population means; the calculation of the mean, median and mode; the advantages and disadvantages of mean and median; the use of median for asymmetric distributions, the normal distribution and data; the left and right skewed distribution; and the relation between mean and median and measures of variability

##### During this course, participants will learn the calculation of the coefficient of variation and the range; the mean and median for grouped data; the box plots, relative frequencies for qualitative data and relative frequencies; the distributions of random variables, basics of probability, frequency interpretation and need of probability; the sum of the probabilities of the sample points; the disjoint or mutually exclusive outcomes, probabilities when events are not disjoint and useful results; the capturing of the population parameter, changing of the confidence level and interpretation of confidence intervals; the hypothesis tests based on a difference means; and the examination of the standard error formula

### Module 1 - Basic Statistics: A Survival Guide

__Contents:__- Basics of Statistics
- Data Collection and Sampling
- Introduction
- Sources of Data
- Published Data
- Observational and Experimental Studies
- Surveys
- Sampling
- Sampling Plans
- Video: Introduction to Statistics (1.1)
- Module Qui

### Module 2 - Describing Data

__Contents:__- General Use of Graphical and Numerical Methods
- Graphical Methods
- Histograms
- Rules of Thumb for Drawing Histograms
- Example 2.1
- Construction of a Histogram
- Bar Charts
- Pie Charts
- Example 2.2
- Construction of a Bar Chart and a Pie Chart
- Numerical Methods
- Notation and Measures of Centrality
- The Sample and Population Means
- The Sample Median
- The Sample Mode
- Example 2.3
- Calculation of the Mean, Median and Mode
- The Weighted Mean
- The Mode
- Advantages and Disadvantages of the Mean and the Median
- Use of Median for Asymmetric distributions
- The Normal Distribution
- Describing Data
- A Left Skewed Distribution
- A Right Skewed Distribution
- Skewness and the Relation between Mean and Median
- Measures of Variability (Dispersion)
- The Sample Variance
- Standard Deviation
- Coefficient of Variation
- Example 2.4
- Calculation of Sample Variance
- Calculation of the Coefficient of Variation and the Range
- Grouped Data
- The Mean of Grouped Data
- The Sample Variance of Grouped Data
- Calculation of the Mean for Grouped Data
- Calculation of the Sample Variance for Grouped Data
- Calculation of the Median for Grouped Data
- Percentiles and Box Plots
- The Interquartile Range
- Boxplot (Box and Whisker plot)
- Box Plot
- Description and Comparison of Box Plots
- Describing Relative Frequencies for Qualitative Data
- Describing Relative Frequencies
- Video: Mode, Median, Mean, Range, and Standard Deviation (1.3)
- Module Quiz

### Module 3 - Distributions of Random Variables

__Contents:__- The Basics of Probability
- Probability
- Frequency Interpretation of Probability
- The Need of Probability
- Sum of the Probabilities of the Sample Points
- Operations on Events
- Disjoint or Mutually-Exclusive Outcomes
- Operations on Events
- Probabilities When Events are Not Disjoint
- Useful Results
- Probability Distribution
- Bernoulli Distribution
- The Binomial Distribution
- Formula for the Binomial Distribution
- Probability Distribution
- Expectation
- The Normal Distribution
- The “Coin Tossing” Distribution
- Random Sampling
- The “Coin Tossing” Distribution of 10-Toss Data Points
- The “Coin Tossing” Distribution of 25-Toss Data points
- The “Coin Tossing” Distribution of 50-Toss Data points
- The “Coin Tossing” Results for millions of tosses done 10,000 per datapoint
- The “Coin Tossing” Results for 50,000 data points of 32 Tosses Each
- You will see this Curve much more!
- Normal Distribution
- Normal Distribution Model
- Standardizing with Z Scores
- Normal Probability Table
- Normal Probability Examples
- 68-95-99.7 Rule
- Video: The Normal Distribution and the 68-95-99.7 Rule (5.2)

### Module 4 - Foundations for Inference

__Contents:__- Introduction
- Variability in Estimates
- Point Estimates
- Point Estimates Are Not Exact
- Standard Error of the Mean
- Point Estimates Are Not Exact
- Standard Error of the Mean
- Basic Properties of Point Estimates
- Confidence Intervals
- Capturing the Population Parameter
- An Approximate 95% Confidence Interval
- A Sampling Distribution for the Mean
- Changing the Confidence Level
- Interpreting Confidence Intervals
- Nearly Normal Population with Known SD
- Video: Confidence Interval for a population mean - σ known
- Module Quiz

### Module 5 - Foundations for Inference Hypothesis Testing

__Contents:__- Hypothesis Testing
- Hypothesis Testing Framework
- Decision Errors
- Formal Testing Using P-values
- Two-sided Hypothesis Testing with P-values
- Choosing a Significance Level
- Examining the Central Limit Theorem
- Inference for Other Estimators
- Confidence Intervals for Nearly Normal Point Estimates
- Non-normal Point Estimates
- When to Retreat
- Sample Size and Power (Special Topic)
- Finding a Sample Size for a Certain Margin of Error
- Power and the Type 2 Error Rate
- Statistical Significance versus Practical Significance
- Module Quiz

### Module 6 - Inference for Numerical Data

__Contents:__- Paired Data
- Paired Observations and Samples
- Inference for Paired Data
- Difference of Two Means
- Point Estimates and Standard Errors for Differences of Means
- Confidence Interval for the Difference
- Hypothesis Tests Based on a Difference in Means
- Summary for Inference of the Difference of Two Means
- Examining the Standard Error Formula
- One-sample Means with the T Distribution
- The Normality Condition
- Introducing the T Distribution
- The T Distribution as a Solution to the Standard Error Problem
- One Sample T Confidence Intervals
- One Sample T Tests
- The T Distribution for the Difference of Two Means
- Two Sample T Test
- Two Sample T Confidence Interval
- Video : 5 1A t distribution

### Module 7 - Comparing Many Means with ANOVA

__Contents:__- Pooled Standard Deviation Estimate
- Comparing Many Means with ANOVA
- Is Batting Performance Related to Player Position in MLB?
- Analysis of Variance (ANOVA) and the F Test
- Reading an ANOVA Table from Software
- Graphical Diagnostics for an ANOVA Analysis
- Multiple Comparisons and Controlling Type 1 Error Rate
- Analysis of Variance (ANOVA)

### Module 8 - Introduction to Linear Regression

__Contents:__- Introduction to Linear Regression
- Introduction
- Line Fitting, Residuals, and Correlation, Beginning with Straight Lines
- Beginning with Straight Lines
- Fitting a Line by Eye
- Residuals
- Describing Linear Relationships with Correlation
- Fitting a Line by Least Squares Regression
- Linear Regression and Calibration Curves
- Video: An Introduction to Linear Regression Analysis
- Module Quiz

### Module 9 - Revision Example

__Contents:__- Data Collection, Analysis, Interpretation and Communication
- Analysis of Experimental Data
- Standard Error Estimate – For a Single Measurement
- Estimating and Reporting Uncertainty – RepeatedMeasurements
- Example: Standard Deviation
- Calculation Error Estimates – Propagation of Independent Errors
- Example: Error Propagation
- Linear Regression and the Correlation Coefficient
- Example: Least-squares Linear Regression
- Interpreting Linear, Power, and Exponential Equations
- Example Linear Manipulation of Non-linear Models
- Interpreting Linear, Power, and Exponential Equations
- Student’s T-test
- Example: One-sided T-test
- Example: Two-sided T-test
- One-way Analysis of Variance (ANOVA)
- Example: One-way ANOVA
- Module Quiz

### Module 10 - Introduction to Statistical Process Control Techniques

__Contents:__- Quality Control Today
- New Demands On Systems Require Action
- So what is Statistical Process Control?
- Where did this idea originate?
- What exactly are process control charts?
- How do they work?
- What’s this relationship between variation and assignable causes?
- So how are these normal-predictable variance levels determined?
- What about these rules violations for determining if a series of points within the control limits is unnatural?
- In control? Out of control? What’s the point?
- What’s this about capabilities?
- Can any type of process data be judged using Control Charts?
- What specifically do they look like, what are their key features and how are they created?
- How do I go about making these control charts?
- When is the best time to start?
- Steps Involved In Using Statistical Process Control
- Specific SPC Tools and Procedures
- Identification and Data Gathering
- Prioritizing
- Pareto Charts
- Cause-and-Effect or Fishbone Diagram
- Flowcharting
- Scatter Plots
- Box Plot
- Control Charts - Fundamental Concepts and Key Terms
- Analysis of Quality Control Materials
- Analysis of Spiked (Fortified) Samples
- Introduction to Constructing and Interpreting Control Charts
- Shewhart Charts
- Interpretation of X-charts and X-bar Charts
- Range Chart
- Module Quiz