123. Confidence Intervals(신뢰구간)

ANOVA (Analysis of Variance) - 2개 이상 그룹의 평균에 차이가 있는지 가설 검정 하는 방법
1. Question: ttest 3번으로 가설 검정 할수 있지 않나? NO — Multiple Comparison Problem — 그룹수가 늘어 날 수록 에러도 커진다.
2. One-Way vs. Two-Way = If there is only one independent variable, then one-way If there are two independent variables, then two-way
  1. ex. One-Way: Testing the relationship between shoe brand (Nike, Adidas, Saucony, Hoka) and race finish times in a marathon.
  2. Two-Way: Testing the relationship between shoe brand (Nike, Adidas, Saucony, Hoka), runner age group (junior, senior, master’s), and race finishing times in a marathon.
3. F-Statistic
  
  F-value가 높으면
  1. 분자(다른 그룹끼리의 분산)는 크고, 분모 (전체 그룹의 분산)는 작아야 됨
  2. 즉 다른 그룹끼리의 분포가 다를 것이다
    
    https://tensorflow.blog/f-값-유도식/
4. F-Stat by scipy
Many Samples
1. 큰 수의 법칙(Law of Large Numbers)- Sample 데이터의 수가 커질 수록, sample의 통계치는 점점 모집단의 모수와 같아진다
중심극한정리 (Central Limit Theorem, CLT) - Sample 데이터의 수가 많아질 수록, sample 평균은 정규분포에 근사한 형태로 나타난다.
신뢰도(confidence) - 신뢰도가 95% 라는 의미는 표본을 100번 뽑았을때 95번은 신뢰구간 내에 모집단의 평균이 포함된다.

ex. 초등학교 3학년 1000명 평균 키

예측 하는 "구간"이 넓어질 수록 맞을 확률(신뢰도)은 올라감.
신뢰구간(confidence interval)

CI 계산법

정리:
1. ANOVA- 2개 이상의 그룹 평균차 계산
  1. F-Stat: 그룹간의 분산/그룹의 분산
2. Law of Large Numbers- Sample Data가 클수록 통계치는 점점 Population의 모수와 같아진다
3. CLT- Sample Data가 많아질수록 Sample의 평균은 정규분포(Normal Distribution)에 형태를 나타낸다
4. Confidence Intervals- x를 찾을수 있는 확률이 95%인 곳의 범위 (신뢰도가 95%인 가정하에)