Genome-Wide Association Study: 12월 2008

2008년 12월 15일 월요일

RMA

RMA (Robust Multi-array Analysis)
RMA는 어피 진칩을 프로브수준에서 신호강도를 표준화하고, 요약하는 방법이다. 프로브 수준의 데이터로부터 시작하여 PM값들이 배경신호 정정되고, 표준화되어, 마지막으로 발현양이 요약된다. 다음의 세단계로 이루어 진다.

배경신호 정정
배경신호정정은 프로브수준 프로세싱에 있어 가장 중요한 단계이다. RMA에 사용되는 배경색정정은 비선형정정 (non-linear correction)이며, 칩당으로 이루어진다. 어피칩상의 프로브간의 PM값의 분포에 기초한다. PM값은 배경신호의 혼합이며, 광학적 잡음과 비특이결합, 등등에 의해 발생된다. The background is estimated as expectation of the signal (S) conditioned on observed PM values (O), using a kernel density estimation in both GeneSpring GX 7.3.1 and GeneSpring GX 9.0. However,, however GeneSpring GX 7.3.1 uses direct convolution while GeneSpring GX 9.0 uses Fast Fourier Transformation.

Normalization
Normalization is necessary so that multiple chips can be compared to each other, and analyzed together. The normalization procedure is aimed at making the distributions identical across arrays. The normalization used in RMA is quantile normalization. This usually gives very sharp normalizations.Both GeneSpring GX 7.3.1 and GeneSpring GX 9.0 use quantile normalization. Note that, in this procedure, all the arrays are used and no chip is discarded based on extreme value considerations.

Summarization
Once the probe-level PM values have been background-corrected and normalized, they need to be summarized into expression measures, so that the result is a single expression measure per probe-set, per chip. The summarization used is motivated by the assumption that observed log-transformed PM values follow a linear additive model containing a probe affinity effect, a gene specific effect (the expression level) and an error term. For RMA, the probe affinity effects are assumed to sum to zero, and the gene effect (expression level) is estimated using median polishing. Median polishing is a robust model fitting technique, that protects against outlier probes. Both GeneSpring GX 7.3.1 and GeneSpring GX 9.0 use same methodology for summarization.

2008년 12월 14일 일요일

affy chip 데이터 분석

어피칩 데이터의 분석

어피칩 raw data cell 파일의 표준화 (normalization)는 RMA 방법을 사용,

왜? 서로 다른 그룹간의 발현차이를 보는데는 RMA 방법이 제일 좋은 것으로 연구됨.

R , bioconductor의 affy 명령을 사용해서 RMA를 수행, 그럼, 배경신호제거, 사분위수표준화, 그리고 메디안폴리싱을 수행한다. 사분위수표준화는 칩간 표준화이고, 메디안폴리싱은 진간 표준화이다.

이때, affy 칩 관련 bioconductor 패키지를 이용해서, 다양한 QC를 하도록 한다.

MA, RLE, NUSE, 등등

이후의 발현비교는

limma 패키지나 maanova를 이용해서 해결하면 될 듯.

2008년 12월 11일 목요일

RMA/GCRMA 표준화이후에 또 표준화를 해야하나?

RMA/GCRMA 표준화는 시료수준에서 수행되는 표준화방법이다. 따라서 이 방법이 적용된 칩데이터에 칩당 표준화를 수행하는 것은 중복일 뿐이다. 따라서 유전자간 표준화 (per gene)를 수행해야 한다.

2008년 12월 3일 수요일

--linear or --logistic 과 --covar 사용시의 p 값 의미?

--covar 명령을 --linear/--logistic 과 같이 사용했다면, plink는 다중회귀분석을 수행하게 되며, SNP, 공변량 (covariate), 및 상호작용 term에 대한 coefficient (상관계수?) 와 p 값을 보고한다. 단지 intercept에 대한 term만 제외된다.

공변량에 대한 p 값은 공변량에 의한 조절후의 SNP-표현형 연관성에 대한 p 값이 아니다. ADD 값이 바로 공변량을 반영시킨 후의 SNP-표현형간의 연관성에 대한 p 값이다. 그 뒤의 값들이 공변량와 피노타입 간의 연관성분석의 p value 값이다. 그리고, 이 값이 극단적으로 유의하게 나오긴 하지만, SNP가 고도로 유의한 영향을 가진다는것을 의미하는것은 아니다.

CHR SNP BP A1 TEST NMISS BETA STAT P
1 rs1234567 742429 G ADD 1495 -0.03335 -0.1732 0.8625
1 rs1234567 742429 G COV1 1495 0.1143 9.748 8.321e-022

위 예에서, 공변량이 극히 상관성이 있다고 나오지만, SNP가 표현형과 관련이 있다는 증거가 되지 않는다.

Genome-Wide Association Study