====== Where to find summary stats ======

Summary statistics for many GWASes are publicly available. One way to find them is through EBI's [[https://www.ebi.ac.uk/gwas/|GWAS catalog]]. Another large public repository is http://ldsc.broadinstitute.org/gwashare/.

The biostats group maintains NORMENT's own GWAS summary statistics repository. Its purpose is to keep historical copies of specific summary statistics versions that were used in publications, to facilitate the use of summary statistics tables via format standardization, and to store early-access or non-public summary statistics we receive through collaboration.

For data received through collaboration it's important to check if collaborators require additional  steps  to authorize the use of the data in publication. There is, however, one exception to this rule: for noTOP and noDEMGENE versions of publicly available datasets we can use them in publications (after citing the primary consortia paper).


===== NOTE: The consortia below have restrictions on sumstats/data use and sharing =====

**23andMe** – Any manuscript using sumstats that include 23andMe participants must send their manuscript to 23andMe (publication-review@23andme.com) before submission to a journal. Use of sumstats must comply with application directives.

**UKB** – when UKB data is used, the manuscript must be shared with the UKB (access@ukbiobank.ac.uk) before submission to a journal. Additionally, use of data must comply with application directives. Derived data + code + published manuscript must be shared with UKB within 6 months of publication.

**Steps for sharing with UKB:**
  * Login to AMS system here: https://bbams.ndph.ox.ac.uk/ams/
  * On the left go to "Projects" tab
  * Then "Admin" tab on top
  * At the bottom of the "Admin" page press the "Go to Upload site" button
  * Select a type of data you are going to upload ("Manuscript", "Code" etc.)
  * Enter requested details and complete uploading. 

**CHARGE** (cognition) – use of sumstats must be cleared by the CHARGE consortium before submission of the manuscript for publication.

**MVP** – sumstats are restricted to use on TSD and should not be shared elsewhere.
===== Repositories =====

You need to be aware of three GWAS repositories: the current, the old one, and the very old one.

  * current repo is established April 2021. It is maintained within TSD p697 project, and then rsync-ed to p33 by cron jobs. As of April 2021 it uses the same pipeline as the old repository (i.e. python_convert/sumstats.py), but this may change in the future. This repository is not synced to NIRD nor MMIL. This repo contains all types of summary statistics (public, through collaboration, and restricted access). This repository is documented in [[https://docs.google.com/spreadsheets/d/19cURugXQQyLgfLU-gwCReuWK99DcpODOpSyDkaR-bow/|MMIL-Oslo GWAS Inventory v2]] google spreadsheet. To include new summary stats in this repository please fill in [[https://docs.google.com/forms/d/e/1FAIpQLSd2wNKmrZlymlc4j8ByM9m3d1XCtyAI-U7h5uBwMJ3azvjkxA/viewform|the form]].
  * old repo is established in 2016, and maintained at UCSD MMIL servers until 2021, with rsync to NIRD and TSD. It used exactly the same google form / MMIL inventory spreadsheet, but it will no longer be updated with new data submitted after ~April 2021.
  * very old repo was used until 2016.

==== 1. Current repository ====

<code>
Location at p697 project (master copy):
    /cluster/projects/p697/projects/SUMSTATv2_RAW   # original sumstats 
    /cluster/projects/p697/projects/SUMSTATv2       # harmonized sumstats

Location at p33 project (read-only copy):
    /cluster/projects/p33/groups/biostat/SUMSTATv2_RAW
    /cluster/projects/p33/groups/biostat/SUMSTATv2

(!) Any changes to SUMSTATv2_RAW / SUMSTATv2 need to be done in p697 project
(!) Changes to these folders within p33 will be overwritten by cron jobs
(!) Any new files added within p33 folders will be removed by cron jobs
</code>


The name of folder in current repository similar to old repository.


==== 2. Old repository ====

Old repository is hosted at MMIL servers in UCSD (''ip113.ucsd.edu''), and replicated to NIRD (''login.nird.sigma2.no'', formerly known as NORSTORE), and TSD p33 (https://www.uio.no/english/services/it/research/sensitive-data/).

It was documented in [[https://docs.google.com/spreadsheets/d/19cURugXQQyLgfLU-gwCReuWK99DcpODOpSyDkaR-bow/|MMIL-Oslo GWAS Inventory v2]] google spreadsheet, but any sumstats submitted after ~April 2021 are not processed.

<code>
Location at MMIL server:
    /space/syn03/1/data/GWAS/SUMSTAT

Location at NIRD server:
    /projects/NS9114K/MMIL/SUMSTAT
    
Location at TSD server:
    /tsd/p33/data/durable/s3-api/mmil/SUMSTAT
    /tsd/p33/data/durable/s3-api/mmil2/SUMSTAT_RESTRICTED 
    (data from collaboration with restricted access, TSD only)    
</code>

Inside ''SUMSTAT'' folder you will find
  - ''RAW'' folder contains raw GWAS data files, downloaded from the different consortia, organized into folders with standard names ''<CONSORTIUM>_<TRAIT>_<YEAR>_<XXX>''
  - ''STD'' folder contains processed ''.csv.gz'' files where all columns names are standardized but no other adjustments are made. The set of SNPs or markers precisely correspond to the original files. You should expect the same number of lines in the raw GWAS data files and in processed ''.csv.gz'' files. The set of columns available for each summary statistic is shown below in table //COLUMNS AVAILABLE//.
  - ''ANALYSIS'' folder contains GWAS Manhattan plots and tables with genome-wide significant loci, lead variants, and independent significant variants (see below for definitions). In addition, ANALYSIS folder folder contains LD score regression results (genetic correlation among all pairs of traits, heritability and partitioned heritability estimates). Use this data with caution.
  - ''TMP/mat_9545380'' contains summary stats aligned with mat_9545380 template (MATLAB format). Older references (mat_2m and mat_9m) are deprecated and should no longer be used.

For each summary statistic we adopt the following naming convention:

  /projects/NS9114K/MMIL/SUMSTAT/RAW/<CONSORTIUM>_<TRAIT>_<YEAR>_<XXX>
 
which applies to folders under `RAW` and to trait names under ''STD'', ''MAT_2M'' and ''MAT_9M''.
Each piece of GWAS data resides inside a directory named by cohort, trait and year. The ''<CONSORTIUM>'' could also indicate the name of the group, etc. The ''<XXX>'' indicate additional information, which may be necessary to generate a unique directory. 

==== 3. Very old repository ====

Old repository, documented in [[https://www.dropbox.com/s/klikpvc7w9lr3pt/MMIL-Oslo%20GWAS_Inventory.xlsx?dl=0|MMIL-Oslo GWAS_Inventory.xlsx]] file. To access this repository from NIRD your should replace ''/space/syn03/1/data/GWAS'' path from ''MMIL-Oslo GWAS_Inventory.xlsx'' file with ''/projects/NS9114K/MMIL/''. To setup your access to NIRD follow [[https://github.uio.no/norment/genodoc/wiki/How-to-apply-for-Norstor-and-Nortur-accounts|this guide]].

<code>
Location at MMIL server:
    /space/syn03/1/data/GWAS/GWAS_Original_Summary_Stats
    /space/syn03/1/data/GWAS/old_25_SNPs_processed_summary
    /space/syn03/1/data/GWAS/new_9m_SNPs_processed_summary

Location at NIRD:
    /projects/NS9114K/MMIL/GWAS_Original_Summary_Stats
    /projects/NS9114K/MMIL/old_25_SNPs_processed_summary
    /projects/NS9114K/MMIL/new_9m_SNPs_processed_summary
    
Location at TSD:
    /tsd/p33/data/durable/s3-api/mmil/GWAS_Original_Summary_Stats
    /tsd/p33/data/durable/s3-api/mmil/old_25_SNPs_processed_summary
    /tsd/p33/data/durable/s3-api/mmil/new_9m_SNPs_processed_summary
</code>


===== How to use the GWAS/SUMSTAT repositories ====

If you are searching for a specific summary stats, please check both inventory spreadsheets: 
[[https://docs.google.com/spreadsheets/d/19cURugXQQyLgfLU-gwCReuWK99DcpODOpSyDkaR-bow/edit|MMIL-Oslo GWAS Inventory v2]] and [[https://www.dropbox.com/s/klikpvc7w9lr3pt/MMIL-Oslo%20GWAS_Inventory.xlsx?dl=0|MMIL-Oslo GWAS_Inventory.xlsx]].

 To get a .MAT file (Matlab format) feel free to download them either from MMIL server ip113.ucsd.edu, or from NIRD - both contain exactly the same data. Your starting point is ''/space/syn03/1/data/GWAS/SUMSTAT'' (at MMIL) or ''/projects/NS9114K/MMIL/'' (at NIRD).

If summary stats that you need for your project are not present in the inventory files, please describe the GWAS using [[https://docs.google.com/forms/d/e/1FAIpQLSd2wNKmrZlymlc4j8ByM9m3d1XCtyAI-U7h5uBwMJ3azvjkxA/viewform|this form]], and send me (<oleksandr.frei@gmail.com>) instructions how to download raw data --- e.g. text file shared by consortia or through collaboration. I'll be happy to include it in the new GWAS/SUMSTAT repository. Same applies to summary stats that are present in the old repository, but not in the new one - if you need those summary stats for an ongoing project it will be good idea to also include them in the new repository. On contrary, summary stats included in the new repo GWAS/SUMSTAT might not be copied to the old repository.

If you have manually converted raw summary stats to .MAT files it is possible to include them in the original GWAS summary stats repository, but not in the new GWAS/SUMSTAT repository. The reason is that all .MAT files in the new repository must be created by an automated Makefile script and sumstats.py tool. This makes it easy to re-generate all .MAT files if we decide to include more data (such as nvec). 

If you need LD score regression results, please check ''SUMSTATS/LDSR/LDSR_Results'' folder.

If you need the data for custom analysis, perhaps outside of Matlab, please check SUMSTATS/STD folder. The files in this folder contain the same set of SNPs as the original summary statistics file, but  standardized column names, chromosome labels. For old summary stats CHR:BP is lifted to hg19 build and the rs# lifted to 149 snapshot of NCBI SNPdb.

If you need to edit or delete an existing GWAS entry in [[https://docs.google.com/spreadsheets/d/19cURugXQQyLgfLU-gwCReuWK99DcpODOpSyDkaR-bow/edit|MMIL-Oslo GWAS Inventory v2]], please contact <oleksandr.frei@gmail.com>.

===== Cohort =====

  * 23andMe = 23andMe, https://www.23andme.com/en-int/
  * CHARGE = Cohorts for Heart and Aging Research in Genomic Epidemiology, http://www.chargeconsortium.com/
  * COGENT = The Cognitive Genomics Consortium
  * CTG = Complex Traits Genetics https://ctg.cncr.nl/software/summary_statistics
  * DIAGRAM = DIAbetes Genetics Replication And Meta-analysis
  * ENIGMA = Enhancing Neuroimaging Genetics Through Meta-Analysis http://enigma.ini.usc.edu/
  * GIANT = Genetic Investigation of ANthropometric Traits
  * IGAP = International Genomics of Alzheimer's Project
  * LIPIDS = Global Lipids Genetics Consortium, http://csg.sph.umich.edu/abecasis/public/lipids2013/
  * NORMENT = Norwegian Centre for Mental Disorders Research, https://www.med.uio.no/norment/english/
  * PGC = Psychiatric Genomics Consortium, https://www.med.unc.edu/pgc/results-and-downloads
  * SSGAC = Social Science Genetic Association Consortium http://www.thessgac.org/
  * UKB = UK Biobank
  * GAMEON =  Genetic Associations and Mechanisms in Oncology https://epi.grants.cancer.gov/gameon/
  * COGS = Collaborative Oncological Gene-Environment Study http://www.cogseu.org/
  * CARDIOGRAM = Coronary ARtery DIsease Genome wide Replication and Meta-analysis (CARDIoGRAM) plus The Coronary Artery Disease (C4D) Genetics http://www.cardiogramplusc4d.org/
  * OKADA = https://www.ncbi.nlm.nih.gov/pubmed/24390342
  * EAGLE = EArly Genetics and Lifecourse Epidemiology, http://www.tweelingenregister.org/EAGLE/ 
  * XXX = Unknown

===== Phenotype abbreviations =====

  * COG = Cognition
  * T2D = Type2 Diabetes
  * BIP = Bipolar Disorder
  * SCZ = Schizophrenia
  * ADHD = Attention-Deficit/Hyperactivity Disorder
  * AD = Alzheimer disorder
  * ASD = Autism disorder
  * MDD = Major Depressive Disorder
  * RT = Reaction Time
  * VNR = Verbal Numeric Reasoning
  * BMI = Body Mass Index
  * WHR = Waist Hip Ratio
  * HEIGHT = Height
  * HDL = High-density lipids
  * LDL = Low-density lipids
  * TG = Triglycerides
  * TC = Total Cholesterol
  * ICV = Intracranial Volume
  * EDU = Educational Attainment
  * IQ = Intelligence Quotient
  * CRP = Chain Reactive Protein
  * BREAST = Breast Cancer
  * COLON = Colorectal cancer
  * OVARIAN = Ovarian cancer
  * PROSTATE = Prostate cancer
  * LUNG = Lung cancer
  * SWB = Subjective Well Being
  * SLEEP = Sleep Duration
  * DEPRESSIVE = Depressive Symptoms
  * RA = Rheumatoid Arthritis
  * CAD = Coronary artery disease

===== Columns available =====

Columns 'SNP', 'CHR', 'BP', 'PVAL', 'A1' and 'A2' are available in all files.
The remaining columns vary as shown in the table below.
The number in 'N', 'NCASE' and 'NCONT.' columns indicate the largest sample size across all SNPs,
while the actual sample size per SNP will be indicated in summary stats file.

<code>
file                                            	#snp     	N	NCASE	NCONT.	Z	OR	BETA	LOGODDS	SE	INFO	FRQ	NSTUDY	DIRECTION
23andMe_AGREE_2016                                	13794598 	59225	n/a	n/a	-	-	YES	-	YES	-	-	-	-
23andMe_COLLEGE_2016                              	15635593 	76155	n/a	n/a	-	-	YES	-	YES	-	-	-	-
23andMe_CONSC_2016                                	13794598 	59225	n/a	n/a	-	-	YES	-	YES	-	-	-	-
23andMe_EDU_2016                                  	15635593 	76155	n/a	n/a	-	-	YES	-	YES	-	-	-	-
23andMe_EXTRA_2016                                	13794598 	59225	n/a	n/a	-	-	YES	-	YES	-	-	-	-
23andMe_FemalePub_2015                            	13794598 	76831	n/a	n/a	-	-	YES	-	YES	-	-	-	-
23andMe_GA_2017                                   	15635593 	43568	n/a	n/a	-	-	YES	-	YES	-	-	-	-
23andMe_MDD_2016                                  	15635593 	n/a	75607	231747	-	-	YES	-	YES	-	-	-	-
23andMe_MIG_2016_v2                               	17289972 	n/a	2766	13702	-	-	YES	-	YES	-	YES	-	-
23andMe_MalePub_2015                              	13794598 	55871	n/a	n/a	-	-	YES	-	YES	-	-	-	-
23andMe_NEUR_2016                                 	13794598 	59225	n/a	n/a	-	-	YES	-	YES	-	-	-	-
23andMe_OPEN_2016                                 	13794598 	59225	n/a	n/a	-	-	YES	-	YES	-	-	-	-
23andMe_PD_2017                                   	15635593 	n/a	6476	302042	-	-	YES	-	YES	-	-	-	-
23andMe_PreB_2017                                 	15635593 	43568	n/a	n/a	-	-	YES	-	YES	-	-	-	-
23andMe_SWB_2018                                  	15648586 	252053	n/a	n/a	-	-	YES	-	YES	-	-	-	-
BROADABC_ASB_2017_qc_lift                         	9545468  	16400	n/a	n/a	YES	-	-	-	-	-	-	-	-
CARDIOGRAM_CAD_2015                               	9455778  	n/a	60801	123504	-	-	YES	-	YES	YES	-	YES	-
CHARGE_COG_2015_lift                              	2477984  	53949	n/a	n/a	-	-	YES	-	YES	-	-	-	-
COGENT_COG_2017_lift                              	8191097  	35298	n/a	n/a	-	-	YES	-	YES	YES	-	-	-
COGENT_COG_2017_noCHARGE_lift                     	8114859  	27888	n/a	n/a	-	-	YES	-	YES	YES	-	-	-
COGS_PROSTATE_2013_lift                           	204318   	n/a	25074	24272	-	-	YES	-	YES	-	-	-	-
CTG_COG_2018                                      	9295118  	269867	n/a	n/a	YES	-	-	-	-	-	YES	-	-
CTG_COG_2018_noTOP                                	10971333 	34637	n/a	n/a	-	-	YES	-	YES	YES	YES	-	-
CTG_INSOMNIA_2017                                 	12444915 	n/a	32384	80622	-	YES	-	-	YES	YES	YES	-	-
CTG_INSOMNIA_2017_females                         	12432936 	n/a	19521	39846	-	YES	-	-	YES	YES	YES	-	-
CTG_INSOMNIA_2017_males                           	12428591 	n/a	12863	40776	-	YES	-	-	YES	YES	YES	-	-
CTG_INTELLIGENCE_2017                             	12104294 	78308	n/a	n/a	YES	-	-	-	-	-	-	-	-
DIAGRAM_T2D_2012_lift                             	2473039  	n/a	12171	56862	-	YES	-	-	-	-	-	-	-
DIAGRAM_T2D_2016_lift                             	15749266 	44414	n/a	n/a	-	-	-	-	-	-	YES	-	-
DIAGRAM_T2D_2017_lift                             	12335836 	n/a	26676	132532	-	-	YES	-	YES	-	-	-	-
EAGLE_ADHD_2016_noGC_lift                         	6007632  	17666	n/a	n/a	YES	-	-	-	-	-	-	-	-
ENIGMA_HV_2016_noPGC_noGC                         	8636348  	11621	n/a	n/a	-	-	YES	-	YES	-	YES	-	-
ENIGMA_ICV_2016_noPGC_noGC_lift                   	8700908  	9826	n/a	n/a	YES	-	YES	-	YES	-	YES	-	-
ENIGMA_PALL_2016_noPGC_noGC                       	8630495  	11595	n/a	n/a	-	-	YES	-	YES	-	YES	-	-
ENIGMA_PUT_2016_noPGC_noGC                        	8634199  	11598	n/a	n/a	-	-	YES	-	YES	-	YES	-	-
GAMEON_BREAST_2013_BCAC                           	11099926 	n/a	16062	46157	-	YES	YES	-	YES	-	YES	YES	-
GAMEON_COLON_2015_CORECT                          	6726427  	n/a	2465	1497	-	YES	YES	-	YES	-	YES	YES	-
GAMEON_LUNG_2014_TRICL_6study                     	8945892  	n/a	12160	16838	-	YES	-	-	YES	-	YES	YES	-
GAMEON_OVARIAN_2013_FOCI                          	2520017  	7272	n/a	n/a	-	YES	-	-	YES	-	-	-	-
GIANT_BMI_2015_EUR_lift                           	2554049  	322154	n/a	n/a	-	-	YES	-	YES	-	YES	-	-
GIANT_BMI_2018_UKB                                	2348397  	795640	n/a	n/a	-	-	YES	-	YES	-	-	-	-
GIANT_HEIGHT_2010_lift                            	2469135  	133859	n/a	n/a	-	-	-	-	-	-	-	-	-
GIANT_HEIGHT_2014_lift                            	2550282  	253280	n/a	n/a	-	-	YES	-	YES	-	-	-	-
GIANT_HEIGHT_2018_UKB                             	2334001  	709706	n/a	n/a	-	-	YES	-	YES	-	-	-	-
GIANT_WHR_2015_EUR                                	2560781  	212248	n/a	n/a	-	-	YES	-	YES	-	-	-	-
ICBP_DBP_2011                                     	2461325  	69395	n/a	n/a	-	-	-	-	-	-	-	-	-
ICBP_SBP_2011                                     	2461325  	69395	n/a	n/a	-	-	-	-	-	-	-	-	-
IGAP_AD_2013                                      	7055881  	n/a	17008	37154	-	-	YES	-	YES	-	-	-	-
IGAP_AD_2013_noAPOE                               	7053300  	n/a	17008	37154	-	-	YES	-	YES	-	-	-	-
IIBDGC_CD_2015_EUR                                	11002658 	n/a	5956	14927	-	YES	-	-	YES	YES	-	YES	-
IIBDGC_CD_2017_lift                               	9585986  	n/a	12194	34915	-	-	YES	-	YES	-	-	-	-
IIBDGC_IBD_2015_EUR                               	11555662 	n/a	12882	21770	-	YES	-	-	YES	YES	-	YES	-
IIBDGC_IBD_2017_lift                              	9750491  	n/a	25042	34915	-	-	YES	-	YES	-	-	-	-
IIBDGC_UC_2015_EUR                                	11113952 	n/a	6968	20464	-	YES	-	-	YES	YES	-	YES	-
IIBDGC_UC_2017_lift                               	9603251  	n/a	12366	34915	-	-	YES	-	YES	-	-	-	-
IPDGC_PD_2014_lift                                	7808621  	n/a	13708	95282	-	-	YES	-	YES	-	-	YES	-
LIPIDS_HDL_2013                                   	2447441  	187167	n/a	n/a	-	-	YES	-	YES	-	-	-	-
LIPIDS_LDL_2013                                   	2437751  	173082	n/a	n/a	-	-	YES	-	YES	-	-	-	-
LIPIDS_TC_2013                                    	2446981  	187365	n/a	n/a	-	-	YES	-	YES	-	-	-	-
LIPIDS_TG_2013                                    	2439432  	177860	n/a	n/a	-	-	YES	-	YES	-	-	-	-
MSA_MSA_2016_lift                                 	6110066  	n/a	800	1000	-	YES	-	-	YES	-	-	-	-
NORMENT_BIP_2017_TOP9                             	10717256 	n/a	313	4015	-	YES	-	-	YES	YES	-	-	-
OKADA_RA_2014_EUR                                 	8747962  	n/a	14361	43923	-	YES	-	-	-	-	-	-	-
PGC_ADHD_2012_lift                                	1206305  	n/a	2960	2455	YES	-	-	-	-	-	-	-	-
PGC_ADHD_2017_EUR                                 	8094094  	n/a	19099	34194	-	YES	-	-	YES	YES	-	-	-
PGC_AD_2018_biorxiv_feb                           	13367300 	452010	71880	383378	-	-	YES	-	YES	-	-	-	-
PGC_ASD_2017_CEU                                  	6440259  	13574	n/a	n/a	-	YES	YES	-	YES	YES	-	-	-
PGC_ASD_2017_WW                                   	6517324  	15954	n/a	n/a	-	YES	YES	-	YES	YES	-	-	-
PGC_ASD_2017_iPSYCH                               	9112386  	n/a	18381	27969	-	YES	-	-	YES	YES	-	-	-
PGC_BIP_2012_lift                                 	2426888  	n/a	7481	9250	-	YES	-	-	YES	YES	-	-	-
PGC_BIP_2016_meta30_noTOP_lift                    	8156384  	11343	19754	30692	YES	-	-	-	-	-	-	-	-
PGC_BIP_2016_qc                                   	13414630 	n/a	20352	31358	-	YES	-	-	YES	YES	-	-	-
PGC_BPD_2017_rietschel                            	10736316 	n/a	998	1545	-	YES	-	-	YES	YES	-	-	-
PGC_BPD_2017_rietschel_noCHR8INV                  	10711810 	n/a	998	1545	-	YES	-	-	YES	YES	-	-	-
PGC_MDD_2012_lift                                 	1234937  	n/a	9240	9519	-	YES	-	-	YES	YES	-	-	-
PGC_MDD_2018_no23andMe                            	13554550 	n/a	83836	177093	-	YES	-	-	YES	YES	-	-	YES
PGC_MDD_2018_no23andMe_noUKBB                     	12986965 	n/a	69576	161613	-	YES	-	-	YES	YES	-	-	YES
PGC_SCZ_0418b                                     	7585077  	73173	67390	94015	-	YES	-	-	YES	YES	-	-	-
PGC_SCZ_0917                                      	9546069  	56323	51900	71675	-	YES	-	-	YES	YES	-	-	-
PGC_SCZ_0917_trios_asia                           	8775879  	83989	78510	104163	-	YES	-	-	YES	YES	-	-	-
PGC_SCZ_2014                                      	9444230  	n/a	35476	46839	-	YES	-	-	YES	YES	-	-	-
PGC_SCZ_2014_0513a                                	17231173 	n/a	35476	46839	-	YES	-	-	YES	YES	-	-	-
PGC_SCZ_2014_EUR_qc                               	15358495 	n/a	33640	43456	-	YES	-	-	YES	YES	-	-	-
PGC_SCZ_2014_meta49_noCHARGE_lift                 	16011470 	n/a	34486	45271	-	-	YES	-	YES	-	-	-	-
PGC_SCZ_2014_meta51_noTOP_keepbadsnps_lift        	17224222 	n/a	35099	46436	-	-	YES	-	YES	-	-	-	-
PGC_SCZ_2014_meta51_noTOP_lift                    	16052697 	n/a	35099	46436	-	-	YES	-	YES	-	-	-	-
ReproGen_MENARCHE_2014_lift                       	2441314  	182416	n/a	n/a	-	-	YES	-	-	-	-	-	-
ReproGen_MENARCHE_2017_qc_lift                    	10816230 	370000	n/a	n/a	-	-	YES	-	-	-	-	-	-
ReproGen_MENOPAUSE_2015_lift                      	2418242  	69360	n/a	n/a	-	-	YES	-	YES	-	-	-	-
SANGER_BLOOD_2016_baso_N171846                    	29541357 	173480	n/a	n/a	-	-	YES	-	YES	-	-	-	-
SANGER_BLOOD_2016_baso_neut_sum_N170143           	29537789 	173480	n/a	n/a	-	-	YES	-	YES	-	-	-	-
SANGER_BLOOD_2016_baso_p_N171996                  	29540740 	173480	n/a	n/a	-	-	YES	-	YES	-	-	-	-
SANGER_BLOOD_2016_baso_p_gran_N170223             	29538069 	173480	n/a	n/a	-	-	YES	-	YES	-	-	-	-
SANGER_BLOOD_2016_eo_N172275                      	29542794 	173480	n/a	n/a	-	-	YES	-	YES	-	-	-	-
SANGER_BLOOD_2016_eo_baso_sum_N171771             	29542097 	173480	n/a	n/a	-	-	YES	-	YES	-	-	-	-
SANGER_BLOOD_2016_eo_p_N172378                    	29540782 	173480	n/a	n/a	-	-	YES	-	YES	-	-	-	-
SANGER_BLOOD_2016_eo_p_gran_N170536               	29539410 	173480	n/a	n/a	-	-	YES	-	YES	-	-	-	-
SANGER_BLOOD_2016_gran_N169822                    	29538626 	173480	n/a	n/a	-	-	YES	-	YES	-	-	-	-
SANGER_BLOOD_2016_gran_p_myeloid_wbc_N169545      	29537542 	173480	n/a	n/a	-	-	YES	-	YES	-	-	-	-
SANGER_BLOOD_2016_hct_N173039                     	29541453 	173480	n/a	n/a	-	-	YES	-	YES	-	-	-	-
SANGER_BLOOD_2016_hgb_N172925                     	29540596 	173480	n/a	n/a	-	-	YES	-	YES	-	-	-	-
SANGER_BLOOD_2016_hlr_N170761                     	29537454 	173480	n/a	n/a	-	-	YES	-	YES	-	-	-	-
SANGER_BLOOD_2016_hlr_p_N170763                   	29537196 	173480	n/a	n/a	-	-	YES	-	YES	-	-	-	-
SANGER_BLOOD_2016_irf_N170548                     	29536954 	173480	n/a	n/a	-	-	YES	-	YES	-	-	-	-
SANGER_BLOOD_2016_lymph_N171643                   	29541141 	173480	n/a	n/a	-	-	YES	-	YES	-	-	-	-
SANGER_BLOOD_2016_lymph_p_N171748                 	29540904 	173480	n/a	n/a	-	-	YES	-	YES	-	-	-	-
SANGER_BLOOD_2016_mch_N172332                     	29539687 	173480	n/a	n/a	-	-	YES	-	YES	-	-	-	-
SANGER_BLOOD_2016_mchc_N172851                    	29543114 	173480	n/a	n/a	-	-	YES	-	YES	-	-	-	-
SANGER_BLOOD_2016_mcv_N172433                     	29540259 	173480	n/a	n/a	-	-	YES	-	YES	-	-	-	-
SANGER_BLOOD_2016_mono_N170721                    	29539498 	173480	n/a	n/a	-	-	YES	-	YES	-	-	-	-
SANGER_BLOOD_2016_mono_p_N170494                  	29538704 	173480	n/a	n/a	-	-	YES	-	YES	-	-	-	-
SANGER_BLOOD_2016_mpv_N164454                     	29519497 	173480	n/a	n/a	-	-	YES	-	YES	-	-	-	-
SANGER_BLOOD_2016_myeloid_wbc_N169219             	29535591 	173480	n/a	n/a	-	-	YES	-	YES	-	-	-	-
SANGER_BLOOD_2016_neut_N170702                    	29538406 	173480	n/a	n/a	-	-	YES	-	YES	-	-	-	-
SANGER_BLOOD_2016_neut_eo_sum_N170384             	29539550 	173480	n/a	n/a	-	-	YES	-	YES	-	-	-	-
SANGER_BLOOD_2016_neut_p_N171542                  	29539874 	173480	n/a	n/a	-	-	YES	-	YES	-	-	-	-
SANGER_BLOOD_2016_neut_p_gran_N170672             	29539682 	173480	n/a	n/a	-	-	YES	-	YES	-	-	-	-
SANGER_BLOOD_2016_pct_N164339                     	29520298 	173480	n/a	n/a	-	-	YES	-	YES	-	-	-	-
SANGER_BLOOD_2016_pdw_N164433                     	29518092 	173480	n/a	n/a	-	-	YES	-	YES	-	-	-	-
SANGER_BLOOD_2016_plt_N166066                     	29522061 	173480	n/a	n/a	-	-	YES	-	YES	-	-	-	-
SANGER_BLOOD_2016_rbc_N172952                     	29543210 	173480	n/a	n/a	-	-	YES	-	YES	-	-	-	-
SANGER_BLOOD_2016_rdw_N171529                     	29541050 	173480	n/a	n/a	-	-	YES	-	YES	-	-	-	-
SANGER_BLOOD_2016_ret_N170641                     	29537018 	173480	n/a	n/a	-	-	YES	-	YES	-	-	-	-
SANGER_BLOOD_2016_ret_p_N170690                   	29537550 	173480	n/a	n/a	-	-	YES	-	YES	-	-	-	-
SANGER_BLOOD_2016_wbc_N172435                     	29542755 	173480	n/a	n/a	-	-	YES	-	YES	-	-	-	-
SSGAC_COLLEGE_2013_lift                           	2321128  	95427	n/a	n/a	-	YES	-	-	YES	-	YES	-	-
SSGAC_CogPerf_2018                                	10098325 	257828	n/a	n/a	-	-	YES	-	YES	-	YES	-	-
SSGAC_DEPRESSIVE_2016                             	6524474  	161460	n/a	n/a	-	-	YES	-	YES	-	YES	-	-
SSGAC_EDU_2013_lift                               	2309708  	101069	n/a	n/a	-	-	YES	-	YES	-	YES	-	-
SSGAC_EDU_2016                                    	8146840  	293723	n/a	n/a	-	-	YES	-	YES	-	YES	-	-
SSGAC_EDU_2018_no23andMe                          	10101242 	766345	n/a	n/a	-	-	YES	-	YES	-	YES	-	-
SSGAC_NEUROTICISM_2016                            	6524432  	170911	n/a	n/a	-	-	YES	-	YES	-	YES	-	-
SSGAC_SWB_2016                                    	2268674  	298420	n/a	n/a	-	-	YES	-	YES	-	YES	-	-
TAG_CIGPERDAY_2010_lift                           	2458765  	68028	n/a	n/a	-	-	YES	-	YES	YES	YES	-	-
TAG_EVERSMOKE_2010_lift                           	2455496  	74035	n/a	n/a	-	-	YES	-	YES	YES	YES	-	-
TAG_FORMERSMOKE_2010_lift                         	2456202  	70675	n/a	n/a	-	-	YES	-	YES	YES	YES	-	-
TAG_SMOKEONSET_2010_lift                          	2457193  	47961	n/a	n/a	-	-	YES	-	YES	YES	YES	-	-
UKB_ALC_2017                                      	12935395 	112176	n/a	n/a	-	-	YES	-	YES	-	YES	-	-
UKB_CHRONOTYPE_2016                               	17032430 	128266	n/a	n/a	-	-	YES	-	YES	-	-	-	-
UKB_COLLEGE_2016                                  	17344347 	111114	n/a	n/a	-	-	YES	-	-	-	-	-	-
UKB_MEMORY_2016                                   	17344579 	112067	n/a	n/a	-	-	YES	-	-	-	-	-	-
UKB_RT_2016                                       	17344609 	111483	n/a	n/a	-	-	YES	-	-	-	-	-	-
UKB_SLEEP_2016                                    	16761225 	128266	n/a	n/a	-	-	YES	-	YES	-	-	-	-
UKB_VNR_2016                                      	17361492 	36035	n/a	n/a	-	-	YES	-	-	-	-	-	-
XXX_CRP_2009                                      	2508890  	66185	n/a	n/a	-	-	YES	-	YES	-	-	-	-
file                                            	#snp     	N	NCASE	NCONT.	Z	OR	BETA	LOGODDS	SE	INFO	FRQ	NSTUDY	DIRECTION
</code>

Columns description:

  * ''A1''      Allele 1, interpreted as ref allele for signed sumstat.
  * ''A2''      Allele 2, interpreted as non-ref allele for signed sumstat.
  * ''BETA''    [linear/logistic] regression coefficient (0 --> no effect; above 0 --> A1 is trait/risk increasing)
  * ''BP''      Base-pair position
  * ''CHR''     Chromosome number
  * ''CHRPOS''  chr:pos column with colon-separated information about Chromosome and Base-pair position
  * ''FRQ''     Allele frequency
  * ''INFO''    INFO score (imputation quality; higher --> better imputation)
  * ''LOGODDS'' Log odds ratio (0 --> no effect; above 0 --> A1 is risk increasing)
  * ''N''       Sample size
  * ''NCASE''    Number of cases
  * ''NCONT.''    Number of controls
  * ''NSTUDY''  Number of studies in which the SNP was genotyped.
  * ''OR''      Odds ratio (1 --> no effect; above 1 --> A1 is risk increasing)
  * ''PVAL''    p-Value
  * ''SE''      standard error of the effect size
  * ''SNP''     Variant ID (e.g., rs number)
  * ''Z''       Z-score (0 --> no effect; above 0 --> A1 is trait/risk increasing)

===== Conversion scripts ======

Conversion from RAW summary stats into standard CSV format and Matlab formats is done with
[[https://github.com/precimed/python_convert/blob/master/sumstats.py|sumstats.py]] script.
This script is able to
  * read raw summary statistics and convert them into a tab-separated file with standard column names ('PVAL', 'SNP', 'CHR', 'BP', etc).
  * read standardized summary statistics file, align it with 2M or 9M SNP template, and save in matlab format
  * lift CHR:POS across different genomic builds, and lift RS numbers to newer version of NCBI SNPdb database.

The whole process is automated by [[https://github.com/precimed/GWAS_SUMSTAT|Makefiles]]
which makes it very easy to re-run the whole pipeline if you require additional summary stats, additional columns or simply discover a bug in the conversion.

If you need more summary stats in this format please contact <oleksandr.frei@gmail.com>.

===== Definitions for GWS loci and lead/indep variants ======

To define the number of independent genome-wide significant hits in GWAS summary stats we follow procedure from FUMA (see http://fuma.ctglab.nl/tutorial#parameters, chapter "Parameters for lead SNP and candidate SNP indentification".

 1. Select a set of genome-wide significant hits (''PVAL < 5e-8''), and clump this set using LD r2=0.6. The resulting set is called `independent SNPs`.
  
 2. Select all independent SNPs as defined above, and clump them using LD r2=0.1 This resulting set is called `lead SNPs`.
  
 3. Query reference panel for all SNPs in LD r2>0.6 with any of the independent SNPs. Call the resulting set `candidate SNPs`.
  
 4. Define a genomic region of all candidate SNPs that belong to a given lead SNP. Merge genomic regions into one if they are located closer than 250 KB to each other. The resulting set of genomic region is called `loci`.


===== Updates ========

  * [2018/09/20] Apply QC procedure to all summary stats from 23andMe
  * [2018/09/20] Add PGC_MDD_2018_with23andMe and PGC_MDD_2018_with23andMe_noUKBB (meta-analysis of 23andMe_MDD_2016 with PGC MDD data)
  * [2018/09/20] Add ICC_CANNABIS_2018_UKB
  * [2018/09/24] Rename CTG_COG_2018_noTOP to COGENT_COG_2017_noTOP, and add the actual CTG_COG_2018_noTOP
  * [2018/12/28] Release v0.9.2
  * [2019/01/07] Comprehensive updated (see below for further notes)


===== Update 2019/01/07 =====

Changes:

0. A dump of the inventory before these changes is available in <SUMSTAT>/releases/v0.9.2, including all previously generated STD and MAT files.

1. The extension of files in STD folder was changed from .csv.gz to .sumstats.gz

2. All files in STD folder no longer have "_qc" or "_lift" suffixes. Previously "_qc" suffix was actually confusing because no QC was done on our side, so that summary stats are QCed only if it had been done by those who share raw summary statistics. The "_lift" suffix previously indicated that SNP or CHR/POS column is not from the original file; now the same information can be looked up at this table:https://github.com/precimed/GWAS_SUMSTAT/blob/master/all_sumstats.tsv (see "doLift" column)

3. all MAT files (2M / 9M / 11M templates) are now stored in TMP folder:
<SUMSTAT>/TMP/mat_2m
<SUMSTAT>/TMP/mat_9m  
<SUMSTAT>/TMP/mat_11m  
They are somewhat redundant, because they are produced by a single command ("sumstats.py mat --sumstats STD/<id>.sumstats.gz --out <out> --ref <ref_file> "), i.e. easily reproduced.  By putting them to TMP I've tried to indicate that .MAT files should be treated as temporary artifact of our cond/conj FDR analysis pipeline. If one needs a permanent version of summary stats it's best to keep the STD file (.sumstats.gz). 

4. Future "releases" of SUMSTAT inventory will contain "STD" and "ANALYSIS" folders. A "release" is created each time when we substantially change the pipeline, otherwise once a year to save new summary statistics.

This changes were introduced to further standardized the pipeline, and document it in a clean way. 

Detailed description is now available here: https://github.com/precimed/GWAS_SUMSTAT (and linked from the wiki).

===== Overview of eQTL resources ====

{{::eqtl_resources.pdf|}}