====== Where to find genotypes ======
(outdated)


Following are excerpts of the contents of the imputation directory as of Nov 23rd, 2016.
These are bound to change along with the ongoing effort of harmonizing the imputed genotype data.

The genotypes are stored in dosage files and are separated into different batches, the original TOP8 batch, NORMENT's Jan15, Jul15, Apr16, Jun16 batches, and the HUNT and NCNG samples.
They can be found under the following paths on TSD:

//(outdated!)//
<code>
/tsd/p33/home/p33-franbe/data/durable/vault/genetics/imputation

total 3783904
drwxrwsr-x. 9 p33-franbe p33-member-group       2048 Nov 22 16:07 .
drwxrwsr-x. 7 p33-franbe p33-member-group       2048 Jun 20 14:39 ..
drwxr-s---. 4 p33-franbe p33-import-group      12288 Aug 30 17:48 HUNT
drwxr-s---. 6 p33-franbe p33-member-group       2048 Nov 21 18:03 Imputation_ENIGMA2_protocol_Apr16
drwxr-s---. 6 p33-franbe p33-member-group       2048 Nov 21 18:05 Imputation_ENIGMA2_protocol_Jan15
drwxr-s---. 6 p33-franbe p33-member-group       2048 Nov 21 18:06 Imputation_ENIGMA2_protocol_Jul15
drwxrwsr-x. 5 p33-sgi024 p33-member-group       2048 Nov 21 18:06 Imputation_ENIGMA2_protocol_Jun16
drwxr-s---. 4 p33-franbe p33-member-group       8192 Aug 30 17:11 Imputation_TOP8
drwxr-s---. 4 p33-franbe p33-import-group      12288 Aug 30 18:07 NCNG
-rw-r-x---. 1 p33-franbe p33-import-group        925 Aug 29 16:10 idconvert.sh
-rw-r-----. 1 p33-sgi024 p33-import-group 3874067429 Apr 25  2016 norment_Imputation_Apr16.tar.gz
-rw-rw-r--. 1 p33-franbe p33-member-group     201576 Aug 26 18:20 pns
-rw-rw-r--. 1 p33-franbe p33-member-group       1087 Aug 26 10:29 readme
-rw-rw-r--. 1 p33-franbe p33-member-group         80 Oct  7 12:06 test.pns
</code>

readme:

<code>
Version 3.0, update Nov 23th, 2016.
    - added directories of symbolic links for specific data.
    - deprecated 'qc1'

the 'Imputed' directories contain five subdirectories called 'dosages', 'haplotypes', 'plink',
'qc0' and 'qc1', respectively. The 'Imputation_TOP8' contains the latter two only.
    'dosages'     - original MaCH dosage output; no post-imputation QC applied whatsoever;
    'haplotypes'  - phased genotypes; no post-imputation QC applied whatsoever;
    'plink'       - plink-friendly 0..2 dosage files; no post-imputation QC applied whatsoever;
    'qc0'         - directory with the actual files the ones in the previous three link to;
    'qc1'         - minimal QC (info > 0.1) applied (deprecated as too mild to be of any use);

NOTE: the dosage files contain SNPs and indels. the latter have the form chr:pos:XX_X, where the
      XX_X specification can be five characters long at most. caution is therefore required when
      handling variants with specification of five characters as they can be ambiguous.


Version 2.0, update Aug 26th, 2016.
    - bug discovered: ineffective MAF control

the 'Imputation*' directories contain two subdirectories called 'qc0' and 'qc1', respectively:
    'qc0'  - no post-imputation QC applied whatsoever;
    'qc1'  - minimal QC (info > 0.1) applied;

NOTE: the dosage files contain SNPs and indels. the latter have the form chr:pos:XX_X, where the
      XX_X specification can be five characters long at most. caution is therefore required when
      handling variants with specification of five characters as they can be ambiguous.


Version 1.0, Jun 23rd, 2016.

the 'Imputation*' directories contain two subdirectories called 'qc0' and 'qc1', respectively:
    'qc0'  - no post-imputation QC applied whatsoever;
    'qc1'  - minimal QC (info > 0.1, MAF > 0.005) applied;

NOTE: the dosage files contain SNPs and indels. the latter have the form chr:pos:XX_X, where the
      XX_X specification can be five characters long at most. caution is therefore required when
      handling variants with specification of five characters as they can be ambiguous.
</code>


contents:

Imputation_ENIGMA2_protocol_Apr16:
<code>
total 1465632
drwxr-s---. 6 p33-franbe p33-member-group       2048 Nov 21 18:03 .
drwxrwsr-x. 9 p33-franbe p33-member-group       2048 Nov 22 16:07 ..
drwxr-s---. 8 p33-franbe p33-member-group       2048 Nov 21 16:38 Imputed
drwxr-s---. 2 p33-franbe p33-member-group      45056 Aug 23 16:55 Mach
drwxr-s---. 2 p33-franbe p33-member-group      55296 May 23  2016 Phased
drwxrwsr-x. 2 p33-franbe p33-member-group       2048 Nov 17 12:23 Raw
-rw-rw-r--. 1 p33-franbe p33-member-group 1477735459 Nov 18 18:17 genotyped.dose.gz
-rw-rw-r--. 1 p33-franbe p33-member-group    6805606 Nov 17 16:43 genotyped.mrk
-rw-rw-r--. 1 p33-franbe p33-member-group    9141691 Nov 21 18:03 genotyped_chip.alleles
-rw-rw-r--. 1 p33-franbe p33-member-group    6805583 Nov 17 17:27 genotyped_chip.mrk
-rw-rw-r--. 1 p33-franbe p33-member-group         23 Nov 17 17:12 genotyped_nochip.mrk
</code>

Imputation_ENIGMA2_protocol_Jan15:
<code>
total 6418144
drwxr-s---. 6 p33-franbe p33-member-group       2048 Nov 21 18:05 .
drwxrwsr-x. 9 p33-franbe p33-member-group       2048 Nov 22 16:07 ..
drwxr-s---. 8 p33-franbe p33-member-group       2048 Nov 18 16:50 Imputed
drwxr-s---. 2 p33-franbe p33-member-group      22528 Jan  7  2016 Mach
drwxr-s---. 2 p33-franbe p33-member-group      28672 Jan  7  2016 Phased
drwxrwsr-x. 2 p33-franbe p33-member-group       2048 Aug 26 18:28 Test
-rw-rw-r--. 1 p33-franbe p33-member-group 6549591516 Nov 18 20:19 genotyped.dose.gz
-rw-rw-r--. 1 p33-franbe p33-member-group    6667775 Nov 17 16:43 genotyped.mrk
-rw-rw-r--. 1 p33-franbe p33-member-group    8955527 Nov 21 18:06 genotyped_chip.alleles
-rw-rw-r--. 1 p33-franbe p33-member-group    6667099 Nov 17 17:27 genotyped_chip.mrk
-rw-rw-r--. 1 p33-franbe p33-member-group        676 Nov 17 17:12 genotyped_nochip.mrk
</code>

Imputation_ENIGMA2_protocol_Jul15:
<code>
total 3192320
drwxr-s---. 6 p33-franbe p33-member-group       2048 Nov 21 18:06 .
drwxrwsr-x. 9 p33-franbe p33-member-group       2048 Nov 22 16:07 ..
drwxr-s---. 9 p33-franbe p33-member-group       2048 Nov 21 16:38 Imputed
drwxr-s---. 3 p33-franbe p33-member-group      24576 Jun 13 10:03 Mach
drwxr-s---. 2 p33-franbe p33-member-group      28672 Jun 13 16:55 Phased
drwxrwsr-x. 2 p33-franbe p33-member-group       2048 Nov 17 12:08 Raw
-rw-rw-r--. 1 p33-franbe p33-member-group 3244918950 Nov 18 21:10 genotyped.dose.gz
-rw-rw-r--. 1 p33-franbe p33-member-group    7117807 Nov 17 16:43 genotyped.mrk
-rw-rw-r--. 1 p33-franbe p33-member-group    9477683 Nov 21 18:06 genotyped_chip.alleles
-rw-rw-r--. 1 p33-franbe p33-member-group    7055047 Nov 17 17:27 genotyped_chip.mrk
-rw-rw-r--. 1 p33-franbe p33-member-group      62760 Nov 17 17:12 genotyped_nochip.mrk
</code>

Imputation_ENIGMA2_protocol_Jun16:
<code>
total 154208
drwxrwsr-x. 5 p33-sgi024 p33-member-group      2048 Nov 21 18:06 .
drwxrwsr-x. 9 p33-franbe p33-member-group      2048 Nov 22 16:07 ..
drwxrwsr-x. 6 p33-sgi024 p33-member-group      2048 Nov 21 16:39 Imputed
drwxrwsr-x. 2 p33-sgi024 p33-member-group     16384 Jul  6 18:21 Mach
drwxrwsr-x. 2 p33-sgi024 p33-member-group     55296 Jul  6 18:22 Phased
-rw-rw-r--. 1 p33-franbe p33-member-group 135306763 Nov 18 21:15 genotyped.dose.gz
-rw-rw-r--. 1 p33-franbe p33-member-group   6675562 Nov 17 16:43 genotyped.mrk
-rw-rw-r--. 1 p33-franbe p33-member-group   8966831 Nov 21 18:06 genotyped_chip.alleles
-rw-rw-r--. 1 p33-franbe p33-member-group   6675515 Nov 17 17:27 genotyped_chip.mrk
-rw-rw-r--. 1 p33-franbe p33-member-group        47 Nov 17 17:12 genotyped_nochip.mrk
</code>

Imputation_TOP8:
<code>
total 391008
drwxr-s---. 4 p33-franbe p33-member-group      8192 Aug 30 17:11 .
drwxrwsr-x. 9 p33-franbe p33-member-group      2048 Nov 22 16:07 ..
-rw-rw-r--. 1 p33-franbe p33-member-group 223421878 Aug 30 17:10 TOP.bed
-rw-rw-r--. 1 p33-franbe p33-member-group  18275245 Aug 30 17:10 TOP.bim
-rw-rw-r--. 1 p33-franbe p33-member-group     33948 Aug 30 17:09 TOP.fam
-rw-rw-r--. 1 p33-franbe p33-member-group       549 Aug 30 17:10 TOP.log
drwxr-s---. 2 p33-franbe p33-member-group      6144 Aug 30 10:50 qc0
drwxr-s---. 4 p33-franbe p33-member-group     12288 Aug 30 10:56 qc1
-rw-r-x---. 1 p33-franbe p33-member-group 158402737 Mar 15  2016 qc1_SNPlist.mrk
</code>

The HUNT and NCNG directories contain the 0..2 dosage files for the respective samples.
The dosages in the Imputation_TOP8 directory were imputed by Andrew Schork.
