====An example of making a Supplementary Table in cond/conjFDR paper====
Supplementary tables in cond/conjFDR papers come in various forms. Mostly they are in Microsoft Excel formats, 10's of files even for comparison of 2-3 traits. Each of the files contains 100's of rows and 10's of columns. Loci associated with 10's of traits are becoming usual these days.\\

Majority of the files are meant to show number of loci associated with a primary trait at condFDR <0.01 given association with a secondary trait, or number of shared loci between two traits at a certain conjFDR level (usually < 0.05).  Following Python script is an example of how to calculate associated loci between two traits (LONELY and SCZ), that can be extended to other traits.  

** Load only 2 Python libraries**

<code>
import numpy as np
import pandas as pd
</code>

**Load the clumped loci file with FDR values**

Once one gets the output put (result.mat) from MATLAB cond/conjFDR run, [[https://github.com/precimed/python_convert| python_convert ]] is used to convert it a corresponding csv file. Then clumping is performed (procedure described [[https://github.com/precimed/python_convert| here ]])

<code>
df1= pd.read_csv('results_LONELY_SCZ_condFDR_clump.loci.csv', sep='\t')
</code>


**Load [[https://fuma.ctglab.nl/|FUMA]] annotated files **

<code>
# Load snps.txt from annotated folder, this file contains most if the columns
df2= pd.read_csv('Loneli_SCZ_condFDR/snps.txt', sep='\t')

# Take only important columns from snps.txt
df2_cols= ['uniqID','rsID', 'nearestGene', 'dist','CADD', 'RDB',  'minChrState','commonChrState','effect_allele','non_effect_allele']
df2= df2[df2_cols]

# load ANNOVAR file, it contains gene annotation
df3= pd.read_csv('Loneli_SCZ_condFDR/annov.txt', sep='\t')

# Take only 2 columns from ANNOV
df3_cols=['chr', 'pos', 'annot']
df3 = df3[df3_cols]

</code>

**Load the STD summary statistics files**

<code>
df_lonely=pd.read_csv('UKB_LONELY_2018_Loneliness.sumstats.gz', compression='gzip', sep='\t', header=0)
df_SCZ = pd.read_csv('PGC_SCZ_2014_EUR.sumstats.gz', compression='gzip', header=0, sep="\t")
colsp= ['SNP','PVAL', 'Z']
df_lonely_p= df_lonely[colsp]
df_SCZ_p= df_SCZ[colsp]

</code>

** Combinations**

<code>
df1_df2 = pd.merge(df1, df2, how= 'left', left_on='LEAD_SNP', right_on='rsID')
df1_df2_df3 = pd.merge(df1_df2, df3,  how='left', left_on=['CHR','LEAD_BP'], right_on = ['chr','pos'])
df4= df1_df2_df3.drop_duplicates(subset=[ 'CHR','LEAD_BP'])
df4= df4.drop(['rsID', 'chr', 'pos'], axis = 1) 

</code>

** Merge the with sumstats to get pval and effect sizes**

<code>
df5= pd.merge(df4,df_lonely_p, how='left', left_on='LEAD_SNP', right_on='SNP' )
df6= pd.merge(df5,df_SCZ_p, how='left', left_on='LEAD_SNP', right_on='SNP', suffixes=('_LONELY', '_SCZ') )

df7= df6.drop([ 'SNP_SCZ', 'SNP_LONELY', 'uniqID'], axis=1)
df7.rename(columns={'locusnum':'GenomicLocus',
                          'FDR':'condFDR',
                          'PVAL_LONELY':'LONELY_pval',
                           'Z_LONELY': 'LONELY_Z',
                        'PVAL_SCZ':'SCZ_pval',
                            'Z_SCZ': 'SCZ_Z',
                           'annot': 'Function',
                           'dist': 'Distance'}, 
                 inplace=True)

df8_cols= ['GenomicLocus','CHR','LEAD_SNP', 'LEAD_BP', 'MinBP', 'MaxBP', 'condFDR' ,
'effect_allele', 'non_effect_allele','LONELY_pval','LONELY_Z', 'SCZ_pval','SCZ_Z',
'nearestGene', 'Distance', 'Function', 'CADD', 'RDB', 'minChrState', 'commonChrState']

df8= df7[df8_cols]

</code>

** Processing for LONELINESS loci **

This file is downloaded from the original paper on [[https://www.nature.com/articles/s41467-018-04930-1|LONELINESS GWAS]] 

<code>
df8.to_excel('Supplement_CondFDR/LONELY_SCZ_condFDR.xlsx', index=False)

# define the loci as BP=250K, both up- and downstream 
min250=df8.MinBP-250000
max250=df8.MaxBP+250000


df_loneli_loci=df_loneli['MTAG (3 Strata Loneliness composite)']
df_loneli_loci= df_loneli_loci[ [ 'chr', 'bpos']]

loneli_chr= df_loneli_loci.chr
loneli_bp= df_loneli_loci.bpos

</code>

** Here is the Secondary (SCZ) trait loci **
<code>
cond_chr= list(df8.CHR)
cond_range = list(zip(min250, max250 ))


loci_col=['Loci in LONELY GWAS', 'MinBP']
dataM=[]
for i in range(len(loneli_bp)):
    for j in range(len(cond_range)):
        if (cond_range[j][0] <= loneli_bp[i] <= cond_range[j][1]) & (cond_chr[j]==loneli_chr[i]):
            loci= str(loneli_chr[i])+":"+str(loneli_bp[i])
            minBP= cond_range[j][0]+250000
            maxBP = cond_range[j][1]-250000
            dataM.append([loci,minBP])
            
df_loci = pd.DataFrame(dataM, columns=loci_col)
</code>

** Merge the dataframes and output in an Excel file **

<code>
df_final= pd.merge(df8,df_loci, how='left', on='MinBP' )
df_final.to_excel('LONELY_SCZ_condFDR.xlsx', index=False)
</code>