====== TSD import/export ======


===== Import (open for all users) =====
Easy to use is https://data.tsd.usit.no/. \\
Uploaded files will be available in ''/tsd/p33/data/durable/file-import/''.\\
Other methods exist to upload files to TSD. They are documented in the [[https://www.uio.no/english/services/it/research/sensitive-data/use-tsd/import-export/index.html|official TSD user guide]]. 

===== Export (restricted to specific users) =====
{{:how_to_export_tsd_p33u.pdf|}}\\
For security reasons standard users are only granted access to import files into p33. \\
If you need to export data,  contact the relevant exporter in your group.\\
These people can help you with export {{:exportusers.xlsx|}} \\
\\
Users who need export rights apply for approval from p33-admin. \\
Users with export access are expected to address export-requests in their local group, and be informed about regulations for sensitive data management, as well as encryption-procedures.\\

Basic principles:
  * Anonymous data files, figures, papers, and presentations can be exported without encryption.
  * Files with participant ID must be encrypted, and extracted in a safe location.
  * Transfer of files between OUS and TSD do not need encryption.

Additional information about encryption and alternative transfers can be found[[https://www.uio.no/english/services/it/research/sensitive-data/use-tsd/import-export/index.html|here]]

NB! There is a limit on how many files will show up in the export portal ( https://data.tsd.usit.no/ ). 
Currently the limit is 200 files.
Export users must regularly remove old files after they had been exported.
If we exceed 200 files, it is still possible to place new files into /tsd/p33/data/durable/file-export,
but new files will not show up in the export portal.

==== The Publication Portal ====
The goal of the TSD publication system is to provide a solution for TSD projects to share data with researchers associated with the project (and who may otherwise not necessarily be project members themselves).\\

-how-to: https://www.uio.no/english/services/it/research/sensitive-data/use-tsd/publication/index.md


==== tsd-s3cmd and TACL ====

These are two APIs for syncing the data to and from TSD via command line.
These APIs can be only configured on certain systems (e.g. NIRD, SAGA, NREC, UCSD MMIL) because TSD has to while-list IP addresses. It's not possible to configure these APIs on a local laptop machine.
Even on NIRD/SAGA/NREC/UCSD each user needs to install and configure tsd-s3cmd and TACL as described below.

tsd-s3cmd is more suitable for large scale incremental sync jobs with large number of files and complex folder structures. TACL is more suitable for interactive usage (more similar to the use of data portal, but from command line).
Both API depend on python3 setup. If you don't have it configured, try as follows:

<code>
mkdir ~/miniconda3 && cd ~/miniconda3
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh 
bash Miniconda3-latest-Linux-x86_64.sh 
export PATH=$PATH:~/miniconda3/bin
</code>

The APIs won't work on a local laptop because the IP address needs to be white-listed.
We have this configred for NIRD, MMIL and probably SAGA (not sure).

Usage of tsd-s3cmd is described here:

Note: 

Once tsd-s3cmd is installed as mentioned in https://github.com/unioslo/tsd-s3cmd, create .config folder in the $HOME directory for registering the API. 

<code> 
cd && mkdir .config 
</code>

Useful commands:
<code>
tsd-s3cmd --guide
</code>

Note that the data will go into /tsd/p33/data/durable/s3api/<bucket>/.

TACL describe here:
https://github.com/unioslo/tsd-api-client

The data will go to /tsd/p33/data/durable/file-import/p33-member-group/


It might be tricky to remove content of your TSD s3 bucket. If you have a bucket named MyBucket, and it has a folder MyFolder, you may remove the content of MyFolder in 2 steps: first, on your local machine, create an empty folder call MyFolder. Then, execute
<code>tsd-s3cmd sync --no-check-md5 --delete-removed --force MyFolder s3://MyBucket</code>


----


===== Encryption/Decryption =====

Since transfer to and from TSD requires an internet connection, your data will necessarily be exposed to the network at some point. TSD recommends therefore that you apply some form of encryption to your data before the transfer. 

=== For Widows-users ===
Encrypt using 7zip. Instructions can be found [[https://www.uio.no/english/services/it/research/sensitive-data/use-tsd/import-export/index.html#toc11|here]]

=== For Linux users ===

Below, is a description of a (safer) asymmetric (private/public) key encryption/decryption procedure between A and B making use of a piece of software called ''gpg'' on Linux/Mac systems.

On system A:

  * Make sure ''gpg'' is installed (see [[https://gnupg.org/download/index.html|here]] otherwise).
  * Open a command line terminal and generate a key pair. 

  gpg --gen-key

The program will ask lots of things. Just use default values whenever possible. Pick a name, e.g. "Ole Andreassen", comments and email address, and choose a password (and remember it, you'll need it here(*)).

  * Export the public key.

  gpg --export 'Ole Andreassen' > public_OA.gpk

On system B (the one with data to be encrypted):

  * Make sure ''gpg'' is installed (see [[https://gnupg.org/download/index.html|here]] otherwise).
  * Import the public key.

  gpg --import public_OA.gpk

  * Encrypt data ''stuff'' using this key.

  gpg --encrypt --recipient 'Ole Andreassen' --output encrypted.stuff.gpg stuff 

  * Send the encrypted data back to A.

On system A:

  * Decrypt data (provide the password(*) when prompted to). 

  gpg --batch --output decrypted.stuff --decrypt encrypted.stuff.gpg

NOTE: this is a comprehensive description of the whole procedure, including key generation and export/import. Of course, once ''gpg'' and the keys are in place, only the last three steps, Encrypt, Send, Decrypt are needed.

===== Encryption/Decryption (advanced usage) =====
If you have a large set of files to decrypt, you could automate this task as follows:

<code>
cd /tsd/p697/data/durable/hdd/p697
find . -type f -name "*.gpg" | parallel echo '"echo <password>" \| gpg --batch --yes --passphrase-fd 0 -o /tsd/p697/data/durable/users/ofrei/deCODE_Nov20/{.} -d {}' | bash
</code>

Here  ''/tsd/p697/data/durable/hdd/p697'' is the source folder containing a bunch of .gpg files,
''<password>'' needs to be replaced by password  that is encrypting the data, ''/tsd/p697/data/durable/users/ofrei/deCODE_Nov20/'' is target location. Note that target location must already have all folders created before running the above script.