====== TSD practical stuff ======

Project information @TSD

Project id
    p33
Project name
    top
Project shortname
    NORMENT
Project longname
    Norsk senter for forskning på mentale lidelser
HPC access
    yes

Project start date
    2014-01-01T00:00:00Z
Project end date
    2050-12-31T00:00:00Z
institution
    uio-med
Price category
    None
VM types
    win_and_linux_vm
Rek
    REK HSØ 493-03-01179 Ole Andreassen, Datatilsynet 2003/2051-6 Ole Andreassen



===== Colossus HPC =====

A cluster of computing nodes called Colossus is available within TSD. The system adopted by Colossus for job control is [[https://slurm.schedmd.com/sbatch.html|Slurm]]. For more information on how to submit jobs to Colossus see [[https://www.uio.no/english/services/it/research/sensitive-data/use-tsd/hpc/colossus-userguide.html|here]].

==== Private p33 queues on Colossus ====

The p33 project has private queues on Colossus HPC (i.e. not shared with other projects).
In order to submit jobs to the private queue add <code>--account=p33_norment</code> to the your ''sbatch'' command line or <code>#SBATCH --account=p33_norment</code> to the script to be submitted via ''sbatch''.

We also have a second small queue with 16 cores called <code>p33_norment_dev</code>,
which we can use to develop scripts. All users are asked not to run large scale jobs in that queue,
so we can use for small tasks during development of new scripts.

==== AI-HUB reservation (GPU) =====

We have access to 128 cores with "p33_aihub" reservation. Submit jobs here if you need nodes with GPU.{{:bmdmcaocfcheijio.png?300|}}

* further directions on how to use dedicated colossus queue \\ https://www.uio.no/english/services/it/research/sensitive-data/use-tsd/hpc/dedicated-resources.html

==== Interrupting a Linux session when its stuck =====

Occasionally Linux terminals get stuck (due to various reasons, e.g. memory overload, wrong password, etc.). One way to interrupt an individual Linux session without rebooting the entire system is by "killing" your Linux session. Obviously, if Linux is stuck you can't access it from Linux, but you can access it via the Windows machine using the `pkill` trick. `pkill` is a Linux command that immediately interrupts all processes on a node. Think of it as pulling the plug on a computer. Here's how to kill a Linux session:

  - Disconnect from ThinLinc
  - Log in to the Windows VM through VMware
  - Open Putty
  - In "Host Name" put: <code>p33-rhel7-login</code>
    - Leave all other settings the same (screenshot below)
  - Type in your user name and password when promted
  - type <code>pkill -9 -u p33-yourusername</code>
  - Now it will say that your session is disconnected, this is normal
  - Now try logging in to a Linux session through ThinLinc again.

{{help:pkill-putty-session.png}}

==== Issues with accessing ssh p33-appn-norment01.tsd.usit.no =====

Occasionally TSD updates the ssh policy and you may get a warning:
    @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
    @ WARNING:REMOTE HOST IDENTIFICATION has changed! @
    @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
    IT IS POSSIBLE THAT ...etc
This is likely solved with 'rm -f .ssh/known_hosts'. Then try to connect again.

==== 7 days' backups... =====
Exaggerated your cleaning file efforts? Don't despair. 

See here: https://www.uio.no/english/services/it/research/sensitive-data/help/Backup-file-recovery.html.

For /cluster/, you can find the it here: /cluster/p/p697/.snapshots

If you deleted more than than 7 days, please contact tsd-drift@usit.uio.no.
