Tinkergpu:Newuser

From tinkergpu
Revision as of 14:36, 27 August 2019 by Root (talk | contribs) (Created page with " = Lab cluster new user = == Preparation == You will need a ssh client to remote log into our computer cluster. If you use Mac or Linux, ssh client is built-in. If you use W...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Lab cluster new user

Preparation

You will need a ssh client to remote log into our computer cluster. If you use Mac or Linux, ssh client is built-in. If you use Windows, you can install this https://mobaxterm.mobatek.net/

If you are new to Linux: https://ryanstutorials.net/linuxtutorial/

Our cluster

You need to ssh into “bme-venus.bme.utexas.edu” first. You can do so off or on campus using your password (all othe rnodes need ssh key). Then "ssh bme-nova”. bme-nova is the header node of our computer cluster where we do real calcualtions and simulations. From there you can start your job by doing "ssh node120 xxx” where xxx is your simulation script. You can write your own script to submit a bunch of job to these nodes (check the availability and skip if the node is busy). Your home directory is shared among nova and all nodes vis NFS. If you write to disk very often, you should put your job on local disk /scratch on each node instead of the NFS mounted home directory.

More about cluster structure at the end.

Your home directory (echo $HOME) is located either in /home or /users. Physically these files are located on bme-nova, we use NFS mount to share your home folder on all computers/nodes in the lab so you see the same file structures. But for speed purpose when you edit a file on venus, it is actually doing so remotely (hence use /scratch that is local on each node for serious jobs). Your home directory has a limit (quota) on space and file number and you won't be able to create new files once it is reached.

You should also set up ssh key for passwordless login right awayafter you log into bme-venus using the temp password. This means you can move between clusters within the lab without using password (using a secure key stored in your home folder instead)

May need to create key gen first with "ssh-keygen"

cd ~
ssh -t rsa (then hit enter all the way)
cd .ssh 
cp id_rsa.pub  authorized_keys
chmod 600  authorized_keys

Use the command "passwd" to change your password (to something more secure) after you log into nova (your changes will be propagated eveywhere automatically).

The node activities can be monitored here:

http://biomol.bme.utexas.edu/ganglia/?c=NOVA&m=load_one&r=hour&s=by%20name&hc=4 
Nodes with GPUs: http://biomol.bme.utexas.edu/~pren/gpu.log (avoid using all CPUs on these nodes)

 

Utilities

We don’t have a job queuing system (yet) so you just log into a node to run your job.

Use "top" to check the load on a node. 600% means 6 cores are fully loaded.
"less /proc/cpuinfo" to check how many CPU cores/threads on a node
"free -g" to check free memoey (esp. for QM jobs). The line "-/+ buffers/cache:" has the real number for free space

More tips about Lnux commands and uitlies:

https://docs.google.com/document/d/1cnmSItdRXDBcpVBGhwDahJVJ2jeh4l2oDVpMBLtlySE/edit?usp=sharing 

Backups

All home directories (/home, /users, /opt, /work, /work2) are backed up twice a week for ~ 2 weeks on bigdata.bme.utexas.edu. If you need to recover any files from last couple of weeks, you should be able to find them there. Log into bigdata, cd /bigdata/renlab/.

Cluster structure

File:Lab Cluster.pdf
Lab Cluster.pdf