MPRC HPCC Basic Tutorials and Tips

Here is a basic overview, some helpful tips, tricks, FAQs and links. Feel free to contact us as well!

Computer Cluster Basics

Computer clusters have gained popularity as a wonderful way to cheaply increase the processing power of a system by allowing computer nodes to split up highly complicated tasks to be run in parallel thus decreasing the overall time needed for the tasks. Clusters are a group of independent computers (called nodes) that are connected through a communication network that allows for tight connections and the splitting and distributing of tasks. Most are especially useful for analyzing large quantities of data, i.e. graphics, images, and genomes. Clusters can have shared or individual memory, but require specific software that facilitates communication between nodes that allows for parallel computing, which is the parallel completing of tasks by connected computer nodes for large sets of data and code.

Though our computers are separate and can be used individually, the MPRC cluster can be used to run programs in parallel with the qsub command from the Linux terminal, which we will learn more about in the next parts of this tutorial. The Portable Batch System software for our cluster includes the Torque scheduler which oversees the messaging and timing of task between the nodes, allowing for efficient completion of tasks. In one of the later sections on this page, you will learn more about preparing PBS submission scripts. However, first you need to be comfortable using the Linux command line!

Linux

There are two parts to Linux systems: the kernel and the shell. The kernel is the software that determines the infrastructure of the systems and never really interacts directly with the user. The shell is the software that interprets the user commands and acts as the go between for the user and computer. The shell command line is designated on our systems by the $, so in the basic tutorial below, you will not need to type in the $. We use it to show that the command is typed in the shell. The first thing you must be able to do is log onto the Linux login.mdbrain.org terminal. If you haven’t already, check out our Request Account page and contact Mr. Sung Yu for access to the MPRC system, otherwise you will not be able to sign in. To access login.mdbrain.org, you must first use ssh. This command stands for Secure Shell and allows for logins to a remote host in the command line among other things. In order to log in, you would use

$ ssh [email protected]

which will take you to the login server. This will then prompt you for your password and Duo Mobile authentication. This is why contacting Mr. Sung Yu and downloading Duo Mobile must be done first. After inputting your password, Duo Mobile will give you three options:

A Duo Push: This sends an approval request to the Duo Mobile App you have installed, which you must accept
A phone call to your mobile
SMS passcodes sent to your mobile number

Once Duo Mobile has approved your entry request, you will be on the login.mdbrain.org server. For more information on how to log into our system, please watch the Login Video on our Video Tutorials page!

To access some files on login.mdbrain.org, you might need admins access. If you need to switch to admins, you must type

$ su admins

into the command line which tells Linux to switch user to admins. After signing in as admins, you will be able to use the command sudo to enact admins commands. For example, to use the vi editor as admins, you need to use the sudo command.

$ sudo vi /PATHWAY/file.jsp

**However, only use sudo and admins when absolutely necessary and when you know what you are doing, as it can install or delete files including others’ data**

There are many commands that do not require admins access. Begin by trying cal. You should get results that look something like this:

$ cal
February 2019
Su Mo Tu We Th Fr Sa
1 2
3 4 5 6 7 8 9
10 11 12 13 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28

You can also try date and who. Note the outputs and how they correspond to the command. Next try using the manual command, man. If you just type man you get a smart-alecky response returned in your terminal. This is because man needs to be used with another command. As implied by the name “manual”, this command returns a description of the command it is used with. Try man with cal:

$ man cal

As you can see the output is information on the uses of cal. There is even a list of how you can modify the command cal to get different outputs or edit the calendar command.

The next important concepts you will need to understand are how to open directories (think of them as folders if that helps) and how to see the contents of those directories, as our Linux computers are all command line based rather than icon based. In order to view the directories on your computer, you can use the command ls, which will show the contents of the directory you are currently working in. It can be modified by adding letter. For example, try:

$ ls

And then try:

$ ls -l

The -l shows the listed files in the “long” way which gives you a bit more information. Other modifiers that are often used with ls are -r,which displays the list in reverse order, and-t, which shows the list in the order of when the files were last opened. Thus, when all of the modifiers are combined with ls,

$ ls -ltr

the files are listed in their long format in order of last opened to most recently opened. Other basic commands you need to know are touch which creates a new file, mkdir which creates a new directory, and cd which changes your working directory. When using touch or mkdir, make sure you include a name. For example,

$ mkdir SomethingFun

will create a directory called “SomethingFun”. If you want to put the output of a command in a certain file, you can use the ">" symbol. For example, you can put a calendar into a file “File1":

$ cal > File1

And you can create a copy of “File1” using the cp command:

$ cp File1 File2

You would now have a copy of the calendar called “File 2”, which you could rename to “FebCal” using the mv command:

$ mv File2 FebCal

*Tip: use the star symbol (*), also called the wildcard, in front of a file name if you have multiple files with similar names that you would like to move/delete/etc. all at once!

All of these files should be under “SomethingFun” but if you wanted to more “FebCal” to a new directory you had already made called “Calendars”, you would also use the mv command:

$ mv FebCal Calendars/

Now to print the file “FebCal” in your terminal you would use the command cat:

$ cat FebCal

You can always use control-c to cancel a command or type exit to quit the terminal.

If you wish to delete a file you’ve already copied or do not want it in the directory anymore, you can use the rm command. In order to remove an entire directory, which you must be careful about as there is no easy way to recover deleted files, you add the option -r to your rm command.

Furthermore, if you have forgotten a command and remember the basics of what it does or is, you can use the command apropos to find relevant commands. Use the keywords about the function of the command you want to find as the argument for the apropos command. For example, if we forgot how to display the calendar, we would input:

$ apropos calendar

This will give you all the commands, such as cal and difftime, that have to do with calendars. If we remember that our forgotten command was cal, we can use it, OR we can use man (mentioned above) to check which one of the two commands we want to use.

These are some of the basic commands that you will use often in Linux, but there are many more. You can find many resources for basic Linux commands online and in book form, so we will not rehash all Linux commands here, but encourage you to explore online and try out new commands as you find them.

Command Terminal Vi Editor

The vi editor is a text editor built into Linux. This allows for easy creating and editing of files. You can open a file with vi

$ vi existingfilename

or create a new file by typing a new file name after vi instead of an existing file’s name. The vi editor has two modes: command and insert. The editor automatically starts in command mode.

In this mode, keystrokes are not inserted into the text, meaning that commands for navigating the text, along with a few other commands, can be enacted using character keys. h moves the cursor to the left, j moves the cursor down, k upwards, and l to the right. However, you can still replace characters while in command mode. r allows you to replace the character underneath the cursor, while R allows you to continue replacing characters until you hit "esc". Cutting, copying, and pasting text can also be done in command mode. To cut a single line you use the command dd. However, there are modifications of this command. Adding a number before dd changes the number of lines cut. For example, if you need three lines of text cut, you would type 3dd. To cut words instead of whole lines, use dw, again using a number in front of the command to modify the number of words affected. However, be careful when cutting, as the command will only cut from where your cursor is. Pasting can also be completed in command mode by typing p where you would like the cut string pasted. Copying is done using the command yy and can be modified the same way as cutting. Also while in command mode, you can search your entire text for certain words. Using /vi, for example, would find all examples of "vi" in the text. You can navigate through the instances of the searched word using n to go upwards in the text and N to go down through the text. Also, u undoes the previous change, U undoes all changes to a line, and x will delete a character.

In insert mode, any keystrokes will be added to the text of the file. The mouse is not useful in insert mode, unless you combine clicks with the "alt" or "option" keys. Otherwise all movements of the cursor within the file must be done with the arrow keys. You enter insert mode by typing i which begins the insertion of characters at the cursor, da which inserts characters after the cursor, A which begins insertions at the end of the line, or o which opens a new line in which to insert characters. There will be a line at the bottom of the terminal window indicating that you are in insert mode. To get back to command mode, press "esc". It is a good idea to save when going between insert and command modes by keying :w while in command mode, which will save any changes but not close vi editor. To save and quit, you can key shift-zz, :wq, or :x. To quit without saving changes, you can key :q! which overrides the need to save changes unlike :q which can be used to quit the vi editor if there were no changes.

Job Scripts and PBS Scheduling

What is a job?

A job is a process the user wishes to be carried out. However, a job can be not just a command, but often multiple commands, and generally requires a large amount of memory to proceed. Thus, the submission requires a listing of what the process should do, applications to use, inputs and outputs, the amount of memory allocated to the job, and the number of CPUs the load should be distributed over.

A Job Scheduler

A job scheduler is what takes care of the job once you submit it into the computer or cluster. Since you have specified the environmental paramenters, the scheduler (in this case the Torque scheduler) manages everything you need done in the most efficient way possible.

Submission Scripts

Submissions are done using a PBS (bash) script with file extension (.sh). This script does not detail the job, but the parameters involved in running the job. Consider it a meta file to your job so that the cluster knows how to handle your job request.

Constructing a Submission Script

The script should start off with a “shebang”, #!/bin/bash, which specifies to the terminal that the file is to be executed using bash. Afterwards, the script should specify the following parameters:

Number of nodes (computers) required
Number of processors per node needed
Maximum amount of run time
Application in which to run the job
Host details
Working directory

You can find a list of specific PBS script commands from Westgrid.ca or watch our MPRC Cluster Video on our Video Tutorials page for more information.

* Note: Since the job scheduler takes care of the process, it is not necessary for you to be logged in while the process runs. You can specify an email command in your PBS script telling the process to email you at at the start, end, or incompletion of the job.

Job Scripts Submission

Now that we’ve created our submission script, we need to submit the job to the server. This is accomplished using the qsub command. To submit the script, simply type:

$ qsub filename.sh

To check the status of the submitted job, you use qstate -u username to get a printout of the job progress. Under the S column there will either be a “Q” for qued or “R” for running. If nothing shows up, the job is finished! You can ls to check that the file name is in your directory and then use cat to print the results out in your terminal.

If you ever need to delete your job you can use the qdel command:

$ qdel jobID

or:

$ qselect -u <username> | xargs qdel

For more qsub commands, visit the command documentation at AdaptiveComputing.com