Difference between revisions of "Useful Tools"

From Computer Science
Jump to navigationJump to search
Line 159: Line 159:
 
  $ [user@camelshump ~] condor_submit your_submit_file.sub
 
  $ [user@camelshump ~] condor_submit your_submit_file.sub
  
After a moment, you should get a response that your job has been submitted to cluster 1.
+
After a moment, you should get a response that your job(s) has been submitted.
  
 
===Preparing Your Environment===
 
===Preparing Your Environment===

Revision as of 10:58, 4 September 2017

The following is a list of useful tools that you may find helpful while navigating your CS major/minor

If you have any tools you believe belong on this list, please email Ruben Gilbert

Notes on Style

In order to easily read this guide, you should be aware of the notation being used. A statement may read:

$ [user@my-machine ~]: command <variable> some/kind/of/path

The $ is simply denoting that this line is from a console window. In other words, the text following a $ should be read as though it were in a Terminal window.

The [ ] enclose a typical terminal prompt, which usually contains the name of the user @ the machine they are currently logged into.

The ~ is shorthand for home folder (and is generally the default location a console opens to). If you changed to a different directory, that ~ would become the bottom-level folder of your current path (e.g. if you are in /home/<username>/Documents/cs101, you would just see cs101 -- NOTE: You may change these settings by editing your ~/.bashrc file)

The : will be followed by the command being referenced.

Any variable value that changes based on the user will be enclosed in < >. (e.g. when a command requires a <username>, we don't actually enter "<username>", but instead your username).

A command that requires some/kind/of/path will need you to specify a path for the command to run to/from. Generally speaking, the path can be relative or absolute

Application Managers

nvm (for nodejs)

Fill

Miniconda

Miniconda is a stand-alone release of the popular package manager conda. Its older brother is Anaconda -- the conda package manager bundled with 150+ packages.

TODO: Examples of usage and reasons for virtual environments

File System Resources

By enrolling in a Middlebury CS course, a CS user account is created for you. This account inherits your Middlebury username and password, but has resources that are distinct from your college account.

Home Folders

The account that is generated for you is a typical Unix user account. This means it has the standard folder structure you would expect on a Linux or Mac machine (i.e. Desktop, Documents, Pictures, etc).

You are the owner of your account and all files contained within your subdirectory. You are free to delete folders/files inside your user directory as you wish (now, remember, "Just because you can, doesn't mean you should" -- but if you really have an itch to delete everything, you can). You can create subfolders for classes or projects whenever you want. You can upload files to or download files from your user directory freely. It is YOUR account.

By default, your folder is viewable by anyone on the network (permission level rwxr-xr-x, or 755). This means people who are not you can view and execute (but NOT write to) files inside your user directory. If you would like to change this, talk to Ruben or research the chmod command.


public_html Folders

Fill

HTCondor

HTCondor is a specialized workload management system that excels at providing high-throughput computing via collections of distributively owned computing resources.

What does that mean?!

Condor uses otherwise unused CPU cycles to compute various sets of jobs.

What does that mean with respect to Midd CS?!

The MBH 632 lab is full of machines that are often idle or under light use. The condor system allows us to utilize the unused computing power of the lab to 1) run processes that we may otherwise not have the computing power to perform efficiently and/or 2) drastically shorten the amount of time it takes to run large numbers of iterations of the same set of code and/or 3) run a job away from a personal machine (eliminating hazards such as accidentally killing a job by letting your laptop sleep, etc). In the following sections, I look to provide a summary of what the system entails, as well as how you can use it.

What is Condor?

Before we start throwing code at condor, we should understand, in a general sense, what the system is and how it is setup. Here is a link to the full condor manual, if a daring soul is interested. Be warned, it is an extremely long set of documents (1146 pages to be exact), much of which is not pertinent to you using condor.

From any MBH 632 lab machine console, you can run the command:

$ [user@lab-machine ~] condor_status

This is, expectedly, polling the condor system for its current status. You should receive output looking something like the following (NOTE: condor_status is a one-time polling command. It does not auto-update):

Name                                   OpSys      Arch   State     Activity LoadAv Mem   ActvtyTime
slot1@abe.cs.middlebury.edu            LINUX      X86_64 Owner     Idle      0.000 1994  2+16:35:36
slot2@abe.cs.middlebury.edu            LINUX      X86_64 Owner     Idle      0.000 1994  2+16:35:37
slot3@abe.cs.middlebury.edu            LINUX      X86_64 Owner     Idle      0.000 1994  2+16:35:38
slot4@abe.cs.middlebury.edu            LINUX      X86_64 Owner     Idle      0.000 1994  2+16:35:39
slot5@abe.cs.middlebury.edu            LINUX      X86_64 Owner     Idle      0.000 1994  2+16:35:40
slot6@abe.cs.middlebury.edu            LINUX      X86_64 Owner     Idle      0.000 1994  2+16:35:41
slot7@abe.cs.middlebury.edu            LINUX      X86_64 Owner     Idle      0.000 1994  2+16:35:42
slot8@abe.cs.middlebury.edu            LINUX      X86_64 Owner     Idle      0.000 1994  2+16:35:35
slot1@battell.cs.middlebury.edu        LINUX      X86_64 Unclaimed Idle      0.000 1994  0+00:04:35
slot2@battell.cs.middlebury.edu        LINUX      X86_64 Unclaimed Idle      0.000 1994  0+00:05:05
slot3@battell.cs.middlebury.edu        LINUX      X86_64 Unclaimed Idle      0.000 1994  0+00:05:06
slot4@battell.cs.middlebury.edu        LINUX      X86_64 Unclaimed Idle      0.000 1994  0+00:05:07
slot5@battell.cs.middlebury.edu        LINUX      X86_64 Unclaimed Idle      0.000 1994  0+00:05:08
slot6@battell.cs.middlebury.edu        LINUX      X86_64 Unclaimed Idle      0.000 1994  0+00:05:09
slot7@battell.cs.middlebury.edu        LINUX      X86_64 Unclaimed Idle      0.000 1994  0+00:05:10
slot8@battell.cs.middlebury.edu        LINUX      X86_64 Unclaimed Idle      0.000 1994  0+00:05:03
slot1@bloodroot.cs.middlebury.edu      LINUX      X86_64 Unclaimed Idle      0.010 1994  0+00:04:34
slot2@bloodroot.cs.middlebury.edu      LINUX      X86_64 Unclaimed Idle      0.000 1994  0+00:05:05
slot3@bloodroot.cs.middlebury.edu      LINUX      X86_64 Unclaimed Idle      0.000 1994  0+00:05:06
slot4@bloodroot.cs.middlebury.edu      LINUX      X86_64 Unclaimed Idle      0.000 1994  0+00:05:07
slot5@bloodroot.cs.middlebury.edu      LINUX      X86_64 Unclaimed Idle      0.000 1994  0+00:05:08
slot6@bloodroot.cs.middlebury.edu      LINUX      X86_64 Unclaimed Idle      0.000 1994  0+00:05:09
slot7@bloodroot.cs.middlebury.edu      LINUX      X86_64 Unclaimed Idle      0.000 1994  0+00:05:10
slot8@bloodroot.cs.middlebury.edu      LINUX      X86_64 Unclaimed Idle      0.000 1994  0+00:05:03
.
.
.
slot1@wilson.cs.middlebury.edu         LINUX      X86_64 Unclaimed Idle      0.000 1994  0+00:04:34
slot2@wilson.cs.middlebury.edu         LINUX      X86_64 Unclaimed Idle      0.000 1994  0+00:05:05
slot3@wilson.cs.middlebury.edu         LINUX      X86_64 Unclaimed Idle      0.000 1994  0+00:05:06
slot4@wilson.cs.middlebury.edu         LINUX      X86_64 Unclaimed Idle      0.000 1994  0+00:05:07
slot5@wilson.cs.middlebury.edu         LINUX      X86_64 Unclaimed Idle      0.000 1994  0+00:05:08
slot6@wilson.cs.middlebury.edu         LINUX      X86_64 Unclaimed Idle      0.000 1994  0+00:05:09
slot7@wilson.cs.middlebury.edu         LINUX      X86_64 Unclaimed Idle      0.000 1994  0+00:05:10
slot8@wilson.cs.middlebury.edu         LINUX      X86_64 Unclaimed Idle      0.000 1994  0+00:05:03
                Machines Owner Claimed Unclaimed Matched Preempting
       X86_64/LINUX      248     8       0       240       0          0
              Total      248     8       0       240       0          0

In more sophisticated setups of Condor, the output of condor_status may look less uniform. For our purposes, we have a lab with 31 identical machines. Therefore, we are going to see a long list of similar information. The info we get is broken down into the following categories:

  • Name specifies the slot (thread) and name of the machine.
    • The MBH 632 lab machines have 4 core, 8 thread CPUs, which Condor identifies as "slots". Each slot can have a job matched to it.
  • OpSys is the operating system of the machine.
    • All of the lab machines are running Fedora 25, so we see LINUX.
  • Arch is the architecture of the machine's CPU.
    • All of the lab machines have 64-bit Intel i7 CPUs using the Intel x86 instruction set (think CS202!)
  • State refers to a machine's current activity.
    • The only states you really need to know about are Owner, Unclaimed, and Claimed.
      • Owner references the Central Manager machine.
      • Unclaimed means that particular slot of the machine is available to handle a condor job
      • Claimed means that particular slot of the machine is claimed by a condor job (and may, or may not, actually be running it)
  • Activity specifies what the slot is currently doing. You will likely only ever see Idle or Busy here, which are self-explanatory.
  • LoadAvg denotes the percentage usage of that slot (not very useful for non-admins)
  • Mem is the amount (in MB) of memory (RAM) available to that slot.
    • Each lab machine has 16 GB of RAM. The 16 GB (16384 MB) of RAM has roughly 15954 MB of usable memory. Divide that among 8 slots and you get 1994 MB of memory per slot.
    • There are ways to specify the need for more memory in a condor job, but, for our purposes, 1994 MB per slot is plenty (and you don't really get a choice because all the machines are the same!)
  • ActvtyTime is just noting how long a machine has been in its current state (not very useful for non-admins).

Types of Machines

In most condor pools (including the Middlebury pool), there are 3 types of machines: Central Manager, Submit, and Execute. The major distinction between each type of machine lies in which condor daemons are running.

Central Manager

The Central Manager machine is solely responsible for resource allocation in the condor pool. When a job is submitted, it is the Central Manager's role to "advertise" the job to all execute machines (i.e. the Central Manager looks at the job request and compares it to the qualities of every execute machine until it finds a viable match, at which point it ships off the job to be executed). In the Midd configuration, the Central Manager machine is (abe.cs.middlebury.edu). This machine will never run any jobs (for the sake of security). This is why it holds the specific state of Owner, and will never become Unclaimed.

Submit Machine

At least one machine needs to be able to submit jobs to the condor pool. In the Midd configuration, there is exactly one (1) machine that can submit jobs to the pool: camelshump.cs.middlebury.edu (the machine closest to the printer). Submit machines can also be execute machines (in the Midd configuration, this is true -- camelshump submits jobs, but can also execute them).

Execute Machine

Execute machines are exactly that: machines that execute jobs. They stand-by in an idle state until they receive a request from the Central Manager.

How to Submit a Job

  • Step 0: Make absolutely sure your code is free of infinite looping and/or infinite recursion errors. Once inside the condor system, there is no way for you, an admin, or anyone else to determine if the job that has been running for days is actually still running or will never reach completion. As a rule of thumb, admins don't want to remove anything from the queue unless specifically asked by the user (user's also have the ability to remove their own jobs from the queue -- See: How To Remove a Job).
  • Step 1: Ensure you are connected to the Submit Machine with your Middlebury account. This can be either locally or remotely via SSH.
    • If you attempt to submit a job from a non-submit machine, you will be met with: ERROR: Can't find address of local schedd. That's condor's way of telling you that you submitted from the wrong machine.
  • Step 2: Make sure your environment is prepared.
  • Step 2: Write a Submission File
  • Step 3: Submit your job to the condor pool:
$ [user@camelshump ~] condor_submit your_submit_file.sub

After a moment, you should get a response that your job(s) has been submitted.

Preparing Your Environment

Writing a Submission File

Monitoring Your Submission(s)

How to Remove a Job

FAQ

THIS SECTION IS A WORK IN PROGRESS

Shell Commands/Tools

PuTTY

PuTTY is an open source SSH client developed for the Windows platform. You can download it here (just the putty.exe binary form will do, but if you are feeling ambitious you can download the .msi installer for all of the tools).

The PuTTY client has many options for customization (similar to optional arguments with the ssh command). But, to get the basic usage out of it, all you need to do is supply the full hostname of the machine you want to connect to.

Example

Picture example

If the connection can be made, you will be prompted with a window asking you the username that you would like to login with. After entering a username, you will be prompted for the password associated with the username. Upon successful authentication, you should see a terminal-esque window like what you would see in a Unix environment.


SCP

Secure CoPy is a command that combines the ssh command with the cp (copy) command. SCP can be used to push local files to a remote server, or it can be used to get a file from a remote server and save it locally.

You can find the manual page here or by typing "man scp" in a terminal window.

NOTE: This command only works for UNIX environments. For Windows, see the pscp utility provided by PuTTY

Typical Usage

$ [user@my-machine ~]: scp path/to/file/to/send <username>@remote-location:path/to/location/to/put/file

OR

$ [user@my-machine ~]: scp <username>@remote-location:path/to/file/to/get path/to/location/to/put/file/locally

Examples

$ [user@my-machine ~]: scp ./Documents/my_homepage.html user@basin.cs.middlebury.edu:~/public_html/homepage/
$ [user@my-machine ~]: scp user@basin.cs.middlebury.edu:~/cs101/homework1.py ~/Documents/this_semester/cs101/

Tip: You can select multiple source files in one scp command. SCP will evaluate as many files as you select until it reads a destination location (i.e. if you provide many local files, it knows to continue reading local files as sources, and when it reads a remote location, that is the destination, and vice versa). You can also supply the -r argument to recursively push entire directories.


SSH

Secure SHell is a network protocol that allows remote console (i.e. terminal) login from one machine to another. The Middlebury CS department machines all support SSH from the on-campus network. In addition, there is one machine (basin) that is specifically given a hole in the Middlebury firewall to allow off-campus connections.

You can find the manual page for the SSH command here or by typing "man ssh" on a Mac or Linux terminal. From the manual, the SSH command looks complicated; it's not! There are many optional arguments supported, but to get basic functionality all you need to supply to the command is the username you want to connect with and which machine you want to connect to. If the connection can be established, you will be prompted for the password of the account you are trying to connect with. If the connection cannot be established, you will be given some form of a "cannot resolve hostname" or "connection timed out" error (usually this means the machine is either disconnected from the network, or powered off).

NOTE: The examples below are all in UNIX format. To use SSH from a Windows machine, see the #PuTTY section.

Verifying Authenticity

The first time you remotely connect to a machine, you will be given a warning that the authenticity of the machine you are trying to connect to cannot be verified. Assuming you have correctly entered the name of a Middlebury managed machine or a Middlebury IP address, you can safely ignore this and enter "yes". In the grand scheme of things, if it's your first time connecting to a machine and you are expecting to get this response, you can usually ignore it and enter "yes". But, for the sake of completeness, you should be aware that spoofing is a thing.

Typical Usage

$ [user@my-machine ~]: ssh <username>@machine-name

OR

$ [user@my-machine ~]: ssh <username>@ip-address

Examples

$ [user@my-machine ~]: ssh <username>@killington.cs.middlebury.edu
$ [user@my-machine ~]: ssh <username>@140.233.20.155

Tip: If you are off-campus and need to connect to a specific machine, you can tunnel through basin to the machine you need with two ssh commands.

$ [user@my-machine ~]: ssh <username>@basin.cs.middlebury.edu
$ [username@basin ~]: ssh <username>@killington.cs.middlebury.edu

OR

$ [user@my-machine ~]: ssh -t <username>@basin.cs.middlebury.edu ssh <username>@killington.cs.middlebury.edu

Tmux

Tmux is a Terminal MUltipleXer. It allows for multiple consoles within a single window, as well as the ability to detach and reattach processes from a single session. You can find the manual page here or by typing "man tmux" in a console.

NOTE: Tmux is a UNIX-only command. While untested by this author, the popular Windows-equivalent is ConEmu (short for Console Emulator). It allows for multiple Command Prompt or PuTTY sessions to be emulated alongside one another.

NOTE 2: If you are a Mac user and would like this utility installed on your machine, go here. It is installed on all CS Linux machines by default.

Keyboard Shortcuts

Keyboard shortcuts can be edited in the file ~/.tmux.conf. Here is a popular community cheat sheet.

Prefix

Tmux commands come after what is referred to as a "command prefix". By default, "ctrl" + "b" is the command prefix. You can edit your prefix in #Keyboard_Shortcuts. In this tutorial, "ctrl" + "b" will be shortened to CB, and commands will be written in the format CB --> <key_to_press_after_prefix>.

Examples

You can achieve the most basic functionality of Tmux simply by calling it:

$ [user@my-machine ~]: tmux

Nothing will appear to happen, except the bottom of your console should change color. This means you are in a tmux session with one pane.

Let's split our single tmux pane horizontally. CB --> "

Now we have 2 consoles, one on top of the other. These consoles are independent of one another. You can use one to ssh to a remote server, and the other to search for local files on your machine, for example.

You can swap panes with CB --> o or, you can allow tmux to read mouse input by inserting the line set -g mouse on into your .tmux.conf file. You can split panes as many times as you want (or until they become illegible!) with CB --> " for horizontal and CB --> % for vertical.

You can detach a tmux session from the current console with CB --> d (or by typing tmux detach). This means the process remains running, but as a distinct program separate from the current console window (useful, for example, if you are ssh'd into a server and want a process to continue running after you logout). You can reattach a tmux session with:

$ [user@my-machine ~]: tmux attach

If you intend to have multiple sessions of tmux running inside a single console session, you can name, attach, detach, and switch between them:

$ [user@my-machine ~]: tmux new -s session1
$ [user@my-machine ~]: tmux detach
$ [user@my-machine ~]: tmux new -s session2
$ [user@my-machine ~]: tmux detach
$ [user@my-machine ~]: tmux attach -t session1
$ [user@my-machine ~]: tmux switch -t session2

If you don't remember all of the sessions you have running, you can use the call:

$ [user@my-machine ~]: tmux list-sessions

to remind you what they are all called.

To kill the current pane use CB --> x. This will prompt at the bottom of your screen for a y/n to confirm.

There is so much more you can do with tmux, but this is just some basic functionality to get you up and running.

TTY Environments

FILL