ECN No Name Newsletter: May, 1989

The ECN No Name Newsletter is no longer being published. This is an archived issue.

[previous article] [next article]

Number Crunchers

Stacey Clark

Large programs are often called "Number Crunchers" because they use the computer to do calculations that would be very tedious if done by hand. Most ECN sites have designated their Gould machines to be their "number crunching" machine. However, there are some limitations, due to the fact that these machines are also used to support the undergrad users.

Sometimes people need to run the same program repeatedly because they are trying different variations with their data. If one person fires off 10 number cruncher jobs, they could drag the whole machine down. The Gould CPUs have a program built into the kernel that compensates for what are fondly called "hogs". Most ECN machines are run with the policy of "a maximum of 2 number crunchers can be run simultaneously." The Gould processor looks at who is running the job and swaps between the large jobs. When you type "uptime", it basically reflects on the number of crunchers in the queue, plus a few heavy "vi" sessions can equal the effect of a number cruncher. So, when you type "ps -axuR" to see the crunchers, the number of lines of information reported should be close to the load average reported from uptime.

USER    PID PR  %CPU    %MEM    SZ      RSS     TT      STAT    TIME    COMMAND
badhog  20127   34.6    0.3     247     80      q4      R       5:38    run
badhog  20140   34.6    0.3     247     80      q4      R       9:28    run2
whitley 23127   34.3    0.2     139     65      pf      R       9:57    a.run
bigfoot 18127   33.8    1.2     9800    357     pb      R N     39:24  RMREX.10M
clarkst 2750    79.1    0.0     0.2     47      52      R       10:00   ps -axuR

Assuming the users from the above table are using a Gould (EI, EN, CN, GN, or MN) swapping would be as follows:

badhog's  run,  whitley's a.run, bigfoot's RMREX.10M,
badhog's  run2, whitley's a.run, bigfoot's RMREX.10M, etc.

Under this swap arrangement, the "badhog" jobs would take longer to run simultaneously than they would have if they were fired off serially. This way the affect on the other machine users is minimized. Sometimes people fire off jobs on a protected terminal and go home, leaving themselves logged in, so that they can obtain timed results. The same thing can be done with a shell script, if the output is captured into a file:

  $ fort. temp.f   [compile the FORTRAN program]
  $ mv a.out RUN   [rename the executable as "RUN"]
  $ ex timeit      [ edit a file called "timeit"]
  :a
  time RUN
  .
  :wq
  $ chmod +x timeit  [make "timeit" executable]

CAPTURE THE OUTPUT in the file err:

Bourne Shell                    C Shell
 $timeit > err 2>&1 &   % timeit >& err &

 $ cat err
    -- has all standard output, screen stuff --

You could use the shell script timeit to run multiple jobs like this:

  $  ex timeit
  : a
  date >> err
  echo "run 1" >> err
  time a.out < input1 > output1
  echo "run 2" >> err
  time a.out < input2 > output2
  echo "run 3" >> err
  time a.out < input3 > output1
  echo "all done" >> err
  date >> err
  : wq
  $ chmod +x timeit
  $ touch err     (to make sure the file exists,
       BECAUSE  if you are using "no clobber" then
       you can't append (>>) to a non-existent file!)

Bourne Shell                    C Shell
 $timeit >> err 2>&1            % timeit >>& err
                          --OR--
 $timeit >> err 2>&1 &  % timeit >>& err &

If you are running /bin/csh, then you can logout. The "time" information will be captured in the file err as well as the date, and items echoed in the timeit command. Make sure that you always use the append (>>) to add to an existing file, because ">" writes over existing files.

NOTE: The "time" function is different in the Bourne Shell ($) than it is in C Shell (%). A shell script can be set to run a certain shell by the first line in the shell script.

Bourne Shell                    C Shell
 #!/bin/sh                              #!/bin/csh
 more commands                  more commands

Also note that if you are a C Shell user and the job takes more than 1 CPU hour to run, you will need to use the limit cputime u or unlimit command to insure that the job finishes. These commands should be used carefully. If all your jobs have unlimited time allowed on the computer, it is your responsibility to kill off runaway jobs.


webmaster@ecn.purdue.edu
Last modified: Thursday, 30-Oct-97 17:49:42 EST

[HTML Check] HTML