Aleph jobs not running - batch queue stuck

Occasionally a ticket will come in with the staff member reporting that he/she submitted a job at xx:xx and it still hasn't run. Follow the instructions below to find and kill the stuck job and unlock the library.

Step-by-step guide

Amazingly enough, ExLibris does not have a util or function to kill or stop an individual batch job once it's been started. In order to stop a job, then, you will need to get to a command-prompt and execute a "kill" command. The following instruction on how to do so are located in the old PRB system:

PRB Ref: 000005397 (6/16/2003)

Problem: Is there a safe way to stop a job once it has begun?

Answer:Killing jobs is a unix system function and you should consult your unix system administrator for whatever local policies you might have.That said, we can offer these guidelines:There are ALEPH utilities for stopping servers (util w), daemons (util e), and the batch queue (util c). If these don't work, you may need to kill the underlying process in a fashion similar to what we describe below. (See PRBs 001, 077, and 2129.)There is no utility for stopping a batch job. You need to use the unix "kill" command.First, you should be certain that the job really needs to be killed. Consult PRB 850 in this regard.

1.Locate the processes associated with the job, you enter a command like this: "ps -ef | grep manage_01". (Note: some systems may require tick-marks: "ps -ef | grep 'manage_01'".) You should not include the "p_" prefix.

For example, you see this:

aleph 7207 7178 0 11:37:24 pts/7 0:00 grep manage_01
aleph 8651 7730 0 00:55:28 ? 0:00 csh -f p_manage_01_a VCU01,1,0000 00000,999999999,,4, p_manage_01 3
aleph 7730 7729 0 00:54:42 ? 0:00 csh -f /exlibris/a50_5/aleph/proc /p_manage_01 VCU01,1,000000000,999999999,,4,
aleph 19001 8595 21 08:36:54 ? 137:51 /exlibris/a50_5/aleph/exe/rts32 b_manage_01_a

 2.Then you do a kill command for each process. The process number you enter in is the first number in the line. You can include multiple processes in a single kill command. Your grep process will show up in the display. You don't want or need to kill that.

 

You would enter in:

 

kill -1 8651 7730 19001

 

Do the "ps -ef | grep" to verify they are gone.

 

If not, repeat the process using "kill -9" instead:

 

kill -9 7730 19001
3. When you kill a job like this, it will leave the library locked. Before restarting the job you will need to use util c/6 to unlock the library.

More Answer: The above sequence should be enough. The kill -1 (or -9) should kill all subsidiary processes which were spawned. As a double-check you can look for tables this job might be loading:

ps -ef | grep -i z9

or 

ps -ef | grep -i z0

NOTE: The "Task Manager" in the GUI also provides an indicator of the parent process' PID. It doesn't provide the spawned processes' PIDs, however.