Skip to main content

Has anyone seen this issue and know how to fix it?  The problem is:

The Default Batch Queue seems to run for awhile as it should, and then decides to take a break for a random amount of minutes.  Sometimes 3 or 4 minutes and up to around 15 minutes….and then it starts running jobs again.  This is causing many jobs to pile up and that queue gets farther and farther behind.  The alert monitor for number of batch queues stuck stays on most of the time, but I don’t see that it is truly stuck, it just seems to take long breaks.

Anyone have ideas?  This is Apps 9, and yes, I have done the break, reactivate and re-init the queue.  Have even bounced the database.

Any ideas would be welcomed….

Hi @STMVALENTINE ,

Did you noticed any long running job when this happen ? If all allocated processe in use, jobs will pile up until those processe released to the queue to pick another job to execute ?

aprat from that I belive this is something you can check as well.

Whether the Optimum value for job queue processes parameter is set.?

If the Optimum value for job queue processes value is less than the equation, you can try out setting it to a correct value.

We can decide the optimum value for Job Queue Processes parameter using the below formula.
(TotalIFSBatchQueueProcesses + OracleReservedProcesses + 3) <= JobQueueProcesses

Number of job_queue_processes should usually be "the sum of processes assigned for each batch queue in batch queue configuration” + “5 more processes for Oracle internal jobs”.

Eg - 

1.Go to Batch Queue Configuration window. 

2. Go to each of the batch queues and look for the number of processes, then calculate the total number of processes. 
Eg: If you have 6 batch queues and each queue had one process, then the TOTAL = 6.  

3. Log into the database as SYS and execute the following statement:  
SHOW parameter job_queue_processes; 
The value of the above parameter should at least be equal to (TOTAL + 5 ) 
So according to above example, the value of the parameter should be 6+5=11 

4. If it is less than that, increase the value by executing the following statement:  
ALTER SYSTEM SET job_queue_processes=11 scope=both;  
change the number of processes of the above statement accordingly.  

5. Restart the database.

Hope this helps.


@Charith Epasinghe , thanks for your response.  Our batch queue processes is well within the adequate limits.  The issue I am having is where the queue is supposed to be checking and processing waiting jobs every 30 seconds, that there are NO jobs running at all and then some times a few minutes later up to 15 minutes or so then the jobs start processing again.  Then after running several of the waiting jobs it just stops again and leaves them all in a POSTED state and waits again until it feels like processing again.  Seems to get crazy with lots of jobs piling up that only take fraction of seconds to run once they start running.

Hopeful that someone else has seen this and can give some ideas...


@STMVALENTINE , - 

1. What is the oracle version we are reffering here. ? 
FYI below found as a known bug in 12c  bug 21047218 . If so try out installing latuest bandle patch.

(Doc ID 2097993.1) - Scheduler Job Stuck Approximately For 23 Minutes Intermittently and then Resumes Automatically

2.Have you checked the DB alert logs? able to find something there?

3.What is the IFS Apps verion reffering here ?
When the issue occurs, have you checked the window "database processes" in IFS EE client? what is shown as the status for the processes? do you see any errors? 


@Charith Epasinghe thanks again for your response.  Oracle 12.1 EE and App9 

I am looking into the Doc ID 2097993.1, this may be on to something.  


While it’s stuck, are there any blocks?

Is this a RAC environment?

-- Show waiters

SELECT *
FROM gv$session
WHERE final_blocking_session IS NOT NULL;

-- Show blockers and waiters
SELECT *
FROM gv$session
WHERE final_blocking_session IS NOT NULL
OR sid IN (SELECT final_blocking_session FROM v$session);

 


@durette  there are no blockers or waiters and yes, it is RAC.

I am waiting for the DBA people to check on the Doc ID 2097993.1 thing, they have a Oracle case open.  So, right now that is where I am leaning for a best answer.  I will update with more….


@durette  there are no blockers or waiters and yes, it is RAC.

I am waiting for the DBA people to check on the Doc ID 2097993.1 thing, they have a Oracle case open.  So, right now that is where I am leaning for a best answer.  I will update with more….

If that doesn’t work out, I have a hunch you’re waiting for the TRANSACTION_SYS_LOCAL_TAB table to get synchronized between your nodes.

In IFS Enterprise Explorer, under Batch Queue Configuration, under Cluster Setup, there’s a field called “Attached to Node”. If you can afford the performance hit, moving all your background jobs to the same node might prevent any synchronization delay.

Make sure your TRANSACTION_SYS_SEQ sequence has a healthy cache size. (I believe the default is 20.)


20 may not be high enough. Check out Don Burleson’s test results here, as he increased the sequence cache size for parallel execution across nodes:

Sequence cache and RAC (dba-oracle.com)


Not sure if I can choose two best answers, because you both have the best answers.  The DBA people are working on the fix for the oracle bug, and I still think there are some bumps to the cache amount to improve synchronization between the nodes would be good too.  Currently that sequence is at the default 20.  Hopefully this situation will be resolved soon.  If not, you will see my return :)