Question

Performance issues and background jobs (default queue)

  • 21 January 2020
  • 15 replies
  • 777 views

Badge +1

Hi,

Over the last few months we have been having performance issues in Apps 9 that have been very difficult to narrow down to a specific cause. We have plenty of overhead in both the database and middleware servers so I don’t think it is a hardware or database performance issue.

What I have been able to determine is that when we get really severe performance issues that we have a large job queue (default - 0) of waiting jobs usually caused by a custom job taking longer than normal to process - say 20mins during which time 6 x 10 Invoice posting jobs enter the queue and wait to be processed, this then prevents other jobs from running and they get added to the queue compounding the problem (sometimes upto 900 in waiting state). During this time the system becomes unusable from logging on to simple queries. 

So I guess I’m asking:

  1. Has anyone else experienced these type of issue?
  2. We have 10 batch queues all set to processes:1 and the majority of jobs are all in 0 - default. Should we have more queues and or allow more processes to run at once?
  3. In sys parameters we were logging completed jobs for 360 days. Would this have an impact?
  4. Is there another way to diagnose the performance issues? when this occurs IFS Mon and EE are both unusable

Many thanks,

Alex


15 replies

Userlevel 5
Badge +11

Hi, 
Please run this SQL by connecting as SYS user to the database and post back the results - 

--------- SQL to find out total memory allocation in Oracle ---------------
select decode( grouping(nm), 1, 'total', nm ) nm, round(sum(val/1024/1024)) mb from
(select 'sga' nm, sum(value) val
  from v$sga
 union all
select 'uga', sum(a.value)
  from v$sesstat a, v$statname b
 where b.name = 'session uga memory'
   and a.statistic# = b.statistic#
 union all
select 'pga', sum(a.value)
  from v$sesstat a, v$statname b
 where b.name = 'session pga memory'
   and a.statistic# = b.statistic#)
 group by rollup(nm);

 

First we need to make sure your database is tuned properly to utilize memory available on the database server, as this is where all the heavy lifting occurs.

This can directly impact how fast your system behaves including background jobs.

 

The next steps will involve reviewing your batch queue configuration to ensure long running jobs run from their own queue so they don’t hold other jobs in the same queue.

You can also increase the processes parameter to >1 assuming there are no interdependent jobs, unlikely in the default queue. Don’t increase the processes value for Finance related queues. 

Badge +1

Hi,

I have run the script as you suggest:

       NM    MB
1    pga    1446
2    sga    16311
3    uga    579
4    total    18336
 

I assuming from this we have 18GB allocated in total to the instance which I would have thought was plenty but obviously this would depend on the number of transactions, etc.

As for reviewing the processes its difficult for us to determine which are independent jobs that don’t rely on other jobs in the queue and which are specifically finance. They are mostly default IFS jobs so is there a list we could have of an optimal setup?

Many thanks,

Alex

Userlevel 5
Badge +11

Hi,

Looking at the numbers, database memory allocation looks OK.

The batch queue configuration varies from client to client and it is difficult to make a suggestion without looking into your system.

Since you mentioned in your original post that there is a custom job taking 20 minutes to run, can you move custom jobs to a new ‘Custom’ queue? This should allow other default jobs which run quickly to process as they get posted to the queue.  You may also need to review into why Custom jobs are taking long time to run and if adding a custom index can speed to improve the time.

You can also move any system related jobs like Validate Indexes, Analyze Indexes, Database Statistics etc to a ‘System’ queue.

The goal here is to organize your batch queue one job at a time to get it to a state where you don’t have any jobs in ‘Posted’ status for long.

You can use this (example) query as APPOWNER to find out jobs stuck in Posted status:

select id, queue_id, procedure_name, created, posted, started, executed, mod(trunc(24 * 60 * (executed - posted)), 60) minutes_elapsed, state from ifsapp.transaction_sys_local_tab
where trunc(posted) = to_date('01/20/2020', 'MM/DD/YYYY')
order by minutes_elapsed

You can use this (example) query as APPOWNER to find out long running jobs:

select id, queue_id, procedure_name, created, posted, started, executed, mod(trunc(24 * 60 * (executed - started)), 60) minutes_elapsed, state from ifsapp.transaction_sys_local_tab
where trunc(posted) = to_date('01/20/2020', 'MM/DD/YYYY')
order by minutes_elapsed

In a system performing smoothly the minutes_elapsed is 0.

As you can see tuning your IFS system is a broad area but staying on top of things will help in extracting the best performance. Hope this info helps.

Userlevel 4
Badge +7

I would definitely suggest you move your custom job into its own queue, or perhaps use a queue that is already there (Data Migration Queue for example). We have always ran 1 process per Batch Queue, with the exception to the fast queue which has 2.

Trouble shooting performance can be a very frustrating experience, and even more so when custom code is being used. It is good to separate this from the vanilla IFS jobs.

We still on the odd occasion get reports of performance issues during the month end period, but I suspect this is not uncommon.

Good luck.

Userlevel 4
Badge +6

In you first entry you noted that you are keeping your completed jobs for almost an entire year? Why so long? If the jobs complete successfully, then there is no a lot you need to the background job details for. What is the row count of each of the following tables?

  • TRANSACTION_SYS_LOCAL_TAB
  • TRANSACTION_SYS_STATUS_TAB

I would suggest keeping,

  • Completed “Ready” jobs for 7 days
  • Warning jobs for 14 days
  • Error jobs for 31 days

You can lower the warning and error window if no one is looking at the error and warning jobs and correcting the issues. For example, if no one ever looks at the errors, then why keep the jobs?

Userlevel 4
Badge +6

Lets also check how many jobs you’re generating per hour. This might show a trend that can be used to re-balance your background jobs to run more even;y throughout the day.

 

select a.the_hour,
  (select count(t.posted) from deferred_job t 
    where trunc(t.posted) = (trunc(sysdate) - 1) 
    and to_char(t.posted, 'HH24') = a.the_hour) qty  
from (select lpad(rownum-1, 2, '0') the_hour from dual connect by rownum <= 24) a
order by a.the_hour;
 

Also, what is the value of your Oracle parameter job_queue_processes?

 

Userlevel 4
Badge +7
  • It is not recommended to increase the number of processes given for queues without doing an analysis first, since it could have adverse effects.
  • There are some jobs that need to be run in a particular order. Some jobs need to be run isolated from others. Some jobs nature is run for a long time which could block the other jobs in the queue. In addition to this parallel execution of some jobs can occur deadlocks as well from the database level.
  • Sometimes we have noticed performance drawbacks due to the value set for the "job queue processes” parameter in the oracle database. If job_queue_processes is too low, jobs may be seen waiting in the execution queue for a process to become free.  

We can decide the optimum value for Job Queue Processes parameter using the below formula.
 (TotalIFSBatchQueueProcesses + OracleReservedProcesses + 3) <= JobQueueProcesses

 

Get the current number of batch queue processes(run as sys in a cmd window)

> Show parameter job_queue_processes;

Change the number of batch queue processes:

> ALTER SYSTEM SET job_queue_processes=XX scope=both;  

 

  • With regard to your situation as suggested in previous comments as well, it is better to do an analysis and move the custom jobs into a separate queue if they take long execution time.Increase the number of processes per queue also need be done after considering above factors.

 

-------- JOBS THAT RUN FOR MORE THAN FIVE MINUTES -----------
select job_id,state,procedure_name, queue_id,arguments_string, started,executed, (executed - started) * 86400 timeinsec 
from deferred_job 
where ((executed - started) * 86400) > 300 AND posted > SYSDATE -30
order by timeinsec desc

------------- COUNT OUTPUT -------------------
SELECT DISTINCT procedure_name,description, COUNT (*) FROM deferred_job WHERE ((executed - started) * 86400) > 300 
AND started  > SYSDATE -30
GROUP BY procedure_name,description

You can adjust the 5 minutes threshold to get a realistic output if more jobs are running than 5 minutes.Please analyze the jobs with previous months and identify a pattern or jobs that usually run for a longer period. 

 

Above output will help to identify problematic jobs and move them to a new queue if needed.

Userlevel 1
Badge +4

 @PROAHAR 

I noticed you said that the custom job was taking longer than normal to process… could you post any details as to this custom job as although creating alternative queues etc etc will stop this issue from impacting other jobs, it doesn't resolve the jobs execution time so lets also have a look at the cause… 

Badge +1

Hi Charith,

Our job_queue_processes is currently set to 10 - this seems to low. I’d like to use your formula, how do I know what the following values are? 

TotalIFSBatchQueueProcesses  

OracleReservedProcesses

Alan - TRANSACTION_SYS_LOCAL_TAB has 1,539,777 and TRANSACTION_SYS_STATUS_TAB has 775,735 rows - we have reduced the numbers of days completed tasks are held but the help record says they are removed by a clean up job - is this normally daily?

Many thanks to you all,

Alex

 

Userlevel 5
Badge +11

JOB_QUEUE_PROCESSES parameter:

We recommend clients to bump job_queue_processes parameter to 50 or 100 as there is no harm in setting it to a high value. The formula is great but setting a higher value won’t cause any issues as Oracle & IFS jobs use the seats they need.

All IFS background jobs are controlled by processes value for each queue.

You can issue this command to increase the value and no database restart required.

SQL> alter system set job_queue_processes=50 scope=both;

 

TRANSACTION_SYS_LOCAL_TAB & TRANSACTION_SYS_STATUS_TAB tables:

The data from background job tables are removed by the Cleanup Database Tasks which are scheduled to run at least once every day in a standard IFS System. 

The number of days worth of data to be retained is controlled by parameters based on background job State, set under IEE → System Parameters

 

It is recommended to review Error and Warning jobs periodically to make sure the errors are OK from a functional standpoint. 

Userlevel 5
Badge +11

@PROAHAR the right answer is to move your custom job to a separate queue so that it doesn’t block unrelated jobs in the Default queue from processing.

This assumes that your custom job is unrelated to any of the other jobs that run in the Default queue, i.e. there are no dependencies that need this to finish before they can process.

The Default queue should never really be adjusted to have more than 1 process because the standard jobs that run in there assume sequential processing, often with dependencies.

So… (1) create a new queue - or as @MikeArbon mentioned you can find an existing queue that is mostly unused - then (2) change your custom jobs to process within that queue.  You will probably be OK to simply define the new queue to use only 1 Process unless you have high volumes of those jobs and don’t care if your custom jobs run in parallel/have no dependency...  Since you mentioned that they have been running the Default queue all this time with a Process count set to 1, I’d assume that is enough for you.

In terms of the job_queue_processes parameter, if you have 10 queues each with 1 process assigned then job_queue_processes =10 is definitely too low.  You can use the calculations in this thread to get a higher number, or simply go with total number of processes assigned to the batch queues + 10 as a quick fix (which would be 20 in your case).

Nick

Badge +1

This is great thanks all. We sorted the jobs out yesterday and found no custom jobs in the default queue in fact the ones taking the time were related to price recalculation so we moved these to their own queue. We are now left with these that take over one min to run:

ID QUEUE_ID PROCEDURE_NAME MINUTES_ELAPSED STATE
15758059 0 Shop_Order_Cost_Util_API.Calc_Aggregate_Costs 11 Ready
15757961 0 Shop_Order_Cost_Util_API.Calc_Aggregate_Costs 9 Ready
15758639 0 Mrp_Process_API.Do_Mrp_Process__ 5 Warning
15758062 0 Shop_Order_Cost_Util_API.Calc_Aggregate_Costs 4 Ready
15757865 0 Mrp_Process_API.Schedule_Mrp_Process__ 4 Warning
15757993 0 Credit_Coll_Info_Util_API.Recalc_Customers_Scheduling 3 Ready
15757864 0 Mrp_Process_API.Schedule_Mrp_Process__ 2 Ready
15758124 0 Posting_Prop_Inv_Util_API.Start_Reverse_Po_Matching 2 Ready
15757870 0 Mrp_Process_API.Schedule_Mrp_Process__ 2 Ready
15757868 0 Mrp_Process_API.Schedule_Mrp_Process__ 1 Ready
15757797 0 App_Message_Processing_API.Cleanup 1 Ready
15758058 0 Shop_Order_Cost_Util_API.Calc_Aggregate_Costs 1 Ready
15758833 0 Mrp_Process_API.Do_Mrp_Process__ 1 Ready
15758057 0 Shop_Order_Cost_Util_API.Calc_Aggregate_Costs 1

Ready

 

Shop order cost util we left in as an experiment but what do we think about MRP, Posting Prop and App Message jobs? As these are in the default queue by default but don’t know if they rely on other processes to run.

Many thanks,

Alex

Badge

@Srikanth 

 

Can you please look at the below numbers and let me know, if the database memory allocation looks OK or otherwise ?

NM            MB

---------------------

PGA         8713

SGA       81554

UGA         6864

TOTAL   97132

 

Thanks, Sathish

Userlevel 5
Badge +11

Memory allocations vary from one client environment to other, but the above settings look  quite generous for a standard IFS environment.

Are you facing any specific performance issues right now?

Badge

@Srikanth 

 

Appreciated and thank you so much for your speedy reply.

 

Please see attachment, Is there any possibilities to debug and see what caused the background job to execute “Production Receipt” procedure for more than an hour ?

 

Thanks, Sathish 

Reply