2 返信 最新の返信: 2012/07/25 17:27 によって 700499 RSS

Process wait SQL*Net message from dblink /SQL*Net message from client

700499 Journeyer
現在モデレーション中
Hi There,
We have an ETL process that we kindly need your help with. The process been running since Sun, where it transfers the data from one server (via remote query). The process was running ok till last night where it appeared
to have stopped working and/or the session is just idling doing nothing.

Here are some tests that we did to figure out what's going on:

1. when looking at the session IO, we noticed that it's not changing:
etl_user@datap> select sess_io.sid,
  2         sess_io.block_gets,
  3         sess_io.consistent_gets,
  4         sess_io.physical_reads,
  5         sess_io.block_changes,
  6         sess_io.consistent_changes
  7    from v$sess_io sess_io, v$session sesion
  8   where sesion.sid = sess_io.sid
  9     and sesion.username is not null
 10     and sess_io.sid=301
 11  order by 1;

                    logical   physical
  SID BLOCK_GETS      reads      reads BLOCK_CHANGES CONSISTENT_CHANGES
----- ---------- ---------- ---------- ------------- ------------------
  301  388131317   97721268   26687579     223052804             161334

Elapsed: 00:00:00.01
2. Check there is nothing blocking the session
etl_user@datap> select * from v$lock where sid=301;

ADDR     KADDR           SID TY        ID1        ID2      LMODE    REQUEST      CTIME      BLOCK
-------- -------- ---------- -- ---------- ---------- ---------- ---------- ---------- ----------
684703F0 6847041C        301 DX         35          0          1          0      45237          0
684714C4 684714F0        301 AE     199675          0          4          0     260148          0
619651EC 6196521C        301 TM      52733          0          3          0      45241          0
67F86ACC 67F86B0C        301 TX     458763      52730          6          0      45241          0
3. Check if the session is still valid:
etl_user@datap> select status from v$session where sid=301;

STATUS
--------
ACTIVE
4. Check if there is anything in long ops that has not completed:
etl_user@datap> SELECT SID, SERIAL#, opname, SOFAR, TOTALWORK,
  2      ROUND(SOFAR/TOTALWORK*100,2) COMPLETE, TIME_REMAINING/60
  3      FROM   V$SESSION_LONGOPS
  4      WHERE
  5      TOTALWORK != 0
  6      AND    SOFAR != TOTALWORK
  7     order by 1;

no rows selected

Elapsed: 00:00:00.00
5. Check if there is anything in long ops for the session:
etl_user@datap> r
  1* select SID,SOFAR,TOTALWORK,START_TIME,LAST_UPDATE_TIME,TIME_REMAINING,MESSAGE from V$SESSION_LONGOPS where sid=301

  SID      SOFAR  TOTALWORK START_TIM LAST_UPDA TIME_REMAINING MESSAGE
----- ---------- ---------- --------- --------- -------------- ------------------------------------------------------------------------------
  301          0          0 22-JUL-12 22-JUL-12                Gather Table's Index Statistics: Table address_etl : 0 out of 0 Indexes done


Elapsed: 00:00:00.00
This is a bit odd!! This particular step have actually completed successfully on the 22nd of July, and we don't know why it's still showing in long opps!? any ideas?


6. Looking at the sql and what's it actually doing:
etl_user@datap> select a.sid, a.value session_cpu, c.physical_reads,
  2  c.consistent_gets,d.event,
  3  d.seconds_in_wait
  4  from v$sesstat a,v$statname b, v$sess_io c, v$session_wait d
  5  where a.sid= &p_sid_number
  6  and b.name = 'CPU used by this session'
  7  and a.statistic# = b.statistic#
  8  and a.sid=c.sid
  9  and a.sid=d.sid;
Enter value for p_sid_number: 301
old   5: where a.sid= &p_sid_number
new   5: where a.sid= 301

             CPU   physical    logical                                   seconds
  SID       used      reads      reads EVENT                             waiting
----- ---------- ---------- ---------- ------------------------------ ----------
  301    1966595   26687579   97721268 SQL*Net message from dblink         45792

Elapsed: 00:00:00.03
7. We looked at the remote DB where the data resides on, and we noticed that the remote session was also waiting on the db link:
SYS@destp> select a.sid, a.value session_cpu, c.physical_reads,
  2  c.consistent_gets,d.event,
  3  d.seconds_in_wait
  4  from v$sesstat a,v$statname b, v$sess_io c, v$session_wait d
  5  where a.sid= &p_sid_number
  6  and b.name = 'CPU used by this session'
  7  and a.statistic# = b.statistic#
  8  and a.sid=c.sid
  9  and a.sid=d.sid;
Enter value for p_sid_number: 388
old   5: where a.sid= &p_sid_number
new   5: where a.sid= 390

       SID SESSION_CPU PHYSICAL_READS CONSISTENT_GETS EVENT                                                    SECONDS_IN_WAIT
---------- ----------- -------------- --------------- ---------------------------------------------------------------- ---------------
       390         136              0            7605 SQL*Net message from client                                        46101

SYS@destp>
We have had an issue in the past where the connection was being dropped by the network when the process runs for few days, hence we have added the following to the sqlnet.ora and listener.ora files:
sqlnet.ora:

SQLNET.EXPIRE_TIME = 1
SQLNET.INBOUND_CONNECT_TIMEOUT = 6000

listener.ora:
INBOUND_CONNECT_TIMEOUT_LISTENER = 6000
What else can we do and/or further investigate to work out the root cause of the problem, and may be help resolve this. We don't want to just stop and start the process again as it took few days already. We have
had a chat to the infrastructure team and they've assured us that there have been no network outages.

Also, the alert logs for both instances (local and remote) shows no errors what so ever!

Your input is highly appreciated.


Thanks

Edited by: rsar001 on Jul 25, 2012 10:22 AM

アイコンマークの説明

  • 正解 - 10 ポイント
  • 役に立つ回答 - 5 ポイント