Controlling distributed queries with hints

articles: 

Recently I've been working on tuning some distributed queries. This is not always straightforward.

This is not a comprehensive discussion of the topic, rather a description of how one might approach the problem. The query I'm using for this demonstration is joining EMP, DEPT, and SALGRADE in the SCOTT schema. EMP and DEPT are at the remote site L1, SALGRADE is local:

SELECT ename,
       dname,
       grade
FROM   emp@l1
       join dept@l1 USING (deptno)
       join salgrade
         ON ( sal BETWEEN losal AND hisal );

This is the plan:
-------------------------------------------------------------------------------------------------
| Id  | Operation            | Name     | Rows  | Bytes | Cost (%CPU)| Time     | Inst   |IN-OUT|
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |          |    42 |  2646 |    11  (19)| 00:00:01 |        |      |
|   1 |  MERGE JOIN          |          |    42 |  2646 |    11  (19)| 00:00:01 |        |      |
|   2 |   SORT JOIN          |          |    14 |   742 |     7  (15)| 00:00:01 |        |      |
|*  3 |    HASH JOIN         |          |    14 |   742 |     6   (0)| 00:00:01 |        |      |
|   4 |     REMOTE           | DEPT     |     4 |    80 |     3   (0)| 00:00:01 |     L1 | R->S |
|   5 |     REMOTE           | EMP      |    14 |   462 |     3   (0)| 00:00:01 |     L1 | R->S |
|*  6 |   FILTER             |          |       |       |            |          |        |      |
|*  7 |    SORT JOIN         |          |     5 |    50 |     4  (25)| 00:00:01 |        |      |
|   8 |     TABLE ACCESS FULL| SALGRADE |     5 |    50 |     3   (0)| 00:00:01 |        |      |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("EMP"."DEPTNO"="DEPT"."DEPTNO")
   6 - filter("EMP"."SAL"<="HISAL")
   7 - access(INTERNAL_FUNCTION("EMP"."SAL")>=INTERNAL_FUNCTION("LOSAL"))
       filter(INTERNAL_FUNCTION("EMP"."SAL")>=INTERNAL_FUNCTION("LOSAL"))

Remote SQL Information (identified by operation id):
----------------------------------------------------

   4 - SELECT "DEPTNO","DNAME" FROM "DEPT" "DEPT" (accessing 'L1' )

   5 - SELECT "ENAME","SAL","DEPTNO" FROM "EMP" "EMP" (accessing 'L1' )

The IN-OUT R->S operations are remote-to-serial, and tell me that the DEPT table and the EMP table are being sent from L1 to be joined locally, and then the result is joined to SALGRADE. This could be a bit silly, and furthermore there is no chance of using an index driven nested loop or merge join, because the local database can't see any indexes that might exist at L1.
So I'll try the driving site hint:
SELECT /*+ driving_site(emp) */ ename,
                                dname,
                                grade
FROM   emp@l1
       join dept@l1 USING (deptno)
       join salgrade
         ON ( sal BETWEEN losal AND hisal ); 

and that gives me this:
-----------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name     | Rows  | Bytes | Cost (%CPU)| Time     | Inst   |IN-OUT|
-----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT REMOTE        |          |    42 |  1512 |    11  (28)| 00:00:01 |        |      |
|   1 |  MERGE JOIN                    |          |    42 |  1512 |    11  (28)| 00:00:01 |        |      |
|   2 |   SORT JOIN                    |          |    14 |   364 |     7  (29)| 00:00:01 |        |      |
|   3 |    MERGE JOIN                  |          |    14 |   364 |     6  (17)| 00:00:01 |        |      |
|   4 |     TABLE ACCESS BY INDEX ROWID| DEPT     |     4 |    52 |     2   (0)| 00:00:01 |  ORCLZ |      |
|   5 |      INDEX FULL SCAN           | PK_DEPT  |     4 |       |     1   (0)| 00:00:01 |  ORCLZ |      |
|*  6 |     SORT JOIN                  |          |    14 |   182 |     4  (25)| 00:00:01 |        |      |
|   7 |      TABLE ACCESS FULL         | EMP      |    14 |   182 |     3   (0)| 00:00:01 |  ORCLZ |      |
|*  8 |   FILTER                       |          |       |       |            |          |        |      |
|*  9 |    SORT JOIN                   |          |     5 |    50 |     4  (25)| 00:00:01 |        |      |
|  10 |     REMOTE                     | SALGRADE |     5 |    50 |     3   (0)| 00:00:01 |      ! | R->S |
-----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   6 - access("A3"."DEPTNO"="A2"."DEPTNO")
       filter("A3"."DEPTNO"="A2"."DEPTNO")
   8 - filter("A3"."SAL"<="A1"."HISAL")
   9 - access(INTERNAL_FUNCTION("A3"."SAL")>=INTERNAL_FUNCTION("A1"."LOSAL"))
       filter(INTERNAL_FUNCTION("A3"."SAL")>=INTERNAL_FUNCTION("A1"."LOSAL"))

Remote SQL Information (identified by operation id):
----------------------------------------------------

  10 - SELECT "GRADE","LOSAL","HISAL" FROM "SALGRADE" "A1" (accessing '!' )


Note
-----
   - fully remote statement

As a "fully remote" statement, the plan is showing the point of view of L1. EMP and DEPT are joined locally (with an indexed merge join, which was not possible before) and SALGRADE is sent across the database link. That too seems a bit silly. Wouldn't it be better to join EMP and DEPT remotely, and send the result across the link and join to SALGRADE locally? Well, the driving_site hint doesn't let you do that. But I can get that effect by using an in-line view:
SELECT ename,
       dname,
       grade
FROM   (SELECT /*+ no_merge */ ename,
                               sal,
                               dname
        FROM   emp@l1
               join dept@l1 USING (deptno))
       join salgrade
         ON ( sal BETWEEN losal AND hisal ); 
------------------------------------------------------------------------------------------------
| Id  | Operation           | Name     | Rows  | Bytes | Cost (%CPU)| Time     | Inst   |IN-OUT|
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |          |    42 |  1638 |    11  (19)| 00:00:01 |        |      |
|   1 |  MERGE JOIN         |          |    42 |  1638 |    11  (19)| 00:00:01 |        |      |
|   2 |   SORT JOIN         |          |     5 |    50 |     4  (25)| 00:00:01 |        |      |
|   3 |    TABLE ACCESS FULL| SALGRADE |     5 |    50 |     3   (0)| 00:00:01 |        |      |
|*  4 |   FILTER            |          |       |       |            |          |        |      |
|*  5 |    SORT JOIN        |          |    14 |   406 |     7  (15)| 00:00:01 |        |      |
|   6 |     VIEW            |          |    14 |   406 |     6   (0)| 00:00:01 |        |      |
|   7 |      REMOTE         |          |       |       |            |          |     L1 | R->S |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - filter("SAL"<="HISAL")
   5 - access("SAL">="LOSAL")
       filter("SAL">="LOSAL")

Remote SQL Information (identified by operation id):
----------------------------------------------------

   7 - EXPLAIN PLAN SET STATEMENT_ID='PLUS1550001' INTO PLAN_TABLE@! FOR SELECT /*+
       NO_MERGE */ "A2"."ENAME","A2"."SAL","A1"."DNAME" FROM "EMP" "A2","DEPT" "A1" WHERE
       "A2"."DEPTNO"="A1"."DEPTNO" (accessing 'L1' )


Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 1 (U - Unused (1))
---------------------------------------------------------------------------

   7 -  SEL$64EAE176
         U -  no_merge

Now I have what I want: EMP and DEPT are joined remotely, with the result being joined to SALGRADE locally. This should give the optimizer the capability of using the best access path and join methods and minimize the network traffic (though it does not however give much flexibility for join order).
Note the use of the no_merge hint (which the hint report says was unused): without it, everything happens locally to give the same plan that I started with.

The take away from this is that you may be able to control which parts of query run at each site, but that the driving_site hint may be too crude a tool to do this optimally. And that, as is so often the case, a hint may have unexpected effects.
--
John Watson
Oracle Certified Master DBA
http://skillbuilders.com