Full Table Scan vs. BITMAP Index vs. VIRTUAL Column
Friday, 02.11.2012 – DejanOvaj put donosim još jedan primjer iz prakse, na osnovu kojeg ću pokazati interesantne metode optimizacije problematičnog SQL upita.
U stvarnoj bazi postoji jedna tabela, u koju se podaci unose i obrađuju po slijedećem principu:
– podaci se unose u nejednakim razmacima, pri čemu se jedna status kolona označi sa ‘N’, što znači da podaci nisu obrađeni; ukoliko dođe do neke greške prilikom unosa ili su podaci nepotpuni, onda status kolona dobija vrijednost NULL
– jedan SQL upit se izvršava svakih 5 sekundi i provjerava, da li postoje neobrađeni ili nepotpuni podaci
– ukoliko postoje neobrađeni podaci, onda bivaju obrađeni, nakon čega se vrijednost u status koloni mijenja iz ‘N’ u ‘Y’
E sad, ostavimo na stranu što je dizajn ove tabele, odnosno njena fizička struktura, u praksi veoma loša…
Zadatak je da optimiram onaj SQL, koji svakih 5 sekundi provjerava, da li postoje neobrađeni podaci. Možeš se žaliti da je struktura tabele loša i da ju treba izmijeniti – odgovor je uvijek isti:”Nemamo sad vremena za komplikovane izmjene, moramo do kraja mjeseca završiti druge bitnije stvari, a to ćemo naknadno…bla bla“…
Dakle, na posao… Trebalo mi je skoro 2 sata vremena, ali rezultat je bio itekako dojmljiv… Zanima vas kako?
Kako?
Za ovaj primjer sam kreirao skoro identičnu tabelu sa istim kolonama, samo pod drugim nazivom, kako bih zaštitio podatke iz stvarne prakse:
create table t( dummy_string VARCHAR2(50 BYTE) NOT NULL, dummy_number NUMBER(9) NOT NULL, dummy_number2 NUMBER(14), dummy_check CHAR(1 BYTE), dummy_datum DATE DEFAULT sysdate NOT NULL ); Table created.
Aktualizirajmo statistike, kako bi optimizer kreirao efikasan execution plan i usput pogledajmo, koliko tabela trenutno ima redova:
SQL> exec dbms_stats.gather_table_stats(user, 'T'); PL/SQL procedure successfully completed. SQL> select count(*) from t; COUNT(*) ---------- 8000000
Aktivirajmo autotrace sa explain planom i statistikama, te uključimo timing:
SQL> set autotrace traceonly explain statistics SQL> set timing on lines 180 trimspool on
Ovako izgleda originalni SQL:
SELECT dummy_string FROM t WHERE dummy_check is null or dummy_check <> 'Y';
FTS (Full Table Scan)
Da, dobro ste vidjeli – tabela nema nijednog indeksa. Kreirana je davno i tada je FTS (Full Table Scan) bio brži u odnosu na index, ali developeri su previdjeli činjenicu da je količina podataka svakim danom sve veća…
Izvršimo originalni SQL upit, koji koristi FTS, te analizirajmo rezultat:
SQL> SELECT dummy_string 2 FROM t 3 WHERE dummy_check is null or dummy_check <> 'Y'; no rows selected Elapsed: 00:00:02.41 Execution Plan ---------------------------------------------------------- Plan hash value: 1601196873 -------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 713 | 37789 | 22522 (2)| 00:04:31 | |* 1 | TABLE ACCESS FULL| T | 713 | 37789 | 22522 (2)| 00:04:31 | -------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("DUMMY_CHECK"<>'Y' OR "DUMMY_CHECK" IS NULL) Statistics ---------------------------------------------------------- 22 recursive calls 0 db block gets 81803 consistent gets 81785 physical reads 0 redo size 340 bytes sent via SQL*Net to client 509 bytes received via SQL*Net from client 1 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 0 rows processed
Dakle, za FTS potrebno je 00:00:02.41 sekundi, te se učita preko 80 000 blokova.
Nije strašno, ali bi vremenom moglo postati kritično, pošto se taj SQL izvršava svakih 5 sekundi, te se odmah potom podaci obrađuju i aktualiziraju, tako da može doći do konflikta iliti blokade u resursima. Onda bi se interval izvršavanja morao povećati sa 5 na npr. 10 sekundi.
Bitmap Index
Iz iskustva znam da se u određenim slučajevima može kreirati Bitmap index kada u SQL upitu postoji OR uslov. Kreirajmo Bitmap index i pogledajmo rezultat:
SQL> create bitmap index ix_bmp_t_dummy_check on t(dummy_check); Index created. SQL> SELECT /*+ index(t ix_bmp_t_dummy_check) */ dummy_string 2 FROM t 3 WHERE dummy_check is null or dummy_check <> 'Y'; no rows selected Elapsed: 00:00:00.03 Execution Plan ---------------------------------------------------------- Plan hash value: 1287377879 ----------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ----------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 713 | 37789 |130K (1)| 00:26:07 | | 1 | TABLE ACCESS BY INDEX ROWID | T | 713 | 37789 |130K (1)| 00:26:07 | | 2 | BITMAP CONVERSION TO ROWIDS| | | | | | |* 3 | BITMAP INDEX FULL SCAN | IX_BMP_T_DUMMY_CHECK | | | | | ----------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - filter("DUMMY_CHECK"<>'Y' OR "DUMMY_CHECK" IS NULL) Statistics ---------------------------------------------------------- 22 recursive calls 0 db block gets 194 consistent gets 0 physical reads 0 redo size 340 bytes sent via SQL*Net to client 509 bytes received via SQL*Net from client 1 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 0 rows processed
Opaaa! Koja razlika! Mnogo bolje! Samo 0.3 sekunde, te samo 194 blokova (consistent gets). E, sad, budimo picajzle i pomislimo odmah na blokadu resursa prilikom DML upita (uglavnom prilikom UPDATE i DELETE), kada se Bitmap Index mora aktualizirati, pri čemu blokira ostale redove. Znači, nije idealno rješenje…
VIRTUAL Column
Od nedavno sam počeo aktivnije testirati i koristiti novu opciju u 11g verziji, pod nazivom “VIRTUAL column“, pa sam odlučio i u ovom slučaju pokušati primijeniti rješenje zasnovano na toj opciji:
SQL> alter table t 2 add ( 3 virtual_check_number NUMBER 4 GENERATED ALWAYS AS (case when dummy_check is null or dummy_check <> 'Y' then 1 else 0 end ) 5 VIRTUAL); Table altered. SQL> create index ix_t_virtual_check on t(virtual_check_number); Index created. SQL> exec dbms_stats.gather_table_stats(user, 'T'); PL/SQL procedure successfully completed. SQL> SELECT dummy_string 2 FROM t 3 WHERE virtual_check_number = 1; no rows selected Elapsed: 00:00:00.00 Execution Plan ---------------------------------------------------------- Plan hash value: 2480226401 -------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost(%CPU)| Time | -------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 55 | 4 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID| T | 1 | 55 | 4 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | IX_T_VIRTUAL_CHECK | 1 | | 3 (0)| 00:00:01 | -------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("VIRTUAL_CHECK_NUMBER"=1) Statistics ---------------------------------------------------------- 22 recursive calls 0 db block gets 9 consistent gets 2 physical reads 0 redo size 340 bytes sent via SQL*Net to client 509 bytes received via SQL*Net from client 1 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 0 rows processed
Ohohoho! Vjerujte, da sam se i ja ugodno iznenadio kada sam vidio rezultat: rezultat je došao promptno i učitano je samo 9 blokova! Fascinantno!
A nakon unosa novih podataka?
Ništa. VIRTUAL column opet rastura FTS i Bitmap Index.
SQL> insert into t(dummy_string, dummy_number, dummy_number2, dummy_check, dummy_datum) 2 select DBMS_RANDOM.STRING('A', 50) as dummy_string, 3 round(dbms_random.value(1, 100)) as dummy_number, 4 round(dbms_random.value(1, 100000)) as dummy_number2, 5 case when mod(rownum,150) = 0 then NULL 6 else 'N' 7 end as dummy_check, 8 to_date('02.11.2012', 'DD.MM.YYYY') as dummy_datum 9 from dual 10 connect by level <= 1000; 1000 rows created. SQL> commit; Commit complete. SQL> exec dbms_stats.gather_table_stats(user, 'T'); PL/SQL procedure successfully completed. SQL> SELECT dummy_string 2 FROM t 3 WHERE dummy_check is null or dummy_check <> 'Y'; 1000 rows selected. Elapsed: 00:00:02.43 Execution Plan ---------------------------------------------------------- Plan hash value: 1601196873 -------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 2221 | 114K| 22522 (2)| 00:04:31 | |* 1 | TABLE ACCESS FULL| T | 2221 | 114K| 22522 (2)| 00:04:31 | -------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("DUMMY_CHECK"<>'Y' OR "DUMMY_CHECK" IS NULL) Statistics ---------------------------------------------------------- 22 recursive calls 0 db block gets 81869 consistent gets 81785 physical reads 0 redo size 64908 bytes sent via SQL*Net to client 1246 bytes received via SQL*Net from client 68 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1000 rows processed SQL> SELECT /*+ index(t ix_bmp_t_dummy_check) */ dummy_string 2 FROM t 3 WHERE dummy_check is null or dummy_check <> 'Y'; 1000 rows selected. Elapsed: 00:00:00.03 Execution Plan ---------------------------------------------------------- Plan hash value: 1287377879 ----------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ----------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 2221 | 114K|130K (1)| 00:26:07 | | 1 | TABLE ACCESS BY INDEX ROWID | T | 2221 | 114K|130K (1)| 00:26:07 | | 2 | BITMAP CONVERSION TO ROWIDS| | | | | | |* 3 | BITMAP INDEX FULL SCAN | IX_BMP_T_DUMMY_CHECK | | | | | ----------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - filter("DUMMY_CHECK"<>'Y' OR "DUMMY_CHECK" IS NULL) Statistics ---------------------------------------------------------- 22 recursive calls 0 db block gets 279 consistent gets 9 physical reads 0 redo size 64908 bytes sent via SQL*Net to client 1246 bytes received via SQL*Net from client 68 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1000 rows processed SQL> SELECT dummy_string 2 FROM t 3 WHERE virtual_check_number = 1; 1000 rows selected. Elapsed: 00:00:00.01 Execution Plan ---------------------------------------------------------- Plan hash value: 2480226401 -------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost(%CPU)| Time | -------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1477 | 82712 | 23 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID| T | 1477 | 82712 | 23 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | IX_T_VIRTUAL_CHECK | 1510 | | 5 (0)| 00:00:01 | -------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("VIRTUAL_CHECK_NUMBER"=1) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 150 consistent gets 0 physical reads 116 redo size 64908 bytes sent via SQL*Net to client 1246 bytes received via SQL*Net from client 68 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1000 rows processed SQL> update t 2 set dummy_check = 'Y' 3 where virtual_check_number = 1; 1000 rows updated. SQL> commit; Commit complete. SQL> exec dbms_stats.gather_table_stats(user, 'T'); PL/SQL procedure successfully completed. SQL> SELECT dummy_string 2 FROM t 3 WHERE dummy_check is null or dummy_check <> 'Y'; no rows selected Elapsed: 00:00:02.41 Execution Plan ---------------------------------------------------------- Plan hash value: 1601196873 -------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 724 | 38372 | 22522 (2)| 00:04:31 | |* 1 | TABLE ACCESS FULL| T | 724 | 38372 | 22522 (2)| 00:04:31 | -------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("DUMMY_CHECK"<>'Y' OR "DUMMY_CHECK" IS NULL) Statistics ---------------------------------------------------------- 22 recursive calls 0 db block gets 81803 consistent gets 81785 physical reads 0 redo size 340 bytes sent via SQL*Net to client 509 bytes received via SQL*Net from client 1 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 0 rows processed SQL> SELECT /*+ index(t ix_bmp_t_dummy_check) */ dummy_string 2 FROM t 3 WHERE dummy_check is null or dummy_check <> 'Y'; no rows selected Elapsed: 00:00:00.03 Execution Plan ---------------------------------------------------------- Plan hash value: 1287377879 ----------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ----------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 724 | 38372 |130K (1)| 00:26:07 | | 1 | TABLE ACCESS BY INDEX ROWID | T | 724 | 38372 |130K (1)| 00:26:07 | | 2 | BITMAP CONVERSION TO ROWIDS| | | | | | |* 3 | BITMAP INDEX FULL SCAN | IX_BMP_T_DUMMY_CHECK | | | | | ----------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - filter("DUMMY_CHECK"<>'Y' OR "DUMMY_CHECK" IS NULL) Statistics ---------------------------------------------------------- 22 recursive calls 0 db block gets 194 consistent gets 0 physical reads 0 redo size 340 bytes sent via SQL*Net to client 509 bytes received via SQL*Net from client 1 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 0 rows processed SQL> SELECT dummy_string 2 FROM t 3 WHERE virtual_check_number = 1; no rows selected Elapsed: 00:00:00.01 Execution Plan ---------------------------------------------------------- Plan hash value: 2480226401 -------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost(%CPU)| Time | -------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 55 | 4 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID| T | 1 | 55 | 4 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | IX_T_VIRTUAL_CHECK | 1 | | 3 (0)| 00:00:01 | -------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("VIRTUAL_CHECK_NUMBER"=1) Statistics ---------------------------------------------------------- 22 recursive calls 0 db block gets 11 consistent gets 0 physical reads 0 redo size 340 bytes sent via SQL*Net to client 509 bytes received via SQL*Net from client 1 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 0 rows processed
Znači, sitna DDL izmjena na tabeli, a tako moćna! Kao šlag na torti – moguće je virtualnu kolonu particionirati. Testiranje i rezultate sa tom opcijom prepuštam vama. 🙂
3 Responses to “Full Table Scan vs. BITMAP Index vs. VIRTUAL Column”
Pozdrav Dejane,
zanimljiv primjer kako sa virtualnom kolonom riješiti problem sa performansama.
Morati ću se malo pozabaviti virtualnim kolonama – moram priznati da su malo zapostavljene kod mene.
Kod nas često puta bilo kakav DDL na produkcijskim tablicama je vrlo osjetljiva stvar, pa stoga pokušavam i zaobići tu metodu ako je moguće. Ovaj bi problem riješio upotrebom funkcijskog indeksa koji daje vrlo slične rezultate kao i virtualna kolona.
Baci pogled ispod 😉
SQL> select count(*),status
2 from btab
3 group by status;
COUNT(*) STATUS
———- ——-
251
349 NO
499400 YES
3 rows selected.
SQL> alter table btab
2 add (
3 virtual_check_number NUMBER
4 GENERATED ALWAYS AS (case when status is null or status ‘YES’ then 1 else 0 end )
5 VIRTUAL);
Table altered.
SQL> create index ix_t_virtual_check on btab(virtual_check_number);
Index created.
SQL> select status
2 from btab
3 where virtual_check_number=1;
600 rows selected.
Elapsed: 00:00:00.03
Execution Plan
———————————————————-
Plan hash value: 1817721707
————————————————————————————————–
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
————————————————————————————————–
| 0 | SELECT STATEMENT | | 5000 | 75000 | 489 (0)| 00:00:06 |
| 1 | TABLE ACCESS BY INDEX ROWID| BTAB | 5000 | 75000 | 489 (0)| 00:00:06 |
|* 2 | INDEX RANGE SCAN | IX_T_VIRTUAL_CHECK | 2000 | | 455 (0)| 00:00:06 |
————————————————————————————————–
Predicate Information (identified by operation id):
—————————————————
2 – access(“VIRTUAL_CHECK_NUMBER”=1)
Statistics
———————————————————-
29 recursive calls
0 db block gets
637 consistent gets
588 physical reads
0 redo size
9020 bytes sent via SQL*Net to client
849 bytes received via SQL*Net from client
41 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
600 rows processed
SQL> drop index ix_t_virtual_check;
SQL> alter table btab drop column virtual_check_number;
admin@AOCDB> create index check_idx on btab
2 (case when status is null or status ‘YES’ then 1 else 0 end);
Index created.
SQL> select status
2 from btab
3 where (case when status is null or status ‘YES’ then 1 else 0 end) = 1;
600 rows selected.
Elapsed: 00:00:00.03
Execution Plan
———————————————————-
Plan hash value: 935407903
—————————————————————————————–
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
—————————————————————————————–
| 0 | SELECT STATEMENT | | 5000 | 25000 | 489 (0)| 00:00:06 |
| 1 | TABLE ACCESS BY INDEX ROWID| BTAB | 5000 | 25000 | 489 (0)| 00:00:06 |
|* 2 | INDEX RANGE SCAN | CHECK_IDX | 2000 | | 455 (0)| 00:00:06 |
—————————————————————————————–
Predicate Information (identified by operation id):
—————————————————
2 – access(CASE WHEN (“STATUS” IS NULL OR “STATUS”‘YES’) THEN 1 ELSE 0 END
=1)
Statistics
———————————————————-
15 recursive calls
0 db block gets
633 consistent gets
589 physical reads
0 redo size
9020 bytes sent via SQL*Net to client
849 bytes received via SQL*Net from client
41 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
600 rows processed
By Marko Sutic on Nov 3, 2012
Nije dobro da koristis bitmap indexe kada koristis <> umesto = jer onda ce uvek da radi BITMAP INDEX FULL SCAN. Pokusaj da promenis WHERE dummy_check is null or dummy_check <> ‘Y’; u WHERE dummy_check is null or dummy_check = ‘N’;
By Srdjan Mitrovic on Mar 19, 2013
Ponovo naleteh na ovaj tekst pa se pitam da li je Dejan probao da izmeni query koji koristi bitmap index. Problem kod queryja je sto za dummy_check ‘Y’ on mora da prodje kroz ceo index jer Oracle ne zna da postoji jos samo jedna vrednost ‘N’ (ok, ima i null) i zato je preporucljivo da se za bitmap indekse uvek koristi equi-join. Filter WHERE dummy_check is null or dummy_check = ‘N’ bi trebalo da radi mnoooogo brze jer bitmap indeksi indeksiraju i null vrednosti za razliku od obicnih indeksa.
Druga primedba je to sto si za virtuelnu kolonu koristio WHERE virtual_check_number = 1 umesto WHERE virtual_check_number 0 pa nije fer poredjenje. Da si koristio ovo drugo onda bi i za to morao da imas full index scan a ne range scan jer oracle onda mora da prodje kroz sve vrednosti da vidi da li su razlicite od 0. Koristio si drugaciji query za virtualnu kolonu i za bitmap indekse.
By Srdjan Mitrovic on Sep 6, 2016