Feeds:
Posts
Comments

Archive for the ‘Oracle 10g Administration’ Category

Hi friends,

 

Last one year i have work a lot on export and import for small to huge database, from specific objects to schema to and query based backup.

 

here I am listed some of my syntax which you can note down for your reference.

  1. export objects from specific schema which match the specific criteria.

expdp system/oracle directory=DUMPDIR1 dumpfile=BG_objects_%u.dmp logfile=BG_objects.log
INCLUDE=FUNCTION,INDEX,CONSTRAINT,PACKAGE,PROCEDURE,SEQUENCE,VIEW,TABLE:\”LIKE \’BG_%\’\” schemas=<schema_name>;

2. Import objects to the schema which were own by other schema but exclude some specific objects.

impdp system/oracle directory=DUMPDIR1 dumpfile=BG_objects_01.dmp logfile=BG_objects_imp.log EXCLUDE=TABLE:\”LIKE \’%POSTED%\’\” table_exists_action=REPLACE
remap_schema=<from_schema_name>:<to_schema_name>;

3. Import objects from one schema to other schema where tablespace for both the schema were different and from different version(like from 10g to 11g).

impdp system/oracle directory=DUMP_BACKUP dumpfile=dumpfile_17032015.dmp logfile=logfile_imp.log remap_schema=<from_schema>:<to_schema>
remap_tablespace=<from_tablespace>:<to_tablespace> table_exists_action=replace EXCLUDE=STATISTICS;

4. Export specific objects from the schema.

expdp username directory=BCKUP_DUMP dumpfile=schema_pk_objects_%U.dmp logfile=schema_pk_objects.log include=TABLE:\”LIKE \’%PK_%\’\”;

5. Export only one table.

expdp oracle directory=BKUP_DUMP dumpfile=schema_ac_objects_%U.dmp logfile=schema_ac_objects.log tables=<Table_name>;

6.Export a big database and break the dumpfile to matching criteria to fit to the space available in the disk.

expdp SYSTEM/oracle dumpfile=DUMP1:SCHEMA_%u.dmp dumpfile=DUMP2:SCHEMA_%u.dmp filesize=96G logfile=DUMP1:schema.log full=y exclude=statistics;

7. Import a full dump which contains multiple user in different tablespace to a single common user in a single common tablespace to same database or in a different database.

impdp system/oracle directory=dump dumpfile=EPIXDB_01.dmp,EPIXDB_02.dmp full=y logfile=epixdb_imp.log remap_schema=<Schema1>:<schema_common>
remap_schema=<Schema2>:<schema_common> remap_schema=<Schema3>:<schema_common> remap_schema=<Schema4>:<schema_common> remap_schema=<Schema5>:<schema_common>
remap_schema=<Schema6>:<schema_common> remap_schema=<Schema7>:<schema_common> remap_schema=<Schema8>:<schema_common> remap_schema=<Schema9>:<schema_common> remap_schema=<Schema10>:<schema_common>
remap_schema=<Schema11>:<schema_common> remap_schema=<Schema12>:<schema_common> remap_tablespace=<tablespace1>:<common_tablespace> remap_tablespace=<tablespace2>:<common_tablespace>
remap_tablespace=<tablespace3>:<common_tablespace> table_exists_action=replace;

8.export of a object to a specific time or from a specific time this is also known as incremental backup (note:flashback log must be present)

expdp username directory=BKUP_DUMP dumpfile=schema_ac_objects_%U.dmp logfile=schema_ac_objects.log tables=TABLE_NAME
FLASHBACK_TIME=\”to_timestamp\(to_char\(sysdate,\’yyyy-mm-dd hh24:mi:ss\’\),\’yyyy-mm-dd hh24:mi:ss\’\)\”;

9. Traditional export or exp utility to take backup of specific data.

exp user/pass file=exp.dmp log=exp.log TABLES=test query=”””where rownum< 101″””;

exp uwclass/uwclass owner=SCOTT tables=emp query=\” WHERE job=\’MANAGER\’ AND sal \>50000\” STATISTICS=NONE;

10.Export of multiple table from multiple schemas.

expdp system/password directory=BACKUP dumpfile=schema_old_table%u.dmp logfile=schema_old_tables.log schemas=’SCHEMA1′,’SCHEMA2′,’SCHEMA3′,’SCHEMA4′,’SCHEMA5′,’SCHEMA6′
INCLUDE=TABLE:\”IN \(\’CC_ACCOUNT\’, \’CC_ACCOUNTSERVICE\’, \’CC_ACCOUNTSERVICEOFFERING\’, \’CC_OFFERINGPROPERTY\’, \’CC_ACCOUNTPROPERTY\’, \’BG_INVOICEPOSTED\’,
\’PR_MANUALSITEDETAILS\’, \’PR_MANUALCNDNINVCODE\’, \’PR_MANUCNDN_MANUALSITEDETAILS\’, \’BG_FINALOUTPUTXLS_MR_ADV_AUG10\’, \’BG_FINALOUTPUTXLS_MR_ADV_AUG11\’,
\’BG_FINALOUTPUTXLS_MR_ADV_AUG12\’, \’BG_FINALOUTPUTXLS_MR_ADV_AUG13\’, \’BG_FINALOUTPUTXLS_MR_ADV_AUG14\’, \’BG_FINALOUTPUTXLS_MR_ADV_AUG15\’,
\’BG_INVOICEDETAILPOSTED\’, \’BG_INVOICEDETAILPOSTED\’, \’BG_INVOICEPOSTED\’, \’BG_INVOICEDETAILPOSTED\’, \’BG_INVOICEPOSTED\’, \’CRDR_INVOICEBILLHEADPOSTED\’,
\’CNDN_INVOICEBILLHEADPOSTED\’, \’CRDR_INVOICEBILLHEADPOSTED\’, \’R3G_INVOICEBILLHEADPOSTED\’\)\”;

11.Export from multiple schema but table name can be like matching criteria.

expdp system/U6dba#15@projdb directory=backup dumpfile=projdbdmp_%u.dmp logfile=projdbdmp_full.log schemas=’SCHEMA1′,’SCHEMA2′,’SCHEMA3′,’SCHEMA4′,’SCHEMA5′,’SCHEMA6′ INCLUDE=TABLE:\”IN\(\SELECT table_name FROM dba_tables where table_name like \’CC_%\’ \)\”;

 

Thats it..

 

Above are a snap-chat 🙂 of my huge collection of syntax which are most common in worst scenario.Still if you have any other critical situation which you are facing please let me know in the comment, It would be a honor to help.

 

Hope it helps.

 

With Regards

Nimai Karmakar

 

 

Read Full Post »

Today Hemant Sir have posted a good link that represent some basic for oracle HIGH AVAILABILITY with demonstration.

 

Please click the link…

 

oracle High availability demos…

 

Hope this helps…

Read Full Post »

Hi friends,

 

This post is about lost writes in Oracle, on techniques for reproducing and investigating the effects of lost writes and on the use of Active Data Guard to protect against lost writes

Imagine this scenario: a production system has two standbys to protect against disaster and to load balance read-only load (with Active Data Guard (ADG)). A lost write happens in the primary and remains unnoticed for a few days. Finally the block that suffered a lost write is updated again. Both standbys stop applying redo throwing ORA-600 [3020] (also known as stuck recovery). The primary DB keeps working fine, although it is logically corrupted by the lost write.
You are the DBA in charge of fixing this, what would you do?

I hope that the example above illustrates that lost write can generate quite complex recovery scenarios and overall a few headaches to support DBAs. In this post I illustrate a few techniques and examples that can be used to test the effects of lost writes in Oracle and therefore prepare in case of a real-world issue strikes. Of particular interest will be to test the effect of lost writes in an environment with (Active) Data Guard.
We need to cover some ground first on techniques and how to setup the test. But first a definition.

Lost writes: “A data block lost write occurs when an I/O subsystem acknowledges the completion of the block write, while in fact the write did not occur in the persistent storage” (from support note 1302539.1). Lost writes can be caused by faulty storage, but also by Oracle bugs, in general anything in between our data in RAM and storage can corrupt our data, including controllers and network.

Digression on techniques.
1. A useful technique that we will need in the following is the ability to read and write a single block from Oracle data files (in a test environment). For databases on filesystems (and also DBs on NFS) dd is the tool for this job (I am using the Linux platform as reference). Examples:

read one 8KB block from filesystem (block 134 in this example):
dd if=testlostwrite.dbf bs=8192 count=1 skip=134 of=blk134.dmp

write one 8KB block to filesystem (block 134 in this example)::
dd of=testlostwrite.dbf bs=8192 count=1 seek=134 if=blk134.dmp conv=notrunc

Note when writing to an Oracle datafile we must use conv=notrunc or else we will end up with an unusable (truncated) output file. Note also the syntax for specifying the block offset, skip is used for input files and seek for output files (see dd manual).

How to read and write single blocks on ASM data files. One possibility is to take a backup copy of the datafile with RMAN, edit it with dd (as detailed above), then (with rman again) restore the backup copy. With a little knowledge of ASM internals, more direct ways to access files in ASM are available: one can find the position of the block (and its mirror copies if relevant) and then use dd to read/write data directly. Currently my preferred way is sightly different and it exploits the dbms_diskgroup package. This is an undocumented package (see again the link above on ASM internals for some additional details) although it is extensively used by Oracle’s asmcmd utility. I have packaged the dbms_diskgroup.read and dbms_diskgroup.write calls into a small utility written in perl (that I called asmblk_edit, follow this link to download asmblk_edit). Similar ideas can be found also in support note 603962.1.
The following example illustrates using the utility asmblk_edit to read and write block number 134 from and to a data files stored in ASM:

read one 8KB block from an Oracle data file in ASM (block 134 in this example):
./asmblk_edit -r -s 134 -a +TESTDB_DATADG1/TESTDB/datafile/testlostwrite.353.810578639 -f blk134.dmp

write one 8KB block to and Oracle data file in ASM (block 134 in this example):
./asmblk_edit -w -s 134 -a +TESTDB_DATADG1/TESTDB/datafile/testlostwrite.353.810578639 -f blk134.dmp

2. Another technique that we need is quite straightforward and allows us to find the offset of the Oracle block that we want to read/write to for our tests.
The example here below shows how to find block number where data is stored, using rowid:
SQL> select rowid, dbms_rowid.ROWID_BLOCK_NUMBER(rowid), a.* from testlosttable a;

Incidentally finding the block number for an index leaf block can be done with the following (using the undocumented function sys_op_lbid):
SQL> select rowid rowid_table_from_index_leaf, sys_op_lbid(18297477,’L’,t.rowid) index_leaf_rowid from testlosttable t –note: 18297477 in this example is the data_object_id of the index I am examining, edit with the actual number as relevant

3. Last but not least, we need a way to modify data blocks ‘under the nose of Oracle’, in particular we want to make sure we flush/invalidate the relevant cached data and metadata. The method we will use is:

  • Offline the tablespace where data resides (this flushes dirty blocks and invalidates cache entries)
  • Perform the read/write modifications to the block, with dd or asmblk_edit, as needed
  • Online the tablespace again before the next usage


A basic recipe to reproduce the effects of a lost write. 

We can now put together the ideas and techniques described above into a working example aimed at reproducing the effects of a lost write in Oracle:

SQL> create bigfile tablespace testlostwrite datafile ‘{edit with datafile directory}/testlostwrite.dbf’ size 10m;
SQL> create table testlosttable (id number, payload varchar2(100)) tablespace testlostwrite ;
SQL> create index i_testlosttable on testlosttable (id) tablespace testlostwrite ;
SQL> insert into testlosttable values (10,’aaaaaaaaaaaaaaaaaaaaaaaaa’);
SQL> insert into testlosttable values (20,’bbbbbbbbbbbbbbbbbbbbbbbbb’);
SQL> commit;

SQL> select rowid, dbms_rowid.ROWID_BLOCK_NUMBER(rowid), a.* from testlosttable a;
–note: this will allow to find the block_id where data resides, let’s say it’s block 134
SQL> alter tablespace testlostwrite offline;

— read block either with dd or with asmblk_edit and create backup copy. Example:
— ./asmblk_edit -r -s 134 -a +TESTDB_DATADG1/TESTDB/datafile/testlostwrite.353.810578639 -f blk134.dmp
SQL> alter tablespace testlostwrite online;
SQL> insert into testlosttable values (30,’cccccccccccccccccccccccccc’);
SQL> commit;
SQL> alter tablespace testlostwrite offline;

— write block either with dd or with asmblk_edit and from previously created backup copy. Example:
— ./asmblk_edit -w -s 134 -a +TESTDB_DATADG1/TESTDB/datafile/testlostwrite.353.810578639 -f blk134.dmp
SQL> alter tablespace testlostwrite online;
SQL> — our database has now a lost write in the table testlosttable block 134

The effect of the lost write on the table is that the row with id=30 has disappeared from the table. However the entry with id=30 is still visible in the index i_testlosttable. This observation can be confirmed with  the 2 queries reported here below. Note that in case of normal operations (i.e. no lost writes) the 2 queries would both return three rows, this is not the case here because of our manual editing of the table block with the asmblk utility.
SQL> select /*+ INDEX_FFS(a)*/ id from testlosttable a where id is not null;
SQL> select  /*+ FULL(a)*/ id from testlosttable a;
SQL> — these 2 queries return different results in our test with a lost write.. and only one of them is correct!

 

We can now proceed with this additional test: insert a new row into the table, for example:

SQL> insert into testlosttable values (40,’ddddddddddddddddddddddd’);
SQL> commit;

What we should notice notice at this point is that Oracle keeps working fine and no errors are shown to the user. With the techniques discussed above we can easily show that this new row has  been inserted into block 134 (the block with a lost write). Let’s postpone further investigations for a later paragraph and for now just note that Oracle has not detected that we have suffered a lost write.

An example of the effects of lost writes with Standby (Data Guard)
A standby database, among others, provides a copy of the database that Oracle can use to detect lost writes (i.e. the standby provides a reference copy of ‘good data’). A simple test to see how this works in practice:

  • We start by going through the same steps described above and we create a lost write in the primary.
    • Note that at this point the standby has no knowledge that something has gone wrong in the primary.
  • Now we can go ahead and run DML against the block that has suffered the lost write
    • e. we insert the row with id=40 as detailed above.
  • At the moment when the standby database will try to apply the redo log entry (change vector) to the block with a lost write, it will compare SCN numbers and find that something is wrong. MRP will stop and throw ORA-600 [3020] (stuck recovery).

An example of the error stack for a stuck recovery:

ORA-00600: internal error code, arguments: [3020], [10], [134], [134], [], [], [], [], [], [], [], []
ORA-10567: Redo is inconsistent with data block (file# 10, block# 134, file offset is 1097728 bytes)
ORA-10564: tablespace TESTLOSTWRITE
ORA-01110: data file 10: ‘+STDBY_DATADG1/TESTDB/datafile/testlostwrite.347.810578685’
ORA-10561: block type ‘TRANSACTION MANAGED DATA BLOCK’, data object# 18299147

Additional information is dumped in the trace files of MRP and of the recovery slaves.
It’s worth noting that ORA-600 [3020] can be caused by a lost write (as in this example) but also by other issues, in particular with the redo consistency (outside the scope of this discussion).

Oracle 11g and lost write protection 
In the example above Oracle only throws an error alerting that something is wrong (in this case ORA-600 [3020], stuck recovery) when a DML operation is performed on the primary DB against a block that had suffered a lost write. This means that lost writes may also remain silent and unnoticed in the primary DB for a very long period of time. A new feature of 11g can be used to make Oracle more active in detecting lost write issues, in particular to check the blocks that are subject to physical reads into the buffer cache. This is how to activate these checks:

  • set the initialization parameter DB_LOST_WRITE_PROTECT= TYPICAL (or FULL if we prefer) on the primary database. This will cause the generation of additional redo entries when Oracle performs physical reads.
  • set DB_LOST_WRITE_PROTECT = TYPICAL also on the standby, this will make MRP and its recovery slaves to check for lost writes using the extra information in the redo log stream.
  • Outcome: if we hit a block suffering from lost write, MRP and its slaves will stop and throw ORA-752: recovery detected a lost write of a data block 

An example of the error stack for lost writes with db_lost_write_protect set to typical is pasted here below (note that additional information is dumped in the trace files of MRP and of the recovery slaves):

Hex dump of (file 10, block 134) in trace file {path..}/standby_pr09_26471.trc
Reading datafile ‘+STDBY_DATADG1/TESTDB/datafile/testlostwrite.347.810578685’ for corruption at rdba: 0x00000086 (file 10, block 134)
Read datafile mirror ‘STDBY_DATADG1_0000’ (file 10, block 134) found same corrupt data (logically corrupt)
Read datafile mirror ‘STDBY_DATADG1_0011’ (file 10, block 134) found same corrupt data (logically corrupt)

STANDBY REDO APPLICATION HAS DETECTED THAT THE PRIMARY DATABASE

LOST A DISK WRITE OF BLOCK 134, FILE 10
NO REDO AT OR AFTER SCN 6367748422450 CAN BE USED FOR RECOVERY.
ORA-00752: recovery detected a lost write of a data block
ORA-10567: Redo is inconsistent with data block (file# 10, block# 134, file offset is 1097728 bytes)
ORA-10564: tablespace TESTLOSTWRITE
ORA-01110: data file 10: ‘+STDBY_DATADG1/TESTDB/datafile/testlostwrite.347.810578685’
ORA-10561: block type ‘TRANSACTION MANAGED DATA BLOCK’, data object# 18299538

Comparing this error message with the ORA-600 [3020] reported in the previous paragraph,the first thing we notice is that we have more information and more details. Moreover Oracle has checked that this is indeed a case of lost write and has even tried to resolve the ‘logical corruption’ trying to read the second mirror copy from the normal-redundancy ASM diskgroup used in this example (an attempt that did not help in this case as both mirror copies of the block have suffered a lost write, as generated in our example using the asmblk utility).

DB_LOST_WRITE_PROTECT and its impact on primary and standby DBs
From the example here above we see that the extra checks that oracle recovery process does when setting DB_LOST_WRITE_PROTECT with Oracle 11g are definitely an improvement from 10g behavior, although this mechanism does not provide a complete protectionagainst lost writes but only gives us a higher probability that a lost write can be found.

How about the impact of setting this parameter? Extra redo entries are generated on the primary: they are called block read redo (BRR). We can directly investigate BRR entries for example by dumping redo (BRR corresponds to redo at layer 23 and opcode 2), see this example on how to do this type of dump.
SQL> alter system dump logfile ‘…{path to storage}…/thread_2_seq_1816.510.810661967’ layer 23 opcode 2;

Another method is by querying v$mystat or v$sysstat. We will look for stats containing the text ‘lost write’ in their name:
SQL> select name, sum(value) from v$mystat se, v$statname n where n.statistic#=se.statistic# and (n.name like ‘%lost write%’ or name like ‘%physical read%’) group by name;

The size of BRR entries in the redo stream varies as Oracle’s optimizations can come into play, such as batching several BRR entries in one redo record. Based on a limited number of observations of a production system I estimate than on average we can expect 30 bytes of extra redo generated by BRR for each physical block read performed by the database, although mileage may vary and definitely the impact of the parameter should be tested before applying it to a busy production system! Another observation based on testing is thatdirect read operations do not generate BRR (for example reads for parallel query).

Example of BRR entry from a logfile dump:

CHANGE #5 TYP:0 CLS:4 AFN:10 DBA:0x00000086 OBJ:18299943 SCN:0x05ca.9b6e95f9 SEQ:1 OP:23.2 ENC:0 RBL:1
Block Read – afn: 10 rdba: 0x00000086 BFT:(1024,134) non-BFT:(0,134)
scn: 0x05ca.9b6e95f9 seq: 0x01
flags: 0x00000004 ( ckval )

After setting db_lost_write_protect to typical (or full) and rebooting the instance Oracle will set_log_committime_block_cleanout to true (based on observations on my test system 11.2.0.3). The effect is that (on the primary instances) Oracle will write into the redo logs additional entries related to commit time block cleanout operations (layer 4 opcode 6). Logging block cleanout redo record (and subsequently applying them to the standby database) is a desirable feature especially in the case of an environment deploying Active Data Guard.

On a standby instance Oracle setting db_lost_write_protect to  TYPICAL or FULL will make MRP slaves do the extra work of checking for lost writes by comparing SCNs in BRRs with SCN in the block headers. Notably the extra work implies additional physical reads on the standby. This additional work on the standby can be monitored by querying v$sysstat (statistics name = ‘recovery blocks read for lost write detection’ and also the related stat ‘recovery blocks skipped lost write checks’).

Note: checks for lost write based on SCN from BRR entries are performed also when doing media recovery (i.e. a recover database command). Therefore even if Data Guard is not available one can use a simple restore of the database from backup to perform validation of BRR records to search for lost writes.

 

Analysis and troubleshooting.
From the steps details above we have a test environment with a lost write that we have generate. This will allow us to have some fun investigating the issue at low level and developing techniques for analysis and troubleshooting. All this with the great advantage, compared to a real-life case, that now we know for sure what the root cause of this incident is!
A good starting point for troubleshooting lost writes in a production system is support note 1265884.1 “Resolving ORA-752 or ORA-600 [3020] During Standby Recovery”. A few more pointers here below.

 

– One technique to investigate the affected blocks both on primary and standby is by dumping the block contents to a trace file. The idea is to compare the contents of the primary and standby. Of particular interest will be  the SCN of the last change to the block and also SCNs in the ITL list. Example:
SQL> alter system dump datafile 10 block 134;
SQL> –the following performs a dump of the cached version of the block
SQL> alter session set events ‘immediate trace name set_tsn_p1 level <ts#+1>’; — where ts# is the tablespace number
SQL> alter session set events ‘immediate trace name buffer level <decimal rdba>’; –rba is 134 in our example

– We can also investigate the contents of the block suffering a lost write using SQL. This has the advantage of allowing the use offlashback query. The current content of the block on the primary database  may not be what we want: we are interested in aconsistent image of the block at the SCN equal to the SCN reported in our ORA-752 message for example (this would also be current SCN of the standby as recovery is stuck there at that point in time).  Example:
SQL> set num 16
SQL> select ora_rowscn, rowid, dbms_rowid.rowid_row_number(a.rowid) row_number, a.* from testlosttable as of scn 6367748219413 a where rowid like ‘ABFzkJAAAAAAACG%’; — edit values for scn and rowid, use SQL below to find the values to use

– Digression: how to find the rowid of the block with lost write (block 134 of file 10 and object_id=18299145). Actually what we need is just the first 15 characters of the rowid (the last three characters are the row_number inside the block). Example:
SQL> select substr(DBMS_ROWID.ROWID_CREATE(rowid_type =>1, object_number =>18299145, relative_fno =>0, block_number =>134, row_number =>0),1,15) rowid_prefix from dual;

– How to find the current SCN (at the standby):
SQL> select current_scn from v$database;

– Another powerful technique is to dump redo log information from all relevant logfiles searching for entries related to the block with lost writes (in our example it’s block 134 of file 10). The dump will include transaction details and most notably redo marking the time when DBWR has written the give block (this info is sotre in block written redo, BWR). If  the parameter db_lost_write_protect is set to typical or full the redo dump will also show details of the block read redo (see BRR discussed above). For further info on logfile dumps see also Julian Dyke’s website. Example:
SQL> alter system dump logfile ‘…{path to storage}…/thread_2_seq_1816.510.810661967’ DBA MIN 10 134 DBA MAX 10 134; — edit file number and block number as needed

 

– Additional investigation on data affected by the lost write can be performed by log mining. The goal is to find the SQL of all the transactions for the affected block. Identify the relevant redo logs to mine first. Example:
SQL> BEGIN

SYS.DBMS_LOGMNR.ADD_LOGFILE(LogFileName=>’…{path to storage}…/thread_2_seq_1816.510.810661967′,Options=>SYS.DBMS_LOGMNR.NEW);

SYS.DBMS_LOGMNR.START_LOGMNR(Options=> SYS.DBMS_LOGMNR.DICT_FROM_ONLINE_CATALOG);

END;

/

SQL> SELECT scn,sql_redo FROM SYS.V_$LOGMNR_CONTENTS WHERE data_obj#=18299145 and row_id like ‘ABFzkJAAAAAAACG%’; — calculate rowid with dbms_rowid package as detailed above

 

– Other analysis that can be done in case we have indexes on the tables:

In case one or more indexes are present we can read data from the index and compare the results with what we have on the table. Example to read from the index:
SQL> select rowid rowid_table_from_index_leaf, id, sys_op_lbid(18299146,’L’,t.rowid) index_leaf_rowid from testlosttable t where rowid like ‘ABFzkJAAAAAAACG%’;
–note: update 18299146 with the the data_object_id of the index of interest
— in this example this is the data_object_id of I_TESTLOSTTABLE

– We can also check if there are inconsistencies (corruption) between the table and its indexes with ‘analyze table validate’, although in my experience this can be quite time consuming and not necessarily add more information to the analysis. Example:
SQL> analyze table testlosttable validate structure cascade online;

 

Actions that we can take to restore the services
Suppose that our analysis has confirmed that a lost write happened and also that we have the details of what ‘is lost’. We need now to fix the corrupted block on the primary and restore the service on the standby. In particular if we have an Active Data Guard with a SLA, the latter may be quite an urgent action. Hopefully we also have some ideas of what the root cause was and a way to fix it in the future.

One possible action is to failover to standby. This action plan however may provide to be unacceptable in many circumstances, given the potential for data loss it implies. A failover would likely not be acceptable if the primary database has continued working and accepting users transactions since the time (SCN) of the incident that has generated ORA-600 (or ORA-752).

Another possibility is to use our knowledge of the lost transactions gathered in the analysis phase to run SQL actions to ‘fix the primary’. This has to be evaluated case by case. In some circumstances we can also get away with just dropping and recreating the object with a lost write. In our simple example of a lost insert on table testlosttable, the action to perform on the primary is:

SQL> alter index i_testlosttable rebuild online;
SQL> insert into testlosttable values (30,’cccccccccccccccccccccccccc’);
SQL> commit;

What about fixing the standby? We can unstuck the recovery on the standby by allowing it to corrupt  the block with a lost write (and fix it later, as detailed below). Example:

SQL> alter system set db_lost_write_protect=none; –temporarily disable lost write check if needed
SQL> alter database recover automatic standby database allow 1 corruption;
SQL> –wait till the redo that cause the error message has been applied
SQL> alter database recover cancel;
SQL> — restart normal Data Guard operations. An example for ADG:
SQL> alter system set db_lost_write_protect=typical;
SQL> alter database open read only;
SQL> alter database recover managed standby database nodelay using current logfile disconnect;

 

At this point all is OK except for one corrupted block on the standby. How to restore the corrupted block on standby? This depends on the case, we may just rebuild the object on primary this will fix the problem on standby. Another option is to copy over the datafile from primary to standby
In our example the corrupted block is block 134 of file 10 and we have an active data guard in real time apply. We can use automatic block media recovery (ABMR) to fix it. In my tests AMBR is attempted but does not really work against the corrupted block 134, I can work around this by zeroing out the block. This is an example (intended to be used on test databases):

./asmblk_edit -w -s 134 -a +STDBY_DATADG1/TESTDB/datafile/testlostwrite.347.810578685 -f zeroblock

where zeroblock file is created with: dd if=/dev/zero bs=8192 count=1 of=zeroblock

if my standby was on filesystem I could have used:
dd if=/dev/zero of=testlostwrite.dbf bs=8192 count=1 seek=134 if=blk134.dmp conv=notrunc
If we now query the table (the table testlostwrite in our example) when Oracle reaches the zeroed block it will fetch a copy from production (see Oracle documentation for the details about configuring and using the ABMR feature). This will happen in a transparent way for the user issuing the query, the operation is logged in the alert log of the instance (the 2 lines here below appeared repeated twice in my test using 11.2.0.3):

Automatic block media recovery requested for (file# 10, block# 134)

Automatic block media recovery successful for (file# 10, block# 134)


Oracle’s OTN demo video on lost write

Short videos with demos on testing lost writes and automatic block media recovery in a Data Guard environment can be found on OTN.
See also support document “Best Practices for Corruption Detection, Prevention, and Automatic Repair – in a Data Guard Configuration” [ID 1302539.1]

Conclusions and future work
This article illustrates a simple example on how to reproduce and investigate the effects lost write in a test environment, which is intended as a training exercise for DBAs.
I would like to end with four lessons learned on the topic of how lost writes can affect a high-availability Oracle deployment:

  • Finding the root causes of a lost write can prove to be very hard. Two possible sources of the root causes of these type of issues are the storage and Oracle bugs. How to find which one (if any of the two) is to blame for a particular incident? If the issue cannot be re-produced we can easily find ourselves in between a finger-pointing discussion between vendors.
  • Having a standby (Data Guard) is of great help for discovering lost writes. Without a standby there are high chances that we will not realize we had a lost write in the first place. However, once a lost write is found the redo apply stops:the recovery is stuck and throws error ORA-600 [3020] or ORA-752 if we try to restart it. A support DBA most likely will have to analyze the situation and decide what to do to fix the issue (for example perform a failover or rather going through a process similar to what described in this article). Moreover if one or more Active Data Guards are used for critical read-only activity, there is time pressure to restart the redo apply.
  • How can we check that our primary and standby databases are ‘in sync’, that is that there are nolost writes waiting to ‘explode’ as time bombs? I am not aware of a utility that could do such check. This is a very interesting topic, probably material for another post. A brief discussion of this problem and possible solutions can be found at this link and also here.
  • Database restores can be affected by lost writes too. For example the restore of a database backup can fail (get stuck on applying redo) because of a lost write that has happened in production after the latest data file backup. This has potential impacts on the disaster and recovery strategy.

Note: the examples reported in this article have been tested against Oracle 11.2.0.3 64 bit for Linux (RHEL5).

Hope it helps..

Read Full Post »

The WordPress.com stats helper monkeys prepared a 2014 annual report for this blog.

Here’s an excerpt:

The concert hall at the Sydney Opera House holds 2,700 people. This blog was viewed about 10,000 times in 2014. If it were a concert at Sydney Opera House, it would take about 4 sold-out performances for that many people to see it.

Click here to see the complete report.

Read Full Post »

WHAT IS ORCALE AWR (Automatic Workload Resporitory)REPORT By Nimai Karmakar

Prior Oracle 10g we use STATSPACK report to analyze instance health for performance. but now in 10g we use AWR(automatic workload repository) report to analyze Oracle Database health.

AWR Features
The AWR is used to collect performance statistics including:
 Wait events used to identify performance problems.
 Time model statistics indicating the amount of DB time associated with a process from
the V$SESS_TIME_MODEL and V$SYS_TIME_MODEL views.
 Active Session History (ASH) statistics from the V$ACTIVE_SESSION_HISTORY view.
 Some system and session statistics from the V$SYSSTAT and V$SESSTAT views.
 Object usage statistics.
 Resource intensive SQL statements.

How to generate AWR report ?
It is very easy to generate AWR report in Oracle 10g.We have to run just one sql script to generate AWR report.
There is TWO sql script to create AWR report.
1. awrrpt.sql
If we have only One Oracle Database then run awrrpt.sql sql script.
2. awrrpti.sql
If we have more than One Oracle Instance (Like RAC) then run awrrpti.sql script so that we can particular instance for awr report creation.

Location of AWR report sql script
$ORACLE_HOME/rdbms/admin
In my database test
F:\oracle\product\10.2.0\test01\RDBMS\ADMIN

What is default interval period between two awr report ?
AWR report generate automatically for database every 1 hr interval period. But Recommanded is 15 Minutes is enough in two snapshot for better performance

How to Manage & change interval period for awr report ?
There is two to modify or changes in AWR report.

1. Manually —> through DBMS_WORKLOAD_REPOSITORY plsql package
2. Snapshots –> By default snapshots of the relevant data are taken every hour and retained for 7 days. The default values for these settings can be altered using the following procedure.

BEGIN
DBMS_WORKLOAD_REPOSITORY.modify_snapshot_settings(retention => 43200, — Minutes (= 30 Days).

Current value retained
if NULL.
interval => 30); — Minutes. Current value retained if NULL.
END;
/

The changes to the settings are reflected in the DBA_HIST_WR_CONTROL view. Typically the retention period should capture at least one complete workload cycle. If you system has monthly archive and loads a 1 month retention time would be more beneficial that the default 7 days. An interval of “0” switches off snapshot collection, which in turn stops much of the self-tuning
functionality, hence this is not recommended. Automatic collection is only possible if the STATISTICS_LEVEL parameter is set to TYPICAL or ALL.

If the value is set to BASIC manual snapshots can be taken, but they will be missing some statistics. Extra snapshots can be taken and existing snapshots can be removed, as shown below.

EXEC DBMS_WORKLOAD_REPOSITORY.create_snapshot;
BEGIN
DBMS_WORKLOAD_REPOSITORY.drop_snapshot_range (
low_snap_id => 22,
high_snap_id => 32);
END;
/

Snapshot information can be queried from the DBA_HIST_SNAPSHOT view.

Baselines – A baseline is a pair of snapshots that represents a specific period of usage. Once baselines are defined they can be used to compare current performance against similar periods in the past. You may wish to create baseline to represent a period of batch processing.

BEGIN
DBMS_WORKLOAD_REPOSITORY.create_baseline (
start_snap_id => 210,
end_snap_id => 220,
baseline_name => ‘batch baseline’);
END;
/

The pair of snapshots associated with a baseline are retained until the baseline is explicitly deleted.

BEGIN
DBMS_WORKLOAD_REPOSITORY.drop_baseline (
baseline_name => ‘batch baseline’,
cascade => FALSE); — Deletes associated snapshots if TRUE.
END;
/

Baseline information can be queried from the DBA_HIST_BASELINE view.

Workload Repository Views
The following workload repository views are available:
 V$ACTIVE_SESSION_HISTORY – Displays the active session history (ASH) sampled every second.
 V$METRIC – Displays metric information.
 V$METRICNAME – Displays the metrics associated with each metric group.
 V$METRIC_HISTORY – Displays historical metrics.
 V$METRICGROUP – Displays all metrics groups.
 DBA_HIST_ACTIVE_SESS_HISTORY – Displays the history contents of the active session history.
 DBA_HIST_BASELINE – Displays baseline information.
 DBA_HIST_DATABASE_INSTANCE – Displays database environment information.
 DBA_HIST_SNAPSHOT – Displays snapshot information.
 DBA_HIST_SQL_PLAN – Displays SQL execution plans.
 DBA_HIST_WR_CONTROL – Displays AWR settings.

Workload Repository Reports
Oracle provide two scripts to produce workload repository reports (awrrpt.sql and awrrpti.sql).They are similar in format to the statspack reports and give the option of HTML or plain text formats. The two reports give essential the same output but the awrrpti.sql allows you to select a single instance. The reports can be generated as follows.

@$ORACLE_HOME/rdbms/admin/awrrpt.sql
@$ORACLE_HOME/rdbms/admin/awrrpti.sql

The scripts prompt you to enter the report format (html or text), the start snapshot id, the end snapshot id and the report filename. The resulting report can be opend in a browser or text editor accordingly.

Enterprise Manager
The automated workload repository administration tasks have been included in Enterprise Manager. The “Automatic Workload Repository” page is accessed from the main page by clicking on the “Administration” link, then the “Workload Repository” link under the “Workload” section.

The page allows you to modify AWR settings or manage snapshots without using the PL/SQL APIs

You can edit snapshot setting here..
CREATE_SNAPSHOT Function and Procedure

This function and procedure create snapshots.In the case of the function, the snapshot ID is returned.

Syntax

DBMS_WORKLOAD_REPOSITORY.CREATE_SNAPSHOT(flush_level IN VARCHAR2 DEFAULT ‘TYPICAL’);
DBMS_WORKLOAD_REPOSITORY.CREATE_SNAPSHOT(flush_level IN VARCHAR2 DEFAULT ‘TYPICAL’)RETURN NUMBER;

Parameter Description

flush_level The flush level for the snapshot is either ‘TYPICAL’ or ‘ALL’

Examples

This example creates a manual snapshot at the TYPICAL level:

EXECUTE DBMS_WORKLOAD_REPOSITORY.CREATE_SNAPSHOT();

If you query the DBA_HIST_SNAPSHOT view after the CREATE_SNAPSHOT action, you will see one more snapshot ID added to the Workload Repository.

DROP_BASELINE Procedure

This procedure drops a baseline.

Syntax

DBMS_WORKLOAD_REPOSITORY.DROP_BASELINE(
baseline_name IN VARCHAR2,
cascade IN BOOLEAN DEFAULT false,
dbid IN NUMBER DEFAULT NULL);

Parameter Description

baseline_name The name of baseline.
cascade If TRUE, the pair of snapshots associated with the baseline will also be dropped.
Otherwise, only the baseline is removed.
dbid The (optional) database id (default to local DBID).

Examples
This example drops the baseline ‘oltp_peakload_bl’ without dropping the underlying snapshots:

EXECUTE DBMS_WORKLOAD_REPOSITORY.DROP_BASELINE (baseline_name => ‘oltp_peakload_bl’);

If you query the DBA_HIST_BASELINE view after the DROP_BASELINE action, you will see the specified baseline definition is removed. You can query theDBS_HIST_SNAPSHOT view to find that the underlying snapshots are left intact.

DROP_SNAPSHOT_RANGE Procedure

This procedure drops a range of snapshots.

DBMS_WORKLOAD_REPOSITORY.DROP_SNAPSHOT_RANGE(
low_snap_id IN NUMBER,
high_snap_id IN NUMBER
dbid IN NUMBER DEFAULT NULL);

Parameter Description

low_snap_id The low snapshot id of snapshots to drop.
high_snap_id The high snapshot id of snapshots to drop.
dbid The database id (default to local DBID.

Examples
This example drops the range of snapshots between snapshot id 102 to 105 for the local database:

EXECUTE DBMS_WORKLOAD_REPOSITORY.DROP_SNAPSHOT_RANGE(102, 105);
If you query the dba_hist_snapshot view after the Drop Snapshot action, you will see that snapshots 102 to 105 are removed from the Workload Repository.

How to Interpreting with AWR report ?

Below is Main Section in AWR report for concern.
Load Profile – In this section we know about Logical Read, Physical Read, Soft Parse, Hard Parse Instance Efficiency Percentages
Many Oracle Guru’s is not recommended to trust on Instance percentage ratio. becuase suppose if your instance percentage show BUFFER CAHCE HIT RATIO is 100% it is not sign for your database is very fast and running smootly.

Top 5 Timed Events – It is very important section in AWR report.through this we can know most five wait event is effecting database performance.

Oracle Wait Event – Top wait events which usually cause db hung.

SQL Statistics – It this section we will know about Execution time, cpu used, logical & physical read for top mostsql statement.

Advisory Statistics – In this section we will get advice for PGA, BUFFER CACHE, SHARED POOL, JAVA POOL size for better performance.

For text output

SELECT output FROM TABLE
(dbms_workload_repository.awr_report_text
(37933856,1,2900,2911)
(Database_id,instance_num,start_snap,end_snap)
);

OUTPUT
——————————————————————–
WORKLOAD REPOSITORY report for
DB Name DB Id Instance Inst Num Release Cluster Host
———- ———– ——— ——– ———– ——- —–
DBDABR 37933856 dbdabr 1 10.1.0.2.0 NO Host1
Snap Id Snap Time Sessions Curs/Sess
——— ——————- ——– ———
Begin Snap: 2900 19-Aug-04 11:00:29 18 5.2
End Snap: 2911 19-Aug-04 22:00:16 18 4.6
Elapsed: 659.78 (mins)
DB Time: 10.08 (mins)
Cache Sizes (end)
~~~~~~~~~~~~~~~~~
Buffer Cache: 48M Std Block Size: 8K
Shared Pool Size: 56M Log Buffer: 256K
Load Profile
~~~~ Per Second Per Transaction
——– —————
Redo size: 1,766.20 18,526.31
Logical reads: 39.21 411.30
Block changes: 11.11 116.54
Physical reads: 0.38 3.95
Physical writes: 0.38 3.96
User calls: 0.06 0.64
Parses: 2.04 21.37
Hard parses: 0.14 1.45
Sorts: 1.02 10.72
Logons: 0.02 0.21
Executes: 4.19 43.91
For html output
SELECT
output
FROM
TABLE
(dbms_workload_repository.awr_report_html
(37933856,1,2900,2911)
(Database_id,instance_num,start_snap,end_snap)
);

Hope it helps a little bit.

Thanks & Regards

Nimai Karmakar

Read Full Post »

Some issues which I have faced at past are some of my blocks got corrupted on the pre-prod, I can’t  give you the exact scenario but I will try to make that scenario again.

So, if some of our data block got corrupted, How we can restore them?

Answer is simple Oracle Block Media Recovery.

When some of the data blocks in the data files got physically corrupted and if you don’t have RMAN backups then you will have to restore the full data file from backup to restore and recover those few blocks (like expdp/impdp, cold, hot backup), which can be quite a hectic job. But if you use rman backups then you can take benefit of this very powerful feature called BMR (BLOCK MEDIA RECOVERY). Using Block Media Recovery only those blocks which got corrupted can be recovered from a backup instead of recovering whole file.

The steps to resolve block corruption are simple…..

  • Start SQL*Plus and connect to the target database.
  • Query V$DATABASE_BLOCK_CORRUPTION to determine whether corrupt blocks exist. For example, execute the following statement:
SQL> SELECT * FROM V$DATABASE_BLOCK_CORRUPTION;
  • (if exist) Start RMAN and connect to the target database.
  • Recover all blocks marked corrupt in V$DATABASE_BLOCK_CORRUPTION.The following command repairs all physically corrupted blocks recorded in the view:
  • RMAN> RECOVER CORRUPTION LIST;

Lets create the scenario.

For this we need to have a test tablespace with a different data-file where we can corrupt our data block. So lets create the tablespace.

Login to sqlplus.

$ sqlplus / as sysdba

create tablespace test datafile ‘/u01/apps/oradata/test/test01.dbf’ size 1G autoextend on next 10M

extent management dictionary segment space management auto

/

Tablespace created.

create user nimai identified by password default tablespace test

quota unlimited on test

/

User created.

grant connect , resource,dba to nimai;

Grant succeeded.

conn nimai/password;

Connected.

create table testnimai as select * from all_objects;

Table created.

SQL> exit

Ok, now we have a tablespace named as test, which has a table named testnimai owned by the user nimai.

Lets take a backup of this tablespace using RMAN.

$ rman target /

RMAN>run

{

allocate channel no1 type disk;

sql ‘alter system switch log’;

backup datafile 5 format /u02/rman_backup/backup/nimai_%d_%s_%p_%t’;

release channel no1;

}

Starting backup at 14-NOV-13

using channel no1

channel no1: starting full datafile backupset

channel no1: specifying datafile(s) in backupset

input datafile fno=00005 name=/u01/apps/oradata/test/test01.dbf

channel no1: starting piece 1 at 14-NOV-13

channel no1: finished piece 1 at 14-NOV-13

piece handle=/u02/rman_backup/backup/nimai_5vhcr4vz_.bkp

comment=NONE

channel no1: backup set complete, elapsed time: 00:00:01

Finished backup at 14-NOV-13

Check that your backup piece exists

RMAN> list backup;

RMAN> exit

Recovery Manager complete.

Now we have a backup of data file “/u01/apps/oradata/test/test01.dbf”.

let check out the status of the table.

$ sqlplus / as sysdba

select segment_name,header_file,header_block from dba_segments where segment_name = ‘TESTNIMAI’

and owner = ‘NIMAI’;

SEGMENT_NAME                 HEADER_FILE HEADER_BLOCK

—————————- ———– ————

TESTNIMAI                              5           10

SQL> exit

The header of the table is in block 11, so if we are able to corrupt the next block we can create a scenario for test. Lets corrupt the next block which is 11 using the “dd” command in Linux (Note: Please use the dd command carefully in you environment as its can remove some important blocks from your environment).

$ cd /u01

$ dd of=/apps/oradata/test/test01.dbf bs=512 count=1 seek=11 << EOF

Ok, now we have executed the command and the block 11 got corrupted as well in the data file “/apps/oradata/test/test01.dbf”.

$ sqlplus / as sysdba

SQL> alter system flush buffer_cache;

System altered.

We need to flush the buffer_cache because if the block 11 is in the buffer_cache then we can’t read that block from the data file.

We will get the block corruption error

  1.  when we will query the v$database_block_corruption.
  2.  in the alert log file.
  3. when we use dbverify (dbv) utility for that file.
  4. when we will try to query the table.

lets check here simply as users are.

SQL> conn nimai/password

Connected.

SQL> select count(*) from testnimai;

select count(*) from testnimai

*

ERROR at line 1:

ORA-01578: ORACLE data block corrupted (file # 5, block # 11)

ORA-01110: data file 5: ‘/u01/apps/oradata/test/test01.dbf’

SQL> exit

As our scenario is created and now we can proceed further to recover the block 11 of data file 5.

$ rman target /

RMAN> BLOCKRECOVER DATAFILE 5 BLOCK 11;

Starting blockrecover at 14-NOV-13

using target database control file instead of recovery catalog

allocated channel: ORA_DISK_1

channel ORA_DISK_1: sid=154 devtype=DISK

channel ORA_DISK_1: restoring block(s)

channel ORA_DISK_1: specifying block(s) to restore from backup set

restoring blocks of datafile 00005

channel ORA_DISK_1: restored block(s) from backup piece 1

piece handle=/u02/rman_backup/backup/nimai_5vhcr4vz_.bkp

channel ORA_DISK_1: block restore complete, elapsed time: 00:00:01

starting media recovery

media recovery complete, elapsed time: 00:00:01

Finished blockrecover at 14-NOV-13

RMAN> exit

Recovery Manager complete.

BLOCK MEDIA RECOVERY Complete.

$ sqlplus nimai/password

SQL> select count(*) from testnimai;

COUNT(*)

———-

40688

SQL> exit

Lets now corrupt more then one blocks in the file and do a test for Block Media Recovery.

$ cd /u01

$ dd of=/apps/oradata/test/test01.dbf bs=512 count=1 seek=11 << EOF

$ dd of=/apps/oradata/test/test01.dbf bs=512 count=1 seek=12 << EOF

$ dd of=/apps/oradata/test/test01.dbf bs=512 count=1 seek=13 << EOF

$ sqlplus nimai/password

SQL> select count(*) from testnimai;

COUNT(*)

———-

40688

So,why are we not getting the error, remember as I told you earlier we need to flush the buffer_cache because if the block is in the buffer_cache then we can’t read that block from the data file.So, here we have queried the blocks from buffer_cache.

SQL> conn / as sysdba

Connected.

SQL> alter system flush buffer_cache;

System altered.

Now query again.

SQL> conn nimai/password

Connected.

SQL> select count(*) from testnimai;

select count(*) from testnimai

*

ERROR at line 1:

ORA-01578: ORACLE data block corrupted (file # 5, block # 11)

ORA-01110: data file 5: ‘/u01/apps/oradata/test/test01.dbf’

SQL> exit

Now say that you have three blocks corrupt in your data file, your automated backup script started in the night somewhere and took a backup of the file marking the blocks as corrupt.  When RMAN finds corrupt blocks in the data file it reports them in v$backup_corruption.

$ rman target /

after getting the corrupted block count from v$backup_corruption we can set in the backup script to not backup those 3 block  (as we have corrupted three blocks e.g. 11,12,13)

We will use set maxcorrupt command in RMAN which will ignore 3 corrupted blocks in the file 5 and mark the blocks as corrupt and mark those 3 blocks in v$backup_corruption.

RMAN> run {

allocate channel no1 type disk;

set maxcorrupt for datafile 5 to 3;

backup datafile 5;

release channel no1;

}

executing command: SET MAX CORRUPT

using target database control file instead of recovery catalog

Starting backup at 14-NOV-13

allocated channel: no1

channel no1: sid=158 devtype=DISK

channel no1: starting full datafile backupset

channel no1: specifying datafile(s) in backupset

input datafile fno=00005

name=/u01/apps/oradata/test/test01.dbf

channel no1: starting piece 1 at 14-NOV-13

channel no1: finished piece 1 at 14-NOV-13

piece handle=

channel ORA_DISK_1: backup set complete, elapsed time: 00:00:03

Finished backup at 14-NOV-13

RMAN> exit

Recovery Manager complete.

The backup is complete now lets query the v$backup_corruption to see our corrupted blocks count.

$ sqlplus / as sysdba

select piece#, file#, block# , blocks , marked_corrupt from v$backup_corruption;

PIECE#      FILE#     BLOCK#     BLOCKS MAR

———- ———- ———- ———- —

1          5         11          3 YES

SQL> exit

It tells us there are three blocks corrupted in data file 5. Now we can simply go to RMAN use the BLOCK RECOVER command to recover all these blocks from the backup we took earlier.

$ rman target /

RMAN> list backup;

RMAN> BLOCKRECOVER CORRUPTION LIST from tag=tag_name;

CORRUPTION LIST means all the blocks reported corrupt in v$backup_corruption.

OR

RMAN> RECOVER CORRUPTION LIST;

RMAN> exit

Recovery Manager complete.

Now lets query again and check that do we have our corrupted data blocks with us in ok condition.

$ sqlplus nimai/password

SQL> select count(*) from testnimai;

COUNT(*)

———-

40688

SQL> exit

All blocks are recovered successfully. Always off and on do take rman backup cause you never know when you can face such type of scenario in your prod environment.

Hope it’s help.

Thanks & Regards

Nimai Karmakar

$

Read Full Post »

Guys as discussed at my last post oracle-performance-tuning-queries I have promised that I will be back with some DBA monitoring shell scripts , So there they are. Below shell scripts can be used for daily Database monitoring by DBA, Some are created by my fellow DBA .

Please edit the configuration as per your environment

  • This scripts checks for tablespace usage. If tablespace is 10 percent free,  it will send an alert e-mail.

#####################################################################
## check_tablespace.sh ##
##
#####################################################################
###!/bin/bash
##
####### Start of configuration
##
######## Oracle Enviorment variables ##########
##
export ORACLE_BASE=/u01/app
export ORACLE_HOME=/rac/app/oracle/product/11.2.0
export ORACLE_UNQNAME=TEST01
export PATH=$PATH:$ORACLE_HOME/bin:$ORACLE_HOME/rdbms/admin:$ORACLE_HOME/lib
export alrt=/rac/app/oracle/diag/rdbms/test01/TEST01/trace/alert_TEST01.log
export asmalrt=/rac/app/oracle/diag/asm/+asm/+ASM1/trace/alert_+ASM1.log
export TNS_ADMIN=$ORACLE_HOME/network/admin
export MONITOR_DIR=$ORACLE_HOME/dba-scripts/monitor

export ORACLE_SID=TEST01
##
######## Other variables #################
##
DBA=nimai.karmakar@hotmail.com
DATABASE=TEST01
datevar=$(date)
datevar2=$(date ‘+%Y-%m-%d-%H-%M’)
##
####### End of configuration
##
sqlplus -s “/ as sysdba” << SQL1
set feed off
set linesize 100
set pagesize 200
spool /rac/app/oracle/product/11.2.0/dba-scripts/monitor/tablespace.alert
SELECT F.TABLESPACE_NAME,
TO_CHAR ((T.TOTAL_SPACE – F.FREE_SPACE),’999,999′) “USED(MB)”,
TO_CHAR (F.FREE_SPACE, ‘999,999’) “FREE(MB)”,
TO_CHAR (T.TOTAL_SPACE, ‘999,999’) “TOTAL(MB)”,
TO_CHAR ((ROUND ((F.FREE_SPACE/T.TOTAL_SPACE)*100)),’999,99999′)||’ %’ PERCENT_FREE
FROM   (
SELECT       TABLESPACE_NAME,
ROUND (SUM (BLOCKS*(SELECT VALUE/1024
FROM V\$PARAMETER
WHERE NAME = ‘db_block_size’)/1024)
) FREE_SPACE
FROM DBA_FREE_SPACE WHERE TABLESPACE_NAME NOT IN (‘SYSTEM’,’SYSAUX’,’TEMP’,’USERS’,’UNDOTBS1′,’UNDOTBS2′)
GROUP BY TABLESPACE_NAME
) F,
(
SELECT TABLESPACE_NAME,
ROUND (SUM (BYTES/1048576)) TOTAL_SPACE
FROM DBA_DATA_FILES WHERE TABLESPACE_NAME NOT IN (‘SYSTEM’,’SYSAUX’,’TEMP’,’USERS’,’UNDOTBS1′,’UNDOTBS2′)
GROUP BY TABLESPACE_NAME
) T
WHERE F.TABLESPACE_NAME = T.TABLESPACE_NAME
AND (ROUND ((F.FREE_SPACE/T.TOTAL_SPACE)*100)) < 10;
spool off
exit
SQL1
if [ `cat /rac/app/oracle/product/11.2.0/dba-scripts/monitor/tablespace.alert|wc -l` -gt 0 ]
then
echo Tablespace less than 10% free on ${DATABASE}.Please add space as necassary >> /rac/app/oracle/product/11.2.0/dba-scripts/monitor/tablespace.tmp
cat /rac/app/oracle/product/11.2.0/dba-scripts/monitor/tablespace.alert >> /rac/app/oracle/product/11.2.0/dba-scripts/monitor/tablespace.tmp
mailx -s “Tablespace percent usage for ${DATABASE} at $datevar” $DBA < /rac/app/oracle/product/11.2.0/dba-scripts/monitor/tablespace.tmp
#####mv /disk1/tablespace.tmp /disk1/tablespace_$datevar2.alert
rm /rac/app/oracle/product/11.2.0/dba-scripts/monitor/tablespace.alert
rm /rac/app/oracle/product/11.2.0/dba-scripts/monitor/tablespace.tmp
fi

  • Script to gather SGA stats in timely manner

####################################################
##
##SGA.sh
##
####################################################
export ORACLE_HOME=/rac/app/oracle/product/11.2.0
export ORACLE_SID=TEST01
export ORACLE_UNQNAME=TEST01
export PATH=$PATH:$ORACLE_HOME/bin:$ORACLE_HOME/rdbms/admin:$ORACLE_HOME/lib
export alrt=/rac/app/oracle/diag/rdbms/test01/TEST01/trace/alert_test011.log
export asmalrt=/rac/app/oracle/diag/asm/+asm/+ASM1/trace/alert_+ASM1.log
export TNS_ADMIN=$ORACLE_HOME/network/admin
MAILID=nimai.karmakar@hotmail.com
#
#
#
sqlplus -s “/ as sysdba” << SQL1
set feed off
set linesize 100
set pagesize 200
spool /rac/app/oracle/product/11.2.0/dba-scripts/monitor/logs/sga_stats.log
col inst_id format 9999999 heading “INSTANCE ID”
col sga_size/1024 format 9999999 heading “SGA SIZE”
col sga_size_factor format 9999999 heading “SGA SIZE FACTOR”
col estd_physical_reads format 9999999 heading “PHYSICAL READ”
select inst_id, sga_size/1024, sga_size_factor, estd_db_time, estd_db_time_factor, estd_physical_reads from gv\$sga_target_advice
order by inst_id, sga_size_factor;
spool off
exit;
SQL1

if [ `cat /rac/app/oracle/product/11.2.0/dba-scripts/monitor/logs/sga_stats.log|wc -l` -gt 0 ]; then
cat /rac/app/oracle/product/11.2.0/dba-scripts/monitor/logs/sga_stats.log
mailx -s “Statistics at `date +%H+%M` for `hostname`” $MAILID << /rac/app/oracle/product/11.2.0/dba-scripts/monitor/logs/sga_stats.log
fi
exit

  • Script to check Server Process (RAC)

#!/bin/bash
export PATH=$PATH:/grid/app/bin/

SUBJECT=”Server Process failed for – Server `hostname` on `date ‘+%m/%d/%y %X %A ‘`”
REPSUBJECT=”Server `hostname` health check report at `date +%H:%M` hours on `date +%d-%m-%Y`”
ERRLOG=$MONITOR_DIR/logs/server_process_check.log
REPORT=$MONITOR_DIR/logs/server_process_report.txt
BODY=$MONITOR_DIR/server_process_report_email_body.txt

## Delete the errorlog file if found
/usr/bin/find  $ERRLOG -type f -exec rm {} \; 2>&1 > /dev/null

##Report recipients
MAILID=’nimai.karmakar@hotmail.com’
chour=`date +%H“date +%M`

if [ `grep -i “TEST01  HEALTH CHECK FOR” $REPORT | wc -l` -eq 0 ]; then
echo “—————————-TEST01 HEALTH CHECK FOR `date +%m_%d_%Y`———————————” > $REPORT
fi

echo ” ” >> $REPORT

echo “—————————-CRS process status `date +%m_%d_%Y_%H:%M`—————————-” >> $REPORT
crsctl check crs | while read outputline
do
if test `echo $outputline | grep ‘online’ | wc -l` -eq 0 ## This will check if the CRS process is online or not
then
echo “Date        : “`date ‘+%m/%d/%y %X %A ‘` >> $ERRLOG
echo “Details     :”`crsctl check  crs` >> $ERRLOG
echo ”  ” >> $ERRLOG
echo ” Details     :”`crsctl check  crs` >> $REPORT
echo ” ” >> $REPORT
echo ” Skiiping other tests ” >> $REPORT
echo ”  ” >> $ERRLOG
##mutt -s “$SUBJECT” $MAILTO < $ERRLOG
/bin/mail -s “$SUBJECT” “$MAILID”
exit
else
echo ” Details     :”`crsctl check  crs` >> $REPORT
echo ” ” >> $REPORT
fi
done

echo “—————————-PMON process Status count on `date +%m_%d_%Y_%H:%M`—————————-” >> $REPORT
if test `ps -ef|grep pmon|grep -v grep |wc -l` -ne 2 ## This will check the no of pmon process for each of the server
then
echo “Date        : “`date ‘+%m/%d/%y %X %A ‘` >> $ERRLOG
echo “Details     :”`ps -ef|grep pmon|grep -v ‘grep’ |wc -l` >> $ERRLOG
echo ” ” >> $ERRLOG
echo ” PMON process not found. Oracle Instance on `hostname` may be down . Require immediate attention” >> $ERRLOG
echo ” ” >> $REPORT
echo ” Skiiping other tests ” >> $REPORT
echo ”  ” >> $ERRLOG
/bin/mail -s “$SUBJECT” “$MAILID”
exit
else
echo ” Details     :”`crsctl check  crs` >> $REPORT
echo ” ” >> $REPORT
fi

echo “—————————-Listener Status on `date +%m_%d_%Y_%H:%M`—————————-” >> $REPORT

##Check whether listener is running. Output should be 1
if test `ps -ef|grep tnslsnr | grep -v “grep” |wc -l` -ne 1 ##Check the no of listener running.
then
echo “Date        : “`date ‘+%m/%d/%y %X %A ‘` >> $ERRLOG
echo “Details     :”`ps -ef|grep tnslsnr |grep -v ‘grep’ |wc -l` >> $ERRLOG
echo ” ” >> $ERRLOG
echo ” Listener on `hostname` may be down . Require immediate attention” >> $ERRLOG
echo ” ” >> $REPORT
echo ”  ” >> $ERRLOG
/bin/mail -s “$SUBJECT” “$MAILID”
else
echo ” Details     :”`crsctl check  crs` >> $REPORT
echo ” ” >> $REPORT
fi

echo “—————————-Checking number of oracle processes `date +%m_%d_%Y_%H:%M`—————————-” >> $REPORT

##Check Process count of “oracle” user. Output should be less than or equal to 1500
if test `ps -ef|grep -i oracle |wc -l` -ge 1000
then
echo “Date        : “`date ‘+%m/%d/%y %X %A ‘` >> $ERRLOG
echo “Details     : “`ps -ef|grep -i oracle|wc -l` >> $ERRLOG
echo ”  ”
echo ” Count of processes exceeded 1000. Require immediate attention” >>  $ERRLOG
echo ”  ” >> $ERRLOG
echo ”  ” >> $ERRLOG
/bin/mail -s “$SUBJECT” “$MAILID”
else
echo “Number of oracle processes: ” `ps -ef|grep -i oracle |wc -l` >> $REPORT
echo ” ” >> $REPORT
fi

##Send the report at particular times (e.g 1500 hours or 2300 hours)
if [ $chour -ge 1500 -a $chour -lt 1502 ]; then
mutt  -s “$REPSUBJECT” -a $REPORT $MAILID < $BODY
if [ $? -eq 0 ]; then
cp $REPORT $MONITOR_DIR/logs/server_process_report_`date +%d-%m-%Y`.txt
> $REPORT
fi
fi

if [ $chour -ge 2350 -a $chour -lt 2355 ]; then
mutt -s “$REPSUBJECT” -a $REPORT $MAILID < $BODY
if [ $? -eq 0 ]; then
cp $REPORT $MONITOR_DIR/logs/server_process_report_`date +%d-%m-%Y`.txt
> $REPORT
fi
fi

exit

  • Script for purging old files

echo ‘Setting your environment’

###ORACLE_SID=#replace with your SID
ORACLE_SID=TEST01
export ORACLE_SID
###
###BDUMP=#replace with your BDUMP path
BDUMP=/rac/app/oracle/diag/rdbms/test01/TEST01/trace
export BDUMP
###
###ADUMP=#replace with your ADUMP path
ADUMP=/rac/app/oracle/admin/TEST01/adump
export ADUMP
###
###UDUMP=#replace with your UDUMP path
UDUMP=/rac/app/oracle/diag/rdbms/test01/TEST01/trace
export UDUMP

DT=`date “+%d%m%y”`
export DT
PID=${$}
export PID
FSEQ=${PID}
export FSEQ

################## Creating Backup Dir if not exist #############################
echo ‘Creating Backup Dir if not exist’
mkdir -p $BDUMP/bdump_oldfiles
mkdir -p $UDUMP/udump_oldfiles
mkdir -p $ADUMP/adump_oldfiles

#### Deleting old Alert log files and trace files################################
echo ‘Deleting old Alert log files and trace files’
cd $BDUMP/bdump_oldfiles
find . -name “*.trc.gz”  -mtime +5 -exec rm {} \ ;
find . -name “*.log.gz”  -mtime +5 -exec rm {} \ ;

cd $BDUMP
ls -lrt | grep “.trc” | awk ‘{print “mv  “$9 ” $BDUMP/bdump_oldfiles “}’ > /tmp/mv$FSEQ.sh
sh /tmp/mv$FSEQ.sh
rm /tmp/mv$FSEQ.sh

#### Backup and Purging of Alert logfile #######################################
echo ‘Backup and Purging of Alert logfile’
cd $BDUMP
cp alert_$ORACLE_SID.log $BDUMP/bdump_oldfiles/alert_$ORACLE_SID.log

cd $BDUMP
>  alert_$ORACLE_SID.log

#### Compression of old Alert log files ########################################
gzip -f $BDUMP/bdump_oldfiles/*.log

#### Deleting old user trace files #############################################
cd $UDUMP/udump_oldfiles
find /$UDUMP/udump_oldfiles -name “*.trc.gz”  -mtime +5 -exec rm {} \ ;

cd $UDUMP
ls -lrt | grep “.trc” | awk ‘{print “mv  “$9 ” $UDUMP/udump_oldfiles “}’ > /tmp/mv$FSEQ.sh
sh /tmp/mv$FSEQ.sh
rm /tmp/mv$FSEQ.sh

cd $UDUMP/udump_oldfiles
ls -lrt | grep “.trc” | grep -v “.gz” | awk ‘{print “gzip -f  ” $9 }’ > /tmp/gzip$FSEQ.sh
sh  /tmp/gzip$FSEQ.sh
rm  /tmp/gzip$FSEQ.sh

#### Deleting old audit files ##################################################
cd $ADUMP/oldfiles
find /$ADUMP/oldfiles -name “*.aud.gz” -mtime +5 -type f -exec rm {} \

cd $ADUMP
ls -lrt | grep “.aud” | awk ‘{print “mv  “$9 ” $ADUMP/oldfiles “}’ > /tmp/mv$FSEQ.sh
sh /tmp/mv$FSEQ.sh
rm /tmp/mv$FSEQ.sh

cd $ADUMP/oldfiles
ls -lrt | grep “.aud” | grep -v “.gz” | awk ‘{print “gzip -f  ” $9 }’ > /tmp/gzip$FSEQ.sh
sh  /tmp/gzip$FSEQ.sh
rm  /tmp/gzip$FSEQ.sh

############################# END ######################################

  • This scripts checks for database recovery area usage and send an  alert e-mail.

#####################################################################
## db_recovery_area_usage.sh ##
##
## Author : Nimai Karmakar
#####################################################################
###!/bin/bash
##
#######
####### Start of configuration
#######
######## Oracle Enviorment variables ##########
##
export ORACLE_BASE=/u01/app
export ORACLE_HOME=/u01/app/oracle/product/11.2.0/test01
export JAVA_HOME=/usr/java/jdk1.6.0_30/
export PATH=${ORACLE_HOME}/bin:${JAVA_HOME}/bin:$PATH:.:
export ORACLE_SID=TEST01
##
######## Other variables #################
##
DBA=nimai.karmakar@hotmail.com
DATABASE=TEST01
datevar=$(date)
datevar2=$(date ‘+%Y-%m-%d-%H-%M’)
##
#######
####### End of configuration
#######
##
sqlplus -s “/ as sysdba” << SQL1
set feed off
set linesize 100
set pagesize 200
spool /disk1/scripts/archive.alert
set lines 100
col Location format a60
select     name “Location”
,  floor(space_limit / 1024 / 1024/1024) “Size GB”
,  ceil(space_used  / 1024 / 1024/1024) “Used GB”
from       v\$recovery_file_dest
order by name
/
spool off
exit
SQL1
if [ `cat /disk1/scripts/archive.alert|awk ‘{print $3}’|wc -l` -gt 0 ]
then
cat /disk1/scripts/archive.alert > /disk1/scripts/archive.tmp
mail -s “DB Recovery area usage for ${DATABASE} at $datevar” $DBA < /disk1/scripts/archive.tmp
rm /disk1/scripts/archive.tmp
rm /disk1/scripts/archive.alert
fi

  • This scripts checks for any ORA errors at alert log file and if found any ORA error send an  alert e-mail.

#####################################################################
## check_alert_log.sh ##
##
## Author : Nimai Karmakar
#####################################################################
###!/bin/bash
##
#######
####### Start of configuration
#######
######## Oracle Enviorment variables ##########
##
export ORACLE_BASE=/u01/app
export ORACLE_HOME=/u01/app/oracle/product/11.2.0/test01
export JAVA_HOME=/usr/java/jdk1.6.0_30/
export PATH=${ORACLE_HOME}/bin:${JAVA_HOME}/bin:$PATH:.:
export ORACLE_SID=TEST01
##
######## Other variables #################
##
DBA=nimai.karmakar@hotmail.com
datevar=$(date)
##
#######
#######
#######
####### End of configuration
#######
#######
##
cd $ORACLE_BASE/diag/rdbms/test01/$ORACLE_SID/trace/
if [ -f alert_${ORACLE_SID}.log ]
then
tail -200 alert_${ORACLE_SID}.log > /disk1/scripts/alert_work.log
grep ORA- /disk1/scripts/alert_work.log >> /disk1/scripts/alert.err
grep Shut /disk1/scripts/alert_work.log >> /disk1/scripts/alert.err
fi
export error << `cat /disk1/scripts/alert.err`

if[ `cat /disk1/scripts/alert.err|wc -l` -gt 0 ]
then
mailx -s “${ORACLE_SID} ORACLE ALERT ERRORS $datevar” $DBA < /disk1/scripts/alert_work.log
fi
fi
rm -f /disk1/scripts/alert.err
rm -f /disk1/scripts/alert_work.log

exit

  • SCRIPT FOR CHECKING MOUNT POINT SPACE IN UNIX

######################################
#!/bin/ksh
#rm /rac/app/oracle/product/11.2.0/dba-scripts/monitor/logs/ERROR_LOG.txt
echo “df -k output for `date` `uname -n`” > /rac/app/oracle/product/11.2.0/dba-scripts/monitor/logs/ERROR_LOG.txt
echo ” ” >> /rac/app/oracle/product/11.2.0/dba-scripts/monitor/logs/ERROR_LOG.txt
echo “File system usage exceeded the threshold on `uname -n` server- `date`” >> /rac/app/oracle/product/11.2.0/dba-scripts/monitor/logs/ERROR_LOG.txt
echo ” ” >> /tmp/dfk.txt
i=1
while [ $i -le `df -k | grep -v proc | grep -v capacity | wc -l` ] ;do
if [ `df -k | grep -v proc | grep -v capacity | head -n $i | tail -1 | awk ‘{print $5}’ | \sed -e ‘s/%//’` -gt 90 ] ; then
echo “File system usage exceeded the threshold on `uname -n` server- `date`” >> /rac/app/oracle/product/11.2.0/dba-scripts/monitor/logs/ERROR_LOG.txt
echo ” ” >> /rac/app/oracle/product/11.2.0/dba-scripts/monitor/logs/ERROR_LOG.txt
df -k | grep -v proc | grep -v capacity | head -n $i | tail -1 >> /rac/app/oracle/product/11.2.0/dba-scripts/monitor/logs/ERROR_LOG.txt
fi
((i=i+1))

done
if [ `cat /rac/app/oracle/product/11.2.0/dba-scripts/monitor/logs/ERROR_LOG.txt | wc -l` -gt 2 ] ; then
#cat /rac/app/oracle/product/11.2.0/dba-scripts/monitor/logs/ERROR_LOG.txt | mailx -s “File system full alert” nimai.karmakar@hotmail.com
cat /rac/app/oracle/product/11.2.0/dba-scripts/monitor/logs/ERROR_LOG.txt
else
exit
fi

  • Script to check the node Eviction (RAC)

#!/bin/bash
. ~/.bash_profile

mailid=’nimai.karmakar@hotmail.com’
date=`date  +%Y-%m-%d” “%H:`

alertlog=/rac/app/oracle/product/11.2.0/dba-scripts/logs/ioerr.log
errlog=err_`date –date=”0 days ago” +_%d_%m_%y_%H_%M`.txt

err1=`grep -B 2 “$date” $alertlog | grep -A 2 “An I/O error” | wc -l`
err2=`grep -B 2 “$date” $alertlog | grep -A 2 “Network communication with node” | wc -l`

if [ $err1 -ge 1 -o $err2 -ge 1 ]
then
echo “Node eviction condition found in server `hostname`. Immediately check DB alert log for further action”  >> $errlog
echo “” >> $errlog
echo `grep -B 2 “$date” $alertlog | grep -A 2 “An I/O error”` >> $errlog
echo “” >> $errlog
echo =`grep -B 2 “$date” $alertlog | grep -A 2 “Network communication with node”` >> $errlog
mutt -s “Node evition type condition found in `hostname`” $mailto < $errLog
rm $errlog
fi

Thats it, Hope it’s helpful.

Thanks & Regards

Nimai Karmakar

Read Full Post »

Older Posts »

Johribazaar's Blog

Just another WordPress.com weblog

Ayurveda and Yoga

Site for Ayurveda and Yoga articles

SanOraLife

Few adventures in my slow running life...

pavankumaroracledba

4 out of 5 dentists recommend this WordPress.com site

ORACLE WRITES

Technical Articles by Kanchana Selvakumar

SAP Basis Cafe

SAP Basis Cafe - Exploring SAP R/3 Basis World

Life Is A Mystery

This Is Wihemdra's Blog...I Share What I Have

ursvenkat's Blog

Oracle Applications DBA Blog

The Ivica Arsov Blog

Database Management & Performance

gumpx

DBA's online diary