Tuesday, 21 January 2020

ASM re-balancing time to complete

The disk group rebalance operation has three phases:
  • Planning
  • File extents relocation
  • Disk compacting
As far as the overall time to complete is concerned, the planning phase time is insignificant so there is no need to worry about it. The file extent relocation phase will take most of the time, so the main focus will be on that. The disk compacting phase may also take significant amount of time, in particular on disk add, so we will have a closer look at that as well.

It is important to understand why the rebalance is running. If we are adding a new disk, say to increase the available disk group space, it doesn't really matter how long it will take for the rebalance to complete. Similarly if we are resizing or dropping disk(s), to adjust the disk group space, we are generally not concerned with the time it takes for the rebalance to complete.

But if a disk has failed and ASM has initiated rebalance, there may be a legitimate reason for concern. If the disk group is normal redundancy AND if another disk fails AND it is the partner of the disk that has already failed, the disk group will be dismounted, all the databases that use that disk group will crash and there may be loss of data. In such cases it may be important to have an idea when the rebalance operation will complete. Actually, we want to see the file extents relocation phase completed, as once it does, all the data is fully redundant again (in case the rebalance was initiated due to a disk failure).

File extents relocation

To have a closer look at the file extents relocation phase, I drop one of the disks with the default rebalance power. I then query GV$ASM_OPERATION to check the estimated completion time (EST_MINUTES):
SQL> show parameter power

NAME                                 TYPE        VALUE
------------------------------------ ----------- ----------------
asm_power_limit                      integer     1

SQL> set time on
16:40:57 SQL> alter diskgroup DATA1 drop disk DATA1_CD_06_CELL06;

Diskgroup altered.

Initial estimated time to complete is 121 minutes:

SQL> select INST_ID, OPERATION, STATE, POWER, SOFAR, EST_WORK, EST_RATE, EST_MINUTES from GV$ASM_OPERATION ;

   INST_ID OPERA STAT      POWER      SOFAR   EST_WORK   EST_RATE EST_MINUTES
---------- ----- ---- ---------- ---------- ---------- ---------- -----------
         1 REBAL DONE         11          0          0          0           0
         1 REBAL RUN          11    1846647    2336971       4043         121
         1 REBAL WAIT         11          0          0          0           0

About 10 minutes into the rebalance, the estimate to complete is 69 minutes:
16:50:25 SQL> /

   INST_ID OPERA STAT      POWER      SOFAR   EST_WORK   EST_RATE EST_MINUTES
---------- ----- ---- ---------- ---------- ---------- ---------- -----------
         1 REBAL DONE         11          0          0          0           0
         1 REBAL RUN          11    1851014    2336971       7030          69
         1 REBAL WAIT         11          0          0          0           0


While that EST_MINUTES doesn't give me much confidence, I see that SOFAR (number of allocation units moved so far) is going up, which is a good sign.

ASM alert log shows the time of the drop disk, the OS process ID of the ARB0 doing all the work, and most importantly - that there are no errors:
Tue Jan 21 09:19:38 2020
SUCCESS: ALTER DISKGROUP DATA DROP DISK
DATA_0003,
DATA_0001,
DATA_0006,
DATA_0005,
DATA_00018,
DATA_00019,
DATA_00020,
DATA_00021,
DATA_00022,
DATA_00023
REBALANCE POWER 11
Starting background process ARB0
Tue Jan 21 09:19:38 2020
ARB0 started with pid=22, OS id=2841 
NOTE: assigning ARB0 to group 1/0x39c42279 (DATA) with 11 parallel I/Os
NOTE: Attempting voting file refresh on diskgroup DATA
Tue Jan 21 09:19:44 2020
NOTE: F1X0 on disk 3 (fmt 2) relocated at fcn 0.95799708: AU 143462 -> AU 0
NOTE: F1X0 on disk 21 (fmt 2) relocated at fcn 0.95799708: AU 143458 -> AU 0
NOTE: F1X0 on disk 22 (fmt 2) relocated at fcn 0.95799708: AU 153322 -> AU 0
NOTE: header on disk 26 advanced to format #2 using fcn 0.0
...

ARB0 trace file should show which file extents are being relocated. It does, and that is how I know that ARB0 is doing what it is supposed to do:
-bash-4.1$ tail -f /d01/app/grid/diag/asm/+asm/+ASM/trace/+ASM_arb0_2841.trc 
ARB0 relocating file +DATA.454.1015698267 (120 entries)

*** 2020-01-21 14:57:49.438
ARB0 relocating file +DATA.454.1015698267 (120 entries)

*** 2020-01-21 14:57:53.686
ARB0 relocating file +DATA.454.1015698267 (120 entries)

*** 2020-01-21 14:57:57.496
ARB0 relocating file +DATA.454.1015698267 (120 entries)

*** 2020-01-21 14:58:01.679
ARB0 relocating file +DATA.454.1015698267 (120 entries)

Note that there may be lot of arb0 trace files in the trace directory, so that's why we need to know the OS process ID of the ARB0 actually doing the rebalance. That information is in the alert log of the ASM instance performing the rebalance.

After about 35 minutes the EST_MINUTES drops to 0:
17:16:54 SQL> /

   INST_ID OPERA STAT      POWER      SOFAR   EST_WORK   EST_RATE EST_MINUTES
---------- ----- ---- ---------- ---------- ---------- ---------- -----------
         2 REBAL RUN           1      74581      75825       2129           0
         3 REBAL WAIT          1
         4 REBAL WAIT          1

And soon after that, the ASM alert log shows:
  • Disk emptied
  • Disk header erased
  • PST update completed successfully
  • Disk closed
  • Rebalance completed

The estimated time was 26 minutes and the rebalance actually took about 36 minutes (in this particular case the disk compacting took less than a minute so I have ignored it). That is why it is more important to understand what is going on, then to know when the rebalance will complete.

Note that the estimated time may also be increasing. If the system is under heavy load, the rebalance will take more time - especially with the rebalance power 1. For a large disk group (many TB) and large number of files, the rebalance can take hours and possibly days.

If you want to get an idea how long will a drop disk take in your environment, you need to test it. Just drop one of the disks, while your system is under normal/typical load. Your data is fully redundant during such disk drop, so you are not exposed to a disk group dismount in case its partner disk fails during the rebalance.

Disk compacting

To look at the disk compacting phase, I add the same disk back, with rebalance power 10:
17:26:48 SQL> alter diskgroup DATA1 add disk '/o/*/DATA1_CD_06_celll06' rebalance power 10;

Diskgroup altered.

Initial estimated time to complete is 6 minutes:
17:27:22 SQL> select INST_ID, OPERATION, STATE, POWER, SOFAR, EST_WORK, EST_RATE, EST_MINUTES from GV$ASM_OPERATION where GROUP_NUMBER=1;

   INST_ID OPERA STAT      POWER      SOFAR   EST_WORK   EST_RATE EST_MINUTES
---------- ----- ---- ---------- ---------- ---------- ---------- -----------
         2 REBAL RUN          10        489      53851       7920           6
         3 REBAL WAIT         10
         4 REBAL WAIT         10

After about 10 minutes, the EST_MINUTES drops to 0:
17:39:05 SQL> /

   INST_ID OPERA STAT      POWER      SOFAR   EST_WORK   EST_RATE EST_MINUTES
---------- ----- ---- ---------- ---------- ---------- ---------- -----------
         3 REBAL WAIT         10
         2 REBAL RUN          10      92407      97874       8716           0
         4 REBAL WAIT         10

And I see the following in the ASM alert log
Wed Jul 11 17:39:49 2012
NOTE: GroupBlock outside rolling migration privileged region
NOTE: requesting all-instance membership refresh for group=1
Wed Jul 11 17:39:58 2012
GMON updating for reconfiguration, group 1 at 31 for pid 43, osid 115117
NOTE: group 1 PST updated.
Wed Jul 11 17:39:58 2012
NOTE: membership refresh pending for group 1/0x6ecaf3e6 (DATA1)
GMON querying group 1 at 32 for pid 19, osid 38421
SUCCESS: refreshed membership for 1/0x6ecaf3e6 (DATA1)
NOTE: Attempting voting file refresh on diskgroup DATA1

That means that ASM has completed the file extents relocation phase of the rebalance and has started the disk compacting phase. If that is true, we should see the kfdCompact() function on the stack. And we do:
# pstack 103326
#0  0x0000003957ccb6ef in poll () from /lib64/libc.so.6
...
#9  0x0000000003d711e0 in kfk_reap_oss_async_io ()
#10 0x0000000003d70c17 in kfk_reap_ios_from_subsys ()
#11 0x0000000000aea50e in kfk_reap_ios ()
#12 0x0000000003d702ae in kfk_io1 ()
#13 0x0000000003d6fe54 in kfkRequest ()
#14 0x0000000003d76540 in kfk_transitIO ()
#15 0x0000000003cd482b in kffRelocateWait ()
#16 0x0000000003cfa190 in kffRelocate ()
#17 0x0000000003c7ba16 in kfdaExecute ()
#18 0x0000000003c4b737 in kfdCompact ()
#19 0x0000000003c4c6d0 in kfdExecute ()
#20 0x0000000003d4bf0e in kfgbRebalExecute ()
#21 0x0000000003d39627 in kfgbDriver ()
#22 0x00000000020e8d23 in ksbabs ()
#23 0x0000000003d4faae in kfgbRun ()
#24 0x00000000020ed95d in ksbrdp ()
#25 0x0000000002322343 in opirip ()
#26 0x0000000001618571 in opidrv ()
#27 0x0000000001c13be7 in sou2o ()
#28 0x000000000083ceba in opimai_real ()
#29 0x0000000001c19b58 in ssthrdmain ()
#30 0x000000000083cda1 in main ()

The tail on the current ARB0 trace file now shows relocating just 1 allocation unit (1 entries) at the time (another sign of the disk compacting phase):
$ tail -f /+ASM2_arb0_103326.trc
ARB0 relocating file +DATA1.321.788357323 (1 entries)
ARB0 relocating file +DATA1.321.788357323 (1 entries)
ARB0 relocating file +DATA1.321.788357323 (1 entries)
...

The V$ASM_OPERATION keeps showing EST_MINUTES=0 during the whole time of the disk compacting (while not helpful, this is normal and expected):
17:42:39 SQL> /

   INST_ID OPERA STAT      POWER      SOFAR   EST_WORK   EST_RATE EST_MINUTES
---------- ----- ---- ---------- ---------- ---------- ---------- -----------
         3 REBAL WAIT         10
         4 REBAL WAIT         10
         2 REBAL RUN          10      98271      98305       7919           0

The X$KFGMG view shows REBALST_KFGMG=2 (yet another confirmation of the disk compacting phase):
17:42:50 SQL> select NUMBER_KFGMG, OP_KFGMG, ACTUAL_KFGMG, REBALST_KFGMG from X$KFGMG;

NUMBER_KFGMG   OP_KFGMG ACTUAL_KFGMG REBALST_KFGMG
------------ ---------- ------------ -------------
           1          1           10             2

Once the compacting phase completes, the alert log shows "stopping process ARB0" and "rebalance completed":
Wed Jul 11 17:43:48 2012
NOTE: stopping process ARB0
SUCCESS: rebalance completed for group 1/0x6ecaf3e6 (DATA1)

In this case, the file extents relocation phase took about 12 minutes and the disk compacting phase took about 4 minutes.

The compacting phase can actually take significant amount of time. In one case I have seen the file extents relocation run for 60 minutes and the disk compacting after that took another 30 minutes. But it doesn't really matter how long it takes for the compacting to complete, because as soon as the file extents relocation completes, all data is fully redundant and we are not exposed to disk group dismount due to a partner disk failure.

Adjusting the rebalance power


The rebalance power can be adjusted dynamically, i.e. during the rebalance. If the rebalance with the default power is 'too slow', the power can be increased. How much? To answer that questions, we need to understand the I/O load, the I/O throughput and most importantly the I/O limits the system can take. If we don't know that, the power can be increased to 5 (with 'ALTER DISKGROUP ... REBALANCE POWER 5;'). We can then check if that makes a difference. Should we go any higher with the rebalance power? Again, as long as we are not adversely impacting the database I/O performance, we can keep increasing the power. But I haven't seen much improvement beyond power 30. Note that the power can go up to 11 with disk groups with COMPATIBLE.ASM<11 .2.0.2="" 1024="" and="" compatible.asm="" disk="" for="" groups="" to="" up="" with="">11.2.0.2.

The testing is the key here. We really need to test the rebalance under the regular production load, with different values for the power. There is no point testing with no databases running or on a system with different storage characteristics.

Tuesday, 7 January 2020

ora 01031 insufficient privileges oracle apex 12cr2

Error: Getting the following error after upgrade Database from Oracle 12c R1 to Oracle 12c R2

ora 01031 insufficient privileges oracle apex 12cr2

Solution

SQL> show parameter sql

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
optimizer_capture_sql_plan_baselines boolean     FALSE
optimizer_use_sql_plan_baselines     boolean     TRUE
plsql_ccflags                        string
plsql_code_type                      string      INTERPRETED
plsql_debug                          boolean     FALSE
plsql_optimize_level                 integer     2
plsql_v2_compatibility               boolean     FALSE
plsql_warnings                       string      DISABLE:ALL
sql92_security                       boolean     TRUE
sql_trace                            boolean     FALSE
sqltune_category                     string      DEFAULT
SQL>

younus01
689456

SQL> alter system set sql92_security=FALSE scope=spfile;

System altered.

SQL>

If the SQL92_SECURITY initialization parameter is set to TRUE and the DELETE operation references table columns, such as the columns in a where_clause, then you must also have the SELECT object privilege on the object from which you want to delete rows.

https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/DELETE.html#GUID-156845A5-B626-412B-9F95-8869B988ABD7

So if you only grant delete, a plain delete with no where clause will work. But you need select to use a where clause (assuming you've set SQL92_SECURITY; true is the default).


Monday, 6 January 2020

Password Oracle 12c SEC_CASE_SENSITIVE_LOGON and ORA-01017... 12c and Later Releases


Error: Oracle 12c SEC_CASE_SENSITIVE_LOGON and ORA-01017

Cause:

This issue is caused by the default setting for allowed logon version in the 12 database.
Note that the SQLNET.ALLOWED_LOGON_VERSION parameter has been deprecated in 12c.
That parameter has been replaced by these:
SQLNET.ALLOWED_LOGON_VERSION_SERVER=n
SQLNET.ALLOWED_LOGON_VERSION_CLIENT=n
Version 12.1:The default setting for the new parameters is 11.  Any client that attempts to connect must
be at version 11 or higher unless these parameters are explicitly set in the server side sqlnet.ora file.
Version 12.2 note:
The default for the SQLNET.ALLOWED_LOGON_VERSION_SERVER setting has changed in 12.2 from 11 to 12.
See:  https://docs.oracle.com/database/122/DBSEG/configuring-authentication.htm#DBSEG33223
Important note for 12.2:  If your client is not at least 11.2.0.3 or includes the CPUOCT2012 patch you will not be able
to use the 12 setting.
Typically, the sqlnet.ora file that would be referenced by the database is located in RDBMS_HOME/network/admin.

Solution

The init.ora/spfile parameter SEC_CASE_SENSITIVE_LOGON got deprecated since Oracle Database 12.1.0.1
This means, we don’t do any further developments to it, you shouldn’t change it from its default TRUE and if you still do you’ll receive a nice warning during STARTUP of your database:
SQL> alter system set sec_case_sensitive_logon=false scope=spfile;
System altered.
SQL> startup force
ORA-32004: obsolete or deprecated parameter(s) specified for RDBMS instance
ORACLE instance started.
Recently a user asked me if we’d changed the behavior of this parameter in Oracle Database 12c Release 2 as he receives now an ORA-01017: Invalid username or password error when having SEC_CASE_SENSITIVE_LOGON=FALSE with every user except SYSDBAs
In which situations may you receive an ORA-01017?
This is outlined in the Oracle 12.1 documentation already:
•Ensure that the SEC_CASE_SENSITIVE_LOGON parameter is not set to FALSE if the SQLNET.ALLOWED_LOGON_VERSION_SERVER parameter is set to 12 or 12a
This is because the more secure password versions used for this mode only support case-sensitive password checking.
For compatibility reasons, Oracle Database does not prevent the use of FALSE for SEC_CASE_SENSITIVE_LOGON when SQLNET.ALLOWED_LOGON_VERSION_SERVER is set to 12 or 12a
Setting SEC_CASE_SENSITIVE_LOGON to FALSE when SQLNET.ALLOWED_LOGON_VERSION_SERVER is set to 12 or 12a causes all accounts to become inaccessible.
The key is the sqlnet.ora parameter SQLNET.ALLOWED_LOGON_VERSION_SERVER and here’s the difference between Oracle Database 12.1 and Oracle Database 12.2:
•Oracle Database 12.1: SQLNET.ALLOWED_LOGON_VERSION_SERVER defaults to 11 out of the box
•Oracle Database 12.2: SQLNET.ALLOWED_LOGON_VERSION_SERVER defaults to 12 out of the box
Behavior difference Oracle 12.1 vs Oracle 12.2
See this simple example after switching SEC_CASE_SENSITIVE_LOGON=FALSE in both databases (as shown above):
•Oracle Database 12.1.0.2:
SQL> alter user system identified by oracle;
User altered.
SQL> connect system/oracle
Connected.
•Oracle Database 12.2.0.1:
SQL> alter user system identified by oracle;
User altered.
SQL> connect system/oracle
ERROR:
ORA-01017: invalid username/password; logon denied
Warning: You are no longer connected to ORACLE
How to workaround the ORA-01017?
First of all you need to edit your sqlnet.ora adding (or lowering) the parameter SQLNET.ALLOWED_LOGON_VERSION_SERVER to a value below 12
But if you try to connect directly after restarting your listener you will receive the same ORA-01017 again
The secret is mentioned in the above documentation link as well, you will have to recreate the user’s password if you need the logon process to work as it did work before Oracle Database 12.2
sqlnet.ora
# sqlnet.ora Network Configuration File: /d01/app/oracle/product/12.2.0.1/network/admin/sqlnet.ora
# Generated by Oracle configuration tools.

-bash-4.4$ cd $ORACLE_HOME/network/admin/
-bash-4.4$ cat sqlnet.ora
# sqlnet.ora Network Configuration File: /oraclehome/app/oracle/product/12.2.0.1/db_1/network/admin/sqlnet.ora
# Generated by Oracle configuration tools.

NAMES.DIRECTORY_PATH= (TNSNAMES, ONAMES, HOSTNAME)
SQLNET.ALLOWED_LOGON_VERSION_SERVER=11

-bash-4.4$



Check in DBA_USERS
SQL> select username, password_versions
from DBA_USERS
where username='SYSTEM';
USERNAME      PASSWORD_VERSIONS
------------- ---------------------
SYSTEM        11G 12C
There’s no “10G” mentioned. This will prevent the connection.
Solution: You will have to specify the password again respective ALTER the user(s):

SQL> alter user SYSTEM identified by said;
User altered.

SQL> select username, password_versions
from DBA_USERS
where username='SYSTEM';
USERNAME       PASSWORD_VERSIONS
-------------- ----------------------
SYSTEM         10G 11G 12C
SQL> connect system/said

Connected.