Clean up data guideline
If your Identify's database appears to be taking up too much storage, you may want to identify tables with the largest data footprint and safely delete their obsolete data. The following sections provide a step-by-step guide to perform a successful clean-up.
Check out storage consumption of your databases
To obtain the sizes of your databases, execute the following script snippet:
EXEC SP_HELPDB 'Identify'
EXEC SP_HELPDB 'IdentifyAudit'
If you have identified that the Identify database contains excessive data, the next step is to determine which tables are consuming the most space.
To verify size of an Azure SQL database, you must first access the database and then execute the following script snippet:
EXEC sp_spaceused @oneresultset = 1
in which:
- database_size: The current size of the database in megabytes, including both data and log files.
- reserved: The total space allocated by objects in the database.
Check out storage consumption of tables of a database
To obtain the number of rows and size of all tables within the Identify database, execute the following script snippet:
SELECT OBJECT_SCHEMA_NAME(OBJECT_ID) AS [SCHEMA_NAME], OBJECT_NAME(OBJECT_ID) AS [TABLE_NAME],
ROWS = SUM(CASE
WHEN INDEX_ID < 2
THEN ROW_COUNT
ELSE 0
END),
RESERVED_MB = 8 * SUM(RESERVED_PAGE_COUNT) / 1024
FROM SYS.DM_DB_PARTITION_STATS
GROUP BY OBJECT_ID
HAVING OBJECT_SCHEMA_NAME(OBJECT_ID) NOT IN
('SYS','DBO','GUEST','INFORMATION_SCHEMA') AND OBJECT_SCHEMA_NAME(OBJECT_ID) NOT LIKE 'DB_%'
ORDER BY RESERVED_MB DESC, OBJECT_SCHEMA_NAME(OBJECT_ID)
If you have identified that the table T1 in the schema S1 contains obsolete data that can be safely deleted, it is essential to review all indices associated with the table T1 in the schema S1 and start data clean-up.
Clean-up script examples
This section presents examples for efficiently managing large tables by removing their obsolete data through clean-up scripts.
Clean up the OAuthAccessToken and AuditOAuthAccessToken tables
During the process of reviewing indices, it is essential to ensure the index existence:
- The Idx_OAuthAccessToken__ExpirationDate_RefreshTokenExpirationDate index is present on the OAuthAccessToken table within the Identify database.
SELECT SCHEMA_NAME(T.SCHEMA_ID) + '.' + T.[NAME] AS TABLE_VIEW,I.[NAME] AS INDEX_NAME,I.is_disabled,
SUBSTRING(COLUMN_NAMES, 1, LEN(COLUMN_NAMES)-1) AS [COLUMNS],
I.type_desc AS INDEX_TYPE,
CASE WHEN I.IS_UNIQUE = 1 THEN 'UNIQUE'
ELSE 'NOT UNIQUE' END AS [UNIQUE],
CASE WHEN T.[TYPE] = 'U' THEN 'TABLE'
WHEN T.[TYPE] = 'V' THEN 'VIEW'
END AS [OBJECT_TYPE]
FROM SYS.OBJECTS AS T
INNER JOIN SYS.INDEXES AS I
ON T.OBJECT_ID = I.OBJECT_ID
CROSS APPLY (SELECT COL.[NAME] + ', '
FROM SYS.INDEX_COLUMNS AS IC
INNER JOIN SYS.COLUMNS AS COL
ON IC.OBJECT_ID = COL.OBJECT_ID
AND IC.COLUMN_ID = COL.COLUMN_ID
WHERE IC.OBJECT_ID = T.OBJECT_ID
AND IC.INDEX_ID = I.INDEX_ID
ORDER BY COL.COLUMN_ID
FOR XML PATH ('') ) AS D (COLUMN_NAMES)
WHERE T.IS_MS_SHIPPED <> 1
AND SCHEMA_NAME(T.SCHEMA_ID)='#SchemaS1'
AND T.[NAME]='OAuthAccessToken'
AND I.[NAME]='Idx_OAuthAccessToken__ExpirationDate_RefreshTokenExpirationDate'
AND INDEX_ID > 0
ORDER BY INDEX_NAME
GO
If this index does not exist on this table, you can create a new one (Note: Index building can have an impact on SQL Server performance. Therefore, it should be run when you believe there aren't many active users logging in.)
IF NOT EXISTS (SELECT * FROM sys.indexes WHERE object_id = OBJECT_ID(N'[#SchemaS1].[OAuthAccessToken]') AND name = N'Idx_OAuthAccessToken__ExpirationDate_RefreshTokenExpirationDate')
BEGIN
CREATE NONCLUSTERED INDEX [Idx_OAuthAccessToken__ExpirationDate_RefreshTokenExpirationDate] ON [#SchemaS1].[OAuthAccessToken]
(
[ExpirationDate] ASC,
[RefreshTokenExpirationDate] ASC
)
END
Once the index check is complete, the following snippet can be executed on the Identify database to remove 5 million tokens that have expired over a month and its execution time-out is 8 hours. Ensure to modify the schema name before executing the script.
DECLARE @SQLQUERY NVARCHAR(MAX)
DECLARE @CLEANDATE DATETIME
DECLARE @SCHEMA NVARCHAR(50)
DECLARE @TABLENAME NVARCHAR(256)
DECLARE @EXECUTIONTIME NVARCHAR(50)
-- SET THE CLEANDED DATE == CURRENT DATE() - 1 MONTH
SET @CLEANDATE = CONVERT(DATETIME,DATEADD(MM,-1,GETUTCDATE()), 102);
SET @SCHEMA='IDENTIFY01'; --- FILL SCHEMA NAME. IN THIS CASE, I FILL: IDENTIFY01
SET @TABLENAME = '[' + @SCHEMA + '].[OAUTHACCESSTOKEN]'
SET @EXECUTIONTIME = 28800 --- 28800 SECONDS = 8 HOURS
PRINT '-- OAuthAccessToken - data removal process'
SET @SQLQUERY = N'
-- FOR SPEED UP PERF, DISABLE NO COUNT
SET NOCOUNT ON;
PRINT ''Start inserting 5 mil Id to temp table: '' + CONVERT(VARCHAR(50), GETUTCDATE(), 126)
DECLARE @TEMPTABLE TABLE(
IDX INT IDENTITY NOT NULL PRIMARY KEY,
ID INT);
INSERT INTO @TEMPTABLE
SELECT TOP 5000000 ID
FROM ' + @TABLENAME + '
WHERE (REFRESHTOKENEXPIRATIONDATE IS NULL AND EXPIRATIONDATE <''' + CONVERT(VARCHAR(50), @CLEANDATE, 126) + ''')
OR (REFRESHTOKENEXPIRATIONDATE <'''+ CONVERT(VARCHAR(50), @CLEANDATE, 126) +''')
ORDER BY ID;
PRINT ''End inserting 5 mil to temp table: '' + CONVERT(VARCHAR(50), GETUTCDATE(), 126);
DECLARE @starttime datetime = GETUTCDATE();
DECLARE @IDX INT = 1, @MAX INT = 0;
DECLARE @BATCHSIZE INT = 50000;
DECLARE @MINLOGSINDEX INT = 1;
DECLARE @MAXLOGSINDEX INT = 1;
SELECT @MAX = MAX(IDX) FROM @TEMPTABLE;
PRINT ''Total: '' + CAST( @Max as nvarchar) + '' rows'';
PRINT ''Start data cleanup: '' + CONVERT(VARCHAR(50), @starttime, 126)
WHILE ((@IDX <= @MAX) AND Datediff(SECOND, @starttime, GETUTCDATE()) < ' + @EXECUTIONTIME + ')
BEGIN
SELECT @MINLOGSINDEX = ID FROM @TEMPTABLE WHERE IDX = @IDX;
PRINT ''IDX at this point: '' + cast(@IDX as nvarchar);
PRINT ''MINLOGSINDEX at this point: '' + cast(@MINLOGSINDEX as nvarchar);
DELETE ' + @TABLENAME + '
FROM ' + @TABLENAME + ' OAT
INNER JOIN @TEMPTABLE T
ON T.ID = OAT.ID AND
T.IDX BETWEEN @IDX AND @IDX + @BATCHSIZE;
SET @IDX = @IDX + @BATCHSIZE;
PRINT ''Index data cleanup: '' + CAST( @IDX as nvarchar)
SELECT @MAXLOGSINDEX = max(ID) FROM @TEMPTABLE WHERE IDX <= @IDX;
PRINT ''MAXLOGSINDEX at this point: '' + cast(@MAXLOGSINDEX as nvarchar);
PRINT ''Finish data cleanup at: '' + CONVERT(VARCHAR(50), GETUTCDATE(), 126)
END
PRINT ''End data cleanup: '' + CONVERT(VARCHAR(50), GETUTCDATE(), 126)
-- GET NO COUNT BACK
SET NOCOUNT OFF;
'
EXEC SP_EXECUTESQL @SQLQUERY
GO
During the process of reviewing indices, it is essential to ensure the index existence:
- The Idx_AuditOAuthAccessToken__ExpirationDate_RefreshTokenExpirationDate index is present on the AuditOAuthAccessToken table within the IdentifyAudit database.
SELECT SCHEMA_NAME(T.SCHEMA_ID) + '.' + T.[NAME] AS TABLE_VIEW,I.[NAME] AS INDEX_NAME,I.is_disabled,
SUBSTRING(COLUMN_NAMES, 1, LEN(COLUMN_NAMES)-1) AS [COLUMNS],
I.type_desc AS INDEX_TYPE,
CASE WHEN I.IS_UNIQUE = 1 THEN 'UNIQUE'
ELSE 'NOT UNIQUE' END AS [UNIQUE],
SCHEMA_NAME(T.SCHEMA_ID) + '.' + T.[NAME] AS TABLE_VIEW,
CASE WHEN T.[TYPE] = 'U' THEN 'TABLE'
WHEN T.[TYPE] = 'V' THEN 'VIEW'
END AS [OBJECT_TYPE]
FROM SYS.OBJECTS AS T
INNER JOIN SYS.INDEXES AS I
ON T.OBJECT_ID = I.OBJECT_ID
CROSS APPLY (SELECT COL.[NAME] + ', '
FROM SYS.INDEX_COLUMNS AS IC
INNER JOIN SYS.COLUMNS AS COL
ON IC.OBJECT_ID = COL.OBJECT_ID
AND IC.COLUMN_ID = COL.COLUMN_ID
WHERE IC.OBJECT_ID = T.OBJECT_ID
AND IC.INDEX_ID = I.INDEX_ID
ORDER BY COL.COLUMN_ID
FOR XML PATH ('') ) AS D (COLUMN_NAMES)
WHERE T.IS_MS_SHIPPED <> 1
AND SCHEMA_NAME(T.SCHEMA_ID)='#SchemaS1'
AND T.[NAME]='AuditOAuthAccessToken'
AND I.[NAME]='Idx_AuditOAuthAccessToken__ExpirationDate_RefreshTokenExpirationDate'
AND INDEX_ID > 0
ORDER BY INDEX_NAME
GO
If this index does not exist on this table, you can create a new one (Note: Index building can have an impact on SQL Server performance. Therefore, it should be run when you believe there aren't many active users logging in.)
IF NOT EXISTS(SELECT * FROM sys.indexes WHERE name = 'Idx_AuditOAuthAccessToken__ExpirationDate_RefreshTokenExpirationDate' AND object_id = OBJECT_ID('[#SchemaS1].[AuditOAuthAccessToken]'))
BEGIN
CREATE NONCLUSTERED INDEX [Idx_AuditOAuthAccessToken__ExpirationDate_RefreshTokenExpirationDate] ON [#SchemaS1].[AuditOAuthAccessToken]
(
[ExpirationDate] ASC,
[RefreshTokenExpirationDate] ASC
)
END
To remove expired data from the AuditOAuthAccessToken table, modify the schema name in the following snippet, and execute it on the IdentifyAudit database. This will delete 5 million tokens that have expired over a month and its execution time-out is 8 hours.
DECLARE @SQLQUERY NVARCHAR(MAX)
DECLARE @CLEANDATE DATETIME
DECLARE @SCHEMA NVARCHAR(50)
DECLARE @TABLENAME NVARCHAR(256)
DECLARE @EVENTTABLE NVARCHAR(256)
DECLARE @EXECUTIONTIME NVARCHAR(50)
-- SET THE CLEANDED DATE == CURRENT DATE() - 1 MONTH
SET @CLEANDATE = CONVERT(DATETIME,DATEADD(MM,-1,GETUTCDATE()), 102);
SET @SCHEMA='IDENTIFY01'; --- FILL SCHEMA NAME. IN THIS CASE, I FILL: IDENTIFY01
SET @TABLENAME = '[' + @SCHEMA + '].[AUDITOAUTHACCESSTOKEN]'
SET @EVENTTABLE = '[' + @SCHEMA + '].[AUDITEVENT]'
SET @EXECUTIONTIME = 28800 --- 28800 SECONDS = 8 HOURS
PRINT '-- AuditOAuthAccessToken - data removal process'
SET @SQLQUERY = N'
-- FOR SPEED UP PERF, DISABLE NO COUNT
SET NOCOUNT ON;
PRINT ''Start inserting 5 mil Id to temp table: '' + CONVERT(VARCHAR(50), GETUTCDATE(), 126)
DECLARE @TEMPTABLE TABLE(
IDX INT IDENTITY NOT NULL PRIMARY KEY,
ID INT,
AUDITEVENTID INT);
INSERT INTO @TEMPTABLE
SELECT TOP 5000000 ID, AUDITEVENTID
FROM ' + @TABLENAME + '
WHERE (REFRESHTOKENEXPIRATIONDATE IS NULL AND EXPIRATIONDATE <''' + CONVERT(VARCHAR(50), @CLEANDATE, 126) + ''')
OR (REFRESHTOKENEXPIRATIONDATE <'''+ CONVERT(VARCHAR(50), @CLEANDATE, 126) +''')
ORDER BY ID;
SELECT * FROM @TEMPTABLE
PRINT ''End inserting 5 mil to temp table: '' + CONVERT(VARCHAR(50), GETUTCDATE(), 126);
DECLARE @starttime datetime = GETUTCDATE();
DECLARE @IDX INT = 1, @MAX INT = 0;
DECLARE @BATCHSIZE INT = 50000;
DECLARE @MINLOGSINDEX INT = 1;
DECLARE @MAXLOGSINDEX INT = 1;
SELECT @MAX = MAX(IDX) FROM @TEMPTABLE;
PRINT ''Total: '' + CAST( @Max as nvarchar) + '' rows'';
PRINT ''Start AuditOAuth2AccessToken data cleanup: '' + CONVERT(VARCHAR(50), @starttime, 126)
WHILE ((@IDX <= @MAX) AND Datediff(SECOND, @starttime, GETUTCDATE()) < ' + @EXECUTIONTIME + ')
BEGIN
SELECT @MINLOGSINDEX = ID FROM @TEMPTABLE WHERE IDX = @IDX;
PRINT ''IDX at this point: '' + cast(@IDX as nvarchar);
PRINT ''MINLOGSINDEX at this point: '' + cast(@MINLOGSINDEX as nvarchar);
DELETE ' + @TABLENAME + '
FROM ' + @TABLENAME + ' OAT
INNER JOIN @TEMPTABLE T
ON T.ID = OAT.ID AND
T.IDX BETWEEN @IDX AND @IDX + @BATCHSIZE;
PRINT ''Finish AuditOAuth2AccessToken data cleanup at: '' + CONVERT(VARCHAR(50), GETUTCDATE(), 126)
DELETE ' + @EVENTTABLE + '
FROM ' + @EVENTTABLE + ' AE
INNER JOIN @TEMPTABLE T
ON T.AUDITEVENTID = AE.ID AND
T.IDX BETWEEN @IDX AND @IDX + @BATCHSIZE;
PRINT ''Finish AuditEvent data cleanup at: '' + CONVERT(VARCHAR(50), GETUTCDATE(), 126)
SET @IDX = @IDX + @BATCHSIZE;
PRINT ''Index data cleanup: '' + CAST( @IDX as nvarchar)
SELECT @MAXLOGSINDEX = max(ID) FROM @TEMPTABLE WHERE IDX <= @IDX;
PRINT ''MAXLOGSINDEX at this point: '' + cast(@MAXLOGSINDEX as nvarchar);
END
PRINT ''End AuditOAuth2AccessToken data cleanup: '' + CONVERT(VARCHAR(50), GETUTCDATE(), 126)
-- GET NO COUNT BACK
SET NOCOUNT OFF;
'
EXEC SP_EXECUTESQL @SQLQUERY
GO
Clean up the Logs table
During the process of reviewing indices, it is essential to ensure the index existence:
- The IXD_Logs__TimeStamp__Type__LogLevel index is present on the Logs table of the schema S1 within the IdentifyAudit database.
SELECT SCHEMA_NAME(T.SCHEMA_ID) + '.' + T.[NAME] AS TABLE_VIEW,I.[NAME] AS INDEX_NAME,I.is_disabled,
SUBSTRING(COLUMN_NAMES, 1, LEN(COLUMN_NAMES)-1) AS [COLUMNS],
I.type_desc AS INDEX_TYPE,
CASE WHEN I.IS_UNIQUE = 1 THEN 'UNIQUE'
ELSE 'NOT UNIQUE' END AS [UNIQUE],
SCHEMA_NAME(T.SCHEMA_ID) + '.' + T.[NAME] AS TABLE_VIEW,
CASE WHEN T.[TYPE] = 'U' THEN 'TABLE'
WHEN T.[TYPE] = 'V' THEN 'VIEW'
END AS [OBJECT_TYPE]
FROM SYS.OBJECTS AS T
INNER JOIN SYS.INDEXES AS I
ON T.OBJECT_ID = I.OBJECT_ID
CROSS APPLY (SELECT COL.[NAME] + ', '
FROM SYS.INDEX_COLUMNS AS IC
INNER JOIN SYS.COLUMNS AS COL
ON IC.OBJECT_ID = COL.OBJECT_ID
AND IC.COLUMN_ID = COL.COLUMN_ID
WHERE IC.OBJECT_ID = T.OBJECT_ID
AND IC.INDEX_ID = I.INDEX_ID
ORDER BY COL.COLUMN_ID
FOR XML PATH ('') ) AS D (COLUMN_NAMES)
WHERE T.IS_MS_SHIPPED <> 1
AND SCHEMA_NAME(T.SCHEMA_ID)='#SchemaS1'
AND T.[NAME]='LOGS'
AND I.[NAME]='IXD_Logs__TimeStamp__Type__LogLevel'
AND INDEX_ID > 0
ORDER BY INDEX_NAME
GO
If this index does not exist on this table, you can create a new one (Note: Index building can have an impact on SQL Server performance. Therefore, it should be run when you believe there aren't many active users logging in.)
IF NOT EXISTS ( SELECT * FROM sys.indexes WHERE name='IXD_Logs__TimeStamp__Type__LogLevel' AND object_id = OBJECT_ID(N'[#SchemaS1].[Logs]'))
BEGIN
CREATE NONCLUSTERED INDEX IXD_Logs__TimeStamp__Type__LogLevel ON [#SchemaS1].[LOGS] ([TimeStamp])
INCLUDE ([Type], [LogLevel])
END
Once the index check is complete, the following snippet can be executed on the IdentifyAudit database to to delete 5 million log records that are older than 7 months and its execution time-out is 8 hours. Ensure to modify the schema name before executing the script.
DECLARE @SQLQUERY NVARCHAR(MAX)
DECLARE @CLEANDATE DATETIME
DECLARE @SCHEMA NVARCHAR(50)
DECLARE @TABLENAME NVARCHAR(256)
DECLARE @EXECUTIONTIME NVARCHAR(50)
-- SET THE CLEANDED DATE == CURRENT DATE() - 7 MONTH
SET @CLEANDATE = CONVERT(DATETIME,DATEADD(MM,-7,GETUTCDATE()), 102);
SET @SCHEMA='IDENTIFY01'; --- FILL SCHEMA NAME. IN THIS CASE, I FILL: IDENTIFY01
SET @TABLENAME = '[' + @SCHEMA + '].[LOGS]'
SET @EXECUTIONTIME = 28800 --- 28800 SECONDS = 8 HOURS
PRINT '-- Logs data removal process'
SET @SQLQUERY = N'
-- FOR SPEED UP PERF, DISABLE NO COUNT
SET NOCOUNT ON;
PRINT ''Start inserting 5 mil Id to temp table: '' + CONVERT(VARCHAR(50), GETUTCDATE(), 126)
DECLARE @TEMPTABLE TABLE(
IDX INT IDENTITY NOT NULL PRIMARY KEY,
ID INT);
INSERT INTO @TEMPTABLE
SELECT TOP 5000000 ID
FROM ' + @TABLENAME + '
WHERE ([timestamp] <'''+ CONVERT(VARCHAR(50), @CLEANDATE, 126) +''')
ORDER BY ID;
PRINT ''End inserting 5 mil to temp table: '' + CONVERT(VARCHAR(50), GETUTCDATE(), 126);
DECLARE @starttime datetime = GETUTCDATE();
DECLARE @IDX INT = 1, @MAX INT = 0;
DECLARE @BATCHSIZE INT = 50000;
DECLARE @MINLOGSINDEX INT = 1;
DECLARE @MAXLOGSINDEX INT = 1;
SELECT @MAX = MAX(IDX) FROM @TEMPTABLE;
PRINT ''Total: '' + CAST( @Max as nvarchar) + '' rows'';
PRINT ''Start data cleanup: '' + CONVERT(VARCHAR(50), @starttime, 126)
WHILE ((@IDX <= @MAX) AND Datediff(SECOND, @starttime, GETUTCDATE()) < ' + @EXECUTIONTIME + ')
BEGIN
SELECT @MINLOGSINDEX = ID FROM @TEMPTABLE WHERE IDX = @IDX;
PRINT ''IDX at this point: '' + cast(@IDX as nvarchar);
PRINT ''MINLOGSINDEX at this point: '' + cast(@MINLOGSINDEX as nvarchar);
SET @IDX = @IDX + @BATCHSIZE;
SELECT @MAXLOGSINDEX = max(ID) FROM @TEMPTABLE WHERE IDX <= @IDX;
PRINT ''Index data cleanup: '' + CAST( @IDX as nvarchar)
PRINT ''MAXLOGSINDEX at this point: '' + cast(@MAXLOGSINDEX as nvarchar);
DELETE FROM ' + @TABLENAME + '
WHERE ID >= @MINLOGSINDEX AND ID <= @MAXLOGSINDEX
PRINT ''Finish data cleanup at: '' + CONVERT(VARCHAR(50), GETUTCDATE(), 126)
END
PRINT ''End data cleanup: '' + CONVERT(VARCHAR(50), GETUTCDATE(), 126)
-- GET NO COUNT BACK
SET NOCOUNT OFF;
'
EXEC SP_EXECUTESQL @SQLQUERY
GO