Home » SQL & PL/SQL » SQL & PL/SQL » Design Question (12c )
Design Question [message #675863] |
Fri, 26 April 2019 10:25 |
ora9a
Messages: 42 Registered: June 2010
|
Member |
|
|
Hi Guys,
I have a requirement to send an extract of data (in xml format) to a 3rd party every day. After the initial extract with all data, subsequent extracts should only be for any data that has changed (from yesterday's extract).
I am trying to decide what is the best way to get only changed data. I will be using a pretty simple query that uses 3 tables for the data.
e.g
SELECT forename, surname, address, course_name
FROM student s, address a, courses c
WHERE s.id = a.id
AND s.id = c.id
So I only want it to return records where forename, surname, address or course_name may have changed from the previous day.
Does anyone have an efficient way of doing this? I was thinking of doing something like>
Above query populates a table each day, then the next day's query compares against the table for any changed data.
Thanks
|
|
|
|
|
Re: Design Question [message #675866 is a reply to message #675864] |
Fri, 26 April 2019 10:52 |
ora9a
Messages: 42 Registered: June 2010
|
Member |
|
|
Initial will be about 20,000 records. After that daily changes will be few hundred records. But at certain times in year when students enrol they will all need to be sent in the extract, which may be about 10,000
Minus - Do you mean using my approach of creating a table and then querying that using Minus?
Thanks
|
|
|
|
|
Re: Design Question [message #675869 is a reply to message #675868] |
Fri, 26 April 2019 11:17 |
ora9a
Messages: 42 Registered: June 2010
|
Member |
|
|
> There will be no deletes
Sorry I meant from the student, address, course tables. These student records are never deleted from the db.
I will create some test data.
Thanks
|
|
|
|
|
Re: Design Question [message #675872 is a reply to message #675870] |
Fri, 26 April 2019 12:10 |
ora9a
Messages: 42 Registered: June 2010
|
Member |
|
|
CREATE TABLE student (ID NUMBER, FORENAME VARCHAR2(50), SURNAME VARCHAR2(50));
CREATE TABLE address (ID NUMBER, LINE1 VARCHAR2(50), CITY VARCHAR2(50), ZIPCODE VARCHAR2(20), STU_ID NUMBER);
INSERT INTO student VALUES (1, 'Joe','Bloggs');
INSERT INTO student VALUES (2, 'Adam','Smith');
INSERT INTO address VALUES (1,'15 New Avenue','London','SE1XAD',1);
INSERT INTO address VALUES (2,'25 Bond Street','London','SWBAJ',2);
SELECT stu.id AS Student_id, stu.forename, stu.surname, adr.line1,adr.city
FROM student stu, address adr
WHERE stu.id = adr.stu_id;
So on Day 1 this would return 2 rows.
On Day 2, a student's address is updated:
UPDATE address SET LINE1 = '80 Harvey Drive', CITY = 'Leeds', ZIPCODE = 'L1BXU'
WHERE stu_id = 1;
When the query is executed on day 2, it should only return this student that has been updated, and not both records as in the first day.
Hope that makes sense!
Thanks.
(And the aliases etc. are only for brevity and not according to standards).
|
|
|
|
Re: Design Question [message #675915 is a reply to message #675873] |
Mon, 29 April 2019 05:37 |
ora9a
Messages: 42 Registered: June 2010
|
Member |
|
|
Michel,
I am assuming I will have to keep updating the staging table everyday. e.g.
CREATE TABLE Staging AS (
SELECT stu.id AS Student_id, stu.forename, stu.surname, adr.line1,adr.city
FROM student stu, address adr
WHERE stu.id = adr.stu_id);
DECLARE
CURSOR Cur1 IS
SELECT stu.id AS Student_id, stu.forename as forename, stu.surname as surname, adr.line1 as line1,adr.city as city
FROM student stu, address adr
WHERE stu.id = adr.stu_id
MINUS
SELECT Student_id,forename,surname,line1,city
FROM Staging;
BEGIN
FOR idx IN cur1 LOOP
UPDATE Staging
SET forename = idx.forename,
surname = idx.surname,
line1 = idx.line1,
city = idx.city
WHERE student_id = idx.student_id;
END LOOP;
END;
|
|
|
Re: Design Question [message #675917 is a reply to message #675915] |
Mon, 29 April 2019 07:49 |
cookiemonster
Messages: 13925 Registered: September 2008 Location: Rainy Manchester
|
Senior Member |
|
|
2 problems:
1) you aren't accounting for new rows - you'll need to do an insert as well. You could replace both update and insert with a single merge.
2) staging will include rows that haven't changed. You could get round this by adding a date column and setting it to trunc(sysdate) whenever a row is inserted or updated.
Then the process that actually creates the xml can just look at rows where date column = trunc(sysdate).
|
|
|
|
Re: Design Question [message #675922 is a reply to message #675920] |
Mon, 29 April 2019 11:40 |
Bill B
Messages: 1971 Registered: December 2004
|
Senior Member |
|
|
Include the student ID and a last_modification date in the staging table and then use a merge statement ones that change get updated with the modification_date being set to sysdate (or timestamp. New rows get inserted. You then simply query the staging table where the modification_date is the current day. Push all those through the XML file
|
|
|
Re: Design Question [message #675931 is a reply to message #675920] |
Tue, 30 April 2019 03:17 |
cookiemonster
Messages: 13925 Registered: September 2008 Location: Rainy Manchester
|
Senior Member |
|
|
Michel Cadot wrote on Mon, 29 April 2019 17:23
You do NOT update staging, after generating your report, the only things you do on it are:
TRUNCATE
INSERT SELECT
If you truncate it what are you doing a minus against?
|
|
|
|
|
Re: Design Question [message #678822 is a reply to message #675941] |
Thu, 16 January 2020 10:57 |
ora9a
Messages: 42 Registered: June 2010
|
Member |
|
|
Hi Michel,
I've come across an issue. If the original query has a where clause such as:
SELECT forename, surname, address, course_name
FROM student s, address a, courses c
WHERE s.id = a.id
AND s.id = c.id
AND course_name = 'A'
This works ok, until a student changes course. The query will therefore not return this student's record, and so the MINUS from the staging table will not return anything.
This means i cannot show that a student's course yesterday was 'A', but today is 'B'.
How can I get around this problem?
Thanks
|
|
|
|
Re: Design Question [message #678826 is a reply to message #678824] |
Thu, 16 January 2020 13:07 |
Lalit Kumar B
Messages: 3174 Registered: May 2013 Location: World Wide on the Web
|
Senior Member |
|
|
If the query is scheduled to extract the data that changed within a stipulated time frame like a day ago, then how about using a DATE column which stores details when the DML happened. So you only extract the data which was manipulated within your required time period.
You would need two date columns, each for inserted and updated datetime values e.g. INS_DATE, UPD_DATE respectively. These columns would only be required in the main table, while you only have one day data in staging stables. As Michel suggested, staging table needs to be truncated everyday and then you could MERGE only the delta everyday into main table. Now all that you need to do is add a filter in your query to send only the rows that have been INS/UPD on SYSDATE -1.
[Updated on: Thu, 16 January 2020 13:53] Report message to a moderator
|
|
|
Re: Design Question [message #678827 is a reply to message #678826] |
Thu, 16 January 2020 13:38 |
Bill B
Messages: 1971 Registered: December 2004
|
Senior Member |
|
|
add the column last_mod (date) to your table add a simple trigger
CREATE OR REPLACE TRIGGER student_t1
BEFORE INSERT OR UPDATE
ON student
REFERENCING NEW AS New OLD AS Old
FOR EACH ROW
BEGIN
:new.last_mod := sysdate;
END student_t1;
Make a simular trigger and column in the address table and you could run a generation report for the previous day (or any other range)
select *
from student
where last_mod > trunc(sysdate-1)
and last_mod <= trunc(sysdate);
[Updated on: Thu, 16 January 2020 13:44] Report message to a moderator
|
|
|
Re: Design Question [message #678829 is a reply to message #678827] |
Thu, 16 January 2020 13:53 |
Lalit Kumar B
Messages: 3174 Registered: May 2013 Location: World Wide on the Web
|
Senior Member |
|
|
I always like to have two date columns for DML changes, one specific to INSERTS another for UPD/DEL. It just completes the entire design and it also leverages for a lot of analytical queries.
Also, depending on requirements, if the use case is suitable for an unified auditing policy I would go for it. As it would be an efficient method to manage compared to triggers.
|
|
|
Re: Design Question [message #678843 is a reply to message #678829] |
Fri, 17 January 2020 05:24 |
ora9a
Messages: 42 Registered: June 2010
|
Member |
|
|
Hi Guys,
@Michel Unfortunately that's how they want it. Only data for certain courses, but if student changes from that course then that needs to be captured.
@Lalit Not allowed to add columns to the base table student.
The process will run at a fixed time every night.
Day 1...
Student: John, Smith, 11 Walter St, A
SELECT forename, surname, address, course_name
FROM student s, address a, courses c
WHERE s.id = a.id
AND s.id = c.id
AND course_name = 'A'
MINUS
SELECT forename, surname, address, course_name
FROM stage
TRUNCATE stage
INSERT INTO stage (
SELECT forename, surname, address, course_name
FROM student s, address a, courses c
WHERE s.id = a.id
AND s.id = c.id
AND course_name = 'A')
-- Day 2
Data changes:
Student: John, Smith, 11 Walter St, B
This will not be picked up in query.
|
|
|
|
Re: Design Question [message #678850 is a reply to message #678849] |
Fri, 17 January 2020 08:19 |
ora9a
Messages: 42 Registered: June 2010
|
Member |
|
|
Thanks for the response Ed. The 3rd party is not willing to do anything, they just want the initial extract, then changed data every day. If the process fails, it can be rerun because the staging table will still be there with previous day's data so i don't think that will be a problem.
I'm wondering whether to create views, one with full data, and second with the restricted where clauses, then somehow comparing the results..
Thanks
|
|
|
Re: Design Question [message #678852 is a reply to message #678850] |
Fri, 17 January 2020 09:44 |
John Watson
Messages: 8935 Registered: January 2010 Location: Global Village
|
Senior Member |
|
|
Quote:The 3rd party is not willing to do anything, they just want the initial extract, then changed data every day. Can you not capture the changes as they occur? Use triggers to write out the I/U/D changes to another table and generate your daily change data report from that.
|
|
|
Re: Design Question [message #678853 is a reply to message #678852] |
Fri, 17 January 2020 10:01 |
ora9a
Messages: 42 Registered: June 2010
|
Member |
|
|
Yes, triggers is the other option, but wanted to avoid as there will be quite a lot of tables involved so will have triggers everywhere.
|
|
|
Re: Design Question [message #678854 is a reply to message #678843] |
Fri, 17 January 2020 10:29 |
Lalit Kumar B
Messages: 3174 Registered: May 2013 Location: World Wide on the Web
|
Senior Member |
|
|
ora9a wrote on Fri, 17 January 2020 16:54Not allowed to add columns to the base table student.
So your entire focus is about WHEN part as Ed also pointed out. Like WHEN a student is enrolled and the WHEN part is even more important with your job scheduling, but you don't have a DATETIME column to address the WHEN part itself? Without it how would you ever know WHEN was the row modified? It will be quite useful even if you go with the trigger approach also if you want to do data validations etc.
With so many design limitations, I would suggest you validate and process the data to get the delta before modifying the base table. It will have an overhead on the ETL but you could process the delta and send it to your 3rd party application. For more help you need to provide more details about the ETL that modifies your base table. It's design, the workflow etc.
|
|
|
|
Re: Design Question [message #678858 is a reply to message #678857] |
Fri, 17 January 2020 11:11 |
ora9a
Messages: 42 Registered: June 2010
|
Member |
|
|
The statement is restricted because only certain types of students are required in the extract, e.g. course A. BUT if the student changes course to 'B', then the 3rd party needs to know about this.
Unfortunately, this is the requirement from the business.
Cheers
|
|
|
Re: Design Question [message #678860 is a reply to message #678858] |
Fri, 17 January 2020 11:43 |
Lalit Kumar B
Messages: 3174 Registered: May 2013 Location: World Wide on the Web
|
Senior Member |
|
|
ora9a wrote on Fri, 17 January 2020 22:41The statement is restricted because only certain types of students are required in the extract, e.g. course A. BUT if the student changes course to 'B', then the 3rd party needs to know about this.
Unfortunately, this is the requirement from the business.
Your business unit must understand that design is important than "how to achieve this ASAP" part. Else, they will feel to be unfortunate.
This seems to be a fine grained auditing at row level or action based audit for particular columns. But still, there are lot of questions you need to answer first. It would be better to stick to WHAT part of design than the HOW/WHEN part.
I hope you understand that your question is about the design at first place, however, you are trying to seek suggestions only about how to achieve the end result. Every ETL process has a critical path which could be made efficient only if the design is good. Otherwise, it will affect the performance of entire ETL jobs and downstream systems/applications consuming your services. It's not late to go back to the drawing board to fix the design first and then think about the latter.
[Updated on: Fri, 17 January 2020 11:59] Report message to a moderator
|
|
|
Re: Design Question [message #678861 is a reply to message #678860] |
Fri, 17 January 2020 12:03 |
Bill B
Messages: 1971 Registered: December 2004
|
Senior Member |
|
|
Another possibility would be to do a minus query using a select of the current table and using a flashback query the contents of the table at a time in the past.
|
|
|
Re: Design Question [message #678862 is a reply to message #678861] |
Fri, 17 January 2020 12:08 |
Lalit Kumar B
Messages: 3174 Registered: May 2013 Location: World Wide on the Web
|
Senior Member |
|
|
Bill B wrote on Fri, 17 January 2020 23:33Another possibility would be to do a minus query using a select of the current table and using a flashback query the contents of the table at a time in the past.
Yeah could be another approach if current suggestions won't work, however, OP needs to confirm whether automatic undo management is configured or not. And how much the undo retention is set to.
|
|
|
Re: Design Question [message #678863 is a reply to message #678858] |
Fri, 17 January 2020 12:13 |
|
Michel Cadot
Messages: 68658 Registered: March 2007 Location: Nanterre, France, http://...
|
Senior Member Account Moderator |
|
|
You can use a double MINUS query with a new "status" field:
SELECT forename, surname, address, course_name, "I/U" status
FROM student s, address a, courses c
WHERE s.id = a.id
AND s.id = c.id
AND course_name = 'A'
MINUS
SELECT forename, surname, address, course_name, "I/U" status
FROM stage
UNION ALL
SELECT forename, surname, address, course_name, "D" status
FROM stage
MINUS
SELECT forename, surname, address, course_name, "D" status
FROM student s, address a, courses c
WHERE s.id = a.id
AND s.id = c.id
AND course_name = 'A'
status "I/U" means inserted into class A or updated inside this class, and status "D" means removed from class A (student dropped or changed class).
|
|
|
Re: Design Question [message #678864 is a reply to message #678858] |
Fri, 17 January 2020 12:57 |
John Watson
Messages: 8935 Registered: January 2010 Location: Global Village
|
Senior Member |
|
|
How about a solution that actually uses the database? From release 12, we have Temporal Validity. You said that you cannot add columns to the tables, but the Temporal Validity columns are hidden so only you will know they are there. I think that Temporal Validity is meant for implementing type 6 Slowly Changing Dimensions, but it can possibly be used to track the changes you need. You would still have to write a few triggers. Well, writing code is part of the job.
|
|
|
|
Re: Design Question [message #678897 is a reply to message #678896] |
Tue, 21 January 2020 07:23 |
John Watson
Messages: 8935 Registered: January 2010 Location: Global Village
|
Senior Member |
|
|
I suppose there is a relationship, Bill - it is about tracking multiple versions. But it does it with table and column structures, not with undo. See here:orclz> select column_name,data_default,hidden_column,virtual_column from user_Tab_cols where table_name='DEPT';
COLUMN_NAME DATA_DEFAULT HID VIR
------------------------------ -------------------------------------------------------------------------------- --- ---
DEPTNO NO NO
DNAME NO NO
LOC NO NO
orclz> alter table dept add (period for user_time);
Table altered.
orclz> select column_name,data_default,hidden_column,virtual_column from user_Tab_cols where table_name='DEPT';
COLUMN_NAME DATA_DEFAULT HID VIR
------------------------------ -------------------------------------------------------------------------------- --- ---
DEPTNO NO NO
DNAME NO NO
LOC NO NO
USER_TIME_START YES NO
USER_TIME_END YES NO
USER_TIME 36770601 YES YES
6 rows selected.
orclz> and then in your SELECT you have the AS OF PERIOD FOR clause, or use DBMS_FLASHBACK.ENABLE_AT_VALID_TIME
|
|
|
Re: Design Question [message #678976 is a reply to message #678897] |
Fri, 24 January 2020 10:31 |
ora9a
Messages: 42 Registered: June 2010
|
Member |
|
|
Thanks for your responses. I am discussing with DBA's long term solutions as we are having to do this DELTA for a lot of new projects.
For now i am trying Michel's suggestion of double minus. @Michel, this works as it brings the student back, but it shows the previous course, (from the stage table), but not the new course that student has changed to. So i am trying to collect this data, then run update on it with new course.
Thanks
|
|
|
|
Goto Forum:
Current Time: Mon Jun 10 05:15:59 CDT 2024
|