SQL Database How far to go to prevent Duplication of Records?
up vote
0
down vote
favorite
I've been reading up on proper DB creation techniques, and I've got a large project that will have numerous items in it. I have already mapped out the tables and the various lookup tables, but then I realized that in the user table, I have first_name, middle_name, last_name columns that could be changed to first_name_id, middle_name_id and last_name_id that act as lookups to a first name table, middle name table and last name table that hold only unique names to prevent duplication of data.
The question I have is how far do I go with this process? Does every single item that could potentially be duplicate information need to be done like this? At some point this seems like it will get confusing to ME to keep track of all the relationships and cascading updates/deletes, etc on everything...
Just looking for some advice as I want to make sure this is done properly as setting a proper foundation is very important to build and scale on top of in the future, but at the same time don't want to take this to extremes if it is not needed.
sql-server duplicates data-integrity
add a comment |
up vote
0
down vote
favorite
I've been reading up on proper DB creation techniques, and I've got a large project that will have numerous items in it. I have already mapped out the tables and the various lookup tables, but then I realized that in the user table, I have first_name, middle_name, last_name columns that could be changed to first_name_id, middle_name_id and last_name_id that act as lookups to a first name table, middle name table and last name table that hold only unique names to prevent duplication of data.
The question I have is how far do I go with this process? Does every single item that could potentially be duplicate information need to be done like this? At some point this seems like it will get confusing to ME to keep track of all the relationships and cascading updates/deletes, etc on everything...
Just looking for some advice as I want to make sure this is done properly as setting a proper foundation is very important to build and scale on top of in the future, but at the same time don't want to take this to extremes if it is not needed.
sql-server duplicates data-integrity
You should avoid the temptation of over engineering your platform.
– John Cappelletti
Nov 10 at 14:34
I would advise not to do this. you might save a miniscule amount of storage but it will add complexity to all your queries. See also Is normalizing a person's name going too far?
– Martin Smith
Nov 10 at 17:00
To answer the initial question... The answer is to use a unique constraint. That said, unless you have a compelling reason to do so, don't normalize names. The name "Matt" isn't a distinct entity in the way Order_ID: 854123 is a distinct entity. Think of it this way, if something about "Matt" were to change, would you want that change propagated to every single person with that Name_ID: Matt attribute?
– Jason A. Long
Nov 11 at 3:18
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I've been reading up on proper DB creation techniques, and I've got a large project that will have numerous items in it. I have already mapped out the tables and the various lookup tables, but then I realized that in the user table, I have first_name, middle_name, last_name columns that could be changed to first_name_id, middle_name_id and last_name_id that act as lookups to a first name table, middle name table and last name table that hold only unique names to prevent duplication of data.
The question I have is how far do I go with this process? Does every single item that could potentially be duplicate information need to be done like this? At some point this seems like it will get confusing to ME to keep track of all the relationships and cascading updates/deletes, etc on everything...
Just looking for some advice as I want to make sure this is done properly as setting a proper foundation is very important to build and scale on top of in the future, but at the same time don't want to take this to extremes if it is not needed.
sql-server duplicates data-integrity
I've been reading up on proper DB creation techniques, and I've got a large project that will have numerous items in it. I have already mapped out the tables and the various lookup tables, but then I realized that in the user table, I have first_name, middle_name, last_name columns that could be changed to first_name_id, middle_name_id and last_name_id that act as lookups to a first name table, middle name table and last name table that hold only unique names to prevent duplication of data.
The question I have is how far do I go with this process? Does every single item that could potentially be duplicate information need to be done like this? At some point this seems like it will get confusing to ME to keep track of all the relationships and cascading updates/deletes, etc on everything...
Just looking for some advice as I want to make sure this is done properly as setting a proper foundation is very important to build and scale on top of in the future, but at the same time don't want to take this to extremes if it is not needed.
sql-server duplicates data-integrity
sql-server duplicates data-integrity
asked Nov 10 at 14:20
MattE
390216
390216
You should avoid the temptation of over engineering your platform.
– John Cappelletti
Nov 10 at 14:34
I would advise not to do this. you might save a miniscule amount of storage but it will add complexity to all your queries. See also Is normalizing a person's name going too far?
– Martin Smith
Nov 10 at 17:00
To answer the initial question... The answer is to use a unique constraint. That said, unless you have a compelling reason to do so, don't normalize names. The name "Matt" isn't a distinct entity in the way Order_ID: 854123 is a distinct entity. Think of it this way, if something about "Matt" were to change, would you want that change propagated to every single person with that Name_ID: Matt attribute?
– Jason A. Long
Nov 11 at 3:18
add a comment |
You should avoid the temptation of over engineering your platform.
– John Cappelletti
Nov 10 at 14:34
I would advise not to do this. you might save a miniscule amount of storage but it will add complexity to all your queries. See also Is normalizing a person's name going too far?
– Martin Smith
Nov 10 at 17:00
To answer the initial question... The answer is to use a unique constraint. That said, unless you have a compelling reason to do so, don't normalize names. The name "Matt" isn't a distinct entity in the way Order_ID: 854123 is a distinct entity. Think of it this way, if something about "Matt" were to change, would you want that change propagated to every single person with that Name_ID: Matt attribute?
– Jason A. Long
Nov 11 at 3:18
You should avoid the temptation of over engineering your platform.
– John Cappelletti
Nov 10 at 14:34
You should avoid the temptation of over engineering your platform.
– John Cappelletti
Nov 10 at 14:34
I would advise not to do this. you might save a miniscule amount of storage but it will add complexity to all your queries. See also Is normalizing a person's name going too far?
– Martin Smith
Nov 10 at 17:00
I would advise not to do this. you might save a miniscule amount of storage but it will add complexity to all your queries. See also Is normalizing a person's name going too far?
– Martin Smith
Nov 10 at 17:00
To answer the initial question... The answer is to use a unique constraint. That said, unless you have a compelling reason to do so, don't normalize names. The name "Matt" isn't a distinct entity in the way Order_ID: 854123 is a distinct entity. Think of it this way, if something about "Matt" were to change, would you want that change propagated to every single person with that Name_ID: Matt attribute?
– Jason A. Long
Nov 11 at 3:18
To answer the initial question... The answer is to use a unique constraint. That said, unless you have a compelling reason to do so, don't normalize names. The name "Matt" isn't a distinct entity in the way Order_ID: 854123 is a distinct entity. Think of it this way, if something about "Matt" were to change, would you want that change propagated to every single person with that Name_ID: Matt attribute?
– Jason A. Long
Nov 11 at 3:18
add a comment |
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53239864%2fsql-database-how-far-to-go-to-prevent-duplication-of-records%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
You should avoid the temptation of over engineering your platform.
– John Cappelletti
Nov 10 at 14:34
I would advise not to do this. you might save a miniscule amount of storage but it will add complexity to all your queries. See also Is normalizing a person's name going too far?
– Martin Smith
Nov 10 at 17:00
To answer the initial question... The answer is to use a unique constraint. That said, unless you have a compelling reason to do so, don't normalize names. The name "Matt" isn't a distinct entity in the way Order_ID: 854123 is a distinct entity. Think of it this way, if something about "Matt" were to change, would you want that change propagated to every single person with that Name_ID: Matt attribute?
– Jason A. Long
Nov 11 at 3:18