SPECIALIZATION: It is the process of identifying the subclass/es of an entity set which are special from other entities of this set in terms of attributes or relationships they make. Consider the design of a database for an academic institution. While designing we identified an entity set "Employee” which represent all the employees of this. The attributes of this may be (Employee ID, Employee Name, Date of Joining, Address) but then we see a subclass of this set of employees, called set of all "Teachers” which is different from other employees. We may have other employees in the sub classes called "Admin Staff” or "Technical Lab Staff” or "Other Staff” like peons and other workers. All of these subclasses form specialization of class "Employee”. All the attributes and associations of class "Employee” will also be there with all the subclasses. Every teacher will have an employee ID, Name, DOJ, Address similarly the admin staff, technical lab staff and other staff members. Sub classes may have some attributes or associations oftheir own which make them different from others. The Teachers will have a subject they teach, Department they belong to, Expertise they have. The teachers will have associations with different entities like "Classes” they teach in, "Projects” they guide etc. These attributes and associations will not be there with other employees sub sets
GENERALIZATION: This is again a similar process as specialization but it is just opposite of that. It is the process of combining the subclasses into a general class and moving the common attributes and associations from subclasses to the general class. It is just the different practical approach. In specialization we start from the general classes and forms the special classes out of them while in generalization we start from various low level classes and forms the general classes by combining several of them identifying the common features in them. So in the above example of Academic Institute we may start thinking of "Teachers” as an Entity set and then "Technical Lab Staff” and Then "Admin staff” and then observing that they have several fields in common like "Employee Id”, "name”, " Address” etc. we combine them to define a general class "Employee” which will only those attributes which are common in all the above three classes. And then these classes will not have these common fields rather they will be there for all employees collectively in the general entity set "Employee”.
The result of both the above process is same. We get a hierarchy of classes and subclasses which can be represented by a tree structure. The result of generalization and specialization will be like this.
AGGREGATION: Consider a "Borrower” relationship set that associates customers from "Customer” entity set to the loans they borrowed in entity set "Loan”. Suppose the bank decides to attach an employee to some customer-loan relationships of "Borrower” based on probably the size of loan or status of customer. This employee called the loan officer will be responsible for tracking up and following up the status of the loan time to time. This suggests a relationship that exists between "Customer”, "Borrower” and "Employee”. So we can draw the simple ERD as follows:
o The diagram above may imply that the relationships Borrower and Loan-Officer may be combined into one. But then it will require that a loan-
officer must be combined to every Customer-loan pair, which is not true.
o The above diagram also have redundancy as every customer-loan pair in "Loanofficer” is also in "Borrower”.
More appropriate way of representing the above set of relationships would be to consider the entire relationship Borrower with its associated entities Customer and Loan as an entity i.e. an aggregate entity, and then representing relationship Loan-Officer between the entities "Employee” and the above aggregated entity. As follows:
Important Terms related to generalization and specialization:
o Attribute inheritance: All the attributes of higher level entity sets are inherited by the respective lower level entity sets. Also the relationships in which higher level entity sets participates all the respective lower level entity level entity sets also participates.
o Disjoint: The lower level entity sets corresponding to a higher level entity sets are called disjoint if an entity doesn’t belong to more than one lower level entity sets.
o Overlapping: When the same entity may belong to more than one lower level entity set.
o Complete Generalization: When each higher level entity belongs to at least one lower-level entity set.
o Partial Generalization: When some higher-level entities may not belong to any lower-level entity set.
Reduction of ER Schema to Tables:
Strong Entity Sets: These are the entity sets for which we have a set of attributes which are called primary key (or simply a key). To represent such an entity set in a form of table we will have a column in a table for each attribute of the entity schema. Each entity of that entity set will be represented by a row in table having values for each attribute. For Example- the entity set BOOK as referred earlier has following attributes: Acc. No., Call No., Title, Author, Publisher and Yr of Publication.
Also consider that we have only two books in the library- a book on "Database System Concepts” by Silbershatz, Korth and Sudarshan published in 1997 having accession number as 312 and Call no. as 245 and another book on "Fundament of Database Systems” by Elmasri and Navathe published in year 1999 having accession no. as 433 and call no. as 23. Then entity set BOOK will be represented in tabular form as follows:
Weak Entity Sets: These are the sets where we cannot identify the different entities only looking at their attributes we should be able to establish a link between a weak entity and some of the entity from another strong entity set which is called owner of the weak entity set. Such entity sets when represented in a tabular form will have a column for key attributes of owner apart from other columns for the attributes of the entity set. For Example: Consider the PAYMENT entity set which is a weak entity set dependent on its owner entity set LOAN. LOAN has Loan No.
and
Loan Amount as its attributes with
Loan No. as the key attribute and
PAYMENT has Payment No.,
Payment Amount and
Payment Date as the set of attributes
(no key as it’s a weak entity set but Payment No. is a partial Key).
Consider the following table is there for LOAN:
Also consider that a payment of Rs 100 is made for L-1 on 22/08/2010 as its first payment and a payment of Rs 300 is made for L-2 on 25/08/2010 as its first payment then table corresponding to entity set PAYMENT will look like this:
Notice: We have included Loan No. as a column even though it was not an attribute of entity set PAYMENT because it is the Key attribute of the owner entity set LOAN. Loan No. and Payment No. in combination forms the primary key of this table.
RELATIONSHIPS: To represent a relationship of an ERD in tabular form we have a column corresponding to key attributes of each of the participating entity with a column for each attribute which is directly associated to the relationship set only. For example: We have defined earlier the relationship BORROWED BY which exists between the entity sets BOOK and USER. It has an attribute Date of Issue directly associated to it. The tabular representation of BORROWED BY will have a column for Acc. No. a column corresponding key of BOOK and a column for Card No. corresponding to key of USER and a column for DOI corresponding to the attribute of relationship set. The table may look like this where rows represents all the borrowings which are there in the library:
Existentially Dependent Entity Sets: Since the existence of all the entities of a dependent entity set depends on the existence of some entity of its owner. We may remove the table representing the relationship that is there between an existentially dependent entity and its dominant entity by adding the Column in table for dependent entity corresponding to key of dominant entity. For Example: ACCOUNT having attributes Account No.(key) and Balance is existentially dependent on BRANCH having attributes Branch Id(key) and Address. So the table for relationship BRANCH-ACCOUNT which associates the accounts to branches may be removed by just adding a column naming Branch Id in table representing ACCOUNT. The Table will have following columns:
Identifying Relationship Sets: These are the relationship sets represented as doubly outlined diamonds in ERD which form an associate a weak entity set to its owner. Since we have already included the Primary key of Strong owner Entity set in the table of weak entity set so we do not require a separate table to represent the identifying relationships.
Multivalued Attributes: The attributes of an entity which can have more than one value is called multi valued attribute. They are marked by doubly outlined ovals in ERD. For example: Consider "Dependants” an attribute of an EMPLOYEE. Since there may be more than one dependent of an employee we will represent this as a multivalued attribute of an EMPLOYEE. But if we represent it as a column in the table for the entity set we will not be able to put all of the values for a row. A multivalued attribute is represented as a different table similar to the weak entities where you will have a column corresponding to the primary key of the entity and a column corresponding to each sub attribute of multivalued attribute(there will be sub attributes in case multivalued attribute itself is a composite attribute). We show the ERD and corresponding Table for such an analogy
Employee Table
Here we see that Rajan is having dependents his mother and his wife and Sartaj also having two dependents Shahina his wife and Rehman his son.
Generalization: Consider following Generalization example:
It shows a general class of entities ACCOUNT which two special classes SAVING and CURRENT referring to savings bank account and current bank account. It can be represented in the following way
1. Tables for general case:
2. Tables when generalization is disjoint and complete Above case has both properties Disjoint and Complete. No account can both be saving and current account and Every account has to be either Saving Account or Current Account.In such case we my club the Account table into its child tables.We will have only two tables as follows
Aggregation:In case of ERD given earlier referring LOAN, CUSTOMER, BORROWER, EMPLOYEE and LOAN-OFFICER
we can have the following tables
o Loan: with attributes LoanNumber and Amount.
o Customer: with attributes Cust-Name,Cust-ID, Address.
o Borrower: with attributes Cust-ID and LoanNumber.
o Employee: with attributes Emp-ID, Emp-Name, Address.
o LoanOfficer: with attributes Cust-ID, LoanNumber and Emp-ID.
Case study of an enterprise:
1. To see why database management systems are necessary, let's look at a typical ``file-processing system'' supported by a conventional operating system.
The application is a savings bank: o Savings account and customer records are kept in permanent system files.
o Application programs are written to manipulate files to perform the following tasks:
§ Debit or credit an account.
§ Add a new account.
§ Find an account balance.
§ Generate monthly statements.
2. Development of the system proceeds as follows:
o New application programs must be written as the need arises.
o New permanent files are created as required.
o but over a long period of time files may be in different formats, and
o Application programs may be in different languages.
3. So we can see there are problems with the straight file-processing approach:
o Data redundancy and inconsistency
§ Same information may be duplicated in several places.
§ All copies may not be updated properly.
o Difficulty in accessing data
§ May have to write a new application program to satisfy an unusual request.
§ E.g. find all customers with the same postal code.
§ Could generate this data manually, but a long job...
o Data isolation
§ Data in different files.
§ Data in different formats.
§ Difficult to write new application programs.
o Multiple users
§ Want concurrency for faster response time.
§ Need protection for concurrent updates.
§ E.g. two customers withdrawing funds from the same account at the same time - account has $500 in it, and they withdraw $100 and $50.
The result could be $350, $400 or $450 if no protection.
o Security problems
§ Every user of the system should be able to access only the data they are permitted to see.
§ E.g. payroll people only handle employee records, and cannot see customer accounts; tellers only access account data and cannot see payroll data.
§ Difficult to enforce this with application programs.
o Integrity problems
§ Data may be required to satisfy constraints.
§ E.g. no account balance below $25.00.
§ Again, difficult to enforce or to change constraints with the file-processing approach.
These problems and others led to the development of database management systems.