ODL Design to Relational Designs Object-Relational Database Systems (ORDBS) Extensible Markup Language (XML) 26/2-2002 Contains slides made by Arthur M. Keller, Vera Goebel Overview 9 From ODL design to relational design 9 Object-relational DBS (ORDBS) 9 Semi-structured data 9 Extensible Markup Language (XML) INF 212 – database theory 2002 Pål Halvorsen 1 ODL Design to Relational Designs Translating ODL to Relations Similar to converting ER-diagrams to relations: 9 Classes without relationships are like entity classes (or sets), but some new problems arise: ¾ Keys are optional in ODL (Some situations may require inventing new attributes to serve as a key) ¾ ODL may have non-atomic attributes (Resulting relations may be non-normalized and must be redesigned) ¾ ODL allows methods as part of design 9 Classes with relationships: ¾ Treat the relationship separately, as in E/R. ¾ Attach a many-one relationship to the relation for the “many.” INF 212 – database theory 2002 Pål Halvorsen 2 Translating ODL to Relations: Attributes and Keys 9 Each atomic ODL attribute translates to a relational attribute. 9 ODL allows non-atomic attributes (structures and collection types): ¾ Structure: make one attribute for each field. ¾ SET: make one tuple for each member of the set. (More than one set attribute? Make tuples for all combinations.) ¾ BAG: make only distinct tuples, add a count attribute ¾ LIST: make one member for each member of the list, add a position attribute ¾ ARRAY: fixed length, add each field in the array as own attributes ¾ DICTIONARY: represent as set, but with a tuple for all pairs that are member of the dictionary 9 ODL class may have no key, but we should have one in the relation to represent “OID.” 2002 Pål Halvorsen INF 212 – database theory Translating ODL to Relations: Attributes and Keys - Example ODL class: class Person (extent persons key name) { attribute string name; attribute Struct Addr{string street, string city} address; attribute Set<string> phone; } Relation schema: persons (name, street, city, phone) name street city phone n1 n1 s1 s1 c1 c1 p1 p2 9 “Surprise”: the key for the class (name) is not the key for the relation (name, phone): ¾name in the class determines a unique object, including a set of phone numbers. ¾name in the relation does not determine a unique tuple. ¾Since tuples are not identical to objects, there is no inconsistency! 9 BCNF violation: INF 212 – database theory name Æ (street, city), name not superkey, separate out name-phone. 2002 Pål Halvorsen 3 Translating ODL to Relations: Relationships 9 Can create a new relation for each relationship 9 If the relationship is many-to-one from A to B, put key of B attributes in the relation for class A. 9 If relationship is many-many, we’ll have to duplicate A-tuples as in ODL with set-valued attributes. ¾ Wouldn’t you really rather create a separate relation for a many-many-relationship? ¾ You’ll wind up separating it anyway, during BCNF decomposition. 2002 Pål Halvorsen INF 212 – database theory Translating ODL to Relations: Relationships - Example ODL class: class Person (extent persons key name) { attribute string name; attribute Struct Addr{string street, string city} address; relationship Set<Cars> ownCar inverse Cars::ownedBy; relationship Person spouse inverse spouse; relationship Set<Person> buddies inverse buddies; } Relation schema: persons (name, street, city, ownCar, spouse, buddies) 9 BCNF violation: 9 4NF violation: name Æ (street, city, spouse) name ÆÆ ownCar name ÆÆ buddies 9 Decomposition into 4NF: persons (name, street, city, spouse) persons_cars (name, ownCar) persons_buddies (name, buddies) INF 212 – database theory 2002 Pål Halvorsen 4 Object-Relational Database Systems (ORDBS) Data Models & Database System Architectures - Chronological Overview 9 Network Data Models (1964) 9 Hierarchical Data Models (1968) 9 Relational Data Models (1970) 9 Object-oriented Data Models (~ 1985) 9 Object-relational Data Models (~ 1990) 9 Semistructured Data Models (XML 1.0) (~1998) INF 212 – database theory 2002 Pål Halvorsen 5 Object-Relational Database Systems (ORDBS) 9 Motivations ¾ Allow DBMS to deal with specialized types – maps, signals, images, etc. – with their own specialized methods. ¾ Supports specialized methods even on conventional relational data. ¾ Supports structure more complex than “flat files.” ¾ … Ö Object-oriented ideas enter the relational world ¾ Keep the relation as the fundamental abstraction whereas the OODBS use the class as the fundamental abstraction. 2002 Pål Halvorsen INF 212 – database theory ORDBS: New Features 9 Structured types Not only atomic types. ODL-like type system. (Also: BLOB, CLOB, ADT, BFILE) 9 Methods Special operations can be defined for a type. 9 Identifiers Allowing unique IDs for each tuple. 9 References Pointers to tuples. INF 212 – database theory 2002 Pål Halvorsen 6 ORDBS: Nested Relations 9 Attributes may have non-atomic types ¾ Nested-relational data models give up 1NF (atomic values) ¾ A relation’s type can be any schema consisting one or more attributes. An attribute may even have an own schema as type. 9 Example: moviestar(name, address(street,city), birth, movies(title,year)) name address street Fisher city Maple Hollywood 5. Avenue New York street Hamill birth title 9/9/1950 city Sunset Bvld LA movie year Star Wars 1977 Empire 1980 title 8/8/1962 year Star Wars 1977 Return 1983 2002 Pål Halvorsen INF 212 – database theory ORDBS: References - I 9Unnormalized relation 9Introduce references to allow a tuple t refer to a tuple s rather than including s in t. name address street Fisher Hamill INF 212 – database theory city Maple Hollywood 5. Avenue New York street birth title 9/9/1950 city Sunset Blvd LA movie Star Wars 1977 Empire 1980 title 8/8/1962 year year Star Wars 1977 Return 1983 2002 Pål Halvorsen 7 ORDBS: References - II 9 If attribute A has a type that is a reference to a relation with schema R, we denote A as A(*R) 9 If A is a set of references, we denote A as A({*R}) 9 Example: moviestar(name, address(street,city), birth, movies({*movies})) movies(title,year) name address street Fisher city Maple Hollywood 5. Avenue New York street Hamil birth title 9/9/1950 city Sunset Bvld LA movie year Star Wars 1977 Empire 1980 Return 1883 8/8/1962 2002 Pål Halvorsen INF 212 – database theory OODBS vs. ORDBS - I two ways to integrate object-orientation into DBS Æ both directions (OODBS and ORDBS) are also reflected in the standard developments Several vendors: commercial OODBS: - GemStone - O2 (now: Ardent) - ObjectivityDB - ObjectStore - ONTOS - POET - Versant - ... INF 212 – database theory commercial ORDBS: - ORACLE - Sybase - Illustra - UNISQL - ... 2002 Pål Halvorsen 8 OODBS vs. ORDBS - II 9 Objects/tuples: Both objects and tuples are structs with components for attributes and relationships 9 Extents/relations: Both may share the same declaration among several collections 9 Methods: Both has the same ability to declare and define methods associated with a type 9 Type systems: Both are based on atomic types and constructions of new types by structs and collection types 9 References/OID: OODBS OID hidden – ORDBS ID visible (may be part of type) 9 Backwards Compatibility: Migrating existing applications to an OODBS require extensive rewriting, but ORDBSes have maintained backward compatibility 2002 Pål Halvorsen INF 212 – database theory OODBS vs. ORDBS - III OODBS: ORDBS: 9 simpler way for programmer to use DBS (familiar with OOPLs) 9 substancial investment in SQL-based rel. DBSs Æ evolutionary approach 9 “seamlessness”, no “impedance mismatch” 9 systems are more robust due to many years of usage and experience 9 OO functionality + DBS functionality Æ higher performance for specific applications 9 application development tools 9 transaction processing performance 9 ... 9 “revolutionary” approach, no legacy problems 9 ... prediction: both kinds of systems will exist, used for different kinds of applications INF 212 – database theory 2002 Pål Halvorsen 9 Semi-Structured Data Information Integration - I Problem: related data exists in many places. They talk about the same things, but differ in model, schema, conventions (e.g., terminology). How should one retrieve data from different places? Examples: In the real world, every bar has its own database. 9 Some may have relations like beer-price; others have an Microsoft Word file from which the menu is printed. 9 Some keep phones of manufacturers but not addresses. 9 Some distinguish beers and ales; others do not. INF 212 – database theory 2002 Pål Halvorsen 10 Information Integration - II 9 Warehousing: Make copies of information at each data source centrally, combine into a global schema. Query data stored at the warehouse. Reconstruct (recopy) data daily/weekly/monthly, but do not try to keep it up-to-date. 9 Mediation: Create a view of all information, but do not make copies. Answer queries by sending appropriate queries to sources (no local data). 2002 Pål Halvorsen INF 212 – database theory Semi-Structured Data 9 Semi-structured data model allows information from several sources, with related but different properties, to be fit together in one whole. Thus, suitable for ¾ integration of databases ¾ sharing information on the Web 9 Semi-structured data is data that may be irregular or incomplete and have a structure that may change rapidly or unpredictably. ¾ It generally has some structure, but does not conform to a fixed schema ¾ “Schemaless” and self-describing, i.e., data carries information about its own schema (e.g., in terms of XML element tags) 9 Characteristics ¾ Heterogeneous ¾ Irregular structure ¾ Large evolving schema 9 Major application: XML documents INF 212 – database theory 2002 Pål Halvorsen 11 Semi-Structured Data: Graph Representation root man 9 Collection of nodes ¾ Atomic values on leaf nodes ¾ Interior nodes have one or more arcs 9 Nodes connected in a general rooted graph structure beer ctur er bar beer addr name name name s bud 9 Labels on arcs ¾ name of attribute/type ¾ relationship ufa es erv miller makes At ed v r se Joe’s madeBy 9 Example: Beer-Bar-Manufacturer name addr INF 212 – database theory 2002 Pål Halvorsen Extensible Markup Language (XML) 12 Data Models & Database System Architectures - Chronological Overview 9 Network Data Models (1964) 9 Hierarchical Data Models (1968) 9 Relational Data Models (1970) 9 Object-oriented Data Models (~ 1985) 9 Object-relational Data Models (~ 1990) 9 Semistructured Data Models (XML 1.0) (~1998) INF 212 – database theory 2002 Pål Halvorsen Extensible Markup Language (XML) 9 Standard of the World Wide Web Consortium (W3C) in 1998 9 An XML document is only a file of characters 9 Similar to HTML, but ¾ HTML uses tags for formatting (e.g., “italic”). ¾ XML uses tags for semantics (e.g., “this is an address”). 9 Two modes: ¾ Well-formed XML allows you to invent your own tags, much like labels in semi-structured data. ¾ Valid XML involves a Document Type Definition (DTD) that tells the labels and gives a grammar for how they may be nested. INF 212 – database theory 2002 Pål Halvorsen 13 XML: Tags 9 Tags are text surrounded by brackets, i.e., <...> 9 Tags come in matching pairs, e.g., <FOO> is balanced by </FOO> 9 Nesting allowed (start and end in same range), e.g., <BAR> <NAME></NAME> </BAR> 9 Unbalanced tags not allowed, e.g., <P>, <BR>, and <HR> in HTML 2002 Pål Halvorsen INF 212 – database theory XML: Well-Formed XML 9 Minimal requirement: XML declaration and root tags surrounding entire body <? XML VERSION = "1.0" STANDALONE = "yes" ?> <XXX> ..... </XXX> NOTE 1: XML version INF 212 – database theory NOTE 2: there is no DTD specified 2002 Pål Halvorsen 14 XML: Well-Formed XML: Example <?XML VERSION = "1.0" STANDALONE = "yes"?> <BARS> <BAR> <NAME>Joe’s Bar</NAME> <BEER> <NAME>Bud</NAME> <PRICE>2.50</PRICE> </BEER> <BEER> <NAME>Miller</NAME> <PRICE>3.00</PRICE> </BEER> </BAR> <BAR> ... </BAR> </BARS> NOTE 1: only balanced tags NOTE 2: value between two surrounding tags NOTE 3: nesting within the same range 2002 Pål Halvorsen INF 212 – database theory XML: Document Type Definitions (DTD) 9 Essentially a grammar describing the legal nesting of tags 9 Intention is that DTD’s will be standards for a domain, used by everyone preparing or using data in that domain Example: a DTD for describing protein structure; a DTD for describing bar menus, etc. 9 Structure of a DTD: <!DOCTYPE root tag [ <!ELEMENT name (components)> ... more elements ... ]> 9 The root-tag is used to surround the document which uses these rules INF 212 – database theory 2002 Pål Halvorsen 15 XML: Elements of a DTD 9 An element is a name (its tag) and a parenthesized description of tags within an element. 9 Special case: (#PCDATA) after an element name means it is text. 9 Each element name is a tag. 9 Its components are the tags that appear nested within, in the order specified. 9 Multiplicity of a tag is controlled by: 1. * = zero or more of. 2. + = one or more of. 3. ? = zero or one of. 9 In addition: | = “or.” 2002 Pål Halvorsen INF 212 – database theory XML: DTD: Example <!DOCTYPE Bars [ <!ELEMENT BARS (BAR*)> <!ELEMENT BAR (NAME, BEER+)> <!ELEMENT NAME (#PCDATA)> <!ELEMENT BEER (NAME, PRICE)> <!ELEMENT PRICE (#PCDATA)> ]> NOTE 1: BARS is root-tag NOTE 2: multiplicity of tags NOTE 4: Inside <BARS>-tag we’ll find zero or more <BAR>-tags INF 212 – database theory NOTE 3: name (and price) has a text value NOTE 5: a BAR has a name and serves one or more beers (which again has components) 2002 Pål Halvorsen 16 XML: Using a DTD 9 To use a DTD, set STANDALONE = "no": <?XML VERSION = "1.0" STANDALONE = "no"?> 9 Either ¾ Include the DTD as a preamble, or ¾ Follow the XML tag by a DOCTYPE declaration with the root tag, the keyword SYSTEM, and a file where the DTD can be found. 2002 Pål Halvorsen INF 212 – database theory XML: Using a DTD: Example <?XML VERSION = "1.0" STANDALONE = "no"?> <!DOCTYPE Bars SYSTEM [ "bar.dtd"> <!ELEMENT BARS (BAR*)> <!ELEMENT BAR (NAME, BEER+)> <!ELEMENT NAME (#PCDATA)> <!ELEMENT BEER (NAME, PRICE)> <!ELEMENT PRICE (#PCDATA)> ]> <BARS> <BAR><NAME>Joe’s Bar</NAME> <BEER> <NAME>Bud</NAME> <PRICE>2.50</PRICE></BEER> <BEER> <NAME>Miller</NAME> <PRICE>3.00</PRICE></BEER> </BAR> <BAR> ... </BARS> INF 212 – database theory NOTE 1: DTD may be in a separate file NOTE 2: DTD may be included as a preamble NOTE 3: BARS is root-tag and surround the document which uses these rules NOTE 4: BEER has a name and a price NOTE 5: BAR has a name and serves one or more beers. 2002 Pål Halvorsen 17 XML: Attribute Lists 9 Opening tags can have “arguments” that appear within the tag, in analogy to constructs like <A HREF = ...> in HTML. 9 Keyword !ATTLIST introduces a list of attributes and their types for a given element in the DTD. 9 Example of declaration: <!ELEMENT BAR (NAME BEER*)> <!ATTLIST BAR type = "sushi" | "sports" | "other"> 9 Bar objects can have a type, and the value of that type is limited to the three strings shown. 9 Example of use: <BAR type = "sports"> . . . </BAR> 2002 Pål Halvorsen INF 212 – database theory XML: ID’s and IDREF’s 9 ID is used to give a unique name for an element/object 9 IDREF is used to provide pointers to elements/object (by the ID-name), and multiple object references within one tag is allowed. IDREFS is used if there might be a set of references 9 Analogous to NAME = foo and HREF = #foo in HTML 9 Allows the structure of an XML document to be a general graph, rather than just a tree. INF 212 – database theory 2002 Pål Halvorsen 18 XML: ID’s and IDREF’s: Example 9 Let us include in our Bars document type elements that are the manufacturers of beers, and have each beer object link, with an IDREF, to the proper manufacturer object: NOTE 1: MANUFACTURER has <!DOCTYPE Bars [ a name-ID <!ELEMENT BARS (BAR*)> <!ELEMENT BAR (NAME, BEER+)> NOTE 2: <!ELEMENT NAME (#PCDATA)> BEER has a poiner <!ELEMENT MANUFACTURER (ADDR,...)> to a manufacturer <!ATTLIST MANUFACTURER (name ID)> <!ELEMENT ADDR (#PCDATA)> NOTE 3: <!ELEMENT BEER (NAME, PRICE)> The IDREF value in <!ATTLIST BEER (manf IDREF)> BEER equals the ID <!ELEMENT PRICE (#PCDATA)> value in the ]> corresponding ... manufacturer <MANUFACTURER name= ="X">...</MANUFACTURER> ... <BEER manf="X"><NAME>Bud</NAME><PRICE>2.50</PRICE></BEER> 2002 Pål Halvorsen INF 212 – database theory Summary 9 From ODL design to relational design 9 Object-relational DBSes (ORDBS) 9 Semi-structured data 9 Extensible Markup Language (XML) INF 212 – database theory 2002 Pål Halvorsen 19