ODL Design to Relational Designs Object-Relational Database Systems (ORDBS)

advertisement
ODL Design to Relational Designs
Object-Relational Database Systems (ORDBS)
Extensible Markup Language (XML)
26/2-2002
Contains slides made by
Arthur M. Keller, Vera Goebel
Overview
9 From ODL design to relational design
9 Object-relational DBS (ORDBS)
9 Semi-structured data
9 Extensible Markup Language (XML)
INF 212 – database theory
2002 Pål Halvorsen
1
ODL Design to Relational Designs
Translating ODL to Relations
Similar to converting ER-diagrams to relations:
9 Classes without relationships are like entity classes (or
sets), but some new problems arise:
¾ Keys are optional in ODL
(Some situations may require inventing new attributes to serve as a key)
¾ ODL may have non-atomic attributes
(Resulting relations may be non-normalized and must be redesigned)
¾ ODL allows methods as part of design
9
Classes with relationships:
¾ Treat the relationship separately, as in E/R.
¾ Attach a many-one relationship to the relation for the “many.”
INF 212 – database theory
2002 Pål Halvorsen
2
Translating ODL to Relations:
Attributes and Keys
9 Each atomic ODL attribute translates to a relational attribute.
9 ODL allows non-atomic attributes (structures and collection types):
¾ Structure: make one attribute for each field.
¾ SET: make one tuple for each member of the set.
(More than one set attribute? Make tuples for all combinations.)
¾ BAG: make only distinct tuples, add a count attribute
¾ LIST: make one member for each member of the list, add a position
attribute
¾ ARRAY: fixed length, add each field in the array as own attributes
¾ DICTIONARY: represent as set, but with a tuple for all pairs that are
member of the dictionary
9 ODL class may have no key, but we should have one in the relation
to represent “OID.”
2002 Pål Halvorsen
INF 212 – database theory
Translating ODL to Relations:
Attributes and Keys - Example
ODL class:
class Person (extent persons key name) {
attribute string name;
attribute Struct Addr{string street, string city} address;
attribute Set<string> phone;
}
Relation schema:
persons (name, street, city, phone)
name
street
city
phone
n1
n1
s1
s1
c1
c1
p1
p2
9 “Surprise”: the key for the class (name) is not the key
for the relation (name, phone):
¾name in the class determines a unique object, including a set of phone numbers.
¾name in the relation does not determine a unique tuple.
¾Since tuples are not identical to objects, there is no inconsistency!
9 BCNF violation:
INF 212 – database theory
name Æ (street, city), name not superkey,
separate out name-phone.
2002 Pål Halvorsen
3
Translating ODL to Relations:
Relationships
9 Can create a new relation for each relationship
9 If the relationship is many-to-one from A to B, put
key of B attributes in the relation for class A.
9 If relationship is many-many, we’ll have to
duplicate A-tuples as in ODL with set-valued
attributes.
¾ Wouldn’t you really rather create a separate relation
for a many-many-relationship?
¾ You’ll wind up separating it anyway, during BCNF
decomposition.
2002 Pål Halvorsen
INF 212 – database theory
Translating ODL to Relations:
Relationships - Example
ODL class:
class Person (extent persons key name) {
attribute string name;
attribute Struct Addr{string street, string city} address;
relationship Set<Cars> ownCar inverse Cars::ownedBy;
relationship Person spouse inverse spouse;
relationship Set<Person> buddies inverse buddies;
}
Relation schema:
persons (name, street, city, ownCar, spouse, buddies)
9 BCNF violation:
9 4NF violation:
name Æ (street, city, spouse)
name ÆÆ ownCar
name ÆÆ buddies
9 Decomposition into 4NF:
persons
(name, street, city, spouse)
persons_cars
(name, ownCar)
persons_buddies (name, buddies)
INF 212 – database theory
2002 Pål Halvorsen
4
Object-Relational Database Systems
(ORDBS)
Data Models & Database System Architectures
- Chronological Overview 9 Network Data Models
(1964)
9 Hierarchical Data Models
(1968)
9 Relational Data Models
(1970)
9 Object-oriented Data Models
(~ 1985)
9 Object-relational Data Models
(~ 1990)
9 Semistructured Data Models (XML 1.0) (~1998)
INF 212 – database theory
2002 Pål Halvorsen
5
Object-Relational Database Systems (ORDBS)
9 Motivations
¾ Allow DBMS to deal with specialized types – maps, signals,
images, etc. – with their own specialized methods.
¾ Supports specialized methods even on conventional relational
data.
¾ Supports structure more complex than “flat files.”
¾ …
Ö Object-oriented ideas enter the relational world
¾ Keep the relation as the fundamental abstraction whereas the
OODBS use the class as the fundamental abstraction.
2002 Pål Halvorsen
INF 212 – database theory
ORDBS:
New Features
9 Structured types
Not only atomic types. ODL-like type system.
(Also: BLOB, CLOB, ADT, BFILE)
9 Methods
Special operations can be defined for a type.
9 Identifiers
Allowing unique IDs for each tuple.
9 References
Pointers to tuples.
INF 212 – database theory
2002 Pål Halvorsen
6
ORDBS:
Nested Relations
9 Attributes may have non-atomic types
¾ Nested-relational data models give up 1NF (atomic values)
¾ A relation’s type can be any schema consisting one or more attributes. An
attribute may even have an own schema as type.
9 Example:
moviestar(name, address(street,city), birth, movies(title,year))
name
address
street
Fisher
city
Maple
Hollywood
5. Avenue
New York
street
Hamill
birth
title
9/9/1950
city
Sunset Bvld LA
movie
year
Star Wars
1977
Empire
1980
title
8/8/1962
year
Star Wars
1977
Return
1983
2002 Pål Halvorsen
INF 212 – database theory
ORDBS:
References - I
9Unnormalized relation
9Introduce references to allow a tuple t refer to
a tuple s rather than including s in t.
name
address
street
Fisher
Hamill
INF 212 – database theory
city
Maple
Hollywood
5. Avenue
New York
street
birth
title
9/9/1950
city
Sunset Blvd LA
movie
Star Wars
1977
Empire
1980
title
8/8/1962
year
year
Star Wars
1977
Return
1983
2002 Pål Halvorsen
7
ORDBS:
References - II
9 If attribute A has a type that is a reference to a relation with
schema R, we denote A as A(*R)
9 If A is a set of references, we denote A as A({*R})
9 Example:
moviestar(name, address(street,city), birth, movies({*movies}))
movies(title,year)
name
address
street
Fisher
city
Maple
Hollywood
5. Avenue
New York
street
Hamil
birth
title
9/9/1950
city
Sunset Bvld LA
movie
year
Star Wars 1977
Empire
1980
Return
1883
8/8/1962
2002 Pål Halvorsen
INF 212 – database theory
OODBS vs. ORDBS - I
two ways to integrate object-orientation into DBS
Æ both directions (OODBS and ORDBS) are also reflected
in the standard developments
Several vendors:
commercial OODBS:
- GemStone
- O2 (now: Ardent)
- ObjectivityDB
- ObjectStore
- ONTOS
- POET
- Versant
- ...
INF 212 – database theory
commercial ORDBS:
- ORACLE
- Sybase
- Illustra
- UNISQL
- ...
2002 Pål Halvorsen
8
OODBS vs. ORDBS - II
9 Objects/tuples:
Both objects and tuples are structs with components for attributes and relationships
9 Extents/relations:
Both may share the same declaration among several collections
9 Methods:
Both has the same ability to declare and define methods associated with a type
9 Type systems:
Both are based on atomic types and constructions of new types by structs and
collection types
9 References/OID:
OODBS OID hidden – ORDBS ID visible (may be part of type)
9 Backwards Compatibility:
Migrating existing applications to an OODBS require extensive rewriting, but
ORDBSes have maintained backward compatibility
2002 Pål Halvorsen
INF 212 – database theory
OODBS vs. ORDBS - III
OODBS:
ORDBS:
9
simpler way for programmer to use
DBS (familiar with OOPLs)
9
substancial investment in SQL-based
rel. DBSs Æ evolutionary approach
9
“seamlessness”, no “impedance
mismatch”
9
systems are more robust due to many
years of usage and experience
9
OO functionality + DBS functionality
Æ higher performance for specific
applications
9
application development tools
9
transaction processing performance
9
...
9
“revolutionary” approach, no legacy
problems
9
...
prediction: both kinds of systems will exist, used for different kinds of applications
INF 212 – database theory
2002 Pål Halvorsen
9
Semi-Structured Data
Information Integration - I
Problem: related data exists in many places. They talk
about the same things, but differ in model,
schema, conventions (e.g., terminology).
How should one retrieve data from different places?
Examples:
In the real world, every bar has its own database.
9 Some may have relations like beer-price; others have an
Microsoft Word file from which the menu is printed.
9 Some keep phones of manufacturers but not addresses.
9 Some distinguish beers and ales; others do not.
INF 212 – database theory
2002 Pål Halvorsen
10
Information Integration - II
9 Warehousing:
Make copies of information at each data source centrally, combine
into a global schema. Query data stored at the warehouse.
Reconstruct (recopy) data daily/weekly/monthly, but do not try to
keep it up-to-date.
9 Mediation:
Create a view of all information, but do not make copies. Answer
queries by sending appropriate queries to sources (no local data).
2002 Pål Halvorsen
INF 212 – database theory
Semi-Structured Data
9 Semi-structured data model allows information from several sources,
with related but different properties, to be fit together in one whole.
Thus, suitable for
¾ integration of databases
¾ sharing information on the Web
9 Semi-structured data is data that may be irregular or incomplete and
have a structure that may change rapidly or unpredictably.
¾ It generally has some structure, but does not conform to a fixed schema
¾ “Schemaless” and self-describing, i.e., data carries information about its
own schema (e.g., in terms of XML element tags)
9 Characteristics
¾ Heterogeneous
¾ Irregular structure
¾ Large evolving schema
9 Major application: XML documents
INF 212 – database theory
2002 Pål Halvorsen
11
Semi-Structured Data:
Graph Representation
root
man
9 Collection of nodes
¾ Atomic values on leaf nodes
¾ Interior nodes have one or more
arcs
9 Nodes connected in a general
rooted graph structure
beer
ctur
er
bar
beer
addr
name
name
name
s
bud
9 Labels on arcs
¾ name of attribute/type
¾ relationship
ufa
es
erv
miller
makes
At
ed
v
r
se
Joe’s
madeBy
9 Example: Beer-Bar-Manufacturer
name
addr
INF 212 – database theory
2002 Pål Halvorsen
Extensible Markup Language
(XML)
12
Data Models & Database System Architectures
- Chronological Overview 9 Network Data Models
(1964)
9 Hierarchical Data Models
(1968)
9 Relational Data Models
(1970)
9 Object-oriented Data Models
(~ 1985)
9 Object-relational Data Models
(~ 1990)
9 Semistructured Data Models (XML 1.0) (~1998)
INF 212 – database theory
2002 Pål Halvorsen
Extensible Markup Language (XML)
9 Standard of the World Wide Web Consortium (W3C) in
1998
9 An XML document is only a file of characters
9 Similar to HTML, but
¾ HTML uses tags for formatting (e.g., “italic”).
¾ XML uses tags for semantics (e.g., “this is an address”).
9 Two modes:
¾ Well-formed XML allows you to invent your own tags, much
like labels in semi-structured data.
¾ Valid XML involves a Document Type Definition (DTD) that
tells the labels and gives a grammar for how they may be
nested.
INF 212 – database theory
2002 Pål Halvorsen
13
XML:
Tags
9 Tags are text surrounded by brackets, i.e., <...>
9 Tags come in matching pairs, e.g.,
<FOO> is balanced by </FOO>
9 Nesting allowed (start and end in same range), e.g.,
<BAR>
<NAME></NAME>
</BAR>
9 Unbalanced tags not allowed, e.g.,
<P>, <BR>, and <HR> in HTML
2002 Pål Halvorsen
INF 212 – database theory
XML:
Well-Formed XML
9 Minimal requirement:
XML declaration and root tags surrounding entire body
<? XML VERSION = "1.0" STANDALONE = "yes" ?>
<XXX>
.....
</XXX>
NOTE 1:
XML version
INF 212 – database theory
NOTE 2:
there is no DTD specified
2002 Pål Halvorsen
14
XML:
Well-Formed XML: Example
<?XML VERSION = "1.0" STANDALONE = "yes"?>
<BARS>
<BAR> <NAME>Joe’s Bar</NAME>
<BEER> <NAME>Bud</NAME>
<PRICE>2.50</PRICE>
</BEER>
<BEER> <NAME>Miller</NAME>
<PRICE>3.00</PRICE>
</BEER>
</BAR>
<BAR>
...
</BAR>
</BARS>
NOTE 1:
only balanced tags
NOTE 2:
value between two surrounding tags
NOTE 3:
nesting within the same range
2002 Pål Halvorsen
INF 212 – database theory
XML:
Document Type Definitions (DTD)
9 Essentially a grammar describing the legal nesting of tags
9 Intention is that DTD’s will be standards for a domain,
used by everyone preparing or using data in that domain
Example: a DTD for describing protein structure; a DTD for describing bar
menus, etc.
9 Structure of a DTD:
<!DOCTYPE root tag [
<!ELEMENT name (components)>
... more elements ...
]>
9 The root-tag is used to surround the document which uses
these rules
INF 212 – database theory
2002 Pål Halvorsen
15
XML:
Elements of a DTD
9 An element is a name (its tag) and a parenthesized description of
tags within an element.
9 Special case: (#PCDATA) after an element name means it is text.
9 Each element name is a tag.
9 Its components are the tags that appear nested within, in the order
specified.
9 Multiplicity of a tag is controlled by:
1.
* = zero or more of.
2.
+ = one or more of.
3.
? = zero or one of.
9 In addition: | = “or.”
2002 Pål Halvorsen
INF 212 – database theory
XML:
DTD: Example
<!DOCTYPE Bars [
<!ELEMENT BARS (BAR*)>
<!ELEMENT BAR (NAME, BEER+)>
<!ELEMENT NAME (#PCDATA)>
<!ELEMENT BEER (NAME, PRICE)>
<!ELEMENT PRICE (#PCDATA)>
]>
NOTE 1:
BARS is root-tag
NOTE 2:
multiplicity of tags
NOTE 4:
Inside <BARS>-tag we’ll find zero or
more <BAR>-tags
INF 212 – database theory
NOTE 3:
name (and price) has a text value
NOTE 5:
a BAR has a name and serves one or more
beers (which again has components)
2002 Pål Halvorsen
16
XML:
Using a DTD
9 To use a DTD, set STANDALONE
= "no":
<?XML VERSION = "1.0" STANDALONE = "no"?>
9 Either
¾ Include the DTD as a preamble, or
¾ Follow the XML tag by a DOCTYPE declaration with the root tag,
the keyword SYSTEM, and a file where the DTD can be found.
2002 Pål Halvorsen
INF 212 – database theory
XML:
Using a DTD: Example
<?XML VERSION = "1.0" STANDALONE = "no"?>
<!DOCTYPE Bars SYSTEM
[
"bar.dtd">
<!ELEMENT BARS (BAR*)>
<!ELEMENT BAR (NAME, BEER+)>
<!ELEMENT NAME (#PCDATA)>
<!ELEMENT BEER (NAME, PRICE)>
<!ELEMENT PRICE (#PCDATA)>
]>
<BARS>
<BAR><NAME>Joe’s Bar</NAME>
<BEER> <NAME>Bud</NAME>
<PRICE>2.50</PRICE></BEER>
<BEER> <NAME>Miller</NAME>
<PRICE>3.00</PRICE></BEER>
</BAR>
<BAR> ...
</BARS>
INF 212 – database theory
NOTE 1:
DTD may be in a separate file
NOTE 2:
DTD may be included as a
preamble
NOTE 3:
BARS is root-tag and
surround the document
which uses these rules
NOTE 4:
BEER has a name and a price
NOTE 5:
BAR has a name and serves
one or more beers.
2002 Pål Halvorsen
17
XML:
Attribute Lists
9 Opening tags can have “arguments” that appear within the tag, in
analogy to constructs like <A HREF = ...> in HTML.
9 Keyword !ATTLIST introduces a list of attributes and their types for
a given element in the DTD.
9 Example of declaration:
<!ELEMENT BAR (NAME BEER*)>
<!ATTLIST BAR type = "sushi" | "sports" | "other">
9 Bar objects can have a type, and the value of that type is limited to
the three strings shown.
9 Example of use:
<BAR type = "sports">
. . .
</BAR>
2002 Pål Halvorsen
INF 212 – database theory
XML:
ID’s and IDREF’s
9 ID is used to give a unique name for an element/object
9 IDREF is used to provide pointers to elements/object
(by the ID-name), and multiple object references within
one tag is allowed. IDREFS is used if there might be a set
of references
9 Analogous to NAME = foo and HREF = #foo in HTML
9 Allows the structure of an XML document to be a general
graph, rather than just a tree.
INF 212 – database theory
2002 Pål Halvorsen
18
XML:
ID’s and IDREF’s: Example
9
Let us include in our Bars document type elements that are the manufacturers of
beers, and have each beer object link, with an IDREF, to the proper manufacturer
object:
NOTE 1:
MANUFACTURER has
<!DOCTYPE Bars [
a name-ID
<!ELEMENT BARS (BAR*)>
<!ELEMENT BAR (NAME, BEER+)>
NOTE 2:
<!ELEMENT NAME (#PCDATA)>
BEER has a poiner
<!ELEMENT MANUFACTURER (ADDR,...)>
to a manufacturer
<!ATTLIST MANUFACTURER (name ID)>
<!ELEMENT ADDR (#PCDATA)>
NOTE 3:
<!ELEMENT BEER (NAME, PRICE)>
The IDREF value in
<!ATTLIST BEER (manf IDREF)>
BEER equals the ID
<!ELEMENT PRICE (#PCDATA)>
value in the
]>
corresponding
...
manufacturer
<MANUFACTURER name= ="X">...</MANUFACTURER>
...
<BEER manf="X"><NAME>Bud</NAME><PRICE>2.50</PRICE></BEER>
2002 Pål Halvorsen
INF 212 – database theory
Summary
9 From ODL design to relational design
9 Object-relational DBSes (ORDBS)
9 Semi-structured data
9 Extensible Markup Language (XML)
INF 212 – database theory
2002 Pål Halvorsen
19
Download