First, a schema that directly mirrors the relational model:
<!DOCTYPE Hogwarts [ <!ELEMENT Hogwarts (Teachers, TeacherTitles, Rooms, Courses, Classes)> <!ELEMENT Teachers (Teacher*)> <!ELEMENT Teacher EMPTY> <!ATTLIST Teacher name ID #REQUIRED room IDREF #REQUIRED> <!ELEMENT TeacherTitles (TeacherTitle*) > <!ELEMENT TeacherTitle EMPTY > <!ATTLIST TeacherTitle teacher IDREF #REQUIRED title CDATA #REQUIRED> <!ELEMENT Rooms (Room*)> <!ELEMENT Room EMPTY> <!ATTLIST Room name ID #REQUIRED nrSeats CDATA #IMPLIED> <!ELEMENT Courses (Course*)> <!ELEMENT Course EMPTY> <!ATTLIST Course name ID #REQUIRED teacher IDREF #REQUIRED nrStudents CDATA #IMPLIED> <!ELEMENT Classes (Class*)> <!ELEMENT Class EMPTY> <!ATTLIST Class course IDREF #REQUIRED day CDATA #REQUIRED hour CDATA #REQUIRED> ]>
Three levels everywhere, for tables, rows, and attributes respectively. Note that we cannot declare composite keys, so Classes and TeacherTitles don't have any ID attributes.
Then a more natural approach:
<!DOCTYPE Hogwarts [ <!ELEMENT Hogwarts (Rooms, Teachers, Courses) > <!ELEMENT Rooms (Room*) > <!ELEMENT Room EMPTY > <!ATTLIST Room name ID #REQUIRED nrSeats CDATA #IMPLIED > <!ELEMENT Teachers (Teacher*) > <!ELEMENT Teacher (Title*) > <!ELEMENT Title (#PCDATA) > <!ATTLIST Teacher name ID #REQUIRED room IDREF #REQUIRED > <!ELEMENT Courses (Course*) > <!ELEMENT Course (Class*) > <!ATTLIST Course name ID #REQUIRED teacher IDREF #REQUIRED nrStudents CDATA #IMPLIED > <!ELEMENT Class EMPTY > <!ATTLIST Class day CDATA #REQUIRED hour CDATA #REQUIRED > ]>
There are room for differences of course. We might choose to make Classes children of Rooms instead, and include an IDREF course, to get a model similar in spirit to the one we got when doing the 3NF decomposition in (the solutions of) exercise 3. Similarly we could add an IDREF room to the Class element in the model above, to get a model similar inspirit to the one that we got from the E-R diagram in exercise 2. There are other (endless) variations as well, but these are (in my opinion at least) the most natural.
Files containing both the DTDs and XML data are available on-line:
These XML files have been validated using the W3C validation service.
An XML document conforming to the DTD above:
<?xml version="1.0" standalone="yes" ?> <!-- put the DTD here --> <Hogwarts> <Rooms> <Room name="The_Dungeon" nrSeats="34" /> <Room name="The_Cabin" nrSeats="163" /> </Rooms> <Teachers> <Teacher name="Snape" room="The_Dungeon" > <Title>Professor</Title> </Teacher> <Teacher name="Hagrid" room="The_Cabin" /> </Teachers> <Courses> <Course name="Potioncraft" teacher="Snape" nrStudents="28"> <Class day="Monday" hour="10" /> </Course> <Course name="Handling_of_Wild_Creatures" teacher="Hagrid"> <Class day="Saturday" hour="13" /> <Class day="Thursday" hour="7" /> </Course> </Courses> </Hogwarts>
(These solutions have been tested using an on-line XPath tester and also the XPath progam on the Linux system at Chalmers.)
//Course[@nrStudents >= 20]
//Teacher[Title = "Professor"]
Can't be written, since we can't follow references!
//Course[Class/@day = "Monday"]
Note that which of the last two that cannot be written depends on how we wrote our DTD. If we made Classes children of Rooms instead, then it would be the last one that couldn't be written instead.
In some cases I will write more than one answer just to show different ways of writing a query.
Find all courses that have at least 20 students
Note that the following solutions only work where there is a single Course element in the result:
let $cs := doc("xml6natural.xml")//Course[@nrStudents >= 20] return $cs for $c in doc("xml6natural.xml")//Course where $c/@nrStudents >= 20 return $c doc("xml6natural.xml")//Course[@nrStudents >= 20]
In general, there will be a seuqence of elements in the result, so we need to put tags around the sequence, e.g.
<Result> { let $cs := doc("xml6natural.xml")//Course[@nrStudents >= 20] return $cs } </Result> let $cs := doc("xml6natural.xml")//Course[@nrStudents >= 20] return <Result>{$cs}</Result> <Result>{doc("xml6natural.xml")//Course[@nrStudents >= 20]}</Result>
Find the room(s) with the highest number of seats.
The following solutions assume that there is only one element in the result. If there can be more than one, then we need to put tags around the sequence.
let $d := doc("xml6natural.xml") let $seats := $d//Room/@nrSeats let $maxSeats := max($seats) for $r in $d//Room where $r/@nrSeats = $maxSeats return $r let $d := doc("xml6natural.xml") for $r in $d//Room[@nrSeats = max($d//Room/@nrSeats)] return $r
Find the teacher(s) with the highest number of students on their courses:
The following solutions assume that there is only one element in the result. If there can be more than one, then we need to put tags around the sequence.
let $d := doc("xml6natural.xml") let $maxstudents := max($d//Course/@nrStudents) for $c in $d//Course[@nrStudents = $maxstudents] for $t in $d//Teacher[@name = $c/@teacher] return $t let $d := doc("xml6natural.xml") let $maxstudents := max($d//Course/@nrStudents) for $c in $d//Course for $t in $d//Teacher where $c/@nrStudents = $maxstudents and $t/@name = $c/@teacher return $t
Find all classes where the number of students exceeds the number of seats.
<Result> { let $d := doc("xml6natural.xml") for $c in $d//Course let $nrStudents := $c/@nrStudents let $teacherName := $c/@teacher for $teacher in $d//Teacher[@name = $teacherName] let $roomName := $teacher/@room for $room in $d//Room[@name = $roomName] let $nrSeats := $room/@nrSeats where $nrStudents > $nrSeats return $c/Class } </Result> <Result> { let $d := doc("xml6natural.xml") for $c in $d//Course for $class in $c/Class for $teacher in $d//Teacher[@name = $c/@teacher] for $room in $d//Room[@name = $teacher/@room] let $nrSeats := $room/@nrSeats where $c/@nrStudents > $nrSeats return $class } </Result> <Result> { let $d := doc("xml6natural.xml") for $c in $d//Course for $teacher in $d//Teacher[@name = $c/@teacher] for $room in $d//Room[@name = $teacher/@room] where $c/@nrStudents > $room/@nrSeats return $c/Class } </Result> <Result> { let $d := doc("xml6natural.xml") for $c in $d//Course for $room in $d//Room[@name = $d//Teacher[@name = $c/@teacher]/@room] where $c/@nrStudents > $room/@nrSeats return $c/Class } </Result> <Result> { let $d := doc("xml6natural.xml") for $c in $d//Course where $c/@nrStudents > $d//Room[@name = $d//Teacher[@name = $c/@teacher]/@room]/@nrSeats return $c/Class } </Result>
Find the teacher(s) with the highest average number of students on his or her (their) courses:
<Result> { let $i := ( let $d := doc("xml6natural.xml") for $t in $d//Teacher let $cs := $d//Course[@teacher = $t/@name] let $avg := avg($cs/@nrStudents) where not(empty($cs/@nrStudents)) return <InnerResult average="{$avg}">{$t}</InnerResult> ) let $maxavg := max($i/@average) for $r in $i where $r/@average = $maxavg return $r/Teacher } </Result>
The where clause in the inner subquery is needed to exclude those courses where the number of students is not known. In these cases, the sequence $cs/@nrStudents will be empty, and the XQuery processor would give an error if we try to compute the average of an empty sequence of numbers.
We don't need to assign the values of the aggregate function calls to variables:
<Result> { let $i := ( let $d := doc("xml6natural.xml") for $t in $d//Teacher let $c := $d//Course[@teacher = $t/@name] where not(empty($c/@nrStudents)) return <InnerResult average="{avg($c/@nrStudents)}">{$t}</InnerResult> ) for $r in $i where $r/@average = max($i/@average) return $r/Teacher } </Result> <Result> { let $i := ( let $d := doc("xml6natural.xml") for $t in $d//Teacher let $numbers := $d//Course[@teacher = $t/@name]/@nrStudents where not(empty($numbers)) return <InnerResult average="{avg($numbers)}">{$t}</InnerResult> ) for $r in $i where $r/@average = max($i/@average) return $r/Teacher } </Result>