CHAPTER 2 XML STRUCTURE Defining Namespaces Looking at the previous example, you may have already determined that xmlns:cus=”http:// www.example.com/Customer” is a namespace definition. Usually, and I stress usually, this is not the case; namespaces are created using a special prefixed attribute name and a URI, like so: xmlns:prefix=”URI” Based on this definition, prefixrefers to the namespace prefix you want to use throughout your document to associate certain elements and attributes to a namespace name (URI). In this example, the Numberelement within the Customer element becomes cus:Number, and the Number element within the Itemelement becomes item:Number. Now, the XML clearly distinguishes between the meanings of these two elements. You have removed any ambiguity from the document. These new names being used in the elements are called qualified names, also referred to as QNames. They can be broken down into two parts, separated by a colon: the prefix and the local name. When using namespaced elements, the start and end tags now must contain the qualified name. Again, an exception to this exists, which you will come to in the Default Namespace section. The significant portion of the namespace declaration is the URI (the namespace name). Once bound to a node or element, this will never change. The prefix, however, is not guaranteed. By manipulating the tree, such as moving elements around using the DOM, it is possible a namespace collision may occur. This frequently happens when a namespace defined lower in the tree declares a namespace and uses a prefix, which was used in one of its ancestors. By moving some element as a child of this other element, the prefixes would collide because they refer to two different URIs. It is perfectly valid for the prefix to automatically be changed to one that would not conflict. This is covered in more detail in the section Namespace Scope. Elements containing the namespace definition are not part of the namespace unless prefixed. Listing 2-14 shows the Orderelement within a namespace, because it is prefixed with ord, as specified in the namespace definition. The Order element in Listing 2-15 is not in any namespace even though a namespace is being defined. Listing 2-14. Element Order Within the http://www.example.com/Order Namespace Listing 2-15. Element Order Not Within the http://www.example.com/Order Namespace Namespaces are not required for every element and attribute within a document. You need to remember that namespaces remove ambiguity when there are, or there could be, overlapping names. Looking at the example, the only two elements that require namespacing are Nameand Number. It would have been perfectly valid to not put all other elements into namespaces. Namespaces can also apply to attributes as well: The attribute cid, with the cus prefix, falls within the http://www.example.com/Customer namespace.
Note: If you are looking for good and high quality web space to host and run your application check Lunarwebhost Clan Web Hosting services
CHAPTER 2 XML STRUCTURE John Smith 12345 The items ordered by the customer take the form of the following structure: - Book 11111
Combining these into a single document would result in the following: John Smith 12345 - Book 11111
Unless you read the pieces of the document in context, the elements Name and Number are ambiguous. Does Number refer to the customer number or an item number? Right now the only way you can tell is that if you are within an item, then Number must refer to an item number; otherwise, it refers to a customer number. This is just a simple case, but it does get worse, such as when elements appear within the same scope. In any event, using namespaces uniquely identifies the elements and attributes, so there is no need for guesswork or trying to figure out the context. Take the following document, for instance. Separate namespaces have been created for Customer and Item data. Just by looking at the element names, you can easily distinguish to what the data refers. John Smith 12345 Book 11111
Note: If you are looking for good and high quality web space to host and run your application check Lunarwebhost PHP Web Hosting services
CHAPTER 2 XML STRUCTURE The entity reference, when used within the document, then is able to take its meaning from the definition. This is further explained in Chapter 3. General Entity Declaration Entity declarations may be either general or parameter entity declarations. Entity declarations will be covered in more depth in Chapter 3, though general entities have some bearing to this discussion with respect to entity references. The common use of general entities is to declare the text replacement value for entity references. General entities are commonly referred to as entities unless used in a context where that name would be ambiguous; therefore, for the sake of this section, entities will refer to general entities. Entities are defined within the DTD, which is part of the prolog. Suppose you had the string “This is replacement text”, which you want to use many times within the document. You could create an entity with a legal name, in this case “replaceit”: ]> &replaceit; If this document were loaded into a parser that was substituting entities, which means it is replacing the entity reference (&replaceit;) with the text string defined in the entity declaration, the results would look something like this: ]> This is replacement text Using Namespaces Documents can become quite complex. They can consist of your own XML as well as XML from outside sources. Element and attribute names can start overlapping, which then makes the names ambiguous. How do you determine whether the name comes from your data or from an outside source? Looking at the document, you would have to guess what the elements and attributes mean depending on the context. Unfortunately, applications processing the XML typically don t understand context, so the document would no longer have the correct meaning. Namespaces solve this potential problem. Namespaces are collections of names identified by URIs. They are not part of the XML specification but have their own specification that applies to XML. Through the use of namespaces, names within a document are able to retain their original meanings even when combined with another document that contains some of the same names with completely different meanings. Assume you are building a document that includes customer information as well as items they have ordered, and assume your customer records look like the following:
Note: If you are looking for good and high quality web space to host and run your application check Lunarwebhost PHP MySQL Web Hosting services
CHAPTER 2 XML STRUCTURE Listing 2-13. Invalid Comments within a document –> > Processing Instructions XML is purely concerned with document content. A PI allows application-specific instructions to be passed with the document to indicate to the application how it should be processed. The PI takes the form of , which is followed by the target (which must be a valid name) and white- space, then takes the actual instruction, and closes with ?>, like so: The target indicates the application that the instruction targets. You might already be familiar with this syntax from PHP: This syntax is a PI. The PI target is php, and the instruction is echo “Hello World”;. If you were creating an XHTML document and embedding PHP code, this would constitute a well- formed XML document. Another case you may have already encountered is the association of style sheets with an XML document. Many XML editors will add the following PI so they can easily perform XSL transformations on the XML you may be editing: Entity References You have already encountered some of the built-in entity references (&, <, >, ', and ") throughout this chapter. Just as characters can be represented using numeric character references, entity references are used to reference strings, which are defined in the DTD. They take the form of &, which is followed by a legal name, and they terminate with a semicolon. You are probably familiar with the concept from HTML: Copyright © 2002
The entity reference © is defined in the HTML DTD and represents the copyright symbol. Entity references cannot just be used blindly, however. The document must provide a meaning to an entity reference. For instance, if you were looking at a document that contained
&myref;
, the entity reference &myref; has absolutely no meaning to you or may mean something completely different to you than to me. You can use DTDs to define an entity reference. It is mandatory that any entity reference, other than those that are built in, must be defined. Looking at an HTML page, you may notice the DOCTYPE tag at the top of the page. The contents depend upon the type of HTML you are writing. For instance, -//W3C//DTD HTML 4.01 Transitional//EN refers to the DTD http://www.w3.org/TR/ html4/loose.dtd. This file contains a reference to http://www.w3.org/TR/html4/ HTMLlat1.ent. If you looked at the contents of this file, you will notice that the entity copy is defined as .
Note: If you are looking for good and high quality web space to host and run your application check Lunarwebhost PHP MySQL Web Hosting services
CHAPTER 2 XML STRUCTURE Listing 2-11. Example Using CDATA Section this & that ]]> Clearly, the document in Listing 2-11 is much easier to read than the one in Listing 2-10. If editing a document by hand, it is also easier to write because you don t need to be concerned with figuring out what the correct entities to use are. Because of the flexibility of CDATA sections, you may have heard or read somewhere that CDATA is great to use for binary data. In its native form, this is not true. You have no guarantee that the binary data will not contain the characters ]]>. For this reason, binary data that must be encoded should use a format such as Base64. Now, if Base64 is used for encoding, a CDATA section is not even necessary, and it could be embedded directly as an element s content. This is because Base64 does not use any of the characters that would be deemed illegal for element content. Comments You can use comments to add notes to a document. This is comparable to a developer adding comments to source code. They do not affect the document but can be used to add some notes or information for someone reading it. For this reason, parsers are not required to parse comments, although most will allow access to the content. This is what a comment looks like: Comments consist of the initial . Be aware of the following stipulations when using comments: The content for a comment must not contain –. A comment may not end with -. Other than those conditions, comments can contain any other characters. Comments may also occur anywhere after the XML declaration as long as they are not contained within markup. Listing 2-12 shows some valid comments, and Listing 2-13 shows some invalid ones. Listing 2-12. Valid Comments
Note: If you are looking for good and high quality web space to host and run your application check Lunarwebhost Clan Web Hosting services
CHAPTER 2 XML STRUCTURE Could multiple values apply to an element? Is a DTD requiring the attribute being used? Is the data essential to the document and not just an instruction for an application? Is the value complex data or difficult to understand? Does the value need to be extensible for potential future use? If the questions aren t applicable, then it comes down to personal preference. One point to always remember is that the document should end up being easily understood by a human and not just meant for electronic processing. With this in mind, you have to ask yourself which of the following is easier to understand. This is the first choice: and this is the second choice: Ford black 1990 Escort CDATA CDATA sections allow the use of all valid Unicode characters in their literal forms. The CDATA contents bypass parsing so are great to use when trying to include content containing markup that should be taken in its literal form and not processed as part of the document. CDATA sections begin with <xml version=”1.0″?> <document> this & that </document>
Note: If you are looking for good and high quality web space to host and run your application check Lunarwebhost Cheap Web Hosting services
CHAPTER 2 XML STRUCTURE within an element must be uniquely named, meaning an element cannot contain more than one attribute with the same name. Listing 2-9 shows an invalid attribute usage. Listing 2-9. Invalid Attribute Usage Attributes also have no specified order within the element, so the following two examples are identical, even though the order and quoting are different: Attribute Values Attributes must also have a value, even if the value is empty. Again, referring to HTML, you may be accustomed to seeing lone attribute names such as
or
. Notice that noshade and noresize have no defined values. These are not well-formed XML and to be made conformant must be written as
and
, which now makes them XHTML and XML compliant. In cases where an attribute value is empty and there are no rules for any default values, such as those for converting HTML to XHTML, you would write an attribute as such: attrname=”". Attribute values can also not contain unescaped < or & characters. Also, you should use escaped characters for double and single quotes. Although it might be OK to use a literal single quote character within an attribute value that is encapsulated by double quotes (though in this case double quote characters must be escaped), it is not good practice and highly discouraged. Suppose you wanted to add some attributes to the element Car with the following name/value pairs: color: Black and white owner: Rob s score: Less than 5 You would write this as follows:
Attribute Use The use of attributes, unless specifically required such as through a DTD, is really a choice left to the document author. You will find opinions on attribute use running the full spectrum, with some saying you should never use attributes. When considering whether you should use an attribute or whether it should be a child element, you have a few facts to consider. If you can answer yes to any of the following questions, then you should use an element rather than an attribute:
Note: If you are looking for good and high quality web space to host and run your application check Lunarwebhost PHP Web Hosting services
CHAPTER 2 XML STRUCTURE This clearly indicates something is wrong with the document. The document in Listing 2-8 applies the rules for XML elements to the document from Listing 2-7 to produce a well-formed XML document. Listing 2-8. HTML Example Using Well-Formed XML
This is in Italics and this is Bold
This might also give you an inclination of why Extensible HTML (XHTML) was created. XHTML is a stricter version of HTML that not only can be processed by a browser but, because it is XML compliant, can also be processed by applications. Attributes You can think of attributes as properties of an element, similar to properties of an object. You might be shaking your head right now completely disagreeing with me. You are 100 percent correct, but for a simple document and to give at least a basic idea of what they are, I will use that analogy for now. Attributes can exist within element start tags and empty-element tags. In no case may they appear in an element end tag. Attributes take the form of name/value pairs using the following syntax: Name=”Value” or Name=’Value’. You can surround values with either double or single quotes. However, you must use the same type of quotes to encapsulate the attribute s value. It also is perfectly acceptable to use one style of quotes for one attribute and another style for a different attribute. The attribute name must conform to the constraints defined by the term name earlier in this chapter. Also, attributes
Note: If you are looking for good and high quality web space to host and run your application check Lunarwebhost PHP MySQL Web Hosting services
CHAPTER 2 XML STRUCTURE ran into years ago, before the days of CSS, was related to forms and tables. Depending upon the placement of the form and table tags, additional whitespace would appear in the rendered page within a Web browser. To remove the additional whitespace, designers would open forms prior to the table tag and close them before closing the table. Web browsers, being forgiving, would render the output correctly without the extra whitespace even though the syntax of the document was not actually correct. As far as XML is concerned, that type of document is not well-formed and will not parse. Elements must be properly nested, which means they must be opened and closed within the same scope. In Listing 2-7, the table tag is opened within the scope of the form tag but closed after the form tag has been closed. Even though it may render when viewed in a browser, the structure is broken and flawed because the form tag should not be closed until all tags residing within its scope have been properly terminated. Each time an element tag (start, end, or empty element) is encountered, you should insert a line feed and a certain number of indents. Typically for each level of the tree you descend (each time you encounter an element start tag), you should indent one more time than you did the previous time. When ascending the tree (each time an element s end tag is encountered), you should index one less time than previously. Because an empty-element tag serves both purposes, it can be ignored. If you tried to do this with the example from Listing 2-7, you just could not do it. Using whitespace for formatting also makes it pretty easy to spot where it is broken as well:
This is in Italics and this is Bold
New Line here
CHAPTER 2 XML STRUCTURE Notice that the last example does contain content. Even though it s only a single space, the element contains content. Every character, including whitespace, is considered content. Element Hierarchy The most important point to remember when dealing with XML is that it must be well-formed. This may be redundant information, but if you are coming from the HTML world, it can be easy to forget. It s easy to get away with malformed documents when writing HTML, especially because not all tags are required to be closed. Take the HTML document shown in Listing 2-7, for example. Listing 2-7. HTML Example
This is all in Italics and this is Bold
New line here