Structuring Text with XHTML

Introduction

This is the textstructure.html sample file modified to demonstrate the use of a variety of CSS selectors.

In addition to the basic structural elements that we can use to give our Web pages a logical structure, there are several elements that can be used to provide the browser with finer-grained guidance as to the internal structure of the page's content.

This page was written to provide students in my CS403 class with a simple sample document that demonstrates the elements of XHTML that are most commonly used for this purpose. The primary motivation has been to generate a relatively simple, but realistic, document that not only reviews the concepts presented in lecture, but also demonstrates their use in a "real world" situation.

This document is focused primarily upon the elements that apply an internal structure to the content of an XHTML document. It assumes the reader understands the higher-level structural elements that have been presented earlier and uses them without further explanation.

To get the most out of this document, you should first read it and then study the source that the browser rendered in order to produce it. Use your browser's ability to display the source of a document for this purpose.

Emphasis

When composing text, it's very common to require that specific portions of the text be emphasized. In XHTML, we simply place the text to be emphasized within an <em> element.

The <em> element asks the browser to emphasize its contents, but leaves the decision as to how that emphasis is to be achieved to the browser's discretion. This makes a great deal of sense on the Web, since as authors we generally have no way to predict what the capabilities of the browsers displaying our pages will be.

Because the <em> element is an inline element, it may be nested within block elements, such as paragraphs, without forcing the block to end. For example, I've emphasized the proper nouns in the following paragraph. Note how your browser chooses to present them differently than the surrounding text. Remember, however, that you should resist the temptation to use the <em> element for its presentational effects and use it only for its structural meaning.

My name is Mike Gildersleeve. I teach for the Computer Science Department at The University of New Hampshire. The courses that I teach include Online Network Exploration, Web Design and Development and Introduction to Client-side Web Development.

Sometimes you need to emphasize a portion of your text in such a way that it is conveyed with even more impact than emphasized text that appears elsewhere in the page. For this purpose, you should use the <strong> element. The <strong> element instructs the browser to present the text it contains with even stronger emphasis than the browser would employ for an <em> element.

For example, consider the paragraph from above with additional emphasis placed on the name of our university. Most browsers will present the strongly emphasized text differently than the emphasized text.

My name is Mike Gildersleeve. I teach for the Computer Science Department at The University of New Hampshire. The courses that I teach include Online Network Exploration, Web Design and Development and Introduction to Client-side Web Development.

As you can see from the sample paragraph above, the key to effective use of emphasis is restraint. The more you use emphasis, the less emphasis it actually conveys. Consider the extreme case where an entire page is emphasized. In this case, the emphasis is completely ineffective.

Abbreviations and acronyms

When you use abbreviations and acronyms within your Web pages, it's important to consider the accessibility issues that they raise. In relation to Web authoring, accessibility refers to the ease with which disabled users are able to access the informational content of your pages.

Although accessibility is a wide ranging topic, the primary concern at the moment is how a screen reader might handle abbreviations and acronyms. A screen reader is a software program that blind and low-vision Web users can employ to read Web pages aloud. Screen readers are generally capable of reading words just fine. However, abbreviations and acronyms present a special challenge because they can often be indistinguishable from words.

Using the <abbr> and <acronym> elements appropriately helps screen readers to avoid potential confusion by telling them explicitly to treat the element's content as an abbreviation or acronym, respectively. Both of these elements should use a title attribute to give the screen reader the expanded form of the abbreviated text for it to read aloud.

Consider the following example which contains both abbreviations and acronyms, each with an appropriate title attribute. Although you don't likely have access to a screen reader, try viewing the following paragraph in different visual browsers to get a feel for how they treat these elements. Try mousing over the abbreviations and acronyms. What, if anything, happens when you do?

I earned my B.A. from Dartmouth College and my M.S. from UNH.

Other phrase elements

In the technical terminology of the HTML 4.01 Specification, the elements we've been discussing are called phrase elements, because they add structural information to text fragments. Although those discussed above are the ones you are likely to find most useful, there are several other, less commonly used phrase elements that you may find useful from time to time.

Citations

When you reference the source of a quote or idea in your document, you should do so within a <cite> element. This gives the browser an indication of the role that citation text is playing within the document. The browser can then use that information for both presentational and functional purposes as it sees fit.

For example, I used a <cite> element in the introductory paragraph for the Phrase Elements section above to indicate the source of my quotation as the HTML 4.01 Specification.

Defining instances

In technical writing, the defining instance of a term or phrase is that point within the text where that term or phrase is defined. For example, I just there defined the term defining instance for the first time in this document, thereby making it an example of itself.

You are probably familiar with the practice of using defining instances from your experience with textbooks, where it is quite common. The first time a new term or phrase is introduced and defined, the textbook publisher typically presents it differently than the surrounding text. This enables the reader to quickly backtrack to the defining instance if they subsequently encounter that term or phrase and have forgotten its meaning.

If, in writing your Web pages, you use the <dfn> element to indicate the defining instances of terms and phrases, you will enable the browser to offer the same level of convenience to your readers that you take for granted in your textbooks.

Variables

When discussing technical topics, it is not uncommon to encounter the need to present formulas and equations which then are subsequently discussed within the text that follows. If you've ever encountered such a situation in a textbook, you may recall that it is fairly common to see references to variables within the discussion of an equation presented differently than the surrounding text.

In XHTML documents, you can use the <var> element to achieve a similar result. Simply enclose any reference to a variable within a <var> element and the browser will be able to tell that it is playing a specific role within your text.

For example, consider the following well-known equation and the paragraph that follows discussing it. Note that I have opted to use <var> elements both within the equation and the discussion that follows. Also note that I am using the <var> elements in strict accordance with their structural purpose in all cases and not for their presentational effect.

E = mc2

In Albert Einstein's famous equation, he is indicating that energy, represented by E, is equivalent to mass (m) times the speed of light (c) squared. Since the value of c is widely regarded as constant for all practical purposes, the elegance of this equation in undeniable.

User-entered text

If you need to convey computer-related instructions to your readers via a Web page, you'll eventually need to give the reader some text that they're expected to type.

To allow the browser to identify text the user is expected to enter and render it differently from the surrounding text, place the text that the user needs to type into a <kbd> element.

Consider the following paragraph, which contains several examples. Notice that the text which the user is expected to type is presented differently than the surrounding text, making it clear exactly which text the user should enter.

If you are using a terminal emulator to publish your Web pages, there are a handful of Unix commands you will need to know. To establish your main Web directory, type mkdir ~/public_html. To set the permissions for your home directory, enter chmod 755 ~. And, once the files have been placed in your main Web directory, you can set the necessary permissions for the directory and everything it contains with the single command chmod -R 755 ~/public_html.

Code and output samples

In a Web page discussion of computer programming, you might find it necessary to present fragments of computer language code and/or samples of the output that running some code produces. Use the <code> element for fragments of computer language code, and use the <samp> element for the sample output.

Since most students in CS403 will not have any programming background, the most likely use you might have for these elements would be in discussing XHTML or other Web-related languages. Consider the following example which offers a bit of XHTML code followed by a sample of the output it might produce in a browser. Note that since both the <code> and <samp> elements are inline elements, they both should be nested within a block element of some sort.

The XHTML code One<br />Two<br />Three<br />Four produces the following output when rendered by the browser:

One
Two
Three
Four

Quotations

There are two types of quotation that commonly appear within documents. Short quotations (at least in English documents) are typically embedded inline with the surrounding text and surrounded with quotation marks. Long quotations, on the other hand, are typically set off from the surrounding text as a separate block.

XHTML supports both types of quotations. Use the <q> element for short quotations and the <blockquote> element for long quotations. The browser should use the information provided by these elements to present their contents as a quote in the appropriate form. Note that since the browser is responsible for deciding whether to use quotation marks, you should not include your own quotation marks with either of these elements.

Because different languages may have different quotation conventions, both of these elements can accept a lang attribute that indicates the language in which the quote is being presented. For American English, set the value of the lang attribute equal to en-us.

Both of these elements may also accept a cite attribute, the value of which represents the URL of a Web page representing the source document of the quotation or information on where to find the source document.

As an example of a short quotation consider the following paragraph repeated from earlier in this document. When examining the XHTML source, note that both the cite and lang attributes have been used and that no quotation marks appear around the quote in the source. The quotation marks you see in the rendered page are part of the browser's presentation.

In the technical terminology of the HTML 4.01 Specification, the elements we've been discussing are called phrase elements, because they add structural information to text fragments. Although those discussed above are the ones you are likely to find most useful, there are several other, less commonly used phrase elements that you may find useful from time to time.

As an example of a long quotation consider the following quote on accessibility drawn from the HTML 4.01 Specification. Note that the <blockquote> element also uses both the cite and the lang attributes. Its default presentation does not include quotation marks, but instead sets of the contents of the quote by indenting it from the edges of the window. Also note that the <blockquote> element is allowed to contain nested <p> elements.

To make the Web more accessible to everyone, notably those with disabilities, authors should consider how their documents may be rendered on a variety of platforms: speech-based browsers, braille-readers, etc. We do not recommend that authors limit their creativity, only that they consider alternate renderings in their design. HTML offers a number of mechanisms to this end (e.g., the alt attribute, the accesskey attribute, etc.)

Furthermore, authors should keep in mind that their documents may be reaching a far-off audience with different computer configurations. In order for documents to be interpreted correctly, authors should include in their documents information about the natural language and direction of the text, how the document is encoded, and other issues related to internationalization.

Superscripts and subscripts

Under certain circumstances, especially when composing technical documents, you will find it necessary to create superscripts and/or subscripts. A superscript consists of text that is typically presented in a smaller text size and raised somewhat above the baseline of the surrounding text. A subscript, by comparison consists of text that is typically presented in a smaller text size and dropped somewhat below the baseline of the surrounding text.

Both superscripts and subscripts are used frequently in math and science. Superscripts can also be useful in writing ordinal numbers and some foreign languages.

To instruct the browser to superscript some text, simply enclose that text in a <sup> element. To instruct the browser to subscript some text, enclose that text in a <sub> element.

Consider the following examples. In examining the source, note that only the portion of text to be superscripted or subscripted appears in the <sup> or <sub> element.

1st place: New Hampshire
2nd place: Maine
3rd place: Massachusetts

1000002 is how we write the value 32 in the binary number system
A chemist might write down water as H2O

Preformatted text

Sometimes, for the sake of efficiency or simplicity, you may find it convenient to instruct the browser to present your content exactly as you have typed it in your source, without reducing the whitespace. To accomplish this, place the text within a <pre> element.

When it encounters a <pre> element, the browser temporarily stops reducing sequences of whitespace characters to single spaces and presents the contents of the element exactly as they appear in the XHTML source. That means that the carriage returns, blank lines and indentation used in typing the source will all get carried through to the rendered page.

Consider the following ASCII image of a cat (obtained from The Great Ascii Art Library at http://www.geocities.com/SouthBeach/Marina/4942/ascii.htm) which relies on its whitespace as an integral component of its imagery.

         |              |
       .' `.          .' `.
      ; :   \_..--.._/   : .
      | . '            ` . |
      '   ___        ___   `
      '  `.  `.    .'  .'  `
     :     `-.|    |.-'     :
     .     .  `    '  .     ,
     /      `. \  / .'      \
    `,'  . . .` `' '. . .  `.'
     `,'    .__.--.__.    `.'
      `,'                `.'
       `,'-`;::....::;'-`.'
        `    ''::::``    '    
    

For comparison sake, here is exactly the same content wrapped in a <p> element rather than a <pre> element. See the source to prove to yourself that all I changed was the name of the element.

| | .' `. .' `. ; : \_..--.._/ : . | . ' ` . | ' ___ ___ ` ' `. `. .' .' ` : `-.| |.-' : . . ` ' . , / `. \ / .' \ `,' . . .` `' '. . . `.' `,' .__.--.__. `.' `,' `.' `,'-`;::....::;'-`.' ` ''::::`` '

Character entities

Some text characters cannot be easily typed on the computer keyboard, and some of those that can have special meaning to the browser. In particular, there are four reserved characters that have special meaning to the browser. Whenever a browser encounters one of these reserved characters in an XHTML document, it's likely to interpret that character in terms of its special meaning. If the author intended the reserved character to simply be displayed like any other character, the browser's tendency to assign it special meaning is likely to cause problems and confusion.

For this reason, Web authors need to have a way to indicate, for each of these reserved characters, that it should be displayed as itself and not treated as having any special meaning. To accomplish this, Web authors use character entities.

Character entities are special codes that can be inserted into XHTML documents. Unlike tags, which are surrounded by angle brackets, character entities are codes that start with an ampersand and end with a semicolon. When the browser sees one of these codes, it knows to replace the code with the corresponding character in the rendered page.

In other words, character entities allow us to insert reserved characters into our XHTML so that the browser knows they are meant to be displayed as the character and not given any special meaning.

When you want to insert a less than sign into a page (as opposed to using it to begin a tag), you need to type the character entity &lt;

When you want to insert a greater than sign into a page (as opposed to using it to end a tag), you need to type the character entity &gt;

When you want to insert a quotation mark into a page (as opposed to using it to begin or end an attribute value), you need to type the character entity &quot;

And when you want to insert an ampersand into a page (as opposed to using it to begin or end an attribute value), you need to type the character entity &amp;

Those are the only four character entities you will need to commit to memory. Since you must only used reserved characters when they have their special meanings within your XHTML, you must use one of these character entities any time you wish to have <, >, " or & appear in your page content. In fact, if you examine the source of this document, you'll see that I've used some of them extensively.

There are many, many additional character entities that you may find useful from time to time, but you can always look them up when you need them.

There are character entities for many common symbols, such as the copyright symbol you see in the footer of this page and various international currency symbols, such as £ and ¥.

There are also many character entities for generating the accented and marked characters common in many foreign languages, so you can properly spell words like résumé, naïve and jalapeño in your Web pages.