An Introduction to PDF Tags

A remediated PDF document is a tagged PDF document. But what are tags? If you are not a web designer the concept of tags may not be second nature, so in this post, I will aim to demystify PDF tags, what they are, their purpose and how you can use them.

This post became an idea after I watched a video by Dax Castro of PDF Accessibility: Understanding the PDF Tags Tree. If you haven’t seen the video yet, I recommend watching it.

What are Tags?

Tags are containers that store content metadata (descriptions of content) of elements in a PDF document. Tags organize content and make it “readable” by assistive technologies, like screen readers. They help identify document structure and relationships between elements.

Container Tags

Container tags help organize the tag structure into groups. They are used by PDF remediators and are ignored by assistive technologies. Examples of common container tags include: <Article>, <Document>, <Part> and <Sect> or section.

 

If you are working with a PowerPoint document export, you will see a list of <Sect> or section tags in the tags tree, this is how PowerPoint organizes individual slides.

 

You can decide whether to use <Part> or <Sect> or both to organize the tags tree, just be consistent in your approach.

The very first tag in a remediated PDF document should be a <Document> tag, indicating a complete document. This is a MUST. You will most likely need to add this tag to the structure.

 
 

Once you add the <Document> tag, you will need to move all the other tags inside it. The result will be a nested tag structure with the <Document> tag in first place.

 
 

Heading Tags

Heading tags, <H1> or heading one, <H2> heading two, <H3> heading three, and so on, add structure and hierarchy to the document. The <H1> tag is reserved for document title and can be used once (Dax Castro disagrees with this in his video – watch the video for his rationale).

 

Use <H2>, <H3>, <H4> tags as appropriate, but do not skip over headings, for example <H1> then <H3> then <H2>. Skipping headings is a WCAG violation of the 1.3.1 Info and Relationships criterion (read more about providing headings by marking content with heading tags).  

Presenting tags in sequential order will also help avoid a nesting error flagged by the Accessibility Checker.

 

Nested Tags

Nested tags have a parent/child structural relationship and group all elements that belong together. Examples of nested parent tags include <L> list, <TOC> table of contents and <Table>.

Here is a list (pun intended!) of child tags of the <L> list tag:

<L>

<LI> list item

<Lbl> label (i.e. bullet)

<LBody> text content

 
 

Here is the child element of the <TOC> table of contents tag:

<TOC>

<TOCI> Table of Content Item

 
 

The <Table> tag is another parent tag, with the following children tags:

<Table>

<TR> table row

<TH> table header

<TD> table data

Here is an example of a tag structure of a simple table with a header row (see <TH> tags nested inside a <TR> tag) and one row of data cells (<TD> tags nested inside a second <TR> tag).

 
 

When you create a PDF document form a Microsoft Word document, you will find PathPathPathPathPathPath container tags in your table structure. The PathPathPathPath is Acrobat’s way of identifying a table border, but I’ll save this for another blog post.

This concludes my very first blog post. Remember, “Accessible design is good design” (Steve Ballmer) so integrate accessibility practices into your content development workflow and your PDF remediation process will be that much easier.  

Resources

  1. PDF Techniques for WCAG 2.0

  2. Info and Relationships: Understanding SC (Success Criterion) 1.3.1