Including Files Using Straight HTML

Getting the job done for a particular task always calls forth the need for the proper set of tools. In this particular case, the task was to create a single-page static html content-rich site. When I say the site is “content-rich” what I actually mean is that there are paragraphs upon paragraphs of copy: text, images, etc.

Editing a giant HTML page no different than editing a large file of code written in any other language: it is a maintenance nightmare and so it is what you call not fun.

##Searching for Solutions I knew about jekyll and of course since I love it so much, I wanted to use it, however jekyll is way too much for what is needed. All I really need for this particular project is a way to use include statements in HTML.

###Server-Side Options There are a number of different ways to include one HTML document inside of another, but none of these methods are incorporated by using HTML directly by itself. Anyone who is familiar with PHP is familiar with the PHP [include]php-include function:

<?php include("myfile.html") ?>

while those who haven’t used php who have used apache or Microsoft IIS are familiar with SSI, which may appear in the form of something like:

<!-- #include virtual="myfile.html" -->

Neither one of these options seemed all too appealing to me. They both required some kind of processing from a server which is not testable offline without running a local server and also requires processing on every page hit.

What I really wanted was a way to be able to do this offline, and “build” my site by piecing it together somehow.

##Building Offline Here are the other methods I came across, starting with the simplest ideas of my own and moving on to the better ideas that I came across on the web.

###The Unix cat Utility The first thing I came up with was using cat. In case you don’t know, cat is the unix utility which concatenates multiple files. This was my sure-fire way to get the project moving assuming I couldn’t come up with something better. cat has a few advantages and disadvantages.

Advantages to using cat:

I already know how to use it
It’s fairly simple
There’s no way it won’t work

Disadvantages to using cat:

It is extremely messy
It will require start and end tags to be separated across multiple files
If there is an HTML error it will be hard to track across these multiple files

###Using sed and cat The second idea I came up with was to create some kind of template, with separate pages and somehow include them. I thought it might be possible to somehow use sed to do a regex replace, find and replace something like

{ include myfile.html }

with the value of cat myfile.html. This seemed kind of tricky and too much work.

Advantages of using sed and cat:

Gets me that nice looking template I wanted

Disadvantages of using sed and cat:

Requires a lot of coding and time spent trouble shooting regex, etc.

###Using the C Preprocessor I went for some help on IRC after many failed attempts to return relevant search results for my regex replacement method. After explaining my situation someone recommended the use of a preprocessor—which is exactly what I was creating without even realizing it. We found a document on the web with instructions on Using a C preprocessor as an HTML authoring tool.

In short, this allows me to create a template file:

<html>
    <head>
        <title>blickity blah</title>
    </head>
    <body>
        #include "mypage.html"
    </body>
</html>

and run it through gcc:

gcc -E -x c -P -C sample.htm > output/sample.html

The original document goes through a breakdown of each flag, optionally you could read the manpages for more detail (man gcc).

This was pretty neat and I could have even stopped at this but I decided to go with a slightly more elegant solution and rather than using gcc I decided to go with m4.

Advantages to using C Preprocessor:

The tools and language are pre-existing
It is fairly easy to invoke
No extra work needed to be done

Disadvantages to using C Preprocessor:

Remembering the proper gcc arguments to disable the c compiler and enable it for preprocessing only is a bit cumbersome unless you are extremely familiar with gcc’s command line options.

###Using M4 This method is identical to using the C Preprocessor. The difference here is that it’s a little simpler. We create a template file similar to before:

<html>
    <head>
        <title>blickity blah</title>
    </head>
    <body>
        include(mypage.html)
    </body>
</html>

and run it through m4:

m4 index.html > output/index.html

The only issue I had here with m4 is that it was adding a newline character after my include for some reason. To fix that I used dnl which I believe stands for “discard new line.” If that’s too hard to remember, I like to think of it as “do not line,” mostly because this makes no sense at all but keeps the meaning clear. Now the original template looks something more like this:

<html>
    <head>
        <title>blickity blah</title>
    </head>
    <body>
        include(mypage.html)dnl
    </body>
</html>

Advantages to using m4:

The tools and language are pre-existing
It is fairly easy to invoke
No extra work needed to be done

and for disadvantages… There are no real show-stopping disadvantages that I can come up with.

##Wrapping it Up Here is how I set up a small example for production work.

mkdir site
cd site/
mkdir pages templates webroot
for i in {1..5}; do echo "<p>This is page $i</p>" > pages/page-$i.html; done

Which leaves us with:

.
├── pages
│   ├── page-1.html
│   ├── page-2.html
│   ├── page-3.html
│   ├── page-4.html
│   └── page-5.html
├── templates
└── webroot

Here’s my explanation. Pages (or sections) will go into the pages/ directory. All templates will go under templates/ the published result will be compiled and sent to webroot/.

Now let’s create a simple template (vim templates/index.html):

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html>
    <head>
        <meta http-equiv="content-type" content="text/html; charset=utf-8"/>
        <title>Test Page</title>
    </head>
    <body>
        <div id="section-1">
            include(pages/page-1.html)dnl
        </div>
        <div id="section-2">
            include(pages/page-2.html)dnl
        </div>
        <div id="section-3">
            include(pages/page-3.html)dnl
        </div>
        <div id="section-4">
            include(pages/page-4.html)dnl
        </div>
        <div id="section-5">
            include(pages/page-5.html)dnl
        </div>
    </body>
</html>

To help manage everything, I’m using a rakefile. If you don’t like rake you’re welcome to use make or a shell script of your own. Here’s what the Rakefile looks like (vim Rakefile):

require 'rake/clean'

SRC='templates/*.html'
DEST='webroot'
CLEAN.include("#{DEST}/*")

task :compile do
  puts 'Compiling...'
  templates = FileList[SRC]
  templates.each do |t|
    sh "m4 #{t} > #{DEST}/#{File.basename(t)}"
  end
end

task :tidy do
  puts 'Tidying...'
  files = FileList["#{DEST}/*.html"]
  files.each do |f|
    sh "tidy -i -n -m #{f}"
  end
end

task :default => ['compile']

It’s somewhat configurable by changing the SRC and DEST variables at the top, to allow you to name your directories differently.

Running rake will default to compiling your HTML:

rake
tree

Outputs:

.
├── Rakefile
├── pages
│   ├── page-1.html
│   ├── page-2.html
│   ├── page-3.html
│   ├── page-4.html
│   └── page-5.html
├── templates
│   └── index.html
└── webroot
    └── index.html

3 directories, 8 files

and if we examine the output file:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html>
    <head>
        <meta http-equiv="content-type" content="text/html; charset=utf-8"/>
        <title>Test Page</title>
    </head>
    <body>
        <div id="section-1">
            <p>This is page 1</p>
        </div>
        <div id="section-2">
            <p>This is page 2</p>
        </div>
        <div id="section-3">
            <p>This is page 3</p>
        </div>
        <div id="section-4">
            <p>This is page 4</p>
        </div>
        <div id="section-5">
            <p>This is page 5</p>
        </div>
    </body>
</html>

Success! If you noticed, I also added a bonus tidy task which can be invoked with rake tidy afterwards, this will allow us to format the html in a pretty way and validate it.