This guest blog post was written by Jack DeLand. Jack has provided Help projects and/or Help authoring training to Fortune 50 companies around the world for over 30 years. He specializes in MadCap Flare project development, architecture, and design, and resides in Ypsilanti, a small city next to Ann Arbor, Michigan.
This is the second post in a 3-part series on Jack’s revamping of an HTML 4 website for The James Fenimore Cooper Society using MadCap Flare. This part focuses on designing the site.
This is the third post in a 3-post part series on Jack DeLand's revamping of a 2000 A.D. HTML 4 website for The James Fenimore Cooper Society using MadCap Flare. After exploring the need for updating an outdated website and delving into the design process, this final post, focusing here on some tools and techniques used in the transformation.
Missed Part 1? Start here.
With a website from 2000 on his hands, Jack DeLand set out to give The James Fenimore Cooper Society's digital home a much-needed facelift using MadCap Flare™. Because time constraints meant starting from scratch wasn’t an option, Jack reused elements from a previous project while introducing new design concepts, fonts, etc.
This post dives into the creative process, offering insights into the evolution of the site's layout, style, and structure.
Designing the Site
Designing is not a profession but an attitude.
— László Moholy-Nagy
Although we have free rein with the site, there will not be enough time to start from scratch. I re-used most of the Poe site’s functionality and didn’t monkey too much with the Cooper site’s existing file structure. The two sites soon diverged as the Cooper project evolved stylistically. In turn, I reused new learnings from the Cooper stylesheets on the Poe site CSS. The sites are cousins.
A design (any design), in my experience, never starts out fully formed, and always undergoes some form of evolution over time. Progress is never linear. Similarly, a website is never finished: grow or die, says Nature.
Unity and Integration
What makes a site look more professional, more “of a piece”? The following notes are from a work in progress, and a full workup of designing with Flare for this one project will have to wait a bit.
Layout Standard
All articles however are conformed to a layout standard that normalizes typographical conventions used to apply formatting. These are not all aesthetic; for example, Heading 1 must be followed by Heading 2 to allow proper searching by Google and other crawlers on the Web.
Components: an Article
An article has an entry point, body text with headings, Notes, Works Cited, and an (optional) Appendix. Each section of the article has certain dedicated styles applied, and these are consistently applied throughout the site.
Anomalies
Quirky uses of asterisks, equal signs, and other hacks are replaced by a 1-pixel horizontal rule displayed at 100% width, with right and left margins of “auto.”
Images
Images are displayed at 99% except where size is an issue, i.e., the image takes up too much screen area. The user is encouraged to use the Hide TOC feature to display images of artworks.
Page Layout
There are no plans to generate a PDF version of the site, so we’re talking “virtual” page laid out in HTML.
As usual, there was no documentation of the reasoning behind any of the design decisions. The previous webmaster used a free program called “First Page 2000,” which I thought might have been an error for “Front Page,” but it is still advertised on the web. However, clicking any of the links at the site popped up Norton’s infamous “Dangerous Webpage” warning.
Standardizing the Text
Standardizing Page Sections
Scholarly articles are composed of text blocks with identifying information and publication restrictions. There is a pattern to enforce. The text originally looked like this:
Separate the blocks of text in your mind before jumping in with the keyboard.
Out of sequence
“Presented” at block is correct.
Copyright statement too soon.
Next should be “Originally published” block if it exists.
Last is “[may be downloaded]” statement. This will be replaced by a snippet.
Then comes <hr /> statement for rule (no line space above the rule, which is styled in the CSS).
Corrected Sequence
In order of importance.
Style Guides
We use the MLA Handbook of Style, 9th edition for new material. Existing scholarly articles are not edited for any reason except to correct obvious errors such as typos or improper formatting.
Project-wide Operations
When performing operations on hundreds of files, you’ll of course need to plan carefully. Thing is, plans also need to evolve as you uncover more structural details of a never-documented file architecture or a standard set of subsections across folders (directories). There is a structure of some kind there; after all, it is a functioning website.
Be creative in your approach to project-wide operations. For example, one page of our sample has a list of URLs―about 300 of them. We need to know, quickly and easily, whether we have “found” all the files and therefore have working links to all of them. We need a list of files referenced, and a list of files in the project’s directory for comparison.
Trick: Creating a List of URLs from an XHTML File
You’ll recognize the graphic below as table text in XHTML format.
TIP: I used the Table Stylesheets editor to give the table rows meaningful names (author, year, title, etc.). It’s a simple, one-time tweak that enhances code readability and findability if you’re doing corrections by hand in a large table.
This table has 380+ entries, but we want to save only the a href links for matching against a file list during an update. The PDF names in the table should exactly equal the names in the file list.
Copy of file before running regex
This regular expression will find these links, which must begin with the partial path Resources/PDFs. It will also delete all other text in the file, so of course you must use it only on a copy of “the real thing.”
1. Open the file in Notepad++.
2. Use the Edit > File Replace dialog, Replace tab to enter the following:
3. Find:
.*?<a href="(Resources/PDFs/[^"]+)" target="_blank">.*?
4. Replace:
\1\n
5. Select these options.
- Wraparound
- Search Mode: regular expression
- . [period] matches newline
6. Select Replace All. Your result should look something like this:
Copy of file after running regex
You’ll note that there are two lines listing Typographia.pdf; this is correct, because that file has been linked twice inside this file
7. You may find some garbage text left over in the converted file. Clean it up and save the file in .txt format.
Making a File List
Forget PowerShell; good old DOS will do for something this simple.
- Start a command prompt by entering CMD.EXE in the Search box at the bottom left of the Windows desktop.
- Enter the path to the file folder (directory) that holds the files you want to list. In our case, we’ll change directories, too:
cd\ C:\POE\Poe\Content\Resources\PDFs - Enter the following at the command prompt. This DOS command will create a list of all files in the current directory and any subdirectories below it: dir /s/b >files.txt
Command line
4. Open the file in Notepad++.
Lots and lots of options for line editing
5. Select all entries, then select Edit > Line Operations > Sort Lines Lexicographically Ascending.
Splitting Topic Files with the Kaizen Toolbar
The Cooper site suffered from file bloat; that is, not the number of files but the size of each, is determinate. I cannot imagine what it would have been like to use this site in the year 2000. Obviously, the smaller the chunks you have in memory, the faster the build. This 3 GB project took 40 minutes; the Poe project, at 8 GB, took 10. Again, it’s not total project size that slows the build; it’s the size of the material being munged, one piece at a time.
This function is quite speedy; you receive properly separated topics and concomitant TOC files. Imagine how long it would take to split out and save 100 of these files manually. How would you track them?
Running Regex to Clean Up Extraneous Text
Here are some samples of the regular expressions used to clean out tables that positioned navigation links in the old website.
Delete the “Return to page” table
FIND = (?-s)<p>Return to.*</p>\R
REPLACE = (leave input empty)
Delete any table
FIND = (<body>)\s*?<table.*?</table>
REPLACE = $1
Remove HTML comments
/(<!--.*?-->)|(<!--[\S\s]+?-->)|(<!--[\S\s]*?$)/g
Delete italic newline space
FIND = \r\n <span class="italic">
REPLACE = <span class="italic">
These probably aren’t much use to you as is; use them as a starting point for your own conversation with ChatGPT. Deleting tables was a big deal; there were about 1600 total. The tables held only navigation links, which were made unnecessary by a real TOC in Flare.
Correcting Text
A cornucopia of editing errors was revealed in the old web files. Troubleshooting would take a long time, with endless sweeps (automated and manual) of the topic files. Here are a few of the most time-consuming edits.
Unreadable Characters
The Flare text editor, like a browser, shows the special character when it encounters a character code that it cannot display. Unfortunately, the same special character is used for different originals. You may need to go under the hood (bonnet?) to distinguish them.
Launching Notepad++ from Flare
In this case, three types of substitution are found (single and double quotes, em dashes), with some ellipses and some hyphens used in dates. No overall pattern has yet been found, so a global replace is out and each file must be swept with repeated replacement runs in Notepad++.
What to Do
The procedure here is to do a (manual and tedious) point by point search for a replace with each individual character, narrowing down the suspects until you can do a global replace. For example, look for “ and if it isn’t “ do not replace, but skip to the next point. You may be able to leave untouched only those points that need to become ‘, then replace them globally, once you have found all “. The Content folder holds 1351 files.
The replacement tasks can always become more difficult, usually in a slightly different pattern:
Yow! Where to begin? Look for patterns.
Keep in mind that because Windows applications nowadays can talk to each other, some tricks become available. For example, if you are converting a big swath of characters in corrupted HTML, you can use the Windows Clipboard to copy and paste from the live online source into Flare’s XML editor (not text mode), overwriting the corrupted part. Use the preview window to check the edits before building the project.
Trick: copying and pasting special characters from a live website into Flare
Improperly Entered Text
By far the greatest number of errors was in the maintenance edits to the text files. These were made by student assistants, mostly after Hugh was unable to continue, and some were quite egregious.
Non-tagged text
The text in smaller font in the figure above has been inserted without applying any formatting tags. Note the gray vertical bars at screen left: while others show a P for <p></p> tags, this text has no bar at all. The browser can display the text but only in the browser’s default font (which is why there are defaults).
Corrected text
Added <p> tag midline 46, and </p> tag in line 52 (highlighted). That’s it.
Remember that you can edit a topic file in more than one editor at a time but be careful of what you’re saving where. Are you certain when the program asks, “this file has been modified outside; reload?”
List Style Errors
The period is not within the same tag as the footnote. What happened, and is it repeated? Does this error create others? How many fixes can we automate?
Inspection reveals that “list-style: none” has been applied to the period in error. This styling instruction is designed to tell the browser to show a list item indented as usual, but without a bullet, number, or icon. It does not make sense here. Can you do a global replace for all? Better spot check first.
Using Notepad ++ to Convert Case
Original in Flare
If you come across files that have a great deal of full cap text to be converted, try using Notepad++ to change case in a variety of different ways. Note the options below:
Converting case in Notepad++
What’s the difference between Sentence case and Sentence case (blend)? Try it!
Proofing by Diff
A “diff” is coder slang for differential. It means a byte-by-byte comparison of two files. In this case, Notepad++’s Compare plug-in lets us view old and new in a matched scrollable pattern. (Use Plugins > Plugin Admin… to install from a list of plugins.)
The colors in the box titled Compare NavBar indicate types of difference. White means zero difference and you can skip this entirely, moving to the next color section. The orange at the right highlights the difference between two uses of an em dash, and the addition of a space after the ellipsis. The new file is correct and needs no editing, so we scroll to the next color marker, and update where necessary.
Note the “map” at image right; it highlights only changed areas.
The easy way to get the two text versions is to open the page in the browser, position the cursor at the beginning point for your capture of the old page, then press Ctrl + End + Shift. Paste into an empty file in Notepad++ and save as <any_name_OLD>. Repeat to create and save <any_name_NEW> from the new page. Run Plugins > Compare.
You can resolve missed characters by copying and pasting directly from the correct version. This situation usually arises when the underlying code has been displayed using a non-recognized character, represented on screen as a square or other odd symbol.
While “copy and paste into text files” may seem an unbelievably crude workflow, consider that you have no better means of capturing what a browser puts out on screen at any moment. Similarly, the lightweight Notepad++ is all the “programming” editor you need for a minute text inspection. Save yourself a few yottabytes of installation space.
If you’ve been successful in your initial passes, then this phase should be fairly simple/speedy, but don’t let it lull you into complacency. Chew some Juicy-Fruit to make your ears pop and open the vent window to blow cold air in your face. Jump on every divergence or discrepancy without mercy and nip it in the bud, right here in the text editor. Slow down and become perfection itself. Your time to shine.
Visit Part 3 for an overview of the tools and techniques used.