See also search engine optimization (SEO)
- After various validations pass, and I hand-check things according to [1], add Viewable With Your Favorite Browser into he footer.
- Later, write an essay on this topic and link to it from the footer, and instead link to anybrowser.org from the essay.
- RSS
HTML §
HTML Tidy §
The executable, not the Ruby library.
I pass everything through HTML Tidy. It's actually extremely difficult to build everything from scratch and be compliant. There are a number of things I do which are clumsy right now. HTML Tidy is a cheat.
I build everything to be XHTML 1.0 strict.
Tested and works on Unity Linux 64bit rc1 as of 2010-04-25:
cvs -d:pserver:anonymous@tidy.cvs.sourceforge.net:/cvsroot/tidy login
# press enter
cvs -z3 -d:pserver:anonymous@tidy.cvs.sourceforge.net:/cvsroot/tidy co -P tidy
cd tidy/build/gmake
make
su
smart install libxslt-proc
make install
The command I use is:
system(
'tidy',
'-clean',
'-quiet',
'-omit',
'-asxhtml',
'-access',
'-modify',
'--drop-empty-paras', 'true',
'--indent', 'true',
'--indent-spaces', '2',
'--keep-time', 'true',
'--wrap', '0',
'--force-output', 'true',
'--show-errors', '0',
'--show-warnings', 'false',
'--break-before-br', 'true',
'--tidy-mark', 'false',
'--output-encoding', 'utf8',
'--escape-cdata', 'false',
'--indent-cdata', 'true',
'--hide-comments', 'true',
'--join-classes', 'true',
'--join-styles', 'true',
source_file_full_path
)
For additional options, check out tidy -help-config
CSS §
I use the Mozilla-only rounded corners CSS, so I fail validation.
Links §
I don't understand why their checker doesn't allow checking of its own links, probably for traffic reasons. I have no robots.txt disallow for them.
JavaScript §
See JavaScript for the list of features used with it.
My knowledge of JavaScript is sorely lacking. There are a few things I've copied from elsewhere which are pretty critical which are not standards-compliant. I just don't know enough to fix or replace what I'm doing..
A HREF function-links §
JavaScript function-links are not actually valid. I've tried all sorts of stuff, but the best I can do is to wrap such things inside of <script type="text/javascript"> so that it'll only appear when JavaScript is enabled. Example link:
<a href="javascript:toggle('styles')">
To fix the validation issue, I started doing:
<a accesskey="t" href="/javascript.html#s0" onClick="javascript:toggle('styles');return false">Styles</a>
.. this forces me to have the link though. While this isn't really what I wanted, it does give an opportunity to link to another page explaining what JavaScript would have allowed the user to do.
Using HTML with document.write §
JavaScript document.write technically shouldn't have HTML opening tags within it. Tidy HTML will escape any forward slashes in opening tags, ruining the code. I've never found a way around this. If I put only ending HTML tags inside the JavaScript - which is valid - and the other text outside, I still get validation errors. I have found no way around this. Example code:
<script type="text/javascript"><!--
var heredoc = (<r><![CDATA[
<p>some example text</p>
]]></r>).toString();
document.write(heredoc);
//--></script>
<noscript>
</noscript>
Robots.txt §
- spiralofhope.com/robots.txt
- en.wikipedia.org/wiki/Robots_exclusion_standard
- www.robotstxt.org/robotstxt.html
Checkers:
- Those with a Google Account can use their Webmaster Tools to test robots.txt, see [2]
- live.com users have a similar tool in their Webmaster Center. But holy shit are the URLs fugly: [3] [4]
- tool.motoricerca.info/robots-checker.phtml
- www.invision-graphics.com/robotstxt_validator.html
- www.targetable.com/scripts/robotstxt.html
- tool.motoricerca.info/robots-checker.phtml
Notes on specific robots §
Internet Archive
- archive.org
- info: www.alexa.com/help/webmasters
- User-agent: ia_archiver
- google.com, www.google.com/imghp, video.google.com/, etc: www.google.com/intl/en/options/
- search.aol.com/, search.aol.com/aol/imagehome, video.aol.com/
- info: en.wikipedia.org/wiki/Googlebot
- I can't find direct information, try around here: [5]
- submit: www.google.com/addurl/
- User-agent: Googlebot
Microsoft
- www.bing.com / live.com
- yahoo.com
- altavista.com
- info: en.wikipedia.org/wiki/Msnbot
-
- I can't even give a direct URL to official information on it.
- submit: www.bing.com/webmaster/SubmitSitePage.aspx
- User-agent: msnbot
Yahoo (Switching to Microsoft's bing.com engine: [6] [7])
- yahoo.com
- altavista.com
- info: help.yahoo.com/l/us/yahoo/search/webcrawler/index.html
- submit (requires registration): search.yahoo.com/info/submit.html
- Yahoo directory submit: ecom.yahoo.com/dir/submit/intro/
- User-agent: Slurp
Cuil
- cuil.com
- info: www.cuil.com/info/webmaster_info/
- submit: www.cuil.com/info/contact_us/feedback/crawl_me
- User-agent: Twiceler
RSS §
(Not implemented yet)
Server §
Since this engine doesn't really care about the functionality of the server, I don't have much to say about it.
Misc. notes:
- Custom error documents
- Logging and statistics
- Security settings
Email §
I'm not using self-hosted email right now.

