
Thursday, June 07, 2007
I consider myself to have quite a decent array of Web searching skills. I'm quite up to speed on the various options Google provides, and tend to use them in complex combinations in order to try and get useful information with a minimum of chaff. I also have a tendency to do searches in parallel, popping up multiple tabs/windows at the same time to allow quicker cycling through the possibilities (something that infuriates Mrs. Dunce to no end when she's shoulder-surfing). Anyway, I tend to be pretty successful at finding what I'm looking for.
Or at least, when it comes to text. Or information that can be found using text, such as identifying a song from a snippet of lyrics (even when they're almost entirely in an unknown language, like this catchy number (link to lyrics). But when it comes to non-text searches, let me just say "Ouch".
For example, Mrs. Dunce is a big fan of a certain plant that grows well near us. It's a flowering tree with some lovely bluish-purple flowers. As pictured here (with Mrs. Dunce wondering about its name).
I've certainly tried text-based searching to find this sort of thing (text-based search tricks work just fine in Google Images, as long as you know the right sort of terminology), but it just isn't happening. Searching for things like "purplish-blue flowers" gets you similar plants but they just don't look quite right. And I get really tired of lavender, lavender, lavender, lilac, lilac, lilac which don't look right either. I've managed to find a nice online flower identifier but its questions assume a level of knowledge/attention well beyond my own (in addition to referring only to northeastern and north central US and adjoining bits of Canada). And there's no way I'm going to admit my ignorance by going into a flower shop and asking questions that reveal my ignorance, or posting the picture on a plant identification forum where no doubt it would be instantly identified.
Of course, by creating this post, I'm revealing my ignorance after all. But it's related to the more general question of how one might go about using text-based search techniques to find out information about an image you are having trouble identifying. This does not just apply to plants or flowers. Say for instance you see an image like this one. The man in the blue shirt is very famous, but who is he? You might get some hints from the name of the website, and the people standing next to him, but then what? Or maybe you see this picture and want to figure out what kind of bus it is (there is a very precise answer to this one which can be found using a different set of simple search tricks).
Anyway, it would probably be much easier to ask someone. But that would take the challenge out of it.

Wednesday, February 14, 2007
I get all kinds of spam these days, despite any number of filtering/marking schemes. But today I received a message from the best spammer name of all time: Firmness K. Joystick.
Three guesses what he was trying to sell me.
Other good recent names include Holden Burns, Bishop N. Desfunction [sic], Snider Mat, Frailey G. Neblett, and Hockaday U. Sunday.

Wednesday, January 03, 2007
Q Who had a beard of burnt up black?
A Blackbeard.
I've spent most of the holiday break away from the computer, so it's been a while since I've posted anything. So imagine my surprise when I checked my access logs and found zillions of referrals from internet searches for beards. Not only beards but specifically, those of burnt up black. I immediately discarded the possibility that the bear community had launched some sort of major Christmas publicity effort, and followed the referrals to their target. Actually, the beard in question is mentioned only in this entry, referring to a particular literary work. To which I refer obliquely at the moment for reasons that will become clear in a bit.
So where did all these references to this literary work come from? I googled the phrase myself, and quickly found the answer. It's a question in this year's King William's College Quiz (PDF link from King William's College site; html link from the Guardian). Quizzes like these are quite a British tradition (and Mannish as well, if that's not covered by the term "British"); this time of year it's nearly impossible to get through a newspaper without a year-in-review quiz of some sort, never mind all the pub quizzes out there. But it seems the gentle art of quizzery has suffered a serious blow thanks to the readily available mountain of information out there (here!) these days. I've been quite a fan of trivia in my day, and have a great appreciation for those who are able to retain vast amounts of inconsequential information (myself included to a limited extent, much more limited when it comes to British pop culture predating my arrival here [though I have made up ground when it comes to pop culture of the 21st century]). But now answering quiz-type questions is very different. Answering a question like the one above now requires little more than typing it into your friendly search engine (Google, that is: referrals from Google are occurring more than 30 times as often as all other search engines combined) and seeing what comes up. Unless, of course, this particular entry appears on your search results. For Blackbeard is not the correct answer at all, but Svengali (also mentioned in KWC's 2000 quiz). Even if a question is written in a manner that prevents searching for the exact quotation, clever use of search terms (usually not the norm, if my referral logs are anything to go by) can still often get the answer quite readily. Never mind people who start compiling their own lists of answers. I was tempted to start compiling such a list, all with incorrect but acceptable-sounding answers. But then I figured that having all the wrong answers in one place might be too obvious. So I'll stick with my old friend alone: Blackbeard and his beard of burnt up black.

Thursday, December 14, 2006
This blog has just been upgraded to a new version of dasBlog, which apparently provides some new ways to protect bloggists against spam (or "smap" if you prefer). I don't get a lot of visible spam here (comments are fairly well protected by CAPTCHA [except for one or two that seemed to have been added by hand, and were just as easily deleted by hand], trackbacks are disabled, and referrals are not made visible anywhere). But back behind the scenes, there are any number of referral spamming techniques cluttering my logs (spam blogs, dodgy links of various styles, spammy linky postings from open message boards, and so on and so on). But strangely enough, they've almost all been attracted to one particularly exciting post I made last September with the title Spammy, spammy, spammy (which just so happened to mention a few terms that often occur in spammy spams, like poker, diet pills, phentermine, cialis, jackpots, and virtual slots [uh oh, there are those terms again!]). The upgrade log revealed that this entry had received 7683 referrals (where most of my posts are in the low hundreds). So I wonder if a new entry of a similar nature might do the same, as a sort of honeypot attracting smappy interest away from the rest of my posts. If I wrote such an entry, my logs suggest that it probably shouldn't contain terms related to pharmaceuticals like viagra, prozac, zoloft, wellbutrin, thorazine if I want to keep the spam away. I sure wouldn't want spam related to insurance or banking, either, so I'd better avoid using terms like geico, aetna, insure, annuity, account. And when it comes to gambling, I really want to steer clear of slots, roulette, blackjack, poker (although I think I'm safe with three-card monte or baccarat). And I'm terribly afraid what would happen if I mentioned porn. Anyway most of those terms appear in my blacklist, so I'm sure this particular entry will remain pristine, untouched by spammy referrals, comments and so on.

Tuesday, July 04, 2006
No, I didn't decide to take on bike thieves single-handed, ending up in the hospital or worse.
No, I haven't been spending every spare moment training for a mountain bike journey.*
Instead there's been quite a conspiracy of external factors that have pretty much wiped out the time I would ordinarily spend on blog entries. I had a couple of (work-related) visitors from sunny California (and the crunch-time work associated with their visits). And a couple of minor sporting events that have drawn my attention in a somewhat predictably obsessive way (World Cup, and now the Tour de France). And this year's journey to renowned music festival Tapestry Goes West (perhaps deserving its own entry, although I fear I'll end up writing more about buying loads of books in Hay, and watching England v. Portugal in a rugby-preferred pub in Port Talbot, Wales, then about the festival itself). And all sorts of work-related work that has somehow found itself all plopping onto my desk at once. Surely this will all evaporate soon.
*Although I have been doing a lot of investigation about the possibility of improving my touring bike's gear ratio for mountain climbing. It seemed like a fairly simple process to upgrade the rear cluster to an 8-speed (currently six), although I would need to obtain a new wheel with a slightly longer axle. Too bad the bike is of a retro style, most notably with 27" wheels which are not exactly easy to find in this day and age (the 700c is now standard). Switching wheels to 700c... well first of all it would probably require switching both front and back (additional cost) + tires for both. And it also seems I'd need to change the brakes as the current ones aren't very adjustable (when it comes to wheel diameter). So it seems I may be returning to my original plan: just putting on a different 6-speed cluster on the rear, one that has a serious granny gear. Plenty of wasted time getting to this conclusion, though.

Wednesday, May 31, 2006
One of my real difficulties with blogging is the extent to which it outright encourages obsessive-compulsive behavior on my part. Under ordinary circumstances I already score quite highly on whatever OCD scale is thrown at me. Although I no longer count continuously as I did as a youngster (counting under my breath simply for the sake of counting to a high number: 12,000 on one long family trip as I vaguely recall), I do occasionally fall into the number trap. The most recent example, determining which of the various local bus stops is actually closest to our front door:
414 paces: bus stop on Seven Sisters road serving routes 259/279 towards Manor House (and its counterpart headed toward Seven Sisters station and beyond just across the street, and approximately 50 paces further). Not very useful for my own journey to work, though. Hence,
588 paces: bus stop at Stamford Hill rail station, serving routes 253/254 (both directions approximately equidistant). However, on the way home it's slightly faster to get off a couple of stops earlier, getting home a little faster despite the added walking length of
614 paces: bus stop on Amhurst Park serving routes 253/254 towards Stamford Hill and Hackney. The best bet for minimizing the time waiting for a bus toward London, however, is a few steps further,
756 paces: bus stop on Seven Sisters Road, all four routes (253,254,259,279) converging. As Mrs. Dunce's commute can involve any of 253,254,259, this is her best bet (and often my own as well). Although sometimes a 259 may pass by while we trek the 342 steps between the nearest stop and this one.
I should note that there remain a few nearby bus stops not listed here: the 67 goes fairly close, along St. Ann's Road, but we seldom have occasion to take this bus. Another pair of 253/254 stops (between the two noted above) is closer to our house as the crow flies, but not on foot.
Anyway, erm, this is all to say that I occasionally fall into the trap of obsessive-compulsive behaviour, and that this is exacerbated by blogging as I occasionally feel additionally compelled to document these sorts of records rather than tracking for the short term, and then discarding them forever. As a frightening illustration of meta-obsessive-compulsive behavior here's a short list of my obsessive-compulsive topics, only covering my first six months of blogging (frankly, because even I have a limit, and I had no idea there were so many. Thus I haven't even reached what are in my mind the most egregious examples)....
What color is the new black?. In which I do a bit of Googling to try and decide which color (of many) is "the new black"
Saarbruecken. Saints of July 18, ranked in order of "saintliness"
Pub misery. Searching pub review sites to find the most miserable in North London.
London by Routemaster. First in a series of maps depicting the shrinking coverage of London's Routemaster buses.
Tip of the tongue. Documenting in slightly painful detail my long-running tip-of-the-tongue experience for the name "George Formby"
Could Do. Describing the tendency in UK English to use expressions like "Well, I might do."
Meal Time. Various English terms for meal times used in different parts of the world.
Recent speechifying. A shockingly dull breakdown of word frequency in a couple of speeches by George Bush and Tony Blair.
Beer festival. Box-ticking and ratings of beers consumed at a festival.
Eatin' vittles. Variation in the terms "vittles" and "victuals".
Not so obligatory plurals. Terms like "spectacles" and whether they should remain plural in phrases like "Spectacle wrangling".
Lady Marmalade. A bit too much on the origin of the term "marmalade"
London by Routemaster II. Another map depicting the further-shrinking coverage of London's Routemaster buses.
Breaking the Law, Breaking the Law. One of three sequential entries describing my attempt to cycle to work, strictly observing the rules and guidelines of the Highway Code. Sadly, based on a strict interpretation beyond the Highway Code itself.
Travel Games. Happy memories of childhood obsessions.
The Next Day. Overly detailed description of my journey to work through various roadblocks that remained in place a day after the London bombings of 7 July 2005.
Olympic Fever. My random selection of badminton as the sport I will pursue in the 2012 Olympics. At least now I can compete for Britain.
Of Nerds, Spazzes, Wonks and Dweebs. Etymology of various terms related to geekery.
Traffic Calming. A bit too much on the various devices and systems used for traffic calming. Here's where you find the difference between speed cushions and speed bumps.
More Ideal US Locations. Learning a little more about the cities that appear high on the list of "Your ideal US locations" generated by findyourspot.com
Route Planning. Fine details of minor variants in my commute.
Absorbubbles. Why does the nasty marketing term "absorbubbles" sound so bad?
Slug Bugs Gone Wild. Detailed rules for our own variant of the "Slug Bug" game.

Friday, April 28, 2006
It's official! We've received our official "CITIZENSHIP INVITATIONS":
"APPLICATION FOR BRITISH CITIZENSHIP BY
Dunce
I am pleased to tell you that this application for British citizenship has been approved. To complete the process of becoming a British citizen, you will need to attend a citizenship ceremony to receive your certificate. In the ceremony you will take an Oath or Affirmation of allegiance to the Crown and a Pledge of loyalty to the United Kingdom. This is a formal promise to Her Majesty the Queen and the United Kingdom."
The choice between Oath or Affirmation depends upon whether Almighty God is mentioned or not. We'll do the ceremony sometime in the next three months (to be scheduled quite soon). It only took one month for our citizenship to be approved (from the day we submitted our documents: 20th of March); we were shocked it happened this quickly as the official website gave an expected waiting time of four months. We haven't even thought about planning the party. I guess our applications were very straightforward.
The date on the official letter was the 20th of April (Adolf Hitler's birthday, the date of the Columbine school shooting, and also a date revered by numerous American cannabis users for its resemblance to the not-at-all-secret number 420). In a not very modern twist, both citizenship invitations arrived in a single envelope addressed only to me. I like to think this was only because my name was on the payment slip, rather than Mrs. Dunce being considered chattel.

Monday, March 06, 2006
I don't usually comment on the
various search terms that lead people to my blog, but came across one I
couldn't resist. A couple of days ago, someone visited from a Google
search for the following: nicer summer dresses suitable for wearing to a beach wedding (my blog is #16 on Google at the moment for this search).
It would be rude of me to criticize this query on the basis of its
not-entirely sensible use and combination of search terms (I personally
would start with "summer dress" "beach wedding" but then I'm
not in the market for dresses of any season and have not been invited
to any beach weddings, so maybe this is just sour grapes), but the
searcher clearly wasn't satisfied with the outcome if s/he actually
visited my site looking for it.
In fact, I cannot think of many sites more poorly suited for answering
such queries. The Dunce is known for the following fashion tips*.
1. Wear clothes until they wear out.
2. Wear a shirt on the upper half, and shorts on the lower half.
3. During the months of November, December, January and February, "shorts" in Tip 2 can be replaced by "trousers".
4. Purple is a favourite autumnal colour, brightening up the catwalks
year after year, but this season it's more wearable than ever, coming
in a range of gorgeous shades from lilac through to plum.
5. Put on a clean pair of underwear in the morning.
6. Wearing dirty clothing is fine, as long as the crotch area is mostly clean looking.
7. A wristwatch is a handy accessory if you wish to know the time on a regular basis.
So if you ask me about a nice summer dress suitable for wearing to a
beach wedding, I'm most likely to try and change the subject (likely
topics: various techniques for adjusting chain tension on a
single-speed bicycle; different kinds of speed control devices on
London roadways; why "Menzies" is nicknamed "Meng"; how history would
have changed if the embattled men at the Alamo had held out for another
week). If you do pin me down, don't blame me if you appear on the
"FASHION DON'Ts" page of the celebrity mags.
*OK, I admit, one of these is plagiarized from somewhere else.

Tuesday, December 13, 2005
Dear Friends,
Please forgive my excruciatingly long lapse in posting. Contrary to
what some of you are surely thinking, my recent trip to the US did not
include any road-to-Damascus moments which resulted in an overt
decision to post much, much less. Instead, I brought back with me not
only pleasant memories and bulging luggage (and belly) but some sort of
debilitating chest cold / flu / wasting fever which developed into a
lovely case of bronchitis, causing me to take to my bed for a period of
some days. Rather than walking you through the specific symptoms
(therefore you will have no need for your phlegm color chart on this
particular occasion) I will only report that I am back in business (if
my efforts can be said to be businesslike in any way; I am sure many
counterpoints can be made to such a claim). Normal posting will resume in due course (at least, I hope so).
Cheers,
The Dunce

Thursday, October 20, 2005
While writing a previous entry I noticed a high frequency of the term "fortunately" in my posts. Perhaps I've had many fortunate experiences, or perhaps I've been telling lots of tales involving possible misfortune, but in which the worst possibilities did not come to pass. Or maybe I just like the word "fortunately". Anyway, since I've been doing some simplistic work analyzing corpora of texts, I thought I'd turn these analyses on my own blog entries and see what other atypical patterns of word choice are present in my writings (up to and including my last entry). I am focusing here strictly upon word frequency: what uncommon words do I use especially frequently? what common words do I use less frequently than would be expected? And what do I write about the most, just in terms of the content words I recycle again and again?
For the sake of simplicity I am using a somewhat out-of-date word frequency database (Kucera & Francis, 1967. Information on the corpus can be found here); this was once the accepted source of word frequency information (approximately 1,000,000 words from 500 different sources), although much larger texts have since supplanted this database (for example, the British National Corpus is based on 100m words). To give you an idea of the distribution, here are a few of the most common words in the K&F corpus and how often each one occurred:
THE 69971
OF 36411
AND 28852
TO 26149
A 23237
IN 21341
THAT 10595
IS 10099
WAS 9816
HE 9543
I combined all the text of my blog entries (including titles, picture captions, and the text of hyperlinks, but not including dates, category labels or comments) and calculated how often each word occurred (a handy online tool for doing this can be found here). I discarded all words that occurred less than five times, and obtained K&F frequency values for each of the remaining words (a handy tool to do this and more can be found here). My ten most frequently used words were quite similar to the K&F set (above):
THE 3218
A 1663
OF 1646
TO 1477
AND 1242
IN 994
I 942
IS 602
FOR 478
IT 470
There are generally similar patterns between the two although I am clearly talking about myself more than the K&F sources ("I" is the 7th most popular word in my writing, and 20th most common in the K&F corpus), and less about other men ("HE" is #10 in K&F, but barely squeaks into the top 50 in my list).
When it comes to "fortunately" (and words like it), unfortunately I neglected to consider an important aspect of the K&F frequency database: it seems that certain kinds of derivational terms were counted under their stem rather than as a specific wordform. So "fortunately" (which I have used 40 times) did not ever occur in the K&F database. Nonetheless, a list of my most frequently used words that never occur in the database is still somewhat informative about my usage tendencies. Among those that don't occur for derivational reasons are (in decreasing order of frequency)
especially (50)
seems (50)
fortunately (40)
words (33)
times (31)
folks (27)
things (25)
minutes (23)
probably (23)
definitely (22)
So it's not just "fortunately" but quite a few other similar adverbs that characterize my writing. Some other terms that I use frequently but don't appear in the database are contractions (I'll, 51; that's, 32; I'd, 31; there's, 21) or abbreviations (ABV, 40; UK, 33; OED, 23). Once all of the above are excluded we are left with the terms that I definitely produce more frequently than the database would predict:
dunce (61) (no surprise there)
bike (39) (I am quite bike-obsessed, and perhaps this abbreviation for "bicycle" is more popular now than in the mid-60s? It's been around since the 1880s, though.)
blog (30) (a very new term: OED's earliest citation is 1999, although the source "weblog" is seen as far back (!) as 1993.)
google (24) (rarely used except in cricket until 1996)
Tallinn (19) (I guess there was not so much mention of Soviet cities in the [American] texts that made up the K&F corpus).
website (14) (another new one; OED's first citation ("WEB site") is from 1993)
spam (14) (The product made of pork shoulder and ham certainly existed in the sixties, but this dirty little secret was brushed under the rug as far as the frequency corpus goes. Spam as a verb dates back only to 1991, again according to OED [but which does not mention the Monty Python origin)
So there are a few (but not many) quite predictable terms that I use more often than the corpus would predict. Now how about the other direction? I selected the 200 most frequent words in the K&F database and checked which (if any) I used less than five times. There were four such words: (wept, 507; united, 482; government, 417; knew, 395). "Wept" and "knew" are irritating because these are clearly derived from "weep" and "know" (why do these appear in the database, but "especially", "seems" and "fortunately" do not? Probably because they're irregular, but still...). I don't use the word "weep" in regular conversation unless I'm being dramatic, but am surprised not to have mentioned "knew" given my constant discussions that seem related to knowledge). "United" and "government": my infrequent use of these terms is probably a very good sign that I'm not a political blogger (I get riled up enough writing about traffic, meal times; classifications of nerds and so on).
Finally, I looked at all of those words that appear both in the frequency database and my own writing. I did some statistical tricks1 in order to assess which words occurred unexpectedly often in my writing (as predicted by K&F frequencies), and which words occurred unexpectedly rarely. Here are the results:
My "unexpectedly often" words came from specific topic areas which I must admit I've spent perhaps too much time on: the consumption of alcohol (pub, ale, beer, cider), transportation (zebra, bus, cycle, traffic, destination, commute, London, route), language (noun, etymological, Albanian, verb, slang), and other more specific matters which have drawn my attention (marmalade, Portuguese, quince; slug, bug; badminton). Strangely very little about music ("festival" had a z-score of +1.79 but I've also referred to beer festivals). I should also note here that "toilet" still appears more often in my language than would be expected. I'm still the same little boy who got in trouble on a third grade assignment to write sentences including the words from that week's spelling list. All of my sentences included the word "toilet", and I was therefore given the opportunity to write "toilet" another 500 times. It clearly didn't cure me of it. In general, I also used content words (the, a, an, to, etc.) more often than would be expected from the corpus; perhaps this comes from my (attempted) conversational tone.
When it comes to words I didn't use as often as would be expected, there were a lot of male terms (men, himself, man, "John", Mr., him), and a lot more terms which you'd expect to see a lot on your bog-standard political blog (system, social, state, development, program, action, war, court, general, power, against, society, American, freedom, business). Am I intentionally avoiding these hot-button topics? Yeah, I guess so.
1Technical note: Frequency data like these are notoriously exponentially distributed, so in order to do this comparison I first transformed frequency by taking the logarithm, then converted the log frequencies into z-scores within each sample (K&F z-score for "the" = 4.16; K&F z-score for a word with frequency 1 = -3.22). I took the difference between K&F z-score and the z-score derived from my own word frequencies as a measure of the difference beyond the distributional patterns.

Friday, October 14, 2005

Wednesday, October 05, 2005
As a blogger who hasn't been at it for so long (110 entries and not quite 9 months), eventually I must come to the time when I express amusement and befuddlement about the search terms that bring visitors to my site (perhaps in part as a not-so-subtle announcement that the blog is being read by more than just my parents, siblings, spouse and relatives-in-law [insert obligatory in-law joke here]). Now is that time; sorry about that. I've already mentioned the frequent visits by referral spammers (here) but now I'd prefer to discuss real visits by real people. Most of my regular visitors seem to come from bookmarks or (one of a few) blogrolls, and occasional visitors follow links from other blogs (thanks for linking me!). And then there are those who reach me by web searches. Especially Google's fairly recently-launched blog search. As it turns out, here are the top 5 search terms in the past month or so:
1. Inzest: (German translation of "incest". Who would have thought my post about the Inzest-Baby would be so popular. Yes, I do come from Indiana. Yes, my parents do live in Kentucky. That doesn't mean anything in this day and age! Anyway, I suspect (hope?) most of these visitors are leaving entirely unsatisfied.
2. Zigni House: (Eritrean restaurant in north London). My review was a good one and there are not so many other reviews of this place online (undeservedly few!). I'm going there again soon, I promise.
3. Confederacy of Dunce: I'm pretty sure these are all misplaced references to the excellent novel A Confederacy of Dunces which is of course the inspiration for the name of my blog. I share perhaps a few too many characteristics with a particular character in that novel.
4. Boswelox: I was irritated at the pseudo-scientific tone of advertising (boswelox is frankincense + manganese), and I'm not the only one curious about this mysterious, amazing substance which (allegedly) helps reduce the appearance of lines caused by facial micro-contractions. Bah!
5. Sawney Beane: Lots of people are curious about this legendary cannibal about whom I wrote back in the very early days of my blog (only my seventh entry!). He's also known as "Sawney Bean", and apparently Sawney is a nickname for Alexander. No official word yet on whether he really existed, though. Here is the original post (in which I take a fairly a-sawney-ic position).
I can't leave this topic without mentioning my favorite searches of the month (none of which are actually relevant to anything I've written). Special credit is due for the MSN search: do girls fart. Although I have not written on this subject before I will officially reply with a solid "Yes". Second favorite is transporting a motorbike in an inflatable boat. Although I haven't written anything about this before either, I think I'll step forward with an equally solid "No". Finally there was gorge warshington. I'm not quite sure how this found me, but nonetheless it did (But not any more. If you google gorge warshington dunce, you get only one page [not mine]). I like this alternate spelling and may adopt it myself.

Friday, September 16, 2005
A curious thing has been happening in the world
of spam and its intersection with my blog. I've gone through the
standard blog growing pains of dealing with comment spam a while ago
(also trackback spam, but this has been only a very minor problem
thanks to the dasblog upgrade). For those few who don't know, comment
spam is when someone places an advertising comment on a blog (intending
to have it displayed to other readers who read the comments and
possibly follow their links, also possibly trying to gain better
listings from search engines). I suddenly started getting a lot of
comment spam, which was easily stopped by requiring commenters to
recognize and type in distorted letters (captcha)
in order to make a comment. Only one spammer has made it through to
leave comment spam since then (related to construction services in
Philadelphia, and entered [by hand I assume] on this comment which I suppose is loosely related).
The real problem (and it's only a problem behind the scenes) is
referral spam. As is the case for most blogs, mine keeps details of who
is visiting my site (what links they clicked to get here, what sort of
browser they are using, when comments were added, etc.). Referral spam
abuses this system, making it look like visitors have come to a blog
from a commercial site (at least for my site, almost entirely related
to poker and/or diet pills, the names of which I have intentionally
included in this post without munging them in order to see whether this post attracts undue attention). Some blogs (like this one
for example [it hasn't been updated in a while, and has various other
problems to boot]) display an automatically-generated list of the top
referrers, which is probably why this sort of referral spam has caught
on (I doubt Mr. Max [former contestant on the UK version of the reality
TV program "The Apprentice" {Alan Sugar instead of Donald Trump?!?}] is
actually getting loads of referrals related to phentermine, norwegion
cruises [sic], ringtone, cheap calls, cialis, jackpots, virtual slots,
etc.). But on my blog, referrals are not displayed anywhere but to me
(when I look at the logs). In fact, the updated version of dasBlog goes
some ways to prevent referral spam. But my logs remain full of blocked
referrals (something like 30 today so far, nearly all blocked because
of texas-holdem, free-online-poker and the like, although once in a
while a referral spam will make it past the block. But it doesn't
matter because I don't display referrers anywhere). Like buses they
come all at once: a single spam site "refers" to one of my entries,
then quick as a wink they troll through various other entries with
their false referrals (I really appreciate real referrals, by the way,
like this one). But it seems like almost always the same entry is the one they start with: Cowes to Lymington and back again.
How and why they started with this entry, I have no idea. Other entries
of mine are far more popular in gaining visitors who get here through
search engines. This is my only entry about sailing (unless I mentioned
the 1980 Olympics in one of my Tallinn posts), perhaps spammers are
looking for sailors? Perhaps I'll write an entry about boxers and see
whether spammers are following the lead of Nina Hamnett.