While going through Google’s Webmaster Tools early today for one of my sites, I noticed under Crawl > Blocked URLs, that the number of URLs that Google had blocked was 0, even though the site has a robots.txt with a few rules, and Google has crawled and indexed 10s of thousands of pages so far.
Upon further inspection, I noticed a peculiar thing:
Line 1: User-agent: * Syntax not understood
Interesting? How can User-agent not be recognized?
Turns out that one of the things that could be causing this, is if the robots.txt is saved using the UTF-8 encoding, then a BOM (Byte order mark) can be inserted at the beginning of the line, which Google will read, and not like.
Here’s what I had in my robots.txt before:
To solve this issue, there were two things that I had to do.
First, was to open the robots.txt file in Notepad, and then do a Save As and selecting ANSI as the encoding
Second thing I had to do, was add a comment on the first line, and then press ENTER so that User-agent went to the second line.
After re-uploading it to the site, and running the test again, the message went away.
Hopefully this will help out anyone else who has came across this same problem.