Author Topic: Robot.txt Query & Archive.org Exclusion  (Read 1251 times)

0 Members and 1 Guest are viewing this topic.

Offline Thundercraft

  • Warrant Officer, Class 1
  • *****
  • Posts: 81
  • Thanked: 6 times
  • Ensign Navigator
    • View Profile
Robot.txt Query & Archive.org Exclusion
« on: December 10, 2015, 10:10:00 AM »
Currently, the Aurora wiki is online. (Sometimes, the wiki is taken offline.) But, I found a page that would not display due to some error.

Anyway, I tried to use the Internet Archive Wayback Machine (archive.org) to see their copy of the page. Unfortunately, all they gave me is an error message:

Quote
Page cannot be crawled or displayed due to robots.txt.

See aurorawiki.pentarch.org robots.txt page. Learn more about robots.txt.

I've seen this problem more and more with archive.org. It's usually due to domain parking outfits that buy out domain names purely to display ads and generate ad revenue. They have extremely restrictive robot.txt exclusions that prevent any and all forms of bot crawling and also legally prevents Archive.org from displaying any archived/cached content they may have (even content from before robot.txt was changed by the new owner).

Currently, your robot.txt is very simple:
Quote
User-agent: *
Disallow: /

If I understand correctly, doesn't this block any and all forms of bots and web caching?

My question:
I understand the need to have a robot.txt that blocks unnecessary bots which suck up precious bandwidth. But isn't a compromise possible to allow Archive.org and search engines like Google to cache your pages without allowing just any bots? Perhaps exceptions could be added specifically for Google and archive.org?

Most websites that I've checked out on Archive.org display just fine. To me, that says they struck a balance between blocking unnecessary bots/crawling while still allowing Archive.org and search engines to do their thing.
"Not only is the universe stranger than we imagine, it is stranger than we can imagine." - Sir Arthur Stanley Eddington
 

Offline Erik Luken

  • Administrator
  • Admiral of the Fleet
  • *****
  • Posts: 5006
  • Thanked: 78 times
    • View Profile
    • Arkayn Game Design
Re: Robot.txt Query & Archive.org Exclusion
« Reply #1 on: December 10, 2015, 10:19:46 AM »
Which page didn't display?

You are right about the robots.txt. It was also to combat spam before I made the logins really restrictive (needing a valid account here on the forums).

I can look into freeing it up some, though that probably won't happen until the weekend. :)
 

Offline Thundercraft

  • Warrant Officer, Class 1
  • *****
  • Posts: 81
  • Thanked: 6 times
  • Ensign Navigator
    • View Profile
Re: Robot.txt Query & Archive.org Exclusion
« Reply #2 on: December 10, 2015, 10:26:43 AM »
Which page didn't display?

It was Beam Weapons and CIWS. However, when I tried again it displayed fine and I can't recreate the error. I think the page may have merely timed-out due to a hiccup and my slow connection.

I can look into freeing it up some, though that probably won't happen until the weekend. :)

It's appreciated. Though, hopefully, we won't have to read archived wiki pages any time soon.  ;)
"Not only is the universe stranger than we imagine, it is stranger than we can imagine." - Sir Arthur Stanley Eddington
 

Offline Erik Luken

  • Administrator
  • Admiral of the Fleet
  • *****
  • Posts: 5006
  • Thanked: 78 times
    • View Profile
    • Arkayn Game Design
Re: Robot.txt Query & Archive.org Exclusion
« Reply #3 on: December 10, 2015, 10:39:04 AM »
My host has issues with the Aurora wiki... They may have throttled it.

I would like to get the data from the wiki and put it in the KB here.
 

Offline 83athom

  • Big Ship Commander
  • Vice Admiral
  • **********
  • Posts: 1196
  • Thanked: 79 times
    • View Profile
Re: Robot.txt Query & Archive.org Exclusion
« Reply #4 on: December 10, 2015, 11:07:02 AM »
I got onto the Beam Overview page. You want me to C&P it and other important pages to a word file(s) so one of you can have a field day editing them?
« Last Edit: December 10, 2015, 11:10:06 AM by 83athom »
Give a man a fire and he's warm for a day, but set fire to him and he's warm for the rest of his life.
 

Offline Erik Luken

  • Administrator
  • Admiral of the Fleet
  • *****
  • Posts: 5006
  • Thanked: 78 times
    • View Profile
    • Arkayn Game Design
Re: Robot.txt Query & Archive.org Exclusion
« Reply #5 on: December 10, 2015, 11:21:59 AM »
I got onto the Beam Overview page. You want me to C&P it and other important pages to a word file(s) so one of you can have a field day editing them?

You could just copy & paste it into a new KB article :D
 

Offline 83athom

  • Big Ship Commander
  • Vice Admiral
  • **********
  • Posts: 1196
  • Thanked: 79 times
    • View Profile
Re: Robot.txt Query & Archive.org Exclusion
« Reply #6 on: December 10, 2015, 11:33:00 AM »
They're getting 403s when I try to post, and I'm busy atm so I can't go into it and fix. Although it does look good it a word document (I'll attach below) (cannot attach .docx apparently)
Give a man a fire and he's warm for a day, but set fire to him and he's warm for the rest of his life.
 

Offline Erik Luken

  • Administrator
  • Admiral of the Fleet
  • *****
  • Posts: 5006
  • Thanked: 78 times
    • View Profile
    • Arkayn Game Design
Re: Robot.txt Query & Archive.org Exclusion
« Reply #7 on: December 10, 2015, 01:19:53 PM »
They're getting 403s when I try to post, and I'm busy atm so I can't go into it and fix. Although it does look good it a word document (I'll attach below) (cannot attach .docx apparently)

Allowed file types gif, jpg, pdf, png, csv, txt, zip

Zip it. :)
 

Offline Mor

  • Commander
  • *********
  • Posts: 305
  • Thanked: 10 times
    • View Profile
Re: Robot.txt Query & Archive.org Exclusion
« Reply #8 on: January 08, 2016, 10:38:17 AM »
Currently, the Aurora wiki is online. (Sometimes, the wiki is taken offline.) But, I found a page that would not display due to some error.
I think that Erik or host, changed the setting. Previously, the setting exuded the forum, now it exclude the wiki.. appeantly we can't we have both  :(

I got onto the Beam Overview page. You want me to C&P it and other important pages to a word file(s) so one of you can have a field day editing them?
Honestly, KB is a waste of time. People always get excited about new things, but I have yet to see single good implementation. KB is like tutorial post and suffer from much of the same limitations\issues, especially with continued development.

My host has issues with the Aurora wiki... They may have throttled it.

I would like to get the data from the wiki and put it in the KB here.

You might want to backup wiki DB to avoid disappearing info, especially if you are considering changing hosts.
 

 

Sitemap 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51