I am a huge fan of reddit. Sometimes I find the need to look at all the links in the hope of stumbling upon some absolute intellectual treasures. Here is the ruby script that I use to get all the links of a given subreddit.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
require 'nokogiri'
require 'open-uri'
require 'logger'
 
$LOG = Logger.new("documentaries.log");
$i=1;
 
def get_links(url)
  begin
	doc = Nokogiri::HTML(open(url))
 
	doc.css('a.title').each do |link|
	  $LOG.info ("#{$i} #{link.content} --- #{link['href']}")
	  $i=$i+1
	end
 
	doc.css('a[rel="nofollow next"]').each do |link|
	  $LOG.info ("NextPage: #{link['href']}")
	  get_links(link['href'])
 
	end	
  rescue Exception => e  
  	$LOG.error e.message
        $LOG.error "OOps some issue in get_links method -- " + url
  end
end
 
get_links('http://www.reddit.com/r/documentaries')
  1. I need to to thank you for this wonderful read!!
    I absolutely loved every bit of it. I hve got you book mrked
    to check out new things you post…

  2. I was wondering exactly how do I use the above Information to get into the Doc?
    where do I enter it and do I copy and paste the whole page there or just one line?

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>