31st March: World Backup Day 2013

backupday

World Backup Day is a day for people to learn about the increasing role of data in our lives and the importance of regular backups.

This independent initiative to raise awareness about backups and data preservation started out — like most good things on the internet – on reddit by a couple of concerned users. Let’s make this happen!

More info at http://www.worldbackupday.com

ViralSearch: Identifying and Visualizing Viral Content

What does it mean for online content to “go viral”? An analysis of almost a billion information cascades on Twitter news, videos, and photos has produced the first quantitative notion of whether something has indeed gone viral, thereby enabling further research into topic experts, trending topics, and viral-incident metrics.

More info at http://research.microsoft.com/apps/video/default.aspx?id=185452

Scrape and download pdf files from google or bing using PowerShell

Here is a very simple script that you could execute using PowerShell ISE. It could probably be written much better, but it works. The script just uses the power of the google search engine by searching for a specific filetype. This should also work with the Bing search engine.
To make the script work, make sure you have a directory C:\temp\dwnld\ created. Also you could easily change the regular expression pattern and the keywords.

Comments with modifications on the scripts are always welcome ;)

$keywords = @("manual", "microsoft", "powershell")
$pattern = 'http://(.*?)[.]{1}pdf'
$storageDir = "C:\temp\dwnld\"
$filetype = "pdf"
$rand = New-Object System.Random

$keywords | foreach {
    $urlToScrapeWithKeyword = "http://www.google.be/search?hl=nl&tbo=d&biw=1229&bih=677&output=search&sclient=psy-ab&q={0}+filetype%3A{1}&btnK=" -f $_, $filetype
    $urlToScrapeWithKeyword | Out-Default
    (Invoke-WebRequest -UseBasicParsing -Uri $urlToScrapeWithKeyword).Links | select -ExpandProperty href | Get-Unique | foreach {
        if ($_ -match $pattern) {
            $Matches[0] | Out-Default
            try {
                Start-BitsTransfer $Matches[0] $storageDir
                "Download ok" | Out-Default
            } catch [exception] {
                "Download failed:" | Out-Default
                $_.Exception.Message | Out-Default
            }
            "Sleeping" | Out-Default
            Start-Sleep -s $rand.Next(20, 43)
        }
    }
}

Enjoy ;-)
Don’t forget, web scraping can be illegal! Use it with care!