Scrape and download pdf files from google or bing using PowerShell

Here is a very simple script that you could execute using PowerShell ISE. It could probably be written much better, but it works. The script just uses the power of the google search engine by searching for a specific filetype. This should also work with the Bing search engine.
To make the script work, make sure you have a directory C:\temp\dwnld\ created. Also you could easily change the regular expression pattern and the keywords.

Comments with modifications on the scripts are always welcome ;)

$keywords = @("manual", "microsoft", "powershell")
$pattern = 'http://(.*?)[.]{1}pdf'
$storageDir = "C:\temp\dwnld\"
$filetype = "pdf"
$rand = New-Object System.Random

$keywords | foreach {
    $urlToScrapeWithKeyword = "http://www.google.be/search?hl=nl&tbo=d&biw=1229&bih=677&output=search&sclient=psy-ab&q={0}+filetype%3A{1}&btnK=" -f $_, $filetype
    $urlToScrapeWithKeyword | Out-Default
    (Invoke-WebRequest -UseBasicParsing -Uri $urlToScrapeWithKeyword).Links | select -ExpandProperty href | Get-Unique | foreach {
        if ($_ -match $pattern) {
            $Matches[0] | Out-Default
            try {
                Start-BitsTransfer $Matches[0] $storageDir
                "Download ok" | Out-Default
            } catch [exception] {
                "Download failed:" | Out-Default
                $_.Exception.Message | Out-Default
            }
            "Sleeping" | Out-Default
            Start-Sleep -s $rand.Next(20, 43)
        }
    }
}

Enjoy ;-)
Don’t forget, web scraping can be illegal! Use it with care!

Web Scraping with PowerShell

In PowerShell v3 you have some new useful cmdlets that allow you to download and parse a website.
The code in this post will demonstrate very basic scripts that could get you started with Web Scraping.

If you don’t know if you have PowerShell v3, use this command to find out:

get-host

The first script to get you started with web scraping:

$site = Invoke-WebRequest -UseBasicParsing -Uri www.bing.com
$site.Links | Out-GridView

This will give you all the links from the given website in a gridview.

The next script will give you all the email addresses that are in a mailto: anchor:

$site = Invoke-WebRequest -UseBasicParsing -Uri www.mywebsite.net
$site.Links | foreach {
if ($_.href.ToLower().StartsWith("mailto:")) {
$_.href.SubString(7) | Out-Default
}
}

By coincidence the ‘mywebsite.net’ has anchors using the mailto: prefix.

The last script is a very cool script from StackOverflow where I just modified the url to make sure the script works in several European countries:

function Get-FlightStatus {
     param($query)
$url = "http://www.bing.com?cc=us&q=flight status for $query"
$result = Invoke-WebRequest $url
$result.AllElements |
        Where Class -eq "ans" |
        Select -First 1 -ExpandProperty innerText
}

Use it like this:
(to test you can just paste this after the function in Windows PowerShell ISE )

Get-FlightStatus LH3102

It will give you a result similar to this:

Flight status for Lufthansa 3102 
flightstats.com · 2 minutes ago   

Departing on time at 5:35 PM from HAM 
FROMHAM 
Hamburg5:35 PM 
12/30/2012Terminal 2 
TOVIE 
Vienna7:05 PM 
12/30/2012

PS C:\>

Don’t forget, web scraping can be illegal!

Have fun ;-)

Take a look at “Web Scraping with Perl” and the PowerShell tag.

SharePoint service not working after changing password and installing updates.

These steps will guide you in case you recently changed a password in Active Directory of an account that is used by SharePoint 2010 and you notice that your services are not working anymore.

For example, my search service stopped working:

“The search request was unable to connect to the Search Service.”

  1. Log in to the SharePoint 2010 Central Administration
  2. Under “System Settings” go to “Manage services on server”.
    Here you will find services that have stopped, because the password is wrong.
  3. If you try to start a service it is possible that you will receive a message that it doesn’t work,   because the passwords are different.

It is easy to fix this. Start the SharePoint 2010 Management Shell. Type in this command:

Set-SPManagedAccount -UseExistingPassword

It will ask for your identity (= username) and it will ask for the password of that account.

More information can be found here: http://technet.microsoft.com/en-us/library/ff607617(v=office.14).aspx

If you receive an error similar to

"Set-SPManagedAccount : Microsoft.SharePoint is not supported with version 4. of the Microsoft .Net Runtime. ..."

after entering the command. You might receive this message after installing recent Windows Updates.

  1. Go to the Windows start menu
  2. in the “Search programs and files” textbox from the Windows start menu search for SharePoint
  3. right mouse click on the “SharePoint 2010 Management Shell” and select “Properties”
  4. In the Target input box change this:
    C:\Windows\System32\WindowsPowerShell\v1.0\PowerShell.exe -NoExit  " & ' C:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\14\CONFIG\POWERSHELL\Registration\\sharepoint.ps1 ' "

to (add -v 2 or -version 2)

C:\Windows\System32\WindowsPowerShell\v1.0\PowerShell.exe -v 2 -NoExit  " & ' C:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\14\CONFIG\POWERSHELL\Registration\\sharepoint.ps1 ' "

Now your SharePoint will start with another powershell version.

After changing your password you can go back to the SharePoint 2010 Central Adminitration and start your services with the new password (it is possible that you will have to enter your password again when starting the SharePoint service).

Windows Server: logging users logon and logoff via PowerShell

You are planning a migration and you want to track and monitor for a few weeks when your server is being used the most?

  1. Open Windows PowerShell ISE ( or notepad ;-) )
  2. Add this PowerShell line below and save the script as monitorlogon.ps1
  3. "logon {0} {1} {2:yyyy-MM-dd HH:mm:ss}" -f $env:username, $env:computername, (Get-Date) >> logon.log
  4. Create another script file, add the PowerShell line below and save it as monitorlogoff.ps1
  5.  "logff {0} {1} {2:yyyy-MM-dd HH:mm:ss}" -f $env:username, $env:computername, (Get-Date) >> logoff.log
  6. Start the Logal Group Policy Editor ([Windows]+[r] > gpedit.msc)
  7. Navigate to [User Configuration] > [Windows Settings] > [Scripts (Logon/Logoff)]
  8. Double click on the [Logon] name
  9. Navigate to the [PowerShell Scripts] tabpage
  10. Click the [Add] button and select your monitorlogon.ps1 script.
  11. Optionally you can select the execution order, default is set to “Not configured”.
  12. Repeat again from step 6. for the Logoff script.

You can change the >> filename.log part to >> \\MyShare\filename.log.

If you want to do this on a Windows Server 2003 where you can’t run your PowerShell you will need to save the file as an *.cmd:

  1. Create a new file and call it monitorlogon.cmd
  2. Enter the line below and save the script as monitorlogon.cmd:
  3. echo logon %username% %computername% %date% %time% >> C:\logon.log
  4. Repeat this for monitorlogoff.cmd and adjust the script line.
  5. Follow the steps from the PowerShell script.

How to find out if your Linkedin password was found via PowerShell

First you need to download the combo_not.zip file and unpack it.
(for example read the comments on this post: http://tweakers.net/nieuws/82411/wachtwoorden-miljoenen-linkedin-gebruikers-op-straat.html )

Next drop the combo_not.txt file in your C:\ drive: C:\combo_not.txt

Now open PowerShell or PowerShell ISE and run the PowerShell script below:
(don’t forget to change YourPasswordHere):

cd c:\
$pass = "YourPasswordHere"
$sha1 = [System.Security.Cryptography.SHA1]::Create()
$bytes = [System.Text.Encoding]::UTF8.GetBytes($pass)
$hashArray = $sha1.ComputeHash($bytes)
$hashArray | foreach -Begin{$str=''} -Process{$str += "{0:x2}" -f $_} -End{$str}
$str2 = [String]::Concat("00000", $str.Substring(5))
findstr -I $str .\combo_not.txt
findstr -I $str2 .\combo_not.txt

I tested it and didn’t give a result, so that must be a good thing ;-) (let’s hope it is not due to this quick script :-) )

LinkedIn commented on the stolen passwords/hashes. Read it here: http://blog.linkedin.com/2012/06/06/linkedin-member-passwords-compromised/

Update: apprently the first 5 bits need to be set to 0 to do another check if it is hacked
Update2: updated the script
Update3: reply from LinkedIn

Please post a comment if there are any suggestions/mistakes.

Learn more about the PowerShell pipeline script function: begin, process and end:

PowerShell: List all folders where access is denied

I just needed a list with all folders I couldn’t access.

Here is the PowerShell script (you may have to change the directory C:\app):

$errors=@()
get-childitem -recurse 'C:\app' -ea silentlycontinue -ErrorVariable +errors | Out-Null
$errors.Count
$errors | Foreach-Object { Write-Host $_ }

If you want to see everything in your PowerShell console, remove the | Out-Null.

This is a possible output from the script:

3 
Access to the path 'C:\app\pfile' is denied. 
Access to the path 'C:\app\adump' is denied. 
Access to the path 'C:\app\diag' is denied.

Enjoy ;-)

[ technet ]

Downloading files with PowerShell using the BitsTransfer module

@sstranger told me on Twitter that you can also use the BitsTransfer module, to download files via PowerShell (previous post).

He wrote a script to download several files from the microsoft website.

Here is a modified version of his script, using test files with size 50MB and 100MB:

# path to store the data
$global:path = "c:\TEMP\"

# loads the BitsTransfer Module
Import-Module BitsTransfer
Write-Host "BitsTransfer Module is loaded"

# test data from http://www.thinkbroadband.com/download.html
$fileLinks = @("http://download.thinkbroadband.com/50MB.zip",
 "http://download.thinkbroadband.com/100MB.zip");

# start the download
Foreach ($fileLink in $fileLinks)
{
 Start-BitsTransfer $fileLink $path
}
Write-Host "Files are downloaded to $path"

Save this script as PowerShellFileDownloader.ps1 and run it in the Windows PowerShell ISE (which is included by default in Windows 7).

If you can’t run the script due to security reasons, change your execution policy:
(read more about running PowerShell scripts here)

Set-ExecutionPolicy RemoteSigned

Warning: changing the ExecutionPolicy might cause a security risk.

Some screenshots of the script running with the BitsTranfser module:

Have fun! ;)