Skip navigation

Since last month, I’ve made a small change to the way I rename the files produced by my scanner.

I wrote the following short function:

function nuname ($n) {
   [void]($n -match "Page(\d+)_(\d+)")
   return "Page" + [string]([int]$matches[1] + [int]$matches[2]) + ".png"

And now, instead of naming files “Page ###”, I simply export them from the HP scanner dialog using the number of the page before the first page I scanned. The HP scanner software adds a sequence number starting with 1, so my function adds the number I supplied and the sequence number from the software, then fills that in as the page number in the resulting file name.

For example, if I start scanning a number of pages starting with page 180 from the book, I save the bunch as
179.png. The HP scanner software saves that page as 179_1.png, and my function turns that into Page180.png

Of course, it would probably make more sense to name the bunch with the actual page number of the first page scanned, in which case the function would have to be changed slightly:

function nuname ($n) {
   [void]($n -match "Page(\d+)_(\d+)")
   return "Page" + [string]([int]$matches[1] + [int]$matches[2] - 1) + ".png"

Then, when I scanned a bunch starting with Page 180, I would save the bunch as 180.png, the first one would be saved (by the scanner software) as 180_1.png, and my function would change it to Page180.png. Much more intuitive.

I’ve been scanning pages from a paperback book (by the way, did you know that Kinko’s / Fedex stores often have a paper cutter capable of cutting the spine off a paperback book, making it very easy to scan those pages?).

My new HP scanner will save files in a sequence – but each time I scan, I have to figure out to save the files so that the file name matches the page number.

For example, I started by scanning pages 18 through 29, and I saved the first file as “Page018.png”, hoping that the subsequent pages would be saved as “Page019.png”, “Page020.png”…”Page029.png”

However, what I got was “Page018_0001.png”, “Page018_0002.png”…”Page018_0012.png”

I used PowerShell to rename the files –

> $pages = Get-ChildItem c:\scans\1
> $pages |
  Rename-Item -NewName {$_.Name -replace "\d\d_\d{4}",
           [string](17 + [int]($_.Name).Substring(9,3)) }

For the next batch, I scanned pages 32 through 54, and saved the first one as “Page.png” – the pages were saved as “Page_0001.png”, “Page_0002.png”…”Page_0023.png”

This time, I had to use the following commands –

> $pages = Get-ChildItem c:\scans\1
> $pages | 
   Rename-Item -NewName {$_.Name -replace "_\d{4}", 
            ('00' + [string](31 + [int]($_.Name).Substring(7,2))) }

I do wish I could save these with the right sequence start and not have to rename them.

Other than this quirk, I confess I’m enjoying this HP device. It’s got its own wireless NIC, so I can scan directly to my home file server, and I don’t have to mess with USB on my laptop.

More PowerShell work today…

I’ve been working with the output from a Ruby script that we use to access Solr indexes for customer data. As I wrote previously, I’ve been able to use the ISE to create command lines to feed job names to this Ruby script and determine whether the job in question was successfully ingested into Solr when it was first ingested.

Using the ISE, I was able to assemble scripts – well, I say scripts. More like lists of command lines – but it still saved a lot of work to use the PowerShell ISE’s command pane to write script lines to the script pane above, then save the script pane as a PowerShell script on the server where the command needed to be run.

But I want more.

Today, I started work on scripting a tool to take a specified day (which would default to the current day) and call the Ruby script itself for all the jobs processed on the specified day. I started with a script to look up just one job (the job name was hardcoded into the script) while I figured out how to invoke the Ruby script, get the output from the Ruby script, and search the output for the information I needed.

I discovered that the PowerShell tool I needed to capture the output of the Ruby script was Out-String

Specifically, the heart of the new functionality involves a script line like this:

 $result = jruby.exe PARAM1 PARAM2 PARAM3 “20160923 <job#>” | Out-String

The part of the line in blue is the same command line format as the individual lines of the query scripts I’ve been writing. The addition of the Out-String command captures the jruby output and introduces it into the PowerShell pipeline, and setting $result equal to that routes that pipeline output into the $result variable, where I can search for the number of hits.

I then wrapped the the essential parts of this code inside a Get-ChildItem | ForEach-Object cycle. Instead of having the Ruby command line as a line of the script, I build up a string expression and used Invoke-Expression to call it:

Get-ChildItem <local\path\to\job\data\directories | ForEach-Object {
$expression = ‘jruby.exe PARAM1 PARAM2 PARAM3 ‘ +
‘”20160923 ‘+ $_.PSChildName.ToString() + ‘”‘ + “`n”
$result = Invoke-Expression $expression | Out-String
<code to parse $result and extract the number of Solr hits>

I added the extracted hits data to a variable called $Body, which I could use after the main part of the script was done as the -Body of an email to be sent through our processing mail server.

I also created an empty array before starting the Get-ChildItem | ForEach-Object cycle. Inside the cycle, whenever I found a line of output that showed 0 bytes returned from Solr, I added the associated job number to a list of $zeroByteJobs, then at the end used the job numbers to find the information needed for file of Solr addonly command lines, that I then assembled into another script that I ran separately to add those jobs to Solr.

I do have one really nagging problem – the Ruby script burps out a recommendation to use ansicon to improve the Ruby command-line experience. This output was not captured by the $result = … | Out-String combination. Thinking about it after I got home, though, it occurs to me that it might be generated as an error, and if so the way to deal with it will be to redirect it using PowerShell error stream redirection.


I’m adding a new Category to this blog, because I’m getting serious about a new subject: JavaScript.

In some ways, it’s a natural development of some work I’m doing at my job. As I wrote previously, I’ve decided to try automating our web Quality Control (QC) processes using PowerShell to drive Internet Explorer.  Of course, a lot of the available information on accessing the browser Document Object Model (DOM) is written for JavaScript programmers. In fact, JavaScript is more likely to be the go-to choice for browser automation.

I do intend to keep moving forward with my QC automation project with PowerShell and IE because it would be a zero-footprint option – our work desktops have PowerShell and Internet Explorer installed, whereas a JavaScript-based solution would require installing and configuring additional software on my coworkers’ PCs. However, there’s nothing to stop me from installing, say, Selenium WebDriver and Node.js to automate the other browsers we use (Firefox and Chrome, mainly) and improve my knowledge of the DOM.

I’m somewhere between “beginner” and “intermediate” in my current JavaScript knowledge. I’m going to be focusing on JavaScript and PowerShell and leaving other scripting languages (Python, Ruby) alone for a while.

Outside of our IT area, I think I’m the only person here at work that uses PowerShell.

I want to record some videos showing how I use PowerShell to solve some of our daily problems, so I went looking for screen recording software.

I discovered that Microsoft Expression Encoder 4 (SP2) seems to be available as a free download.

(You may remember Microsoft’s Expression suite of products from a few years ago. It looked to me like Microsoft wanted to take on Adobe in the design field – but Adobe has that market pretty well locked up).

Expression Encoder is a non-linear editing (NLE) software package for editing and encoding video. The package includes a Screen Capture tool that allows you to define an area of your Windows desktop to be recorded. Once you’ve defined the recording area, you hit the familiar red record button and everything you do within that area is recorded to a file with an .XESC file extension.

You can preview the .XESC files in the Screen Capture utility, or send them to Expression Encoder for editing and or saving as .WMV files.

The free download does not include the codecs needed to read .MPG (standard video) files or save files as .MP4 (the popular format used by non-Windows devices), but there are a number of free tools that will convert .WMV files to .MP4 files.

One thing I discovered from recording some PowerShell work – I definitely want to script my actions in PowerShell before starting to record, if possible. I had to re-do some actions to make the video smoother.

Searching a big text file in PowerShell

Where I work, we use simple text files as menus for our web applications. These menus may reference hundreds of jobs per month, and span up to 84 months (7 years) in some cases.

Looking up lines in these files can be very time-consuming. Since I’m writing scripts that have to fit around our processing automation, I usually can’t sort the files (or alter them in any other way, in fact) to make it possible to do some sort of binary search on them.

Doing a linear search might be workable if the lines I wanted were near the beginning of the files, but I usually want the information near the end (since new lines are appended to the end of the file). Also, the file server where the local copy of these files reside has historically been very dodgy, and communicating with it over our LAN has caused problems with normal processing, so I wouldn’t want to add to that load by doing Get-Content in my PowerShell script and passing the data line-by-line to the search logic.

Mind you, that server has been replaced recently, and the new one works much better (I think they upgraded from Windows Server 2003 to Server 2012, skipping right over Server 2008 – which gives you an idea how long we suffered with the previous incarnation).

Still, it’s better not to flog the network with more traffic than necessary, and in any case old habits die hard. So how do we make searching a plain text file across the LAN more efficient?

Well, for starters, I copy the entire file into a variable on my PC, so the whole thing is stored in RAM. We have ordinary desktop PCs, nothing high-end, but just about any PC you’ll find in a business environment these days will have at least 4 GB of RAM. A menu file with hundreds of thousands of lines will have an actual size on disk of just a few MB – so grab the entire file.

$big_menu = Get-Content \\Serv1\jobs\cust37\menufiles\MEMBERS.TXT

Since $big_menu is an array of strings, it’s not out of the question to do a brute-force linear search for one or two lines. If the lines you want are most likely to be at the end of the file, it’s possible to step backwards through an array of strings in a way that isn’t feasible when you’re reading the file line-by-line from disk.

Or if you know that, say, 117 jobs were processed today, and you just want the last 117 lines of the menu, you can create a cut-down version of the menu using the Select-Object cmdlet:

$todays_menu = $big_menu | Select-Object -Last 117

If the line(s) you need might be anywhere in the file, or if you’re trying to determine if a job number that ought to appear only once actually appears more than once, you’ll probably want to create an associative array, also know as a hash.

If you have a PowerShell function that can extract, say, a job number from an individual menu line, you can hash the entire file using just a couple of PowerShell commands:

$big_menu_hash = @{}        # create an empty hash
$big_menu | ForEach-Object { $big_menu_hash[Get-JobNumber( $_ )] = $_}

(if there is more than one instance of JobNumber in the file, the later instance(s) will overwrite the entries for the earlier one(s) in $big_menu_hash — you need a bit more logic than this to account for multiple appearances of a job number)

the Out-Gridview cmdlet

Maybe you don’t need to search the entire file. For example, maybe you just want to grab a couple of lines (maybe for testing before running against the entire file). Is there an alternative to opening the file in a text editor and copying the line(s) you want to another text editor window?

If you’re using PowerShell 3.0 or greater, you have access to the Out-Gridview cmdlet with the -PassThru feature. Just pipe the variable to Out-Gridview, and then use Ctrl-Click to select the rows you want:

$selected_menu_lines = $todays_menu | Out-Gridview -PassThru

Of course, you aren’t limited to piping variables to Out-Gridview – you can make a selection from a text file on disk directly:

$selected_menu_lines2 = Get-Content \\serv2\cust42\menu\MENU.TXT | Out-Gridview -PassThru

I hadn’t used it myself before today, and I was just blown away by how well it worked!

A couple of days ago, I talked about using the ISE command line to create script lines in the active script tab.

My process was:

  1. Switch to the email program,
  2. Copy the job name or full line out of the error email,
  3. Switch back to the PowerShell ISE, and
  4. Up-Arrow to create each new line

That was four steps – FOR EACH LINE.

That’s WAY too much like work. If only I could get the entire list at once…

Well, actually, I can.

The unique job name that I copied out of the error emails for each line is also the name used for the directory containing that job’s processed files on our processing server. I can build the query using the name of the directory, instead of copying each problem job name out of the error email.

But how do I know which directories contain jobs that generated an error email?

Actually, it’s better not to know. We’re better off running queries against ALL the jobs – that way, we can catch all the jobs that failed to get ingested into the full-text indexing/search app we use, even if the respective error email goes missing.

So to create a script that runs a query against every job processed in a day:

a. Start the PowerShell ISE – if the current script (file) tab isn’t empty, create a New one
b. In the ISE command line pane:
PS P:\> $PCE = $PSISE.CurrentFile.Editor
PS P:\> $solr_cl = ‘jruby solring.rb Cust-37 PROD hits MEMBERS “20160916 ‘
PS P:\>   # $solr_cl is the command line for the bulk of the Solr query –
PS P:\>   # oh, and it’s not quite the query that I use – that’s proprietary info, of course
PS P:\> $PCE.Text += Get-ChildItem <path-to-data>\2016\201609\20160916 |
ForEach-Object { $solr_cl + $_.PSChildName.ToString() + ‘”‘ + “`n”}
PS P:\>

So instead of taking four steps for each line of the query script, I can create the entire query script using just three steps. PowerShell FTW!

But That’s Not All! You Also Get…

Sure, it’s great that I no longer have to copy lines out of individual emails to create a hits query script for all the jobs processed in a given day, but what about creating the addonly query to ingest the missing jobs into Solr?

Well, as it happens, our Solr services were down on 9/16, so all the jobs failed and needed to be added. Also as it happened, there were 121 jobs processed on 9/16, so the hits query was 121 lines long.

The MEMBERS parameter in the Solr command line for the hits query corresponds to a MEMBERS.TXT file that contains web menu lines used by our web app. Each line of an addonly query uses the same format as the lines in the MEMBERS.TXT file.

So, to create the addonly query, I opened a New (blank) script file tab in the ISE, then entered:

PS P:\> $addonly_cl = ‘jruby solring.rb Cust-37 PROD addonly MEMBERS ‘
PS P:\> Get-Content <path_to>\MEMBERS.TXT |
Select-Object -Last 121 |
ForEach-Object {$PCE.Text += $addonly_cl + $_ + “`n”}
PS P:\> # names have been changed to protect proprietary information

But Wait! There’s More!

So I’ve cut down the process of creating these Solr scripts from 4 steps per line to 2 or 3 steps for the whole script – but what about the output?

Previously, I was watching the results of each query, and logging each result in our trouble ticket system. I built in a 15-second delay (using Start-Sleep 15) for each line, and bounced back from our trouble ticket system (in a web browser on my PC’s desktop) to the processing server where I had to run the query.

Again, that’s WAY too much like work.

The results of each hits or addonly query are logged to individual text files in a directory on the processing server. This directory is not shared – however, the processing server can send email using our processing mail server.


  • I connected to the processing server (using Remote Desktop),
  • ran the hits query (to verify that all the jobs needed to be ingested using the addonly query)
  • ran the addonly query and watched as each job seemed to be added successfully, then
  • ran the hits query again (starting at 5:45 PM) to verify successful ingestion

I then used PowerShell to create and send a email report of the results:

PS C:\> $SMTPurl = <URL of processing email server>
PS C:\> $To = “<>”, “<>”
PS C:\> $From = “<>”
PS C:\> $Subject = “Cust-37 hits for 20160916 <job#1>..<job#121> (after addonly)”
PS C:\> $Body = “”
PS C:\> $i = 0
PS C:\> Get-ChildItem <path-to-log-files>\solr_20160916*.log |
Where-Object{ $_.LastWriteTime -gt (Get-Date “9/16/2016 5:45 PM”) } |
Get-Content | ForEach-Object {
if ($_ -match “Getting” ) { $Body += ($i++).ToString() + “: ” + $_ + “`n”}
if ($_ -match “Number of hits found” ) { $Body += $_ + “`n`n" }
PS C:\> Send-MailMessage -SmtpServer $SMTPurl `
-To $To -From $From -Subject $Subject -Body $Body

Shazam! I (and the other tech in this project) got a nice summary report, by the numbers.

What’s next?

Well, these were all commands entered at the PowerShell command line, either in the ISE (on my desktop) or in a regular PowerShell prompt (on the processing server). One obvious improvement would be to create a couple of script-building scripts (how meta!) that I (or someone else) could run to create the query scripts, and a separate script (to be run on the processing server) to generate the summary email.

What if (as is usually the case) only a few of the jobs need to have addonly queries created to be re-ingested? Well, the brute-force way would be to create the addonly query with all the jobs included, then manually edit it, deleting all the lines where the initial ingestion was a success.

But the slick way would be to scan the query results log files, get the job numbers of the jobs that failed to ingest, and pull only the corresponding lines from the MEMBERS.TXT file.

(Spoiler: one way would be to append the name of each failed job to a single string, then get the contents of MEMBERS.TXT, extracting the lines that -match the string of failed jobs, and use those lines to create the addonly query.

It might be faster, though, to hash the lines of MEMBERS.TXT with the job number as the key and the entire line as the value, then return the entire line corresponding to each failed job).

Something that’s been bugging me for a while is that I couldn’t remember how to continue lines in the ISE command pane.

Of course, in a regular PowerShell window, you can just press Enter to continue entering a command on the next line:

PS C:\Users\Owner> 1..10 | Where-Object{ ($_ % 2) -eq 0 }
PS C:\Users\Owner> 1..10 | Where-Object{
>> ($_ % 2) -eq 0 }

You can press Enter after a “|” to start the next section of pipe on the next line, you can press Enter after a “,” when entering a list of items, you can press Enter after an opening bracket “{” when starting into a script block, and you can press Enter immediately after entering a “`” (a backtick) anywhere within a line (except inside a string literal).

But if you try to use Enter in the command pane of the ISE, you’ll get an error. For example, when I try to break a line after the opening bracket of Where-Object (as in the example above), I get:

PS P:\> 1..10 | Where-Object {
Missing closing ‘}’ in statement block.
    + CategoryInfo…

Very annoying.

I finally (after a couple weeks of suffering along without continuation lines in my recent orgy of ISE work) tracked down the keystroke needed – it’s Shift-Enter in the ISE.

Enter in the regular prompt, Shift-Enter in the ISE. Enter in the regular prompt, Shift-Enter in the ISE. Enter in the regular prompt, Shift-Enter in the ISE…

It is a truth universally acknowledged that not everyone is comfortable working with a command line. Let’s face it, some of our coworkers will never get over their desire for a good 5-cent GUI.

We’re a Ruby (and Java) shop here at work, so after I learned Ruby and started writing some serious scripts, I looked for a GUI toolkit for Ruby. Sadly, I found the available alternatives lacking for one reason or another.

Shoes looked good at first glance, but when its creator dropped out of programming, the Shoes project grew stale.. More recently, according the The Ruby Way (3rd edition), work has resumed on Shoes, but in JRuby. Maybe it would be worth taking a new look at Shoes, since we use JRuby here.

I also looked at Tk, but when I last checked, there were holes in the Ruby implementation (the Python implementation was more complete).

Qt is another popular toolkit used in Ruby, but the licensing (for commercial entities) caused me to pass on it.

One of the things I looked for was support built-in — either built-in to the programming language (so it’s all part of the same install), or built-in to the operating system itself (Windows, in our case). For a while, I played with the idea of scripting in Python, so I could use its superior implementation of Tk — but the needs of the many (possible users of any scripts I might write) outweighed the needs of the few (programmers who could maintain Python scripts in a Ruby shop).

Lately, my inclination has been to use PowerShell to run Ruby scripts, and this extends to writing GUIs. PowerShell can build GUIs using either Windows Forms (the old and busted way) or Windows Presentation Foundation (WPF).

Here are a couple of links to building GUIs using WPF:

Of course, when using PowerShell, it’s also a good idea to remember the Out-GridView cmdlet, which might be all the GUI some scripts need.

One of the things I do at work involves creating scripts to run a Ruby script. For each line in each script I create, I have to:

  1. Go to our secondary email program
  2. Copy a job name (a word, basically) or an entire line from an error email
  3. Change to an editing program (like the PowerShell Integrated Scripting Environment, a.k.a. ISE)
  4. Create a line with the Ruby command line and space for the text I’ve copied from the error email, and add the error email text to that line

Today, I had almost 120 lines to create this way – some of them in two versions

Previously, I did it all by hand – I duplicated the Ruby script portion as many times as I needed, then copied and pasted the error text.

Today, though, contemplating the 200+ lines to create, I decided to dig a bit deeper into the PowerShell ISE.

I discovered I could open a PowerShell script file from disk using:

PS P:\> New-Item -ItemType File test0.ps1
PS P:\> $PSISE.CurrentPowerShellTab.Files.Add(“P:\test0.PS1”)

I then found I could access the open files in the tabbed script panes using standard array indexing notation:

PS P:\> $PSISE.CurrentPowerShellTab.Files[7]                # or [0], [1], [2], etc

With a little more experimentation, I found I could assign the tabbed script to a variable, Save its contents from the command line, and update the Text in the Editor property of the script pane:

PS P:\> $file0 = $PSISE.CurrentPowerShellTab.Files[7]
PS P:\> $file0.Editor.Text = “hello, world”
PS P:\> $file0.Editor.Text += “`n” + “Goodbye, cruel world”
PS P:\> $file0.Save()
PS P:\> Get-Content P:\test0.ps1
hello, world
Goodbye, cruel world

PS P:\>

I then discovered that I could access the current tab directly, without having to use the array indexing notation, or assign the tabbed script to a variable:

PS P:\> $PSISE.CurrentFile.Editor.Text = “”

I then adapted Recipe 8.3 (“Read and Write from the Windows Clipboard”) from the Windows PowerShell Cookbook to write a one-liner:

PS P:\> function Get-Clipboard { Add-Type -Assembly PresentationCore; [Windows.Clipboard]::GetText() }

Finally, I put the Ruby script lines into variables (for example, $ruby_script_1),
defined a new variable $PCE:

PS P:\> $PCE = $PSISE.CurrentFile.Editor

And used the results to add lines to the currently selected script tab:

PS P:\> $PCE.Text += $ruby_script_1 + (Get-Clipboard) + “`n”
# the `n is the PowerShell way to specify a newline character

Now, I just had to

  1. Switch to the email program,
  2. Copy the job name or full line out of the error email,
  3. Switch back to the PowerShell ISE, and
  4. Up-Arrow to create each new line

It looks like the same number of steps – but there’s a lot fewer keypresses, so…WIN!!!