RabbitFarm

2022-10-16

Zippy Fast Dubious OCR Process

The examples used here are from the weekly challenge problem statement and demonstrate the working solution.

Part 1

You are given two lists of the same size. Create a subroutine sub zip() that merges the two lists.

Solution


use v5.36;
use strict;
use warnings;

sub zip($a, $b){
    return map { $a->[$_], $b->[$_] } 0 .. @$a - 1;
}

MAIN:{
    print join(", ", zip([qw/1 2 3/], [qw/a b c/])) . "\n";
    print join(", ", zip([qw/a b c/], [qw/1 2 3/])) . "\n";
}

Sample Run


$ perl perl/ch-1.pl
1, a, 2, b, 3, c
a, 1, b, 2, c, 3

Notes

The solution here is basically that one line map. Since we know that the lists are of the same size we can map over the array indices and then construct the desired return list directly.

Part 2

You are given a string with possible unicode characters. Create a subroutine sub makeover($str) that replace the unicode characters with their ascii equivalent. For this task, let us assume the string only contains letters.

Solution


use utf8;
use v5.36;
use strict;
use warnings;
##
# You are given a string with possible unicode characters. Create a subroutine 
# sub makeover($str) that replace the unicode characters with their ascii equivalent.
# For this task, let us assume the string only contains letters.
##
use Imager;
use File::Temp q/tempfile/;
use Image::OCR::Tesseract q/get_ocr/;

use constant TEXT_SIZE => 30;
use constant FONT => q#/usr/pkg/share/fonts/X11/TTF/Symbola.ttf#;

sub makeover($s){
    my $image = Imager->new(xsize => 100, ysize => 100);
    my $temp = File::Temp->new(SUFFIX => q/.tiff/);
    my $font = Imager::Font->new(file => FONT) or die "Cannot load " . FONT . " ", Imager->errstr;
    $font->align(string => $s,
                 size => TEXT_SIZE,
                 color => q/white/,
                 x => $image->getwidth/2,
                 y => $image->getheight/2,
                 halign => q/center/,
                 valign => q/center/,
                 image => $image
    );
    $image->write(file => $temp) or die "Cannot save $temp", $image->errstr;
    my $text = get_ocr($temp);
    return $text;
}


MAIN:{
    say makeover(q/ Ã Ê Í Ò Ù /);
}

Sample Run


$ perl perl/ch-2.pl
EIO



Notes

First I have to say upfront that this code doesn't work all that well for the problem at hand! Rather than modify it to something that works better I thought I would share it as is. It's intentionally ridiculous and while it would have been great if it worked better I figure it's worth taking a look at anyway.

So, my idea was:

I wasn't so sure about that last one. A good ocr should maintain the true letters, accents and all. Tesseract, the ocr engine used here, claims to support Unicode and "more than 100 languages" so it should have reproduced the original input text, except that it didn't. In fact, for a variety of font sizes and letter combinations it never detected the accents. While I would be frustrated if I wanted that feature to work well, I was happy to find that it did not!

Anyway, to put it mildly, it's clear that this implementation is fragile for the task at hand! In other ways it's pretty solid though. Imager is a top notch image manipulation module that does the job nicely here. Image::OCR::Tesseract is similarly a high quality wrapper around the Tesseract ocr engine. Tesseract itself is widely accepted as being world class. My lack of a great result here is mainly due to my intentional misuse of these otherwise fine tools!

References

Imager

Image::OCR::Tesseract

Challenge 186

posted at: 22:38 by: Adam Russell | path: /perl | permanent link to this entry

The Weekly Challenge 186 (Prolog Solutions)

The examples used here are from the weekly challenge problem statement and demonstrate the working solution.

Part 1

You are given two lists of the same size. Write a predicate that merges the two lists.

Solution


zip([], [], []).
zip([Ha|Ta], [Hb|Tb], [Ha, Hb|Tc]):-
    zip(Ta, Tb, Tc).

Sample Run


$ gprolog --consult-file prolog/ch-1.p
| ?- zip([1,2,3],[a,b,c],C).

C = [1,a,2,b,3,c]

Notes

This is a great simple example of recursion in Prolog. If there is any doubt about what is happening here is what trace/0 shows:


| ?- trace.
The debugger will first creep -- showing everything (trace)

yes
{trace}
| ?- zip([1,2,3],[a,b,c],C).
      1    1  Call: zip([1,2,3],[a,b,c],_38) ? 
      2    2  Call: zip([2,3],[b,c],_73) ? 
      3    3  Call: zip([3],[c],_102) ? 
      4    4  Call: zip([],[],_131) ? 
      4    4  Exit: zip([],[],[]) ? 
      3    3  Exit: zip([3],[c],[3,c]) ? 
      2    2  Exit: zip([2,3],[b,c],[2,b,3,c]) ? 
      1    1  Exit: zip([1,2,3],[a,b,c],[1,a,2,b,3,c]) ? 

Given that we know the lists are of the same size we have an extremely elegant solution. That is, we state that the first argument is an element, followed by the rest of the list, and the same goes for the second argument. For the third argument we state that its head is the combined initial two elements from the first and second lists and this unifies our uninstantiated variable.

References

Challenge 186

posted at: 20:00 by: Adam Russell | path: /prolog | permanent link to this entry