Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Microsoft OCR Library for Windows Runtime (windows.com)
132 points by maouida on Sept 19, 2014 | hide | past | favorite | 45 comments


We had great results using tesseract-ocr[1] with SWT (state of the art text detection algorithm, via libccv[2]) on Linux.

You can use our python bindings for both[3,4], although they might be slightly outdated:

[1] https://code.google.com/p/tesseract-ocr/

[2] http://libccv.org/doc/doc-swt/

[3] https://github.com/veezio/pytesseract

[4] https://github.com/veezio/pyccv


Be aware that SWT is patented [1] if you want to use it commercially.

[1] http://www.google.com/patents/US20090285482


This link shows the claims of the published application. The recently allowed claims are a lot more narrow and less problematic. Still worth reviewing though in case you are worried you infringe:

http://www.scribd.com/doc/240266916/12122729


It looks like Microsoft is the assignee? If so, is this included in the Microsoft OCR library?


Is it still possible to generate pixel correct hOCR when using SWT? Also, what is the main advantage of SWT - improving speed or accuracy?


I'm not sure about speed, but for accuracy, it's great. We've had terrible results with tesseract when giving it text that wasn't properly cropped with SWT.


Thanks, I use Tesseract in a product I'm developing, I'll try using SWT then.


What did you use to generate pyccv? (It looks automatically generated)

Does it still work with an up-to-date ccv?


This method: http://www.kaij.org/blog/?p=98

Although SWIG might work better now.


This is very cool! I've been working on a receipt scanning tool in C# for keeping track of kitchen inventory (tired of calling my wife asking if we have sesame oil or some odd ball thing)

I found a few libraries, but they only worked with relatively perfect scans (my goal is to be able to just use a phone). When I get home definitely going to give this a go.


Off topic, but this made me think that it would be neat if libraries on places like github and nuget could someout include "cited by" data. Something that referenced open source (maybe closed source too) projects that had a dependency to the library similar to google scholar or CiteSeerX.


You can get a DOI for github.


Then what? There's nothing magical about DOIs. You need someone to store the citation metadata. And generate / deposit citation metadata. And maintain the persistence of the DOI. What precisely does the DOI represent? A codebase? A fork of it? A file? A file at a particular revision? A changeset?


It doesn't appear that you can use this in a 'normal' .NET app. Any ideas why?


This is really one of my big frustrations with Microsoft.

On one hand, they really try to push everybody to upgrade to their newest and shiniest, by making a lot of stuff (like this) only available on Windows 8+.

On the other hand, they don't even bother to put in a box with "What operating systems will this work on", so you don't have to do trial/error, research WinRT, and then be disappointed when you realize this will apparently never work on Windows 7. And maybe only in Metro apps? What is Windows Runtime and am I just supposed to know this?

I really enjoy coding C# and working in .NET. Microsoft has some really great stable techs which work well for years and years - but increasingly if you want anything new and shiny from them, you have to run the newest OS. Which if you work with anything related to enterprise, good luck only targetting Windows 8.

And honestly, despite working almost exclusively with MS tech, I just don't really trust any platform from them that doesn't have significant traction and track record as they all too often just give up and try something new - and sometimes without real replacements available.


The MSDN documentation for the classes [1] clearly states the supported platforms. Admittedly the restriction to store apps is missing on the page for the namespace [2].

  Minimum supported client  Windows 8.1 [Windows Store apps only]
  Minimum supported server  Windows Server 2012 R2 [Windows Store apps only]
  Minimum supported phone   Windows Phone 8
[1] http://msdn.microsoft.com/en-us/library/windows/apps/xaml/wi...

[2] http://msdn.microsoft.com/en-us/library/windows/apps/xaml/wi...


Ah, don't know how I missed that. Thanks.

Crazy that it's limited to 8.1, and not even working on 8.0.

I really wonder if there is a valid technical reason, or they just use it to push upgrades.


It's crazy that it's Windows Store only. On my only Windows machine the Windows Store won't even open because I have UAC disabled.


Who on Earth uses Windows Store apps on Windows Server? We would totally consider using this if not for that.


I can't imagine there being a technical reason. OCR is almost purely algorithmic and it shouldn't depend much on the OS.


8.0 is deprecated, 8.1 has replaced it.


> What is Windows Runtime and am I just supposed to know this?

If you subscribe to MSDN like any Windows developer, this has been explained multiple times in the last two years.

Just for the clueless ones.

Windows Runtime is an evolution of COM, based on the ideas that were on the genesis of .NET. Namely Ext-VOS.

http://blogs.msdn.com/b/dsyme/archive/2012/07/05/more-c-net-...

So a native version of .NET, so to speak. And unless Windows 9 changes it, the future of Windows APIs.

The .NET runtime starting with Windows 8 acquired additional capabilities:

- Ahead of time compilation to native code for Windows Phone apps, with the MDIL binary format

- Consumption and creation of Windows Runtime components

> but increasingly if you want anything new and shiny from them, you have to run the newest OS.

No different from other commercial vendors.


It was 7 years from XP to Vista - and 9 years to Windows 7.

Now it has just been 5 years since Windows 7 and apparently it's already completely outdated. It's only 2 years since Windows 8.0, and it's already out!

If this is the way Microsoft is going, then it's a huge change for anybody dealing with enterprise. I used to be able to develop on the same system using the same techs as my customers - now I may sit on Windows 8.1, but I can't use any new shiny features as my customers are still on Windows 7. Many of them just upgraded.

Sure, this may be the way others are doing it. It may be the new normal. But I still think Microsoft is shooting themselves in the foot big time regarding anything related to businesses.

In this case I'm actively researching an OCR solution. Tesseract is annoying compared to a nicely integrated .NET solution. But I'm not able to choose the solution from Microsoft. It will be at least 5 years until our customers have upgraded again, and by then - well, we're probably not going to switch OCR tech.

Instead we will be relying on third parties, open source projects - things that are not tied as much to Microsoft or the .NET ecosystem. I didn't mind being tightly tied to MS tech, I often preferred it as it was easier and worked great - but in this case I don't even have a choice. Basically I just wish Microsoft would stay Microsoft instead of trying to be Apple.

-- And regarding following MSDN. I develop WPF/C#, and I don't follow anything at all. I don't care about hype or news. I care about solid techs that are mature and sticking around for a long time. Most of the stuff being announced will significantly change or be cancelled anyway. When a product has stuck around for 3 versions and is having a pretty good following, then I might be interested. If my customers are actually able to run it of course.


Didn't down-vote you, but I can guess why someone else might've.

Up until this post you didn't have it (the Microsoft OCR library) anywhere. You're still free as ever to develop using all the non-RT technologies (Windows Forms, WPF, Silverlight, WF, WCF and so on). All of it's officially supported.

And all said and done, Microsoft is doing no more or less than any other tech company. They release new stuff. And sometimes a new release is constrained in some way, like when an iPhone app is released by some startup that doesn't bother with an Android or Windows Phone equivalent. They may come eventually, or not...


Can confirm that the actual package successfully installs into a Profile78 Portable Class Library. So whilst the marketing heavily mentions Windows Phone, in theory this library will also work on Xamarin (iOS/Android, etc) and also within standard .NET applications (ASP.NET/Console/etc).

nb: haven't actually tested past installation at this stage.

edit: nope :(


Ah excellent, will try that, thanks!


It installs - but there's no DLL reference and you don't see the WindowsPreview namespace.


I made a C# Console app and added the nuget package. It adds, but there aren't any references. Within the nuget package, though, there is are three subdirectories within 'lib', one being 'win81'.

  packages\Microsoft.Windows.Ocr.1.0.0\lib\win81
Within this, there are 'ARM' 'x86' and 'x64' directories and dlls within them. VS refuses to add them to my project, so I'm guessing they're native and not COM libraries.

Why would I think they might be .NET libraries if they have 'x86' and 'x64' labels? Because C++/CLI has to be compiled to separate dlls, I believe.


It has nothing to do with C++/CLI, but with marshaling, JIT and NGEN.

When calling native code outside the CLR the runtime needs to know which type of marshaling code to generate.

It also plays a role when using unsafe code blocks in .NET.


On http://msdn.microsoft.com/en-us/library/windows/apps/windows... they mention the supported languages and their statuses, but Korean is only "Good".

I freely admit that I do not speak Korean, but if one compares "Chinese Simplified" characters (listed as "Very good") with those in the Korean alphabet, I am surprised those two entries aren't transposed.

Is there something that makes recognizing Korean harder than Chinese Simplified, or was that just a product management decision?


"demonstrated in code snippets below". The code snippets are actually images and even worse, they're JPEGs which is the reason why the text looks horrible.


If only there were some automated way to convert those images to text.


Now that is possibly the cruellest irony I've seen for a while. Well spotted :)


Yeah, stop being silly and just click the link on the page to the actual documentation with examples:

http://msdn.microsoft.com/en-us/library/windows/apps/xaml/wi...


You are expected to OCR them using the library.


"This blog was written by Jelena Mojasevic, Program Manager at Microsoft" - I seems no one told her how to embed code snippets.


is this better than the terrassect OCR?



I think that's a sort of apples and pears type of comparison.

Tessarect can be used everywhere, and is used dominantly on open platforms. This is a offering from Microsoft to be used on their platform only.

They may both be good, but they have widely different platform targets.


My guess is he meant better at actually OCR'ing text, not better for implementation.


what are you talking about? it is always about the results. OCR is a tool and it doesn't matter if runs on windows, linux, osx, phone, tablet, watch. if this microsoft OCR produce better results than terrassect, than people will simply create service running on windows (yes even on windows phone) and some kind of API to talk to it. the questions remains the same: does it produce better results than terrassect?

so far, this microsoft OCR is just bunch of words without any prove that it actually works, what so ever. show me some pictures or videos of results.


Thats the big question. Tesseract is pretty good, though quite slow I must say.


It depends on what is being scanned. Say you have a perfectly formatted image, directly taken from a scanner, it's a pretty darn quick process.

But from my experience, what adds to the slowness is pre-processing the image to make it suitable for OCR, especially tesseract. I still haven't found the magic combination of filters because every image is different, especially if your source them from users camera phones.


tesseract is really looking great with google adding searchable pdf as output in the latest release candidate.


So from reading the list of reasons for inaccurate results, it sounds like this library is totally useless for images taken with mobile phones, yet it is only allowed to run on mobile ;)

Now I would be more interested in an image correction library

".... Blurry images Handwritten or cursive text Artistic font styles Small text size (less than 15 pixels for Western languages, or less than 20 pixels for East Asian languages) Complex backgrounds Shadows or glare over text Perspective distortion Oversized or dropped capital letters at the beginnings of words Subscript, superscript, or strikethrough text"




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: