Using IFilter in C# by bypassing COM
I’ve been using IFilters in a C# application I’m working on, and it hasn’t been fun at all. There are all kinds of problems with COM threading and then there are some malfunctioning filter implementations.
Well, I decided one day to get to the bottom of these problems and finally created my own implementation of the LoadIFilter function.
The LoadIFilter function is used to find a filter implementation for a certain file. My implementation does what LoadIFilter does (and a bit more), but it does not involve COM in the process and avoids the threading problems mentioned above. Until now, it hasn’t introduced new problems..
Anyway, I packaged all that information (and source code) in an article and posted it to Code Project. You can find it here. It has some nice information on how to dynamically load a dll and call a function pointer using GetProcAddress (which was not possible before .Net 2.0).
Hope you’ll find it useful.
UPDATE: The article moved to it’s permanent location after being edited. Link updated.
Technorati: IFilter, LoadIFilter, COM, C#
15 Responses to “Using IFilter in C# by bypassing COM”
Software/Technology Discussion
March 12th, 2006
Hey Eyal,
First, thanks for the code. We are testing your code in our site and are getting:
System.Runtime.InteropServices.COMException (0x80030050): already exists. (Exception from HRESULT: 0x80030050 (STG_E_FILEALREADYEXISTS))
at System.Runtime.InteropServices.ComTypes.IPersistFile.Load(String pszFileName, Int32 dwMode)
at EPocalipse.IFilter.FilterLoader.LoadAndInitIFilter(String fileName, String extension)
at EPocalipse.IFilter.FilterReader..ctor(String fileName)
Which is strange as you say there are no COM objects used.
ANy help will be appreciated.
Alon
Alon
January 3rd, 2008
chalom Eyal
how can I program in C# or Java a global image filter .
I mean like Ifilter for all the system.
If I want to see any image or video or Flash coming to the screen according the built filter (GUID or any image filter built in C# or Java.
Toda
Zeev
zeev
May 11th, 2008
Hi,
First off, thanks for posting this example. I’m trying to use your posted library to scan a collection of documents. Everything works fine for the first file in the collection, but then on the second file i get an error “already exists. (Exception from HRESULT: 0x80030050 (STG_E_FILEALREADYEXISTS))” do you have any pointers on how to get around this? Thanks! Allan
Allan
June 10th, 2008
Hi Eyal,
Thanks for posting the code!
I’m trying to use your IFilter to extract text from different file types. It work good on word and text. I’m having some problems with the excel files. The text that I receive has the strings in different order. First I receive the text and then the numbers…sounds strange but…And with other implementation of IFilter I get the same thing.
With .csv files I get i get the error: “already exists. (Exception from HRESULT: 0×80030050 (STG_E_FILEALREADYEXISTS))â€.
Any help is appreciated.
Thank you!
Doru
Doru
June 12th, 2008
I can’t seem to find the source zip anymore at:
http://www.codeproject.com/KB/cs/IFilter.aspx
Could you tell me where I might download it from, or send me a copy?
Thanks!
John
Massachusetts, US
JohnH
August 26th, 2008
The zip file source is back today. Thanks.
JohnH
August 26th, 2008
“Ð’Ñегда приÑтно читать умных людей”
Alex K
December 3rd, 2008
Hey Eyal, I need to know what all properties(e.g. author,date created) are supported by each IFilter(e.g. .docx,.xlsx) in FIlter Pack 2007. Do you have any idea about it?
UJ
January 7th, 2009
Hey Eyal,
Thanks for the code. I am using your code for my search solution and ran into a little issue when I tried to modify it. I only need to parse PDF and both Office 2003 and 2007 document types. Instead of looking up IFilter in registry by extension, I found what IFilters are used by my local system using IFilter Explorer software, copied DLLs into bin directory of my project and hardcoded Persistent Handler Addin values in my code. Then I matched extensions to DLL location and persistent handler value and extracted content using these hardcoded values. Everything worked perfectly fine till I tried to run the same code on another machine (both machines running Vista). I could no longer extract content on this other machine even though I used the same IFilter DLL files and the same Persistent Handler values. Does Persistent Handler DLL specific or installation/workstation specific? What\’s the difference between \”Persistent Handlers\” and \”Persistent Handlers Addins Registered\” values? I see both of these values in the IFilter Explorer next to DLL name and file extension, but don\’t know what they are used for and how are they different? According to your code, you use \”Persistent Handlers Addins Registered\” value in order to load filter from DLL. I did the same.
I\’ve been stuck on this problem for almost a week now and you are my best chance to get it resolve. At least point me in the right direction. All I am trying to do is use the latest IFilter DLLs in my project instead of hoping every machine has the latest IFilter DLL versions for file types that I need.
Thanks again for your code and time,
Ilia
Anonymous
April 10th, 2009
hi Eyal
thanks for you code. there is slight change in my requirement i.e, i need to extract the page numbers in pdf file while extracting text from pdf file.. pls could u help me on this.
thanks a bunch.
Sanjay
April 21st, 2009
hi Eyal
thanks for you code. there is slight change in my requirement i.e, i need to extract the page numbers in pdf file while extracting text from pdf file.. pls could u help me on this.
kiran
December 16th, 2009
I just posted to CodeProxect a patch to fix Adobe IFilter DLL load problem.
You can find it here:
http://www.codeproject.com/Messages/3414291/Re-pdf-files-ifilter-load-fails-for-AcroRdIF-dll-i.aspx
Thanks for your work.
Claudio
March 24th, 2010
Hi Eyal,
I just wanted to say thank you for making this code available to developers. I had written some code then whilst looking for a solution to some issues I came across yours which has saved me a lot of time.
All the best
Gary Lee
Gary
April 15th, 2010
re: Visio 2003 IFilter and MTA…
…