Unanswered: Full text indexing failure - Large pdf files
I am having a problem with indexing of large pdf files. I have 2 large pdfs : both around 23meg and both around 2000 pages. When the gatherer tries to index them it fails and retries. It appears to be a 30sec time out failure as CPU usage drops after 30sec and then ramps up again.
It retries repeatedly without moving on to other documents - effectively getting stuck. It does not log an error in the Windows Event log or the sql log. It does log the following in the gatherer log:
09/03/2005 14:38:24 Add The gatherer has started
09/03/2005 14:38:26 Add The initialization has completed
09/03/2005 16:10:36 Add The gatherer has started
09/03/2005 16:10:40 Add The recovery has completed
09/03/2005 16:45:06 MSSQL75://SQLServer/76cba758/F87750AC4AACBF4BA9F2816993FBE5EA Add Error fetching URL, (80041201 - The object was not found. )
Other documents in the database get indexed propoperly (if they were indexed before these pdfs) and the full-text catalogs are searchable. If the 2 large pdfs are removed then the indexing completes successfully. Other pdfs in the database are indexable and searchable.
I am using SQL 2000, SP3, Adobe IFilter 6.0, Windows 2003. The database is a Windows Sharepoint Services content database.