SharePoint will not crawl a site…

(The specified address was excluded from the index. The crawl rules may have to be modified to include this address. (The item was deleted because it was either not found or the crawler was denied access to it.)

 We were using the Robots.txt file to block search engines from indexing our public site. When I first tried to index the site from the SSP it would only crawl a small part of the site.  After messing with permissions and other items I was informed that we had added a Robots.txt file to the site.  Once we removed the file from the C:\inetpub\wwwroot\wss\VirtualDirectories\YourSiteName folder and rebooted the server, I was able to index the entire site from the SSP. The following day is when the trouble began. I tried doing a Full Crawl and the crawl status would change from Crawling to Idle within a few seconds and would display the error above. 

 Here is how I fixed the problem.  In the content database create this Stored Procedure (you can delete it later).

What does it do? The stored procedure will allow you to search all the tables in a database.


Create PROC [dbo].[SearchAllTables]

(

      @SearchStr nvarchar(100)

)

AS

BEGIN

 

 

      CREATE TABLE #Results (ColumnName nvarchar(370), ColumnValue nvarchar(3630))

 

      SET NOCOUNT ON

 

      DECLARE @TableName nvarchar(256), @ColumnName nvarchar(128), @SearchStr2 nvarchar(110)

      SET  @TableName = ”

      SET @SearchStr2 = QUOTENAME(‘%’ + @SearchStr + ‘%’,””)

 

      WHILE @TableName IS NOT NULL

      BEGIN

            SET @ColumnName = ”

            SET @TableName =

            (

                  SELECT MIN(QUOTENAME(TABLE_SCHEMA) + ‘.’ + QUOTENAME(TABLE_NAME))

                  FROM INFORMATION_SCHEMA.TABLES

                  WHERE             TABLE_TYPE = ‘BASE TABLE’

                        AND   QUOTENAME(TABLE_SCHEMA) + ‘.’ + QUOTENAME(TABLE_NAME) > @TableName

                        AND   OBJECTPROPERTY(

                                    OBJECT_ID(

                                          QUOTENAME(TABLE_SCHEMA) + ‘.’ + QUOTENAME(TABLE_NAME)

                                           ), ‘IsMSShipped’

                                           ) = 0

            )

 

            WHILE (@TableName IS NOT NULL) AND (@ColumnName IS NOT NULL)

            BEGIN

                  SET @ColumnName =

                  (

                        SELECT MIN(QUOTENAME(COLUMN_NAME))

                        FROM INFORMATION_SCHEMA.COLUMNS

                        WHERE             TABLE_SCHEMA      = PARSENAME(@TableName, 2)

                              AND   TABLE_NAME  = PARSENAME(@TableName, 1)

                              AND   DATA_TYPE IN (‘char’, ‘varchar’, ‘nchar’, ‘nvarchar’)

                              AND   QUOTENAME(COLUMN_NAME) > @ColumnName

                  )

     

                  IF @ColumnName IS NOT NULL

                  BEGIN

                        INSERT INTO #Results

                        EXEC

                        (

                              ‘SELECT ”’ + @TableName + ‘.’ + @ColumnName + ”’, LEFT(‘ + @ColumnName + ‘, 3630)

                              FROM ‘ + @TableName + ‘ (NOLOCK) ‘ +

                              ‘ WHERE ‘ + @ColumnName + ‘ LIKE ‘ + @SearchStr2

                        )

                  END

            END  

      END

 

      SELECT ColumnName, ColumnValue FROM #Results

END

Once you have created the stored proc execute this command:

Exec SearchAllTables ‘Robots.txt’

You will see a single record returned.

Then run this command:

Delete from AllDocs

Where LeafName = ‘Robots.txt’

After this I rebooted the server and went to lunch.  Why lunch? Well, for some reason SharePoint takes a little time to propagate the change (timer job I’m sure).

If you are having an issue where only part of your site is being indexed try adding crawl rules.

example:

http://www.sharepointed.com/*

include

then select the default account or use one that has read access to the site.

Leave a Reply

Your email address will not be published. Required fields are marked *