Printer Friendly Version
Email this thread to a friend
|
Featured Web Site Template |
|
Reflects user activity within the last 5 minutes
|
|
| Member |
Message |
morch
Joined: Mar 12, 2006
# Posts: 16
|
Posted: 2006-Mar-12 13:27
[sorry in advance for the length of this]
Hi I am a webmaster who uses FP as an editor. As this suggests I am no wizard with coding or webmastering generally but I have made a reasonable living over the last few years.
This post is really about Google and the non www problem that I have seen discussed numerous times on this forum. I cant find answers to my specific problems and hope that my post may help me and others who are not as clued as we perhaps should be.
The recent Google BD shuffle has caused me (and thousands of others) a few sleepless nights. I suffered from the supplemental / indexing problem that saw my serps disappear from 21/22 Feb – save the home page which still brought in a bit of traffic. Today I have seen that Google has either reversed part of the algo OR simply indexed pages that were previously missing from the BD datacentres. Im ranking well again and have most of my pages properly indexed.
But Im not counting my chickens just yet. There may be more to come before BD fizzles out. This has caused me to reflect on issues that I have so far tried to ignore. I have some potentially serious flaws with my site and that is why I am posting today. I want to know more about the non www url’s on my site and about .htaccess.
My entire site seems to have another “duplicate” non www version where all pages have been indexed showing varying page rank (ie different to the www version) They have no external links as far as I can tell.
Every time I put a new page up – it seems to duplicate to non www – I cant work out why. My external links don’t have non www linking code but it might be FP internal links that are causing the problem.
I know nothing of .htaccess apart from what I have picked up reading this forum. My host does provide one global htaccess file stored in my “protected” logs directory. But this is a directory shared by up to 12 other sites that I run. Only a handful are commercial. I am wary of messing up the othr sites.
What I want to know is:
1) Do I need to resolve the duplicate pages indexed with non www? So far they haven’t been problem but the recent BD shuffle has changed my thinking on this
2) I thought google would not penalise a duplicate page – merely rank the oldest page first and ignore the second one. If this is the case why is the non www version of my site such a problem?
3) Why do pages convert to the non www version – I have read that incorrectly referenced backlinks can cause this but I cant find any apart from the odd stray.
4) Could my internal linking structure be to blame? I use FP as an editor.
5) I have no understanding of the .htaccess file – but I have read much on this forum about it. How do I fix the non www problem with a basic script that I can upload intro my root directory? Can you recommend any code that will resolve this easily
Really sorry about making such a lengthy first post
Many thanks
Morch
|
 |
g1smd
Staff
Joined: Jul 28, 2002
# Posts: 10446
|
Posted: 2006-Mar-12 17:41
I have answered these questions a few hundred times in the last few years, but you have brought up a number of related issues too, so it is worth repeating again, and expanding on the information:
>> 1) Do I need to resolve the duplicate pages indexed with non www? So far they haven’t been problem but the recent BD shuffle has changed my thinking on this.
Yes, you must resolve the issues. Split page rank is just one problem that you can see. A trained eye will find a lot of other issues that you haven't yet noticed.
These searches give some clues:
site:domain.com
site:domain.com -inurl:www
site:www.domain.com
especially:
- how far you get before you get the "repeat search to show omitted results" link appears.
- entries shown as Supplemental Results.
- entries shown as URL-only.
- number of pages indexed compared to how many pages you actually have.
>> 2) I thought google would not penalise a duplicate page – merely rank the oldest page first and ignore the second one. If this is the case why is the non www version of my site such a problem?
Page rank splitting is just one problem, but you will also find that some of your pages will be shown as URL-only entries, and some pages may not be indexed at all. The above three searches give many clues.
The 301 redirect takes any request for non-www pages and sends the browser or bot to the www version of that page. Eventually only www pages will be listed, and the non-www pages will all drop out of the index.
>> 3) Why do pages convert to the non www version – I have read that incorrectly referenced backlinks can cause this but I cant find any apart from the odd stray.
If you use relative linking (domain not specified in the link), then it only takes ONE stray link to the "wrong" version to get the whole site indexed under the wrong version. Either use full http://www.domain.com/folder/subfolder/ type links on the site, or start all your links with a / (so that they count from the root) and make sure that every page of the site contains the <base href="http://www.domain.com/"> tag to tell the search engines which version site-wide is to be indexed.
Get a copy of Xenu LinkSleuth and test your site starting at http://domain.com/ and then again at http://www.domain.com/ and see what you get. Then do it again after you have put all the fixes in place.
>> 4) Could my internal linking structure be to blame? I use FP as an editor.
Yes it could. Relative linking allows the whole site to be crawled as both non-www and as www. Using the base tag stops that happening.
So, either use full URL in all links (uses more bandwidth), OR use the base tag on every page of the site, combined with having all URLs start with a / and count FROM the root, within the site.
Using a 301 redirect from non-www to www is also a very good idea. The 301 redirect, or the base tag can fix the problem, but it is a very good idea to do both of these things.
>> 5) I have no understanding of the .htaccess file – but I have read much on this forum about it. How do I fix the non www problem with a basic script that I can upload intro my root directory? Can you recommend any code that will resolve this easily
If you don't want to mess with the main configuration files, you can add the 301 redirect to the .htaccess file in the root folder for each individual site. The code is very simple:
Options +FollowSymLinks
RewriteEngine on
RewriteCond %{HTTP_HOST} ^domain\.com [NC]
RewriteRule ^(.*)$ http://www.domain.com/$1 [L,R=301]
When you have done that, test it using WebBug and make sure that you really do get the correct HTTP status code response.
[ Message was edited by: g1smd 03/12/2006 01:48 pm ]
|
 |
morch
Joined: Mar 12, 2006
# Posts: 16
|
Posted: 2006-Mar-12 18:56
Many many thanks for such a lucid and detailed reply g1smd
I have a few questions before I implement [bear with me on the techy stuff]
>> make sure that every page of the site contains the <base href="http://www.domain.com/"> tag to tell the search engines which version site-wide is to be indexed <<
Do you mean that I should simply leave one absolute link like this on every page to act as some kind of sign post for the bot to re-direct it to the www page?
>> having all URLs start with a / and count FROM the root <<
Can you clarify that for me - do you mean placing a / after the domain so you get www.domain.com/url1.html? Or do you place the / before the url?? SOunds simple but for me it aint!
Your .htaccess code reads
Options +FollowSymLinks
RewriteEngine on
RewriteCond %{HTTP_HOST} ^domain\.com [NC]
RewriteRule ^(.*)$ http://www.domain.com/$1 [L,R=301]
This suggests I will need the "RewriteRule" for every page on the site so you would have:
RewriteRule ^(.*)$ http://www.domain.com/index.html [L,R=301]
RewriteRule ^(.*)$ http://www.domain.com/url1.html [L,R=301]
RewriteRule ^(.*)$ http://www.domain.com/url2.html [L,R=301]
RewriteRule ^(.*)$ http://www.domainn.com/url3.html [L,R=301]
Is this correct? So on a 200 page site you'd need 200 such lines redirecting to the correct www page. Or do you only need to write this once and embed the file in the site root dir?
Id appreciate your help on thse points - I know its basic but thats where Im at Im afraid.
Finally - what detrimental effect can all of this have on your site. Ive read that people switching to new domains are haiving mixed success and some are being sandboxed. I dont want to risk that. Is this unlikely to happen with a re-direct of this type (ie non www --> www within same root)
Best regards
Morch
[ Message was edited by: g1smd 03/12/2006 01:49 pm ... Reason: Fixed Code Example ]
|
 |
morch
Joined: Mar 12, 2006
# Posts: 16
|
Posted: 2006-Mar-12 19:05
Just done the site:domain.com -inurl:www
It shows only 6 pages as non www - 4 of which are supplemental
But when I type in domain.com into ie and then remove the www on the browser - the whole site appears fully functional with non www pages (in other words hundreds of pages all navigable as domain.com)
So do I have 6 pages to re-direct or is there a whole site to fix?
Cant get my head round it at all
|
 |
g1smd
Staff
Joined: Jul 28, 2002
# Posts: 10446
|
Posted: 2006-Mar-12 19:46
You are lucky. Google has only indexed 6 "non-www" pages, and already realises that it is going down the wrong road (by marking the 4 pages as "supplemental results" ). Now is exactly the right time to be putting this fix in.
>> >> make sure that every page of the site contains the <base href="http://www.domain.com/"> tag to tell the search engines which version site-wide is to be indexed << <<
>> Do you mean that I should simply leave one absolute link like this on every page to act as some kind of sign post for the bot to re-direct it to the www page? <<
No. The <base> tag is NOT a link. The <base> tag goes in the <head> of the page. The <base> tag specifies the "default domain" that all domain-less links on the page are pointing to.
If you are on http://www.domain.com/page1.html then a link to /page2.html is pointing to http://www.domain.com/page2.html
If you are on http://domain.com/page1.html then a link to /page2.html is pointing to http://domain.com/page2.html
...BUT if that page had had <base href="http://www.domain.com/"> at the top, then it would not matter whether you were at http://domain.com/page1.html or you were at http://www.domain.com/page1.html, the URL for page2.html would ALWAYS be seen as being http://www.domain.com/page2.html because the <base> tag resolves the domain, and that is exactly what you want to do.
The base tag forces the canonicalisation of all pages that the page containing the <base> tag point to. Having the same <base> tag on every page of the site "fixes" every link to every page, from every page = problem solved.
>> >> having all URLs start with a / and count FROM the root << <<
>> Can you clarify that for me - do you mean placing a / after the domain so you get www.domain.com/url1.html? Or do you place the / before the url?? SOunds simple but for me it aint! <<
Link to your pages like this:
/page2.html
/folder/page2.html
/images/widgets/green.widget.jpg
Don't link like this:
page2.html
../../image.jpg
without a starting "/".
If you link to an index page, then you would use just "/" or "/folder/" to link to it.
>> So on a 200 page site you'd need 200 such lines redirecting to the correct www page. Or do you only need to write this once and embed the file in the site root dir?
The .htaccess code that I posted will redirect domain.com/any.page.html over to www.domain.com/any.page.html so only those 4 lines of code are needed per domain.
The $1 in the rewrite rule contains the folder and/or page name of the URL that is being re-written. It is a dynamic system, so any page on the non-www site will be rewriiten to the new URL (the one including a www on it) with the same page name.
>> Finally - what detrimental effect can all of this have on your site. Ive read that people switching to new domains are having mixed success and some are being sandboxed. I dont want to risk that. Is this unlikely to happen with a re-direct of this type (ie non-www --> www within same root) <<
There will be no detrimental efffect. Over 99% of your site is already listed at the www version. Adding the 301 redirect redirect will make the last few "non-www" entries go away. Failing to add the redirect risks Google "finding" more "non-www" pages, indexing them, and those then causing you duplicate content issues, and making the www version of your site lose rank.
Google recommends using this 301 redirect from non-www to www. Get it fixed now.
[ Message was edited by: g1smd 03/12/2006 12:22 pm ]
|
 |
Dinkar
Staff
Joined: Aug 12, 2001
# Posts: 4391
|
Posted: 2006-Mar-12 20:08
g1smd, you should rename yourself to g301smd
|
 |
g1smd
Staff
Joined: Jul 28, 2002
# Posts: 10446
|
Posted: 2006-Mar-12 20:23
Uh, huh.
I guess so: http://www.google.com/search?num=100&filter=0&q=g1smd+301
|
 |
Dinkar
Staff
Joined: Aug 12, 2001
# Posts: 4391
|
Posted: 2006-Mar-12 20:41
lol Good job.
|
 |
morch
Joined: Mar 12, 2006
# Posts: 16
|
Posted: 2006-Mar-12 20:48
Its becoming clearer!!
Can I resolve the whole thing by using the <base> tag on every page? Im more than happy to do this because it doesnt involve playing with .htaccess file which makes me nervous. But if its only a partial fix then I guess Im going to have to grasp the nettle.
In which case:
>>The $1 in the rewrite rule contains the folder and/or page name of the URL that is being re-written.<<
Still not clear on this - not sure if the $1 is to be replaced with /index.html or if you leave it because its the precise syntax that goes with the command. I know this sounds thick but you can read your comments in two ways if youre as green as I am on coding.
Presumably by placing the /$1 after the www.domain.com - are you telling any bot to go looking for the www pages rather than the non www.
And one last thing (I know - im pushing it)
How come I can access my site fully on the browser with non www pages? The whole site is navigable without the www - which concerns me even though the indexing reveals only half a dozen supps or non www's? The PR is different on most pages.
Ive even tested this on a few BIG authority sites today - they also allow you to revert to domain.com and navigate with different PR displayed on the two pages (www PR 7 + non www PR4)
Cant explain this last bit too well. Hopefully you know what I mean. I just dont geddit
You've been great today - I know your probably sick to death of repeating yourself on this 301 business g1smd but I really think these "basics" will help a lot of people.
Thanks again
Morch
|
 |
g1smd
Staff
Joined: Jul 28, 2002
# Posts: 10446
|
Posted: 2006-Mar-12 20:57
"Normal" server configuration, as set up by most clueless hosting companies, allows you to access your site at either non-www or at www. In effect you have two identical sites. They fail to realise that search engines like Google see these two sites as being separate copies, not "one" site.
Search engines do not want to index the same content twice, so they try to filter one copy out. That means that your PageRank suffers. You have already seen that in effect, a site with domain.com/page.html with a PR3 and the "same" page at www.domain.com/page.html with a PR6.
If the redirect had been done properly, only www.domain.com/page.html would be listed and it might be PR7.
You can resolve it with the <base> tag, but I can recommend the 301 redirect too. It is 4 lines of code to add to the .htaccess file. Make sure you check your site with Xenu LinkSleuth too.
In the .htaccess file, the $1 contains the folder and/or name of the page, as a server function. You just type a "dollar" and a "one" there, and the server invokes that rule for every page of the site. You do not put real page names there.
|
 |
morch
Joined: Mar 12, 2006
# Posts: 16
|
Posted: 2006-Mar-12 21:16
I cant ignore such unequivocal advice - Im going to take you up on your recommendations:
1) I am going to set up a .htaccess file on notepad and upload it to the root directory of the site that is suffering the problem. If Im understanding you fully - there is already a .htaccess at a higher level (in my basic log file) but this additional .htaccess file overrides it at the local / lower level. There is no corruption or conflict.
2) I am going to use the following code
Options +FollowSymLinks
RewriteEngine on
RewriteCond %{HTTP_HOST} ^domain\.com [NC]
RewriteRule ^(.*)$ http://www.domain.com/$1 [L,R=301]
And nothing else.
Presumably the capitals / upper case letters must remain precisely as they are here.
3) I will add the <base> tag to every page as a final measure. This will simply read
<base href="http://www.domain.com/">
And will go in the <head> tag
Could you [finally] confirm that I'm on the right track with this?
Thanks again g1smd. Its been an education!
[ Message was edited by: g1smd 03/12/2006 01:52 pm ... Reason: Fixed Code Example ]
|
 |
g1smd
Staff
Joined: Jul 28, 2002
# Posts: 10446
|
Posted: 2006-Mar-12 21:21
OK. To answer your questions.
Yes. Yes. Yes. and Yes.
Spot on!
BUT:
- do make sure that the <base> tag is placed before any links to any CSS or JS files in the <head> section
- make sure that your internal links to pages, files, images, stylesheets, etc all start with a / and that the URLs specify the full path to the file.
- make sure that you check your site using Xenu LinkSleuth for any problems.
|
 |
You are not permitted to post messages in this forum or topic, because of one or more of the following reasons:
- You have not yet logged in, or registered properly as a member
- You are a member, but no longer have posting rights.
- This is a private forum, for which you do not have permissions.
If you are a recent member, it's possible that you simply have not yet confirmed your account. Please
check your email for a message entitled 'JimWorld Forums: Confirm Your Account' and follow the instructions
contained within.
If you cannot find this message, click here to Re-Send it.
|
If you are still experiencing problem, please read the
Login Assistance
Article for some advice on what may be causing your login not to work properly.
|
Switch to Advanced Editor and ...
Create a New Topic
or Reply to this Thread
|
|