Empty image src can destroy your site
This is a problem I’ve come across frequently, and since it has come up again recently, I thought I’d explore this issue in the hope that it will save others some trouble. There are so many problems that this one issue can lead to that it’s baffling browsers still behave this way. The issue? An HTML image, either via <img> tag or JavaScript Image object, that has its src set to “” (an empty string).
The offending code
There are basically two patterns to identify. The first pattern is just straight HTML:
<img src="" >
The second pattern is JavaScript and involves the dynamic setting of the src property on either a newly created image or an existing one:
var img = new Image();
img.src = "";
Both patterns cause the same effect: another request is made to your server. There are two different ways that browsers do this.
- Internet Explorer makes a request to the directory in which the page is located. For example, if you have a page running at
http://www.example.com/dir/mypage.htmthat has one of these patterns, IE makes a request tohttp://www.example.com/dir/to fill in the image. - Safari and Chrome make a request to the actual page itself. So the page running at
http://www.example.com/dir/mypage.htmresults in a second request tohttp://www.example.com/dir/mypage.htmto fill in the image.
You’ll note that Opera and Firefox aren’t mentioned at all. Opera behaves as you might expect: it doesn’t do anything when an empty image src is encountered; the attribute is ignored. Firefox 3 and earlier behave the same as Safari and Chrome, but Firefox 3.5 addressed this issue and no longer sends a request (related bug).
Both cases, of course, are problematic because it’s an image making a request for a document. You can easily see this behavior using an HTTP debugging proxy (I highly recommend Fiddler).
The problems
There are two basic problems that this browser behavior causes. The first is a traffic spike. Imagine that have <img src=""> on the page at http://www.example.com/. The big problem is that each instance of <img src=""> makes a request to / in all browsers, which is the homepage of the domain. Congratulations, you’ve effectively doubled your traffic to the homepage.
For small sites, this may not be that big of a deal; jumping from 10,000 to 20,000 page views probably isn’t going to raise any flags for you or your host. If you’re a page that gets millions of page views per day, and probably have a lot of machines to handle that load, doubling or tripling traffic can be crippling. You can very easily run out of capacity.
Another issue with the traffic increase is the computing power needed to generate that homepage. If the page is personalizable or is updated with some regular frequency, you could be wasting computing cycles creating a page that will never be viewed by anyone.
The second problem is user state corruption. If you’re tracking state in the request, either by cookies or in another way, you have the possibility of destroying data. Even though the image request doesn’t return an image, all of the headers are read and accepted by the browser, including all cookies. While the rest of the response is thrown away, the damage may already be done.
How does this code happen?
The first time I encountered this problem, I naively thought that it was a bad developer writing crappy code. Had this been 2000 or earlier, I probably would have been right. In today’s web development world, however, I’m mostly wrong. Today, there are so many templating engines and content management systems responsible for constructing pages on-the-fly that it’s quite possible for good developers to end up producing pages with this code. All it takes is something as simple as this PHP:
<img src="$imageUrl" >
If some other part of the code is responsible for filling in $imageUrl, and that code fails, then the offending code gets output to the browser.
In today’s web development world, we’re all doing something along these lines, whether we know it or not. Download a new Wordpress theme? Make sure you’ll filled in all default arguments. Using a CMS at work? Make sure all your image URL fields are validated. It’s frightening easy to end up with this bad code on your page.
Other tags with problems
Before getting too angry at browser vendors, I think it’s fair to take a look at the HTML 4 specification, specifically the part defining images. Even though the specification indicates that the src attribute should contain a URI, it fails to define the behavior when src doesn’t contain a URI. Of course, images aren’t the only tags that reference an external resource, and so it should come as no surprise that there are other tags with the same problem.
As it turns out, Internet Explorer is the most sane browser out there. It’s problems are thankfully limited to images with an empty src attribute. It does make for this by making it a pain to detect, but that will be discussed later.
For other browsers, there are two additional problem scenarios: <script src=""> and <link href="">. Chrome, Safari, and Firefox all initiate another request.
Thankfully, no browser has a problem with <iframe src="">, as all correctly do not make another request.
What can be done?
Of course, the best thing to do is eliminate the offending code from your pages whenever possible. That’s fixing the problem at the source. If you can’t do that, though, your next best option is to attempt to detect it on the server and abort any further execution.
For browsers other than IE, it’s not too difficult to detect what’s going on from the server side. Since the request comes back to the exact same location that contains the offending code, there are two things you can do. First, you can check the request’s referrer. A request resulting from this issue coming from http://www.example.com/dir/mypage.htm will have a referrer of http://www.example.com/dir/mypage.htm. Assuming that there are no valid situations under which your page links to itself, this is a fairly safe way to detect these requests on the server-side.
Internet Explorer throws a wrench into the works by sending the request to the directory of the page instead of the page itself. If you’re only using path URLs (i.e., nothing with a file extension), then the effect is the same and you can use the same referrer detect. Some sample code for use with PHP:
<?php
//Works for IE only when using path URLs and not file URLs
//get the referrer
$referrer = isset($_SERVER['HTTP_REFERER']) ? $_SERVER['HTTP_REFERER'] : '';
//current URL (assuming HTTP and default port)
$url = "http://" . $_SERVER['HTTP_HOST'] . $_SERVER['REQUEST_URI'];
//make sure they're not the same
if ($referrer == $url){
exit;
}
?>
The goal here is to detect that the page refers to itself and then exit immediately to prevent the server from doing anything additional. Another option, and probably a good idea, is to log that this has happened so it shows up on a dashboard for evaluation.
Another way to attempt to detect this type of request on the server is by looking at the HTTP Accept header. All browsers except IE send different HTTP Accept headers for image requests than they do for HTML requests. As an example, Chrome sends the following Accept header for an HTML request:
Accept: application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Compare this to the Accept header that is sent for an image, script, or style sheet request:
Accept: */*
Firefox, Safari, and Opera all send roughly the same Accept header for HTML requests, meaning that you can check for an individual part, such as “text/html”, to determine if the request is an HTML request or something else. Unfortunately, IE only sends the latter Accept header for all requests, so there is no way to differentiate this on the server. For browsers other than IE, you can use something like the following:
<?php
//Warning: Doesn't work for IE!
//make sure the Accept header has 'text/htmnl' in it
if (strpos($_SERVER['HTTP_ACCEPT'], 'text/html') === false){
exit;
}
?>
This check is a little safer than the previous, but its big downside is that it doesn’t work in IE.
Why does this happen?
The real problem is the way that URI resolution is performed in browsers. This behavior is defined in RFC 3986 - Uniform Resource Identifiers. When an empty string is encountered as a URI, it’s considered a relative URI and is resolved according to the algorithm defined in section 5.2. This specific example, an empty string, is listed in section 5.4. Firefox, Safari, and Chrome are all resolving an empty string correctly per the specification, while Internet Explorer is resolving it incorrectly, apparently in line with an earlier version of the specification, RFC 2396 - Uniform Resource Identifiers (this was obsoleted by RFC 3986). So technically, the browsers are doing what they’re supposed to do to resolve relative URIs. The problem is that in this context, the empty string is clearly unintentional.
It’s time to fix this
This is a serious flaw in browsers, and I’m not sure you can look at it in any way where it’s not considered a bug. The inconsistent behavior, from Opera completely ignoring all invalid external references, to IE falling victim only for <img> tags while others do the same for <script> and <link> as well, seem to indicate a bug in browsers. Though browsers seem to be following correct URI resolution (except IE), I think this is a case where common sense must win over the letter of the specification. There is no way that an image can possibly render an HTML page, and the same goes for <script> and <link>. This bug has cost web developers hundreds of lost hours and has potentially brought down sites, pushing servers over capacity. Enough is enough. It’s time for the browser vendors to fix this bug. I’ve taken the liberty of filing or locating bugs:
- Firefox: Bug 531327
- WebKit (Safari/Chrome): Bug 30303
Please show support for fixing these bugs, as I don’t see any reason why we should still be dealing with this browser behavior. And if anyone can get the note to Microsoft so they can address IE, we’d all greatly appreciate it.
HTML5 to the rescue
HTML5 adds to the description of the <img> tag’s src attribute to instruct browsers not to make an additional request in section 4.8.2:
The
srcattribute must be present, and must contain a valid URL referencing a non-interactive, optionally animated, image resource that is neither paged nor scripted. If the base URI of the element is the same as the document’s address, then the src attribute’s value must not be the empty string.
Hopefully, browsers won’t have this problem in the future. Unfortunately, there is no such clause for <script src=""> and <link href="">. Maybe there’s still time to make that adjustment to ensure browsers don’t accidentally implement this behavior.
Update (2 Dec 2009): It appears that <img src=""> has been patched in Firefox 3.5 (bug 444931). Problems with <script src=""> and <link href=""> still remain. Also, added a reference to the HTML5 section that aims to help this issue.
Disclaimer: Any viewpoints and opinions expressed in this article are those of Nicholas C. Zakas and do not, in any way, reflect those of Yahoo!, Wrox Publishing, O'Reilly Publishing, or anyone else. I speak only for myself, not for them.
You can leave a response, or trackback from your own site.




23 Comments
It has been shown browser vendors can implement specifications.
So maybe it’s not so much a bug in the browsers, maybe it should be fixed in the RFC, the browsers will follow ?
Lennie on November 30th, 2009 at 4:24 pm
Interesting read and something one can easily fall victim to via use of third party data/ web services or indeed data provided Internally by other co workers/ departments who perhaps work further down the line and ship data to the front end. Will check this out.
Daithi44 on November 30th, 2009 at 4:32 pm
[...] Details beschreibt Nicholas C. Zakas auf seiner Seite. Das Problem dabei sind vor allem Skripte bzw. Templates, wo es schnell vorkommen [...]
Ein leeres Bild verursacht Traffic [Javascript ist Toll!] on November 30th, 2009 at 4:57 pm
Would be nice if YSlow could detect these kind of occurrences. It is a performance issue after all
kangax on November 30th, 2009 at 5:48 pm
Here’s a bit of CSS to help you find those empty img tags:
img[src=""] {
border: 1px dotted red !important;
}
(Works in Firefox 3+, which most webdevs are already using!)
richtaur on November 30th, 2009 at 8:11 pm
空 image src 属性导致的问题…
今天看到 Zakas 写的一篇文章 “Empty image src can destroy your site” ,想到之前做项目的时候遇到过这个问题,当时出现了一个怪异的BUG:当刷新页面的时候,页面上的一个数值会增加,但实际上…
Dreamer's Blog on November 30th, 2009 at 10:45 pm
Ran into this a few weeks back using a plugin to resolve a relative URL to an absolute one:
http://james.padolsey.com/javascript/getting-a-fully-qualified-url/
Turned out to be a killer, especially when using the Wicket framework. The symptom being that the Java constructor was called for each img created with a src attribute (obviously not ideal), corrupting server side state.
Phantom requests are definitely a major issue.
Zach Leatherman on December 1st, 2009 at 1:45 am
This also happens with CSS background-image:url();
Matthias on December 1st, 2009 at 3:39 am
[...] C.Zakas publica un artículo que muestra que cada navegador interpreta los estándares web como quiere. En este caso nos pone un [...]
El src de una imágen puede cargarse tu página | aNieto2K on December 1st, 2009 at 7:12 am
[...] checks can be performed. For example, you could crawl a site looking for empty image sources (why you’d want to do this), or perhaps to look for unclosed tags, or instances of inline JavaScript or CSS. You could do a [...]
Introducing “SiteTraverser” – James Padolsey on December 2nd, 2009 at 1:40 pm
Hi Nicholas,
I completely agree with your explanation that browser behaviour should change on this.
Just for your information i have spotted this in a well known company (CTS) website
http://www.cognizant.com/html/home.asp
Viewed the source of the page and could see <img id=”logoprintImg” src=”"……
Nice catch !! Keep posting these kind of findings, we shall atleast rectify them
Cheers, Karthik
Karthik Reddy Chintaparthi on December 4th, 2009 at 12:52 pm
Thanks for posting this and raising awareness for developers. Issues like this are very good ones for browser vendors to take note of (and hopefully fix). I ran into this a few years ago and spent days banging my head against the desk to figure out why every request to an ecomm page was causing 3 updates to the shopping cart. Turns out the testers weren’t filling in all the fields for the dummy product data. Now I know what to look for in these instances. For someone stumbling across these types of bugs for the first time, its a head->desk->repeat (lots) moment as it usually appears in the form of really random and often costly quirks (ala multiple updates to carts etc).
Bart on December 7th, 2009 at 6:36 pm
Same issue is also caused due to use of BASE meta tag for example will cause extra hits on home page. Base mata tag is often used for relative images downloads. CRE Loaded and OSCommerce based sites use this by default.
Wasim Asif on December 16th, 2009 at 5:54 am
[...] a previous post, I discussed the problem with setting an HTML image’s src attribute to an empty string. In [...]
Protect IE from empty img src () | NCZOnline on December 22nd, 2009 at 9:01 am
[...] last year, after spending 10 days tracking down a horrific bug, I posted, Empty image src can destroy your site. The post laid out a problem present in almost all modern browsers regarding empty string URLs in [...]
Empty-string URLs in HTML - A followup | NCZOnline on March 16th, 2010 at 9:00 am
No, you were right the first time - it is bad developers writing crappy code.
Michael Wales on March 17th, 2010 at 5:47 pm
[...] natrafiłem artykuł dotyczący problemu pustych atrybutów src. Dotyczy on wysyłania dodatkowego żądania do serwera. [...]
Niebezpieczeństwo pustych atrybutów src | Frontend.pl on April 7th, 2010 at 3:11 pm
[...] C. Zakas har skrevet en udmærket artikel, der beskriver problemet rigtig [...]
Hvem glor – din bror » Sådan laver du hurtigere websider on April 26th, 2010 at 3:55 pm
So when i use an images with the src attribute to the subdomain name instead of the main domain name, was the the traffic increased problem on the main homepage happen again? Or i should completely move website to html5?
adsl viettel on May 1st, 2010 at 11:37 am
Thanks a lot for the post. It’s really helpful for me because we found the error just recently - it costed us a lot of time :-(.. Just one question: how it could vary of configuration. We had no problems on our test servers and localhost but faced this problem on production servers. It means that this behaviou depends on IIS as well?
Val on May 19th, 2010 at 10:02 pm
[...] Yahoo!’s JavaScript guru Nicolas C. Zakas. For more information check out his article “Empty image src can destroy your site“. Did you enjoy this article? If yes, then subscribe to my RSS [...]
Best Practices for Speeding Up Your Web Site - Quicsolv Blog on June 21st, 2010 at 2:58 am
[...] 首先是加载空字符串的问题,如果给img的src设为空字符串的话,可能会得到意料之外的结果。例如在 http://xxx/test.htm 里面的 <img src=""> 会发生以下情况:ie 会产生相对地址的请求,即:http://xxx/Safari/Chrome 会产生当前页面地址的请求,即:http://xxx/test.htmOpera/Firefox 不会产生请求详细参考Nicholas C. Zakas的“Empty image src can destroy your site”。如果不想加载图片,不应该把src设为空值,因为还可能会发出请求,浪费资源。可以像程序那样,通过removeAttribute来移除就行了。 [...]
【原创】ImagesLazyLoad 图片延迟加载效果 - Web开发常见问题 - 原创 ImagesLazyLoad 图片延迟加载效果 Web 开发 JavaScript 原创 ImagesLazyLoad 图片延迟加载效果 Web 开发 JavaScript - 123Doing on June 21st, 2010 at 11:52 pm
[...] My guess is that the markers rendered on the page are rendered as img controls but their src is set to blank. You can read more about what happens when your control is having the src value as empty here: http://www.nczonline.net/blog/2009/11/30/empty-image-src-can-destroy-your-site/ [...]
Missing IMG src | The Largest Forum Archive on June 24th, 2010 at 4:02 pm
Leave a Comment