[Date Prev][Date Next][Thread Prev][][Date Index][Thread Index]

error on bad <base> url with form

With the current cvs, viewing foo.html below gets an error

    M-x w3m-find-file /tmp/foo.html
    w3m-expand-url: BASE must have a scheme part: /noschema/onthis/

This is a cut-down version of what recent archive.org is doing, eg.


The <base> field doesn't have a schema, which is probably slightly
contrary to the html specs.  Could w3m-check-header-tags tolerate it,
something like below?

The backtrace shows the form setups don't like w3m-current-base-url
relative.  In foo.html and archive.org the action url being expanded is
already absolute actually, so w3m-current-base-url will have no effect,
but it seems not good to set a relative url in w3m-current-base-url if
other code too might expect absolute.

2015-09-29  Kevin Ryde  <user42_kevin@xxxxxxxxxxxx>

	* w3m.el (w3m-check-header-tags): w3m-expand-url <base> to tolerate a
	non-absolute there, as for example from archive.org.  (If
	w3m-current-base-url is relative then form setups error out.)


Debugger entered--Lisp error: (error "BASE must have a scheme part: /noschema/onthis/")
  signal(error ("BASE must have a scheme part: /noschema/onthis/"))
  error("BASE must have a scheme part: %s" "/noschema/onthis/")
  w3m-expand-url("http://example.com/" "/noschema/onthis/")
  w3m-form-normalize-action("http://example.com/" "/noschema/onthis/")
  w3m-create-text-page("file:///tmp/foo.html" "text/html" nil #<buffer *w3m*>)
  w3m-create-page("file:///tmp/foo.html" "text/html" nil #<buffer *w3m*>)
  w3m-retrieve-and-render ...
  w3m-goto-url("file:///tmp/foo.html" nil nil nil nil nil nil nil t)
--- w3m.el.~1.1640.~	2015-06-09 16:50:26.578341932 +1000
+++ w3m.el	2015-09-29 19:23:00.227913283 +1000
@@ -6079,10 +6079,13 @@
 		  (setq w3m-previous-url href))
 		 ((member "start" rel) (setq w3m-start-url href))
 		 ((member "contents" rel) (setq w3m-contents-url href))))))
+           ;; <base> ought to be absolute but if not then absolutize for
+           ;; w3m-current-base-url.  Helps bad <base href="/foo/bar/"> seen
+           ;; from from archive.org circa 2015.
 	   ((string= tag "base")
 	    (w3m-parse-attributes (href)
 	      (when (< 0 (length href))
-		(setq w3m-current-base-url href))))))))))
+		(setq w3m-current-base-url (w3m-expand-url href)))))))))))
 (defun w3m-check-refresh-attribute ()
   "Get REFRESH attribute in META tags."