[Date Prev][Date Next][Thread Prev][][Date Index][Thread Index]

[no subject]



This bug report will be sent to the emacs-w3m development team,
 not to your local site managers!!
Please write in simple English, because the emacs-w3m developers
aren't good at English reading. ;-)

Please describe as succinctly as possible:
	- What happened.
	- What you thought should have happened.
	- Precisely what you were doing at the time.

Please also include any Lisp back-traces that you may have.
================================================================
Dear Bug Team!

As Katsumi Yamaoka requested, I have provided more information for the
emacs-w3m bug I reported last week. I have provided the stack
backtrace and the report-emacs-w3m-bug info.  I am sorry that I did
not include these in the original bug report.

I emphasize: I have narrowed the bug down to emacs-w3n's handling of 
lines that look like

<input type="hidden" name="text" value="" />

where the value string is larger than 69K bytes.
It is possible that not all string values contain the problem;
the only thing I am sure of is that the web page
 http://twiki.org/cgi-bin/edit/Sandbox/EmacsW3mBugDemo?t=1102963190    
does contain such a large singleline input element.

Forget blogging: that was a red herring.  
The bug seems to be handling of very large single lines,
such as are automatically produced by wiki.





================================================================
Debugger entered--Lisp error: (error "Stack overflow in regexp matcher")
  re-search-forward("<\\(form_int\\|map\\|img_alt\\|input_alt\\|/input_alt\\)\\([ 	
\f\n]+[^>]*\\)?/?>" nil t)
  w3m-form-parse-and-fontify(nil)
  w3m-fontify-forms()
  w3m-fontify()
  w3m-create-text-page("http://twiki.org/cgi-bin/save/Sandbox/EmacsW3mBugDemo" "text/html" "ISO-8859-1" #<buffer *w3m*>)
  w3m-create-page("http://twiki.org/cgi-bin/save/Sandbox/EmacsW3mBugDemo" "text/html" "ISO-8859-1" #<buffer *w3m*>)
  #[(G90087 G90088 G90089 G90090 G90091 G90092 G90093 G90094 type) "ÆJ!ƒthis wiki page. 
    Buffer menu line
    . % *w3m*       85297  w3m      TWiki . Glew . GlewBlog

Step 2 - OK: When I start trying to edit this wiki page, I am given a web 
page with a 1793 line long text area. well, actually, only 16 lines
of the textarea are displayed, but I assume the others are there.

    Buffer menu line
    . % *w3m*        2616  w3m      TWiki . Glew . GlewBlog (edit)

Step 3 - OK: when I hit return in the textarea, I correctly enter the
emacs/w3m emacs editor window/buffer.

    Buffer menu lines
    .*  *w3m form textarea*    82925 w3m form textarea 
      % *w3m*        2616  w3m      TWiki . Glew . GlewBlog (edit)


Step 4 - OK: I can save the page from emacs/w3m via ctl-C ctl-C.  This
puts me back into the wiki page with the 1793 line long text area.

    Buffer menu line:
    . % *w3m*        2616  w3m      TWiki . Glew . GlewBlog (edit)


Step 5 - Problem:  when I hit the preview button on the wiki page of
step 4, I end up with an error.

    Error message in minibuffer (extracted from *messages* buffer):

   error in process sentinel: w3m-form-parse-and-fontify: Stack overflow in regexp matcher
   error in process sentinel: Stack overflow in regexp matcher

    Buffer menu line:
    .*% *w3m*      183853  w3m      TWiki . Glew . GlewBlog (edit)

    The *w3m* buffer contains partially formatted text that looks like

       |<base href="https://dpg-or.pdx.intel.com/Tools/TWiki/bin/view/Glew/GlewBlog"><pre_int><img_alt src="/Tools/TWiki/pub/wikiHome.gif" hseq="1" title="TWiki Home">TWiki Ho</img_alt></pre_int>                                        <_SYMBOL TYPE=32>•</_SYMBOL> To save changes: Press the [Save Changes] button.      
       |         TWiki . Glew . GlewBlog (preview)      <_SYMBOL TYPE=32>•</_SYMBOL> To make more changes: Go back in your browser.         
       |                                                <_SYMBOL TYPE=32>•</_SYMBOL> To cancel: Go back twice.                              
       |                                                                                                         
       |         Note: This is a preview. Do not forget to save your changes.                                    
       |
       |
       |Glew's pseudo-wiki-blog
       |
       |GlewBlogTOC - Table of Contents
       |
       |Friday December 10, 2004

    (The pipe symbol | indicates beginning of line, and is not in the buffer.)

    I.e. the buffer appears to be partially formatted.

    The w3m process is still running - at least, if I try to "g" to go
    to a nw page I get told that I cannot start the asynchronous
    process twice.

   Cannot run two w3m processes simultaneously (Type `C-c C-k' to stop asynchronous process)

---++ Reproducing the bug for you

Unfortunately, my blog in on an Intel internal web site.  Not only can
outsiders not access it, but my blog may contain stuff that I would
get in trouble for releasing to the outside world.

I'm recording the above, as I first encountered the bug, because it's
better than nothing.  I understand that it would be nicer to provide
you with an example where you can reproduce the bug, and I will be
attempting to do so.

(However, I am not one of those who say that a non-reproduceable bug
report is worthless. Reproduceable bug reports are best, but sometimes
reproducing a bug is hard. Sometimes a good description will allow
somebody more expert than the bug reporter to locate, reproduce,
and/or fix the bug.)

...

I have reproduced the bug at http://www.twiki.org.

Specifically, page
http://twiki.org/cgi-bin/view/Sandbox/SandBoxW3m
holds the bug description (a version of this email)
http://twiki.org/cgi-bin/view/Sandbox/EmacsW3mBugReport
and a page that demonstrates the bug 
http://twiki.org/cgi-bin/view/Sandbox/EmacsW3mBugDemo

The page that demonstrates the bug can be edited,
but the edited page cannot be previewed.

...

I was able to edit and preview the demo page using
w3m, but not using emacs-w3m. From w3m I was
able to save the HTML.

w3m-find-file on the saved HTML reproduced the bug.
Unfortunately, the HTML that causes the bug is too large
to attach to this wiki or to mail (as you willl see below).

Binary search revealed that the problem was a hidden input
element.

<verbatim>
   <input type="hidden" name="text" value="TBD deleted large text" />
</verbatim>

This element contained the entire text of the page,
converted to a single line.

Binary search revealed that the stack overflow occurred between a line
length of 68K and 69K bytes.


I.e. apparently the regexps used by emacs-w4m cause a stack overflow
for line lengths of approximately 69K bytes.


COMMENT: that's a pretty long line!  However, I anticiapte and
disagree with statements such as "nobody in their right mind would
create such a long line. TWiki apparently does. Automatically
generated code.

Also note that an HTML document need have no newlines
at all. Line length limited regexp parsing is dangerous.

I have not yet investigated what might be done to emacs-w4m to avoid
this problem.  Usually what needs to be done in such cases is to
replace a single powerful regexp with simpler regexps. Possibly also
to skip long lines.