[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Hidden hyperlinks
- From: Boruch Baum <boruch_baum@xxxxxxx>
- Date: Wed, 6 May 2020 14:28:48 -0400
- X-ml-name: emacs-w3m
- X-mail-count: 13623
Off-list, I've been having a discussion with Andrés Ramírez that I
thought was worth summarizing on-list because it might be a useful
feature to pursue if more interest exists. The idea would make emacs-w3m
much more useful for sites that hide features within html <script> elements
and maybe within other elements -- without needing to run any of the scripts.
There exist sites that hide useful raw data within html <script> elements
and maybe within other elements also, but from what I can tell, while
emacs-w3m can access that information using function w3m-view-source, it
can't contextualize the data to a specific region in the rendered buffer
without trying to duplicate the work already being done in processing
'w3m -halfdump' data.
The example page that Andrés pointed out to me is here[1], and this is
an example of some of the data of interest:
8< ------------------------------------------------------------------ >8
_firtPlayers['3033'] = {
id: 3033,
url: 'https://rpp.pe/audio/podcast/larecetadegastonacurio/tacacho-con-cecina-las-raices-de-la-selva-en-la-mesa-de-tu-hogar-por-gaston-acurio-3033',
cover: 'https://md1.rpp-noticias.io/256x256/2018/09/10/6752021400x1400-gastonjpg.jpg',
title: 'Tacacho con cecina: las raíces de la Selva en la mesa de tu hogar por Gastón Acurio',
media: 'https://gruporpp.mc.tritondigital.com/LARECETADEGASTONACURIO_P/media/podcast%2F2020%2F03%2F24%2F402-tacacho-con-cecina-podcast.mp3'
}
8< ------------------------------------------------------------------ >8
Andrés would like have easy access to the hidden urls (and possibly other
data?) at a particular position on the page.
From what I can tell, the current rendering process, based upon 'w3m
-hallfdump' doesn't reveal the information at all, so one question would
be: Is there another undocumented feature of w3m that could be useful
for this?
A second question: Is the only alternative to try to splice the data into the
correct position of rendered page using function w3m-view-source? That
could be quite cpu-heavy for a page with many anchors. One would need to
incrementally compare the source to the rendered page, anchor-by-anchor
and text-to-text to find where each script element is positioned. Then,
at that position create a new text-property for each hidden data element
and offer a command or an optional setting to make those elements visible.
Is there a simpler way? Any ideas?
refs:
[1] https://rpp.pe/audio/podcast/larecetadegastonacurio/tacacho-con-cecina-las-raices-de-la-selva-en-la-mesa-de-tu-hogar-por-gaston-acurio-3033
--
hkp://keys.gnupg.net
CA45 09B5 5351 7C11 A9D1 7286 0036 9E45 1595 8BC0