[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Hidden hyperlinks

Off-list, I've been having a discussion with Andrés Ramírez that I
thought was worth summarizing on-list because it might be a useful
feature to pursue if more interest exists. The idea would make emacs-w3m
much more useful for sites that hide features within html <script> elements
and maybe within other elements -- without needing to run any of the scripts.

There exist sites that hide useful raw data within html <script> elements
and maybe within other elements also, but from what I can tell, while
emacs-w3m can access that information using function w3m-view-source, it
can't contextualize the data to a specific region in the rendered buffer
without trying to duplicate the work already being done in processing
'w3m -halfdump' data.

The example page that Andrés pointed out to me is here[1], and this is
an example of some of the data of interest:

8< ------------------------------------------------------------------ >8

_firtPlayers['3033'] = {
        id: 3033,
        url: 'https://rpp.pe/audio/podcast/larecetadegastonacurio/tacacho-con-cecina-las-raices-de-la-selva-en-la-mesa-de-tu-hogar-por-gaston-acurio-3033',
        cover: 'https://md1.rpp-noticias.io/256x256/2018/09/10/6752021400x1400-gastonjpg.jpg',
        title: 'Tacacho con cecina: las ra&iacute;ces de la Selva en la mesa de tu hogar por Gast&oacute;n Acurio',
        media: 'https://gruporpp.mc.tritondigital.com/LARECETADEGASTONACURIO_P/media/podcast%2F2020%2F03%2F24%2F402-tacacho-con-cecina-podcast.mp3'
8< ------------------------------------------------------------------ >8

Andrés would like have easy access to the hidden urls (and possibly other
data?) at a particular position on the page.

From what I can tell, the current rendering process, based upon 'w3m
-hallfdump' doesn't reveal the information at all, so one question would
be: Is there another undocumented feature of w3m that could be useful
for this?

A second question: Is the only alternative to try to splice the data into the
correct position of rendered page using function w3m-view-source? That
could be quite cpu-heavy for a page with many anchors. One would need to
incrementally compare the source to the rendered page, anchor-by-anchor
and text-to-text to find where each script element is positioned. Then,
at that position create a new text-property for each hidden data element
and offer a command or an optional setting to make those elements visible.

Is there a simpler way? Any ideas?

[1] https://rpp.pe/audio/podcast/larecetadegastonacurio/tacacho-con-cecina-las-raices-de-la-selva-en-la-mesa-de-tu-hogar-por-gaston-acurio-3033

CA45 09B5 5351 7C11 A9D1  7286 0036 9E45 1595 8BC0