[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: youtube support [CORRECTION]
Apologies. I somehow posted my wrong solution. The correction is
included in-line below
On 2017-09-19 13:37, Boruch Baum wrote:
> I got it into my head to use emacs-w3m for searching youtube videos, and
> decided to share my results, if it's of interest to the project or to
> other users.
>
> 1] Initial configuration
>
> This part was straightforward:
>
> #+BEGIN_SRC emacs-lisp
> (add-to-list 'w3m-uri-replace-alist
> '("\\`yt:" w3m-search-uri-replace "youtube"))
>
> (add-to-list ' w3m-content-type-alist
> '("youtube" "https://www.youtube.com/search?q=%s"))
> #+END_SRC
>
>
> 2] Initial result
>
> The initial result was surpisingly satisfying because many of the site
> features don't require javascript. Twenty results appear per page,
> navigable using links at the footer.
>
> 3] Thumbnail correction
>
> Youtube seems to have some form of server bandwidth optimization to
> limit thumbnail images of result to the first six results. I suspect
> this is because on javascript-enabled browsers the results appear in
> a single, infinitely-scrolling page.
>
> This turned out to be easy to fix, using a simple emacs-w3m 'filter' (but
> wait for more below):
>
> #+BEGIN_SRC emacs-lisp
(goto-char (point-min))
(while (search-forward "<img" nil t)
(let ((p1 (match-end 0)) (p2 (search-forward ">" nil t)))
(goto-char p1)
(when (search-forward "data-thumb=" p2 t)
(goto-char p1)
(when (re-search-forward "src=\"[^\"]*\"" p2 t)
(replace-match "")
(goto-char p1)
(re-search-forward "data-thumb=" p2 t)
(replace-match "src=")))))
> #+END_SRC
>
> 4] My current filter
>
> Once I got started messing with the page, I made a bunch of other
> changes to make it appear more compact and to remove what I
> considered unnecessary stuff.
>
> #+BEGIN_SRC emacs-lisp
> ; ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
> ; ┃ youtube filter ┃
> ; ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
(defun w3m-filter-youtube (url)
(w3m-filter-delete-regions url "<head>" "<title>" t t)
(w3m-filter-delete-regions url "</title>" "</head>" t t)
(insert "<body>")
(w3m-filter-delete-regions url
"<body" "<p class=\"num-results" nil t (point))
(w3m-filter-delete-regions url
"<div id=\"footer-container\"" "</body>" nil t)
(w3m-filter-replace-regexp url "</?h4[^>]*>" "")
(goto-char (point-min))
(let ((p1 (point)) (p2 (search-forward "<ol" nil t)))
(w3m-filter-replace-regexp url "<li>" " | " p1 p2)
(w3m-filter-replace-regexp url "<ul>" "" p1 p2)
(w3m-filter-replace-regexp url
"<li><div class=\"yt-lockup[^>]*>" "<p><li>" p2)
(w3m-filter-replace-regexp url "<button.*</button>" "")
(w3m-filter-replace-regexp url "<a aria-hidden[^>]*>" "")
(w3m-filter-replace-regexp url "</?h3[^>]*>" "")
(goto-char (point-min))
(while (search-forward "<ul class=\"yt-lockup-meta-info\">" nil t)
(delete-region (match-beginning 0) (match-end 0))
(setq p1 (point) p2 (search-forward "</ul>" nil t))
(w3m-filter-replace-regexp url "</?li>" " " p1 p2)
(w3m-filter-replace-regexp url "</ul>" "" p1 nil 1))
(goto-char (point-min))
(while (search-forward "<ul class=\"yt-badge-list \">" nil t)
(replace-match "")
(setq p1 (point) p2 (search-forward "</ul>" nil t))
(insert " ")
(w3m-filter-replace-regexp url "</?li[^>]*>" " " p1 p2)
(w3m-filter-replace-regexp url "</ul>" "" p1 nil 1))
(w3m-filter-replace-regexp url
"</div><div class=\"yt-lockup-meta \">" "")
(goto-char (point-min))
(while (search-forward "<img" nil t)
(let ((p1 (match-end 0)) (p2 (search-forward ">" nil t)))
(goto-char p1)
(when (search-forward "data-thumb=" p2 t)
(goto-char p1)
(when (re-search-forward "src=\"[^\"]*\"" p2 t)
(replace-match "")
(goto-char p1)
(re-search-forward "data-thumb=" p2 t)
(replace-match "src=")))))))
>
> (add-to-list 'w3m-filter-configuration
> '(t
> "filter for youtube.com"
> "\\`http[s]?://www.youtube\\.com/"
> w3m-filter-youtube))
> #+END_SRC
>
> 5] TODO, maybe
>
> There is a minor bug at the bottom of the page that I'm not terribly
> motivated to try to fix, unless maybe someone makes a complaint. The
> search navigation links should read something like
>
> "1 2 3 4 5 6 7 Next »"
>
> where all the text are links except for the indicator of the current
> page. However, the current code deletes the element indicating the
> current page, so for example, at page three what appears is
>
> "1 2 4 5 6 7 Next »"
>
> This is because of how the filter is currently deleting "button"s.
>
--
hkp://keys.gnupg.net
CA45 09B5 5351 7C11 A9D1 7286 0036 9E45 1595 8BC0