Submitted By:            Xi Ruoyao <xry111 at xry111 dot site>
Date:                    2025-04-07
Initial Package Version: 3.9.0
Upstream Status:         Committed
Origin:                  Self, upstream MR 597
Description:             Fix compatibility with libxml2 >= 2.14.

From f3245004ecebf1a9829875bc6232fc94dddb2858 Mon Sep 17 00:00:00 2001
From: Xi Ruoyao <xry111@xry111.site>
Date: Mon, 31 Mar 2025 14:03:44 +0800
Subject: [PATCH] extract/html: Don't assume a NUL-terminated string for the
 SAX callback

The HTML parser or libxml2 used to always pass NUL-terminated string to
the callback, but it's not documented and it no longer happens with
libxml2-2.14.0.

Fixes #391.

Signed-off-by: Xi Ruoyao <xry111@xry111.site>
---
 src/extractor/tracker-extract-html.c | 14 +++++---------
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/src/extractor/tracker-extract-html.c b/src/extractor/tracker-extract-html.c
index 6ea58b5af..7189d61df 100644
--- a/src/extractor/tracker-extract-html.c
+++ b/src/extractor/tracker-extract-html.c
@@ -196,20 +196,16 @@ parser_characters (void          *data,
 
 	switch (pd->current) {
 	case READ_TITLE:
-		g_string_append (pd->title, ch);
+		g_string_append_len (pd->title, ch, len);
 		break;
 	case READ_IGNORE:
 		break;
 	default:
 		if (pd->in_body && pd->n_bytes_remaining > 0) {
-			gsize text_len;
-
-			text_len = strlen (ch);
-
 			if (tracker_text_validate_utf8 (ch,
-			                                (pd->n_bytes_remaining < text_len ?
+			                                (pd->n_bytes_remaining < len ?
 			                                 pd->n_bytes_remaining :
-			                                 text_len),
+			                                 len),
 			                                &pd->plain_text,
 			                                NULL)) {
 				/* In the case of HTML, each string arriving this
@@ -219,8 +215,8 @@ parser_characters (void          *data,
 				g_string_append_c (pd->plain_text, ' ');
 			}
 
-			if (pd->n_bytes_remaining > text_len) {
-				pd->n_bytes_remaining -= text_len;
+			if (pd->n_bytes_remaining > len) {
+				pd->n_bytes_remaining -= len;
 			} else {
 				pd->n_bytes_remaining = 0;
 			}
-- 
GitLab

