Utility functions for the engines¶
Utility functions for the engines
- searx.utils.convert_str_to_int(number_str: str) int[source]¶
- Convert number_str to int or 0 if number_str is not a number. 
- searx.utils.dict_subset(dictionnary: MutableMapping, properties: Set[str]) Dict[source]¶
- Extract a subset of a dict - Examples:
- >>> dict_subset({'A': 'a', 'B': 'b', 'C': 'c'}, ['A', 'C']) {'A': 'a', 'C': 'c'} >>> >> dict_subset({'A': 'a', 'B': 'b', 'C': 'c'}, ['A', 'D']) {'A': 'a'} 
 
- searx.utils.ecma_unescape(string: str) str[source]¶
- Python implementation of the unescape javascript function - https://www.ecma-international.org/ecma-262/6.0/#sec-unescape-string https://developer.mozilla.org/fr/docs/Web/JavaScript/Reference/Objets_globaux/unescape - Examples:
- >>> ecma_unescape('%u5409') '吉' >>> ecma_unescape('%20') ' ' >>> ecma_unescape('%F3') 'ó' 
 
- searx.utils.eval_xpath(element: ElementBase, xpath_spec: Union[str, XPath])[source]¶
- Equivalent of element.xpath(xpath_str) but compile xpath_str once for all. See https://lxml.de/xpathxslt.html#xpath-return-values - Args:
- element (ElementBase): [description] 
- xpath_spec (str|lxml.etree.XPath): XPath as a str or lxml.etree.XPath 
 
- Returns:
- result (bool, float, list, str): Results. 
 
- Raises:
- TypeError: Raise when xpath_spec is neither a str nor a lxml.etree.XPath 
- SearxXPathSyntaxException: Raise when there is a syntax error in the XPath 
- SearxEngineXPathException: Raise when the XPath can’t be evaluated. 
 
 
- searx.utils.eval_xpath_getindex(elements: ~lxml.etree.ElementBase, xpath_spec: ~typing.Union[str, ~lxml.etree.XPath], index: int, default=<searx.utils._NotSetClass object>)[source]¶
- Call eval_xpath_list then get one element using the index parameter. If the index does not exist, either aise an exception is default is not set, other return the default value (can be None). - Args:
- elements (ElementBase): lxml element to apply the xpath. 
- xpath_spec (str|lxml.etree.XPath): XPath as a str or lxml.etree.XPath. 
- index (int): index to get 
- default (Object, optional): Defaults if index doesn’t exist. 
 
- Raises:
- TypeError: Raise when xpath_spec is neither a str nor a lxml.etree.XPath 
- SearxXPathSyntaxException: Raise when there is a syntax error in the XPath 
- SearxEngineXPathException: if the index is not found. Also see eval_xpath. 
 
- Returns:
- result (bool, float, list, str): Results. 
 
 
- searx.utils.eval_xpath_list(element: ElementBase, xpath_spec: Union[str, XPath], min_len: Optional[int] = None)[source]¶
- Same as eval_xpath, check if the result is a list - Args:
- element (ElementBase): [description] 
- xpath_spec (str|lxml.etree.XPath): XPath as a str or lxml.etree.XPath 
- min_len (int, optional): [description]. Defaults to None. 
 
- Raises:
- TypeError: Raise when xpath_spec is neither a str nor a lxml.etree.XPath 
- SearxXPathSyntaxException: Raise when there is a syntax error in the XPath 
- SearxEngineXPathException: raise if the result is not a list 
 
- Returns:
- result (bool, float, list, str): Results. 
 
 
- searx.utils.extract_text(xpath_results, allow_none: bool = False) Optional[str][source]¶
- Extract text from a lxml result - if xpath_results is list, extract the text from each result and concat the list 
- if xpath_results is a xml element, extract all the text node from it ( text_content() method from lxml ) 
- if xpath_results is a string element, then it’s already done 
 
- searx.utils.extract_url(xpath_results, base_url) str[source]¶
- Extract and normalize URL from lxml Element - Args:
- xpath_results (Union[List[html.HtmlElement], html.HtmlElement]): lxml Element(s) 
- base_url (str): Base URL 
 
- Example:
- >>> def f(s, search_url): >>> return searx.utils.extract_url(html.fromstring(s), search_url) >>> f('<span id="42">https://example.com</span>', 'http://example.com/') 'https://example.com/' >>> f('https://example.com', 'http://example.com/') 'https://example.com/' >>> f('//example.com', 'http://example.com/') 'http://example.com/' >>> f('//example.com', 'https://example.com/') 'https://example.com/' >>> f('/path?a=1', 'https://example.com') 'https://example.com/path?a=1' >>> f('', 'https://example.com') raise lxml.etree.ParserError >>> searx.utils.extract_url([], 'https://example.com') raise ValueError 
- Raises:
- ValueError 
- lxml.etree.ParserError 
 
- Returns:
- str: normalized URL 
 
 
- searx.utils.gen_useragent(os_string: Optional[str] = None) str[source]¶
- Return a random browser User Agent - See searx/data/useragents.json 
- searx.utils.get_engine_from_settings(name: str) Dict[source]¶
- Return engine configuration from settings.yml of a given engine name 
- searx.utils.get_torrent_size(filesize: str, filesize_multiplier: str) Optional[int][source]¶
- Args:
- filesize (str): size 
- filesize_multiplier (str): TB, GB, …. TiB, GiB… 
 
- Returns:
- int: number of bytes 
 
- Example:
- >>> get_torrent_size('5', 'GB') 5368709120 >>> get_torrent_size('3.14', 'MiB') 3140000 
 
- searx.utils.get_xpath(xpath_spec: Union[str, XPath]) XPath[source]¶
- Return cached compiled XPath - There is no thread lock. Worst case scenario, xpath_str is compiled more than one time. - Args:
- xpath_spec (str|lxml.etree.XPath): XPath as a str or lxml.etree.XPath 
 
- Returns:
- result (bool, float, list, str): Results. 
 
- Raises:
- TypeError: Raise when xpath_spec is neither a str nor a lxml.etree.XPath 
- SearxXPathSyntaxException: Raise when there is a syntax error in the XPath 
 
 
- searx.utils.html_to_text(html_str: str) str[source]¶
- Extract text from a HTML string - Args:
- html_str (str): string HTML 
 
- Returns:
- str: extracted text 
 
- Examples:
- >>> html_to_text('Example <span id="42">#2</span>') 'Example #2' - >>> html_to_text('<style>.span { color: red; }</style><span>Example</span>') 'Example' 
 
- searx.utils.int_or_zero(num: Union[List[str], str]) int[source]¶
- Convert num to int or 0. num can be either a str or a list. If num is a list, the first element is converted to int (or return 0 if the list is empty). If num is a str, see convert_str_to_int 
- searx.utils.is_valid_lang(lang) Optional[Tuple[bool, str, str]][source]¶
- Return language code and name if lang describe a language. - Examples:
- >>> is_valid_lang('zz') None >>> is_valid_lang('uk') (True, 'uk', 'ukrainian') >>> is_valid_lang(b'uk') (True, 'uk', 'ukrainian') >>> is_valid_lang('en') (True, 'en', 'english') >>> searx.utils.is_valid_lang('Español') (True, 'es', 'spanish') >>> searx.utils.is_valid_lang('Spanish') (True, 'es', 'spanish') 
 
- searx.utils.match_language(locale_code, lang_list=[], custom_aliases={}, fallback: Optional[str] = 'en-US') Optional[str][source]¶
- get the language code from lang_list that best matches locale_code 
- searx.utils.normalize_url(url: str, base_url: str) str[source]¶
- Normalize URL: add protocol, join URL with base_url, add trailing slash if there is no path - Args:
- url (str): Relative URL 
- base_url (str): Base URL, it must be an absolute URL. 
 
- Example:
- >>> normalize_url('https://example.com', 'http://example.com/') 'https://example.com/' >>> normalize_url('//example.com', 'http://example.com/') 'http://example.com/' >>> normalize_url('//example.com', 'https://example.com/') 'https://example.com/' >>> normalize_url('/path?a=1', 'https://example.com') 'https://example.com/path?a=1' >>> normalize_url('', 'https://example.com') 'https://example.com/' >>> normalize_url('/test', '/path') raise ValueError 
- Raises:
- lxml.etree.ParserError 
 
- Returns:
- str: normalized URL