Utility functions for the engines

Utility functions for the engines

searx.utils.convert_str_to_int(number_str: str) int[source]

Convert number_str to int or 0 if number_str is not a number.

searx.utils.dict_subset(dictionnary: MutableMapping, properties: Set[str]) Dict[source]

Extract a subset of a dict

Examples:
>>> dict_subset({'A': 'a', 'B': 'b', 'C': 'c'}, ['A', 'C'])
{'A': 'a', 'C': 'c'}
>>> >> dict_subset({'A': 'a', 'B': 'b', 'C': 'c'}, ['A', 'D'])
{'A': 'a'}
searx.utils.ecma_unescape(string: str) str[source]

Python implementation of the unescape javascript function

https://www.ecma-international.org/ecma-262/6.0/#sec-unescape-string https://developer.mozilla.org/fr/docs/Web/JavaScript/Reference/Objets_globaux/unescape

Examples:
>>> ecma_unescape('%u5409')
'吉'
>>> ecma_unescape('%20')
' '
>>> ecma_unescape('%F3')
'ó'
searx.utils.eval_xpath(element: ElementBase, xpath_spec: Union[str, XPath])[source]

Equivalent of element.xpath(xpath_str) but compile xpath_str once for all. See https://lxml.de/xpathxslt.html#xpath-return-values

Args:
  • element (ElementBase): [description]

  • xpath_spec (str|lxml.etree.XPath): XPath as a str or lxml.etree.XPath

Returns:
  • result (bool, float, list, str): Results.

Raises:
  • TypeError: Raise when xpath_spec is neither a str nor a lxml.etree.XPath

  • SearxXPathSyntaxException: Raise when there is a syntax error in the XPath

  • SearxEngineXPathException: Raise when the XPath can’t be evaluated.

searx.utils.eval_xpath_getindex(elements: ~lxml.etree.ElementBase, xpath_spec: ~typing.Union[str, ~lxml.etree.XPath], index: int, default=<searx.utils._NotSetClass object>)[source]

Call eval_xpath_list then get one element using the index parameter. If the index does not exist, either aise an exception is default is not set, other return the default value (can be None).

Args:
  • elements (ElementBase): lxml element to apply the xpath.

  • xpath_spec (str|lxml.etree.XPath): XPath as a str or lxml.etree.XPath.

  • index (int): index to get

  • default (Object, optional): Defaults if index doesn’t exist.

Raises:
  • TypeError: Raise when xpath_spec is neither a str nor a lxml.etree.XPath

  • SearxXPathSyntaxException: Raise when there is a syntax error in the XPath

  • SearxEngineXPathException: if the index is not found. Also see eval_xpath.

Returns:
  • result (bool, float, list, str): Results.

searx.utils.eval_xpath_list(element: ElementBase, xpath_spec: Union[str, XPath], min_len: Optional[int] = None)[source]

Same as eval_xpath, check if the result is a list

Args:
  • element (ElementBase): [description]

  • xpath_spec (str|lxml.etree.XPath): XPath as a str or lxml.etree.XPath

  • min_len (int, optional): [description]. Defaults to None.

Raises:
  • TypeError: Raise when xpath_spec is neither a str nor a lxml.etree.XPath

  • SearxXPathSyntaxException: Raise when there is a syntax error in the XPath

  • SearxEngineXPathException: raise if the result is not a list

Returns:
  • result (bool, float, list, str): Results.

searx.utils.extract_text(xpath_results, allow_none: bool = False) Optional[str][source]

Extract text from a lxml result

  • if xpath_results is list, extract the text from each result and concat the list

  • if xpath_results is a xml element, extract all the text node from it ( text_content() method from lxml )

  • if xpath_results is a string element, then it’s already done

searx.utils.extract_url(xpath_results, base_url) str[source]

Extract and normalize URL from lxml Element

Args:
  • xpath_results (Union[List[html.HtmlElement], html.HtmlElement]): lxml Element(s)

  • base_url (str): Base URL

Example:
>>> def f(s, search_url):
>>>    return searx.utils.extract_url(html.fromstring(s), search_url)
>>> f('<span id="42">https://example.com</span>', 'http://example.com/')
'https://example.com/'
>>> f('https://example.com', 'http://example.com/')
'https://example.com/'
>>> f('//example.com', 'http://example.com/')
'http://example.com/'
>>> f('//example.com', 'https://example.com/')
'https://example.com/'
>>> f('/path?a=1', 'https://example.com')
'https://example.com/path?a=1'
>>> f('', 'https://example.com')
raise lxml.etree.ParserError
>>> searx.utils.extract_url([], 'https://example.com')
raise ValueError
Raises:
  • ValueError

  • lxml.etree.ParserError

Returns:
  • str: normalized URL

searx.utils.gen_useragent(os_string: Optional[str] = None) str[source]

Return a random browser User Agent

See searx/data/useragents.json

searx.utils.get_engine_from_settings(name: str) Dict[source]

Return engine configuration from settings.yml of a given engine name

searx.utils.get_torrent_size(filesize: str, filesize_multiplier: str) Optional[int][source]
Args:
  • filesize (str): size

  • filesize_multiplier (str): TB, GB, …. TiB, GiB…

Returns:
  • int: number of bytes

Example:
>>> get_torrent_size('5', 'GB')
5368709120
>>> get_torrent_size('3.14', 'MiB')
3140000
searx.utils.get_xpath(xpath_spec: Union[str, XPath]) XPath[source]

Return cached compiled XPath

There is no thread lock. Worst case scenario, xpath_str is compiled more than one time.

Args:
  • xpath_spec (str|lxml.etree.XPath): XPath as a str or lxml.etree.XPath

Returns:
  • result (bool, float, list, str): Results.

Raises:
  • TypeError: Raise when xpath_spec is neither a str nor a lxml.etree.XPath

  • SearxXPathSyntaxException: Raise when there is a syntax error in the XPath

searx.utils.html_to_text(html_str: str) str[source]

Extract text from a HTML string

Args:
  • html_str (str): string HTML

Returns:
  • str: extracted text

Examples:
>>> html_to_text('Example <span id="42">#2</span>')
'Example #2'
>>> html_to_text('<style>.span { color: red; }</style><span>Example</span>')
'Example'
searx.utils.int_or_zero(num: Union[List[str], str]) int[source]

Convert num to int or 0. num can be either a str or a list. If num is a list, the first element is converted to int (or return 0 if the list is empty). If num is a str, see convert_str_to_int

searx.utils.is_valid_lang(lang) Optional[Tuple[bool, str, str]][source]

Return language code and name if lang describe a language.

Examples:
>>> is_valid_lang('zz')
None
>>> is_valid_lang('uk')
(True, 'uk', 'ukrainian')
>>> is_valid_lang(b'uk')
(True, 'uk', 'ukrainian')
>>> is_valid_lang('en')
(True, 'en', 'english')
>>> searx.utils.is_valid_lang('Español')
(True, 'es', 'spanish')
>>> searx.utils.is_valid_lang('Spanish')
(True, 'es', 'spanish')
searx.utils.match_language(locale_code, lang_list=[], custom_aliases={}, fallback: Optional[str] = 'en-US') Optional[str][source]

get the language code from lang_list that best matches locale_code

searx.utils.normalize_url(url: str, base_url: str) str[source]

Normalize URL: add protocol, join URL with base_url, add trailing slash if there is no path

Args:
  • url (str): Relative URL

  • base_url (str): Base URL, it must be an absolute URL.

Example:
>>> normalize_url('https://example.com', 'http://example.com/')
'https://example.com/'
>>> normalize_url('//example.com', 'http://example.com/')
'http://example.com/'
>>> normalize_url('//example.com', 'https://example.com/')
'https://example.com/'
>>> normalize_url('/path?a=1', 'https://example.com')
'https://example.com/path?a=1'
>>> normalize_url('', 'https://example.com')
'https://example.com/'
>>> normalize_url('/test', '/path')
raise ValueError
Raises:
  • lxml.etree.ParserError

Returns:
  • str: normalized URL

searx.utils.searx_useragent() str[source]

Return the searx User Agent

searx.utils.to_string(obj: Any) str[source]

Convert obj to its string representation.