Utility functions for the engines¶
Utility functions for the engines
- searx.utils.convert_str_to_int(number_str: str) int [source]¶
Convert number_str to int or 0 if number_str is not a number.
- searx.utils.dict_subset(dictionnary: MutableMapping, properties: Set[str]) Dict [source]¶
Extract a subset of a dict
- Examples:
>>> dict_subset({'A': 'a', 'B': 'b', 'C': 'c'}, ['A', 'C']) {'A': 'a', 'C': 'c'} >>> >> dict_subset({'A': 'a', 'B': 'b', 'C': 'c'}, ['A', 'D']) {'A': 'a'}
- searx.utils.ecma_unescape(string: str) str [source]¶
Python implementation of the unescape javascript function
https://www.ecma-international.org/ecma-262/6.0/#sec-unescape-string https://developer.mozilla.org/fr/docs/Web/JavaScript/Reference/Objets_globaux/unescape
- Examples:
>>> ecma_unescape('%u5409') '吉' >>> ecma_unescape('%20') ' ' >>> ecma_unescape('%F3') 'ó'
- searx.utils.eval_xpath(element: ElementBase, xpath_spec: Union[str, XPath])[source]¶
Equivalent of element.xpath(xpath_str) but compile xpath_str once for all. See https://lxml.de/xpathxslt.html#xpath-return-values
- Args:
element (ElementBase): [description]
xpath_spec (str|lxml.etree.XPath): XPath as a str or lxml.etree.XPath
- Returns:
result (bool, float, list, str): Results.
- Raises:
TypeError: Raise when xpath_spec is neither a str nor a lxml.etree.XPath
SearxXPathSyntaxException: Raise when there is a syntax error in the XPath
SearxEngineXPathException: Raise when the XPath can’t be evaluated.
- searx.utils.eval_xpath_getindex(elements: ~lxml.etree.ElementBase, xpath_spec: ~typing.Union[str, ~lxml.etree.XPath], index: int, default=<searx.utils._NotSetClass object>)[source]¶
Call eval_xpath_list then get one element using the index parameter. If the index does not exist, either aise an exception is default is not set, other return the default value (can be None).
- Args:
elements (ElementBase): lxml element to apply the xpath.
xpath_spec (str|lxml.etree.XPath): XPath as a str or lxml.etree.XPath.
index (int): index to get
default (Object, optional): Defaults if index doesn’t exist.
- Raises:
TypeError: Raise when xpath_spec is neither a str nor a lxml.etree.XPath
SearxXPathSyntaxException: Raise when there is a syntax error in the XPath
SearxEngineXPathException: if the index is not found. Also see eval_xpath.
- Returns:
result (bool, float, list, str): Results.
- searx.utils.eval_xpath_list(element: ElementBase, xpath_spec: Union[str, XPath], min_len: Optional[int] = None)[source]¶
Same as eval_xpath, check if the result is a list
- Args:
element (ElementBase): [description]
xpath_spec (str|lxml.etree.XPath): XPath as a str or lxml.etree.XPath
min_len (int, optional): [description]. Defaults to None.
- Raises:
TypeError: Raise when xpath_spec is neither a str nor a lxml.etree.XPath
SearxXPathSyntaxException: Raise when there is a syntax error in the XPath
SearxEngineXPathException: raise if the result is not a list
- Returns:
result (bool, float, list, str): Results.
- searx.utils.extract_text(xpath_results, allow_none: bool = False) Optional[str] [source]¶
Extract text from a lxml result
if xpath_results is list, extract the text from each result and concat the list
if xpath_results is a xml element, extract all the text node from it ( text_content() method from lxml )
if xpath_results is a string element, then it’s already done
- searx.utils.extract_url(xpath_results, base_url) str [source]¶
Extract and normalize URL from lxml Element
- Args:
xpath_results (Union[List[html.HtmlElement], html.HtmlElement]): lxml Element(s)
base_url (str): Base URL
- Example:
>>> def f(s, search_url): >>> return searx.utils.extract_url(html.fromstring(s), search_url) >>> f('<span id="42">https://example.com</span>', 'http://example.com/') 'https://example.com/' >>> f('https://example.com', 'http://example.com/') 'https://example.com/' >>> f('//example.com', 'http://example.com/') 'http://example.com/' >>> f('//example.com', 'https://example.com/') 'https://example.com/' >>> f('/path?a=1', 'https://example.com') 'https://example.com/path?a=1' >>> f('', 'https://example.com') raise lxml.etree.ParserError >>> searx.utils.extract_url([], 'https://example.com') raise ValueError
- Raises:
ValueError
lxml.etree.ParserError
- Returns:
str: normalized URL
- searx.utils.gen_useragent(os_string: Optional[str] = None) str [source]¶
Return a random browser User Agent
See searx/data/useragents.json
- searx.utils.get_engine_from_settings(name: str) Dict [source]¶
Return engine configuration from settings.yml of a given engine name
- searx.utils.get_torrent_size(filesize: str, filesize_multiplier: str) Optional[int] [source]¶
- Args:
filesize (str): size
filesize_multiplier (str): TB, GB, …. TiB, GiB…
- Returns:
int: number of bytes
- Example:
>>> get_torrent_size('5', 'GB') 5368709120 >>> get_torrent_size('3.14', 'MiB') 3140000
- searx.utils.get_xpath(xpath_spec: Union[str, XPath]) XPath [source]¶
Return cached compiled XPath
There is no thread lock. Worst case scenario, xpath_str is compiled more than one time.
- Args:
xpath_spec (str|lxml.etree.XPath): XPath as a str or lxml.etree.XPath
- Returns:
result (bool, float, list, str): Results.
- Raises:
TypeError: Raise when xpath_spec is neither a str nor a lxml.etree.XPath
SearxXPathSyntaxException: Raise when there is a syntax error in the XPath
- searx.utils.html_to_text(html_str: str) str [source]¶
Extract text from a HTML string
- Args:
html_str (str): string HTML
- Returns:
str: extracted text
- Examples:
>>> html_to_text('Example <span id="42">#2</span>') 'Example #2'
>>> html_to_text('<style>.span { color: red; }</style><span>Example</span>') 'Example'
- searx.utils.int_or_zero(num: Union[List[str], str]) int [source]¶
Convert num to int or 0. num can be either a str or a list. If num is a list, the first element is converted to int (or return 0 if the list is empty). If num is a str, see convert_str_to_int
- searx.utils.is_valid_lang(lang) Optional[Tuple[bool, str, str]] [source]¶
Return language code and name if lang describe a language.
- Examples:
>>> is_valid_lang('zz') None >>> is_valid_lang('uk') (True, 'uk', 'ukrainian') >>> is_valid_lang(b'uk') (True, 'uk', 'ukrainian') >>> is_valid_lang('en') (True, 'en', 'english') >>> searx.utils.is_valid_lang('Español') (True, 'es', 'spanish') >>> searx.utils.is_valid_lang('Spanish') (True, 'es', 'spanish')
- searx.utils.match_language(locale_code, lang_list=[], custom_aliases={}, fallback: Optional[str] = 'en-US') Optional[str] [source]¶
get the language code from lang_list that best matches locale_code
- searx.utils.normalize_url(url: str, base_url: str) str [source]¶
Normalize URL: add protocol, join URL with base_url, add trailing slash if there is no path
- Args:
url (str): Relative URL
base_url (str): Base URL, it must be an absolute URL.
- Example:
>>> normalize_url('https://example.com', 'http://example.com/') 'https://example.com/' >>> normalize_url('//example.com', 'http://example.com/') 'http://example.com/' >>> normalize_url('//example.com', 'https://example.com/') 'https://example.com/' >>> normalize_url('/path?a=1', 'https://example.com') 'https://example.com/path?a=1' >>> normalize_url('', 'https://example.com') 'https://example.com/' >>> normalize_url('/test', '/path') raise ValueError
- Raises:
lxml.etree.ParserError
- Returns:
str: normalized URL