Skip to content

Reference

Core

Api

RedditApi

A class that provides functions to access Reddit API.
Methods

def generate_headers(self, token: str) -> Dict[str, str]: Generates headers for a Reddit API request.

def generate_reddit_api_token(self, verbose: bool = False) -> str: Generate a Reddit API token using the user's credentials.

def get_logged_user_profile(self, verbose: bool) -> Union[Dict, None]: Retrieve the logged-in user's profile information.

def generate_params_for_reddit_api_req(self, after: Optional[str], before: Optional[str], count: Optional[int], limit: Optional[int], show: Optional[str], sr_detail: bool) -> Dict[str, str]: Generate a dictionary of parameters to be used in a Reddit API request.

generate_headers(token)

Generates headers for a Reddit API request.

Parameters:

Name Type Description Default
token str

A Reddit API access token.

required

Returns:

Type Description
Dict[str, str]

Dict[str, str]: A dictionary of headers with authorization token added.

generate_params_for_reddit_api_req(after, before, count, limit, show, sr_detail)

Generate a dictionary of parameters to be used in a Reddit API request.

Parameters:

Name Type Description Default
after Optional[str]

The fullname of the post to start after.

required
before Optional[str]

The fullname of the post to start before.

required
count Optional[int]

The number of items in the listing to skip.

required
limit Optional[int]

The maximum number of items to return.

required
show Optional[str]

The types of items to show.

required
sr_detail Optional[bool]

Whether to return details about the subreddit.

required

Returns:

Type Description
Dict[str, object]

A dictionary containing the specified parameters for a Reddit API request.

generate_reddit_api_token(verbose)

Generate a Reddit API token using the user's credentials.

Parameters:

Name Type Description Default
verbose Optional[bool]

Whether to log additional debug information. Defaults to False.

required

Raises:

Type Description
TokenErrorException

If the request to obtain the token fails.

Returns:

Name Type Description
str str

The generated Reddit API token.

get_logged_user_profile(verbose)

Retrieve the logged-in user's profile information.

Parameters:

Name Type Description Default
verbose Optional[bool]

If True, prints the API request information and response.

required

Returns:

Type Description
Union[Dict, None]

Union[Dict, None]: Returns a dictionary of the user profile information if successful,

Union[Dict, None]

otherwise returns None.

Raises:

Type Description
UserNotFoundException

If the user profile is unavailable or could not be reached.

Helpers

MainHelper

A class that provides helper functions for the main application.

Methods

export_threads_detailed_information(user_or_subreddit: str, export_mode: str, threads_list: dict[str, dict[str, dict[str, Any] | ResultSet | Any] | Any], verbose: bool) -> None: Writes all the image URLs to a JSON file, sorted by subreddit and post.

scrape_user(reddit_user: str, verbose: bool, number_results: int) -> None: Scrapes a Reddit user's submissions, downloads the images, and exports detailed thread information.

scrape_subreddit(subreddits: Optional[List[str]], sorting_type: str, number_results: Optional[int], details: bool, verbose: bool) -> None: Scrapes posts and comments from a subreddit.

export_threads_detailed_information(user_or_subreddit, export_mode, threads_list, output_directory, verbose)

Write all the img urls to a json file, sorted by subreddit and post.

Parameters:

Name Type Description Default
output_directory str

Directory to output thread detailed information

required
user_or_subreddit str

The name of the user or subreddit to export data for.

required
export_mode str

The export mode, which can be "single", "multiple", or "user".

required
threads_list List[dict]

A list of dictionaries containing thread information.

required
verbose Optional[bool]

Whether to print verbose output.

required

Returns:

Type Description
None

None

scrape_subreddit(subreddits, sorting_type, number_results, details, output_directory, verbose)

Scrape posts and comments from a subreddit.

Parameters:

Name Type Description Default
subreddits List[str]

A list of subreddits to scrape. If None, the user's subreddits will be used.

required
sorting_type str

A string indicating how to sort the posts. Valid values: 'hot', 'new', 'top', 'controversial', 'rising'.

required
number_results int

The maximum number of posts to scrape. If None, all posts will be scraped.

required
details bool

If True, exports detailed information about each post to a JSON file.

required
output_directory str

(str): Directory to output the downloaded files and reports

required
verbose bool

If True, displays logging information during the scraping process.

required

Returns:

Type Description
None

None

scrape_user(reddit_user, sort, number_results, output_directory, verbose)

Scrape a Reddit user's submissions, download the images, and export detailed thread information.

Parameters:

Name Type Description Default
reddit_user str

The name of the Reddit user to scrape.

required
sort str

(str): The type of posts to be scraped: hot, new, top

required
verbose Optional[bool]

Whether to print verbose output.

required
output_directory str

(str): Directory to output the downloaded files and reports

required
number_results int

The number of results to scrape.

required

Returns:

Type Description
None

None

Scrapers

ThreadScraper

A class that provides methods to scrape threads from subreddits or users

Methods

def scrape_threads(self, subreddit_or_user: str, sort: str, scrape_mode: str, verbose: bool, max_counter: Optional[int] = None ) -> Dict[str, Union[Dict[str, Union[Dict[str, Any], Dict[str, Any], ResultSet[Any], Dict[str, Any], Any]], Any]]: Scrape threads from a subreddit or user and return the results. def scrape_single_thread(self, link: str, verbose: bool): Scrapes the given Reddit thread URL and returns a tuple containing a list of image URLs and a dictionary of comments.

scrape_single_thread(link, verbose)

Scrapes the given Reddit thread URL and returns a tuple containing a list of image URLs and a dictionary of comments.

Parameters:

Name Type Description Default
link str

The URL of the Reddit thread to be scraped.

required
verbose bool

Whether to print verbose logging messages.

required

Returns:

Type Description
tuple[Any, dict[str, dict[str, Any]]]

A tuple containing a list of image URLs and a dictionary of comments.

scrape_threads(subreddit_or_user, sort, scrape_mode, verbose, max_counter)

Scrape threads from a subreddit or user and return the results.

Parameters:

Name Type Description Default
subreddit_or_user str

The name of the subreddit or user from which to scrape threads.

required
sort str

The method to sort the threads, such as "hot" or "top".

required
scrape_mode str

The mode in which to scrape threads, either "subreddit" or "user".

required
verbose bool

A flag indicating whether to log verbose output.

required
max_counter Optional[int]

The maximum number of threads to scrape.

required

Returns:

Type Description
Dict[str, Union[Dict[str, Union[Dict[str, Any], Dict[str, Any], ResultSet[Any], Dict[str, Any], Any]], Any]]

Dict[str, Union[ Dict[str, Union[Dict[str, Any], Dict[str, Any], ResultSet[Any], Dict[str, Any], Any]], Any]]: A dictionary of thread URLs and their corresponding information, such as their author, datetime, rating, URLs, and comments.

CommentScraper

A class for scraping comments and replies from a given HTML element.

Attributes:

Name Type Description
logging_funcs LoggingSetup

An instance of LoggingSetup for logging purposes.

constants CommonConstants

An instance of ConstantsNamespace for constants.

validations UrlValidations

An instance of UrlValidations for validating URLs.

image_downloader ImageDownloader

An instance of ImageDownloader for downloading images.

scraper_helper ScraperHelper

An instance of ScraperHelper for helper methods.

Methods

def scrape_comments(soup): Scrape comments from the given comments' element. def scrape_replies(reply_divs): Scrapes the replies from the given reply divs and returns a list of dictionaries representing each reply.

scrape_comments(soup)

Scrape comments from the given comments' element.

Parameters:

Name Type Description Default
soup bs4.BeautifulSoup

A BeautifulSoup object representing the HTML or XML source.

required

Returns:

Type Description
Tuple[List[Dict[str, any]], List[str]]

A tuple containing two items: - A list of dictionaries representing each comment, where each dictionary contains the following keys: - 'text': A string representing the text content of the comment. - 'author': A dictionary representing the author of the comment. - 'rating': A dictionary representing the rating of the comment, where each key is a rating category (e.g. 'score_likes') and the value is the score for that category. - 'datetime': A dictionary representing the datetime information of the comment, where each key is a datetime category (e.g. 'time_since_posting') and the value is the corresponding datetime information. - 'numChildren': An integer representing the number of children replies for the given reply. - 'hasChildren': A boolean indicating whether the given reply has children replies or not. - 'urls': A list of strings representing the URLs found in the given comment. - 'replies': A list of nested dictionaries representing the children replies for the given reply. - A list of strings representing the URLs of any images found in the comments.

scrape_replies(reply_divs)

Scrapes the replies from the given reply divs and returns a list of dictionaries representing each reply.

Parameters:

Name Type Description Default
reply_divs List[Any]

A list of reply divs to scrape from.

required

Returns:

Type Description
Tuple[List[Dict[str, Any]], List[str]]

A tuple containing: - A list of dictionaries representing each reply, where each dictionary contains the following keys: - 'text': A string representing the text content of the reply. - 'author': A dictionary representing the author of the reply. - 'rating': A dictionary representing the rating of the reply, where each key is a rating category (e.g. 'score_likes') and the value is the score for that category. - 'datetime': A dictionary representing the datetime information of the reply, where each key is a datetime category (e.g. 'time_since_posting') and the value is the corresponding datetime information. - 'numChildren': An integer representing the number of children replies for the given reply. - 'hasChildren': A boolean indicating whether the given reply has children replies or not. - 'urls': A list of strings representing the URLs found in the given reply. - 'replies': A list of nested dictionaries representing the children replies for the given reply. - A list of unique URLs found in the scraped replies.

ScraperHelper

This class provides helper methods for web scraping comments and threads from forums.

Methods: - def construct_author_dict(self, div_ele: BeautifulSoup) -> Dict[str, str]: Constructs a dictionary with the author's username and profile URL. - def construct_rating_dict(self, div_ele: BeautifulSoup) -> Dict[str, str]: Constructs a dictionary with the comment's rating scores. - def construct_thread_rating_dict(self, div_ele: BeautifulSoup) -> Dict[str, str]: Constructs a dictionary with the thread's rating scores. - def construct_time_dict(self, div_ele: BeautifulSoup) -> Dict[str, str]: Constructs a dictionary with the time information of the comment or thread. - def define_children_fields(self, div_ele: BeautifulSoup) -> Tuple[bool, int]: Defines the number of children and whether a comment or thread has children. - def construct_urls_list(self, div_ele: BeautifulSoup) -> List[str]: Constructs a list of URLs from the div element that contains the URLs.

construct_author_dict(div_ele)

Constructs a dictionary with the author's username and profile URL.

Parameters:

Name Type Description Default
div_ele element.PageElement

A BeautifulSoup object representing a div element that contains the author's information.

required

Returns:

Type Description
Dict[str, str]

A dictionary with the author's username and profile URL.

construct_rating_dict(div_ele)

Constructs a dictionary with the comment's rating scores.

Parameters:

Name Type Description Default
div_ele element.PageElement

A BeautifulSoup object representing a div element that contains the comment's rating scores.

required

Returns:

Type Description
Dict[str, str]

A dictionary with the comment's rating scores.

construct_thread_rating_dict(div_ele)

Constructs a dictionary with the thread's rating scores.

Parameters:

Name Type Description Default
div_ele element.PageElement

A BeautifulSoup object representing a div element that contains the thread's rating scores.

required

Returns:

Type Description
Dict[str, str]

A dictionary with the thread's rating scores.

construct_time_dict(div_ele)

Constructs a dictionary with the time information of the comment or thread.

Parameters:

Name Type Description Default
div_ele element.PageElement

A BeautifulSoup object representing a div element that contains the time information.

required

Returns:

Type Description
Dict[str, str]

A dictionary with the time information of the comment or thread.

construct_urls_list(div_ele)

Constructs a list of URLs from the div element that contains the URLs.

Parameters:

Name Type Description Default
div_ele element.PageElement

A BeautifulSoup object representing a div element that contains URLs.

required

Returns:

Type Description
List[str]

A list of URLs that are valid image links.

define_children_fields(div_ele)

Defines the number of children and whether a comment or thread has children.

Parameters:

Name Type Description Default
div_ele element.PageElement

A BeautifulSoup object representing a div element that contains the information about children.

required

Returns:

Type Description
bool

A tuple with the boolean value indicating whether the comment or thread has children, and the number of

int

children (an integer) if there are any.

get_list_of_img_files_in_dir(directory)

Generate a list of image files in a directory.

Parameters:

Name Type Description Default
directory pathlib.Path

The directory to search for image files.

required

Returns:

Name Type Description
list List[str]

A list of image file paths.

remove_empty_lists(lst)

Recursively remove empty lists from a nested list.

Parameters:

Name Type Description Default
lst list

A list to remove empty lists from.

required

Returns:

Name Type Description
list List[Dict[str, Any]]

A new list with all empty lists removed.

Logging

LoguruSetup

A class to set up logging for scripts.

Attributes:

Name Type Description
constants LoggingConstants

an instance of LoggingConstantsNamespace class.

Methods

def script_logger_config_dict: creates configuration object for scripts to setup logging.

__init__()

Initializes LoggingSetup class.

script_logger_config_dict(logger, output_directory, log_filename, level=constants.default_log_file_level, log_format=constants.default_log_format, colorize=constants.default_log_colorizing, rotation=constants.default_log_rotation, retention=constants.default_log_retention, compression=constants.default_log_compression, delay=constants.default_log_delay, mode=constants.default_log_mode, buffering=constants.default_log_buffering, encoding=constants.default_log_encoding, serialize=constants.default_log_serialize, backtrace=constants.default_log_backtrace, diagnose=constants.default_log_diagnose, enqueue=constants.default_log_enqueue, catch=constants.default_log_catch, debug=False) staticmethod

Creates configuration object for scripts to setup logging.

Parameters:

Name Type Description Default
output_directory str required
logger Any

the logging object.

required
log_filename str

the name of the log file.

required
level str

the logging level. Defaults to constants.DEFAULT_LOG_FILE_LEVEL.

constants.default_log_file_level
log_format str

the logging format. Defaults to constants.default_log_format.

constants.default_log_format
colorize bool

whether to colorize the logging. Defaults to constants.DEFAULT_LOG_COLORIZING.

constants.default_log_colorizing
rotation float

the rotation size of the log file. Defaults to constants.default_log_rotation.

constants.default_log_rotation
retention str

the retention period of the log file. Default: constants.DEFAULT_LOG_RETENTION.

constants.default_log_retention
compression str

the compression format of the log file. Default: constants.DEFAULT_LOG_COMPRESSION.

constants.default_log_compression
delay bool

whether to delay logging. Defaults to constants.DEFAULT_LOG_DELAY.

constants.default_log_delay
mode str

the mode of the log file. Defaults to constants.DEFAULT_LOG_MODE.

constants.default_log_mode
buffering int

the buffering size of the log file. Defaults to constants.DEFAULT_LOG_BUFFERING.

constants.default_log_buffering
encoding str

the encoding format of the log file. Defaults to constants.DEFAULT_LOG_ENCODING.

constants.default_log_encoding
serialize bool

whether to serialize the logging. Defaults to constants.DEFAULT_LOG_SERIALIZE.

constants.default_log_serialize
backtrace bool

whether to include a backtrace in the logging. Default: constants.DEFAULT_LOG_BACKTRACE.

constants.default_log_backtrace
diagnose bool

whether to diagnose the logging. Defaults to constants.DEFAULT_LOG_DIAGNOSE.

constants.default_log_diagnose
enqueue bool

whether to enqueue the logging. Defaults to constants.DEFAULT_LOG_ENQUEUE.

constants.default_log_enqueue
catch bool

whether to catch the logging. Defaults to constants.DEFAULT_LOG_CATCH.

constants.default_log_catch
debug bool

whether to include debug messages. Defaults to False.

False

Logging Utils

LogRotator

Custom logging formatter that applies log rotation based on time or limit.

__init__(*, size, at)

Initializes a LogRotator instance.

Parameters:

Name Type Description Default
size float

The maximum size in bytes of the log file before it rotates.

required
at datetime.time

The time at which the log file should rotate.

required
should_rotate(message, file)

Determines whether the log file should rotate based on the given message and file.

Parameters:

Name Type Description Default
message loguru._record.Record

A LogRecord instance representing the log message.

required
file io.IOBase

An IOBase instance representing the log file.

required

Returns:

Type Description
bool

A boolean indicating whether the log file should rotate.

PaddingFormatter

A logging formatter that adjusts padding length based on previously encountered values.

This formatter is used to vertically align log messages by fixing the length of {name}, {function}, and {line} fields.

Attributes:

Name Type Description
padding int

The padding length to use.

fmt str

The log format string to use.

Methods
  • format(record): Formats the log record according to the format string.
__init__()

Initializes a new PaddingFormatter instance.

format(record)

Formats the specified log record.

Parameters:

Name Type Description Default
record loguru.Record

The log record to format.

required

Returns:

Type Description
str

The formatted log record string.

logger_wraps(*, entry=True, exit_trigger=True, level='DEBUG')

Decorator that logs entry and exit of a function.

Parameters:

Name Type Description Default
entry bool

Whether to log the entry of the function.

True
exit_trigger bool

Whether to log the exit of the function.

True
level str

The logging level to use for the messages.

'DEBUG'

Returns:

Type Description
Callable[..., Any]

A function decorator.

Example

@logger_wraps() def foo(x): return x**2

timeit(func)

Decorator that times the execution of a function and logs the elapsed time.

Parameters:

Name Type Description Default
func Callable[..., Any]

The function to decorate.

required

Returns:

Type Description
Callable[..., Any]

The decorated function.

Example

@timeit def foo(x): return x**2

Constants

CommonConstants

A class that contains constants used throughout the application.

Attributes:

Name Type Description
old_reddit_url str

Old reddit base url.

reddit_api_base_url str

Reddit API base url.

resolutions defaultdict

Resolutions for the most popular resolutions.

default_logger_format str

Default logger format.

default_logger_date_format str

Default logger date format.

attrs dict

Default attributes for scraping reddit posts.

possible_urls list

Valid possible url prefixes that may appear during scraping.

invalid_urls list

Invalid possible url prefixes that may appear during scraping.

match_url str

Regex that match urls in the format https://subdomain.domain.

domain_regex str

Regex for validating domains.

Properties

logs_default_output_directory (str): Logs default output directory. user_reports_default_output_directory (str): User reports default output directory. subreddits_reports_default_output_directory (str): Subreddits reports default output directory. user_img_downloads_default_output_directory (str): User image downloads default output directory. subreddits_img_downloads_default_output_directory (str): Subreddits default output directory. client_id (str): User API key. secret_token (str): User API secret. username (str): User Reddit username. password (str): User Reddit password. user_subreddits_list (str): Subreddit user list. A string of comma-separated subreddits. user_profile_to_scrape (str): Subreddit user list. A string of comma-separated subreddits. user_subreddits_sort_method (str): Subreddit user list sort method. A string of comma-separated subreddits. reddit_headers (dict): A dictionary containing the user agent header information. check_mark_symbol (str): Check mark symbol. cross_symbol (str): Cross symbol. reddit_url (str): New Reddit base URL. old_reddit_url (str): Old Reddit base URL. reddit_api_base_url (str): Reddit API base URL. output_path (Path): Output path for subreddits. resolutions (defaultdict): a dictionary containing a tuple of integers representing a resolution as the key and the corresponding resolution label as the value. user_agent (str): User agent information for Reddit.

attrs property

Returns the default attributes for scraping Reddit posts.

Returns:

Type Description
dict

a dictionary containing the default attributes.

check_mark_symbol property

Returns:

Type Description
str

Check mark symbol.

client_id property

Returns:

Type Description
str

User API key.

cross_symbol property

Returns:

Type Description
str

Cross symbol.

current_date property

Returns the current date and time as a formatted string.

Returns:

Type Description
str

A formatted date string in the format: "dd_mm_yyyy_hh_mm_ss".

domain_regex property

Returns the regular expression for validating domains.

The first regular expression uses a positive lookahead to assert that the string contains between 1 and 254 characters ((?=^.{1,254}$)). Then it uses a capturing group to match one or more repetitions of a subpattern that consists of one or more characters that are not a digit followed by an optional dot. Finally, it uses a non-capturing group to match two or more alphabetical characters ((?:[a-zA-Z]{2,})$). In summary, this regex matches a string that consists of one or more non-digit characters followed by an optional dot, and ends with two or more alphabetical characters. This would match a domain name such as "example.com".

The second regular expression is a simplified version of the first one, and it only matches domain names that consist of one or more subdomains separated by dots, followed by a top-level domain. The regex starts by using a capturing group to match one or more repetitions of a subpattern that consists of a letter or digit ([a-zA-Z0-9]), followed by an optional subpattern that consists of between 0 and 61 characters that are either a letter, digit, or a hyphen (([a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?), followed by a dot. This pattern is repeated one or more times, followed by a subpattern that matches two or more alphabetical characters ([a-zA-Z]{2,}). In summary, this regex matches a string that consists of one or more subdomains separated by dots, followed by a top-level domain. This would match a domain name such as "sub.example.com".

Returns:

Type Description
str

a string containing the regular expression.

extract_num_children property

Extracts decimal numbers from a string.

Returns:

Type Description
str

A regular expression pattern that matches decimal numbers.

invalid_url_prefixes property

Returns the invalid possible URL prefixes that may appear during scraping.

Returns:

Type Description
list

a list of strings containing the invalid URL prefixes.

logs_default_output_directory property

Returns:

Type Description
str

Logs default output directory.

match_url property

Returns the regular expression that matches URLs in the format https://subdomain.domain.

Returns:

Type Description
str

a string containing the regular expression.

old_reddit_url property

Returns:

Type Description
str

Old Reddit base URL.

output_path property

Returns:

Type Description
Path

Output path for subreddits.

password property

Returns:

Type Description
str
  • User reddit password
path_regex property

Returns the regular expression for validating paths.

Returns:

Type Description
str

a string containing the regular expression.

path_regex_with_query_params property

Returns the regular expression for validating paths with query parameters.

Returns:

Type Description
str

a string containing the regular expression.

possible_urls property

Returns the valid possible URL prefixes that may appear during scraping.

Returns:

Type Description
list

a list of strings containing the valid URL prefixes.

reddit_api_base_url property

Returns:

Type Description
str

Reddit API base URL.

reddit_headers property

Sets up the header info, which gives Reddit a brief description of the app.

Returns:

Type Description
dict

A dictionary containing the user agent header information.

reddit_url property

Returns:

Type Description
str

New Reddit base URL.

remove_empty_groups_from_comma_or_semicolon_separated_string property

Remove any remaining empty groups in a comma or semicolon separated string

Returns:

Type Description
str

a string containing the regular expression.

replace_consecutive_commas_or_semicolons_regex property

Replace consecutive commas or semicolons with a single comma in a comma or semicolon separated string

Returns:

Type Description
str

a string containing the regular expression.

resolutions property

Returns resolutions for the most popular resolutions.

Returns:

Type Description
defaultdict

a dictionary containing a tuple of integers representing a resolution as the key and the corresponding resolution label as the value.

secret_token property

Returns:

Type Description
str

User API secret.

subreddits_img_downloads_default_output_directory property

Returns:

Type Description
str

subreddits default output directory.

subreddits_reports_default_output_directory property

Returns:

Type Description
str

Subreddits reports default output directory.

user_agent property

Returns the User-Agent header for HTTP requests.

Returns:

Type Description
dict

a dictionary containing the User-Agent header.

user_img_downloads_default_output_directory property

Returns:

Type Description
str

user image downloads default output directory.

user_profile_to_scrape property

Returns:

Type Description
str

Subreddit user list. A string of comma separated subreddits.

user_reports_default_output_directory property

Returns:

Type Description
str

User reports default output directory.

user_subreddits_list property

Returns:

Type Description
str

Subreddit user list. A string of comma separated subreddits.

user_subreddits_sort_method property

Returns:

Type Description
str

Subreddit user list sort method. A string of comma separated subreddits.

username property

Returns:

Type Description
str

User reddit username.

validate_and_split_string_regex property

Validates if a string is composed of values separated by commas or semicolons and only contains plus signs, underscores, and alphanumeric characters.

Returns:

Type Description
str

A regular expression pattern that matches the validated string pattern.

LoggingConstants

A class that contains logging constants used for configuring logging settings. The properties of this class represent the logging levels and their corresponding severity levels, as well as various symbols and log formats used in the logging process.

Methods: log_filename (str): The name of the log file to be used for logging. trace_level (str): The logging level for trace messages. trace_severity_level (int): The severity level for trace messages. debug_level (str): The logging level for debug messages. debug_severity_level (int): The severity level for debug messages. info_level (str): The logging level for info messages. info_severity_level (int): The severity level for info messages. success_level (str): The logging level for success messages. success_severity_level (int): The severity level for success messages. warning_level (str): The logging level for warning messages. warning_severity_level (int): The severity level for warning messages. error_level (str): The logging level for error messages. error_severity_level (int): The severity level for error messages. critical_level (str): The logging level for critical messages. critical_severity_level (int): The severity level for critical messages. bug_symbol (str): The symbol used to represent a bug. robot_symbol (str): The symbol used to represent a robot. rocket_symbol (str): The symbol used to represent a rocket. red_alarm_symbol (str): The symbol used to represent a red alarm. red_circle_symbol (str): The symbol used to represent a red circle. green_circle_symbol (str): The symbol used to represent a green circle. warning_symbol (str): The symbol used to represent a warning. lightning_bolt_symbol (str): The lightning bolt symbol as a string: "⚡". skull_symbol (str): The skull symbol as a string: "☠️". check_mark_symbol (str): The check mark symbol as a string: "✔️". cross_symbol (str): The cross symbol as a string: "❌". info_symbol (str): The information symbol as a string: "🛈 ". default_log_format (str): The default log format as a string, which consists of a time stamp, log level, file name, function name, line number, and log message. default_log_format2 (str): An alternative default log format as a string, which consists of a time stamp, log level, file name, function name, line number, and log message. padding_log_format (str): A padded log format as a string, which consists of a time stamp, log level, file name, function name, line number, log message, and exception details.

bug_symbol property

The bug symbol as a string: "🐞".

Returns:

Type Description
str

the bug symbol as a string: "🐞".

check_mark_symbol property

The check mark symbol as a string: "✔️".

Returns:

Type Description
str

the check mark symbol as a string: "✔️"

critical_level property

The logging level for critical messages.

Returns:

Type Description
str

the logging level for critical messages.

critical_severity_level property

The severity level for critical messages.

Returns:

Type Description
int

the severity level for critical messages.

cross_symbol property

The cross symbol as a string: "❌"

Returns:

Type Description
str

the cross symbol as a string: "❌".

debug_level property

The logging level for debug messages.

Returns:

Type Description
str

the logging level for debug messages.

debug_mode_flag property

A boolean value indicating whether debug mode is enabled or not.

Returns:

Type Description
bool

boolean value indicating whether debug mode is enabled or not

debug_severity_level property

The severity level for debug messages.

Returns:

Type Description
int

the severity level for debug messages.

default_log_backtrace property

A boolean representing whether a backtrace should be included in log messages by default.

Returns:

Type Description
bool

a boolean representing whether a backtrace should be included in log messages by default

default_log_buffering property

An integer representing the default buffering value for log files.

Returns:

Type Description
int

an integer representing the default buffering value for log files

default_log_catch property

A boolean representing whether exceptions should be caught during log writing by default.

Returns:

Type Description
bool

a boolean representing whether exceptions should be caught during log writing by default

default_log_colorizing property

A boolean representing whether log output should be colorized by default.

Returns:

Type Description
bool

a boolean representing whether log output should be colorized by default

default_log_compression property

A string representing the default compression type for log files.

Returns:

Type Description
str

a string representing the default compression type for log files

default_log_delay property

A boolean representing whether log messages should be written immediately or with a delay.

Returns:

Type Description
bool

a boolean representing whether log messages should be written immediately or with a delay

default_log_diagnose property

A boolean representing whether diagnosis information should be included in log messages by default.

Returns:

Type Description
bool

a boolean representing whether diagnosis information should be included in log messages by default

default_log_encoding property

A string representing the default character encoding for log files.

Returns:

Type Description
str

a string representing the default character encoding for log files

default_log_enqueue property

A boolean representing whether log messages should be enqueued for writing by default.

Returns:

Type Description
bool

a boolean representing whether log messages should be enqueued for writing by default

default_log_format property

The default log format as a string, which consists of a time stamp, log level, file name, function name, line number, and log message.

Returns:

Type Description
str

the default log format as a string, which consists of a time stamp, log level, file name,

function name, line number, and log message.

default_log_format2 property

An alternative default log format as a string, which consists of a time stamp, log level, file name, function name, line number, and log message.

Returns:

Type Description
str

an alternative default log format as a string

default_log_mode property

A string representing the default mode for opening log files.

Returns:

Type Description
str

a string representing the default mode for opening log files

default_log_retention property

A string representing the default retention period for log files.

Returns:

Type Description
str

a string representing the default retention period for log files

default_log_rotation property

A float representing the default maximum size of the log file before it is rotated.

Returns:

Type Description
float

a float representing the default maximum size of the log file before it is rotated

default_log_rotation_time property

A time object representing the default log rotation time, which is midnight (0 hours, 0 minutes, 0 seconds).

Returns: (time): a time object representing the default log rotation time

default_log_serialize property

A boolean representing whether log messages should be serialized by default.

Returns:

Type Description
bool

a boolean representing whether log messages should be serialized by default

default_log_stdout_level property

A string representing the default logging level for standard output.

Returns:

Type Description
str
default_log_stfout_level property

A string representing the default logging level for standard output.

Returns:

Type Description
str
default_logger_date_format property

The default logger date format as a string, which consists of the day, month, year, hour, minute, and second.

Returns:

Type Description
str

the default logger date format as a string

default_logger_format property

The default logger format as a string, which consists of a time stamp, log level, logger name, and log message.

Returns: (str): the default logger format as a string

error_level property

The logging level for error messages.

Returns:

Type Description
str

the logging level for error messages.

error_severity_level property

The severity level for error messages.

Returns:

Type Description
int

the severity level for error messages.

green_circle_symbol property

The green circle symbol as a string: "🟢".

Returns:

Type Description
str

the green circle symbol as a string: "🟢"

info_level property

The logging level for info messages.

Returns:

Type Description
str

the logging level for info messages.

info_severity_level property

The severity level for info messages.

Returns:

Type Description
int

the severity level for info messages

info_symbol property

The information symbol as a string: "🛈 "

Returns:

Type Description
str

the information symbol as a string: "🛈 ".

lightning_bolt_symbol property

The lightning bolt symbol as a string: "⚡".

Returns:

Type Description
str
log_filename property

The name of the log file to be used for logging.

Returns:

Type Description
str

the name of the log file to be used for logging.

padding_log_format property

A padded log format as a string, which consists of a time stamp, log level, file name, function name, line number, log message, and exception details.

Returns:

Type Description
str

a padded log format as a string

red_alarm_symbol property

The red alarm symbol as a string: "🚨".

Returns:

Type Description
str

the red alarm symbol as a string: "🚨".

red_circle_symbol property

The red circle symbol as a string: "🔴".

Returns:

Type Description
str

the red circle symbol as a string: "🔴"

robot_symbol property

The robot symbol as a string: "🤖".

Returns:

Type Description
str

the robot symbol as a string: "🤖".

rocket_symbol property

The rocket symbol as a string: "🚀".

Returns:

Type Description
str

the rocket symbol as a string: "🚀".

skull_symbol property

The skull symbol as a string: "☠️"

Returns:

Type Description
str

the skull symbol as a string: "☠️".

success_level property

The logging level for success messages.

Returns:

Type Description
str

the logging level for success messages.

success_severity_level property

The severity level for success messages.

Returns:

Type Description
int

the severity level for success messages.

trace_level property

The logging level for trace messages.

Returns:

Type Description
str

the logging level for trace messages.

trace_severity_level property

The severity level for trace messages.

Returns:

Type Description
int

the severity level for trace messages.

warning_level property

The logging level for warning messages.

Returns:

Type Description
str

the logging level for warning messages.

warning_severity_level property

The severity level for warning messages.

Returns:

Type Description
int

the severity level for warning messages.

warning_symbol property

The warning symbol as a string: "⚠️".

Returns:

Type Description
str

the warning symbol as a string: "⚠️"

default_log_file_level()

The default logger date format as a string, which consists of the day, month, year, hour, minute, and second.

Returns:

Type Description
str

Exceptioms

Validations

ParameterValidations

A class that provides method to validate the application parameters

Methods

def validate_subreddits_parameter(self, input_str: str) -> list[str]: Validates that input string only contains commas and semicolons, and splits it into a list of values separated by those characters. Returns the list of values.

def validate_user(self, token: str, reddit_user: str) -> Optional[bool]: Checks if the given username exists on Reddit API.

def validate_user_v2(self, reddit_user: str) -> Optional[bool]: Checks if the given username exists on Reddit API.

validate_subreddits_parameter(input_str)

Validates that input string only contains commas and semicolons, and splits it into a list of values separated by those characters. Returns the list of values.

Parameters:

Name Type Description Default
input_str str

The string to be validated and split.

required

Returns:

Type Description
list[str]

list or str: If the input string contains only alphanumeric characters, it returns the input string. Otherwise, it returns a list of strings that are split by the special character present in the input string.

Raises:

Type Description
ValueError

If the input string contains special characters other than commas and semicolons, or if the input string contains both commas and semicolons.

validate_user(token, reddit_user)

Checks if the given username exists on Reddit API.

Parameters:

Name Type Description Default
token str

The OAuth token for Reddit API.

required
reddit_user str

The name of the user to check.

required

Returns:

Name Type Description
bool Optional[bool]

True if the user exists, False otherwise.

Notes

According to Reddit's API rules changed the client's User-Agent string to something unique and descriptive, including the target platform, a unique application identifier, a version string, and your username as contact information, in the following format

this check can also be accomplished by targeting this endpoint: url = "{constants.reddit_api_base_url}/api/v1/user/{username}/trophies"

validate_user_v2(reddit_user)

Checks if the given username exists on Reddit API.

Parameters:

Name Type Description Default
reddit_user str

The name of the user to check.

required

Returns:

Name Type Description
bool Optional[bool]

True if the user exists, False otherwise.

RedditApiValidations

A class that provides a number of methods to expose the reddit API

Methods
  • def validate_subreddits_list(self, subreddits): Validates a list of subreddits
  • def validate_subreddit(self, subreddit): Validates if a subreddit exists
  • def check_if_subreddit_exists(self, token: str, subreddit: str) -> Optional[bool]: Checks if the given username exists on Reddit API.
  • def validate_reddit_user(self, reddit_user: str, verbose: bool) -> Optional[bool]: Checks if the given username exists on Reddit API.
  • def validate_user(self, token: str, reddit_user: str) -> Optional[bool]: Checks if the given username exists on Reddit API.
  • def validate_user_v2(self, reddit_user: str) -> Optional[bool]: Checks if the given username exists on Reddit API.
validate_reddit_user(reddit_user, verbose)

Checks if the given username exists on Reddit API.

Parameters:

Name Type Description Default
reddit_user str

The name of the user to check.

required
verbose bool

Controls the verbosity level

required

Returns:

Name Type Description
bool Optional[bool]

True if the user exists, False otherwise.

Notes

According to Reddit's API rules changed the client's User-Agent string to something unique and descriptive, including the target platform, a unique application identifier, a version string, and your username as contact information, in the following format

this check can also be accomplished by targeting this endpoint: url = "{constants.reddit_api_base_url}/api/v1/user/{username}/trophies"

validate_subreddit(subreddit, verbose)

Checks if the given subreddit exists on Reddit API.

Parameters:

Name Type Description Default
subreddit str

The name of the subreddit to check.

required
verbose bool

(bool): Controls the verbosity level

required

Returns:

Name Type Description
bool Optional[bool]

True if the subreddit exists, False otherwise.

Notes

According to Reddit's API rules changed the client's User-Agent string to something unique and descriptive, including the target platform, a unique application identifier, a version string, and your username as contact information, in the following format

validate_subreddits_list(subreddits, verbose)

Validates a list of subreddits

Parameters:

Name Type Description Default
subreddits str

list of comma or semicolon separated values

required
verbose bool

Controls the verbosity level

required

UrlValidations

A class that provides methods for validating URLs.

Methods
  • def validate_if_url_is_a_valid_img_link(self, url: str, possible_base_urls: List[str]) -> Optional[bool]: This method extracts the URL from the given string that contains one of the possible base URLs.
  • def validate_image_url(self, possible_base_urls: List[str], provided_url: str) -> bool: This method validates if any of the URLs in a URL list is equal or starts with a provided URL.

This method extracts the URL from the given string that contains one of the possible base URLs.

Parameters:

Name Type Description Default
url str

The string to check and extract from.

required
possible_base_urls list

A list of possible base URLs that the extracted URL might contain.

required

Returns:

Name Type Description
bool Optional[bool]

True if the given string contains a URL that contains one of the possible base URLs, or False

Optional[bool]

otherwise.

validate_image_url(possible_base_urls, provided_url)

This method validates if any of the URLs in a URL list is equal or starts with a provided URL.

Parameters:

Name Type Description Default
possible_base_urls list

A list of URLs to check.

required
provided_url str

The URL to compare against.

required

Returns:

Name Type Description
bool bool

True if any of the URLs in the list is equal or starts with the provided URL, else False.

IO Operations

IOOperations

The IOOperations class provides methods for input-output operations, including writing detailed post information, validating directories, sorting files by MIME type and resolution, and deleting original files.

This class requires access to the ConstantsNamespace and LoggingSetup classes from the common package, as well as the PIL library for image processing.

Methods
  • def write_detailed_post_information(payload: Any, operation: str, filename) -> None: Writes payload to a file with the given filename, using the specified operation mode. The file format is determined by the extension of the filename. output_post_detailed_information(payload): Writes all image URLs in payload to a JSON file named src/output/image_urls.json, sorted by subreddit and post.
  • def validate_directories(input_dir): Checks if the input_dir directory exists. If it does not exist, logs an error message and exits the program.
  • def sort_by_mime_type_and_resolution(input_dir: Path, output_dir: Path, remove: bool): Sorts all files in the input_dir directory by MIME type and resolution. The sorted files are copied to the corresponding folders in the output_dir directory, and original files are deleted if the remove flag is set to True. -def delete_original_files(input_dir, remove): Deletes all original files in the input_dir directory if the remove flag is set to True.
create_output_folder_and_move_files(file_path, output_dir, matching_res, mimetype)

Creates the output folder and then moves the files to said folder

Parameters:

Name Type Description Default
file_path Path

Path to the file to be moved

required
output_dir Path

Path to the output directory

required
matching_res str

Matching resolution of the file

required
mimetype str

Mimetype of the file

required
delete_original_files(input_dir, remove, verbose)

Delete the original files from the input directory if remove is True.

Parameters:

Name Type Description Default
input_dir str

The path to the input directory.

required
remove bool

Whether to remove the original files.

required
verbose bool

Determines the level of verbosity

required
init_directory(path)

Checks if a directory exists at the given path and creates it if it does not exist.

Parameters:

Name Type Description Default
path str

The path to the directory to check/create.

required
sort_by_mime_type_and_resolution(input_dir, output_dir, remove, verbose)

Sort image files in the input directory by MIME type and resolution and save them to the output directory.

Parameters:

Name Type Description Default
input_dir Path

The path to the input directory.

required
output_dir Path

The path to the output directory.

required
remove bool

Whether to remove the original files.

required
verbose bool

Determines the level of verbosity

required
validate_directory(input_dir)

Check if the input directory exists and exit if it does not.

Parameters:

Name Type Description Default
input_dir str

The path to the input directory.

required
write_detailed_post_information(payload, operation, filename, verbose)

Write payload to a file in the specified format.

Parameters:

Name Type Description Default
payload Any

The data to be written to the file.

required
operation str

The file operation mode ('w' for write, 'a' for append, etc.).

required
filename str

The path and filename of the file to write to.

required
verbose bool

Boolean flag that controls the verbosity output

required

Returns:

Type Description
None

None

RequestManager

The RequestManager class is used to send HTTP GET requests to URLs and receive the response.

Methods

request_page(link: str): sends an HTTP GET request to a given URL and returns the response if the request is successful. If the request fails, an exception is raised. The method has one parameter, link, which is the URL to send the request to. The method includes a user-agent in the request headers to mimic browser activity, and has a timeout of 10 seconds for the request.

request_page(endpoint, headers, params=None, timeout=10) staticmethod

The request_page method sends an HTTP GET request to a given URL and returns the response if the request is successful. A user-agent is included in the request headers to mimic browser activity. The method has one parameter, link, which is the URL to send the request to. If the request is successful (status code 200), the response object is returned. If the request fails, an exception is raised. The method has a timeout of 10 seconds for the request.

Parameters:

Name Type Description Default
endpoint str

url to make the request

required
headers str

headers to use in the request

required
params dict

params to use in the request

None
timeout int

time to wait before request timeout

10

ImageDownloader

The ImageDownloader class is used to download images from URLs.

Methods

def download_img_url_list(self, subreddit_or_user, subreddit_or_user_threads_list, output_dir, log_mode, verbose): downloads an image from the given URL and saves it to a file with a filename inferred from the URL and the appropriate file extension. The method has three parameters: url, which is the URL of the image to download; verbose, which is a boolean indicating whether to print verbose output; and output_dir, which is the directory to save the downloaded image to. The default value for output_dir is the current directory. def download_img_urls_sync(self, scrapped_subreddit: str, subreddits_img_list: List[str], directory: str, verbose: bool) -> None: downloads the images from the URLs scrapped from a given subreddit. The method has three parameters: scrapped_subreddit, which is the name of the subreddit from which the URLs were scrapped; subreddits_list, which is a dictionary of subreddits and their respective posts and URLs; and verbose, which is a boolean indicating whether to print verbose output.

download_img_url_list(subreddit_or_user, subreddit_or_user_threads_list, log_mode, output_directory, verbose)

Downloads and sorts image URLs from a subreddit.

Parameters:

Name Type Description Default
subreddit_or_user str

The name of the subreddit.

required
subreddit_or_user_threads_list dict

A dictionary containing a list of subreddit posts.

required
log_mode str

Determines where to output the downloaded images

required
verbose bool

Whether to output verbose messages or not.

required
download_img_urls_sync(subreddits_img_list, output_directory, verbose)

Downloads the images from the URLs scrapped from the given subreddit.

Parameters:

Name Type Description Default
subreddits_img_list List[str]

list of subreddits images

required
output_directory Path

Directory to output the downloaded images

required
verbose bool

Whether to print verbose output.

required
generate_download_report(output_dir, subreddits_img_list, failed_urls, verbose)

Generates report for the downloading process of the subreddits images

Parameters:

Name Type Description Default
output_dir Path

Output path

required
subreddits_img_list List[str]

List of images scraped from the subreddit

required
failed_urls List[str]

List of urls that failed the downloading process

required
verbose bool

(bool): controls the verbosity level

required

Utils

StringBuilder

A class that represents a mutable sequence of characters.

Methods

append_string(string: str) -> 'StringBuilder': Appends the given string to the current sequence.

append_strings(*strings: str) -> 'StringBuilder': Appends the given strings to the current sequence.

str() -> str: Returns a string representing the current sequence.

__str__()

Returns a string representing the current sequence.

Returns:

Type Description
str

A string representing the current sequence.

append_string(string)

Appends the given string to the current sequence.

Parameters:

Name Type Description Default
string str

The string to be appended to the current sequence.

required

Returns:

Type Description
StringBuilder

The current StringBuilder object with the appended string.

append_strings(*strings)

Appends the given strings to the current sequence.

Parameters:

Name Type Description Default
strings str

The strings to be appended to the current sequence.

()

Returns:

Type Description
StringBuilder

The current StringBuilder object with the appended strings.