Python Web Tools

A collection of tools I commonly use in my web development work


Lorem Ipsum style text generator

Thanks to Luca De Vitis for the inspiration


From PyPI

pip install lorem_pysum

From github

pip install git+
python -m pip install -e C:\Users\corji\dinma\git\py_web_tools # from directory
pip install

Import the appropriate class

import py_web_tools

Create a single LoremPysum instance object

p = py_web_tools.LoremPysum()

The default option is to use the standard lorem ipsum text. But it is also possible for the user to supply the list of words (in a text file) to be used. This is achieved by using the sample parameter during object creation

p = py_web_tools.LoremPysum(sample="some_file.txt")

The following methods are available for the object. # return an email address. # return a name in the form "firstname I. lastname".
p.sentence() # generate a single sentence.
p.paragraphs() # return a single paragraph of standard Lorem Ipsum text.
p.paragraphs(count=3) # return 3 paragraphs where the first paragraph is the standard text.
p.paragraphs(common=False) # return a single paragraph where the first paragraph is random.
p.title() # generate a string (title case) with 2 to 6 words. Good for article titles.

In case you want to look into the words used, the following object attributes are defined.

p.common # A list of the first few words in the lorem ipsum text
p.words # A list of all the words in the lorem ipsum text.
p.standard # Standard lorem ipsum text. Usually the first 1/3rd paragraph of a sample file.


Lorem Pysum: Name, email, title, sentence and paragraph generator

class py_web_tools.lorem_pysum.LoremPysum(sample=None)[source]

Generate random sentences and paragraphs

Parameters:sample (file, optional) – a file containing the text to be used as sample. Default is Lorem Ipsum text.



Return an email address


Return any name with a middle initial.

paragraphs(count=1, common=True)[source]

Return paragraphs

  • count (int) – The number of required paragraph. Default is 1
  • common (bool) – Whether the first paragraph will be the standard lorem ipsum text. Default is True

Return a sentence


The first word is capitalized, and the sentence ends in either a period or question mark. Commas are added at random.

Determine the number of comma-separated sections and number of words in each section for this sentence.


return a title consisting of between 2 to 6 words

Page Tools

A tool for harvesting words and links on a web page.



Create a BeautifulSoup object of a webpage

class py_web_tools.page_ripper.PageRipper(page_url='')[source]

Harvest words and links from a webpage

Parameters:str – Page url. Default is ‘



  1. PageRipper(‘‘).page_soup()
  2. PageRipper(‘’).page_soup(to_file = ‘yes’)
  3. PageRipper(‘‘).all_links()
  4. PageRipper(‘‘).crawlable_links()
  5. PageRipper(‘‘).words_on_page()


Return all <a> tags and their content

Returns:List of all <a> tags in webpage source file
Return type:list

Return all crawlable links (clickable url) on webpage

Yields:str – Clickable url


Links with “#” are excluded


Create BeautifulSoup object of given url page source

Parameters:str – Web page url in the form “http://something.extension
Returns:Beautiful soup object of webpage source file
Return type:BeautifulSoup


Pass “to_file = ‘yes’” to get the output in a text file


Harvest all words enclosed in <p> tags in webpage source

Yields:str – Single word which is not in list of excluded words

Indexing functions

py_web_tools.index_utils.add_page_to_index(word_index, page_url)[source]

Add all words found in a webpage to the word index

  • word_index (dict) – Index of words
  • page_url (str) – url from which words are to be extracted

Word index with entries added/updated

Return type:



Modifies the input dictionary in place

py_web_tools.index_utils.add_to_index(word_index, word, page_url)[source]

Add a word to word index and adds a page url to the list of urls associated with that word

  • word_index (dict) – Index of words
  • word (str) – Word to be added to the index
  • page_url (str) – url to be added in the list of urls associated with “word”

Word index with “word” and “page_url” added/updated.

Return type:



This function modifies the input dictionary in-situ (in place)

Indices and tables