Python Web Tools

A collection of tools I commonly use in my web development work

LoremPysum

Lorem Ipsum style text generator

Thanks to Luca De Vitis for the inspiration

Usage

From PyPI

pip install lorem_pysum

From github

pip install git+https://github.com/Parousiaic/py_web_tools.git
python -m pip install -e C:\Users\corji\dinma\git\py_web_tools # from directory
pip install https://github.com/Parousiaic/py_web_tools/archive/master.zip

Import the appropriate class

import py_web_tools

Create a single LoremPysum instance object

p = py_web_tools.LoremPysum()

The default option is to use the standard lorem ipsum text. But it is also possible for the user to supply the list of words (in a text file) to be used. This is achieved by using the sample parameter during object creation

p = py_web_tools.LoremPysum(sample="some_file.txt")

The following methods are available for the object.

p.email() # return an email address.
p.name() # return a name in the form "firstname I. lastname".
p.sentence() # generate a single sentence.
p.paragraphs() # return a single paragraph of standard Lorem Ipsum text.
p.paragraphs(count=3) # return 3 paragraphs where the first paragraph is the standard text.
p.paragraphs(common=False) # return a single paragraph where the first paragraph is random.
p.title() # generate a string (title case) with 2 to 6 words. Good for article titles.

In case you want to look into the words used, the following object attributes are defined.

p.common # A list of the first few words in the lorem ipsum text
p.words # A list of all the words in the lorem ipsum text.
p.standard # Standard lorem ipsum text. Usually the first 1/3rd paragraph of a sample file.

Code

Lorem Pysum: Name, email, title, sentence and paragraph generator

class py_web_tools.lorem_pysum.LoremPysum(sample=None)[source]

Generate random sentences and paragraphs

Parameters:sample (file, optional) – a file containing the text to be used as sample. Default is Lorem Ipsum text.

Methods

email()[source]

Return an email address

name()[source]

Return any name with a middle initial.

paragraphs(count=1, common=True)[source]

Return paragraphs

Parameters:
  • count (int) – The number of required paragraph. Default is 1
  • common (bool) – Whether the first paragraph will be the standard lorem ipsum text. Default is True
sentence()[source]

Return a sentence

Notes

The first word is capitalized, and the sentence ends in either a period or question mark. Commas are added at random.

Determine the number of comma-separated sections and number of words in each section for this sentence.

title()[source]

return a title consisting of between 2 to 6 words

Page Tools

A tool for harvesting words and links on a web page.

Usage

Code

Create a BeautifulSoup object of a webpage

class py_web_tools.page_ripper.PageRipper(page_url='http://python.org')[source]

Harvest words and links from a webpage

Parameters:str – Page url. Default is ‘http://python.org

Notes

Usage:

  1. PageRipper(‘http://python.org‘).page_soup()
  2. PageRipper(‘http://python.org’).page_soup(to_file = ‘yes’)
  3. PageRipper(‘http://python.org‘).all_links()
  4. PageRipper(‘http://python.org‘).crawlable_links()
  5. PageRipper(‘http://python.org‘).words_on_page()

Methods

Return all <a> tags and their content

Returns:List of all <a> tags in webpage source file
Return type:list

Return all crawlable links (clickable url) on webpage

Yields:str – Clickable url

Notes

Links with “#” are excluded

page_soup(to_file='no')[source]

Create BeautifulSoup object of given url page source

Parameters:str – Web page url in the form “http://something.extension
Returns:Beautiful soup object of webpage source file
Return type:BeautifulSoup

Notes

Pass “to_file = ‘yes’” to get the output in a text file

words()[source]

Harvest all words enclosed in <p> tags in webpage source

Yields:str – Single word which is not in list of excluded words

Indexing functions

py_web_tools.index_utils.add_page_to_index(word_index, page_url)[source]

Add all words found in a webpage to the word index

Parameters:
  • word_index (dict) – Index of words
  • page_url (str) – url from which words are to be extracted
Returns:

Word index with entries added/updated

Return type:

dict

Notes

Modifies the input dictionary in place

py_web_tools.index_utils.add_to_index(word_index, word, page_url)[source]

Add a word to word index and adds a page url to the list of urls associated with that word

Parameters:
  • word_index (dict) – Index of words
  • word (str) – Word to be added to the index
  • page_url (str) – url to be added in the list of urls associated with “word”
Returns:

Word index with “word” and “page_url” added/updated.

Return type:

dict

Notes

This function modifies the input dictionary in-situ (in place)

Indices and tables