Python Web Tools

A collection of tools I commonly use in my web development work

Installation

From PyPI (not available yet)

Straight from github

pip install https://github.com/Parousiaic/py_web_tools/archive/master.zip

Usage

LoremPysum - Generate random texts

Credits to Luca De Vitis for the inspiration and starter code

Import the class

from py_web_tools import LoremPysum

Create a single LoremPysum instance with default Lorem Ipsum text

p = LoremPysum()

It is also possible to supply the list of words (in a text file) to be used. This is achieved by using the sample parameter during object creation

p = LoremPysum(sample="some_file.txt")

The following instance methods are defined.

p.email() # return an email address.
p.name() # return a name in the form "firstname I. lastname".
p.sentence() # generate a single sentence.
p.paragraphs() # return a single paragraph of standard Lorem Ipsum text.
p.paragraphs(count=3) # return 3 paragraphs where the first paragraph is the standard text.
p.paragraphs(common=False) # return a single paragraph where the first paragraph is random.
p.title() # generate a string (title case) with 2 to 6 words. Good for article titles.

In case you want to look into the words used, the following instance attributes are defined.

p.common # A list of the first few words in the lorem ipsum text
p.words # A list of all the words in the lorem ipsum text.
p.standard # Standard lorem ipsum text. Usually the first 1/3rd portion of a sample file.

Code

LoremPysum

Lorem Pysum: Name, email, title, sentence and paragraph generator

class py_web_tools.lorem_pysum.LoremPysum(sample=None)[source]

Generate random sentences and paragraphs

Parameters:sample (file, optional) – a file containing the text to be used as sample. Default is Lorem Ipsum text.

Methods

email()[source]

Return an email address

name()[source]

Return any name with a middle initial.

paragraphs(count=1, common=True)[source]

Return paragraphs

Parameters:
  • count (int) – The number of required paragraph. Default is 1
  • common (bool) – Whether the first paragraph will be the standard lorem ipsum text. Default is True
sentence()[source]

Return a sentence

Notes

The first word is capitalized, and the sentence ends in either a period or question mark. Commas are added at random.

Determine the number of comma-separated sections and number of words in each section for this sentence.

title()[source]

return a title consisting of between 2 to 6 words

PageRipper

Create a BeautifulSoup object of a webpage

class py_web_tools.page_ripper.PageRipper(url='http://python.org')[source]

Harvest words and links from a webpage

Parameters:str – Page url. Default is ‘http://python.org

Notes

Usage:

  1. PageRipper(‘http://python.org’).soup
  2. PageRipper(‘http://python.org’).page_soup(to_file=’no‘)
  3. PageRipper(‘http://python.org‘).raw_links()
  4. PageRipper(‘http://python.org‘).links()
  5. PageRipper(‘http://python.org‘).words()

Methods

Return all crawlable links (clickable url) on webpage

Yields:str – Clickable url

Notes

Links with “#” are excluded

words()[source]

Harvest all words enclosed in <p> tags in webpage source

Yields:str – Single word which is not in list of excluded words

Indexing

Indexing functions

py_web_tools.indexing.add_page_to_index(word_index, page_url)[source]

Add all words found in a webpage to the word index

Parameters:
  • word_index (dict) – Index of words
  • page_url (str) – url from which words are to be extracted
Returns:

Word index with entries added/updated

Return type:

dict

Notes

Modifies the input dictionary in place

py_web_tools.indexing.add_to_index(word_index, word, page_url)[source]

Add a word to word index and adds a page url to the list of urls associated with that word

Parameters:
  • word_index (dict) – Index of words
  • word (str) – Word to be added to the index
  • page_url (str) – url to be added in the list of urls associated with “word”
Returns:

Word index with “word” and “page_url” added/updated.

Return type:

dict

Notes

This function modifies the input dictionary in-situ (in place)

Indices and tables