nti.contentfragments.censor module

algorithms for content censoring.

The algorithms contained in here are trivially simple. We could do much better, for example, with prefix trees. See https://hkn.eecs.berkeley.edu/~dyoo/python/ahocorasick/ and http://pypi.python.org/pypi/trie/0.1.1

If efficiency really matters, and we have many different filters we are applying, we would need to do a better job pipelining to avoid copies

class nti.contentfragments.censor.BasicScanner[source]

Bases: object

do_scan(fragment, ranges)[source]

do_scan is passed a fragment that is guaranteed to be unicode and lower case.

scan(content_fragment)[source]
test_range(new_range, yielded)[source]
class nti.contentfragments.censor.DefaultCensoredContentPolicy(fragment=None, target=None)[source]

Bases: object

A content censoring policy that looks up the default scanner and strategy utilities and uses them.

This package does not register this policy as an adapter for anything, you must do that yourself, on (content-fragment, target-object); it can also be registered as a utility or instantiated directly with no arguments.

censor(fragment, target)[source]
censor_html(fragment, target)[source]
censor_text(fragment, target)[source]
class nti.contentfragments.censor.NoOpCensoredContentPolicy(*args, **kwargs)[source]

Bases: object

A content censoring policy that does no censoring whatesover.

This package does not register this policy as an adapter for anything, you must do that yourself, on (content-fragment, target-object); it can also be registered as a utility or instantiated directly with no arguments.

censor(fragment, _target)[source]
class nti.contentfragments.censor.PipeLineMatchScanner(scanners=())[source]

Bases: nti.contentfragments.censor.BasicScanner

do_scan(content_fragment, yielded)[source]

do_scan is passed a fragment that is guaranteed to be unicode and lower case.

class nti.contentfragments.censor.SimpleReplacementCensoredContentStrategy(replacement_char=u'*')[source]

Bases: object

censor_ranges(content_fragment, censored_ranges)[source]
class nti.contentfragments.censor.TrivialMatchScanner(prohibited_values=())[source]

Bases: nti.contentfragments.censor.BasicScanner

do_scan(content_fragment, yielded)[source]

do_scan is passed a fragment that is guaranteed to be unicode and lower case.

class nti.contentfragments.censor.WordMatchScanner(white_words=(), prohibited_words=())[source]

Bases: nti.contentfragments.censor.BasicScanner

do_scan(content_fragment, yielded)[source]

do_scan is passed a fragment that is guaranteed to be unicode and lower case.

char_tester[source]
nti.contentfragments.censor.censor_assign(fragment, target, field_name)[source]

Perform manual censoring of assigning an object to a field.

nti.contentfragments.censor.censor_before_assign_components_of_sequence(sequence, target, event)[source]

Register this adapter for (usually any) sequence, some specific interface target, and the nti.schema.interfaces.IBeforeSequenceAssignedEvent and it will iterate across the fields and attempt to censor each of them.

This package DOES NOT register this event.

nti.contentfragments.censor.censor_before_text_assigned(fragment, target, event)[source]

Watches for field values to be assigned, and looks for specific policies for the given object and field name to handle censoring. If such a policy is found and returns something that is not the original fragment, the event is updated (and so the value assigned to the target is also updated).

nti.contentfragments.censor.punkt_re_char(lang='en')[source]