-----------------------------------
| ==> installation info <== |
-----------------------------------
synOCR-user: synOCR
synOCR-user is admin: yes
synOCR-version: 1.4.5
Architecture: x86_64
DSM-build: 69057
Device: 920plus (3647395103)
current Profil: default
monitor is running?: yes
DB-version: 9
used image (created): geimist/ocrmypdf-polyglot:latest (2024-04-16T20:14:01)
document author:
used ocr-parameter (raw): -srd -l deu
ocropt_array: -srd -l deu
search prefix:
replace search prefix: yes
renaming syntax: §yocr4-§mocr-§docr_§tag
Symbol for tag marking:
target file handling: useCatDir
Document split pattern: SYNOCR-SEPARATOR-SHEET
split page handling: discard
delete blank pages:
threshold black/white:
threshold black pixels:
clean up spaces: false
Date search method: use Python
date found order: firstfound
source for filedate: ocr
ignored dates by search: ;
date range in past: 0 [absolute: 0]
date range in future: 0 [absolute: 0]
Docker test: OK
DSM notify to user: almin
apprise notify service:
apprise attachment: false
notify language: ger
Loglevel: normal
max. count of logfiles: 10
rotate backupfiles after: (purge backup deactivated)
Source directory: /volume1/DokumenteScan/INPUT/
Target directory: /volume1/DokumenteScan/OUTPUT/
BackUp directory: /volume1/DokumenteScan/BACKUP/
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
● ---------------------------------- ●
● | ==> RUN THE FUNCTIONS <== | ●
● ---------------------------------- ●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
-----------------------------------------------------------------------------------
| check the python3 installation and the necessary modules: |
-----------------------------------------------------------------------------------
prepare_python: OK
-----------------------------------------------------------------------------------
| convert images to pdf |
-----------------------------------------------------------------------------------
nothing to do ...
Target temp directory: /tmp/tmp.gH7Eg4t8yu
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
● STEP 1 - RUN OCR / SPLIT FILES, IF NEEDED: ●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
CURRENT FILE: ➜ pdf_04172024_171545_000081.pdf
temp. target file: /tmp/tmp.gH7Eg4t8yu/step1_tmp_1713366973/pdf_04172024_171545_000081.pdf
-----------------------------------------------------------------------------------
| processing PDF @ OCRmyPDF: |
-----------------------------------------------------------------------------------
➜ OCRmyPDF-LOG:
reading file from standard input
Start processing 4 pages concurrently
2 page is facing ⇧, confidence 11.09 - no change
3 page is facing ⇧, confidence 11.73 - no change
4 page is facing ⇧, confidence 13.57 - no change
1 page is facing ⇧, confidence 11.59 - no change
5 page is facing ⇧, confidence 6.91 - no change
2 [tesseract] lots of diacritics - possibly poor OCR
Postprocessing...
Image optimization ratio: 1.09 savings: 8.2%
Total file size ratio: 0.95 savings: -5.8%
Output sent to stdout
← OCRmyPDF-LOG-END
target file (OK): /tmp/tmp.gH7Eg4t8yu/step1_tmp_1713366973/pdf_04172024_171545_000081.pdf
-----------------------------------------------------------------------------------
| document split handling: |
-----------------------------------------------------------------------------------
splitpage count: 0
no separator sheet found, or number of pages too small
-----------------------------------------------------------------------------------
| handle source file: |
-----------------------------------------------------------------------------------
➜ backup source file to: /volume1/DokumenteScan/BACKUP/pdf_04172024_171545_000081.pdf
removed directory '/tmp/tmp.gH7Eg4t8yu/step1_tmp_1713366973/'
Stats:
runtime last file: ➜ 00:00:56
runtime 1st step (all files): ➜ 00:00:57
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
● STEP 2 - SEARCH TAGS / RENAME / SORT: ●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
CURRENT FILE: ➜ pdf_04172024_171545_000081.pdf
-----------------------------------------------------------------------------------
| search tags in ocr text: |
-----------------------------------------------------------------------------------
source for tags is yaml based tag rule file [/volume1/DokumenteScan/INPUT/_TagConfig_[profile_default].txt]
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/usr/syno/synoman/webman/3rdparty/synOCR/python3_env/lib/python3.8/site-packages/yaml/__init__.py", line 125, in safe_load
return load(stream, SafeLoader)
File "/usr/syno/synoman/webman/3rdparty/synOCR/python3_env/lib/python3.8/site-packages/yaml/__init__.py", line 81, in load
return loader.get_single_data()
File "/usr/syno/synoman/webman/3rdparty/synOCR/python3_env/lib/python3.8/site-packages/yaml/constructor.py", line 49, in get_single_data
node = self.get_single_node()
File "/usr/syno/synoman/webman/3rdparty/synOCR/python3_env/lib/python3.8/site-packages/yaml/composer.py", line 36, in get_single_node
document = self.compose_document()
File "/usr/syno/synoman/webman/3rdparty/synOCR/python3_env/lib/python3.8/site-packages/yaml/composer.py", line 55, in compose_document
node = self.compose_node(None, None)
File "/usr/syno/synoman/webman/3rdparty/synOCR/python3_env/lib/python3.8/site-packages/yaml/composer.py", line 84, in compose_node
node = self.compose_mapping_node(anchor)
File "/usr/syno/synoman/webman/3rdparty/synOCR/python3_env/lib/python3.8/site-packages/yaml/composer.py", line 133, in compose_mapping_node
item_value = self.compose_node(node, item_key)
File "/usr/syno/synoman/webman/3rdparty/synOCR/python3_env/lib/python3.8/site-packages/yaml/composer.py", line 84, in compose_node
node = self.compose_mapping_node(anchor)
File "/usr/syno/synoman/webman/3rdparty/synOCR/python3_env/lib/python3.8/site-packages/yaml/composer.py", line 133, in compose_mapping_node
item_value = self.compose_node(node, item_key)
File "/usr/syno/synoman/webman/3rdparty/synOCR/python3_env/lib/python3.8/site-packages/yaml/composer.py", line 82, in compose_node
node = self.compose_sequence_node(anchor)
File "/usr/syno/synoman/webman/3rdparty/synOCR/python3_env/lib/python3.8/site-packages/yaml/composer.py", line 111, in compose_sequence_node
node.value.append(self.compose_node(node, index))
File "/usr/syno/synoman/webman/3rdparty/synOCR/python3_env/lib/python3.8/site-packages/yaml/composer.py", line 84, in compose_node
node = self.compose_mapping_node(anchor)
File "/usr/syno/synoman/webman/3rdparty/synOCR/python3_env/lib/python3.8/site-packages/yaml/composer.py", line 133, in compose_mapping_node
item_value = self.compose_node(node, item_key)
File "/usr/syno/synoman/webman/3rdparty/synOCR/python3_env/lib/python3.8/site-packages/yaml/composer.py", line 64, in compose_node
if self.check_event(AliasEvent):
File "/usr/syno/synoman/webman/3rdparty/synOCR/python3_env/lib/python3.8/site-packages/yaml/parser.py", line 98, in check_event
self.current_event = self.state()
File "/usr/syno/synoman/webman/3rdparty/synOCR/python3_env/lib/python3.8/site-packages/yaml/parser.py", line 449, in parse_block_mapping_value
if not self.check_token(KeyToken, ValueToken, BlockEndToken):
File "/usr/syno/synoman/webman/3rdparty/synOCR/python3_env/lib/python3.8/site-packages/yaml/scanner.py", line 116, in check_token
self.fetch_more_tokens()
File "/usr/syno/synoman/webman/3rdparty/synOCR/python3_env/lib/python3.8/site-packages/yaml/scanner.py", line 251, in fetch_more_tokens
return self.fetch_double()
File "/usr/syno/synoman/webman/3rdparty/synOCR/python3_env/lib/python3.8/site-packages/yaml/scanner.py", line 655, in fetch_double
self.fetch_flow_scalar(style='"')
File "/usr/syno/synoman/webman/3rdparty/synOCR/python3_env/lib/python3.8/site-packages/yaml/scanner.py", line 666, in fetch_flow_scalar
self.tokens.append(self.scan_flow_scalar(style))
File "/usr/syno/synoman/webman/3rdparty/synOCR/python3_env/lib/python3.8/site-packages/yaml/scanner.py", line 1149, in scan_flow_scalar
chunks.extend(self.scan_flow_scalar_non_spaces(double, start_mark))
File "/usr/syno/synoman/webman/3rdparty/synOCR/python3_env/lib/python3.8/site-packages/yaml/scanner.py", line 1223, in scan_flow_scalar_non_spaces
raise ScannerError("while scanning a double-quoted scalar", start_mark,
yaml.scanner.ScannerError: while scanning a double-quoted scalar
in "<unicode string>", line 184, column 21:
- searchstring: "(?i)(?|(Steuer(\D*[N|n]um\S+|[\ ...
^
found unknown escape character 'D'
in "<unicode string>", line 184, column 38:
... searchstring: "(?i)(?|(Steuer(\D*[N|n]um\S+|[\.\-\:\;\s]*Nr\S+) ...
^
ERROR at line 448: tag_rule_content=$( python3 -c 'import sys, yaml, json; print(json.dumps(yaml.safe_load(sys.stdin.read()), indent=2, sort_keys=False))' < "${taglisttmp}")
ERROR - YAML-check failed!ERROR at line 2456: return 1
-----------------------------------------------------------------------------------
| search for a valid date in ocr text: |
-----------------------------------------------------------------------------------
2024-04-17 17:17:12,173 - Date scanning started
2024-04-17 17:17:12,174 - Version: 1.04
2024-04-17 17:17:12,174 - Parameter minYear = 0
2024-04-17 17:17:12,174 - Parameter maxYear = 0
2024-04-17 17:17:12,174 - Parameter searchnearest = off
2024-04-17 17:17:12,174 - set searchnearest = off
2024-04-17 17:17:12,174 - Parameter fileWithTextFindings = /tmp/tmp.gH7Eg4t8yu/step2_tmp_1713367030//synOCR.txt
2024-04-17 17:17:12,175 - Parameter dateBlackLIst = ;
2024-04-17 17:17:12,175 - start checking blacklist
2024-04-17 17:17:12,176 - end checking blacklist
2024-04-17 17:17:12,176 - Start searching for alphanumerical and numerical dates......
2024-04-17 17:17:19,172 - finish searching for alphanumerical and numerical dates......
2024-04-17 17:17:19,172 - found 2 dates
2024-04-17 17:17:19,172 - found date 2024-04-09
2024-04-17 17:17:19,172 - Date scanning ended
Dates found: 1
check date ([yy]yy mm dd): 2024-04-09
➜ valid
day: 09
month:04
year: 2024
-----------------------------------------------------------------------------------
| rename and sort to target folder: |
-----------------------------------------------------------------------------------
➜ renaming:
apply renaming syntax ➜ 2024-04-09_
➜ insert metadata (use python pikepdf)
used metadata:
➜ '/Author': '',
➜ '/Keywords': '',
➜ '/CreationDate': 'D:20240409',
➜ '/CreatorTool': 'synOCR 1.4.5'
2024-04-17 17:17:19,684 - INFO - HandlePdf started
2024-04-17 17:17:19,685 - INFO - Version: 0.2
2024-04-17 17:17:19,685 - INFO - Task=metadata
2024-04-17 17:17:19,688 - INFO - >>>>> write meta_data started
2024-04-17 17:17:19,699 - INFO - save pdf to file (/tmp/tmp.gH7Eg4t8yu/step2_tmp_1713367030/temp_pdf_04172024_171545_000081_1713367030.pdf_meta.pdf)
empty
0
➜ File name already exists! Add counter (1)
target file: 2024-04-09_ (1).pdf
-----------------------------------------------------------------------------------
| adjusts the attributes of the target file: |
-----------------------------------------------------------------------------------
➜ Adapt file date (Source: OCR)
-----------------------------------------------------------------------------------
| final tasks: |
-----------------------------------------------------------------------------------
INFO: Notify for apprise not defined ...
run user defined post scripts:
Stats:
runtime last file: ➜ 00:00:10
pagecount last file: ➜ 5
file count profile : ➜ (profile default) - 32972 PDF's / 41285 Pages processed up to now
file count total: ➜ 33479 PDF's / 42936 Pages processed up to now since 2021-09-03
cleanup:
delete tmp-files ...
removed '/tmp/tmp.gH7Eg4t8yu/pdf_04172024_171545_000081.pdf'
removed '/tmp/tmp.gH7Eg4t8yu/step2_tmp_1713367030/tmprulefile.txt'
removed '/tmp/tmp.gH7Eg4t8yu/step2_tmp_1713367030/synOCR.txt'
removed '/tmp/tmp.gH7Eg4t8yu/step2_tmp_1713367030/synOCR_filename.txt'
removed directory '/tmp/tmp.gH7Eg4t8yu/step2_tmp_1713367030/'
removed directory '/tmp/tmp.gH7Eg4t8yu'
purge log files ...
delete 1 log files ( > 10 files)
delete 0 search files ( > 10 files)
purge backup deactivated!
runtime all files: ➜ 00:01:08
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
● ---------------------------------- ●
● | ==> END OF FUNCTIONS <== | ●
● ---------------------------------- ●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●