Auf Thema antworten

Neue Wiki Funktionalität - Wir haben das alte synology-wiki ins Forum übertragen.

Jetzt ist die Hilfe von allen gefragt. Gemeinsam können wir das Wiki auf einen aktuellen Stand bringen. Bitte helfe mit.

Hier gehts zum neuen Wiki

Hier gibt es weitere Informationen zum neuen Wiki

Beitrag

Selbstverständlich. Hier die Beiden Bereiche aus dem OCR Prozess und der Datumserkennung

[CODE]

-----------------------------------------------------------------------------------
| processing PDF @ OCRmyPDF: |
-----------------------------------------------------------------------------------

➜ OCRmyPDF-LOG:
reading file from standard input
Start processing 4 pages concurrently
1 page is facing ⇧, confidence 14.29 - rotation appears correct
4 page is facing ⇧, confidence 14.87 - rotation appears correct
2 page is facing ⇧, confidence 15.51 - rotation appears correct
3 page is facing ⇧, confidence 15.86 - rotation appears correct
4 [tesseract] lots of diacritics - possibly poor OCR
6 [tesseract] Too few characters. Skipping this page
6 [tesseract] Too few characters. Skipping this page
6 [tesseract] Error during processing.
6 page is facing ⇧, confidence 0.00 - no change
5 page is facing ⇧, confidence 13.73 - no change
6 [tesseract] Empty page!!
6 [tesseract] Empty page!!
Postprocessing...
Optimize ratio: 1.00 savings: -0.0%
Image optimization did not improve the file - optimizations will not be used
Output sent to stdout
← OCRmyPDF-LOG-END

target file (OK): /tmp/tmp.shl28WamYI/step1_tmp_1727708403/300982024165537.pdf

-----------------------------------------------------------------------------------
| search for a valid date in ocr text: |
-----------------------------------------------------------------------------------

2024-09-30 17:00:50,955 - Date scanning started
2024-09-30 17:00:50,955 - Version: 1.04
2024-09-30 17:00:50,955 - Parameter minYear = 0
2024-09-30 17:00:50,955 - Parameter maxYear = 0
2024-09-30 17:00:50,955 - Parameter searchnearest = off
2024-09-30 17:00:50,955 - set searchnearest = off
2024-09-30 17:00:50,955 - Parameter fileWithTextFindings = /tmp/tmp.shl28WamYI/step2_tmp_1727708446//synOCR.txt
2024-09-30 17:00:50,955 - Parameter dateBlackLIst = off
2024-09-30 17:00:50,955 - start checking blacklist
2024-09-30 17:00:51,077 - end checking blacklist
2024-09-30 17:00:51,078 - Start searching for alphanumerical and numerical dates......
2024-09-30 17:00:55,206 - finish searching for alphanumerical and numerical dates......
2024-09-30 17:00:55,207 - found 0 dates
2024-09-30 17:00:55,207 - no dates found
2024-09-30 17:00:55,207 - found date None
2024-09-30 17:00:55,207 - Date scanning ended
Date not found in OCR text - use file date:
day: 30
month:09
year: 2024[/CODE]

Im Anhang ist der obere Teil der Pdf angehängt. Komischerweise wird auch die Leerseite nicht entfernt.

<blockquote data-quote="schlomo" data-source="post: 1196952" data-attributes="member: 140622">Selbstverständlich. Hier die Beiden Bereiche aus dem OCR Prozess und der Datumserkennung[CODE] &nbsp;-----------------------------------------------------------------------------------&nbsp; | processing PDF @ OCRmyPDF:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; |&nbsp; -----------------------------------------------------------------------------------&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; ➜ OCRmyPDF-LOG:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; reading file from standard input&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Start processing 4 pages concurrently&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 1 page is facing ⇧, confidence 14.29 - rotation appears correct&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 4 page is facing ⇧, confidence 14.87 - rotation appears correct&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 2 page is facing ⇧, confidence 15.51 - rotation appears correct&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 3 page is facing ⇧, confidence 15.86 - rotation appears correct&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 4 [tesseract] lots of diacritics - possibly poor OCR&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 6 [tesseract] Too few characters. Skipping this page&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 6 [tesseract] Too few characters. Skipping this page&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 6 [tesseract] Error during processing.&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 6 page is facing ⇧, confidence 0.00 - no change&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 5 page is facing ⇧, confidence 13.73 - no change&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 6 [tesseract] Empty page!!&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 6 [tesseract] Empty page!!&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Postprocessing...&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Optimize ratio: 1.00 savings: -0.0%&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Image optimization did not improve the file - optimizations will not be used&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Output sent to stdout&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; ← OCRmyPDF-LOG-END&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; target file (OK): /tmp/tmp.shl28WamYI/step1_tmp_1727708403/300982024165537.pdf-----------------------------------------------------------------------------------&nbsp; | search for a valid date in ocr text:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; |&nbsp; -----------------------------------------------------------------------------------2024-09-30 17:00:50,955 - Date scanning started2024-09-30 17:00:50,955 - Version: 1.042024-09-30 17:00:50,955 - Parameter minYear = 02024-09-30 17:00:50,955 - Parameter maxYear = 02024-09-30 17:00:50,955 - Parameter searchnearest = off2024-09-30 17:00:50,955 - set searchnearest = off2024-09-30 17:00:50,955 - Parameter fileWithTextFindings = /tmp/tmp.shl28WamYI/step2_tmp_1727708446//synOCR.txt2024-09-30 17:00:50,955 - Parameter dateBlackLIst = off2024-09-30 17:00:50,955 - start checking blacklist2024-09-30 17:00:51,077 - end checking blacklist2024-09-30 17:00:51,078 - Start searching for alphanumerical and numerical dates......2024-09-30 17:00:55,206 - finish searching for alphanumerical and numerical dates......2024-09-30 17:00:55,207 - found 0 dates2024-09-30 17:00:55,207 - no dates found2024-09-30 17:00:55,207 - found date None2024-09-30 17:00:55,207 - Date scanning ended&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Date not found in OCR text - use file date:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; day:&nbsp; 30&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; month:09&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; year: 2024[/CODE]Im Anhang ist der obere Teil der Pdf angehängt. Komischerweise wird auch die Leerseite nicht entfernt.</blockquote>

[QUOTE="schlomo, post: 1196952, member: 140622"] Selbstverständlich. Hier die Beiden Bereiche aus dem OCR Prozess und der Datumserkennung [CODE] ----------------------------------------------------------------------------------- | processing PDF @ OCRmyPDF: | ----------------------------------------------------------------------------------- ➜ OCRmyPDF-LOG: reading file from standard input Start processing 4 pages concurrently 1 page is facing ⇧, confidence 14.29 - rotation appears correct 4 page is facing ⇧, confidence 14.87 - rotation appears correct 2 page is facing ⇧, confidence 15.51 - rotation appears correct 3 page is facing ⇧, confidence 15.86 - rotation appears correct 4 [tesseract] lots of diacritics - possibly poor OCR 6 [tesseract] Too few characters. Skipping this page 6 [tesseract] Too few characters. Skipping this page 6 [tesseract] Error during processing. 6 page is facing ⇧, confidence 0.00 - no change 5 page is facing ⇧, confidence 13.73 - no change 6 [tesseract] Empty page!! 6 [tesseract] Empty page!! Postprocessing... Optimize ratio: 1.00 savings: -0.0% Image optimization did not improve the file - optimizations will not be used Output sent to stdout ← OCRmyPDF-LOG-END target file (OK): /tmp/tmp.shl28WamYI/step1_tmp_1727708403/300982024165537.pdf ----------------------------------------------------------------------------------- | search for a valid date in ocr text: | ----------------------------------------------------------------------------------- 2024-09-30 17:00:50,955 - Date scanning started 2024-09-30 17:00:50,955 - Version: 1.04 2024-09-30 17:00:50,955 - Parameter minYear = 0 2024-09-30 17:00:50,955 - Parameter maxYear = 0 2024-09-30 17:00:50,955 - Parameter searchnearest = off 2024-09-30 17:00:50,955 - set searchnearest = off 2024-09-30 17:00:50,955 - Parameter fileWithTextFindings = /tmp/tmp.shl28WamYI/step2_tmp_1727708446//synOCR.txt 2024-09-30 17:00:50,955 - Parameter dateBlackLIst = off 2024-09-30 17:00:50,955 - start checking blacklist 2024-09-30 17:00:51,077 - end checking blacklist 2024-09-30 17:00:51,078 - Start searching for alphanumerical and numerical dates...... 2024-09-30 17:00:55,206 - finish searching for alphanumerical and numerical dates...... 2024-09-30 17:00:55,207 - found 0 dates 2024-09-30 17:00:55,207 - no dates found 2024-09-30 17:00:55,207 - found date None 2024-09-30 17:00:55,207 - Date scanning ended Date not found in OCR text - use file date: day: 30 month:09 year: 2024[/CODE] Im Anhang ist der obere Teil der Pdf angehängt. Komischerweise wird auch die Leerseite nicht entfernt. [/QUOTE]

Additional post fields

Authentifizierung

NAS-Central - Ihr Partner für NAS Lösungen

Oben Unten

Suche

Suche

Auf Thema antworten

Additional post fields