Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
Odevzdávací Systém MO
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Deploy
Model registry
Analyze
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
GitLab community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
MO-P
Odevzdávací Systém MO
Merge requests
!9
WIP: Zárodek uživatelské části webu a submitování
Code
Review changes
Check out branch
Download
Patches
Plain diff
Expand sidebar
Closed
WIP: Zárodek uživatelské části webu a submitování
devel
into
master
Overview
4
Commits
128
Changes
51
Closed
WIP: Zárodek uživatelské části webu a submitování
Martin Mareš
requested to merge
devel
into
master
Jan 6, 2021
Overview
4
Commits
128
Changes
41
0
0
Merge request reports
Compare
version 1
version 68
9bf70cfc
Jan 11, 2021
version 67
1fd12d8e
Jan 11, 2021
version 66
ac706593
Jan 11, 2021
version 65
2481d477
Jan 11, 2021
version 64
d7e1b326
Jan 11, 2021
version 63
58bcb31b
Jan 11, 2021
version 62
b7137b52
Jan 11, 2021
version 61
a09b8eab
Jan 11, 2021
version 60
302966bb
Jan 11, 2021
version 59
ea14afa2
Jan 11, 2021
version 58
bee33ba5
Jan 11, 2021
version 57
7c106e92
Jan 11, 2021
version 56
a7305d14
Jan 11, 2021
version 55
17691b80
Jan 11, 2021
version 54
fef6c036
Jan 11, 2021
version 53
3c704983
Jan 11, 2021
version 52
ff520f85
Jan 11, 2021
version 51
3d53b2c9
Jan 11, 2021
version 50
07d3b23c
Jan 11, 2021
version 49
7bc0f67f
Jan 11, 2021
version 48
e07d1e6c
Jan 10, 2021
version 47
284fbfc7
Jan 10, 2021
version 46
318bef11
Jan 10, 2021
version 45
2fe7db9f
Jan 10, 2021
version 44
e077e427
Jan 10, 2021
version 43
5a4da8a0
Jan 10, 2021
version 42
765ffd35
Jan 10, 2021
version 41
7614a61e
Jan 9, 2021
version 40
ae1503ae
Jan 9, 2021
version 39
7ac9bfea
Jan 9, 2021
version 38
b36c20c3
Jan 9, 2021
version 37
c9956d4e
Jan 9, 2021
version 36
e8fbdb2f
Jan 9, 2021
version 35
f750836a
Jan 9, 2021
version 34
e6dca6a3
Jan 9, 2021
version 33
d55b49af
Jan 9, 2021
version 32
35314b1a
Jan 9, 2021
version 31
b2dc4522
Jan 8, 2021
version 30
a698b722
Jan 8, 2021
version 29
4e17e571
Jan 8, 2021
version 28
105913d5
Jan 8, 2021
version 27
3f85e030
Jan 8, 2021
version 26
c331d1c2
Jan 7, 2021
version 25
cb7c4c3a
Jan 7, 2021
version 24
8105f07e
Jan 7, 2021
version 23
a0aeeb3e
Jan 7, 2021
version 22
9e671a06
Jan 7, 2021
version 21
e5445f8a
Jan 7, 2021
version 20
e2bcbb26
Jan 7, 2021
version 19
94177a01
Jan 7, 2021
version 18
6407ccae
Jan 7, 2021
version 17
1ad40c9b
Jan 7, 2021
version 16
e8d90711
Jan 7, 2021
version 15
0dc2103b
Jan 7, 2021
version 14
1b662c94
Jan 7, 2021
version 13
e365502c
Jan 7, 2021
version 12
7f80a7a4
Jan 7, 2021
version 11
12c0e908
Jan 7, 2021
version 10
c51971b8
Jan 7, 2021
version 9
8d03aba2
Jan 7, 2021
version 8
4f677798
Jan 7, 2021
version 7
6c1d1cfb
Jan 7, 2021
version 6
67f416d6
Jan 7, 2021
version 5
f8890dd7
Jan 7, 2021
version 4
741ca3b0
Jan 6, 2021
version 3
0cc470b8
Jan 6, 2021
version 2
7a1b1030
Jan 6, 2021
version 1
c8510461
Jan 6, 2021
master (base)
and
version 23
latest version
5e7f4d88
128 commits,
Jan 11, 2021
version 68
9bf70cfc
126 commits,
Jan 11, 2021
version 67
1fd12d8e
125 commits,
Jan 11, 2021
version 66
ac706593
124 commits,
Jan 11, 2021
version 65
2481d477
123 commits,
Jan 11, 2021
version 64
d7e1b326
122 commits,
Jan 11, 2021
version 63
58bcb31b
121 commits,
Jan 11, 2021
version 62
b7137b52
120 commits,
Jan 11, 2021
version 61
a09b8eab
119 commits,
Jan 11, 2021
version 60
302966bb
118 commits,
Jan 11, 2021
version 59
ea14afa2
117 commits,
Jan 11, 2021
version 58
bee33ba5
116 commits,
Jan 11, 2021
version 57
7c106e92
115 commits,
Jan 11, 2021
version 56
a7305d14
114 commits,
Jan 11, 2021
version 55
17691b80
113 commits,
Jan 11, 2021
version 54
fef6c036
112 commits,
Jan 11, 2021
version 53
3c704983
111 commits,
Jan 11, 2021
version 52
ff520f85
109 commits,
Jan 11, 2021
version 51
3d53b2c9
108 commits,
Jan 11, 2021
version 50
07d3b23c
107 commits,
Jan 11, 2021
version 49
7bc0f67f
95 commits,
Jan 11, 2021
version 48
e07d1e6c
94 commits,
Jan 10, 2021
version 47
284fbfc7
92 commits,
Jan 10, 2021
version 46
318bef11
90 commits,
Jan 10, 2021
version 45
2fe7db9f
89 commits,
Jan 10, 2021
version 44
e077e427
88 commits,
Jan 10, 2021
version 43
5a4da8a0
80 commits,
Jan 10, 2021
version 42
765ffd35
79 commits,
Jan 10, 2021
version 41
7614a61e
78 commits,
Jan 9, 2021
version 40
ae1503ae
78 commits,
Jan 9, 2021
version 39
7ac9bfea
77 commits,
Jan 9, 2021
version 38
b36c20c3
76 commits,
Jan 9, 2021
version 37
c9956d4e
74 commits,
Jan 9, 2021
version 36
e8fbdb2f
73 commits,
Jan 9, 2021
version 35
f750836a
67 commits,
Jan 9, 2021
version 34
e6dca6a3
66 commits,
Jan 9, 2021
version 33
d55b49af
65 commits,
Jan 9, 2021
version 32
35314b1a
64 commits,
Jan 9, 2021
version 31
b2dc4522
60 commits,
Jan 8, 2021
version 30
a698b722
59 commits,
Jan 8, 2021
version 29
4e17e571
58 commits,
Jan 8, 2021
version 28
105913d5
57 commits,
Jan 8, 2021
version 27
3f85e030
56 commits,
Jan 8, 2021
version 26
c331d1c2
55 commits,
Jan 7, 2021
version 25
cb7c4c3a
54 commits,
Jan 7, 2021
version 24
8105f07e
51 commits,
Jan 7, 2021
version 23
a0aeeb3e
50 commits,
Jan 7, 2021
version 22
9e671a06
47 commits,
Jan 7, 2021
version 21
e5445f8a
46 commits,
Jan 7, 2021
version 20
e2bcbb26
45 commits,
Jan 7, 2021
version 19
94177a01
44 commits,
Jan 7, 2021
version 18
6407ccae
43 commits,
Jan 7, 2021
version 17
1ad40c9b
42 commits,
Jan 7, 2021
version 16
e8d90711
41 commits,
Jan 7, 2021
version 15
0dc2103b
40 commits,
Jan 7, 2021
version 14
1b662c94
39 commits,
Jan 7, 2021
version 13
e365502c
38 commits,
Jan 7, 2021
version 12
7f80a7a4
37 commits,
Jan 7, 2021
version 11
12c0e908
36 commits,
Jan 7, 2021
version 10
c51971b8
35 commits,
Jan 7, 2021
version 9
8d03aba2
34 commits,
Jan 7, 2021
version 8
4f677798
33 commits,
Jan 7, 2021
version 7
6c1d1cfb
32 commits,
Jan 7, 2021
version 6
67f416d6
31 commits,
Jan 7, 2021
version 5
f8890dd7
30 commits,
Jan 7, 2021
version 4
741ca3b0
29 commits,
Jan 6, 2021
version 3
0cc470b8
28 commits,
Jan 6, 2021
version 2
7a1b1030
19 commits,
Jan 6, 2021
version 1
c8510461
18 commits,
Jan 6, 2021
Show latest version
41 files
+
1239
−
342
Inline
Compare changes
Side-by-side
Inline
Show whitespace changes
Show one file at a time
Files
41
bin/shorten-schools
0 → 100755
+
338
−
0
View file @ a0aeeb3e
Edit in single-file editor
Open in Web IDE
#!/usr/bin/env python3
"""
Zkrátí v databázi oficiální dlouhá jména škol na něco čitelnějšího, uloží
do sloupce places.name.
Algoritmus se jména snaží dostat do podoby ZKRÁCENÉ_JMÉNO, kde
ZKRÁCENÉ_JMÉNO = NÁZEV MÍSTO
NÁZEV = např.
"
SŠ
"
,
"
ZŠ T. G. Masaryka
"
,
"
SPŠ strojnická a SOŠ profesora Švejcara
"
MÍSTO = MĚSTO [ULICE [Č.P.]]
např.
"
Slatinice
"
,
"
Praha 7
"
,
"
Olomouc, Svatoplukova
"
Může existovat víc možností zkrácení, např.
ZŠ a MŠ Olomouc, Svatoplukova 11
ZŠ a MŠ Olomouc, Svatoplukova
ZŠ a MŠ Olomouc
Algoritmus vytvoří všechny varianty jmen a pak kontroluje, jestli při použití
nejkratší varianty (
"
ZŠ a MŠ Olomouc
"
) nenastane konflikt jmen s jinou školou.
Pokud ano, zkusí použít pro obě školy delší variantu názvu. Toto se opakuje,
dokud se konflikty nevyřeší.
"""
import
copy
import
random
import
re
import
sys
import
argparse
from
sqlalchemy.orm
import
aliased
import
mo.db
as
db
def
eprint
(
*
args
,
**
kwargs
):
print
(
*
args
,
file
=
sys
.
stderr
,
**
kwargs
)
def
sorted_by_length
(
schools
):
schools2
=
copy
.
copy
(
schools
)
schools2
.
sort
(
key
=
lambda
sc
:
len
(
sc
[
"
names
"
][
-
1
]))
return
schools2
def
summarize
(
schools
,
k
=
5
):
lens
=
[
len
(
sc
[
"
names
"
][
-
1
])
for
sc
in
schools
]
avg_len
=
sum
(
lens
)
/
len
(
schools
)
eprint
(
"
Average length:
"
,
avg_len
)
eprint
(
"
Maximum length:
"
,
max
(
lens
))
names_by_lens
=
sorted_by_length
(
schools
)
eprint
()
eprint
(
f
"
{
k
}
longest:
"
)
for
sc
in
names_by_lens
[::
-
1
][:
k
]:
eprint
(
f
'
{
sc
[
"
names
"
][
-
1
]
}
(@
{
sc
[
"
city
"
]
}
)
'
)
random
.
shuffle
(
names_by_lens
)
eprint
()
eprint
(
f
"
{
k
}
random:
"
)
for
sc
in
names_by_lens
[:
k
]:
eprint
(
f
'
Old:
{
sc
[
"
names
"
][
0
]
}
'
)
eprint
(
f
'
{
sc
[
"
names
"
][
-
1
]
}
'
)
eprint
()
def
remove_formalities
(
name
):
for
formality
in
formalities
:
name
=
re
.
sub
(
formality
,
""
,
name
,
flags
=
re
.
IGNORECASE
)
return
name
def
shorten_name
(
name
):
for
re_from
,
re_to
in
school_kinds
:
name
=
re
.
sub
(
re_from
,
re_to
,
name
,
flags
=
re
.
IGNORECASE
)
return
name
def
partition
(
name
,
city
):
"""
Rozdělí název školy na část před názvem města a část po názvu města
"""
# Zkouší drobné úpravy názvu města
for
rule
in
city_rules
:
# Pro slova jako "Táborské" chceme odstranit i zbytek slova, nejen "Tábor"
pat
=
r
"
\b{}\w*\b
"
.
format
(
city
)
if
re
.
search
(
pat
,
name
)
is
not
None
:
parts
=
re
.
split
(
pat
,
name
)
if
len
(
parts
)
!=
2
:
# Název města se vyskytuje víckrát, není jasné, co dělat
return
None
else
:
ok
=
True
for
kind
,
_
in
school_kinds
:
if
kind
.
lower
()
in
parts
[
1
].
lower
():
ok
=
False
if
not
ok
:
# Názvová část školy pokračuje i po názvu města (např. "Táborské gymnázium"),
# nelze automaticky vyřešit
return
None
else
:
return
parts
if
rule
is
not
None
:
city
=
re
.
sub
(
rule
[
0
],
rule
[
1
],
city
)
# Nenašli jsme název města
return
[
name
]
def
remove_house_number
(
name
):
name
,
n
=
re
.
subn
(
r
"
(, ([^\W\d_]| |\.)+) [0-9/]+[a-z]?$
"
,
r
"
\1
"
,
name
)
# True, pokud se název změnil
return
name
,
n
>
0
def
should_have_comma_after_name
(
p_name
):
# Čárku chceme v případech jako
# "Základní škola generála Zdeňka Škarvady, Ostrava-Poruba"
# ale ne pro
# "Základní škola Dolní Ředice, okres Pardubice"
for
sk
in
school_kinds
:
if
p_name
.
endswith
(
sk
):
return
False
return
True
def
postprocess_name_part
(
p_name
):
# Vyřeší okrajové případy části názvu před městem
p_name
=
p_name
.
strip
(
"
,-
"
)
p_name
=
re
.
sub
(
"
v$
"
,
""
,
p_name
)
# Pro případy jako "G v Kroměříži" -> "G v, Kroměříž"
if
should_have_comma_after_name
(
p_name
):
p_name
+=
"
,
"
return
p_name
def
shorten_all
(
schools
):
for
sc
in
schools
:
sc
[
"
names
"
].
append
(
remove_formalities
(
sc
[
"
names
"
][
-
1
]))
sc
[
"
parts
"
]
=
partition
(
sc
[
"
names
"
][
-
1
],
sc
[
"
city
"
])
eprint
(
"
Total schools: {}
"
.
format
(
len
(
schools
)))
n_split
=
0
for
sc
in
schools
:
sc
[
"
names
"
].
append
(
shorten_name
(
sc
[
"
names
"
][
-
1
]))
if
sc
[
"
parts
"
]
is
not
None
:
if
len
(
sc
[
"
parts
"
])
==
1
:
# Název města nenalezen v názvu školy
p_name
=
postprocess_name_part
(
sc
[
"
names
"
][
-
1
])
sc
[
"
names
"
].
append
(
f
"
{
p_name
}
{
sc
[
'
city
'
]
}
"
)
else
:
# Když máme rozdělení, můžeme zkusit odstanit číslo popisné
# a případně i celý název ulice
n_split
+=
1
assert
len
(
sc
[
"
parts
"
])
==
2
p_name
,
p_place
=
sc
[
"
parts
"
]
p_name
=
shorten_name
(
p_name
)
p_name
=
postprocess_name_part
(
p_name
)
p_place2
,
changed
=
remove_house_number
(
p_place
)
if
changed
:
sc
[
"
names
"
].
append
(
f
"
{
p_name
}
{
sc
[
'
city
'
]
}
,
{
p_place2
.
strip
(
'
,-
'
)
}
"
)
if
"
Praha
"
not
in
sc
[
"
city
"
]:
# např. "G Praha 2" nechceme
sc
[
"
names
"
].
append
(
f
"
{
p_name
}
{
sc
[
'
city
'
]
}
"
)
eprint
(
f
"
Successfully split up
{
n_split
}
schools
"
)
return
schools
def
is_conflict
(
names1
,
names2
):
return
any
([(
name
in
names1
)
for
name
in
names2
])
def
remove_conflicts
(
shortened
):
"""
Vrátí se k delším variantám jmen, pokud se vyskytly konflikty
"""
again
=
True
while
again
:
shortened
.
sort
(
key
=
lambda
sc
:
sc
[
"
names
"
][
-
1
])
eprint
(
"
----------------------------
"
)
n_conflicts
=
0
again
=
False
bad_names
=
set
()
for
sc1
,
sc2
in
zip
(
shortened
,
shortened
[
1
:]):
if
is_conflict
(
sc1
[
"
names
"
],
sc2
[
"
names
"
]):
n_conflicts
+=
1
if
sc1
[
"
names
"
][
0
]
!=
sc2
[
"
names
"
][
0
]:
bad_names
.
add
(
sc1
[
"
names
"
][
-
1
])
again
=
True
for
sc
in
shortened
:
if
sc
[
"
names
"
][
-
1
]
in
bad_names
:
assert
len
(
sc
[
"
names
"
])
>
1
sc
[
"
names
"
].
pop
()
eprint
(
f
"
Found
{
n_conflicts
}
conflicts
"
)
# Hack - tato zkrácení vždy chceme aplikovat, předpokládáme, že nevzniknou konflikty
for
sc
in
shortened
:
sc
[
"
names
"
].
append
(
remove_formalities
(
shorten_name
(
sc
[
"
names
"
][
-
1
])))
eprint
(
"
Done (possible unremovable conflicts)
"
)
city_rules
=
[
(
r
"
(\w)-(\w)
"
,
r
"
\1 - \2
"
),
# Mezery kolem pomlček jsou někdy nekonzistentní
(
"
Praha
"
,
"
v Praze
"
),
(
"
v Praze 4
"
,
"
v Praze 12
"
),
(
r
"
v Praze [0-9]+
"
,
"
v Praze
"
),
(
"
v Praze
"
,
"
Praha
"
),
None
,
# Dummy
]
school_kinds
=
[
(
"
Gymnázium
"
,
"
G
"
),
(
"
Vyšší odborná škola
"
,
"
VOŠ
"
),
(
"
Střední odborná škola
"
,
"
SOŠ
"
),
(
"
Střední zdravotnická škola
"
,
"
SZŠ
"
),
(
"
Střední průmyslová škola
"
,
"
SPŠ
"
),
(
"
Střední pedagogická škola
"
,
"
SPŠ
"
),
(
"
Střední odborné učiliště
"
,
"
SOU
"
),
(
"
Střední škola
"
,
"
SŠ
"
),
(
"
Základní škola
"
,
"
ZŠ
"
),
(
"
Základní umělecká škola
"
,
"
ZUŠ
"
),
(
"
Mateřská škola
"
,
"
MŠ
"
),
# Nechceme mít zvlášť "ZŠ Nový Rychnov" a "ZŠ a MŠ Nový Rychnov" odlišené jen "MŠ"
(
"
ZŠ a MŠ
"
,
"
ZŠ
"
),
(
"
MŠ a ZŠ
"
,
"
ZŠ
"
),
]
formalities
=
[
r
"
,?-? ?příspěvková organizace
"
,
r
"
,? s\.r\.o\.
"
,
r
"
,? o\.p\.s\.
"
,
r
"
s právem státní jazykové zkoušky
"
,
r
"
,? ?okres .+$
"
,
]
def
main
():
parser
=
argparse
.
ArgumentParser
(
description
=
"
Automaticky zkrátí jména škol v databázi
"
)
parser
.
add_argument
(
"
-n
"
,
"
--dry-run
"
,
action
=
"
store_true
"
,
help
=
"
Jen uloží vygenerovaná zkrácení do
'
prejmenovani.tsv
'
, nemění databázi
"
,
)
parser
.
add_argument
(
"
--restore
"
,
action
=
"
store_true
"
,
help
=
"
Vrátí se k oficiálním názvům
"
)
args
=
parser
.
parse_args
()
session
=
db
.
get_session
()
school_place_t
=
aliased
(
db
.
Place
)
parent_place_t
=
aliased
(
db
.
Place
)
schools_q
=
(
session
.
query
(
db
.
School
,
school_place_t
,
parent_place_t
)
.
filter
(
db
.
School
.
place_id
==
school_place_t
.
place_id
)
.
filter
(
parent_place_t
.
place_id
==
school_place_t
.
parent
)
.
all
()
)
if
args
.
restore
:
eprint
(
"
Vracím se k původním názvům.
"
)
for
school
,
place
,
parent_place
in
schools_q
:
place
.
name
=
school
.
official_name
session
.
commit
()
return
schools
=
[]
for
school
,
place
,
parent_place
in
schools_q
:
# Parent má být škola
assert
parent_place
.
level
==
3
# Toto platí před prvním spuštením skriptu, pak už ne (změníme place.name)
# assert place.name == school.official_name
schools
.
append
(
{
"
place_id
"
:
school
.
place_id
,
"
names
"
:
[
school
.
official_name
],
"
city
"
:
parent_place
.
name
,
"
db_place
"
:
place
,
}
)
shortened
=
shorten_all
(
schools
)
remove_conflicts
(
shortened
)
summarize
(
shortened
,
k
=
10
)
if
args
.
dry_run
:
filename
=
"
prejmenovani.tsv
"
with
open
(
filename
,
"
w
"
)
as
f
:
shortened
.
sort
(
key
=
lambda
sc
:
sc
[
"
names
"
][
0
])
for
sc
in
shortened
:
f
.
write
(
f
"
{
sc
[
'
names
'
][
-
1
]
}
\t
{
sc
[
'
names
'
][
0
]
}
\n
"
)
print
(
f
"
Seznam všech přejmenování uložen do
{
filename
}
.
"
)
return
# Zapsat do DB
for
sc
in
shortened
:
sc
[
"
db_place
"
].
name
=
sc
[
"
names
"
][
-
1
]
session
.
commit
()
if
__name__
==
"
__main__
"
:
main
()
Loading