New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch list implementation to more effective one #7898
base: master
Are you sure you want to change the base?
Conversation
567ccfa
to
6d48c33
Compare
Some stats out of @justinmk log: ---
allalloc: 8920
allappend: 1262
allendfind: 280
allfind: 1105
allfindcachemiss: 1091
allfreelist: 8907
allknown_alloc_int: 0
allmov: 0
allremov: 0
allrevers: 59
alls10init: 0
allsinit: 9904
allsort: 60
allstartfind: 317
allstatic_alloc: 9904
allunknown_alloc: 6
append: {avg: 0.0668503019387647, len: 18878, max: 41, med: 0.0, min: 0}
drop: {avg: 0.07193558639686408, len: 18878, max: 59, med: 0.0, min: 0}
dropsiz: {avg: 1.0, len: 1358, max: 1, med: 1.0, min: 1}
find: {avg: 0.05853374298124801, len: 18878, max: 13, med: 0.0, min: 0}
findcachediff: {avg: 0.8816568047337278, len: 169, max: 1, med: 1, min: -1}
findcachemiss: {avg: 0.05779213899777519, len: 18878, max: 6, med: 0.0, min: 0}
findenddiff: {avg: 4.664253393665159, len: 1105, max: 8, med: 8, min: 0}
findstartdiff: {avg: 0.7809954751131222, len: 1105, max: 4, med: 1, min: 0}
first: {avg: 1.0343256700921708, len: 18878, max: 91, med: 0.0, min: 0}
freecont: {avg: 8907.0, len: 1, max: 8907, med: 8907, min: 8907}
freecontsiz: {avg: 2.484562703491636, len: 8907, max: 53, med: 2, min: 0}
freelistsiz: {avg: 0.0, len: 8907, max: 0, med: 0, min: 0}
insert: {avg: 0.0, len: 18878, max: 0, med: 0.0, min: 0}
iter: {avg: 0.24811950418476533, len: 18878, max: 66, med: 0.0, min: 0}
iterconst: {avg: 0.00031783027863121094, len: 18878, max: 1, med: 0.0, min: 0}
known_alloc_len: {avg: 2.1035372144436257, len: 2714, max: 100, med: 2.0, min: 0}
last: {avg: 0.0, len: 18878, max: 0, med: 0.0, min: 0}
len: {avg: 1.3253522618921496, len: 18878, max: 92, med: 0.0, min: 0}
mov: {avg: 0.0, len: 18878, max: 0, med: 0.0, min: 0}
remov: {avg: 0.0, len: 18878, max: 0, med: 0.0, min: 0}
revers: {avg: 0.0031253310732069074, len: 18878, max: 1, med: 0.0, min: 0}
sort: {avg: 0.003178302786312109, len: 18878, max: 1, med: 0.0, min: 0}
startinsert: {avg: 0.0, len: 18878, max: 0, med: 0.0, min: 0}
unknown_alloc_len: {avg: 2.856670341786108, len: 5442, max: 53, med: 1.0, min: 0}
unsure_alloc_len: {avg: 2.503957783641161, len: 758, max: 5, med: 3.0, min: 0} Full output: https://gist.github.com/e2745696b5c5c60e4d694fbb51c9659e. |
Updated after latest commit: ---
allalloc: 8920
allappend: 1262
allendfind: 280
allfind: 1105
allfindcachemiss: 1091
allfreelist: 8907
allknown_alloc_int: 0
allmov: 0
allremov: 0
allrevers: 59
alls10init: 0
allsinit: 9904
allsort: 60
allstartfind: 317
allstatic_alloc: 9904
allunknown_alloc: 6
append: {avg: 0.0668503019387647, len: 18878, max: 41, med: 0.0, min: 0}
drop: {avg: 0.07193558639686408, len: 18878, max: 59, med: 0.0, min: 0}
dropsiz: {avg: 1.0, len: 1358, max: 1, med: 1.0, min: 1}
find: {avg: 0.05853374298124801, len: 18878, max: 13, med: 0.0, min: 0}
findcachediff: {avg: 0.8816568047337278, len: 169, max: 1, med: 1, min: -1}
findcachemiss: {avg: 0.05779213899777519, len: 18878, max: 6, med: 0.0, min: 0}
findenddiff: {avg: 4.664253393665159, len: 1105, max: 8, med: 8, min: 0}
findstartdiff: {avg: 0.7809954751131222, len: 1105, max: 4, med: 1, min: 0}
first: {avg: 1.0343256700921708, len: 18878, max: 91, med: 0.0, min: 0}
freecont: {avg: 8907.0, len: 1, max: 8907, med: 8907, min: 8907}
freecontsiz: {avg: 2.484562703491636, len: 8907, max: 53, med: 2, min: 0}
freelistsiz: {avg: 0.0, len: 8907, max: 0, med: 0, min: 0}
insert: {avg: 0.0, len: 18878, max: 0, med: 0.0, min: 0}
iter: {avg: 0.24811950418476533, len: 18878, max: 66, med: 0.0, min: 0}
iterconst: {avg: 0.00031783027863121094, len: 18878, max: 1, med: 0.0, min: 0}
known_alloc_len: {avg: 2.1035372144436257, len: 2714, max: 100, med: 2.0, min: 0}
last: {avg: 0.0, len: 18878, max: 0, med: 0.0, min: 0}
len: {avg: 1.3253522618921496, len: 18878, max: 92, med: 0.0, min: 0}
lengthdiff: {avg: 2.6475207538703165, len: 8914, max: 100, med: 2.0, min: 0}
maxlength: {avg: 1.2853883701188455, len: 18848, max: 100, med: 0.0, min: 0}
minlength: {avg: 0.003477675566524568, len: 8914, max: 5, med: 0.0, min: 0}
mov: {avg: 0.0, len: 18878, max: 0, med: 0.0, min: 0}
remov: {avg: 0.0, len: 18878, max: 0, med: 0.0, min: 0}
revers: {avg: 0.0031253310732069074, len: 18878, max: 1, med: 0.0, min: 0}
sort: {avg: 0.003178302786312109, len: 18878, max: 1, med: 0.0, min: 0}
startinsert: {avg: 0.0, len: 18878, max: 0, med: 0.0, min: 0}
unknown_alloc_len: {avg: 2.856670341786108, len: 5442, max: 53, med: 1.0, min: 0}
unsure_alloc_len: {avg: 2.503957783641161, len: 758, max: 5, med: 3.0, min: 0} My “typical editing session” (though it yields 69 MiB log after something like five minutes) stats: ---
allalloc: 31952
allappend: 31387
allendfind: 10840
allfind: 22234
allfindcachemiss: 16109
allfreelist: 25144
allknown_alloc_int: 361
allmov: 722
allremov: 0
allrevers: 0
alls10init: 0
allsinit: 28976
allsort: 1
allstartfind: 14108
allstatic_alloc: 28976
allunknown_alloc: 13
append: {avg: 0.5151405734543485, len: 60929, max: 870, med: 0, min: 0}
drop: {avg: 0.6805133844310591, len: 60929, max: 869, med: 0, min: 0}
dropsiz: {avg: 1.0278320430263126, len: 41463, max: 98, med: 1, min: 1}
find: {avg: 0.3649165422048614, len: 60929, max: 1739, med: 0, min: 0}
findcachediff: {avg: 0.484904738641915, len: 10235, max: 97, med: 0, min: -29}
findcachemiss: {avg: 0.2643896994862873, len: 60929, max: 1337, med: 0, min: 0}
findenddiff: {avg: 4.397994063146532, len: 22234, max: 117, med: 1.0, min: 0}
findstartdiff: {avg: 1.8097958082216425, len: 22234, max: 97, med: 0.0, min: 0}
first: {avg: 4.671896797912324, len: 60929, max: 3702, med: 0, min: 0}
freecont: {avg: 25144.0, len: 1, max: 25144, med: 25144, min: 25144}
freecontsiz: {avg: 5.380806554247534, len: 25144, max: 1542, med: 1.0, min: 0}
freelistsiz: {avg: 0.0, len: 25144, max: 0, med: 0.0, min: 0}
insert: {avg: 0.032644553496692874, len: 60929, max: 795, med: 0, min: 0}
iter: {avg: 0.15700241264422524, len: 60929, max: 80, med: 0, min: 0}
iterconst: {avg: 0.0033317467872441696, len: 60929, max: 1, med: 0, min: 0}
known_alloc_len: {avg: 15.274523007856342, len: 8910, max: 1542, med: 2.0, min: 0}
last: {avg: 0.0, len: 60929, max: 0, med: 0, min: 0}
len: {avg: 3.2614846788885425, len: 60929, max: 7400, med: 1, min: 0}
lengthdiff: {avg: 5.580937401698647, len: 31790, max: 1542, med: 1.0, min: 0}
maxlength: {avg: 3.271500787815126, len: 60928, max: 1542, med: 0.0, min: 0}
minlength: {avg: 0.35460836741113555, len: 31790, max: 49, med: 0.0, min: 0}
mov: {avg: 0.011849858031479263, len: 60929, max: 160, med: 0, min: 0}
remov: {avg: 0.0, len: 60929, max: 0, med: 0, min: 0}
revers: {avg: 0.0, len: 60929, max: 0, med: 0, min: 0}
sort: {avg: 1.6412545749971278e-05, len: 60929, max: 1, med: 0, min: 0}
startinsert: {avg: 0.017413711040719527, len: 60929, max: 635, med: 0, min: 0}
unknown_alloc_len: {avg: 0.9374942209893666, len: 10815, max: 58, med: 0, min: 0}
unsure_alloc_len: {avg: 1.9331852358168147, len: 11704, max: 93, med: 2.0, min: 0} |
7c85d4b
to
1ae7381
Compare
It does not do anything useful with the parsed log yet though. [ci skip]
[ci skip]
It appears that typical list is small and immutable - hardly a surprise. What is a surprise is that at least half of the lists are empty ones. As for operations: |
In case somebody wants to play with my log: |
I am thinking of implementing list on top of |
What are the benefits? |
Of what exactly? Using vector over linked list should be faster (well, except for empty lists), and avoiding second allocation is going to speed up things a bit further for most lists. The vector allocation pattern is though merely “double vector size always”, yet a number of languages (like lua or perl, though first is special case (has two vectors to grow separately with some logic behind whether item with numeric key is going to stored in hash or actual array) and second is a bit less then doubles: actually, old_len + min_len - 2; and apparently whoever wrote kvec.h was OK with that) manage to work fine with that. |
By the way, it appears that when using vectors behaviour of |
Now I see, here you were talking about the "double vector size always" pattern. |
@justinmk I was talking about allocating a small array as a part of the list structure always, specifically the second argument to |
Of what I checked so far only Ruby somewhat pays this price for its arrays, but it pays with three elements per array (why not four?), and they are of pointer integer type and not |
Probably not four to not really pay any price: that array is in a union with long, union(long, pointer) and pointer. |
ee9f76e
to
8eaac17
Compare
373cd71
to
5d15b11
Compare
5d15b11
to
5c84d48
Compare
Current status: functional tests |
5c84d48
to
d08ad17
Compare
All functional tests pass locally now; and nvim works normally with my actual config only crashing at exit with |
Looks like preallocating four elements per list was a very bad idea: before (
after:
(these are memory footprints of nvim running with my config on this branch and on master where by “master” I mean the commit I branched of). |
This is strange, removing four preallocated elements actually doubled vsize in place of reducing it:
|
This reverts commit fb7057b.
Intended as optimization, but did not achieve anything significant.
Gcc complains about “missing initializer” for |
This was actually an attempt to fix local clang warnings from glibc-2.25 macros, specifically: 1. `-Wconversion` on fpclassify() macros. 2. `-Wgnu-statement-expression` from `assert()` macros. Though after some attempts it appeared that 1 is not actually fixable without making clang ignore warnings from math.h (should already be doing that, but does not for some reason) or making it pretend being gcc-4.4 or higher and not gcc-4.2 (in the former case glibc uses __builtin_fpclassify which is actually defined). 2 is also not fixable withoun using `-ansi` (or defining `__STRICT_ANSI__` or undefining `__GNUC__`). Though this should be fixed by a newer libc: it [uses `__extension__`][1] to mask warnings. [1]: https://sourceware.org/git/?p=glibc.git;a=blob;f=assert/assert.h;h=5468c16c2c4e354e86454386c39a049daa130523;hb=23158b08a0908f381459f273a984c6fd328363cb#l105
Hi! I'm sorry for just dropping into this pull request, since I'm not a neovim dev and don't really follow the existing code. I just had a few questions about this, since similar comparisons are often done in c++ for std::list vs std::vector. Do I understand correctly that the old implementation was a doubly-linked list of basic types with some sort of caching mechanism? and the new one is a dynamic array implementation?
|
|
Hi, thanks for replying =) Possibly I'm misunderstanding and you're doing it already, but would it make sense to not allocate at all until an element is added to the vector? By a small_vector I meant a vector that has an inline buffer (union with the vector's pointer to the buffer, size and capacity) where on small sizes it uses the inline buffer instead of allocating. This may be total overkill / counter productive however. |
@dgel How do you imagine not allocating? I need a editable object so that two variables holding the same empty vector will get “changed” by putting a value to it. About small: maybe some code? I kind of have something like this now: allocating a list will allocate space for structure which has four list items among other things. There are no unions though, vector with preallocated items is supposed to be compatible with vector with no preallocated items (i.e. any function which operates on the pointer to vector without preallocated items should also work with pointer to vector with preallocated items) as long as changes made to the vector do not trigger some kind of reallocation. So in place of code checking size against number of preallocated items and selecting where to get values from pointer to the items array is used, which incidentally points to the memory just after the memory where pointer itself is stored. Less space efficient, but easier to work with: I actually use this feature to have vectors with different number of preallocated items (Vim has two lists with different number of preallocated items: for group captures when pattern matching and for user function arguments), and also still be able to easily switch back to vector without any preallocated items at all. |
Ahh yeah it looks like I was misunderstanding. I thought a Thanks for talking about this! It's nice to learn about how such lists are implemented in different places. Perhaps I can find an English-language equivalent of that article about different list implementations |
I can’t grow the entire struct ever. First, that makes vector not compatible with vector without initial elements. Second, I do need a never-changing pointer to hold and right now nearly the same struct you wrote is a part of another struct which additionally has data for GC and some other technical thingies. I considered the variant of having (About article: I have the data collected, but did not quite like how I structured it yet, so did not publish it so far. So you would not have that in Russian either yet.) |
5cc8a9f
to
05b433c
Compare
With this `make -jN` is somewhat working (if you are lucky, tests still do share common filenames). Better though is that there are no implicit dependencies now.
This actually makes sense only on developer machines and adds overhead (copying all test files for each test run), but not much of that.
Pretty sure it will end up being dynamically allocated array, but some things are undecided yet. Have a nice collection of data (in Russian though) for an article regarding list implementations in different languages (VimL, Python, Lua(Jit), Perl, MoarVM (Perl6), Ruby, PHP), most (except for VimL) are using reallocated arrays, but details are very different (especially PHP which shares structures and code with hash table; Lua(Jit) also has different kind of hash/array hybrid).
TODO: