File modules/editor/codemirror/doc/internals.html

Last commit: Tue May 22 22:39:52 2018 +0200	Jan Dankert	Fix für PHP 7.2: 'Object' darf nun nicht mehr als Klassennamen verwendet werden. AUCH NICHT IN EINEM NAMESPACE! WTF, wozu habe ich das in einen verfickten Namespace gepackt? Wozu soll der sonst da sein??? Amateure. Daher nun notgedrungen unbenannt in 'BaseObject'.
1 <!doctype html> 2 3 <title>CodeMirror: Internals</title> 4 <meta charset="utf-8"/> 5 <link rel=stylesheet href="docs.css"> 6 <style>dl dl {margin: 0;} .update {color: #d40 !important}</style> 7 <script src="activebookmark.js"></script> 8 9 <div id=nav> 10 <a href="http://codemirror.net"><h1>CodeMirror</h1><img id=logo src="logo.png"></a> 11 12 <ul> 13 <li><a href="../index.html">Home</a> 14 <li><a href="manual.html">Manual</a> 15 <li><a href="https://github.com/codemirror/codemirror">Code</a> 16 </ul> 17 <ul> 18 <li><a href="#top">Introduction</a></li> 19 <li><a href="#approach">General Approach</a></li> 20 <li><a href="#input">Input</a></li> 21 <li><a href="#selection">Selection</a></li> 22 <li><a href="#update">Intelligent Updating</a></li> 23 <li><a href="#parse">Parsing</a></li> 24 <li><a href="#summary">What Gives?</a></li> 25 <li><a href="#btree">Content Representation</a></li> 26 <li><a href="#keymap">Key Maps</a></li> 27 </ul> 28 </div> 29 30 <article> 31 32 <h2 id=top>(Re-) Implementing A Syntax-Highlighting Editor in JavaScript</h2> 33 34 <p style="font-size: 85%" id="intro"> 35 <strong>Topic:</strong> JavaScript, code editor implementation<br> 36 <strong>Author:</strong> Marijn Haverbeke<br> 37 <strong>Date:</strong> March 2nd 2011 (updated November 13th 2011) 38 </p> 39 40 <p style="padding: 0 3em 0 2em"><strong>Caution</strong>: this text was written briefly after 41 version 2 was initially written. It no longer (even including the 42 update at the bottom) fully represents the current implementation. I'm 43 leaving it here as a historic document. For more up-to-date 44 information, look at the entries 45 tagged <a href="http://marijnhaverbeke.nl/blog/#cm-internals">cm-internals</a> 46 on my blog.</p> 47 48 <p>This is a followup to 49 my <a href="http://codemirror.net/story.html">Brutal Odyssey to the 50 Dark Side of the DOM Tree</a> story. That one describes the 51 mind-bending process of implementing (what would become) CodeMirror 1. 52 This one describes the internals of CodeMirror 2, a complete rewrite 53 and rethink of the old code base. I wanted to give this piece another 54 Hunter Thompson copycat subtitle, but somehow that would be out of 55 place—the process this time around was one of straightforward 56 engineering, requiring no serious mind-bending whatsoever.</p> 57 58 <p>So, what is wrong with CodeMirror 1? I'd estimate, by mailing list 59 activity and general search-engine presence, that it has been 60 integrated into about a thousand systems by now. The most prominent 61 one, since a few weeks, 62 being <a href="http://googlecode.blogspot.com/2011/01/make-quick-fixes-quicker-on-google.html">Google 63 code's project hosting</a>. It works, and it's being used widely.</p> 64 65 <p>Still, I did not start replacing it because I was bored. CodeMirror 66 1 was heavily reliant on <code>designMode</code> 67 or <code>contentEditable</code> (depending on the browser). Neither of 68 these are well specified (HTML5 tries 69 to <a href="http://www.w3.org/TR/html5/editing.html#contenteditable">specify</a> 70 their basics), and, more importantly, they tend to be one of the more 71 obscure and buggy areas of browser functionality—CodeMirror, by using 72 this functionality in a non-typical way, was constantly running up 73 against browser bugs. WebKit wouldn't show an empty line at the end of 74 the document, and in some releases would suddenly get unbearably slow. 75 Firefox would show the cursor in the wrong place. Internet Explorer 76 would insist on linkifying everything that looked like a URL or email 77 address, a behaviour that can't be turned off. Some bugs I managed to 78 work around (which was often a frustrating, painful process), others, 79 such as the Firefox cursor placement, I gave up on, and had to tell 80 user after user that they were known problems, but not something I 81 could help.</p> 82 83 <p>Also, there is the fact that <code>designMode</code> (which seemed 84 to be less buggy than <code>contentEditable</code> in Webkit and 85 Firefox, and was thus used by CodeMirror 1 in those browsers) requires 86 a frame. Frames are another tricky area. It takes some effort to 87 prevent getting tripped up by domain restrictions, they don't 88 initialize synchronously, behave strangely in response to the back 89 button, and, on several browsers, can't be moved around the DOM 90 without having them re-initialize. They did provide a very nice way to 91 namespace the library, though—CodeMirror 1 could freely pollute the 92 namespace inside the frame.</p> 93 94 <p>Finally, working with an editable document means working with 95 selection in arbitrary DOM structures. Internet Explorer (8 and 96 before) has an utterly different (and awkward) selection API than all 97 of the other browsers, and even among the different implementations of 98 <code>document.selection</code>, details about how exactly a selection 99 is represented vary quite a bit. Add to that the fact that Opera's 100 selection support tended to be very buggy until recently, and you can 101 imagine why CodeMirror 1 contains 700 lines of selection-handling 102 code.</p> 103 104 <p>And that brings us to the main issue with the CodeMirror 1 105 code base: The proportion of browser-bug-workarounds to real 106 application code was getting dangerously high. By building on top of a 107 few dodgy features, I put the system in a vulnerable position—any 108 incompatibility and bugginess in these features, I had to paper over 109 with my own code. Not only did I have to do some serious stunt-work to 110 get it to work on older browsers (as detailed in the 111 previous <a href="http://codemirror.net/story.html">story</a>), things 112 also kept breaking in newly released versions, requiring me to come up 113 with <em>new</em> scary hacks in order to keep up. This was starting 114 to lose its appeal.</p> 115 116 <section id=approach> 117 <h2>General Approach</h2> 118 119 <p>What CodeMirror 2 does is try to sidestep most of the hairy hacks 120 that came up in version 1. I owe a lot to the 121 <a href="http://ace.ajax.org">ACE</a> editor for inspiration on how to 122 approach this.</p> 123 124 <p>I absolutely did not want to be completely reliant on key events to 125 generate my input. Every JavaScript programmer knows that key event 126 information is horrible and incomplete. Some people (most awesomely 127 Mihai Bazon with <a href="http://ymacs.org">Ymacs</a>) have been able 128 to build more or less functioning editors by directly reading key 129 events, but it takes a lot of work (the kind of never-ending, fragile 130 work I described earlier), and will never be able to properly support 131 things like multi-keystoke international character 132 input. <a href="#keymap" class="update">[see below for caveat]</a></p> 133 134 <p>So what I do is focus a hidden textarea, and let the browser 135 believe that the user is typing into that. What we show to the user is 136 a DOM structure we built to represent his document. If this is updated 137 quickly enough, and shows some kind of believable cursor, it feels 138 like a real text-input control.</p> 139 140 <p>Another big win is that this DOM representation does not have to 141 span the whole document. Some CodeMirror 1 users insisted that they 142 needed to put a 30 thousand line XML document into CodeMirror. Putting 143 all that into the DOM takes a while, especially since, for some 144 reason, an editable DOM tree is slower than a normal one on most 145 browsers. If we have full control over what we show, we must only 146 ensure that the visible part of the document has been added, and can 147 do the rest only when needed. (Fortunately, the <code>onscroll</code> 148 event works almost the same on all browsers, and lends itself well to 149 displaying things only as they are scrolled into view.)</p> 150 </section> 151 <section id="input"> 152 <h2>Input</h2> 153 154 <p>ACE uses its hidden textarea only as a text input shim, and does 155 all cursor movement and things like text deletion itself by directly 156 handling key events. CodeMirror's way is to let the browser do its 157 thing as much as possible, and not, for example, define its own set of 158 key bindings. One way to do this would have been to have the whole 159 document inside the hidden textarea, and after each key event update 160 the display DOM to reflect what's in that textarea.</p> 161 162 <p>That'd be simple, but it is not realistic. For even medium-sized 163 document the editor would be constantly munging huge strings, and get 164 terribly slow. What CodeMirror 2 does is put the current selection, 165 along with an extra line on the top and on the bottom, into the 166 textarea.</p> 167 168 <p>This means that the arrow keys (and their ctrl-variations), home, 169 end, etcetera, do not have to be handled specially. We just read the 170 cursor position in the textarea, and update our cursor to match it. 171 Also, copy and paste work pretty much for free, and people get their 172 native key bindings, without any special work on my part. For example, 173 I have emacs key bindings configured for Chrome and Firefox. There is 174 no way for a script to detect this. <a class="update" 175 href="#keymap">[no longer the case]</a></p> 176 177 <p>Of course, since only a small part of the document sits in the 178 textarea, keys like page up and ctrl-end won't do the right thing. 179 CodeMirror is catching those events and handling them itself.</p> 180 </section> 181 <section id="selection"> 182 <h2>Selection</h2> 183 184 <p>Getting and setting the selection range of a textarea in modern 185 browsers is trivial—you just use the <code>selectionStart</code> 186 and <code>selectionEnd</code> properties. On IE you have to do some 187 insane stuff with temporary ranges and compensating for the fact that 188 moving the selection by a 'character' will treat \r\n as a single 189 character, but even there it is possible to build functions that 190 reliably set and get the selection range.</p> 191 192 <p>But consider this typical case: When I'm somewhere in my document, 193 press shift, and press the up arrow, something gets selected. Then, if 194 I, still holding shift, press the up arrow again, the top of my 195 selection is adjusted. The selection remembers where its <em>head</em> 196 and its <em>anchor</em> are, and moves the head when we shift-move. 197 This is a generally accepted property of selections, and done right by 198 every editing component built in the past twenty years.</p> 199 200 <p>But not something that the browser selection APIs expose.</p> 201 202 <p>Great. So when someone creates an 'upside-down' selection, the next 203 time CodeMirror has to update the textarea, it'll re-create the 204 selection as an 'upside-up' selection, with the anchor at the top, and 205 the next cursor motion will behave in an unexpected way—our second 206 up-arrow press in the example above will not do anything, since it is 207 interpreted in exactly the same way as the first.</p> 208 209 <p>No problem. We'll just, ehm, detect that the selection is 210 upside-down (you can tell by the way it was created), and then, when 211 an upside-down selection is present, and a cursor-moving key is 212 pressed in combination with shift, we quickly collapse the selection 213 in the textarea to its start, allow the key to take effect, and then 214 combine its new head with its old anchor to get the <em>real</em> 215 selection.</p> 216 217 <p>In short, scary hacks could not be avoided entirely in CodeMirror 218 2.</p> 219 220 <p>And, the observant reader might ask, how do you even know that a 221 key combo is a cursor-moving combo, if you claim you support any 222 native key bindings? Well, we don't, but we can learn. The editor 223 keeps a set known cursor-movement combos (initialized to the 224 predictable defaults), and updates this set when it observes that 225 pressing a certain key had (only) the effect of moving the cursor. 226 This, of course, doesn't work if the first time the key is used was 227 for extending an inverted selection, but it works most of the 228 time.</p> 229 </section> 230 <section id="update"> 231 <h2>Intelligent Updating</h2> 232 233 <p>One thing that always comes up when you have a complicated internal 234 state that's reflected in some user-visible external representation 235 (in this case, the displayed code and the textarea's content) is 236 keeping the two in sync. The naive way is to just update the display 237 every time you change your state, but this is not only error prone 238 (you'll forget), it also easily leads to duplicate work on big, 239 composite operations. Then you start passing around flags indicating 240 whether the display should be updated in an attempt to be efficient 241 again and, well, at that point you might as well give up completely.</p> 242 243 <p>I did go down that road, but then switched to a much simpler model: 244 simply keep track of all the things that have been changed during an 245 action, and then, only at the end, use this information to update the 246 user-visible display.</p> 247 248 <p>CodeMirror uses a concept of <em>operations</em>, which start by 249 calling a specific set-up function that clears the state and end by 250 calling another function that reads this state and does the required 251 updating. Most event handlers, and all the user-visible methods that 252 change state are wrapped like this. There's a method 253 called <code>operation</code> that accepts a function, and returns 254 another function that wraps the given function as an operation.</p> 255 256 <p>It's trivial to extend this (as CodeMirror does) to detect nesting, 257 and, when an operation is started inside an operation, simply 258 increment the nesting count, and only do the updating when this count 259 reaches zero again.</p> 260 261 <p>If we have a set of changed ranges and know the currently shown 262 range, we can (with some awkward code to deal with the fact that 263 changes can add and remove lines, so we're dealing with a changing 264 coordinate system) construct a map of the ranges that were left 265 intact. We can then compare this map with the part of the document 266 that's currently visible (based on scroll offset and editor height) to 267 determine whether something needs to be updated.</p> 268 269 <p>CodeMirror uses two update algorithms—a full refresh, where it just 270 discards the whole part of the DOM that contains the edited text and 271 rebuilds it, and a patch algorithm, where it uses the information 272 about changed and intact ranges to update only the out-of-date parts 273 of the DOM. When more than 30 percent (which is the current heuristic, 274 might change) of the lines need to be updated, the full refresh is 275 chosen (since it's faster to do than painstakingly finding and 276 updating all the changed lines), in the other case it does the 277 patching (so that, if you scroll a line or select another character, 278 the whole screen doesn't have to be 279 re-rendered). <span class="update">[the full-refresh 280 algorithm was dropped, it wasn't really faster than the patching 281 one]</span></p> 282 283 <p>All updating uses <code>innerHTML</code> rather than direct DOM 284 manipulation, since that still seems to be by far the fastest way to 285 build documents. There's a per-line function that combines the 286 highlighting, <a href="manual.html#markText">marking</a>, and 287 selection info for that line into a snippet of HTML. The patch updater 288 uses this to reset individual lines, the refresh updater builds an 289 HTML chunk for the whole visible document at once, and then uses a 290 single <code>innerHTML</code> update to do the refresh.</p> 291 </section> 292 <section id="parse"> 293 <h2>Parsers can be Simple</h2> 294 295 <p>When I wrote CodeMirror 1, I 296 thought <a href="http://codemirror.net/story.html#parser">interruptable 297 parsers</a> were a hugely scary and complicated thing, and I used a 298 bunch of heavyweight abstractions to keep this supposed complexity 299 under control: parsers 300 were <a href="http://bob.pythonmac.org/archives/2005/07/06/iteration-in-javascript/">iterators</a> 301 that consumed input from another iterator, and used funny 302 closure-resetting tricks to copy and resume themselves.</p> 303 304 <p>This made for a rather nice system, in that parsers formed strictly 305 separate modules, and could be composed in predictable ways. 306 Unfortunately, it was quite slow (stacking three or four iterators on 307 top of each other), and extremely intimidating to people not used to a 308 functional programming style.</p> 309 310 <p>With a few small changes, however, we can keep all those 311 advantages, but simplify the API and make the whole thing less 312 indirect and inefficient. CodeMirror 313 2's <a href="manual.html#modeapi">mode API</a> uses explicit state 314 objects, and makes the parser/tokenizer a function that simply takes a 315 state and a character stream abstraction, advances the stream one 316 token, and returns the way the token should be styled. This state may 317 be copied, optionally in a mode-defined way, in order to be able to 318 continue a parse at a given point. Even someone who's never touched a 319 lambda in his life can understand this approach. Additionally, far 320 fewer objects are allocated in the course of parsing now.</p> 321 322 <p>The biggest speedup comes from the fact that the parsing no longer 323 has to touch the DOM though. In CodeMirror 1, on an older browser, you 324 could <em>see</em> the parser work its way through the document, 325 managing some twenty lines in each 50-millisecond time slice it got. It 326 was reading its input from the DOM, and updating the DOM as it went 327 along, which any experienced JavaScript programmer will immediately 328 spot as a recipe for slowness. In CodeMirror 2, the parser usually 329 finishes the whole document in a single 100-millisecond time slice—it 330 manages some 1500 lines during that time on Chrome. All it has to do 331 is munge strings, so there is no real reason for it to be slow 332 anymore.</p> 333 </section> 334 <section id="summary"> 335 <h2>What Gives?</h2> 336 337 <p>Given all this, what can you expect from CodeMirror 2?</p> 338 339 <ul> 340 341 <li><strong>Small.</strong> the base library is 342 some <span class="update">45k</span> when minified 343 now, <span class="update">17k</span> when gzipped. It's smaller than 344 its own logo.</li> 345 346 <li><strong>Lightweight.</strong> CodeMirror 2 initializes very 347 quickly, and does almost no work when it is not focused. This means 348 you can treat it almost like a textarea, have multiple instances on a 349 page without trouble.</li> 350 351 <li><strong>Huge document support.</strong> Since highlighting is 352 really fast, and no DOM structure is being built for non-visible 353 content, you don't have to worry about locking up your browser when a 354 user enters a megabyte-sized document.</li> 355 356 <li><strong>Extended API.</strong> Some things kept coming up in the 357 mailing list, such as marking pieces of text or lines, which were 358 extremely hard to do with CodeMirror 1. The new version has proper 359 support for these built in.</li> 360 361 <li><strong>Tab support.</strong> Tabs inside editable documents were, 362 for some reason, a no-go. At least six different people announced they 363 were going to add tab support to CodeMirror 1, none survived (I mean, 364 none delivered a working version). CodeMirror 2 no longer removes tabs 365 from your document.</li> 366 367 <li><strong>Sane styling.</strong> <code>iframe</code> nodes aren't 368 really known for respecting document flow. Now that an editor instance 369 is a plain <code>div</code> element, it is much easier to size it to 370 fit the surrounding elements. You don't even have to make it scroll if 371 you do not <a href="../demo/resize.html">want to</a>.</li> 372 373 </ul> 374 375 <p>On the downside, a CodeMirror 2 instance is <em>not</em> a native 376 editable component. Though it does its best to emulate such a 377 component as much as possible, there is functionality that browsers 378 just do not allow us to hook into. Doing select-all from the context 379 menu, for example, is not currently detected by CodeMirror.</p> 380 381 <p id="changes" style="margin-top: 2em;"><span style="font-weight: 382 bold">[Updates from November 13th 2011]</span> Recently, I've made 383 some changes to the codebase that cause some of the text above to no 384 longer be current. I've left the text intact, but added markers at the 385 passages that are now inaccurate. The new situation is described 386 below.</p> 387 </section> 388 <section id="btree"> 389 <h2>Content Representation</h2> 390 391 <p>The original implementation of CodeMirror 2 represented the 392 document as a flat array of line objects. This worked well—splicing 393 arrays will require the part of the array after the splice to be 394 moved, but this is basically just a simple <code>memmove</code> of a 395 bunch of pointers, so it is cheap even for huge documents.</p> 396 397 <p>However, I recently added line wrapping and code folding (line 398 collapsing, basically). Once lines start taking up a non-constant 399 amount of vertical space, looking up a line by vertical position 400 (which is needed when someone clicks the document, and to determine 401 the visible part of the document during scrolling) can only be done 402 with a linear scan through the whole array, summing up line heights as 403 you go. Seeing how I've been going out of my way to make big documents 404 fast, this is not acceptable.</p> 405 406 <p>The new representation is based on a B-tree. The leaves of the tree 407 contain arrays of line objects, with a fixed minimum and maximum size, 408 and the non-leaf nodes simply hold arrays of child nodes. Each node 409 stores both the amount of lines that live below them and the vertical 410 space taken up by these lines. This allows the tree to be indexed both 411 by line number and by vertical position, and all access has 412 logarithmic complexity in relation to the document size.</p> 413 414 <p>I gave line objects and tree nodes parent pointers, to the node 415 above them. When a line has to update its height, it can simply walk 416 these pointers to the top of the tree, adding or subtracting the 417 difference in height from each node it encounters. The parent pointers 418 also make it cheaper (in complexity terms, the difference is probably 419 tiny in normal-sized documents) to find the current line number when 420 given a line object. In the old approach, the whole document array had 421 to be searched. Now, we can just walk up the tree and count the sizes 422 of the nodes coming before us at each level.</p> 423 424 <p>I chose B-trees, not regular binary trees, mostly because they 425 allow for very fast bulk insertions and deletions. When there is a big 426 change to a document, it typically involves adding, deleting, or 427 replacing a chunk of subsequent lines. In a regular balanced tree, all 428 these inserts or deletes would have to be done separately, which could 429 be really expensive. In a B-tree, to insert a chunk, you just walk 430 down the tree once to find where it should go, insert them all in one 431 shot, and then break up the node if needed. This breaking up might 432 involve breaking up nodes further up, but only requires a single pass 433 back up the tree. For deletion, I'm somewhat lax in keeping things 434 balanced—I just collapse nodes into a leaf when their child count goes 435 below a given number. This means that there are some weird editing 436 patterns that may result in a seriously unbalanced tree, but even such 437 an unbalanced tree will perform well, unless you spend a day making 438 strangely repeating edits to a really big document.</p> 439 </section> 440 <section id="keymap"> 441 <h2>Keymaps</h2> 442 443 <p><a href="#approach">Above</a>, I claimed that directly catching key 444 events for things like cursor movement is impractical because it 445 requires some browser-specific kludges. I then proceeded to explain 446 some awful <a href="#selection">hacks</a> that were needed to make it 447 possible for the selection changes to be detected through the 448 textarea. In fact, the second hack is about as bad as the first.</p> 449 450 <p>On top of that, in the presence of user-configurable tab sizes and 451 collapsed and wrapped lines, lining up cursor movement in the textarea 452 with what's visible on the screen becomes a nightmare. Thus, I've 453 decided to move to a model where the textarea's selection is no longer 454 depended on.</p> 455 456 <p>So I moved to a model where all cursor movement is handled by my 457 own code. This adds support for a goal column, proper interaction of 458 cursor movement with collapsed lines, and makes it possible for 459 vertical movement to move through wrapped lines properly, instead of 460 just treating them like non-wrapped lines.</p> 461 462 <p>The key event handlers now translate the key event into a string, 463 something like <code>Ctrl-Home</code> or <code>Shift-Cmd-R</code>, and 464 use that string to look up an action to perform. To make keybinding 465 customizable, this lookup goes through 466 a <a href="manual.html#option_keyMap">table</a>, using a scheme that 467 allows such tables to be chained together (for example, the default 468 Mac bindings fall through to a table named 'emacsy', which defines 469 basic Emacs-style bindings like <code>Ctrl-F</code>, and which is also 470 used by the custom Emacs bindings).</p> 471 472 <p>A new 473 option <a href="manual.html#option_extraKeys"><code>extraKeys</code></a> 474 allows ad-hoc keybindings to be defined in a much nicer way than what 475 was possible with the 476 old <a href="manual.html#option_onKeyEvent"><code>onKeyEvent</code></a> 477 callback. You simply provide an object mapping key identifiers to 478 functions, instead of painstakingly looking at raw key events.</p> 479 480 <p>Built-in commands map to strings, rather than functions, for 481 example <code>"goLineUp"</code> is the default action bound to the up 482 arrow key. This allows new keymaps to refer to them without 483 duplicating any code. New commands can be defined by assigning to 484 the <code>CodeMirror.commands</code> object, which maps such commands 485 to functions.</p> 486 487 <p>The hidden textarea now only holds the current selection, with no 488 extra characters around it. This has a nice advantage: polling for 489 input becomes much, much faster. If there's a big selection, this text 490 does not have to be read from the textarea every time—when we poll, 491 just noticing that something is still selected is enough to tell us 492 that no new text was typed.</p> 493 494 <p>The reason that cheap polling is important is that many browsers do 495 not fire useful events on IME (input method engine) input, which is 496 the thing where people inputting a language like Japanese or Chinese 497 use multiple keystrokes to create a character or sequence of 498 characters. Most modern browsers fire <code>input</code> when the 499 composing is finished, but many don't fire anything when the character 500 is updated <em>during</em> composition. So we poll, whenever the 501 editor is focused, to provide immediate updates of the display.</p> 502 503 </section> 504 </article>
Download modules/editor/codemirror/doc/internals.html
History Tue, 22 May 2018 22:39:52 +0200 Jan Dankert Fix für PHP 7.2: 'Object' darf nun nicht mehr als Klassennamen verwendet werden. AUCH NICHT IN EINEM NAMESPACE! WTF, wozu habe ich das in einen verfickten Namespace gepackt? Wozu soll der sonst da sein??? Amateure. Daher nun notgedrungen unbenannt in 'BaseObject'.